AESOP AI Academy — Curriculum Review Rubric

Your Assigned Role

Reviewer Assignment

See assignment below

Reviewer	Primary Lens	What to emphasize
Claude	Narrative & Curriculum Integrity	Is story doing pedagogical work? Is the curriculum architecture sound?
Gemini	Technical & Factual Accuracy	Are AI concepts, definitions, and cited cases correct and current?
ChatGPT	Learner Experience & Accessibility	Can a real learner at each level navigate, understand, and complete the work?
Perplexity	Real-World Alignment & Currency	Does content reflect the current AI landscape? Are sources and cases current?

Program Philosophy — Read Before Evaluating

The AESOP Model: Story Is How Humans Learn

The AESOP AI Academy is built on one foundational principle: story is how humans learn — not how they are entertained. This is not a stylistic choice. Every lesson is designed so the narrative creates the problem, and the concept section names what the story already demonstrated.

The curriculum is structured across three proficiency levels (Intro, Basic, Advanced) and five modules. The master content lives at the Advanced level; Intro and Basic are intentional subsets — not simplified versions of Advanced content, but curated selections delivered through a completely different lens appropriate to that level.

The highest-priority question in every evaluation is: After completing this lesson, can the learner actually DO something they could not do before? Not recall a definition — perform a task, make a judgment, or run an experiment.

Proficiency Levels

Ages 5–8

Intro

100% story-driven. Concrete, sensory language. The question being answered is: "What does it do?" No abstraction. No technical vocabulary.

Ages 9–12

Basic

~50–65% narrative. Relational/logical language. The question being answered is: "How does it work?" Concepts are named but explained through analogy.

Ages 13–18

Advanced

~20–35% narrative. Technical/systemic language. The question being answered is: "Why does it matter and what are the stakes?" Real documented cases. Full technical vocabulary.

Scoring Model

How to Score

Absent

Not present / completely fails

Poor

Attempted but misses the mark

Partial

Present but incomplete

Adequate

Meets expectations

4–5

Strong

Exceeds expectations

Score every criterion 0–5. Some criteria are weighted (marked ×2) — those are worth up to 10 points. Total possible score per unit: 100 points. You are scoring ALL five dimensions regardless of your primary role — your role simply defines where to apply the most critical attention.

The Five Evaluation Dimensions

Narrative Integrity

Primary role: Claude · All reviewers score

25 pts

Does the narrative do actual pedagogical work — or is it decoration? Story must create the problem that the concept section answers.

Criterion	Evaluating Question	Max / Weight
Story Creates the Problem	Does the narrative create the exact problem or question that the concept section then answers? Or does the story feel disconnected?	10 pts (×2)
Learner Lands the Insight	Does the protagonist arrive at the insight themselves through the story, or does an adult/narrator explain it to them?	5 pts
Narrative Density Match	Is the story-to-concept ratio calibrated correctly for this level? (Intro = high story; Advanced = scenario hooks + dense concept)	5 pts
Character Consistency	Are established characters used consistently? Does the narrative feel like a continuous experience?	5 pts

0–1: Poor

Story is decoration. Concepts are text blocks with a thin narrative wrapper. Learner is told, not shown.

2–3: Adequate

Story is present and related, but the protagonist doesn't earn the insight — an adult explains it. Density roughly right but drifts.

4–5: Strong

The story creates a genuine problem. The learner character works it out. You couldn't remove the story without destroying the lesson.

Concept Accuracy

Primary role: Gemini · All reviewers score

20 pts

Wrong technical content creates confident misconceptions. Every definition must be correct at the depth appropriate to the level.

Criterion	Evaluating Question	Max / Weight
Definition Accuracy	Are AI terms (tokens, RLHF, hallucination, emergence, transformer, etc.) correctly defined at the appropriate depth for this level?	10 pts (×2)
Real-World Case Fidelity	(Advanced only) Are cited cases (Lemoine/LaMDA, Schwartz attorney, NYT v. OpenAI, etc.) described accurately and without distortion?	5 pts
Misconception Prevention	Does the content actively avoid and counter common AI misconceptions? (AI "thinks," AI "knows," AI is "neutral," AI is "magic")	5 pts

0–1: Poor

Definitions vague or wrong. Common misconceptions reinforced. Real cases absent or misrepresented.

2–3: Adequate

Core concepts roughly correct but lack precision. Misconceptions not reinforced but not corrected either.

4–5: Strong

Definitions precise and level-appropriate. Misconceptions named and countered. Advanced cases accurate with correct attribution and stakes.

Level Appropriateness

Primary role: ChatGPT · All reviewers score

20 pts

Each level (Intro / Basic / Advanced) must feel purposefully designed for that learner — not adapted from another level. Subset-not-simplification: Intro covers fewer concepts with a completely different framing, not watered-down Advanced content.

Criterion	Evaluating Question	Max / Weight
Vocabulary Calibration	Is vocabulary genuinely matched to the level? Intro = concrete/sensory; Basic = relational/logical; Advanced = technical/systemic. Not just simpler words.	5 pts
Cognitive Framing	Is the right question being asked for this level? Intro = "What does it do?"; Basic = "How does it work?"; Advanced = "Why does it matter and what are the stakes?"	10 pts (×2)
Subset Integrity	Does Intro/Basic content feel purposefully curated, or like truncated Advanced content?	5 pts

0–1: Poor

Feels like stripped-down Advanced. Vocabulary condescendingly oversimplified or accidentally too complex. "Dumbed down" rather than genuinely designed for the level.

2–3: Adequate

Vocabulary roughly calibrated but inconsistent. Framing appropriate in some sections but slips in others.

4–5: Strong

Each level feels written for that learner specifically. Intro feels naturally concrete; Advanced feels naturally systemic. Every concept in a lower level serves that level's complete learning outcome.

Delivery Architecture

All reviewers · Equal weight

15 pts

Does the structure of how content is delivered serve the learner? Navigation, pacing, and layout are only meaningful insofar as they help or hinder the learner's experience. This is not a technical audit.

Criterion	Evaluating Question	Max / Weight
Learner Orientation	Can a learner immediately understand where they are, how much remains, and what comes next — without needing instructions?	5 pts
Pacing Support	Does the delivery structure support the learner moving at a natural pace? Are there clear rest/break points? Does it feel rushed or padded?	5 pts
Story-Concept Flow	Does the layout make the transition from story → concept → lab → quiz feel natural and progressive, or jarring and arbitrary?	5 pts

0–1: Poor

A learner could not proceed without external guidance. Section transitions feel arbitrary.

2–3: Adequate

Learner can navigate but requires effort. Story-to-concept transitions work but feel mechanical.

4–5: Strong

Delivery feels invisible — learner is never thinking about navigation, only about the lesson. Story → concept → lab → quiz feel like one continuous experience.

Applied Outcome — Can the Learner DO Something?

⬆ HIGHEST PRIORITY DIMENSION · All reviewers

20 pts

Override Rule: If Dimension 5 scores below 8/20, the unit cannot pass regardless of total score. A lesson where learners cannot do anything after completing it has failed its core purpose.

The entire AESOP philosophy collapses if learners walk away with facts but no capability. After completing this lesson, can the learner actually DO something they could not do before? Not recall a definition — perform a task, make a judgment, or run an experiment.

Criterion	Evaluating Question	Max / Weight
Lab Executability	Can the story lab actually be performed — right now, by this learner, using the stated tools (AESOP or an LLM)? Is the task clearly defined and completable?	10 pts (×2)
Quiz Tests Judgment	Do quiz questions require the learner to apply, evaluate, or decide — rather than simply recall a definition or fact they just read?	5 pts
Clear Capability Delta	Can you complete this sentence: "After this lesson, this learner can ___"? Is that capability something real and meaningful — not just "knows what X is"?	5 pts

0–1: Poor

Labs are vague. Quizzes test recall. At the end of the lesson, the learner knows more facts but has no new capability.

2–3: Adequate

Lab has a real task but is underspecified. Quiz has some application but mostly recall. Capability delta exists but is fuzzy.

4–5: Strong

Lab is fully defined — learner knows what to do, where, what the output looks like, and what "done" means. Quiz forces judgment. You can state the capability delta in one sentence with an action verb.

Required Output Format

How to Report Your Scores

After completing your evaluation, report your scores in the following format. One block per lesson unit reviewed. This format feeds directly into the scoring dashboard.

REVIEWER: [Your name — Claude / Gemini / ChatGPT / Perplexity]
UNIT: [Module number and age group — e.g. "Module 1 – Basic (Ages 9–10)"]
LEVEL: [Intro / Basic / Advanced]
DATE: [Today's date]

DIMENSION 1 — NARRATIVE INTEGRITY (max 25)
  Story Creates the Problem:    [0–5]
  Learner Lands the Insight:    [0–5]
  Narrative Density Match:      [0–5]
  Character Consistency:        [0–5]
  D1 Notes: [Brief notes on findings]

DIMENSION 2 — CONCEPT ACCURACY (max 20)
  Definition Accuracy:          [0–5]
  Real-World Case Fidelity:     [0–5]
  Misconception Prevention:     [0–5]
  D2 Notes: [Brief notes on findings]

DIMENSION 3 — LEVEL APPROPRIATENESS (max 20)
  Vocabulary Calibration:       [0–5]
  Cognitive Framing:            [0–5]
  Subset Integrity:             [0–5]
  D3 Notes: [Brief notes on findings]

DIMENSION 4 — DELIVERY ARCHITECTURE (max 15)
  Learner Orientation:          [0–5]
  Pacing Support:               [0–5]
  Story-Concept Flow:           [0–5]
  D4 Notes: [Brief notes on findings]

DIMENSION 5 — APPLIED OUTCOME (max 20) *** HIGHEST PRIORITY ***
  Lab Executability:            [0–5]
  Quiz Tests Judgment:          [0–5]
  Clear Capability Delta:       [0–5]
  D5 Notes: Complete this sentence — "After this lesson, this learner can ___"

TOTAL SCORE: [sum] / 100
D5 SCORE:    [sum] / 20