The AESOP AI Academy is built on one foundational principle: story is how humans learn — not how they are entertained. This is not a stylistic choice. Every lesson is designed so the narrative creates the problem, and the concept section names what the story already demonstrated.
The curriculum is structured across three proficiency levels (Intro, Basic, Advanced) and five modules. The master content lives at the Advanced level; Intro and Basic are intentional subsets — not simplified versions of Advanced content, but curated selections delivered through a completely different lens appropriate to that level.
The highest-priority question in every evaluation is: After completing this lesson, can the learner actually DO something they could not do before? Not recall a definition — perform a task, make a judgment, or run an experiment.
100% story-driven. Concrete, sensory language. The question being answered is: "What does it do?" No abstraction. No technical vocabulary.
~50–65% narrative. Relational/logical language. The question being answered is: "How does it work?" Concepts are named but explained through analogy.
~20–35% narrative. Technical/systemic language. The question being answered is: "Why does it matter and what are the stakes?" Real documented cases. Full technical vocabulary.
Score every criterion 0–5. Some criteria are weighted (marked ×2) — those are worth up to 10 points. Total possible score per unit: 100 points. You are scoring ALL five dimensions regardless of your primary role — your role simply defines where to apply the most critical attention.
Does the narrative do actual pedagogical work — or is it decoration? Story must create the problem that the concept section answers.
| Criterion | Evaluating Question | Max / Weight |
|---|---|---|
| Story Creates the Problem | Does the narrative create the exact problem or question that the concept section then answers? Or does the story feel disconnected? | 10 pts (×2) |
| Learner Lands the Insight | Does the protagonist arrive at the insight themselves through the story, or does an adult/narrator explain it to them? | 5 pts |
| Narrative Density Match | Is the story-to-concept ratio calibrated correctly for this level? (Intro = high story; Advanced = scenario hooks + dense concept) | 5 pts |
| Character Consistency | Are established characters used consistently? Does the narrative feel like a continuous experience? | 5 pts |
Story is decoration. Concepts are text blocks with a thin narrative wrapper. Learner is told, not shown.
Story is present and related, but the protagonist doesn't earn the insight — an adult explains it. Density roughly right but drifts.
The story creates a genuine problem. The learner character works it out. You couldn't remove the story without destroying the lesson.
Wrong technical content creates confident misconceptions. Every definition must be correct at the depth appropriate to the level.
| Criterion | Evaluating Question | Max / Weight |
|---|---|---|
| Definition Accuracy | Are AI terms (tokens, RLHF, hallucination, emergence, transformer, etc.) correctly defined at the appropriate depth for this level? | 10 pts (×2) |
| Real-World Case Fidelity | (Advanced only) Are cited cases (Lemoine/LaMDA, Schwartz attorney, NYT v. OpenAI, etc.) described accurately and without distortion? | 5 pts |
| Misconception Prevention | Does the content actively avoid and counter common AI misconceptions? (AI "thinks," AI "knows," AI is "neutral," AI is "magic") | 5 pts |
Definitions vague or wrong. Common misconceptions reinforced. Real cases absent or misrepresented.
Core concepts roughly correct but lack precision. Misconceptions not reinforced but not corrected either.
Definitions precise and level-appropriate. Misconceptions named and countered. Advanced cases accurate with correct attribution and stakes.
Each level (Intro / Basic / Advanced) must feel purposefully designed for that learner — not adapted from another level. Subset-not-simplification: Intro covers fewer concepts with a completely different framing, not watered-down Advanced content.
| Criterion | Evaluating Question | Max / Weight |
|---|---|---|
| Vocabulary Calibration | Is vocabulary genuinely matched to the level? Intro = concrete/sensory; Basic = relational/logical; Advanced = technical/systemic. Not just simpler words. | 5 pts |
| Cognitive Framing | Is the right question being asked for this level? Intro = "What does it do?"; Basic = "How does it work?"; Advanced = "Why does it matter and what are the stakes?" | 10 pts (×2) |
| Subset Integrity | Does Intro/Basic content feel purposefully curated, or like truncated Advanced content? | 5 pts |
Feels like stripped-down Advanced. Vocabulary condescendingly oversimplified or accidentally too complex. "Dumbed down" rather than genuinely designed for the level.
Vocabulary roughly calibrated but inconsistent. Framing appropriate in some sections but slips in others.
Each level feels written for that learner specifically. Intro feels naturally concrete; Advanced feels naturally systemic. Every concept in a lower level serves that level's complete learning outcome.
Does the structure of how content is delivered serve the learner? Navigation, pacing, and layout are only meaningful insofar as they help or hinder the learner's experience. This is not a technical audit.
| Criterion | Evaluating Question | Max / Weight |
|---|---|---|
| Learner Orientation | Can a learner immediately understand where they are, how much remains, and what comes next — without needing instructions? | 5 pts |
| Pacing Support | Does the delivery structure support the learner moving at a natural pace? Are there clear rest/break points? Does it feel rushed or padded? | 5 pts |
| Story-Concept Flow | Does the layout make the transition from story → concept → lab → quiz feel natural and progressive, or jarring and arbitrary? | 5 pts |
A learner could not proceed without external guidance. Section transitions feel arbitrary.
Learner can navigate but requires effort. Story-to-concept transitions work but feel mechanical.
Delivery feels invisible — learner is never thinking about navigation, only about the lesson. Story → concept → lab → quiz feel like one continuous experience.
The entire AESOP philosophy collapses if learners walk away with facts but no capability. After completing this lesson, can the learner actually DO something they could not do before? Not recall a definition — perform a task, make a judgment, or run an experiment.
| Criterion | Evaluating Question | Max / Weight |
|---|---|---|
| Lab Executability | Can the story lab actually be performed — right now, by this learner, using the stated tools (AESOP or an LLM)? Is the task clearly defined and completable? | 10 pts (×2) |
| Quiz Tests Judgment | Do quiz questions require the learner to apply, evaluate, or decide — rather than simply recall a definition or fact they just read? | 5 pts |
| Clear Capability Delta | Can you complete this sentence: "After this lesson, this learner can ___"? Is that capability something real and meaningful — not just "knows what X is"? | 5 pts |
Labs are vague. Quizzes test recall. At the end of the lesson, the learner knows more facts but has no new capability.
Lab has a real task but is underspecified. Quiz has some application but mostly recall. Capability delta exists but is fuzzy.
Lab is fully defined — learner knows what to do, where, what the output looks like, and what "done" means. Quiz forces judgment. You can state the capability delta in one sentence with an action verb.
After completing your evaluation, report your scores in the following format. One block per lesson unit reviewed. This format feeds directly into the scoring dashboard.
REVIEWER: [Your name — Claude / Gemini / ChatGPT / Perplexity] UNIT: [Module number and age group — e.g. "Module 1 – Basic (Ages 9–10)"] LEVEL: [Intro / Basic / Advanced] DATE: [Today's date] DIMENSION 1 — NARRATIVE INTEGRITY (max 25) Story Creates the Problem: [0–5] Learner Lands the Insight: [0–5] Narrative Density Match: [0–5] Character Consistency: [0–5] D1 Notes: [Brief notes on findings] DIMENSION 2 — CONCEPT ACCURACY (max 20) Definition Accuracy: [0–5] Real-World Case Fidelity: [0–5] Misconception Prevention: [0–5] D2 Notes: [Brief notes on findings] DIMENSION 3 — LEVEL APPROPRIATENESS (max 20) Vocabulary Calibration: [0–5] Cognitive Framing: [0–5] Subset Integrity: [0–5] D3 Notes: [Brief notes on findings] DIMENSION 4 — DELIVERY ARCHITECTURE (max 15) Learner Orientation: [0–5] Pacing Support: [0–5] Story-Concept Flow: [0–5] D4 Notes: [Brief notes on findings] DIMENSION 5 — APPLIED OUTCOME (max 20) *** HIGHEST PRIORITY *** Lab Executability: [0–5] Quiz Tests Judgment: [0–5] Clear Capability Delta: [0–5] D5 Notes: Complete this sentence — "After this lesson, this learner can ___" TOTAL SCORE: [sum] / 100 D5 SCORE: [sum] / 20