Intro

Context and Memory

2 min read

Here's what makes people think AI is magic: it seems to remember everything you tell it. You ask a question, it answers. You say "hold on, I meant X," and it pivots perfectly. It feels like conversation with a person who never forgets.

It's not magic. It's architecture. And knowing how the architecture works is the difference between building systems that scale and building systems that fail.

AI doesn't have memory the way you do. It has a context window — a box of fixed size where everything you've said and everything it's said lives. Fill that box with the wrong things, and the system starts forgetting critical information. Keep the wrong things out, and you can run long, complex conversations without losing coherence.

This module is about understanding that box. Not theoretically. Practically. How do you design a system that handles long interactions? When does context matter? What breaks when you run out of space?

Your artifact

A context management design for an AI assistant application

By the end of this module, you will:

Explain what a context window is in plain, non-technical language
Design a conversation flow that handles long interactions gracefully
Choose the right memory strategy for a specific use case
Identify when context loss is causing bad outputs
Document a complete context management plan for a team

Scenario

The Debugging Journey

3 min read

A developer at a software company built a customer service chatbot. It worked flawlessly for the first two weeks. Users loved it. The company was thrilled.

By week three, something shifted. Customers started reporting that the bot was giving contradictory advice. One minute it would suggest a solution, the customer would say "I already tried that," and ten messages later, the bot would suggest the exact same solution again. It was forgetting what it had already recommended. It was contradicting itself. Occasionally it would suggest a feature that the customer had explicitly said didn't work for them fifteen minutes earlier.

The developer's first thought: the prompts need more specificity. She added guardrails, added examples, added stricter instructions. The bot got worse. The contradictions continued. She added even more guardrails. Still worse.

At some point, she stopped and asked a different question: What if the problem isn't the prompts? What if the problem is that the bot is forgetting things?

She tested it. Long conversations, short conversations, conversations where information was repeated. The pattern became clear: the longer the conversation, the more context the bot forgot. The system was losing information as the chat got longer. The bot wasn't broken. The architecture was.

That's when she realized: the bot had a context window. Everything it said and everything the customer said had to fit in that window. When the conversation got long enough, new messages pushed the old ones out. The bot was literally forgetting the beginning of the conversation — and then repeating suggestions it didn't know it had already made.

This module teaches you to recognize that problem before it reaches your customers.

Lesson

Three Kinds of Memory

3 min read

AI systems have three ways to remember. Understanding the difference between them is how you design systems that actually work at scale.

In-Context Memory

Everything in the current conversation — everything you've typed and everything the AI has typed — lives in the context window. This memory is precise, immediate, and completely temporary. When the conversation ends, it's gone. When the context window fills up, the oldest messages get pushed out. You can reference anything in the current conversation by name, by detail, by context. But once it leaves the window, the AI has no memory of it.

External Memory

Information stored outside the AI — in a database, a file, a search index, a vector store. The AI doesn't "remember" this directly. Instead, when it needs information, it retrieves it. If the AI is helping a customer service agent, the customer's full history might live in a database. The AI doesn't hold the whole history in its context window — it retrieves the relevant parts when needed. This scales much better than in-context memory, but it requires you to build the retrieval system.

Fine-Tuning Memory

Information baked into the model's weights during training. This is permanent, but it's slow to update and expensive. You don't fine-tune to remember a single customer's conversation — you fine-tune to bake in knowledge, style, or behavior that applies across all conversations. This is the wrong tool for transient information.

Most systems use all three. In-context memory for the current conversation, external memory for historical data, fine-tuning for patterns and knowledge that apply everywhere. Your job is to choose which memory type to use for each piece of information.

Context

Designing Around Limits

2 min read

Once you understand the three types of memory, design gets concrete. Three questions solve most context problems.

What's the longest conversation your use case will have?

And what happens at the limit? A customer service chat might run for 50 messages. A research assistant might run for 500. A tutor for a whole semester. Know your maximum conversation length. Then design memory around it — what information must stay in-context the whole time? What can be archived? What needs retrieval?

What information must always be in context?

Some things can't be retrieved. If the AI needs to know the user's name, their previous choices in this conversation, or a constraint they set at the start, that information must never leave the context window. Everything else can be retrieved on demand.

How do you tell users when the AI has forgotten something?

Transparency matters. If the AI can no longer access earlier messages, it should say so. If it's retrieving historical data from a database, that's different from retrieving from the current conversation. Users should understand what the AI can and can't remember.

These three questions become the design spec for your system. Answer them, and context management becomes tractable.

⚡ Skill Lab

Context Design

~25 minutes · 1 design challenge

What you're doing

You'll design a complete context management strategy for an AI assistant in a real domain. The AI will ask you about your design, present edge cases, and help you think through tradeoffs.

Roles

🏗️

You — ArchitectYou design the memory strategy for a specific AI assistant. Your design should handle real constraints and tradeoffs.

🎯

AI — Design PartnerI'll ask challenging questions about your design, present edge cases, and help you think through the consequences of your choices.

Domains to choose from

Customer service · Tutoring assistant · Project management · Medical triage intake

Design framework — answer all three

What's the longest conversation?

What info must stay in-context?

How do you handle forgotten info?

Success criteria

Your design should handle realistic conversation lengths, specify what memory strategy you're using for each piece of information, and address transparency with users.

Shift + Enter for a new line

✓ Module Complete

You've completed Module 2 of 8.

Next Module →