Here's what makes people think AI is magic: it seems to remember everything you tell it. You ask a question, it answers. You say "hold on, I meant X," and it pivots perfectly. It feels like conversation with a person who never forgets.
It's not magic. It's architecture. And knowing how the architecture works is the difference between building systems that scale and building systems that fail.
AI doesn't have memory the way you do. It has a context window — a box of fixed size where everything you've said and everything it's said lives. Fill that box with the wrong things, and the system starts forgetting critical information. Keep the wrong things out, and you can run long, complex conversations without losing coherence.
This module is about understanding that box. Not theoretically. Practically. How do you design a system that handles long interactions? When does context matter? What breaks when you run out of space?
A developer at a software company built a customer service chatbot. It worked flawlessly for the first two weeks. Users loved it. The company was thrilled.
By week three, something shifted. Customers started reporting that the bot was giving contradictory advice. One minute it would suggest a solution, the customer would say "I already tried that," and ten messages later, the bot would suggest the exact same solution again. It was forgetting what it had already recommended. It was contradicting itself. Occasionally it would suggest a feature that the customer had explicitly said didn't work for them fifteen minutes earlier.
The developer's first thought: the prompts need more specificity. She added guardrails, added examples, added stricter instructions. The bot got worse. The contradictions continued. She added even more guardrails. Still worse.
At some point, she stopped and asked a different question: What if the problem isn't the prompts? What if the problem is that the bot is forgetting things?
She tested it. Long conversations, short conversations, conversations where information was repeated. The pattern became clear: the longer the conversation, the more context the bot forgot. The system was losing information as the chat got longer. The bot wasn't broken. The architecture was.
That's when she realized: the bot had a context window. Everything it said and everything the customer said had to fit in that window. When the conversation got long enough, new messages pushed the old ones out. The bot was literally forgetting the beginning of the conversation — and then repeating suggestions it didn't know it had already made.
This module teaches you to recognize that problem before it reaches your customers.
AI systems have three ways to remember. Understanding the difference between them is how you design systems that actually work at scale.
Everything in the current conversation — everything you've typed and everything the AI has typed — lives in the context window. This memory is precise, immediate, and completely temporary. When the conversation ends, it's gone. When the context window fills up, the oldest messages get pushed out. You can reference anything in the current conversation by name, by detail, by context. But once it leaves the window, the AI has no memory of it.
Information stored outside the AI — in a database, a file, a search index, a vector store. The AI doesn't "remember" this directly. Instead, when it needs information, it retrieves it. If the AI is helping a customer service agent, the customer's full history might live in a database. The AI doesn't hold the whole history in its context window — it retrieves the relevant parts when needed. This scales much better than in-context memory, but it requires you to build the retrieval system.
Information baked into the model's weights during training. This is permanent, but it's slow to update and expensive. You don't fine-tune to remember a single customer's conversation — you fine-tune to bake in knowledge, style, or behavior that applies across all conversations. This is the wrong tool for transient information.
Most systems use all three. In-context memory for the current conversation, external memory for historical data, fine-tuning for patterns and knowledge that apply everywhere. Your job is to choose which memory type to use for each piece of information.
Once you understand the three types of memory, design gets concrete. Three questions solve most context problems.
And what happens at the limit? A customer service chat might run for 50 messages. A research assistant might run for 500. A tutor for a whole semester. Know your maximum conversation length. Then design memory around it — what information must stay in-context the whole time? What can be archived? What needs retrieval?
Some things can't be retrieved. If the AI needs to know the user's name, their previous choices in this conversation, or a constraint they set at the start, that information must never leave the context window. Everything else can be retrieved on demand.
Transparency matters. If the AI can no longer access earlier messages, it should say so. If it's retrieving historical data from a database, that's different from retrieving from the current conversation. Users should understand what the AI can and can't remember.
These three questions become the design spec for your system. Answer them, and context management becomes tractable.