The DEEP Agent Framework: 5 Principles for Designing AI Tools That Actually Work in Education
A pedagogical design framework, not a technical one
Alex Gray
Director, DEEP Education
Most AI tools built for education are built by technologists. They start with what AI can do and work backwards to find an educational use case. The result is technically impressive and pedagogically shallow: tools that demonstrate AI capability without meaningfully serving learning.
I have spent the last year building AI tools for education, and the experience convinced me that we need a better design framework. Not a technical framework (there are plenty of those), but a pedagogical design framework that starts with what education needs and works forward to how AI should be built to serve it.
That framework is the DEEP Agent Framework. It is five principles and four layers that I use to design every AI tool I build. I am sharing it here because I believe it has value beyond my own projects; for anyone building, evaluating, or commissioning AI tools for education.
Why Existing Approaches Fail
The standard approach to building AI tools for education follows a pattern: choose a use case (tutoring, assessment, feedback), select a model (GPT-4, Gemini, Claude), build an interface, and ship. The pedagogical design happens at the prompting level: a system prompt that tells the AI to "be a helpful tutor" or "provide constructive feedback."
This approach treats pedagogy as an afterthought. The system prompt is the thinnest possible layer of educational design, bolted onto a general-purpose model. It works for simple use cases, generating quiz questions, summarising texts, but it fails for anything that requires sustained, adaptive, pedagogically informed interaction.
The failure modes are predictable. The AI gives answers when it should ask questions. It moves on when it should verify understanding. It responds to frustration with more content instead of adjusting its approach. It treats every student the same way because it has no model of individual learning states. These are not technical failures; they are design failures caused by the absence of a pedagogical framework.
The Five Principles
The DEEP Agent Framework rests on five principles. Each one addresses a specific failure mode I have observed in educational AI tools.
Principle 1: Depth over Breadth
Most AI tools try to do everything. They are tutors and assessment generators and feedback providers and content creators. The result is that they do nothing particularly well.
The first principle is to build AI tools that go deep on one thing rather than wide on many things. An AI tutor should be the best possible tutor, with sophisticated question strategies, misconception detection, and adaptive difficulty, rather than a tutor that also marks essays and generates worksheets. A feedback tool should provide the most insightful, personalised, actionable feedback possible, rather than feedback that is acceptable but also comes with a lesson planner.
Depth is where educational value lives. Surface-level AI is a convenience. Deep AI is a pedagogical tool. And depth requires focus.
Principle 2: Separate Language from Logic
Large language models are language machines. They are extraordinarily good at generating coherent, contextually appropriate text. They are not good at logic, reasoning, or structured decision-making. Yet most educational AI tools ask language models to do both: generate the text and make the pedagogical decisions.
The second principle is to separate these functions. The language model handles what it is good at: generating natural, engaging, contextually appropriate text. The pedagogical logic (when to ask a question, when to provide a hint, when to verify understanding, when to adjust difficulty) is handled by a separate decision layer that the language model follows.
In Athena, this means the Socratic questioning strategy is not embedded in a prompt. It is a structured decision engine that determines what type of interaction should happen next, and then instructs the language model to generate the appropriate text. The language model is the voice. The decision engine is the brain.
This separation makes the tool more reliable, more predictable, and more pedagogically sound. It means the AI's teaching behaviour is designed, not emergent; and designed by someone with pedagogical expertise, not by the statistical patterns of a language model's training data.
Principle 3: Model the State
Most AI interactions are stateless. Each message is processed independently, with no memory of what came before (beyond the conversation window) and no model of the user's current state.
The third principle is that educational AI must maintain a model of the learner's state: what they know, what they are struggling with, what misconceptions they hold, and what their emotional or engagement state is. This model informs every decision the AI makes.
This is not about storing data permanently. It is about the AI maintaining, within a learning session, an understanding of where the student is. A tutor that asks the same diagnostic question twice because it does not remember the answer is not just inefficient; it is pedagogically incoherent. A feedback tool that does not account for what a student has already been told is generating noise, not signal.
Principle 4: Design Diagnostic Loops
Learning is not linear. Students do not progress smoothly from ignorance to understanding. They oscillate: grasping a concept, losing it, regaining it in a different form, connecting it to prior knowledge, and consolidating it through practice.
The fourth principle is to design AI tools with diagnostic loops; regular checkpoints where the AI assesses the learner's current understanding and adjusts its approach accordingly. This is the AI equivalent of checking for understanding, which is arguably the most important thing any teacher does.
In Athena, the verification gate is a diagnostic loop. The AI does not move on to a new concept until it has confirmed understanding of the current one. If verification fails, the AI loops back with a different approach: a simpler question, a concrete example, a visual analogy.
Without diagnostic loops, AI tools operate on the assumption that delivery equals learning. It does not. Delivery plus verification plus adjustment equals learning.
Principle 5: Define Exit Criteria
Every educational interaction should have a clear endpoint. Not a time limit; a learning outcome. The student has demonstrated understanding. The misconception has been corrected. The skill has been practised to a sufficient level. The feedback has been understood and a revision plan has been articulated.
The fifth principle is that AI tools need defined exit criteria: clear conditions under which the interaction concludes because the learning objective has been met. Without exit criteria, AI interactions either end arbitrarily (the student gets bored and leaves) or continue indefinitely without purpose.
This is directly analogous to what good teachers do: they know what success looks like before the lesson starts, and they design the lesson to get students there. AI tools should operate on the same principle: every interaction should have a defined destination, and the AI should work towards it.
The Four Layers
The DEEP Agent Framework organises these principles into four architectural layers:
Interface Layer: How the student interacts with the AI. This should be as simple as possible; the interface should not be the challenge, the learning should be.
Domain Layer: The subject-specific knowledge and pedagogical strategies. This is where curriculum content, common misconceptions, question strategies, and feedback patterns live. This layer is subject-specific and should be designed by subject experts.
State Layer: The model of the learner's current understanding, engagement, and progress. This layer tracks what the student knows, what they are struggling with, and how they are feeling.
Decision Layer: The logic that determines what the AI does next. This layer draws on the state layer (where is the student?) and the domain layer (what strategies are available?) to make pedagogical decisions. It is the brain of the system, and it should be designed with pedagogical expertise, not just technical capability.
Why This Matters for Schools
You do not need to build AI tools to benefit from this framework. You need to evaluate them. When you assess an AI tool for your school, ask which of these principles it embodies.
Does it go deep on one thing, or try to do everything? Does it separate language from logic, or rely on a prompt to make pedagogical decisions? Does it model the learner's state, or treat every interaction as independent? Does it include diagnostic loops, or assume delivery equals learning? Does it have defined exit criteria, or let interactions drift without purpose?
Most tools on the market today fail on at least three of these five criteria. That does not mean they are worthless; it means they are limited. Understanding those limitations helps you use them appropriately and advocate for better tools when the opportunity arises.
The DEEP Agent Framework is my attempt to articulate what "better" looks like. It is a starting point, not a final answer, and I expect it to evolve as the field matures. But the core conviction behind it will not change: AI in education should be designed by educators, grounded in pedagogy, and built to serve learning, not to demonstrate what technology can do.
Alex Gray
Director, DEEP Education
Education technology specialist with 20 years in the education sector. BSME AI Network Lead and ISC Edruptor 2024 & 2025. Alex founded DEEP Education, part of the DEEP Education Network by DEEP Professional, to help schools navigate AI integration with confidence.
Ready to assess your school’s AI readiness?
Upload your policy documents and receive evidence-based scores across all 9 dimensions, with actionable improvement plans.
Start your free audit