
The Kirkpatrick Model is a four-level framework for evaluating the effectiveness of training programs. Developed by Donald Kirkpatrick in 1959 and refined for decades since, it remains the most widely used training evaluation model in corporate learning and instructional design.
This guide covers all four levels with real-world examples, sample evaluation questions for each level, a planning template, the updated New World Kirkpatrick Model, common criticisms, and how Kirkpatrick compares to alternatives like the Phillips ROI Methodology and CIRO.
The Kirkpatrick Model is a training evaluation framework that measures the impact of a learning program across four sequential levels: Reaction, Learning, Behavior, and Results. Each level answers a different question about whether the training is working, and each level becomes more difficult (and more valuable) to measure than the one before it.
While the model is most often applied to corporate training and eLearning programs, it's flexible enough to evaluate any kind of learning intervention. Modern practitioners (including the Kirkpatrick family's own training organization) treat the model as cyclical rather than linear, and they teach evaluators to plan in reverse: start by defining your Level 4 business results, then work backward to design the training itself.
Here's how the four levels compare across the questions they answer, the methods used to measure them, and when to do that measurement.
Donald Kirkpatrick first introduced the four levels in his 1954 PhD dissertation at the University of Wisconsin, then published the framework in a 1959 series of articles in the journal Training and Development. His 1994 book Evaluating Training Programs: The Four Levels made the model the de facto standard for the training industry.
In the 2010s, Don's son Jim Kirkpatrick and daughter-in-law Wendy Kirkpatrick launched the New World Kirkpatrick Model through Kirkpatrick Partners. Their update kept the original four levels but added critical refinements: required drivers to support on-the-job behavior, confidence and commitment as Level 2 indicators, and a strong emphasis on planning evaluation from Level 4 backward.
The model continues to evolve. In 2026, Vanessa Milara Alzate is expanding the framework further with explicit attention to the performance environment: the systems, culture, and reinforcement structures that determine whether training actually sticks.
The sections below walk through each level in order from 1 to 4, which makes them easier to understand. But when you actually plan an evaluation strategy, you should work the opposite direction: start with Level 4 (what the business needs) and design backward. We'll cover that workflow later.
Level 1 captures how participants respond to the training experience. Specifically, it measures how engaging, relevant, and useful they found it. This is the most commonly collected type of evaluation data, usually gathered through a short post-training survey (sometimes called a "smile sheet").
One important update from the New World model: relevance is a stronger predictor of behavior change than satisfaction or engagement. A learner can enjoy a workshop and still not apply any of it. But if they tell you the content was directly relevant to their job, they're far more likely to actually use it. Build your Level 1 surveys around relevance, not just enjoyment.
Level 1 evaluation also shouldn't wait until the end of a program. Formative pulse checks during the training, like quick polls, check-in questions, or facilitator observations, let you adjust in the moment instead of finding out a week later that learners were lost in module two.
Use these as a starting point for your own questionnaire. Most are rated on a 1–5 Likert scale; the last two are open-ended.
A technical support call center rolls out new screen-sharing software and runs a one-hour webinar teaching agents when to use it, how to initiate a session, and how to handle legal disclaimers. At the end, agents complete a short online survey rating relevance, clarity, and confidence, plus two open-ended questions on what worked and what didn't. The training team uses the results to flag the disclaimer section as confusing and revise it before the next cohort.
Level 2 measures whether learners actually acquired the knowledge, skills, and attitudes the training was designed to teach. This is the cornerstone of most instructional design work. It's where quizzes, demonstrations, and final assessments live.
The New World Kirkpatrick Model expands Level 2 beyond the original "KSA" (knowledge, skills, attitudes) trio. It adds two new indicators: confidence (do learners believe they can do this back on the job?) and commitment (do they intend to actually do it?). Both are strong predictors of whether Level 3 behavior change will happen, and both can be measured with a few well-placed survey questions.
Pre-tests are also worth the effort when feasible. Measuring knowledge before and after training is the cleanest way to attribute gains to the training itself rather than to existing experience.
Knowledge check (multiple choice):
Skill demonstration prompts:
Confidence and commitment (1–5 scale):
After the call center webinar, agents complete a 10-question multiple-choice quiz on the screen-sharing process and legal disclaimers. They must score 80% or higher to receive certification. They also complete a live role-play with their supervisor — initiating a session, walking through the disclaimer, and screen-sharing successfully — before they're authorized to use the tool on real customer calls.
A quick contrast: for a coffee-roastery cleaning workshop, written tests don't cut it. Physical procedural skills are best measured by direct observation, like watching each operator clean the machine end to end.
Level 3 asks the question that actually matters to the business: are people behaving differently on the job because of this training? Learning something in a classroom and applying it back at your desk are two different things, and Level 3 is where the model starts producing data you can actually act on.
Two important practical notes. First, measurement should begin within a few weeks of training, not after 90 days. The longer you wait, the harder it is to attribute behavior changes to the training versus everything else happening in the work environment. A common cadence is to start observing 2–4 weeks post-training and continue through the 3–6 month window.
Second, behavior change doesn't happen because of training alone. Don Kirkpatrick identified four conditions necessary for behavior change: desire to change, knowledge of what to do, the right climate (a supportive manager, removed obstacles), and rewards for doing it. The New World model formalizes this idea as required drivers: the reinforcement, accountability, and support systems that turn learning into behavior. Without those drivers, even excellent training fails at Level 3.
Self-report (sent 30 and 60 days post-training):
Supervisor observation checklist (sample items):
The screen-sharing software is integrated with the call center's performance management platform, so every screen-share session is logged automatically. Three weeks after training, the team pulls a report: what percentage of eligible calls included a screen-share? Agents below a threshold get a coaching conversation, not a reprimand. The goal is to find out what's blocking transfer (forgot the steps? worried about customer reaction? no manager reinforcement?) and fix it.
Level 4 measures whether the training is actually moving the needle on the outcomes the business cares about, such as sales, customer satisfaction, retention, safety incidents, output, and error rates. This is where training proves its worth, and it's the level most organizations skip.
Two concepts make Level 4 more practical. The first is the distinction between leading and lagging indicators. Lagging indicators (quarterly revenue, annual turnover) tell you what already happened. Leading indicators (number of qualified demos booked, first-call resolution rate) predict where the lagging numbers are headed and let you course-correct earlier.
The second is a shift away from chasing a precise ROI number. Calculating the dollar return on a training program with any real certainty is extremely hard. There are too many confounding variables. Kirkpatrick Partners now recommends two more honest alternatives:
Phillips' ROI Methodology (covered in the alternatives section below) takes a different approach and tries to isolate training's financial impact directly. Both views are valid; choose based on what your stakeholders find credible.
For the screen-sharing initiative:
For stakeholder ROE conversations:
Six months after the screen-sharing rollout, the team compares CSAT scores on calls that used screen sharing against a matched sample that didn't. Calls with screen sharing show a measurable lift in CSAT and a reduction in average handle time. Combined with the Level 3 data showing strong adoption, the training team makes a credible Contributive ROI argument: the program is one of several factors driving the CSAT improvement, and the evidence supports continued investment.
The New World Kirkpatrick Model, developed by Jim and Wendy Kirkpatrick, is the current, modernized version of the framework. It keeps the original four levels but adds several practical concepts that address the biggest weakness of the 1950s version: the assumption that good training automatically leads to behavior change and business results.
The key additions:
If you're learning the model today, learn the New World version. The original 1959 framework is foundational, but the updated model reflects what practitioners have figured out over six decades of trying to make it work in real organizations.
Here's the workflow most experienced evaluators follow: working from Level 4 backward to Level 1.
This workflow pairs well with broader ID frameworks like ADDIE and other instructional design models. Kirkpatrick handles the evaluation strategy; ADDIE handles the design and development workflow.
You can use this template to plan your evaluation using the Kirkpatrick model. It’s the exact template that I teach and share in the Instructional Design Project Lab.
The sample questions earlier in this guide can be assembled into a complete questionnaire. A practical approach:
The Kirkpatrick Model is the industry standard, but it has real limitations worth knowing about (especially if you're going to defend your evaluation choices to a skeptical stakeholder).
None of this invalidates the framework. It's still the most useful starting point we have. But pretending the model is airtight does the profession no favors.
Several other evaluation frameworks have been developed to address Kirkpatrick's limitations or take different angles on the same problem.
In practice, most organizations end up using Kirkpatrick as the spine and borrowing from Phillips (for ROI), CIRO (for front-end context), or Brinkerhoff (for case-based insight) when the situation calls for it.
Level 1 (Reaction) measures how learners respond to the training. Level 2 (Learning) measures what they know and can do. Level 3 (Behavior) measures whether they apply it on the job. Level 4 (Results) measures the impact on business outcomes.
Donald Kirkpatrick developed the model as part of his 1954 PhD dissertation at the University of Wisconsin and published it through a series of articles in 1959. His son Jim Kirkpatrick and Wendy Kirkpatrick updated it into the New World Kirkpatrick Model in the 2010s.
Level 1: during and immediately after training. Level 2: at the end of training (and pre-training as a baseline where possible). Level 3: starting within a few weeks of training and continuing through the 3–6 month window. Level 4: 3–12 months post-training, with ongoing tracking.
The New World model adds required drivers (the post-training reinforcement that enables behavior change), confidence and commitment as Level 2 indicators, an emphasis on planning from Level 4 backward, and Return on Expectations (ROE) as a more practical alternative to financial ROI.
Yes, it remains the most widely used training evaluation framework in corporate L&D and instructional design. The New World version addresses most of the modern critiques of the original, and the four-level vocabulary is universal among practitioners.
The main criticisms are that the causal links between levels are weaker than the model implies, that organizations rarely measure Levels 3 and 4 in practice, that the model doesn't account for environmental and transfer factors, and that Level 4 attribution is genuinely difficult to defend.
Phillips adds a fifth level (financial ROI) and a specific methodology for isolating training's monetary impact. Kirkpatrick (especially the New World version) prefers Return on Expectations and Contributive ROI, which acknowledge that training contributes to outcomes alongside other factors rather than claiming sole financial credit.
The Kirkpatrick Model isn't a checklist. It's a discipline. Used well, it forces you to answer two hard questions before you build a single slide: what does the business actually need? and what will people need to do differently for that to happen? Everything else follows from there.
If you only do one thing with this framework, plan from Level 4 backward. That single habit separates evaluation that informs decisions from evaluation that fills out a checkbox.
