The Kirkpatrick Model: Four Levels of Training Evaluation (With Examples, Free Template, & FAQ)

By
Devlin Peck
. Updated on 
May 28, 2026
.
Kirkpatrick model of training evaluation article cover photo

The Kirkpatrick Model is a four-level framework for evaluating the effectiveness of training programs. Developed by Donald Kirkpatrick in 1959 and refined for decades since, it remains the most widely used training evaluation model in corporate learning and instructional design.

This guide covers all four levels with real-world examples, sample evaluation questions for each level, a planning template, the updated New World Kirkpatrick Model, common criticisms, and how Kirkpatrick compares to alternatives like the Phillips ROI Methodology and CIRO.

What Is the Kirkpatrick Model?

The Kirkpatrick Model is a training evaluation framework that measures the impact of a learning program across four sequential levels: Reaction, Learning, Behavior, and Results. Each level answers a different question about whether the training is working, and each level becomes more difficult (and more valuable) to measure than the one before it.

While the model is most often applied to corporate training and eLearning programs, it's flexible enough to evaluate any kind of learning intervention. Modern practitioners (including the Kirkpatrick family's own training organization) treat the model as cyclical rather than linear, and they teach evaluators to plan in reverse: start by defining your Level 4 business results, then work backward to design the training itself.

The Four Levels at a Glance

Here's how the four levels compare across the questions they answer, the methods used to measure them, and when to do that measurement.

Level Question Answered Common Methods When to Measure Difficulty
1. Reaction Did learners find the training engaging, relevant, and useful? Post-training surveys, pulse checks, focus groups During and immediately after training Low
2. Learning Did learners gain the intended knowledge, skills, attitudes, confidence, and commitment? Quizzes, tests, demonstrations, pre/post assessments During and at the end of training Moderate
3. Behavior Are learners applying what they learned on the job? Observation, performance data, 360-degree feedback, supervisor reviews Beginning within weeks of training; ongoing High
4. Results Is the training producing measurable outcomes for the business? KPIs, ROE, Contributive ROI, leading and lagging indicators 3–12 months post-training, ongoing Very High

A Brief History of the Model (1959–2026)

Donald Kirkpatrick first introduced the four levels in his 1954 PhD dissertation at the University of Wisconsin, then published the framework in a 1959 series of articles in the journal Training and Development. His 1994 book Evaluating Training Programs: The Four Levels made the model the de facto standard for the training industry.

In the 2010s, Don's son Jim Kirkpatrick and daughter-in-law Wendy Kirkpatrick launched the New World Kirkpatrick Model through Kirkpatrick Partners. Their update kept the original four levels but added critical refinements: required drivers to support on-the-job behavior, confidence and commitment as Level 2 indicators, and a strong emphasis on planning evaluation from Level 4 backward.

The model continues to evolve. In 2026, Vanessa Milara Alzate is expanding the framework further with explicit attention to the performance environment: the systems, culture, and reinforcement structures that determine whether training actually sticks.

Kirkpatrick's Four Levels of Training Evaluation Explained

The sections below walk through each level in order from 1 to 4, which makes them easier to understand. But when you actually plan an evaluation strategy, you should work the opposite direction: start with Level 4 (what the business needs) and design backward. We'll cover that workflow later.

Level 1: Reaction

Level 1 captures how participants respond to the training experience. Specifically, it measures how engaging, relevant, and useful they found it. This is the most commonly collected type of evaluation data, usually gathered through a short post-training survey (sometimes called a "smile sheet").

One important update from the New World model: relevance is a stronger predictor of behavior change than satisfaction or engagement. A learner can enjoy a workshop and still not apply any of it. But if they tell you the content was directly relevant to their job, they're far more likely to actually use it. Build your Level 1 surveys around relevance, not just enjoyment.

Level 1 evaluation also shouldn't wait until the end of a program. Formative pulse checks during the training, like quick polls, check-in questions, or facilitator observations, let you adjust in the moment instead of finding out a week later that learners were lost in module two.

Level 1 Evaluation Techniques

Sample Level 1 Questions

Use these as a starting point for your own questionnaire. Most are rated on a 1–5 Likert scale; the last two are open-ended.

  1. The training was directly relevant to my job.
  2. I will be able to apply what I learned in my work.
  3. The content was clear and well-organized.
  4. The instructor (or eLearning experience) kept me engaged.
  5. The pace of the training was appropriate.
  6. The practice activities helped me build confidence.
  7. I would recommend this training to a colleague.
  8. I feel confident I can use what I learned after this session.
  9. What part of the training was most useful to you?
  10. What would you change to make it more useful?

Level 1 Example: Screen Sharing Training

A technical support call center rolls out new screen-sharing software and runs a one-hour webinar teaching agents when to use it, how to initiate a session, and how to handle legal disclaimers. At the end, agents complete a short online survey rating relevance, clarity, and confidence, plus two open-ended questions on what worked and what didn't. The training team uses the results to flag the disclaimer section as confusing and revise it before the next cohort.

Level 2: Learning

Level 2 measures whether learners actually acquired the knowledge, skills, and attitudes the training was designed to teach. This is the cornerstone of most instructional design work. It's where quizzes, demonstrations, and final assessments live.

The New World Kirkpatrick Model expands Level 2 beyond the original "KSA" (knowledge, skills, attitudes) trio. It adds two new indicators: confidence (do learners believe they can do this back on the job?) and commitment (do they intend to actually do it?). Both are strong predictors of whether Level 3 behavior change will happen, and both can be measured with a few well-placed survey questions.

Pre-tests are also worth the effort when feasible. Measuring knowledge before and after training is the cleanest way to attribute gains to the training itself rather than to existing experience.

Level 2 Evaluation Techniques

Sample Level 2 Questions & Assessment Formats

Knowledge check (multiple choice):

Skill demonstration prompts:

Confidence and commitment (1–5 scale):

Level 2 Example: Screen Sharing Assessment

After the call center webinar, agents complete a 10-question multiple-choice quiz on the screen-sharing process and legal disclaimers. They must score 80% or higher to receive certification. They also complete a live role-play with their supervisor — initiating a session, walking through the disclaimer, and screen-sharing successfully — before they're authorized to use the tool on real customer calls.

A quick contrast: for a coffee-roastery cleaning workshop, written tests don't cut it. Physical procedural skills are best measured by direct observation, like watching each operator clean the machine end to end.

Level 3: Behavior

Level 3 asks the question that actually matters to the business: are people behaving differently on the job because of this training? Learning something in a classroom and applying it back at your desk are two different things, and Level 3 is where the model starts producing data you can actually act on.

Two important practical notes. First, measurement should begin within a few weeks of training, not after 90 days. The longer you wait, the harder it is to attribute behavior changes to the training versus everything else happening in the work environment. A common cadence is to start observing 2–4 weeks post-training and continue through the 3–6 month window.

Second, behavior change doesn't happen because of training alone. Don Kirkpatrick identified four conditions necessary for behavior change: desire to change, knowledge of what to do, the right climate (a supportive manager, removed obstacles), and rewards for doing it. The New World model formalizes this idea as required drivers: the reinforcement, accountability, and support systems that turn learning into behavior. Without those drivers, even excellent training fails at Level 3.

Level 3 Evaluation Techniques

Sample Level 3 Questions & Observation Methods

Self-report (sent 30 and 60 days post-training):

Supervisor observation checklist (sample items):

Level 3 Example: On-the-Job Screen Sharing Behavior

The screen-sharing software is integrated with the call center's performance management platform, so every screen-share session is logged automatically. Three weeks after training, the team pulls a report: what percentage of eligible calls included a screen-share? Agents below a threshold get a coaching conversation, not a reprimand. The goal is to find out what's blocking transfer (forgot the steps? worried about customer reaction? no manager reinforcement?) and fix it.

Level 4: Results

Level 4 measures whether the training is actually moving the needle on the outcomes the business cares about, such as sales, customer satisfaction, retention, safety incidents, output, and error rates. This is where training proves its worth, and it's the level most organizations skip.

Two concepts make Level 4 more practical. The first is the distinction between leading and lagging indicators. Lagging indicators (quarterly revenue, annual turnover) tell you what already happened. Leading indicators (number of qualified demos booked, first-call resolution rate) predict where the lagging numbers are headed and let you course-correct earlier.

The second is a shift away from chasing a precise ROI number. Calculating the dollar return on a training program with any real certainty is extremely hard. There are too many confounding variables. Kirkpatrick Partners now recommends two more honest alternatives:

Phillips' ROI Methodology (covered in the alternatives section below) takes a different approach and tries to isolate training's financial impact directly. Both views are valid; choose based on what your stakeholders find credible.

Level 4 Evaluation Techniques

Sample Level 4 Metrics & Questions

For the screen-sharing initiative:

For stakeholder ROE conversations:

Level 4 Example: Customer Satisfaction Impact

Six months after the screen-sharing rollout, the team compares CSAT scores on calls that used screen sharing against a matched sample that didn't. Calls with screen sharing show a measurable lift in CSAT and a reduction in average handle time. Combined with the Level 3 data showing strong adoption, the training team makes a credible Contributive ROI argument: the program is one of several factors driving the CSAT improvement, and the evidence supports continued investment.

The New World Kirkpatrick Model

The New World Kirkpatrick Model, developed by Jim and Wendy Kirkpatrick, is the current, modernized version of the framework. It keeps the original four levels but adds several practical concepts that address the biggest weakness of the 1950s version: the assumption that good training automatically leads to behavior change and business results.

The key additions:

If you're learning the model today, learn the New World version. The original 1959 framework is foundational, but the updated model reflects what practitioners have figured out over six decades of trying to make it work in real organizations.

How to Use the Kirkpatrick Model: A Step-by-Step Framework

Here's the workflow most experienced evaluators follow: working from Level 4 backward to Level 1.

  1. Define Level 4 outcomes with stakeholders. What measurable business result is this training supposed to support? Get a specific target where possible (e.g., "sell 800,000 units in year one," "reduce safety incidents by 20%," "improve CSAT by 5 points"). Document this as your Return on Expectations.
  2. Identify Level 3 critical behaviors. Working with subject matter experts and frontline managers, list the specific on-the-job behaviors that drive the Level 4 result. Be ruthless and focus on the few behaviors that matter most.
  3. Plan the required drivers. For each critical behavior, identify the reinforcement, accountability, and support that needs to exist post-training. Who reinforces it? What's measured? What's rewarded? Without this step, training won't transfer.
  4. Define Level 2 learning objectives. What knowledge, skills, attitudes, confidence, and commitment do learners need to perform the critical behaviors? Write objectives that point directly at Level 3 behaviors.
  5. Design the Level 1 experience. Build the training to be relevant first, engaging second. Make sure learners can see the link between what they're doing in the session and what they'll do back on the job.
  6. Build evaluation instruments before launch. Write your surveys, quizzes, observation checklists, and metric dashboards before the training rolls out. If you wait until afterward, you'll miss baseline data and lose credibility with stakeholders.
  7. Implement with formative pulse-checks. Don't wait for end-of-training feedback. Check in during the program, after each module, and at intervals post-training (30, 60, 90 days).
  8. Measure, report, and iterate. Share results with sponsors against the ROE you defined in step 1. Where the program is working, scale it. Where it's not, find out whether the issue is design, delivery, or missing required drivers, and then fix the root cause.

This workflow pairs well with broader ID frameworks like ADDIE and other instructional design models. Kirkpatrick handles the evaluation strategy; ADDIE handles the design and development workflow.

Kirkpatrick Model Template & Questionnaire

You can use this template to plan your evaluation using the Kirkpatrick model. It’s the exact template that I teach and share in the Instructional Design Project Lab.

Level 1: Reaction

Success CriteriaPlanned Method(s)Timing
[Insert success criteria here] [Insert proposed method(s) here] [Insert proposed timing here]

Level 2: Learning

Success CriteriaPlanned Method(s)Timing
[Insert success criteria here] [Insert proposed method(s) here] [Insert proposed timing here]

Level 3: Behavior

Success CriteriaPlanned Method(s)Timing
[Insert success criteria here] [Insert proposed method(s) here] [Insert proposed timing here]

Level 4: Results

Success CriteriaPlanned Method(s)Timing
[Insert success criteria here] [Insert proposed method(s) here] [Insert proposed timing here]

Sample Questionnaire by Level

The sample questions earlier in this guide can be assembled into a complete questionnaire. A practical approach:

Criticisms and Limitations of the Kirkpatrick Model

The Kirkpatrick Model is the industry standard, but it has real limitations worth knowing about (especially if you're going to defend your evaluation choices to a skeptical stakeholder).

None of this invalidates the framework. It's still the most useful starting point we have. But pretending the model is airtight does the profession no favors.

Alternatives to the Kirkpatrick Model

Several other evaluation frameworks have been developed to address Kirkpatrick's limitations or take different angles on the same problem.

Model What It Adds Best For
Phillips ROI Methodology Adds a fifth level (financial ROI) and provides a specific methodology for isolating training's impact and converting results to monetary value Organizations that need a defensible dollar-figure ROI
CIRO Model (Warr, Bird, Rackham) Four stages: Context, Input, Reaction, Output. Adds front-end analysis (context and input) that Kirkpatrick assumes you've already done Programs that need formal needs analysis built into the evaluation framework
Anderson's Value of Learning Model Three-stage model emphasizing strategic alignment of learning with business priorities before measurement Senior L&D leaders aligning a training portfolio with strategy
Brinkerhoff's Success Case Method Identifies the most and least successful cases and studies them in depth, rather than averaging across all participants Quickly understanding what makes training transfer (or fail to transfer) in real conditions

In practice, most organizations end up using Kirkpatrick as the spine and borrowing from Phillips (for ROI), CIRO (for front-end context), or Brinkerhoff (for case-based insight) when the situation calls for it.

Frequently Asked Questions

What are the four levels of the Kirkpatrick Model?

Level 1 (Reaction) measures how learners respond to the training. Level 2 (Learning) measures what they know and can do. Level 3 (Behavior) measures whether they apply it on the job. Level 4 (Results) measures the impact on business outcomes.

Who developed the Kirkpatrick Model and when?

Donald Kirkpatrick developed the model as part of his 1954 PhD dissertation at the University of Wisconsin and published it through a series of articles in 1959. His son Jim Kirkpatrick and Wendy Kirkpatrick updated it into the New World Kirkpatrick Model in the 2010s.

When should each level be measured?

Level 1: during and immediately after training. Level 2: at the end of training (and pre-training as a baseline where possible). Level 3: starting within a few weeks of training and continuing through the 3–6 month window. Level 4: 3–12 months post-training, with ongoing tracking.

What's the difference between the original and New World Kirkpatrick Model?

The New World model adds required drivers (the post-training reinforcement that enables behavior change), confidence and commitment as Level 2 indicators, an emphasis on planning from Level 4 backward, and Return on Expectations (ROE) as a more practical alternative to financial ROI.

Is the Kirkpatrick Model still relevant?

Yes, it remains the most widely used training evaluation framework in corporate L&D and instructional design. The New World version addresses most of the modern critiques of the original, and the four-level vocabulary is universal among practitioners.

What are the main criticisms of the Kirkpatrick Model?

The main criticisms are that the causal links between levels are weaker than the model implies, that organizations rarely measure Levels 3 and 4 in practice, that the model doesn't account for environmental and transfer factors, and that Level 4 attribution is genuinely difficult to defend.

How does Kirkpatrick compare to the Phillips ROI Model?

Phillips adds a fifth level (financial ROI) and a specific methodology for isolating training's monetary impact. Kirkpatrick (especially the New World version) prefers Return on Expectations and Contributive ROI, which acknowledge that training contributes to outcomes alongside other factors rather than claiming sole financial credit.

Putting the Kirkpatrick Model into Practice

The Kirkpatrick Model isn't a checklist. It's a discipline. Used well, it forces you to answer two hard questions before you build a single slide: what does the business actually need? and what will people need to do differently for that to happen? Everything else follows from there.

If you only do one thing with this framework, plan from Level 4 backward. That single habit separates evaluation that informs decisions from evaluation that fills out a checkbox.

Devlin Peck
About
Devlin Peck
Devlin Peck is an instructional design educator and founder of Peck Academy, a licensed career school in Oregon, and the publisher of DevlinPeck.com.
Learn More about
Devlin Peck
.

Explore more content

Explore by tag