The Kirkpatrick Model: Four Levels of Training Evaluation (With Examples, Free Template, & FAQ)

The Kirkpatrick Model is a four-level framework for evaluating the effectiveness of training programs. Developed by Donald Kirkpatrick in 1959 and refined for decades since, it remains the most widely used training evaluation model in corporate learning and instructional design.

This guide covers all four levels with real-world examples, sample evaluation questions for each level, a planning template, the updated New World Kirkpatrick Model, common criticisms, and how Kirkpatrick compares to alternatives like the Phillips ROI Methodology and CIRO.

What Is the Kirkpatrick Model?

The Kirkpatrick Model is a training evaluation framework that measures the impact of a learning program across four sequential levels: Reaction, Learning, Behavior, and Results. Each level answers a different question about whether the training is working, and each level becomes more difficult (and more valuable) to measure than the one before it.

While the model is most often applied to corporate training and eLearning programs, it's flexible enough to evaluate any kind of learning intervention. Modern practitioners (including the Kirkpatrick family's own training organization) treat the model as cyclical rather than linear, and they teach evaluators to plan in reverse: start by defining your Level 4 business results, then work backward to design the training itself.

The Four Levels at a Glance

Here's how the four levels compare across the questions they answer, the methods used to measure them, and when to do that measurement.

Level	Question Answered	Common Methods	When to Measure	Difficulty
1. Reaction	Did learners find the training engaging, relevant, and useful?	Post-training surveys, pulse checks, focus groups	During and immediately after training	Low
2. Learning	Did learners gain the intended knowledge, skills, attitudes, confidence, and commitment?	Quizzes, tests, demonstrations, pre/post assessments	During and at the end of training	Moderate
3. Behavior	Are learners applying what they learned on the job?	Observation, performance data, 360-degree feedback, supervisor reviews	Beginning within weeks of training; ongoing	High
4. Results	Is the training producing measurable outcomes for the business?	KPIs, ROE, Contributive ROI, leading and lagging indicators	3–12 months post-training, ongoing	Very High

A Brief History of the Model (1959–2026)

Donald Kirkpatrick first introduced the four levels in his 1954 PhD dissertation at the University of Wisconsin, then published the framework in a 1959 series of articles in the journal Training and Development. His 1994 book Evaluating Training Programs: The Four Levels made the model the de facto standard for the training industry.

In the 2010s, Don's son Jim Kirkpatrick and daughter-in-law Wendy Kirkpatrick launched the New World Kirkpatrick Model through Kirkpatrick Partners. Their update kept the original four levels but added critical refinements: required drivers to support on-the-job behavior, confidence and commitment as Level 2 indicators, and a strong emphasis on planning evaluation from Level 4 backward.

The model continues to evolve. In 2026, Vanessa Milara Alzate is expanding the framework further with explicit attention to the performance environment: the systems, culture, and reinforcement structures that determine whether training actually sticks.

Kirkpatrick's Four Levels of Training Evaluation Explained

The sections below walk through each level in order from 1 to 4, which makes them easier to understand. But when you actually plan an evaluation strategy, you should work the opposite direction: start with Level 4 (what the business needs) and design backward. We'll cover that workflow later.

Level 1: Reaction

Level 1 captures how participants respond to the training experience. Specifically, it measures how engaging, relevant, and useful they found it. This is the most commonly collected type of evaluation data, usually gathered through a short post-training survey (sometimes called a "smile sheet").

One important update from the New World model: relevance is a stronger predictor of behavior change than satisfaction or engagement. A learner can enjoy a workshop and still not apply any of it. But if they tell you the content was directly relevant to their job, they're far more likely to actually use it. Build your Level 1 surveys around relevance, not just enjoyment.

Level 1 evaluation also shouldn't wait until the end of a program. Formative pulse checks during the training, like quick polls, check-in questions, or facilitator observations, let you adjust in the moment instead of finding out a week later that learners were lost in module two.

Level 1 Evaluation Techniques

Post-training surveys (delivered via email, LMS, or in-session form)
In-the-moment polls and pulse checks
Short interviews or focus groups with a sample of participants
Facilitator observations during live sessions

Sample Level 1 Questions

Use these as a starting point for your own questionnaire. Most are rated on a 1–5 Likert scale; the last two are open-ended.

The training was directly relevant to my job.
I will be able to apply what I learned in my work.
The content was clear and well-organized.
The instructor (or eLearning experience) kept me engaged.
The pace of the training was appropriate.
The practice activities helped me build confidence.
I would recommend this training to a colleague.
I feel confident I can use what I learned after this session.
What part of the training was most useful to you?
What would you change to make it more useful?

Level 1 Example: Screen Sharing Training

A technical support call center rolls out new screen-sharing software and runs a one-hour webinar teaching agents when to use it, how to initiate a session, and how to handle legal disclaimers. At the end, agents complete a short online survey rating relevance, clarity, and confidence, plus two open-ended questions on what worked and what didn't. The training team uses the results to flag the disclaimer section as confusing and revise it before the next cohort.

Level 2: Learning

Level 2 measures whether learners actually acquired the knowledge, skills, and attitudes the training was designed to teach. This is the cornerstone of most instructional design work. It's where quizzes, demonstrations, and final assessments live.

The New World Kirkpatrick Model expands Level 2 beyond the original "KSA" (knowledge, skills, attitudes) trio. It adds two new indicators: confidence (do learners believe they can do this back on the job?) and commitment (do they intend to actually do it?). Both are strong predictors of whether Level 3 behavior change will happen, and both can be measured with a few well-placed survey questions.

Pre-tests are also worth the effort when feasible. Measuring knowledge before and after training is the cleanest way to attribute gains to the training itself rather than to existing experience.

Level 2 Evaluation Techniques

Multiple-choice quizzes and written tests (best for knowledge and cognitive skills)
Skill demonstrations and role plays (best for procedural or physical skills)
Pre- and post-assessments to measure gain
Confidence and commitment surveys (e.g., "How confident are you that you can do X on Monday?")
Case-study analyses and scenario-based questions

Sample Level 2 Questions & Assessment Formats

Knowledge check (multiple choice):

Which of the following is the correct first step in initiating a screen-sharing session? (A/B/C/D)
Before sharing your screen with a customer, you must: (select all that apply)

Skill demonstration prompts:

"Walk me through initiating a screen-sharing session with this practice customer."
"Show me how you would handle a customer who declines the screen-sharing request."

Confidence and commitment (1–5 scale):

I feel confident I can initiate a screen-sharing session correctly on my next live call.
I plan to use screen sharing on appropriate calls starting this week.

Level 2 Example: Screen Sharing Assessment

After the call center webinar, agents complete a 10-question multiple-choice quiz on the screen-sharing process and legal disclaimers. They must score 80% or higher to receive certification. They also complete a live role-play with their supervisor — initiating a session, walking through the disclaimer, and screen-sharing successfully — before they're authorized to use the tool on real customer calls.

A quick contrast: for a coffee-roastery cleaning workshop, written tests don't cut it. Physical procedural skills are best measured by direct observation, like watching each operator clean the machine end to end.

Level 3: Behavior

Level 3 asks the question that actually matters to the business: are people behaving differently on the job because of this training? Learning something in a classroom and applying it back at your desk are two different things, and Level 3 is where the model starts producing data you can actually act on.

Two important practical notes. First, measurement should begin within a few weeks of training, not after 90 days. The longer you wait, the harder it is to attribute behavior changes to the training versus everything else happening in the work environment. A common cadence is to start observing 2–4 weeks post-training and continue through the 3–6 month window.

Second, behavior change doesn't happen because of training alone. Don Kirkpatrick identified four conditions necessary for behavior change: desire to change, knowledge of what to do, the right climate (a supportive manager, removed obstacles), and rewards for doing it. The New World model formalizes this idea as required drivers: the reinforcement, accountability, and support systems that turn learning into behavior. Without those drivers, even excellent training fails at Level 3.

Level 3 Evaluation Techniques

Direct on-the-job observation by supervisors or peers
Performance metrics already tracked by business systems (call data, sales activity, ticket resolution times)
360-degree feedback from managers, peers, and direct reports
Self-reported behavior surveys at intervals (30, 60, 90 days)
Action plans and follow-up coaching conversations
xAPI (Experience API / Tin Can) data for tracking informal and on-the-job activity

Sample Level 3 Questions & Observation Methods

Self-report (sent 30 and 60 days post-training):

How often have you used screen sharing on customer calls in the past 30 days?
What's been the biggest obstacle to applying what you learned?
What support from your manager or team would help you apply this more consistently?

Supervisor observation checklist (sample items):

Agent identified an appropriate moment to offer screen sharing
Agent read the disclaimer accurately and in full
Agent troubleshot connection issues without disengaging the customer

Level 3 Example: On-the-Job Screen Sharing Behavior

The screen-sharing software is integrated with the call center's performance management platform, so every screen-share session is logged automatically. Three weeks after training, the team pulls a report: what percentage of eligible calls included a screen-share? Agents below a threshold get a coaching conversation, not a reprimand. The goal is to find out what's blocking transfer (forgot the steps? worried about customer reaction? no manager reinforcement?) and fix it.

Level 4: Results

Level 4 measures whether the training is actually moving the needle on the outcomes the business cares about, such as sales, customer satisfaction, retention, safety incidents, output, and error rates. This is where training proves its worth, and it's the level most organizations skip.

Two concepts make Level 4 more practical. The first is the distinction between leading and lagging indicators. Lagging indicators (quarterly revenue, annual turnover) tell you what already happened. Leading indicators (number of qualified demos booked, first-call resolution rate) predict where the lagging numbers are headed and let you course-correct earlier.

The second is a shift away from chasing a precise ROI number. Calculating the dollar return on a training program with any real certainty is extremely hard. There are too many confounding variables. Kirkpatrick Partners now recommends two more honest alternatives:

Return on Expectations (ROE): Did the training deliver what stakeholders said success would look like? Defined collaboratively at the start of the project.
Contributive ROI (cROI): An acknowledgment that training contributes to business results alongside other factors, rather than claiming sole credit.

Phillips' ROI Methodology (covered in the alternatives section below) takes a different approach and tries to isolate training's financial impact directly. Both views are valid; choose based on what your stakeholders find credible.

Level 4 Evaluation Techniques

Tracking organization-level KPIs against pre-training baselines
Control-group comparisons where feasible
ROE check-ins with sponsors against pre-defined success criteria
Customer feedback, NPS, and CSAT data
Operational metrics (output, error rates, safety incidents, turnover)

Sample Level 4 Metrics & Questions

For the screen-sharing initiative:

CSAT (customer satisfaction) on calls that included screen sharing vs. calls that did not
First-call resolution rate
Average handle time on issues where screen sharing was used

For stakeholder ROE conversations:

Did we hit the customer-satisfaction targets you defined at the start of the project?
What evidence would convince you this program is worth continuing?
What unintended outcomes, positive or negative, have you observed?

Level 4 Example: Customer Satisfaction Impact

Six months after the screen-sharing rollout, the team compares CSAT scores on calls that used screen sharing against a matched sample that didn't. Calls with screen sharing show a measurable lift in CSAT and a reduction in average handle time. Combined with the Level 3 data showing strong adoption, the training team makes a credible Contributive ROI argument: the program is one of several factors driving the CSAT improvement, and the evidence supports continued investment.

The New World Kirkpatrick Model

The New World Kirkpatrick Model, developed by Jim and Wendy Kirkpatrick, is the current, modernized version of the framework. It keeps the original four levels but adds several practical concepts that address the biggest weakness of the 1950s version: the assumption that good training automatically leads to behavior change and business results.

The key additions:

Plan from Level 4 backward. Every project starts by defining business results, then identifies the critical behaviors that produce those results, then the learning required for those behaviors, then the experience that delivers the learning.
Required drivers at Level 3. Reinforcement, encouragement, rewards, and accountability systems that have to exist in the work environment for behavior change to happen. The model estimates that around 70% of behavior-change success comes from these drivers, not from training quality.
Confidence and commitment at Level 2. Two new learning indicators that predict transfer better than knowledge alone.
Return on Expectations (ROE) at Level 4. A stakeholder-defined measure of success that replaces the often-unreliable hunt for a precise ROI percentage.

If you're learning the model today, learn the New World version. The original 1959 framework is foundational, but the updated model reflects what practitioners have figured out over six decades of trying to make it work in real organizations.

How to Use the Kirkpatrick Model: A Step-by-Step Framework

Here's the workflow most experienced evaluators follow: working from Level 4 backward to Level 1.

Define Level 4 outcomes with stakeholders. What measurable business result is this training supposed to support? Get a specific target where possible (e.g., "sell 800,000 units in year one," "reduce safety incidents by 20%," "improve CSAT by 5 points"). Document this as your Return on Expectations.
Identify Level 3 critical behaviors. Working with subject matter experts and frontline managers, list the specific on-the-job behaviors that drive the Level 4 result. Be ruthless and focus on the few behaviors that matter most.
Plan the required drivers. For each critical behavior, identify the reinforcement, accountability, and support that needs to exist post-training. Who reinforces it? What's measured? What's rewarded? Without this step, training won't transfer.
Define Level 2 learning objectives. What knowledge, skills, attitudes, confidence, and commitment do learners need to perform the critical behaviors? Write objectives that point directly at Level 3 behaviors.
Design the Level 1 experience. Build the training to be relevant first, engaging second. Make sure learners can see the link between what they're doing in the session and what they'll do back on the job.
Build evaluation instruments before launch. Write your surveys, quizzes, observation checklists, and metric dashboards before the training rolls out. If you wait until afterward, you'll miss baseline data and lose credibility with stakeholders.
Implement with formative pulse-checks. Don't wait for end-of-training feedback. Check in during the program, after each module, and at intervals post-training (30, 60, 90 days).
Measure, report, and iterate. Share results with sponsors against the ROE you defined in step 1. Where the program is working, scale it. Where it's not, find out whether the issue is design, delivery, or missing required drivers, and then fix the root cause.

This workflow pairs well with broader ID frameworks like ADDIE and other instructional design models. Kirkpatrick handles the evaluation strategy; ADDIE handles the design and development workflow.

Kirkpatrick Model Template & Questionnaire

You can use this template to plan your evaluation using the Kirkpatrick model. It’s the exact template that I teach and share in the Instructional Design Project Lab.

Level 1: Reaction

Success Criteria	Planned Method(s)	Timing
[Insert success criteria here]	[Insert proposed method(s) here]	[Insert proposed timing here]

Level 2: Learning

Success Criteria	Planned Method(s)	Timing
[Insert success criteria here]	[Insert proposed method(s) here]	[Insert proposed timing here]

Level 3: Behavior

Success Criteria	Planned Method(s)	Timing
[Insert success criteria here]	[Insert proposed method(s) here]	[Insert proposed timing here]

Level 4: Results

Success Criteria	Planned Method(s)	Timing
[Insert success criteria here]	[Insert proposed method(s) here]	[Insert proposed timing here]

Sample Questionnaire by Level

The sample questions earlier in this guide can be assembled into a complete questionnaire. A practical approach:

End-of-training survey (Level 1): 8–10 items covering relevance, clarity, engagement, confidence, and two open-ended questions.
Knowledge assessment (Level 2): 10–15 multiple-choice items aligned to learning objectives, plus 2–3 confidence and commitment items on a 5-point scale.
30/60/90-day follow-up (Level 3): 5–8 self-report items on frequency of application, perceived obstacles, and support received.
Stakeholder ROE check-in (Level 4): 3–5 questions asking sponsors to evaluate progress against the success criteria defined at kickoff.

Criticisms and Limitations of the Kirkpatrick Model

The Kirkpatrick Model is the industry standard, but it has real limitations worth knowing about (especially if you're going to defend your evaluation choices to a skeptical stakeholder).

The causal-link assumption is weak. The model implies that positive Level 1 reactions lead to Level 2 learning, which leads to Level 3 behavior, which leads to Level 4 results. Decades of research (including critiques from Will Thalheimer and academic work indexed by ERIC) show that the relationships between levels are far weaker than the model suggests. Strong Level 1 scores tell you almost nothing about whether Level 3 transfer will happen.
Most organizations stop at Levels 1 and 2. Because Levels 3 and 4 are harder to measure, they get skipped. The data that matters most is the data least often collected.
The model doesn't isolate training from the environment. If Level 3 behavior doesn't change, was the training bad or did the work environment fail to support it? The original model doesn't answer this clearly. (The New World model's "required drivers" concept addresses this directly.)
Attribution at Level 4 is genuinely hard. Business results are influenced by economic conditions, product changes, leadership, market timing, and dozens of other variables. Claiming training caused a specific outcome is rarely defensible. ROE and Contributive ROI are more honest framings.
The model is descriptive, not prescriptive. It tells you what to measure but not how to design effective training. It needs to be paired with instructional design models and learning science.

None of this invalidates the framework. It's still the most useful starting point we have. But pretending the model is airtight does the profession no favors.

Alternatives to the Kirkpatrick Model

Several other evaluation frameworks have been developed to address Kirkpatrick's limitations or take different angles on the same problem.

Model	What It Adds	Best For
Phillips ROI Methodology	Adds a fifth level (financial ROI) and provides a specific methodology for isolating training's impact and converting results to monetary value	Organizations that need a defensible dollar-figure ROI
CIRO Model (Warr, Bird, Rackham)	Four stages: Context, Input, Reaction, Output. Adds front-end analysis (context and input) that Kirkpatrick assumes you've already done	Programs that need formal needs analysis built into the evaluation framework
Anderson's Value of Learning Model	Three-stage model emphasizing strategic alignment of learning with business priorities before measurement	Senior L&D leaders aligning a training portfolio with strategy
Brinkerhoff's Success Case Method	Identifies the most and least successful cases and studies them in depth, rather than averaging across all participants	Quickly understanding what makes training transfer (or fail to transfer) in real conditions

In practice, most organizations end up using Kirkpatrick as the spine and borrowing from Phillips (for ROI), CIRO (for front-end context), or Brinkerhoff (for case-based insight) when the situation calls for it.

Frequently Asked Questions

What are the four levels of the Kirkpatrick Model?

Level 1 (Reaction) measures how learners respond to the training. Level 2 (Learning) measures what they know and can do. Level 3 (Behavior) measures whether they apply it on the job. Level 4 (Results) measures the impact on business outcomes.

Who developed the Kirkpatrick Model and when?

Donald Kirkpatrick developed the model as part of his 1954 PhD dissertation at the University of Wisconsin and published it through a series of articles in 1959. His son Jim Kirkpatrick and Wendy Kirkpatrick updated it into the New World Kirkpatrick Model in the 2010s.

When should each level be measured?

Level 1: during and immediately after training. Level 2: at the end of training (and pre-training as a baseline where possible). Level 3: starting within a few weeks of training and continuing through the 3–6 month window. Level 4: 3–12 months post-training, with ongoing tracking.

What's the difference between the original and New World Kirkpatrick Model?

The New World model adds required drivers (the post-training reinforcement that enables behavior change), confidence and commitment as Level 2 indicators, an emphasis on planning from Level 4 backward, and Return on Expectations (ROE) as a more practical alternative to financial ROI.

Is the Kirkpatrick Model still relevant?

Yes, it remains the most widely used training evaluation framework in corporate L&D and instructional design. The New World version addresses most of the modern critiques of the original, and the four-level vocabulary is universal among practitioners.

What are the main criticisms of the Kirkpatrick Model?

The main criticisms are that the causal links between levels are weaker than the model implies, that organizations rarely measure Levels 3 and 4 in practice, that the model doesn't account for environmental and transfer factors, and that Level 4 attribution is genuinely difficult to defend.

How does Kirkpatrick compare to the Phillips ROI Model?

Phillips adds a fifth level (financial ROI) and a specific methodology for isolating training's monetary impact. Kirkpatrick (especially the New World version) prefers Return on Expectations and Contributive ROI, which acknowledge that training contributes to outcomes alongside other factors rather than claiming sole financial credit.

Putting the Kirkpatrick Model into Practice

The Kirkpatrick Model isn't a checklist. It's a discipline. Used well, it forces you to answer two hard questions before you build a single slide: what does the business actually need? and what will people need to do differently for that to happen? Everything else follows from there.

If you only do one thing with this framework, plan from Level 4 backward. That single habit separates evaluation that informs decisions from evaluation that fills out a checkbox.

The Kirkpatrick Model: Four Levels of Training Evaluation (With Examples, Free Template, & FAQ)

What Is the Kirkpatrick Model?

The Four Levels at a Glance

A Brief History of the Model (1959–2026)

Kirkpatrick's Four Levels of Training Evaluation Explained

Level 1: Reaction

Level 1 Evaluation Techniques

Sample Level 1 Questions

Level 1 Example: Screen Sharing Training

Level 2: Learning

Level 2 Evaluation Techniques

Sample Level 2 Questions & Assessment Formats

Level 2 Example: Screen Sharing Assessment

Level 3: Behavior

Level 3 Evaluation Techniques

Sample Level 3 Questions & Observation Methods

Level 3 Example: On-the-Job Screen Sharing Behavior

Level 4: Results

Level 4 Evaluation Techniques

Sample Level 4 Metrics & Questions

Level 4 Example: Customer Satisfaction Impact

The New World Kirkpatrick Model

How to Use the Kirkpatrick Model: A Step-by-Step Framework

Kirkpatrick Model Template & Questionnaire

Level 1: Reaction

Level 2: Learning

Level 3: Behavior

Level 4: Results

Sample Questionnaire by Level

Criticisms and Limitations of the Kirkpatrick Model

Alternatives to the Kirkpatrick Model

Frequently Asked Questions

What are the four levels of the Kirkpatrick Model?

Who developed the Kirkpatrick Model and when?

When should each level be measured?

What's the difference between the original and New World Kirkpatrick Model?

Is the Kirkpatrick Model still relevant?

What are the main criticisms of the Kirkpatrick Model?

How does Kirkpatrick compare to the Phillips ROI Model?

Putting the Kirkpatrick Model into Practice

Explore more content

50 Best Jobs for Former Teachers in 2026 (High-Paying & Remote Options)

The Full History of Instructional Design

Top 25 Non-Teaching Jobs in Education (2026) | Salary & Guide

Moving Beyond Infodumps with Cathy Moore

Explore by tag

Join the ID Community

Contact

Links

Content tags

Mailing list