Interleaving: The Evidence on Mixing Practice That Feels Harder but Works Better
Interleaving evidence: Firth, Rivers and Boyle (2021) systematic review, Rohrer's mathematics studies, and the desirable-difficulty mechanism that produces real gains.

Interleaving is the counterintuitive finding that mixing related problem types during practice produces better long-term learning than blocking them by topic — even though students consistently report feeling less confident while they're doing it. The method sits in a family of "desirable difficulty" techniques that make practice feel harder in the short run and produce more durable learning in the long run. This evidence review sets out what interleaving actually is, what the 2021 Firth systematic review and adjacent classroom research show, and how to run it without destabilising students.
What interleaving actually is
Interleaving is the deliberate mixing of related but distinct problem types, cases, or categories during practice. If students are learning to identify four kinds of geometric proof, an interleaved practice set mixes proof types within the session. A blocked practice set does all of type A, then all of type B, then all of type C, then all of type D.
The contrast matters because the two formats feel profoundly different to students. Blocked practice feels smooth. The strategy for each problem is the same as the last problem, so students apply it quickly. Interleaved practice feels harder. Each problem requires the student to first identify which strategy fits — the discrimination move — before solving.
The discrimination move is exactly where the learning sits. Students who have only practised in blocked sets often cannot tell problem types apart when they mix on a test. Students who have practised in interleaved sets can.
What the research actually shows
The evidence base has firmed up over the last decade.
Firth, Rivers, and Boyle (2021) published a systematic review of interleaving as a concept-learning strategy in Review of Education. Their conclusion: interleaving produces consistent benefits for discrimination learning — the ability to tell related categories or problem types apart — across a range of subjects and ages, with the strongest effects on delayed tests rather than immediate tests.
Rohrer and colleagues' earlier classroom studies in mathematics — widely cited as the canonical interleaving evidence — reported large effects on delayed tests of problem classification and solution, with doubling or tripling of test performance in some studies when compared with blocked practice.
Kim and Webb (2022) meta-analysed spaced practice in second-language learning and found consistent positive effects that overlap conceptually with interleaving. The two techniques are cousins: spacing is about temporal distribution; interleaving is about categorical mixing. Both exploit the desirable-difficulty mechanism.
The subjective-difficulty paradox is the part that matters for classroom practice. Students practising interleaved sets report lower confidence and predict worse performance than students practising blocked sets — and then outperform them on delayed tests. Teachers who don't know this pattern typically revert to blocked practice at the first sign of student discomfort. The discomfort is the mechanism, not a signal to change course.
The three conditions for interleaving
Interleaving works when three conditions are met.
- The content is genuinely discriminable. If problem types are not actually different, mixing them does nothing. The method pays off when students need to decide which strategy or rule applies.
- Students have some baseline exposure to each type first. Pure interleaving from lesson one with no blocked practice overwhelms novices. A short blocked introduction to each type, followed by interleaved practice, is the defensible sequence.
- Practice is spaced across time as well as mixed within sessions. Interleaving within a single session produces some benefit. Interleaving across spaced sessions produces the larger effect.
Classroom examples across phases
Primary. Year 5 mathematics on fractions. Instead of a Monday lesson on adding fractions with like denominators, a Tuesday lesson on adding fractions with unlike denominators, and a Wednesday lesson on subtracting fractions, the practice set for Wednesday mixes all three. Students must first decide which operation and denominator condition each problem fits, then solve.
Secondary. Year 10 science on energy transfers. Over three weeks, the topic covers mechanical, thermal, electrical, and radiative transfer. A Friday retrieval set across weeks two and three interleaves problems across all four transfer types, so students have to classify first and apply second.
Tertiary. First-year medical school on acid-base chemistry. A problem set mixes respiratory acidosis, respiratory alkalosis, metabolic acidosis, and metabolic alkalosis cases rather than blocking by disturbance type. Students must classify each blood-gas panel before applying the relevant compensation rules.
Where interleaving fails
The failure modes are specific.
- Pure interleaving with novices from lesson one. Students need baseline exposure to each type individually before mixing produces the discrimination benefit.
- Mixing content that is not genuinely distinguishable. If the problem types are variations of the same rule with no real discrimination required, interleaving adds cognitive load without learning benefit.
- Reverting at the first sign of student discomfort. Students will report that interleaved practice feels harder and predict they are doing worse. The feeling is the mechanism; the prediction is wrong. Teachers who back off lose the effect.
- Interleaving without spaced return. A single interleaved session produces some benefit but not the large durable effect. Return to the same content across weeks with interleaved sets each time.
- No retrieval in the interleaved set. Interleaving works best paired with retrieval practice — students recall and apply from memory, not from a formula list. Open-book interleaving is much weaker than closed-book interleaving.
Best fit and poor fit
Best fit: mathematics problem classification and solution, science concept classification, grammar and language contrasts, case-based clinical reasoning, anything where decision-which-rule-applies is a core skill.
Poor fit: skills where students need to build automaticity on a single procedure before discrimination is useful; early stages of teaching a wholly new concept.
Evidence caveat: immediate-test performance often shows little benefit and sometimes a small decrement. The case for interleaving is about delayed-test performance and transfer. Short-term assessments can mislead.
Teacher requirements, assessment, and resources
Interleaving is resource-light and design-sensitive. The cost is rebuilding practice sets so that they mix across types — which is a one-off planning investment for each unit.
Assess with delayed tests, not immediate ones. A week-later or unit-end test that includes mixed problem types is the right measurement instrument. Weekly immediate tests that stay within the blocked-practice format reward blocked practice artefactually and hide the interleaving benefit.
How TAyumira supports interleaving
TAyumira supports interleaving in lesson plans that involve discriminable content types. When you pick it or enable it within a lesson, the generator produces:
- Interleaved practice sets that mix related problem types within and across sessions
- A suggested sequencing: initial blocked exposure, then interleaved practice with spaced return
- A cumulative retrieval quiz for the unit that mixes types
- Teacher notes explaining the subjective-difficulty paradox so you know what student reaction to expect
- A delayed-test template that measures discrimination rather than only procedure
Start for free — the Free tier covers the full workflow.
FAQ
What is the effect size of interleaving?
Firth, Rivers, and Boyle (2021) in Review of Education reported a systematic positive effect for interleaving on concept-learning and discrimination, with strongest effects on delayed tests. Earlier classroom studies by Rohrer in mathematics reported large effects — in some cases doubling or tripling of delayed-test performance compared with blocked practice.
What is the difference between interleaving and spaced practice?
Spaced practice is the temporal distribution of practice across sessions over time. Interleaving is the categorical mixing of related content within a practice session. The two are complementary: spacing combats forgetting, interleaving builds discrimination. The strongest effects occur when practice is both spaced and interleaved.
Why does interleaving feel harder for students?
Because it is harder. Interleaved practice requires students to first decide which strategy applies before solving — the discrimination move that blocked practice skips. Students experience lower confidence and predict worse performance, then perform better on delayed tests. The mismatch between felt difficulty and actual learning is why interleaving is classed as a "desirable difficulty."
Should new topics be introduced with interleaving?
No. Students need baseline exposure to each type individually first. The defensible sequence is: brief blocked introduction of each type, then interleaved practice that mixes them. Pure interleaving with true novices overwhelms working memory and produces no benefit.
How do I assess interleaving's effect in my classroom?
Use delayed tests a week or more after practice, and ensure the test mixes problem types. Immediate tests or tests that stay within the blocked-practice format reward blocked practice artefactually. The true interleaving benefit shows up on delayed, mixed assessments.
Related evidence reviews
- Retrieval Practice and Spaced Practice Evidence
- Explicit Instruction Evidence
- Metacognition and Self-Regulated Learning Evidence
- Dual Coding and Multiple Representations Evidence
Sources
- Firth, J., Rivers, I., & Boyle, J. (2021). A systematic review of interleaving as a concept learning strategy. Review of Education.
- Rohrer, D. (2012). Interleaving helps students distinguish among similar concepts. Educational Psychology Review.
- Kim, S. K., & Webb, S. (2022). The effects of spaced practice on second language learning: A meta-analysis. Language Learning, 72(1), 269–319.
- Bjork, R. A., & Bjork, E. L. Making things hard on yourself, but in a good way. (Foundational desirable-difficulty reference.)
Try one interleaved practice set this week
Pick a lesson where students have already been introduced to three or four related problem types. Rebuild the practice set so it mixes all of them. Warn students that it will feel harder. Measure with a delayed test a week later. If you want interleaved practice sets and cumulative retrieval quizzes generated for you, create a free TAyumira account.

