22 April 2026TAyumira Editorial

Formative Assessment and Feedback: The Evidence Teachers Need to Know

Formative assessment evidence: Black and Wiliam's legacy, d ≈ 0.20 meta-analysis, d ≈ 0.48 feedback, EFA trial +2 months — and how to apply it in the classroom.

Formative assessment sits in an awkward place in the evidence base. For years it was sold with enormous headline effect sizes that did not hold up to careful meta-analysis. At the same time, the underlying idea — use evidence of learning, moment by moment, to decide what to teach next — is so plainly sensible that it has survived every wave of pedagogical fashion intact. This guide pulls the formative assessment evidence back to what the peer-reviewed literature actually supports in 2026, explains what the large-scale Embedding Formative Assessment trial in English secondary schools really found, and gives you the classroom routines that turn the evidence into practice.

What formative assessment actually is

Formative assessment is the planned, and in-the-moment, use of evidence about learning to adapt teaching and help students improve. The terminology originates in Michael Scriven's 1967 distinction between formative and summative evaluation, was taken into classroom assessment theory by Benjamin Bloom, and was popularised in schools through the work of Paul Black, Dylan Wiliam, and colleagues.

It is not a particular quiz format or a branded programme. It is a cycle: clarify success criteria, elicit evidence of current understanding, interpret errors diagnostically, and provide feedback that moves the learner from current performance toward the target. The cycle happens over a year, over a unit, over a lesson, and over a single minute of questioning.

What the research actually shows

The research base is strong in aggregate but far more modest than the early headline numbers suggested.

A cautious meta-analysis of K–12 formative assessment studies found a weighted mean effect of d ≈ 0.20 and a median of 0.25. That is much smaller than the d ≈ 0.70–0.80 figures that circulated in older policy documents. The smaller number reflects tighter study inclusion rules, more conservative handling of outliers, and a more honest treatment of implementation variation.

A very large feedback meta-analysis found an overall d ≈ 0.48, with substantial heterogeneity. That is a genuinely moderate-to-large effect. The most important finding is not the headline size but the moderators. Task-level, process-level, and self-regulation-level feedback outperformed praise and person-focused comments ("good job", "you are so smart"). Praise and grade-only feedback were either neutral or negative. Feedback that forces the learner into action — rework this, try this strategy, check this type of error — outperforms feedback that simply informs.

In the large English secondary Embedding Formative Assessment trial, teachers in treatment schools attended a two-year monthly teacher-learning-community programme. The trial found roughly +2 months of additional progress on Attainment 8, though not specific gains in English or maths GCSE alone. That result is consistent with the smaller but real d ≈ 0.20 meta-analytic effect: formative assessment works, it works at scale, and the size of the effect is sensitive to implementation quality and duration.

The aggregate verdict is: strong in principle, smaller than the headline numbers, and implementation-sensitive. Done well, it reliably improves responsiveness and efficiency of teaching. Done as a superficial exercise — ticks next to boxes, generic "WWW/EBI" comments, data walls without reteaching — it disappears.

The cycle in practice

A working formative assessment routine has four moves.

  • Clarify success criteria. What does "good" look like here? Co-construct, show exemplars and non-exemplars, or display the criteria at the start.
  • Elicit evidence. Hinge questions, mini-whiteboards, retrieval starters, live circulating, exit tickets. The evidence has to be generated in a form you can actually see.
  • Interpret diagnostically. Categorise errors by likely cause, not by surface feature. Two students both got the fraction addition wrong — one because they added denominators, one because they didn't find the common denominator. That is two different lessons, not one.
  • Feed forward. Give concise task-, process-, or self-regulation-level feedback. Build in classroom time for students to act on it. Feedback the student never uses is feedback wasted.

What counts as good feedback

The feedback meta-analysis evidence is unusually clean on this point. Useful feedback is specific, actionable, focused on the task or the strategy, and given in time to matter. "Focus your introduction on the specific question asked" beats "good effort." "Check whether you multiplied or added in step three" beats "close."

Feedback that is praise-only or person-focused tends to depress performance, because it tells the student about themselves rather than about the work. The teacher who writes "brilliant!" on a mediocre paragraph has just trained the student to accept mediocre work.

A short rule of thumb: the number of minutes the class spends acting on feedback should exceed the number of minutes the teacher spent writing it. If it does not, the ratio is wrong.

Classroom examples across phases

Primary. Year 5 persuasive writing. Success criteria are co-constructed from two exemplars at the start of the unit. Students draft a paragraph; the teacher holds five-minute live conferences while the class writes. Every student then redrafts at least one sentence. Exit ticket: a rewritten topic sentence showing the targeted improvement.

Secondary. Year 10 algebra. The teacher uses a pre-planned hinge question with four multiple-choice answers after the initial explanation of factorising quadratics. If fewer than 75% of the class choose the correct option on mini-whiteboards, the teacher reteaches immediately. If 75% or more, the class moves to guided practice. The hinge is the decision point.

Tertiary. Second-year seminar essays. Students receive an annotated exemplar and a rubric. Rubric-based peer review happens in-seminar before submission. Each student submits a 100-word reflection on the revision choices they made and why. The final piece is marked against the rubric, with feedback explicitly tied to one of three improvement categories.

Where schools go wrong

The common failure modes of formative assessment are remarkably consistent.

  • Generic comments. "Good effort. More detail next time." That is not feedback; that is a receipt that the teacher read the work.
  • Overmarking. Writing on every line of every piece of student work is unsustainable and, past a certain density, actually reduces the probability that the student acts on any of it.
  • Data walls without reteaching. A wall showing which students got which questions right is just a very expensive poster if it does not change what happens in the next lesson.
  • Misconception-blind marking. Giving a tick or cross without categorising the error type means the data is unusable for planning.
  • No response time. Students need timetabled time to act on feedback. If they don't use the feedback, the feedback didn't exist for them.

Mitigate by focusing on one or two high-value next steps per piece of work, building response time into the lesson, and training teachers in diagnostic question design rather than in better comment-writing.

Best fit and limits

Formative assessment fits every age and subject, especially where misconceptions are common and tasks generate inspectable evidence. It is cheap from a materials point of view and expensive in teacher thinking. It requires assessment literacy, strong questioning, and routines for acting on evidence — not just for gathering it.

The limits sit at the edges. In a tightly summative, speeded-test environment, aggressive formative routines can compete with final test preparation if the two aren't aligned. In classrooms with weak behaviour routines, feedback and redraft cycles collapse because the time and attention isn't there.

Teacher requirements, assessment, and resources

Schools need assessment literacy across the staff. The most useful professional development targets are: designing hinge questions with explicit correct and incorrect options tied to likely misconceptions; listening to student answers diagnostically rather than ticking them for "yes / no"; writing short feed-forward comments that force action; building in classroom time for students to use feedback.

Evaluate the routine through pre/post task improvement, the rate at which students act on feedback, and pattern analysis of misconceptions across the cohort. Do not evaluate it through marking frequency or comment length; those metrics incentivise the wrong behaviour.

How to use TAyumira for formative assessment routines

TAyumira builds formative assessment into every lesson it generates. For any lesson the planner produces, you get:

  • A set of hinge questions with pre-written expected correct and incorrect responses
  • An aligned exit ticket tied to the stated success criteria
  • A live presenter with class-response polling, so you can sample every student in seconds
  • Automatic identification of misconception patterns across the class
  • Feed-forward comments generated from the student's actual work, not from a template

Start for free — the Free tier covers the full lesson generation workflow.

FAQ

What is the effect size of formative assessment?

A cautious meta-analysis of K–12 formative assessment studies reports a weighted mean effect of roughly d ≈ 0.20 with a median of 0.25 — much smaller than the headline numbers common in older policy documents. Feedback research reports a larger overall effect of d ≈ 0.48 with heavy moderation by feedback type, and the Embedding Formative Assessment trial found approximately +2 months of additional progress on Attainment 8 in English secondary schools.

What makes feedback effective?

Feedback is most effective when it is specific, actionable, and focused on the task, the process, or the learner's self-regulation rather than on the person. Praise-only and grade-only feedback tends to be neutral or negative. The strongest feedback forces the student into action: rework this, try this strategy, check for this category of error.

How often should I use hinge questions?

Every three to ten minutes of direct teaching is a reasonable rule of thumb. The point is to sample class understanding before moving on. If fewer than around 75% of the class get the hinge right, reteach. If more, move on.

Why did my marking not improve results?

Usually because the students never acted on the feedback. Research on feedback is consistent: feedback that the student does not use, or cannot use because the task is already over, does not change learning. Build in classroom time for action on feedback. If you cannot, the feedback investment is not paying.

What is the difference between formative and summative assessment?

Formative assessment is used to adapt teaching and help students improve — it happens during learning. Summative assessment measures what was learned at the end of a period — it happens at the end of learning. The same quiz can be used formatively or summatively depending on what happens next with the results.

Related evidence reviews

Sources

Try one formative routine this week

Pick one hinge question for one lesson. Write the correct option and three wrong options that each map to a specific misconception. Use mini-whiteboards to sample the whole class. Reteach or move on based on the result. If you want lesson plans that come with hinge questions pre-written, create a free TAyumira account.

Want lessons like this, generated for you?

The Free tier covers the full TAyumira workflow — pick a teaching method, enter your topic, and get a complete lesson in minutes.

Start free