Performance, not theatre: Three principles for better government performance measurement
Christian Schuster, Professor of Management and Public Policy at the Blavatnik School of Government, has worked with governments around the world on measuring management and performance in public sector organisations. He argues that, thanks to the digitisation of records and AI, governments have never been in a better position to measure performance, though measurability does not protect governments from repeating performance measurement mistakes of the past.
Every government wants to show its citizens that it produces value for them. Every government thus faces a measurement challenge: how to define and show the success of government organisations. In the private sector, profit and share prices provide ready approximations for success. Yet, public organisations pursue multi-dimensional social goals, not profit – from education to clean streets to nuclear safety. Ready measures of success are not available.
Performance measurement is thus riddled with tension: governments want to measure to evaluate progress and show success, yet face inherent challenges and complexity when doing so.
What happens when governments prioritise performance measurement without recognition of these inherent challenges and complexity? The history of performance measurement in the UK provides rich lessons.
Take, for example, the country’s ambulance system. Starting in 2002, regional ambulance trusts were given “stars”, based in part on how quickly they reached patients with their ambulances. Trusts that failed were named and shamed, while trusts that performed well were publicly recognised and granted greater autonomy. Response times were a politically easily communicable measures of ‘success’ of ambulances. And, indeed, after the introduction of the system, the share of patients reached within response time limits increased.
But did this mean that ambulances produced more value for citizens? It did not.
Ambulances started gaming and cheating the system. They altered response times manually and started placing ambulances closer to patients who could be reached within the response time target at the detriment of others who would now have to wait much longer. Measured performance was thus, in practice, a theatre performance, in which ambulances looked like they were performing when they were not.
Moreover, the system reduced a complex mission of ambulances – including providing care and saving lives – into a single indicator. Ambulances were now treating the clock, not the patient. The lesson is clear: when metrics become proxies for purpose, they can end up distorting the very goals they are meant to serve.
Three principles for performance measurement
The ambulance case highlights a common failure in public management: over-reliance on what is easy to measure, rather than what is important to measure. Response times are visible, comparable, and politically salient. Patient outcomes, quality of care, and equity are harder to capture. Such multi-dimensional missions are the norm in public services.
So, firstly, performance systems must capture mission attainment holistically. No single metric can reflect a multi-dimensional public mission. While public managers are inevitably constrained by what can be quantified, relying exclusively on quantitative indicators risks sidelining what matters. The solution is not to abandon measurement, but to broaden it: expand what is measured quantitatively, and combine quantitative indicators with qualitative indicators.
The ambulance case also highlights the importance of considering human behaviour. Individuals and organisations respond to being measured.
So, secondly, measures should be designed with gaming and cheating in mind. When targets are tied to rewards or sanctions, organisations will respond to incentives, sometimes in unintended ways. Organisations game through “cream skimming” (prioritising easier cases), threshold effects (focusing effort just below a target cutoff), or “storming” (concentrating resources when measurement is most visible), for instance. Or they might falsify numbers, manipulating data outright.
The task for leaders is to anticipate and limit these behaviours. This requires iterative design: anticipating potential gaming, monitoring after implementation for signs of gaming and being willing to revise measures to address gaming, even when headline numbers look good. It also requires system design to preclude cheating, such as avoiding situations where those being evaluated are the sole source of performance data. Independent verification, whether through audits, third-party data, or user-reported outcomes, can act as a powerful deterrent.
Finally, performance systems should be designed to reinforce, not undermine, motivation. Many public servants are driven by a sense of purpose, professional pride, and commitment to those they serve. Targets and incentives can inadvertently crowd out these motivations by signalling that success is defined solely by hitting narrow numerical thresholds. To counter this, leaders should align metrics with missions and consider using performance information as a tool for learning rather than reward and punishment. Done well, performance management can strengthen intrinsic motivation by making visible how individual effort contributes to meaningful public outcomes.
The time for good performance measurement is now
Governments have grappled with performance measurement as long they have existed. In Ancient Egypt, Pharaohs rewarded officials for demonstrable outputs, including the accumulation of agricultural surplus.
Today, the context for performance measurement in government is fundamentally different, however, than even a decade ago. A plethora of government records are digitized, from procurement transactions and tax filings to land registries, health records, and unemployment claims. At the same time, advances in AI and machine learning have dramatically lowered the cost of cleaning, linking, and processing these datasets for analysis. What once required teams of statisticians working for months can now be done rapidly and at scale.
For instance, analysts can identify which local offices process unemployment claims more quickly, generate fewer appeals, and receive fewer citizen complaints – while controlling for differences in caseloads or regional labor market conditions. Similar techniques can be used to compare hospitals on patient outcomes and readmission rates, schools on student progress, or municipalities on the cost and speed of infrastructure delivery.
More measurability, however, only translates into better performance measurement if governments heed lessons from the past: measure mission attainment holistically, anticipate how humans will react to measurement and think of performance measures as learning tools first. Then performance measures might, in fact, enhance performance rather than theatre in government.
Christian’s latest book, ‘The Government Analytics Handbook’, published with the World Bank, explores how to use data to measure and improve public administrations. It has been downloaded over 50,000 times.