As frontier AI capabilities progress, approaches to place safeguards around them are also proliferating and on a global scale. These include:
- progress in negotiations of the EU AI Act
- the UK Government’s AI regulations’ white paper
- the AI Risk Management Framework published by the National Institute of Standards and Technology (NIST) in the US
- China’s proposals for wide-ranging regulation of generative AI.
Model evaluations, or model audits, have become a popular idea for ensuring the safety of frontier AI models in the context of and parallel to these regulatory initiatives. For instance, the UK Government has identified international research collaboration on ‘evaluating model capabilities’ as a priority objective for the Global Summit on AI safety this November.
In essence, evaluations are designed to establish whether models have the potential to cause harm or have dangerous capabilities. It is often advised they are carried out independently by third-parties to ensure an unbiased appraisal of risks.
While proposals for evaluations may be a welcome step forward, at present they are merely ideas and therefore need scrutiny to be developed into effective interventions. The good news is that there are lessons to learn from the history of policy and regulations.
This post sets out to summarise these learnings by discussing three very common modes of policy failure and the ways to avoid them through, among other things, consultation with public and tech workers, and sufficient power to take action on results.
Implementing AI evaluation systems appropriate to our understanding of the (rapidly evolving) problem space
Emerging technology policy is intrinsically challenging. Technology is capable of evolving much faster than the human systems for governing it. New institutions and methods for regulation need to be designed and established to fill this void, with pressure to ‘do something’ meaning that a highly imperfect (and likely soon out of date) response might be better than nothing. Additionally, government agencies are under unique pressure due to their own ways of operating, like the strict requirement to spend budget within the financial year or lose it.
All these challenges are exacerbated by the current slow pace of progress on understanding how AI works so that it can be more effectively steered. There has been a lack of investment to date by AI labs in what is sometimes called AI safety or alignment research – the process of getting AI to do what we want and having high confidence that it will do so across different settings and over time. This requires further scientific study of how AI technologies work, such as interpretability research – the study into how to understand the decision processes of an AI system – so that risks and harms can be identified and prevented.
In summary, everyone – not just policymakers – has a highly imperfect understanding of what frontier AI can do, its intrinsic risks, and therefore how it ought to be regulated.
All of this is an argument for two seemingly contradictory actions: more reflection, alongside more rapidly iterating regulatory responses. Acting too hastily, such as quickly setting standards for evaluations, runs a high risk of wasted effort to embed an approach that soon loses utility, as well as stifling innovative methodologies for evaluation. Yet we need agile systems to render AI safe sooner rather than later.
Notably, regulatory methods combining intervention and reflection have been shown to work in safety-critical domains such as health, where frequently most of the legal framework is amenable to iteration via guidance, rather than requiring legislative changes. In the context of AI regulation, the purpose of the UK Frontier AI Taskforce, and its focus on technical safety work for instance, would seem to be a needed step in the same direction.
An example of the complexity in getting this right can be seen in the EU AI Act. The Act was initially designed for narrow AI technologies, which are focused on specific tasks and contexts, so the regulatory framework centred on the risk of an AI system based on its intended use. But foundation models, which have recently proliferated and underpin technologies like ChatGPT, offer such a broad range of uses in different contexts that accounting for all risks is almost impossible. In fact, many of their capabilities and associated risks are only discovered during deployment.
Since the proliferation of large language models (LLMs) – a type of foundation model – the EU had to significantly rework their legislation. Moreover, some have noticed that more changes to the AI Act are necessary ‘such that [it] generalises from today’s focus on ChatGPT’, suggesting that it has now over-anchored to large-language models, the dominant type of foundation models today, and yet again will likely not be fit for future types of models.
Policymakers are in a difficult position because there is a bias towards taking action – even if there’s a high risk it might be counterproductive. A fine balance needs to be struck so that regulation mechanisms like evaluation are not so rigid that they quickly become unfit for purpose.
Engaging a broad range of voices during the policy design stage of evaluations
Tech workers in companies developing frontier AI will have useful views about how external evaluations should be deployed, including: which key risks they should tackle, at which points of the lifecycle to introduce them and how any burdens they might create on engineers can be mitigated.
It is not just those building technologies who will have a relevant view. The public will be simultaneously consumers of and subject to the influence of frontier AI deployments and therefore have a considerable interest in its evaluation. Lack of public trust could be in the critical path to products failing or the development of pressure or campaigns to have frontier AI restricted. Academia and civil society groups typically provide another flank in defence of the publics’ values.
Sadly, there are lessons to learn from the field of healthcare on how pivotal trust of the public and key staff groups is, and how loss of their trust curtails innovation. In 2013, care.data aimed to create a national NHS database of patients’ medical records, combining both primary and secondary care data. However, its poor public consultation undermined trust in national data aggregation.
Consent for sharing data was assumed, with the burden of ‘opting out’ placed on patients, though this was not considered adequate by many general practice (GP) doctors as well as the Information Commissioner’s Office. Posters in GP surgeries and leaflets were used to convey the change by NHS England to patients, but both types of printed materials were easily missed. Poor communication created an inherent distrust of the scheme, and was one of the key reasons why the project was put on hold and then scrapped in 2016.
Most disappointingly, and despite significant reflection on this high profile failure, a similar project met the same fate in 2021. The General Practice Data for Planning and Research (GPDPR) assumed public goodwill towards data sharing following the COVID-19 pandemic. However, it was wrapped up due to a high number of patients withdrawing their consent as a result of the scheme’s lack of assurance around privacy.
In the AI policy context, we might currently be in the best of circumstances, where companies building frontier AI models have shown willingness towards external evaluations in the UK and US. However, even in command-and-control led industries, we know that the vision of management and staff are not always perfectly aligned, and that staff need to be involved in designing and implementing a change for it to be successful.
And even then, the case study of care.data shows how easily public trust can be lost. Two-way engagement with the public and key tech workers is an opportunity to design a robust evaluation policy from their perspectives, and to course-correct if the policy starts to drift from meeting their needs.
Sufficient power and commitment to act on evaluation results
The evaluation of products, services or how an organisation is complying with legislation is a key plank in effective regulatory regimes, but it is meaningless unless it informs a decision. This can be approving a product or service for the market or enforcing specific rules where there are issues or concerns.
Current voluntary AI evaluation regimes in the UK and US agreements do not carry legal powers for approvals or enforcement. Eventually they will need to and, even then, the legal power to act is not enough. Enforcement bodies, like regulators, may not always have sufficient human and financial resources or political support behind their decisions. Two case studies highlight the complexities in this: GDPR enforcement and medicines approvals.
The GDPR’s enforcement regime is distributed across European countries, with different supervisory authorities undertaking investigations of data breaches. Because of this, GDPR legislation can be interpreted differently by different agencies, which some argue has led to inadequate enforcement.
Authorities have a range of actions available to them: amicable resolution, reprimands or compliance orders where the company must change how they process data. Fines can also be issued of up to 4% of the global turnover of a company from the preceding financial year. The actions that national enforcement authorities choose, however, can be scrutinised where they affect other EU states.
The Irish Data Protection Commissioner (DPC) has come under considerable scrutiny for what were considered too lenient actions when investigating big tech. A 2023 analysis by the Irish Council for Civil Liberties showed that the DPC had 67% of its GDPR investigation decisions in EU cases overruled by majority vote of its European counterparts at the European Data Protection Board (EDPB), who insisted on tougher enforcement action.
There are understandable – perhaps even predictable – reasons why this happened. Ireland hosts more big tech companies than any other EU state and therefore is responsible for the most investigations. While they have been criticised for pursuing amicable resolutions, the Commissioner Helen Dixon highlighted the challenges the DPC has been facing. The ‘disproportionate resources’ held by big tech in comparison to the regulator’s and the limits on its funds and staff led to delays and slowed investigations down.
It is indeed notable how, when resourcing was boosted, the result of investigations changed. From 2016 to 2022, the DPC’s budget increased from €5m to €23m (approximately 4.5 times the original budget), and in 2022 the DPC issued two-thirds of the fines and corrective measures across the combined European Economic Area and UK. So, while there are still many critics of the DPC and GDPR enforcement more broadly, it seems reasonable to conclude that poor resourcing did negatively affect enforcement and this was at least partially rectifiable.
A different way to empower enforcement and evaluation agencies is through political support and, again, the health sector offers an interesting example with regards to this.
The National Institute for Health and Care Excellence (NICE) undertakes health economic assessments of drugs and medical devices for England. This fundamentally has an allocative function: ensuring that what is approved has the biggest patient impact and is most valuable for taxpayers. One key flank of this is appraisals on clinical and cost-effectiveness of technologies, to inform use in the NHS and social care.
In 1999, NICE’s first technology appraisal did not recommend Relenza – a drug used to treat and prevent influenza – for national uptake, due to lack of proven clinical benefit at the time. In this regard NICE was fulfilling its purpose of ensuring technologies paid for at a national scale were truly impactful. But the company hit back. The chair of the company contended that the pharmaceutical industry would leave the UK because of an ‘environment antagonistic to this industry’, and called on the government to abolish the fledgling evaluator.
The then Health Minister stood down the challenge and supported NICE’s verdict. Most importantly, history showed that NICE’s processes worked as intended. The drug was eventually approved after additional evidence was collected to demonstrate clinical benefits, and at a large cost reduction of 33%, which meant a lower cost burden on taxpayers and acceptable reimbursement for industry.
Clearly, things could have been different if the Government had not lent its political backing and resourcing of NICE’s vision for evidence-based and value-driven care. How this lesson will affect frontier AI is anyone’s bet, but it speaks to the importance of maintaining the independence of any evaluators.
Perhaps more relevant to the present is how funding of the Data Protection Commissioner has been affecting enforcement. The squeeze in public services – exemplified by recent pay disputes – would suggest that lack of resourcing for regulatory and evaluation agencies could be an issue.
Policymakers, particularly in the UK, have demonstrated a serious intention to understand and mitigate a range of risks from AI; from immediate risks to minoritised groups, exploited workers and further centralisation of power, to perceived existential risks. While there is clearly goodwill, it would be remiss to not point out the potential for history repeating itself with forthcoming regulation.
The often quoted George Santayana expression ‘those who cannot remember the past are condemned to repeat it’ is both a warning and a gift to us. Any new technology frontier will still challenge policymakers and regulators, but by learning from the past we will probably make better mistakes than we would otherwise.
 Guidance always has its roots in legislation, but can be iterated more rapidly and flexibly whereas legislation requires several legal and political steps at minimum. Explainer here: https://www.oneeducation.org.uk/difference-between-laws-regulations-acts-guidance-policies/
 One exemplary case study is the failed attempts to introduce new production and quality management processes in car manufacturing at General Motors, because of long-standing cultural issues with management.
 Data protection agencies in EU nations or regions in Germany.
 For a deeper and definitive account, read chapter 1 of ‘A Terrible Beauty’ by Timmins, Rawlins and Appleby. Available at: https://www.idsihealth.org/wp-content/uploads/2016/02/A-TERRIBLE-BEAUTY_resize.pdf [accessed 16th August 2023]
Reaching consensus at the AI Safety Summit will not be easy – so what can the Government do to improve its chances of success?
Strengthening the UK's proposals for the benefit of people and society
Will the proposed regulatory framework for artificial intelligence enable benefits and protect people from harm?