Data science and the case for ethical responsibility

8 March 2019 | Tim Gardam

Tim Gardam, Chief Executive of the Nuffield Foundation, recently delivered a speech to the British Computer Society on the history of data ethics and the importance of the Ada Lovelace Institute

History

The Nuffield Foundation was created in 1943 by Lord Nuffield, the British 20th century industrialist and philanthropist. Its main purpose is “the advancement of social well-being by scientific research”.

From its early years in post war Britain, it has focused on disadvantage, the obstacles to opportunity and the interventions that might overcome them. In all this, it put scientific research to the fore.

In 1956, the Foundation gave grants of £80,000 to the Cambridge University Mathematics Laboratory to fund the successor computer to EDSAC, The Electronic Delay Storage Automatic Calculator, which, as you will know better than I, was the first practical general purpose stored program electronic computer. It was a principle of Nuffield always to back talent, and the Foundation was impressed that the grant applicant, one Dr MV Wilkes, whose team had created EDSAC, was determined to keep development pointed towards the scientific possibilities and did not want to apply his research too early to immediate commercial opportunities. Sir Maurice Wilkes of course was the founder of the British Computer Society (BSC), just one year after the Nuffield grant. It is difficult to think of a grant better spent in the history of the Foundation.

The Foundation’s interest in computing continued. A smaller grant in early 1960s to Welsh College of Advanced Technology was to fund research into an electronic digital computer.

Its interests have always stretched from science to social science, with Education at its core. In the language of its first report, in order “to assist the disinterested study of human society”.

In the twenty first century, it is impossible to conceive of such a challenge without reinforcing the close interlinking of data science and social science. However, there remains a challenge of translation across the two cultures and disciplines. This is one of the most potentially rewarding initiatives we could plot together.

If one sets the interests of the BCS and those of the Nuffield Foundation alongside each other, they are, in many respects, the same.

In Education: re-thinking the curriculum to keep up with the transformation of a digital society, improving the quality of computer science and IT teaching and supporting teachers. More broadly, exploring what sort of education will most benefit a child born this year, who will emerge from the world of study to the world of work in the 2040s, when assumptions about what constitutes work, or a productive life in a tolerant and free society, seem ever more uncertain.

The other area of common interest is probably the most important of all for social scientists and data scientists: the ethical development and use of data and AI in a digitally driven society. As the BCS says, “AI has the potential to make our lives easier, healthier and more fulfilled. It also poses threats that we don’t even fully understand yet. As tech companies develop AI machines, we’re contemplating the ethics of their advances”.

It is this second challenge that I’ll focus on here.

The Foundation came into being at a time in history when there appeared to have been a great resolution as to the future direction of society, with the election of a Labour government at the end of the war in 1945. What mattered then was effective implementation of those ideals of equitably shared health and prosperity into policy.

The challenge facing all of us, 75 years later, is to address those same issues of social well-being, opportunity and disadvantage at a very different moment in history – when, so it appears to me, there is no longer any such equivalent sense of resolution and little sign of a new one emerging. Indeed the debate is only just beginning as to what sort of settlement the technological revolution of a data driven society will bring to our culture and politics.

It is not that the social agenda in recent decades has changed beyond recognition, but the terms in which it needs now to be addressed undoubtedly have.

What links the perspectives of the BCS and the Nuffield Foundation is an understanding that the relationship between people and technology already and will increasingly shape our most deep-seated perceptions of who we are and how we interact. It will change the individual’s relationship to the state.

Whether in the world of work, childhood development, the justice system, learning and skills, family dynamics, or intergenerational, ethnic minority and geographic tensions and inequalities, the impact of algorithms, data and AI changes fundamentally how we consider these issues.

Background to Ada

This is why the Nuffield Foundation has put £5m into setting up the Ada Lovelace Institute.

Ada is a partnership, designed to research and deliberate on the impact of data and AI on people and society, the ethical implications and the distributional effects. Its aim is not to polish the problems but to seek ways of addressing them.

We have established Ada with a number of other bodies, including the Royal Society, Royal Statistical Society, British Academy, Alan Turing Institute, Wellcome, and beyond the academy, Luminate, and the tech sector’s body, tech UK.

We would very much like the BCS to be a partner with our work, bringing its intellectual capital to inform the interdisciplinary work we need to do.

One of our models for Ada is the Nuffield Council on Bioethics. The Council was established 30 years ago to be a place for independent and interdiscplinary deliberation about the most complex ethical questions arising from biological and medical research. It was conceived as being upstream of regulation, identifying on the horizon the decisions that need to be addressed.

Today’s challenges are of course even more complex as the ethics surrounding the development and use of data driven systems and artificial intelligence involve scenarios well outside the long established institutional and normative frameworks of medical science.

In this debate, technological, societal and ethical arguments interlink ever more directly. These questions break down intellectual and disciplinary barriers – between social scientists and data scientists, both of which also now have to take account of the underpinning disciplines of the humanities – moral philosophy and practical ethics.

There is no divide between science and ethics in addressing these challenges. Ethics is not some system of compliance; it involves considering from the outset whether what you are proposing is the right thing to do or the right way to go about it. There is little point having your ethics teams turning up at the end of the process.

Mapping and defining ethical terms

Last month, the Leverhulme Centre for the Future of Intelligence in Cambridge published a Nuffield funded roadmap for research setting out the ethical and social implications of data and artificial intelligence. With most roadmaps, one can presuppose the roads described are already built; the LCFI project seeks to map an intellectual terrain for our century that is as uncertain, as new and as alarming as those 16th century maps one sees in the Florentine Museum of the Medicis where the outlines of coastlines, continents and mountain ranges fade into terra incognita and pictures of sea monsters and dragons.

In the public debate of the past year, it has been the monsters and dragons who have come to dominate our view of the horizon, as we have seen revealed the murky practices of Facebook and the opacity of Google and Amazon – the underhand ways in which they secure their dominance and use their customers as products, with little effective regulatory framework to make them accountable.

This sense of growing dysfunctionality is itself a concern because if the discussion around AI becomes dominated by mistrust and questioning of its legitimacy, the huge benefits to personal well-being, health and prosperity will be lost. And worse, the development of AI will be implemented not by the innovation of liberal democracies, but by others who have no interest in a plural society where government and corporate power is held to account under the rule of law.

The thesis of the LCFI report is that, before one can identify the ethical co-ordinates to guide us through the risks of a data driven world, or indeed find the routes to well-being that the power of these technologies makes possible, we must first delineate and define key concepts and values, and then negotiate the tensions and trade-offs between them.

Only then can we resolve the ambiguities inherent in the way we use such key terms such as consent, bias, ownership, explainability, and privacy. Too often, in this debate, those with conflicting interests use these terms on their own terms, and simply talk past each other. Some clarity of definition has to be the precondition for building a more rigorous evidence base on which to construct possible answers to the dilemmas raised, or, at the very least, to bottom out the nature of disagreements.

So there needs to be a way to reset the conversation and define the questions that we need to resolve.

The data scientist commendably always looks to providing a solution; tweak as you go along. There are some questions capable of technical solutions – “we can fix that”- but many others that are” wicked questions”, where there is no right answer, only competing values. For example: the trades off between individual privacy and public benefit, between accuracy and fairness, convenience and loss of control of information about yourself, personalisation and the loss of solidarity and common purpose. These require a framework of norms if they are to be negotiated successfully, even when they are not solvable in algorithmic terms. That is where you need the interaction with the social scientists and moral philosophers.

Power

Discussions about ethics soon collide with discussion about power. These questions also require the convening of different interests, not only across academic disciplines, but also between the Academy, public policy, civil society and, above all, involving practitioners, the interests of the private capital of the tech sector, the locked away repository of so much of the knowledge that affects all our futures. We face a complex challenge of translation not only between different academic languages but between public and commercial interests. Even when interests conflict, we need to confront this challenge of translation so that we can at least recognise what we mean by what we say.

Debates on what is at stake in an AI-determined world can be couched in inevitably quite elevated terms, focusing on the intersection of intelligent machines and the future of humanity. However, I think it may be more effective to approach these challenges in a more granular way, grounding this rather millenarian discourse in our current social dilemmas. We should start by examining the emerging distributional effects of technologies on different groups within society and identifying what one might call bias by design, whether that design is conscious or not.

Underlying all these issues are the inherent asymmetries of power in a data driven world; It is already the case that as consumers, we have ourselves become the product, and do not understand the terms and conditions to which we agree. In a world of AI, transparency itself becomes impossible. Decisions that may be better, in healthcare diagnostics for example, may also be unexplainable.

We need to decide how to frame our understanding of what we think of as belonging to us. Our individual data is of very limited value, but collectively it is the currency of our era. Some have argued that we have allowed ourselves to surrender our data unawares to companies that surreptitiously usurped our ownership of it. But this assumes we own our own data in the first place. Is the concept of individual data ownership as the basis for our relationship to the Googles and Facebooks of the world, a useful one? Or, outside of a framework of contract law, does there need to be a different framework for underpinning data rights as an aspect of human rights?

This in turn links to questions of individual and collective privacy. We may be prepared to trade our own privacy for advantage, but what happens when judgements are made about us as individuals due to our data being part of collective assumptions made about some group into which we are categorised. This goes to the heart of the relation of the individual to corporate power or to the power of the state, and to our concept of justice and individual freedom under the law.

The latest Rubicon, which we may already have crossed, is facial recognition technology. In the last century, we accepted CCTV as a price for better security; today, AI is able not only to recognise immediately who you are, wherever you are, but also makes predictions as to what you are likely to do. In the US, I am told, AI companies are offering technology to schools so they can identify which of the pupils may be a potential “active shooter”. What then is left of our autonomy?

Culture

It is not difficult to identify the issues but maybe one shortcoming of the current debate is that it focuses on the use of AI when equally tough questions may lie in its development. As Professor Helen Margetts, Director of the Public Policy Programme at the Alan Turing Institute put it to me recently – “who makes the stuff”? and what are the implications embedded in its creation before it is ever put to use?

We all recognise that data and AI are not of themselves ethically charged, but their biases do reflect those of the people who create the algorithms. This is why the narrowness of tech culture has consequences. I will mention just one implication of this, one very close to the BCS’s interests – the gender and diversity problem in computing.

The BCS has forced attention to this issue by compiling the data that show only 17% of UK IT specialists are female. And indeed only 8% are disabled (as against 23% of working age population who are disabled); and non-white IT employees are more than twice as likely to be in part-time work as their white counterparts, as they are unable to find full time jobs.

You may have seen a compelling article in the New York Times last week – the Secret History of Women in Coding. It is the chilling account of the gradual erasing in the USA of the role of women who had from the 1950s and to 1970s taken a leading role in the development of the computing sector, as, in the mid-1980s, the culture of the Academy and then of Silicon Valley took hold.

I learned that in the 1960s some executives argued that women’s traditional expertise at painstaking activities like knitting and weaving manifested precisely the right mind-set for computer programmers and the 1968 book Your Career in Computers stated that people who like “cooking from a cookbook” make good programmers.

In 1967, a Cosmopolitan article, Computer Girls, noted that women could make $20,000 a year doing this work (or more than $150,000 in today’s money). Unlike law or finance or most sectors, this was the rare white-collar occupation in which women could thrive.

The article goes to the core of the Nuffield Foundation’s and the BCS’s concerns – education. Women graduating with a degree in computer science fell from the high point of over 37% in 1984 to 17.5% in 2010. As personal computing took hold, fathers took it up with their sons; computers were a boy’s toy; they sat in the boy’s bedroom. The bias in the home fed through to bias in the classroom. A self re-enforcing culture of male assumption developed that if you were not already experienced in coding, you weren’t any good at it; computing was a “hard core” subject – a revealing metaphor, – and it drove women from the lecture halls.

At times, this has developed into a rancid socio-biological filter in the tech start-up sector that only a type of male – white or Asian – was good enough to be a top grade computer programmer. In 2017, a news site that covers the technology industry revealed that 20 percent of Google’s technical employees were women, while only 1 percent were black and 3 percent were Hispanic. Facebook was nearly identical; the numbers at Twitter were 15 percent, 2 percent and 4 percent, respectively. Yet in India 40% of computer science graduates are women and at the University of Malaya in Kuala Lumpur 39% of the PhD candidates are female.

A significant social science research agenda of the next decade must be one that takes head-on the current and future patterns of the tech industry, its employment, economic, cultural, societal and ethical assumptions, and the significance of these on society’s decision-making. This cannot be the terrain of a single discipline as the consequences enter into the lives of all of us at the most intimate level. In outlining this challenge, it is very easy to underplay the transformative benefits in all parts of our lives that the data science revolution will bring. The challenge is to ensure these are spread as widely as possible. The current inequalities obviously include gender but go more widely and more deeply to the heart of our understanding of contemporary disadvantage and diversity, and the terms on which an inclusive digital society will be possible. And these are problems that data science and social science working together have an ethical responsibility to help to answer.