Blog

Minding the genomic data gap: COVID-19, genomics and health inequalities

The role of genomics in the data-driven pandemic response

17 June 2021

Reading time: 7 minutes

Project: Tackling health and social inequalities in data-driven systems

Keyword: Health data

Genomic technologies have been deployed at an unprecedented scale and pace during the pandemic. Rapid sequencing of SARS-Cov-2, the pathogenic coronavirus that caused the COVID-19 pandemic, has helped to inform the development of diagnostics, as well as vaccines. Ongoing sequencing of the virus is providing knowledge on emerging variants, and aiding surveillance efforts to detect and track novel variants.

As well as viral sequencing, work to sequence the genomes of people infected by SARS-Cov-2 has been underway since the early days of the pandemic. Research is examining how SARS Cov-2 interacts with our immune system, to better understand, for example, why the response to infection varies and some people have no or mild symptoms while others become critically ill. As of April 2021, a global initiative to study the relationship between the ‘host’ genome and COVID-19 susceptibility, severity and outcomes has 47 groups involved from 19 countries.

Genomics will continue to have an important role to play in the ongoing response and recovery from this pandemic as well as preparedness for future ones. But, as things stand, biases and gaps in data (genomic and associated clinical, social and epidemiological data) mean the benefits of genomics are unlikely to be distributed equally across all populations.

Genomics and the missing data challenge

A lack of diversity in datasets is a well-documented challenge in genomics and biomedical research. Historically most human genomic studies have been performed in populations of European ancestry. One analysis in 2017 of two public databases of genomic studies found more than 60% of studies were based on populations of European descent. Another review noted that people of European descent account for 88% of genomes in major genomic studies.

The underrepresentation of diverse populations in genomic datasets and studies exacerbates health inequalities. It limits the generalisability of research findings as well as the viability of using genomics in the clinical care of persons of non-European ancestry.

Over 1.5 million SARS-COV-2 genomes have been submitted from 182 countries (as of May 2021) to GISAID, a database for sharing viral genomes. Yet 66% of these genomes are derived from just four countries: the USA, UK, Germany and Denmark. Discrepancies exist even within countries; the percentage of reported COVID-19 cases sequenced and shared varies markedly across different US States.

Longstanding data biases cannot be corrected overnight, so under current conditions they will continue to have repercussions for COVID-19 human genomics research. If viral sequencing is not representative of all regions and groups, there will be significant gaps in our understanding of how SARS-Cov-2 affects underserved populations. For example, for those areas and groups where no or fewer viral samples are sequenced, it will take longer to detect novel variants and examine their significance, to understand transmission networks, and to inform the public health response.

In recognition of the issues posed by a lack of diversity in genomic studies, there are a growing number of initiatives to sequence under-represented regions and populations. Ensuring diversity and equity of access is one of the key pillars of the UK’s recently published strategy for genomic healthcare and research. It emphasises the importance of outreach and communication to diversify datasets to increase data from ethnic minorities in genomic cohorts.

Deep, meaningful and targeted engagement is one step towards improving the diversity of datasets. However, while genomic science has had a critical role in tackling the pandemic (and beyond that for improvements in areas such as rare diseases, cancer and infectious diseases), reporting the merits of genomics alone is inadequate to drive recruitment.

Failing to engage with perspectives outside the field of genomics places the onus on underrepresented populations to understand and to trust. Instead, the field of genomics must hold a mirror up to itself and see the role it has played in perpetuating a lack of diversity and how it should be doing better. This is a critical starting point to identify, recognise and redress the immediate and long-standing barriers to participation.

Minding the ‘missing data’ gap

Human genetics and genomics have a troubling historical connection with racist ideology and ‘race science’ that posits race as having a biological and genetic basis. Angela Sanai has written extensively about how racialised ideas in science and society still persist.

Even today there is confusion around the inclusion of race as a category in research studies and in practice, and too often this leads to a dangerous conflation of socio-political race with genetic ancestry. Brothers et al., suggest a series of principles for those involved in publishing genetics and genomics research to consider. They pose that genetic ancestry and socio-political race should not be used as surrogates for one another. At the same time, they highlight that inclusion of race as a socio-political category is important in contexts where health disparities are observed and that those ‘examining the genetic contribution to health disparities should avoid framing health disparities in reductive terms’, meaning that the non-genetic factors that contribute to health inequalities are overlooked.

This is especially pertinent in the context of COVID-19, which has had a profoundly disproportionate impact on racial and minority ethnic groups in the UK and the USA. Structural racism, and social determinants of health have contributed to greater chances of contracting the virus and to worse outcomes when infected. It is crucial that the genomics community contextualises and communicates the contribution of non-genetic factors to COVID-19 health disparities, especially when studying underserved and underrepresented populations.

Ultimately, addressing data gaps in genomics requires a great deal of introspection for all those involved in genetics and genomics – from funders, research institutions, practitioners, and industry – to acknowledge past racial transgressions in biomedical research, as well as recent biases and structural racism that have contributed to the lack of data diversity in the genomic era.

Research previously commissioned by Genomics England to understand barriers to participation for Black people of African or Caribbean descent points to a range of issues. These range from: a lack of diversity within project teams and the genomics profession; an underestimation of time and resources required to recruit diverse participants; recruitment and information resources that fail to visually represent Black people; and negative historical associations, to contemporary and historical experiences of discrimination.

Others have similarly cited lagging diversity and inclusion in research within the field as a challenge, as well as limited engagement and follow up with participants, and the preference of researchers to focus on well-characterised and studied population cohorts. In recognition of these challenges, the US National Human Genome Research Institute (NHGRI) has set out an action agenda for building a diverse genomics workforce.

Improving representation, whether in human genomics or pathogen genomics, also requires a global outlook. Equitable international collaborations and data sharing are key to this. Too often the Western world’s approach to global health and research have been entrenched in colonial and neo-colonial attitudes that perpetuate inequalities.

In these scenarios wealthier nations place little or no focus on local capacity building that would engender sustainable research in less economically developed nations. Data and samples are extracted for study without necessarily benefiting those nations from which they are derived. Efforts to address the data gaps in genomics, whether at a local or global level, must focus on realising demonstrable benefits for understudied populations. The World Economic Forum highlights some of the tensions arising in genomics that can be considered to foster a broader sense of solidarity through equitable and ethical social practices.

The publication of the Genome UK implementation plan and recent announcement of a ‘Global Pandemic Radar’ signal that genomic sequencing is set to increase sharply over the coming months. If genomics is to deliver both effective and equitable outcomes, efforts to improve data diversity will also need to intensify.

Dr Sobia Razia is Senior Programme Manager, Centre for Genomic Pathogen Surveillance, Big Data Institute, University of Oxford and an Associate at the PHG Foundation. This post is written in a personal capacity and not linked with either organisation.

Image credit: SusanneB

Author: Dr Sobia Raza

Minding the genomic data gap: COVID-19, genomics and health inequalities

Genomics and the missing data challenge

Minding the ‘missing data’ gap

Related content

Researcher: AI and genomics futures

New joint project with the Nuffield Council on Bioethics on AI and genomics futures

Black data matters: how missing data undermines equitable societies

Why the COVID-19 shielded patient list might both compound and address inequalities