Genomics, PRS, genetic ancestry, race & ethics — Anna Lewis

Over the past two decades, the cost of sequencing a whole human genome has fallen ten million-fold. However, the advances in sequencing have not equally progressed our understanding of the genetic code. Alongside the scientific struggle to interpret the Book of Life, we must also confront the ethical and social issues brought up by ongoing research.

Anna Lewis, a researcher at the Edmond & Lily Safra Center for Ethics at Harvard, guides us through some of the most complex issues in this field in this week’s episode. She discusses the use of polygenic risk scores, and how a poor choice of scientific framework can hinder medical advances, or even worse, lend support to racist ideologies.

In 2003, the Human Genome Project announced it had sequenced the 3.2 billion base pairs that constitute the human genetic code, more precisely the genetic code of a few humans, on just one-half of each chromosome. This project cost around $3 billion and took thirteen years. In 2023, it takes less than 24 hours and under $300 to sequence a whole genome. The learning rates for renewable technology and computing (Moore’s law) pale in comparison, making genomics a leading candidate for the technology that has seen the steepest price declines in history.

Proteomics and gene editing have also witnessed revolutionary advances in recent years:

  • Protein folding: Google’s DeepMind unveiled AlphaFold in 2021, a deep learning model that can predict the shapes formed by proteins with speed and accuracy, thus enhancing our ability to understand their functions.
  • CRISPR: a technology that adopts the mechanism by which bacteria splice in DNA from their predators (phages) to remember them. CRISPR allows for low-cost copy-paste editing of DNA.

Despite these breakthroughs, Anna points out that many problems remain partially understood. One of the deeper reasons for this is the highly complex interactions of genes. Prior to the sequencing of the human genome, scientists anticipated they would find 80,000 – 140,000 genes. Instead, depending on our understanding of the term ‘gene’, they found 20,000 – 30,000. This initial overestimate reflected an under-appreciation of the intricacies of how genes coordinate their functions.

Pathologies such as Huntington’s disease and hemophilia can be traced very clearly to single genes. In many cases, health outcomes depend on multiple genes; or single genes that lead to disease may not always be expressed (“switched on”), furthermore, non-human DNA can play a role in health. Epigenetics and hologenomics are two fast-developing fields that study, respectively, the mechanisms driving the expression of human genes and the importance to the health of non-human cells such as those in the gut microbiome.

We have transcribed The Book of Life, but we do not yet know how to read it. Not fully.

As a consequence, progress in medical technology has not kept pace with the improvements in gene sequencing. It should be conceded, however, that this lag is also due to the essential regulatory hurdles involved in bringing therapies safely to market. Undoubtedly, a similarly rigorous framework for AI would have prevented Language Learning Models (LLMs) such as ChatGPT from emerging as early as they did.

Nonetheless, there is cause for optimism in the treatments of monogenic diseases such as Huntington’s, hemophilia, and Duchenne’s muscular dystrophy. But for the many human traits, whether pathogenic or not, that are polygenic — and thus depend on the expression of multiple genes — progress is slower. Considering that 12,000 genetic variants influence height we can appreciate the difficulties of linking traits back to the genome.

One of the tools developed to cut through this complexity is the Polygenic Risk Score (PRS). By examining a population, researchers apply statistical methods to understand how the frequency of occurrence of a trait (for example, type 2 diabetes) correlates with the genes observed. This is used to predict the probability of a trait based on an individual’s genome, with varying results. One of the key factors in the efficacy of such scores is that the genome for which risk is being calculated is similar to those sampled in the population. But what does that mean?

Population, Anna argues, is a term with a dangerously confusing array of meanings. The population of New York is a number, but it also represents a group of people who do not necessarily share anything meaningful for the purposes of genomics. In the past, PRSs have relied on concepts of race for creating a sample set and understanding the portability of results from that set to other individuals. However, while race remains an important and valid variable for studies in sociology and economics—for example, in understanding and correcting for the historical and ongoing consequences of discrimination—it lacks a scientific basis. Race is a social construct subject to shifts and changes; who is considered “white,” for instance, has more to do with power structures than phenotype. A century ago, Southern European immigrants to the USA were not counted as such.

Recently, there has been a move away from race to the notion of genetic ancestry. However, Anna and her colleagues have observed that unless properly defined, this risks being no more than race renamed. In particular, if genetic ancestry is understood as simple continental groupings—East Asian origin, and so forth—it neither serves science nor society. On one hand, it fails because these groupings do not accurately represent genomically salient features. For example, a continental African grouping contains as much diversity as all other groups combined. On the other hand, by suggesting it has scientific meaning, the use of these groupings may inadvertently propagate racist ideologies.

It is crucial to correctly develop Polygenic Risk Scores. If done correctly, medical advice and interventions can be more effectively targeted. Anna and her colleagues argue that the most appropriate way to understand genetic ancestry is through the Ancestral Recombination Graph (ARG). Unlike continental ancestry groupings, which are flat in time and wide in extent, the ARG looks at how an individual’s DNA has been traced back from various different points and places in time. It traces how an individual’s genome branches out back in time among ancestors.

The ARG is much more specific than continental groupings. However, because genes tend to be passed down in chunks rather than individually, it offers sufficient coarse-graining to highlight where variations are significant and apply metrics for grouping people such that highly predictive PRSs can be calculated. The ARG is a precisely defined and scientifically meaningful object, lacking the connotations of race. As Anna and her colleagues put it in their recent paper, this gets genetic ancestry right for science and society. So far the research community is yet to combine ARGs and PRSs, Anna and her colleagues’ paper is a rallying call to do so.

To make the most of this window of opportunity to move away from race as a biological variable,
we would urge the adoption of a multidimensional and continuous conceptualization of ancestry,
free wherever possible of population categories, and not relying on continental labels that bear
striking resemblance to prior racist groups.

Anna Lewis et al, Getting genetic ancestry right for science and society. Science. 2022 Apr 15

Anna and I studied together at Oxford, starting our degrees in 2003, just as the human genome was first sequenced. It has been a pleasure to follow her career since then, through a PhD in systems biology, medtech startups, and back into academia. More technological and medical breakthroughs are on the horizon—perhaps polygenic CRISPR edits will allow us to influence intelligence, strength, and tendency towards violence. The work that Anna and others are doing in the ELSI field (Ethical, Legal & Social Implications) is crucial to the choices we will make as a species.

 We are learning how we can use our tools, we need to be mindful of how we should use them.

Notes