Why is Our Genome Data So White? A Discussion on the Lack of Representation in Genome-Wide Association Studies


The public health research consensus is that predominantly social and economic factors contribute to the health disparities observed in the United States. The determinants that contribute the least to health disparities are biology and genetics, which is understandable, as all humans share roughly 99.9% of DNA, and modern access to quality healthcare and a healthier lifestyle is not governed by our DNA but by environmental factors. However, with growing revolutionary fields like pharmacogenomics and precision medicine, which promise to use genetic information to personalize healthcare towards the individual, the fact that the vast majority of genetic databases contain information from individuals of European descent does not necessarily guarantee similar benefits for underrepresented communities. In order to maintain the status that genetics is not a significant determinant of health, it is absolutely necessary that the genetics community eliminate its severe bias towards the European community and increase the number of participants from underrepresented populations in genome-wide association studies. Otherwise, this lack of representation can exacerbate health disparities.


The field of genetics acknowledges that the vast majority of participants in genome-wide association studies (GWAS) are those of European descent. In a 2009 analysis, about 96% of participants in GWAS were of European descent (Popejoy & Fullerton, 2016). As of 2018, this number has lowered to about 78% with the increasing representation of participants of Asian descent in GWAS, but the number of participants of African, Hispanic/Latin American, and/or Native American descent has remained low (Sirugo et al., 2019). Participants of African descent made up about 2% of GWAS, participants of Hispanic or Latin American descent made up approximately 1%, and participants of Greater Middle Eastern, Native American, or Oceanian descent made up <1% (Sirugo et al., 2019). A 2017 study by Johnson et al. has identified the consequences of this discrepancy in the field of pharmacogenetics. They found that for participants of European descent, short nucleotide polymorphisms (SNPs) in three genes account for between 11 and 30% of variance in drug metabolism of warfarin, the most prescribed oral anticoagulant in the world (Sirugo et al., 2019). Meanwhile, for participants of African descent, SNPs account for a significantly smaller percentage of variance, thus claiming that “the algorithms derived from Europeans do not translate into better and safer treatment across ethnic groups” (Sirugo et al., 2019). Many researchers recognize that it should be a priority to have more representation in genomics studies, and this understanding has reached mainstream media in the past couple of years. However, it is important to recognize some of the reasons for which certain populations are severely underrepresented in genetics research. These provide further insight into what protocols researchers need to prioritize in order to improve the representation of minorities in genomics research.

There is no doubt that the history of genetics has been rather hideous: the field has been used as a tool to justify and perpetuate bigotry, racism, and supremacy over others. The story of Henrietta Lacks, whose “immortal” cancer cells underpin key developments in molecular medicine, and the story of Lucy, Anarcha, and Betsey, whose subjugation to experimentation allowed for James Marion Sims to become the so-called “Father of Modern Gynecology,” highlight instances of a lack of informed consent. In each case, the scientists profited not only financially, but also in the form of recognition (Nature, 2020; Holland, 2018). The Guatemala and Tuskegee experiments (where scientists had knowingly injected a total of over 5,600 people with sexually transmitted diseases) and Havasupai tribe case (where DNA samples were kept to investigate mental illnesses and theories on geographical origins that contradicted traditional stories) also share the common denominator of exploitation and lack of informed consent (Rodriguez & García, 2013; Centers for Disease Control and Prevention, 2020; Harmon, 2010). With cases like these painting science’s history, it is understandable why many members of underrepresented communities are more hesitant to participate: they do not want to be the next “guinea pig” that scientists can exploit for recognition, financial gain, or perpetuation of racist ideologies and agendas. These studies have thus changed the relationship that underrepresented communities have with the scientific community.

Still, it is not only due to a lack of trust among these marginalized communities that certain populations are understudied, as indicated by further underlying complexities. One factor that plays a role in this disparity among representation of participants from different ethnic communities originates from the institutional bodies that allocate grants towards scientists conducting GWAS. A 2011 Nature article stated that a mandate to include more participants from minority backgrounds was issued in 1985 (Bustamante et al., 2011). Thus, when it was found that twenty-four years later only 4% of GWAS studies included individuals from non-European backgrounds (Bustamante et al., 2011)—later increased to a whopping 7% in 2011, and eventually 22% in 2018—it became clear that it has not been a priority of many granting bodies to fulfill that mandate. Esteban Burchard of University of California San Francisco says that the delayed GWAS diversification comes from “a system that distributes accountability across multiple scientific committees, leaving no one truly responsible for ensuring improvement” (Jacewicz, 2016).

A distribution of ancestry categories. On the left is the distribution of ancestries represented in studies. On the right is the distribution of all individuals represented in GWAS studies.

Some researchers who have conducted diverse genomics research and clinical trials suggest that the studies themselves are not putting in enough effort to be inclusive. A 2005 study by Wendler et al. conducted a comprehensive literature search and found a total of 20 health research studies conducted in the United States (representing over 70,000 people) that had published consent rates to research by ethnicity. They had uncovered that those studies on average did not have significant differences in consent rate between different ethnicity groups, but there were significant differences in enrollment rates between studies (Wendler et al., 2005). Their data suggest that it is not necessarily an attitude towards research that is affecting participation (as the consent rates are quite similar), but rather knowledge of the studies that are occurring and access to their sites (Wendler et al., 2005). This suggests a much larger focus on identifying what factors may contribute to this disparity in recruitment, such as travel expenses and language barriers. These factors can then be addressed by the researchers themselves.

With these reasons in mind, protocols to improve participation are actions that can be accomplished both in the present and future. The most promising way to increase trust among marginalized communities is to increase minority representation in science, medicine, and public health. It becomes much more empowering for a patient from a marginalized group to see a scientist of a similar background occupy a space that holds such a complicated relationship with marginalized communities. Significantly increasing representation in these fields will take several years, meaning that a long-term effort will be required to overcome barriers of entry to dramatically increase trust. Nonetheless, there are actions that researchers can accomplish right now in order to improve participation in studies and the general relationship between the scientific community and these marginalized communities.

Jocelyn Ashford, a patient advocate for biotech company Eidos Therapeutics, offers how she invites African Americans in clinical trials that benefit their communities: 

“I’ve found that one of the first and most important steps to creating an inclusive clinical 

trial is to engage the target community in discussions around the recruitment plan. By bringing these communities to the table early, we can hear their input instead of making assumptions about how to best reach them. We can hear their concerns and attempt to address them, while educating the communities about the importance of clinical trials, all that’s involved, and the potential to bring high-quality care to their community.” (Ashford, 2020)

She emphasizes building a positive relationship between the researcher and the participant, one that is not only focused on increasing the number of studies with strong minority representation, but also doing so in an ethical manner that respects the concerns of the participants.

The All of Us program aims to recruit 1 million participants and has already recruited over 175,000 participants who have volunteered their biospecimens for research, 80% of which are from historically underrepresented communities. (Sketchify on Canva)

Moreover, it is important to recognize and amplify the efforts that have been implemented to address this extreme difference in representation. For instance, the National Institute of Health’s “All of Us” research program aims to recruit a diverse group of at least one million United States participants to provide more comprehensive phenotype and genotype data. Enrollment opened in May 2018, and as of July 2019, more than 175,000 participants had volunteered their biospecimens for research; moreover, about 80% are from historically underrepresented communities. The All of Us research program investigators also emphasize the importance of ensuring that participants are engaging in a positive relationship between the participant and the researcher, specifically regarding privacy for the participants:

“Establishing authentic engagement with participants and providing value for them will be key to long-term retention and continued recruitment of persons from diverse populations. The All of Us program must resist cyberattacks and effectively communicate the legal and procedural protections of privacy such as those afforded by the 21st Century Cures Act.” (The All of Us Research Program Investigators, 2019)

The All of Us research program investigators estimate that they will reach their target participant size by 2024. (The All of Us Research Program Investigators, 2019)

When analyzing this historical perspective in relation to contemporary innovations, although there is not much that science can do to change its past, there is more than enough room to build a positive relationship it has with underrepresented groups for the present and the future. It will hopefully only be a short amount of time before representation across genomic studies better reflects and better serves the populations that have been neglected and mistreated for so long.

Ann-Marie Abunyewa is a first-year in Branford College. Ann-Marie is a prospective Classics and Molecular, Cellular, and Developmental Biology double major from Georgia. She can be contacted at ann-marie.abunyewa@yale.edu.


Works cited:

The All of Us Research Program Investigators. (2019). The All of Us research program. New England Journal of Medicine, 387(1), 668-676. https://doi.org/10.1056/NEJMsr1809937

Ashford, J. (2020, July 22). Clinical trials need to include more Black and other minority participants: Here’s how. In Stat. Retrieved December 21, 2020, from https://www.statnews.com/2020/07/22/clinical-trials-include-more-black-and-other-minority-participants/

Bustamante, C., De La Vega, F. & Burchard, E. (2011). Genomics for the world. Nature, 475, 163-165. https://doi.org/10.1038/475163a

Centers for Disease Control and Prevention. (2020, March 2). The Tuskegee timeline. In Centers for Disease Control and Prevention. Retrieved December 21, 2020, from https://www.cdc.gov/tuskegee/timeline.htm

Harmon, A. (2010, April 21). Indian tribe wins fight to limit research of its DNA. In New York Times. Retrieved December 21, 2020, from https://www.nytimes.com/2010/04/22/us/22dna.html

Holland, B. (2018, December 4). The ‘Father of Modern Gynecology’ performed shocking experiments on enslaved women. In History. Retrieved December 21, 2020, from https://www.history.com/news/the-father-of-modern-gynecology-performed-shocking-experiments-on-slaves

Jacewicz, N. (2016, June 16). Why are health studies so white? In The Atlantic. Retrieved December 21, 2020, from https://www.theatlantic.com/health/archive/2016/06/why-are-health-studies-so-white/487046/

Nature. (2020, September 1). Henrietta Lacks: Science must right a historical wrong. In Nature. Retrieved December 21, 2020, from https://www.nature.com/articles/d41586-020-02494-z

Popejoy, A. & Fullerton, S. (2016, October 12). Genomics is failing on diversity. In Nature. Retrieved December 21, 2020, from https://www.nature.com/news/genomics-is-failing-on-diversity-1.20759#/

Rodriguez, M., & García, R. (2013). First, do no harm: The US sexually transmitted disease experiments in Guatemala. American Journal of Public Health, 103(12), 2122-2126. https://doi.org/10.2105/AJPH.2013.301520

Sirugo, G., Williams, S., & Tishkoff, S. (2019). The missing diversity in human genetic studies. Cell, 177(1), 26-31. https://doi.org/10.1016/j.cell.2019.02.048Wendler, D., Kington, R., Madans, J. Van Wye, G., Christ-Schmidt, H., Pratt, L., Brawley, O., Gross, C., & Emanuel, E. (2005). Are racial and ethnic minorities less willing to participate in health research? PLOS Medicine, 3(2), e19. https://doi.org/10.1371/journal.pmed.0030019


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s