Lower Exome Sequencing Coverage of Ancestrally African Patients in the Cancer Genome Atlas

Cette étude analyse, en fonction de l'origine ethnique des patients et pour sept types de cancer, la qualité des données de séquençage incluses dans la base du projet "The Cancer Genome Atlas"

Journal of the National Cancer Institute, sous presse, 2022, article en libre accès

Résumé en anglais

Background : In the US, cancer disproportionately impacts Black and African American individuals. Identifying genetic factors underlying cancer disparities has been an important research focus and requires data that are equitable in both quantity and quality across racial groups. It is widely recognized that DNA databases quantitatively under-represent minorities. However, the differences in data quality between racial groups have not been well studied.

Methods : We compared the qualities of germline and tumor exomes between ancestrally African and European patients in The Cancer Genome Atlas (TCGA) of seven cancers with at least 50 self-reported Black patients in the context of sequencing depth, tumor purity, and qualities of germline variants and somatic mutations.

Results : Germline and tumor exomes from ancestrally African patients were sequenced at statistically significantly lower depth in six out of the seven cancers. For three cancers, most ancestrally European exomes were sequenced in early sample batches at higher depth whereas ancestrally African exomes were concentrated in later batches and sequenced at much lower depth. For the other three cancers, the reasons of lower sequencing coverage of ancestrally African exomes remain unknown. Furthermore, even when the sequencing depths were comparable, African exomes had disproportionally higher percentages of positions with insufficient coverage, likely due to the known European bias in the human reference genome that impacted exome capture kit design.

Conclusions : Overall and positional lower sequencing depths of ancestrally African exomes in TCGA led to under-detection and lower quality of variants, highlighting the need to consider epidemiological factors for future genomics studies.