PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin

Angela M.Taravella Oill, Anagha J. Deshpande, Heini M. Natri, Melissa A. Wilson

Research output: Contribution to journalArticlepeer-review

Abstract

Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.

Original languageEnglish (US)
Pages (from-to)296-303
Number of pages8
JournalJournal of Computational Biology
Volume28
Issue number3
DOIs
StatePublished - Mar 2021

Keywords

  • cancer GWAS
  • computational pipeline
  • population ancestry
  • principal component analysis
  • visualization

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin'. Together they form a unique fingerprint.

Cite this