TY - JOUR
T1 - PopInf
T2 - An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin
AU - Oill, Angela M.Taravella
AU - Deshpande, Anagha J.
AU - Natri, Heini M.
AU - Wilson, Melissa A.
N1 - Funding Information:
This publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM124827 to M.A.W. H.M.N. was supported by an ASU Center for Evolution and Medicine postdoctoral fellowship and the Marcia and Frank Carlucci Charitable Foundation postdoctoral award from the Prevent Cancer Foundation. A.M.T.O. was supported by The Graduate College at ASU and The Achievement Rewards for College Scientists (ARCS), Phoenix Chapter.
Publisher Copyright:
© Copyright 2020, Mary Ann Liebert, Inc.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/3
Y1 - 2021/3
N2 - Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.
AB - Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.
KW - cancer GWAS
KW - computational pipeline
KW - population ancestry
KW - principal component analysis
KW - visualization
UR - http://www.scopus.com/inward/record.url?scp=85102123927&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102123927&partnerID=8YFLogxK
U2 - 10.1089/cmb.2019.0434
DO - 10.1089/cmb.2019.0434
M3 - Article
C2 - 33074720
AN - SCOPUS:85102123927
VL - 28
SP - 296
EP - 303
JO - Journal of Computational Biology
JF - Journal of Computational Biology
SN - 1066-5277
IS - 3
ER -