A critical assessment of storytelling: Gene ontology categories and the importance of validating genomic scans

Pavlos Pavlidis; Jeffrey D. Jensen; Wolfgang Stephan; Alexandros Stamatakis

doi:10.1093/molbev/mss136

A critical assessment of storytelling: Gene ontology categories and the importance of validating genomic scans

Pavlos Pavlidis, Jeffrey D. Jensen, Wolfgang Stephan, Alexandros Stamatakis

Research output: Contribution to journal › Article › peer-review

165 Scopus citations

Abstract

In the age of whole-genome population genetics, so-called genomic scan studies often conclude with a long list of putatively selected loci. These lists are then further scrutinized to annotate these regions by gene function, corresponding biological processes, expression levels, or gene networks. Such annotations are often used to assess and/or verify the validity of the genome scan and the statistical methods that have been used to perform the analyses. Furthermore, these results are frequently considered to validate true-positives if the identified regions make biological sense a posteriori. Here, we show that this approach can be potentially misleading. By simulating neutral evolutionary histories, we demonstrate that it is possible not only to obtain an extremely high false-positive rate but also to make biological sense out of the false-positives and construct a sensible biological narrative. Results are compared with a recent polymorphism data set from Drosophila melanogaster.

Original language	English (US)
Pages (from-to)	3237-3248
Number of pages	12
Journal	Molecular biology and evolution
Volume	29
Issue number	10
DOIs	https://doi.org/10.1093/molbev/mss136
State	Published - Oct 2012
Externally published	Yes

Keywords

gene ontology
genome scanning
literature mining
positive selection
validation

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Molecular Biology
Genetics

Access to Document

10.1093/molbev/mss136

Cite this

@article{2d4a379bd2c04c4e91e604e321b7461b,

title = "A critical assessment of storytelling: Gene ontology categories and the importance of validating genomic scans",

abstract = "In the age of whole-genome population genetics, so-called genomic scan studies often conclude with a long list of putatively selected loci. These lists are then further scrutinized to annotate these regions by gene function, corresponding biological processes, expression levels, or gene networks. Such annotations are often used to assess and/or verify the validity of the genome scan and the statistical methods that have been used to perform the analyses. Furthermore, these results are frequently considered to validate true-positives if the identified regions make biological sense a posteriori. Here, we show that this approach can be potentially misleading. By simulating neutral evolutionary histories, we demonstrate that it is possible not only to obtain an extremely high false-positive rate but also to make biological sense out of the false-positives and construct a sensible biological narrative. Results are compared with a recent polymorphism data set from Drosophila melanogaster.",

keywords = "gene ontology, genome scanning, literature mining, positive selection, validation",

author = "Pavlos Pavlidis and Jensen, {Jeffrey D.} and Wolfgang Stephan and Alexandros Stamatakis",

year = "2012",

month = oct,

doi = "10.1093/molbev/mss136",

language = "English (US)",

volume = "29",

pages = "3237--3248",

journal = "Molecular biology and evolution",

issn = "0737-4038",

publisher = "Oxford University Press",

number = "10",

}

TY - JOUR

T1 - A critical assessment of storytelling

T2 - Gene ontology categories and the importance of validating genomic scans

AU - Pavlidis, Pavlos

AU - Jensen, Jeffrey D.

AU - Stephan, Wolfgang

AU - Stamatakis, Alexandros

PY - 2012/10

Y1 - 2012/10

N2 - In the age of whole-genome population genetics, so-called genomic scan studies often conclude with a long list of putatively selected loci. These lists are then further scrutinized to annotate these regions by gene function, corresponding biological processes, expression levels, or gene networks. Such annotations are often used to assess and/or verify the validity of the genome scan and the statistical methods that have been used to perform the analyses. Furthermore, these results are frequently considered to validate true-positives if the identified regions make biological sense a posteriori. Here, we show that this approach can be potentially misleading. By simulating neutral evolutionary histories, we demonstrate that it is possible not only to obtain an extremely high false-positive rate but also to make biological sense out of the false-positives and construct a sensible biological narrative. Results are compared with a recent polymorphism data set from Drosophila melanogaster.

AB - In the age of whole-genome population genetics, so-called genomic scan studies often conclude with a long list of putatively selected loci. These lists are then further scrutinized to annotate these regions by gene function, corresponding biological processes, expression levels, or gene networks. Such annotations are often used to assess and/or verify the validity of the genome scan and the statistical methods that have been used to perform the analyses. Furthermore, these results are frequently considered to validate true-positives if the identified regions make biological sense a posteriori. Here, we show that this approach can be potentially misleading. By simulating neutral evolutionary histories, we demonstrate that it is possible not only to obtain an extremely high false-positive rate but also to make biological sense out of the false-positives and construct a sensible biological narrative. Results are compared with a recent polymorphism data set from Drosophila melanogaster.

KW - gene ontology

KW - genome scanning

KW - literature mining

KW - positive selection

KW - validation

UR - http://www.scopus.com/inward/record.url?scp=84866933213&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866933213&partnerID=8YFLogxK

U2 - 10.1093/molbev/mss136

DO - 10.1093/molbev/mss136

M3 - Article

C2 - 22617950

AN - SCOPUS:84866933213

SN - 0737-4038

VL - 29

SP - 3237

EP - 3248

JO - Molecular biology and evolution

JF - Molecular biology and evolution

IS - 10

ER -

A critical assessment of storytelling: Gene ontology categories and the importance of validating genomic scans

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this