TY - JOUR
T1 - The Protein-Protein Interaction tasks of BioCreative III
T2 - Classification/ranking of articles and linking bio-ontology concepts to full text
AU - Krallinger, Martin
AU - Vazquez, Miguel
AU - Leitner, Florian
AU - Salgado, David
AU - Chatr-aryamontri, Andrew
AU - Winter, Andrew
AU - Perfetto, Livia
AU - Briganti, Leonardo
AU - Licata, Luana
AU - Iannuccelli, Marta
AU - Castagnoli, Luisa
AU - Cesareni, Gianni
AU - Tyers, Mike
AU - Schneider, Gerold
AU - Rinaldi, Fabio
AU - Leaman, Robert
AU - Gonzalez, Graciela
AU - Matos, Sergio
AU - Kim, Sun
AU - Wilbur, W. J.
AU - Rocha, Luis
AU - Shatkay, Hagit
AU - Tendulkar, Ashish V.
AU - Agarwal, Shashank
AU - Liu, Feifan
AU - Wang, Xinglong
AU - Rak, Rafal
AU - Noto, Keith
AU - Elkan, Charles
AU - Lu, Zhiyong
AU - Dogan, Rezarta I.
AU - Fontaine, Jean Fred
AU - Andrade-Navarro, Miguel A.
AU - Valencia, Alfonso
N1 - Funding Information:
The work of the CNIO task organizers (MK, MV, FL, AV) was funded by the projects BIO2007 (BIO2007-666855), CONSOLIDER (CSD2007-00050) and ENFIN (LSGH-CT-2005-518254). We would like to thank the following publishers for allowing us to use their articles for this task: Elsevier, Wiley, NPG, Rockefeller University Press, American Society for Biochemistry and Molecular Biology, American Society of Plant Biologists. We would like to thank Reverse Informatics and in particular Parthiban Srinivasan for their collaboration on the ACT. We especially thank the other BioCreative organizers for hosting the evaluation workshop, feedback and coordination. The OntoGene group (GS and FR) is partially supported by the Swiss National Science Foundation (grants 100014-118396/1 and 105315-130558/1) and by NITAS/TMS, Text Mining Services, Novartis Pharma AG, Basel, Switzerland. Additional contributors to their work are: Simon Clematide, Martin Romacker, Therese Vachon. Authors RL and GG (team 69) are supported by The Arizona Alzheimer’s Disease Data Management Core under NIH grant NIA P30 AG-19610. The work at University of Aveiro (SM, team 70) was supported by “Fundação para a Ciência e a Tecnologia”, under the research project PTDC/ EIA-CCO/100541/2008. Team 73 (SK and WJW) is supported by the Intramural Research Program of the NIH, National Library of Medicine. Team 89 (SA and FLI) wants to acknowledge the support from the National Library of Medicine, grant numbers 5R01LM009836 to Hong Yu and 5R01LM010125 to Isaac Kohane. The work of team 90 was mainly supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC project BB/G013160/1, Automated Biological Event Extraction from the Literature for Drug Discovery), and the National Centre for Text Mining is supported by the UK Joint Information Systems Committee (JISC). Team 100 (ZL and RID) is grateful to other team members not listed as authors (M. Huang, A. Neveol, and Y. Yang) for their contribution to the tasks. Their work is supported by the Intramural Research Program of the NIH, National Library of Medicine. Team 104’s (JFF and MAAN) work was funded within the framework of the Medical Genome Research Programme NGFN-Plus by the German Ministry of Education and Research (reference number: 01GS08170), and by the Helmholtz Alliance in Systems Biology (Germany). GC was supported by the Telethon foundation and AIRC. KN would like to acknowledge funding from NIH R01 grant number GM077402. DS would like to acknowledge the participation in the MyMiner project of Marc Depaule, Elodie Drula and Christophe Marcelle. The MyMiner project was supported by, MyoRes European Network of Excellence dedicated to study normal and aberrant muscle development function and repair and the French Association against Myopathies (AFM). This article has been published as part of BMC Bioinformatics Volume 12 Supplement 8, 2011: The Third BioCreative - Critical Assessment of Information Extraction in Biology Challenge. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S8.
PY - 2011/10/3
Y1 - 2011/10/3
N2 - Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.
AB - Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.
UR - http://www.scopus.com/inward/record.url?scp=80053423937&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053423937&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S8-S3
DO - 10.1186/1471-2105-12-S8-S3
M3 - Article
C2 - 22151929
AN - SCOPUS:80053423937
VL - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
SN - 1471-2105
IS - SUPPL. 8
M1 - S3
ER -