The Protein-Protein Interaction tasks of BioCreative III

Classification/ranking of articles and linking bio-ontology concepts to full text

Martin Krallinger, Miguel Vazquez, Florian Leitner, David Salgado, Andrew Chatr-aryamontri, Andrew Winter, Livia Perfetto, Leonardo Briganti, Luana Licata, Marta Iannuccelli, Luisa Castagnoli, Gianni Cesareni, Mike Tyers, Gerold Schneider, Fabio Rinaldi, Robert Leaman, Graciela Gonzalez, Sergio Matos, Sun Kim, W. J. Wilbur & 14 others Luis Rocha, Hagit Shatkay, Ashish V. Tendulkar, Shashank Agarwal, Feifan Liu, Xinglong Wang, Rafal Rak, Keith Noto, Charles Elkan, Zhiyong Lu, Rezarta I. Dogan, Jean Fred Fontaine, Miguel A. Andrade-Navarro, Alfonso Valencia

Research output: Contribution to journalArticle

83 Citations (Scopus)

Abstract

Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.

Original languageEnglish (US)
Article numberS3
JournalBMC Bioinformatics
Volume12
Issue numberSUPPL. 8
DOIs
StatePublished - Oct 3 2011

Fingerprint

Protein-protein Interaction
Linking
Ontology
Ranking
Proteins
Interaction
Data Mining
Text Mining
Correlation coefficient
Area Under Curve
Annotation
Term
Concepts
Text
Competitive System
Workflow
Test Set
Error term
Granularity
Macros

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Krallinger, M., Vazquez, M., Leitner, F., Salgado, D., Chatr-aryamontri, A., Winter, A., ... Valencia, A. (2011). The Protein-Protein Interaction tasks of BioCreative III: Classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinformatics, 12(SUPPL. 8), [S3]. https://doi.org/10.1186/1471-2105-12-S8-S3

The Protein-Protein Interaction tasks of BioCreative III : Classification/ranking of articles and linking bio-ontology concepts to full text. / Krallinger, Martin; Vazquez, Miguel; Leitner, Florian; Salgado, David; Chatr-aryamontri, Andrew; Winter, Andrew; Perfetto, Livia; Briganti, Leonardo; Licata, Luana; Iannuccelli, Marta; Castagnoli, Luisa; Cesareni, Gianni; Tyers, Mike; Schneider, Gerold; Rinaldi, Fabio; Leaman, Robert; Gonzalez, Graciela; Matos, Sergio; Kim, Sun; Wilbur, W. J.; Rocha, Luis; Shatkay, Hagit; Tendulkar, Ashish V.; Agarwal, Shashank; Liu, Feifan; Wang, Xinglong; Rak, Rafal; Noto, Keith; Elkan, Charles; Lu, Zhiyong; Dogan, Rezarta I.; Fontaine, Jean Fred; Andrade-Navarro, Miguel A.; Valencia, Alfonso.

In: BMC Bioinformatics, Vol. 12, No. SUPPL. 8, S3, 03.10.2011.

Research output: Contribution to journalArticle

Krallinger, M, Vazquez, M, Leitner, F, Salgado, D, Chatr-aryamontri, A, Winter, A, Perfetto, L, Briganti, L, Licata, L, Iannuccelli, M, Castagnoli, L, Cesareni, G, Tyers, M, Schneider, G, Rinaldi, F, Leaman, R, Gonzalez, G, Matos, S, Kim, S, Wilbur, WJ, Rocha, L, Shatkay, H, Tendulkar, AV, Agarwal, S, Liu, F, Wang, X, Rak, R, Noto, K, Elkan, C, Lu, Z, Dogan, RI, Fontaine, JF, Andrade-Navarro, MA & Valencia, A 2011, 'The Protein-Protein Interaction tasks of BioCreative III: Classification/ranking of articles and linking bio-ontology concepts to full text', BMC Bioinformatics, vol. 12, no. SUPPL. 8, S3. https://doi.org/10.1186/1471-2105-12-S8-S3
Krallinger, Martin ; Vazquez, Miguel ; Leitner, Florian ; Salgado, David ; Chatr-aryamontri, Andrew ; Winter, Andrew ; Perfetto, Livia ; Briganti, Leonardo ; Licata, Luana ; Iannuccelli, Marta ; Castagnoli, Luisa ; Cesareni, Gianni ; Tyers, Mike ; Schneider, Gerold ; Rinaldi, Fabio ; Leaman, Robert ; Gonzalez, Graciela ; Matos, Sergio ; Kim, Sun ; Wilbur, W. J. ; Rocha, Luis ; Shatkay, Hagit ; Tendulkar, Ashish V. ; Agarwal, Shashank ; Liu, Feifan ; Wang, Xinglong ; Rak, Rafal ; Noto, Keith ; Elkan, Charles ; Lu, Zhiyong ; Dogan, Rezarta I. ; Fontaine, Jean Fred ; Andrade-Navarro, Miguel A. ; Valencia, Alfonso. / The Protein-Protein Interaction tasks of BioCreative III : Classification/ranking of articles and linking bio-ontology concepts to full text. In: BMC Bioinformatics. 2011 ; Vol. 12, No. SUPPL. 8.
@article{2a68fe6ef8ec48189e12e5ac089972bf,
title = "The Protein-Protein Interaction tasks of BioCreative III: Classification/ranking of articles and linking bio-ontology concepts to full text",
abstract = "Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89{\%} and the best AUC iP/R was 68{\%}. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53{\%}, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35{\%}) the macro-averaged precision ranged between 50{\%} and 80{\%}, with a maximum F-Score of 55{\%}.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.",
author = "Martin Krallinger and Miguel Vazquez and Florian Leitner and David Salgado and Andrew Chatr-aryamontri and Andrew Winter and Livia Perfetto and Leonardo Briganti and Luana Licata and Marta Iannuccelli and Luisa Castagnoli and Gianni Cesareni and Mike Tyers and Gerold Schneider and Fabio Rinaldi and Robert Leaman and Graciela Gonzalez and Sergio Matos and Sun Kim and Wilbur, {W. J.} and Luis Rocha and Hagit Shatkay and Tendulkar, {Ashish V.} and Shashank Agarwal and Feifan Liu and Xinglong Wang and Rafal Rak and Keith Noto and Charles Elkan and Zhiyong Lu and Dogan, {Rezarta I.} and Fontaine, {Jean Fred} and Andrade-Navarro, {Miguel A.} and Alfonso Valencia",
year = "2011",
month = "10",
day = "3",
doi = "10.1186/1471-2105-12-S8-S3",
language = "English (US)",
volume = "12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL. 8",

}

TY - JOUR

T1 - The Protein-Protein Interaction tasks of BioCreative III

T2 - Classification/ranking of articles and linking bio-ontology concepts to full text

AU - Krallinger, Martin

AU - Vazquez, Miguel

AU - Leitner, Florian

AU - Salgado, David

AU - Chatr-aryamontri, Andrew

AU - Winter, Andrew

AU - Perfetto, Livia

AU - Briganti, Leonardo

AU - Licata, Luana

AU - Iannuccelli, Marta

AU - Castagnoli, Luisa

AU - Cesareni, Gianni

AU - Tyers, Mike

AU - Schneider, Gerold

AU - Rinaldi, Fabio

AU - Leaman, Robert

AU - Gonzalez, Graciela

AU - Matos, Sergio

AU - Kim, Sun

AU - Wilbur, W. J.

AU - Rocha, Luis

AU - Shatkay, Hagit

AU - Tendulkar, Ashish V.

AU - Agarwal, Shashank

AU - Liu, Feifan

AU - Wang, Xinglong

AU - Rak, Rafal

AU - Noto, Keith

AU - Elkan, Charles

AU - Lu, Zhiyong

AU - Dogan, Rezarta I.

AU - Fontaine, Jean Fred

AU - Andrade-Navarro, Miguel A.

AU - Valencia, Alfonso

PY - 2011/10/3

Y1 - 2011/10/3

N2 - Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.

AB - Background: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.Results: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.Conclusions: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.

UR - http://www.scopus.com/inward/record.url?scp=80053423937&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053423937&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-S8-S3

DO - 10.1186/1471-2105-12-S8-S3

M3 - Article

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL. 8

M1 - S3

ER -