Extractive summarization using cohesion network analysis and submodular set functions

Valentin Sergiu Cioaca, Mihai Dascalu, Danielle S. McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Numerous approaches have been introduced to automate the process of text summarization, but only few can be easily adapted to multiple languages. This paper introduces a multilingual text processing pipeline integrated in the open-source ReaderBench framework, which can be retrofit to cover more than 50 languages. While considering the extensibility of the approach and the problem of missing labeled data for training in various languages besides English, an unsupervised algorithm was preferred to perform extractive summarization (i.e., select the most representative sentences from the original document). Specifically, two different approaches relying on text cohesion were implemented: a) a graph-based text representation derived from Cohesion Network Analysis that extends TextRank, and b) a class of submodular set functions. Evaluations were performed on the DUC dataset and use as baseline the implementation of TextRank from Gensim. Our results using the submodular set functions outperform the baseline. In addition, two use cases on English and Romanian languages are presented, with corresponding graphical representations for the two methods.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages161-168
Number of pages8
ISBN (Electronic)9781728176284
DOIs
StatePublished - Sep 2020
Event22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020 - Virtual, Timisoara, Romania
Duration: Sep 1 2020Sep 4 2020

Publication series

NameProceedings - 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020

Conference

Conference22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020
Country/TerritoryRomania
CityVirtual, Timisoara
Period9/1/209/4/20

Keywords

  • Cohesion Network Analysis
  • Extractive summarization
  • SpaCy framework
  • Submodular functions
  • TextRank
  • Word Mover's Distance

ASJC Scopus subject areas

  • Computer Science Applications
  • Computational Mathematics
  • Modeling and Simulation
  • Numerical Analysis

Fingerprint

Dive into the research topics of 'Extractive summarization using cohesion network analysis and submodular set functions'. Together they form a unique fingerprint.

Cite this