How important is size? An investigation of corpus size and meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation

Scott A. Crossley, Mihai Dascalu, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

This study examines how differences in corpus size influence the accuracy of Latent Semantic Analysis (LSA) spaces and Latent Dirichlet Allocation (LDA) spaces in two tasks: a word association task and a vocabulary definition test. Specific optimizations were considered in building each semantic model. Initial results indicate that larger corpora lead to greater accuracy and that LDA probabilistic models, similar to LSA vector spaces, can provide insights into cognitive processing at semantic levels.

Original languageEnglish (US)
Title of host publicationFLAIRS 2017 - Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference
EditorsVasile Rus, Zdravko Markov
PublisherAAAI press
Pages293-296
Number of pages4
ISBN (Electronic)9781577357872
StatePublished - 2017
Event30th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017 - Marco Island, United States
Duration: May 22 2017May 24 2017

Publication series

NameFLAIRS 2017 - Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference

Other

Other30th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017
Country/TerritoryUnited States
CityMarco Island
Period5/22/175/24/17

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'How important is size? An investigation of corpus size and meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation'. Together they form a unique fingerprint.

Cite this