How important is size? An investigation of corpus size and meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation

Scott A. Crossley, Mihai Dascalu, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

This study examines how differences in corpus size influence the accuracy of Latent Semantic Analysis (LSA) spaces and Latent Dirichlet Allocation (LDA) spaces in two tasks: a word association task and a vocabulary definition test. Specific optimizations were considered in building each semantic model. Initial results indicate that larger corpora lead to greater accuracy and that LDA probabilistic models, similar to LSA vector spaces, can provide insights into cognitive processing at semantic levels.

Original languageEnglish (US)
Title of host publicationFLAIRS 2017 - Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference
PublisherAAAI Press
Pages293-296
Number of pages4
ISBN (Electronic)9781577357872
Publication statusPublished - 2017
Event30th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017 - Marco Island, United States
Duration: May 22 2017May 24 2017

Other

Other30th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017
CountryUnited States
CityMarco Island
Period5/22/175/24/17

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Cite this

Crossley, S. A., Dascalu, M., & McNamara, D. (2017). How important is size? An investigation of corpus size and meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation. In FLAIRS 2017 - Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (pp. 293-296). AAAI Press.