Latent Semantic Analysis (LSA) is a statistical model of word usage that has been used for a variety of applications. One of these applications is the quantitative assessment of the semantic content within written text. While the technology has been successful in correlating with the qualitative ratings of human experts, it is unclear what aspect of knowledge is being reflected in an LSA output. The two experiments presented here were designed to address this general question. We were particularly interested in whether an LSA analysis more accurately reflects the factual or conceptual knowledge contained in written material. Experiment 1 explored this issue by comparing LSA analyses of essays to human-generated scores. It also compared the LSA output to several measures of conceptual structure. Experiment 2 correlated LSA analyses of transcribed recall protocols with a series of comprehension measures that were designed to vary in the degree to which they reflect conceptual or factual knowledge. We found compelling evidence that LSA analyses are a stronger reflection of the text-based knowledge represented by essays and recall protocols than conceptual knowledge. Both studies also explored a methodological issue pertaining to the use of LSA. Specifically, does LSA have to be "trained" in the particular content area of the text to be analyzed? This question was addressed by running multiple LSA analyses, each performed with differing "semantic spaces" created through training in domain specific or general content areas. We found that LSA performed best when trained in a content area specific to the material to be analyzed. These results are discussed with respect to the application of LSA analyses in the classroom and laboratory.
ASJC Scopus subject areas
- Computer Science Applications