AutoTutor's Coverage of Expectations during Tutorial Dialogue
The Context Dependent Sentence Abstraction (CDSA) model and Latent Semantic Analysis (LSA) were compared in their ability to predict sentence similarity. Evidence supports the conclusion that the CDSA model better predicts human ratings for short phrases and sentences than does LSA. Alternative theoretical reasons are given for this finding. Introduction Researchers in many disciplines within cognitive science have proposed and tested theoretical claims about the meaning of natural language expressions. One of the contemporary models is Latent Semantic Analysis (LSA; Landauer & Dumais, 1997). LSA is a statistical, corpus based technique for representing world knowledge. It computes similarity comparisons between words or documents by capitalizing on the fact that words are similar when they are surrounded by similar words (i.e., the company a word keeps). LSA takes quantitative information about co-occurrences of words in documents (paragraphs and sentences) and translates this into a K-dimensional space. The input of LSA is a large co-occurrence matrix that specifies the frequency of words in documents. LSA reduces each document and word into a lower dimensional space by using singular value decomposition. This way, the initially extremely large wordby-document co-occurrence matrix is typically reduced to about 300 dimensions. Each word ends up being a Kdimensional vector. The semantic relationship between words can be estimated by taking the cosine (normalized dot product) between two vectors. Although LSA performance has been shown to be impressive at the paragraph level (Foltz, Gilliam, & Kendall, 2000; Landauer, Laham, Rehder, & Schreiner, 1997), other research has found limitations of LSA at the sentence level (Kintsch, 2001). In this paper we will present the Context Dependent Sentence Abstraction (CDSA) model, a corpus-based model that builds sentence meanings based on combinations of pooled adjacent neighbors of individual words. We will first discuss a weakness with vector representational systems (e.g., LSA) in handling sentence comprehension and then turn to a description of the CDSA model, with evidence supporting it.