Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power

  title={Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power},
  author={Jekaterina Novikova and Aparna Balagopalan and Ksenia Shkaruta and Frank Rudzicz},
Understanding the vulnerability of linguistic features extracted from noisy text is important for both developing better health text classification models and for interpreting vulnerabilities of natural language models. [...] Key Result Results are validated across three datasets representing different text-classification tasks, with different levels of lexical and syntactic complexity of both conversational and written language.Expand
Impact of ASR on Alzheimer’s Disease Detection: All Errors are Equal, but Deletions are More Equal than Others
It is found that deletion errors affect detection performance the most, due to their impact on the features of syntactic complexity and discourse representation in speech, and proposed to reflect a higher penalty for deletion errors in order to improve dementia detection performance.
Robustness and Sensitivity of BERT Models Predicting Alzheimer’s Disease from Text
This paper analyzes how a controlled amount of desired and undesired text alterations impacts performance of BERT and shows that BERT is robust to natural linguistic variations in text and not sensitive to removing clinically important information from text.


Computational Methods for Corpus Annotation and Analysis
This volume provides language and linguistics researchers with an accessible introduction to the state-of-the-art NLP technology that facilitates automatic annotation and analysis of large text corpora at both shallow and deep linguistic levels.
Detecting cognitive impairments by agreeing on interpretations of linguistic features
This paper proposes Consensus Networks (CNs), a framework to classify after reaching agreements between modalities, which significantly outperform traditional classifiers, which are used by the state-of-the-art papers.
INSIGHT Galway: Syntactic and Lexical Features for Aspect Based Sentiment Analysis
This work analyses various syntactic and lexical features for sentence level aspect based sentiment analysis and reports accuracies which are much higher than the provided baselines.
Automated classification of primary progressive aphasia subtypes from narrative speech transcripts
This study presents a method for evaluating and classifying connected speech in primary progressive aphasia using computational techniques and achieves accuracies well above baseline on the three binary classification tasks.
Automatic analysis of syntactic complexity in second language writing
  • X. Lu
  • Computer Science
  • 2010
The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures, which are designed with advanced second language proficiency research in mind and developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners.
The relation between content and structure in language production: An analysis of speech errors in semantic dementia
The study presents the first evidence that SD patients have problems with closed class items and make syntactic as well as semantic speech errors, although these grammatical abnormalities are mostly subtle rather than gross.
Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars
A novel approach for detecting and correcting ungrammatical translations of six single statistical machine translation systems and proposes a new unification method which allows the unification procedure to continue when unification fails and also to propagate the failure information to relevant words.
Learning Word Vectors for Sentiment Analysis
This work presents a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term--document information as well as rich sentiment content, and finds it out-performs several previously introduced methods for sentiment classification.
Spoken Language Derived Measures for Detecting Mild Cognitive Impairment
The results indicate that using multiple, complementary measures can aid in automatic detection of MCI, and demonstrate a statistically significant improvement in the area under the ROC curve (AUC) when using automatic spoken language derived features in addition to the neuropsychological test scores.
Lexical richness in the spontaneous speech of bilinguals
The focus of the present paper is on the measurement of lexical richness. Lexical richness is often measured either by the traditional type-token ratio (TTR) or by its square root variant, the index