• Publications
  • Influence
MalwareTextDB: A Database for Annotated Malware Articles
Cybersecurity risks and malware threats are becoming increasingly dangerous and common. Despite the severity of the problem, there has been few NLP efforts focused on tackling cybersecurity. In thisExpand
  • 20
  • 5
Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators
In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture how mentionsExpand
  • 36
  • 4
Learning to Recognize Discontiguous Entities
This paper focuses on the study of recognizing discontiguous entities. Motivated by a previous work, we propose to use a novel hypergraph representation to jointly encode discontiguous entities ofExpand
  • 15
  • 1
Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text
This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages,Expand
  • 13
  • 1
Efficient Dependency-Guided Named Entity Recognition
Named entity recognition (NER), which focuses on the extraction of semantically meaningful named entities and their semantic classes from text, serves as an indispensable component for severalExpand
  • 13
Low-resource Cross-lingual Event Type Detection via Distant Supervision with Minimal Effort
The use of machine learning for NLP generally requires resources for training. Tasks performed in a low-resource language usually rely on labeled data in another, typically resource-rich, language.Expand
  • 6
The ARIEL-CMU Systems for LoReHLT18
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL),Expand
  • 4
Analyzing Incorporation of Emotion in Emoji Prediction
In this work, we investigate the impact of incorporating emotion classes on the task of predicting emojis from Twitter texts. More specifically, we first show that there is a correlation between theExpand
  • 1
Neural Polysynthetic Language Modelling
Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". In high-resource languages, especiallyExpand
Materials for Learning to Recognize Discontiguous Entities
This is the supplementary material for “Learning to Recognize Discontiguous Entities” [Muis and Lu, 2016]. This material gives more details in the experiment setup, the ambiguity of each model, andExpand