• Corpus ID: 7233052

Using a Probabilistic Model of Context to Detect Word Obfuscation

@inproceedings{Jabbari2008UsingAP,
  title={Using a Probabilistic Model of Context to Detect Word Obfuscation},
  author={Sanaz Jabbari and Ben Allison and Louise Guthrie},
  booktitle={LREC},
  year={2008}
}
This paper proposes a distributional model of word use and word meaning which is derived purely from a body of text, and then applies this model to determine whether certain words are used in or out of context. We suggest that we can view the contexts of words as multinomially distributed random variables. We illustrate how using this basic idea, we can formulate the problem of detecting whether or not a word is used in context as a likelihood ratio test. We also define a measure of semantic… 

Tables from this paper

Using common-sense knowledge-base for detecting word obfuscation in adversarial communication
  • Swati Agarwal, A. Sureka
  • Computer Science
    2015 7th International Conference on Communication Systems and Networks (COMSNETS)
  • 2015
TLDR
This work presents a solution approach exploiting vast amount of semantic knowledge in ConceptNet for addressing the technically challenging problem of word substitution in adversarial communication and results reveal that the proposed approach is effective.
Investigating the Application of Common-Sense Knowledge-Base for Identifying Term Obfuscation in Adversarial Communication
TLDR
This work uses ConceptNet to compute the conceptual similarity between any two given terms and define a Mean Average Conceptual Similarity (MACS) metric to identify out-of-context terms.
Performance Analysis of Different Sentence Oddity Measures Applied on Google and Google News Repository for Detection of Substitution
TLDR
This paper applied some measures on different types of documents to detect word substitution in General data, General data and Google news data and compared the performance of these measures.
Code Word Detection in Fraud Investigations using a Deep-Learning Approach
TLDR
It is demonstrated that deep neural language models can reliably be applied in fraud investigations for the detection of code words and it is shown that the state-of-the-art BERT model significantly outperforms other methods on this task.
Extraction of opinionated profiles from comments on web news
TLDR
Opinion Mining is a specific case of text mining that deals with the extraction of opinions, where some notion of opinion is obtained by the classifier, but there are many different ways to improve final results.
Computational approaches to the comparison of regional variety corpora : prototyping a semi-automatic system for German
Regional varieties of pluri-centric languages such as German are generally very similar with respect to their structure and the linguistic phenomena that occur. The extraction of differences is thus
COLING 2018 The 27th International Conference on Computational Linguistics Proceedings of the First Workshop on Natural Language Processing for Internet Freedom (NLP4IF-2018)
  • Economics
  • 2018
Censorship of Internet content in China is understood to operate through a system of intermediary liability whereby service providers are liable for the content on their platforms. Previous work
Creative Language Encoding under Censorship
TLDR
This position paper systematically categorize human-created obfuscated language on various levels, investigates their basic mechanisms, gives an overview on automated techniques needed to simulate human encoding, and summarizes remaining challenges for future research on the interaction between Natural Language Processing and encryption.
Automatic extraction of mobility activities in microblogs
TLDR
This dissertation evaluated a random sample of messages from Twitter to be classified as containing mobility activities or not and the results were a precision of 82.7 % and a recall of 62 %, meaning that the priority was to improve precision than recall.
...
1
2
...

References

SHOWING 1-10 OF 25 REFERENCES
Lexical chains as representations of context for the detection and correction of malapropisms
TLDR
How lexical chains can be constructed by means of WordNet, and how they can be applied in one particularlinguistic task: the detection and correction of malapropisms is shown.
KU: Word Sense Disambiguation by Substitution
  • Deniz Yuret
  • Computer Science
    Fourth International Workshop on Semantic Evaluations (SemEval-2007)
  • 2007
TLDR
A WSD system that uses a statistical language model based on a large unannotated corpus to evaluate the likelihood of various substitutes for a word in a given context to determine the best sense for the word in novel contexts is described.
An introduction to latent semantic analysis
TLDR
The adequacy of LSA's reflection of human knowledge has been established in a variety of ways, for example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word‐word and passage‐word lexical priming data.
Automatic Word Sense Discrimination
TLDR
This paper presents context-group discrimination, a disambiguation algorithm based on clustering that demonstrates good performance of context- group discrimination for a sample of natural and artificial ambiguous words.
Detecting Word Substitution in Adversarial Communication
TLDR
This paper considers ways to detect replacements that have similar frequencies to the original words, and considers the frequencies of generalized n-grams that are called kgrams, the change in frequency that results from removing the word under consideration from its context, and the change from replacing a word by its hypernym.
Similarity-Based Approaches to Natural Language Processing
TLDR
The clustering method, which uses the technique of deterministic annealing, represents (to the authors' knowledge) the first application of soft clustering to problems in natural language processing, and is compared to several such nearest-neighbor approaches on a word sense disambiguation task and finds that as a whole, their performance is far superior to that of standard methods.
The Generative Lexicon
TLDR
It is argued that lexical decomposition is possible if it is performed generatively and a theory of lexical inheritance is outlined, which provides the necessary principles of global organization for the lexicon, enabling us to fully integrate the authors' natural language lexicon into a conceptual whole.
Using Corpus Statistics and WordNet Relations for Sense Identification
TLDR
A statistical classifier is described that combines topical context with local cues to identify a word sense and is used to disambiguate a noun, a verb, and an adjective.
Finding translations for low-frequency words in comparable corpora
TLDR
A method is developed that aims to compensate for insufficient amounts of corpus evidence on rare words: prior to measuring cross-language similarities, the method uses same-language corpus data to model co-occurrence vectors of rare words by predicting their unseenCo-occurrences and smoothing rare, unreliable ones.
Roget's thesaurus and semantic similarity
TLDR
A system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests, comparing the results with those produced by WordNet-based similarity measures.
...
1
2
3
...