• Publications
  • Influence
A Closer Look at Skip-gram Modelling
TLDR
The amount of extra training data required to achieve skip-gram coverage using standard adjacent tri-grams is determined, which is determined by computing all possible skip-rams in a training corpus and measuring how many adjacent (standard) n- grams these cover in test documents.
Lexical Disambiguation using Simulated Annealing
TLDR
A method for lexical disambiguation of text using the definitions in a machine-readable dictionary together with the technique of simulated annealing to select the optimal combinations of word senses for all the words in the sentence simultaneously.
Electric Words: Dictionaries, Computers, and Meanings
TLDR
A short history of meaning symbolic accounts of definitional meaning primitives in meaning definition wordbooks as human artifacts and tasks and tools text analysis and its relationship to dictionaries.
Lexical Disambiguation using Simulated Annealing
TLDR
A method for lexical disambiguation of text using the definitions in a machine-readable dictionary together with the technique of simulated annealing to select the optimal combinations of word senses for all the words in the sentence simultaneously.
Subject-Dependent Co-Occurence and Word Sense Disambiguation
TLDR
Using the subject classifications given in the machine-redable version of Longman's Dictionary of Contemporary English, subject-dependent co-occurrence links between words of the defining vocabulary are established to construct "neighborhoods" and the application of these neighborhoods to information retrieval is described.
Genus Disambiguation: A Study in Weighted Preference
TLDR
A series of experiments are reported which weight the three factors in various ways, and improvements to the algorithm are described, to about 90% accuracy.
Natural Language Information Retrieval: TREC-8 Report
TLDR
This paper reports on the joint GE/Lockheed Martin/Rutgers/NYU natural language information retrieval project as related to the 5th Text Retrieval Conference (TREC-5), which uses natural language processing techniques to enhance the effectiveness of full-text document retrieval.
Unsupervised Anomaly Detection
TLDR
This paper shows several variants of an automatic technique for identifying an 'unusual' segment within a document, and considers texts which are unusual because of author, genre, topic or emotional tone, and shows substantial improvements over a baseline in all cases.
Towards the Orwellian Nightmare: Separation of Business and Personal Emails
TLDR
This paper describes the largest scale annotation project involving the Enron email corpus to date, which classified emails into the categories "Business" and "Personal", and then sub-categorised by type within these categories.
Is there content in empty heads?
TLDR
It is shown that hierarchies of this type can be automatically constructed, by using the semantic category codes and the subject codes of the Longman Dictionary of Contemporary English to disambiguate the genus terms in noun definitions.
...
1
2
3
4
5
...