SUMSS: a wide-field radio imaging survey of the southern sky – II. The source catalogue
This paper is the second in a series describing the Sydney University Molonglo Sky Survey (SUMSS) being carried out at 843 MHz with the Molonglo Observatory Synthesis Telescope (MOST). The surveyExpand
Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models
This article describes a number of log-linear parsing models for an automatically extracted lexicalized grammar and develops a new model and efficient parsing algorithm which exploits all derivations, including CCG's nonstandard derivations. Expand
Learning multilingual named entity recognition from Wikipedia
The approach outperforms other approaches to automatic ne annotation; competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text. Expand
Evaluating Entity Linking with Wikipedia
This work reimplement three seminal nel systems and presents a detailed evaluation of search strategies, finding that coreference and acronym handling lead to substantial improvement, and search strategies account for much of the variation between systems. Expand
Linguistically Motivated Large-Scale NLP with C&C and Boxer
An NLP system which is based on syntactic and semantic formalisms from theoretical linguistics, and which is used to analyse the entire Gigaword corpus in less than 5 days using only 18 processors, represents a break-through in NLP technology. Expand
From distributional to semantic similarity
This dissertation describes how to extract contexts from a corpus of over 2 billion words and introduces a new context-weighted approximation algorithm with bounded complexity in context vector size that significantly reduces the system runtime with only a minor performance penalty. Expand
Improvements in Automatic Thesaurus Extraction
An approximation algorithm is proposed, based on canonical attributes and coarse- and fine-grained matching, that reduces the time complexity and execution time of thesaurus extraction with only a marginal performance penalty. Expand
Parsing the WSJ Using CCG and Log-Linear Models
A parallel implementation of the L-BFGS optimisation algorithm is described, which runs on a Beowulf cluster allowing the complete Penn Treebank to be used for estimation and a new efficient parsing algorithm for CCG which maximises expected recall of dependencies is developed. Expand
The Importance of Supertagging for Wide-Coverage CCG Parsing
This paper describes the role of supertagging in a wide-coverage CCG parser which uses a log-linear model to select an analysis and shows that large increases in speed can be obtained by tightly integrating the supertagger with the CCG grammar and parser. Expand
Compact continuum source finding for next generation radio surveys
A new source-finding algorithm aegean is demonstrated, based on the application of a Laplacian kernel, which can avoid these problems and can produce complete and reliable source catalogues for the next generation of radio surveys. Expand