Gábor Recski

Learn More
We describe, and make public, large-scale language resources and the toolchain used in their creation, for fifteen medium density To make the process uniform across languages, we selected tools that are either language-independent or easily customizable for each language, and reimplemented all stages that were taking too long. To achieve processing times(More)
We present our approach to measuring semantic similarity of sentence pairs used in Se-meval 2015 tasks 1 and 2. We adopt the sentence alignment framework of (Han et al., 2013) and experiment with several measures of word similarity. We hybridize the common vector-based models with definition graphs from the 4lang concept dictionary and devise a measure of(More)
We investigate from the competence standpoint two recent models of lexical semantics, algebraic conceptual representations and continuous vector models. Characterizing what it means for a speaker to be competent in lexical semantics remains perhaps the most significant stumbling block in reconciling the two main threads of semantics, Chomsky's cogni-tivism(More)
We present a state-of-the-art algorithm for measuring the semantic similarity of word pairs using novel combinations of word embeddings, WordNet, and the concept dictionary 4lang. We evaluate our system on the SimLex-999 benchmark data. Our top score of 0.76 is higher than any published system that we are aware of, well beyond the average inter-annotator(More)
We present Minimum Description Length techniques for learning the structure of weighted languages. MDL is already widely used both for segmentation and classification tasks, and here we show it can be used to formalize further important tools in the descriptive linguists' toolbox, including the distinction between accidental and systematic gaps in the data,(More)
We created a simple gold standard for English-Hungarian NP-level alignment, Orwell's 1984 by manually verifying the automatically generated NP chunking and manually aligning the maximal NPs and PPs. Since the results are highly impacted by the quality of the NP chunking, we tested our alignment algorithms both with real world (machine obtained) chunkings,(More)
  • 1