An information-theoretic perspective of tf-idf measures
  • Akiko Aizawa
  • Mathematics, Computer Science
  • Inf. Process. Manag.
  • 2003
The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency-inverse document frequency measures that are commonly used in today's information retrieval systems.
A Conditional Variational Framework for Dialog Generation
This paper proposes a framework allowing conditional response generation based on specific attributes, which can be either manually assigned or automatically detected and validated on two different scenarios, where the attribute refers to genericness and sentiment states respectively.
A Language Model based Evaluator for Sentence Compression
A language-model-based evaluator for deletion-based sentence compression and an empirical study shows that the proposed model can effectively generate more readable compression, comparable or superior to several strong baselines.
Scheduling of Genetic Algorithms in a Noisy Environment
New methods for adjusting configuration parameters of genetic algorithms operating in a noisy environment by model the search process as a statistical selection process and derive equations useful for these problems are developed.
Linguistic Techniques to Improve the Performance of Automatic Text Categorization
A method for incorporating natural language processing into existing text categorization procedures using a probabilistic language model and automatic extraction of terms based on POS tags automatically generated by a morphological analyzer is presented.
What Makes Reading Comprehension Questions Easier?
This study proposes to employ simple heuristics to split each dataset into easy and hard subsets and examines the performance of two baseline models for each of the subsets, and observes that the baseline performances for thehard subsets remarkably degrade compared to those of entire datasets.
A Fast Linkage Detection Scheme for Multi-Source Information Integration
This paper proposes a fast and efficient method for linkage detection that exploits a suffix array structure that enables linkage detection using variable length n-grams and dynamically generates blocks of possibly associated records using 'blocking keys' extracted from already known reliable linkages.
Dynamic Control of Genetic Algorithms in a Noisy Environment
Adaptive procedures for adjusting parameters of genetic algorithms that operate in a noisy environment are presented and it is shown that these adaptive procedures improve the performance of genetic algorithm over those of commonly used static ones.
NTCIR-12 MathIR Task Overview
This overview paper summarizes the task design, corpora, submitted runs, results, and the approaches used by participating groups of the NTCIR-12 MathIR Task.
MCAT Math Retrieval System for NTCIR-12 MathIR Task
Three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification are introduced and it is found that these modules have a very good impact on the search performance of the MCAT search system.