• Publications
  • Influence
An information-theoretic perspective of tf-idf measures
  • Akiko Aizawa
  • Mathematics, Computer Science
  • Inf. Process. Manag.
  • 2003
TLDR
The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency-inverse document frequency measures that are commonly used in today's information retrieval systems. Expand
A Conditional Variational Framework for Dialog Generation
TLDR
This paper proposes a framework allowing conditional response generation based on specific attributes, which can be either manually assigned or automatically detected and validated on two different scenarios, where the attribute refers to genericness and sentiment states respectively. Expand
A Language Model based Evaluator for Sentence Compression
TLDR
A language-model-based evaluator for deletion-based sentence compression and an empirical study shows that the proposed model can effectively generate more readable compression, comparable or superior to several strong baselines. Expand
Scheduling of Genetic Algorithms in a Noisy Environment
TLDR
New methods for adjusting configuration parameters of genetic algorithms operating in a noisy environment by model the search process as a statistical selection process and derive equations useful for these problems are developed. Expand
Linguistic Techniques to Improve the Performance of Automatic Text Categorization
TLDR
A method for incorporating natural language processing into existing text categorization procedures using a probabilistic language model and automatic extraction of terms based on POS tags automatically generated by a morphological analyzer is presented. Expand
What Makes Reading Comprehension Questions Easier?
TLDR
This study proposes to employ simple heuristics to split each dataset into easy and hard subsets and examines the performance of two baseline models for each of the subsets, and observes that the baseline performances for thehard subsets remarkably degrade compared to those of entire datasets. Expand
A Fast Linkage Detection Scheme for Multi-Source Information Integration
TLDR
This paper proposes a fast and efficient method for linkage detection that exploits a suffix array structure that enables linkage detection using variable length n-grams and dynamically generates blocks of possibly associated records using ‘blocking keys’ extracted from already known reliable linkages. Expand
Dynamic Control of Genetic Algorithms in a Noisy Environment
TLDR
Adaptive procedures for adjusting parameters of genetic algorithms that operate in a noisy environment are presented and it is shown that these adaptive procedures improve the performance of genetic algorithm over those of commonly used static ones. Expand
NTCIR-12 MathIR Task Overview
TLDR
This overview paper summarizes the task design, corpora, submitted runs, results, and the approaches used by participating groups of the NTCIR-12 MathIR Task. Expand
MCAT Math Retrieval System for NTCIR-12 MathIR Task
TLDR
Three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification are introduced and it is found that these modules have a very good impact on the search performance of the MCAT search system. Expand
...
1
2
3
4
5
...