Assessing thesaurus-based annotations for semantic search applications

Statistical methods for automated document indexing are becoming an alternative to the manual assignment of keywords. We argue that the quality of the thesaurus used as a basis for indexing in regard to its ability to adequately cover the contents to be indexed and as a basis for the specific indexing method used is of crucial importance in automatic indexing. We present an interactive tool for thesaurus evaluation that is based on a combination of statistical measures and appropriate… 
Tagging and automation: challenges and opportunities for academic libraries
The paper presents the open source software Semtinel, offering a highly optimized toolbox for analysing thesauri and classifications, and finds a combination of tagging, intellectual and automatic indexing is probably best suited to shape the annotation of literature more efficiently without compromising quality.
User-Centered Maintenance of Concept Hierarchies
The authors describe several successful approaches to the semi-automatic maintenance of taxonomies, which incorporate the human expert as a central part of the system.
Exploiting Title-Keywords Relation to Develop Concept Classifier for Scientific Community
A new approach of finding the evolving classification, termed as concept classifier for the scientific community by aggregating the correlation between title terms and keywords of the paper is proposed and implemented.
Progress towards intelligent support for human articulation of concepts from examples
  • G. Pavel
  • Computer Science
    Int. J. Knowl. Learn.
  • 2010
This article proposes a framework based on using machine learning to reason from student classifications, which it calls HMCD (Human Machine Concept Dance), and addresses here the learning of natural concepts and principles of working memories that can support the learning activities.


Automated Indexing with Thesaurus Descriptors: A Co-occurence Based Approach to Multilingual Retrieval
Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual Information Retrieval, however, manual indexing is expensive and most weighting schemes of automated indexing methods are not suited to select Thesaurus descriptors.
Benchmarking ontology-based annotation tools for the Semantic Web
The main issues for evaluating ontology-based annotation tools, a key component in text mining applications for the Semantic Web, are discussed and the main requirements in terms of both usability and performance are outlined.
Metrics for Evaluation of Ontology-based Information Extraction
This paper discusses existing evaluation metrics, and proposes a new method for evaluating the ontology population task, which is general enough to be used in a variety of situations, yet more precise than many current metrics.
Exploring large document repositories with RDF technology: the DOPE project
This thesaurus-based search system uses automatic indexing, RDF-based querying, and concept-based visualization of results to support exploration of large online document repositories. Innovative
Semi-Automatic Indexing of Full Text Biomedical Articles
An extension of MTI is reported on to the full text of articles appearing in online medical journals that are indexed for Medline, using a collection of 17 journal issues containing 500 articles to report on the effectiveness of the contribution of terms by the whole article and also by each section.
An Intrinsic Information Content Metric for Semantic Similarity in WordNet
A wholly intrinsic measure of Information Content that relies on hierarchical structure alone is presented, which is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis.
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the
Towards Text Knowledge Engineering
This work introduces a methodology for automating the maintenance of domain-specific taxonomies based on natural language text understanding and ranks concept hypotheses according to credibility and the most credible ones are selected for assimilation into the domain knowledge base.
Semantic Precision and Recall for Ontology Alignment Evaluation
Drawing on previous syntactic generalizations of precision and recall, semantically justified measures that satisfy maximal precision and maximal recall for correct and complete alignments is proposed, which are compatible with classical Precision and recall and can be computed.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy
This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.