Updating controlled vocabularies by analysing query logs

@article{Vallez2015UpdatingCV,
  title={Updating controlled vocabularies by analysing query logs},
  author={Marie Odile Vallez and Rafael Pedraza-Jim{\'e}nez and Llu{\'i}s Codina and Sa{\'u}l Blanco and Crist{\`o}fol Rovira},
  journal={Online Inf. Rev.},
  year={2015},
  volume={39},
  pages={870-884},
  url={https://api.semanticscholar.org/CorpusID:3444105}
}
A semi-automatic model for updating controlled vocabularies through the use of a text corpus and the analysis of query logs is presented.

Tables from this paper

User search terms and controlled subject vocabularies in an institutional repository

The study presents a novel method for analyzing user search behavior to assist IR managers in determining whether to invest in applying controlled subject vocabularies to IR content.

Etiquetado social y blog-scraping como alternativa para la actualización de vocabularios controlados: Aplicación práctica a un tesauro de Biblioteconomía y Documentación

It is concluded that free language tags could be a better and faster way for contributing new terminology to controlled vocabularies than unstructured controlled language lists.

Cultural Heritage Information Retrieval: Data Modelling and Applications

The Semantic Web enables a wide set of opportunities to develop smart applications based on rich CH information besides better information retrieval, and also reviewed intelligent applications and services developed in the CH domain after establishing semantic data models and Knowledge Organization Systems.

Information search by applying VDL-based iconic tags: an experimental study

This study is one of the first to verify how structured icons work in information searching and how user’s graphical cognition impacts on tag-based information searching process.

Universidades en Google: hacia un modelo de análisis multinivel del posicionamiento web académico

It is concluded that a multilevel analysis is necessary to study the web positioning of the universities and that the proposed model is both viable and scalable.

Cultural Heritage Information Retrieval: Past, Present, and Future Trends

The process from the initial steps of adopting Semantic Web technologies in the CH domain to the latest developments in CH information retrieval is outlined and the findings revealed that GLAMs are excellent and comprehensive sources of CH information.

Mídias sociais e bibliotecas na produção científica dos Estados Unidos

It was found that interdomain has been discussed by the scientific community since 2006, mainly within the scope of university libraries, and a possible epistemic community in formation was identified.

Cultural Heritage Data Management: The Role of Formal Ontology and CIDOC CRM

This chapter proposes CIDOC CRM as the most robust solution for information integration in CH and distinguishes knowledge engineering and formal ontology from other information modelling techniques as the necessary approach for tackling the broader data integration problem.

Posicionamiento web y medios de comunicación: ciclo de vida de una campaña y factores SEO

Con el soporte de los proyectos: “Creacion y contenido interactivo en la comunicacion de informacion audiovisual: audiencias, diseno, sistemas y formatos. CSO2015-64955-C4-2-R. | "El turista en la

Metodologias para revisão e atualização de tesauros: mapeamento da literatura

Introducao : Os tesauros continuam desempenhando importante funcao na padronizacao terminologica para facilitar a comunicacao da informacao. Haja vista essa reconhecida importância, tais instrumentos

A semi-automatic indexing system based on embedded information in HTML documents

The tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents works by identifying and suggesting keywords from a thesaurus according to the embedded information in HTML documents, and there is close to a 50% match or overlap between the two indexing systems.

A Hybrid Information Retrieval Model Using Metadata and Text

A hybrid IR model that searches both metadata and text fields of documents and shows that the hybrid approach outperforms either of the cases, i.e. searching text only or metadata only.

The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research

The debate concerning con trolled and natural indexing languages, as used in searching the databases of the online hosts, in-house information retrieval systems, online public access catalogues and databases stored on CD-ROM is revisited.

Still a Lot to Lose: The Role of Controlled Vocabulary in Keyword Searching

This study replicates the search process in the same online catalog, but after the addition of automated enriched metadata such as tables of contents and summaries, to address criticisms of the Gross/Taylor study.

Comparisons Between Internet Users' Free-Text Queries and Controlled Vocabularies: A Case Study in Water Quality

Comparisons were made between 3,275 free-text searches representing 2,075 unique terms and three controlled vocabularies: Library of Congress Subject Headings, Water Resources Abstracts Thesaurus, and Aqualine Thesauri 2.

What Have We Got to Lose? The Effect of Controlled Vocabulary on Keyword Searching Results

It was found that more than one-third of records retrieved by successful keyword searches would be lost if subject headings were not present, and many individual cases exist in which 80, 90, and even 100 percent of the retrieved records would not be retrieved in the absence of subjectHeadings.

Efficient automatic search query formulation using phrase‐level analysis

An implementable method designed to provide relevant queries based on a user's text input is proposed, and a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase‐level analysis is proposed.

Expanded information retrieval using full-text searching

Use of combination terms along with proximity specification capability is a very powerful feature for retrieving relevant records from full-text searching, and can be useful for applications like literature-related discovery.

A Statistical Approach to Term Extraction

This paper adopts some general principles of the statistical properties of terms and a method to obtain the corresponding language specific parameters and is used for the automatic identification of terminology and is quantitatively evaluated in an empirical study of English medical terms.