My Approach = Your Apparatus?

  title={My Approach = Your Apparatus?},
  author={Julian Risch and Ralf Krestel},
  journal={Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries},
  • Julian RischRalf Krestel
  • Published 23 May 2018
  • Computer Science
  • Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries
Comparative text mining extends from genre analysis and political bias detection to the revelation of cultural and geographic differences, through to the search for prior art across patents and scientific papers. These applications use cross-collection topic modeling for the exploration, clustering, and comparison of large sets of documents, such as digital libraries. However, topic modeling on documents from different collections is challenging because of domain-specific vocabulary. We present… 

Figures and Tables from this paper

Domain-specific word embeddings for patent classification

A deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings that fulfills the need for domain-specific word embedDings for downstream tasks in the patent domain, such as patent classification or patent analysis.

Learning Patent Speak: Investigating Domain-Specific Word Embeddings

  • Julian RischRalf Krestel
  • Computer Science
    2018 Thirteenth International Conference on Digital Information Management (ICDIM)
  • 2018
A deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings for the patent domain and shows that this approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches.

A Micro Perspective of Research Dynamics Through “Citations of Citations” Topic Analysis

A cross-collection topic model is used to reveal the research dynamics of topic disappearance topic inheritance, and topic innovation in each generation of forward chaining and demonstrates that a scientific influence exists in indirect citations through its analysis offorward chaining.



Mining contrastive opinions on political texts using cross-perspective topic model

An extensive set of experiments have been conducted to evaluate the proposed unsupervised topic model for contrastive opinion modeling, which simulates the generative process of how opinion words occur in the documents of different collections.

A cross-collection mixture model for comparative text mining

A generative probabilistic mixture model is proposed for comparative text mining that simultaneously performs cross-collection clustering and within- collection clustering, and can be applied to an arbitrary set of comparable text collections.

Scalable Topical Phrase Mining from Text Corpora

This work proposes a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition that discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets.

An unsupervised topic segmentation model incorporating word order

This work presents a new unsupervised topic discovery model for a collection of text documents that does not break the document's structure such as paragraphs and sentences and preserves word order, and can generate two levels of topics of different granularity.

Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining

Supervised cross-collection topic modeling

The results suggest that the proposed scLDA can generate meaningful collection-specific topics and achieves better retrieval accuracy than other related topic models.

Differential Topic Models

A differential topic model for this application that models both topic differences and similarities is presented and it is shown the model outperforms the state-of-the-art for document classification/ideology prediction on a number of text collections.

Exploring the Space of Topic Coherence Measures

This work is the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining elementary components, and shows that new combinations of components outperform existing measures with respect to correlation to human ratings.

Recommending patents based on latent topics

This paper investigates the use of latent Dirichlet allocation and Dirichlets multinomial regression to represent patent documents and to compute similarity scores and compares their methods with state-of-the-art document representations and retrieval techniques.

Fast, Flexible Models for Discovering Topic Correlation across Weakly-Related Collections

Two probabilistic topic models, Correlated LDA (C-LDA) and Correlated HDP (C -HDP) are introduced, which address problems that can arise when analyzing large, asymmetric, and potentially weakly-related collections.