• Corpus ID: 2538517

Identifying Meaningful Citations

  title={Identifying Meaningful Citations},
  author={Marco Valenzuela and Vu A. Ha and Oren Etzioni},
  booktitle={AAAI Workshop: Scholarly Big Data},
We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. [] Key Method We annotate a dataset of approximately 450 citations with this information, and release it publicly. We propose a supervised classification approach that addresses this task with a battery of features that range from citation counts to where the citation appears in the body of the paper, and show that,our approach…

Figures and Tables from this paper

Identifying Important Citations Using Contextual Information from Full Text

This paper explores the effectiveness of eight previously published features and six novel features (including context based, cue words based and textual based) on an annotated dataset of 465 citations and achieves an overall classification accuracy of 0.91 AUC using the Random Forest classifier.

An Authoritative Approach to Citation Classification

It is argued that authors themselves are in a primary position to answer the question of why something was cited, and a new methodology for annotating citations is introduced and a significant new dataset of 11,233 citations annotated by 883 authors is introduced.

Incidental or Influential? - Challenges in Automatically Detecting Citation Importance Using Publication Full Texts

This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications’ full text and finds abstract similarity one of the most predictive features.

Mining the Context of Citations in Scientific Publications

This paper compares and builds upon on four state-of-the-art models that detect important citations using 450 manually annotated citations by experts - randomly selected from 20,527 papers from the Association for Computational Linguistics corpus.

Citation Classification for Behavioral Analysis of a Scientific Field

It is demonstrated that authors are sensitive to discourse structure and publication venue when citing, that online readers follow temporal links to previous and future work rather than methodological links, and that how a paper cites related work is predictive of its citation count.

Towards Finding a Research Lineage Leveraging on Identification of Significant Citations

It is hypothesize that such an automated system can facilitate relevant literature discovery and help identify knowledge flow for at least a certain category of papers and identify the real impact of research work or a facility beyond direct citation counts.

Towards establishing a research lineage via identification of significant citations

It is hypothesize that such an automated system can facilitate relevant literature discovery and help identify knowledge flow for a particular category of papers and demonstrate the efficacy of the idea with two real-life case studies.

Incidental or influential? - A decade of using text-mining for citation function classification

Overall, it is shown that many of the features previously described in literature have been either reported as not particularly predictive, cannot be reproduced based on their existing descriptions or should not be used due to their reliance on external changing evidence.

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

This paper presents a binary citation classification scheme, which is dominated by metadata based parameters, and claims that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable.

A meta-analysis of semantic classification of citations

This literature review investigates the approaches for characterizing citations based on their semantic type and explores the existing classification schemes, data sets, preprocessing methods, extraction of contextual and noncontextual features, and the different types of classifiers and evaluation approaches.



Measuring academic influence: Not all citations are equal

The hip‐index, a model for predicting academic influence that achieves good performance on this data set using only four features, was found, among those evaluated, those based on the number of times a reference is mentioned in the body of a citing paper.

Unsupervised prediction of citation influences

A probabilistic topic model is devised that explains the generation of documents and incorporates the aspects of topical innovation and topical inheritance via citations, and its ability to predict the strength of influence of citations against manually rated citations is evaluated.

Automatic classification of citation function

This work shows that the annotation scheme for citation function is reliable, and presents a supervised machine learning framework to automatically classify citation function, using both shallow and linguistically-inspired features, finding a strong relationship between citation function and sentiment classification.

CiteSeer: an automatic citation indexing system

CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations.

Joint latent topic models for text and citations

This work addresses the problem of joint modeling of text and citations in the topic modeling framework with two different models called the Pairwise-Link-LDA and the Link-PLSA-Lda models, which combine the LDA and PLSA models into a single graphical model.

Logical Structure Recovery in Scholarly Articles with Rich Document Features

SectLabel is described, a module that further develops existing software to detect the logical structure of a document from existing PDF files, using the formalism of conditional random fields.

Latent Topic Models for Hypertext

This paper presents a probabilistic generative model for hypertext document collections that explicitly models the generation of links and shows how to perform EM learning on this model efficiently.

TopicFlow Model: Unsupervised Learning of Topic-specific Influences of Hyperlinked Documents

The TopicFlow model can be a powerful visualization tool to track the diffusion of topics across a citation network and is competitive with the state-of-theart Relational Topic Models in predicting the likelihood of unseen text on two different data sets.

Digital Libraries and Autonomous Citation Indexing

Digital libraries incorporating ACI can help organize scientific literature and may significantly improve the efficiency of dissemination and feedback and speed the transition to scholarly electronic publishing.

The PageRank Citation Ranking : Bringing Order to the Web

This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.