Corpus ID: 2538517

Identifying Meaningful Citations

@inproceedings{Valenzuela2015IdentifyingMC,
  title={Identifying Meaningful Citations},
  author={Marco Valenzuela and Vu A. Ha and Oren Etzioni},
  booktitle={AAAI Workshop: Scholarly Big Data},
  year={2015}
}
We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. [...] Key Method We annotate a dataset of approximately 450 citations with this information, and release it publicly. We propose a supervised classification approach that addresses this task with a battery of features that range from citation counts to where the citation appears in the body of the paper, and show that,our approach…Expand
Identifying Important Citations Using Contextual Information from Full Text
TLDR
This paper explores the effectiveness of eight previously published features and six novel features (including context based, cue words based and textual based) on an annotated dataset of 465 citations and achieves an overall classification accuracy of 0.91 AUC using the Random Forest classifier. Expand
Incidental or Influential? - Challenges in Automatically Detecting Citation Importance Using Publication Full Texts
TLDR
This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications’ full text and finds abstract similarity one of the most predictive features. Expand
Determining How Citations Are Used in Citation Contexts
TLDR
This paper proposes a classification scheme for citation contexts, as well as a machine-learning-based approach to determine the classes automatically, and reveals that the classification performance varies significantly between the citation types. Expand
Mining the Context of Citations in Scientific Publications
TLDR
This paper compares and builds upon on four state-of-the-art models that detect important citations using 450 manually annotated citations by experts - randomly selected from 20,527 papers from the Association for Computational Linguistics corpus. Expand
Citation Classification for Behavioral Analysis of a Scientific Field
TLDR
It is demonstrated that authors are sensitive to discourse structure and publication venue when citing, that online readers follow temporal links to previous and future work rather than methodological links, and that how a paper cites related work is predictive of its citation count. Expand
Incidental or influential? - A decade of using text-mining for citation function classification
TLDR
Overall, it is shown that many of the features previously described in literature have been either reported as not particularly predictive, cannot be reproduced based on their existing descriptions or should not be used due to their reliance on external changing evidence. Expand
Identification of important citations by exploiting research articles’ metadata and cue-terms from content
TLDR
This paper presents a binary citation classification scheme, which is dominated by metadata based parameters, and claims that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable. Expand
A Comprehensive Evaluation of Cue-Words based Features and In-text Citations based Features for Citation Classification
TLDR
A hybrid approach would present all possible combinations of cue-words and in-text citation-based features for citation classifications. Expand
Important citation identification by exploiting content and section-wise in-text citation count
TLDR
A novel approach for binary citation classification is presented by exploiting section-wise in-text citation frequencies, similarity score, and overall citation count-based features to achieve improved value of precision from contemporary state-of-the-art approach. Expand
Important Citation Identification by Exploiting the Optimal In-text Citation Frequency
TLDR
This research explored the significance of applying Threshold value over Frequency count for binary classification and identified optimal threshold value of frequency count and further applied this to classify the citations into important and non-important ones. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
Measuring academic influence: Not all citations are equal
TLDR
The hip‐index, a model for predicting academic influence that achieves good performance on this data set using only four features, was found, among those evaluated, those based on the number of times a reference is mentioned in the body of a citing paper. Expand
Unsupervised prediction of citation influences
TLDR
A probabilistic topic model is devised that explains the generation of documents and incorporates the aspects of topical innovation and topical inheritance via citations, and its ability to predict the strength of influence of citations against manually rated citations is evaluated. Expand
Automatic classification of citation function
TLDR
This work shows that the annotation scheme for citation function is reliable, and presents a supervised machine learning framework to automatically classify citation function, using both shallow and linguistically-inspired features, finding a strong relationship between citation function and sentiment classification. Expand
CiteSeer: an automatic citation indexing system
TLDR
CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Expand
Joint latent topic models for text and citations
TLDR
This work addresses the problem of joint modeling of text and citations in the topic modeling framework with two different models called the Pairwise-Link-LDA and the Link-PLSA-Lda models, which combine the LDA and PLSA models into a single graphical model. Expand
Logical Structure Recovery in Scholarly Articles with Rich Document Features
TLDR
SectLabel is described, a module that further develops existing software to detect the logical structure of a document from existing PDF files, using the formalism of conditional random fields. Expand
Latent Topic Models for Hypertext
TLDR
This paper presents a probabilistic generative model for hypertext document collections that explicitly models the generation of links and shows how to perform EM learning on this model efficiently. Expand
TopicFlow Model: Unsupervised Learning of Topic-specific Influences of Hyperlinked Documents
TLDR
The TopicFlow model can be a powerful visualization tool to track the diffusion of topics across a citation network and is competitive with the state-of-theart Relational Topic Models in predicting the likelihood of unseen text on two different data sets. Expand
Digital Libraries and Autonomous Citation Indexing
TLDR
Digital libraries incorporating ACI can help organize scientific literature and may significantly improve the efficiency of dissemination and feedback and speed the transition to scholarly electronic publishing. Expand
The PageRank Citation Ranking : Bringing Order to the Web
TLDR
This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages. Expand
...
1
2
3
...