A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

  title={A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications},
  author={Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard H. Hovy and Roy Schwartz},
Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research purposes (PeerRead v1),1 providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR. The dataset also includes 10.7K textual peer reviews written by experts for a subset of the papers. We describe the… 

Figures and Tables from this paper

COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews

From a thorough observation of a large set of review sentences, a taxonomy of categories in comparison discussions is built and a detailed annotation scheme is presented to analyze this.

Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus

A comparative study of state-of-the-art text modelling methods on the newly crafted, largest review dataset of its kind based on Interspeech 2019, and the first to explore uncertainty-aware methods (soft labels, quantile regression) to address the subjectivity inherent in this problem.

Argument Mining for Understanding Peer Reviews

The content and structure of peer reviews under the argument mining framework is studied, through automatically detecting the argumentative propositions put forward by reviewers, and their types (e.g., evaluating the work or making suggestions for improvement).

DeepASPeer: Towards an Aspect-level Sentiment Controllable Framework for Decision Prediction from Academic Peer Reviews

This work study how to take advantage of aspects and their corresponding sentiment to build a generic controllable system to assist the editor/chair in determining the outcome based on the reviews of a paper to make better editorial decisions.

Can We Automate Scientific Reviewing?

Comprehensive experimental results show that system-generated reviews tend to touch upon more aspects of the paper than human-written reviews, but the generated text can suffer from lower constructiveness for all aspects except the explanation of the core ideas of the papers, which are largely factually correct.

A Neural Citation Count Prediction Model based on Peer Review Text

This paper takes the initiative to utilize peer review data for the CCP task with a neural prediction model and incorporates the abstract-review match mechanism and the cross- review match mechanism to learn deep features from peer review text.

Multi-task Peer-Review Score Prediction

A multi-task shared structure encoding approach which automatically selects good shared network structures as well as good auxiliary resources for improving the performance of the target.

Does My Rebuttal Matter? Insights from a Major NLP Conference

The results suggest that a reviewer’s final score is largely determined by her initial score and the distance to the other reviewers’ initial scores, which could help better assess the usefulness of the rebuttal phase in NLP conferences.

Between Acceptance and Rejection: Challenges for an Automatic Peer Review Process

Evaluated the general performance of existing state-of-the-art models for RSP and PDP tasks and investigated what types of instances these models tend to have difficulty classifying and how impactful they are, finding that there are groups of instances that can negatively impact the model’s performance.

DeepSentiPeer: Harnessing Sentiment in Review Texts to Recommend Peer Review Decisions

The role of reviewer sentiment embedded within peer review texts to predict the peer review outcome is investigated and a proposed deep neural architecture takes into account three channels of information: the paper, the corresponding reviews, and review’s polarity to Predict the overall recommendation score as well as the final decision.



Content-Based Citation Recommendation

It is shown empirically that, although adding metadata improves the performance on standard metrics, it favors self-citations which are less useful in a citation recommendation setup and released an online portal for citation recommendation based on this method.

RevRank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews

A novel method for content anal- ysis, which is especially suitable for product reviews, and shows that REVRANK clearly outperforms a baseline imitating the user vote model used by Amazon.

A Quantitative Analysis of Peer Review

A number of unexpected results were found, in particular the low correlation between peer review outcome and impact in time of the accepted contributions and the presence of a high level of randomness in the analyzed peer review processes.

Single versus Double Blind Reviewing at WSDM 2017

In this paper we study the implications for conference program committees of using single-blind reviewing, in which committee members are aware of the names and affiliations of paper authors, versus

Deep Unordered Composition Rivals Syntactic Methods for Text Classification

This work presents a simple deep neural network that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question answering tasks while taking only a fraction of the training time.

Journal ratings as predictors of articles quality in Arts, Humanities and Social Sciences: an analysis based on the Italian Research Evaluation Exercise

A positive relationship among peer evaluation and journal ranking is interpreted as evidence that journal ratings are good predictors of article quality as well as providing for the first time a large scale test of the robustness of expert-based classification.

Popularity of arXiv.org within Computer Science

What percentage of papers in computer science are placed on the arXiv, by cross-referencing published papers in DBLP with e-prints on arXIV, is measured, which has risen dramatically among the most selective conferences inComputer science.

NIH peer review percentile scores are poorly predictive of grant productivity

It is reported that these percentile scores awarded by peer review panels are a poor discriminator of productivity, which underscores the limitations of peer review as a means of assessing grant applications in an era when typical success rates are often as low as about 10%.

Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation

  • Vipin Balachandran
  • Computer Science
    2013 35th International Conference on Software Engineering (ICSE)
  • 2013
Through a user study, it is shown that integrating static analysis tools with code review process can improve the quality of code review, and a tool called Review Bot is proposed for the integration of automatic static analysis with the codereview process.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.