• Publications
  • Influence
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
  • Man Lan, C. Tan, J. Su, Y. Lu
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine…
  • 1 April 2009
TLDR
In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space. Expand
  • 479
  • 53
  • PDF
Predicting Discourse Connectives for Implicit Discourse Relation Recognition
TLDR
In this paper we attempt to overcome this difficulty for implicit relation recognition by automatically inserting discourse connectives between arguments with the use of a language model. Expand
  • 107
  • 14
  • PDF
Proposing a New Term Weighting Scheme for Text Categorization
TLDR
We propose a new supervised term weighting scheme for text categorization and investigate its performance on two popular data collections. Expand
  • 103
  • 9
  • PDF
ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment
  • J. Zhao, Tiantian Zhu, Man Lan
  • Computer Science
  • SemEval@COLING
  • 1 August 2014
TLDR
This paper presents our approach to semantic relatedness and textual entailment subtasks organized as task 1 in SemEval 2014. Expand
  • 91
  • 9
  • PDF
A Refined End-to-End Discourse Parser
TLDR
We use 9 components to construct the whole parser to identify discourse connectives, label arguments and classify the sense of Explicit or Non-Explicit relations in free texts. Expand
  • 52
  • 9
  • PDF
Multi-task Attention-based Neural Networks for Implicit Discourse Relationship Representation and Identification
TLDR
We present a novel multi-task attention-based neural network model to address implicit discourse relationship representation and identification through two types of representation learning, an attention based neural network for learning discourse Relationship representation with two arguments and a multi- task framework for learning knowledge from annotated and unannotated corpora. Expand
  • 42
  • 9
  • PDF
ECNU at SemEval-2017 Task 1: Leverage Kernel-based Traditional NLP features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity
TLDR
We build a universal model in combination of traditional NLP methods and deep learning methods together and the extensive experimental results show that this combination not only improves the performance but also increases the robustness for modeling similarity of multilingual sentences. Expand
  • 46
  • 6
  • PDF
Leveraging Synthetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition
TLDR
We present a multi-task learning based system which can effectively use synthetic data for implicit discourse relation recognition. Expand
  • 45
  • 5
  • PDF
Initialization of cluster refinement algorithms: a review and comparative study
TLDR
This paper reviews the various cluster initialization methods in the literature by categorizing them into three major families, namely random sampling methods, distance optimization methods, and density estimation methods. Expand
  • 108
  • 4
  • PDF
A comparative study on term weighting schemes for text categorization
TLDR
The term weighting scheme, which is used to convert documents into vectors in the term spaces, is a vital step in automatic text categorization. Expand
  • 83
  • 4
  • PDF