• Corpus ID: 15523254

Routing documents according to styleShlomo

  title={Routing documents according to styleShlomo},
  author={Shlomo Engelson Argamon and Moshe Koppel and Galit Avneri},
Most research on automated text categorization has focused on determining the topic of a given text. While topic is generally the main characteristic of an information need, there are other characteristics that are useful for information retrieval. In this paper we consider the problem of text categorization according to style. For example, in searching the web, we may wish to automatically determine if a given page is promotional or informative, was written by a native En-glish speaker or not… 

Authorship Attribution of The Golden Lotus Based on Text Classification Methods

It is proved that among four authors, Wei Xu most likely be the author of The Golden Lotus.

A model-driven architecture for enterprise document management, supporting discovery and reuse

This thesis presents the overall Model-driven Reuse Architecture and preliminary implementation that has been developed to support the specific needs of teaching and learning in higher education, and evaluates the resulting web-base implementation, MRA-HE, in terms of how it performs against a set of realistic scenarios within the domain of higher education.

The Influence of Negative Emotions on Customer Innovation Activities: An Examination Using Sentiment Analysis

It is shown that negative emotion significantly affects innovation activities in the brand community, and frustration is the most influential among the discrete negative emotions; and as the intensity level of negative emotions increases, so does their influence.



Mistake-Driven Learning in Text Categorization

This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.

Training algorithms for linear text classifiers

This work proposes that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used in training linear text classifiers for IR tasks, and theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms.


ONE element of style which seems to be characteristic of an author, in so far as can be judged from general impressions, is the length of his sentences. This author develops his thought in long,

The Statistical Study of Literary Vocabulary

A NEW book by Mr. Udny Yule is a statistical event which, though of great rarity, is all the more welcome when it occurs. This, however, is not a formal treatment of a subject which has already

Fast Eeective Rule Induction

This paper evaluates the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems, and proposes a number of modiications resulting in an algorithm RIPPERk that is very competitive with C4.5 and C 4.5rules with respect to error rates, but much more eecient on large samples.

Fast e ective rule induction

  • Proceedings of the Twelfth
  • 1995

Applied Bayesian and classical inference : the case of the Federalist papers

Authorship studies/textual statistics.

A Simple Rule-Based Part of Speech Tagger

This work presents a simple rule-based part of speech tagger which automatically acquires its rules and tags with accuracy comparable to stochastic taggers, demonstrating that the stochastics method is not the only viable method for part ofspeech tagging.