• Publications
  • Influence
Examining Variations of Prominent Features in Genre Classification
TLDR
We use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification. Expand
  • 35
  • 3
  • PDF
Automatically structuring domain knowledge from text: An overview of current research
TLDR
This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Expand
  • 42
  • 1
  • PDF
Building a document genre corpus: a profile of the KRYS I corpus
TLDR
This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. Expand
  • 9
  • 1
  • PDF
CLEAR: a credible method to evaluate website archivability
TLDR
We introduce the Credible Live Evaluation of Archive Readiness (CLEAR) method, a set of metrics to quantify the level of archivability of any website. Expand
  • 17
  • 1
  • PDF
AutoEval: An Evaluation Methodology for Evaluating Query Suggestions Using Query Logs
TLDR
AutoEval is an evaluation methodology that assesses the quality of query modifications generated by a model using the query logs of past user interactions with a search engine. Expand
  • 15
  • 1
  • PDF
Detecting Family Resemblance: Automated Genre Classification
TLDR
This paper presents results in automated genre classification of digital documents in PDF format. Expand
  • 9
  • 1
  • PDF
Genre Classification in Automated Ingest and Appraisal Metadata
TLDR
Metadata creation is a crucial aspect of the ingest of digital materials into digital libraries. Expand
  • 17
  • 1
  • PDF
Formulating Representative Features with Respect to Genre Classification
TLDR
Document classification is one of the most fundamental steps in enabling the search, selection, and ranking of digital material according to its relevance in answering a predefined search. Expand
  • 5
  • 1
"The Naming of Cats": Automated Genre Classification
TLDR
This paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives. Expand
  • 14
  • 1
Digital forensics formats: seeking a digital preservation storage format for web archiving
TLDR
In this paper we discuss archival storage formats from the point of view of digital curation and preservation. Expand
  • 2
  • 1
  • PDF