Learn More
In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem.(More)
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies <sup>1</sup>. This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution(More)
Due to name abbreviations, identical names, name misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This(More)
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other(More)
Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper presents a hierarchical naive Bayes mixture model, an unsupervised learning approach, for(More)
Clinical investigations present much evidence that the glucocorticoid receptor (GR) antagonist mifepristone leads to a rapid amelioration of depression. The molecular mechanisms of mifepristone involved in the treatment of depression are not fully understood. Depression is associated with hippocampal plasticity, for which increased excitatory amino acid(More)
This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer's autonomous citation database can be(More)
Acknowledgements in research publications, like citations, indicate influential contributions to scientific work; however, large-scale acknowledgement analyses have traditionally been impractical due to the high cost of manual information extraction. In this paper we describe a mixture method for automatically mining acknowledgements from research documents(More)
Quantitative susceptibility mapping (QSM) is a novel MRI method for quantifying tissue magnetic property. In the brain, it reflects the molecular composition and microstructure of the local tissue. However, susceptibility maps reconstructed from single-orientation data still suffer from streaking artifacts which obscure structural details and small lesions.(More)
BACKGROUND Few studies have investigated the relationship between anemia, smoking, drinking and survival in esophageal squamous cell carcinoma (ESCC) with primary radiotherapy. This study had the aim of evaluating the prognostic value of anemia, smoking and drinking in patients receiving primary radiotherapy for ESCC. METHODS A total of 79 patients who(More)