Learn More
SeerSuite is a framework for scientific and academic digital libraries and search engines built by crawling scientific and academic documents from the web with a focus on providing reliable, robust services. In addition to full text indexing, SeerSuite supports autonomous citation indexing and automatically links references in research articles to(More)
CiteSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI(More)
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of(More)
Acknowledgments are widely used in scientific articles to express gratitude and credit collaborators. Despite suggestions that indexing acknowledgments automatically will give interesting insights, there is currently, to the best of our knowledge, no such system to track acknowledgments and index them. In this paper we introduce AckSeer, a search engine and(More)
Online digital libraries that store and index research articles not only make it easier for researchers to search for scientific information, but also have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data available in digital libraries(More)
We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based(More)
—CiteSeer χ is a digital library that contains approximately 3.5 million scholarly documents and receives between 2 and 4 million requests per day. In addition to making documents available via a public Website, the data is also used to facilitate research in areas like citation analysis, co-author network analysis, scalability evaluation and information(More)
Acknowledgments in scientific and academic papers are a method of expressing appreciation and gratitude between scholars. Inside the acknowledgments section, authors usually thank different entities or individuals for their support, most commonly in terms of grants, suggestions and discussions. Social networks of authors and publications has been well(More)