PDFBox

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta…

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.

2015

Superior to state-of-the-art approaches which compete in recognizing tables among 67 annotated government reports (PDF) released…

2015

Superior to state-of-the-art approaches which compete in table recognition with 67 annotated government reports in PDF format…

2014

Purpose – The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract…

2013

We present Icecite, a new fully web-based research paper management system (RPMS). Icecite facilitates the following otherwise…

2012

The Association for Library Collections and Technical Services (ALCTS) defines the goal of digital preservation as “the accurate…

2012

Text preprocessing and segmentation are critical tasks in search and text mining applications. Due to the huge amount of…

2012

Meist werden am Ende eines wissenschaftlichen Dokuments Referenzlisten angegeben, die die verwendeten Quellen und Hinweise auf…

2011

In our day to day life we come across unstructured data in many forms. These include books journals, audio / video files and…

2010

By analyzing the characteristics of multi-classification support vector machine,BBT-SVM model was established,and the relevant…

1998

This report is the manual for the Matlab toolboxes pdfbox and dlqgbox. The toolboxes are used for design and analysis of systems…