Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 225,168,520 papers from all fields of science
Search
Sign In
Create Free Account
PDFBox
Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta…
Expand
Wikipedia
(opens in a new tab)
Create Alert
Alert
Related topics
Related topics
8 relations
(formerly Ohloh)
COCOMO
Extensible Metadata Platform
Java
Expand
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
2015
2015
Open-domain Table Detection Using Large-scale PDF Files without Annotation
M. Fan
,
Doo Soon Kim
2015
Corpus ID: 11967133
Superior to state-of-the-art approaches which compete in recognizing tables among 67 annotated government reports (PDF) released…
Expand
2015
2015
Detecting Table Region in PDF Documents Using Distant Supervision
M. Fan
,
Doo Soon Kim
2015
Corpus ID: 14348894
Superior to state-of-the-art approaches which compete in table recognition with 67 annotated government reports in PDF format…
Expand
2014
2014
Extracting bibliographical data for PDF documents with HMM and external resources
W. Hsiao
,
Te-Min Chang
,
Thomas Erwin
Program
2014
Corpus ID: 9996467
Purpose – The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract…
Expand
2013
2013
The Icecite Research Paper Management System
Hannah Bast
,
Claudius Korzen
WISE
2013
Corpus ID: 19397656
We present Icecite, a new fully web-based research paper management system (RPMS). Icecite facilitates the following otherwise…
Expand
2012
2012
The Network is the Format: PDF and the Long-term Use of Digital Content
Sheila Morrissey
Archiving Conference
2012
Corpus ID: 15465259
The Association for Library Collections and Technical Services (ALCTS) defines the goal of digital preservation as “the accurate…
Expand
2012
2012
Improving the Extraction of Text in PDFs by Simulating the Human Reading Order
Ismael Hasan
,
Javier Parapar
,
Álvaro Barreiro
Journal of universal computer science (Online)
2012
Corpus ID: 6251760
Text preprocessing and segmentation are critical tasks in search and text mining applications. Due to the huge amount of…
Expand
2012
2012
Automatische Referenzextraktion mit PARSCIT
Karima Haddou ou Moussa
,
Philipp Mayr
2012
Corpus ID: 61665559
Meist werden am Ende eines wissenschaftlichen Dokuments Referenzlisten angegeben, die die verwendeten Quellen und Hinweise auf…
Expand
2011
2011
Text mining: Finding right documents from large collection of unstructured documents
Savidu Amarakoon
,
A. Caldera
The 3rd International Conference on Data Mining…
2011
Corpus ID: 14173759
In our day to day life we come across unstructured data in many forms. These include books journals, audio / video files and…
Expand
2010
2010
Research of paper metadata extraction method based on SVM
Lu Le-bin
2010
Corpus ID: 63527625
By analyzing the characteristics of multi-classification support vector machine,BBT-SVM model was established,and the relevant…
Expand
1998
1998
Two Toolboxes for Systems with Random Delays
J. Nilsson
1998
Corpus ID: 64037714
This report is the manual for the Matlab toolboxes pdfbox and dlqgbox. The toolboxes are used for design and analysis of systems…
Expand
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
(opens in a new tab)
,
Terms of Service
(opens in a new tab)
, and
Dataset License
(opens in a new tab)
ACCEPT & CONTINUE