PDFBox
   HOME

TheInfoList



OR:

Apache PDFBox is an open source pure-
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
files. Open Hub reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing year-over-year commits. Using the
COCOMO The Constructive Cost Model (COCOMO) is a procedural software cost estimation model developed by Barry W. Boehm. The model parameters are derived from fitting a regression formula using data from historical projects (63 projects for COCOMO 81 ...
model, it took an estimated 46 person-years of effort.


Structure

Apache PDFBox has these components: * PDFBox: the main part * FontBox: handles font information * XmpBox: handles XMP metadata * Preflight (optional): checks PDF files for
PDF/A PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
-1b conformity.


History

PDFBox was started in 2002 in
SourceForge SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
by Ben Litchfield who wanted to be able to extract text of PDF files for
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
. It became an Apache Incubator project in 2008, and an Apache top level project in 2009. Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011. In February 2015, Apache PDFBox was named an Open Source Partner Organization of the
PDF Association The PDF Association promotes the adoption and use of International Standards related to PDF technology by assisting enterprise content management (ECM), document management system (DMS) and advanced PDF users with the implementation of PDF technol ...
.Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association
February 3, 2015


See also

*
List of PDF software This is a list of links to articles on software used to manage Portable Document Format (PDF) documents. The distinction between the various functions is not entirely clear-cut; for example, some viewers allow adding of annotations, signatures, e ...


References


External links


Apache PDFBox Project
{{Apache Software Foundation PDFBox Free PDF software Free software programmed in Java (programming language) Java (programming language) libraries Java platform Software using the Apache license