Apache PDFBox is an open source pure-
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of
PDF
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
files.
Open Hub reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing
year-over-year commits. Using the
COCOMO model, it took an estimated 46
person-years of effort.
Structure
Apache PDFBox has these components:
* PDFBox: the main part
* FontBox: handles font information
* XmpBox: handles
XMP metadata
* Preflight (optional): checks PDF files for
PDF/A
PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
-1b conformity.
History
PDFBox was started in 2002 in
SourceForge
SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
by Ben Litchfield who wanted to be able to extract text of PDF files for
Lucene
Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
. It became an Apache Incubator project in 2008, and an Apache top level project in 2009.
Preflight was originally named PaDaF and developed by
Atos worldline, and donated to the project in 2011.
In February 2015, Apache PDFBox was named an Open Source Partner Organization of the
PDF Association.
Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association
February 3, 2015
See also
* List of PDF software
References
External links
Apache PDFBox Project
{{Apache Software Foundation
PDFBox
Free PDF software
Free software programmed in Java (programming language)
Java (programming language) libraries
Java platform
Software using the Apache license