HOME

TheInfoList



OR:

Apache PDFBox is an open source pure-
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
files. Open Hub reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing year-over-year commits. Using the COCOMO model, it took an estimated 46 person-years of effort.


Structure

Apache PDFBox has these components: * PDFBox: the main part * FontBox: handles font information * XmpBox: handles XMP metadata * Preflight (optional): checks PDF files for
PDF/A PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
-1b conformity.


History

PDFBox was started in 2002 in
SourceForge SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
by Ben Litchfield who wanted to be able to extract text of PDF files for
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
. It became an Apache Incubator project in 2008, and an Apache top level project in 2009. Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011. In February 2015, Apache PDFBox was named an Open Source Partner Organization of the PDF Association.Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association
February 3, 2015


See also

* List of PDF software


References


External links


Apache PDFBox Project
{{Apache Software Foundation PDFBox Free PDF software Free software programmed in Java (programming language) Java (programming language) libraries Java platform Software using the Apache license