Apache PDFBox is an open source pure-
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of
PDF files.
Open Hub
Black Duck Open Hub, formerly Ohloh, is a website which provides a web services suite and online community platform that aims to index the open-source software development community. It was founded by former Microsoft managers Jason Allen and Sc ...
reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing
year-over-year commits. Using the
COCOMO
The Constructive Cost Model (COCOMO) is a procedural software cost estimation model developed by Barry W. Boehm. The model parameters are derived from fitting a regression formula using data from historical projects (63 projects for COCOMO 81 ...
model, it took an estimated 46
person-years of effort.
Structure
Apache PDFBox has these components:
* PDFBox: the main part
* FontBox: handles font information
* XmpBox: handles
XMP metadata
* Preflight (optional): checks PDF files for
PDF/A-1b conformity.
History
PDFBox was started in 2002 in
SourceForge
SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirroring ...
by Ben Litchfield who wanted to be able to extract text of PDF files for
Lucene
Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as ...
. It became an
Apache Incubator
Apache Incubator is the gateway for open-source projects intended to become fully fledged Apache Software Foundation projects.
The Incubator project was created in October 2002 to provide an entry path to the Apache Software Foundation for projec ...
project in 2008, and an Apache top level project in 2009.
Preflight was originally named PaDaF and developed by
Atos worldline, and donated to the project in 2011.
In February 2015, Apache PDFBox was named an Open Source Partner Organization of the
PDF Association
The PDF Association promotes the adoption and use of International Standards related to PDF technology by assisting enterprise content management (ECM), document management system (DMS) and advanced PDF users with the implementation of PDF techn ...
.
Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association
February 3, 2015
See also
* List of PDF software
This is a list of links to articles on software used to manage Portable Document Format (PDF) documents. The distinction between the various functions is not entirely clear-cut; for example, some viewers allow adding of annotations, signatures, e ...
References
External links
Apache PDFBox Project
{{Apache Software Foundation
PDFBox
Free PDF software
Free software programmed in Java (programming language)
Java (programming language) libraries
Java platform
Software using the Apache license