HOME

TheInfoList



OR:

The Apache Object Oriented Data Technology (OODT) is an open source data management system framework that is managed by the Apache Software Foundation. OODT was originally developed at
NASA Jet Propulsion Laboratory The Jet Propulsion Laboratory (JPL) is a federally funded research and development center and NASA field center in the City of La Cañada Flintridge, California, United States. Founded in the 1930s by Caltech researchers, JPL is owned by NASA a ...
to support capturing, processing and sharing of data for NASA's scientific archives.


History

The project started out as an internal
NASA Jet Propulsion Laboratory The Jet Propulsion Laboratory (JPL) is a federally funded research and development center and NASA field center in the City of La Cañada Flintridge, California, United States. Founded in the 1930s by Caltech researchers, JPL is owned by NASA a ...
project incepted by Daniel J. Crichton, Sean Kelly and Steve Hughes. The early focus of the effort was on information integration and search using XML as described in Crichton et al.'s paper in the CODATA meeting in 2000. After deploying OODT to the
Planetary Data System The Planetary Data System (PDS) is a distributed data system that NASA uses to archive data collected by Solar System missions. The PDS is an active archive that makes available well documented, peer reviewed planetary data to the research communi ...
and to the
National Cancer Institute The National Cancer Institute (NCI) coordinates the United States National Cancer Program and is part of the National Institutes of Health (NIH), which is one of eleven agencies that are part of the U.S. Department of Health and Human Services. T ...
EDRN {{Short description, Cancer biomarker discovery project The Early Detection Research Network (EDRN) is a collaboration led by the National Cancer Institute (NCI) focused on the discovery of cancer biomarkers. The effort, started in 2000, includes b ...
or Early Detection Research Network project, OODT in 2005 moved into the era of large scale data processing and management via
NASA The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research. NASA was established in 1958, succeedi ...
's
Orbiting Carbon Observatory The Orbiting Carbon Observatory (OCO) is a NASA satellite mission intended to provide global space-based observations of atmospheric carbon dioxide (). The original spacecraft was lost in a launch failure on 24 February 2009, when the payload ...
(OCO) project. OODT's role on OCO was to usher in a new data management processing framework that instead of tens of jobs per day and tens of gigabytes of data would handle 10,000 jobs per day and hundreds of terabytes of data. This required an overhaul of OODT to support these new requirements. Dr. Chris Mattmann at NASA JPL led a team of 3-4 developers between 2005-2009 and completely re-engineered OODT to support these new requirements. Influenced by the emerging efforts in
Apache Nutch Apache Nutch is a highly extensible and scalable open source web crawler software project. Features Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architec ...
and
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
which Mattmann participated in, OODT was given an overhaul making it more amenable towards Apache Software Foundation like projects. In addition, Mattmann had a close relationship with Dr.
Justin Erenkrantz Justin Erenkrantz was President of the Apache Software Foundation. He previously served as Treasurer for the ASF. Erenkrantz worked as a software engineer for Joost. He is currently working as Head of Compute Architecture at Bloomberg L.P., New ...
, who as the Apache Software Foundation President at the time, and the idea to bring OODT to the Apache Software Foundation emerged. In 2009, Mattmann and his team received approval from NASA and from JPL to bring OODT to Apache making it the first NASA project to be stewarded by the foundation. Seven years later, the project has released a version 1.0.


Features

OODT focuses on two canonical use cases: Big Data processing and on
Information integration Information integration (II) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured ...
. Both were described in Mattmann's ICSE 2006 and SMC-IT 2009 papers. It provides three core services.


File Manager

A File Manager is responsible for tracking file locations, their metadata, and for transferring files from a staging area to controlled access storage.


Workflow Manager

A Workflow Manager captures control flow and data flow for complex processes, and allows for reproducibility and the construction of scientific pipelines.


Resource Manager

A Resource Manager handles allocation of Workflow Tasks and other jobs to underlying resources, e.g., Python jobs go to nodes with Python installed on them; jobs that require a large disk or CPU are properly sent to those nodes that fulfill those requirements. In addition to the three core services, OODT provides three client-oriented frameworks that build on these services.


File Crawler

A file Crawler automatically extracts metadata and uses
Apache Tika Apache Tika is a content detection and analysis framework, written in Java, stewarded at the Apache Software Foundation. It detects and extracts metadata and text from over a thousand different file types, and as well as providing a Java libr ...
to identify file types and ingest the associated information into the File Manager.


Catalog and Archive Crawling Framework

A Push/Pull framework acquires remote files and makes them available to the system.


Catalog and Archive Service Production Generation Executive (CAS-PGE)

A scientific algorithm wrapper (called CAS-PGE, for Catalog and Archive Service Production Generation Executive) encapsulates scientific codes and allows for their execution independent of environment, and while doing so capturing provenance, and making the algorithms easily integrated into a production system.


CAS RESTful Services

A Set of RESTful APIs which exposes the capabilities of File Manager, Workflow Manager and Resource manager components.


OPSUI Monitor Dashboard

A web application for exposing services form the underlying OODT product / workflow / resource managing Control Systems via the
JAX-RS Jakarta RESTful Web Services, (JAX-RS; formerly Java API for RESTful Web Services) is a Jakarta EE application programming interface, API specification that provides support in creating web services according to the Representational State Transf ...
specification. At this stage it is built using
Apache Wicket Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Ver ...
components. The overall motivation for OODT's re-architecting was described in a paper in
Nature (journal) ''Nature'' is a British weekly scientific journal founded and based in London, England. As a multidisciplinary publication, ''Nature'' features peer-reviewed research from a variety of academic disciplines, mainly in science and technology. ...
in 2013 by Mattmann called A Vision for Data Science. OODT is written in the
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
, and through its REST API used in other languages including
Python (programming language) Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected. It supports multiple programming pa ...
.


Notable uses

OODT has been recently highlighted as contributing to NASA missions including Soil Moisture Active Passive and
New Horizons ''New Horizons'' is an interplanetary space probe that was launched as a part of NASA's New Frontiers program. Engineered by the Johns Hopkins University Applied Physics Laboratory (APL) and the Southwest Research Institute (SwRI), with a t ...
. OODT also helps to power the
Square Kilometre Array The Square Kilometre Array (SKA) is an intergovernmental international radio telescope project being built in Australia (low-frequency) and South Africa (mid-frequency). The combining infrastructure, the Square Kilometre Array Observatory (SK ...
telescope increasing the scope of its use from Earth science, Planetary science, radio astronomy, and to other sectors. OODT is also used within bioinformatics and is a part of the Knowledgent Big Data Platform.


References


External links

* http://oodt.apache.org {{Apache Software Foundation OODT Java platform Free software programmed in Java (programming language) Java (programming language) libraries Software using the Apache license