Sqoop is a
command-line interface application for transferring data between
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s and
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
.
The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic.
Description
Sqoop supports incremental loads of a single table or a free form
SQL query
The SQL SELECT statement returns a result set of records, from one or more tables.
A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data ma ...
as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in
Hive or
HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Fil ...
. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from "SQL-to-Hadoop".
Sqoop became a top-level
Apache project in March 2012.
Informatica
Informatica is an American software development company founded in 1993. It is headquartered in Redwood City, California. Its core products include Enterprise Cloud Data Management and Data Integration. It was co-founded by Gaurav Dhillon and Di ...
provides a Sqoop-based
connector from version 10.1.
Pentaho
Pentaho is business intelligence (BI) software that provides data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load (ETL) capabilities. Its headquarters are in Orlando, Florida. Pentah ...
provides
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
Sqoop based connector steps, ''Sqoop Import''
and ''Sqoop Export'',
in their
ETL suite
Pentaho Data Integration
Pentaho is business intelligence (BI) software that provides data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load (ETL) capabilities. Its headquarters are in Orlando, Florida. Pentah ...
since version 4.5 of the software.
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
uses a Sqoop-based connector to help transfer data from
Microsoft SQL Server
Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ...
databases to Hadoop.
Couchbase, Inc.
Couchbase, Inc. is an American public (NASDAQ symbol BASE) software company that develops and provides commercial packages and support for Couchbase Server and Couchbase Lite both of which are open-source, NoSQL, multi-model, document-oriented ...
also provides a
Couchbase Server
Couchbase Server, originally known as Membase, is an open-source, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many c ...
-Hadoop connector by means of Sqoop.
See also
*
Apache Hadoop
*
Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditi ...
*
Apache Accumulo
Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels a ...
*
Apache HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File ...
References
Bibliography
*
External links
*
Sqoop WikiSqoop Users Mailing List Archives
{{Apache Software Foundation
Apache Software Foundation projects
Cloud applications
Hadoop