Sqoop is a
command-line interface application for transferring data between
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s and
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
.
The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic.
Description
Sqoop supports incremental loads of a single table or a free form
SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in
Hive or
HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Fil ...
. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from "SQL-to-Hadoop".
Sqoop became a top-level
Apache project in March 2012.
Informatica provides a Sqoop-based
connector
Connector may refer to:
Hardware
*Plumbing
* Electrical connector, a device for joining electrical circuits together (sometimes known as ports, plugs, or interfaces)
** Gender of connectors and fasteners
** AC power plugs and sockets, devices tha ...
from version 10.1.
Pentaho provides
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
Sqoop based connector steps, ''Sqoop Import''
and ''Sqoop Export'',
in their
ETL suite
Pentaho Data Integration
Pentaho is business intelligence (BI) software that provides data integration, OLAP, OLAP services, reporting, Dashboards (management information systems), information dashboards, data mining and extract, transform, load (ETL) capabilities. Its he ...
since version 4.5 of the software.
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
uses a Sqoop-based connector to help transfer data from
Microsoft SQL Server
Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ...
databases to Hadoop.
Couchbase, Inc. also provides a
Couchbase Server-Hadoop connector by means of Sqoop.
See also
*
Apache Hadoop
*
Apache Hive
*
Apache Accumulo
Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels a ...
*
Apache HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File ...
References
Bibliography
*
External links
*
Sqoop WikiSqoop Users Mailing List Archives
{{Apache Software Foundation
Apache Software Foundation projects
Cloud applications
Hadoop