Greenplum is a
big data technology based on
MPP architecture and the
Postgres open source database technology. The technology was created by a company of the same name headquartered in
San Mateo,
California
California is a state in the Western United States, located along the Pacific Coast. With nearly 39.2million residents across a total area of approximately , it is the most populous U.S. state and the 3rd largest by area. It is also the ...
around 2005. Greenplum was acquired by
EMC Corporation
Dell EMC (EMC Corporation until 2016) is an American multinational corporation headquartered in Hopkinton, Massachusetts and Round Rock, Texas, United States. Dell EMC sells data storage, information security, virtualization, analytics, cloud c ...
in July 2010.
[
]
Starting in 2012, its
database management system
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
software became known as the Pivotal Greenplum Database sold through
Pivotal Software. Pivotal open sourced the core engine and continued its development by the Greenplum Database open source community and Pivotal.
Starting in 2020 Pivotal was acquired by
VMware
VMware, Inc. is an American cloud computing and virtualization technology company with headquarters in Palo Alto, California. VMware was the first commercially successful company to virtualize the x86 architecture.
VMware's desktop software ru ...
and VMware continued to sponsor the Greenplum Database open source community as well as commercialize the technology under the brand name VMware Tanzu Greenplum.
Company
Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan. It was a merger of two smaller companies: Metapa (founded in August 2000 near
Los Angeles
Los Angeles ( ; es, Los Ángeles, link=no , ), often referred to by its initials L.A., is the largest city in the state of California and the second most populous city in the United States after New York City, as well as one of the wor ...
) and Didera in
Fairfax, Virginia
The City of Fairfax ( ), colloquially known as Fairfax City, Downtown Fairfax, Old Town Fairfax, Fairfax Courthouse, FFX, or simply Fairfax, is an independent city in the Commonwealth of Virginia in the United States. At the 2010 census the p ...
.
Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total of in funding was announced at the merger. Greenplum, based in
San Mateo, California
San Mateo ( ; ) is a city in San Mateo County, California, on the San Francisco Peninsula. About 20 miles (32 km) south of San Francisco, the city borders Burlingame, California, Burlingame to the north, Hillsborough, California, Hillsboro ...
, released its
database management system
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
software based on
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
in April 2005 calling it Bizgres. Rounds of
venture capital
Venture capital (often abbreviated as VC) is a form of private equity financing that is provided by venture capital firms or funds to start-up company, startups, early-stage, and emerging companies that have been deemed to have high growth poten ...
of about each were invested in March 2006 and February 2007.
In July 2006 a partnership with
Sun Microsystems
Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, ...
was announced. Sun, which had also acquired
MySQL AB
MySQL AB was a Swedish software company founded in 1995. It was acquired by Sun Microsystems in 2008, Sun was in turn acquired by Oracle Corporation in 2010. MySQL AB is the creator of MySQL, a relational database management system, as well a ...
, participated in a round of investment in January 2009, led by
Meritech Capital Partners.
The Bizgres project included a few other members, and was supported through about 2008, when the product was just called "Greenplum" as well. The
Sun Fire X4500 was a reference architecture and used by the majority of customers until a transition was made to
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
around that time. Greenplum was acquired by
EMC Corporation
Dell EMC (EMC Corporation until 2016) is an American multinational corporation headquartered in Hopkinton, Massachusetts and Round Rock, Texas, United States. Dell EMC sells data storage, information security, virtualization, analytics, cloud c ...
in July 2010, becoming the foundation of EMC's
big data software division.
Although EMC did not disclose the value, it was estimated at . Greenplum's products at the time of acquisition were the Greenplum Database, Chorus (a management tool), and Data Science Labs. Greenplum had customers in
vertical market
A vertical market is a market in which vendors offer goods and services ''specific'' to an industry, trade, profession, or other group of customers with specialized needs. A horizontal market is a market in which a product or service meets a n ...
s including
eBay
eBay Inc. ( ) is an American multinational e-commerce company based in San Jose, California, that facilitates consumer-to-consumer and business-to-consumer sales through its website. eBay was founded by Pierre Omidyar in 1995 and became ...
. It became part of
Pivotal Software in 2012.
A variant using
Apache Hadoop to store data in the Hadoop file system called Hawq was announced in 2013. In 2015 the GreenplumDB and Hawq
open source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
projects were announced.
Technology
Pivotal's Greenplum database product uses
massively parallel
Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
processing (MPP) techniques. Each
computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.
The comp ...
consists of a master node, standby master node, and segment nodes.
All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the
data definition language. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a query enters the master node, it is parsed, planned and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table. The
Structured Query Language, version
SQL:2003, is used to present queries to the system. Transaction semantics comply with constraints known as
ACID.
Competitors include other MPP database management systems provided by major vendors such as
Teradata
Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech a ...
,
Amazon Redshift,
Microsoft Azure, Alibab
AnalyticDBand, in the past, IBM
Netezza.
Additional competition comes from other smaller competitors,
column-oriented databases such as HP
Vertica
Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker, with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as later ...
,
Exasol and
data warehousing vendors with non MPP architecture, such as
Oracle Exadata
The Oracle Exadata Database Machine (Exadata) is a computing platform optimized for running Oracle Databases.
Exadata is a combined hardware and software platform that includes scale-out Intel x86-64 compute and storage servers, RoCE or Infi ...
,
IBM Db2
Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and ...
and
SAP HANA
SAP HANA (HochleistungsANalyseAnwendung or High-performance ANalytic Application) is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a databas ...
.
Greenplum Version 5
In September 2017, Greenplum Database Version 5 was released. Version 5 includes the first iteration of the Greenplum project strategy of merging PostgreSQL later versions back into Greenplum and is based on PostgreSQL version 8.3 up from the previous version 8.2. Version 5 also introducing the General Availability of the GPORCA Optimizer for cost based optimization of SQL designed for big data.
Greenplum Version 6
In September 2019, Greenplum Database Version 6 was released. Version 6 is based on PostgreSQL version 9.4 and features massive gains in
OLTP performance. Greenplum 6 was reviewed in the media by several sources and mentioned for its Postgres open source alignment
and for its OLTP performance
References
{{EMC
Big data companies
Data warehousing products
Pivotal Software