EMC Greenplum
   HOME

TheInfoList



OR:

Greenplum is a
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
technology based on MPP architecture and the
Postgres PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
open source database technology. The technology was created by a company of the same name headquartered in San Mateo,
California California () is a U.S. state, state in the Western United States that lies on the West Coast of the United States, Pacific Coast. It borders Oregon to the north, Nevada and Arizona to the east, and shares Mexico–United States border, an ...
around 2005. Greenplum was acquired by
EMC Corporation EMC Corporation (stylized as EMC²) was an American multinational corporation headquartered in Hopkinton, Massachusetts, which sold data storage device, data storage, information security, virtualization, analytics, cloud computing and other pro ...
in July 2010. Starting in 2012, its
database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
software became known as the Pivotal Greenplum Database sold through
Pivotal Software Pivotal Software, Inc. was an American Multinational corporation, multinational software and Service (economics), services company based in San Francisco that provided Cloud computing, cloud platform hosting and consulting services. Since Novemb ...
. Pivotal open sourced the core engine and continued its development by the Greenplum Database open source community and Pivotal. Starting in 2020 Pivotal was acquired by VMware and VMware continued to sponsor the Greenplum Database open source community as well as commercialize the technology under the brand name VMware Tanzu Greenplum. In November 2023, VMware was acquired by Broadcom. In May 2024, Tanzu by Broadcom made the decision to close source the Greenplum Database project. All future releases of Greenplum Database will be closed source and released as part of the VMware Tanzu Data Suite.


Company

Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan. It was a merger of two smaller companies: Metapa (founded in August 2000 near
Los Angeles Los Angeles, often referred to by its initials L.A., is the List of municipalities in California, most populous city in the U.S. state of California, and the commercial, Financial District, Los Angeles, financial, and Culture of Los Angeles, ...
) and Didera in
Fairfax, Virginia Fairfax ( ) is an independent city (United States), independent city in Virginia and the county seat of Fairfax County, Virginia, in the United States. As of the 2020 United States census, 2020 census, the population was 24,146. Fairfax is pa ...
. Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total of in funding was announced at the merger. Greenplum, based in
San Mateo, California San Mateo ( ) is the most populous city in San Mateo County, California, United States, on the San Francisco Peninsula. It is part of the San Francisco Bay Area metropolitan region, and is located about south of San Francisco. San Mateo border ...
, released its
database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
software based on
PostgreSQL PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
in April 2005 calling it Bizgres. Rounds of
venture capital Venture capital (VC) is a form of private equity financing provided by firms or funds to start-up company, startup, early-stage, and emerging companies, that have been deemed to have high growth potential or that have demonstrated high growth in ...
of about each were invested in March 2006 and February 2007. In July 2006 a partnership with
Sun Microsystems Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
was announced. Sun, which had also acquired
MySQL AB MySQL AB was a Swedish software company founded in 1995. It was acquired by Sun Microsystems in 2008, Sun was in turn acquired by Oracle Corporation in 2010. MySQL AB is the creator of MySQL, a relational database management system, as well a ...
, participated in a round of investment in January 2009, led by
Meritech Capital Partners Meritech Capital Partners is an American Venture Firm company focused on late-stage venture capital investments in information technology companies with a focus on consumer Internet and media, software and services, enterprise infrastructure, an ...
. The Bizgres project included a few other members, and was supported through about 2008, when the product was just called "Greenplum" as well. The Sun Fire X4500 was a reference architecture and used by the majority of customers until a transition was made to
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
around that time. Greenplum was acquired by
EMC Corporation EMC Corporation (stylized as EMC²) was an American multinational corporation headquartered in Hopkinton, Massachusetts, which sold data storage device, data storage, information security, virtualization, analytics, cloud computing and other pro ...
in July 2010, becoming the foundation of EMC's
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
software division. Although EMC did not disclose the value, it was estimated at . Greenplum's products at the time of acquisition were the Greenplum Database, Chorus (a management tool), and Data Science Labs. Greenplum had customers in
vertical market A vertical market is a market in which vendors offer goods and services ''specific'' to an industry, trade, profession A profession is a field of Work (human activity), work that has been successfully professionalized. It can be defined a ...
s including
eBay eBay Inc. ( , often stylized as ebay) is an American multinational e-commerce company based in San Jose, California, that allows users to buy or view items via retail sales through online marketplaces and websites in 190 markets worldwide. ...
. It became part of
Pivotal Software Pivotal Software, Inc. was an American Multinational corporation, multinational software and Service (economics), services company based in San Francisco that provided Cloud computing, cloud platform hosting and consulting services. Since Novemb ...
in 2012. A variant using
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
to store data in the Hadoop file system called Hawq was announced in 2013. In 2015 the GreenplumDB and Hawq
open source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
projects were announced.


Technology

Pivotal's Greenplum database product uses
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of ...
processing (MPP) techniques. Each
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newes ...
consists of a master node, standby master node, and segment nodes. All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the
data definition language In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining d ...
. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a query enters the master node, it is parsed, planned and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table. The
Structured Query Language Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
, version SQL:2003, is used to present queries to the system. Transaction semantics comply with constraints known as
ACID An acid is a molecule or ion capable of either donating a proton (i.e. Hydron, hydrogen cation, H+), known as a Brønsted–Lowry acid–base theory, Brønsted–Lowry acid, or forming a covalent bond with an electron pair, known as a Lewis ...
. Competitors include other MPP database management systems provided by major vendors such as
Teradata Teradata Corporation is an American software company that provides cloud database and Analytics, analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers a ...
,
Amazon Redshift Amazon Redshift is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel (later acqui ...
,
Microsoft Azure Microsoft Azure, or just Azure ( /ˈæʒər, ˈeɪʒər/ ''AZH-ər, AY-zhər'', UK also /ˈæzjʊər, ˈeɪzjʊər/ ''AZ-ure, AY-zure''), is the cloud computing platform developed by Microsoft. It has management, access and development of ...
, Alibab
AnalyticDB
and, in the past, IBM
Netezza IBM Netezza (pronounced ne-teez-a) is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for the most demanding analytic uses including enterpr ...
. Additional competition comes from other smaller competitors,
column-oriented database Data orientation is the representation of tabular data in a linear memory model such as in-disk or in-memory. The two most common representations are column-oriented (columnar format) and row-oriented (row format). The choice of data orientat ...
s such as HP
Vertica Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later ...
, Exasol and
data warehousing In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
vendors with non MPP architecture, such as
Oracle Exadata Oracle Exadata (Exadata) is a computing system optimized for running Oracle Databases. Exadata is a combined database machine and software platform that includes scale-out x86-64 compute and storage servers, RoCE networking, RDMA-addressable ...
, IBM Db2 and
SAP HANA SAP HANA (HochleistungsANalyseAnwendung or High-performance ANalytic Application) is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a datab ...
.


Greenplum Version 7

In September 2023, Greenplum Database Version 7 was released. Version 7 is based on PostgreSQL version 12.12.


Greenplum Version 6

In September 2019, Greenplum Database Version 6 was released. Version 6 is based on PostgreSQL version 9.4 and features massive gains in OLTP performance. Greenplum 6 was reviewed in the media by several sources and mentioned for its Postgres open source alignment and for its OLTP performance


Greenplum Version 5

In September 2017, Greenplum Database Version 5 was released. Version 5 includes the first iteration of the Greenplum project strategy of merging PostgreSQL later versions back into Greenplum and is based on PostgreSQL version 8.3 up from the previous version 8.2. Version 5 also introducing the General Availability of the GPORCA Optimizer for cost based optimization of SQL designed for big data.


References

{{EMC Big data companies Data warehousing products Pivotal Software Defunct software companies of the United States