WebScaleSQL was an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
relational database management system
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
(RDBMS) created as a
software branch of the production-ready community releases of
MySQL
MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
. By joining efforts of a few companies and incorporating various changes and new features into MySQL, WebScaleSQL aimed toward fulfilling various needs arising from the deployment of MySQL in large-scale environments, which involve large amounts of data and numerous
database server
A database server is a server which uses a database application that provides database services to other computer programs or to computers, as defined by the client–server model. Database management systems (DBMSs) frequently provide database- ...
s.
The
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
of WebScaleSQL is hosted on
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
and licensed under the terms of version 2 of the
GNU General Public License
The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
.
The project website announced in December 2016 that the companies involved would no longer contribute to the project.
Overview
Running MySQL on numerous
servers with large amounts of data, at the scale of
terabyte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s and
petabyte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s of data, creates a set of difficulties that in many cases arise the need for implementing specific customized MySQL features, or the need for introducing functional changes to MySQL. More than a few companies have faced the same (or very similar) set of difficulties in their
production environment
In software deployment, an environment or tier is a computer system or set of systems in which a computer program or software component is deployed and executed. In simple cases, such as developing and immediately executing a program on the same m ...
s, which used to result in the availability of multiple solutions for similar challenges.
WebScaleSQL was announced on March 27, 2014 as a joint effort of
Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
,
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
LinkedIn
LinkedIn () is an American business and employment-oriented Social networking service, social network. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly. Since December 2016, LinkedIn has been a wholly owned subsidiary of Microsoft. ...
and
Twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
(with
Alibaba Group
Alibaba Group Holding Limited, branded as Alibaba (), is a Chinese Multinational corporation, multinational technology company specializing in E-commerce in China, e-commerce, retail, Internet, and technology. Founded on 28 June 1999 in Hangzho ...
joining in January 2015), aiming to provide a centralized development structure for extending MySQL with new features specific to its large-scale deployments, such as building large
replicated databases running on
server farms
A server farm or server cluster is a collection of computer servers, usually maintained by an organization to supply server functionality far beyond the capability of a single machine. They often consist of thousands of computers which require ...
. As a result, WebScaleSQL attempted to open a path toward deduplicating the efforts each founding company had been putting into maintaining its own branch of MySQL, and toward bringing together more developers.
WebScaleSQL was created as a
branch
A branch, also called a ramus in botany, is a stem that grows off from another stem, or when structures like veins in leaves are divided into smaller veins.
History and etymology
In Old English, there are numerous words for branch, includ ...
of the MySQL's latest production-ready community release, which was version 5.6 . As the project aimed to tightly follow new MySQL community releases, a branching path was selected instead of becoming a
software fork
In software development, a fork is a codebase that is created by duplicating an existing codebase and, generally, is subsequently modified independently of the original. Software software build, built from a fork initially has identical behavior ...
of MySQL. The selection of MySQL production-ready community releases for the WebScaleSQL's
upstream, instead of selecting some of the available MySQL forks was the result of a consensus between the four founding companies, which concluded that the features already existing in MySQL 5.6 are suitable for large-scale deployments, while additional features of the same kind are planned for MySQL 5.7.
Features
The initial changes and feature additions that WebScaleSQL introduced to the MySQL 5.6
codebase
In software development, a codebase (or code base) is a collection of source code used to build a particular software system, application, or software component. Typically, a codebase includes only human-written source code system files; thu ...
came from the engineers employed by the four founding companies; however, the project was open to
peer-reviewed
Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work ( peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer review ...
community contributions. , available new features and changes included the following:
* A
software framework
In computer programming, a software framework is a software abstraction that provides generic functionality which developers can extend with custom code to create applications. It establishes a standard foundation for building and deploying soft ...
that provides automated testing of all proposed changes
* A customized suite of database performance tests
* Various changes to the
automated tests provided by the MySQL community releases
* Performance improvements in various areas, including
buffer pool flushing, execution of certain types of
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
queries, and support for
NUMA
Numa or NUMA may refer to:
* Non-uniform memory access (NUMA), in computing
Places
* Numa Falls, a waterfall in Kootenay National Park, Canada
* 15854 Numa, a main-belt asteroid
United States
* Numa, Indiana
* Numa, Iowa
* Numa, Oklahoma
* ...
architectures
* Changes related to large-scale deployments, such as the ability to specify sub-second
client
Client(s) or The Client may refer to:
* Client (business)
* Client (computing), hardware or software that accesses a remote service on another computer
* Customer or client, a recipient of goods or services in return for monetary or other valuable ...
timeouts
* Performance and reliability improvements to the global transaction
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
(GTID)
feature of MySQL 5.6
* So-called
super_read_only
operation mode for the MySQL server, which disables data modification operations even for privileged database accounts
, planned new features and changes included the following:
* New
asynchronous
Asynchrony is any dynamic far from synchronization. If and as parts of an asynchronous system become more synchronized, those parts or even the whole system can be said to be in sync.
Asynchrony or asynchronous may refer to:
Electronics and com ...
MySQL client that will eliminate the client-side waiting while establishing
database connection A database connection is a facility in computer science that allows client software to talk to database server software, whether on the same machine or not. A connection is required to send commands and receive answers, usually in the form of a ...
s, sending
queries, and receiving their results
* Availability of various
table
Table may refer to:
* Table (database), how the table data arrangement is used within the databases
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (information), a data arrangement with rows and column ...
, user and
compression
Compression may refer to:
Physical science
*Compression (physics), size reduction due to forces
*Compression member, a structural element such as a column
*Compressibility, susceptibility to compression
* Gas compression
*Compression ratio, of a ...
statistics
* Changes to the internal compression mechanisms
* Addition of a logical
read-ahead mechanism that will bring significant performance improvements for
full table scans
Availability
WebScaleSQL is distributed in a source-code-only form, with no official binaries available. ,
compiling
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
the source code and running WebScaleSQL is supported only on
x86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set architecture, instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new ope ...
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
hosts, requiring at the same time a
toolchain
A toolchain is a set of software development tools used to build and otherwise develop software. Often, the tools are executed sequentially and form a pipeline such that the output of one tool is the input for the next. Sometimes the term is us ...
that supports
C99
C99 (previously C9X, formally ISO/IEC 9899:1999) is a past version of the C programming language open standard. It extends the previous version ( C90) with new features for the language and the standard library, and helps implementations mak ...
and
C++11
C++11 is a version of a joint technical standard, ISO/IEC 14882, by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), for the C++ programming language. C++11 replaced the prior vers ...
language standards.
The source code is hosted on GitHub and available under version 2 of the GNU General Public License (
GPL v2).
End of Contributions
In December 2016, the WebScaleSQL website announced the companies originally involved in collaborating on the project (Facebook, Google, LinkedIn, Twitter, and Alibaba) would no longer contribute to the project. The announcement blamed differences among the needs of the various companies for the end of the collaboration.
See also
*
Comparison of relational database management systems
The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are ba ...
References
External links
*
*
{{Database
2014 software
Client-server database management systems
Free database management systems
Linux-only free software
MySQL
Relational database management software for Linux