Memcached (pronounced variously ''mem-cash-dee'' or ''mem-cashed'') is a general-purpose distributed
memory-caching system. It is often used to speed up dynamic
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
-driven websites by caching data and
objects
Object may refer to:
General meanings
* Object (philosophy), a thing, being, or concept
** Object (abstract), an object which does not exist at any particular time or place
** Physical object, an identifiable collection of matter
* Goal, an ...
in
RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached is
free and open-source software
Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
, licensed under the
Revised BSD license.
Memcached runs on
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems (
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
and
macOS
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
) and on
Microsoft Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
. It depends on the
libevent
libevent is a software library that provides asynchronous event notification. The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. libevent als ...
library.
Memcached's
API
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
s provide a very large
hash table distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in
least recently used
In computing, cache algorithms (also frequently called cache replacement algorithms or cache replacement policies) are optimizing instructions, or algorithms, that a computer program or a hardware-maintained structure can utilize in order to ma ...
(LRU) order. Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database.
Memcached has no internal mechanism to track misses which may happen. However, some third party utilities provide this functionality.
Memcached was first developed by
Brad Fitzpatrick for his website
LiveJournal, on May 22, 2003. It was originally written in
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
, then later rewritten in
C by Anatoly Vorobey, then employed by LiveJournal. Memcached is now used by many other systems, including
YouTube
YouTube is a global online video platform, online video sharing and social media, social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by ...
,
Reddit
Reddit (; stylized in all lowercase as reddit) is an American social news aggregation, content rating, and discussion website. Registered users (commonly referred to as "Redditors") submit content to the site such as links, text posts, images ...
,
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
,
Pinterest,
Twitter
Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
,
Wikipedia
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
, and
Method Studios.
Google App Engine
Google App Engine (often referred to as GAE or simply App Engine) is a cloud computing platform as a service for developing and hosting web applications in Google-managed data centers. Applications are sandboxed and run across multiple servers. ...
,
Google Cloud Platform
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Drive, and YouTube. Alongside ...
,
Microsoft Azure
Microsoft Azure, often referred to as Azure ( , ), is a cloud computing platform operated by Microsoft for application management via around the world-distributed data centers. Microsoft Azure has multiple capabilities such as software as a ...
,
IBM Bluemix and
Amazon Web Services also offer a Memcached service through an API.
Software architecture
The system uses a
client–server architecture. The servers maintain a key–value
associative array; the clients populate this array and query it by key. Keys are up to 250 bytes long and values can be at most 1
megabyte
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...
in size.
Clients use client-side libraries to contact the servers which, by default, expose their service at
port
A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as Ham ...
11211. Both TCP and UDP are supported. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a
hash
Hash, hashes, hash mark, or hashing may refer to:
Substances
* Hash (food), a coarse mixture of ingredients
* Hash, a nickname for hashish, a cannabis product
Hash mark
*Hash mark (sports), a marking on hockey rinks and gridiron football field ...
of the key to determine which server to use. This gives a simple form of
sharding and scalable
shared-nothing architecture A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
across the servers. The server computes a second hash of the key to determine where to store or read the corresponding value. The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. Other databases, such as
MemcacheDB
MemcacheDB (pronunciation: mem-cash-dee-bee) is a persistence enabled variant of memcached. MemcacheDB has not been actively maintained since 2009. It is a general-purpose distributed memory caching system often used to speed up dynamic database- ...
,
Couchbase Server, provide persistent storage while maintaining Memcached protocol compatibility.
If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data.
A typical deployment has several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server. The size of its hash table is often very large. It is limited to available memory across all the servers in the cluster of servers in a data center. Where high-volume, wide-audience Web publishing requires it, this may stretch to many gigabytes. Memcached can be equally valuable for situations where either the number of requests for content is high, or the cost of generating a particular piece of content is high.
Security
Most deployments of Memcached are within trusted networks where clients may freely connect to any server. However, sometimes Memcached is deployed in untrusted networks or where administrators want to exercise control over the clients that are connecting. For this purpose Memcached can be compiled with optional
SASL authentication support. The SASL support requires the binary protocol.
A presentation at
BlackHat USA 2010 revealed that a number of large public websites had left Memcached open to inspection, analysis, retrieval, and modification of data.
Even within a trusted organisation, the flat trust model of memcached may have security implications. For efficient simplicity, all Memcached operations are treated equally. Clients with a valid need for access to low-security entries within the cache gain access to ''all'' entries within the cache, even when these are higher-security and that client has no justifiable need for them. If the cache key can be either predicted, guessed or found by exhaustive searching, its cache entry may be retrieved.
Some attempt to isolate setting and reading data may be made in situations such as high volume web publishing. A farm of outward-facing content servers have ''read'' access to memcached containing published pages or page components, but no write access. Where new content is published (and is not yet in memcached), a request is instead sent to content generation servers that are not publicly accessible to create the content unit and add it to memcached. The content server then retries to retrieve it and serve it outwards.
Used as a DDoS attack vector
In February 2018,
CloudFlare reported that misconfigured memcached servers were used to launch
DDoS attacks
In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connect ...
in large scale. The memcached protocol over UDP has a huge
amplification factor
In general an amplification factor is the numerical multiplicative factor by which some quantity is increased.
* In structural engineering the amplification factor is the ratio of second order to first order deflections.
* In electronics the ampl ...
, of more than 51000. Victims of the DDoS attacks include
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
, which was flooded with 1.35 Tbit/s peak incoming traffic.
This issue was mitigated in Memcached version 1.5.6, which disabled UDP protocol by default.
Example code
''Note that all functions described on this page are
pseudocode
In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
only. Memcached calls and programming languages may vary based on the API used.''
Converting database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows:
function get_foo(int userid)
data = db_select("SELECT * FROM users WHERE userid = ?", userid)
return data
After conversion to Memcached, the same call might look like the following
function get_foo(int userid)
/* first try the cache */
data = memcached_fetch("userrow:" + userid)
if not data
/* not found : request database */
data = db_select("SELECT * FROM users WHERE userid = ?", userid)
/* then store in cache until next get */
memcached_add("userrow:" + userid, data)
end
return data
The client would first check whether a Memcached value with the unique key "userrow:userid" exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call.
However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached "view" of the data would become out of date. Therefore, in addition to creating an "add" call, an update call would also be needed using the Memcached set function.
function update_foo(int userid, string dbUpdateString)
/* first update database */
result = db_execute(dbUpdateString)
if result
/* database update successful : fetch data to be stored in cache */
data = db_select("SELECT * FROM users WHERE userid = ?", userid)
/* the previous line could also look like data = createDataFromDBString(dbUpdateString) */
/* then store in cache until next get */
memcached_set("userrow:" + userid, data)
This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache.
An alternate cache-invalidation strategy is to store a random number in an agreed-upon cache entry and to incorporate this number into all keys that are used to store a particular kind of entry. To invalidate all such entries at once, change the random number. Existing entries (which were stored using the old number) will no longer be referenced and so will eventually expire or be recycled.
function store_xyz_entry(int key, string value)
/* Retrieve the random number - use zero if none exists yet.
* The key-name used here is arbitrary. */
seed = memcached_fetch(":xyz_seed:")
if not seed
seed = 0
/* Build the key used to store the entry and store it.
* The key-name used here is also arbitrary. Notice that the "seed" and the user's "key"
* are stored as separate parts of the constructed hashKey string: ":xyz_data:(seed):(key)."
* This is not mandatory, but is recommended. */
string hashKey = sprintf(":xyz_data:%d:%d", seed, key)
memcached_set(hashKey, value)
/* "fetch_entry," not shown, follows identical logic to the above. */
function invalidate_xyz_cache()
existing_seed = memcached_fetch(":xyz_seed:")
/* Coin a different random seed */
do
seed = rand()
until seed != existing_seed
/* Now store it in the agreed-upon place. All future requests will use this number.
* Therefore, all existing entries become un-referenced and will eventually expire. */
memcached_set(":xyz_seed:", seed)
Usage
*
MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
- directly supports the Memcached API as of version 5.6.
*
Oracle Coherence In computing, Oracle Coherence (originally Tangosol Coherence) is a Java-based distributed cache and in-memory data grid. It is claimed to be "intended for systems that require high availability, high scalability and low latency, particularly in ...
- directly supports the Memcached API as of version 12.1.3.
*
Infinispan
Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it as remote service through TCP/IP.
...
- directly supports Memcached.
See also
*
Amazon ElastiCache
Amazon ElastiCache is a fully managed in-memory data store and cache service by Amazon Web Services (AWS). The service improves the performance of web applications by retrieving information from managed in-memory caches, instead of relying en ...
*
Aerospike
*
Couchbase Server
*
Redis
Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, su ...
*
Mnesia
Mnesia is a distributed, soft real-time database management system written in the Erlang programming language. It is distributed as part of the Open Telecom Platform.
Description
As with Erlang, Mnesia was developed by Ericsson for soft real ...
*
MemcacheDB
MemcacheDB (pronunciation: mem-cash-dee-bee) is a persistence enabled variant of memcached. MemcacheDB has not been actively maintained since 2009. It is a general-purpose distributed memory caching system often used to speed up dynamic database- ...
*
Hazelcast
In computing, Hazelcast IMDG is an open source in-memory data grid based on Java. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California.
In a H ...
*
Cassandra
*
Tarantool
Tarantool is an in-memory computing platform with a flexible data schema, best used for creating high-performance applications. Two main parts of it are an in-memory database and a Lua application server.
Tarantool maintains data in memory and ...
*
Ehcache
Ehcache ( ) is an open source Java distributed cache for general-purpose caching, Java EE and . Ehcache is available under an Apache open source license.
Ehcache was developed by Greg Luck starting in 2003. In 2009, the project was purchased by ...
*
Infinispan
Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it as remote service through TCP/IP.
...
References
External links
*{{Official website
Free memory management software
Cross-platform software
Structured storage
2003 software
Database caching
Key-value databases
Software using the BSD license