HOME

TheInfoList



OR:

In
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s, and
transaction processing Transaction processing is information processing in computer science that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially compl ...
(transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation has been adopted by several major
database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
s, such as
InterBase InterBase is a relational database management system (RDBMS) currently developed and marketed by Embarcadero Technologies. InterBase is distinguished from other RDBMSs by its small footprint, close to zero administration requirements, and multi-g ...
,
Firebird Firebird and fire bird may refer to: Mythical birds * Phoenix (mythology), sacred firebird found in the mythologies of many cultures * Bennu, Egyptian firebird * Huma bird, Persian firebird * Firebird (Slavic folklore) Bird species ''Various s ...
,
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
,
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database orga ...
,
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingr ...
, SQL Anywhere,
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
and
Microsoft SQL Server Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ...
(2005 and later). The main reason for its adoption is that it allows better performance than
serializability In concurrency control of databases, Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987)''Concurrency Control and Recovery in Database Systems''(free PDF download), Addison Wesley Publishing Company, Gerhard Weikum, Gottfried Vossen (20 ...
, yet still avoids most of the concurrency anomalies that serializability avoids (but not all). In practice snapshot isolation is implemented within
multiversion concurrency control Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory. Description W ...
(MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has been used to criticize the
ANSI The American National Standards Institute (ANSI ) is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The orga ...
SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI). In spite of its distinction from serializability, snapshot isolation is sometimes referred to as ''serializable'' by Oracle.


Definition

A transaction executing under snapshot isolation appears to operate on a personal ''snapshot'' of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write–write conflict will cause the transaction to abort. In a ''write skew'' anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies. As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2. If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = −$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 − $200) is now −$200), or T2 happens first and similarly prevents T1 from committing. If the database is under snapshot isolation(MVCC), however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither ''update'' conflicts, both commit successfully, leaving V1 = V2 = −$100, and V1 + V2 = −$200. Some systems built using
multiversion concurrency control Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory. Description W ...
(MVCC) may support (only) snapshot isolation to allow transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. This is convenient because MVCC maintains a series of recent history consistent states. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed. However, MVCC systems (such as MarkLogic) will use locks to serialize writes together with MVCC to obtain some of the performance gains and still support the stronger "serializability" level of isolation.


Workarounds

Potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the
serializability In concurrency control of databases, Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987)''Concurrency Control and Recovery in Database Systems''(free PDF download), Addison Wesley Publishing Company, Gerhard Weikum, Gottfried Vossen (20 ...
property. ;Materialize the conflict: Add a special conflict table, which both transactions update in order to create a direct write–write conflict. ;Promotion: Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write–write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE). In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their ''total balance''. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write–write conflict that would prevent the two from succeeding concurrently. However, this approach violates the normal form. Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write–write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible. In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance.


Terminology

Snapshot isolation is called "serializable" mode in
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
Oracle Database Concepts 10g Release 1 (10.1) Chapter 13 : Data Concurrency and Consistency — Oracle Isolation Levels
Ask Tom : On Transaction Isolation Levels
and
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingr ...
versions prior to 9.1,PostgreSQL 9.0 Documentation: 13.2.2.1. Serializable Isolation versus True Serializability
/ref>PostgreSQL 9.1 press release
/ref>
/ref> which may cause confusion with the "real
serializability In concurrency control of databases, Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987)''Concurrency Control and Recovery in Database Systems''(free PDF download), Addison Wesley Publishing Company, Gerhard Weikum, Gottfried Vossen (20 ...
" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic.


History

Snapshot isolation arose from work on
multiversion concurrency control Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory. Description W ...
databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level.
InterBase InterBase is a relational database management system (RDBMS) currently developed and marketed by Embarcadero Technologies. InterBase is distinguished from other RDBMSs by its small footprint, close to zero administration requirements, and multi-g ...
, later owned by
Borland Borland Software Corporation was a computer technology company founded in 1983 by Niels Jensen, Ole Henriksen, Mogens Glad and Philippe Kahn. Its main business was the development and sale of software development and software deployment produc ...
, was acknowledged to provide SI rather than full serializability in version 4, and likely permitted write-skew anomalies since its first release in 1985. Unfortunately, the ANSI
SQL-92 SQL-92 was the third revision of the SQL database query language. Unlike SQL-89, it was a major revision of the standard. Aside from a few minor incompatibilities, the SQL-89 standard is forward-compatible with SQL-92. The standard specificati ...
standard was written with a
lock Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson ''et al.'' wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions. In 2008, Cahill ''et al.'' showed that write-skew anomalies could be prevented by detecting and aborting "dangerous" triplets of concurrent transactions.Michael J. Cahill, Uwe Röhm, Alan D. Fekete (2008
"Serializable isolation for snapshot databases"
''Proceedings of the 2008 ACM SIGMOD international conference on Management of data'', pp. 729–738, (SIGMOD 2008 best paper award)
This implementation of serializability is well-suited to
multiversion concurrency control Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory. Description W ...
databases, and has been adopted in PostgreSQL 9.1, where it is referred to as "Serializable Snapshot Isolation", abbreviated to SSI. When used consistently, this eliminates the need for the above workarounds. The downside over snapshot isolation is an increase in aborted transactions. This can perform better or worse than snapshot isolation with the above workarounds, depending on workload.


References


Further reading

* Bettina Kemme, Gustavo Alonso
A new approach to developing and implementing eager database replication protocols
ACM Transactions on Database Systems (TODS), v.25 n.3, p. 333-379, Sept. 2000. * Gerhard Weikum, Gottfried Vossen, ''Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery'', Morgan Kaufmann, 2002, * Yi Lin, Bettina Kemme, Marta Patiño-Martínez, Ricardo Jiménez-Peris
Middleware based data replication providing snapshot isolation
Proceedings of the 2005 ACM SIGMOD international Conf., 2005. * Marta Patiño-Martinez, Ricardo Jiménez-Peris, Bettina Kemme, Gustavo Alonso
MIDDLE-R: Consistent database replication at the middleware level
ACM Transactions on Computer Systems (TOCS). Volume 23 Issue 4. Pages 375-423. * Khuzaima Daudjee, Kenneth Salem, ''Lazy Database Replication with Snapshot Isolation'', VLDB 2006: pages 715-726 {{DEFAULTSORT:Snapshot Isolation Databases Concurrency control Transaction processing