Typical cache usage
Below is a typical cache usage pattern for an item that needs to be updated every units of time: function fetch(''key'', ''ttl'') If the function takes a long time and the key is accessed frequently, many processes will simultaneously call upon expiration of the cache value. In typical web applications, the function may query a database, access other services, or perform some complicated operation (which is why this particular computation is being cached in the first place). When the request rate is high, the database (or any other shared resource) will suffer from an overload of requests/queries, which may in turn cause a system collapse.Cache stampede mitigation
Several approaches have been proposed to mitigate cache stampedes (also known as dogpile prevention). They can be roughly grouped in 3 main categories.Locking
To prevent multiple simultaneous recomputations of the same value, upon a cache miss a process will attempt to acquire the lock for that cache key and recompute it only if it acquires it. There are different implementation options for the case when the lock is ''not'' acquired: * Wait until the value is recomputed * Return a "not-found" and have the client handle the absence of the value properly * Keep a stale item in the cache to be used while the new value is recomputed If implemented properly, locking can prevent stampedes altogether, but requires an extra write for the locking mechanism. Apart from doubling the number of writes, the main drawback is a correct implementation of the locking mechanism which also takes care of edge cases including failure of the process acquiring the lock, tuning of a time-to-live for the lock, race-conditions, and so on.External recomputation
This solution moves the recomputation of the cache value from the processes needing it to an external process. The recomputation of the external process can be triggered in different ways: * When the cache value approaches its expiration * Periodically * When a process needing the value encounters a cache miss This approach requires one more moving part - the external process - that needs to be maintained and monitored. In addition, this solution requires unnatural code separation/duplication and is mostly suited for static cache keys (i.e., not dynamically generated, as in the case of keys indexed by an id).Probabilistic early expiration
With this approach, each process may recompute the cache value before its expiration by making an independent probabilistic decision, where the probability of performing the early recomputation increases as we get closer to the expiration of the value. Since the probabilistic decision is made independently by each process, the effect of the stampede is mitigated as fewer processes will expire at the same time. The following implementation based on an exponential distribution has been shown to be optimal in terms of its effectiveness in preventing stampedes and how early recomputations can happen.. function x-fetch(''key'', ''ttl'', ''beta''=1) The parameter can be set to a value greater than 1 to favor earlier recomputations and further reduce stampedes but the authors show that setting =1 works well in practice. The variable represents the time to recompute the value and is used to scale the probability distribution appropriately. This approach is simple to implement and effectively reduces cache stampedes by automatically favoring early recomputations when the traffic rate increases. One drawback is that it takes more memory in cache as we need to bundle the value ''delta'' with the cache item - when the caching system does not support retrieval of the key expiration time, we also need to store the (that is, ) in the bundle.References
{{reflistExternal links