What Every Developer Should Know About Cache Coherence

Cache Coherence is a set of mechanisms and protocols to ensure that any changes to a shared data item are consistently reflected across all cached copies.

This is mostly useful in systems where there are multiple caches distributed across multiple processors or servers. Yes, processors or servers – this means that cache coherency is utilized in both hardware and software extensively.

Why Cache Coherence Matters

Imagine this scenario:

Server A reads data “X” from a database and stores a copy in its cache.
Server B updates the value of “X” in the database.
Server A, unaware of the update, continues to serve the now stale cached version of “X”.

This inconsistency leads to potential problems, from incorrect calculations to unpredictable user experiences. Cache coherency can prevent such scenarios.

Core Principles of Coherent Caches

There are two types of cache coherency protocols: snooping protocols and directory-based protocols.

Both systems share two core principles.

They need a mechanism to determine that multiple caches are holding copies of the same data item.
Each cached data item has a state (e.g., Modified, Shared, Invalid) indicating its consistency with the main memory copy.

In snoop-based protocols, caches “listen” on a bus, monitoring all memory transactions. If a change to shared data is detected, caches holding invalid copies are either updated with the new value or their copy is marked as invalid.

In directory-based protocols, a central directory maintains the state of cached data across the system. When data changes, the directory instructs caches to update or invalidate their copies accordingly.

Note that there are implementations of both protocols at once, called hybrid protocols. These are not common.

Snooping Protocols

In snooping protocols, all caches constantly monitor a shared bus for memory transactions. Here’s the basic flow:

Read Request: A processor issues a read request for data X.
Cache Check: Other processors’ caches snoop on the bus.
Shared or Owned State: If a cache holds X in a Shared or Owned state, it responds, and the requesting processor retrieves the data.
Invalidation: If a processor needs to modify X (currently in Shared state in other caches), it broadcasts an invalidation message on the bus, forcing other caches to discard their copies.
Write-Back: Upon modification completion, the processor with the updated data might write it back to main memory (Write-Invalidate) or broadcast the update to other caches (Write-Update).

As you see above in Step 5, there are two main types of snooping protocols:

Write-Invalidate: The most prevalent example of this is the MESI protocol (Modified, Exclusive, Shared, Invalid). When a core writes to a cache line, it broadcasts an invalidate signal to other cores, which then invalidate their copies of the cache line. This ensures that only the writing cache has the valid version of the data.
Write-Broadcast (or Write-Update): In this approach, when a core updates a cache line, the new value is broadcast to all other caches. This method ensures that all caches always have the most recent value, but it can lead to higher bus traffic and power consumption.

Snoop-based protocols are less efficient in large systems due to increased bus traffic. Also, protocols with write-update can be less efficient for frequently written data due to broadcast overhead.

Directory-Based Protocols

Directory-based protocols rely on a central directory that tracks the location and state of cached data. These protocols do not rely on a bus and are therefore well-suited to systems with non-uniform memory access (NUMA) architectures. Here’s how it works:

Read Request: A processor requests to read data X.
Directory Lookup: The directory checks its records and informs the processor where the most up-to-date copy resides (another cache or main memory).
Data Retrieval: The processor fetches the data from the designated location.
Write Update: When a processor modifies X, it informs the directory, which updates its records and might instruct other caches to invalidate their copies.

Common Examples

MSI Protocol: A basic protocol with three states: Modified, Shared, and Invalid.
MESI Protocol: Extends MSI with an Exclusive state, allowing a processor to have exclusive read/write access to data.
MOESI Protocol: An extension of MESI with an Owned state, allowing for optimizations in write-back scenarios.

Real-World Examples

Facebook Memcached

Facebook built the largest Memcached deployment in the world. They broke down exactly how they did it, which was a great look into well-built distributed caching.

Intel’s Multi-Core Xeon Processors

Intel’s Xeon processors are a great example of hardware-based cache coherence. These processors utilize the MESI (Modified, Exclusive, Shared, Invalid) protocol, a form of the snooping protocol designed to maintain data consistency across the processor’s multiple cores.

How it works:

Each core in a multi-core Xeon processor has its own local cache.
When a core modifies data in its cache, the MESI protocol ensures that other cores’ caches invalidate their copies of that data, if they have it cached.
This protocol uses a combination of bus sniffing (where each cache monitors or “snoops” on a shared communication bus to detect write actions) and direct communication between caches to keep data consistent and up-to-date.

Redis with Redis Sentinel

Redis is also often used as a distributed cache. When used in distributed environments, especially with features like Redis Sentinel, it manages cache coherence with mechanisms to handle cache invalidation and data synchronization across different nodes.

How it works:

Redis can be configured in a master-slave architecture, where the master node holds the current version of the data, and slave nodes replicate this data.
Redis Sentinel manages and monitors the Redis nodes, handling automatic failover and guaranteeing that only one master node is active at any given time.
When a write occurs to the master node, changes are propagated to all slave nodes. If the master node fails, one of the slaves is promoted to master, ensuring data continuity and coherence.
Applications connected to this distributed cache always interact with the master to write data, while reads can be distributed across the master and slave nodes to balance load without sacrificing data coherence.