Cache Synchronization and Write Policies

Cache synchronization makes sure that all cached copies of data reflect the latest changes of data that is stored in the “source of truth,” which is usually a database. Write policies dictate how data updates are handled within the cache itself.

Cache Synchronization Techniques, aka Write Policies

1. Write-Through Caching

Steps:

Data Update Request: The application initiates a request to modify data.
Simultaneous Writes: The data is written to both the cache and the underlying data source (database) in a single atomic transaction.
Operation Complete: The transaction completes only when both the cache and the data source have been successfully updated.

Pros:
- Strong Consistency: The cache is always perfectly in sync with the data source.
- Simple to Implement: No need for complex invalidation or eventual consistency mechanisms.
Cons:
- Performance Overhead: Every write operation incurs the cost of both a cache update and a database write, potentially slowing down the application.
When to Use:
- Your system needs absolute data consistency and performance is less important.
- You need simplified caching logic.

2. Write-Back Caching

Steps:

Data Update Request: The application initiates a request to modify data.
Cache Write: The data is first written to the cache.
Acknowledgement: The application receives an immediate acknowledgement, allowing it to proceed.
Asynchronous Write-Back: In the background, the updated data is written back to the underlying data source, often using a write-back queue for efficiency.

Note: A “dirty bit” is often used to mark data that has been changed but not yet written back to the storage.

Pros:
- Excellent Write Performance: Writes are absorbed by the fast cache, deferring slower source updates.
- Reduced Load on Data Source: Fewer database writes can improve overall system throughput.
Cons:
- Eventual Consistency: There might be a delay before the source reflects changes made in the cache.
- Risk of Data Loss: If the cache fails before a write-back occurs, data updates could be lost.
- Complexity: Requires mechanisms to manage write-back queues and handle potential cache coherence issues in distributed setups.
When to Use:
- Systems with write-heavy workloads where immediate consistency is not strictly required.
- When offloading a database or primary data source is necessary to improve performance.

3. Cache Aside Pattern

Steps:

Cache Read Attempt: The application attempts to read data from the cache.
Cache Miss: If the data is not found in the cache (cache miss), proceed to step 3.
Fetch from Source: The application retrieves the data from the original data source (database).
Update Cache: The fetched data is written into the cache for future access.
Validation Check (Optional): Before returning the data, a quick check can be performed against the data source to ensure it hasn’t changed since retrieval.
Return Data: The data (either from the cache or the updated source) is returned to the application.

Pros:
- Balances Performance and Consistency: Avoids unnecessary source interactions on cache hits while offering a mechanism to maintain acceptable freshness.
- Flexibility: The validation check before returning data provides some control over staleness tolerance.
Cons:
- Increased Complexity: Requires implementing fetch, write, and validation logic.
- Overhead on Cache Misses: Fetches from the source occur on cache misses, impacting performance in those cases.
When to Use:
- Occasional slight staleness is tolerable in exchange for performance benefits on cache hits.
- Systems where granular control over data staleness is desired for different types of data.

4. Cache Invalidation

Steps:

Data Change: The underlying data source is updated (e.g., database record modified).
Invalidation Trigger: The data source or an associated system sends an invalidation message (e.g., broadcast notification, webhook, update to a timestamp value).
Caches React: Caches holding copies of the modified data mark those copies as invalid.
Force Refetch: On the next read request for the invalidated data, caches fetch the latest version from the original data source.

Pros:
- Enforces Consistency: Actively ensures caches don’t serve stale data after the source updates.
Cons:
- Overhead: Invalidation messages or checks can add network traffic or processing overhead.
- Complexity: In distributed systems, reliably propagating invalidation events can be complex.
When to Use:
- Systems with strict consistency requirements where data must accurately reflect changes in the source in a timely manner.
- Often used in conjunction with other techniques (write-back or cache aside) to trigger cache updates upon data changes.