Cache synchronization makes sure that all cached copies of data reflect the latest changes of data that is stored in the “source of truth,” which is usually a database. Write policies dictate how data updates are handled within the cache itself.
Cache Synchronization Techniques, aka Write Policies
1. Write-Through Caching
Steps:
- Data Update Request: The application initiates a request to modify data.
- Simultaneous Writes: The data is written to both the cache and the underlying data source (database) in a single atomic transaction.
- Operation Complete: The transaction completes only when both the cache and the data source have been successfully updated.
- Pros:
- Strong Consistency: The cache is always perfectly in sync with the data source.
- Simple to Implement: No need for complex invalidation or eventual consistency mechanisms.
- Cons:
- Performance Overhead: Every write operation incurs the cost of both a cache update and a database write, potentially slowing down the application.
- When to Use:
- Your system needs absolute data consistency and performance is less important.
- You need simplified caching logic.
2. Write-Back Caching
Steps:
- Data Update Request: The application initiates a request to modify data.
- Cache Write: The data is first written to the cache.
- Acknowledgement: The application receives an immediate acknowledgement, allowing it to proceed.
- Asynchronous Write-Back: In the background, the updated data is written back to the underlying data source, often using a write-back queue for efficiency.
Note: A “dirty bit” is often used to mark data that has been changed but not yet written back to the storage.
- Pros:
- Excellent Write Performance: Writes are absorbed by the fast cache, deferring slower source updates.
- Reduced Load on Data Source: Fewer database writes can improve overall system throughput.
- Cons:
- Eventual Consistency: There might be a delay before the source reflects changes made in the cache.
- Risk of Data Loss: If the cache fails before a write-back occurs, data updates could be lost.
- Complexity: Requires mechanisms to manage write-back queues and handle potential cache coherence issues in distributed setups.
- When to Use:
- Systems with write-heavy workloads where immediate consistency is not strictly required.
- When offloading a database or primary data source is necessary to improve performance.
3. Cache Aside Pattern
Steps:
- Cache Read Attempt: The application attempts to read data from the cache.
- Cache Miss: If the data is not found in the cache (cache miss), proceed to step 3.
- Fetch from Source: The application retrieves the data from the original data source (database).
- Update Cache: The fetched data is written into the cache for future access.
- Validation Check (Optional): Before returning the data, a quick check can be performed against the data source to ensure it hasn’t changed since retrieval.
- Return Data: The data (either from the cache or the updated source) is returned to the application.
- Pros:
- Balances Performance and Consistency: Avoids unnecessary source interactions on cache hits while offering a mechanism to maintain acceptable freshness.
- Flexibility: The validation check before returning data provides some control over staleness tolerance.
- Cons:
- Increased Complexity: Requires implementing fetch, write, and validation logic.
- Overhead on Cache Misses: Fetches from the source occur on cache misses, impacting performance in those cases.
- When to Use:
- Occasional slight staleness is tolerable in exchange for performance benefits on cache hits.
- Systems where granular control over data staleness is desired for different types of data.
4. Cache Invalidation
Steps:
- Data Change: The underlying data source is updated (e.g., database record modified).
- Invalidation Trigger: The data source or an associated system sends an invalidation message (e.g., broadcast notification, webhook, update to a timestamp value).
- Caches React: Caches holding copies of the modified data mark those copies as invalid.
- Force Refetch: On the next read request for the invalidated data, caches fetch the latest version from the original data source.
- Pros:
- Enforces Consistency: Actively ensures caches don’t serve stale data after the source updates.
- Cons:
- Overhead: Invalidation messages or checks can add network traffic or processing overhead.
- Complexity: In distributed systems, reliably propagating invalidation events can be complex.
- When to Use:
- Systems with strict consistency requirements where data must accurately reflect changes in the source in a timely manner.
- Often used in conjunction with other techniques (write-back or cache aside) to trigger cache updates upon data changes.