Design Decision: Async Query Based Change Capture Vs Event Based Message Queue
type
Post
status
Published
date
Oct 17, 2024
slug
async-query-based-change-capture
summary
tags
software design patterns
cache invalidation
database caching
in-memory cache
database
message queue
event based architecture
data pipeline
Design Decision
category
icon
password
Importance
Tweet
Problem Statement
I have a in-memory cache that needs invalidation when the underlying source data is updated. The underlying source data is stored in a postgres database
Characteristic
- Higher read rate than write rate
- data is rarely update
- read performance is priority
- parallel read and write
Proposal 1 - Async Query Based Change Capture
- For each data row, keep a
updated_atdate field, update that field whenever the data row is updated.
- At every set interval, query the database for
updated_at >= x - where
xcould be - the last seen latest
updated_atfrom the previous query - the time of the previous query
Characteristics
- pull based
- cron
- async
Pros
- simple - no additional infrastructure other than existing database
Cons
- increases database load - compared to an event based push approach
- could be mitigated by using indexes - trade off between write and read latency
- no eventual consistency guarantees between parallel caches - happens when a write updates
updated_atto an earlier time and a read happens in between theupdated_atand the actual write xcould be negative offset to reduce chances of inconsistencies - trade off between query performance and inconsistency
- the length of the interval determines the longest possible lag between data update and data syncing across the servers’ in-memory cache
Proposal 2 - Event based message queue
- On database update, sends a message to queue that represents the change event
- consumers consume from the queue and update their cache accordingly
Variations
- consumers: listen vs poll
- write-back vs write-through
- write-back might result in loss of data
- write-through increases write latency
Pros
- takes load off database
- guaranteed eventual consistency
- less lag than proposal 1
Cons
- additional infrastructure
References
MediumLess is More: A Case for Query Based Change Data Capture
Less is More: A Case for Query Based Change Data Capture
Choose simple technology for long term stewardship