Cassandra auto-expiring data
Just found a beautiful feature of Cassandra — a single update that:
- performs an upsert (insert or update), so there’s no need to check for existing data
- ensures items are unique — because we’re using a set { … }
- and takes care of individual item expiration: each item is valid only 2 weeks, after that it is removed, and when there are no more items left, the row is removed as well
UPDATE covid_contact_tracing USING TTL 1209600 SET people_met = people_met + {'alice','charlie'} WHERE user = 'bob';
Seems to behave exactly as expected:
- If Bob meets Alice multiple times, each time Alice’s the TTL is reset to 2 weeks
- After 2 weeks Alice indeed expires and is removed from the set
- When all contacts expire, Bob is also removed
There are drawbacks, however:
- There’s no way to query a TTL of a collection (set/list/map) item — so you can never know how long it has to live. Or if it has a TTL set at all.
- If you have existing items without TTL, they have to be reinserted with TTL; this means tons of writes, but at least the whole set can be reinserted at once.
- This breaks idempotence of writes: you cannot have a job to replay writes (for example, from multiple sources to decrease probability of missing items), since this would reset the TTL
Reply
You must be logged in to post a comment.