Cassandra auto-expiring data

Just found a beautiful feature of Cassandra — a single update that:

  • performs an upsert (insert or update), so there’s no need to check for existing data
  • ensures items are unique — because we’re using a set { … }
  • and takes care of individual item expiration: each item is valid only 2 weeks, after that it is removed, and when there are no more items left,  the row is removed as well
UPDATE covid_contact_tracing USING TTL 1209600 
SET people_met = people_met + {'alice','charlie'}
WHERE user = 'bob';

Seems to behave exactly as expected:

  • If Bob meets Alice multiple times, each time Alice’s the TTL is reset to 2 weeks
  • After 2 weeks Alice indeed expires and is removed from the set
  • When all contacts expire, Bob is also removed

There are drawbacks, however:

  • There’s no way to query a TTL of a collection (set/list/map) item — so you can never know how long it has to live. Or if it has a TTL set at all.
  • If you have existing items without TTL, they have to be reinserted with TTL; this means tons of writes, but at least the whole set can be reinserted at once.
  • This breaks idempotence of writes: you cannot have a job to replay writes (for example, from multiple sources to decrease probability of missing items), since this would reset the TTL