Simple Scaling with Redis Lists
Although simple and powerful, it's no secret that relational databases have their limitations. So what happens when simple doesn't cut it, and your relational database falls short? You might try throwing more servers at it, reaching for replication, or load balancing which would assist in scaling read-intensive operations. But how would you go about scaling write-intensive operations?
Take for example some of the problems I've faced working with clients in the Adtech space:
- Tracking impressions across an ad network
- Monitoring user behaviors and actions
- Tracking traffic to and from affiliate networks
- Capturing spikes of traffic resulting in periodic loads 10x higher than normal
All of these have required high data ingestion rates that couldn't be achieved the same way that you would remedy read-intensive bottlenecks. Rather than rearchitecting from the ground up, in each of these instances, I was able to keep it simple and lean in on tools that were already familiar and readily available.
Redis is an [...] in-memory data structure store, used as a database, cache, and message broker.
Although commonly used for caching the output of CPU intensive or time-consuming tasks, Redis' use cases extend far beyond that. Redis supports various data types including hashes, lists, sets/sorted sets, and of course strings. Each offering the same speed and scalability that you've come to expect from the blazing fast, in-memory, highly scalable data store.
We're going to look at an example of how to use Redis Lists for fast data ingestion to then process and persist the data to a relational database later on.
Redis Lists are simply lists of strings, sorted by insertion order.
Lists support a maximum length of over 4 billion elements and, as long as you're accessing those elements near the head and tail of the list, maintain constant time insertion and deletion. You can think of Lists as a feature-limited extension to your programming language's array
or list
data type.
Because of this, Redis Lists are great for fast data ingestion. We can use Lists as a "data channel" to push data into in real-time and then consume, process, and ingest the data into our relational database in more performant batches.
All the below pseudocode samples are in PHP and make references to components of the Laravel framework. However, the same methodologies can be applied to your language/framework of choice.
Filename: AdSpotController.php
Description: Push data from an incoming request onto the end of the list as requests come in.
public function __invoke(Request $request){ Redis::rpush('impressions', json_encode([ 'spot_id' => $request->spot_id, // more data points... 'timestamp' => time(), ])); // ...}
Filename: ImpressionListConsumer.php
Description: Pop data off the end of the list in chunks of 1,000 and process it before storing it in the relational database. Scheduled to run at a fixed interval (i.e. every 10 minutes).
public function handle(){ while ($chunk = Redis::rpop('impressions', 1000) { DB::transaction(function () use ($chunk) { collect($chunk) ->map(fn($row) => json_decode($row, true)) ->transform(...) // do some data processing... ->mapToGroups(...) // maybe summarize high polarity data ->each(function ($group) { Impression::createMany($group); }); }); }}
Closing
While these examples aren't exactly functional, they do accurately depict how simple it is to get started using Redis Lists for fast data ingestion as a means to scaling your relational database driven application without introducing any unnecessary complexity.
If there's one thing to take away from this article, aside from how powerful Redis is, the next time you're facing a scaling problem or technical limitation due to the constraints of your stack, consider exploring what is already available to you rather than introducing any additional complexity.
But wait...
If you’re curious about how far you can push this, I have easily scaled apps to tens of thousands of writes/second with a single 4gb Redis instance. Redis’ enterprise offering claims to be able to support up to 200m writes/second should you need that.