Faster Eloquent chunking

#laravel

As your applications scale, processing large amounts of database records with Laravel Eloquent can become increasingly difficult. Resulting in out of memory exceptions and overall slowing down your application. Why is that?

When fetching results from the database you are, in turn, pulling that data into memory. Take this snippet of code for instance

Post::all()->each(function ($post) {
// ...
});

Which results in the following query, loading all records in the posts table into memory

select * from posts;

Typically, for tables with a small number of records, this is absolutely acceptable. However, as you accumulate tens of thousands of posts you will eventually begin hitting memory resource constraints of your webserver.

Chunking

A common approach in Laravel is to use Eloquent's (via BuildsQuery) [chunk()](https://laravel.com/api/6.x/Illuminate/Database/Concerns/BuildsQueries.html#method_chunk) method which fetches a fixed amount of records breaking a larger set into more consumable chunks.

Post::chunk(1000, function ($post) {
// ...
});

While this might seem fine there are both improvements and gotchas to be aware of.

First, imagine the following scenario: you are fetching Post records from the database to update an attribute that you are also using in a where clause

Post::where('published_at', '<', now())->chunk(1000, function ($post) {
$post->update('published_at', now());
});

Although contrived, this exemplifies a very real problem where such a query would result in an endless loop as the published_at attribute will always be less than now() at the time of the queries next execution (assuming accuracy to the second as with MySQL's timestamp column type or similar).

Second, there is the matter of the queries performance and its impact on the database server. The above code would result in a query similar to the following

select * from posts order by posts.id asc limit 1000 offset 9000

MySQL is unable to go directly to the offset due to deleted records and additional query constraints and, therefore, this query has to effectively select the first 10,000 records to return only the last 1,000 selected. As you can imagine this will not scale well into many-thousands or even millions of rows. This will cause your database server to use unnecessary resources slowing down all other queries from your application.

Chunking... but better!

In order to both prevent the unforeseen gotcha and improve database server performance we can use Eloquent's chunkById method

Post::where('published_at', '<', now())->chunkById(1000, function ($post) {
$post->update('published_at', now());
});

The above code snippet will result in a query similar to the following

select * from posts where published_at < '2019-09-11 12:00:00' and id > 9000 order by id asc limit 1000

Why is this approach considered "better"?

a) it allows MySQL to completely skip the first 9000 (assumingly sequential) records b) we will no longer be re-selecting records which we have already updated due to the id constraint in our where clause

Bonus - How?! 🤔

Diving into the chunkyById method of the BuildsQueries trait we see that the id of the last record fetched (remember, we're ordering by id in ascending order) is stored and used as a parameter in the next query to be run.

Syntax highlighting provided by Torchlight.