Chunk, ChunkByID and Cursor

Suppose you have to process over 20,000 rows from the database. If you prefer to use get() and all() method of laravel then it would not be a good choice because you may face out of memory issue. As these methods involved loading of whole data at once and then perform further iteration on dataset.

So to overcome this Chunk method was introduced. It break down the large dataset into chunks and then perform further iteration. In laravel Chunk is provided with closure function which is passed as second argument like:

User::chunk(200, function ($users) {
    foreach ($users as $user) {
        echo $user->name;
    }
});

But it doesn’t recommended for the queries where you we have to perform filtering and then further processing like updating on that filter data as in the below sample code:

User::where('status', 1)->chunk(50, function ($users) {
  foreach ($users as $user) {
    $user->update(['status' => 0]);
  }
});

Instead it updates just half chunk of the dataset. Like it receives the first chunk and update the data. Now when it iterate to next chunk the data is already updated and it will skip updating task for that chunk of data.
In this scenario you can either use ChunkByID method or modify the above code as given below:

User::chunk(100, function ($rows) {
   $rows->each(function($user) {
      $user->update(['status' => 1]);
   });
});

ChunkById()

ChunkById is differ from chunk in query structure as follows:

Chunk has :
select * from `users` limit 20 offset 0
select * from `users` limit 20 offset 20
select * from `users` limit 20 offset 40

ChunkById use id in place of offset and has query structure:
select * from `users` where `id` > 0 order by `id` asc limit 20
select * from `users` where `id` > 0 and `id` > 20 order by `id` asc limit 20
select * from `users` where `id` > 0 and `id` > 20 and `id` > 40 order by `id` asc limit 20

So, as limit is already being used in both query structure. Thus you can’t use it in the query in the following manner:

$dataPerChunk = 20;
User::limit(200)->chunk($dataPerChunk, function($user){
      foreach ($users as $user) {
        echo $user->name;
    }
});
Or
User::limit(200)->chunkById($dataPerChunk, function($user){
      foreach ($users as $user) {
        echo $user->name;
    }
});

But in ChunkById we can use where clause with id as it is being used in its internal query. So , we will write query in this manner:

$dataPerChunk = 20;
$limit = 200;
$id = User::orderBy('id', 'asc')->limit(1)>offset($limit)->select('id')->first()->id;
// this would give 201 
// so next we will write where claue in this way that act as limit
User::where('id', '<', $id)->chunkById($dataPerChunk, function($user){
     foreach ($users as $user) {
        echo $user->name;
    }
});

Cursor

Cursor also has same responsibility like the Chunk and ChunkById i.e. dealing with bulky data. But somehow its way of dealing with data is quiet different. It doesn’t load all the data(like 10,000 of records) at once, instead it is lazy and prefer to load when it is needed. This is the reason the name Cursor is chosen for it. Cursor logically uses the concept of php generators to implement this functionality and is relatively faster than other methods.

$users = User::where('status',1)->cursor();
foreach($users as $user){
    $user->name;
}

Leave a Reply Cancel reply