Optimizing Sorting and Filtering on JSON Columns in Laravel with Indexed Virtual Columns

Draft Disclaimer: Please note that this article is currently in draft form and may undergo revisions before final publication. The content, including information, opinions, and recommendations, is subject to change and may not represent the final version. We appreciate your understanding and patience as we work to refine and improve the quality of this article. Your feedback is valuable in shaping the final release.

Language Mismatch Disclaimer: Please be aware that the language of this article may not match the language settings of your browser or device.
Do you want to read articles in English instead ?

  • json column for sorting and filtering
  • as part of my full-time job, monitoring is something I do regularly (right now 1h a week)
  • max execution time of 30s on php processes
  • following the stack trace lead me to Filament list record page
  • page has a default sorting on a json column attribute
  • no index on json
  • time for virtual column
    • can be indexed
    • updated automatically
  • benchmark to confirm solution is better than better
  • created fake data
  • before and after
  • Using nested json column for filtering, sorting and query in general
  • started to be slow around 10k records
  • remembered Rob's virtual column work 3 years ago
  • decided to give it a try
  • add migration
<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::table('metadata', function (Blueprint $table) {
            $table->string('gcma_approved_at')
                ->nullable()
                ->virtualAs("json_unquote(json_extract(data, '$.gcma_latest_approvalreapproval_date__c'))");
            $table->index('gcma_approved_at');
        });
    }

    public function down(): void
    {
        Schema::table('metadata', function (Blueprint $table) {
            $table->dropColumn('gcma_approved_at');
        });
    }
};

  • create HasVirtualFields trait
<?php
namespace App\Models\Concerns;

trait HasVirtualFields
{
    public function save(array $options = [])
    {
        if (isset($this->virtualFields)) {
            $this->attributes = array_diff_key($this->attributes, array_flip($this->virtualFields));
        }
        return parent::save($options);
    }
}
  • add trait and virtual column property to model
/**
    use HasVirtualFields;
     * The virtual generated columns on the model
     *
     * @var array
     */
    protected array $virtualFields = [
        'website_repository_virtual',
    ];
  • add
  • test environment // records 580k // env: local, mac pro // --
  • provisioning script
$count = 500 * 1000;
for ($i = 0; $i < 250; $i++) {
    Metadata::factory()->count(1000)->create();
}
  • wrapping up all with a test
<?php

namespace Tests\Feature\Models;

use Tests\TestCase;
use App\Models\Metadata;

class MetadataTest extends TestCase
{
    public function testGcmaApprovedAt()
    {
        $date = fake()->dateTimeBetween()->format(DATE_ATOM);
        $metadata = Metadata::factory()->create([
            'data' => [
                'gcma_latest_approvalreapproval_date__c' => $date,
            ],
        ]);
        $this->assertEquals($date, $metadata->gcma_approved_at);
    }

    public function testGcmaUpdatedAt()
    {
        $date = fake()->dateTimeBetween()->format(DATE_ATOM);
        $metadata = Metadata::factory()->create([
            'data' => [
                'gcma_version_modified_date__v' => $date,
            ],
        ]);
        $this->assertEquals($date, $metadata->gcma_updated_at);
    }
}
count(1000)->create(); } // scenario A: nested json column // count 728.008ms // retrieve page of 100 708.738ms // filter + paginate (page of 100) 589.468ms // sort + paginate (page of 100) 5,698.480ms // filter + order + paginate (page of 100) 1,926.094ms Benchmark::dd([ 'count' => fn () => Metadata::count(), 'paginate' => fn () => Metadata::paginate(100), 'filter + paginate' => fn () => Metadata::where('data->gcma_latest_approvalreapproval_date__c', '>', '2024-09-05')->paginate(100), 'sort + paginate' => fn () => Metadata::orderBy('data->gcma_latest_approvalreapproval_date__c')->paginate(100), 'filter + sort + paginate' => fn () => Metadata::where('data->gcma_latest_approvalreapproval_date__c', '>', '2024-09-05')->orderBy('data->gcma_latest_approvalreapproval_date__c')->paginate(100), ], iterations: 100); // scenario B // virtual column + index // count 927.826ms // retrieve page of 100 583.398ms -- because who you would retrieve 580k unpaginated // filter + paginate (page of 100) 52.941ms ~11 x faster // sort + paginate (page of 100) 750.218ms ~7.5 x faster // filter + order + paginate (page of 100) 53.684ms ~36 x faster Benchmark::dd([ 'count' => fn () => Metadata::count(), 'paginate' => fn () => Metadata::paginate(100), 'filter + paginate' => fn () => Metadata::where('gcma_approved_at', '>', '2024-09-05')->paginate(100), 'sort + paginate' => fn () => Metadata::orderBy('gcma_approved_at')->paginate(100), 'filter + sort + paginate' => fn () => Metadata::where('gcma_approved_at', '>', '2024-09-05')->orderBy('gcma_approved_at')->paginate(100), ], iterations: 100);