PERF: Add partial index on reviewables for topic view (PR #10492)

On the topic view route we query for reviewables of each post in the stream, using a query that filters on two unindexed columns. This results in a Parallel Seq Scan over all rows, which can take quite some time (~20ms was seen) on forums with lots of flags

After index is added PostgreSQL planner opts for a simple Index Scan and runs in sub 1ms.

Before:

                                     QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=11401.08..11404.87 rows=20 width=28) (actual time=19.209..19.209 rows=1 loops=1)
   Group Key: r.target_id
   ->  Gather Merge  (cost=11401.08..11404.41 rows=26 width=28) (actual time=19.202..20.419 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=10401.06..10401.38 rows=13 width=28) (actual time=16.958..16.958 rows=0 loops=3)
               Group Key: r.target_id
               ->  Sort  (cost=10401.06..10401.09 rows=13 width=16) (actual time=16.956..16.956 rows=0 loops=3)
                     Sort Key: r.target_id
                     Sort Method: quicksort  Memory: 25kB
                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                     ->  Nested Loop  (cost=0.42..10400.82 rows=13 width=16) (actual time=15.894..16.938 rows=0 loops=3)
                           ->  Parallel Seq Scan on reviewables r  (cost=0.00..10302.47 rows=8 width=12) (actual time=15.882..16.927 rows=0 loops=3)
                                 Filter: (((target_type)::text = 'Post'::text) AND (target_id = ANY ('{7565483,7565563,7565566,7565567,7565568,7565569,7565579,7565580,7565583,7565586,7565588,7565589,7565601,7565602,7565603,7565613,7565620,7565623,7565624,7565626}'::integer[])))
                                 Rows Removed by Filter: 49183
                           ->  Index Scan using index_reviewable_scores_on_reviewable_id on reviewable_scores s  (cost=0.42..12.27 rows=2 width=8) (actual time=0.029..0.030 rows=1 loops=1)
                                 Index Cond: (reviewable_id = r.id)
 Planning Time: 0.318 ms
 Execution Time: 20.470 ms

After:

                                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.84..342.54 rows=20 width=28) (actual time=0.038..0.038 rows=1 loops=1)
   Group Key: r.target_id
   ->  Nested Loop  (cost=0.84..341.95 rows=31 width=16) (actual time=0.020..0.033 rows=1 loops=1)
         ->  Index Scan using index_reviewables_on_target_id on reviewables r  (cost=0.42..96.07 rows=20 width=12) (actual time=0.013..0.026 rows=1 loops=1)
               Index Cond: (target_id = ANY ('{7565483,7565563,7565566,7565567,7565568,7565569,7565579,7565580,7565583,7565586,7565588,7565589,7565601,7565602,7565603,7565613,7565620,7565623,7565624,7565626}'::integer[]))
         ->  Index Scan using index_reviewable_scores_on_reviewable_id on reviewable_scores s  (cost=0.42..12.27 rows=2 width=8) (actual time=0.005..0.005 rows=1 loops=1)
               Index Cond: (reviewable_id = r.id)
 Planning Time: 0.253 ms
 Execution Time: 0.067 ms

GitHub

Just to err on the side of caution, I think it’ll be better to add the index here concurrently. Another suggestion I have is to add a custom name for this index because the default name Rails generates will not indicate anything about the filter. We do annotate the index in our code but if someone is looking through the index in the DB, it’ll seem like an index on the target_id.

As a rarely written and small table, I’m not 100% on the concurrent trade-off being worth it. I tested it on our largest site in terms of flags and the index is created very fast in just a few seconds. Anyway, I guess we can add it and live with the longer migration.

Like

FROM: index_reviewables_on_target_id
TO:   index_reviewables_on_target_id_where_post_type_eq_post

?

I’m not sure this model scales to complex filters in the partial index :thinking:

Inspecting it from the DB shows the complete information (second to last line):

discourse_development=# \d reviewables
                                               Table "public.reviewables"
         Column          |            Type             | Collation | Nullable |                 Default
-------------------------+-----------------------------+-----------+----------+-----------------------------------------
 id                      | bigint                      |           | not null | nextval('reviewables_id_seq'::regclass)
 type                    | character varying           |           | not null |
 status                  | integer                     |           | not null | 0
 created_by_id           | integer                     |           | not null |
 reviewable_by_moderator | boolean                     |           | not null | false
 reviewable_by_group_id  | integer                     |           |          |
 category_id             | integer                     |           |          |
 topic_id                | integer                     |           |          |
 score                   | double precision            |           | not null | 0.0
 potential_spam          | boolean                     |           | not null | false
 target_id               | integer                     |           |          |
 target_type             | character varying           |           |          |
 target_created_by_id    | integer                     |           |          |
 payload                 | json                        |           |          |
 version                 | integer                     |           | not null | 0
 latest_score            | timestamp without time zone |           |          |
 created_at              | timestamp without time zone |           | not null |
 updated_at              | timestamp without time zone |           | not null |
Indexes:
    "reviewables_pkey" PRIMARY KEY, btree (id)
    "index_reviewables_on_type_and_target_id" UNIQUE, btree (type, target_id)
    "index_reviewables_on_reviewable_by_group_id" btree (reviewable_by_group_id)
    "index_reviewables_on_status_and_created_at" btree (status, created_at)
    "index_reviewables_on_status_and_score" btree (status, score)
    "index_reviewables_on_status_and_type" btree (status, type)
    "index_reviewables_on_target_id" btree (target_id) WHERE target_type::text = 'Post'::text
    "index_reviewables_on_topic_id_and_status_and_created_by_id" btree (topic_id, status, created_by_id)

But is it a index on target_id! It indexes target_id in a btree only when target_type = 'Post', which in the real world is 90% of of table anyway.

All changes made!

@xfalcox Btw you don’t have to make those changes if it doesn’t make sense from your perspective. I might not suggest things that always make sense.

    add_index :reviewables, [:target_id], where: "target_type = 'Post'", algorithm: :concurrently, name: "index_post_reviewables_on_target_id"