Neden jsonb sütununda bir cin dizin sorgumu yavaşlatıyor ve bu konuda ne yapabilirim?

Test verilerini başlat:

CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE TABLE docs (data JSONB NOT NULL DEFAULT '{}');
-- generate 200k documents, ~half with type: "type1" and another half with type: "type2", unique incremented index and random uuid per each row
INSERT INTO docs (data)
SELECT json_build_object('id', gen_random_uuid(), 'type', (CASE WHEN random() > 0.5 THEN 'type1' ELSE 'type2' END) ,'index', n)::JSONB
FROM generate_series(1, 200000) n;
-- inset one more row with explicit uuid to query by it later
INSERT INTO docs (data) VALUES (json_build_object('id', '30e84646-c5c5-492d-b7f7-c884d77d1e0a', 'type', 'type1' ,'index', 200001)::JSONB);

İlk sorgu - veriye göre filtrele-> tür ve sınır:

-- FAST ~19ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
LIMIT 25;
/* "Limit  (cost=0.00..697.12 rows=25 width=90) (actual time=0.029..0.070 rows=25 loops=1)"
   "  ->  Seq Scan on docs  (cost=0.00..5577.00 rows=200 width=90) (actual time=0.028..0.061 rows=25 loops=1)"
   "        Filter: (data @> '{"type": "type1"}'::jsonb)"
   "        Rows Removed by Filter: 17"
   "Planning time: 0.069 ms"
   "Execution time: 0.098 ms" 
*/

İkinci sorgu - veriye göre filtrele-> tür, veriye göre sırala-> dizin ve sınır

-- SLOW ~250ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index' -- added ORDER BY
LIMIT 25;

/* "Limit  (cost=5583.14..5583.21 rows=25 width=90) (actual time=236.750..236.754 rows=25 loops=1)"
   "  ->  Sort  (cost=5583.14..5583.64 rows=200 width=90) (actual time=236.750..236.750 rows=25 loops=1)"
   "        Sort Key: ((data -> 'index'::text))"
   "        Sort Method: top-N heapsort  Memory: 28kB"
   "        ->  Seq Scan on docs  (cost=0.00..5577.50 rows=200 width=90) (actual time=0.020..170.797 rows=100158 loops=1)"
   "              Filter: (data @> '{"type": "type1"}'::jsonb)"
   "              Rows Removed by Filter: 99842"
   "Planning time: 0.075 ms"
   "Execution time: 236.785 ms"
*/

Üçüncü sorgu - İkinci (önceki) ile aynı, ancak veri-> dizininde btree dizini ile:

CREATE INDEX docs_data_index_idx ON docs ((data->'index'));

-- FAST ~19ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index' -- added BTREE index on this field
LIMIT 25;
/* "Limit  (cost=0.42..2473.98 rows=25 width=90) (actual time=0.040..0.125 rows=25 loops=1)"
   "  ->  Index Scan using docs_data_index_idx on docs  (cost=0.42..19788.92 rows=200 width=90) (actual time=0.038..0.119 rows=25 loops=1)"
   "        Filter: (data @> '{"type": "type1"}'::jsonb)"
   "        Rows Removed by Filter: 17"
   "Planning time: 0.127 ms"
   "Execution time: 0.159 ms"
*/

Dördüncü sorgu - şimdi verilere göre filtrele-> id ve limit = 1:

-- SLOW ~116ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> ('{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}')::JSONB -- querying by "id" field now
LIMIT 1;
/* "Limit  (cost=0.00..27.89 rows=1 width=90) (actual time=97.990..97.990 rows=1 loops=1)"
   "  ->  Seq Scan on docs  (cost=0.00..5577.00 rows=200 width=90) (actual time=97.989..97.989 rows=1 loops=1)"
   "        Filter: (data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::jsonb)"
   "        Rows Removed by Filter: 189999"
   "Planning time: 0.064 ms"
   "Execution time: 98.012 ms"
*/

Beşinci sorgu - Dördüncü ile aynı, ancak verilerdeki cin (json_path_ops) diziniyle:

CREATE INDEX docs_data_idx ON docs USING GIN (data jsonb_path_ops);

-- FAST ~17ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::JSONB -- added gin index with json_path_ops
LIMIT 1;
/* "Limit  (cost=17.55..20.71 rows=1 width=90) (actual time=0.027..0.027 rows=1 loops=1)"
   "  ->  Bitmap Heap Scan on docs  (cost=17.55..649.91 rows=200 width=90) (actual time=0.026..0.026 rows=1 loops=1)"
   "        Recheck Cond: (data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::jsonb)"
   "        Heap Blocks: exact=1"
   "        ->  Bitmap Index Scan on docs_data_idx  (cost=0.00..17.50 rows=200 width=0) (actual time=0.016..0.016 rows=1 loops=1)"
   "              Index Cond: (data @> '{"id": "30e84646-c5c5-492d-b7f7-c884d77d1e0a"}'::jsonb)"
   "Planning time: 0.095 ms"
   "Execution time: 0.055 ms"
*/

Altıncı (ve son) sorgu - Üçüncü sorgu ile aynı (veriye göre sorgulama-> tür, veriye göre sıralama-> dizin, sınır):

-- SLOW AGAIN! ~224ms
EXPLAIN ANALYZE
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
/* "Limit  (cost=656.06..656.12 rows=25 width=90) (actual time=215.927..215.932 rows=25 loops=1)"
   "  ->  Sort  (cost=656.06..656.56 rows=200 width=90) (actual time=215.925..215.925 rows=25 loops=1)"
   "        Sort Key: ((data -> 'index'::text))"
   "        Sort Method: top-N heapsort  Memory: 28kB"
   "        ->  Bitmap Heap Scan on docs  (cost=17.55..650.41 rows=200 width=90) (actual time=33.134..152.618 rows=100158 loops=1)"
   "              Recheck Cond: (data @> '{"type": "type1"}'::jsonb)"
   "              Heap Blocks: exact=3077"
   "              ->  Bitmap Index Scan on docs_data_idx  (cost=0.00..17.50 rows=200 width=0) (actual time=32.468..32.468 rows=100158 loops=1)"
   "                    Index Cond: (data @> '{"type": "type1"}'::jsonb)"
   "Planning time: 0.157 ms"
   "Execution time: 215.992 ms"
*/

Veri sütununda cin indeksi olduğunda Altıncı (Üçüncü ile aynı) sorgu çok daha yavaş görünüyor. Muhtemelen veri-> tür alanı için çok farklı değerler olmadığı için (sadece "tip1" veya "tip2")? Bu konuda ne yapabilirim? Ben buna yarar diğer sorguları yapmak için cin dizin gerekir ...

postgresql postgresql-9.4

— user606521
kaynak

Eğer sorunu yaşamanıza gibi o görünüyor jsonbsütunlar düz% 1 istatistik oranına sahip aşağıda bildirildiği gibi, istatistikler jsonb eksikliğini etrafında çalışıyor? . Sorgu planlarınıza bakıldığında, tahminler ve gerçek yürütmeler arasındaki farklar çok büyük. Tahminler, muhtemelen 200 satır olduğunu ve gerçek getirinin 100158 satır olduğunu ve bu da planlamacının diğerlerine göre belirli stratejileri tercih etmesine neden olduğunu söylüyor.

Altıncı sorgudaki seçim, bir dizin taraması üzerinde bir bitmap dizin taraması lehine geldiğine göre SET enable_bitmapscan=off, üçüncü örneğinizdeki davranışa geri dönmesini sağlamak için planlayıcıyı birlikte sürükleyebilirsiniz .

Benim için şu şekilde çalıştı:

postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=656.06..656.12 rows=25 width=90) (actual time=117.338..117.343 rows=25 loops=1)
   Buffers: shared hit=3096
   ->  Sort  (cost=656.06..656.56 rows=200 width=90) (actual time=117.336..117.338 rows=25 loops=1)
         Sort Key: ((data -> 'index'::text))
         Sort Method: top-N heapsort  Memory: 28kB
         Buffers: shared hit=3096
         ->  Bitmap Heap Scan on docs  (cost=17.55..650.41 rows=200 width=90) (actual time=12.838..80.584 rows=99973 loops=1)
               Recheck Cond: (data @> '{"type": "type1"}'::jsonb)
               Heap Blocks: exact=3077
               Buffers: shared hit=3096
               ->  Bitmap Index Scan on docs_data_idx  (cost=0.00..17.50 rows=200 width=0) (actual time=12.469..12.469 rows=99973 loops=1)
                     Index Cond: (data @> '{"type": "type1"}'::jsonb)
                     Buffers: shared hit=19
 Planning time: 0.088 ms
 Execution time: 117.405 ms
(15 rows)

Time: 117.813 ms
postgres@[local]:5432:postgres:=# SET enable_bitmapscan = off;
SET
Time: 0.130 ms
postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1320.48 rows=25 width=90) (actual time=0.017..0.050 rows=25 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using docs_data_index_idx on docs  (cost=0.42..10560.94 rows=200 width=90) (actual time=0.015..0.045 rows=25 loops=1)
         Filter: (data @> '{"type": "type1"}'::jsonb)
         Rows Removed by Filter: 27
         Buffers: shared hit=4
 Planning time: 0.083 ms
 Execution time: 0.071 ms
(8 rows)

Time: 0.402 ms
postgres@[local]:5432:postgres:=#

Bu rotaya gitmek istiyorsanız, bu taramayı yalnızca böyle davranış gösteren sorgular için devre dışı bıraktığınızdan emin olun, aksi takdirde diğer sorgu planlarında da kötü davranış elde edersiniz. Böyle bir şey yapmak gayet iyi çalışmalıdır:

BEGIN;
SET enable_bitmapscan=off;
SELECT * FROM docs
WHERE data @> '{"type": "type1"}'::JSONB
ORDER BY data->'index'
LIMIT 25;
SET enable_bitmapscan=on;
COMMIT;

Umut etmek yardımcı olur =)

— Kassandry
kaynak

Seni doğru anladığımdan emin değilim (PG internals'e aşina değilim) - bu davranış jsonb sütunundaki "type" alanındaki düşük kardinaliteden kaynaklanıyor (ve dahili olarak düz istatistik oranından kaynaklanıyor), değil mi? Ve aynı zamanda, eğer benim sorgu optimize etmek istiyorum, ben etkinleştirmek gerekir ben karar_sonmapscan etkinleştirmek ya da değil, karar vermek için sorgu jsonb alan (lar) ın yaklaşık kardinalite bilmek zorunda demektir.

— user606521

Evet, bunu her iki açıdan da anlıyorsunuz. Temel% 1 seçicilik, WHEREcin endeksindeki maddede alana bakmayı tercih ediyor çünkü daha az satır döndüreceğine inanıyor, bu doğru değil. Satır sayısını daha iyi tahmin edebileceğiniz için, yaptığınız için ORDER BY data->'index' LIMIT 25, diğer dizinin ilk birkaç girişinin (50 veya daha fazla satır atılmış) taranmasının daha az satırla sonuçlanacağını görebilirsiniz. planlayıcı gerçekten daha hızlı bir sorgu planı kullanılmasına neden olan bir bitmapscan sonuçları kullanmaya çalışmamalıdır. Umarım işleri temizler. =)

— Kassandry

Burada ek açıklayıcı bilgiler de var: databasesoup.com/2015/01/tag-all-things-part-3.html ve bu sunumda yardımcı olmak için thebuild.com/presentations/json2015-pgconfus.pdf .

— Kassandry

Bildiğim tek iş, Oleg Bartunov, Tedor Sigaev ve Alexander Kotorov'dan JsQuery uzantısı ve seçicilik iyileştirmeleri. Herhangi bir şansla, 9.6 veya daha sonra PostgreSQL çekirdeğine dönüştürür.

— Kassandry

PostgreSQL Çekirdek Takım üyesi Josh Berkus'un cevabımdaki e-postadaki% 1'lik rakamı aktardım. Bunun nereden geldiğini, şu anda sahip olduğumdan çok, çok daha derin bir anlayış gerektirir, üzgünüm. = ( Tam olarak bu rakamın nereden geldiği pgsql-performance@postgresql.orgkonusunda Freenode IRC'yi yanıtlamayı veya kontrol etmeyi deneyebilirsiniz #postgresql.

— Kassandry