Karma Katıl vs Karma Yarı Katıl

PostgreSQL 9.2

Hash Semi JoinVe just arasındaki farkı anlamaya çalışıyorum Hash Join.

İşte iki sorgu:

ben

EXPLAIN ANALYZE SELECT * FROM orders WHERE customerid IN (SELECT
customerid FROM customers WHERE state='MD');

Hash Semi Join  (cost=740.34..994.61 rows=249 width=30) (actual time=2.684..4.520 rows=120 loops=1)
  Hash Cond: (orders.customerid = customers.customerid)
  ->  Seq Scan on orders  (cost=0.00..220.00 rows=12000 width=30) (actual time=0.004..0.743 rows=12000 loops=1)
  ->  Hash  (cost=738.00..738.00 rows=187 width=4) (actual time=2.664..2.664 rows=187 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 7kB
        ->  Seq Scan on customers  (cost=0.00..738.00 rows=187 width=4) (actual time=0.018..2.638 rows=187 loops=1)
              Filter: ((state)::text = 'MD'::text)
              Rows Removed by Filter: 19813

EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON o.customerid = c.customerid WHERE c.state = 'MD'

Hash Join  (cost=740.34..1006.46 rows=112 width=298) (actual time=2.831..4.762 rows=120 loops=1)
  Hash Cond: (o.customerid = c.customerid)
  ->  Seq Scan on orders o  (cost=0.00..220.00 rows=12000 width=30) (actual time=0.004..0.768 rows=12000 loops=1)
  ->  Hash  (cost=738.00..738.00 rows=187 width=268) (actual time=2.807..2.807 rows=187 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 37kB
        ->  Seq Scan on customers c  (cost=0.00..738.00 rows=187 width=268) (actual time=0.018..2.777 rows=187 loops=1)
              Filter: ((state)::text = 'MD'::text)
              Rows Removed by Filter: 19813

Görülebileceği gibi, planlardaki tek fark, ilk durumda, hastabl tüketir 7kB, ancak ikincisinde 37kBve düğümün olmasıdır Hash Semi Join.

Ama hashtable boyutundaki farkı anlamıyorum. HashDüğümü ile tam anlamıyla aynı kullanan Seq Scanaynı olan düğüm Filter. Neden fark var?

postgresql join hashing

— St.Antario
kaynak

Sorguların gerçek çıktısına baktınız mı? Veya kullanın explain (analyze, verbose).

— jjanes

İlk sorguda, yalnızca customer_id değerinin customershash tablosuna kaydedilmesi gerekir , çünkü bu, yarı birleştirmeyi uygulamak için gereken tek veridir.

İkinci sorguda, *yalnızca customer_id varlığını test etmek yerine tüm sütunları tablodan (kullanarak ) seçtiğiniz için tüm sütunların karma tablosuna depolanması gerekir .

— jjanes
kaynak