电脑有固定IP 做网站/关键词推广营销
Hash索引只能处理简单等值比较。不论何时当一个索引列涉及到一个使用了=操作符的 比较时,查询规划器将考虑使用一个Hash索引。下面的命令将创建一个Hash索引:
CREATE INDEX name ON table USING HASH (column);
在官方文档中没有更多的关于hash索引的描述,于是这里我按照pg中hash索引的原理进行了实验:
hash索引特别适用于字段VALUE非常长(不适合b-tree索引,因为b-tree一个PAGE至少要存储3个ENTRY,所以不支持特别长的VALUE)的场景,例如很长的字符串,并且用户只需要等值搜索,建议使用hash index。
kingbase=# create table t_hash (id int, info text);
CREATE TABLE
kingbase=# insert into t_hash select generate_series(1,100), repeat(md5(random()::text),10000);
INSERT 0 100
kingbase=# create index idx_t_hash_1 on t_hash using btree (info);
ERROR: index row size 3720 exceeds maximum 2712 for index "idx_t_hash_1"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
果然,这里因为info字段太长导致报错了!继续创建hash索引看看是不是可以呢?
kingbase=# create index idx_t_hash_1 on t_hash using hash (info);
WARNING: hash indexes are not WAL-logged and their use is discouraged
CREATE INDEX
索引创建成功了,但是下面却有一行WARNING的提示,为什么呢?参考官方文档中的原话:
Hash索引操作目前不被WAL记录,因此存在未写入修改,在数据库崩溃后需要 用REINDEX命令重建Hash索引。同样, 在完成初始的基础备份后,对于Hash索 引的改变也不会通过流式或基于文件的 复制所复制,所以它们会对其后使用它 们的查询给出错误的答案。正因为这些 原因,Hash索引已不再被建议使用。
原来kingbase中不建议使用hash索引,那如果想要实现长字符串的等值搜索,使用hash索引能够成功呢?
kingbase=# set enable_hashjoin=off;
SET
kingbase=# explain (analyze,verbose,timing,costs,buffers) select * from t_hash where info in (select info from t_hash limit 1); QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.03..3.29 rows=1 width=36) (actual time=3.816..209.564 rows=1 loops=1)Output: t_hash.id, t_hash.infoJoin Filter: (t_hash.info = t_hash_1.info)Rows Removed by Join Filter: 99Buffers: shared hit=404-> HashAggregate (cost=0.03..0.04 rows=1 width=32) (actual time=1.498..1.499 rows=1 loops=1)Output: t_hash_1.infoGroup Key: t_hash_1.infoBuffers: shared hit=3-> Limit (cost=0.00..0.02 rows=1 width=32) (actual time=0.028..0.030 rows=1 loops=1)Output: t_hash_1.infoBuffers: shared hit=1-> Seq Scan on PUBLIC.t_hash t_hash_1 (cost=0.00..2.00 rows=100 width=32) (actual time=0.025..0.025 rows=1 loops=1)Output: t_hash_1.infoBuffers: shared hit=1-> Seq Scan on PUBLIC.t_hash (cost=0.00..2.00 rows=100 width=36) (actual time=0.008..0.193 rows=100 loops=1)Output: t_hash.id, t_hash.infoBuffers: shared hit=1Planning time: 0.372 msExecution time: 209.626 ms
(20 行记录)
同样的语句,我们在pg中执行的情况如下:
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_hash where info in (select info from t_hash limit 1); QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.03..3.07 rows=1 width=22) (actual time=0.859..0.861 rows=1 loops=1) Output: t_hash.id, t_hash.info Buffers: shared hit=11 -> HashAggregate (cost=0.03..0.04 rows=1 width=18) (actual time=0.281..0.281 rows=1 loops=1) Output: t_hash_1.info Group Key: t_hash_1.info Buffers: shared hit=3 -> Limit (cost=0.00..0.02 rows=1 width=18) (actual time=0.012..0.012 rows=1 loops=1) Output: t_hash_1.info Buffers: shared hit=1 -> Seq Scan on public.t_hash t_hash_1 (cost=0.00..2.00 rows=100 width=18) (actual time=0.011..0.011 rows=1 loops=1) Output: t_hash_1.info Buffers: shared hit=1 -> Index Scan using idx_t_hash_1 on public.t_hash (cost=0.00..3.02 rows=1 width=22) (actual time=0.526..0.527 rows=1 loops=1) Output: t_hash.id, t_hash.info Index Cond: (t_hash.info = t_hash_1.info) Buffers: shared hit=6 Planning time: 0.159 ms Execution time: 0.898 ms
(19 rows)
确实,kingbase中hash索引无法支持长字符串的等值搜索.因为kingbase是基于pg9版本改造的,对于hash索引的支持还是较弱的,再加上不被wal记录,看来是在kingbase中hash索引的确没有什么使用的价值了.