正文

超長字段的索引

(2010-10-15 00:08:55) 下一個
SQL Server 不允許對長度超過900字節的varchar字段或超過450字節的nvarchar字段作成索引。遇上對這類字段做查詢時,就隻能做全表查詢,速度可想而知地慢。

網上看一篇文章介紹了一個好思路。就是為這個超長字段(F1)追加一個專門用來加速檢索的hash字段F2,根據F1的值生成Hash值,保存到F2中,然後對F2做成索引。當需要查詢F1時,隻需要對F2做查詢即可。圖例如下:
1)Table做成時
 ++++  F1  ++++ +++  F2 ++++
 ++++  Str                      Hash(Str)
 ++++  (例4000byte)    (16byte)
Insert into xxx (..,F1,F2,..) Values(...,@Str, Hash([@Str),...)

2)查詢時
原select ... From ... WHERE F1 = @xx        (無法使用INDEX)
現select ... From ... WHERE F2 = Hash(@xx)  (使用INDEX)

關鍵的這個Hash函數如下:(參照http://sqlservernation.com/blogs/howtos/archive/2009/03/15/hashing-large-data-strings.aspx

使用例 Hash (@xx)= fn_hashbytesMAX(@xx, 'MD5')

create function dbo.fn_hashbytesMAX
     (@string nvarchar(max)
     ,@Algo    varchar(10)
    )
    returns binary(16)
as
/************************************************************
*
*    Author:        Brandon Galderisi
*    Last modified: 03-MAR-2009
* Purpose: uses the system function hashbytes as well
*                   as sys.fn_varbintohexstr to split an
* nvarchar(max) string and hash in 8000 byte
*                   chunks hashing each 8000 byte chunk,,
*                   getting the 32 byte output, streaming each
*                   32 byte output into a string then hashing
*                   that string.
*
*************************************************************/
begin
     declare    @concat       varchar(max)
               ,@concatX      XML
               ,@NumHash      int
               ,@HASH         binary(16)
     set @NumHash = ceiling((datalength(@string)/2)/(4000.0))
    /* HashBytes only supports 8000 bytes so split the string if it is larger */
    if @NumHash>1
    begin
                                                        -- # * 4000 character strings
          ;with a as (select 1 as n union all select 1) -- 2
               ,b as (select 1 as n from a ,a a1)       -- 4
               ,c as (select 1 as n from b ,b b1)       -- 16
               ,d as (select 1 as n from c ,c c1)       -- 256
               ,e as (select 1 as n from d ,d d1)       -- 65,536
               ,f as (select 1 as n from e ,e e1)       -- 4,294,967,296 = 17+ TRILLION characters
               ,factored as (select row_number() over (order by n) rn from f)
               ,factors as (select rn,(rn*4000)+1 factor from factored)

          select @concat = cast((
          select right(sys.fn_varbintohexstr
                         (
                         hashbytes(@Algo, substring(@string, factor - 4000, 4000))
                         )
                      , 32) + ''
          from Factors
          where rn <= @NumHash
          for xml path('')
          ) as nvarchar(max))


          set @HASH = dbo.fn_hashbytesMAX(@concat ,@Algo)
    end
     else
     begin
          set @HASH = convert(binary(16), hashbytes(@Algo, @string))
     end

return @HASH
end


真是個巧妙的想法!

[ 打印 ]
閱讀 ()評論 (1)
評論
目前還沒有任何評論
登錄後才可評論.