On Feb 20, 6:56 pm, "%NAME%" <hbccan...@gmail.comwrote:

The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only

bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,

I wish to find all the 0-1 vectors that are subsumed by that query bit

vector.

For example, given bit vector query as 01100, i wish to find all bit

vectors that whose 1st, 4th, 5th bits are also zero, leaving

the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I

could avoid checking each 0-1 vector in sequence. In another word, I

hope this index could help me quickly filter away those vectors that

guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be

used to solve a search engine problem: the 0-1 vector

corresponds to my keywords search (a 0 means that keyword NOT in my

query, while a 1 means the keyword is in), and all the webpages

corresponds to 0-1 vectors. I wish to find all the webpages that

subsumes my query.

If anyone happen to know how Google solves this problem, and it is not

confidential, pls let me know, thanks a bunch!!!!

Suppose I have a database of all the words from the

Official Scrabble Players Dictionary (OSPD). I can

create a consonant/vowel vector for all 5-letter

words using the SQL statement:

SELECT OSPD_words.OSPD,

IIf(Mid$([OSPD],1,1) Like "[a,e,i,o,u]",1,0) AS bit1,

IIf(Mid$([OSPD],2,1) Like "[a,e,i,o,u]",1,0) AS bit2,

IIf(Mid$([OSPD],3,1) Like "[a,e,i,o,u]",1,0) AS bit3,

IIf(Mid$([OSPD],4,1) Like "[a,e,i,o,u]",1,0) AS bit4,

IIf(Mid$([OSPD],5,1) Like "[a,e,i,o,u]",1,0) AS bit5,

[bit1] & [bit2] & [bit3] & [bit4] & [bit5] AS vector

FROM OSPD_words

WHERE (((Len([OSPD]))=5));

which produces:

OSPD bit1 bit2 bit3 bit4 bit5 vector

added 1 0 0 1 0 10010

adder 1 0 0 1 0 10010

addle 1 0 0 0 1 10001

adeem 1 0 1 1 0 10110

adept 1 0 1 0 0 10100

adieu 1 0 1 1 1 10111

adios 1 0 1 1 0 10110

adits 1 0 1 0 0 10100

adman 1 0 0 1 0 10010

admen 1 0 0 1 0 10010

admit 1 0 0 1 0 10010

admix 1 0 0 1 0 10010

adobe 1 0 1 0 1 10101

adobo 1 0 1 0 1 10101

..

..

..

Once I have those vectors, I can then query them to

find anything that has the 1st, 4th & 5th letters

as consonants and the 2nd & 3rd letters can be either

consonants or vowels. This is done using the

like "0??00" pattern matching criteria. The question

marks are single character wild cards:

SELECT vector_test1.OSPD, vector_test1.vector

FROM vector_test1

WHERE (((vector_test1.vector) Like "0??00"));

which filters the above output to show only the

requested vectors:

OSPD vector

baals 01100

backs 01000

baddy 01000

badly 01000

baffs 01000

baffy 01000

baggy 01000

bahts 01000

bails 01100

bairn 01100

baith 01100

baits 01100

..

..

..

brach 00100

bract 00100

brads 00100

brags 00100

braky 00100

brand 00100

..

..

..

And because we didn'y define "y" as a vowel, there

are even some 5-letter words with no vowels:

OSPD vector

ghyll 00000

lymph 00000

hymns 00000

gypsy 00000

crypt 00000

dryly 00000

lynch 00000

synth 00000

glyph 00000

synch 00000