By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,131 Members | 2,140 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,131 IT Pros & Developers. It's quick & easy.

a question on google's algorithm

P: n/a
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!

Feb 21 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
* %NAME%:
[interesting question, cross-posted to numerous groups]
No C++ content means OFF-TOPIC in [comp.lang.c++].

Thank you.

FUT: [comp.programming]

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '07 #2

P: n/a

%NAME% <hb*******@gmail.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!
Have you tried Googling(TM) your question?

---
William Ernest Reid

Feb 21 '07 #3

P: n/a
On Feb 20, 6:56 pm, "%NAME%" <hbccan...@gmail.comwrote:
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!
Suppose I have a database of all the words from the
Official Scrabble Players Dictionary (OSPD). I can
create a consonant/vowel vector for all 5-letter
words using the SQL statement:

SELECT OSPD_words.OSPD,
IIf(Mid$([OSPD],1,1) Like "[a,e,i,o,u]",1,0) AS bit1,
IIf(Mid$([OSPD],2,1) Like "[a,e,i,o,u]",1,0) AS bit2,
IIf(Mid$([OSPD],3,1) Like "[a,e,i,o,u]",1,0) AS bit3,
IIf(Mid$([OSPD],4,1) Like "[a,e,i,o,u]",1,0) AS bit4,
IIf(Mid$([OSPD],5,1) Like "[a,e,i,o,u]",1,0) AS bit5,
[bit1] & [bit2] & [bit3] & [bit4] & [bit5] AS vector
FROM OSPD_words
WHERE (((Len([OSPD]))=5));

which produces:

OSPD bit1 bit2 bit3 bit4 bit5 vector
added 1 0 0 1 0 10010
adder 1 0 0 1 0 10010
addle 1 0 0 0 1 10001
adeem 1 0 1 1 0 10110
adept 1 0 1 0 0 10100
adieu 1 0 1 1 1 10111
adios 1 0 1 1 0 10110
adits 1 0 1 0 0 10100
adman 1 0 0 1 0 10010
admen 1 0 0 1 0 10010
admit 1 0 0 1 0 10010
admix 1 0 0 1 0 10010
adobe 1 0 1 0 1 10101
adobo 1 0 1 0 1 10101
..
..
..

Once I have those vectors, I can then query them to
find anything that has the 1st, 4th & 5th letters
as consonants and the 2nd & 3rd letters can be either
consonants or vowels. This is done using the
like "0??00" pattern matching criteria. The question
marks are single character wild cards:

SELECT vector_test1.OSPD, vector_test1.vector
FROM vector_test1
WHERE (((vector_test1.vector) Like "0??00"));

which filters the above output to show only the
requested vectors:

OSPD vector
baals 01100
backs 01000
baddy 01000
badly 01000
baffs 01000
baffy 01000
baggy 01000
bahts 01000
bails 01100
bairn 01100
baith 01100
baits 01100
..
..
..
brach 00100
bract 00100
brads 00100
brags 00100
braky 00100
brand 00100
..
..
..

And because we didn'y define "y" as a vowel, there
are even some 5-letter words with no vowels:

OSPD vector
ghyll 00000
lymph 00000
hymns 00000
gypsy 00000
crypt 00000
dryly 00000
lynch 00000
synth 00000
glyph 00000
synch 00000

Feb 21 '07 #4

P: n/a
In article <11**********************@a75g2000cwd.googlegroups .com>,
hb*******@gmail.com says...
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.
Assuming your bit-vectors were stored in an integer type, you could use
a comparison like this:

if ((bit_vector | pattern) == pattern)
// found
else
// not found

If you need to store larger bit-vectors than will fit in any integer
type (only 32 are guaranteed), you can look at std::bit_vector and/or
std::vector<bool>. IIRC, bit_vector supports boolean operations
directly, but for vector<boolyou'll probably need to write your own.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Feb 21 '07 #5

P: n/a
Jerry Coffin wrote:
hb*******@gmail.com says...
>The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only bits
corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that
query bit vector.

For example, given bit vector query as 01100, i wish to find
all bit vectors that whose 1st, 4th, 5th bits are also zero,
leaving the 2nd, 3rd bits as either 0 or 1.

Assuming your bit-vectors were stored in an integer type, you
could use a comparison like this:

if ((bit_vector | pattern) == pattern)
// found
else
// not found

If you need to store larger bit-vectors than will fit in any
integer type (only 32 are guaranteed), you can look at
std::bit_vector and/or std::vector<bool>. IIRC, bit_vector
supports boolean operations directly, but for vector<bool>
you'll probably need to write your own.
The fault is basically that of the OP, but please do not post
off-topic information on c.l.c. It is easy to overlook the foolish
cross-posting, but you should not answer in any group where the
answer (and question) are off topic. F'ups set.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 21 '07 #6

P: n/a
On Tue, 20 Feb 2007 18:56:19 -0600, NAME% wrote
(in article <11**********************@a75g2000cwd.googlegroups .com>):
The title is not exactly a disnomer, so pleaes let me explain...
What in the world is a disnomer? Is this another one of those "we made
up a word because the mood struck us" deals?
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Mar 6 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.