473,386 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

a question on google's algorithm

The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!

Feb 21 '07 #1
6 1607
* %NAME%:
[interesting question, cross-posted to numerous groups]
No C++ content means OFF-TOPIC in [comp.lang.c++].

Thank you.

FUT: [comp.programming]

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Feb 21 '07 #2

%NAME% <hb*******@gmail.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!
Have you tried Googling(TM) your question?

---
William Ernest Reid

Feb 21 '07 #3
On Feb 20, 6:56 pm, "%NAME%" <hbccan...@gmail.comwrote:
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.

What I wish to do is to build an index on the bit vectors, so that I
could avoid checking each 0-1 vector in sequence. In another word, I
hope this index could help me quickly filter away those vectors that
guarunteed not to be subsumed by the given 0-1 vector.

I am not sure this is the right place to ask, but the index could be
used to solve a search engine problem: the 0-1 vector
corresponds to my keywords search (a 0 means that keyword NOT in my
query, while a 1 means the keyword is in), and all the webpages
corresponds to 0-1 vectors. I wish to find all the webpages that
subsumes my query.

If anyone happen to know how Google solves this problem, and it is not
confidential, pls let me know, thanks a bunch!!!!
Suppose I have a database of all the words from the
Official Scrabble Players Dictionary (OSPD). I can
create a consonant/vowel vector for all 5-letter
words using the SQL statement:

SELECT OSPD_words.OSPD,
IIf(Mid$([OSPD],1,1) Like "[a,e,i,o,u]",1,0) AS bit1,
IIf(Mid$([OSPD],2,1) Like "[a,e,i,o,u]",1,0) AS bit2,
IIf(Mid$([OSPD],3,1) Like "[a,e,i,o,u]",1,0) AS bit3,
IIf(Mid$([OSPD],4,1) Like "[a,e,i,o,u]",1,0) AS bit4,
IIf(Mid$([OSPD],5,1) Like "[a,e,i,o,u]",1,0) AS bit5,
[bit1] & [bit2] & [bit3] & [bit4] & [bit5] AS vector
FROM OSPD_words
WHERE (((Len([OSPD]))=5));

which produces:

OSPD bit1 bit2 bit3 bit4 bit5 vector
added 1 0 0 1 0 10010
adder 1 0 0 1 0 10010
addle 1 0 0 0 1 10001
adeem 1 0 1 1 0 10110
adept 1 0 1 0 0 10100
adieu 1 0 1 1 1 10111
adios 1 0 1 1 0 10110
adits 1 0 1 0 0 10100
adman 1 0 0 1 0 10010
admen 1 0 0 1 0 10010
admit 1 0 0 1 0 10010
admix 1 0 0 1 0 10010
adobe 1 0 1 0 1 10101
adobo 1 0 1 0 1 10101
..
..
..

Once I have those vectors, I can then query them to
find anything that has the 1st, 4th & 5th letters
as consonants and the 2nd & 3rd letters can be either
consonants or vowels. This is done using the
like "0??00" pattern matching criteria. The question
marks are single character wild cards:

SELECT vector_test1.OSPD, vector_test1.vector
FROM vector_test1
WHERE (((vector_test1.vector) Like "0??00"));

which filters the above output to show only the
requested vectors:

OSPD vector
baals 01100
backs 01000
baddy 01000
badly 01000
baffs 01000
baffy 01000
baggy 01000
bahts 01000
bails 01100
bairn 01100
baith 01100
baits 01100
..
..
..
brach 00100
bract 00100
brads 00100
brags 00100
braky 00100
brand 00100
..
..
..

And because we didn'y define "y" as a vowel, there
are even some 5-letter words with no vowels:

OSPD vector
ghyll 00000
lymph 00000
hymns 00000
gypsy 00000
crypt 00000
dryly 00000
lynch 00000
synth 00000
glyph 00000
synch 00000

Feb 21 '07 #4
In article <11**********************@a75g2000cwd.googlegroups .com>,
hb*******@gmail.com says...
The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only
bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that query bit
vector.

For example, given bit vector query as 01100, i wish to find all bit
vectors that whose 1st, 4th, 5th bits are also zero, leaving
the 2nd, 3rd bits as either 0 or 1.
Assuming your bit-vectors were stored in an integer type, you could use
a comparison like this:

if ((bit_vector | pattern) == pattern)
// found
else
// not found

If you need to store larger bit-vectors than will fit in any integer
type (only 32 are guaranteed), you can look at std::bit_vector and/or
std::vector<bool>. IIRC, bit_vector supports boolean operations
directly, but for vector<boolyou'll probably need to write your own.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Feb 21 '07 #5
Jerry Coffin wrote:
hb*******@gmail.com says...
>The title is not exactly a disnomer, so pleaes let me explain...

Say I have a large number of 0-1 vectors containing only bits
corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
I wish to find all the 0-1 vectors that are subsumed by that
query bit vector.

For example, given bit vector query as 01100, i wish to find
all bit vectors that whose 1st, 4th, 5th bits are also zero,
leaving the 2nd, 3rd bits as either 0 or 1.

Assuming your bit-vectors were stored in an integer type, you
could use a comparison like this:

if ((bit_vector | pattern) == pattern)
// found
else
// not found

If you need to store larger bit-vectors than will fit in any
integer type (only 32 are guaranteed), you can look at
std::bit_vector and/or std::vector<bool>. IIRC, bit_vector
supports boolean operations directly, but for vector<bool>
you'll probably need to write your own.
The fault is basically that of the OP, but please do not post
off-topic information on c.l.c. It is easy to overlook the foolish
cross-posting, but you should not answer in any group where the
answer (and question) are off topic. F'ups set.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Feb 21 '07 #6
On Tue, 20 Feb 2007 18:56:19 -0600, NAME% wrote
(in article <11**********************@a75g2000cwd.googlegroups .com>):
The title is not exactly a disnomer, so pleaes let me explain...
What in the world is a disnomer? Is this another one of those "we made
up a word because the mood struck us" deals?
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Mar 6 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: adrianTNT | last post by:
Hello; Anyone knows how can I create a scrip that returns the google page rank of a given web page? I have seen this on many web pages but I want to host my own script and I dont know how it...
2
by: Philipp Lenssen | last post by:
Google With Your Stylesheet <http://blog.outer-court.com/archive/2003_12_22_index.html#1072120815601 84833> Search.CSS is a search engine which grabs Google results and displays them using...
87
by: ziliath | last post by:
I recently tried out the Google "top coder" contest, as a C++ coder. I noticed immediately that they expected me to know STL. To which I say, what the fuck?! I may be missing something, but at...
3
by: Lad | last post by:
Hi, I would like to make in my web application a similar navigation like Google uses, I mean at the bottom of each page there are numbers of returned pages after you search query. Has anyone...
10
by: bpontius | last post by:
The GES Algorithm A Surprisingly Simple Algorithm for Parallel Pattern Matching "Partially because the best algorithms presented in the literature are difficult to understand and to implement,...
3
by: Johann Blake | last post by:
This aticle presents factual evidence that Google's PageRank, inbound links and keywords are irrelevent to your placement in the search results. It presents several case studies that show...
2
by: Julio C. Hernandez Castro | last post by:
Dear all, We have just developped a new block cipher called Raiden, following a Feistel Network structure by means of genetic programming. Our intention now consists on getting as much feedback...
12
by: amogan | last post by:
**If interested & qualified, please reply with your resume directly to amogan@google.com** Referrals are always welcome!! Network System Test Engineer - Mountain View This position is...
20
by: tatata9999 | last post by:
The first generation of web site search engine hands-down is google. A majority of these web sites are static page -driven html pages. Now, I would think more and more web-based applications are...
4
by: oruccim | last post by:
hi my friends; google can searching in phrase but it is imposible. it have a lot of page in data base and quadrillions sentence it can't search in fulltxt all of them .it need a super algorithm....
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.