By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,853 Members | 936 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

how did they do their search algorithm - help anyone with advice, I've got impossible requirements

P: n/a
vic
My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field you will get 2155 items.

- enter the word 'televisions' (in plural) you will get only 60 items

- enter the word 'tv' you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.

Could anyone help me with advise?

Jul 20 '05 #1
Share this Question
Share on Google+
14 Replies


P: n/a
"vic" <vi*@hotmail.com> wrote:
My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field you will get 2155 items.

- enter the word 'televisions' (in plural) you will get only 60 items

- enter the word 'tv' you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.
They seem to assign everything to a category and the search facility
is initially searching for categories not for specific items. For
example searching for antique finds the antiques category but does not
find any items with the word antique in their title in other
categories.

It is easy to set up the television category to be associated with the
keywords television, tv, etc.
And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.
See above, you're not searching for items, you're searching for
categories. Hence anything you find will be relevant (assuming
everything is in the correct category) but there's a good chance that
you won't find anything at all if you search for something specific.

Low noise is also probably because there are very items in total on
the edorado site.
Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.


Not at all an HTML issue is it? I suggest you ask in either a general
programming/information theory group or in one devoted to whatever
database and language you are using.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 20 '05 #2

P: n/a

"vic" <vi*@hotmail.com> wrote in message
news:R_*****************@newssvr27.news.prodigy.co m...
My manager wants me to develop a search program, that would work like they
have it at edorado.com. <snip> Could anyone from those who work on development of search engines offer me any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a real snake. <snip> Could anyone help me with advise?


Yes. Run away screaming.
Advise your boss that search engine programming is FAR too complex even for
experienced programmers and TEAMS of programmers.
During a recent usability study my company did, we had people use the site's
"search" function, which is badly broken. We would have the participants
look for words & terms that we KNEW were covered on the site. When they were
lead to a search results page that had "No Matches", we'd ask them "OK, now
what would you do?" 100% of the users said "Leave and go to Google".

Providing search is a very valuable feature for any relatively large site.
But it IS NOT something to be programmed by an amateur. There are so many
considerations when building a search utility. Natural language searching,
taxonomy problems, misspellings, abbreviations, acronyms, different levels
of understanding the nomenclature of the site (often, site content is not
written in a way that users understand).

The absolute best option for you is to recommend to your boss that you use a
search utility programmed by people who know WTF they're doing.
If you have the budget, look at Verity Search http://verity.com/ it is
about $15k.
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than
most searches and FREE!

"Why Search Engines Fail" -
http://www.searchtools.com/info/whysearchesfail.html
"Why On-Site Searching Stinks" - http://www.uie.com/articles/search_stinks/
"Brands Suffer From Search Disfunctions" -
http://www.clickz.com/experts/brand/...le.php/1477641
"Linking And Searching" - http://www.humanfactors.com/downloads/jan032.htm
"Half Web Searchers Enter One Query, Look At One Page Of Results" -
http://usabilitynews.com/news/article1213.asp

-Karl
Jul 20 '05 #3

P: n/a
Steve Pugh <st***@pugh.net> wrote:
Low noise is also probably because there are very items in total on
the edorado site.


Insert the word 'few' in order to make sense of the above sentence.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 20 '05 #4

P: n/a
vic wrote:
My manager wants me to develop a search program, that would work like they
have it at edorado.com.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm.

Purchase a search engine. Google has a very nice one.
You can buy a large number of search engines for the amount of your
time it will take to develop one of the sort you are considering.
Do a market survey of what is available: price, installation time, and
training. Estimate how long it will take you to make one: time, money,
resources. Present to two options to your boss.

--
jmm dash list at sohnen-moe dot com
(Remove .TRSPAMTR for email)
Jul 20 '05 #5

P: n/a
After you've told your boss that you put the question to leading
experts in the field and the responses you got support your view.
* It's not possible
* and/or if it were it would be prohibitively expensive
* Explain why the edorado solution is of limited applicability and
unsuitable.
....
You may want to propose another way forward that goes some way toward
achieving her aim.

Depending on the nature of the site you are working on, you might be
able to do this:

Offer users the choice of either using a free-form search box (using a
third party search engine) or selecting from some predefined searches.

For example you could have a select field that offered a list of
categories. Reverting to your example one entry might be
"Televisions".

By presenting that choice you reduce the necessity for the user to
choose between TV, Television and televisions for their search and you
KNOW you can deliver some results for those pre-defined searches.

You can then feed the search engine with a search string of your
choosing, in this case "television" to get the largest number of hits
- if that is your intention, or maybe something more specific like
"televison + electronics" to eliminate unhelpful results like "as seen
in our television advert".

This is no use if you are likely to need to handle a large number of
common choices.

Jul 20 '05 #6

P: n/a
vic wrote:

My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field you will get 2155 items.

- enter the word 'televisions' (in plural) you will get only 60 items

- enter the word 'tv' you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.

Could anyone help me with advise?

Yes, I can.

I think the problem of sexual missatisfaction in between managers,
cause releafing their ambit missatisfaction to their subordinated
workers.

When you see the viper is going to you, better escape.

Do you sacre to lose your job ?
Then what for you are working ?

Well, my advice:

Try to explain her,
that she's absolutely right, if search sites gave only links which user
want to see
it would be great.
But that relevance of search engines is big problem.

Russian guy proposed a bit better principle of search,
and found top search engine.

And even it does not do everythng what you may expect.

Internet has very big volume of information, and probaly most effective
search in all of it with existed algorythms,
even if all sand on the earth was turned to search engine,
could take more time than black energy distroy this universe.

Say her that you can code classic search engine.

1. A stranger which walk the internet and define what words happen
in first 1 to 5 kilobytes of each page, and put em into database.

2. Search engine which just look in this data base.

Synonims (different wods with same meaning) also the problem,
you of course may make list of synomims.

It's good for "tv" and "television",
but how to be with "oil" and "shell" or "machine oil" and "machine
shell" ?

Yes there are AI solutions but they all need big processor resourses.

And grammatics.
In most of western languages pluralism of noun marked with end "-s",
"-es" and "-en".
Of what I can remind, at the monet, to the end of words may be added:
"-ing" (-n') (english danish norwegian)
"-ed" (english)
"-en" (dutch german english)
"-t" (dutch german)
"-'s" (english danish)
"-e" (danish) http://www.geocities.com/tsca.geo/dansk/

At the beging: may be added:
"ge-" (dutch german)

Finally words came in english from franch may not have
last mute sylabe in most of ceses:
"-que" replaces with "-c"
republique republic
technique technics

Optionally you may make grammar sequention.

Also some exotical letters may be simplified:
Glyphs "","","","" may be changed to "e", and so on.

Also "ae" "" "" may be replace by each other and all may be repaced by
"e".
""
my be replced by "aa".
"" "" "oe"
"" "ue" "e"
"ij" "y" "u" "i" "" and even "ae" ""
In american english "a" and "u" also may replace each other.

More difficult technique I do not recoment you to implemet.
Because nobody need yet another gray search site,
you may spend your forces for what has no perspective.
Just let her to see what does she can, that may be a goal of your job.

That all.

Bye.

--Michaelo Mitrofanov

Jul 20 '05 #7

P: n/a
"vic" <vi*@hotmail.com> wrote in message news:<R_*****************@newssvr27.news.prodigy.c om>...
snip
My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

snip

Er.. Perhaps the "powered by" link at the bottom of the search
function at that page would be of interest to your manager. Write them
a check and your done.
http://www.freefind.com/
Jul 20 '05 #8

P: n/a
Tim
On Sun, 6 Jun 2004 12:03:47 -0400,
"Karl Groves" <ka**@NOSPAMkarlcore.com> posted:
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than
most searches and FREE!


The interface is just HTML templates, you can completely rewrite it to suit
yourself. It's possible to make it work pretty much the same as most other
search engines, as far as how you input queries and receive results.

I've been playing with it for quite a while on an intranet. The things I
don't like about it are:

* I can't add an address to the database, it has to reindex the whole site.

* It doesn't handle UTF-8 encoded pages (I don't know about other
encodings, it works fine with ISO-8859-1 and ASCII, which are the other
ones I use).

* It doesn't handle character entities (well only a few of them). As soon
as it encounters something like "&hellip;" or "&rdquo;", it turns them into
"&amp;hellip;" or "&amp;rdquo;" (you'll see that crap in the excerpts it
returns with the results pages).

Most people probably won't notice that problem, as few seem to use proper
punctuation entities. It seems to handle &nbsp; okay, but I haven't gone
through and tested other common ones (e.g. &copy;, &trade;, etc.).

* It, like many other engines, is fond of using an excerpt from the top of
the page. That means that many pages have excerpts which are nothing more
than lists of the main site navigational links. Though you can tell it to
use meta descriptions instead, or even supply both separately.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 20 '05 #9

P: n/a

"Tim" <ti*@mail.localhost.invalid> wrote in message
news:1q*******************************@40tude.net. ..
On Sun, 6 Jun 2004 12:03:47 -0400,
"Karl Groves" <ka**@NOSPAMkarlcore.com> posted:
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than most searches and FREE!
The interface is just HTML templates, you can completely rewrite it to

suit yourself. It's possible to make it work pretty much the same as most other search engines, as far as how you input queries and receive results.


You're right. I was referring to the search form itself, again which can be
modified to suit, I've just been too distracted by other things to do so.
Thanks for the other tips.

-Karl
Jul 20 '05 #10

P: n/a
"Steve Pugh" <st***@pugh.net> schrieb im Newsbeitrag
news:qc********************************@4ax.com...
"vic" <vi*@hotmail.com> wrote:
[...]
They seem to assign everything to a category and the search facility
is initially searching for categories not for specific items. For
example searching for antique finds the antiques category but does not
find any items with the word antique in their title in other
categories.

It is easy to set up the television category to be associated with the
keywords television, tv, etc.


Also note that this is editorial work, not programming. It would neet quite
a team of editors to achieve this even for one single language only.

[...]
Could anyone from those who work on development of search engines offer meany advise on how I should approach to the design of my algorithm. I don'thave much experience in programming this kind of search, and my manager is areal snake. According to her, if search engine brings hundreds of thousandsresults, the user would not be able to browse through all of them, so whatshe wants my program to do - is to bring less results, and only those thatare most relevant to the search term.


If it would be that easy... why would Google and others spend millions of
dollars on research and enhancing their search and indexing algorithms every
year?

Besides paid and free search engines mentioned in other posts also consider
to subscribe to a remote search service such as Atomz (www.atomz.com).

--
Markus
Jul 20 '05 #11

P: n/a
Hi Vic,
Maybe I developed just the type of search engine you are looking for.
It is described at
http://www.contentsitesearch.esmartdesign.com
If you are interested please drop me an e-mail
Cheers,
Peter

Jul 23 '05 #12

P: n/a
"Peter JavaScript" <pe***************@hotmail.com> wrote:
Hi Vic,
Maybe I developed just the type of search engine you are looking for.
It is described at
http://www.contentsitesearch.esmartdesign.com
If you are interested please drop me an e-mail
Cheers,
Peter


Replying to six months old messages with no quoted material to let
readers know what you're talking about. Classy.

Steve

Jul 23 '05 #13

P: n/a
"Steve Pugh" wrote in comp.infosystems.www.authoring.html:
Replying to six months old messages with no quoted material to let
readers know what you're talking about. Classy.


There seems to be a huge upsurge in some groups of even legitimate
replies to very old threads.

I wonder whether this phenomenon is a byproduct of the changes over
at Google.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/200..._wont_help_you
Jul 23 '05 #14

P: n/a
On Sun, 2 Jan 2005 14:57:34 -0500, Stan Brown
<th************@fastmail.fm> wrote:

[...]
There seems to be a huge upsurge in some groups of even legitimate
replies to very old threads.

I wonder whether this phenomenon is a byproduct of the changes over
at Google.


The old version of google groups didn't let you reply to messages more
than about a month old (a sensible restriction, IMO). However, after
the switch to their new system, it seems that there is nothing to
prevent one from replying to any message of any age (at least last
time I checked).

Nick

--
Nick Theodorakis
ni**************@hotmail.com
contact form:
http://theodorakis.net/contact.html
Jul 23 '05 #15

This discussion thread is closed

Replies have been disabled for this discussion.