473,480 Members | 1,711 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

how did they do their search algorithm - help anyone with advice, I've got impossible requirements

vic
My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field à you will get 2155 items.

- enter the word 'televisions' (in plural) à you will get only 60 items

- enter the word 'tv' à you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.

Could anyone help me with advise?

Jul 20 '05 #1
14 4605
"vic" <vi*@hotmail.com> wrote:
My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field à you will get 2155 items.

- enter the word 'televisions' (in plural) à you will get only 60 items

- enter the word 'tv' à you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.
They seem to assign everything to a category and the search facility
is initially searching for categories not for specific items. For
example searching for antique finds the antiques category but does not
find any items with the word antique in their title in other
categories.

It is easy to set up the television category to be associated with the
keywords television, tv, etc.
And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.
See above, you're not searching for items, you're searching for
categories. Hence anything you find will be relevant (assuming
everything is in the correct category) but there's a good chance that
you won't find anything at all if you search for something specific.

Low noise is also probably because there are very items in total on
the edorado site.
Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.


Not at all an HTML issue is it? I suggest you ask in either a general
programming/information theory group or in one devoted to whatever
database and language you are using.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 20 '05 #2

"vic" <vi*@hotmail.com> wrote in message
news:R_*****************@newssvr27.news.prodigy.co m...
My manager wants me to develop a search program, that would work like they
have it at edorado.com. <snip> Could anyone from those who work on development of search engines offer me any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a real snake. <snip> Could anyone help me with advise?


Yes. Run away screaming.
Advise your boss that search engine programming is FAR too complex even for
experienced programmers and TEAMS of programmers.
During a recent usability study my company did, we had people use the site's
"search" function, which is badly broken. We would have the participants
look for words & terms that we KNEW were covered on the site. When they were
lead to a search results page that had "No Matches", we'd ask them "OK, now
what would you do?" 100% of the users said "Leave and go to Google".

Providing search is a very valuable feature for any relatively large site.
But it IS NOT something to be programmed by an amateur. There are so many
considerations when building a search utility. Natural language searching,
taxonomy problems, misspellings, abbreviations, acronyms, different levels
of understanding the nomenclature of the site (often, site content is not
written in a way that users understand).

The absolute best option for you is to recommend to your boss that you use a
search utility programmed by people who know WTF they're doing.
If you have the budget, look at Verity Search http://verity.com/ it is
about $15k.
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than
most searches and FREE!

"Why Search Engines Fail" -
http://www.searchtools.com/info/whysearchesfail.html
"Why On-Site Searching Stinks" - http://www.uie.com/articles/search_stinks/
"Brands Suffer From Search Disfunctions" -
http://www.clickz.com/experts/brand/...le.php/1477641
"Linking And Searching" - http://www.humanfactors.com/downloads/jan032.htm
"Half Web Searchers Enter One Query, Look At One Page Of Results" -
http://usabilitynews.com/news/article1213.asp

-Karl
Jul 20 '05 #3
Steve Pugh <st***@pugh.net> wrote:
Low noise is also probably because there are very items in total on
the edorado site.


Insert the word 'few' in order to make sense of the above sentence.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 20 '05 #4
vic wrote:
My manager wants me to develop a search program, that would work like they
have it at edorado.com.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm.

Purchase a search engine. Google has a very nice one.
You can buy a large number of search engines for the amount of your
time it will take to develop one of the sort you are considering.
Do a market survey of what is available: price, installation time, and
training. Estimate how long it will take you to make one: time, money,
resources. Present to two options to your boss.

--
jmm dash list at sohnen-moe dot com
(Remove .TRSPAMTR for email)
Jul 20 '05 #5
After you've told your boss that you put the question to leading
experts in the field and the responses you got support your view.
* It's not possible
* and/or if it were it would be prohibitively expensive
* Explain why the edorado solution is of limited applicability and
unsuitable.
....
You may want to propose another way forward that goes some way toward
achieving her aim.

Depending on the nature of the site you are working on, you might be
able to do this:

Offer users the choice of either using a free-form search box (using a
third party search engine) or selecting from some predefined searches.

For example you could have a select field that offered a list of
categories. Reverting to your example one entry might be
"Televisions".

By presenting that choice you reduce the necessity for the user to
choose between TV, Television and televisions for their search and you
KNOW you can deliver some results for those pre-defined searches.

You can then feed the search engine with a search string of your
choosing, in this case "television" to get the largest number of hits
- if that is your intention, or maybe something more specific like
"televison + electronics" to eliminate unhelpful results like "as seen
in our television advert".

This is no use if you are likely to need to handle a large number of
common choices.

Jul 20 '05 #6
vic wrote:

My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field à you will get 2155 items.

- enter the word 'televisions' (in plural) à you will get only 60 items

- enter the word 'tv' à you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.

Could anyone help me with advise?

Yes, I can.

I think the problem of sexual missatisfaction in between managers,
cause releafing their ambit missatisfaction to their subordinated
workers.

When you see the viper is going to you, better escape.

Do you sacre to lose your job ?
Then what for you are working ?

Well, my advice:

Try to explain her,
that she's absolutely right, if search sites gave only links which user
want to see
it would be great.
But that relevance of search engines is big problem.

Russian guy proposed a bit better principle of search,
and found top search engine.

And even it does not do everythng what you may expect.

Internet has very big volume of information, and probaly most effective
search in all of it with existed algorythms,
even if all sand on the earth was turned to search engine,
could take more time than black energy distroy this universe.

Say her that you can code classic search engine.

1. A stranger which walk the internet and define what words happen
in first 1 to 5 kilobytes of each page, and put em into database.

2. Search engine which just look in this data base.

Synonims (different wods with same meaning) also the problem,
you of course may make list of synomims.

It's good for "tv" and "television",
but how to be with "oil" and "shell" or "machine oil" and "machine
shell" ?

Yes there are AI solutions but they all need big processor resourses.

And grammatics.
In most of western languages pluralism of noun marked with end "-s",
"-es" and "-en".
Of what I can remind, at the monet, to the end of words may be added:
"-ing" (-n') (english danish norwegian)
"-ed" (english)
"-en" (dutch german english)
"-t" (dutch german)
"-'s" (english danish)
"-e" (danish) http://www.geocities.com/tsca.geo/dansk/

At the beging: may be added:
"ge-" (dutch german)

Finally words came in english from franch may not have
last mute sylabe in most of ceses:
"-que" replaces with "-c"
republique republic
technique technics

Optionally you may make grammar sequention.

Also some exotical letters may be simplified:
Glyphs "è","é","ê","ë" may be changed to "e", and so on.

Also "ae" "æ" "ä" may be replace by each other and all may be repaced by
"e".
"å"
my be replced by "aa".
"ö" "ø" "oe"
"ü" "ue" "e"
"ij" "y" "u" "i" "ï" and even "ae" "æ"
In american english "a" and "u" also may replace each other.

More difficult technique I do not recoment you to implemet.
Because nobody need yet another gray search site,
you may spend your forces for what has no perspective.
Just let her to see what does she can, that may be a goal of your job.

That all.

Bye.

--Michaelo Mitrofanov

Jul 20 '05 #7
"vic" <vi*@hotmail.com> wrote in message news:<R_*****************@newssvr27.news.prodigy.c om>...
snip
My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

snip

Er.. Perhaps the "powered by" link at the bottom of the search
function at that page would be of interest to your manager. Write them
a check and your done.
http://www.freefind.com/
Jul 20 '05 #8
Tim
On Sun, 6 Jun 2004 12:03:47 -0400,
"Karl Groves" <ka**@NOSPAMkarlcore.com> posted:
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than
most searches and FREE!


The interface is just HTML templates, you can completely rewrite it to suit
yourself. It's possible to make it work pretty much the same as most other
search engines, as far as how you input queries and receive results.

I've been playing with it for quite a while on an intranet. The things I
don't like about it are:

* I can't add an address to the database, it has to reindex the whole site.

* It doesn't handle UTF-8 encoded pages (I don't know about other
encodings, it works fine with ISO-8859-1 and ASCII, which are the other
ones I use).

* It doesn't handle character entities (well only a few of them). As soon
as it encounters something like "&hellip;" or "&rdquo;", it turns them into
"&amp;hellip;" or "&amp;rdquo;" (you'll see that crap in the excerpts it
returns with the results pages).

Most people probably won't notice that problem, as few seem to use proper
punctuation entities. It seems to handle &nbsp; okay, but I haven't gone
through and tested other common ones (e.g. &copy;, &trade;, etc.).

* It, like many other engines, is fond of using an excerpt from the top of
the page. That means that many pages have excerpts which are nothing more
than lists of the main site navigational links. Though you can tell it to
use meta descriptions instead, or even supply both separately.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 20 '05 #9

"Tim" <ti*@mail.localhost.invalid> wrote in message
news:1q*******************************@40tude.net. ..
On Sun, 6 Jun 2004 12:03:47 -0400,
"Karl Groves" <ka**@NOSPAMkarlcore.com> posted:
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than most searches and FREE!
The interface is just HTML templates, you can completely rewrite it to

suit yourself. It's possible to make it work pretty much the same as most other search engines, as far as how you input queries and receive results.


You're right. I was referring to the search form itself, again which can be
modified to suit, I've just been too distracted by other things to do so.
Thanks for the other tips.

-Karl
Jul 20 '05 #10
"Steve Pugh" <st***@pugh.net> schrieb im Newsbeitrag
news:qc********************************@4ax.com...
"vic" <vi*@hotmail.com> wrote:
[...]
They seem to assign everything to a category and the search facility
is initially searching for categories not for specific items. For
example searching for antique finds the antiques category but does not
find any items with the word antique in their title in other
categories.

It is easy to set up the television category to be associated with the
keywords television, tv, etc.


Also note that this is editorial work, not programming. It would neet quite
a team of editors to achieve this even for one single language only.

[...]
Could anyone from those who work on development of search engines offer meany advise on how I should approach to the design of my algorithm. I don'thave much experience in programming this kind of search, and my manager is areal snake. According to her, if search engine brings hundreds of thousandsresults, the user would not be able to browse through all of them, so whatshe wants my program to do - is to bring less results, and only those thatare most relevant to the search term.


If it would be that easy... why would Google and others spend millions of
dollars on research and enhancing their search and indexing algorithms every
year?

Besides paid and free search engines mentioned in other posts also consider
to subscribe to a remote search service such as Atomz (www.atomz.com).

--
Markus
Jul 20 '05 #11
Hi Vic,
Maybe I developed just the type of search engine you are looking for.
It is described at
http://www.contentsitesearch.esmartdesign.com
If you are interested please drop me an e-mail
Cheers,
Peter

Jul 23 '05 #12
"Peter JavaScript" <pe***************@hotmail.com> wrote:
Hi Vic,
Maybe I developed just the type of search engine you are looking for.
It is described at
http://www.contentsitesearch.esmartdesign.com
If you are interested please drop me an e-mail
Cheers,
Peter


Replying to six months old messages with no quoted material to let
readers know what you're talking about. Classy.

Steve

Jul 23 '05 #13
"Steve Pugh" wrote in comp.infosystems.www.authoring.html:
Replying to six months old messages with no quoted material to let
readers know what you're talking about. Classy.


There seems to be a huge upsurge in some groups of even legitimate
replies to very old threads.

I wonder whether this phenomenon is a byproduct of the changes over
at Google.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/200..._wont_help_you
Jul 23 '05 #14
On Sun, 2 Jan 2005 14:57:34 -0500, Stan Brown
<th************@fastmail.fm> wrote:

[...]
There seems to be a huge upsurge in some groups of even legitimate
replies to very old threads.

I wonder whether this phenomenon is a byproduct of the changes over
at Google.


The old version of google groups didn't let you reply to messages more
than about a month old (a sensible restriction, IMO). However, after
the switch to their new system, it seems that there is nothing to
prevent one from replying to any message of any age (at least last
time I checked).

Nick

--
Nick Theodorakis
ni**************@hotmail.com
contact form:
http://theodorakis.net/contact.html
Jul 23 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
6484
by: savesdeday | last post by:
In my beginnning computer science class we were asked to translate a simple interest problem. We are expected to write an algorithm that gets values for the starting account balance B, annual...
1
3037
by: Dave Townsend | last post by:
Hi, Can anybody help me with the following piece of code? The purpose behind the code is to parse HTML files, strip out the tags and return the text between tags. This is part of a larger...
60
48968
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
182
7350
by: Jim Hubbard | last post by:
http://www.eweek.com/article2/0,1759,1774642,00.asp
4
2234
by: zing | last post by:
Our company is in the startup phase of a large project involving lots of network traffic. At this point, I'm trying to find out whether TCP will be fast enough for the task. I've read a few...
458
20760
by: wellstone9912 | last post by:
Java programmers seem to always be whining about how confusing and overly complex C++ appears to them. I would like to introduce an explanation for this. Is it possible that Java programmers...
4
3351
by: Dameon | last post by:
Hi All, I have a process where I'd like to search the contents of a file(in a dir) for all occurences (or the count of) of a given string. My goal is to focus more on performance, as some of the...
14
2948
by: S | last post by:
Any idea on how I would be able to do a search within C# that does ranges or words For example I want to search for Chicken in the string string s1 = "This is Great Chicken";
0
6908
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7087
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6741
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
6944
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
4483
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
2995
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
2985
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
563
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
182
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.