473,892 Members | 1,531 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how did they do their search algorithm - help anyone with advice, I've got impossible requirements

vic
My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field you will get 2155 items.

- enter the word 'televisions' (in plural) you will get only 60 items

- enter the word 'tv' you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.

Could anyone help me with advise?

Jul 20 '05 #1
14 4646
"vic" <vi*@hotmail.co m> wrote:
My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field you will get 2155 items.

- enter the word 'televisions' (in plural) you will get only 60 items

- enter the word 'tv' you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.
They seem to assign everything to a category and the search facility
is initially searching for categories not for specific items. For
example searching for antique finds the antiques category but does not
find any items with the word antique in their title in other
categories.

It is easy to set up the television category to be associated with the
keywords television, tv, etc.
And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.
See above, you're not searching for items, you're searching for
categories. Hence anything you find will be relevant (assuming
everything is in the correct category) but there's a good chance that
you won't find anything at all if you search for something specific.

Low noise is also probably because there are very items in total on
the edorado site.
Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.


Not at all an HTML issue is it? I suggest you ask in either a general
programming/information theory group or in one devoted to whatever
database and language you are using.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #2

"vic" <vi*@hotmail.co m> wrote in message
news:R_******** *********@newss vr27.news.prodi gy.com...
My manager wants me to develop a search program, that would work like they
have it at edorado.com. <snip> Could anyone from those who work on development of search engines offer me any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a real snake. <snip> Could anyone help me with advise?


Yes. Run away screaming.
Advise your boss that search engine programming is FAR too complex even for
experienced programmers and TEAMS of programmers.
During a recent usability study my company did, we had people use the site's
"search" function, which is badly broken. We would have the participants
look for words & terms that we KNEW were covered on the site. When they were
lead to a search results page that had "No Matches", we'd ask them "OK, now
what would you do?" 100% of the users said "Leave and go to Google".

Providing search is a very valuable feature for any relatively large site.
But it IS NOT something to be programmed by an amateur. There are so many
considerations when building a search utility. Natural language searching,
taxonomy problems, misspellings, abbreviations, acronyms, different levels
of understanding the nomenclature of the site (often, site content is not
written in a way that users understand).

The absolute best option for you is to recommend to your boss that you use a
search utility programmed by people who know WTF they're doing.
If you have the budget, look at Verity Search http://verity.com/ it is
about $15k.
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than
most searches and FREE!

"Why Search Engines Fail" -
http://www.searchtools.com/info/whysearchesfail.html
"Why On-Site Searching Stinks" - http://www.uie.com/articles/search_stinks/
"Brands Suffer From Search Disfunctions" -
http://www.clickz.com/experts/brand/...le.php/1477641
"Linking And Searching" - http://www.humanfactors.com/downloads/jan032.htm
"Half Web Searchers Enter One Query, Look At One Page Of Results" -
http://usabilitynews.com/news/article1213.asp

-Karl
Jul 20 '05 #3
Steve Pugh <st***@pugh.net > wrote:
Low noise is also probably because there are very items in total on
the edorado site.


Insert the word 'few' in order to make sense of the above sentence.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #4
vic wrote:
My manager wants me to develop a search program, that would work like they
have it at edorado.com.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm.

Purchase a search engine. Google has a very nice one.
You can buy a large number of search engines for the amount of your
time it will take to develop one of the sort you are considering.
Do a market survey of what is available: price, installation time, and
training. Estimate how long it will take you to make one: time, money,
resources. Present to two options to your boss.

--
jmm dash list at sohnen-moe dot com
(Remove .TRSPAMTR for email)
Jul 20 '05 #5
After you've told your boss that you put the question to leading
experts in the field and the responses you got support your view.
* It's not possible
* and/or if it were it would be prohibitively expensive
* Explain why the edorado solution is of limited applicability and
unsuitable.
....
You may want to propose another way forward that goes some way toward
achieving her aim.

Depending on the nature of the site you are working on, you might be
able to do this:

Offer users the choice of either using a free-form search box (using a
third party search engine) or selecting from some predefined searches.

For example you could have a select field that offered a list of
categories. Reverting to your example one entry might be
"Television s".

By presenting that choice you reduce the necessity for the user to
choose between TV, Television and televisions for their search and you
KNOW you can deliver some results for those pre-defined searches.

You can then feed the search engine with a search string of your
choosing, in this case "television " to get the largest number of hits
- if that is your intention, or maybe something more specific like
"televison + electronics" to eliminate unhelpful results like "as seen
in our television advert".

This is no use if you are likely to need to handle a large number of
common choices.

Jul 20 '05 #6
vic wrote:

My manager wants me to develop a search program, that would work like they
have it at edorado.com.
She made up her requirements after having compared how search works at
different websites, like eBay, Yahoo and others.
This is what she wants my program to be able to do:
(try this test at different websites just for fun).

At eBay:

- enter the word 'television' in a search field you will get 2155 items.

- enter the word 'televisions' (in plural) you will get only 60 items

- enter the word 'tv' you will get 5147 items.

In other words - if entering different variations of the same word, you
will be getting different results .

My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

And another important thing: the result set came very precise - televisions,
not other things, speaking statistically - 'low noise', or in other words -
very low percentage of unrelated items.

Could anyone from those who work on development of search engines offer me
any advise on how I should approach to the design of my algorithm. I don't
have much experience in programming this kind of search, and my manager is a
real snake. According to her, if search engine brings hundreds of thousands
results, the user would not be able to browse through all of them, so what
she wants my program to do - is to bring less results, and only those that
are most relevant to the search term.

Could anyone help me with advise?

Yes, I can.

I think the problem of sexual missatisfaction in between managers,
cause releafing their ambit missatisfaction to their subordinated
workers.

When you see the viper is going to you, better escape.

Do you sacre to lose your job ?
Then what for you are working ?

Well, my advice:

Try to explain her,
that she's absolutely right, if search sites gave only links which user
want to see
it would be great.
But that relevance of search engines is big problem.

Russian guy proposed a bit better principle of search,
and found top search engine.

And even it does not do everythng what you may expect.

Internet has very big volume of information, and probaly most effective
search in all of it with existed algorythms,
even if all sand on the earth was turned to search engine,
could take more time than black energy distroy this universe.

Say her that you can code classic search engine.

1. A stranger which walk the internet and define what words happen
in first 1 to 5 kilobytes of each page, and put em into database.

2. Search engine which just look in this data base.

Synonims (different wods with same meaning) also the problem,
you of course may make list of synomims.

It's good for "tv" and "television ",
but how to be with "oil" and "shell" or "machine oil" and "machine
shell" ?

Yes there are AI solutions but they all need big processor resourses.

And grammatics.
In most of western languages pluralism of noun marked with end "-s",
"-es" and "-en".
Of what I can remind, at the monet, to the end of words may be added:
"-ing" (-n') (english danish norwegian)
"-ed" (english)
"-en" (dutch german english)
"-t" (dutch german)
"-'s" (english danish)
"-e" (danish) http://www.geocities.com/tsca.geo/dansk/

At the beging: may be added:
"ge-" (dutch german)

Finally words came in english from franch may not have
last mute sylabe in most of ceses:
"-que" replaces with "-c"
republique republic
technique technics

Optionally you may make grammar sequention.

Also some exotical letters may be simplified:
Glyphs "","","" ,"" may be changed to "e", and so on.

Also "ae" "" "" may be replace by each other and all may be repaced by
"e".
""
my be replced by "aa".
"" "" "oe"
"" "ue" "e"
"ij" "y" "u" "i" "" and even "ae" ""
In american english "a" and "u" also may replace each other.

More difficult technique I do not recoment you to implemet.
Because nobody need yet another gray search site,
you may spend your forces for what has no perspective.
Just let her to see what does she can, that may be a goal of your job.

That all.

Bye.

--Michaelo Mitrofanov

Jul 20 '05 #7
"vic" <vi*@hotmail.co m> wrote in message news:<R_******* **********@news svr27.news.prod igy.com>...
snip
My manager showed me one website (www.edorado.com ), where the above problem
didn't occur during her testing. When searched for 3 different forms of the
word 'tv' she was always getting the same results.

snip

Er.. Perhaps the "powered by" link at the bottom of the search
function at that page would be of interest to your manager. Write them
a check and your done.
http://www.freefind.com/
Jul 20 '05 #8
Tim
On Sun, 6 Jun 2004 12:03:47 -0400,
"Karl Groves" <ka**@NOSPAMkar lcore.com> posted:
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than
most searches and FREE!


The interface is just HTML templates, you can completely rewrite it to suit
yourself. It's possible to make it work pretty much the same as most other
search engines, as far as how you input queries and receive results.

I've been playing with it for quite a while on an intranet. The things I
don't like about it are:

* I can't add an address to the database, it has to reindex the whole site.

* It doesn't handle UTF-8 encoded pages (I don't know about other
encodings, it works fine with ISO-8859-1 and ASCII, which are the other
ones I use).

* It doesn't handle character entities (well only a few of them). As soon
as it encounters something like "&hellip;" or "&rdquo;", it turns them into
"&amp;helli p;" or "&amp;rdquo ;" (you'll see that crap in the excerpts it
returns with the results pages).

Most people probably won't notice that problem, as few seem to use proper
punctuation entities. It seems to handle &nbsp; okay, but I haven't gone
through and tested other common ones (e.g. &copy;, &trade;, etc.).

* It, like many other engines, is fond of using an excerpt from the top of
the page. That means that many pages have excerpts which are nothing more
than lists of the main site navigational links. Though you can tell it to
use meta descriptions instead, or even supply both separately.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 20 '05 #9

"Tim" <ti*@mail.local host.invalid> wrote in message
news:1q******** *************** ********@40tude .net...
On Sun, 6 Jun 2004 12:03:47 -0400,
"Karl Groves" <ka**@NOSPAMkar lcore.com> posted:
If you have no budget, get yourself HTDig http://www.htdig.org/
I'm personally not a big fan of HTDig's UI, but it is still far better than most searches and FREE!
The interface is just HTML templates, you can completely rewrite it to

suit yourself. It's possible to make it work pretty much the same as most other search engines, as far as how you input queries and receive results.


You're right. I was referring to the search form itself, again which can be
modified to suit, I've just been too distracted by other things to do so.
Thanks for the other tips.

-Karl
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
6532
by: savesdeday | last post by:
In my beginnning computer science class we were asked to translate a simple interest problem. We are expected to write an algorithm that gets values for the starting account balance B, annual interest rate I, and annual service charge S. Your algorithm would then compute and print out the total amount of interest earned during the year and the final account balance at the end of the year (assuming that interest is compounded monthly, and...
1
3063
by: Dave Townsend | last post by:
Hi, Can anybody help me with the following piece of code? The purpose behind the code is to parse HTML files, strip out the tags and return the text between tags. This is part of a larger application which will perform "searches" for text values in a directory of html files, trying to match only the non-tagged text in the documents.
60
49249
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't be indexed/sorted. I don't want to load the entire file into physical memory, memory-mapped files are ok (and preferred). Speed/performance is a requirement -- the target is to locate the string in 10 seconds or less for a 100 MB file. The...
182
7616
by: Jim Hubbard | last post by:
http://www.eweek.com/article2/0,1759,1774642,00.asp
4
2256
by: zing | last post by:
Our company is in the startup phase of a large project involving lots of network traffic. At this point, I'm trying to find out whether TCP will be fast enough for the task. I've read a few articles that promote UDP, claiming that TCP is slow, mainly written by gamers. But I've also read some articles by more scientific sources, which made it clear that a lot of progress has been made during the last 15 years or so. I actually find it...
458
21664
by: wellstone9912 | last post by:
Java programmers seem to always be whining about how confusing and overly complex C++ appears to them. I would like to introduce an explanation for this. Is it possible that Java programmers simply aren't smart enough to understand C++? This is not merely a whimsical hypothesis. Given my experience with Java programmers --- the code they write and the conversations they have --- Occam's Razor points to this explanation. For example,...
4
3390
by: Dameon | last post by:
Hi All, I have a process where I'd like to search the contents of a file(in a dir) for all occurences (or the count of) of a given string. My goal is to focus more on performance, as some of the files could be upwards of 25mb in size and time is important. I don't want to take the route of loading the text of the file into a giant string and searching it, but would rather focus on a performance-minded solution. Any sugesstions for a...
14
2987
by: S | last post by:
Any idea on how I would be able to do a search within C# that does ranges or words For example I want to search for Chicken in the string string s1 = "This is Great Chicken";
0
10836
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10926
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10468
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9644
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8018
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7172
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5857
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
4279
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3288
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.