By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,148 Members | 1,317 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,148 IT Pros & Developers. It's quick & easy.

LinearSearch to BinarySearch

stealwings
P: 34
I have this code that does a linear search within two files, but it is too slow......
Expand|Select|Wrap|Line Numbers
  1.  
  2. for(unsigned i=0;i<Standard.size();i++) 
  3.   for(unsigned j=0;j<MyWords.size();j++)  //each MyWords compare to the Standard 
  4.                                       //and see if it is in the standard vector
  5.         {
  6.             string::size_type FoundAt=MyWords[i].find(Standard[j]);
  7.             while( string::npos != FoundAt )
  8.                 {
  9.                     MyWords[i].replace( FoundAt, Standard[j].length(),Standard[j]+"/");
  10. FoundAt = MyWords[i].find( Standard[j], FoundAt + Standard[j].length() );
  11.                 }
  12.                             }
  13.               }
  14. /***********************************Create output file****************************************/
  15.     for(unsigned j=0;j<MyWords.size();j++)    // for each word in MyWords vector  
  16.     {
  17.         WriteWord<<MyWords[j]<<endl;    // write it to the destination file
  18.     }
  19.     WriteWord.flush();    // flush the memory to ensure all is written to the file
  20.     WriteWord.close();   // close the write file handls
  21.     return 0;
  22. }
I want to change it into binarysearch so it will perform faster, but the problem is I don't know how can I do that, I mean, if I change it to binarysearch the structure itself will have to be changed. I have the following questions that I can't solve so the binarysearch can be done:
1. Cut the sentences into words.
2. Compare first word of the sentences and see if matches, if match take second and so on till a phrase in my standard file is formed. Otherwise, discard word.
e.g. Lets say I have a sentences "This world is full of beauty!" and I have in my sorted array the phrase "full of beauty", I would like to take each word of the sentence to and compare it with my array like the following:
1. Take the word "This", since it doesn't match delete it
2. Take the word "world", since it doesn't match delete it.
3. and so on until "full", since "full" is part of my phrase "full of beauty" conserve it and take the next and compare it, and so on till I get the full phrase "full of beauty" then return that the phrase was found.
Hope someone can give me some hint, I really need to get this thing to work, and want to get it work myself but I really can't do. Thanks in advance to all.
Mar 21 '07 #1
Share this Question
Share on Google+
8 Replies


DeMan
100+
P: 1,806
You can do a binary search in the same structure, provided the data is sorted...Then instead of iterating through every option, you start in the middle, and choose whether this is your value, otherwise which side to search. You continue to do this until you find your word, (or conversely find you have no words left).

Thus, you have a sorted array,
"beauty", "Full", "of"

You cut "This" off the input string and compare it to the middle elemnet ("Full"), since "This" is lexicongraphically later, you compare to "of" and discard.

Same for "world"

"full" you find immediately...etc

In this example, there is no huge saving, but if your array contains more words it becomes more efficient
Mar 21 '07 #2

stealwings
P: 34
Thanks, but I think I haven't been clear in my question. I have I sort array of phrases which I named as standard, it contains 400 phrases sorted in alphabetical order. Then I have another file that contains lets say 1000 sentences, I want to cut this sentences into words, then do the comparison.
e.g. in the sentences I there is the word "excellent" , I want to take that word, and look if there is any phrase in my array that begins with that word, if true take the next word, if false discard the word and take next one.
Lets say I have in my array "excellent job" and "excellent work" so I will have to check the next word to make sure which is the match. And if the word that follows "excellent" is not neither discard, if is any of both return true.
Mar 21 '07 #3

DeMan
100+
P: 1,806
Sorry if I'm still on the wrong page.....

I think we can still modify the same idea.....

Assumiong you are always searching from the beginning of the word stored in the list you are searching......

say you array has:
"apples and oranges", "burgers and fries", "cats and dog", "excellent job", "excellent weather", "excellent work", "fantastic stuff"

and you want to search for "Doug does excellent work", we chop off Doug and does when we don't find them.....
we compare excellent and (depending on how you implement the comparison (it is trivial to implement one that ignores size and will match "excellent" with "excellent anything") you can do one of many things. At this point, you are at "excellent job" and you could concatenate " work" back onto your "excellent" and search linearly from where you are. You could add "w work" back on and search binary (although you's have to search over the remainder of the array, otherwise finding excellent from the other end may be a little complex). Or (depending on what you actually store in the array (like if you store a list of words in teh array, you could search again without the concatenation, although I'd suggest that's more work).

This (above) should work provided that ytou always want to match a longer phrase with a shorter substring (that is something having "and oranges" won't match to "apples and oranges" but "fred's apples and oranges" will.
You can also modify so that a phrase like "It was somewhere in the country, in a land of rock and scrub" matches to "somewhere in the country".

Hope this helps...

As alway post again if you want further clarification....
Mar 21 '07 #4

stealwings
P: 34
Sorry if I'm still on the wrong page.....

I think we can still modify the same idea.....

Assumiong you are always searching from the beginning of the word stored in the list you are searching......

say you array has:
"apples and oranges", "burgers and fries", "cats and dog", "excellent job", "excellent weather", "excellent work", "fantastic stuff"

and you want to search for "Doug does excellent work", we chop off Doug and does when we don't find them.....
we compare excellent and (depending on how you implement the comparison (it is trivial to implement one that ignores size and will match "excellent" with "excellent anything") you can do one of many things. At this point, you are at "excellent job" and you could concatenate " work" back onto your "excellent" and search linearly from where you are. You could add "w work" back on and search binary (although you's have to search over the remainder of the array, otherwise finding excellent from the other end may be a little complex). Or (depending on what you actually store in the array (like if you store a list of words in teh array, you could search again without the concatenation, although I'd suggest that's more work).

This (above) should work provided that ytou always want to match a longer phrase with a shorter substring (that is something having "and oranges" won't match to "apples and oranges" but "fred's apples and oranges" will.
You can also modify so that a phrase like "It was somewhere in the country, in a land of rock and scrub" matches to "somewhere in the country".

Hope this helps...

As alway post again if you want further clarification....
Thank you DeMan, but the problem here is I don't get how can I chop off Doug and does and how to concatenate work back to excellent, sorry if I driving you crazy but that is the exact thing I want to know. I mean, I also have the idea of how should it work but I don't know how do I code that idea.
Mar 21 '07 #5

stealwings
P: 34
I still don't get it, can some one help me out with my question??? Thank you all in advance!
Mar 23 '07 #6

DeMan
100+
P: 1,806
Sorry I didn't repkly sooner (I somehow missed your last post)

Given what you originally posted I had assumed you were already taking words from a string. You don't have to chop words per'se (especially if you are using local variables).

Say you have the String from before "Doug does excellent work".
You compare the beginning of this string to those in the array/linkedlist. As soon as we can tell Doug can't be find, we iterate to the character after the next space (so the string is still the same size, but we're pointing at the d of "does excellent work".
We begin the search again, quickly realising "does" is no good so we iterate to the char after the next space which is the e in "excellent".
Then we find a match on excellent and continue until one (or both) of the strings (the original "Doug does excellent work" and "excellent work") are completed (Thus avoiding the concatenate side of things).

I realise that much of this is probably not clear, but I can't draw diagrams, and cannot post code unless either
a) you can show you have done work based on previous posts
b) you can explain specifically what part you are having difficulty with (you can learn about strings by googling "c string" - and if you add terms like concatenate, manipulate, modify, compare you'll get more specific results)
Mar 23 '07 #7

stealwings
P: 34
Ok thanks again DeMan, I will try my best to figure it out these days, and if I have further questions I will continue pop upping the post, if these is not against the forum's rules. Or shall I star a new post further?
Mar 23 '07 #8

DeMan
100+
P: 1,806
If you feel the question is related to this post, then feel free to post again in this thread. (On the other hand a new thread tends to get mroe attention, I think)....
So it's up to you.

Look forward to hearing from you again!
Mar 23 '07 #9

Post your reply

Sign in to post your reply or Sign up for a free account.