Need to mark similar phrases in two different texts 
September 7th, 2008, 07:45 AM
| | | Need to mark similar phrases in two different texts
Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.
Example:
text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.
text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticated technology</bto make the web faster, safer, and
easier.
Hematology Analyzers – Simple, <b>Sophisticated Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
Because "sophisticated technology" is repeated. But unfortunately I
don't know how to do it. Can you help me? | 
September 7th, 2008, 09:05 AM
| | | Re: Need to mark similar phrases in two different texts
SuperNova wrote: Quote:
Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.
>
Example:
>
text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.
>
text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
>
After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticated technology</bto make the web faster, safer, and
easier.
>
Hematology Analyzers – Simple, <b>Sophisticated Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
>
Because "sophisticated technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?
| That's not quite enough to go on for effectively finding matches. It
would be trivial if you had a pre-determined list of phrases, or you
used a query from the user.
However, as you have it now, and since the phrase could be anything,
you'd end up making bold useless things like indefinite/definite
articles, prepositions, pronouns, etc.
--
Curtis | 
September 7th, 2008, 02:25 PM
| | | Re: Need to mark similar phrases in two different texts
SuperNova wrote: Quote:
I need to mark similar phrases in two different texts, for example to
use <btag.
| Why do you want this?
This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.
Something like this:
<?php
$text1 = 'Google Chrome[...]';
$text2 = 'Hematology Analyzers[...]';
// We don't want case sensitivity
$lower1 = strtolower($text1);
$lower2 = strtolower($text2);
// Array of words
$array1 = preg_split('/\W/', $lower1);
$array2 = preg_split('/\W/', $lower2);
// Intersect
$intersect = array_intersect($array1, $array2);
// Filter
$filter = array('a', '');
$filtered = array_diff($intersect , $filter);
// Make bold
foreach ($filtered as $word) {
$text1 = preg_replace("/($word)/i", '<b>\1</b>', $text1);
$text2 = preg_replace("/($word)/i", '<b>\1</b>', $text2);
}
echo $text1;
echo $text2;
?> | 
September 7th, 2008, 06:15 PM
| | | Re: Need to mark similar phrases in two different texts Quote:
Why do you want this?
>
This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.
| Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another. | 
September 7th, 2008, 06:45 PM
| | | Re: Need to mark similar phrases in two different texts
SuperNova schreef: Quote: Quote:
>Why do you want this?
>>
>This may work:
>1) Make a list of words in each text.
>2) Compute the intersection of these lists, so that the result is a list
>with words which are present in both texts.
>3) Filter this list to avoid common words such as 'it' and 'a'.
>4) Mark the all words in the list bold in the texts.
| >
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
| than you can 'unmark' if you got only 1 consecutive hit
this will leave all the marked words with 2 or more consecutive hits
(or am i missing something?)
--
Luuk | 
September 7th, 2008, 07:15 PM
| | | Re: Need to mark similar phrases in two different texts
SuperNova wrote: Quote:
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
| I am sure you can figure out how to make my example work with two words.
Although my previous post was elaborate and even included a working
example, I have no intentions to write code for you to solve your problem. | 
September 8th, 2008, 10:05 AM
| | | Re: Need to mark similar phrases in two different texts
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote: Quote: |
I have no intentions to write code for you to solve your problem.
| I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm. | 
September 8th, 2008, 10:35 AM
| | | Re: Need to mark similar phrases in two different texts
On Sep 8, 5:55*am, SuperNova <SerafimPa...@gmail.comwrote: Quote:
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
> Quote: |
I have no intentions to write code for you to solve your problem.
| >
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.
| You are probably looking for something along the line of a dictionary
coder, the process used in some compression algorithms. see: http://en.wikipedia.org/wiki/Dictionary_coder for how it works.
Instead of looking for characters, you will be looking for words.
Bill H | 
September 9th, 2008, 01:45 AM
| | | Re: Need to mark similar phrases in two different texts
"SuperNova" <SerafimPanov@gmail.comschreef in bericht
news:667509dc-02e4-4144-8482-81d5f31c36ff@d77g2000hsb.googlegroups.com... Quote:
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote: Quote: |
>I have no intentions to write code for you to solve your problem.
| >
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.
|
Start by selecting two words in a sentence. Copy those, and search for them
in the other sentence. If you don't find a match, forward the word pointer
by one, select the second and third word, redo until you've reached the last
two words (i.e. pointer is at the next to last word).
Every time you do find a match, try finding a longer match until that fails.
Highlight. Then forward the outer pointer not by one word, but by the amount
of words found.
Add in some boundary checking so that you don't fall of the end of a piece
of text.
Make sure you invest some time in selecting the fastest code to do this job,
you probably want to use strpos or strstr depending on how you're going to
code this. strstr allows for some shortcuts, but perhaps a solution using
strpos is faster.
You may need to tweak this algoritm so that you can find more matches, which
may even be longer.
A: If some text starts with abc, then ...
B: if some text contains something else but a substring of some text starts
with abc, then ...
What do you highlight? "some text" and "starts with abc, then...", or "some
text starts with abc, then ..." or both? (better examples will exist, but
you probably got the point) | 
September 9th, 2008, 06:35 AM
| | | Re: Need to mark similar phrases in two different texts
On Sep 9, 6:37*am, "mijn naam" <whate...@hotmail.invalidwrote: Quote:
"SuperNova" <SerafimPa...@gmail.comschreef in berichtnews:667509dc-02e4-4144-8482-81d5f31c36ff@d77g2000hsb.googlegroups.com...
>
| Thanks Bill and Mijn for helping. Your ideas are good, I think it will
help me.
Thanks! | | Thread Tools | Search this Thread | | | |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 220,989 network members.
|