Connecting Tech Pros Worldwide Help | Site Map

Need to mark similar phrases in two different texts

 
LinkBack Thread Tools Search this Thread
  #1  
Old September 7th, 2008, 07:45 AM
SuperNova
Guest
 
Posts: n/a
Default Need to mark similar phrases in two different texts

Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.

Example:

text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.

text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticated technology</bto make the web faster, safer, and
easier.

Hematology Analyzers – Simple, <b>Sophisticated Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

Because "sophisticated technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?

  #2  
Old September 7th, 2008, 09:05 AM
Curtis
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

SuperNova wrote:
Quote:
Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.
>
Example:
>
text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.
>
text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
>
After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticated technology</bto make the web faster, safer, and
easier.
>
Hematology Analyzers – Simple, <b>Sophisticated Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
>
Because "sophisticated technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?
That's not quite enough to go on for effectively finding matches. It
would be trivial if you had a pre-determined list of phrases, or you
used a query from the user.

However, as you have it now, and since the phrase could be anything,
you'd end up making bold useless things like indefinite/definite
articles, prepositions, pronouns, etc.

--
Curtis
  #3  
Old September 7th, 2008, 02:25 PM
Sjoerd
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

SuperNova wrote:
Quote:
I need to mark similar phrases in two different texts, for example to
use <btag.
Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.

Something like this:

<?php
$text1 = 'Google Chrome[...]';
$text2 = 'Hematology Analyzers[...]';

// We don't want case sensitivity
$lower1 = strtolower($text1);
$lower2 = strtolower($text2);

// Array of words
$array1 = preg_split('/\W/', $lower1);
$array2 = preg_split('/\W/', $lower2);

// Intersect
$intersect = array_intersect($array1, $array2);

// Filter
$filter = array('a', '');
$filtered = array_diff($intersect , $filter);

// Make bold
foreach ($filtered as $word) {
$text1 = preg_replace("/($word)/i", '<b>\1</b>', $text1);
$text2 = preg_replace("/($word)/i", '<b>\1</b>', $text2);
}

echo $text1;
echo $text2;
?>
  #4  
Old September 7th, 2008, 06:15 PM
SuperNova
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

Quote:
Why do you want this?
>
This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
  #5  
Old September 7th, 2008, 06:45 PM
Luuk
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

SuperNova schreef:
Quote:
Quote:
>Why do you want this?
>>
>This may work:
>1) Make a list of words in each text.
>2) Compute the intersection of these lists, so that the result is a list
>with words which are present in both texts.
>3) Filter this list to avoid common words such as 'it' and 'a'.
>4) Mark the all words in the list bold in the texts.
>
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
than you can 'unmark' if you got only 1 consecutive hit

this will leave all the marked words with 2 or more consecutive hits

(or am i missing something?)

--
Luuk
  #6  
Old September 7th, 2008, 07:15 PM
Sjoerd
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

SuperNova wrote:
Quote:
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
I am sure you can figure out how to make my example work with two words.
Although my previous post was elaborate and even included a working
example, I have no intentions to write code for you to solve your problem.
  #7  
Old September 8th, 2008, 10:05 AM
SuperNova
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
Quote:
I have no intentions to write code for you to solve your problem.
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.

  #8  
Old September 8th, 2008, 10:35 AM
Bill H
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

On Sep 8, 5:55*am, SuperNova <SerafimPa...@gmail.comwrote:
Quote:
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
>
Quote:
I have no intentions to write code for you to solve your problem.
>
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.
You are probably looking for something along the line of a dictionary
coder, the process used in some compression algorithms. see:
http://en.wikipedia.org/wiki/Dictionary_coder for how it works.
Instead of looking for characters, you will be looking for words.

Bill H
  #9  
Old September 9th, 2008, 01:45 AM
mijn naam
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

"SuperNova" <SerafimPanov@gmail.comschreef in bericht
news:667509dc-02e4-4144-8482-81d5f31c36ff@d77g2000hsb.googlegroups.com...
Quote:
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
Quote:
>I have no intentions to write code for you to solve your problem.
>
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.

Start by selecting two words in a sentence. Copy those, and search for them
in the other sentence. If you don't find a match, forward the word pointer
by one, select the second and third word, redo until you've reached the last
two words (i.e. pointer is at the next to last word).

Every time you do find a match, try finding a longer match until that fails.
Highlight. Then forward the outer pointer not by one word, but by the amount
of words found.

Add in some boundary checking so that you don't fall of the end of a piece
of text.

Make sure you invest some time in selecting the fastest code to do this job,
you probably want to use strpos or strstr depending on how you're going to
code this. strstr allows for some shortcuts, but perhaps a solution using
strpos is faster.

You may need to tweak this algoritm so that you can find more matches, which
may even be longer.

A: If some text starts with abc, then ...
B: if some text contains something else but a substring of some text starts
with abc, then ...

What do you highlight? "some text" and "starts with abc, then...", or "some
text starts with abc, then ..." or both? (better examples will exist, but
you probably got the point)

  #10  
Old September 9th, 2008, 06:35 AM
SuperNova
Guest
 
Posts: n/a
Default Re: Need to mark similar phrases in two different texts

On Sep 9, 6:37*am, "mijn naam" <whate...@hotmail.invalidwrote:
Quote:
"SuperNova" <SerafimPa...@gmail.comschreef in berichtnews:667509dc-02e4-4144-8482-81d5f31c36ff@d77g2000hsb.googlegroups.com...
>
Thanks Bill and Mijn for helping. Your ideas are good, I think it will
help me.

Thanks!
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.