473,396 Members | 1,767 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Need to mark similar phrases in two different texts

Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.

Example:

text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.

text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticated technology</bto make the web faster, safer, and
easier.

Hematology Analyzers – Simple, <b>Sophisticated Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

Because "sophisticated technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?
Sep 7 '08 #1
9 3001
SuperNova wrote:
Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.

Example:

text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.

text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticated technology</bto make the web faster, safer, and
easier.

Hematology Analyzers – Simple, <b>Sophisticated Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

Because "sophisticated technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?
That's not quite enough to go on for effectively finding matches. It
would be trivial if you had a pre-determined list of phrases, or you
used a query from the user.

However, as you have it now, and since the phrase could be anything,
you'd end up making bold useless things like indefinite/definite
articles, prepositions, pronouns, etc.

--
Curtis
Sep 7 '08 #2
SuperNova wrote:
I need to mark similar phrases in two different texts, for example to
use <btag.
Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.

Something like this:

<?php
$text1 = 'Google Chrome[...]';
$text2 = 'Hematology Analyzers[...]';

// We don't want case sensitivity
$lower1 = strtolower($text1);
$lower2 = strtolower($text2);

// Array of words
$array1 = preg_split('/\W/', $lower1);
$array2 = preg_split('/\W/', $lower2);

// Intersect
$intersect = array_intersect($array1, $array2);

// Filter
$filter = array('a', '');
$filtered = array_diff($intersect , $filter);

// Make bold
foreach ($filtered as $word) {
$text1 = preg_replace("/($word)/i", '<b>\1</b>', $text1);
$text2 = preg_replace("/($word)/i", '<b>\1</b>', $text2);
}

echo $text1;
echo $text2;
?>
Sep 7 '08 #3
Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
Sep 7 '08 #4
SuperNova schreef:
>Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.

Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
than you can 'unmark' if you got only 1 consecutive hit

this will leave all the marked words with 2 or more consecutive hits

(or am i missing something?)

--
Luuk
Sep 7 '08 #5
SuperNova wrote:
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
I am sure you can figure out how to make my example work with two words.
Although my previous post was elaborate and even included a working
example, I have no intentions to write code for you to solve your problem.
Sep 7 '08 #6
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
I have no intentions to write code for you to solve your problem.
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.

Sep 8 '08 #7
On Sep 8, 5:55*am, SuperNova <SerafimPa...@gmail.comwrote:
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
I have no intentions to write code for you to solve your problem.

I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.
You are probably looking for something along the line of a dictionary
coder, the process used in some compression algorithms. see:
http://en.wikipedia.org/wiki/Dictionary_coder for how it works.
Instead of looking for characters, you will be looking for words.

Bill H
Sep 8 '08 #8
"SuperNova" <Se**********@gmail.comschreef in bericht
news:66**********************************@d77g2000 hsb.googlegroups.com...
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail.comwrote:
>I have no intentions to write code for you to solve your problem.

I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.

Start by selecting two words in a sentence. Copy those, and search for them
in the other sentence. If you don't find a match, forward the word pointer
by one, select the second and third word, redo until you've reached the last
two words (i.e. pointer is at the next to last word).

Every time you do find a match, try finding a longer match until that fails.
Highlight. Then forward the outer pointer not by one word, but by the amount
of words found.

Add in some boundary checking so that you don't fall of the end of a piece
of text.

Make sure you invest some time in selecting the fastest code to do this job,
you probably want to use strpos or strstr depending on how you're going to
code this. strstr allows for some shortcuts, but perhaps a solution using
strpos is faster.

You may need to tweak this algoritm so that you can find more matches, which
may even be longer.

A: If some text starts with abc, then ...
B: if some text contains something else but a substring of some text starts
with abc, then ...

What do you highlight? "some text" and "starts with abc, then...", or "some
text starts with abc, then ..." or both? (better examples will exist, but
you probably got the point)

Sep 9 '08 #9
On Sep 9, 6:37*am, "mijn naam" <whate...@hotmail.invalidwrote:
"SuperNova" <SerafimPa...@gmail.comschreef in berichtnews:66**********************************@d 77g2000hsb.googlegroups.com...
Thanks Bill and Mijn for helping. Your ideas are good, I think it will
help me.

Thanks!
Sep 9 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Christoph Pingel | last post by:
Hi all, an interesting problem for regex nerds. I've got a thesaurus of some hundred words and a moderately large dataset of about 1 million words in some thousand small texts. Words from the...
2
by: Noticedtrends | last post by:
Are there search-engine utilities that allow searches of content only contained in titles (as opposed to regular searches that search through all content)? Would any of these search-engine...
40
by: apprentice | last post by:
Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages,...
15
by: Cheryl Langdon | last post by:
Hello everyone, This is my first attempt at getting help in this manner. Please forgive me if this is an inappropriate request. I suddenly find myself in urgent need of instruction on how to...
13
by: James | last post by:
Is this possible? I want to pass an array into a function that contains txtBox.Text properties... I was thinking something like this, but I know it won't work Dim vendorFields(9) As String ...
4
by: naknak4 | last post by:
Introduction This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will...
6
by: naknak | last post by:
Introduction This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will...
10
by: pycraze | last post by:
Hi , I am currently trying to implement base64 encoding and decoding scheme in C . Python has a module , base64 , that will do the encoding and decoding with ease . I am aware of OpenSSL having...
5
by: rahees | last post by:
i am sending mail using vb code. the contents are in arabic. when recieving in yahoo account its shows only question marks. but it shows right in gmail account. plz help me.. advance thanks... My...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.