473,806 Members | 2,732 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Need to mark similar phrases in two different texts

Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.

Example:

text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.

text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticate d technology</bto make the web faster, safer, and
easier.

Hematology Analyzers – Simple, <b>Sophisticate d Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

Because "sophistica ted technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?
Sep 7 '08 #1
9 3015
SuperNova wrote:
Hello!
I need to mark similar phrases in two different texts, for example to
use <btag.

Example:

text 1:
Google Chrome is a browser that combines a minimal design with
sophisticated technology to make the web faster, safer, and easier.

text 2:
Hematology Analyzers – Simple, Sophisticated Technology Serving All
Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

After comparing the following should be shown:
Google Chrome is a browser that combines a minimal design with
<b>sophisticate d technology</bto make the web faster, safer, and
easier.

Hematology Analyzers – Simple, <b>Sophisticate d Technology</bServing
All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

Because "sophistica ted technology" is repeated. But unfortunately I
don't know how to do it. Can you help me?
That's not quite enough to go on for effectively finding matches. It
would be trivial if you had a pre-determined list of phrases, or you
used a query from the user.

However, as you have it now, and since the phrase could be anything,
you'd end up making bold useless things like indefinite/definite
articles, prepositions, pronouns, etc.

--
Curtis
Sep 7 '08 #2
SuperNova wrote:
I need to mark similar phrases in two different texts, for example to
use <btag.
Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.

Something like this:

<?php
$text1 = 'Google Chrome[...]';
$text2 = 'Hematology Analyzers[...]';

// We don't want case sensitivity
$lower1 = strtolower($tex t1);
$lower2 = strtolower($tex t2);

// Array of words
$array1 = preg_split('/\W/', $lower1);
$array2 = preg_split('/\W/', $lower2);

// Intersect
$intersect = array_intersect ($array1, $array2);

// Filter
$filter = array('a', '');
$filtered = array_diff($int ersect , $filter);

// Make bold
foreach ($filtered as $word) {
$text1 = preg_replace("/($word)/i", '<b>\1</b>', $text1);
$text2 = preg_replace("/($word)/i", '<b>\1</b>', $text2);
}

echo $text1;
echo $text2;
?>
Sep 7 '08 #3
Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
Sep 7 '08 #4
SuperNova schreef:
>Why do you want this?

This may work:
1) Make a list of words in each text.
2) Compute the intersection of these lists, so that the result is a list
with words which are present in both texts.
3) Filter this list to avoid common words such as 'it' and 'a'.
4) Mark the all words in the list bold in the texts.

Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
than you can 'unmark' if you got only 1 consecutive hit

this will leave all the marked words with 2 or more consecutive hits

(or am i missing something?)

--
Luuk
Sep 7 '08 #5
SuperNova wrote:
Thank you for the code sample. It's a good thing to think about. But I
need to mark similar phrases, 2 or more words one after another. Your
code marks all the similar words, but I need to mark only 2 or more
words one after another.
I am sure you can figure out how to make my example work with two words.
Although my previous post was elaborate and even included a working
example, I have no intentions to write code for you to solve your problem.
Sep 7 '08 #6
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail .comwrote:
I have no intentions to write code for you to solve your problem.
I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.

Sep 8 '08 #7
On Sep 8, 5:55*am, SuperNova <SerafimPa...@g mail.comwrote:
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail .comwrote:
I have no intentions to write code for you to solve your problem.

I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.
You are probably looking for something along the line of a dictionary
coder, the process used in some compression algorithms. see:
http://en.wikipedia.org/wiki/Dictionary_coder for how it works.
Instead of looking for characters, you will be looking for words.

Bill H
Sep 8 '08 #8
"SuperNova" <Se**********@g mail.comschreef in bericht
news:66******** *************** ***********@d77 g2000hsb.google groups.com...
On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail .comwrote:
>I have no intentions to write code for you to solve your problem.

I don't need code, I need algorithm. But the only thing I'm thinking
about is to split words in array and to check words. If words are
alike, the second word should be checked again, if it is alike too,
the mark should be set. But I hoped that there is more fast algorithm.

Start by selecting two words in a sentence. Copy those, and search for them
in the other sentence. If you don't find a match, forward the word pointer
by one, select the second and third word, redo until you've reached the last
two words (i.e. pointer is at the next to last word).

Every time you do find a match, try finding a longer match until that fails.
Highlight. Then forward the outer pointer not by one word, but by the amount
of words found.

Add in some boundary checking so that you don't fall of the end of a piece
of text.

Make sure you invest some time in selecting the fastest code to do this job,
you probably want to use strpos or strstr depending on how you're going to
code this. strstr allows for some shortcuts, but perhaps a solution using
strpos is faster.

You may need to tweak this algoritm so that you can find more matches, which
may even be longer.

A: If some text starts with abc, then ...
B: if some text contains something else but a substring of some text starts
with abc, then ...

What do you highlight? "some text" and "starts with abc, then...", or "some
text starts with abc, then ..." or both? (better examples will exist, but
you probably got the point)

Sep 9 '08 #9
On Sep 9, 6:37*am, "mijn naam" <whate...@hotma il.invalidwrote :
"SuperNova" <SerafimPa...@g mail.comschreef in berichtnews:66* *************** *************** ***@d77g2000hsb .googlegroups.c om...
Thanks Bill and Mijn for helping. Your ideas are good, I think it will
help me.

Thanks!
Sep 9 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
4055
by: Christoph Pingel | last post by:
Hi all, an interesting problem for regex nerds. I've got a thesaurus of some hundred words and a moderately large dataset of about 1 million words in some thousand small texts. Words from the thesaurus appear at many places in my texts, but they are often misspelled, just slightly different from the thesaurus. Now I'm looking for the best strategy to match the appearence of my thesaurus items in the texts. Do I have to build patterns...
2
1321
by: Noticedtrends | last post by:
Are there search-engine utilities that allow searches of content only contained in titles (as opposed to regular searches that search through all content)? Would any of these search-engine utilities allow tallies e.g., listings in titles of two-word phrases that have occured for the first time within the last thirty days? Listings of two-word phrases that have occured two to four times within the last three months? The purpose of...
40
3253
by: apprentice | last post by:
Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages, including asian languages (chinese, japanese, korean, etc). First of all, strings will be passed to my class methods, some of which based on the language (and on the encoding) might contain characters that require more that a single byte.
15
4651
by: Cheryl Langdon | last post by:
Hello everyone, This is my first attempt at getting help in this manner. Please forgive me if this is an inappropriate request. I suddenly find myself in urgent need of instruction on how to communicate with a MySQL database table on a web server, from inside of my company's Access-VBA application. I know VBA pretty well but have never before needed to do this HTTP/XML/MySQL type functions.
13
1485
by: James | last post by:
Is this possible? I want to pass an array into a function that contains txtBox.Text properties... I was thinking something like this, but I know it won't work Dim vendorFields(9) As String vendorFields(0) = "txtVendorName.Text" vendorFields(1) = "txtVendorStreetAddress.Text" vendorFields(2) = "txtVendorCity.Text" ....
4
2212
by: naknak4 | last post by:
Introduction This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will implement all three techniques as programs. In these programs, as well as solving the problem, you will also measure how long the program takes to run. The programs are worth 80% of the total mark. The final 20% of the marks are awarded for a...
6
2133
by: naknak | last post by:
Introduction This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will implement all three techniques as programs. In these programs, as well as solving the problem, you will also measure how long the program takes to run. The programs are worth 80% of the total mark. The final 20% of the marks are awarded for a...
10
4062
by: pycraze | last post by:
Hi , I am currently trying to implement base64 encoding and decoding scheme in C . Python has a module , base64 , that will do the encoding and decoding with ease . I am aware of OpenSSL having support for base64 encoding and decoding , but i will have to now implement both in C without using the openssl libraries . I was able to download a code w.r.t. base 64 encoding and decoding . I am attaching the code below .
5
2441
by: rahees | last post by:
i am sending mail using vb code. the contents are in arabic. when recieving in yahoo account its shows only question marks. but it shows right in gmail account. plz help me.. advance thanks... My code is below strSql = "SELECT ,USER_PWD FROM DEALERS WHERE EMAIL='" & txtEmail.Text.Trim & "'" Cn = db.GetConnection cmd = New SqlCommand(strSql, Cn) dr = cmd.ExecuteReader If dr.Read Then ...
0
9719
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9598
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10623
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10371
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10373
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10111
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5546
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5683
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3010
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.