473,388 Members | 1,326 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Match exact word/phrase

Hi,

I have a word like this: "Rna binding proteins" and i want to match this exact phrase. I have written code like this:

Expand|Select|Wrap|Line Numbers
  1.  
  2. $sentence="Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression.";
  3.  
  4. $word="RNA binding proteins";
  5.  
  6. if($sentence=~/\b$word\b/)
  7. {
  8.  print "matched";
  9.  $sentence=~s/(\b$word\b)/<spanstyle="background-color:#E1FF77">$1<\/span>/i;    
  10.  
  11. print "<br> sentence=$sentence<br>";
  12.  
  13.  
Rna binding proteins will be highlighted in sentence.

I have a problem. Many sentences contains this same phrases but it is not getting matched and its not getting highlighted!!

How to write a code so that it should pick up the following sentences which has phrases like this : "Rna-binding protein" "Rna binding protein"?

I don't understand why the sentences with "Rna binding proteins" is not getting retrieved?

with regards
Archana
Aug 11 '08 #1
12 12985
nithinpes
410 Expert 256MB
You can try this:
Expand|Select|Wrap|Line Numbers
  1. $word="RNA[ -]binding proteins?";
  2.  
  3. if($sentence=~/\b$word\b/i)
  4. {
  5.  print "matched";
  6. ########
  7.  
- The words RNA and binding may be separated by a space or a hipen(hence included space and - inside character class, it will match one of these characters).
- Also, from your description, the phrase may contain 'protein' or 'proteins'. The '?' will match one or zero occurence of the character preceeding it ('s' in this case).
- The /i option in the regex makes the pattern match case insensitive (both RNA and Rna will be matched). Also, you can make use of /g option if you want to extend the search for multiple occurences within a line.
Aug 11 '08 #2
KevinADC
4,059 Expert 2GB
also, there is an error in the HTML code you posted

Expand|Select|Wrap|Line Numbers
  1. <spanstyle=
should be:

Expand|Select|Wrap|Line Numbers
  1. <span style=
Aug 11 '08 #3
also, there is an error in the HTML code you posted

Expand|Select|Wrap|Line Numbers
  1. <spanstyle=
should be:

Expand|Select|Wrap|Line Numbers
  1. <span style=

Hi,

Ya i corrected but still the same!!!

These are 2 sentences.

Expand|Select|Wrap|Line Numbers
  1. Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression and therefore, in defining the bioenergetic signature of the cancer cell is expected.
  2.  
  3. RNA chaperones are non-specific RNA binding proteins that help RNA folding by resolving misfolded structures or preventing their formation.
  4.  
  5.  
only second sentence "RNA binding proteins" is matched and highlighted but first sentence in not matched.

why the phrase is not matched?

With regards

Archana
Aug 11 '08 #4
KevinADC
4,059 Expert 2GB
Are you reading lines from a file or are all the sentences one long string, or what?
Aug 11 '08 #5
Kelicula
176 Expert 100+
It is only remembering the last match made. Well actually the first, but perl goes backwards.

You need "match global" modifier.
add a "g" along with your "case-insensitive" modifier (i).

Expand|Select|Wrap|Line Numbers
  1. $sentence="Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression.";
  2.       $word="RNA binding proteins";
  3.  
  4.       if($sentence=~/\b$word\b/)
  5.       {
  6.        print "matched";
  7.        $sentence=~s/(\b$word\b)/<span style="background-color:#E1FF77">$1<\/span>/ig;   
  8. }       
  9.       print "<br> sentence=$sentence<br>";
  10.  
That should do it.
You may need a loop.
For instance, to remove (nested (even deeply nested (like this))) remarks. You could use:

Expand|Select|Wrap|Line Numbers
  1. 1 while s/\([^()]*\)//g; # This works on $_
  2.  
That's right out of the "Programming Perl" book.

or in your case:

Expand|Select|Wrap|Line Numbers
  1. while($sentence=~s/(\b$word\b)/<spanstyle="background-color:#E1FF77">$1<\/span>/i){
  2. print "matched";       
  3. }       
  4. print "<br> sentence=$sentence<br>";
  5.  
In the later case you should NOT use the "g" modifier, it will create an infinite loop. You also wouldn't need the first if statement. It will continue to match, and substitute as long as it can. If it can't right from the start that's ok.

So the final result would be:
Expand|Select|Wrap|Line Numbers
  1. $sentence="Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression.";
  2. $word="RNA binding proteins";
  3.  
  4.        while($sentence=~s/(\b$word\b)/<spanstyle="background-color:#E1FF77">$1<\/span>/i){
  5. print "matched";   
  6. }       
  7. print "<br> sentence=$sentence<br>";
  8.  

Hope it helps!
Aug 11 '08 #6
It is only remembering the last match made. Well actually the first, but perl goes backwards.

You need "match global" modifier.
add a "g" along with your "case-insensitive" modifier (i).

Expand|Select|Wrap|Line Numbers
  1. $sentence="Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression.";
  2.       $word="RNA binding proteins";
  3.  
  4.       if($sentence=~/\b$word\b/)
  5.       {
  6.        print "matched";
  7.        $sentence=~s/(\b$word\b)/<span style="background-color:#E1FF77">$1<\/span>/ig;   
  8. }       
  9.       print "<br> sentence=$sentence<br>";
  10.  
That should do it.
You may need a loop.
For instance, to remove (nested (even deeply nested (like this))) remarks. You could use:

Expand|Select|Wrap|Line Numbers
  1. 1 while s/\([^()]*\)//g; # This works on $_
  2.  
That's right out of the "Programming Perl" book.

or in your case:

Expand|Select|Wrap|Line Numbers
  1. while($sentence=~s/(\b$word\b)/<spanstyle="background-color:#E1FF77">$1<\/span>/i){
  2. print "matched";       
  3. }       
  4. print "<br> sentence=$sentence<br>";
  5.  
In the later case you should NOT use the "g" modifier, it will create an infinite loop. You also wouldn't need the first if statement. It will continue to match, and substitute as long as it can. If it can't right from the start that's ok.

So the final result would be:
Expand|Select|Wrap|Line Numbers
  1. $sentence="Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression.";
  2. $word="RNA binding proteins";
  3.  
  4.        while($sentence=~s/(\b$word\b)/<spanstyle="background-color:#E1FF77">$1<\/span>/i){
  5. print "matched";   
  6. }       
  7. print "<br> sentence=$sentence<br>";
  8.  

Hope it helps!
Hi,

I tried it didn't work!!!

I have paragraph in file an splitting paragraph into sentences and then i am matching word for that sentence.

Here is the code.
Expand|Select|Wrap|Line Numbers
  1. open(FH,"param.txt")|| die "cannot open file\n";
  2.  
  3. while(<FH>)
  4. {
  5.  
  6.         $content.=$_;
  7.         if($_=~/PMID:(.*)/)
  8.         {
  9.                 $content_all{$_}=$content;
  10.                 $content="";
  11.                 @sentences=split("\n\n",$content_all{$_});
  12.                 for($i=0;$i<=$#sentences;$i++)
  13.                 {
  14.                      print "<br> ***** $sentences[$i] ---> $i <br>";
  15.                 }
  16.                 &subpassparam($content_all{$_});
  17.         }
  18. }
  19. sub subpassparam
  20. {
  21.                 $abs=$_[0];
  22.                 #print "<br>abstract passed = $abs <br>";
  23.                 @sentences=split("\n\n",$content_all{$_});
  24.                 @list=split /[.]\s+\W*/, $sentences[4];
  25.                  foreach(@list)
  26.                 {
  27.                         #print "<br>sentences = $_ <br>";
  28.                          if($_=~/\b$word\b/)
  29.                           {
  30.                                 print "<br>yes its matching <br>";
  31.                                 $_=~s/(\b$word\b)/<span style="background-color:#E1FF77">$1<\/span>/img;
  32.                                  print "<br>sentences only at list = $_ <br> ";
  33.                              }
  34.                          }
  35. }
  36.  
Here is the input file.
Expand|Select|Wrap|Line Numbers
  1. Pull down experiments identified five novel TDRD3 interacting partners, most of which are potentially methylated RNA binding proteins 
  2.  
  3. Here we develop the notion that mRNA regulation via RNA binding proteins, or ribonomics, also contributes to post-ischemic TA 
  4.  
  5.  
  6. PUF proteins comprise a highly conserved family of sequence-specific RNA binding proteins that regulate target mRNAs via binding directly to their 3'UTRs 
  7.  
  8. We review here results arising from the systematic functional analysis of Nova, a neuron-specific RNA binding protein targeted in an autoimmune neurological disorder associated with cancer 
  9.  
  10. A group of RNA binding proteins exerts their roles through the autonomous flowering pathway
  11.  
  12. We have previously identified and characterized two novel nuclear RNA binding proteins, p34 and p37, which have been shown to bind 5S rRNA in Trypanosoma brucei  
  13.  
  14. These mRNAs encode RNA binding proteins, signaling molecules and a replication-independent histone
  15.  
  16.  
  17. Among the observed 3'UTR RNA binding proteins, we have confirmed a 52 kDa protein as the human La autoantigen by using purified recombinant protein and a polyclonal La antibody 
  18.  
  19. The modulation of mRNA binding proteins, therefore, illuminates a promising approach for the pharmacotherapy of those key pathologies mentioned above and characterized by a posttranscriptional dysregulation.
  20.  
How to match exact word?
Is this approach wrong!!!
None of these sentences are getting picked by the program!!!
How to solve this problem?



With regards
Archana
Aug 12 '08 #7
Kelicula
176 Expert 100+
Hi,

I tried it didn't work!!!

I have paragraph in file an splitting paragraph into sentences and then i am matching word for that sentence.

Here is the code.
Expand|Select|Wrap|Line Numbers
  1. open(FH,"param.txt")|| die "cannot open file\n";
  2.  
  3. while(<FH>)
  4. {
  5.  
  6.         $content.=$_;
  7.         if($_=~/PMID:(.*)/)
  8.         {
  9.                 $content_all{$_}=$content;
  10.                 $content="";
  11.                 @sentences=split("\n\n",$content_all{$_});
  12.                 for($i=0;$i<=$#sentences;$i++)
  13.                 {
  14.                      print "<br> ***** $sentences[$i] ---> $i <br>";
  15.                 }
  16.                 &subpassparam($content_all{$_});
  17.         }
  18. }
  19. sub subpassparam
  20. {
  21.                 $abs=$_[0];
  22.                 #print "<br>abstract passed = $abs <br>";
  23.                 @sentences=split("\n\n",$content_all{$_});
  24.                 @list=split /[.]\s+\W*/, $sentences[4];
  25.                  foreach(@list)
  26.                 {
  27.                         #print "<br>sentences = $_ <br>";
  28.                          if($_=~/\b$word\b/)
  29.                           {
  30.                                 print "<br>yes its matching <br>";
  31.                                 $_=~s/(\b$word\b)/<span style="background-color:#E1FF77">$1<\/span>/img;
  32.                                  print "<br>sentences only at list = $_ <br> ";
  33.                              }
  34.                          }
  35. }
  36.  
Here is the input file.
Expand|Select|Wrap|Line Numbers
  1. Pull down experiments identified five novel TDRD3 interacting partners, most of which are potentially methylated RNA binding proteins 
  2.  
  3. Here we develop the notion that mRNA regulation via RNA binding proteins, or ribonomics, also contributes to post-ischemic TA 
  4.  
  5.  
  6. PUF proteins comprise a highly conserved family of sequence-specific RNA binding proteins that regulate target mRNAs via binding directly to their 3'UTRs 
  7.  
  8. We review here results arising from the systematic functional analysis of Nova, a neuron-specific RNA binding protein targeted in an autoimmune neurological disorder associated with cancer 
  9.  
  10. A group of RNA binding proteins exerts their roles through the autonomous flowering pathway
  11.  
  12. We have previously identified and characterized two novel nuclear RNA binding proteins, p34 and p37, which have been shown to bind 5S rRNA in Trypanosoma brucei  
  13.  
  14. These mRNAs encode RNA binding proteins, signaling molecules and a replication-independent histone
  15.  
  16.  
  17. Among the observed 3'UTR RNA binding proteins, we have confirmed a 52 kDa protein as the human La autoantigen by using purified recombinant protein and a polyclonal La antibody 
  18.  
  19. The modulation of mRNA binding proteins, therefore, illuminates a promising approach for the pharmacotherapy of those key pathologies mentioned above and characterized by a posttranscriptional dysregulation.
  20.  
How to match exact word?
Is this approach wrong!!!
None of these sentences are getting picked by the program!!!
How to solve this problem?



With regards
Archana

The statement you have:
Expand|Select|Wrap|Line Numbers
  1. while(<FH>){
  2. 1;
  3. }
  4.  
Will automatically go through each "line" of the input file, loading them one at a time into $_. I'm not sure what this line does:

Expand|Select|Wrap|Line Numbers
  1. if($_=~/PMID:(.*)/){
  2.  
But this code will find all matches and add the span around them. I assumed you also wanted to match "mRNA binding proteins".

Expand|Select|Wrap|Line Numbers
  1.  
  2. use diagnostics;
  3. use warnings;
  4.  
  5. my $word = 'RNA binding proteins';
  6.  
  7. open(FH, "param.txt") or die "Can't open file: $!";
  8.  
  9. while(<FH>){
  10. if(/$word/){
  11. $_ =~ s/(\bm?$word\b)/<span style="background-color:#E1FF77">$1<\/span>/img;
  12. }
  13. print;
  14. }
  15.  
  16. close(FH);
  17.  
If you also want to print ONLY the lines that contained a match try this.

Expand|Select|Wrap|Line Numbers
  1. use diagnostics;
  2. use warnings;
  3.  
  4. my $word = 'RNA binding proteins';
  5.  
  6. open(FH, "param.txt") or die "Can't open file: $!";
  7.  
  8. while(<FH>){
  9. if(/$word/){
  10. $_ =~ s/(\bm?$word\b)/<span style="background-color:#E1FF77">$1<\/span>/img;
  11. print "Match Found:\n";
  12. print "$_\n\n";
  13. }
  14.  
  15. }
  16.  
  17. close(FH);
  18.  
Aug 12 '08 #8
The statement you have:
Expand|Select|Wrap|Line Numbers
  1. while(<FH>){
  2. 1;
  3. }
  4.  
Will automatically go through each "line" of the input file, loading them one at a time into $_. I'm not sure what this line does:

Expand|Select|Wrap|Line Numbers
  1. if($_=~/PMID:(.*)/){
  2.  
But this code will find all matches and add the span around them. I assumed you also wanted to match "mRNA binding proteins".

Expand|Select|Wrap|Line Numbers
  1.  
  2. use diagnostics;
  3. use warnings;
  4.  
  5. my $word = 'RNA binding proteins';
  6.  
  7. open(FH, "param.txt") or die "Can't open file: $!";
  8.  
  9. while(<FH>){
  10. if(/$word/){
  11. $_ =~ s/(\bm?$word\b)/<span style="background-color:#E1FF77">$1<\/span>/img;
  12. }
  13. print;
  14. }
  15.  
  16. close(FH);
  17.  
If you also want to print ONLY the lines that contained a match try this.

Expand|Select|Wrap|Line Numbers
  1. use diagnostics;
  2. use warnings;
  3.  
  4. my $word = 'RNA binding proteins';
  5.  
  6. open(FH, "param.txt") or die "Can't open file: $!";
  7.  
  8. while(<FH>){
  9. if(/$word/){
  10. $_ =~ s/(\bm?$word\b)/<span style="background-color:#E1FF77">$1<\/span>/img;
  11. print "Match Found:\n";
  12. print "$_\n\n";
  13. }
  14.  
  15. }
  16.  
  17. close(FH);
  18.  
Hi,

I have found few sentences that has that word like this:
I could not match that word for these sentences.
Expand|Select|Wrap|Line Numbers
  1.  
  2. eukaryotic type KH-domain, typical of the KH-domain type I superfamily of RNA
  3. binding proteins, and both recombinant and native MOEP19 bind polynucleotides.
  4.  
  5. Karyopherinbeta2 (Kap beta2) or transportin imports numerous RNA binding
  6. proteins into the nucleus. 
  7.  
  8. many questions remain about how these mechanisms are regulated by RNA binding
  9. proteins in the environment of differentiated cells and tissues. 
  10.  
The word is not matching for these sentences and in such case how should i match RNA binding proteins?

How should i check for the conditions to match this word?

With regards
Archana
Aug 13 '08 #9
nithinpes
410 Expert 256MB
That is because you have a newline separating the words of the phrase. That needs to be included in the regex that you are using. To include all the conditions that you have mentioned so far, set $word as below:

Expand|Select|Wrap|Line Numbers
  1. $word="RNA( |-|\n)binding( |\n)proteins?";
  2.  
This does not work if you are reading one line at a time. You need to modify your input record separator to read entire text at once.
Expand|Select|Wrap|Line Numbers
  1. open(FH,"param.txt")|| die "cannot open file\n";
  2. $/ ="";
  3.  
Aug 13 '08 #10
nithinpes
410 Expert 256MB
Also, while reading entire text at once, you need to use /g modifier to search for the pattern repeatedy in the input string.
The line :
Expand|Select|Wrap|Line Numbers
  1. while(<FH>){
  2. if(/$word/){
  3.  
should be changed to:

Expand|Select|Wrap|Line Numbers
  1. while(<FH>){
  2. while(/$word/g){
  3.  
  4.  
Aug 13 '08 #11
Also, while reading entire text at once, you need to use /g modifier to search for the pattern repeatedy in the input string.
The line :
Expand|Select|Wrap|Line Numbers
  1. while(<FH>){
  2. if(/$word/){
  3.  
should be changed to:

Expand|Select|Wrap|Line Numbers
  1. while(<FH>){
  2. while(/$word/g){
  3.  
  4.  
Hi,

I have one problem now.

Now i have considered RNA binding protein as an example.

But i have a text box to take user input.

Here is the code.

Expand|Select|Wrap|Line Numbers
  1.  
  2. $word=param('query');
  3.  
  4.  
Here is the input sentences.
Expand|Select|Wrap|Line Numbers
  1.  
  2. IRES elements consist of cis-acting RNA structures that often operate in association with specific RNA-binding proteins to recruit the translational machinery.
  3.  
  4. Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression and therefore, in defining the bioenergetic signature of the cancer cell is expected.
  5.  
  6. We describe here a complete scaffold-independent analysis of the RNA-binding protein of the four KH domains of KSRP. 
  7.  
  8. RNA chaperones are non-specific RNA binding protein that help RNA folding by resolving misfolded structures or preventing their formation. 
  9.  
  10.  
How to match $word to retrieve all these sentences?

I gave like this but its not working!!!

Expand|Select|Wrap|Line Numbers
  1.  
  2. if($word=~/(.*)[\s\-]?/)
  3.  
  4.  
Its matching with last sentence only!!!

But i want to match $word with all the sentences!!!

How should i give the regular expression?

With regards
Archana
Aug 14 '08 #12
Kelicula
176 Expert 100+
Hi,

I have one problem now.

Now i have considered RNA binding protein as an example.

But i have a text box to take user input.

Here is the code.

Expand|Select|Wrap|Line Numbers
  1.  
  2. $word=param('query');
  3.  
  4.  
Here is the input sentences.
Expand|Select|Wrap|Line Numbers
  1.  
  2. IRES elements consist of cis-acting RNA structures that often operate in association with specific RNA-binding proteins to recruit the translational machinery.
  3.  
  4. Overall, the participation of additional RNA binding proteins in controlling beta-F1-ATPase expression and therefore, in defining the bioenergetic signature of the cancer cell is expected.
  5.  
  6. We describe here a complete scaffold-independent analysis of the RNA-binding protein of the four KH domains of KSRP. 
  7.  
  8. RNA chaperones are non-specific RNA binding protein that help RNA folding by resolving misfolded structures or preventing their formation. 
  9.  
  10.  
How to match $word to retrieve all these sentences?

I gave like this but its not working!!!

Expand|Select|Wrap|Line Numbers
  1.  
  2. if($word=~/(.*)[\s\-]?/)
  3.  
  4.  
Its matching with last sentence only!!!

But i want to match $word with all the sentences!!!

How should i give the regular expression?

With regards
Archana




Just add a "g" for global search.

Try to understand all of this: Regular Expression Tutorial
Aug 23 '08 #13

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Follower | last post by:
Hi, I am working on a function to return extracts from a text document with a specific phrase highlighted (i.e. display the context of the matched phrase). The requirements are: * Match...
6
by: Rob Meade | last post by:
Lo all, I was just running through some code I was writing for a site and when it came to the 'exact phrase' search type I wasn't sure whether that should run through and ignore the words in the...
6
by: Mark Findlay | last post by:
I am trying to figure out how to set up my reg exp search so that the search will only match on the exact word. Here is the current problem code: Word1 = "RealPlayer.exe" Word2 = "Player.exe"...
4
by: jmdaviault | last post by:
I want to do the equivalent of SELECT id from TABLE WHERE text='text' only fast solution I found is: SELECT id,text from TABLE WHERE MATCH(text) AGAINST('value' IN BOOLEAN MODE) HAVING...
5
by: Chris Mahoney | last post by:
Hi I have a string containing a phrase, and I want to search for a particular word and get its index. I've tried a couple of things already: InStr(phrase, word) - this doesn't work because it...
1
by: pmwhelan | last post by:
Hi I have a column that holds text "ProductDescription" How would I do a match for an exact phrase? If one of my rows contains the text "this is a test of the string" and I enter the string...
2
by: teo | last post by:
Hallo, I have a long text Inside it, I have to match: the dog bit the cat but
12
by: ross.oneill | last post by:
Hi, Is there any function in php that will match a word exactly and if it finds it, it returns true. For example if I search for "CA" strVar = "Bob is from Los Angeles CA" - return true ...
1
by: Archanak | last post by:
Hi, I want to match an exact word/phrase in mysql. I don't know weather i have to use "LIKE" syntax or "RLIKE" syntax. How do i proceed further?
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.