473,856 Members | 1,626 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Match 2 words in a line of file

Hi

Am pretty new to python and hence this question..

I have file with an output of a process. I need to search this file one
line at a time and my pattern is that I am looking for the lines that
has the word 'event' and the word 'new'.

Note that i need lines that has both the words only and not either one
of them..

how do i write a regexp for this.. or better yet shd i even be using
regexp or is there a better way to do this....

thanks

Jan 18 '07 #1
11 11832
el*********@gma il.com wrote:
Hi

Am pretty new to python and hence this question..

I have file with an output of a process. I need to search this file one
line at a time and my pattern is that I am looking for the lines that
has the word 'event' and the word 'new'.

Note that i need lines that has both the words only and not either one
of them..

how do i write a regexp for this.. or better yet shd i even be using
regexp or is there a better way to do this....

thanks
Maybe something like this would do:

import re

def lines_with_word s(file, word1, word2):
"""Print all lines in file that have both words in it."""
for line in file:
if re.search(r"\b" + word1 + r"\b", line) and \
re.search(r"\b" + word2 + r"\b", line):
print line

Just call the function with a file object and two strings that
represent the words that you want to find in each line.

To match a word in regex you write "\bWORD\b".

I don't know if there is a better way of doing this, but I believe that
this should at least work.

Jan 19 '07 #2
Without using re, this may work (untested ;-):

def lines_with_word s(file, word1, word2):
"""Print all lines in file that have both words in it."""
for line in file:
words = line.split()
if word1 in words and word2 in words:
print line
/Jean Brouwers
Rickard Lindberg wrote:
el*********@gma il.com wrote:
Hi

Am pretty new to python and hence this question..

I have file with an output of a process. I need to search this file one
line at a time and my pattern is that I am looking for the lines that
has the word 'event' and the word 'new'.

Note that i need lines that has both the words only and not either one
of them..

how do i write a regexp for this.. or better yet shd i even be using
regexp or is there a better way to do this....

thanks

Maybe something like this would do:

import re

def lines_with_word s(file, word1, word2):
"""Print all lines in file that have both words in it."""
for line in file:
if re.search(r"\b" + word1 + r"\b", line) and \
re.search(r"\b" + word2 + r"\b", line):
print line

Just call the function with a file object and two strings that
represent the words that you want to find in each line.

To match a word in regex you write "\bWORD\b".

I don't know if there is a better way of doing this, but I believe that
this should at least work.
Jan 19 '07 #3
MrJean1 wrote:
def lines_with_word s(file, word1, word2):
"""Print all lines in file that have both words in it."""
for line in file:
words = line.split()
if word1 in words and word2 in words:
print line
This sounds better, it's probably faster than the RE version, Python
2.5 has a really fast str.__contains_ _ method, done by effbot:

def lines_with_word s(file, word1, word2):
"""Print all lines in file that have both words in it.
(word1 may be the less frequent word of the two)."""
for line in file:
if word1 in line and word2 in line:
print line

Bye,
bearophile

Jan 19 '07 #4
I see two potential problems with the non regex solutions.

1) Consider a line: "foo (bar)". When you split it you will only get
two strings, as split by default only splits the string on white space
characters. Thus "'bar' in words" will return false, even though bar is
a word in that line.

2) If you have a line something like this: "foobar hello" then "'foo'
in line" will return true, even though foo is not a word (it is part of
a word).

Jan 19 '07 #5

Rickard Lindberg wrote:
I see two potential problems with the non regex solutions.

1) Consider a line: "foo (bar)". When you split it you will only get
two strings, as split by default only splits the string on white space
characters. Thus "'bar' in words" will return false, even though bar is
a word in that line.

2) If you have a line something like this: "foobar hello" then "'foo'
in line" will return true, even though foo is not a word (it is part of
a word).
Here's a solution using re.split:

import re
import StringIO

wordsplit = re.compile('\W+ ').split
def matchlines(fh, w1, w2):
w1 = w1.lower()
w2 = w2.lower()
for line in fh:
words = [x.lower() for x in wordsplit(line)]
if w1 in words and w2 in words:
print line.rstrip()

test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(Stri ngIO.StringIO(t est), 'test', 'word')

Jan 19 '07 #6

Rickard Lindberg wrote:
I see two potential problems with the non regex solutions.

1) Consider a line: "foo (bar)". When you split it you will only get
two strings, as split by default only splits the string on white space
characters. Thus "'bar' in words" will return false, even though bar is
a word in that line.

2) If you have a line something like this: "foobar hello" then "'foo'
in line" will return true, even though foo is not a word (it is part of
a word).
Here's a solution using re.split:

import re
import StringIO

wordsplit = re.compile('\W+ ').split
def matchlines(fh, w1, w2):
w1 = w1.lower()
w2 = w2.lower()
for line in fh:
words = [x.lower() for x in wordsplit(line)]
if w1 in words and w2 in words:
print line.rstrip()

test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(Stri ngIO.StringIO(t est), 'test', 'word')

Jan 19 '07 #7
Rickard Lindberg, yesterday I was sleepy and my solution was wrong.
2) If you have a line something like this: "foobar hello" then "'foo'
in line" will return true, even though foo is not a word (it is part of
a word).
Right. Now I think the best solution is to use __contains__ (in) to
quickly find the lines that surely contains both substrings, then on
such possibly rare cases you can use a correctly done RE. If the words
are uncommon enough, such solution may be fast and reliable.
Using raw tests followed by slow and reliable ones on the rare positive
results of the first test is a solution commonly used in Computer
Science, that often is both fast and reliable. (It breaks when the
first test is passed too much often, or when it has some false
negatives).

Probably there are even faster solutions, scanning the whole text at
once instead of inside its lines, but the code becomes too much hairy
and probably it's not worth it.

Bye,
bearophile

Jan 19 '07 #8
On 18 Jan 2007 18:54:59 -0800, "Rickard Lindberg"
<ri******@stude nt.liu.sewrote:
>I see two potential problems with the non regex solutions.

1) Consider a line: "foo (bar)". When you split it you will only get
two strings, as split by default only splits the string on white space
characters. Thus "'bar' in words" will return false, even though bar is
a word in that line.

2) If you have a line something like this: "foobar hello" then "'foo'
in line" will return true, even though foo is not a word (it is part of
a word).
1) Depends how you define a 'word'.

2) This can be resolved with

templine = ' ' + line + ' '
if ' ' + word1 + ' ' in templine and ' ' + word2 + ' ' in templine:
Dan
Jan 19 '07 #9
Daniel Klein wrote:
2) This can be resolved with

templine = ' ' + line + ' '
if ' ' + word1 + ' ' in templine and ' ' + word2 + ' ' in templine:
But then you will still have a problem to match the word "foo" in a
string like "bar (foo)".

Jan 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2535
by: bdwise | last post by:
I have this in my body tag: something();something(); document.thisForm.textBox1.focus();something(); And I want to find a part between the semicolons that ends in focus() and remove the entire value between the semicolons. My Regular Expression looks like this but it is not matching, can anyone help?
0
1951
by: Follower | last post by:
Hi, I am working on a function to return extracts from a text document with a specific phrase highlighted (i.e. display the context of the matched phrase). The requirements are: * Match should be case-insensitive, but extract should have case preserved.
2
2142
by: cricfan | last post by:
I'm parsing a text file to extract word definitions. For example the input text file contains the following content: di.va.gate \'di_--v*-.ga_-t\ vb pas.sim \'pas-*m\ adv : here and there : THROUGHOUT I am trying to obtain words between two literal backslashes (\ .. \). I am not able to match words between two literal backslashes using the regxp - re.compile(r'\\*\\').
6
56829
by: Mark Findlay | last post by:
I am trying to figure out how to set up my reg exp search so that the search will only match on the exact word. Here is the current problem code: Word1 = "RealPlayer.exe" Word2 = "Player.exe" RegExp re = Word2; if (re.Find(Word1))
12
2203
by: teoryn | last post by:
I've been spending today learning python and as an exercise I've ported a program I wrote in java that unscrambles a word. Before describing the problem, here's the code: *--beginning of file--* #!/usr/bin/python # Filename: unscram.py def sort_string(word): '''Returns word in lowercase sorted alphabetically'''
3
2627
by: Hrvoje Niksic | last post by:
I often have the need to match multiple regexes against a single string, typically a line of input, like this: if (matchobj = re1.match(line)): ... re1 matched; do something with matchobj ... elif (matchobj = re2.match(line)): ... re2 matched; do something with matchobj ... elif (matchobj = re3.match(line)): .....
2
5907
by: Sejoro | last post by:
Hello, I am trying to write a program that opens a file; reads through it; outputs the text; then outputs the number of lines, words, and characters. Problem is, every time I try to compile, no matter what modifications I make, I get an error, "line 42: Error: Could not find a match for std::basic_istream<char, std::char_traits<char>>::get(int)." I have tried everything I can think of. Help? #include <iostream> #include <fstream>
2
7227
by: Slippy27 | last post by:
I'm trying to modify a find/replace script which iterates through a file A and makes replacements defined in a csv file B. My original goal was to change any line in file A containing a search string (in whole or as a substring) defined in file B. File B contains both the search string and the string it should be changed into. Example file A whippy slippy ippy slippy snoob flop bloppy
25
5980
by: joeferns79 | last post by:
I had posed a similar topic some time back but I want some additional information from the input file. The log file is as shown... - App Number: 0 - Response for AE Completion No such RID Found - infrastructure:ID_UNHANDLED: An un-handled server exception occurred. Please contact your administrator. at sun.reflect.GeneratedMethodAccessor186.invoke(Unknown Source) at...
0
9916
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9762
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11057
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10696
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10782
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7932
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7094
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5958
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4575
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.