473,383 Members | 1,792 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

How to match literal backslashes read from a text file using regular expressions?

I'm parsing a text file to extract word definitions. For example the
input text file contains the following content:

di.va.gate \'di_--v*-.ga_-t\ vb
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT

I am trying to obtain words between two literal backslashes (\ .. \). I
am not able to match words between two literal backslashes using the
regxp - re.compile(r'\\[^\\]*\\').

Here is my sample script:

import re;

#slashPattern = re.compile(re.escape(r'\\[^\\]*\\'));
pattern = r'\\[^\\]*\\'
slashPattern = re.compile(pattern);

fdr = file( "parseinput",'r');
line = fdr.readline();

while (line != ""):
if (slashPattern.match(line)):
print line.rstrip() + " <-- matches pattern " + pattern
else:
print line.rstrip() + " <-- DOES not match pattern " +
pattern
line = fdr.readline();
print;
----------
The output

C:\home\krishna\lang\python>python wsparsetest.py
python wsparsetest.py
di.va.gate \'di_--v*-.ga_-t\ vb <-- DOES not match
pattern \\[^\\]*\\
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT <-- DOES not match
pattern \\[^\\]*\\
-----------

What should I be doing to match those literal backslashes?

Thanks

Jul 21 '05 #1
2 2109
cr*****@gmail.com wrote:
I'm parsing a text file to extract word definitions. For example the
input text file contains the following content:

di.va.gate \'di_--v*-.ga_-t\ vb
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT

I am trying to obtain words between two literal backslashes (\ .. \). I
am not able to match words between two literal backslashes using the
regxp - re.compile(r'\\[^\\]*\\').

Here is my sample script:

import re;
Lose the semicolons ...

#slashPattern = re.compile(re.escape(r'\\[^\\]*\\'));
pattern = r'\\[^\\]*\\'
slashPattern = re.compile(pattern);

fdr = file( "parseinput",'r');
line = fdr.readline();

You should upgrade so that you have a modern Python and a modern
tutor[ial] -- then you will be writing:

for line in fdr:
do_something_with(line)

while (line != ""):
Lose the extraneous parentheses ...
if (slashPattern.match(line)):
Your main problem is that you should be using the search() method, not
the match() method. Read the section on this topic in the re docs!!
import re
pat = re.compile(r'\\[^\\]*\\')
pat.match(r'abcd \xyz\ pqr')
pat.search(r'abcd \xyz\ pqr')

<_sre.SRE_Match object at 0x00AE8988>

print line.rstrip() + " <-- matches pattern " + pattern
else:
print line.rstrip() + " <-- DOES not match pattern " +
pattern
line = fdr.readline();
print;
----------
The output

C:\home\krishna\lang\python>python wsparsetest.py
python wsparsetest.py
di.va.gate \'di_--v*-.ga_-t\ vb <-- DOES not match
pattern \\[^\\]*\\
pas.sim \'pas-*m\ adv : here and there : THROUGHOUT <-- DOES not match
pattern \\[^\\]*\\
-----------

What should I be doing to match those literal backslashes?

Thanks

Jul 21 '05 #2
This should give you an idea of how to go about it (needs python 2.3 or
newer):
import re
slashPattern = re.compile(r'\\(.*?)\\')

for i,line in enumerate(file("parseinput")):
print "line", i+1,
match = slashPattern.search(line)
if match:
print "matched:", match.group(1)
else:
print "did not match"

#===== output =======================

line 1 matched: 'di_--v*-.ga_-t
line 2 matched: 'pas-*m

#====================================
George

Jul 21 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: bdwise | last post by:
I have this in my body tag: something();something(); document.thisForm.textBox1.focus();something(); And I want to find a part between the semicolons that ends in focus() and remove the...
0
by: Follower | last post by:
Hi, I am working on a function to return extracts from a text document with a specific phrase highlighted (i.e. display the context of the matched phrase). The requirements are: * Match...
6
by: Matt Wette | last post by:
Over the last few years I have converted from Perl and Scheme to Python. There one task that I do often that is really slick in Perl but escapes me in Python. I read in a text line from a file...
1
by: Alastair Cameron | last post by:
VB6, MSXML 3.2 installed: Q1. I am having a problem selecting nodes with XPATH expressions when an attribute values contain backslashes (\\) in as part of its value: For example the...
2
by: Christian Staffe | last post by:
Hi, I would like to check for a partial match between an input string and a regular expression using the Regex class in .NET. By partial match, I mean that the input string could not yet be...
38
by: Steve Kirsch | last post by:
I need a simple function that can match the number of beginning and ending parenthesis in an expression. Here's a sample expression: ( ( "john" ) and ( "jane" ) and ( "joe" ) ) Does .NET have...
2
by: brad | last post by:
Hello all, I'm new to javascript--not too new to a few other programming languages--and I need your help deciphering the Regexp in the following string. Regular expresions are hard enough in...
32
by: Licheng Fang | last post by:
Basically, the problem is this: 'do' Python's NFA regexp engine trys only the first option, and happily rests on that. There's another example: 'oneself' The Python regular expression...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.