473,671 Members | 2,279 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regexp question

I want to match several regexps against a large body of text. What I
have so far is similar to this:

re1 = <some regexp>
re2 = <some regexp>
re3 = <some regexp>

big_re = re.compile(re1 + '|' + re2 + '|' + re3)

matches = big_re.finditer (file_list)
for match in matches:
span = match.span()
print "matched text =", file_list[span[0]:span[1]]
print "matched re =", match.re.patter n

Now the "match.re.patte rn" is the entire regexp, big_re. But I want
to print out the portion of the big re that was matched -- was it re1?
re2? or re3? Is it possible to determine this, or do I have to make
a second pass through the collection of re's and compare them against
the "matched text" in order to determine which part of the big_re was
matched?

thanks!!

Jul 18 '05 #1
1 2186
On Fri, 05 Dec 2003 02:26:53 -0000, python_charmer2 000 wrote:
re1 = <some regexp>
re2 = <some regexp>
re3 = <some regexp>

big_re = re.compile(re1 + '|' + re2 + '|' + re3)

Now the "match.re.patte rn" is the entire regexp, big_re. But I want
to print out the portion of the big re that was matched -- was it re1?
re2? or re3? Is it possible to determine this, or do I have to make
a second pass through the collection of re's and compare them against
the "matched text" in order to determine which part of the big_re was
matched?


That will work no matter what your regexes hapen to be, and is easily
understood. Implement that, and see if it's fast enough. (Doing
otherwise is known as "premature optimisation" and is a bad practice.)
In fact, it may be better (from a readability standpoint) to simply
compile each of the regexes and match them all each time.

An alternative, if it's not fast enough: Group the regexes and inspect
them with the re.MatchObject. group() method.
import re
regex1 = 'abc'
regex2 = 'def'
regex3 = 'ghi'
big_regex = re.compile( ... '(' + regex1 + ')'
... + '|(' + regex2 + ')'
... + '|(' + regex3 + ')'
... ) match = re.match( big_regex, 'def' )
match.groups() (None, 'def', None) match.group(1)
match.group(2) 'def' match.group(3)

--
\ "As the evening sky faded from a salmon color to a sort of |
`\ flint gray, I thought back to the salmon I caught that morning, |
_o__) and how gray he was, and how I named him Flint." -- Jack Handey |
Ben Finney <http://bignose.squidly .org/>
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
2177
by: Magnus Lie Hetland | last post by:
I'm working on a project (Atox) where I need to match quite a few regular expressions (several hundred) in reasonably large text files. I've found that this can easily get rather slow. (There are many things that slow Atox down -- it hasn't been designed for speed, and any optimizations will entail quite a bit of refactoring.) I've tried to speed this up by using the same trick as SPARK, putting all the regexps into a single or-group in...
5
2350
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could. Or how can I replace the html &entities; in a string "blablabla&amp;blablabal&amp;balbalbal" with the chars they mean using re.sub? I found out they are stored in an dict . I though about this functionality:
4
7465
by: Jon Maz | last post by:
Hi All, I want to strip the accents off characters in a string so that, for example, the (Spanish) word "práctico" comes out as "practico" - but ignoring case, so that "PRÁCTICO" comes out as "PRACTICO". What's the best way to do this? TIA,
3
1712
by: Sped Erstad | last post by:
There must be a simple regexp reason for this little question but it's driving me nuts. Below is a simple regexp to determine if a string contains only numbers. I'm running these two strings through the two very subtly different pieces of code: "0" and "0a" If I do the two "nots" on it, it works perfectly ("0" succeeds, "0a" fails). However, if I get rid of the two "nots", it appears to not do a global match properly on the string: ...
2
1477
by: Bill McCormick | last post by:
Hello, I'm new to VB.NET but have used regexp in Perl and VI. I'd like to read a regular expression from a file and apply it to a string read from another file. The regexp is simple word replace operation, so my input regexp file lines might look like this: /foo/bar/
26
2115
by: Matt Kruse | last post by:
Are there any current browsers that have Javascript support, but not RegExp support? For example, cell phone browsers, blackberrys, or other "minimal" browsers? I know that someone using Netscape 3 would fall into this category, for example, but that's not a realistic situation anymore. And if such a condition exists, then how do you guys handle validation using regular expressions, if the browser lacks them? For example:
7
3440
by: Csaba Gabor | last post by:
I need to come up with a function function regExpPos (text, re, parenNum) { ... } that will return the position within text of RegExp.$parenNum if there is a match, and -1 otherwise. For example: var re = /some(thing|or other)?.*(n(est)(?:ed)?.*(parens) )/ var text = "There were some nesting parens in the test"; alert (regExpPos (text, re, 3));
11
2920
by: HopfZ | last post by:
I coudn't understand some behavior of RegExp.test function. Example html code: ---------------- <html><head></head><body><script type="text/javascript"> var r = /^https?:\/\//g; document.write( ); </script></body></html> ---------------------
8
3313
by: Darryl Kerkeslager | last post by:
Currently I am using the RegExp object to parse a large dataset in an Access table - but this table was exported from SQL Server, and the very correct question was asked - why not just do it in SQL Server. What would be the best way to convert the VBA code I use in Access to SQL Server - being only marginally familiar with T-SQL syntax and not at all familiar with what can or cannot be done? --
0
8924
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8823
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8602
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8672
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7441
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6234
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
2817
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2058
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1814
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.