I want to match several regexps against a large body of text. What I
have so far is similar to this:
re1 = <some regexp>
re2 = <some regexp>
re3 = <some regexp>
big_re = re.compile(re1 + '|' + re2 + '|' + re3)
matches = big_re.finditer (file_list)
for match in matches:
span = match.span()
print "matched text =", file_list[span[0]:span[1]]
print "matched re =", match.re.patter n
Now the "match.re.patte rn" is the entire regexp, big_re. But I want
to print out the portion of the big re that was matched -- was it re1?
re2? or re3? Is it possible to determine this, or do I have to make
a second pass through the collection of re's and compare them against
the "matched text" in order to determine which part of the big_re was
matched?
thanks!! 1 2186
On Fri, 05 Dec 2003 02:26:53 -0000, python_charmer2 000 wrote: re1 = <some regexp> re2 = <some regexp> re3 = <some regexp>
big_re = re.compile(re1 + '|' + re2 + '|' + re3)
Now the "match.re.patte rn" is the entire regexp, big_re. But I want to print out the portion of the big re that was matched -- was it re1? re2? or re3? Is it possible to determine this, or do I have to make a second pass through the collection of re's and compare them against the "matched text" in order to determine which part of the big_re was matched?
That will work no matter what your regexes hapen to be, and is easily
understood. Implement that, and see if it's fast enough. (Doing
otherwise is known as "premature optimisation" and is a bad practice.)
In fact, it may be better (from a readability standpoint) to simply
compile each of the regexes and match them all each time.
An alternative, if it's not fast enough: Group the regexes and inspect
them with the re.MatchObject. group() method. import re regex1 = 'abc' regex2 = 'def' regex3 = 'ghi' big_regex = re.compile(
... '(' + regex1 + ')'
... + '|(' + regex2 + ')'
... + '|(' + regex3 + ')'
... ) match = re.match( big_regex, 'def' ) match.groups()
(None, 'def', None) match.group(1) match.group(2)
'def' match.group(3)
--
\ "As the evening sky faded from a salmon color to a sort of |
`\ flint gray, I thought back to the salmon I caught that morning, |
_o__) and how gray he was, and how I named him Flint." -- Jack Handey |
Ben Finney <http://bignose.squidly .org/> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Magnus Lie Hetland |
last post by:
I'm working on a project (Atox) where I need to match quite a few
regular expressions (several hundred) in reasonably large text files.
I've found that this can easily get rather slow. (There are many
things that slow Atox down -- it hasn't been designed for speed, and
any optimizations will entail quite a bit of refactoring.)
I've tried to speed this up by using the same trick as SPARK, putting
all the regexps into a single or-group in...
|
by: Lukas Holcik |
last post by:
Hi everyone!
How can I simply search text for regexps (lets say <a
href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a
dictionary { name : URL}? In a single pass if it could.
Or how can I replace the html &entities; in a string
"blablabla&blablabal&balbalbal" with the chars they mean using
re.sub? I found out they are stored in an dict . I though about this functionality:
|
by: Jon Maz |
last post by:
Hi All,
I want to strip the accents off characters in a string so that, for example,
the (Spanish) word "práctico" comes out as "practico" - but ignoring case,
so that "PRÁCTICO" comes out as "PRACTICO".
What's the best way to do this?
TIA,
|
by: Sped Erstad |
last post by:
There must be a simple regexp reason for this little question but it's
driving me nuts. Below is a simple regexp to determine if a string
contains only numbers. I'm running these two strings through the two
very subtly different pieces of code: "0" and "0a"
If I do the two "nots" on it, it works perfectly ("0" succeeds, "0a"
fails). However, if I get rid of the two "nots", it appears to not do
a global match properly on the string: ...
|
by: Bill McCormick |
last post by:
Hello,
I'm new to VB.NET but have used regexp in Perl and VI.
I'd like to read a regular expression from a file and apply it to a
string read from another file. The regexp is simple word replace
operation, so my input regexp file lines might look like this:
/foo/bar/
| |
by: Matt Kruse |
last post by:
Are there any current browsers that have Javascript support, but not RegExp
support?
For example, cell phone browsers, blackberrys, or other "minimal" browsers?
I know that someone using Netscape 3 would fall into this category, for
example, but that's not a realistic situation anymore.
And if such a condition exists, then how do you guys handle validation using
regular expressions, if the browser lacks them?
For example:
|
by: Csaba Gabor |
last post by:
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.
For example:
var re = /some(thing|or other)?.*(n(est)(?:ed)?.*(parens) )/
var text = "There were some nesting parens in the test";
alert (regExpPos (text, re, 3));
|
by: HopfZ |
last post by:
I coudn't understand some behavior of RegExp.test function.
Example html code:
----------------
<html><head></head><body><script type="text/javascript">
var r = /^https?:\/\//g;
document.write( );
</script></body></html>
---------------------
|
by: Darryl Kerkeslager |
last post by:
Currently I am using the RegExp object to parse a large dataset in an Access
table - but this table was exported from SQL Server, and the very correct
question was asked - why not just do it in SQL Server.
What would be the best way to convert the VBA code I use in Access to SQL
Server - being only marginally familiar with T-SQL syntax and not at all
familiar with what can or cannot be done?
--
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |