Quicker reg exps?

Hi, I've written a reg exp for capturing a group of numbers from text files in the following format:
-1.4326 s < 0.6758 s < 1.4334 s
Any of the numbers can be positive or negative and the units (s) can change or even be absent. What I wanted was the three numbers (signs included)! Here was the reg exp I used to capture:

Expand|Select|Wrap|Line Numbers

  $line =~ m!([ |-]\d+\.\d+)\s+.*?<\s+([ |-]\d+\.\d+)\s+(.*?)<\s+([ |-]\d+\.\d+)!   
 
 

The problem is that this must be used hundreds of thousands of times per file so speed is an issue! Does anyone have any ideas to make this reg exp faster? I'm not fully aware of what reg exp constructs incurr speed penalties?
Thanks!

Sep 24 '08 #1

Subscribe Post Reply

1675

numberwhun

3,509

Expert Mod 2GB

The only thing I can really think of right off (due to it still being early and my brain is still sleeping), is to work to make your regex non-greedy if you can. You can read about it here and here.

Having a more exact regular expression is one key to speed. Also, in the beginning of your regex you have the following:

Expand|Select|Wrap|Line Numbers

[ |-]

I assume that the spacing before the pipe symbol is supposed to be a space, but to a regex, its just white space and not part of the regex. To indicate a space in a regex, you would use a \s, not an actual space.

Regards,

Jeff

Sep 24 '08 #2

Ganon11

3,652

Expert 2GB

Jeff,

A space inside a character class (such as the one he has) matches just that - a space. Whitespace is matched normally inside regexs unless a certain option is turned on (which I forget right now). In other words,

Expand|Select|Wrap|Line Numbers

$line =~ /This is a test./;

will correctly match "This is a test." but not "Thisisatest."

Expand|Select|Wrap|Line Numbers

 C:\Users\Ganon11>perl

while (1) {

   chomp(my $line = <STDIN>);

   if ($line =~ /This is a test./) {

      print "Successful match.\n";

   } else {

      print "No match.\n";

   }

}

^Z

This is a test.

Successful match.

Thisisatest.

No match.

^C

Similarly,

Expand|Select|Wrap|Line Numbers

$line =~ /(\w+)[ \t]/;

will match "Dogs ", "Cats ", but not "Mouse".

Expand|Select|Wrap|Line Numbers

 C:\Users\Ganon11>perl

while (1) {

        chomp(my $line = <STDIN>);

        if ($line =~ /(\w+)[ \t]/) {

                print "Successful match.\n";

        } else {

                print "No match.\n";

        }

}

^Z

Dogs

No match.

Dogs and

Successful match.

Cats

Successful match.

There was a tab in the previous line

Successful match.

Mousenospace

No match.

^C

The special character \s is special only because it matches any kind of whitespace - therefore, I believe \s is equivalent to [ \t\n].

Sep 24 '08 #3

KevinADC

4,059

Expert 2GB

try:

Expand|Select|Wrap|Line Numbers

$line =~ m/(-?\d+\.\d+)[^<]+<\s+(-?\d+\.\d+)[^<]+<\s+(-?\d+\.\d+)/o;

the "o" on the end might also give some performance boost but you would have to test the code to see if that is true for your application.

Sep 24 '08 #4

numberwhun

3,509

Expert Mod 2GB

The special character \s is special only because it matches any kind of whitespace - therefore, I believe \s is equivalent to [ \t\n].

Plus, with \s, you can add the modifiers to match none or many, where as I believe you would have to include as many spaces as you expect the way he has done it. I was just looking to efficiency, but also wasn't aware you could use a literal space as such.

Sep 24 '08 #5

Ganon11

3,652

Expert 2GB

You could use [ \t\n]+ or [ \t\n]* just like \s, it's just faster to write \s+ or \s*. I think.

Sep 24 '08 #6

KevinADC

4,059

Expert 2GB

\s is actually a character class, not just a meta character, its like \d ([0-9]) or \w ([a-zA-Z0-9_]) and not like \t or \n, which are meta characters that have only one interpolated meaning (tab and newline). Its actual meaning may also vary between older versions of perl and newer ones.

According to the perl 5.10 documentation:

\s matches a whitespace character, the set [\ \t\r\n\f] and others

Sep 24 '08 #7

by: NotGiven | last post by:

I am researching the best place to put pictures. I have heard form both sides and I'd like to know why one is better than the other. Many thanks!

PHP

Quicker way to copy()?

by: has | last post by:

I'm wondering if the following code is acceptable for shallow copying instances of new-style classes: class clone(object): def __init__(self, origObj): self.__dict__ = origObj.__dict__.copy()...

Python

383

Why Windows Lost The Battle for the Desktop

by: John Bailo | last post by:

The war of the OSes was won a long time ago. Unix has always been, and will continue to be, the Server OS in the form of Linux. Microsoft struggled mightily to win that battle -- creating a...

.NET Framework

Making a program quicker tips...

by: Chris Mantoulidis | last post by:

There must be some tips to make a program quicker. I guess more than 50% of ppl here will say "avoid the if-s". Yeah I know this makes a program quicker but some times an "if" is inevitable,...

C / C++

What is quicker or better?

by: Newbie | last post by:

Hi all, I have a access 2000 database with linked tables to a access 2000 backend database. The performance is really slow. I am looking for the best way to open a form to add a new record...

Microsoft Access / VBA

Which is Quicker? Loop or filter a dataset....

by: Mark | last post by:

Hi - when working with datasets, is it quicker to loop through the dataset, comparing some column values with predetermined values, or should I apply a filter on the dataset to retrieve the values...

ASP.NET

quicker way to create indexes

by: shelleybobelly | last post by:

Hi, I have a new job. It needs to drop and re-create (by insert) a table every night. The table contains approximately 3,000,000 (and growing) records. The insert is fine, runs in 2 minutes. The...

Microsoft SQL Server

Quicker way for DOM?

by: adam | last post by:

I'm currently coding a CMS system for a site which includes the feature to create multiple sections inside a page. To add each of these new sections I'm using DOM with AJAX to save, but I've got a...

Javascript

How to get a quicker, more complete answer

by: drhowarddrfine | last post by:

I see these mistakes over and over again. Follow them and you can get a quicker and better answer to your questions. Include a link to your page or the complete HTML and CSS. A picture of the...

HTML / CSS

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Quicker reg exps?

Similar topics