473,811 Members | 3,256 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

(mostly-)POSIX regular expressions

Hi,

I'm searching for a POSIX 1003.2 compatible regular expression engine.
The Python binding "pregex" by Neal Becker may do the job, but I did
not manage to download it as the original link
ftp://ftp.ctd.comsat.com/pub/
seems dead.

Does any old-timer (<wink>) have a copy of this package ?

Cheers,

SB

May 27 '06 #1
6 2735
maybe this: http://www.pcre.org/pcre.txt and ctypes might work for you?
(I was suprised to find out that PCRE supported POSIX but don't know
what version it supports or how well).

- Pad

May 28 '06 #2
Very good hint ! I wouldn't have found it alone ...

I have to study the doc, but the "THE DFA MATCHING ALGORITHM" may do
what I need Obviously, I didn't expect the Perl-Compatible Regular
Expressions to implement
"an alternative algorithm, provided by the pcre_dfa_exec() function,
that operates in a different way, and is not Perl-compatible".

Maybe the lib should be renamed in PCREWSO for:
Perl-compatible regular expressions ... well, sort of :)

Cheers,

SB

May 28 '06 #3

Paddy a écrit :
maybe this: http://www.pcre.org/pcre.txt and ctypes might work for you?


Well finally, it doesn't fit. What I need is a "longest match" policy
in
patterns like "(a)|(b)|(c )" and NOT a "left-to-right" policy.
Additionaly,
I need to be able to obtain the matched ("captured") substring and
the PCRE does not allow this in DFA mode.

Too bad ...

SB

May 28 '06 #4
On 29/05/2006 7:46 AM, Sébastien Boisgérault wrote:
Paddy a écrit :
maybe this: http://www.pcre.org/pcre.txt and ctypes might work for you?


Well finally, it doesn't fit. What I need is a "longest match" policy
in
patterns like "(a)|(b)|(c )" and NOT a "left-to-right" policy.
Additionaly,
I need to be able to obtain the matched ("captured") substring and
the PCRE does not allow this in DFA mode.


Perhaps you might like to be somewhat more precise with your
requirements. "POSIX-compliant" made me think of yuckies like [:fubar:]
in character classes :-)

The operands of | are such that the length is not fixed and so you can't
write them in descending length order? Care to tell us some more detail
about those operands?

If those operands are simple strings (LOGICAL|LOGIC| LOG) and you've got
more than a handful of them, try Danny Yoo's ahocorasick module.

HTH,
John
May 28 '06 #5
John Machin wrote:
On 29/05/2006 7:46 AM, Sébastien Boisgérault wrote:
Paddy a écrit :
maybe this: http://www.pcre.org/pcre.txt and ctypes might work for you?
Well finally, it doesn't fit. What I need is a "longest match" policy
in
patterns like "(a)|(b)|(c )" and NOT a "left-to-right" policy.
Additionaly,
I need to be able to obtain the matched ("captured") substring and
the PCRE does not allow this in DFA mode.


Perhaps you might like to be somewhat more precise with your
requirements.


Sure. More on this below.
"POSIX-compliant" made me think of yuckies like [:fubar:]
in character classes :-)
Yep. I do not need POSIX *syntax* for regular expressions but POSIX
*semantics*, at least the "leftmost-longest" part (in contrast to the
"first then longest" used in Python, Perl, .NET, etc.)
The operands of | are such that the length is not fixed and so you can't
write them in descending length order? Care to tell us some more detail
about those operands?


Basically, I'd like to use the (excellent) python module SPARK
of John Aycock to build an (extended) C lexer. To do so, I need
to specify the patterns that match my tokens as well as a priority
between them. SPARK then builds a big alternate list of patterns
that begins with the high priority patterns and ends with the low
priority patterns and runs a match.

The problem with to be very careful and to specify explicitely the
priorities to get the desired results: "<=" shall be higher than "<",
decimal stuff higher than integer, etc, when most of the time what
you really want is to match the longest pattern ...

Worse, the priority work-around does not work well when you
compare keywords and (other) identifiers. To match "fortune"
as a identifier, you would need to define identifier with a higher
priority than keyword and it is a problem: "for" would be then
match as a identifier when it is a keyword.

I can come up with possible work-arounds for the "id vs
keyword" issue, but nothing that really makes me happy ...
Therefore, I was studying the possible replacement of the
Python native regular expression engine with a "POSIX
semantics" regular expression engine that would give the
longest match and avoid me a lot of extra work ...

I hope it's clearer now :)

Any advice ?

Cheers

SB

May 29 '06 #6
i have a problem with the os.times() command, on different Python
versions, i get different printout:

Server1# python
Python 2.3.4 (#1, Feb 2 2005, 11:44:13)
[GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
import time
import os

print os.times()[4] 4880406.62
----------------------------------
Server2% python
Python 2.3.2 (#4, Sep 14 2004, 09:41:45) [C] on sunos5
Type "help", "copyright" , "credits" or "license" for more information. import time
import os

print os.times()[4] -21464227.74
---------------
Server3% python
Python 2.4.1 (#1, May 16 2005, 15:19:29)
[GCC 4.0.0 20050512 (Red Hat 4.0.0-5)] on linux2
Type "help", "copyright" , "credits" or "license" for more information. import time
import os

print os.times()[4]

18390711.21

and on the 3 servers, the linux command: $date
returns the same value.....

any suggestions???
May 29 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4187
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
11
3923
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being parsed can completely wreck it. The string I am trying to parse is as follows: commandText=insert into (Text) values (@message + N': ' + @category);commandType=StoredProcedure; message=@message; category=@category I am looking to retrive name value...
4
1958
by: GenoJoe | last post by:
If you are not new to VB.NET but are new to regular expressions, you need to get a free copy of "Pragmatic Guide to Regular Expressions for VB.NET Programmers". I wrote this guide because all of the sources that I researched for information on this topic, including Microsoft Help pages, did not properly address it from the viewpoint of someone new to regular expressions. If you send me an email, I will return you a zipped file that includes...
2
5107
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I have to use all the expressions seperately? Here are my regular expressions that check for valid email address and link Dim Expression As String =
2
2475
by: cleo | last post by:
I'm experimenting with Regular Expressions and Windows Forms. Frequently I want a value to be either a valid pattern or empty. For example a Zip code must be 5 digits or may be empty. I know that I can use the Regular Expression "\d{5}" to test for exactly 5 digits. How can I add the option for the string to be empty or must I always test the value before calling the Regular Expression? Thanks
4
5187
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or adate LIKE "2004.01.10 __:15") .... into something like this: .... where adate LIKE "2004.01.10 __:(30/15)" ...
3
3028
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular Expressions used in Perl identical to the Regular Expressions in PHP?
1
4390
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more regular expressions match the string I want the first one returned) The code that does this has...
10
1883
by: Thomas Dybdahl Ahle | last post by:
Hi, I'm writing a program with a large data stream to which modules can connect using regular expressions. Now I'd like to not have to test all expressions every time I get a line, as most of the time, one of them having a match means none of the others can have so. But ofcource there are also cases where a regular expression can "contain" another expression, like in: "^strange line (\w+) and (\w+)$" and "^strange line (\w+) (?:.*?)$"...
13
7497
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can be accomplished with regular expressions this way, such as validating a mathematical expression or parsing a language with nested parens, quoting or expressions. Another feature I'm missing is once-only subpatterns and possessive quantifiers...
0
9605
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10393
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10405
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10136
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9208
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6893
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5556
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4342
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3020
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.