need help of regular expression genius

GHUM

I need to split a text at every ; (Semikolon), but not at semikolons
which are "escaped" within a pair of $$ or $_$ signs.

My guess was that something along this should happen withing csv.py;
but ... it is done within _csv.c :(

Example: the SQL text should be splitted at "<split here>" (of course,
those "split heres" are not there yet :)

set interval 2;
<split here>
CREATE FUNCTION uoibcachebetrkd(bigint, text, text, text, text, text,
timestamp without time zone, text, text) RETURNS integer
AS $_$
DECLARE
result int4;
BEGIN
update bcachebetrkd set
name=$2, wieoftjds=$3, letztejds=$4, njds=$5,
konzern=$6, letztespeicherung=$7, betreuera=$8, jdsueberkonzern=$9
where id_p=$1;
IF FOUND THEN
result:=-1;
else
insert into bcachebetrkd (
id_p, name, wieoftjds, letztejds, njds, konzern,
letztespeicherung, betreuera, jdsueberkonzern
)
values ($1, $2, $3, $4, $5, $6, $7, $8, $9);
result:=$1;
END IF;
RETURN result;
END;
$_$
LANGUAGE plpgsql;
<split here>
CREATE FUNCTION set_quarant(mylvlquarant integer) RETURNS integer
AS $$
BEGIN
perform relname from pg_class
where relname = 'quara_tmp'
and case when has_schema_privilege(relnamespace, 'USAGE')
then pg_table_is_visible(oid) else false end;
if not found then
create temporary table quara_tmp (
lvlquara integer
);
else
delete from quara_tmp;
end if;

insert into quara_tmp values (mylvlquarant);
return 0;
END;
$$
LANGUAGE plpgsql;
<split here>

Can anybody hint me in the right direction, how a RE looks for "all ;
but not those ; within $$" ?

Harald

Aug 2 '06 #1

Subscribe Post Reply

1625

Ant

GHUM wrote:

I need to split a text at every ; (Semikolon), but not at semikolons
which are "escaped" within a pair of $$ or $_$ signs.

Looking at you example SQL code, it probably isn't possible with
regexes. Consider the code:

$$
blah blah
....
$$
blah;
<split here>
xxx
$$
blah
blah
$$

Regexes aren't clever enough to count the number of backreferences, and
so won't help in the above case. You'd be better off creating a custom
parser using a stack or counter of some sort to decide whether or not
to split the text.

Aug 2 '06 #2

Anthra Norell

Harald,

This works. 's' is your SQL sample.

>>import SE # From the Cheese Shop with a good manual
Split_Marker = SE.SE (' ";=\<split here>" "~\$_?\$(.|\n)*?\$_?\$~==" ')
s_with_split_marks = Split_Marker (s)
s_split = s_with_split_marks.split ('<split here>')

That's it! And it isn't as complicated as it looks. The first expressions says translate the semicolon to your split mark. The
second expression finds the $-blocks and says translate them to themselves. So they don't change. You can add as many expressions as
you want. You'd probably want to choose a more convenient split mark.

Frederic

----- Original Message -----
From: "GHUM" <ha**************@gmail.com>
Newsgroups: comp.lang.python
To: <py*********@python.org>
Sent: Wednesday, August 02, 2006 5:27 PM
Subject: need help of regular expression genius

I need to split a text at every ; (Semikolon), but not at semikolons
which are "escaped" within a pair of $$ or $_$ signs.

My guess was that something along this should happen withing csv.py;
but ... it is done within _csv.c :(

Example: the SQL text should be splitted at "<split here>" (of course,
those "split heres" are not there yet :)

set interval 2;
<split here>
CREATE FUNCTION uoibcachebetrkd(bigint, text, text, text, text, text,
timestamp without time zone, text, text) RETURNS integer
AS $_$
DECLARE
result int4;
BEGIN
update bcachebetrkd set
name=$2, wieoftjds=$3, letztejds=$4, njds=$5,
konzern=$6, letztespeicherung=$7, betreuera=$8, jdsueberkonzern=$9
where id_p=$1;
IF FOUND THEN
result:=-1;
else
insert into bcachebetrkd (
id_p, name, wieoftjds, letztejds, njds, konzern,
letztespeicherung, betreuera, jdsueberkonzern
)
values ($1, $2, $3, $4, $5, $6, $7, $8, $9);
result:=$1;
END IF;
RETURN result;
END;
$_$
LANGUAGE plpgsql;
<split here>
CREATE FUNCTION set_quarant(mylvlquarant integer) RETURNS integer
AS $$
BEGIN
perform relname from pg_class
where relname = 'quara_tmp'
and case when has_schema_privilege(relnamespace, 'USAGE')
then pg_table_is_visible(oid) else false end;
if not found then
create temporary table quara_tmp (
lvlquara integer
);
else
delete from quara_tmp;
end if;

insert into quara_tmp values (mylvlquarant);
return 0;
END;
$$
LANGUAGE plpgsql;
<split here>

Can anybody hint me in the right direction, how a RE looks for "all ;
but not those ; within $$" ?

Harald

--
http://mail.python.org/mailman/listinfo/python-list

Aug 2 '06 #3

Paul McGuire

"GHUM" <ha**************@gmail.comwrote in message
news:11**********************@i42g2000cwa.googlegr oups.com...

I need to split a text at every ; (Semikolon), but not at semikolons
which are "escaped" within a pair of $$ or $_$ signs.

The pyparsing rendition to this looks very similar to the SE solution,
except for the regexp's:

text = """ ... input source text ... ""

from pyparsing import SkipTo,Literal,replaceWith
ign1 = "$$" + SkipTo("$$") + "$$"
ign2 = "$_$" + SkipTo("$_$") + "$_$"
semi = Literal(";").setParseAction( replaceWith("; <***>") )
print (ign1 | ign2 | semi).transformString(text)

In concept, this works just like the SE program: as the scanner/parser scans
through the input text, the ignoreable expressions are looked for first, and
if found, just skipped over. If the semicolon expression is found, then its
parse action is executed, which replaces the ';' with "; <***>", or whatever
you choose.

The pyparsing wiki is at pyparsing.wikispaces.com.

-- Paul

Aug 3 '06 #4

GHUM

Paul,

text = """ ... input source text ... ""
from pyparsing import SkipTo,Literal,replaceWith
ign1 = "$$" + SkipTo("$$") + "$$"
ign2 = "$_$" + SkipTo("$_$") + "$_$"
semi = Literal(";").setParseAction( replaceWith("; <***>") )
print (ign1 | ign2 | semi).transformString(text)

Thank you very much! this really looks beautifull and short! How could
I forget about pyparsing? Old loves are often better then adventures
with RE. :)

Two questions remain:
1) I did not succeed in finding a documentation for pyparsing. Is there
something like a "full list of Classes and their methods" ?

2) as of missing 1) :)): something like
"setParseAction(splithereandreturnalistofelementss plittedhere) ?

Thanks again!

Harald

(of course, I can .split("<***>") the transformedString :)

Aug 3 '06 #5

Paul McGuire

"GHUM" <ha**************@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...

Paul,

text = """ ... input source text ... ""
from pyparsing import SkipTo,Literal,replaceWith
ign1 = "$$" + SkipTo("$$") + "$$"
ign2 = "$_$" + SkipTo("$_$") + "$_$"
semi = Literal(";").setParseAction( replaceWith("; <***>") )
print (ign1 | ign2 | semi).transformString(text)

Thank you very much! this really looks beautifull and short! How could
I forget about pyparsing? Old loves are often better then adventures
with RE. :)

Good to hear from you again, Harald! I didn't recognize your "From"
address, but when I looked into the details, I recognized your name from
when we talked about some very early incarnations of pyparsing.

>
Two questions remain:
1) I did not succeed in finding a documentation for pyparsing. Is there
something like a "full list of Classes and their methods" ?

Pyparsing ships with JPG and PNG files containing class diagrams, plus an
htmldoc directory containing epydoc-generated help files.
There are also about 20 example programs included (also accessible in the
wiki).

2) as of missing 1) :)): something like
"setParseAction(splithereandreturnalistofelementss plittedhere) ?

I briefly considered what this grammar might look like, and rejected it as
much too complicated compared to .split("<***>"). You could also look into
using scanString instead of transformString (scanString reports the location
within the string of the matched text). Then when matching on a ";", use
the match location to help slice up the string and append to a list. But
again, this is so much more complicated than just .split("<***>"), I
wouldn't bother other than as an exercise in learning scanString.

Good luck!
-- Paul

Aug 3 '06 #6

GHUM

Paul,

Pyparsing ships with JPG and PNG files containing class diagrams, plus an
htmldoc directory containing epydoc-generated help files.
There are also about 20 example programs included (also accessible in the
wiki).

Yes. That's what I have been missing. Maybe you could add: "please also
download the .zip file if you use the windows installer to find the
documentation" :)))

>You could also look into using scanString instead of transformString

thats what I found:
from pyparsing import SkipTo,Literal,replaceWith
ign1 = "$$" + SkipTo("$$") + "$$"
ign2 = "$_$" + SkipTo("$_$") + "$_$"
semi = Literal(";")

von=0
befehle=[]
for row in (ign1 | ign2 | semi).scanString(txt):
if row[0][0]==";":
token, bis, von2=row
befehle.append(txt[von: von2])
von=von2

I knew that for this common kind of problem there MUST be better
solution then my homebrewn tokenizer (skimming through text char by
char and remembering the switch to escape mode ... brrrrrr, looked like
perl)

Thanks for the reminder of pyparsing, maybe I should put in a reminder
in my calender ... something along the lines "if you think of using a
RE, you propably have forgotton pyparsing" every 3 months :)))))

Best wishes and thank you very much for pyparsing and the hint

Harald

Aug 3 '06 #7

Similar topics

Regular Expression

by: Buddy | last post by:

Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...

C# / C Sharp

Need help understanding regular expression

by: Joe | last post by:

Hi, I have been using a regular expression that I donâ€™t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...

ASP.NET

Simple Regular Expression need

by: Q. John Chen | last post by:

I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...

ASP.NET

Regular expression optimization

by: Billa | last post by:

Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...

.NET Framework

Need one Regular Expression

by: Lucky | last post by:

hi guys, i'm practising regular expression. i've got one string and i want it to split in groups. i was trying to make one regular expression but i didn't successed. please help me guys. i'm...

Visual Basic .NET

Get regular expression

by: Mike | last post by:

I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...

C# / C Sharp

Need help in forming a regular expression using regex_replace

by: deepak_kamath_n | last post by:

Hello, I am relatively new to the world of regex and require some help in forming a regular expression to achieve the following: I have an input stream similar to: Slot: slot1 Description:...

C / C++

need some regular expression help

by: Chris | last post by:

I need a pattern that matches a string that has the same number of '(' as ')': findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = Can anybody help me out? Thanks for any help!

Python

More regular expression woes

by: Mark Rae | last post by:

Hi, This time, I'm looking for a regular expression which says "the string must contain exactly seven or exactly eight digits" e.g. 123456 fails 1234567 passes 12345678 passes 123456789...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice