split on blank lines

Hi everyone,

can somebody tell me why (using Python 2.3.2)

import re
re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz") ['foo\n\nbar\n\nbaz']

? Being used to Perl semantics, I expect

['foo\n', 'bar\n', 'baz']

or something equivalent without the '\n' characters in the result
strings. I have found that
re.compile(r"^\n", re.MULTILINE).split("foo\n\nbar\n\nbaz")

['foo\n', 'bar\n', 'baz']

I prefer the first version however because my intent is stated more
clearly. Could this be a bug in sre.py (I looked at the code for a
good two minutes but then my head started hurting)

Thanks for your help,

Jan

Jul 18 '05 #1

Subscribe Post Reply

6727

Duncan Booth

jb****@hotmail.com (Jan Burgy) wrote in
news:80**************************@posting.google.c om:

can somebody tell me why (using Python 2.3.2)
import re
re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz") ['foo\n\nbar\n\nbaz']

? Being used to Perl semantics, I expect

['foo\n', 'bar\n', 'baz']

or something equivalent without the '\n' characters in the result
strings. I have found that
re.compile(r"^\n", re.MULTILINE).split("foo\n\nbar\n\nbaz") ['foo\n', 'bar\n', 'baz']

I prefer the first version however because my intent is stated more
clearly. Could this be a bug in sre.py (I looked at the code for a
good two minutes but then my head started hurting)

Given that re.compile("^$", re.MULTILINE).findall("foo\n\nbar\n\nbaz")
returns ['', ''] I would agree this looks like a bug. You could submit a
bug report on Sourceforge.

Of course, if you really want to state your intentions, you could just use:

"foo\n\nbar\n\nbaz".split('\n\n')

['foo', 'bar', 'baz']

as you aren't doing anything here that obviously benefits from regex
obfuscation.

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #2

Hans Nowak

Duncan Booth wrote:

Given that re.compile("^$", re.MULTILINE).findall("foo\n\nbar\n\nbaz")
returns ['', ''] I would agree this looks like a bug. You could submit a
bug report on Sourceforge.

I may be wrong, but I would think that the behavior is correct. "^$" matches an
empty line. This is exactly what findall returns... two empty lines.

--
Hans (ha**@zephyrfalcon.org)
http://zephyrfalcon.org/

Jul 18 '05 #3

Duncan Booth

Hans Nowak <ha**@zephyrfalcon.org> wrote in
news:ma*************************************@pytho n.org:

Duncan Booth wrote:
Given that re.compile("^$",
re.MULTILINE).findall("foo\n\nbar\n\nbaz") returns ['', ''] I would
agree this looks like a bug. You could submit a bug report on
Sourceforge.
I may be wrong, but I would think that the behavior is correct. "^$"
matches an empty line. This is exactly what findall returns... two
empty lines.

Perhaps you trimmed too much of the original context, but you have
misunderstood the original poster's intent.

The original post said:
can somebody tell me why (using Python 2.3.2)
import re
re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz")

['foo\n\nbar\n\nbaz']

Notice that the string they are splitting contains two empty lines. I
pointed out that re.findall correctly spots the two empty lines, and
therefore you would expect that the split should correctly split the string
there, but it doesn't.

For the avoidance of doubt: there is an inconsistency of behaviour between
re.findall and re.split. It looks to me like a bug in the str.split method.

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #4

Jan Burgy

Duncan Booth <du****@NOSPAMrcp.co.uk> wrote in message news:<Xn***************************@127.0.0.1>...

jb****@hotmail.com (Jan Burgy) wrote in
news:80**************************@posting.google.c om:
can somebody tell me why (using Python 2.3.2)
> import re
> re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz")

['foo\n\nbar\n\nbaz']

? Being used to Perl semantics, I expect

['foo\n', 'bar\n', 'baz']

or something equivalent without the '\n' characters in the result
strings. I have found that
> re.compile(r"^\n", re.MULTILINE).split("foo\n\nbar\n\nbaz")

['foo\n', 'bar\n', 'baz']

I prefer the first version however because my intent is stated more
clearly. Could this be a bug in sre.py (I looked at the code for a
good two minutes but then my head started hurting)

Given that re.compile("^$", re.MULTILINE).findall("foo\n\nbar\n\nbaz")
returns ['', ''] I would agree this looks like a bug. You could submit a
bug report on Sourceforge.

Of course, if you really want to state your intentions, you could just use:
>>> "foo\n\nbar\n\nbaz".split('\n\n')

['foo', 'bar', 'baz']

as you aren't doing anything here that obviously benefits from regex
obfuscation.

Thank you Duncan for your input. You're right, I will post a bug
report on sourceforge. Why, you ask, do I split on "^$" and not simply
"\n\n"? Simply because I'm dealing with an idiotic file format (not my
own mind you) and that I really want to split on "^\t*$" (I agree with
you that it's a rather arbitrary definition of a blank line, once
again, not mine). When the above didn't work, I spent a long time
questioning my understanding of regular expressions until I could
simplify my code to the minimal amount that still yielded the error.
Sometimes I wish that Python contained more elements from AWK (in
particularly "RS" for instance)

Cheers,

Jan

--
Being an actuary is a lot harder than being a mathematician: it is
enough for a mathematician to prove that he or she is right.

Jul 18 '05 #5

by: Ruben | last post by:

Hello. I am trying to read a small text file using the readline statement. I can only read the first 2 records from the file. It stops at the blank lines or at lines with only spaces. I have a...

Python

skip blank lines

by: puzzlecracker | last post by:

I want to read lines and skip blank lines: would this work considering the lines can contain tabs, spaces, etc.? file.in: ------ line1 line2

C / C++

Blank Lines In Reports

by: Melissa | last post by:

Does anyone have a generic procedure for adding blank lines to reports like Sales details, PO details and/or Orders details. The procedure would need to count the number of line items, determine...

Microsoft Access / VBA

Reading XML document with blank lines at top

by: Ryan S | last post by:

I am trying to read an XML document generated by a web server using the XMLTextReader class, but the document generated appears to have some blank lines at the top that are causing problems. If...

Visual Basic .NET

unwanted blank lines in output when using xalan

by: Jeff Calico | last post by:

Hello everyone I am transforming an XML document to text, basically only outputting a small portion of it. When I run the following XSLT via Xalan's processor, I get a bunch of unwanted blank...

.NET Framework

Eliminate blank lines in export (between headings and when a heading is empty)

by: satya.mahesh | last post by:

Hi All, I am working on a problem which "eliminates blank lines in export (between headings and when a heading is empty)". I want a macro which will do this job for me. For e.g: Heading1 ...

HTML / CSS

Blank lines

by: DAnne | last post by:

Hi, I have checked your archives but have not been able to find anything that works for my situation. I have a for loop that brings back a list of unique responses for each section in a report....

XML

code formatting: how to clean unnecessary blank lines?

by: Andreas Bauer | last post by:

Hi, I have to audit some c# code. I know in the options I can adjust how the code should be formatted while entering it. But is there any way to apply afterwards a code template to the classes...

C# / C Sharp

blank lines

by: sskk | last post by:

how can i split by blank lines? split(\r\n\r\n,$arr) like this?

PHP

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

split on blank lines

Similar topics