469,344 Members | 6,274 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,344 developers. It's quick & easy.

Is there a better/simpler way to filter blank lines?

I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?

lines = filter(lambda line: len(line.strip()) 0, lines)

Thomas
Nov 4 '08 #1
25 1696
tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)
xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile
Nov 4 '08 #2
be************@lycos.com wrote:
tmallen:
>I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)

xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile
Of if you want to filter/loop at the same time, or if you don't want all the
lines in memory at the same time:

fp = open(filename, 'r')
for line in fp:
if not line.strip():
continue

#
# Do something with the non-blank like:
#
fp.close()

-Larry
Nov 4 '08 #3
On Nov 4, 4:30*pm, bearophileH...@lycos.com wrote:
tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)

xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile
I must be missing something:
>>xlines = (line for line in open("new.data") if line.strip())
xlines
<generator object at 0x6b648>
>>xlines.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?

Thomas
Nov 4 '08 #4
On Tue, 04 Nov 2008 13:27:00 -0800, tmallen wrote:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?

lines = filter(lambda line: len(line.strip()) 0, lines)

Thomas

lines = filter(lambda line: line.strip(), lines)
--
Steven
Nov 4 '08 #5
On Tue, Nov 4, 2008 at 2:30 PM, tmallen <th**********@gmail.comwrote:
On Nov 4, 4:30 pm, bearophileH...@lycos.com wrote:
>tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)

xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile

I must be missing something:
>>>xlines = (line for line in open("new.data") if line.strip())
xlines
<generator object at 0x6b648>
>>>xlines.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?
xlines is a generator, not a list. If you don't know what a generator
is, see the relevant parts of the Python tutorial/manual (Google is
your friend).
To sort the generator, you can use 'sorted(xlines)'
If you need it to actually be a list, you can do 'list(xlines)'

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com
>
Thomas
--
http://mail.python.org/mailman/listinfo/python-list
Nov 4 '08 #6
Larry Bates <la*********@vitalEsafe.comwrites:
be************@lycos.com wrote:
xlines = (line for line in open(filename) if line.strip())

Of if you want to filter/loop at the same time, or if you don't want
all the lines in memory at the same time
The above implementation creates a generator; so it, too, won't need
to load all the lines in memory at the same time

--
\ “Program testing can be a very effective way to show the |
`\ presence of bugs, but is hopelessly inadequate for showing |
_o__) their absence.” —Edsger W. Dijkstra |
Ben Finney
Nov 4 '08 #7
tmallen <th**********@gmail.comwrites:
On Nov 4, 4:30*pm, bearophileH...@lycos.com wrote:
xlines = (line for line in open(filename) if line.strip())

I must be missing something:
>xlines = (line for line in open("new.data") if line.strip())
xlines
<generator object at 0x6b648>
A generator <URL:http://www.python.org/dev/peps/pep-0255is a
sequence, but is not a collection. It will generate each item on
request, rather than having them all in memory at once.

for line in xlines:
do something_knowing_the_line_is_not_blank(line)

If you later *want* a collection containing all the items from the
generator, you can feed the generator (or any iterable) to a type that
can turn it into a collection. For example, to get all the filtered
lines as a list:

all_lines = list(xlines)

Note that some generators (not this one, which will end because the
file is finite size) never end, so feeding them to a constructor this
way will never return.

--
\ “It is far better to grasp the universe as it really is than to |
`\ persist in delusion, however satisfying and reassuring.” —Carl |
_o__) Sagan |
Ben Finney
Nov 4 '08 #8
tmallen
I must be missing something:
>xlines = (line for line in open("new.data") if line.strip())
xlines
<generator object at 0x6b648>
>xlines.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?
Congratulations, you have just met your first lazy construct ^_^
That's a generator, it yields nonblank lines one after the other. This
can be really useful.
If you want a real array of items, then you can do this:
lines = list(xlines)
Or use a list comp.:
lines = [line for line in open("new.data") if line.strip()]

Bye,
bearophile
Nov 4 '08 #9
On Nov 4, 3:30*pm, tmallen <thomasmal...@gmail.comwrote:
On Nov 4, 4:30*pm, bearophileH...@lycos.com wrote:
tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)
xlines = (line for line in open(filename) if line.strip())
Bye,
bearophile

I must be missing something:
>xlines = (line for line in open("new.data") if line.strip())
xlines

<generator object at 0x6b648>>>xlines.sort()

Traceback (most recent call last):
* File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?

Thomas
Using the surrounding parentheses creates a generator object, whereas
using square brackets would create a list. So, if you want to run list
operations on the resulting object, you'll want to use the list
comprehension instead.

i.e.

list_o_lines = [line for line in open(filename) if line.strip()]

Downside is the increased memory usage and processing time as you dump
the entire file into memory, whereas if you plan to do a "for line in
xlines:" operation, it would be faster to use the generator.
Nov 4 '08 #10
Between this info and http://www.python.org/doc/2.5.2/tut/...00000000000000
, I'm starting to understand how I'll use generators (I've seen them
mentioned before, but never used them knowingly).
list_o_lines = [line for line in open(filename) if line.strip()]
+1 for "list_o_lines"

Thanks for the help!
Thomas

On Nov 4, 6:36*pm, Falcolas <garri...@gmail.comwrote:
On Nov 4, 3:30*pm, tmallen <thomasmal...@gmail.comwrote:
On Nov 4, 4:30*pm, bearophileH...@lycos.com wrote:
tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)
xlines = (line for line in open(filename) if line.strip())
Bye,
bearophile
I must be missing something:
>>xlines = (line for line in open("new.data") if line.strip())
>>xlines
<generator object at 0x6b648>>>xlines.sort()
Traceback (most recent call last):
* File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'
What do you think?
Thomas

Using the surrounding parentheses creates a generator object, whereas
using square brackets would create a list. So, if you want to run list
operations on the resulting object, you'll want to use the list
comprehension instead.

i.e.

list_o_lines = [line for line in open(filename) if line.strip()]

Downside is the increased memory usage and processing time as you dump
the entire file into memory, whereas if you plan to do a "for line in
xlines:" operation, it would be faster to use the generator.
Nov 4 '08 #11
Falcolas <ga******@gmail.comwrites:
Using the surrounding parentheses creates a generator object
No. Using the generator expression syntax creates a generator object.

Parentheses are irrelevant to whether the expression is a generator
expression. The parentheses merely group the expression from
surrounding syntax.

--
\ “bash awk grep perl sed, df du, du-du du-du, vi troff su fsck |
`\ rm * halt LART LART LART!” —The Swedish BOFH, |
_o__) alt.sysadmin.recovery |
Ben Finney
Nov 5 '08 #12
tmallen wrote:
On Nov 4, 4:30 pm, bearophileH...@lycos.com wrote:
>tmallen:
>>I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)
xlines = (line for line in open(filename) if line.strip())

Bye,
bearophile

I must be missing something:
>>>xlines = (line for line in open("new.data") if line.strip())
xlines
<generator object at 0x6b648>
>>>xlines.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'sort'

What do you think?
I think there'd be no advantage to a sort method on a generator, since
theoretically the last item could be the first required in the sorted
sequence, so it's necessary to hold all items in memory to ensure the
sort is correct. So there's no point using a generator in the first place.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Nov 5 '08 #13
On Wed, 05 Nov 2008 12:06:42 +1100, Ben Finney wrote:
Falcolas <ga******@gmail.comwrites:
>Using the surrounding parentheses creates a generator object

No. Using the generator expression syntax creates a generator object.

Parentheses are irrelevant to whether the expression is a generator
expression. The parentheses merely group the expression from surrounding
syntax.
No they are important:

In [270]: a = x for x in xrange(10)
------------------------------------------------------------
File "<ipython console>", line 1
a = x for x in xrange(10)
^
<type 'exceptions.SyntaxError'>: invalid syntax
In [271]: a = (x for x in xrange(10))

Ciao,
Marc 'BlackJack' Rintsch
Nov 5 '08 #14
Marc 'BlackJack' Rintsch <bj****@gmx.netwrites:
On Wed, 05 Nov 2008 12:06:42 +1100, Ben Finney wrote:
Falcolas <ga******@gmail.comwrites:
Using the surrounding parentheses creates a generator object
No. Using the generator expression syntax creates a generator
object.

Parentheses are irrelevant to whether the expression is a
generator expression. The parentheses merely group the expression
from surrounding syntax.

No they are important:
Your example shows only that they're important for grouping the
expression from surrounding syntax. As I said.

They are *not* important for making the expresison be a generator
expression in the first place. Parentheses are irrelevant for the
generator expression syntax.

--
\ “Today, I was — no, that wasn't me.” —Steven Wright |
`\ |
_o__) |
Ben Finney
Nov 5 '08 #15
Steve Holden <st***@holdenweb.comwrites:
I think there'd be no advantage to a sort method on a generator,
since theoretically the last item could be the first required in the
sorted sequence
Worse, generators don't necessarily *have* a finite set of items, and
there's no way in general of telling whether any particular generator
will have a “last item” without trying to get all the items. So it
would be actively harmful to provide such a method on generators, IMO.

--
\ “Whatever you do will be insignificant, but it is very |
`\ important that you do it.” —Mahatma Gandhi |
_o__) |
Ben Finney
Nov 5 '08 #16
On Tue, 04 Nov 2008 20:25:09 -0500, Steve Holden wrote:
I think there'd be no advantage to a sort method on a generator, since
theoretically the last item could be the first required in the sorted
sequence, so it's necessary to hold all items in memory to ensure the
sort is correct. So there's no point using a generator in the first
place.

You can't sort something lazily.

Actually, that's not *quite* true: it only holds for comparison sorts.
You can sort lazily using non-comparison sorts, such as Counting Sort:

http://en.wikipedia.org/wiki/Counting_sort

Arguably, the benefit of giving generators a sort() method would be to
avoid an explicit call to list. But I think many people would argue that
was actually a disadvantage, not a benefit, and that the call to list is
a good thing. I'd agree with them.

However, sorted() should take a generator argument, and in fact I see it
does:
>>sorted( x+1 for x in (4, 2, 0, 3, 1) )
[1, 2, 3, 4, 5]

--
Steven
Nov 5 '08 #17
On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:
Marc 'BlackJack' Rintsch <bj****@gmx.netwrites:

Your example shows only that they're important for grouping the
expression from surrounding syntax. As I said.

They are *not* important for making the expresison be a generator
expression in the first place. Parentheses are irrelevant for the
generator expression syntax.
Okay, technically correct but parenthesis belong to generator expressions
because they have to be there to separate them from surrounding syntax
with the exception when there are already enclosing parentheses. So
parenthesis are tied to generator expression syntax.

Ciao,
Marc 'BlackJack' Rintsch
Nov 5 '08 #18
Marc 'BlackJack' Rintsch <bj****@gmx.netwrites:
On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:
Marc 'BlackJack' Rintsch <bj****@gmx.netwrites:

Your example shows only that they're important for grouping the
expression from surrounding syntax. As I said.

They are *not* important for making the expresison be a generator
expression in the first place. Parentheses are irrelevant for the
generator expression syntax.

Okay, technically correct but parenthesis belong to generator expressions
because they have to be there to separate them from surrounding syntax
with the exception when there are already enclosing parentheses. So
parenthesis are tied to generator expression syntax.
No, I think that's factually wrong *and* confusing.
>>list(i + 7 for i in range(10))
[7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

Does this demonstrate that parentheses are “tied to” integer literal
syntax? No.

Here, parentheses were used because they're part of the function call
syntax. In your example, parentheses were used as a grouping operator.
In neither case are they “tied to” the generator expression syntax.

It's best to be clear what parentheses *are* used for; they don't
“create a generator” nor are they “tied to” the generator
expression syntax.

--
\ “In any great organization it is far, far safer to be wrong |
`\ with the majority than to be right alone.” —John Kenneth |
_o__) Galbraith, 1989-07-28 |
Ben Finney
Nov 5 '08 #19
On Wed, 05 Nov 2008 14:39:36 +1100, Ben Finney wrote:
Marc 'BlackJack' Rintsch <bj****@gmx.netwrites:
>On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:
Marc 'BlackJack' Rintsch <bj****@gmx.netwrites:

Your example shows only that they're important for grouping the
expression from surrounding syntax. As I said.

They are *not* important for making the expresison be a generator
expression in the first place. Parentheses are irrelevant for the
generator expression syntax.

Okay, technically correct but parenthesis belong to generator
expressions because they have to be there to separate them from
surrounding syntax with the exception when there are already enclosing
parentheses. So parenthesis are tied to generator expression syntax.

No, I think that's factually wrong *and* confusing.
>>list(i + 7 for i in range(10))
[7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

Does this demonstrate that parentheses are “tied to” integer literal
syntax? No.
You can use integer literals without parenthesis, like the 7 above, but
you can't use generator expressions without them. They are always
there. In that way parenthesis are tied to generator expressions.

If I see the pattern ``f(x) for x in obj if c(x)`` I look if it is
enclosed in parenthesis or brackets to decide if it is a list
comprehension or a generator expression. That may not reflect the formal
grammar, but it is IMHO the easiest and pragmatic way to look at this as
a human programmer.

Ciao,
Marc 'BlackJack' Rintsch
Nov 5 '08 #20
On Tue, 04 Nov 2008 15:36:23 -0600, Larry Bates <la*********@vitalEsafe.comwrote:
be************@lycos.com wrote:
>tmallen:
>>I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)
....
Of if you want to filter/loop at the same time, or if you don't want all the
lines in memory at the same time:
Or if you want to support potentially infinite input streams, such as
a pipe or socket. There are many reasons this is my preferred way of
going through a text file.
fp = open(filename, 'r')
for line in fp:
if not line.strip():
continue

#
# Do something with the non-blank like:
#
fp.close()
Often, you want to at least rstrip() all lines anyway,
for other reasons, and then the extra cost is even less:

line = line.rstrip()
if not line: continue
# do something with the rstripped, nonblank lines

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se R'lyeh wgah'nagl fhtagn!
Nov 5 '08 #21
Why do I feel like the coding style in Lutz' "Programming Python" is
very far from idiomatic Python? The content feels dated, and I find
that most answers that I get for Python questions use a different
style from the sort of code I see in this book.

Thomas

On Nov 5, 7:15*am, Jorgen Grahn <grahn+n...@snipabacken.sewrote:
On Tue, 04 Nov 2008 15:36:23 -0600, Larry Bates <larry.ba...@vitalEsafe.comwrote:
bearophileH...@lycos.com wrote:
tmallen:
I'm parsing some text files, and I want to strip blank lines in the
process. Is there a simpler way to do this than what I have here?
lines = filter(lambda line: len(line.strip()) 0, lines)

...
Of if you want to filter/loop at the same time, or if you don't want all the
lines in memory at the same time:

Or if you want to support potentially infinite input streams, such as
a pipe or socket. *There are many reasons this is my preferred way of
going through a text file.
fp = open(filename, 'r')
for line in fp:
* * *if not line.strip():
* * * * *continue
* * *#
* * *# Do something with the non-blank like:
* * *#
fp.close()

Often, you want to at least rstrip() all lines anyway,
for other reasons, and then the extra cost is even less:

* * * *line = line.rstrip()
* * * *if not line: continue
* * * *# do something with the rstripped, nonblank lines

/Jorgen

--
* // Jorgen Grahn <grahn@ * * * *Ph'nglui mglw'nafh Cthulhu
\X/ * * snipabacken.se* * * * *R'lyeh wgah'nagl fhtagn!
Nov 5 '08 #22
Lie
On Nov 5, 4:56*pm, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
On Wed, 05 Nov 2008 14:39:36 +1100, Ben Finney wrote:
Marc 'BlackJack' Rintsch <bj_...@gmx.netwrites:
On Wed, 05 Nov 2008 13:18:27 +1100, Ben Finney wrote:
Marc 'BlackJack' Rintsch <bj_...@gmx.netwrites:
Your example shows only that they're important for grouping the
expression from surrounding syntax. As I said.
They are *not* important for making the expresison be a generator
expression in the first place. Parentheses are irrelevant for the
generator expression syntax.
Okay, technically correct but parenthesis belong to generator
expressions because they have to be there to separate them from
surrounding syntax with the exception when there are already enclosing
parentheses. *So parenthesis are tied to generator expression syntax..
No, I think that's factually wrong *and* confusing.
* * >>list(i + 7 for i in range(10))
* * [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
Does this demonstrate that parentheses are tied to integer literal
syntax? No.

You can use integer literals without parenthesis, like the 7 above, but
you can't use generator expressions without them. *They are always
there. *In that way parenthesis are tied to generator expressions.

If I see the pattern ``f(x) for x in obj if c(x)`` I look if it is
enclosed in parenthesis or brackets to decide if it is a list
comprehension or a generator expression. *That may not reflect the formal
grammar, but it is IMHO the easiest and pragmatic way to look at this as
a human programmer.

Ciao,
* * * * Marc 'BlackJack' Rintsch
The situation is similar to tuples. What makes a tuple is the commas,
not the parens.
What makes a generator expression is "<expfor <var-or-tuplein
<exp>".

Parenthesis is generally required because without it, it's almost
impossible to differentiate it with the surrounding. But it is not
part of the formally required syntax.
Nov 5 '08 #23
Lie <Li******@gmail.comwrites:
What makes a generator expression is "<expfor <var-or-tuplein
<exp>".

Parenthesis is generally required because without it, it's almost
impossible to differentiate it with the surrounding. But it is not
part of the formally required syntax.
.... But *every* generator expression is surrounded by parentheses, isn't
it?

--
Arnaud
Nov 5 '08 #24
Arnaud Delobelle wrote:
Lie <Li******@gmail.comwrites:
>What makes a generator expression is "<expfor <var-or-tuplein
<exp>".

Parenthesis is generally required because without it, it's almost
impossible to differentiate it with the surrounding. But it is not
part of the formally required syntax.

... But *every* generator expression is surrounded by parentheses, isn't
it?
Indeed, the syntax production is:

generator_expression ::= "(" expression genexpr_for ")"

albeit with the note: "The parentheses can be omitted on calls with only
one argument. See section 5.3.4 for the detail." but that only means
you don't need a second set of parentheses. A generator expression is
always enclosed in parentheses, the same is NOT true of a tuple.
Nov 5 '08 #25
Ben Finney wrote:
Falcolas writes:
>Using the surrounding parentheses creates a generator object

No. Using the generator expression syntax creates a generator object.

Parentheses are irrelevant to whether the expression is a generator
expression. The parentheses merely group the expression from
surrounding syntax.
As others have pointed out, the parentheses are part of the generator
syntax. If not for the parentheses, a list comprehension would be
indistinguishable from a list literal with a single element, a
generator object. It's also worth remembering that list
comprehensions are distinct from generator expressions and don't
require the creation of a generator object.

-Miles
Nov 5 '08 #26

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by John | last post: by
21 posts views Thread by Michele Simionato | last post: by
6 posts views Thread by Melissa | last post: by
4 posts views Thread by Himanshu Singh Chauhan | last post: by
1 post views Thread by Michael Shutt | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.