Good Python style?

Andreas Beyer

Hi,

I found the following quite cryptic code, which basically reads the
first column of some_file into a set.
In Python I am used to seeing much more verbose/explicit code. However,
the example below _may_ actually be faster than the usual "for line in ..."
Do you consider this code good Python style? Or would you recommend to
refrain from such complex single-line code??

Thanks!
Andreas

inp = resource(some_file)
# read first entries of all non-empty lines into a set
some_set = frozenset([line.split()[0] for line in \
filter(None, [ln.strip() for ln in inp])])

May 31 '07 #1

Subscribe Post Reply

1313

Steven D'Aprano

On Thu, 31 May 2007 09:59:07 +0200, Andreas Beyer wrote:

Hi,

I found the following quite cryptic code, which basically reads the
first column of some_file into a set.
In Python I am used to seeing much more verbose/explicit code. However,
the example below _may_ actually be faster than the usual "for line in ..."
Do you consider this code good Python style? Or would you recommend to
refrain from such complex single-line code??

Thanks!
Andreas

inp = resource(some_file)
# read first entries of all non-empty lines into a set
some_set = frozenset([line.split()[0] for line in \
filter(None, [ln.strip() for ln in inp])])

It's a little complex, but not excessively so. Any more, and it would
probably be too complex.

It would probably be easier to read with more readable names and a few
comments:

some_set = frozenset(
# ... from a list comp of the first word in each line
[line.split()[0] for line in
# ... each line has leading and trailing white space
# filtered out, and blanks are skipped
filter(None, # strip whitespace and filter out blank lines
[line.strip() for line in input_lines]
)])

Splitting it into multiple lines is self-documenting:

blankless_lines = filter(None, [line.strip() for line in input_lines])
first_words = [line.split()[0] for line in blankless_words]
some_set = frozenset(first_words)

As for which is faster, I doubt that there will be much difference, but I
expect the first version will be a smidgen fastest. However, I'm too lazy
to time it myself, so I'll just point you at the timeit module.

--
Steven.

May 31 '07 #2

Virgil Dupras

On May 31, 3:59 am, Andreas Beyer <m...@a-beyer.dewrote:

Hi,

I found the following quite cryptic code, which basically reads the
first column of some_file into a set.
In Python I am used to seeing much more verbose/explicit code. However,
the example below _may_ actually be faster than the usual "for line in ..."
Do you consider this code good Python style? Or would you recommend to
refrain from such complex single-line code??

Thanks!
Andreas

inp = resource(some_file)
# read first entries of all non-empty lines into a set
some_set = frozenset([line.split()[0] for line in \
filter(None, [ln.strip() for ln in inp])])

I think it would be more readable if you would take the filter out of
the list comprehension, and the list comprehension out of the set.

inp = resource(some_file)
stripped_lines = (ln.strip() for ln in inp)
splitted_lines = (line.split()[0] for line in stripped_lines if line)
some_set = frozenset(splitted_lines)

May 31 '07 #3

Michael Hoffman

Steven D'Aprano wrote:

It would probably be easier to read with more readable names and a few
comments:

[...]

Splitting it into multiple lines is self-documenting:

blankless_lines = filter(None, [line.strip() for line in input_lines])
first_words = [line.split()[0] for line in blankless_words]
some_set = frozenset(first_words)

Re-writing code so that it is self-documenting is almost always a better
approach. Premature optimization is the root of all evil.
--
Michael Hoffman

May 31 '07 #4

Alex Martelli

Andreas Beyer <ma**@a-beyer.dewrote:

Hi,

I found the following quite cryptic code, which basically reads the
first column of some_file into a set.
In Python I am used to seeing much more verbose/explicit code. However,
the example below _may_ actually be faster than the usual "for line in ..."
Do you consider this code good Python style? Or would you recommend to
refrain from such complex single-line code??

Thanks!
Andreas

inp = resource(some_file)
# read first entries of all non-empty lines into a set
some_set = frozenset([line.split()[0] for line in \
filter(None, [ln.strip() for ln in inp])])

Sparse is better than dense, and this code is far from the fastest one
could write -- it builds useless lists comprehensions where genexps
would do, it splits off all whitespace-separated words (rather than just
the first one) then tosses all but the first away, etc.

frozenset(first_word_of_line
for line in input_file
for first_word_of_line in line.split(None, 1)[:1]
)

does not sacrifice any speed (on the contrary), yet achieves somewhat
better clarity by clearer variable names, reasonable use of whitespace
for formatting, and avoidance of redundant processing.

If this set didn't have to be frozen, building it in a normal for loop
(with a .add call in the nested loop) would no doubt be just as good in
terms of both clarity and performance; however, frozen sets (like
tuples) don't lend themselves to such incremental building "in place"
(you'd have to build the mutable version, then "cast" it at the end; the
alternative of making a new frozenset each time through the loop is
clearly unpleasant), so I can see why one would want to avoid that.

A good, readable, and only slightly slower alternative is to write a
named generator:

def first_words(input_file):
for line in input_file:
first_word_if_any = line.split(None, 1)
if first_word_if_any:
yield first_word_if_any[0]

ff = frozenset(first_words(input_file))

A full-fledged generator gives you more formatting choices, an extra
name to help, and the possibility of naming intermediate variables.
Whether it's worth it in this case is moot -- personally I find the
"slice then for-loop" idiom (which I used in the genexp) just as clear
as the "test then index" one (which I used in first_words) to express
the underlying notion "give me the first item if any, but just proceed
without giving anything if the list is empty"; but I can imagine
somebody else deciding that the test/index idiom is a more direct
expression of that notion. You could use it in a genexp, too, of
course:

frozenset(first_word_if_any[0]
for line in input_file
for first_word_if_any in [line.split(None, 1)]
if first_word_if_any
)

but this requires the "for x in [y]" trick to simulate the "x=y"
assigment of an intermediate variable in a genexp or LC, which is never
an elegant approach, so I wouldn't recommend it.
Alex

May 31 '07 #5

by: Christian Seberino | last post by:

How does Ruby compare to Python?? How good is DESIGN of Ruby compared to Python? Python's design is godly. I'm wondering if Ruby's is godly too. I've heard it has solid OOP design but then...

Python

prototypes in Python [was: what is good in Prothon]

by: Michele Simionato | last post by:

So far, I have not installed Prothon, nor I have experience with Io, Self or other prototype-based languages. Still, from the discussion on the mailing list, I have got the strong impression that...

Python

Good Form

by: phez.asap | last post by:

I am new to Python but come from a C++ background so I am trying to connect the dots :) . I am really liking what I see so far but have some nubee questions on what is considered good form. For one...

Python

206

Python's "only one way to do it" philosophy isn't good?

by: WaterWalk | last post by:

I've just read an article "Building Robust System" by Gerald Jay Sussman. The article is here: http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf In it there is a...

Python

Good programming style

by: Astley Le Jasper | last post by:

I'm still learning python and would like to know what's a good way of organizing code. I am writing some scripts to scrape a number of different website that hold similar information and then...

Python

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Good Python style?

Similar topics