Perl and Python, a practical side-by-side example. - Page 2

Shawn Milo

I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.

I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.

Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.

I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.

I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.

Thanks for taking the time to look at these.

Shawn

############### ############### ############### ############### ##############
Perl code:
############### ############### ############### ############### ##############
#! /usr/bin/env perl

use warnings;
use strict;

my $piid;
my $row;
my %input;
my $best;
my $curr;

foreach $row (<>){

chomp($row);
$piid = (split(/\t/, $row))[0];

push ( @{$input{$piid} }, $row );
}

for $piid (keys(%input)){

$best = "";

for $curr (@{$input{$piid }}){
if ($best eq ""){
$best = $curr;
}else{
#If the current record is the correct state

if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){
#If existing record is the correct state
if ((split(/\t/, $best))[1] eq (split(/\t/, $curr))[6]){
if ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5]){
$best = $curr;
}
}else{
$best = $curr;
}
}else{
#if the existing record does not have the correct state
#and the new one has a newer expiration date
if (((split(/\t/, $best))[1] ne (split(/\t/, $curr))[6]) and
((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){
$best = $curr;
}
}
}
}
print "$best\n";
}

############### ############### ############### ############### ##############
End Perl code
############### ############### ############### ############### ##############

############### ############### ############### ############### ##############
Python code
############### ############### ############### ############### ##############

#! /usr/bin/env python

import sys

input = sys.stdin

recs = {}

for row in input:
row = row.rstrip('\n' )
piid = row.split('\t')[0]
if recs.has_key(pi id) is False:
recs[piid] = []
recs[piid].append(row)

for piid in recs.keys():
best = ""
for current in recs[piid]:
if best == "":
best = current;
else:
#If the current record is the correct state
if current.split(" \t")[1] == current.split(" \t")[6]:
#If the existing record is the correct state
if best.split("\t" )[1] == best.split("\t" )[6]:
#If the new record has a newer exp. date
if current.split(" \t")[5] best.split("\t" )[5]:
best = current
else:
best = current
else:
#If the existing record does not have the correct state
#and the new record has a newer exp. date
if best.split("\t" )[1] != best.split("\t" )[6] and
current.split(" \t")[5] best.split("\t" )[5]:
best = current

print best
############### ############### ############### ############### ##############
End Python code
############### ############### ############### ############### ##############

Mar 2 '07

Subscribe Reply

2219

Marc 'BlackJack' Rintsch

In <54************ *@mid.individua l.net>, Bjoern Schliessmann wrote:

Bruno Desthuilliers wrote:
>Shawn Milo a Ã©crit :

>> if recs.has_key(pi id) is False:

'is' is the identity operator - practically, in CPython, it
compares memory addresses. You *dont* want to use it here.

It's recommended to use "is None"; why not "is False"? Are there
multiple False instances or is False generated somehow?

Before `True` and `False` existed many people defined them as aliases to 1
and 0. And of course there are *many* other objects that can be used in a
boolean context of an ``if`` statement for testing "trueness" and
"falseness" .

Ciao,
Marc 'BlackJack' Rintsch

Mar 3 '07 #11

John Machin

On Mar 3, 7:08 pm, attn.steven.... @gmail.com wrote:

On Mar 2, 2:44 pm, "Shawn Milo" <S...@Milochik. comwrote:

(snipped)

I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.

Thanks for taking the time to look at these.

My attempts:
### Python (re-working John's code) ###

import sys

def keep_best(best, current):

ACTUAL_STATE = 1
# John had these swapped
DESIRED_STATE = 5
EXPIRY_DATE = 6

*Bullshit* -- You are confusing me with Bruno; try (re)?reading what
the OP wrote (and which you quoted above):
"""
The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6).
"""

and his code (indented a little less boisterously):

"""
#If the current record is the correct state
if current.split(" \t")[1] == current.split(" \t")[6]:
#If the existing record is the correct state
if best.split("\t" )[1] == best.split("\t" )[6]:
#If the new record has a newer exp. date
if current.split(" \t")[5] best.split("\t" )[5]:
"""

Mar 3 '07 #12

William Heymann

On Saturday 03 March 2007, Ben Finney wrote:

Bjoern Schliessmann <us************ **************@ spamgourmet.com writes:

if not recs.has_key(pi id): # [1]

Why not

if piid not in recs:

That is shorter, simpler, easier to read and very slightly faster. Plus you
can change the data structure of recs later without changing that line so
long as it implements containment testing.

Mar 3 '07 #13

Ben Finney

William Heymann <ko**@aesaeion. comwrites:

On Saturday 03 March 2007, Ben Finney wrote:
Bjoern Schliessmann <us************ **************@ spamgourmet.com writes:

if not recs.has_key(pi id): # [1]
Why not

if piid not in recs:

That is shorter, simpler, easier to read and very slightly faster.

Perhaps if I'd made my posting shorter, simpler, easier to read and
slightly faster, you might have read the footnote to which the '[1]'
referred.

--
\ "Choose mnemonic identifiers. If you can't remember what |
`\ mnemonic means, you've got a problem." -- Larry Wall |
_o__) |
Ben Finney

Mar 3 '07 #14

Paddy

On Mar 2, 10:44 pm, "Shawn Milo" <S...@Milochik. comwrote:

I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.

I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.

Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.

Hi Shawn, there is a web page that gives examples from Perl's
Datastructures Cookbook re-implemented in Python. It might be of help
for future Python projects:
http://wiki.python.org/moin/PerlPhrasebook

- Paddy.

Mar 3 '07 #15

Jussi Salmela

Shawn Milo kirjoitti:

<snip>
I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.

I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.

I don't know if this attempt satisfies your criteria but here goes!

This is not a rewrite of your program but was created using your problem
description above. I've not included the reading of the data because it
has not much to do with the problem per se.

#============== =============== =============== =============== =
input = [
"aaa\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 0120\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 1010\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 1111\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1201\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0612\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
]

input = [x[:-1].split('\t') for x in input]
recs = {}
for row in input:
recs.setdefault (row[0], []).append(row)

for key in recs:
rows = recs[key]
rows.sort(key=l ambda x:x[5], reverse=True)
for current in rows:
if current[1] == current[6]:
break
else:
current = rows[0]
print '\t'.join(curre nt)
#============== =============== =============== =============== =
The output is:

aaa AAA ... ... ... 20070120 AAA
bbb AAA ... ... ... 20071212 AAA
ccc AAA ... ... ... 20071212 AAA

and it is the same as the output of your original code on this data.
Further testing would naturally be beneficial.

Cheers,
Jussi

Mar 3 '07 #16

Bruno Desthuilliers

John Machin a écrit :

On Mar 3, 12:36 pm, Bruno Desthuilliers >
[snip]

> DATE = 5
TARGET = 6

[snip]

>>Now for the bad news: I'm afraid your algorithm is broken : here are my
test data and results:

input = [
#ID STATE ... ... ... TARG DATE
"aaa\tAAA\t...\ t...\t...\tBBB\ t20071212\n",

[snip]

Bruno, The worse news is that your test data is broken.

Re-reading the OP's specs, the bad news is that my only neuron left is
broken. Shouldn't code at 2 o'clock in the morning :(

Mar 4 '07 #17

Bruno Desthuilliers

Bjoern Schliessmann a écrit :

Bruno Desthuilliers wrote:

>>Shawn Milo a écrit :

>> if recs.has_key(pi id) is False:

'is' is the identity operator - practically, in CPython, it
compares memory addresses. You *dont* want to use it here.

It's recommended to use "is None"; why not "is False"? Are there
multiple False instances or is False generated somehow?

Once upon a time, Python didn't have a "proper" boolean type. It only
had rules for boolean evaluation of a given object. According to these
rules - that of course still apply -, empty strings, lists, tuples or
dicts, numeric zeros and None are false in a boolean context. IOW, an
expression can eval to false without actually being the False object
itself. So the result of using the identity operator to test against
such an expression, while being clearly defined, may not be exactly what
you'd think.

To make a long story short:

if not []:
print "the empty list evals to false in a boolean context"

if [] is False:
print "this python interpreter is broken"

HTH

Mar 4 '07 #18

Bruno Desthuilliers

Shawn Milo a écrit :
(snip)

The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.

Here's a fixed (wrt/ test data) version with a somewhat better (and
faster) algorithm using Decorate/Sort/Undecorate (aka schwarzian transform):

import sys
output = sys.stdout

input = [
#ID STATE ... ... ... DATE TARGET
"aaa\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 0120\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 1010\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 1111\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1201\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 0612\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
]

def find_best_match (input=input, output=output):
PIID = 0
STATE = 1
EXP_DATE = 5
DESIRED_STATE = 6

recs = {}
for line in input:
line = line.rstrip('\n ')
row = line.split('\t' )
sort_key = (row[STATE] == row[DESIRED_STATE], row[EXP_DATE])
recs.setdefault (row[PIID], []).append((sort_ key, line))

for decorated_lines in recs.itervalues ():
print >output, sorted(decorate d_lines, reverse=True)[0][1]

Lines are sorted first on whether the state matches the desired state,
then on the expiration date. Since it's a reverse sort, we first have
lines that match (if any) sorted by date descending, then the lines that
dont match sorted by date descending. So in both cases, the 'best match'
is the first item in the list. Then we just have to get rid of the sort
key, et voilà !-)

HTH

Mar 4 '07 #19

Peter Otten

Bruno Desthuilliers wrote:

print >output, sorted(decorate d_lines, reverse=True)[0][1]

Or just
print >output, max(decorated_l ines)[1]

Peter

Mar 4 '07 #20

Similar topics

4112

Choosing Perl/Python for my particular niche

by: Fred Ma | last post by:

Hello, This is not a troll posting, and I've refrained from asking because I've seen similar threads get all nitter-nattery. But I really want to make a decision on how best to invest my time. I'm not interested on which language is better in *general*, just for my purpose. My area of research is in CAD algorithms, and I'm sensing the need to resort to something more expedient than C++, bash scripting, or sed scripting.

Python

1938

[perl-python] 20050117, filter, map

by: Xah Lee | last post by:

© # -*- coding: utf-8 -*- © # Python © © # the "filter" function can be used to © # reduce a list such that unwanted © # elements are removed. © # example: © © def even(n): return n % 2 == 0 © print filter( even, range(11))

Python

2092

[perl-python] 20050127 traverse a dir

by: Xah Lee | last post by:

# -*- coding: utf-8 -*- # Python suppose you want to walk into a directory, say, to apply a string replacement to all html files. The os.path.walk() rises for the occasion. © import os © mydir= '/Users/t/Documents/unix_cilre/python' © def myfun(s1, s2, s3):

Python

3218

[perl-python] string pattern matching

by: Xah Lee | last post by:

# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images files from gif # format to png format. Now you need to change the # html code to use the .png files. So, essentially

Python

2540

[perl-python] get web page programatically

by: Xah Lee | last post by:

# -*- coding: utf-8 -*- # Python # suppose you want to fetch a webpage. from urllib import urlopen print urlopen('http://xahlee.org/Periodic_dosage_dir/_p2/russell-lecture.html').read() # note the line # from <library_name> import <function_name1,function_name2...>

Python

1906

[perl-python] text pattern matching, and expressiveness

by: Xah Lee | last post by:

20050207 text pattern matching # -*- coding: utf-8 -*- # Python # suppose you want to replace all strings of the form # <img src="some.gif" width="30" height="20"> # to # <img src="some.png" width="30" height="20"> # in your html files.

Python

1504

[perl-python] combinatorics fun

by: Xah Lee | last post by:

a year ago i wrote this perl program as part of a larger program. as a exercise of fun, let's do a python version. I'll post my version later today. =pod combo(n) returns a collection with elements of pairs that is all possible combinations of 2 things from n. For example, combo(4) returns {'3,4' => ,'1,2' => ,'1,3' => ,'1,4' =>

Python

1928

[perl-python] problem: reducing comparison

by: Xah Lee | last post by:

here's a interesting real-world algoritm to have fun with. attached below is the Perl documentation that i wrote for a function called "reduce", which is really the heart of a larger software. The implementation is really simple, but the key is to understand what the function should be. I'll post Perl and Python codes tomorrow for those interested. If you are a perl programer, try to code it in Python. (it's easy.)

Python

12379

Perl-Python-a-Day: split a file full path

by: Xah Lee | last post by:

Split File Fullpath Into Parts Xah Lee, 20051016 Often, we are given a file fullpath and we need to split it into the directory name and file name. The file name is often split into a core part and a extension part. For example: '/Users/t/web/perl-python/I_Love_You.html' becomes

Python

1300

perl-python a-day

by: Xah Lee | last post by:

i'm starting a yahoo group for learning python. Each day, a tip of python will be shown, with the perl equivalent. For those of you perlers who always wanted to learn python, this is suitable. (i started it because i always wanted to switch to python but too lazy and always falling back to a lang i am an expert at, but frustrated constantly by its inanities and incompetences.) to subscribe, go to:...

C / C++

9656

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10364

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10172

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10110

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9967

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8993

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5398

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4069

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3670

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP