473,788 Members | 2,810 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Perl and Python, a practical side-by-side example.

I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.

I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.

Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.

I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.

I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.

Thanks for taking the time to look at these.

Shawn

############### ############### ############### ############### ##############
Perl code:
############### ############### ############### ############### ##############
#! /usr/bin/env perl

use warnings;
use strict;

my $piid;
my $row;
my %input;
my $best;
my $curr;

foreach $row (<>){

chomp($row);
$piid = (split(/\t/, $row))[0];

push ( @{$input{$piid} }, $row );
}

for $piid (keys(%input)){

$best = "";

for $curr (@{$input{$piid }}){
if ($best eq ""){
$best = $curr;
}else{
#If the current record is the correct state

if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){
#If existing record is the correct state
if ((split(/\t/, $best))[1] eq (split(/\t/, $curr))[6]){
if ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5]){
$best = $curr;
}
}else{
$best = $curr;
}
}else{
#if the existing record does not have the correct state
#and the new one has a newer expiration date
if (((split(/\t/, $best))[1] ne (split(/\t/, $curr))[6]) and
((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){
$best = $curr;
}
}
}
}
print "$best\n";
}

############### ############### ############### ############### ##############
End Perl code
############### ############### ############### ############### ##############


############### ############### ############### ############### ##############
Python code
############### ############### ############### ############### ##############

#! /usr/bin/env python

import sys

input = sys.stdin

recs = {}

for row in input:
row = row.rstrip('\n' )
piid = row.split('\t')[0]
if recs.has_key(pi id) is False:
recs[piid] = []
recs[piid].append(row)

for piid in recs.keys():
best = ""
for current in recs[piid]:
if best == "":
best = current;
else:
#If the current record is the correct state
if current.split(" \t")[1] == current.split(" \t")[6]:
#If the existing record is the correct state
if best.split("\t" )[1] == best.split("\t" )[6]:
#If the new record has a newer exp. date
if current.split(" \t")[5] best.split("\t" )[5]:
best = current
else:
best = current
else:
#If the existing record does not have the correct state
#and the new record has a newer exp. date
if best.split("\t" )[1] != best.split("\t" )[6] and
current.split(" \t")[5] best.split("\t" )[5]:
best = current

print best
############### ############### ############### ############### ##############
End Python code
############### ############### ############### ############### ##############
Mar 2 '07
20 2219
In <54************ *@mid.individua l.net>, Bjoern Schliessmann wrote:
Bruno Desthuilliers wrote:
>Shawn Milo a écrit :
>> if recs.has_key(pi id) is False:

'is' is the identity operator - practically, in CPython, it
compares memory addresses. You *dont* want to use it here.

It's recommended to use "is None"; why not "is False"? Are there
multiple False instances or is False generated somehow?
Before `True` and `False` existed many people defined them as aliases to 1
and 0. And of course there are *many* other objects that can be used in a
boolean context of an ``if`` statement for testing "trueness" and
"falseness" .

Ciao,
Marc 'BlackJack' Rintsch
Mar 3 '07 #11
On Mar 3, 7:08 pm, attn.steven.... @gmail.com wrote:
On Mar 2, 2:44 pm, "Shawn Milo" <S...@Milochik. comwrote:

(snipped)
I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Thanks for taking the time to look at these.

My attempts:
### Python (re-working John's code) ###

import sys

def keep_best(best, current):

ACTUAL_STATE = 1
# John had these swapped
DESIRED_STATE = 5
EXPIRY_DATE = 6
*Bullshit* -- You are confusing me with Bruno; try (re)?reading what
the OP wrote (and which you quoted above):
"""
The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6).
"""

and his code (indented a little less boisterously):

"""
#If the current record is the correct state
if current.split(" \t")[1] == current.split(" \t")[6]:
#If the existing record is the correct state
if best.split("\t" )[1] == best.split("\t" )[6]:
#If the new record has a newer exp. date
if current.split(" \t")[5] best.split("\t" )[5]:
"""

Mar 3 '07 #12
On Saturday 03 March 2007, Ben Finney wrote:
Bjoern Schliessmann <us************ **************@ spamgourmet.com writes:

if not recs.has_key(pi id): # [1]
Why not

if piid not in recs:

That is shorter, simpler, easier to read and very slightly faster. Plus you
can change the data structure of recs later without changing that line so
long as it implements containment testing.

Mar 3 '07 #13
William Heymann <ko**@aesaeion. comwrites:
On Saturday 03 March 2007, Ben Finney wrote:
Bjoern Schliessmann <us************ **************@ spamgourmet.com writes:

if not recs.has_key(pi id): # [1]
Why not

if piid not in recs:

That is shorter, simpler, easier to read and very slightly faster.
Perhaps if I'd made my posting shorter, simpler, easier to read and
slightly faster, you might have read the footnote to which the '[1]'
referred.

--
\ "Choose mnemonic identifiers. If you can't remember what |
`\ mnemonic means, you've got a problem." -- Larry Wall |
_o__) |
Ben Finney

Mar 3 '07 #14
On Mar 2, 10:44 pm, "Shawn Milo" <S...@Milochik. comwrote:
I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.

I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.

Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.
Hi Shawn, there is a web page that gives examples from Perl's
Datastructures Cookbook re-implemented in Python. It might be of help
for future Python projects:
http://wiki.python.org/moin/PerlPhrasebook

- Paddy.
Mar 3 '07 #15
Shawn Milo kirjoitti:
<snip>
I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.

I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
I don't know if this attempt satisfies your criteria but here goes!

This is not a rewrite of your program but was created using your problem
description above. I've not included the reading of the data because it
has not much to do with the problem per se.

#============== =============== =============== =============== =
input = [
"aaa\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 0120\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 1010\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 1111\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1201\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0612\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
]

input = [x[:-1].split('\t') for x in input]
recs = {}
for row in input:
recs.setdefault (row[0], []).append(row)

for key in recs:
rows = recs[key]
rows.sort(key=l ambda x:x[5], reverse=True)
for current in rows:
if current[1] == current[6]:
break
else:
current = rows[0]
print '\t'.join(curre nt)
#============== =============== =============== =============== =
The output is:

aaa AAA ... ... ... 20070120 AAA
bbb AAA ... ... ... 20071212 AAA
ccc AAA ... ... ... 20071212 AAA

and it is the same as the output of your original code on this data.
Further testing would naturally be beneficial.

Cheers,
Jussi
Mar 3 '07 #16
John Machin a écrit :
On Mar 3, 12:36 pm, Bruno Desthuilliers >
[snip]
> DATE = 5
TARGET = 6

[snip]
>>Now for the bad news: I'm afraid your algorithm is broken : here are my
test data and results:

input = [
#ID STATE ... ... ... TARG DATE
"aaa\tAAA\t...\ t...\t...\tBBB\ t20071212\n",

[snip]

Bruno, The worse news is that your test data is broken.
Re-reading the OP's specs, the bad news is that my only neuron left is
broken. Shouldn't code at 2 o'clock in the morning :(
Mar 4 '07 #17
Bjoern Schliessmann a écrit :
Bruno Desthuilliers wrote:
>>Shawn Milo a écrit :

>> if recs.has_key(pi id) is False:

'is' is the identity operator - practically, in CPython, it
compares memory addresses. You *dont* want to use it here.


It's recommended to use "is None"; why not "is False"? Are there
multiple False instances or is False generated somehow?
Once upon a time, Python didn't have a "proper" boolean type. It only
had rules for boolean evaluation of a given object. According to these
rules - that of course still apply -, empty strings, lists, tuples or
dicts, numeric zeros and None are false in a boolean context. IOW, an
expression can eval to false without actually being the False object
itself. So the result of using the identity operator to test against
such an expression, while being clearly defined, may not be exactly what
you'd think.

To make a long story short:

if not []:
print "the empty list evals to false in a boolean context"

if [] is False:
print "this python interpreter is broken"

HTH
Mar 4 '07 #18
Shawn Milo a écrit :
(snip)
The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Here's a fixed (wrt/ test data) version with a somewhat better (and
faster) algorithm using Decorate/Sort/Undecorate (aka schwarzian transform):

import sys
output = sys.stdout

input = [
#ID STATE ... ... ... DATE TARGET
"aaa\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 0120\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 1010\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 1111\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1201\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 0612\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
]

def find_best_match (input=input, output=output):
PIID = 0
STATE = 1
EXP_DATE = 5
DESIRED_STATE = 6

recs = {}
for line in input:
line = line.rstrip('\n ')
row = line.split('\t' )
sort_key = (row[STATE] == row[DESIRED_STATE], row[EXP_DATE])
recs.setdefault (row[PIID], []).append((sort_ key, line))

for decorated_lines in recs.itervalues ():
print >output, sorted(decorate d_lines, reverse=True)[0][1]

Lines are sorted first on whether the state matches the desired state,
then on the expiration date. Since it's a reverse sort, we first have
lines that match (if any) sorted by date descending, then the lines that
dont match sorted by date descending. So in both cases, the 'best match'
is the first item in the list. Then we just have to get rid of the sort
key, et voilà !-)

HTH
Mar 4 '07 #19
Bruno Desthuilliers wrote:
print >output, sorted(decorate d_lines, reverse=True)[0][1]
Or just
print >output, max(decorated_l ines)[1]

Peter
Mar 4 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
4112
by: Fred Ma | last post by:
Hello, This is not a troll posting, and I've refrained from asking because I've seen similar threads get all nitter-nattery. But I really want to make a decision on how best to invest my time. I'm not interested on which language is better in *general*, just for my purpose. My area of research is in CAD algorithms, and I'm sensing the need to resort to something more expedient than C++, bash scripting, or sed scripting.
11
1938
by: Xah Lee | last post by:
© # -*- coding: utf-8 -*- © # Python © © # the "filter" function can be used to © # reduce a list such that unwanted © # elements are removed. © # example: © © def even(n): return n % 2 == 0 © print filter( even, range(11))
15
2092
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python suppose you want to walk into a directory, say, to apply a string replacement to all html files. The os.path.walk() rises for the occasion. © import os © mydir= '/Users/t/Documents/unix_cilre/python' © def myfun(s1, s2, s3):
9
3218
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images files from gif # format to png format. Now you need to change the # html code to use the .png files. So, essentially
2
2540
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # suppose you want to fetch a webpage. from urllib import urlopen print urlopen('http://xahlee.org/Periodic_dosage_dir/_p2/russell-lecture.html').read() # note the line # from <library_name> import <function_name1,function_name2...>
4
1906
by: Xah Lee | last post by:
20050207 text pattern matching # -*- coding: utf-8 -*- # Python # suppose you want to replace all strings of the form # <img src="some.gif" width="30" height="20"> # to # <img src="some.png" width="30" height="20"> # in your html files.
7
1504
by: Xah Lee | last post by:
a year ago i wrote this perl program as part of a larger program. as a exercise of fun, let's do a python version. I'll post my version later today. =pod combo(n) returns a collection with elements of pairs that is all possible combinations of 2 things from n. For example, combo(4) returns {'3,4' => ,'1,2' => ,'1,3' => ,'1,4' =>
9
1928
by: Xah Lee | last post by:
here's a interesting real-world algoritm to have fun with. attached below is the Perl documentation that i wrote for a function called "reduce", which is really the heart of a larger software. The implementation is really simple, but the key is to understand what the function should be. I'll post Perl and Python codes tomorrow for those interested. If you are a perl programer, try to code it in Python. (it's easy.)
3
12379
by: Xah Lee | last post by:
Split File Fullpath Into Parts Xah Lee, 20051016 Often, we are given a file fullpath and we need to split it into the directory name and file name. The file name is often split into a core part and a extension part. For example: '/Users/t/web/perl-python/I_Love_You.html' becomes
8
1300
by: Xah Lee | last post by:
i'm starting a yahoo group for learning python. Each day, a tip of python will be shown, with the perl equivalent. For those of you perlers who always wanted to learn python, this is suitable. (i started it because i always wanted to switch to python but too lazy and always falling back to a lang i am an expert at, but frustrated constantly by its inanities and incompetences.) to subscribe, go to:...
0
9656
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10364
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10172
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9967
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4069
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.