I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.
I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.
Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.
I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.
I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Thanks for taking the time to look at these.
Shawn
############### ############### ############### ############### ##############
Perl code:
############### ############### ############### ############### ##############
#! /usr/bin/env perl
use warnings;
use strict;
my $piid;
my $row;
my %input;
my $best;
my $curr;
foreach $row (<>){
chomp($row);
$piid = (split(/\t/, $row))[0];
push ( @{$input{$piid} }, $row );
}
for $piid (keys(%input)){
$best = "";
for $curr (@{$input{$piid }}){
if ($best eq ""){
$best = $curr;
}else{
#If the current record is the correct state
if ((split(/\t/, $curr))[1] eq (split(/\t/, $curr))[6]){
#If existing record is the correct state
if ((split(/\t/, $best))[1] eq (split(/\t/, $curr))[6]){
if ((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5]){
$best = $curr;
}
}else{
$best = $curr;
}
}else{
#if the existing record does not have the correct state
#and the new one has a newer expiration date
if (((split(/\t/, $best))[1] ne (split(/\t/, $curr))[6]) and
((split(/\t/, $curr))[5] gt (split(/\t/, $best))[5])){
$best = $curr;
}
}
}
}
print "$best\n";
}
############### ############### ############### ############### ##############
End Perl code
############### ############### ############### ############### ##############
############### ############### ############### ############### ##############
Python code
############### ############### ############### ############### ##############
#! /usr/bin/env python
import sys
input = sys.stdin
recs = {}
for row in input:
row = row.rstrip('\n' )
piid = row.split('\t')[0]
if recs.has_key(pi id) is False:
recs[piid] = []
recs[piid].append(row)
for piid in recs.keys():
best = ""
for current in recs[piid]:
if best == "":
best = current;
else:
#If the current record is the correct state
if current.split(" \t")[1] == current.split(" \t")[6]:
#If the existing record is the correct state
if best.split("\t" )[1] == best.split("\t" )[6]:
#If the new record has a newer exp. date
if current.split(" \t")[5] best.split("\t" )[5]:
best = current
else:
best = current
else:
#If the existing record does not have the correct state
#and the new record has a newer exp. date
if best.split("\t" )[1] != best.split("\t" )[6] and
current.split(" \t")[5] best.split("\t" )[5]:
best = current
print best
############### ############### ############### ############### ##############
End Python code
############### ############### ############### ############### ##############
Mar 2 '07
20 2219
In <54************ *@mid.individua l.net>, Bjoern Schliessmann wrote:
Bruno Desthuilliers wrote:
>Shawn Milo a écrit :
>> if recs.has_key(pi id) is False:
'is' is the identity operator - practically, in CPython, it compares memory addresses. You *dont* want to use it here.
It's recommended to use "is None"; why not "is False"? Are there
multiple False instances or is False generated somehow?
Before `True` and `False` existed many people defined them as aliases to 1
and 0. And of course there are *many* other objects that can be used in a
boolean context of an ``if`` statement for testing "trueness" and
"falseness" .
Ciao,
Marc 'BlackJack' Rintsch
On Mar 3, 7:08 pm, attn.steven.... @gmail.com wrote:
On Mar 2, 2:44 pm, "Shawn Milo" <S...@Milochik. comwrote:
(snipped)
I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Thanks for taking the time to look at these.
My attempts:
### Python (re-working John's code) ###
import sys
def keep_best(best, current):
ACTUAL_STATE = 1
# John had these swapped
DESIRED_STATE = 5
EXPIRY_DATE = 6
*Bullshit* -- You are confusing me with Bruno; try (re)?reading what
the OP wrote (and which you quoted above):
"""
The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6).
"""
and his code (indented a little less boisterously):
"""
#If the current record is the correct state
if current.split(" \t")[1] == current.split(" \t")[6]:
#If the existing record is the correct state
if best.split("\t" )[1] == best.split("\t" )[6]:
#If the new record has a newer exp. date
if current.split(" \t")[5] best.split("\t" )[5]:
"""
On Saturday 03 March 2007, Ben Finney wrote:
Bjoern Schliessmann <us************ **************@ spamgourmet.com writes:
if not recs.has_key(pi id): # [1]
Why not
if piid not in recs:
That is shorter, simpler, easier to read and very slightly faster. Plus you
can change the data structure of recs later without changing that line so
long as it implements containment testing.
William Heymann <ko**@aesaeion. comwrites:
On Saturday 03 March 2007, Ben Finney wrote:
Bjoern Schliessmann <us************ **************@ spamgourmet.com writes:
if not recs.has_key(pi id): # [1]
Why not
if piid not in recs:
That is shorter, simpler, easier to read and very slightly faster.
Perhaps if I'd made my posting shorter, simpler, easier to read and
slightly faster, you might have read the footnote to which the '[1]'
referred.
--
\ "Choose mnemonic identifiers. If you can't remember what |
`\ mnemonic means, you've got a problem." -- Larry Wall |
_o__) |
Ben Finney
On Mar 2, 10:44 pm, "Shawn Milo" <S...@Milochik. comwrote:
I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.
I wrote the same script in both Perl and Python, and the output is
identical. The run speed is similar (very fast) and the line count is
similar.
Now that they're both working, I was looking at the code and wondering
what Perl-specific and Python-specific improvements to the code would
look like, as judged by others more knowledgeable in the individual
languages.
Hi Shawn, there is a web page that gives examples from Perl's
Datastructures Cookbook re-implemented in Python. It might be of help
for future Python projects: http://wiki.python.org/moin/PerlPhrasebook
- Paddy.
Shawn Milo kirjoitti:
<snip>
I am not looking for the smallest number of lines, or anything else
that would make the code more difficult to read in six months. Just
any instances where I'm doing something inefficiently or in a "bad"
way.
I'm attaching both the Perl and Python versions, and I'm open to
comments on either. The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
I don't know if this attempt satisfies your criteria but here goes!
This is not a rewrite of your program but was created using your problem
description above. I've not included the reading of the data because it
has not much to do with the problem per se.
#============== =============== =============== =============== =
input = [
"aaa\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 0120\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 1010\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 1111\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1201\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0612\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
]
input = [x[:-1].split('\t') for x in input]
recs = {}
for row in input:
recs.setdefault (row[0], []).append(row)
for key in recs:
rows = recs[key]
rows.sort(key=l ambda x:x[5], reverse=True)
for current in rows:
if current[1] == current[6]:
break
else:
current = rows[0]
print '\t'.join(curre nt)
#============== =============== =============== =============== =
The output is:
aaa AAA ... ... ... 20070120 AAA
bbb AAA ... ... ... 20071212 AAA
ccc AAA ... ... ... 20071212 AAA
and it is the same as the output of your original code on this data.
Further testing would naturally be beneficial.
Cheers,
Jussi
John Machin a écrit :
On Mar 3, 12:36 pm, Bruno Desthuilliers >
[snip]
> DATE = 5 TARGET = 6
[snip]
>>Now for the bad news: I'm afraid your algorithm is broken : here are my test data and results:
input = [ #ID STATE ... ... ... TARG DATE "aaa\tAAA\t...\ t...\t...\tBBB\ t20071212\n",
[snip]
Bruno, The worse news is that your test data is broken.
Re-reading the OP's specs, the bad news is that my only neuron left is
broken. Shouldn't code at 2 o'clock in the morning :(
Bjoern Schliessmann a écrit :
Bruno Desthuilliers wrote:
>>Shawn Milo a écrit :
>> if recs.has_key(pi id) is False:
'is' is the identity operator - practically, in CPython, it compares memory addresses. You *dont* want to use it here.
It's recommended to use "is None"; why not "is False"? Are there
multiple False instances or is False generated somehow?
Once upon a time, Python didn't have a "proper" boolean type. It only
had rules for boolean evaluation of a given object. According to these
rules - that of course still apply -, empty strings, lists, tuples or
dicts, numeric zeros and None are false in a boolean context. IOW, an
expression can eval to false without actually being the False object
itself. So the result of using the identity operator to test against
such an expression, while being clearly defined, may not be exactly what
you'd think.
To make a long story short:
if not []:
print "the empty list evals to false in a boolean context"
if [] is False:
print "this python interpreter is broken"
HTH
Shawn Milo a écrit :
(snip)
The script reads a file from standard input and
finds the best record for each unique ID (piid). The best is defined
as follows: The newest expiration date (field 5) for the record with
the state (field 1) which matches the desired state (field 6). If
there is no record matching the desired state, then just take the
newest expiration date.
Here's a fixed (wrt/ test data) version with a somewhat better (and
faster) algorithm using Decorate/Sort/Undecorate (aka schwarzian transform):
import sys
output = sys.stdout
input = [
#ID STATE ... ... ... DATE TARGET
"aaa\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 0120\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"aaa\tAAA\t...\ t...\t...\t2007 1010\tBBB\n",
"aaa\tAAA\t...\ t...\t...\t2007 1111\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1201\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 0101\tAAA\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"ccc\tAAA\t...\ t...\t...\t2007 1212\tAAA\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 0101\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 0612\tBBB\n",
"bbb\tAAA\t...\ t...\t...\t2007 1212\tBBB\n",
]
def find_best_match (input=input, output=output):
PIID = 0
STATE = 1
EXP_DATE = 5
DESIRED_STATE = 6
recs = {}
for line in input:
line = line.rstrip('\n ')
row = line.split('\t' )
sort_key = (row[STATE] == row[DESIRED_STATE], row[EXP_DATE])
recs.setdefault (row[PIID], []).append((sort_ key, line))
for decorated_lines in recs.itervalues ():
print >output, sorted(decorate d_lines, reverse=True)[0][1]
Lines are sorted first on whether the state matches the desired state,
then on the expiration date. Since it's a reverse sort, we first have
lines that match (if any) sorted by date descending, then the lines that
dont match sorted by date descending. So in both cases, the 'best match'
is the first item in the list. Then we just have to get rid of the sort
key, et voilà !-)
HTH
Bruno Desthuilliers wrote:
print >output, sorted(decorate d_lines, reverse=True)[0][1]
Or just
print >output, max(decorated_l ines)[1]
Peter This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Fred Ma |
last post by:
Hello,
This is not a troll posting, and I've refrained from
asking because I've seen similar threads get all
nitter-nattery. But I really want to make a decision
on how best to invest my time. I'm not interested on
which language is better in *general*, just for my
purpose. My area of research is in CAD algorithms,
and I'm sensing the need to resort to something more
expedient than C++, bash scripting, or sed scripting.
|
by: Xah Lee |
last post by:
© # -*- coding: utf-8 -*-
© # Python
©
© # the "filter" function can be used to
© # reduce a list such that unwanted
© # elements are removed.
© # example:
©
© def even(n): return n % 2 == 0
© print filter( even, range(11))
|
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
suppose you want to walk into a directory, say, to apply a string
replacement to all html files. The os.path.walk() rises for the
occasion.
© import os
© mydir= '/Users/t/Documents/unix_cilre/python'
© def myfun(s1, s2, s3):
|
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
# Matching string patterns
#
# Sometimes you want to know if a string is of
# particular pattern. Let's say in your website
# you have converted all images files from gif
# format to png format. Now you need to change the
# html code to use the .png files. So, essentially
|
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
# suppose you want to fetch a webpage.
from urllib import urlopen
print
urlopen('http://xahlee.org/Periodic_dosage_dir/_p2/russell-lecture.html').read()
# note the line
# from <library_name> import <function_name1,function_name2...>
| |
by: Xah Lee |
last post by:
20050207 text pattern matching
# -*- coding: utf-8 -*-
# Python
# suppose you want to replace all strings of the form
# <img src="some.gif" width="30" height="20">
# to
# <img src="some.png" width="30" height="20">
# in your html files.
|
by: Xah Lee |
last post by:
a year ago i wrote this perl program as part of a larger program.
as a exercise of fun, let's do a python version. I'll post my version
later today.
=pod
combo(n) returns a collection with elements of pairs that is all
possible combinations of 2 things from n. For example, combo(4)
returns {'3,4' => ,'1,2' => ,'1,3' => ,'1,4' =>
|
by: Xah Lee |
last post by:
here's a interesting real-world algoritm to have fun with.
attached below is the Perl documentation that i wrote for a function
called "reduce", which is really the heart of a larger software.
The implementation is really simple, but the key is to understand what
the function should be. I'll post Perl and Python codes tomorrow for
those interested. If you are a perl programer, try to code it in
Python. (it's easy.)
|
by: Xah Lee |
last post by:
Split File Fullpath Into Parts
Xah Lee, 20051016
Often, we are given a file fullpath and we need to split it into the
directory name and file name. The file name is often split into a core
part and a extension part. For example:
'/Users/t/web/perl-python/I_Love_You.html'
becomes
|
by: Xah Lee |
last post by:
i'm starting a yahoo group for learning python. Each day, a tip of
python will be shown, with the perl equivalent. For those of you
perlers who always wanted to learn python, this is suitable. (i started
it because i always wanted to switch to python but too lazy and always
falling back to a lang i am an expert at, but frustrated constantly by
its inanities and incompetences.)
to subscribe, go to:...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| | |