hi
i'm learning python, and one area i'd use it for is data management in
scientific computing. in the case i've tried i want to reformat a data
file from a normalised list to a matrix with some sorted columns. to
do this at the moment i am using perl, which is very easy to do, and i
want to see if python is as easy.
so, the data i am using is some epiphyte population abundance data for
particular sites, and it looks like this:
1.00 1.00 1.00 "MO" 906.00 "genus species 1" 1.00
1.00 1.00 1.00 "MO" 906.00 "genus species 2" 1.00
1.00 1.00 1.00 "MO" 906.00 "genus species 3" 1.00
1.00 1.00 1.00 "MO" 906.00 "genus species 4" 1.00
(i have changed the data to protect the innocent) the first four
columns relate to the location, the fifth to the substrate, the sixth
is the epiphyte species and the seventh the abundance. i need to turn
this into a substrate x species matrix with columns 1 to 4 retained as
sorting columns and the intersection of speces and substrate being the
abundance. the species name needs to be the column headers. this is
going to go into a multivariate analysis of variance programme that
only takes its data in that format. here is an example of the output
region location site stand substrate genus species 1 genus species
2 genus species 3 genus species 4 genus species 5 genus species
6 genus species 7
<..etc..>
1 1 1 MO 906 0 0 0 0 0 0 0 0 0 0 0 0 0 0
<..etc...>
so, to do this in perl - and i won't bore you with the whole script -
i read the file, split it into tokens and then populate a hash of
hashes, the syntax of which is
$HoH{$tokens[0]}{$tokens[1]}{$tokens[2]}{$tokens[3]}{$tokens[4]}{$tokens[5]}
= $tokens[6]
with the various location and species values are the keys of the hash,
and the abundance is the $tokens[6] value. this now gives me a
multidimensiona l data structure that i can use to loop over the keys
and sort them by each as i go, then to write out the data into a
matrix as above. the syntax for this is generally like
# level 1 - region
foreach $region (sort {$a <=> $b} keys %HoH) {
# level 2 - location
foreach $location (sort {$a <=> $b} keys %{ $HoH{$region} }) {
# level 3 - site
foreach $site (sort {$a <=> $b} keys %{ $HoH{$region}{$ location} })
<... etc ...>
there is a bit more perl obviously, but that is the general gist of
it. multidimensiona l hash and then looping and sorting to get the data
out.
ok. so how do i do this in python? i've tried the "perlish" way but
didn't get very far, however i know it must be able to be done!
if you want to respond to this, try benmoretti at yahoo dot com dot au
as i get too much spam otherwise
cheers
ben 2 4448
ben moretti wrote: hi
i'm learning python, and one area i'd use it for is data management in scientific computing. in the case i've tried i want to reformat a data file from a normalised list to a matrix with some sorted columns. to do this at the moment i am using perl, which is very easy to do, and i want to see if python is as easy.
so, the data i am using is some epiphyte population abundance data for particular sites, and it looks like this:
1.00 1.00 1.00 "MO" 906.00 "genus species 1" 1.00 1.00 1.00 1.00 "MO" 906.00 "genus species 2" 1.00 1.00 1.00 1.00 "MO" 906.00 "genus species 3" 1.00 1.00 1.00 1.00 "MO" 906.00 "genus species 4" 1.00
(i have changed the data to protect the innocent) the first four columns relate to the location, the fifth to the substrate, the sixth is the epiphyte species and the seventh the abundance. i need to turn this into a substrate x species matrix with columns 1 to 4 retained as sorting columns and the intersection of speces and substrate being the abundance. the species name needs to be the column headers. this is going to go into a multivariate analysis of variance programme that only takes its data in that format. here is an example of the output
region location site stand substrate genus species 1 genus species 2 genus species 3 genus species 4 genus species 5 genus species 6 genus species 7
<..etc..>
1 1 1 MO 906 0 0 0 0 0 0 0 0 0 0 0 0 0 0
<..etc...>
so, to do this in perl - and i won't bore you with the whole script - i read the file, split it into tokens and then populate a hash of hashes, the syntax of which is
$HoH{$tokens[0]}{$tokens[1]}{$tokens[2]}{$tokens[3]}{$tokens[4]}{$tokens[5]} = $tokens[6]
with the various location and species values are the keys of the hash, and the abundance is the $tokens[6] value. this now gives me a multidimensiona l data structure that i can use to loop over the keys and sort them by each as i go, then to write out the data into a matrix as above. the syntax for this is generally like
# level 1 - region foreach $region (sort {$a <=> $b} keys %HoH) {
# level 2 - location foreach $location (sort {$a <=> $b} keys %{ $HoH{$region} }) {
# level 3 - site foreach $site (sort {$a <=> $b} keys %{ $HoH{$region}{$ location} })
<... etc ...>
there is a bit more perl obviously, but that is the general gist of it. multidimensiona l hash and then looping and sorting to get the data out.
ok. so how do i do this in python? i've tried the "perlish" way but didn't get very far, however i know it must be able to be done!
The best solution would probably to be to rely on a database that supports
pivot tables.
However, I've put together a simple class to generate a pivot table to get
you started. It's only 2D, i. e. f(row,col) -> value, but if I have
understood you correctly that should be sufficient (I am not good at
reading perl).
To read your data from a (text) file, have a look at Python's csv module.
Peter
<code>
import sets
class Adder(object):
""" Adds all values entered via set()
"""
def __init__(self, value=0):
self.value = value
def set(self, value):
self.value += value
def get(self):
return self.value
_none = object()
class First(object):
""" Accepts any value the first time set() is called,
requires the same value on subsequent calls of set().
"""
def __init__(self):
self.value = _none
def set(self, value):
if self.value is _none:
self.value = value
else:
if value != self.value:
raise ValueError, "%s expected but got %s" % (self.value,
value)
def get(self):
return self.value
class Pivot(object):
""" A simple Pivot table generator class
"""
def __init__(self, valueAccumulato r, rowHeaders):
self.rows = sets.Set()
self.columns = sets.Set()
self.values = {}
self.valueAccum ulator = valueAccumulato r
self.rowHeaders = rowHeaders
def extend(self, table, extractRow, extractColumn, extractValue):
for record in table:
r = extractRow(reco rd)
c = extractColumn(r ecord)
self.rows.add(r )
self.columns.ad d(c)
try:
fxy = self.values[r, c]
except KeyError:
fxy = self.valueAccum ulator()
self.values[r, c] = fxy
fxy.set(extract Value(record))
def toTable(self, defaultValue=No ne, columnCompare=N one,
rowCompare=None ):
""" returns a list of lists.
"""
table = []
rows = list(self.rows)
rows.sort(rowCo mpare)
columns = list(self.colum ns)
columns.sort(co lumnCompare)
headers = self.rowHeaders + [c for c in columns]
table.append(he aders)
for row in rows:
record = list(row)
for column in columns:
v = self.values.get ((row, column), None)
if v is not None:
v = v.get()
record.append(v )
table.append(re cord)
return table
def printTable(p):
for row in p.toTable():
print row
if __name__ == "__main__":
table = [
"Jack Welsh Beer 1",
"Richard Maier Beer 1",
"Bill Bush Wine 2",
"Bill Bush Wine 2",
]
table = [row.split() for row in table]
print table
print "-" * 10
p = Pivot(Adder, ["Christian" , "Surname"])
def extractRow(reco rd):
return record[0], record[1]
def extractValue(re cord):
return int(record[3])
def extractColumn(r ecord):
return record[2]
p.extend(table, extractRow, extractColumn, extractValue)
printTable(p)
columns = "region location site stand substrate species
abundance".spli t()
table = [
[1.0, 1.0, 1.0, "MO", 906, "species 1", 1],
[1.0, 1.0, 1.0, "MO", 906, "species 2", 1],
[1.0, 1.0, 1.0, "MO", 906, "species 3", 1],
[1.0, 1.0, 1.0, "MO", 906, "species 1", 1],
[1.0, 1.0, 1.0, "GO", 706, "species 4", 1],
# [1.0, 1.0, 1.0, "GO", 706, "species 4", 2],# uncomment me
[1.0, 1.0, 1.0, "GO", 806, "species 1", 1],
[1.0, 1.0, 1.0, "GO", 906, "species 1", 1],
[1.0, 1.0, 1.0, "GO", 106, "species 1", 1],
]
p = Pivot(First, columns[:5])
p.extend(table, lambda r: tuple(r[:5]),
lambda r: r[5],
lambda r: r[6])
printTable(p)
</code> bm******@chario t.net.au (ben moretti) wrote: i'm learning python, and one area i'd use it for is data management in scientific computing. in the case i've tried i want to reformat a data file from a normalised list to a matrix with some sorted columns. to do this at the moment i am using perl, which is very easy to do, and i want to see if python is as easy.
Not being too familiar with Perl (or scientific computing), I'm not
sure if I understood everything correctly...
1.00 1.00 1.00 "MO" 906.00 "genus species 1" 1.00 1.00 1.00 1.00 "MO" 906.00 "genus species 2" 1.00 1.00 1.00 1.00 "MO" 906.00 "genus species 3" 1.00 1.00 1.00 1.00 "MO" 906.00 "genus species 4" 1.00
I _think_ you want your data as a nested dictionary like so:
{1: {1: {1: {"MO": {906: {"genus species 1": 1,
"genus species 2": 1,
"genus species 3": 1,
"genus species 4": 1} }}}}}
so, to do this in perl - and i won't bore you with the whole script - i read the file, split it into tokens
I hope I will NOT bore you with a whole script, but I've expanded your
data a bit to have a somewhat more complicated/structured data file to
work with (not shown here, this's more than long enough as it is); so
I'll first read it in and split it up:
###
import csv
f = open(r"i:\pytho n\nestedtest.tx t", "r") # my testdata
csvreader = csv.reader(f, delimiter=' ', quotechar='"')
###
From your output I gather that maybe you the numbers as numbers, and
not as strings, so I'll convert the data while populating an
intermediate list:
###
def parselist(lst):
"""convert the list's values to floats or integers where
appropriate"""
parsed = []
for itm in lst:
try:
f = float(itm)
i = int(f)
if i == f:
parsed.append(i nt(i))
else:
parsed.append(f )
except ValueError:
parsed.append(i tm)
return parsed
datalist = []
for line in csvreader:
datalist.append (parselist(line ))
f.close() # don't need that anymore
###
and then populate a hash of hashes, the syntax of which is
$HoH{$tokens[0]}{$tokens[1]}{$tokens[2]}{$tokens[3]}{$tokens[4]}{$tokens[5]} = $tokens[6]
Now, if that does what I think it does (create a nested hash), then
hats off to Perl! I haven't found anything as concise built into
Python (but then I'm not a guru, maybe someone else knows a better
way?), so I rolled my own:
###
def nestdict(lst):
"""create a recursively nested dictionary from a _flat_ list"""
dct = {}
if len(lst) > 2:
dct[lst[0]] = nestdict(lst[1:])
elif len(lst) == 2:
dct[lst[0]] = lst[1]
return dct
###
which is good for ONE line of input; since I have a list of those, I
want to build up the dictionary line by line, for which I need another
function:
###
def nestextend(dct, upd):
"""recursiv ely extend/update a nested dictionary with another one"""
try:
items = upd.items()
for key, val in items:
if key not in dct:
dct[key] = val
else:
nestextend(dct[key], upd[key])
except AttributeError:
dct.update(upd)
datadict = {}
for lst in datalist:
nestextend(data dict, nestdict(lst))
###
datadict now holds all the data from the testfile in a nested
dictionary with the various locations and species values as the keys
of the hash, which is what (I hope) you wanted.
and the abundance is the $tokens[6] value. this now gives me a multidimensiona l data structure
Reading that I'm not sure I've understood anything - shouldn't you
want to use a multidimensiona l array for that? Anyone familiar with
Python's scientific/number crunching/array libraries should be able to
clear that up...
that i can use to loop over the keys and sort them by each as i go, then to write out the data into a matrix as above.
I'm not sure how you arrive at your matrix output, but looping over
the dictionary shouldn't be a problem now. However, since you also
want to sort the data (by key), and dictionaries notoriously don't
support that, I've written another function:
###
def nestsort(dct):
"""convert a nested dictionary to a nested (key, value) list,
recursively sorting it by key"""
lst = []
try:
items = dct.items()
items.sort()
for key, value in items:
lst.append([key, nestsort(dct[key])])
return lst
except AttributeError:
return dct
sorteddata = nestsort(datadi ct)
###
So now the data from the beginning looks like:
[1, [1, [1, ["MO", [906, ["genus species 1", 1],
["genus species 2", 1],
["genus species 3", 1],
["genus species 4", 1] ]]]]]
which you probably could have had cheaper...
Now you can do something like:
###
for region, rdata in sorteddata:
print "Region", region
for location, ldata in rdata:
print " " * 2 + "Location", location
for site, sitedata in ldata:
print " " * 4 + "Site", site
for stand, stdata in sitedata:
print " " * 6 + "Stand", stand
for substrate, subdata in stdata:
print " " * 8 + "Substrate" , substrate
for genus, abundance in subdata:
print " " * 10 + "Genus", genus, "Abundance" , abundance
###
to test my script and your (real) data.
There's next to no error-checking and it sure'd be more
pythonic/beautiful/reusable if I'd subclass'd dict, but it works --
for my data at least.
ok. so how do i do this in python? i've tried the "perlish" way but
Once more, it seems that "the perlish way" <> "the python way".
didn't get very far, however i know it must be able to be done!
I don't think there's much of anything either language can do that the
other can't, but of course some things are harder than others...
if you want to respond to this, try benmoretti at yahoo dot com dot au as i get too much spam otherwise
<posted to the NG and forwarded to you>
--
Christopher This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics | |
by: spar |
last post by:
I'm converting a Perl script to Python and have run into something I'm
not sure how to do in Python.
In Perl, I am running through a couple loops and inserting values
directly into a complex data structure (array->hash->array). This
isn't the actual code, but should demonstrate the general idea:
foreach $bar_count(@bars) {
foreach $el_count(@els) {
$var = somefunc($bar_count,$el_count);
|
by: surfunbear |
last post by:
I've read some posts on Perl versus Python and studied a bit of my
Python book.
I'm a software engineer, familiar with C++ objected oriented
development, but have been using Perl because it is great for pattern
matching, text processing, and automated testing. Our company is really
fixated on risk managnemt and the only way I can do enough testing
without working overtime (which some people have ended up doing) is by
automating my...
|
by: Lad |
last post by:
Is anyone capable of providing Python advantages over PHP if there are
any?
Cheers,
L.
|
by: Miguel Manso |
last post by:
Hi there,
I'm a Perl programmer trying to get into Python. I've been reading some
documentation and I've choosed Python has being the "next step" to give.
Can you point me out to Python solutions for:
1) Perl's Data::Dumper
It dumps any perl variable to the stdout in a "readable" way.
|
by: Alex |
last post by:
Hi all,
I'm looking for some advice on how best to implement storage of access
logs into a db/2 8.1.4 database running on a RH 7.2 system.
I have 5 (squid) web caches running here that service the whole
university. All access to external web sites must go through these
caches. Each cache generates a gzip'd access log file that's about
100Mbytes every night.
| | |
by: Robert Oschler |
last post by:
I am converting a Perl script over to "C" for a potential open source
project. I need some open source "C" code that will give me the same
functionality of a Perl Style associative array:
someArray = 6;
I know I can't get the same syntactic sugar as Perl offers, with the usage
of a string as the array key surrounded by square brackets. I just want the
general functionality, that's all. That is, a data container that will
maintain...
|
by: squash |
last post by:
I am a little annoyed at why such a simple program in Perl is causing
so much difficulty for python, i.e:
$a += 200000 * 140000;
print $a;
|
by: rurpy |
last post by:
Is there an effcient way (more so than cgi) of using Python
with Microsoft IIS? Something equivalent to Perl-ISAPI?
|
by: Palindrom |
last post by:
Hi everyone !
I'd like to apologize in advance for my bad english, it's not my
mother tongue...
My girlfriend (who is a newbie in Python, but knows Perl quite well)
asked me this morning why the following code snippets didn't give the
same result :
### Python ###
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| | |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| | |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |