473,326 Members | 2,134 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Array of dict or lists or ....?

Pat

I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):
States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0

With each line I read in, I would create a hash entry and increment the
number of enrolled students.

I wrote a routine in Perl using arrays of hash tables (but the syntax
was a bear) that allowed me to read in the data and with those arrays of
hash tables to arrays of hash tables almost everything was dynamically
assigned.

I was able to fill in the hash tables and determine if any school class
(e.g. Gym) had exceeded the number of max students or if no students had
enrolled.

No, this is not a classroom project. I really need this for my job.
I'm converting my Perl program to Python and this portion has me stumped.

The reason why I'm converting a perfectly working program is because no
one else knows Perl or Python either (but I believe that someone new
would learn Python quicker than Perl) and the Perl program has become
huge and is continuously growing.
Oct 6 '08 #1
13 1472
I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):

States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0

With each line I read in, I would create a hash entry and increment the
number of enrolled students.
A python version of what you describe:

class TooManyAttendants(Exception): pass
class Attendence(object):
def __init__(self, max):
self.max = int(max)
self.total = 0
def accrue(self, other):
self.total += int(other)
if self.total self.max: raise TooManyAttendants
def __str__(self):
return "%s/%s" % (self.max, self.total)
__repr__ = __str__

data = {}
for i, line in enumerate(file("input.txt")):
print line,
state, county, school, cls, max_students, enrolled = map(
lambda s: s.strip(),
line.rstrip("\r\n").split(",")
)
try:
data.setdefault(
state, {}).setdefault(
county, {}).setdefault(
cls, Attendence(max_students)).accrue(enrolled)
except TooManyAttendants:
print "Too many Attendants in line %i" % (i + 1)
print repr(data)
You can then access things like

a = data["Nebraska"]["Wabash"]["Newville"]["Math"]
print a.max, a.total

If capitalization varies, you may have to do something like

data.setdefault(
state.upper(), {}).setdefault(
county.upper(), {}).setdefault(
cls.upper(), Attendence(max_students)).accrue(enrolled)

to make sure they're normalized into the same groupings.

-tkc


Oct 7 '08 #2
Tim Chase:
__repr__ = __str__
I don't know if that's a good practice.

try:
data.setdefault(
state, {}).setdefault(
county, {}).setdefault(
cls, Attendence(max_students)).accrue(enrolled)
except TooManyAttendants:
I suggest to decompress that part a little, to make it a little more
readable.

Bye,
bearophile
Oct 7 '08 #3
> __repr__ = __str__
>
I don't know if that's a good practice.
I've seen it in a couple places, and it's pretty explicit what
it's doing.
> try:
data.setdefault(
state, {}).setdefault(
county, {}).setdefault(
cls, Attendence(max_students)).accrue(enrolled)
except TooManyAttendants:

I suggest to decompress that part a little, to make it a little more
readable.
I played around with the formatting and didn't really like any of
the formatting I came up with. My other possible alternatives were:

try:
data \
.setdefault(state, {}) \
.setdefault(county, {}) \
.setdefault(cls, Attendence(max_students)) \
.accrue(enrolled)
except TooManyAttendants:

or

try:
(data
.setdefault(state, {})
.setdefault(county, {})
.setdefault(cls, Attendence(max, 0))
).accrue(enrolled)
except TooManyAttendants:

Both accentuate the setdefault() calls grouped with their
parameters, which can be helpful. Which one is "better" is a
matter of personal preference:

* no extra characters but hard to read
* backslashes, or
* an extra pair of parens

-tkc


Oct 7 '08 #4
En Mon, 06 Oct 2008 22:52:29 -0300, Tim Chase
<py*********@tim.thechases.comescribió:
>> __repr__ = __str__
[be************@lycos.com wrote]
> I don't know if that's a good practice.
I've seen it in a couple places, and it's pretty explicit what it's
doing.
__repr__ is used as a fallback for __str__, so just defining __repr__ (and
leaving out __str__) is enough.

--
Gabriel Genellina

Oct 7 '08 #5
Tim Chase <py*********@tim.thechases.comwrites:
>> __repr__ = __str__

I don't know if that's a good practice.

I've seen it in a couple places, and it's pretty explicit what it's
doing.
But what's the point? Simply define __repr__, and both repr and str
will pick it up.
Oct 7 '08 #6
Pat
Dennis Lee Bieber wrote:
On Mon, 06 Oct 2008 19:45:07 -0400, Pat <Pa*@junk.comdeclaimed the
following in comp.lang.python:
>I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):
States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0

<snip>
The structure looks more suited to a database -- maybe SQLite since
the interface is supplied with the newer versions of Python (and
available for older versions).
I don't understand why I need a database when it should just be a matter
of defining the data structure. I used a fictional example to make it
easier to (hopefully) convey how the data is laid out.

One of the routines in the actual program checks a few thousand
computers to verify that certain processes are running. I didn't want
to complicate my original question by going through all of the gory
details (multiple userids running many processes with some of the
processes having the same name). To save time, I fork a process for
each computer that I'm checking. It seems to me that banging away at a
database would greatly slow down the program and make the program more
complicated.

The Perl routine works fine and I'd like to emulate that behavior but
since I've just starting learning Python I don't know the syntax for
designing the data structure. I would really appreciate it if someone
could point me in the right direction.
Oct 7 '08 #7
Would the following be suitable data structure:
....
struct = {}
struct["Nebraska"] = "Wabash"
struct["Nebraska"]["Wabash"] = "Newville"
struct["Nebraska"]["Wabash"]["Newville"]["topics"] = "Math"
struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Max Allowed Students"] = 20
struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Current enrolled Students"] = 0
....

Have an easy Yom Kippur,
Ron.

-----Original Message-----
From: Pat [mailto:Pa*@junk.net]
Sent: Wednesday, October 08, 2008 04:16
To: py*********@python.org
Subject: Re: Array of dict or lists or ....?

Dennis Lee Bieber wrote:
On Mon, 06 Oct 2008 19:45:07 -0400, Pat <Pa*@junk.comdeclaimed the
following in comp.lang.python:
>I can't figure out how to set up a Python data structure to read in
data that looks something like this (albeit somewhat simplified and contrived):
States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0 Nebraska, Wabash, Newville,
Gym, 400, 0 Nebraska, Tingo, Newfille, Gym, 400, 0 Ohio, Dinger,
OldSchool, English, 10, 0

<snip>
The structure looks more suited to a database -- maybe SQLite since
the interface is supplied with the newer versions of Python (and
available for older versions).
I don't understand why I need a database when it should just be a matter ofdefining the data structure. I used a fictional example to make it easierto (hopefully) convey how the data is laid out.

One of the routines in the actual program checks a few thousand computers to verify that certain processes are running. I didn't want to complicate my original question by going through all of the gory details (multiple userids running many processes with some of the processes having the same name).. To save time, I fork a process for each computer that I'm checking. It seems to me that banging away at a database would greatly slow down the program and make the program more complicated.

The Perl routine works fine and I'd like to emulate that behavior but sinceI've just starting learning Python I don't know the syntax for designing the data structure. I would really appreciate it if someone could point me in the right direction.

Oct 7 '08 #8
On Oct 7, 10:16*am, "Barak, Ron" <Ron.Ba...@lsi.comwrote:
Would the following be suitable data structure:
...
struct = {}
struct["Nebraska"] = "Wabash"
struct["Nebraska"]["Wabash"] = "Newville"
struct["Nebraska"]["Wabash"]["Newville"]["topics"] = "Math"
struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Max Allowed Students"] = 20
struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Current enrolled Students"] = 0
...
That's not quite right as stated.
>>struct = {}
struct["Nebraska"] = "Wabash"
struct["Nebraska"]["Wabash"] = "Newville"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

Oct 7 '08 #9
-----Original Message-----
From: py********************************@python.org [mailto:python-
li*************************@python.org] On Behalf Of Pat
Sent: Tuesday, October 07, 2008 10:16 PM
To: py*********@python.org
Subject: Re: Array of dict or lists or ....?

The Perl routine works fine and I'd like to emulate that behavior but
since I've just starting learning Python I don't know the syntax for
designing the data structure. I would really appreciate it if someone
could point me in the right direction.

states = {}

if 'georgia' not in states:
states['georgia'] = {}

states['georgia']['fulton'] = {}
states['georgia']['fulton']['ps101'] = {}
states['georgia']['fulton']['ps101']['math'] = {}
states['georgia']['fulton']['ps101']['math']['max'] = 100
states['georgia']['fulton']['ps101']['math']['current'] = 33
states['georgia']['dekalb'] = {}
states['georgia']['dekalb']['ps202'] = {}
states['georgia']['dekalb']['ps202']['english'] = {}
states['georgia']['dekalb']['ps202']['english']['max'] = 500
states['georgia']['dekalb']['ps202']['english']['current'] = 44

print states
*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
Oct 7 '08 #10
On Oct 7, 10:15 pm, Pat <P...@junk.netwrote:
Dennis Lee Bieber wrote:
On Mon, 06 Oct 2008 19:45:07 -0400, Pat <P...@junk.comdeclaimed the
following in comp.lang.python:
I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):
States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students
Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0
<snip>
The structure looks more suited to a database -- maybe SQLite since
the interface is supplied with the newer versions of Python (and
available for older versions).
Seconded.
I don't understand why I need a database when it should just be
a matter of defining the data structure.
Picking an appropriate data structure depends on the kind of
functionality you want to provide. So far you basically described just
one requirement: keep a tally of how many students are in each class
and compare it to the max allowed (and zero). If that's the only kind
of query you want to run against your data, there's no reason to index
separately each state, county, or school; all you care about are
classes. A simple data structure that satisfies perfectly the
requirement could then be:

# mapping of {class-info : (max,enrolled)}

data = {
('Nebraska', 'Wabash', 'Newville', 'Math') : (20, 0),
('Nebraska', 'Wabash', 'Newville', 'Gym') : (400, 0),
('Nebraska', 'Tingo', 'Newville', 'Gym') : (400, 0),
('Ohio', 'Dinger', 'OldSchool', 'English') : (10, 0),
}

Of course this data structure is pretty bad at answering a query like
"how many classes are there in Nebraska" or "what's the average number
of enrolled students in Newville". The more general information you
might want to get from the data, the more obvious it becomes that you
need a real database.

HTH,
George
Oct 7 '08 #11
George Sakkis <ge***********@gmail.comwrites:
On Oct 7, 10:15 pm, Pat <P...@junk.netwrote:
I don't understand why I need a database when it should just be a
matter of defining the data structure.

Picking an appropriate data structure depends on the kind of
functionality you want to provide.
[…]
The more general information you might want to get from the data,
the more obvious it becomes that you need a real database.
Thanks very much for posting this answer; I tried to do something
similar but couldn't get at the essential points the way you did here.

Perhaps the original poster is confusing “you should use a databaseâ€
with “you should use a database stored in a fully-concurrent
dedicated database management systemâ€.

Far from it: with Python 2.5 you have SQLite (in the ‘sqlite3’
module), which would be ideal for implementing a powerful relational
SQL database used directly by one program instance, without needing a
full-blown database management system in a separately-administrated
server application.

--
\ “Patience, n. A minor form of despair, disguised as a virtue.†|
`\ —Ambrose Bierce, _The Devil's Dictionary_, 1906 |
_o__) |
Ben Finney
Oct 8 '08 #12
En Tue, 07 Oct 2008 23:15:54 -0300, Pat <Pa*@junk.netescribió:
Dennis Lee Bieber wrote:
>On Mon, 06 Oct 2008 19:45:07 -0400, Pat <Pa*@junk.comdeclaimed the
following in comp.lang.python:
>>I can't figure out how to set up a Python data structure to read in
data that looks something like this (albeit somewhat simplified and
contrived):
States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0
<snip>
>The structure looks more suited to a database -- maybe SQLite since
the interface is supplied with the newer versions of Python (and
available for older versions).

I don't understand why I need a database when it should just be a matter
of defining the data structure. I used a fictional example to make it
easier to (hopefully) convey how the data is laid out.
You don't need a full-blown-multiuser-concurrent-petabyte-capable-server
database, just one that does the job. SQLite is very small and comes with
Python 2.5
The Perl routine works fine and I'd like to emulate that behavior but
since I've just starting learning Python I don't know the syntax for
designing the data structure. I would really appreciate it if someone
could point me in the right direction.
So none of the previously posted alternatives worked for you?

--
Gabriel Genellina

Oct 8 '08 #13
Pat wrote:
I can't figure out how to set up a Python data structure to read in data
that looks something like this (albeit somewhat simplified and contrived):

States
Counties
Schools
Classes
Max Allowed Students
Current enrolled Students

Nebraska, Wabash, Newville, Math, 20, 0
Nebraska, Wabash, Newville, Gym, 400, 0
Nebraska, Tingo, Newfille, Gym, 400, 0
Ohio, Dinger, OldSchool, English, 10, 0

With each line I read in, I would create a hash entry and increment the
number of enrolled students.

You might want something like this:
>>import collections, functools
int_dict = functools.partial(collections.defaultdict, int)
curr = functools.partial(collections.defaultdict, int)
# builds a dict-maker where t = curr(); t['name'] += 1 "works"
for depth in range(4):
# add a layer with a default of the preceding "type"
curr = functools.partial(collections.defaultdict, curr)
>>base = curr() # actually make one
base['Nebraska']['Wabash']['Newville']['Math']['max'] = 20
base['Nebraska']['Wabash']['Newville']['Math']['curr'] += 1
base['Nebraska']['Wabash']['Newville']['Math']['curr']
1
>>base['Nebraska']['Wabash']['Newville']['English']['curr']
0
--Scott David Daniels
Sc***********@Acm.Org
Oct 9 '08 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Alexander Kervero | last post by:
Hi ,today i was reading diveinto python book,in chapter 5 it has a very generic module to get file information,html,mp3s ,etc. The code of the example is here :...
8
by: bearophileHUGS | last post by:
I'm frequently using Py2.4 sets, I find them quite useful, and I like them, even if they seem a little slower than dicts. Sets also need the same memory of dicts (can they be made to use less...
7
by: Marcio Rosa da Silva | last post by:
Hi! In dictionaries, unlinke lists, it doesn't matter the order one inserts the contents, elements are stored using its own rules. Ex: >>> d = {3: 4, 1: 2} >>> d {1: 2, 3: 4}
11
by: sandravandale | last post by:
I can think of several messy ways of making a dict that sets a flag if it's been altered, but I have a hunch that experienced python programmers would probably have an easier (well maybe more...
9
by: py | last post by:
I have two lists which I want to use to create a dictionary. List x would be the keys, and list y is the values. x = y = Any suggestions? looking for an efficent simple way to do...
5
by: bruce | last post by:
hi... i'm trying to deal with multi-dimension lists/arrays i'd like to define a multi-dimension string list, and then manipulate the list as i need... primarily to add lists/information to the...
16
by: agent-s | last post by:
Basically I'm programming a board game and I have to use a list of lists to represent the board (a list of 8 lists with 8 elements each). I have to search the adjacent cells for existing pieces and...
20
by: Seongsu Lee | last post by:
Hi, I have a dictionary with million keys. Each value in the dictionary has a list with up to thousand integers. Follow is a simple example with 5 keys. dict = {1: , 2: , 900000: , 900001:...
6
by: Ernst-Ludwig Brust | last post by:
Given 2 Number-Lists say l0 and l1, count the various positiv differences between the 2 lists the following part works: dif= da={} for d in dif: da=da.get(d,0)+1 i wonder, if there is a...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.