473,387 Members | 1,742 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Nested dictionaries trouble

Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]
Thank you,
Ian

Apr 11 '07 #1
16 1605
En Wed, 11 Apr 2007 15:57:56 -0300, IamIan <ia****@gmail.comescribió:
I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)
All your years share the *same* monthTotals object.
This is similar to this FAQ entry:
<http://effbot.org/pyfaq/how-do-i-create-a-multidimensional-list.htm>
You have to create a new dict for each year; replace the above code with:

yearTotals = {}
for year in years:
yearTotals[year] = dict.fromkeys(months, 0)

--
Gabriel Genellina
Apr 11 '07 #2

"IamIan" <ia****@gmail.comwrote in message
news:11*********************@n59g2000hsh.googlegro ups.com...
| Hello,
|
| I'm writing a simple FTP log parser that sums file sizes as it runs. I
| have a yearTotals dictionary with year keys and the monthTotals
| dictionary as its values. The monthTotals dictionary has month keys
| and file size values. The script works except the results are written
| for all years, rather than just one year. I'm thinking there's an
| error in the way I set my dictionaries up or reference them...
|
| import glob, traceback
|
| years = ["2005", "2006", "2007"]
| months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
| # Create months dictionary to convert log values
| logMonths =
|
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
| # Create monthTotals dictionary with default 0 value
| monthTotals = dict.fromkeys(months, 0)
| # Nest monthTotals dictionary in yearTotals dictionary
| yearTotals = {}
| for year in years:
| yearTotals.setdefault(year, monthTotals)

try yearTotals.setdefault(year, dict.fromkeys(months, 0))
so you start with a separate subdict for each year instead of 1 for all.

tjr

Apr 11 '07 #3
IamIan a écrit :
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
DRY violation alert !

logMonths = {
"Jan":"01",
"Feb":"02",
"Mar":"03",
"Apr":"04",
"May":"05",
#etc
}

months = sorted(logMonths.values())
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)
A complicated way to write:
yearTotals = dict((year, monthTotals) for year in years)

And without even reading further, I can tell you have a problem here:
all 'year' entry in yearTotals points to *the same* monthTotal dict
instance. So when updating yearTotals['2007'], you see the change
reflected for all years. The cure is simple: forget the monthTotals
object, and define your yearTotals dict this way:

yearTotals = dict((year, dict.fromkeys(months, 0)) for year in years)

NB : for Python versions < 2.4.x, you need a list comp instead of a
generator expression, ie:

yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

HTH
Apr 11 '07 #4
1) You have this setup:

logMonths = {"Jan":"01", "Feb":"02",...}
yearTotals = {
"2005":{"01":0, "02":0, ....}
"2006":
"2007":
}

Then when you get a value such as "Jan", you look up the "Jan" in the
logMonths dictionary to get "01". Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary. Why do
that? What is the point of even having the logMonths dictionary? Why
not make "Jan" the key in the the "2005" dictionary and look it up
directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]
# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps. How
about:

yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer. Or, you could do this:

year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

3)
>I'm thinking there's an error in the way
I set my dictionaries up or reference them
Yep. It's right here:

for year in years:
yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict. You can use a dicts
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

for x in yearTotals.keys():
print "KEY", "\t", "VALUE"
print x, "\t", yearTotals[x]
for y in yearTotals[x].keys():
print " ", y, "\t", yearTotals[x][y]

Apr 11 '07 #5

IamIan wrote:
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]
Thank you,
Ian

1) You have this setup:

logMonths = {"Jan":"01", "Feb":"02",...}
yearTotals = {
"2005":{"01":0, "02":0, ....}
"2006":
"2007":
}

Then when you get a result such as "Jan", you look up "Jan" in the
logMonths dictionary to get "01". Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary. What is
the point of even having the logMonths dictionary? Why not make "Jan"
the key in the the "2005" dictionary and look it up directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]
# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps. How
about:

yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer. Or, you could do this:

year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

3)
>I'm thinking there's an error in the way
I set my dictionaries up or reference them
Yep. It's right here:

for year in years:
yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict. You can use a dict's
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

for x in yearTotals.keys():
print "KEY", "\t", "VALUE"
print x, "\t", yearTotals[x]
for y in yearTotals[x].keys():
print " ", y, "\t", yearTotals[x][y]

Apr 11 '07 #6
Thank you everyone for the helpful replies. Some of the solutions were
new to me, but the script now runs successfully. I'm still learning to
ride the snake but I love this language!

Ian

Apr 11 '07 #7
On Apr 11, 2:57 pm, Bruno Desthuilliers
<bdesth.quelquech...@free.quelquepart.frwrote:
IamIan a écrit :

yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

HTH
List comprehensions without a list? What? Where? How?
Apr 12 '07 #8
On Apr 11, 7:01 pm, "7stud" <bbxx789_0...@yahoo.comwrote:
On Apr 11, 2:57 pm, Bruno Desthuilliers

<bdesth.quelquech...@free.quelquepart.frwrote:
IamIan a écrit :
yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])
HTH

List comprehensions without a list? What? Where? How?
Ooops. I copied the wrong one. I was looking at this one:

yearTotals = dict((year, monthTotals) for year in years)

Apr 12 '07 #9
On Apr 11, 7:28 pm, "7stud" <bbxx789_0...@yahoo.comwrote:
On Apr 11, 7:01 pm, "7stud" <bbxx789_0...@yahoo.comwrote:
On Apr 11, 2:57 pm, Bruno Desthuilliers
<bdesth.quelquech...@free.quelquepart.frwrote:
IamIan a écrit :
yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])
HTH
List comprehensions without a list? What? Where? How?

Ooops. I copied the wrong one. I was looking at this one:

yearTotals = dict((year, monthTotals) for year in years)
Never mind. I found this PEP:

http://www.python.org/dev/peps/pep-0289/

Apr 12 '07 #10
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

Thanks.

Apr 18 '07 #11
On Wednesday, Apr 18th 2007 at 12:16 -0700, quoth IamIan:

=>I am using the suggested approach to make a years list:
=>
=>years = ["199%s" % x for x in range(0,10)]
=>years += ["200%s" % x for x in range(0,10)]
=>
=>I haven't had any luck doing this in one line though. Is it possible?

I'm so green that I almost get a chubby at being able to answer something.
;-)

years = [str(1990+x) for x in range(0,20)]

Yes?

--
Time flies like the wind. Fruit flies like a banana. Stranger things have .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net
Apr 18 '07 #12
In <11**********************@p77g2000hsh.googlegroups .com>, IamIan wrote:
years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?
In [48]: years = map(str, xrange(1999, 2011))

In [49]: years
Out[49]:
['1999',
'2000',
'2001',
'2002',
'2003',
'2004',
'2005',
'2006',
'2007',
'2008',
'2009',
'2010']

Ciao,
Marc 'BlackJack' Rintsch
Apr 18 '07 #13
On Wed, 18 Apr 2007 12:16:12 -0700, IamIan wrote:
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?
years = ["199%s" % x for x in range(0,10)] + \
["200%s" % x for x in range(0,10)]

Sorry for the line continuation, my news reader insists on breaking the
line. In your editor, just delete the "\" and line break to make it a
single line.
If you don't like that solution, here's a better one:

years = [str(1990 + n) for n in range(20)]

Or there's this:

years = [str(n) for n in range(1990, 2010)]

Or this one:

years = map(str, range(1990, 2010))
--
Steven.

Apr 19 '07 #14
Thank you again for the great suggestions. I have one final question
about creating a httpMonths dictionary like {'Jan':'01' , 'Feb':'02' ,
etc} with a minimal amount of typing. My code follows (using Python
2.3.4):

import calendar

# Create years list, formatting as strings
years = map(str, xrange(1990,2051))

# Create months list with three letter abbreviations
months = list(calendar.month_abbr)

# Create monthTotals dictionary with default value of zero
monthTotals = dict.fromkeys(months[1:],0)

# Create yearTotals dictionary with years for keys
# and copies of the monthTotals dictionary for values
yearTotals = dict([(year, monthTotals.copy()) for year in years])

# Create httpMonths dictionary to map month abbreviations
# to Apache numeric month representations
httpMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}

It is this last step I'm referring to. I got close with:
httpMonths = {}
for month in months[1:]:
httpMonths[month] = str(len(httpMonths)+1)

but the month numbers are missing the leading zero for 01-09. Thanks!

Ian

Apr 19 '07 #15
IamIan <ia****@gmail.comwrote in
news:11**********************@e65g2000hsc.googlegr oups.com:
Thank you again for the great suggestions. I have one final
question about creating a httpMonths dictionary like {'Jan':'01'
, 'Feb':'02' , etc} with a minimal amount of typing. My code
follows (using Python 2.3.4):

import calendar

# Create years list, formatting as strings
years = map(str, xrange(1990,2051))

# Create months list with three letter abbreviations
months = list(calendar.month_abbr)

# Create monthTotals dictionary with default value of zero
monthTotals = dict.fromkeys(months[1:],0)

# Create yearTotals dictionary with years for keys
# and copies of the monthTotals dictionary for values
yearTotals = dict([(year, monthTotals.copy()) for year in
years])

# Create httpMonths dictionary to map month abbreviations
# to Apache numeric month representations
httpMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"0
6
","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov ":"11","Dec":"
1
2"}

It is this last step I'm referring to. I got close with:
httpMonths = {}
for month in months[1:]:
httpMonths[month] = str(len(httpMonths)+1)

but the month numbers are missing the leading zero for 01-09.
Thanks!
Maybe something like:
httpMonths = dict((k,"%02d" % (x+1))
for x,k in enumerate(months[1:]) )

--
rzed
Apr 21 '07 #16
IamIan a écrit :
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?
# Q, D and pretty obvious
years = ["199%s" % x for x in range(0,10)] + ["200%s" % x for x in
range(0,10)]

# hardly more involved, and quite more generic
years = ["%s%s" % (c, y) for c in ("199", "201") for y in range(10)]
Apr 21 '07 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: omission9 | last post by:
I have a dictionary that looks like this MY_DICT=FOO I am having a problem updating this with a simple MY_DICT.update(NEW_DICT) as update doesn't seem to care about getting into the inner...
9
by: T. Earle | last post by:
To list, I'm trying to figure out the best approach to the following problem: I have four variables: 1) headlines 2) times 3) states 4) zones
6
by: Andy Baker | last post by:
Hi there, I'm learning Python at the moment and trying to grok the thinking behind it's scoping and nesting rules. I was googling for nested functions and found this Guido quote:...
3
by: Faisal Alquaddoomi | last post by:
Hello, I'm having a bit of trouble isolating my scripts from each other in my embedded Python interpreter, so that their global namespaces don't get all entangled. I've had some luck with...
2
by: techiepundit | last post by:
I'm parsing some data of the form: OuterName1 InnerName1=5,InnerName2=7,InnerName3=34; OuterName2 InnerNameX=43,InnerNameY=67,InnerName3=21; OuterName3 .... and so on.... These are fake...
8
by: Brian L. Troutwine | last post by:
I've got a problem that I can't seem to get my head around and hoped somebody might help me out a bit: I've got a dictionary, A, that is arbitarily large and may contains ints, None and more...
12
by: Rich Shepard | last post by:
I want to code what would be nested "for" loops in C, but I don't know the most elegant way of doing the same thing in python. So I need to learn how from you folks. Here's what I need to do: build...
0
by: d80013 | last post by:
Hello all, I am trying to create a Dictionary of dictionaries in VBA. All I do is declare two dictionaries, one temporary one, and add the temporary dictionary to the main one recursively. The...
1
by: Matthew Schibler | last post by:
I'm a newbie to Python, with some experience using perl (where I used nested arrays and hashes extensively). I am building a script in python for a MUD I play, and I want to use the shelve module...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.