By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,899 Members | 1,066 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,899 IT Pros & Developers. It's quick & easy.

Nested dictionaries trouble

P: n/a
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]
Thank you,
Ian

Apr 11 '07 #1
Share this Question
Share on Google+
16 Replies


P: n/a
En Wed, 11 Apr 2007 15:57:56 -0300, IamIan <ia****@gmail.comescribió:
I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)
All your years share the *same* monthTotals object.
This is similar to this FAQ entry:
<http://effbot.org/pyfaq/how-do-i-create-a-multidimensional-list.htm>
You have to create a new dict for each year; replace the above code with:

yearTotals = {}
for year in years:
yearTotals[year] = dict.fromkeys(months, 0)

--
Gabriel Genellina
Apr 11 '07 #2

P: n/a

"IamIan" <ia****@gmail.comwrote in message
news:11*********************@n59g2000hsh.googlegro ups.com...
| Hello,
|
| I'm writing a simple FTP log parser that sums file sizes as it runs. I
| have a yearTotals dictionary with year keys and the monthTotals
| dictionary as its values. The monthTotals dictionary has month keys
| and file size values. The script works except the results are written
| for all years, rather than just one year. I'm thinking there's an
| error in the way I set my dictionaries up or reference them...
|
| import glob, traceback
|
| years = ["2005", "2006", "2007"]
| months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
| # Create months dictionary to convert log values
| logMonths =
|
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
| # Create monthTotals dictionary with default 0 value
| monthTotals = dict.fromkeys(months, 0)
| # Nest monthTotals dictionary in yearTotals dictionary
| yearTotals = {}
| for year in years:
| yearTotals.setdefault(year, monthTotals)

try yearTotals.setdefault(year, dict.fromkeys(months, 0))
so you start with a separate subdict for each year instead of 1 for all.

tjr

Apr 11 '07 #3

P: n/a
IamIan a écrit :
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
DRY violation alert !

logMonths = {
"Jan":"01",
"Feb":"02",
"Mar":"03",
"Apr":"04",
"May":"05",
#etc
}

months = sorted(logMonths.values())
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)
A complicated way to write:
yearTotals = dict((year, monthTotals) for year in years)

And without even reading further, I can tell you have a problem here:
all 'year' entry in yearTotals points to *the same* monthTotal dict
instance. So when updating yearTotals['2007'], you see the change
reflected for all years. The cure is simple: forget the monthTotals
object, and define your yearTotals dict this way:

yearTotals = dict((year, dict.fromkeys(months, 0)) for year in years)

NB : for Python versions < 2.4.x, you need a list comp instead of a
generator expression, ie:

yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

HTH
Apr 11 '07 #4

P: n/a
1) You have this setup:

logMonths = {"Jan":"01", "Feb":"02",...}
yearTotals = {
"2005":{"01":0, "02":0, ....}
"2006":
"2007":
}

Then when you get a value such as "Jan", you look up the "Jan" in the
logMonths dictionary to get "01". Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary. Why do
that? What is the point of even having the logMonths dictionary? Why
not make "Jan" the key in the the "2005" dictionary and look it up
directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]
# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps. How
about:

yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer. Or, you could do this:

year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

3)
>I'm thinking there's an error in the way
I set my dictionaries up or reference them
Yep. It's right here:

for year in years:
yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict. You can use a dicts
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

for x in yearTotals.keys():
print "KEY", "\t", "VALUE"
print x, "\t", yearTotals[x]
for y in yearTotals[x].keys():
print " ", y, "\t", yearTotals[x][y]

Apr 11 '07 #5

P: n/a

IamIan wrote:
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10", "11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]
Thank you,
Ian

1) You have this setup:

logMonths = {"Jan":"01", "Feb":"02",...}
yearTotals = {
"2005":{"01":0, "02":0, ....}
"2006":
"2007":
}

Then when you get a result such as "Jan", you look up "Jan" in the
logMonths dictionary to get "01". Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary. What is
the point of even having the logMonths dictionary? Why not make "Jan"
the key in the the "2005" dictionary and look it up directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]
# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps. How
about:

yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer. Or, you could do this:

year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

3)
>I'm thinking there's an error in the way
I set my dictionaries up or reference them
Yep. It's right here:

for year in years:
yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict. You can use a dict's
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
year, month, val = logLine[4], logLine[1], int(logLine[7])
yearTotals[year][month] += val

for x in yearTotals.keys():
print "KEY", "\t", "VALUE"
print x, "\t", yearTotals[x]
for y in yearTotals[x].keys():
print " ", y, "\t", yearTotals[x][y]

Apr 11 '07 #6

P: n/a
Thank you everyone for the helpful replies. Some of the solutions were
new to me, but the script now runs successfully. I'm still learning to
ride the snake but I love this language!

Ian

Apr 11 '07 #7

P: n/a
On Apr 11, 2:57 pm, Bruno Desthuilliers
<bdesth.quelquech...@free.quelquepart.frwrote:
IamIan a écrit :

yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

HTH
List comprehensions without a list? What? Where? How?
Apr 12 '07 #8

P: n/a
On Apr 11, 7:01 pm, "7stud" <bbxx789_0...@yahoo.comwrote:
On Apr 11, 2:57 pm, Bruno Desthuilliers

<bdesth.quelquech...@free.quelquepart.frwrote:
IamIan a écrit :
yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])
HTH

List comprehensions without a list? What? Where? How?
Ooops. I copied the wrong one. I was looking at this one:

yearTotals = dict((year, monthTotals) for year in years)

Apr 12 '07 #9

P: n/a
On Apr 11, 7:28 pm, "7stud" <bbxx789_0...@yahoo.comwrote:
On Apr 11, 7:01 pm, "7stud" <bbxx789_0...@yahoo.comwrote:
On Apr 11, 2:57 pm, Bruno Desthuilliers
<bdesth.quelquech...@free.quelquepart.frwrote:
IamIan a écrit :
yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])
HTH
List comprehensions without a list? What? Where? How?

Ooops. I copied the wrong one. I was looking at this one:

yearTotals = dict((year, monthTotals) for year in years)
Never mind. I found this PEP:

http://www.python.org/dev/peps/pep-0289/

Apr 12 '07 #10

P: n/a
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?

Thanks.

Apr 18 '07 #11

P: n/a
On Wednesday, Apr 18th 2007 at 12:16 -0700, quoth IamIan:

=>I am using the suggested approach to make a years list:
=>
=>years = ["199%s" % x for x in range(0,10)]
=>years += ["200%s" % x for x in range(0,10)]
=>
=>I haven't had any luck doing this in one line though. Is it possible?

I'm so green that I almost get a chubby at being able to answer something.
;-)

years = [str(1990+x) for x in range(0,20)]

Yes?

--
Time flies like the wind. Fruit flies like a banana. Stranger things have .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net
Apr 18 '07 #12

P: n/a
In <11**********************@p77g2000hsh.googlegroups .com>, IamIan wrote:
years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?
In [48]: years = map(str, xrange(1999, 2011))

In [49]: years
Out[49]:
['1999',
'2000',
'2001',
'2002',
'2003',
'2004',
'2005',
'2006',
'2007',
'2008',
'2009',
'2010']

Ciao,
Marc 'BlackJack' Rintsch
Apr 18 '07 #13

P: n/a
On Wed, 18 Apr 2007 12:16:12 -0700, IamIan wrote:
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?
years = ["199%s" % x for x in range(0,10)] + \
["200%s" % x for x in range(0,10)]

Sorry for the line continuation, my news reader insists on breaking the
line. In your editor, just delete the "\" and line break to make it a
single line.
If you don't like that solution, here's a better one:

years = [str(1990 + n) for n in range(20)]

Or there's this:

years = [str(n) for n in range(1990, 2010)]

Or this one:

years = map(str, range(1990, 2010))
--
Steven.

Apr 19 '07 #14

P: n/a
Thank you again for the great suggestions. I have one final question
about creating a httpMonths dictionary like {'Jan':'01' , 'Feb':'02' ,
etc} with a minimal amount of typing. My code follows (using Python
2.3.4):

import calendar

# Create years list, formatting as strings
years = map(str, xrange(1990,2051))

# Create months list with three letter abbreviations
months = list(calendar.month_abbr)

# Create monthTotals dictionary with default value of zero
monthTotals = dict.fromkeys(months[1:],0)

# Create yearTotals dictionary with years for keys
# and copies of the monthTotals dictionary for values
yearTotals = dict([(year, monthTotals.copy()) for year in years])

# Create httpMonths dictionary to map month abbreviations
# to Apache numeric month representations
httpMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09", "Oct":"10","Nov":"11","Dec":"12"}

It is this last step I'm referring to. I got close with:
httpMonths = {}
for month in months[1:]:
httpMonths[month] = str(len(httpMonths)+1)

but the month numbers are missing the leading zero for 01-09. Thanks!

Ian

Apr 19 '07 #15

P: n/a
IamIan <ia****@gmail.comwrote in
news:11**********************@e65g2000hsc.googlegr oups.com:
Thank you again for the great suggestions. I have one final
question about creating a httpMonths dictionary like {'Jan':'01'
, 'Feb':'02' , etc} with a minimal amount of typing. My code
follows (using Python 2.3.4):

import calendar

# Create years list, formatting as strings
years = map(str, xrange(1990,2051))

# Create months list with three letter abbreviations
months = list(calendar.month_abbr)

# Create monthTotals dictionary with default value of zero
monthTotals = dict.fromkeys(months[1:],0)

# Create yearTotals dictionary with years for keys
# and copies of the monthTotals dictionary for values
yearTotals = dict([(year, monthTotals.copy()) for year in
years])

# Create httpMonths dictionary to map month abbreviations
# to Apache numeric month representations
httpMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May" :"05","Jun":"0
6
","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov ":"11","Dec":"
1
2"}

It is this last step I'm referring to. I got close with:
httpMonths = {}
for month in months[1:]:
httpMonths[month] = str(len(httpMonths)+1)

but the month numbers are missing the leading zero for 01-09.
Thanks!
Maybe something like:
httpMonths = dict((k,"%02d" % (x+1))
for x,k in enumerate(months[1:]) )

--
rzed
Apr 21 '07 #16

P: n/a
IamIan a écrit :
I am using the suggested approach to make a years list:

years = ["199%s" % x for x in range(0,10)]
years += ["200%s" % x for x in range(0,10)]

I haven't had any luck doing this in one line though. Is it possible?
# Q, D and pretty obvious
years = ["199%s" % x for x in range(0,10)] + ["200%s" % x for x in
range(0,10)]

# hardly more involved, and quite more generic
years = ["%s%s" % (c, y) for c in ("199", "201") for y in range(10)]
Apr 21 '07 #17

This discussion thread is closed

Replies have been disabled for this discussion.