472,328 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,328 software developers and data experts.

tips requested for a log-processing script

Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!
Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.
Nov 5 '06 #1
4 1224
if you are running in windows you can use the win32com module to
automate the process of generating a pivot table in excel and then code
to send it via e-mail

Jaap wrote:
Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!
Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.
Nov 5 '06 #2
Jaap wrote:
Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.
It's probably not necessary if your records are in the order of 100K,
but if you're dealing with millions and above, you can write your
config file in binary using the struct module and condense it down to 6
bytes per record (32 bits for the ID and 12 bits for the months
occurences). Filtering will also be faster, as for each record you just
have to do a bitwise AND with the 0..010...0 mask corresponding to a
given month.

George

Nov 5 '06 #3
"Jaap" <ja**@nospaml.comwrote:

Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!
Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.

I would do something like this: (obviously untested)

for line in readlines(open(logfile,r,1)):
(code to get hold of item, date, amount)
if item not in item_dict:
item_dict[item] = [(date,amount)]
else:
item_dict[item].append(date,amount)

this will give you, for each unique item, a direct ref to wherever its been
used.

I would then work through the config file, and extract the items of interest for
the run date...

HTH - Hendrik

Nov 6 '06 #4
Hendrik van Rooyen schreef:
"Jaap" <ja**@nospaml.comwrote:

>Python ers,
Thanks!
all your replies have been both to the point and helpfull for me.

You have proven both Python and it's community are open and welcoming to
new users.

Jaap
Nov 6 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Mike Chirico | last post by:
Interesting Things to Know about MySQL Mike Chirico (mchirico@users.sourceforge.net) Copyright (GPU Free Documentation License) 2004 Last Updated:...
4
by: Klemens | last post by:
One database tells on every connect statement: SQL1092N "" does not have the authority to perform the requested command. Another database at the...
2
by: Brian Campbell | last post by:
Gretings... I have written a Windows Service that accesses a Web Service. Both run fine on my development machine.... After installing both...
11
by: Josh Flanagan | last post by:
I am trying to write to the event log from ASP.NET, on Windows XP SP1. As soon as I try to write an event (or even query the source with...
3
by: bfprog | last post by:
Using IBM iSeries client access OLEDB provider to connect to DB2 on AS/400, but cannot create connection using .NET web app. Using following code:...
2
by: CathieC | last post by:
I have deployed an asp.net application on Windows XP(SP2), Internet Explorer (V 6). SQL Server is running on the same machine. To login the user...
4
by: LP | last post by:
Hi, My webservice is currently deployed on WIndows 2000 server and runs pretty fine. I am trying to run my webservice on a Windows 2003 server....
0
by: Flinker | last post by:
When attempting to read the System event log on a remote system, I receive a "Requested registry access is not allowed" exception. I temporary added...
1
by: UK1967 | last post by:
I wrote a ASP.NET application (Windows 2003 Enterprise Server, IIS, .NET Framework 1.1). This application use the Windows (AD) account and...
5
by: mivey4 | last post by:
Hi, First off, I am aware that this is a very heavily documented error and I have done my homework for throughly researching probable causes...
0
by: tammygombez | last post by:
Hey fellow JavaFX developers, I'm currently working on a project that involves using a ComboBox in JavaFX, and I've run into a bit of an issue....
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.