473,574 Members | 2,600 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

tips requested for a log-processing script

Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!
Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.
Nov 5 '06 #1
4 1312
if you are running in windows you can use the win32com module to
automate the process of generating a pivot table in excel and then code
to send it via e-mail

Jaap wrote:
Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!
Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.
Nov 5 '06 #2
Jaap wrote:
Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.
It's probably not necessary if your records are in the order of 100K,
but if you're dealing with millions and above, you can write your
config file in binary using the struct module and condense it down to 6
bytes per record (32 bits for the ID and 12 bits for the months
occurences). Filtering will also be faster, as for each record you just
have to do a bitwise AND with the 0..010...0 mask corresponding to a
given month.

George

Nov 5 '06 #3
"Jaap" <ja**@nospaml.c omwrote:

Python ers,
As a relatively new user of Python I would like to ask your advice on
the following script I want to create.

I have a logfile which contains records. All records have the same
layout, and are stored in a CSV-format. Each record is (non-uniquely)
identified by a date and a itemID. Each itemID can occur 0 or more times
per month. The item contains a figure/amount which I need to sum per
month and per itemID. I have already managed to separate the individual
parts of each logfile-record by using the csv-module from Python 2.5.
very simple indeed.

Apart from this I have a configuration file, which contains the list of
itemID's i need to focus on per month. Not all itemID's are relevant for
each month, but for example only every second or third month. All
records in the logfile with other itemID's can be ignored. I have yet to
define the format of this configuration file, but am thinking about a 0
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose?
The logfile is not very big (about 200k max, average 200k) so I assume I
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant
items from logfile? Would you propose I use a filter(func, list) for
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems
straitforward from the documentation I have found, although I must
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!
Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table
which does about the same ;-] but that is not what I want; the
processing must be automated in the end with a periodic script which
e-mails the summary of the keyfigure every month.

I would do something like this: (obviously untested)

for line in readlines(open( logfile,r,1)):
(code to get hold of item, date, amount)
if item not in item_dict:
item_dict[item] = [(date,amount)]
else:
item_dict[item].append(date,am ount)

this will give you, for each unique item, a direct ref to wherever its been
used.

I would then work through the config file, and extract the items of interest for
the run date...

HTH - Hendrik

Nov 6 '06 #4
Hendrik van Rooyen schreef:
"Jaap" <ja**@nospaml.c omwrote:

>Python ers,
Thanks!
all your replies have been both to the point and helpfull for me.

You have proven both Python and it's community are open and welcoming to
new users.

Jaap
Nov 6 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
3936
by: Mike Chirico | last post by:
Interesting Things to Know about MySQL Mike Chirico (mchirico@users.sourceforge.net) Copyright (GPU Free Documentation License) 2004 Last Updated: Mon Jun 7 10:37:28 EDT 2004 The latest version of this document can be found at: http://prdownloads.sourceforge.net/souptonuts/README_mysql.txt?download
4
32306
by: Klemens | last post by:
One database tells on every connect statement: SQL1092N "" does not have the authority to perform the requested command. Another database at the same instance works fine. One day before the first database worked too. Any idea what could have happened and how I can connect again? Thanks Klemens
2
15058
by: Brian Campbell | last post by:
Gretings... I have written a Windows Service that accesses a Web Service. Both run fine on my development machine.... After installing both the Windows Service and Web Service on my staging server, I get the following error...
11
3565
by: Josh Flanagan | last post by:
I am trying to write to the event log from ASP.NET, on Windows XP SP1. As soon as I try to write an event (or even query the source with EventLog.SourceExists() or EventLog.LogNameFromSourceName()) I get a SecurityException "Requested registry access is not allowed.". I have read the KB article associated with this error message, which...
3
2495
by: bfprog | last post by:
Using IBM iSeries client access OLEDB provider to connect to DB2 on AS/400, but cannot create connection using .NET web app. Using following code: Dim cnTest As New OleDbConnection("Provider=IBMDA400; Data Source=S10324NM; User ID=THEUSER; Password=THESECRET") Works fine in console app, but fails in web app with: ...
2
4291
by: CathieC | last post by:
I have deployed an asp.net application on Windows XP(SP2), Internet Explorer (V 6). SQL Server is running on the same machine. To login the user browses to http://machinename/default.aspx. Here they enter log in details and can use the application. This works fine on all the other machines that I have deployed the app (usually laptops...
4
46419
by: LP | last post by:
Hi, My webservice is currently deployed on WIndows 2000 server and runs pretty fine. I am trying to run my webservice on a Windows 2003 server. My webservice tries to write to a eventlog. The code is as follows: System.Diagnostics.EventLog Log = new System.Diagnostics.EventLog( EventLogName ); Log.Source = EventLogName; Log.WriteEntry(...
0
1508
by: Flinker | last post by:
When attempting to read the System event log on a remote system, I receive a "Requested registry access is not allowed" exception. I temporary added myself as an administrator to the remote system and the code worked fine. What permissions need to be assigned to my account on the remote system? If I use the NT 4.0 resource kit utility...
1
3127
by: UK1967 | last post by:
I wrote a ASP.NET application (Windows 2003 Enterprise Server, IIS, .NET Framework 1.1). This application use the Windows (AD) account and impersonation. Some functions in this application contact, read and change the remote registry (HKLM/Software/...) of some internal server. After installing SP1 on the web server (IIS) all functions with...
5
38820
by: mivey4 | last post by:
Hi, First off, I am aware that this is a very heavily documented error and I have done my homework for throughly researching probable causes before deciding to post my problem here. At this point, I believe another set of eyes on the issue is merited. I am a MSSQL DBA and somewhat new to ORACLE; but I have read the administrators manual...
0
7814
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7736
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8067
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
6464
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5631
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5307
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3756
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1352
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1071
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.