473,385 Members | 1,396 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

File not read to end

Hi,

I'm trying to write a simple log parsing program. I noticed that it
isn't reading my log file to the end.

My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.

This is the file reading code segment that I'm using:
sysFile=open(sysFilename,'r')
lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line

I also stuck this same code bit into a test script and it was able to
parse the entire log without problem. Very quirky.

This is my first foray from Perl to Python so I appreciate any help.

Thanks in advance.

--Andrew

Apr 25 '07 #1
8 2135
an**************@gmail.com wrote:
Hi,

I'm trying to write a simple log parsing program. I noticed that it
isn't reading my log file to the end.

My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.

This is the file reading code segment that I'm using:
sysFile=open(sysFilename,'r')
lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line

I also stuck this same code bit into a test script and it was able to
parse the entire log without problem. Very quirky.

This is my first foray from Perl to Python so I appreciate any help.

Thanks in advance.

--Andrew
Show us more of your surrounding code so we have some chance of figuring
out why this working code stops. There's nothing wrong with this code,
the problem is somewhere else.

Suggestion:

lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line

can be written:

for lineCount, line in enumerate(sysFile):
print "%i--%s" % (lineCount, line)

-Larry
Apr 25 '07 #2
an**************@gmail.com wrote:
My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.
Are you in Windows? Just in case, put "rb" as the mode of the open.

Regards,

--
.. Facundo
..
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Apr 25 '07 #3
On Apr 25, 3:03 pm, Facundo Batista <facu...@taniquetil.com.arwrote:
andrew.jeffer...@gmail.com wrote:
My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.

Are you in Windows? Just in case, put "rb" as the mode of the open.

Regards,

--
. Facundo
.
Blog:http://www.taniquetil.com.ar/plog/
PyAr:http://www.python.org/ar/
I am running Windows (Vista). I tried the "rb" as you suggested but it
didn't solve the problem. Thanks for the suggestion.

--Andrew

Apr 26 '07 #4
On Apr 25, 2:51 pm, Larry Bates <larry.ba...@websafe.comwrote:
andrew.jeffer...@gmail.com wrote:
Hi,
I'm trying to write a simple log parsing program. I noticed that it
isn't reading my log file to the end.
My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.
This is the file reading code segment that I'm using:
sysFile=open(sysFilename,'r')
lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line
I also stuck this same code bit into a test script and it was able to
parse the entire log without problem. Very quirky.
This is my first foray from Perl to Python so I appreciate any help.
Thanks in advance.
--Andrew

Show us more of your surrounding code so we have some chance of figuring
out why this working code stops. There's nothing wrong with this code,
the problem is somewhere else.

Suggestion:

lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line

can be written:

for lineCount, line in enumerate(sysFile):
print "%i--%s" % (lineCount, line)

-Larry
Hi Larry,

I've attached the whole script. Thanks again for your help.

--Andrew
import getopt, sys, re, os

def main():
try:# Get options for processing
o, a = getopt.getopt(sys.argv[1:], 'a:d:hl')
except getopt.GetoptError:
# print help information and exit:
usage()
sys.exit(2)
opts = {}
for k,v in o: #Parse parameters into
hash
opts[k] = v
#make sure that all of the needed options are included
if opts.has_key('-h'): #Return help for -
h
usage()
sys.exit(0)
if opts.has_key('-l'):
pathname = opts['-l']
if not (opts.has_key('-a')):
usage()
sys.exit()
else:
address=opts['-a']
if not (opts.has_key('-d')):
usage()
sys.exit()
if not (opts.has_key('-l')): # Use current path if not provided
pathname = os.path.abspath(os.path.dirname(sys.argv[0]))

# Get file names and open files
sysFilename = os.path.abspath(pathname) + "\sys" + opts["-d"] +
".txt"
#logFilename = opts["-l"] + "\log" + opts["-d"] + ".txt"
spamFilename = os.path.abspath(pathname) + "\spam" + opts["-d"] +
".log"
print "Loading Files:\n" + sysFilename + "\n" + spamFilename +
"\n"
try: #Open log files
sysFile=open(sysFilename,'rb')
#logFile=open(logFilename,'r')
spamFile=open(spamFilename,'rb')
except:
print "could not open file for reading" , sys.exc_info()[0]
sys.exit()
ToAddr = {} # This will hold Messages TO the address
FrAddr = {} # This will hold Messages FROM the address
numFound = 0 # For Testing
notFound = 0 # For Testing
lineCount = 0 # For Testing
#Read file and get message IDs that correspond to the searched
address
for line in sysFile:
lineCount +=1 # For Testing
# print str(lineCount) + " -- " + line
daRegex = re.compile(address)
if daRegex.search(line): #Found address in line - Continue
processing
#re.search(address,line): #If line has address
print line + "\n" # For Testing
numFound +=1 # For Testing
if re.search('MAIL FROM:',line): #Add it (message id) to
the From list if needed
MID = getMID(line)
if FrAddr.has_key(MID):
break
else:
FrAddr[MID]=""
#print "From: " + MID + "\n"
elif re.search('RCPT TO:',line): #Add it (message id) to
the To list if needed
MID = getMID(line)
if ToAddr.has_key(MID):
break
else:
ToAddr[MID]=""
else:
notFound +=1 #For Testing
# Close and re-open file for re-processes (there is probably a
better way to do this)
sysFile.close
sysFile=open(sysFilename,'r')

for line in sysFile: # Get all messages with message IDs that have
been found
MID = getMID(line)
if FrAddr.has_key(MID):
FrAddr[MID]+=line
# print line + "\n"
elif ToAddr.has_key(MID):
ToAddr[MID]+=line
sysFile.close

for line in spamFile: # Get similar messages from spam file
MID = getMID(line)
if FrAddr.has_key(MID):
FrAddr[MID]+='SPAM>>>'+ line
elif ToAddr.has_key(MID):
ToAddr[MID]+='SPAM>>>'+ line
spamFile.close

#open output files
fname = pathname + "\\" + address + ".txt"
fout = open(fname,'w')
# Output and format
for key in FrAddr.keys():
fout.write("<<<<<<< FROM "+ address+ " Message ID "+ key
+ "------------\n")
fout.write(FrAddr[key]+"\n")
for key in ToAddr.keys():
fout.write(">>>>>>To "+ address+ " Message ID "+ key
+ "------------\n")
fout.write(ToAddr[key]+"\n")

print "------------------- Done processing
---------------------"
print "Found: " + str(numFound) #Test
print "Not matching: " + str(notFound) #Test
print "Line Cound: " + str(lineCount) #test
fout.close
def getMID(daLine): #Extracts the message ID from the message
p = re.compile("\(.*?\)")
pid=p.search(daLine)
if pid:
id=pid.group()
id=id.lstrip('\(')
id=id.rstrip('\)')
#print id
return id
else:
return

def usage(): # Provides usage feedback
print """
Syntax:
-a email account to find
-l location of log files (OPTIONAL)
-d date, in file date format (####)
"""
if __name__ == "__main__": # Call mail loop
main()
Apr 26 '07 #5
an**************@gmail.com wrote:
I've attached the whole script. Thanks again for your help.

--Andrew
Andrew, tip:

If you attach the whole script, what you get is that a lot of people
goes away from the thread. Me for example. I won't read 100 lines of
code to see where is the problem, and then try to solve.

The best way to handle this, and effectively getting more help from the
community, is start to trim your code. So, you take those 20 lines away,
and the problem persist. You cut off another 15, and the problem
persist.

After ten minutes of work, you get a 15 lines code, which still shows
your problem. You send that to the community, and surely you'll get more
help.

As a fantastic side effect of that process, normally you actually *find*
the problem by yourself, which is always better, :)

Regards,

--
.. Facundo
..
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Apr 26 '07 #6
In <11**********************@c18g2000prb.googlegroups .com>,
andrew.jefferies wrote:
On Apr 25, 2:51 pm, Larry Bates <larry.ba...@websafe.comwrote:
>andrew.jeffer...@gmail.com wrote:
Hi,
I'm trying to write a simple log parsing program. I noticed that it
isn't reading my log file to the end.
My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.
This is the file reading code segment that I'm using:
sysFile=open(sysFilename,'r')
lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line
I also stuck this same code bit into a test script and it was able to
parse the entire log without problem. Very quirky.

[…]

I've attached the whole script. Thanks again for your help.
There are ``break`` statements in the loop body!? Do you really want to
leave the loop at those places?

And I've seen at least two times ``somefile.close`` which does just
reference the `close()` method but does not *call* it. Parenthesis are
the "call operator" in Python and they are not optional!

Ciao,
Marc 'BlackJack' Rintsch
Apr 26 '07 #7
On Apr 26, 9:48 am, Facundo Batista <facu...@taniquetil.com.arwrote:
andrew.jeffer...@gmail.com wrote:
I've attached the whole script. Thanks again for your help.
--Andrew

Andrew, tip:

If you attach the whole script, what you get is that a lot of people
goes away from the thread. Me for example. I won't read 100 lines of
code to see where is the problem, and then try to solve.

The best way to handle this, and effectively getting more help from the
community, is start to trim your code. So, you take those 20 lines away,
and the problem persist. You cut off another 15, and the problem
persist.

After ten minutes of work, you get a 15 lines code, which still shows
your problem. You send that to the community, and surely you'll get more
help.

As a fantastic side effect of that process, normally you actually *find*
the problem by yourself, which is always better, :)

Regards,

--
. Facundo
.
Blog:http://www.taniquetil.com.ar/plog/
PyAr:http://www.python.org/ar/
Thanks for the input Facundo. I'm not a frequent poster so I'm not up
on the etiquette and do appreciate the feedback. I'll better control
code spews in future posts!
Apr 26 '07 #8
"rb",,please












<an**************@gmail.comwrote in message
news:11**********************@s33g2000prh.googlegr oups.com...
Hi,

I'm trying to write a simple log parsing program. I noticed that it
isn't reading my log file to the end.

My log is around 200,000 lines but it is stopping at line 26,428. I
checked that line and there aren't any special characters.

This is the file reading code segment that I'm using:
sysFile=open(sysFilename,'r')
lineCount = 0
for line in sysFile:
lineCount +=1
print str(lineCount) + " -- " + line

I also stuck this same code bit into a test script and it was able to
parse the entire log without problem. Very quirky.

This is my first foray from Perl to Python so I appreciate any help.

Thanks in advance.

--Andrew

Apr 27 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: simon place | last post by:
is the code below meant to produce rubbish?, i had expected an exception. f=file('readme.txt','w') f.write(' ') f.read() ( PythonWin 2.3 (#46, Jul 29 2003, 18:54:32) on win32. ) I got...
6
by: Russell E. Owen | last post by:
At one time, mixing for x in file and readline was dangerous. For example: for line in file: # read some lines from a file, then break nextline = readline() # bad would not do what a naive...
3
by: Abhas | last post by:
> > Hi, this is Abhas, > > I had made a video library program in C++, but was facing a problem. > > After entering 12 movies, i cannot enter any more movies. > > Something gibberish comes instead....
8
by: siliconwafer | last post by:
Hi All, If I open a binary file in text mode and use text functions to read it then will I be reading numbers as characters or actual values? What if I open a text file and read it using binary...
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
13
by: George | last post by:
Hi, I am re-writing part of my application using C#. This application starts another process which execute a "legacy" program. This legacy program writes to a log file and before it ends, it...
9
by: Use*n*x | last post by:
Hello, I have a binary file (image file) and am reading 4-bytes at a time. The File size is 63,480,320 bytes. My assumption is that if I loop through this file reading 4 bytes at a time, I...
3
by: JDeats | last post by:
I have some .NET 1.1 code that utilizes this technique for encrypting and decrypting a file. http://support.microsoft.com/kb/307010 In .NET 2.0 this approach is not fully supported (a .NET 2.0...
1
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming. FYI Although I have called this...
15
by: patf | last post by:
Hi - experienced programmer but this is my first Python program. This URL will retrieve an excel spreadsheet containing (that day's) msci stock index returns. ...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.