473,404 Members | 2,179 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

string parsing screwing up on large files?

Hello, I'm fairly new to python but I've written a script that takes
in a special text file (a renderman .rib to be specific).. and filters
some of the commands. The .rib file is a simple text file, but in
some cases it's very large.. can be 20megs or more at times.

The script steps though each line looking for keywords and changes the
line if nessisary but most lines just pass in and out of the script
un-modified. The problem is sometimes the lines aren't written out
correctly and it's an intermittent problem. If I re-run the script
again on the same input usually it works fine. After filtering about
100 files i might get 4 or 5 that come out bad.. simply re-running
those fixes them.

Anyone know what I might look for? It's possible that the machine is
under a lot of i/o load and/or cpu load when it happens, but not sure
about that.. I normally send this processing to a render farm, so it's
hard to predict exactly what sort of load is going on at that time. It
feels like a buffer isn't getting flushed before the text is written
out.. or something like that.

Any suggestions where I might look?

thanks

daniel
Jul 18 '05 #1
2 1551
Daniel Kramer:
Any suggestions where I might look?


In the source code, probably. I've looked long and hard at your posting,
but I didn't find any bug there.

--
René Pijlman
Jul 18 '05 #2
On 19 Dec 2003 18:55:29 -0800, da*********@yahoo.com (Daniel Kramer) wrote:
Hello, I'm fairly new to python but I've written a script that takes
in a special text file (a renderman .rib to be specific).. and filters
some of the commands. The .rib file is a simple text file, but in
some cases it's very large.. can be 20megs or more at times.

The script steps though each line looking for keywords and changes the
line if nessisary but most lines just pass in and out of the script
un-modified. The problem is sometimes the lines aren't written out
correctly and it's an intermittent problem. If I re-run the script
again on the same input usually it works fine. After filtering about
100 files i might get 4 or 5 that come out bad.. simply re-running
those fixes them.

Anyone know what I might look for? It's possible that the machine is
under a lot of i/o load and/or cpu load when it happens, but not sure
about that.. I normally send this processing to a render farm, so it's
hard to predict exactly what sort of load is going on at that time. It
feels like a buffer isn't getting flushed before the text is written
out.. or something like that.

Any suggestions where I might look?

What is telling you that some lines aren't correct? Renderman syntax errors?
Maybe if you saved the bad file(s) and re-ran the changes until you got a good
one, and then ran diff -u goodfile badfile to see how things were actually
changing, it would become clear. Or if not, you could post some diffs and
the code that should be accomplishing the changes, and we could go from there.

Is the code threaded? Are you perhaps clobbering something across threads
occasionally? Accidental name collisions? Unsychronized accesses?

You might also want to mention what platform and python version etc you are running.
Maybe there is a file system bug that an upgrade would fix? It doesn't happen often,
but it might be worth googling for for your platform.

Regards,
Bengt Richter
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Paul | last post by:
I have users who want to search 6 different large flat xml documents I can only fit 3 of these documents into memory at one time So I continually have to swap XML documents in and out of memory...
9
by: PedroX | last post by:
Hello: I need to parse some large XML files, and save the data in an Access DB. I was using MSXML 2 and ASP, but it turns out to be extremely slow when then XML documents are like 10 mb in...
3
by: Kevin | last post by:
Does anyone have a suggestion for parsing large files line by line without loading the entire file into memory first? I don't want to use file() because the files I'm working with may be...
1
by: Christoph Bisping | last post by:
Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement...
10
by: Federico G. Babelis | last post by:
Hi, I need to extract a string from another string separated by "," like a .csv file. for example I have this string: String1 = 000,federico,00,439827HGH,1233,FGB,0000,00,000 and from that...
8
by: Eric Anderson | last post by:
I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic...
1
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...
22
by: JJ | last post by:
Whats the best way for me to pull out records from a tab delimited text file? Or rather HOW do I parse the text, knowing that the tabs are field delimiters and a return (I image) signifies a new...
0
by: hd95 | last post by:
I have an rss download stream I am parsing using the xmltextreader object. I am finding that lone ampersands and dashes in the content are screwing up the parsing process. If someone knows how to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.