473,382 Members | 1,400 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

file corruption on windows - possible bug

I've written a piece of code that iterates through a list of items and
determines the filename to write some piece of data to based on
something in the item itself. Here is a small example piece of code to
show the type of thing I'm doing::

#################################
file_dict = {}

a_list = [("a", "a%s" % i) for i in range(2500)]
b_list = [("b", "b%s" % i) for i in range(2500)]
c_list = [("c", "c%s" % i) for i in range(2500)]
d_list = [("d", "d%s" % i) for i in range(2500)]
joined_list = a_list + b_list + c_list + d_list

for key, value in joined_list:
outfile = file_dict.setdefault(key, open("%s.txt" % key, "w"))
outfile.write("%s\n" % value)

for f in file_dict.values():
f.close()
#################################

Problem is, when I run this on Windows, I get 14,520 null ("\x00")
characters at the front of the file and each file is 16,390 bytes long.
When I run this script on Linux, each file is 13,890 bytes and contains
no "\x00" characters. This piece of code::

#################################
import cStringIO

file_dict = {}

a_list = [("a", "a%s" % i) for i in range(2500)]
b_list = [("b", "b%s" % i) for i in range(2500)]
c_list = [("c", "c%s" % i) for i in range(2500)]
d_list = [("d", "d%s" % i) for i in range(2500)]
joined_list = a_list + b_list + c_list + d_list

for key, value in joined_list:
#outfile = file_dict.setdefault(key, open("%s.txt" % key, "w"))
outfile = file_dict.setdefault(key, cStringIO.StringIO())
outfile.write("%s\n" % value)

for key, io_string in file_dict.items():
outfile = open("%s.txt" % key, "w")
io_string.seek(0)
outfile.write(io_string.read())
outfile.close()
#################################

results in files containing 16,390 bytes and no "\x00" characters on
Windows and 13,890 bytes on Linux and no "\x00" characters (file size
difference on Windows and Linux is due to line ending). I'm still doing
a setdefault on the dictionary to create an object if the key doesn't
exist, but I'm using a cStringIO object rather than a file object. So,
I'm treating this just like it was a file and writing it out later.

Does anyone have any idea as to why this is writing over 14,000 "\x00"
characters to my file to start off with where printable characters
should go and then writing the remainder of the file correctly?
Jeremy Jones
Jul 19 '05 #1
2 1568
Jeremy Jones wrote:
Here is a small example piece of code to
show the type of thing I'm doing::

#################################
file_dict = {}

a_list = [("a", "a%s" % i) for i in range(2500)]
b_list = [("b", "b%s" % i) for i in range(2500)]
c_list = [("c", "c%s" % i) for i in range(2500)]
d_list = [("d", "d%s" % i) for i in range(2500)]
joined_list = a_list + b_list + c_list + d_list

for key, value in joined_list:
outfile = file_dict.setdefault(key, open("%s.txt" % key, "w"))
outfile.write("%s\n" % value)

for f in file_dict.values():
f.close()
#################################

Problem is, when I run this on Windows, I get 14,520 null ("\x00")
characters at the front of the file and each file is 16,390 bytes long.


Your call to setdefault is opening the file for writing every time it is
called, but using only the first handle to write to the file. I presume you
get a nasty interaction between the file handle you are using to write and
the other file handles which open the file in a destructive ("w") mode.

The fix is simply to only open each file once instead of 2500 times. e.g.
(untested code)

for key, value in joined_list:
if key in file_dict:
outfile = file_dict[key]
else:
outfile = file_dict[key] = open("%s.txt" % key, "w")
outfile.write("%s\n" % value)
Jul 19 '05 #2
On Mon, 09 May 2005 10:54:22 -0400, Jeremy Jones <za******@bellsouth.net> wrote:
I've written a piece of code that iterates through a list of items and
determines the filename to write some piece of data to based on
something in the item itself. Here is a small example piece of code to
show the type of thing I'm doing::

#################################
file_dict = {}

a_list = [("a", "a%s" % i) for i in range(2500)]
b_list = [("b", "b%s" % i) for i in range(2500)]
c_list = [("c", "c%s" % i) for i in range(2500)]
d_list = [("d", "d%s" % i) for i in range(2500)]
joined_list = a_list + b_list + c_list + d_list

for key, value in joined_list:
outfile = file_dict.setdefault(key, open("%s.txt" % key, "w")) You are opening files multiply, since the open is a default value expression that is
always evaluated. Try replacing the above line with the following two lines:
try: outfile = file_dict[key]
except KeyError: outfile = file_dict[key] = open("%s.txt" % key, 'w') outfile.write("%s\n" % value)

for f in file_dict.values():
f.close()
#################################

Problem is, when I run this on Windows, I get 14,520 null ("\x00")
characters at the front of the file and each file is 16,390 bytes long.
When I run this script on Linux, each file is 13,890 bytes and contains
no "\x00" characters. This piece of code::

I don't want to think about the _exact_ explanation, but try the above (untested ;-)
and see if the symptoms change ;-)

Regards,
Bengt Richter
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: lsloan | last post by:
Hi! I have a reoccuring problem with one of my databases where forms will occasionally become corrupted. Since I have backup copies of my production databases, it is easy to recover from using...
18
by: Andre Laplume via AccessMonster.com | last post by:
I have inherited a bunch of dbs which are are shared among a small group in my dept. We typically use the dbs to write queries to extract data, usually dumping it into Excel. Most dbs originated...
17
by: shineofleo | last post by:
Here is the situation: I wrote a VB programm, which stores all the information in a single Access database file using jet engine. It worked well, however one of my customs reported that there was...
15
by: Amir Michail | last post by:
Hi, Trying to open a file for writing that is already open for writing should result in an exception. It's all too easy to accidentally open a shelve for writing twice and this can lead to...
0
by: Kritin | last post by:
Hi, Could someone please guide me as to how shud i fetch opertional attributes like creatorsname,numsubordinates in my c# Windows Application. i m using DireetoryServices.It fetches me...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.