473,395 Members | 1,666 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

file IO

Can anyone explain this?

I have a file called old.dat with two lines:

1
2

So it's 3 bytes long. I run the following:

import os
f = file('old.dat',mode='r')
olddata = f.readlines()
f.close()

f = file('new.dat',mode='w')
f.writelines(olddata)
f.close()

new.dat is now 4 bytes long. ???

I need to reformat and then save some data. Then I need to be able to
export the reformatted data to a spreadsheet-friendly format. But once I
have simply copied (trying to isolate the problem) the file using the
script above, my export function takes 10x as long as it would have with
the original file. And worse, the output has an extra newline character
added at the end of each line. Any suggestions would really be
appreciated, I am going a bit crazy trying to understand this.

Darren
Jul 18 '05 #1
5 3531
Darren Dale wrote:
Can anyone explain this?

I have a file called old.dat with two lines:

1
2

So it's 3 bytes long. I run the following:

import os
f = file('old.dat',mode='r')
olddata = f.readlines()
f.close()

f = file('new.dat',mode='w')
f.writelines(olddata)
f.close()

new.dat is now 4 bytes long. ???

I need to reformat and then save some data. Then I need to be able to
export the reformatted data to a spreadsheet-friendly format. But once I
have simply copied (trying to isolate the problem) the file using the
script above, my export function takes 10x as long as it would have with
the original file. And worse, the output has an extra newline character
added at the end of each line. Any suggestions would really be
appreciated, I am going a bit crazy trying to understand this.

Darren


One more bit of info. The extra newline character is added to output
when I open the rewritten file like this:

import os
from mmap import mmap, ACCESS_READ
f = file('foobar.dat',mode='rU')
fd = f.fileno()
m = mmap(fd, os.fstat(fd).st_size, None, ACCESS_READ)
olddata = []
line = m.readline()
while line:
olddata.append(line)
line = m.readline()

using mmap to read the original datafile works. Any thoughts? I would
really like to stick with mmap, my datafiles are the right size to
really benefit.

Darren
Jul 18 '05 #2
Are you using Windows? That would mean the answer is almost certainly
"something to do with carriage returns and binary vs text mode". The
lack of a trailing newline on the last line of your example can also
make for additional trouble (though my tests on unix, with stdio, mmap,
and StringIO didn't ever give me a 4-byte file, windows might give you
the file "a\r\nb" when viewed in binary format, "a\nb" when viewed in
text format)

I doubt that the mmap module's readline knows whether the file was
opened in universal text mode---that's a pure Python invention, while
mmap takes a file descriptor.

On Unix, I don't find that a "while" loop with mmap.readline is any
faster than a "for" loop over a file:

[45426 lines, 409305 bytes]
$ timeit -s "..." "readspeed.read_stdio('/usr/share/dict/words')"
10 loops, best of 3: 34.9 msec per loop
$ timeit -s "..." "readspeed.read_mmap('/usr/share/dict/words')"
10 loops, best of 3: 107 msec per loop

[363416 lines, 3274440 bytes]
$ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
real 0.372s user 0.331s sys 0.031s
$ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
real 1.080s user 1.013s sys 0.021s

[2907328 lines, 26195520 bytes]
$ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
real 2.603s user 2.308s sys 0.157s
$ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
real 8.514s user 7.893s sys 0.153s

I didn't have any "bigger-than-RAM text files" around to test.

Testing "biggerfile.txt" with mode "rU" gives real 3.110s, so there is
some penalty from using universal newlines.

------------------------------------------------------------------------
# readspeed.py
from mmap import mmap, PROT_READ
import itertools, os

def consume(iterable):
for j in iterable: pass

def read_stdio(filename):
f = open(filename) # open(filename, "rU") is slightly slower
consume(f)

def read_mmap(filename):
f = open(filename)
fd = f.fileno()
m = mmap(fd, os.fstat(fd).st_size, prot=PROT_READ)
while 1:
if not m.readline(): break
------------------------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBDvnHJd01MZaTXX0RAoEKAJ9r/zUIJ2WXmFtFSi8LO8jo8AjCdACdFtUl
jz2rnP0xWsnIU8pmfFNeH6w=
=sw4W
-----END PGP SIGNATURE-----

Jul 18 '05 #3
Jeff Epler wrote:
Are you using Windows? That would mean the answer is almost certainly
"something to do with carriage returns and binary vs text mode". The
lack of a trailing newline on the last line of your example can also
make for additional trouble (though my tests on unix, with stdio, mmap,
and StringIO didn't ever give me a 4-byte file, windows might give you
the file "a\r\nb" when viewed in binary format, "a\nb" when viewed in
text format)

I doubt that the mmap module's readline knows whether the file was
opened in universal text mode---that's a pure Python invention, while
mmap takes a file descriptor.


I am using windows (for now), and reading files created on a Linux
machine. I think you are right, it has something to do with mmap and the
/r/n windows convention. Thank you (very much) for your response... I am
sane again.

Darren
Jul 18 '05 #4
Jeff Epler <je****@unpythonic.net> wrote in message news:<ma**************************************@pyt hon.org>...
Are you using Windows? That would mean the answer is almost certainly
"something to do with carriage returns and binary vs text mode". The
lack of a trailing newline on the last line of your example can also
make for additional trouble (though my tests on unix, with stdio, mmap,
and StringIO didn't ever give me a 4-byte file, windows might give you
the file "a\r\nb" when viewed in binary format, "a\nb" when viewed in
text format)

I doubt that the mmap module's readline knows whether the file was
opened in universal text mode---that's a pure Python invention, while
mmap takes a file descriptor.

On Unix, I don't find that a "while" loop with mmap.readline is any
faster than a "for" loop over a file:

[45426 lines, 409305 bytes]
$ timeit -s "..." "readspeed.read_stdio('/usr/share/dict/words')"
10 loops, best of 3: 34.9 msec per loop
$ timeit -s "..." "readspeed.read_mmap('/usr/share/dict/words')"
10 loops, best of 3: 107 msec per loop

[363416 lines, 3274440 bytes]
$ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
real 0.372s user 0.331s sys 0.031s
$ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
real 1.080s user 1.013s sys 0.021s

[2907328 lines, 26195520 bytes]
$ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
real 2.603s user 2.308s sys 0.157s
$ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
real 8.514s user 7.893s sys 0.153s

I didn't have any "bigger-than-RAM text files" around to test.

Testing "biggerfile.txt" with mode "rU" gives real 3.110s, so there is
some penalty from using universal newlines.

------------------------------------------------------------------------
# readspeed.py
from mmap import mmap, PROT_READ
import itertools, os

def consume(iterable):
for j in iterable: pass

def read_stdio(filename):
f = open(filename) # open(filename, "rU") is slightly slower
consume(f)

def read_mmap(filename):
f = open(filename)
fd = f.fileno()
m = mmap(fd, os.fstat(fd).st_size, prot=PROT_READ)
while 1:
if not m.readline(): break
------------------------------------------------------------------------

--

I've come across this in C, now that I'm forced to work under XP
(Thank you, Cygwin!)

Open the file 'rb' or 'r+b' and you avoid the entire issue of newlines.
Jul 18 '05 #5
Darren Dale wrote:

You might want to look at using ilines on your mmap'ed file's data.
<http://members.dsl-only.net/~daniels/ilines.html>
This will give you access to building a universal newline generator
from a 'block-o-characters' generator.

--
-Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: matt | last post by:
I have compiled some code, some written by me, some compiled from various sources online, and basically i've got a very simple flat file photo gallery. An upload form, to upload the photos and give...
5
by: Dave Smithz | last post by:
Hi There, I have a PHP script that sends an email with attachment and works great when provided the path to the file to send. However this file needs to be on the same server as the script. ...
7
by: Joseph | last post by:
Hi, I'm having bit of questions on recursive pointer. I have following code that supports upto 8K files but when i do a file like 12K i get a segment fault. I Know it is in this line of code. ...
3
by: StGo | last post by:
How can i read/write file's custom attributs(like subject,author...) in C#??? Thanks :))
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
13
by: Sky Sigal | last post by:
I have created an IHttpHandler that waits for uploads as attachments for a webmail interface, and saves it to a directory that is defined in config.xml. My question is the following: assuming...
1
by: Roy | last post by:
Hi, I have a problem that I have been working with for a while. I need to be able from server side (asp.net) to detect that the file i'm streaming down to the client is saved...
3
by: Shapper | last post by:
Hello, I created a script to upload a file. To determine the file type I am using userPostedFile.ContentType. For example, for a png image I get "image/png". My questions are: 1. Where can...
0
by: troutbum | last post by:
I am experiencing problems when one user has a document open through a share pointing to the web site. I use the dsolefile to read the contents of a particular directory and then display them in a...
0
by: thjwong | last post by:
I'm using WinXP with Microsoft Visual C++ .NET 69462-006-3405781-18776, Microsoft Development Environment 2003 Version 7.1.3088, Microsoft .NET Framework 1.1 Version 1.1.4322 SP1 Most developers...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.