473,385 Members | 1,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Problem reading/writing files

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20. When printed as strings, these two values look the same (null and
space, respectively), but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)? Thanks in advance!

Aug 4 '06 #1
8 2654
have you been using text mode?

sm*******@hmc.edu wrote:
This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20. When printed as strings, these two values look the same (null and
space, respectively), but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)? Thanks in advance!
Aug 4 '06 #2
sm*******@hmc.edu wrote:
This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20.
No. It doesn't.

Ok, maybe it does, but I doubt this so severely that, without even
checking, I'll bet you a [virtual] beer it doesn't. :-)

Are you opening the file in binary mode?
Ok, I did check, it doesn't.

|>s = '\0'
|>len(s)
1
|>print s
\x00
|>f = open('noway', 'wb')
|>f.write(s)
|>f.close()

Checking that the file is a length 1 null byte:

$ hexdump noway
0000000 0000
0000001
$ ls -l noway
-rw-r--r-- 1 sforman sforman 1 2006-08-03 23:40 noway

Now let's read it and see...

|>f = open('noway', 'rb')
|>s = f.read()
|>f.close()
|>len(s)
1
|>print s
\x00

The problem is not with the read() method. Or, if it is, something
very very weird is going on.

If you can do the above and not get the same results I'd be interested
to know what file data you have, what OS you're using.

Peace,
~Simon

(Think about this: More people than you have tried the challenge, if
this happened to them they'd have mentioned it too, and it would have
fixed or at least addressed by now. Maybe.)

(Hmm, or maybe this is *part* of the challenge?)

Aug 4 '06 #3

sm*******@hmc.edu wrote:
This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20.
I doubt it. What platform? What version of Python? Have you opened the
file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.
When printed as strings, these two values look the same (null and
space, respectively),
Use the repr() function when you want to see what's *really* in an
object:

#>>hasnul = 'a\x00b'
#>>hasspace = 'a\x20b'
#>>print hasnul, hasspace
a b a b
#>>print repr(hasnul), repr(hasspace)
'a\x00b' 'a b'
#>>>

but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)?
"seems to be" != "is" :-)

There is no simple better way. We need to establish what you are
actually doing to cause this problem to seem to happen. Kindly answer
the questions above ;-)

Cheers,
John

Aug 4 '06 #4
What platform? What version of Python? Have you opened the
file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.
Yes, I've been reading and writing everything in binary mode. I'm using
version 2.4 on a Windows XP machine.

Here is the code that I have been using to split up the original file:

f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')
for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:
>>f = open('evil2.gfx','rb')
s = f.read()
print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.

Aug 4 '06 #5
Ok, now I'm very confused, even though I just solved my problem. I
copied the entire contents of the original file (evil2.gfx) from my hex
editor and pasted it into a text file. When I read from *this* file
using my original code, everything worked fine. When I read the 21st
byte, it came up as the correct \x00. Why this didn't work in trying to
read from the original file, I don't know, since the hex values should
be the same, but oh well...

Aug 4 '06 #6
sm*******@hmc.edu schreef:
f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')
for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:
>>>f = open('evil2.gfx','rb')
s = f.read()
print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.
Very weird. I tried your code on my system (Python 2.4, Windows XP) (but
using a copy of evil2.gfx I still had on my system), with no problems.

Are you sure that you don't have 2 copies of that file around, and that
your program is using the wrong one? Or is it possible that some module
imported with 'from blabla import *' clashes with the builtin open()?

--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven
Aug 4 '06 #7
Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.

Aug 4 '06 #8
sm*******@hmc.edu schreef:
Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.
Weird indeed: I ran the script under IDLE too...
--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven
Aug 4 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Brandon McCombs | last post by:
This may be the wrong group but I didn't see anything for VC++ so I'm trying here. I have a C++ book by Deitel and Deitel that says I can use fstream File("data.dat", ios::in | ios::out |...
25
by: Xah Lee | last post by:
Python Doc Problem Example: gzip Xah Lee, 20050831 Today i need to use Python to compress/decompress gzip files. Since i've read the official Python tutorial 8 months ago, have spent 30...
11
by: Abhishek | last post by:
I have a problem transfering files using sockets from pocket pc(.net compact c#) to desktop(not using .net just mfc and sockets 2 API). The socket communication is not a issue and I am able to...
0
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen....
7
by: jsale | last post by:
I'm currently using ASP.NET with VS2003 and SQL Server 2003. The ASP.NET app i have made is running on IIS v6 and consists of a number of pages that allow the user to read information from the...
23
by: Babak | last post by:
Hi Everyone, I've written a standard C code for a simple finite element analysis in MSVC++ . When I save the file as a cpp file, it compiles and runs perfectly, but when I save it as a c file,...
6
by: arne.muller | last post by:
Hello, I've come across some problems reading strucutres from binary files. Basically I've some strutures typedef struct { int i; double x; int n; double *mz;
2
by: patrickdepinguin | last post by:
Hi, I use zlib to write data structures to a compressed file, using the gzwrite function. Afterwards I read the data back with gzread. I notice that this works well when the data written is not...
5
by: Neil Crighton | last post by:
I'm using the zipfile library to read a zip file in Windows, and it seems to be adding too many newlines to extracted files. I've found that for extracted text-encoded files, removing all instances...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.