471,347 Members | 1,740 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,347 software developers and data experts.

how to do reading of binary files?

Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

thanks already!

Jeroen

Jun 8 '07 #1
7 1498
jvdb schrieb:
Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez
Jun 8 '07 #2
On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
jvdb schrieb:
.......
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

Jun 8 '07 #3
jvdb schrieb:
On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>jvdb schrieb:
......
>What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

diez
Jun 8 '07 #4
In <5c*************@mid.uni-berlin.de>, Diez B. Roggisch wrote:
jvdb schrieb:
>True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.

Ciao,
Marc 'BlackJack' Rintsch
Jun 8 '07 #5
On 8 jun, 15:19, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
In <5ct0evF31n73...@mid.uni-berlin.de>, Diez B. Roggisch wrote:
jvdb schrieb:
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.

Ciao,
Marc 'BlackJack' Rintsch
Hi,

your last comment is also something i have noticed. There are a number
of occasions where this will happen. I also have to deal with this.
I will dive into this on monday, after this hot weekend.

cheers,
Jeroen

Jun 8 '07 #6
On 2007-06-08, jvdb <st***********@gmail.comwrote:
I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?
I'd memmap the file.

http://docs.python.org/lib/module-mmap.html

If you prefer it to appear as an array of bytes instead of a
string, the various numeric/array packags can do that.

Numarray: http://stsdas.stsci.edu/numarray/num...ay.memmap.html
Vmaps: http://snafu.freedom.org/Vmaps/Vmaps.html
Numpy: <documentation is not free>

Since I can't point you to Numpy docs, here's a link to a
newsgroup thread with an example for numpy:

http://groups.google.com/group/comp....36baa98386d5e7

--
Grant Edwards grante Yow! I like your SNOOPY
at POSTER!!
visi.com
Jun 8 '07 #7
On Jun 8, 2:07 am, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>...

What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez
Better make that 'open("file", "rb").

Jun 8 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by christos panagiotou | last post: by
3 posts views Thread by Olivier Maurice | last post: by
7 posts views Thread by laclac01 | last post: by
50 posts views Thread by Michael Mair | last post: by
7 posts views Thread by John Dann | last post: by
2 posts views Thread by amfr | last post: by
30 posts views Thread by siliconwafer | last post: by
2 posts views Thread by nnimod | last post: by
6 posts views Thread by arne.muller | last post: by
reply views Thread by Ronak mishra | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.