how to do reading of binary files?

jvdb

Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

thanks already!

Jeroen

Jun 8 '07 #1

Subscribe Reply

1572

Diez B. Roggisch

jvdb schrieb:

Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

Jun 8 '07 #2

jvdb

On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:

jvdb schrieb:

.......

What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

Jun 8 '07 #3

Diez B. Roggisch

jvdb schrieb:

On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>jvdb schrieb:
......
>What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

diez

Jun 8 '07 #4

Marc 'BlackJack' Rintsch

In <5c*************@mid.uni-berlin.de>, Diez B. Roggisch wrote:

jvdb schrieb:
>True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.

Ciao,
Marc 'BlackJack' Rintsch

Jun 8 '07 #5

jvdb

On 8 jun, 15:19, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:

In <5ct0evF31n73...@mid.uni-berlin.de>, Diez B. Roggisch wrote:

jvdb schrieb:
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)

will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.

Ciao,
Marc 'BlackJack' Rintsch

Hi,

your last comment is also something i have noticed. There are a number
of occasions where this will happen. I also have to deal with this.
I will dive into this on monday, after this hot weekend.

cheers,
Jeroen

Jun 8 '07 #6

Grant Edwards

On 2007-06-08, jvdb <st***********@gmail.comwrote:

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

I'd memmap the file.

http://docs.python.org/lib/module-mmap.html

If you prefer it to appear as an array of bytes instead of a
string, the various numeric/array packags can do that.

Numarray: http://stsdas.stsci.edu/numarray/num...ay.memmap.html
Vmaps: http://snafu.freedom.org/Vmaps/Vmaps.html
Numpy: <documentation is not free>

Since I can't point you to Numpy docs, here's a link to a
newsgroup thread with an example for numpy:

http://groups.google.com/group/comp....36baa98386d5e7

--
Grant Edwards grante Yow! I like your SNOOPY
at POSTER!!
visi.com

Jun 8 '07 #7

Roger Miller

On Jun 8, 2:07 am, "Diez B. Roggisch" <d...@nospam.web.dewrote:

>...

What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

Better make that 'open("file", "rb").

Jun 8 '07 #8

Similar topics

8880

reading .raw files (windows-linux)

by: christos panagiotou | last post by:

hi all I am trying to open some .raw files that represent images (256x256, 8 bit per pixel, no header) in a c++ program I cannot copy paste the module here as it uses a method from the VTK...

C / C++

2801

Reading from standard in

by: Olivier Maurice | last post by:

Hi all, I suppose some of you know the program Redmon (type redmon in google, first result). This neat little tool allows to hook up any functionality to a printer by putting the file printed...

C / C++

4200

reading a binary file in C++ and having trouble.

by: laclac01 | last post by:

So I am converting some matlab code to C++. I am stuck at one part of the code. The matlab code uses fread() to read in to a vector a file. It's a binary file. The vector is made up of floats,...

C / C++

4877

Reading whole text files

by: Michael Mair | last post by:

Cheerio, I would appreciate opinions on the following: Given the task to read a _complete_ text file into a string: What is the "best" way to do it? Handling the buffer is not the problem...

C / C++

6427

reading strings from binary files - performance issue

by: rnorthedge | last post by:

I am working on a code library which needs to read in the data from large binary files. The files hold int, double and string data. This is the code for reading in the strings: protected...

C# / C Sharp

6030

Reading structures from a binary file

by: John Dann | last post by:

I'm trying to read some binary data from a file created by another program. I know the binary file format but can't change or control the format. The binary data is organised such that it should...

Visual Basic .NET

1881

Reading binary files

by: amfr | last post by:

On windows, is there anything special I have to do to read a binary file correctly?

Python

4535

How to stop reading a file?

by: siliconwafer | last post by:

Hi All, I want to know tht how can one Stop reading a file in C (e.g a Hex file)with no 'EOF'?

C / C++

3228

Need help reading UTF-16 files ...

by: nnimod | last post by:

Hi. I'm having trouble reading some unicode files. Basically, I have to parse certain files. Some of those files are being input in Japanese, Chinese etc. The easiest way, I figured, to distinguish...

C / C++

5237

problem reading/writing structures from and to files

by: arne.muller | last post by:

Hello, I've come across some problems reading strucutres from binary files. Basically I've some strutures typedef struct { int i; double x; int n; double *mz;

C / C++

7136

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7018

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7182

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7232

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

4923

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

3110

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

3106

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

672

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

316

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General