Hi all,
I need some help on the following issue. I can't seem to solve it.
I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?
thanks already!
Jeroen 7 1572
jvdb schrieb:
Hi all,
I need some help on the following issue. I can't seem to solve it.
I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
jvdb schrieb:
.......
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
jvdb schrieb:
On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>jvdb schrieb:
......
>What has the searching to do with the reading? 10MB easily fit into the main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
diez
In <5c*************@mid.uni-berlin.de>, Diez B. Roggisch wrote:
jvdb schrieb:
>True. But there is another issue attached to the one i wrote. When i know how much this occurs, i know the amount of pages in the file. After that i would like to be able to extract a given amount of data: file x contains 20 <0C>. then for example i would like to extract from instance 5 to instance 12 from the file. The reason why i want to do this: The 0C stands for a pagebreak in PCL language. This way i would be absle to extract a certain amount of pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.
One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.
Ciao,
Marc 'BlackJack' Rintsch
On 8 jun, 15:19, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
In <5ct0evF31n73...@mid.uni-berlin.de>, Diez B. Roggisch wrote:
jvdb schrieb:
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.
One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.
Ciao,
Marc 'BlackJack' Rintsch
Hi,
your last comment is also something i have noticed. There are a number
of occasions where this will happen. I also have to deal with this.
I will dive into this on monday, after this hot weekend.
cheers,
Jeroen
On Jun 8, 2:07 am, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>...
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
Better make that 'open("file", "rb"). This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: christos panagiotou |
last post by:
hi all
I am trying to open some .raw files that represent images (256x256, 8
bit per pixel, no header) in a c++ program
I cannot copy paste the module here as it uses a method from the VTK...
|
by: Olivier Maurice |
last post by:
Hi all,
I suppose some of you know the program Redmon (type redmon in google, first
result). This neat little tool allows to hook up any functionality to a
printer by putting the file printed...
|
by: laclac01 |
last post by:
So I am converting some matlab code to C++. I am stuck at one part of
the code. The matlab code uses fread() to read in to a vector a file.
It's a binary file. The vector is made up of floats,...
|
by: Michael Mair |
last post by:
Cheerio,
I would appreciate opinions on the following:
Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem...
|
by: rnorthedge |
last post by:
I am working on a code library which needs to read in the data from
large binary files. The files hold int, double and string data. This
is the code for reading in the strings:
protected...
| |
by: John Dann |
last post by:
I'm trying to read some binary data from a file created by another
program. I know the binary file format but can't change or control the
format. The binary data is organised such that it should...
|
by: amfr |
last post by:
On windows, is there anything special I have to do to read a binary
file correctly?
|
by: siliconwafer |
last post by:
Hi All,
I want to know tht how can one Stop reading a file in C (e.g a Hex
file)with no 'EOF'?
|
by: nnimod |
last post by:
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish...
|
by: arne.muller |
last post by:
Hello,
I've come across some problems reading strucutres from binary files.
Basically I've some strutures
typedef struct {
int i;
double x;
int n;
double *mz;
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |