Hi all,
I need some help on the following issue. I can't seem to solve it.
I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?
thanks already!
Jeroen 7 1566
jvdb schrieb:
Hi all,
I need some help on the following issue. I can't seem to solve it.
I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
jvdb schrieb:
.......
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
jvdb schrieb:
On 8 jun, 14:07, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>jvdb schrieb:
......
>What has the searching to do with the reading? 10MB easily fit into the main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
diez
In <5c*************@mid.uni-berlin.de>, Diez B. Roggisch wrote:
jvdb schrieb:
>True. But there is another issue attached to the one i wrote. When i know how much this occurs, i know the amount of pages in the file. After that i would like to be able to extract a given amount of data: file x contains 20 <0C>. then for example i would like to extract from instance 5 to instance 12 from the file. The reason why i want to do this: The 0C stands for a pagebreak in PCL language. This way i would be absle to extract a certain amount of pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.
One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.
Ciao,
Marc 'BlackJack' Rintsch
On 8 jun, 15:19, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
In <5ct0evF31n73...@mid.uni-berlin.de>, Diez B. Roggisch wrote:
jvdb schrieb:
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?
Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.
One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.
Ciao,
Marc 'BlackJack' Rintsch
Hi,
your last comment is also something i have noticed. There are a number
of occasions where this will happen. I also have to deal with this.
I will dive into this on monday, after this hot weekend.
cheers,
Jeroen
On Jun 8, 2:07 am, "Diez B. Roggisch" <d...@nospam.web.dewrote:
>...
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do
contents = open("file").read() # yes I know I should close the file...
print contents.find('\x0c')
Diez
Better make that 'open("file", "rb"). This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: christos panagiotou |
last post by:
hi all
I am trying to open some .raw files that represent images (256x256, 8
bit per pixel, no header) in a c++ program
I cannot copy paste the module here as it uses a method from the VTK...
|
by: Olivier Maurice |
last post by:
Hi all,
I suppose some of you know the program Redmon (type redmon in google, first
result). This neat little tool allows to hook up any functionality to a
printer by putting the file printed...
|
by: laclac01 |
last post by:
So I am converting some matlab code to C++. I am stuck at one part of
the code. The matlab code uses fread() to read in to a vector a file.
It's a binary file. The vector is made up of floats,...
|
by: Michael Mair |
last post by:
Cheerio,
I would appreciate opinions on the following:
Given the task to read a _complete_ text file into a string:
What is the "best" way to do it?
Handling the buffer is not the problem...
|
by: rnorthedge |
last post by:
I am working on a code library which needs to read in the data from
large binary files. The files hold int, double and string data. This
is the code for reading in the strings:
protected...
|
by: John Dann |
last post by:
I'm trying to read some binary data from a file created by another
program. I know the binary file format but can't change or control the
format. The binary data is organised such that it should...
|
by: amfr |
last post by:
On windows, is there anything special I have to do to read a binary
file correctly?
|
by: siliconwafer |
last post by:
Hi All,
I want to know tht how can one Stop reading a file in C (e.g a Hex
file)with no 'EOF'?
|
by: nnimod |
last post by:
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish...
|
by: arne.muller |
last post by:
Hello,
I've come across some problems reading strucutres from binary files.
Basically I've some strutures
typedef struct {
int i;
double x;
int n;
double *mz;
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM).
In this month's session, the creator of the excellent VBE...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Aftab Ahmad |
last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below.
Dim IE As Object
Set IE =...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
| |