473,769 Members | 6,597 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

find a pattern in binary file

Hi there,
i need to find an hex pattern like 0x650A1010 in a binary file.
i can make a small algorithm that fetch all the file for the match,
but this file is huge, and i'm scared about performances.
Is there any stl method for a fast search?
Andrea
Jun 27 '08
16 8951
On 21 Giu, 12:26, James Kanze <james.ka...@gm ail.comwrote:
On Jun 21, 3:59 am, "Eric Pruneau" <eric.prun...@c gocable.cawrote :
"vizzz" <andrea.visin.. .@gmail.coma écrit dans le message de news:
aad55897-6560-4fd7-ae4f-5b8cc810f...@a7 0g2000hsh.googl egroups.com...
i need to find an hex pattern like 0x650A1010 in a binary file.
i can make a small algorithm that fetch all the file for the match,
but this file is huge, and i'm scared about performances.
Is there any stl method for a fast search?
Andrea
Check out *boost::regex

Which requires a forward iterator, and so can't be used on data
in a file (for which he'll have at best an input iterator).

Also, if he's only looking for a fixed string, it's likely to be
significantly slower than some other algorithms.
Maybe explaining my goal can be useful.
in jpeg2000 files (jp2) there are several boxes made of 4byte length,
4byte type and then data.
i must check if box exist by searching somewhere in the file (boxes
can be anywhere in the whole file) for the box type (ex 0x650A1010).
Jun 27 '08 #11
James Kanze wrote:
On Jun 21, 2:13 am, Kai-Uwe Bux <jkherci...@gmx .netwrote:
>Ivan wrote:
On Jun 20, 1:11 pm, vizzz <andrea.visin.. .@gmail.comwrot e:
Hmmm... I had a look at this and ran accross a simple
problem. How do you read a binary file and just echo the
HEX for byte to the screen.
[snip]
The issue is the c++ read function doesn't return number of
bytes read... so on the last read into a buffer how do you
know how many characters to print?
>Have a look at readsome().

Yes, have a look at it. Read it's specification very carefully.
Because if you do, you're realize that it is absolutely
worthless here.
I reread it again. I fail to see why it's worthless. Obviously, I am missing
something.
The function he's looking for is istream::gcount (), which
returns the number of bytes read by the last unformatted read.
His basic loop would be:

while ( input.read( &buffer[ 0 ], buffer.size() ) ) {
process( buffer.begin(), buffer.end() ) ;
}
process( buffer.begin(), buffer.begin() + input.gcount() ) ;
On the other hand, that looks very clean.
Best

Kai-Uwe
Jun 27 '08 #12
vizzz wrote:
Maybe explaining my goal can be useful.
in jpeg2000 files (jp2) there are several boxes made of 4byte length,
4byte type and then data.
i must check if box exist by searching somewhere in the file (boxes
can be anywhere in the whole file) for the box type (ex 0x650A1010).
What is the largest file size and on which system
do you want this to happen?

The C-memchr is, on modern compilers, very very
fast (it does 8 byte alignment on the pointer,
scans 32 or 64 bit at a time by bit ops and so on.)

You can't simply beat that one. Read the file
as a block (fread after stat(), ftell/SEEK_END)
or in chunks and find the first byte (and compare
the rest).

Otherwise, you could give memcmp() a shot
http://www.cplusplus.com/reference/c...ng/memcmp.html
maybe its optimized as hard as memchr() is.
I didn't look into this but know from memchr()
it would get about double speed compared to the
naive implementation: if(*p == *q) ...

But if you can't slurp the whole file at
once into memory, you have of course to
deal with the possibility of broken pattern
across the read block boundary.

Regards

M.
Jun 27 '08 #13
James Kanze wrote:
On Jun 21, 2:13 am, Kai-Uwe Bux <jkherci...@gmx .netwrote:
>Ivan wrote:
On Jun 20, 1:11 pm, vizzz <andrea.visin.. .@gmail.comwrot e:
Hmmm... I had a look at this and ran accross a simple
problem. How do you read a binary file and just echo the
HEX for byte to the screen.
>#include <iostream>
#include <ostream>
#include <fstream>
#include <iterator>
#include <iomanip>
#include <algorithm>
#include <cassert>
>class print_hex {
> std::ostream * ostr_ptr;
unsigned int line_length;
unsigned int index;
>public:
> print_hex ( std::ostream & str_ref, unsigned int length )
: ostr_ptr( &str_ref )
, line_length ( length )
, index ( 0 )
{}
> void operator() ( unsigned char ch ) {
++index;
if ( index >= line_length ) {
(*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
<< (unsigned int)(ch) << '\n';
index = 0;
} else {
(*ostr_ptr) << std::hex << std::setw(2) << std::setfill( '0' )
<< (unsigned int)(ch) << ' ';

Wouldn't it be preferable to set the formatting flags in the
constructor?
Yup.
I'd also provide an "indent" argument; if index
were 0, I'd output indent spaces, otherwise a single space---or
perhaps the best solution would be to provide a start of line
and a separator string to the constructor, then:
Good idea.

(*ostr_ptr)
<< (inLineCount == 0 ? startString : separString)
<< std::setw( 2 ) << (unsigned int)( ch ) ;
++ inLineCount ;
if ( inLineCount == lineLength ) {
(*ostr_ptr) << endString ;
inLineCount = 0 ;
}

(This supposes that hex and fill were set in the constructor.)
Given the copying that's going on, I'd also simulate move
semantics, so that the final destructor could do something like:

if ( inLineCount != 0 ) {
(*ostr_ptr) << endString ;
}
> }
}
};

>int main ( int argn, char ** args ) {
assert( argn == 2 );
std::ifstream in ( args[1] );
std::for_each( std::istreambuf _iterator< char >( in ),
std::istreambuf _iterator< char >(),
print_hex( std::cout, 25 ) );

Unless you're doing something relatively generic, with support
for different separators, etc., this really looks like a case of
for_each abuse.
Actually, with regard to for_each, I am growing more and more comfortable
using it. Of all algorithms, for_each seems the most silly; on the other
hand it is also the one that has the largest potential for specialized
versions that take advantage of internal knowledge about the underlying
sequence. E.g., I can easily imagine a special version for iterators into a
deque (where for_each would iterate over pages and within each page would
use a very fast loop using T* where it can skip the test for reaching a
page end). Similar optimizations should be possible for stream iterators.

> std::cout << '\n';

Which results in one new line too many if the number of elements
just happened to be an exact multiple of the line length.
You are making up specs :-)

But seriously: you are right, of course.

About the only real use for this sort of output I've found is
debugging or experimenting, but there, I use it often enough
that I've a generic Dump<Tclass (and a generic function which
returns it, for automatic type deduction), so that I can write
things like:

std::cout << dump( someObject ) << std::endl ;
[snip]

Hm, I never had a use for hex dumping objects. But, maybe I should try that
out.
Best

Kai-Uwe Bux
Jun 27 '08 #14
On Jun 21, 8:35 pm, Kai-Uwe Bux <jkherci...@gmx .netwrote:
James Kanze wrote:
On Jun 21, 2:13 am, Kai-Uwe Bux <jkherci...@gmx .netwrote:
Ivan wrote:
On Jun 20, 1:11 pm, vizzz <andrea.visin.. .@gmail.comwrot e:
Hmmm... I had a look at this and ran accross a simple
problem. How do you read a binary file and just echo the
HEX for byte to the screen.
[snip]
The issue is the c++ read function doesn't return number of
bytes read... so on the last read into a buffer how do you
know how many characters to print?
Have a look at readsome().
Yes, have a look at it. Read it's specification very carefully.
Because if you do, you're realize that it is absolutely
worthless here.
I reread it again. I fail to see why it's worthless.
Obviously, I am missing something.
It will read a maximum of streambuf::in_a vail characters. If
there are no characters in the buffer, streambuf::in_a vail calls
showmanyc. And by default, all showmanyc does is return 0. An
implementation of filebuf may do more, if the system supports
some means of finding out exactly how many characters are in the
file, but it's not required to. Which means that basically,
readsome() may stop (returning 0 characters read) as soon as
there are no more characters in the buffer.

--
James Kanze (GABI Software) email:ja******* **@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #15
On 21 Giu, 20:57, Mirco Wahab <wahab-m...@gmx.dewrot e:
vizzz wrote:
Maybe explaining my goal can be useful.
in jpeg2000 files (jp2) there are several boxes made of 4byte length,
4byte type and then data.
i must check if box exist by searching somewhere in the file (boxes
can be anywhere in the whole file) for the box type (ex 0x650A1010).

What is the largest file size and on which system
do you want this to happen?
About 800-900MB on win32 (i'm using VS2008)

Jun 27 '08 #16
On Jun 21, 3:10*am, James Kanze <james.ka...@gm ail.comwrote:
(But IMHO, istream really isn't appropriate for binary; if I'm
really working with a binary file, I'll drop down to the system
API.)
That's exactly what was I thinking, but I wasn't sure if it was just
my lack of C++ knowledge that made it a pain to read binary data with
istream.

Thanks,
Ivan Novick
http://www.mycppquiz.com/
Jun 27 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1847
by: Fred the man | last post by:
Hi, I'm a PHP newbie, and am stuck as to why I can't find a pattern in a Win32 binary file. I'm actually trying to extract the FileVersion information myself since PHP under Unix doesn't seem to offer support for the PE file format: -------------
0
1264
by: john_phx | last post by:
I'm trying to increase the performance of a program that concatenates binary file parts into a single file. Each of the parts is contained in a binary file. The existing app simply takes the first part, renames it, then concatenates each additional part to that file I'd like to check the user's system for available heap space and calculate how many parts I can hold in memory, read the files into variables, concatenate the variables...
17
16005
by: Arnold | last post by:
Is using fseek and ftell a reliable method of getting the file size on a binary file? I thought I remember reading somewhere it wasn't... If not what would be the "right" and portable method to obtain it? Thanks.
14
3775
by: spike | last post by:
Im trying to write a program that should read through a binary file searching for the character sequence "\name\" Then it should read the characters following the "\name\" sequence until a NULL character is encountered. But when my program runs it gets a SIGSEGV (Segmentation vioalation) signal. Whats wrong? And is there a better way than mine to solve this task (most likely)
20
5604
by: cylin | last post by:
Dear all, I open a binary file and want to write 0x00040700 to this file. how can I set write buffer? --------------------------------------------------- typedef unsigned char UCHAR; int iFD=open(szFileName,O_CREAT|O_BINARY|O_TRUNC|O_WRONLY,S_IREAD|S_IWRITE); UCHAR buffer; //??????????? write(iFD,buffer,5); ---------------------------------------------------
24
3182
by: pristo | last post by:
hello All, how can i insert unique ID into binary file (that created by compiler)? (so after compiling i can to identify the src that i use) thx
7
6063
by: John Dann | last post by:
I'm trying to read some binary data from a file created by another program. I know the binary file format but can't change or control the format. The binary data is organised such that it should populate a series of structures of specified variable composition. I have the structures created OK, but actually reading the files is giving me an error. Can I ask a simple question to start with: I'm trying to read the file using the...
3
14571
by: mouac01 | last post by:
Newbie here. How do I do a find and replace in a binary file? I need to read in a binary file then replace a string "ABC" with another string "XYZ" then write to a new file. Find string is the same length as Replace string. Here's what I have so far. I spent many hours googling for sample code but couldn't find much. Thanks... public static void FindReplace(string OldFile, string NewFile) { string sFind = "ABC"; //I probably need...
1
4586
by: kenone | last post by:
I have loaded a large binary file into memory and now I want to search for 10101. I was using file.get to return the next hex number and see if it was equal to 0x15. This is not correct as part of my seach pattern (10101) may straggle over two hex numbers. Does anyone know of a way to find the pattern 10101 in a binary file loaded into memory? Any help is appreciated.
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10214
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10048
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9996
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9865
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5304
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3563
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.