finding/replacing a long binary pattern in a .bin file

yaipa

What would be the common sense way of finding a binary pattern in a
..bin file, say some 200 bytes, and replacing it with an updated pattern
of the same length at the same offset?

Also, the pattern can occur on any byte boundary in the file, so
chunking through the code at 16 bytes a frame maybe a problem. The
file itself isn't so large, maybe 32 kbytes is all and the need for
speed is not so great, but the need for accuracy in the
search/replacement is very important.

Thanks,

--Alan

Jul 18 '05 #1

Subscribe Post Reply

15180

Stephen Thorne

On 12 Jan 2005 22:36:54 -0800, yaipa <ya***@yahoo.com> wrote:

What would be the common sense way of finding a binary pattern in a
.bin file, say some 200 bytes, and replacing it with an updated pattern
of the same length at the same offset?

Also, the pattern can occur on any byte boundary in the file, so
chunking through the code at 16 bytes a frame maybe a problem. The
file itself isn't so large, maybe 32 kbytes is all and the need for
speed is not so great, but the need for accuracy in the
search/replacement is very important.

Okay, given the requirements.

f = file('mybinfile')
contents = f.read().replace(oldbinstring, newbinstring)
f.close()
f = file('mybinfile','w')
f.write(contents)
f.close()

Will do it, and do it accurately. But it will also read the entire
file into memory.

Stephen.

Jul 18 '05 #2

Bengt Richter

On Thu, 13 Jan 2005 16:51:46 +1000, Stephen Thorne <st************@gmail.com> wrote:

On 12 Jan 2005 22:36:54 -0800, yaipa <ya***@yahoo.com> wrote:
What would be the common sense way of finding a binary pattern in a
.bin file, say some 200 bytes, and replacing it with an updated pattern
of the same length at the same offset?

Also, the pattern can occur on any byte boundary in the file, so
chunking through the code at 16 bytes a frame maybe a problem. The
file itself isn't so large, maybe 32 kbytes is all and the need for
speed is not so great, but the need for accuracy in the
search/replacement is very important.

Okay, given the requirements.

f = file('mybinfile')
contents = f.read().replace(oldbinstring, newbinstring)
f.close()
f = file('mybinfile','w')
f.write(contents)
f.close()

Will do it, and do it accurately. But it will also read the entire
file into memory.

You must be on linux or such, otherwise you would have shown opening the
_binary_ files (I assume that's what a .bin file is) with 'rb' and 'wb', IWT.

Not sure what system the OP was/is on.

BTW, I'm sure you could write a generator that would take a file name
and oldbinstring and newbinstring as arguments, and read and yield nice
os-file-system-friendly disk-sector-multiple chunks, so you could write

fout = open('mynewbinfile', 'wb')
for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
fout.write(buf)
fout.close()

(left as an exercise ;-)
(modifying a file "in place" is another exercise)
(doing the latter with defined maximum memory buffer usage
even when mods increase the length of the file is another ;-)

Regards,
Bengt Richter

Jul 18 '05 #3

François Pinard

[Stephen Thorne]

On 12 Jan 2005 22:36:54 -0800, yaipa <ya***@yahoo.com> wrote:
What would be the common sense way of finding a binary pattern in
a .bin file, say some 200 bytes, and replacing it with an updated
pattern of the same length at the same offset? The file itself
isn't so large, maybe 32 kbytes is all and the need for speed is not
so great, but the need for accuracy in the search/replacement is
very important.
Okay, given the requirements. f = file('mybinfile')
contents = f.read().replace(oldbinstring, newbinstring)
f.close()
f = file('mybinfile','w')
f.write(contents)
f.close() Will do it, and do it accurately. But it will also read the entire
file into memory.

32Kb is a small file indeed, reading it in memory is not a problem!

People sometimes like writing long Python programs. Here is about the
same, a bit shorter: :-)

buffer = file('mybinfile', 'rb').read().replace(oldbinstring, newbinstring)
file('mybinfile', 'wb').write(buffer)

--
François Pinard http://pinard.progiciels-bpi.ca

Jul 18 '05 #4

Jeff Shannon

Bengt Richter wrote:

BTW, I'm sure you could write a generator that would take a file name
and oldbinstring and newbinstring as arguments, and read and yield nice
os-file-system-friendly disk-sector-multiple chunks, so you could write

fout = open('mynewbinfile', 'wb')
for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
fout.write(buf)
fout.close()

What happens when the bytes to be replaced are broken across a block
boundary? ISTM that neither half would be recognized....

I believe that this requires either reading the entire file into
memory, to scan all at once, or else conditionally matching an
arbitrary fragment of the end of a block against the beginning of the
oldbinstring... Given that the file in question is only a few tens of
kbytes, I'd think that doing it in one gulp is simpler. (For a large
file, chunking it might be necessary, though...)

Jeff Shannon
Technician/Programmer
Credit International

Jul 18 '05 #5

Bengt Richter

On Thu, 13 Jan 2005 11:40:52 -0800, Jeff Shannon <je**@ccvcorp.com> wrote:

Bengt Richter wrote:
BTW, I'm sure you could write a generator that would take a file name
and oldbinstring and newbinstring as arguments, and read and yield nice
os-file-system-friendly disk-sector-multiple chunks, so you could write

fout = open('mynewbinfile', 'wb')
for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
fout.write(buf)
fout.close()
What happens when the bytes to be replaced are broken across a block
boundary? ISTM that neither half would be recognized....

That was part of the exercise ;-)

(Hint: use str.find to find unbroken oldbinstrings in current inputbuffer and buffer out
safe changes, then when find fails, delete the safely used front of the input buffer,
and append another chunk from the input file. Repeat until last chunk has been appended
and find finds no more. Then buffer out the tail of the input buffer (if any) that then
won't have an oldbinstring to change).

I believe that this requires either reading the entire file into
memory, to scan all at once, or else conditionally matching an
arbitrary fragment of the end of a block against the beginning of the
oldbinstring... Given that the file in question is only a few tens of
kbytes, I'd think that doing it in one gulp is simpler. (For a large
file, chunking it might be necessary, though...)

It's certainly simpler to do it in one gulp, but it's not really hard to
do it in chunks. You just have to make sure your input buffer/chunksize is/are
larger than oldbinstring ;-)

Regards,
Bengt Richter

Jul 18 '05 #6

Bengt Richter

On Thu, 13 Jan 2005 11:40:52 -0800, Jeff Shannon <je**@ccvcorp.com> wrote:

Bengt Richter wrote:
BTW, I'm sure you could write a generator that would take a file name
and oldbinstring and newbinstring as arguments, and read and yield nice
os-file-system-friendly disk-sector-multiple chunks, so you could write

fout = open('mynewbinfile', 'wb')
for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
fout.write(buf)
fout.close()

What happens when the bytes to be replaced are broken across a block
boundary? ISTM that neither half would be recognized....

I believe that this requires either reading the entire file into
memory, to scan all at once, or else conditionally matching an
arbitrary fragment of the end of a block against the beginning of the
oldbinstring... Given that the file in question is only a few tens of
kbytes, I'd think that doing it in one gulp is simpler. (For a large
file, chunking it might be necessary, though...)

Might as well post this, in case you're interested... warning, not very tested.
You want to write a proper test? ;-)

----< sreplace.py >-------------------------------------------------
def sreplace(sseq, old, new, retsize=4096):
"""
iterate through sseq input string chunk sequence treating it
as a continuous stream, replacing each substring old with new,
and generating a sequence of retsize returned strings, except
that the last may be shorter depedning on available input.
"""
inbuf = ''
endsseq = False
out = []
start = 0
lenold = len(old)
lennew = len(new)
while not endsseq:
start, endprev = old and inbuf.find(old, start) or -1, start
if start<0:
start = endprev # restore find start pos
for chunk in sseq: inbuf+= chunk; break
else:
out.append(inbuf[start:])
endsseq = True
else:
out.append(inbuf[endprev:start])
start += lenold
out.append(new)
if endsseq or sum(map(len, out))>=retsize:
s = ''.join(out)
while len(s)>= retsize:
yield s[:retsize]
s = s[retsize:]
if endsseq:
if s: yield s
else:
out = [s]

if __name__ == '__main__':
import sys
args = sys.argv[:]
usage = """
Test usage: [python] sreplace.py old new retsize [rest of args is string chunks for test]
where old is old string to find in chunked stream and new is replacement
and retsize is returned buffer size, except that last may be shorter"""
if not args[1:]: raise SystemExit, usage
try:
args[3] = int(args[3])
args[0] = iter(sys.argv[4:])
print '%r\n-----------\n%s\n------------' %(sys.argv[1:], '\n'.join(sreplace(*args[:4])))
except Exception, e:
print '%s: %s' %(e.__class__.__name__, e)
raise SystemExit, usage
--------------------------------------------------------------------

As mentioned, not tested very much beyond what you see:

[ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 20 This is x and abcxdef 012x345 zzxx zzz x
['x', '_XX_', '20', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
-----------
Thisis_XX_andabc_XX_
def012_XX_345zz_XX__
XX_zzz_XX_
------------

[ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 80 This is x and abcxdef 012x345 zzxx zzz x
['x', '_XX_', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
-----------
Thisis_XX_andabc_XX_def012_XX_345zz_XX__XX_zzz_XX_
------------

[ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 4 This is x and abcxdef 012x345 zzxx zzz x
['x', '_XX_', '4', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
-----------
This
is_X
X_an
dabc
_XX_
def0
12_X
X_34
5zz_
XX__
XX_z
zz_X
X_
------------

[ 2:44] C:\pywk\ut>py24 sreplace.py def DEF 80 This is x and abcxdef 012x345 zzxx zzz x
['def', 'DEF', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
-----------
ThisisxandabcxDEF012x345zzxxzzzx
------------

If you wanted to change a binary file, you'd use it something like (although probably let
the default buffer size be at 4096, not 20, which is pretty silly other than demoing.
At least the input chunks are 512 ;-)

from sreplace import sreplace
fw = open('sreplace.py.txt','wb')
for buf in sreplace(iter(lambda f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20): ... fw.write(buf)
... fw.close()
^Z

[ 3:00] C:\pywk\ut>diff -u sreplace.py sreplace.py.txt
--- sreplace.py Fri Jan 14 02:39:52 2005
+++ sreplace.py.txt Fri Jan 14 03:00:01 2005
@@ -7,7 +7,7 @@
"""
inbuf = ''
endsseq = False
- out = []
+ OUT = []
start = 0
lenold = len(old)
lennew = len(new)
@@ -17,21 +17,21 @@
start = endprev # restore find start pos
for chunk in sseq: inbuf+= chunk; break
else:
- out.append(inbuf[start:])
+ OUT.append(inbuf[start:])
endsseq = True
else:
- out.append(inbuf[endprev:start])
+ OUT.append(inbuf[endprev:start])
start += lenold
- out.append(new)
- if endsseq or sum(map(len, out))>=retsize:
- s = ''.join(out)
+ OUT.append(new)
+ if endsseq or sum(map(len, OUT))>=retsize:
+ s = ''.join(OUT)
while len(s)>= retsize:
yield s[:retsize]
s = s[retsize:]
if endsseq:
if s: yield s
else:
- out = [s]
+ OUT = [s]

if __name__ == '__main__':
import sys
Regards,
Bengt Richter

Jul 18 '05 #7

yaipa

Bengt, and all,

Thanks for all the good input. The problems seems to be that .find()
is good for text files on Windows, but is not much use when it is
binary data. The script is for a Assy Language build tool, so I know
the exact seek address of the binary data that I need to replace, so
maybe I'll just go that way. It just seemed a little more general to
do a search and replace rather than having to type in a seek address.

Of course I could use a Lib function to convert the binary data to
ascii and back, but seems a little over the top in this case.

Cheers,

--Alan
Bengt Richter wrote:

On Thu, 13 Jan 2005 11:40:52 -0800, Jeff Shannon <je**@ccvcorp.com> wrote:
Bengt Richter wrote:
BTW, I'm sure you could write a generator that would take a file name and oldbinstring and newbinstring as arguments, and read and yield nice os-file-system-friendly disk-sector-multiple chunks, so you could write
fout = open('mynewbinfile', 'wb')
for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring): fout.write(buf)
fout.close()
What happens when the bytes to be replaced are broken across a block
boundary? ISTM that neither half would be recognized....

I believe that this requires either reading the entire file into
memory, to scan all at once, or else conditionally matching an
arbitrary fragment of the end of a block against the beginning of theoldbinstring... Given that the file in question is only a few tens ofkbytes, I'd think that doing it in one gulp is simpler. (For a largefile, chunking it might be necessary, though...)

Might as well post this, in case you're interested... warning, not

very tested. You want to write a proper test? ;-)

----< sreplace.py >-------------------------------------------------
def sreplace(sseq, old, new, retsize=4096):
"""
iterate through sseq input string chunk sequence treating it
as a continuous stream, replacing each substring old with new,
and generating a sequence of retsize returned strings, except
that the last may be shorter depedning on available input.
"""
inbuf = ''
endsseq = False
out = []
start = 0
lenold = len(old)
lennew = len(new)
while not endsseq:
start, endprev = old and inbuf.find(old, start) or -1, start
if start<0:
start = endprev # restore find start pos
for chunk in sseq: inbuf+= chunk; break
else:
out.append(inbuf[start:])
endsseq = True
else:
out.append(inbuf[endprev:start])
start += lenold
out.append(new)
if endsseq or sum(map(len, out))>=retsize:
s = ''.join(out)
while len(s)>= retsize:
yield s[:retsize]
s = s[retsize:]
if endsseq:
if s: yield s
else:
out = [s]

if __name__ == '__main__':
import sys
args = sys.argv[:]
usage = """
Test usage: [python] sreplace.py old new retsize [rest of args is string chunks for test] where old is old string to find in chunked stream and new is replacement and retsize is returned buffer size, except that last may be shorter""" if not args[1:]: raise SystemExit, usage
try:
args[3] = int(args[3])
args[0] = iter(sys.argv[4:])
print '%r\n-----------\n%s\n------------' %(sys.argv[1:], '\n'.join(sreplace(*args[:4]))) except Exception, e:
print '%s: %s' %(e.__class__.__name__, e)
raise SystemExit, usage
--------------------------------------------------------------------

As mentioned, not tested very much beyond what you see:

[ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 20 This is x and abcxdef 012x345 zzxx zzz x ['x', '_XX_', '20', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x'] -----------
Thisis_XX_andabc_XX_
def012_XX_345zz_XX__
XX_zzz_XX_
------------

[ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 80 This is x and abcxdef 012x345 zzxx zzz x ['x', '_XX_', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x'] -----------
Thisis_XX_andabc_XX_def012_XX_345zz_XX__XX_zzz_XX_
------------

[ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 4 This is x and abcxdef 012x345 zzxx zzz x ['x', '_XX_', '4', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x'] -----------
This
is_X
X_an
dabc
_XX_
def0
12_X
X_34
5zz_
XX__
XX_z
zz_X
X_
------------

[ 2:44] C:\pywk\ut>py24 sreplace.py def DEF 80 This is x and abcxdef 012x345 zzxx zzz x ['def', 'DEF', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x'] -----------
ThisisxandabcxDEF012x345zzxxzzzx
------------

If you wanted to change a binary file, you'd use it something like (although probably let the default buffer size be at 4096, not 20, which is pretty silly other than demoing. At least the input chunks are 512 ;-)
>>> from sreplace import sreplace
>>> fw = open('sreplace.py.txt','wb')
>>> for buf in sreplace(iter(lambda
f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20):
... fw.write(buf)
... >>> fw.close()
>>> ^Z

[ 3:00] C:\pywk\ut>diff -u sreplace.py sreplace.py.txt
--- sreplace.py Fri Jan 14 02:39:52 2005
+++ sreplace.py.txt Fri Jan 14 03:00:01 2005
@@ -7,7 +7,7 @@
"""
inbuf = ''
endsseq = False
- out = []
+ OUT = []
start = 0
lenold = len(old)
lennew = len(new)
@@ -17,21 +17,21 @@
start = endprev # restore find start pos
for chunk in sseq: inbuf+= chunk; break
else:
- out.append(inbuf[start:])
+ OUT.append(inbuf[start:])
endsseq = True
else:
- out.append(inbuf[endprev:start])
+ OUT.append(inbuf[endprev:start])
start += lenold
- out.append(new)
- if endsseq or sum(map(len, out))>=retsize:
- s = ''.join(out)
+ OUT.append(new)
+ if endsseq or sum(map(len, OUT))>=retsize:
+ s = ''.join(OUT)
while len(s)>= retsize:
yield s[:retsize]
s = s[retsize:]
if endsseq:
if s: yield s
else:
- out = [s]
+ OUT = [s]

if __name__ == '__main__':
import sys
Regards,
Bengt Richter

Jul 18 '05 #8

Bengt Richter

On 14 Jan 2005 15:40:27 -0800, "yaipa" <ya***@yahoo.com> wrote:

Bengt, and all,

Thanks for all the good input. The problems seems to be that .find()
is good for text files on Windows, but is not much use when it is
binary data. The script is for a Assy Language build tool, so I know Did you try it? Why shouldn't find work for binary data?? At the end of
this, I showed an example of opening and modding a text file _in binary_.

s= ''.join(chr(i) for i in xrange(256))
s '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\ r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\
x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ab
cdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x 84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f
\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x 9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7
\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\x b4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf
\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\x cc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7
\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\x e4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\x fc\xfd\xfe\xff' for i in xrange(256): ... assert i == s.find(chr(i))
...
I.e., all the finds succeded for all 256 possible bytes. Why wouldn't you think that would work fine
for data from a binary file? Of course, find is case sensitive and fixed, not a regex, so it's
not very flexible. It wouldn't be that hard to expand to a list of old,new pairs as a change spec
though. Of course that would slow it down some.

the exact seek address of the binary data that I need to replace, so
maybe I'll just go that way. It just seemed a little more general to
do a search and replace rather than having to type in a seek address. Except you run the risk of not having a unique search result, unless you
have a really guaranteed unique pattern.
Of course I could use a Lib function to convert the binary data to
ascii and back, but seems a little over the top in this case. I think you misunderstand Python strings. There is no need to "convert" the result
of open(filename, 'rb').read(chunksize). Re-read the example below ;-)
[...]

If you wanted to change a binary file, you'd use it something like ^^^^^^^^^^^(although probably let
the default buffer size be at 4096, not 20, which is pretty silly

other than demoing.
At least the input chunks are 512 ;-)
>>> from sreplace import sreplace
>>> fw = open('sreplace.py.txt','wb')

opens a binary output file
>>> for buf in sreplace(iter(lambda

f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20):

iter(f, sentinel) is the format above. I creates an iterator that
keeps calling f() until f()==sentinel, which it doesn't return, and that ends the sequence
f in this case is lambda f=open(inputfilename):f.read(inputchunksize)
and the sentinel is '' -- which is what is returned at EOF.
The old thing to find was 'out', to be changed to 'OUT', and the 20 was a silly small
return chunks size for the sreplace(...) iterator. Alll these chunks were simply passed
to ... fw.write(buf)
...
>>> fw.close() and closing the file explicitly wrapped it up. >>> ^Z

I just typed that in interactively to demo the file change process with the source itself, so the diff
could show the changes. I guess I should have made sreplace.py runnable as a binary file updater, rather
than a cute demo using command line text. The files are no worry, but what is the source of your old
and new binary patterns that you want use for find and replace? You can't enter them in unescaped format
on a command line, so you may want to specify them in separate binary files, or you could specify them
as Python strings in a module that could be imported. E.g.,

---< old2new.py >------
# example of various ways to specify binary bytes in strings
from binascii import unhexlify as hex2chr
old = (
'This is plain text.'
+ ''.join(map(chr,[33,44,55, 0xaa])) + '<<-- arbitrary list of binary bytes specified in numerically if desired'
+ chr(33)+chr(44)+chr(55)+ '<<-- though this is plainer for a short sequence'
+ hex2chr('4142433031320001ff') + r'<<-- should be ABC012\x00\x01\xff'
)

new = '\x00'*len(old) # replace with zero bytes
-----------------------

BTW: Note: changing binaries can be dangerous! Do so at your own risk!!
And this has not been tested worth a darn, so caveat**n.

---< binfupd.py >------
from sreplace import sreplace
def main(infnam, outfnam, old, new):
infile = open(infnam, 'rb')
inseq = iter(lambda: infile.read(4096), '')
outfile = open(outfnam, 'wb')
try:
try:
for buf in sreplace(inseq, old, new):
outfile.write(buf)
finally:
infile.close()
outfile.close()
except Exception, e:
print '%s:%s' %(e.__class__.__name__, e)

if __name__ == '__main__':
import sys
try:
oldnew = __import__(sys.argv[3])
main(sys.argv[1], sys.argv[2], oldnew.old, oldnew.new)
except Exception, e:
print '%s:%s' %(e.__class__.__name__, e)
raise SystemExit, """
Usage: [python] binfupd.py infname outfname oldnewmodulename
where infname is read in binary, and outfname is written
in binary, replacing instances of old binary data with new
specified as python strings named old and new respectively
in a module named oldnewmodulename (without .py extension).
"""
-----------------------

REMEMBER: NO WARRANTY FOR ANY PURPOSE! USE AT YOUR OWN RISK!

And, if you know where to seek to, that seems like the best way ;-)

Regards,
Bengt Richter

Jul 18 '05 #9

John Lenton

On Wed, Jan 12, 2005 at 10:36:54PM -0800, yaipa wrote:

What would be the common sense way of finding a binary pattern in a
.bin file, say some 200 bytes, and replacing it with an updated pattern
of the same length at the same offset?

Also, the pattern can occur on any byte boundary in the file, so
chunking through the code at 16 bytes a frame maybe a problem. The
file itself isn't so large, maybe 32 kbytes is all and the need for
speed is not so great, but the need for accuracy in the
search/replacement is very important.

ok, after having read the answers, I feel I must, once again, bring
mmap into the discussion. It's not that I'm any kind of mmap expert,
that I twirl mmaps for a living; in fact I barely have cause to use it
in my work, but give me a break! this is the kind of thing mmap
*shines* at!

Let's say m is your mmap handle, a is the pattern you want to find,
b is the pattern you want to replace, and n is the size of both a and
b.

You do this:

p = m.find(a)
m[p:p+n] = b

and that is *it*. Ok, so getting m to be a mmap handle takes more work
than open() (*) A *lot* more work, in fact, so maybe you're justified
in not using it; some people can't afford the extra

s = os.stat(fn).st_size
m = mmap.mmap(f.fileno(), s)

and now I'm all out of single-letter variables.

*) why isn't mmap easier to use? I've never used it with something
other than the file size as its second argument, and with its access
argument in sync with open()'s second arg.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
If the aborigine drafted an IQ test, all of Western civilization would
presumably flunk it.
-- Stanley Garn

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFB6KYegPqu395ykGsRAi2MAKCAgLlfIfiKMvOYTN3n+h Wgd/u7wgCgkEIv
pr3dzPovxdjsVbZjhIVC+6E=
=dNOf
-----END PGP SIGNATURE-----

Jul 18 '05 #10

yaipa

John,

Thanks for reminding me of the mmap module. The following worked as
expected.
#--------------------------------------------------------
import mmap

source_data = open("source_file.bin", 'rb').read()
search_data = open("search_data.bin", 'rb').read()
replace_data = open("replace_data.bin", 'rb').read()

# copy source.bin to modified.bin
open("modified.bin", 'wb').write(open("source_file.bin", 'rb').read())

fp = open("modified.bin", 'r+')
mm = mmap.mmap(fp.fileno(), 0)

start_addr = mm.find(search_data)
end_addr = start_addr + len(replace_data)
mm[start_addr:end_addr] = replace_data

mm.close()
#--------------------------------------------------------

Although, I choose impliment string method approach in the build tool
because there are two occurances of *Pattern* in the .bin file to be
updated and the string method did both in one shot.

Cheers,

--Alan
John Lenton wrote:

On Wed, Jan 12, 2005 at 10:36:54PM -0800, yaipa wrote:
What would be the common sense way of finding a binary pattern in a
.bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset?

Also, the pattern can occur on any byte boundary in the file, so
chunking through the code at 16 bytes a frame maybe a problem. The
file itself isn't so large, maybe 32 kbytes is all and the need for
speed is not so great, but the need for accuracy in the
search/replacement is very important.
ok, after having read the answers, I feel I must, once again, bring
mmap into the discussion. It's not that I'm any kind of mmap expert,
that I twirl mmaps for a living; in fact I barely have cause to use

it in my work, but give me a break! this is the kind of thing mmap
*shines* at!

Let's say m is your mmap handle, a is the pattern you want to find,
b is the pattern you want to replace, and n is the size of both a and
b.

You do this:

p = m.find(a)
m[p:p+n] = b

and that is *it*. Ok, so getting m to be a mmap handle takes more work than open() (*) A *lot* more work, in fact, so maybe you're justified
in not using it; some people can't afford the extra

s = os.stat(fn).st_size
m = mmap.mmap(f.fileno(), s)

and now I'm all out of single-letter variables.

*) why isn't mmap easier to use? I've never used it with something
other than the file size as its second argument, and with its access
argument in sync with open()'s second arg.

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
If the aborigine drafted an IQ test, all of Western civilization would presumably flunk it.
-- Stanley Garn

Jul 18 '05 #11

yaipa

Thanks Francois,

It worked as expected.
-------------------------------------------------------------------------------
source_data = open("source_data.bin", 'rb').read()
search_data = open("search_data.bin", 'rb').read()
replace_data = open("replace_data.bin", 'rb').read()
outFile = open("mod.bin", 'wb')

file_offset = source_data.find(search_data)
print "file_offset:", file_offset

outData = source_data.replace(search_data, replace_data)
outFile.write(outData)
outFile.close
print ""

Jul 18 '05 #12

yaipa

Jul 18 '05 #13

yaipa

Bengt,

Thanks for the input, sorry, your diff threw me the first time I looked
at it, but then I went back and tried it later. Yes it works fine and
I've tucked it away for later use. For this particular Use Case
String.replace seems to get the job done in short order and the tool
needs to be maintained by folks not familiar /w Python so I went a head
and used that. But, I image I will use this bit of code when I need a
finer grained tool.

Thanks again.
Cheers,

--Alan

Jul 18 '05 #14

finding/replacing a long binary pattern in a .bin file

Similar topics