I want to scan a file byte for byte for occurences of the the four byte
pattern 0x00000100. I've tried with this:
# start
import sys
numChars = 0
startCode = 0
count = 0
inputFile = sys.stdin
while True:
ch = inputFile.read( 1)
numChars += 1
if len(ch) < 1: break
startCode = ((startCode << 8) & 0xffffffff) | (ord(ch))
if numChars < 4: continue
if startCode == 0x00000100:
count = count + 1
print count
# end
But it is very slow. What is the fastest way to do this? Using some
native call? Using a buffer? Using whatever?
/David
Oct 28 '05
79 5289
Peter Otten wrote: Bengt Richter wrote:
What struck me was
> gen = byblocks(String IO.StringIO('no '),1024,len('en d?')-1) > [gen.next() for i in xrange(10)]
['no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no']
Ouch. Seems like I spotted the subtle cornercase error and missed the big one.
No, you just realised subconsciously that we'd all spot the obvious one
and decided to point out the bug that would remain after the obvious one
had been fixed.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
Steven D'Aprano <st***@REMOVETH IScyber.com.au> wrote: On Fri, 28 Oct 2005 15:29:46 +0200, Björn Lindström wrote:
"pi************ @gmail.com" <pi************ @gmail.com> writes:
f = open("filename" , "rb") s = f.read() sub = "\x00\x00\x01\x 00" count = s.count(sub) print count
That's a lot of lines. This is a bit off topic, but I just can't stand unnecessary local variables.
print file("filename" , "rb").read().co unt("\x00\x00\x 01\x00")
Funny you should say that, because I can't stand unnecessary one-liners.
In any case, you are assuming that Python will automagically close the file when you are done.
Nonsense. This behavior is deterministic. At the end of that line, the
anonymous file object out of scope, the object is deleted, and the file is
closed.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Paul Watson wrote: Here is a better one that counts, and not just detects, the substring. This is -much- faster than using mmap; especially for a large file that may cause paging to start. Using mmap can be -very- slow.
<ss = pattern, be = len(ss) - 1> ... b = fp.read(blocksi ze) count = 0 while len(b) > be: count += b.count(ss) b = b[-be:] + fp.read(blocksi ze) ...
In cases where that one wins and blocksize is big,
this should do even better:
...
block = fp.read(blocksi ze)
count = 0
while len(block) > be:
count += block.count(ss)
lead = block[-be :]
block = fp.read(blocksi ze)
count += (lead + block[: be]).count(ss)
...
--
-Scott David Daniels sc***********@a cm.org
Tim Roberts <ti**@probo.com > wrote:
... print file("filename" , "rb").read().co unt("\x00\x00\x 01\x00")
Funny you should say that, because I can't stand unnecessary one-liners.
In any case, you are assuming that Python will automagically close the file when you are done.
Nonsense. This behavior is deterministic. At the end of that line, the anonymous file object out of scope, the object is deleted, and the file is closed.
In today's implementations of Classic Python, yes. In other equally
valid implementations of the language, such as Jython, IronPython, or,
for all we know, some future implementation of Classic, that may well
not be the case. Many, quite reasonably, dislike relying on a specific
implementation' s peculiarities, and prefer to write code that relies
only on what the _language_ specs guarantee.
Alex ne********@gmai l.com wrote: I think implementing a finite state automaton would be a good (best?) solution. I have drawn a FSM for you (try viewing the following in fixed width font). Just increment the count when you reach state 5.
<---------------| | | 0 0 | 1 0 |0 -->[1]--->[2]--->[3]--->[4]--->[5]-| ^ | | ^ | | | 1| |<---| | | |1 |1 |_| 1 |_| | | ^ 0 | | |---------------------|<-----|
If you don't understand FSM's, try getting a book on computational theory (the book by Hopcroft & Ullman is great.)
Here you don't have special cases whether reading in blocks or reading whole at once (as you only need one byte at a time).
Indeed, but reading one byte at a time is about the slowest way to
process a file, in Python or any other language, because it fails to
amortize the overhead cost of function calls over many characters.
Buffering wasn't invented because early programmers had nothing better
to occupy their minds, remember :-)
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
"Alex Martelli" <al*****@yahoo. com> wrote in message
news:1h5760l.1e 2eatkurdeo7N%al *****@yahoo.com ... In today's implementations of Classic Python, yes. In other equally valid implementations of the language, such as Jython, IronPython, or, for all we know, some future implementation of Classic, that may well not be the case. Many, quite reasonably, dislike relying on a specific implementation' s peculiarities, and prefer to write code that relies only on what the _language_ specs guarantee.
How could I identify when Python code does not close files and depends on
the runtime to take care of this? I want to know that the code will work
well under other Python implementations and future implementations which may
not have this provided.
"Paul Watson" <pw*****@redlin epy.com> writes: How could I identify when Python code does not close files and depends on the runtime to take care of this? I want to know that the code will work well under other Python implementations and future implementations which may not have this provided.
There is nothing in the Python language reference that guarantees the
files will be closed when the references go out of scope. That
CPython does it is simply an implementation artifact. If you want to
make sure they get closed, you have to close them explicitly. There
are some Python language extensions in the works to make this more
convenient (PEP 343) but for now you have to do it by hand. ne********@gmai l.com wrote: I think implementing a finite state automaton would be a good (best?) solution. I have drawn a FSM for you (try viewing the following in fixed width font). Just increment the count when you reach state 5.
<---------------| | | 0 0 | 1 0 |0 -->[1]--->[2]--->[3]--->[4]--->[5]-| ^ | | ^ | | | 1| |<---| | | |1 |1 |_| 1 |_| | | ^ 0 | | |---------------------|<-----|
If you don't understand FSM's, try getting a book on computational theory (the book by Hopcroft & Ullman is great.)
I already have that book. The above solution very slow in practice. None
of the solutions presented in this thread is nearly as fast as the
print file("filename" , "rb").read().co unt("\x00\x00\x 01\x00")
/David
On Sat, 29 Oct 2005 21:08:09 +0000, Tim Roberts wrote: In any case, you are assuming that Python will automagically close the file when you are done.
Nonsense. This behavior is deterministic. At the end of that line, the anonymous file object out of scope, the object is deleted, and the file is closed.
That is an implementation detail. CPython may do that, but JPython does
not -- or at least it did not last time I looked. JPython doesn't
guarantee that the file will be closed at any particular time, just that
it will be closed eventually.
If all goes well. What if you have a circular dependence and the file
reference never gets garbage-collected? What happens if the JPython
process gets killed before the file is closed? You might not care about
one file not being released, but what if it is hundreds of files?
In general, it is best practice to release external resources as soon as
you're done with them, and not rely on a garbage collector which may or
may not release them in a timely manner.
There are circumstances where things do not go well and the file never
gets closed cleanly -- for example, when your disk is full, and the
buffer is only written to disk when you close the file. Would you
prefer that error to raise an exception, or to pass silently? If you want
close exceptions to pass silently, then by all means rely on the garbage
collector to close the file.
You might not care about these details in a short script -- when I'm
writing a use-once-and-throw-away script, that's what I do. But it isn't
best practice: explicit is better than implicit.
I should also point out that for really serious work, the idiom:
f = file("parrot")
handle(f)
f.close()
is insufficiently robust for production level code. That was a detail I
didn't think I needed to drop on the original newbie poster, but depending
on how paranoid you are, or how many exceptions you want to insulate the
user from, something like this might be needed:
try:
f = file("parrot")
try:
handle(f)
finally:
try:
f.close()
except:
print "The file could not be closed; see your sys admin."
except:
print "The file could not be opened."
--
Steven.
"Paul Watson" <pw*****@redlin epy.com> writes: "Mike Meyer" <mw*@mired.or g> wrote in message news:86******** ****@bhuda.mire d.org... "Paul Watson" <pw*****@redlin epy.com> writes: ... Did you do timings on it vs. mmap? Having to copy the data multiple times to deal with the overlap - thanks to strings being immutable - would seem to be a lose, and makes me wonder how it could be faster than mmap in general.
The only thing copied is a string one byte less than the search string for each block.
Um - you removed the code, but I could have *sworn* that it did
something like:
buf = buf[testlen:] + f.read(bufsize - testlen)
which should cause the the creation of three strings: the last few
bytes of the old buffer, a new bufferfull from the read, then the sum
of those two - created by copying the first two into a new string. So
you wind up copying all the data.
Which, as you showed, doesn't take nearly as much time as using mmap.
Thanks,
<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: CHANGE username to westes |
last post by:
What are the most popular, and well supported, libraries of drivers for bar
code scanners that include a Visual Basic and C/C++ API? My requirements
are:
- Must allow an application to be written to a single interface, but support
many different manufacturers' barcode scanning devices. I do not want to
be tied to one manufacturers' software interfaces.
- Must support use of the scanner from Visual Basic, and ideally from C/C++
and...
|
by: Zen |
last post by:
I'm using Access 2000, and I'd like to know if there is a way to use a
scanner (flatbed, doc-feed, etc) to scan forms with OMR or OCR software, and
have the data be automatically (or if not automatically then using a macro
or other means) entered into tables. I guess the real question is do I need
to use an expensive program to do this or is it codable suing Access/VB, and
if it is codable, any suggestions as to how to start?
Many...
|
by: Marie-Christine Bechara |
last post by:
I have a form with a button called btnScan. When i click on this button
i want to scan a file and save it in the database. Any hints?? ideas???
solutions???
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
|
by: Brent Burkart |
last post by:
I am using a streamreader to read a log file into my application. Now I
want to be able to scan for error messages, such as "failed", "error",
"permission denied", so I can take action such as send an email.
I am not quite sure how to approach this as far as scanning the content.
I currently read all of the contents in using the following
Dim contents As String = objStreamReader.ReadToEnd()
|
by: Bob Alston |
last post by:
I am looking for others who have built systems to scan documents, index
them and then make them accessible from an Access database. My
environment is a nonprofit with about 20-25 case workers who use
laptops. They have Access databases on their laptops and the data is
replicated.
The idea is that each case worker would scan their own documents,
either remotely or back at the office.
And NO I am not planning to store the scanned...
| |
by: tshad |
last post by:
We have a few pages that accept uploads and want to scan the files before
accepting them. Does Asp.net have a way of doing a virus scan?
We are using Trendmicro to scan files and email but don't know if we can use
it with our pages to handle files that our clients upload. Is there some
type of API that would allow us to do this?
I want to be able to Upload Word files using:
<input id="MyFile" visible="true" style="width:200px"...
|
by: kirubagari |
last post by:
For i = 49 To mfilesize Step 6
rich1.SelStart = Len(rich1.Text)
rich1.SelText = "Before : " & HexByte2Char(arrByte(i)) & _
" " & HexByte2Char(arrByte(i + 1)) & " " _
& HexByte2Char(arrByte(i + 2)) & " " _
& HexByte2Char(arrByte(i + 3)) & " " _
& HexByte2Char(arrByte(i + 4)) & " " _
|
by: Rotsey |
last post by:
Hi,
I am writing an app that scans hard drives and logs info
about every fine on the drive.
The first iteration of my code used a class and a generic list
to store the data and rhis took 13min on my 60 GB drive.
I wanted it to be quicker.
|
by: iheartvba |
last post by:
Hi Guys,
I have been using EzTwain Pro to scan documents into my access program.
It allows me to specify the location I want the Doc to go to. It also allows me to set the name of the document as well. The link to the program is as below :
EZTwain imaging library system - add TWAIN scanning or image capture to your application.
I'm not sure if it's the nature of the program, but the scanning module is very slow to load. Otherwise it's...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |