Is there a canonical way of iterating over the lines of a file that
are null-separated rather than newline-separated? Sure, I can
implement my own iterator using read() and split(), etc., but
considering that using "find -print0" is so common, it seems like
there should be a more cannonical way.
|>oug 17 4236
On Wed, Feb 23, 2005 at 10:54:50PM -0500, Douglas Alan wrote: Is there a canonical way of iterating over the lines of a file that are null-separated rather than newline-separated?
I'm not sure if there is a canonical method, but I would recommending using a
generator to get something like this, where 'f' is a file object:
def readnullsep(f):
# Need a place to put potential pieces of a null separated string
# across buffer boundaries
retain = []
while True:
instr = f.read(2048)
if len(instr)==0:
# End of file
break
# Split over nulls
splitstr = instr.split('\0 ')
# Combine with anything left over from previous read
retain.append(s plitstr[0])
splitstr[0] = ''.join(retain)
# Keep last piece for next loop and yield the rest
retain = [splitstr[-1]]
for element in splitstr[:-1]:
yield element
# yield anything left over
yield retain[0]
Chris
Douglas Alan wrote: Is there a canonical way of iterating over the lines of a file that are null-separated rather than newline-separated? Sure, I can implement my own iterator using read() and split(), etc., but considering that using "find -print0" is so common, it seems like there should be a more cannonical way.
You could start with this code and add '\0' as a line terminator: http://members.dsl-only.net/~daniels/ilines.html
--Scott David Daniels Sc***********@A cm.Org
Christopher De Vries <de*****@idolst arastronomer.co m> writes: I'm not sure if there is a canonical method, but I would recommending using a generator to get something like this, where 'f' is a file object:
Thanks for the generator. It returns an extra blank line at the end
when used with "find -print0", which is probably not ideal, and is
also not how the normal file line iterator behaves. But don't worry
-- I can fix it.
In any case, as a suggestion to the whomever it is that arranges for
stuff to be put into the standard library, there should be something
like this there, so everyone doesn't have to reinvent the wheel (even
if it's an easy wheel to reinvent) for something that any sysadmin
(and many other users) would want to do on practically a daily basis.
|>oug
Douglas Alan wrote:
.... In any case, as a suggestion to the whomever it is that arranges for stuff to be put into the standard library, there should be something like this there, so everyone doesn't have to reinvent the wheel (even if it's an easy wheel to reinvent) for something that any sysadmin (and many other users) would want to do on practically a daily basis.
The general model is that you produce a module, and if it gains a
audience to a stable interface, inclusion might be considered. I'd
suggest you put up a recipe at ActiveState.
--Scott David Daniels Sc***********@A cm.Org
On Thu, Feb 24, 2005 at 02:03:52PM -0500, Douglas Alan wrote: Thanks for the generator. It returns an extra blank line at the end when used with "find -print0", which is probably not ideal, and is also not how the normal file line iterator behaves. But don't worry -- I can fix it.
Sorry... I forgot to try it with a null terminated string. I guess it further
illustrates the power of writing good test cases. Something like this would
help:
# yield anything left over
if retain[0]:
yield retain[0]
The other modification would be an option to ignore multiple nulls in a row,
rather than returning empty strings, which could be done in a similar way.
Chris
On Thu, 24 Feb 2005 11:53:32 -0500, Christopher De Vries
<de*****@idolst arastronomer.co m> wrote: On Wed, Feb 23, 2005 at 10:54:50PM -0500, Douglas Alan wrote: Is there a canonical way of iterating over the lines of a file that are null-separated rather than newline-separated? I'm not sure if there is a canonical method, but I would recommending using a generator to get something like this, where 'f' is a file object:
def readnullsep(f): # Need a place to put potential pieces of a null separated string # across buffer boundaries retain = []
while True: instr = f.read(2048) if len(instr)==0: # End of file break
# Split over nulls splitstr = instr.split('\0 ')
# Combine with anything left over from previous read retain.append(s plitstr[0]) splitstr[0] = ''.join(retain)
# Keep last piece for next loop and yield the rest retain = [splitstr[-1]] for element in splitstr[:-1]:
(1) Inefficient (copies all but the last element of splitstr)
yield element
# yield anything left over yield retain[0]
(2) Dies when the input file is empty.
(3) As noted by the OP, can return a spurious empty line at the end.
Try this:
!def readweird(f, line_end='\0', bufsiz=8192):
! retain = ''
! while True:
! instr = f.read(bufsiz)
! if not instr:
! # End of file
! break
! splitstr = instr.split(lin e_end)
! if splitstr[-1]:
! # last piece not terminated
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = splitstr.pop()
! else:
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = ''
! del splitstr[-1]
! for element in splitstr:
! yield element
! if retain:
! yield retain
Cheers,
John
On Thu, 24 Feb 2005 14:51:07 -0500, Christopher De Vries
<de*****@idolst arastronomer.co m> wrote: The other modification would be an option to ignore multiple nulls in a row, rather than returning empty strings, which could be done in a similar way.
Why not leave this to the caller? Efficiency?? Filtering out empty
lines is the least of your worries.
Try giving the callers options to do things they *can't* do
themselves, like a different line-terminator or a buffer size > 2048
[which could well enhance efficiency] or < 10 [which definitely
enhances testing]
On Fri, Feb 25, 2005 at 07:56:49AM +1100, John Machin wrote: Try this: !def readweird(f, line_end='\0', bufsiz=8192): ! retain = '' ! while True: ! instr = f.read(bufsiz) ! if not instr: ! # End of file ! break ! splitstr = instr.split(lin e_end) ! if splitstr[-1]: ! # last piece not terminated ! if retain: ! splitstr[0] = retain + splitstr[0] ! retain = splitstr.pop() ! else: ! if retain: ! splitstr[0] = retain + splitstr[0] ! retain = '' ! del splitstr[-1] ! for element in splitstr: ! yield element ! if retain: ! yield retain
I think this is a definite improvement... especially putting the buffer size
and line terminators as optional arguments, and handling empty files. I think,
however that the if splitstr[-1]: ... else: ... clauses aren't necessary, so I
would probably reduce it to this:
!def readweird(f, line_end='\0', bufsiz=8192):
! retain = ''
! while True:
! instr = f.read(bufsiz)
! if not instr:
! # End of file
! break
! splitstr = instr.split(lin e_end)
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = splitstr.pop()
! for element in splitstr:
! yield element
! if retain:
! yield retain
Popping off that last member and then iterating over the rest of the list as
you suggested is so much more efficient, and it looks a lot better.
Chris
On Thu, 24 Feb 2005 16:51:22 -0500, Christopher De Vries
<de*****@idolst arastronomer.co m> wrote:
[snip] I think this is a definite improvement... especially putting the buffer size and line terminators as optional arguments, and handling empty files. I think, however that the if splitstr[-1]: ... else: ... clauses aren't necessary,
Indeed. Any efficiency gain would be negated by the if test and it's
only once per buffer-full anyway. I left all that stuff in to show
that I had actually analyzed the four cases i.e. it wasn't arrived at
by lucky accident.
so I would probably reduce it to this:
[snip] Popping off that last member and then iterating over the rest of the list as you suggested is so much more efficient, and it looks a lot better.
Yeah. If it looks like a warthog, it is a warthog. The converse is of
course not true; examples of elegant insufficiency abound.
Cheers,
John This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: jerrygarciuh |
last post by:
Hello,
If you have the whole server path for a file is there a canonical way to get
the path from document root for that file so that you can present the file
ina browser or for download? Check $_SERVER and parse the
path?
My thought is that given OS diversity and individual server differences (eg
www vs public_html vs htdocs) that there may be no one-size-fits-all
solution but I thought I would throw this out there any way.
|
by: deko |
last post by:
I have a (Access 2003) contact management database where the user can
double-click a contact's phone number in a form and have the Windows Phone
Dialer dial the number. The problem is the number has to be in canonical
format or dialing rules won't be applied (cf. MSKB Article 318575). I don't
want to use an Input Mask because users like to put comments after the
number, like: "985-983-0098 ext. 980 - Mike B."
I thought there might be a...
|
by: Eric Lilja |
last post by:
Hello, consider the following complete program:
#include <assert.h>
#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
static int has_char(const char *, const char);
|
by: Max Sandman |
last post by:
I'm getting increasingly frustrated with C# and its exceptions on null
values. Rather than try to deal with it on a hit-or-miss basis as
exceptions pop up, I thought I should try to learn exactly how C# deals
with null. Of course, there's nothing obvious in the docs like "Dealing
with Null Values" and a search on "null" yielded 500 results, most of
which don't apply. Can anybody point me in the right direction? Or offer
some general...
|
by: Matt |
last post by:
I could use some help dealing with null blobs. I'm
returning a transaction from an Image BLOB field in SQL
Server 2000 using C#. If the transaction exists the value
is returned with out trouble, but because the ID can exist
without having a value in the Image column the returned
value is NULL and the code can't handle it and I receive
this error when the Stored Procedure's value is returned
(the line of code is marked with "**HERE**":
...
| |
by: Juan R. |
last post by:
Introduction
I am developing the CanonML language (version 1.0) as a way to
generate, store, and publish canonical science documents on the
Internet. This language will be the basis for the next version 2.0 of
the website of the Center for CANONICAL |SCIENCE). The current
preliminary version -in proof stage- has been developed on XHTML
1.1 + MathML 2.0 language without semantics (e.g. there exists not use
of <h1> or <p>). We wait see the...
|
by: Juan R. |
last post by:
The initial CanonMath program presented here
http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html]
was discussed with several specialists, including father of XML-MAIDEN
project (which provided many interesting ideas over original desing).
The initial CanonMath program (was abandoned) was presented at the w3c
mailing list for mathematics. There was little discussion but
subsequent discussion on others...
|
by: Stephen Cawood |
last post by:
I'm trying to use a C++ .lib from C# (I tried the Interop group will no
results).
I have a working wrapper DLL (I can get back simple things like int), but
I'm having issues dealing with an array of bytes.
For example, the .lib contains this function:
int create(int id, int scale, unsigned char *image);
In the wrapper DLL I have this function:
|
by: zzz |
last post by:
Hi all,
I was recently reading the book "Write Great code by ryndall Hyde" in
this in chapter 8 the following are given.
given n input variables there are two raised to two raised to n
unique Boolean functions ex:- for 2 i/p variables there are 16
different functions.
then he mentions about canonical forms. he says about sum of min terms
|
by: Jeffrey Walton |
last post by:
Hi All,
BMP Strings are a subset of Universal Strings.The BMP string uses
approximately 65,000 code points from Universal String encoding. BMP
Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/
IEC 10646, 4-octet canonical form.
An excellent discussion occured with respect to BMP Strings and .Net
(see http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_thread/thread/f18fcb62156a1a0c/)....
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |