Canonical way of dealing with null-separated lines?

Douglas Alan

Is there a canonical way of iterating over the lines of a file that
are null-separated rather than newline-separated? Sure, I can
implement my own iterator using read() and split(), etc., but
considering that using "find -print0" is so common, it seems like
there should be a more cannonical way.

|>oug

Jul 18 '05 #1

Subscribe Reply

4236

Christopher De Vries

On Wed, Feb 23, 2005 at 10:54:50PM -0500, Douglas Alan wrote:

Is there a canonical way of iterating over the lines of a file that
are null-separated rather than newline-separated?

I'm not sure if there is a canonical method, but I would recommending using a
generator to get something like this, where 'f' is a file object:

def readnullsep(f):
# Need a place to put potential pieces of a null separated string
# across buffer boundaries
retain = []

while True:
instr = f.read(2048)
if len(instr)==0:
# End of file
break

# Split over nulls
splitstr = instr.split('\0 ')

# Combine with anything left over from previous read
retain.append(s plitstr[0])
splitstr[0] = ''.join(retain)

# Keep last piece for next loop and yield the rest
retain = [splitstr[-1]]
for element in splitstr[:-1]:
yield element

# yield anything left over
yield retain[0]

Chris

Jul 18 '05 #2

Scott David Daniels

Douglas Alan wrote:

Is there a canonical way of iterating over the lines of a file that
are null-separated rather than newline-separated? Sure, I can
implement my own iterator using read() and split(), etc., but
considering that using "find -print0" is so common, it seems like
there should be a more cannonical way.

You could start with this code and add '\0' as a line terminator:

http://members.dsl-only.net/~daniels/ilines.html
--Scott David Daniels
Sc***********@A cm.Org

Jul 18 '05 #3

Douglas Alan

Christopher De Vries <de*****@idolst arastronomer.co m> writes:

I'm not sure if there is a canonical method, but I would
recommending using a generator to get something like this, where 'f'
is a file object:

Thanks for the generator. It returns an extra blank line at the end
when used with "find -print0", which is probably not ideal, and is
also not how the normal file line iterator behaves. But don't worry
-- I can fix it.

In any case, as a suggestion to the whomever it is that arranges for
stuff to be put into the standard library, there should be something
like this there, so everyone doesn't have to reinvent the wheel (even
if it's an easy wheel to reinvent) for something that any sysadmin
(and many other users) would want to do on practically a daily basis.

|>oug

Jul 18 '05 #4

Scott David Daniels

Douglas Alan wrote:
....

In any case, as a suggestion to the whomever it is that arranges for
stuff to be put into the standard library, there should be something
like this there, so everyone doesn't have to reinvent the wheel (even
if it's an easy wheel to reinvent) for something that any sysadmin
(and many other users) would want to do on practically a daily basis.

The general model is that you produce a module, and if it gains a
audience to a stable interface, inclusion might be considered. I'd
suggest you put up a recipe at ActiveState.

--Scott David Daniels
Sc***********@A cm.Org

Jul 18 '05 #5

Christopher De Vries

On Thu, Feb 24, 2005 at 02:03:52PM -0500, Douglas Alan wrote:

Thanks for the generator. It returns an extra blank line at the end
when used with "find -print0", which is probably not ideal, and is
also not how the normal file line iterator behaves. But don't worry
-- I can fix it.

Sorry... I forgot to try it with a null terminated string. I guess it further
illustrates the power of writing good test cases. Something like this would
help:

# yield anything left over
if retain[0]:
yield retain[0]

The other modification would be an option to ignore multiple nulls in a row,
rather than returning empty strings, which could be done in a similar way.

Chris

Jul 18 '05 #6

John Machin

On Thu, 24 Feb 2005 11:53:32 -0500, Christopher De Vries
<de*****@idolst arastronomer.co m> wrote:

On Wed, Feb 23, 2005 at 10:54:50PM -0500, Douglas Alan wrote:
Is there a canonical way of iterating over the lines of a file that
are null-separated rather than newline-separated?
I'm not sure if there is a canonical method, but I would recommending using a
generator to get something like this, where 'f' is a file object:

def readnullsep(f):
# Need a place to put potential pieces of a null separated string
# across buffer boundaries
retain = []

while True:
instr = f.read(2048)
if len(instr)==0:
# End of file
break

# Split over nulls
splitstr = instr.split('\0 ')

# Combine with anything left over from previous read
retain.append(s plitstr[0])
splitstr[0] = ''.join(retain)

# Keep last piece for next loop and yield the rest
retain = [splitstr[-1]]
for element in splitstr[:-1]:

(1) Inefficient (copies all but the last element of splitstr)
yield element

# yield anything left over
yield retain[0]

(2) Dies when the input file is empty.

(3) As noted by the OP, can return a spurious empty line at the end.

Try this:

!def readweird(f, line_end='\0', bufsiz=8192):
! retain = ''
! while True:
! instr = f.read(bufsiz)
! if not instr:
! # End of file
! break
! splitstr = instr.split(lin e_end)
! if splitstr[-1]:
! # last piece not terminated
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = splitstr.pop()
! else:
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = ''
! del splitstr[-1]
! for element in splitstr:
! yield element
! if retain:
! yield retain

Cheers,
John

Jul 18 '05 #7

John Machin

On Thu, 24 Feb 2005 14:51:07 -0500, Christopher De Vries
<de*****@idolst arastronomer.co m> wrote:

The other modification would be an option to ignore multiple nulls in a row,
rather than returning empty strings, which could be done in a similar way.

Why not leave this to the caller? Efficiency?? Filtering out empty
lines is the least of your worries.

Try giving the callers options to do things they *can't* do
themselves, like a different line-terminator or a buffer size > 2048
[which could well enhance efficiency] or < 10 [which definitely
enhances testing]

Jul 18 '05 #8

Christopher De Vries

On Fri, Feb 25, 2005 at 07:56:49AM +1100, John Machin wrote:

Try this:
!def readweird(f, line_end='\0', bufsiz=8192):
! retain = ''
! while True:
! instr = f.read(bufsiz)
! if not instr:
! # End of file
! break
! splitstr = instr.split(lin e_end)
! if splitstr[-1]:
! # last piece not terminated
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = splitstr.pop()
! else:
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = ''
! del splitstr[-1]
! for element in splitstr:
! yield element
! if retain:
! yield retain

I think this is a definite improvement... especially putting the buffer size
and line terminators as optional arguments, and handling empty files. I think,
however that the if splitstr[-1]: ... else: ... clauses aren't necessary, so I
would probably reduce it to this:

!def readweird(f, line_end='\0', bufsiz=8192):
! retain = ''
! while True:
! instr = f.read(bufsiz)
! if not instr:
! # End of file
! break
! splitstr = instr.split(lin e_end)
! if retain:
! splitstr[0] = retain + splitstr[0]
! retain = splitstr.pop()
! for element in splitstr:
! yield element
! if retain:
! yield retain

Popping off that last member and then iterating over the rest of the list as
you suggested is so much more efficient, and it looks a lot better.

Chris

Jul 18 '05 #9

John Machin

On Thu, 24 Feb 2005 16:51:22 -0500, Christopher De Vries
<de*****@idolst arastronomer.co m> wrote:

[snip]

I think this is a definite improvement... especially putting the buffer size
and line terminators as optional arguments, and handling empty files. I think,
however that the if splitstr[-1]: ... else: ... clauses aren't necessary,
Indeed. Any efficiency gain would be negated by the if test and it's
only once per buffer-full anyway. I left all that stuff in to show
that I had actually analyzed the four cases i.e. it wasn't arrived at
by lucky accident.
so I
would probably reduce it to this:
[snip]
Popping off that last member and then iterating over the rest of the list as
you suggested is so much more efficient, and it looks a lot better.

Yeah. If it looks like a warthog, it is a warthog. The converse is of
course not true; examples of elegant insufficiency abound.

Cheers,
John

Jul 18 '05 #10

Similar topics

3005

Canonical method for getting path from doucment root for a file?

by: jerrygarciuh | last post by:

Hello, If you have the whole server path for a file is there a canonical way to get the path from document root for that file so that you can present the file ina browser or for download? Check $_SERVER and parse the path? My thought is that given OS diversity and individual server differences (eg www vs public_html vs htdocs) that there may be no one-size-fits-all solution but I thought I would throw this out there any way.

PHP

10033

Code to get phone numbers in canonical format?

by: deko | last post by:

I have a (Access 2003) contact management database where the user can double-click a contact's phone number in a form and have the Windows Phone Dialer dial the number. The problem is the number has to be in canonical format or dialing rules won't be applied (cf. MSKB Article 318575). I don't want to use an Input Mask because users like to put comments after the number, like: "985-983-0098 ext. 980 - Mike B." I thought there might be a...

Microsoft Access / VBA

1855

Robustify code dealing with input

by: Eric Lilja | last post by:

Hello, consider the following complete program: #include <assert.h> #include <ctype.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <time.h> static int has_char(const char *, const char);

C / C++

4054

Dealing with Null Values

by: Max Sandman | last post by:

I'm getting increasingly frustrated with C# and its exceptions on null values. Rather than try to deal with it on a hit-or-miss basis as exceptions pop up, I thought I should try to learn exactly how C# deals with null. Of course, there's nothing obvious in the docs like "Dealing with Null Values" and a search on "null" yielded 500 results, most of which don't apply. Can anybody point me in the right direction? Or offer some general...

C# / C Sharp

3636

Dealing with a null BLOB field

by: Matt | last post by:

I could use some help dealing with null blobs. I'm returning a transaction from an Image BLOB field in SQL Server 2000 using C#. If the transaction exists the value is returned with out trouble, but because the ID can exist without having a value in the Image column the returned value is NULL and the code can't handle it and I receive this error when the Stored Procedure's value is returned (the line of code is marked with "**HERE**": ...

C# / C Sharp

1786

Canonical Science Today, and notation/syntaxes for CanonMath

by: Juan R. | last post by:

Introduction I am developing the CanonML language (version 1.0) as a way to generate, store, and publish canonical science documents on the Internet. This language will be the basis for the next version 2.0 of the website of the Center for CANONICAL |SCIENCE). The current preliminary version -in proof stage- has been developed on XHTML 1.1 + MathML 2.0 language without semantics (e.g. there exists not use of <h1> or <p>). We wait see the...

.NET Framework

1742

Canonical Science Today, authoring system for science and mathematics (1st part)

by: Juan R. | last post by:

The initial CanonMath program presented here http://canonicalscience.blogspot.com/2006/02/choosing-notationsyntax-for-canonmath.html] was discussed with several specialists, including father of XML-MAIDEN project (which provided many interesting ideas over original desing). The initial CanonMath program (was abandoned) was presented at the w3c mailing list for mathematics. There was little discussion but subsequent discussion on others...

.NET Framework

7797

dealing with unsigned char* from C++ .lib

by: Stephen Cawood | last post by:

I'm trying to use a C++ .lib from C# (I tried the Interop group will no results). I have a working wrapper DLL (I can get back simple things like int), but I'm having issues dealing with an array of bytes. For example, the .lib contains this function: int create(int id, int scale, unsigned char *image); In the wrapper DLL I have this function:

C# / C Sharp

2546

canonical forms

by: zzz | last post by:

Hi all, I was recently reading the book "Write Great code by ryndall Hyde" in this in chapter 8 the following are given. given n input variables there are two raised to two raised to n unique Boolean functions ex:- for 2 i/p variables there are 16 different functions. then he mentions about canonical forms. he says about sum of min terms

C / C++

4826

Universal String (4 Byte Canonical Encoding) and UTF-32

by: Jeffrey Walton | last post by:

Hi All, BMP Strings are a subset of Universal Strings.The BMP string uses approximately 65,000 code points from Universal String encoding. BMP Strings: ISO/IEC 10646, 2-octet canonical form, Universal String: ISO/ IEC 10646, 4-octet canonical form. An excellent discussion occured with respect to BMP Strings and .Net (see http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_thread/thread/f18fcb62156a1a0c/)....

C# / C Sharp

8984

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8823

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9530

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9312

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9238

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6073

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4593

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4864

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2206

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General