Why does StringIO discard its initial value?

Leif K-Brooks

When StringIO gets an initial value passed to its constructor, it seems
to discard it after the first call to .write(). For instance:

from StringIO import StringIO
buffer = StringIO('foo')
buffer.getvalue() 'foo' buffer.write('bar')
buffer.getvalue() 'bar' buffer.write('baz')
buffer.getvalue()

'barbaz'

The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor, so this issue doesn't
bother me very much, but I'm still curious about it. Is this the
expected behavior, and why it isn't mentioned in the docs if so?

Jul 18 '05 #1

Subscribe Post Reply

2822

jepler

Maybe this short interactive session can give you an idea why.

from StringIO import StringIO
b = StringIO("123456789")
b.tell() 0 b.write("abc")
b.getvalue() 'abc456789' b.tell()

3

StringIO seems to operate like a file opened with "r+" (If I've got my modes
right): it is opened for reading and writing, and positioned at the beginning.
In my example, the write of 3 bytes overwrites the first 3 bytes of the file
and leaves the rest intact. In your example your first write overwrote the
whole initial contents of the file, so you couldn't notice this effect.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCWWqGJd01MZaTXX0RAvSuAJ9lSChyzOej2TkqLuoaWp zxopOUPACfQv8D
lWmB6rReTFep5sYMwanqF7I=
=t4F9
-----END PGP SIGNATURE-----

Jul 18 '05 #2

Raymond Hettinger

[Leif K-Brooks]

The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor,
More than just a workaround, it is the preferred approach.
That makes is easier to switch to cStringIO where initialized objects are
read-only.
Is this the
expected behavior
Yes.
, and why it isn't mentioned in the docs if so?

Per your request, the docs have been updated.

Raymond Hettinger

Jul 18 '05 #3

David Fraser

Raymond Hettinger wrote:

[Leif K-Brooks]
The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor,

More than just a workaround, it is the preferred approach.
That makes is easier to switch to cStringIO where initialized objects are
read-only.

Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration :-)

David

class StringIO:
def __init__(self, buf = ''):
if not isinstance(buf, (str, unicode)):
buf = str(buf)
self.len = len(buf)
self.buf = cStringIO.StringIO()
self.buf.write(buf)
self.buf.seek(0)
self.pos = 0
self.closed = 0

def __iter__(self):
return self

def next(self):
if self.closed:
raise StopIteration
r = self.readline()
if not r:
raise StopIteration
return r

def close(self):
"""Free the memory buffer.
"""
if not self.closed:
self.closed = 1
del self.buf, self.pos

def isatty(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return False

def seek(self, pos, mode = 0):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.seek(pos, mode)
self.pos = self.buf.tell()

def tell(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.pos

def read(self, n = None):
if self.closed:
raise ValueError, "I/O operation on closed file"
if n == None:
r = self.buf.read()
else:
r = self.buf.read(n)
self.pos = self.buf.tell()
return r

def readline(self, length=None):
if self.closed:
raise ValueError, "I/O operation on closed file"
if length is not None:
r = self.buf.readline(length)
else:
r = self.buf.readline(length)
self.pos = self.buf.tell()
return r

def readlines(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
lines = self.buf.readlines()
self.pos = self.buf.tell()
return lines

def truncate(self, size=None):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.truncate(size)
self.pos = self.buf.tell()
self.buf.seek(0, 2)
self.len = self.buf.tell()
self.buf.seek(self.pos)

def write(self, s):
if self.closed:
raise ValueError, "I/O operation on closed file"
origpos = self.buf.tell()
self.buf.write(s)
self.pos = self.buf.tell()
if origpos + len(s) > self.len:
self.buf.seek(0, 2)
self.len = self.buf.tell()
self.buf.seek(self.pos)

def writelines(self, lines):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.writelines(lines)
self.pos = self.buf.tell()
self.buf.seek(0, 2)
self.len = self.buf.tell()
self.buf.seek(self.pos)

def flush(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.flush()

def getvalue(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.buf.getvalue()

Jul 18 '05 #4

Raymond Hettinger

[David Fraser]

Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration :-)

IMO, that would be a step backwards. Initializing the object and then
writing to it is not a good practice. The cStringIOAPI needs to be as
file-like as possible. With files, we create an emtpy object and then
starting writing (the append mode for existing files is a different story).
Good code ought to maintain that parallelism so that it is easier to
substitute a real file for a writeable cStringIO object.

This whole thread (except for the documentation issue which has been
fixed) is about fighting the API rather than letting it be a guide to good
code.

If there were something wrong with the API, Guido would have long
since fired up the time machine and changed the timeline so that all
would be as right as rain ;-)
Raymond Hettinger

Jul 18 '05 #5

David Fraser

Raymond Hettinger wrote:

[David Fraser]
Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration :-)

IMO, that would be a step backwards. Initializing the object and then
writing to it is not a good practice. The cStringIOAPI needs to be as
file-like as possible. With files, we create an emtpy object and then
starting writing (the append mode for existing files is a different story).
Good code ought to maintain that parallelism so that it is easier to
substitute a real file for a writeable cStringIO object.

This whole thread (except for the documentation issue which has been
fixed) is about fighting the API rather than letting it be a guide to good
code.

If there were something wrong with the API, Guido would have long
since fired up the time machine and changed the timeline so that all
would be as right as rain ;-)

But surely the whole point of files is that you can do more than either
creating a new file or appending to an existing one (seek, write?)

The reason I wrote this was to enable manipulating zip files inside zip
files, in memory. This is on translate.sourceforge.net - I wanted to
manipulate Mozilla XPI files, and replace file contents etc. within the
XPI. The XPI files are zip format that contains jars inside (also zip
format). I needed to alter the contents of files within the inner zip files.

The zip classes in Python can handle adding files but not replacing
them. The cStringIO is as described above.

So I created extensions to the zipfile.ZipFile class that allow it to
delete existing files, and add them again with new contents (thus
replacing them).

And I created wStringIO so that I could do this all inplace on the
existing zip files.

This all required some extra hacking because of the dual-layer zip files.

But all this as far as I see would have been really tricky using the
existing zipfile and cStringIO classes, which both assume (conceptually)
that files are either readable or new or merely appendable (for zipfile).

The problem for me was not that cStringIO classes are too similar to
files, it was that they are too dissimilar. All of this would work with
either StringIO (but too slow) or real files (but I needed it in memory
because of the zipfiles being inside other zip files).

Am I missing something?

David

Jul 19 '05 #6

by: Christian Seberino | last post by:

How does Ruby compare to Python?? How good is DESIGN of Ruby compared to Python? Python's design is godly. I'm wondering if Ruby's is godly too. I've heard it has solid OOP design but then...

Python

This program makes Python segfault - no other does

by: Juho Saarikko | last post by:

The program attached to this message makes the Python interpreter segfault randomly. I have tried both Python 2.2 which came with Debian Stable, and self-compiled Python 2.3.3 (newest I could find...

Python

StringIO objects sharing a buffer

by: Thomas Lotze | last post by:

Hi, I want to implement a tokenizer for some syntax. So I thought I'd subclass StringIO and make my new class return tokens on next(). However, if I want to read tokens from two places in the...

Python

StringIO proposal: add __iadd__

by: Paul Rubin | last post by:

I've always found the string-building idiom temp_list = for x in various_pieces_of_output(): v = go_figure_out_some_string() temp_list.append(v) final_string = ''.join(temp_list) ...

Python

StringIO.readline() returns ''

by: Max | last post by:

I'm using StringIO for the first time (to buffer messages recieved from a socket). I thought it would be a simple matter of writing the stuff to the buffer and then calling readline, but that...

Python

how to discard rows of a text file which does not satisfies foreign key constraint and continue with the rest

by: kutty | last post by:

Hi All, I am loading data to a child table from a text file. the text files also contains data not referenced by parent key. while loading the data if one row fails to satisfies the constraint...

MySQL Database

Convert StringIO to string

by: Jonathan Bowlas | last post by:

Hi listers, I've written this little script to generate some html but I cannot get it to convert to a string so I can perform a replace() on the >, < characters that get returned. from...

Python

trying to gzip uncompress a StringIO

by: bob | last post by:

I'm using the code below to read the zipped, base64 encoded WMF file saved in an XML file with "Save as XML" from MS Word. As the "At this point" comment shows, I know that the base64 decoding is...

Python

tarfile.open(mode='w:gz'|'w|gz'|..., fileobj=StringIO()) fails.

by: sebastian.noack | last post by:

Hi, is there a way to or at least a reason why I can not use tarfile to create a gzip or bunzip2 compressed archive in the memory? You might might wanna answer "use StringIO" but this isn't...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Why does StringIO discard its initial value?

Similar topics