473,804 Members | 3,038 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why does StringIO discard its initial value?

When StringIO gets an initial value passed to its constructor, it seems
to discard it after the first call to .write(). For instance:
from StringIO import StringIO
buffer = StringIO('foo')
buffer.getvalue () 'foo' buffer.write('b ar')
buffer.getvalue () 'bar' buffer.write('b az')
buffer.getvalue ()

'barbaz'

The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor, so this issue doesn't
bother me very much, but I'm still curious about it. Is this the
expected behavior, and why it isn't mentioned in the docs if so?
Jul 18 '05 #1
5 2863
Maybe this short interactive session can give you an idea why.
from StringIO import StringIO
b = StringIO("12345 6789")
b.tell() 0 b.write("abc")
b.getvalue() 'abc456789' b.tell()

3

StringIO seems to operate like a file opened with "r+" (If I've got my modes
right): it is opened for reading and writing, and positioned at the beginning.
In my example, the write of 3 bytes overwrites the first 3 bytes of the file
and leaves the rest intact. In your example your first write overwrote the
whole initial contents of the file, so you couldn't notice this effect.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFCWWqGJd0 1MZaTXX0RAvSuAJ 9lSChyzOej2TkqL uoaWpzxopOUPACf Qv8D
lWmB6rReTFep5sY MwanqF7I=
=t4F9
-----END PGP SIGNATURE-----

Jul 18 '05 #2
[Leif K-Brooks]
The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor,
More than just a workaround, it is the preferred approach.
That makes is easier to switch to cStringIO where initialized objects are
read-only.
Is this the
expected behavior
Yes.
, and why it isn't mentioned in the docs if so?


Per your request, the docs have been updated.

Raymond Hettinger
Jul 18 '05 #3
Raymond Hettinger wrote:
[Leif K-Brooks]
The obvious workaround is to call buffer.write() with the initial value
instead of passing it to StringIO's constructor,

More than just a workaround, it is the preferred approach.
That makes is easier to switch to cStringIO where initialized objects are
read-only.


Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration :-)

David

class StringIO:
def __init__(self, buf = ''):
if not isinstance(buf, (str, unicode)):
buf = str(buf)
self.len = len(buf)
self.buf = cStringIO.Strin gIO()
self.buf.write( buf)
self.buf.seek(0 )
self.pos = 0
self.closed = 0

def __iter__(self):
return self

def next(self):
if self.closed:
raise StopIteration
r = self.readline()
if not r:
raise StopIteration
return r

def close(self):
"""Free the memory buffer.
"""
if not self.closed:
self.closed = 1
del self.buf, self.pos

def isatty(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return False

def seek(self, pos, mode = 0):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.seek(p os, mode)
self.pos = self.buf.tell()

def tell(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.pos

def read(self, n = None):
if self.closed:
raise ValueError, "I/O operation on closed file"
if n == None:
r = self.buf.read()
else:
r = self.buf.read(n )
self.pos = self.buf.tell()
return r

def readline(self, length=None):
if self.closed:
raise ValueError, "I/O operation on closed file"
if length is not None:
r = self.buf.readli ne(length)
else:
r = self.buf.readli ne(length)
self.pos = self.buf.tell()
return r

def readlines(self) :
if self.closed:
raise ValueError, "I/O operation on closed file"
lines = self.buf.readli nes()
self.pos = self.buf.tell()
return lines

def truncate(self, size=None):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.trunca te(size)
self.pos = self.buf.tell()
self.buf.seek(0 , 2)
self.len = self.buf.tell()
self.buf.seek(s elf.pos)

def write(self, s):
if self.closed:
raise ValueError, "I/O operation on closed file"
origpos = self.buf.tell()
self.buf.write( s)
self.pos = self.buf.tell()
if origpos + len(s) > self.len:
self.buf.seek(0 , 2)
self.len = self.buf.tell()
self.buf.seek(s elf.pos)

def writelines(self , lines):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.writel ines(lines)
self.pos = self.buf.tell()
self.buf.seek(0 , 2)
self.len = self.buf.tell()
self.buf.seek(s elf.pos)

def flush(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
self.buf.flush( )

def getvalue(self):
if self.closed:
raise ValueError, "I/O operation on closed file"
return self.buf.getval ue()
Jul 18 '05 #4
[David Fraser]
Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration :-)


IMO, that would be a step backwards. Initializing the object and then
writing to it is not a good practice. The cStringIOAPI needs to be as
file-like as possible. With files, we create an emtpy object and then
starting writing (the append mode for existing files is a different story).
Good code ought to maintain that parallelism so that it is easier to
substitute a real file for a writeable cStringIO object.

This whole thread (except for the documentation issue which has been
fixed) is about fighting the API rather than letting it be a guide to good
code.

If there were something wrong with the API, Guido would have long
since fired up the time machine and changed the timeline so that all
would be as right as rain ;-)
Raymond Hettinger
Jul 18 '05 #5
Raymond Hettinger wrote:
[David Fraser]
Others may find this helpful ; it's a pure Python wrapper for cStringIO
that makes it behave like StringIO in not having initialized objects
readonly. Would it be an idea to extend cStringIO like this in the
standard library? It shouldn't lose performance if used like a standard
cStringIO, but it prevents frustration :-)

IMO, that would be a step backwards. Initializing the object and then
writing to it is not a good practice. The cStringIOAPI needs to be as
file-like as possible. With files, we create an emtpy object and then
starting writing (the append mode for existing files is a different story).
Good code ought to maintain that parallelism so that it is easier to
substitute a real file for a writeable cStringIO object.

This whole thread (except for the documentation issue which has been
fixed) is about fighting the API rather than letting it be a guide to good
code.

If there were something wrong with the API, Guido would have long
since fired up the time machine and changed the timeline so that all
would be as right as rain ;-)


But surely the whole point of files is that you can do more than either
creating a new file or appending to an existing one (seek, write?)

The reason I wrote this was to enable manipulating zip files inside zip
files, in memory. This is on translate.sourc eforge.net - I wanted to
manipulate Mozilla XPI files, and replace file contents etc. within the
XPI. The XPI files are zip format that contains jars inside (also zip
format). I needed to alter the contents of files within the inner zip files.

The zip classes in Python can handle adding files but not replacing
them. The cStringIO is as described above.

So I created extensions to the zipfile.ZipFile class that allow it to
delete existing files, and add them again with new contents (thus
replacing them).

And I created wStringIO so that I could do this all inplace on the
existing zip files.

This all required some extra hacking because of the dual-layer zip files.

But all this as far as I see would have been really tricky using the
existing zipfile and cStringIO classes, which both assume (conceptually)
that files are either readable or new or merely appendable (for zipfile).

The problem for me was not that cStringIO classes are too similar to
files, it was that they are too dissimilar. All of this would work with
either StringIO (but too slow) or real files (but I needed it in memory
because of the zipfiles being inside other zip files).

Am I missing something?

David
Jul 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
3499
by: Christian Seberino | last post by:
How does Ruby compare to Python?? How good is DESIGN of Ruby compared to Python? Python's design is godly. I'm wondering if Ruby's is godly too. I've heard it has solid OOP design but then I've also heard there are lots of weird ways to do some things kinda like Perl which is bad for me. Any other ideas?
6
3022
by: Juho Saarikko | last post by:
The program attached to this message makes the Python interpreter segfault randomly. I have tried both Python 2.2 which came with Debian Stable, and self-compiled Python 2.3.3 (newest I could find on www.python.org, compiled with default options (./configure && make). I'm using the pyPgSQL plugin to connect to a PostGreSQL database, and have tried the Debian and self-compiled newest versions of that as well. I'm running BitTorrent, and...
1
1929
by: Thomas Lotze | last post by:
Hi, I want to implement a tokenizer for some syntax. So I thought I'd subclass StringIO and make my new class return tokens on next(). However, if I want to read tokens from two places in the string in turns, I'd either need to do some housekeeping of file pointers outside the tokenizer class (which is ugly) or use two tokenizers on the same data buffer (which seems impossible to me using my preferred approach as a file-like object...
21
2280
by: Paul Rubin | last post by:
I've always found the string-building idiom temp_list = for x in various_pieces_of_output(): v = go_figure_out_some_string() temp_list.append(v) final_string = ''.join(temp_list) completely repulsive. As an alternative I suggest
3
7298
by: Max | last post by:
I'm using StringIO for the first time (to buffer messages recieved from a socket). I thought it would be a simple matter of writing the stuff to the buffer and then calling readline, but that doesn't seem to work: >>> buf = StringIO.StringIO() >>> buf.write("Foo\n") >>> buf.write("Bar\n") >>> buf.flush() >>> buf.readline() ''
5
4583
by: kutty | last post by:
Hi All, I am loading data to a child table from a text file. the text files also contains data not referenced by parent key. while loading the data if one row fails to satisfies the constraint everything is getting rollback.. plz suggest me something.. which will help me to discard the unsatisfied rows and continue with the rest..
2
26578
by: Jonathan Bowlas | last post by:
Hi listers, I've written this little script to generate some html but I cannot get it to convert to a string so I can perform a replace() on the >, < characters that get returned. from StringIO import StringIO def generator_file(rsspath,titleintro,tickeropt): scripter=StringIO()
3
6186
by: bob | last post by:
I'm using the code below to read the zipped, base64 encoded WMF file saved in an XML file with "Save as XML" from MS Word. As the "At this point" comment shows, I know that the base64 decoding is going fine, but unzipping from the decodedVersion StringIO object isn't getting me anything, because the len(fileContent) call is returning 0. Any suggestions? thanks, Bob
6
6029
by: sebastian.noack | last post by:
Hi, is there a way to or at least a reason why I can not use tarfile to create a gzip or bunzip2 compressed archive in the memory? You might might wanna answer "use StringIO" but this isn't such easy as it seems to be. ;) I am using Python 2.5.2, by the way. I think this is a bug in at least in this version of python, but maybe StringIO isn't just file-like enough for this "korky" tarfile module. But this would conflict with its...
0
10571
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10326
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10317
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10075
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7615
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6851
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5520
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5651
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4295
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.