speedy Python strings?

Uwe Mayer

Hi,

thanks to previous help my wrapper for an abstract file type with variable
record length is working now.
I did some test runs and its awfully slow:

I didn't want to read in the whole file at once and I didn't like to read it
step by step (contains 32 bit length terminated strings (Delphi) and 32bit
integers), so I read in i.e. 2MB, work on that buffer and if the buffer
goes empty i load some more 2MB, etc.
For this buffer I use ordinary strings:

class myFile(file):
def read(self, *args):
...
self.buffer += file.read(self, *args)
...

and after reading information from the buffer I remove the read part from
it:

text = struct.unpack(" L", self.buffer[:4])
self.buffer = self.buffer[4:]

During debugging I saw the program accelerating when the buffersize
<len(self.buffe r) > grew smaller, thus my conclusion: since strings are
immutable the string operations are so slow.

I tested different sizes of the buffer and it performed best when I didn't
use that buffering system (which is sad, since I spend so much time on it).

Is there anything else I can use instead of normal strings that'll speed
this up?
What do you suggest how to deal with this situation? Do you suggest I remove
any buffering of the data?

Thanks for any comments
Yours
Uwe

Jul 18 '05 #1

Subscribe Reply

1562

Jp Calderone

On Tue, Jan 20, 2004 at 01:43:24AM +0100, Uwe Mayer wrote:

Hi,

thanks to previous help my wrapper for an abstract file type with variable
record length is working now.
I did some test runs and its awfully slow:

I didn't want to read in the whole file at once and I didn't like to read it
step by step (contains 32 bit length terminated strings (Delphi) and 32bit
integers), so I read in i.e. 2MB, work on that buffer and if the buffer
goes empty i load some more 2MB, etc.
For this buffer I use ordinary strings:

class myFile(file):
def read(self, *args):
...
self.buffer += file.read(self, *args)
...
Doing the above in a loop has quadratic complexity.

and after reading information from the buffer I remove the read part from
it:

text = struct.unpack(" L", self.buffer[:4])
self.buffer = self.buffer[4:]

The same is true for this line.

Your program will be much faster if you make fewer string copies.
"self.buffe r[4:]" is a string copy, as is "self.buffe r += anotherString".

The buffer() builtin is one way to avoid copying, but another way is to
simply keep track of how far into the string you are and use both low and
high indexes when slicing, instead of repeatedly chopping the front off the
string.

Jp

Jul 18 '05 #2

Roy Smith

In article <bu**********@n ews.rz.uni-karlsruhe.de>,
Uwe Mayer <me*****@hadiko .de> wrote:

since strings are
immutable the string operations are so slow.

More to the point, and I suspect what you're running into, string
addition is linear with respect to string length because it has to
create a new string each time and destroy the old one. Since each
addition is linear with string length, a loop which adds strings will
run in quadratic time.

Possibly you want to look at the cStringIO class.

Another possibility is to buffer things in a list, using append to add
to the end and pop(0) to pop off the front (a roll-your-own queue).
Lists do not suffer from the quadratic behavor that string addition does.

Jul 18 '05 #3

Stuart D. Gathman

On Tue, 20 Jan 2004 01:43:24 +0100, Uwe Mayer wrote:

class myFile(file):
def read(self, *args):
... # self.buffer += file.read(self, *args) # not this
self.buffer = self.buffer[self.pos:] + file.read(self, *args)
self.pos = 0 ...

and after reading information from the buffer I remove the read part from
it:

# text = struct.unpack(" L", self.buffer[:4])
# self.buffer = self.buffer[4:]
pos = self.pos
text = struct.unpack(" L", self.buffer[pos:pos+4])
self.pos = pos + 4

--
Stuart D. Gathman <st****@bmsi.co m>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

Jul 18 '05 #4

Josiah Carlson

> Another possibility is to buffer things in a list, using append to add

to the end and pop(0) to pop off the front (a roll-your-own queue).
Lists do not suffer from the quadratic behavor that string addition does.

Deleting from the front of a list results in every item being shifted up
a single entry. While this is insignificant for short lists, it is
significant for long lists:

def del_from_front( size): ... a = range(size)
... t = time.time()
... while a: a.pop(0)
... return time.time()-t
... del_from_front( 1000) 0.0160000324249 26758 del_from_front( 10000) 0.8589999675750 7324 del_from_front( 20000) 3.2970001697540 283 def del_from_end(si ze): ... a = range(size)
... t = time.time()
... while a: a.pop()
... return time.time()-t
... del_from_end(10 000) 0.0629999637603 75977 del_from_end(10 0000) 0.5160000324249 2676 del_from_end(10 00000)

5.4530000686645 508

In terms of buffering as the parent post asks, file IO is already
buffered by the underlying libraries and operating system. Another
layer of Python buffering doesn't seem to make much sense. I would
suggest writing off the buffering exercise as a learning experience.

- Josiah

Jul 18 '05 #5

Francis Avila

Stuart D. Gathman wrote in message ...

# text = struct.unpack(" L", self.buffer[:4])
# self.buffer = self.buffer[4:]
pos = self.pos
text = struct.unpack(" L", self.buffer[pos:pos+4])
self.pos = pos + 4

In this vein, I would also recommend looking at the array module. You
didn't describe the data structure, but if each record is simply a list of
4-byte integers, you could convert the whole record to ints at once like so:

record = array.array('L' , stringorlist)
Then, perform your loops on record, which will be a list-like object
supporting item insertion and deletion, and conversion to bytestrings and
other Python base types.

Or, you could simply use array.array() to implement your buffer more
efficiently than continually calling struct. Simply buffer the converted
data a chunk at a time, using array.

However, I don't think buffering helps much in this case. File operations
are already buffered by Python, and most (all?) OSes buffer reads
themselves, too. Reading a file is almost always very fast. I would
concentrate more on presenting a useful abstraction for your data, rather
than worrying about what is essentially an optimization problem.
--
Francis Avila

Jul 18 '05 #6

Jp Calderone

On Mon, Jan 19, 2004 at 10:53:47PM -0500, Francis Avila wrote:

Stuart D. Gathman wrote in message ...
# text = struct.unpack(" L", self.buffer[:4])
# self.buffer = self.buffer[4:]
pos = self.pos
text = struct.unpack(" L", self.buffer[pos:pos+4])
self.pos = pos + 4

In this vein, I would also recommend looking at the array module. You
didn't describe the data structure, but if each record is simply a list of
4-byte integers, you could convert the whole record to ints at once like so:

record = array.array('L' , stringorlist)
Then, perform your loops on record, which will be a list-like object
supporting item insertion and deletion, and conversion to bytestrings and
other Python base types.

Also note the existence of fromfile() - the array module actually makes it
possible to do this incredibly efficiently:

a = array.array('L' )
a.fromfile(open FileObj, count)

Jp

Jul 18 '05 #7

Tim Delaney

"Josiah Carlson" <jc******@uci.e du> wrote:

Deleting from the front of a list results in every item being
shifted up a single entry. While this is insignificant for
short lists, it is significant for long lists:

Note that there's been some very interesting discussions on python-dev
recently that may end up alleviating this issue somewhat in the future - but
no promises - there doesn't appear to be a consensus on the "right way" ...

Tim Delaney

Jul 18 '05 #8

Josiah Carlson

> Note that there's been some very interesting discussions on python-dev

recently that may end up alleviating this issue somewhat in the future - but
no promises - there doesn't appear to be a consensus on the "right way" ...

I've been following it. My favorite solution is to not change the
structure of the list at all, and to offer a reasonably fast fifo queue
in a standard library module with a bunch of other useful data
structures. Then you don't need to worry about breaking C modules that
rely on the list having that behavior, and get some useful data
structures out of the deal.

- Josiah

Jul 18 '05 #9

Similar topics

220

18990

What's better about Ruby than Python?

by: Brandon J. Van Every | last post by:

What's better about Ruby than Python? I'm sure there's something. What is it? This is not a troll. I'm language shopping and I want people's answers. I don't know beans about Ruby or have any preconceived ideas about it. I have noticed, however, that every programmer I talk to who's aware of Python is also talking about Ruby. So it seems that Ruby has the potential to compete with and displace Python. I'm curious on what basis it...

Python

2416

Prothon should not borrow Python strings!

by: Paul Prescod | last post by:

I skimmed the tutorial and something alarmed me. "Strings are a powerful data type in Prothon. Unlike many languages, they can be of unlimited size (constrained only by memory size) and can hold any arbitrary data, even binary data such as photos and movies.They are of course also good for their traditional role of storing and manipulating text." This view of strings is about a decade out of date with modern programmimg practice. From...

Python

3681

author index for Python Cookbook 2?

by: Andrew Dalke | last post by:

Is there an author index for the new version of the Python cookbook? As a contributor I got my comp version delivered today and my ego wanted some gratification. I couldn't find my entries. Andrew dalke@dalkescientific.com

Python

8259

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8192

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8358

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8502

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7188

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

4090

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4195

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1805

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1504

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General