marshal.dumps quadratic growth and marshal.dump not allowingfile-like objects

bkustel

I'm stuck on a problem where I want to use marshal for serialization
(yes, yes, I know (c)Pickle is normally recommended here). I favor
marshal for speed for the types of data I use.

However it seems that marshal.dumps() for large objects has a
quadratic performance issue which I'm assuming is that it grows its
memory buffer in constant increments. This causes a nasty slowdown for
marshaling large objects. I thought I would get around this by passing
a cStringIO.StringIO object to marshal.dump() instead but I quickly
learned this is not supported (only true file objects are supported).

Any ideas about how to get around the marshal quadratic issue? Any
hope for a fix for that on the horizon? Thanks for any information.

Jun 27 '08 #1

Subscribe Post Reply

1543

TheSaint

On 16:04, domenica 15 giugno 2008 bk*****@gmail.com wrote:

cStringIO.StringIO object to marshal.dump() instead but I quickly
learned this is not supported (only true file objects are supported).

Any ideas about how to get around the marshal quadratic issue? Any
hope for a fix for that on the horizon?

If you zip the cStringIO.StringIO object, would it be possible?

--
Mailsweeper Home : http://it.geocities.com/call_me_not_now/index.html

Jun 27 '08 #2

Peter Otten

bk*****@gmail.com wrote:

I'm stuck on a problem where I want to use marshal for serialization
(yes, yes, I know (c)Pickle is normally recommended here). I favor
marshal for speed for the types of data I use.

However it seems that marshal.dumps() for large objects has a
quadratic performance issue which I'm assuming is that it grows its
memory buffer in constant increments. This causes a nasty slowdown for
marshaling large objects. I thought I would get around this by passing
a cStringIO.StringIO object to marshal.dump() instead but I quickly
learned this is not supported (only true file objects are supported).

Any ideas about how to get around the marshal quadratic issue? Any
hope for a fix for that on the horizon? Thanks for any information.

Here's how marshal resizes the string:

newsize = size + size + 1024;
if (newsize 32*1024*1024) {
newsize = size + 1024*1024;
}

Maybe you can split your large objects and marshal multiple objects to keep
the size below the 32MB limit.

Peter

Jun 27 '08 #3

Raymond Hettinger

On Jun 15, 1:04*am, bkus...@gmail.com wrote:

However it seems that marshal.dumps() for large objects has a
quadratic performance issue which I'm assuming is that it grows its
memory buffer in constant increments.

Looking at the source in http://svn.python.org/projects/pytho...thon/marshal.c
, it looks like the relevant fragment is in w_more():

. . .
size = PyString_Size(p->str);
newsize = size + size + 1024;
if (newsize 32*1024*1024) {
newsize = size + 1024*1024;
}
if (_PyString_Resize(&p->str, newsize) != 0) {
. . .

When more space is needed, the resize operation over-allocates by
double the previous need plus 1K. This should give amortized O(1)
performance just like list.append().

However, when that strategy requests more than 32Mb, the resizing
becomes less aggressive and grows only in 1MB blocks and giving your
observed nasty quadratic behavior.

Raymond

Jun 27 '08 #4

John Machin

On Jun 15, 7:47 pm, Peter Otten <__pete...@web.dewrote:

bkus...@gmail.com wrote:
I'm stuck on a problem where I want to use marshal for serialization
(yes, yes, I know (c)Pickle is normally recommended here). I favor
marshal for speed for the types of data I use.

However it seems that marshal.dumps() for large objects has a
quadratic performance issue which I'm assuming is that it grows its
memory buffer in constant increments. This causes a nasty slowdown for
marshaling large objects. I thought I would get around this by passing
a cStringIO.StringIO object to marshal.dump() instead but I quickly
learned this is not supported (only true file objects are supported).

Any ideas about how to get around the marshal quadratic issue? Any
hope for a fix for that on the horizon? Thanks for any information.

Here's how marshal resizes the string:

newsize = size + size + 1024;
if (newsize 32*1024*1024) {
newsize = size + 1024*1024;
}

Maybe you can split your large objects and marshal multiple objects to keep
the size below the 32MB limit.

But that change went into the svn trunk on 11-May-2008; perhaps the OP
is using a production release which would have the previous version,
which is merely "newsize = size + 1024;".

Do people really generate 32MB pyc files, or is stopping doubling at
32MB just a safety valve in case someone/something runs amok?

Cheers,
John

Jun 27 '08 #5

Peter Otten

John Machin wrote:

>Here's how marshal resizes the string:

newsize = size + size + 1024;
if (newsize 32*1024*1024) {
newsize = size + 1024*1024;
}

Maybe you can split your large objects and marshal multiple objects to
keep the size below the 32MB limit.

But that change went into the svn trunk on 11-May-2008; perhaps the OP
is using a production release which would have the previous version,
which is merely "newsize = size + 1024;".

That is indeed much worse. Depending on what the OP means by "large objects"
the problem may be fixed in subversion then.

Do people really generate 32MB pyc files, or is stopping doubling at
32MB just a safety valve in case someone/something runs amok?

A 32MB pyc would correspond to a module of roughly the same size. So
someone/something runs amok in either case.

Peter

Jun 27 '08 #6

Christian Heimes

Raymond Hettinger wrote:

When more space is needed, the resize operation over-allocates by
double the previous need plus 1K. This should give amortized O(1)
performance just like list.append().

However, when that strategy requests more than 32Mb, the resizing
becomes less aggressive and grows only in 1MB blocks and giving your
observed nasty quadratic behavior.

The marshal code has been revamped in Python 2.6. The old code in Python
2.5 uses a linear growth strategy:

size = PyString_Size(p->str);
newsize = size + 1024;
if (_PyString_Resize(&p->str, newsize) != 0) {
p->ptr = p->end = NULL;
}

Anyway marshal should not be used by user code to serialize objects.
It's only meant for Python byte code. Please use the pickle/cPickle
module instead.

Christian

Jun 27 '08 #7

bkustel

On Jun 15, 3:16*am, John Machin <sjmac...@lexicon.netwrote:

But that change went into the svn trunk on 11-May-2008; perhaps the OP
is using a production release which would have the previous version,
which is merely "newsize = size + 1024;".

Do people really generate 32MB pyc files, or is stopping doubling at
32MB just a safety valve in case someone/something runs amok?

Indeed. I (the OP) am using a production release which has the 1k
linear growth.
I am seeing the problems with ~5MB and ~10MB sizes.
Apparently this will be improved greatly in Python 2.6, at least up to
the 32MB limit.

Thanks all for responding.

Jun 27 '08 #8

John Machin

On Jun 16, 1:08 am, bkus...@gmail.com wrote:

On Jun 15, 3:16 am, John Machin <sjmac...@lexicon.netwrote:

But that change went into the svn trunk on 11-May-2008; perhaps the OP
is using a production release which would have the previous version,
which is merely "newsize = size + 1024;".

Do people really generate 32MB pyc files, or is stopping doubling at
32MB just a safety valve in case someone/something runs amok?

Indeed. I (the OP) am using a production release which has the 1k
linear growth.
I am seeing the problems with ~5MB and ~10MB sizes.
Apparently this will be improved greatly in Python 2.6, at least up to
the 32MB limit.

Apparently you intend to resist good advice and persist [accidental
pun!] with marshal -- how much slower is cPickle for various sizes of
data? What kinds of objects are you persisting?

Jun 27 '08 #9

Raymond Hettinger

On Jun 15, 8:08*am, bkus...@gmail.com wrote:

Indeed. I (the OP) am using a production release which has the 1k
linear growth.
I am seeing the problems with ~5MB and ~10MB sizes.
Apparently this will be improved greatly in Python 2.6, at least up to
the 32MB limit.

I've just fixed this for Py2.5.3 and Py2.6. No more quadratic
behavior.
Raymond

Jun 27 '08 #10

Aaron Watters

>
Anywaymarshalshould not be used by user code to serialize objects.
It's only meant for Python byte code. Please use the pickle/cPickle
module instead.

Christian

Just for yucks let me point out that marshal has
no real security concerns of interest to the non-paranoid,
whereas pickle is a security disaster waiting to happen
unless you are extremely cautious... yet again.

Sorry, I know a even a monkey learns after 3 times...

-- Aaron Watters

===
http://www.xfeedme.com/nucular/pydis...ETEXT=disaster

Jun 27 '08 #11

by: syd | last post by:

Hello all, In my project, I have container classes holding lists of item classes. For example, a container class myLibrary might hold a list of item classes myNation and associated variables...

Python

Bad marshal data

by: Michael McGarry | last post by:

Hi, I am using the marshal module in python to save a data structure to a file. It does not appear to be portable. The data is saved on a Linux machine. Loading that same data on a Mac gives me...

Python

Marshal Obj is String or Binary?

by: Mike | last post by:

Hi, The example below shows that result of a marshaled data structure is nothing but a string >>> data = {2:'two', 3:'three'} >>> import marshal >>> bytes = marshal.dumps(data) >>>...

Python

marshal and unmarshal

by: leo | last post by:

Hi, following is marshal and unmarshal script import marshal strng = """ print 'hello world' """ code = compile(strng, "<strng>", "exec") data = marshal.dumps(code) strng0 =...

Python

Quadratic Optimization Problem

by: amitsoni.1984 | last post by:

Hi, I need to do a quadratic optimization problem in python where the constraints are quadratic and objective function is linear. What are the possible choices to do this. Thanks Amit

Python

Python 2.4 does not marshal infinity floating point properly under Win32

by: Pierre Rouleau | last post by:

Hi all, When using Python 2.4.x on a Win32 box, marshal.loads(marshal.dumps(1e66666)) returns 1.0 instead of infinity as it should and does under Python 2.5 (also running on Win32 ). This...

Python

Automatically delete old backup dumps

by: snoconegod | last post by:

Hi all, This one's probably super easy and I'm just not finding the right command/job: I've already set up a scheduled backup job that dumps the contents of my DB2 (ver 8.1) database to a...

DB2 Database

DB2 exam 701 - Need dumps

by: Ramya S | last post by:

Hi all , Can anyone help me to get some dump papers for exam 701.Am not able to get dumps for this exam.I don want to buy some downloads. Thanks, Ramya

DB2 Database

marshal bug?

by: Anurag | last post by:

I have been chasing a problem in my code since hours and it bolis down to this import marshal marshal.dumps(str(123)) != marshal.dumps(str("123")) Can someone please tell me why? when...

Python

debugging core dumps on other computer

by: Bruno Gonzalez (STenyaK) | last post by:

(first of all, sorry if this is not the correct place to ask, but i couldn't find a better one...) I'm new to debugging using core dumps. I've managed to get core dumps + symbols using g++ and...

C / C++

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

marshal.dumps quadratic growth and marshal.dump not allowingfile-like objects

Similar topics