472,984 Members | 2,352 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,984 software developers and data experts.

PEP 353: Using ssize_t as the index type

I've been working on PEP 353 for some time now.
Please comment, in particular if you are using 64-bit


PEP: 353
Title: Using ssize_t as the index type
Version: $Revision: 42333 $
Last-Modified: $Date: 2006-02-12 10:36:52 +0100 (So, 12 Feb 2006) $
Author: Martin v. Lwis <ma****@v.loewis.de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Dec-2005

In Python 2.4, indices of sequences are restricted to the C type
int. On 64-bit machines, sequences therefore cannot use the full
address space, and are restricted to 2**31 elements. This PEP proposes
to change this, introducing a platform-specific index type
Py_ssize_t. An implementation of the proposed change is in

64-bit machines are becoming more popular, and the size of main memory
increases beyond 4GiB. On such machines, Python currently is limited,
in that sequences (strings, unicode objects, tuples, lists,
array.arrays, ...) cannot contain more than 2GiElements.

Today, very few machines have memory to represent larger lists: as
each pointer is 8B (in a 64-bit machine), one needs 16GiB to just hold
the pointers of such a list; with data in the list, the memory
consumption grows even more. However, there are three container types
for which users request improvements today:

* strings (currently restricted to 2GiB)
* mmap objects (likewise; plus the system typically
won't keep the whole object in memory concurrently)
* Numarray objects (from Numerical Python)

As the proposed change will cause incompatibilities on 64-bit
machines, it should be carried out while such machines are not in wide
use (IOW, as early as possible).

A new type Py_ssize_t is introduced, which has the same size as the
compiler's size_t type, but is signed. It will be a typedef for
ssize_t where available.

The internal representation of the length fields of all container
types is changed from int to ssize_t, for all types included in the
standard distribution. In particular, PyObject_VAR_HEAD is changed to
use Py_ssize_t, affecting all extension modules that use that macro.

All occurrences of index and length parameters and results are changed
to use Py_ssize_t, including the sequence slots in type objects, and
the buffer interface.

New conversion functions PyInt_FromSsize_t and PyInt_AsSsize_t, are
introduced. PyInt_FromSsize_t will transparently return a long int
object if the value exceeds the LONG_MAX; PyInt_AsSsize_t will
transparently process long int objects.

New function pointer typedefs ssizeargfunc, ssizessizeargfunc,
ssizeobjargproc, and ssizessizeobjargproc are introduced. The
buffer interface function types are now called readbufferproc,
writebufferproc, segcountproc, and charbufferproc.

A new conversion code 'n' is introduced for PyArg_ParseTuple
and Py_BuildValue, which operates on Py_ssize_t.

The conversion codes 's#' and 't#' will output Py_ssize_t
if the macro PY_SSIZE_T_CLEAN is defined before Python.h
is included, and continue to output int if that macro
isn't defined.

At places where a conversion from size_t/Py_ssize_t to
int is necessary, the strategy for conversion is chosen
on a case-by-case basis (see next section).

To prevent loading extension modules that assume a 32-bit
size type into an interpreter that has a 64-bit size type,
Py_InitModule4 is renamed to Py_InitModule4_64.
Conversion guidelines

Module authors have the choice whether they support this PEP in their
code or not; if they support it, they have the choice of different
levels of compatibility.

If a module is not converted to support this PEP, it will continue to
work unmodified on a 32-bit system. On a 64-bit system, compile-time
errors and warnings might be issued, and the module might crash the
interpreter if the warnings are ignored.

Conversion of a module can either attempt to continue using int
indices, or use Py_ssize_t indices throughout.

If the module should continue to use int indices, care must be taken
when calling functions that return Py_ssize_t or size_t, in
particular, for functions that return the length of an object (this
includes the strlen function and the sizeof operator). A good compiler
will warn when a Py_ssize_t/size_t value is truncated into an int.
In these cases, three strategies are available:

* statically determine that the size can never exceed an int
(e.g. when taking the sizeof a struct, or the strlen of
a file pathname). In this case, write::

some_int = Py_SAFE_DOWNCAST(some_value, Py_ssize_t, int);

This will add an assertion in debug mode that the value
really fits into an int, and just add a cast otherwise.

* statically determine that the value shouldn't overflow an
int unless there is a bug in the C code somewhere. Test
whether the value is smaller than INT_MAX, and raise an
InternalError if it isn't.
* otherwise, check whether the value fits an int, and raise
a ValueError if it doesn't.

The same care must be taking for tp_as_sequence slots, in
addition, the signatures of these slots change, and the
slots must be explicitly recast (e.g. from intargfunc
to ssizeargfunc). Compatibility with previous Python
versions can be achieved with the test::

#if PY_VERSION_HEX < 0x02050000
typedef int Py_ssize_t;

and then using Py_ssize_t in the rest of the code. For
the tp_as_sequence slots, additional typedefs might
be necessary; alternatively, by replacing::

PyObject* foo_item(struct MyType* obj, int index)


PyObject* foo_item(PyObject* _obj, Py_ssize_t index)
struct MyType* obj = (struct MyType*)_obj;

it becomes possible to drop the cast entirely; the type
of foo_item should then match the sq_item slot in all
Python versions.

If the module should be extended to use Py_ssize_t indices, all usages
of the type int should be reviewed, to see whether it should be
changed to Py_ssize_t. The compiler will help in finding the spots,
but a manual review is still necessary.

Particular care must be taken for PyArg_ParseTuple calls:
they need all be checked for s# and t# converters, and
PY_SIZE_T_CLEAN must be defined before including Python.h
if the calls have been updated accordingly.

Why not size_t

An initial attempt to implement this feature tried to use
size_t. It quickly turned out that this cannot work: Python
uses negative indices in many places (to indicate counting
from the end). Even in places where size_t would be usable,
to many reformulations of code where necessary, e.g. in
loops like::

for(index = length-1; index >= 0; index--)

This loop will never terminate if index is changed from
int to size_t.

Why not Py_intptr_t

Conceptually, Py_intptr_t and Py_ssize_t are different things:
Py_intptr_t needs to be the same size as void*, and Py_ssize_t
the same size as size_t. These could differ, e.g. on machines
where pointers have segment and offset. On current flat-address
space machines, there is no difference, so for all practical
purposes, Py_intptr_t would have worked as well.

Doesn't this break much code?

With the changes proposed, code breakage is fairly
minimal. On a 32-bit system, no code will break, as
Py_ssize_t is just a typedef for int.

On a 64-bit system, the compiler will warn in many
places. If these warnings are ignored, the code will
continue to work as long as the container sizes don't
exceeed 2**31, i.e. it will work nearly as good as
it does currently. There are two exceptions to this
statement: if the extension module implements the
sequence protocol, it must be updated, or the calling
conventions will be wrong. The other exception is
the places where Py_ssize_t is output through a
pointer (rather than a return value); this applies
most notably to codecs and slice objects.

If the conversion of the code is made, the same code
can continue to work on earlier Python releases.

Doesn't this consume too much memory?

One might think that using Py_ssize_t in all tuples,
strings, lists, etc. is a waste of space. This is
not true, though: on a 32-bit machine, there is no
change. On a 64-bit machine, the size of many
containers doesn't change, e.g.

* in lists and tuples, a pointer immediately follows
the ob_size member. This means that the compiler
currently inserts a 4 padding bytes; with the
change, these padding bytes become part of the size.
* in strings, the ob_shash field follows ob_size.
This field is of type long, which is a 64-bit
type on most 64-bit systems (except Win64), so
the compiler inserts padding before it as well.

Open Issues

* Marc-Andre Lemburg commented that complete backwards
compatibility with existing source code should be
preserved. In particular, functions that have
Py_ssize_t* output arguments should continue to run
correctly even if the callers pass int*.

It is not clear what strategy could be used to implement
that requirement.

This document has been placed in the public domain.
Feb 12 '06 #1
2 1835
In article <43***********************@news.freenet.de>,
"Martin v. Lwis" <ma****@v.loewis.de> wrote:

Why not size_t

An initial attempt to implement this feature tried to use
size_t. It quickly turned out that this cannot work: Python
uses negative indices in many places (to indicate counting
from the end). Even in places where size_t would be usable,
to many reformulations of code where necessary, e.g. in ...

Minor typo: "too"
__________________________________________________ ______________________
TonyN.:' *firstname*nlsnews@georgea*lastname*.com
' <http://www.georgeanelson.com/>
Feb 12 '06 #2
Tony Nelson wrote:
Minor typo: "too"

Thanks, fixed.

Feb 12 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: bin_P19 P | last post by:
the code i have got is as follows and now im stuck <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Shopping...
by: doomx | last post by:
I'm using SQL scripts to create and alter tables in my DB I want to know if it's possible to fill the description(like in the Create table UI) using these scripts. EX: CREATE TABLE(...
by: William Xuuu | last post by:
Hi, This's a way of defining size_t and ssize_t in Linux: //"linux/types.h" typedef __kernel_size_t size_t; typedef __kernel_ssize_t ssize_t; //"asm/posix_types.h" typedef unsigned...
by: Nathan | last post by:
I have an aspx page with a data grid, some textboxes, and an update button. This page also has one html input element with type=file (not inside the data grid and runat=server). The update...
by: ransoma22 | last post by:
I developing an application that receive SMS from a connected GSM handphone, e.g Siemens M55, Nokia 6230,etc through the data cable. The application(VB.NET) will receive the SMS automatically,...
by: Roka100 | last post by:
Hi, I am using size_t and ssize_t . But I am confused about them. <ssize_t> typedef int __ssize_t; typedef __ssize_t ssize_t; <size_t > typedef unsigned int size_t;
by: Mark Odell | last post by:
I've always declared variables used as indexes into arrays to be of type 'size_t'. I have had it brought to my attention, recently, that size_t is used to indicate "a count of bytes" and that using...
by: Harris | last post by:
Dear all, I have the following codes: ====== public enum Enum_Value { Value0 = 0, Value1 = 10,
by: happyse27 | last post by:
Hi All, I am creating the perl script using html form(with embedded javascript inside). When using this html form with javascript alone, it works where the form validation will pop up...
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.