473,382 Members | 1,345 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Another Python rookie trying to port to C/C++ question.


A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...

Am I understanding/interpreting correctly when I go with the idea that a
"dict" variable can be looked at as (effectively) two parallel arrays?
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}

Where "ValType" would be likely be a union, to allow
assigning/retrieving the actual values correctly depending on what their
individiual types were, and "<somenumber>" is just that: Some
more-or-less arbitrarily selected value that provides enough space to
handle all expected numbers of key/value pairs?

Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

(again, ValType would probably be a union to cover each type of value
that might be wanted in the dictionary - but in this case, Key directly
contains the Key's name) then building a double-linked list of struct
DictStructs end up behaving at least reasonably like a Python "dict"?

I expect that either will be a reasonable facsimile of how Python
handles dicts, with, of course, the corresponding overhead (which may be
surprisingly little, I'm thinking... but I have yet to code that part,
so I may find out it's going to be a nightmare) of "We want the value
that goes with key, so we've got to walk the KeyArray[] until we find a
match, then extract the value from the other array", or a similar
procedure for the doubly-linked list.

Or am I way out in left field with the way I'm RTFM?

(Note 1)
Along with quite a bit of entirely unwelcome "Python evangelism" --
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of
you who took it upon yourselves to preach (including the idiot who
decided to spew a relatively impressive (as such things go) string of
insults about how stupid I am, how far up my ass my head is, and how
pathetic my family all the way back to Adam must be for producing such a
throwback because I've got no interest in adopting "the perfect
language...") at me on the merits of Python, both on group and
in-mailbox?

--
Don Bruder - da****@sonic.net <--- Preferred Email - SpamAssassinated.
Hate SPAM? See <http://www.spamassassin.org> for some seriously great info.
I will choose a path that's clear: I will choose Free Will! - N. Peart
Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
Jul 18 '05 #1
12 2802
Don Bruder <da****@sonic.net> writes:

[rewriting a Python program in C/C++]
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}
[...] Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

[...] end up behaving at least reasonably like a Python "dict"?


Yes.

Except that it would be less flexible, *much* slower, and
probably much more buggy.

Thomas
Jul 18 '05 #2
Don Bruder wrote:
A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...
As you seem to feel at home with C, you could have a look into
Objects/dictobject.c and Include/dictobject.h of the source distribution.
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of


Smart people use efficient tools and build on existing libraries. If you are
serious about the C++ part of your subject line, the Standard library aka
STL has a map template that resembles Python's dict and might enhance both
development and execution speed.

Off topic (or not):
A single person can start a discussion but not limit it, I think that's a
feature of newsgroups rather than an annoyance.
Peter

Jul 18 '05 #3
Don Bruder wrote:
A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...
As you seem to feel at home with C, you could have a look into
Objects/dictobject.c and Include/dictobject.h of the source distribution.
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of


Smart people use efficient tools and build on existing libraries. The
Standard library aka STL has a <map> template that might enhance both
development and execution speed.

Off topic (or not):
A single person can start a discussion but not end it, I think that's a
feature of newsgroups rather than an annoyance.
Peter

Jul 18 '05 #4
"Don Bruder" <da****@sonic.net> schrieb im Newsbeitrag
news:GS********************@typhoon.sonic.net...
I expect that either will be a reasonable facsimile of how Python
handles dicts, with, of course, the corresponding overhead (which may be
surprisingly little, I'm thinking... but I have yet to code that part,
so I may find out it's going to be a nightmare) of "We want the value
that goes with key, so we've got to walk the KeyArray[] until we find a
match, then extract the value from the other array", or a similar
procedure for the doubly-linked list.
Python hashes the keys for fast lookup.

(Note 1)
Along with quite a bit of entirely unwelcome "Python evangelism" --
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of
you who took it upon yourselves to preach (including the idiot who
decided to spew a relatively impressive (as such things go) string of
insults about how stupid I am, how far up my ass my head is, and how
pathetic my family all the way back to Adam must be for producing such a
throwback because I've got no interest in adopting "the perfect
language...") at me on the merits of Python, both on group and
in-mailbox?


You are treated as you treat others.
It might suprise you but this is a newsgroup not your personal information
lookup place.
Jul 18 '05 #5
In article <bk*************@news.t-online.com>,
Peter Otten <__*******@web.de> wrote:
Don Bruder wrote:
A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...
As you seem to feel at home with C, you could have a look into
Objects/dictobject.c and Include/dictobject.h of the source distribution.


I may just go poking around there for more details.
Someone (via email) suggests that dicts are "supposed to be" (my
synopsis, not his words) a bit "murky", because all that's supposed to
be relevant is stuffing things into them, and pulling things out. That's
fine, completely admirable, and fully acceptable. *IF* You're working
directly in Python. I'm not. I'm working in a combination of C and C++,
attempting to duplicate the functionality of a program originally
written in Python. As such, the details *MUST* get clear if I'm to
create equivalent functionality. If only because I need to read data
that was written (presumably...) by a Python program. Hence, it becomes
neccesary to get down "in the muck" and actually know what's going on
behind the nice, tidy "dict.whosit.get()" (or whatever other operators
might be involved) interface that Python provides.
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of
Smart people use efficient tools and build on existing libraries.


There's a statement that needed saying.... Not...

If you are
serious about the C++ part of your subject line, the Standard library aka
STL has a map template that resembles Python's dict and might enhance both
development and execution speed.
I'm *REASONABLY* serious. This port started life as a "Do you really
know as much about C++ as you think you do? Prove it! Take this Python
program and make it C++!" sort of "mid-term" test in my self-taught C++
course. It's rapidly degenerating (due to some technicalities of C++
that I've never before encountered, and therefore had no expectation of
trouble from) into "Let's just get the damn thing working in straight C,
then worry about trying to turn it into C++ later."

Off topic (or not):
A single person can start a discussion but not limit it, I think that's a
feature of newsgroups rather than an annoyance.


Perhaps, and perhaps not. Guess it depends on your perspective. Having a
stream of insult and invective dumped into your mailbox because you've
made it plain that you don't consider somebody's "pet" language the
be-all and end-all of computer programming is hardly what I'd call a
"feature"...

Be that as it may, fortunately I have a thick enough hide to
figuratively "flip off" the jerk that was involved, and go on with life
knowing full well that in at least one respect, I'm so far superior to
him that words can't describe the gulf between us. Doesn't mean I'm not
going to mention that I think he's a hopeless putz, but I'm sure not
going to let it get me down to the point of giving up.

--
Don Bruder - da****@sonic.net <--- Preferred Email - SpamAssassinated.
Hate SPAM? See <http://www.spamassassin.org> for some seriously great info.
I will choose a path that's clear: I will choose Free Will! - N. Peart
Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
Jul 18 '05 #6
In article <8y**********@python.net>,
Thomas Heller <th*****@python.net> wrote:
Don Bruder <da****@sonic.net> writes:

[rewriting a Python program in C/C++]
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}

[...]
Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

[...] end up behaving at least reasonably like a Python "dict"?


Yes.

Except that it would be less flexible, *much* slower, and
probably much more buggy.

Thomas


Well, since I'm not worried about "flexible", have my doubts about the
"much slower" part, and have no reason to believe that choice of
language automatically makes a routine buggy, I'll go with the tenative
plan I had. Seems you're confirming that I'm understanding things
properly when it comes to "This is what a dict is, and here's how it
works." Now to put finishing touches on my understanding of exactly what
goes on inside that "black box" that is the "dict"
class/type/whatever-it-is-you-Python-types-like-to-call-such-things, and
make some working code out of it...

--
Don Bruder - da****@sonic.net <--- Preferred Email - SpamAssassinated.
Hate SPAM? See <http://www.spamassassin.org> for some seriously great info.
I will choose a path that's clear: I will choose Free Will! - N. Peart
Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
Jul 18 '05 #7
On Thu, 25 Sep 2003 18:55:34 GMT, Don Bruder <da****@sonic.net> wrote:
Dictionaries/"dict" types...

Am I understanding/interpreting correctly when I go with the idea that a
"dict" variable can be looked at as (effectively) two parallel arrays?
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}


Sort of, in a sense, but not quite. The types of items within the
dictionary (both key and value) are not constrained. This is no
problem as all Python objects are held using a reference scheme. So
(assuming that a pair-object scheme is used in the dictionary) the
struct will be defined much like...

struct DictPair
{
PyObject *Key;
PyObject *Data;
}

The items will almost certainly be held in a single 'array' of pair
objects.

The Python dictionaries use hashing. I don't know the details of the
hashing used. Anyway, there are many ways to handle dictionary-like
data structures.

C++ has the std::map template, which uses a form of binary tree called
a red-black tree (the red-black refers to a partial balancing system).

Tree-based mappings can be slightly more flexible than hashing (for
instance, a tree can be 'walked' in sorted order - the hashing process
does not preserve relative ordering) but hashing tends to be faster.

An interesting approach to dictionary-like objects is the ternary
tree. This is rather like cascading binary trees - each subtree
identifies one character and cascades into the subtree that finds the
next character. The third link in the 'ternary' is the 'I've
identified one character and moving on to the next' link.

A ternary tree can be faster than hashing - in particular at deciding
that a particular key is not present (hashing needs to calculate a
hash over the whole tree before looking up the result - a ternary tree
search may fail at the first character, without looking at the whole
key).

Digital trees (or tries, IIRC) can be even faster still - but wastes a
lot of memory. In a digital tree, each node has an array subscripted
by character/digit code of pointers to the next subtree.

Multiway trees (most often used on disk for database indexes - B trees
and B+ trees being types of multiway trees) may well be appropriate in
main store on modern desktop machines with caching and virtual memory.

If these options are just too complicated, sorted arrays can be
extremely effective as long as you don't have too many items. The C++
library includes std::vector (a resizable array) and binary search
algorithms which can ease the implementation.
Odds are, if you are translating Python to C++, you should use an
std::map to replace a dictionary. There are some different
assumptions, though. Dictionaries assume that the key is hashable, but
don't require relative comparisons ('<', '<=' etc). The std::map
requires relative comparisons.

This simply arises out of the different approaches - Python using
hashing and C++ using binary trees. Going from Python to C++, 99 times
out of 100 this is not a problem. From C++ to Python there is the
minor issue that Python dictionaries can't be _directly_ iterated in
sorted order, though there are certainly easy ways around that.
--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk
Jul 18 '05 #8
On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <br***@sweetapp.com>
wrote:
Well, since I'm not worried about "flexible", have my doubts about the
"much slower" part,


Your implementation is O(n) with the number of entries while the Python
implementation is O(log(n)). If n is small then your implementation
might be acceptably fast. Can't you just use the STL hash_map template
class?


Are you sure Python dictionaries are O(log n)? - Remember, they are
based on hashing.

Simple hashing is O(1). Handling of collisions and stuff will affect
that (and I'm no expert on hashing techniques) but I can't help
wondering if you're confusing Pythons hashing with tree-based
techniques.
--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk
Jul 18 '05 #9
On Thu, 25 Sep 2003, Don Bruder wrote:
[...] As such, the details *MUST* get clear if I'm to create equivalent
functionality. If only because I need to read data that was written
(presumably...) by a Python program. Hence, it becomes neccesary to get
down "in the muck" and actually know what's going on behind the nice,
tidy "dict.whosit.get()" (or whatever other operators might be involved)
interface that Python provides.


is this data in 'pickle' format (as opposed to text, XML, or some specific
binary format)? if it is, reading it in C/C++ is going to be very hard
indeed; you might consider writing a small python program to read it and
convert it to something more tractable. if it isn't, you don't need to
'get down in the muck'; you just need to write a C program to do the job
that needs to be done.

if pickling is entirely new to you, then your data probably isn't pickled,
so you don't need to worry about it.
Off topic (or not): A single person can start a discussion but not
limit it, I think that's a feature of newsgroups rather than an
annoyance.


Perhaps, and perhaps not. Guess it depends on your perspective. Having a
stream of insult and invective dumped into your mailbox because you've
made it plain that you don't consider somebody's "pet" language the
be-all and end-all of computer programming is hardly what I'd call a
"feature"...


i'm sorry that happened; i hope you won't think badly of python and
pythoneers because of one moron. discreetly let the relevant people know
who he is and we'll fetch the Comfy Chair ... 8)

tom

--
Things fall apart - it's scientific

Jul 18 '05 #10
On Thu, 25 Sep 2003, Don Bruder wrote:
In article <8y**********@python.net>,
Thomas Heller <th*****@python.net> wrote:
Don Bruder <da****@sonic.net> writes:

[rewriting a Python program in C/C++]
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}
[...]
Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

[...] end up behaving at least reasonably like a Python "dict"?


Yes.

Except that it would be less flexible, *much* slower, and
probably much more buggy.


Well, since I'm not worried about "flexible", have my doubts about the
"much slower" part,


think again. python's dictionaries use an efficient implementation, are
written in C, and have been tuned over the course of several years. your
sketched design uses a very slow implementation, and is brand new.
and have no reason to believe that choice of language automatically
makes a routine buggy,


it's maturity that matters in this case; you will make mistakes, just like
the python (and STL) implementers did, but they've already had years to
find and fix them.

if you don't fancy using STL (which will be the case if you're steering
clear of C++ - and might well be even if you weren't!), at least go and
read up on how to implement dictionaries; they're a very well-studied
abstract type. hashtables are simple and efficient, and binary trees are
good too. there are more complicated things like ternary trees, red-black
trees, Patricia trees (my personal favourite :) ) and skip lists, but
you'll probably be fine with hashtables. if you can lay your hands on a
good data structures and algorithms textbook, you'll be fine (Sedgewick's
'Algorithms in C' is where i learnt my craft).

tom

--
Things fall apart - it's scientific

Jul 18 '05 #11
On Fri, 26 Sep 2003 00:33:46 +0100, Tom Anderson
<tw**@urchin.earth.li> wrote:
Patricia trees (my personal favourite :)


Oh goody - Something new to look up!
--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk
Jul 18 '05 #12
Stephen Horne <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote in
news:79********************************@4ax.com:
On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <br***@sweetapp.com>
wrote:
Well, since I'm not worried about "flexible", have my doubts about the
"much slower" part,


Your implementation is O(n) with the number of entries while the Python
implementation is O(log(n)). If n is small then your implementation
might be acceptably fast. Can't you just use the STL hash_map template
class?


Are you sure Python dictionaries are O(log n)? - Remember, they are
based on hashing.

Simple hashing is O(1). Handling of collisions and stuff will affect
that (and I'm no expert on hashing techniques) but I can't help
wondering if you're confusing Pythons hashing with tree-based
techniques.


Python dictionaries are effectively O(1). Dictionaries are never more than
2/3rds full, so on average a dictionary lookup only needs a couple of
probes. Python's collision resolution is designed such that in most cases
lookups that collide on the first attempt will diverge for the next
attempt.

As some of the other posters have said dictionaries in Python have been
tuned over many years, and since the whole of Python is based on fast
dictionary lookups it is very unlikely that the OP will get close to the
same efficiency. That may not matter though as Python dictionaries are
tuned for general use, so a specific application might be able to do better
by losing some of the generality.

Some of the tuning extends beyond dictionaries, for example strings are
commonly used as dictionary keys, and Python avoids recalculating hash
values more than once for each string. It is unlikely that a C or C++
programmer would go to the effort of using a string type that cached hash
values throughout their application, so for many uses, string lookups in
dictionaries will almost certainly be slower (although there could be gains
elsewhere).

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Don Bruder | last post by:
Greetings, oh scaly ones... :) I'm a Mac user with a fairly high level of computer literacy, including reasonable programming skills in BASIC, C, Pascal, and several flavors of ML, just trying...
3
by: Michael Sparks | last post by:
Hi, I'm posting a link to this since I hope it's of interest to people here :) I've written up the talk I gave at ACCU Python UK on the Kamaelia Framework, and it's been published as a BBC...
8
by: Daniel Bickett | last post by:
This post started as an incredibly long winded essay, but halfway through I decided that was a terribly bad idea, so I've trimmed it down dramatically, and put it in the third person (for humor's...
7
by: clintonG | last post by:
I'm puzzled and don't think this is possible but if an application that is running on websiteA generates a file can FTP be used from websiteA to transfer that file to websiteB which would be...
12
by: Arash Partow | last post by:
Hi all, I've ported various hash functions to python if anyone is interested: def RSHash(key): a = 378551 b = 63689 hash = 0
206
by: WaterWalk | last post by:
I've just read an article "Building Robust System" by Gerald Jay Sussman. The article is here: http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf In it there is a...
3
by: chernevik | last post by:
Here is a newbie question: how do I get the server examples in the Preview chapter of "Progamming Python" (Lutz) to run? The code is supposed to be a little webserver on which to run examples,...
15
by: kyosohma | last post by:
Hi, I am trying to get a small group of volunteers together to create Windows binaries for any Python extension developer that needs them, much like the package/extension builders who volunteer...
31
by: raylopez99 | last post by:
I went through a bunch of Regex examples, and indeed it's quite powerful, including 'groups' using 'matches', word boundaries, lookahead matches, replacing and splitting text,etc--apparently...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.