473,657 Members | 2,513 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Interned Strings

Hello All,

I'm trying to clarify how Python avoids byte by byte
string comparisons most of the time. As I understand,
dictionaries keep strings, their keys (hash values),
and caches of their keys. Caching keys helps to avoid
recalculation of a string's hash value. So, when two
strings need to be compared, only their cached keys
are compared, which improves performance as there is
no need for byte by byte comparison.

Also, there is a global interning dictionary that
keeps interned strings. What I don't understand is why
strings are interned. How does it help with string
comparisons?

Thank you.

_______________ _______________ _______________ _____
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Jan 10 '06 #1
1 1906
Dave wrote:
I'm trying to clarify how Python avoids byte by byte
string comparisons most of the time. As I understand,
dictionaries keep strings, their keys (hash values),
and caches of their keys.
If you are talking about dictionaries with string keys,
you also have the dictionary values, of course.
Caching keys helps to avoid
recalculation of a string's hash value.
Correct (s/keys/string hash values/).
So, when two
strings need to be compared, only their cached keys
are compared, which improves performance as there is
no need for byte by byte comparison.
No. When comparing to strings s1 and s2, the following
operations occur:
1. is s1 and s2 the very same string? If yes, they
are equal.
2. else, do they have the same size, the same first
byte (which might be a null byte), and do they
compare equal, byte-by-byte?
If yes, they are equal, if not, they are not equal.
3. Is it perhaps some other compare operation (<,>

<=, >, >=) that we want to perform? Do the
slow algorithm.

As you can see, the string hash is never consulted when
comparing string objects. It is only consulted to
find the potential dictionary slot in the first place.
Also, there is a global interning dictionary that
keeps interned strings. What I don't understand is why
strings are interned. How does it help with string
comparisons?


Why you look up a dictionary entry, this happens:
1. compute the key hash.
2. find the corresponding dictionary slot
If the slot is empty, KeyError.
3. compare the slot key with the search key.
If they are equal: value found.
If they are different: collision, go to the
next key.

Interned strings speed up step 1 and step 3. If
you only have interned strings throughout, you always
also have the hash value. Of course, you had to
compute the hash value when adding the string
to the interning dictionary.

The real speedup is in step 3, and based on
the assumption that collisions won't happen:
if you lookup the key (e.g. to find the value
of a global variable), you find the slot using
the computed hash. Then:
1. if the slot is empty, it's a KeyError.
2. if the slot is not empty, you first compare
for pointer equality. As collisions are
supposedly unlikely, this will be an equal
string most of the time. Then, if you have
interning, it even will be the *same* string,
so you only need to compare pointers to find
out they are the same string.

So assuming all strings are interned, this is how
you do the dictionary lookup.
1. fetch the hash value from the key (no need to
compute it - it's already cached)
2. go to the slot in the dict.
3. if the slot is empty (==NULL): KeyError
4. otherwise: success.
As you can see, in that case, there is no need to
compare the string values. If there are collisions,
and if not all strings are interned, the algorithm
gets more complicated, but four items above are
assumed to be the common case.

Regards,
Martin
Jan 10 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
5766
by: Ravi | last post by:
Hi, I have about 200GB of data that I need to go through and extract the common first part of a line. Something like this. >>>a = "abcdefghijklmnopqrstuvwxyz" >>>b = "abcdefghijklmnopBHLHT" >>>c = extract(a,b) >>>print c "abcdefghijklmnop"
17
7391
by: Gordon Airport | last post by:
Has anyone suggested introducing a mutable string type (yes, of course) and distinguishing them from standard strings by the quote type - single or double? As far as I know ' and " are currently interchangeable in all circumstances (as long as they're paired) so there's no overloading to muddy the language. Of course there could be some interesting problems with current code that doesn't make a distinction, but it would be dead easy to fix...
1
2491
by: Byron Morgan | last post by:
Anyone run into this before? I have a python app that has been reliable, running for days on end without a crash. Suddenly, It repeatedly crashes with the following message: "Fatal Python error: Inconsistent interned string state. abnormal program termination" Some background:
16
2418
by: Paul Prescod | last post by:
I skimmed the tutorial and something alarmed me. "Strings are a powerful data type in Prothon. Unlike many languages, they can be of unlimited size (constrained only by memory size) and can hold any arbitrary data, even binary data such as photos and movies.They are of course also good for their traditional role of storing and manipulating text." This view of strings is about a decade out of date with modern programmimg practice. From...
4
10530
by: agent349 | last post by:
First off, I know arrays can't be compared directly (ie: if (arrary1 == array2)). However, I've been trying to compare two arrays using pointers with no success. Basically, I want to take three sets of character strings from the user. Then I want to run through each element and compare the two strings. If they match I print they match... I'm having a bit of trouble with the actual loop through each array using the pointers and comparing...
4
1659
by: Jon Skeet | last post by:
I've just noticed something rather odd and disturbing. The following code displays "True": using System; class Test { public static void Main(string args) { string x = new string ("".ToCharArray());
25
3541
by: Rainmaker | last post by:
Hi, Can anyone tell me an efficient algorithm to sort an array of strings? Keep in mind that this array is HUGE and so the algorithm should me efficient enough to deal with it. Thanks
2
22596
by: Potiuper | last post by:
Question: Is it possible to use a char pointer array ( char *<name> ) to read an array of strings from a file in C? Given: code is written in ANSI C; I know the exact nature of the strings to be read (the file will be written by only this program); file can be either in text or binary (preferably binary as the files may be read repeatedly); the amount and size of strings in the array won't be known until run time (in the example I have it in...
95
5046
by: hstagni | last post by:
Where can I find a library to created text-based windows applications? Im looking for a library that can make windows and buttons inside console.. Many old apps were make like this, i guess ____________________________________ | | | ------------------ | | | BUTTON | | | ...
0
8413
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8324
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8842
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8740
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8513
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8617
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7352
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5642
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.