473,769 Members | 1,730 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How do I get a reference to a KEY value of a dictionary?

I am new to python, so please bear with me if I am making some
conceptual error.

Basically I want to create a graph with an adjacency list
representation, but I don't want any of the adjacency lists to have
duplicate strings when it is avoidable. I have a function createEdge
that adds an edge to the graph. The arguments will be distinct since
they are read from text files. But basically I want to use the
dictionary as a string pool, and if the argument string equals
something in the pool already, don't use the argument string, just a
use a reference to something in the string pool already.

Is this a waste of time? Because I know in python I cannot be certain
that the argument strings that are read from files are even garbage
collected anyway. I could certainly do the job with duplicate
strings, but it would seem wasteful. I am a C programmer, and old
habits die hard.

The following code works -- I tested it and entries in the adjacency
list that are equivalent are in fact identical. But it seems rather
stupid to have a dictionary of the form {'alice': 'alice', 'bob':
'bob'}, etc. i.e. the keys and values are the same. It would be nice
if I can get a reference to the key in the same time as a hash lookup.
I know I can do dictionary.keys ().index( 'string' ) or something but
that would be pretty inefficient, I believe.

class GraphAccumulato r:
def __init__( self, fileFunction ):
self.fileFuncti on = fileFunction
self.adjList = {} # adjacency list
self.nodes = {} # list of nodes for
preventing string dupes

def createEdge( self, node1, node2 ):

# another way of doing by using a dictionary

nodes = self.nodes
if nodes.has_key( node2 ):
node2 = nodes[ node2 ]
else:
nodes[ node2 ] = node2

adjList = self.adjList
if adjList.has_key ( node1 ):
adjList[ node1 ].append( node2 );
else:
adjList[ node1 ] = [ node2 ]; # singleton edge list
Jul 18 '05 #1
15 2637
On 31 Jul 2003 17:36:41 -0700, Andy C wrote:
Is this a waste of time? Because I know in python I cannot be certain
that the argument strings that are read from files are even garbage
collected anyway. I could certainly do the job with duplicate
strings, but it would seem wasteful. I am a C programmer, and old
habits die hard.


Python dynamically binds names ("variables" ) to objects; many types,
including immutable strings, will not be duplicated if an identical one
is already stored.

The upshot is: store an identical string as many times as you like,
because PYthon will simply keep references to the one string object for
each duplicate.

--
\ "Friendship is born at that moment when one person says to |
`\ another, 'What! You too? I thought I was the only one!'" -- |
_o__) C.S. Lewis |
Ben Finney <http://bignose.squidly .org/>
Jul 18 '05 #2

"Andy C" <an******@yahoo .com> wrote in message
news:64******** *************** ***@posting.goo gle.com...
I am new to python, so please bear with me if I am making some
conceptual error.

Basically I want to create a graph with an adjacency list
representation, but I don't want any of the adjacency lists to have
duplicate strings when it is avoidable. I have a function createEdge that adds an edge to the graph. The arguments will be distinct since they are read from text files. But basically I want to use the
dictionary as a string pool, and if the argument string equals
something in the pool already, don't use the argument string, just a
use a reference to something in the string pool already.


Thinking in terms of 'references' can both help and hinder. (There
have been some long recent discussion.)

Are you familiar with this?
help('intern')


Help on built-in function intern:

intern(...)
intern(string) -> string

``Intern'' the given string. This enters the string in the
(global)
table of interned strings whose purpose is to speed up dictionary
lookups.
Return the string itself or the previously interned string object
with the
same value.

TJR
Jul 18 '05 #3
I don't think this is true in many cases. For example:
x = 'hi'
y = 'hi'
x is y True a = 'hi'
b = 'h'
b += 'i'
b 'hi' a is b
False

Also, when the strings come from different files read in using readline(),
it appears that duplicates aren't eliminated. I tried my code without the
hash table duplicate elimination, and the adjacency lists has non-identical
strings. After I constructed the adjacency lists, I compared them with 'is'
and they were not identical. After I used my hash table hack solution, all
equivalent strings were identical.

"Ben Finney" <bi************ ****@and-benfinney-does-too.id.au> wrote in
message news:sl******** *************** ********@iris.p olar.local... On 31 Jul 2003 17:36:41 -0700, Andy C wrote:
Is this a waste of time? Because I know in python I cannot be certain
that the argument strings that are read from files are even garbage
collected anyway. I could certainly do the job with duplicate
strings, but it would seem wasteful. I am a C programmer, and old
habits die hard.


Python dynamically binds names ("variables" ) to objects; many types,
including immutable strings, will not be duplicated if an identical one
is already stored.

The upshot is: store an identical string as many times as you like,
because PYthon will simply keep references to the one string object for
each duplicate.

--
\ "Friendship is born at that moment when one person says to |
`\ another, 'What! You too? I thought I was the only one!'" -- |
_o__) C.S. Lewis |
Ben Finney <http://bignose.squidly .org/>

Jul 18 '05 #4
Thanks, I think that is exactly what I'm looking for. I guess if you do a =
intern(a), then if a is equivalent but not identical to something in the
global string table, it will assign a to reference the object in the global
string table, and thus the old "a" value can be garbage collected.

Well, for a C/C++ programmer, they ARE references, and I can't imagine
thinking otherwise. They are implemented as references/pointers in the
interpreter. Thinking in value semantics can simplify things, but there
will always be some cases where it doesn't work.

I think my example is a good one. If you have a graph in an adjacency list
representation, you don't want the nodes to be copies of each other. They
should just point to a global list of nodes. This can definitely matter...
the worst case is when you have a complete graph, where you would have
n(n-1)/2 (or n(n-1) for a complete directed graph) node objects where you
only need n. Instead of 100,000 nodes (not an unreasonable size), you could
have 10 billion (totally unreasonable).

But it is a good question whether the garbage collection will make the
difference worth it, in most cases. I could reassign my references, but I
don't know when the old objects get deleted. That is, I can do a =
intern(a), but the old value of a could hang around in memory for who knows
how long. If anyone could shed some light on this, I would appreciate it.
"Terry Reedy" <tj*****@udel.e du> wrote in message
news:Qf******** ************@co mcast.com...

"Andy C" <an******@yahoo .com> wrote in message
news:64******** *************** ***@posting.goo gle.com...
I am new to python, so please bear with me if I am making some
conceptual error.

Basically I want to create a graph with an adjacency list
representation, but I don't want any of the adjacency lists to have
duplicate strings when it is avoidable. I have a function

createEdge
that adds an edge to the graph. The arguments will be distinct

since
they are read from text files. But basically I want to use the
dictionary as a string pool, and if the argument string equals
something in the pool already, don't use the argument string, just a
use a reference to something in the string pool already.


Thinking in terms of 'references' can both help and hinder. (There
have been some long recent discussion.)

Are you familiar with this?
help('intern')


Help on built-in function intern:

intern(...)
intern(string) -> string

``Intern'' the given string. This enters the string in the
(global)
table of interned strings whose purpose is to speed up dictionary
lookups.
Return the string itself or the previously interned string object
with the
same value.

TJR

Jul 18 '05 #5
Andy C:
Basically I want to create a graph with an adjacency list
representation, but I don't want any of the adjacency lists to have
duplicate strings when it is avoidable.
Took me a while but I finally figured out what you're asking.

Look in the docs for 'intern', which is a builtin. It's rarely used
because most people don't worry about this sort of duplication.
Is this a waste of time? Because I know in python I cannot be certain
that the argument strings that are read from files are even garbage
collected anyway. I could certainly do the job with duplicate
strings, but it would seem wasteful. I am a C programmer, and old
habits die hard.


You can never be certain that anything is every garbage collected.
Python-the-language makes no guarantees on that. Python-in-C
does make a stronger statement, but it's an implementation issue. As
a C programmer you'll be happy that it's a reference counted scheme
and so long as there are no cycles (which can't happen with a string),
the string will be deallocated when your code lets go of it.

This is true even of interned strings, at least in newer Pythons. If
no one references an interned string, it too goes away. (Older
Pythons make the strings immortal.)

Here's how you might change your code.

class GraphAccumulato r:
def __init__( self, fileFunction ):
self.fileFuncti on = fileFunction
self.adjList = {} # adjacency list

def createEdge( self, node1, node2 ):

# another way of doing by using a dictionary
node1 = intern(node1)
node2 = intern(node2)
adjList = self.adjList
if adjList.has_key ( node1 ):
adjList[ node1 ].append( node2 );
else:
adjList[ node1 ] = [ node2 ]; # singleton edge list

But personally I would put the intern outside of the graph
code - either in the reader or the code between the reader
and this one. Otherwise, consider:

# fully connect the graph
nodes = graph.adjList.k eys()
for n1 in nodes:
for n2 in nodes:
if n1 != n2:
graph.createEdg e(n1, n1)

This does extra interning even though it isn't needed.

But then again, normally I wouldn't worry about the interning
at all. ... Perhaps I should... Hmmm...

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #6
In article <64************ **************@ posting.google. com>,
Andy C <an******@yahoo .com> wrote:

Basically I want to create a graph with an adjacency list
representation , but I don't want any of the adjacency lists to have
duplicate strings when it is avoidable. I have a function createEdge
that adds an edge to the graph. The arguments will be distinct since
they are read from text files. But basically I want to use the
dictionary as a string pool, and if the argument string equals
something in the pool already, don't use the argument string, just a
use a reference to something in the string pool already.


That makes some sense, but why use the string object for both key and
value? Just do

d[key] = True

You could optimize your coding style (if not your speed) in Python 2.3
by using sets. I'd recommend against using intern(), because in Python
2.2 and earlier, interned strings *never* get garbage collected. Even
in Python 2.3, you may end up with worse memory behavior.
--
Aahz (aa**@pythoncra ft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects
with useful practice. --Aahz
Jul 18 '05 #7
I don't see how I can do this and let me eliminate duplicates. I need to
assign the old duplicate string to the unique string that already exists.
Hence the question, how do I get a reference to the KEY value? I know I can
use keys() and do a linear search, but that is much more inefficient. I
would like to get a reference to the key value in the same time that it
takes to do a hash lookup (constant time).

Basically I would want to rewrite this section of the code I posted:

nodes = self.nodes
if nodes.has_key( node2 ):
node2 = nodes[ node2 ]
else:
nodes[ node2 ] = node2

This dictionary seems stupid, I agree. The keys and values are the same.
But in the first part of the if, I want to reassign node2 to an equivalent
string that already exists. How can I do that?

The intern solution seems reasonable, and it appears that it was designed
specifically for this problem. I wasn't aware of the implementation
problems. But I'm still curious about different ways to do it.
That makes some sense, but why use the string object for both key and
value? Just do

d[key] = True

You could optimize your coding style (if not your speed) in Python 2.3
by using sets. I'd recommend against using intern(), because in Python
2.2 and earlier, interned strings *never* get garbage collected. Even
in Python 2.3, you may end up with worse memory behavior.
--
Aahz (aa**@pythoncra ft.com) <*> http://www.pythoncraft.com/
This is Python. We don't care much about theory, except where it intersects with useful practice. --Aahz

Jul 18 '05 #8
On 2 Aug 2003 10:16:13 -0400, aa**@pythoncraf t.com (Aahz) wrote:
In article <w1************ ********@newssv r21.news.prodig y.com>,
Andy C <ay********@cor nell.edu> wrote:

I don't see how I can do this and let me eliminate duplicates. I need
to assign the old duplicate string to the unique string that already
exists. Hence the question, how do I get a reference to the KEY value?
I know I can use keys() and do a linear search, but that is much more
inefficient . I would like to get a reference to the key value in the
same time that it takes to do a hash lookup (constant time).


Ahhhh.... Right. Hmmmmm.... <thinks hard> You're correct, you do
need to set the value to the key. I think using a dict is better than
using intern(). Here's an optimization:

node2 = node_cache.setd efault(node2, node2)


I would sure be more comfortable with a change in nomenclature, e.g.,

node2name = node_name_cache .setdefault(nod e2name, node2name)

so that it is clear that you are dealing with name strings, not typical graph
vertex/node objects.

BTW, reading from a file, you can probably not in general avoid forward references,
(i.e., names of adjacent nodes yet to be defined), but they should be resolvable at the end,
so you could make nodes that refer to adjacent nodes directly instead of by name.
Nodes would carry a ref to their own (string) name, and go by that same name string in a graph dict,
so there should be no duplication of strings, just references to them. E.g.,
(I started to just sketch something, and ... well, it's hard to stop programming Python
before something is running at least preliminarily ;-) I subclassed dict to keep nodes in by name.

====< graph.py >============== =============== =============== =============
def boxit(s):
lines = s.splitlines()
wid = max(map(len,lin es))
ret = ['+-%s-+' % ('-'*wid)]
fmt = '| %%-%ds |'%wid
for line in lines: ret.append(fmt% line)
ret.append(ret[0])
return '\n'.join(ret)

def errep(Ex, *rest):
if __debug__: print boxit('%s: %s'%(Ex.__name_ _, rest and rest[0])) # and blunder on
else: raise Ex(*rest)

class Node(object):
__slots__ = 'name', 'adj_list'
def __init__(self, name):
self.name = name
self.adj_list = None # signify undefined, vs [] for no adjacent nodes

class Graph(dict):
def load(self, infile, print_lines=Fal se):
for line in infile:
if print_lines: print line.rstrip()
sline = line.strip()
if not sline or sline.startswit h('#'): continue
node_def_names = line.split() # assume line like 'nodename adj_node1_name adj_node2_name ...'
node_name = node_def_names[0]
if node_name in self:
# don't allow re-definition
errep(ValueErro r, 'Node %r being re-defined:\n new adj: %r\n old adj: %r' %(
node_name, node_def_names[1:], self[node_name].adj_list))
else:
self[node_name] = Node(node_name)
# set adj list, making making this a defined node
self[node_name].adj_list = node_def_names[1:]
# change adj lists to direct references to adjacent nodes
for node in self.values():
adj_list = node.adj_list
assert adj_list is not None
for i in xrange(len(adj_ list)):
adj_name = adj_list[i]
try:
adj_list[i] = self[adj_name]
except KeyError:
errep( ValueError, 'node %r refers to undefined adj node %r' % (node.name, adj_name))
# continuing if debugging
adj_list[i] = Node(adj_name) # adj_list None (vs []) signifies undefined
def show(self):
node_names = self.keys()
node_names.sort ()
visited = {}
ret = []
def prtree(node, indent=0):
if node.name in visited:
if indent: ret.append('%s% s (see previous)'%(ind ent*' ',node.name))
else:
visited[node.name]=True
if node.adj_list is None:
ret.append('%s% s (undefined) '%(indent*' ',node.name))
return
ret.append('%s% s'%(indent*' ',node.name))
for adj in node.adj_list:
prtree(adj, indent+1)
for node_name in node_names:
prtree(self[node_name])
ret.append('')
return '\n'.join(ret)

def test(filepath=' '):
graph = Graph() # name-string -> Node instance
if not filepath:
import StringIO
f_graph = StringIO.String IO("""
A B C D
B E F
C G H
G A
D B
E
F
H
D E F
X E Y
""")
else:
f_graph = file(filepath)

print 'Loading graph from %s ...\n--' % (filepath or '(test)')
graph.load(f_gr aph, True) # print the lines read
print '--'
f_graph.close()

print graph.show()

if __name__ == '__main__':
import sys
args = sys.argv[1:]
test(args and args[0] or '')
=============== =============== =============== =============== =============

Result:

[16:33] C:\pywk\clp\pp> graph.py
Loading graph from (test) ...
--

A B C D
B E F
C G H
G A
D B
E
F
H
D E F
+----------------------------------------+
| ValueError: Node 'D' being re-defined: |
| new adj: ['E', 'F'] |
| old adj: ['B'] |
+----------------------------------------+
X E Y

+-------------------------------------------------------+
| ValueError: node 'X' refers to undefined adj node 'Y' |
+-------------------------------------------------------+
--
A
B
E
F
C
G
A (see previous)
H
D
B (see previous)
X
E (see previous)
Y (undefined)

(Not tested beyond what you see. NTBWYS? ;-)

Regards,
Bengt Richter
Jul 18 '05 #9
On Sat, 02 Aug 2003 23:34:20 GMT, "Andy C" <ay********@cor nell.edu> wrote:
Google("intern-like memory saver").


Well, that seems very sensical, so how come it hasn't made it into the
language? And what's wrong with intern? (Though intern only works on
strings, not for immutable objects in general. I believe someone was asking
a pretty much identical question here, and someone replied with the
'memoize' pattern).

Can this be done without C, now that you can subclass the built-in
dictionary?

For a subclass of dict that may be of interest, see my post in this thread
timestamped less than 2 minutes before this post of yours ;-)

Regards,
Bengt Richter
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

36
28123
by: Riccardo Rossi | last post by:
Hi all! How does Python pass arguments to a function? By value or by reference? Thanks, Riccardo Rossi.
14
1925
by: Arik Funke | last post by:
Hi together, following code does compile fine on MS VC but not on gcc. What am I doing wrong? Is there a universal coding standard? Cheers, Arik --- class abc {
17
4206
by: Karl Irvin | last post by:
To use the Textstream object, I had to set a Reference to the Microsoft Scripting Runtime. This works good with A2000 Is the Scripting Runtime included with A2002 and A2003 so the Reference won't be broken when my app is opened with those versions. Also is the Scripting Runtime included as part of the A2000 Runtime Engine which some of my customers use.
5
1429
by: raghu | last post by:
Hi All, I am a new user of Python and am having a bit of problem understanding the Reference counting and memory leakage issues. Requesting help from experienced users.... I wrote the following simple program.
4
14762
by: NullQwerty | last post by:
Hi folks, I have a Dictionary which contains a string key and an object value. I want the object value to point to a property in my class and I want it to be by reference, so that later on I can change the value of the property through the dictionary. I am having difficulty making the value be by reference. Is this possible? I've even tried unsuccessfully to use unsafe code with pointers.
4
2410
by: Steve Goldman | last post by:
Even asking this question probably demonstrates that I have a fundamental misunderstanding of how values and references work in C#, but here goes: I'd like to assign a reference to an arbitrary class member to another class member, so that I can later mutate the stored reference. I'd like to use this for an undo function - basically, when you make a change to a class member value, you call SaveMemberValue() to store the reference and...
5
1868
by: Ethan Strauss | last post by:
Hi, I have just started using Generic Collections for .Net 2.0 and so far they are working very nicely for me. But, I do have a couple of questions. If I have a Generic collection which has a type which is a reference type, is there a way to get Collection.ContainsKey(RefKey) and Collection to work with different instances of equal keys? For example, I have defined a class of "codon" and I have made a dictionary private...
2
2363
by: Andy B | last post by:
I have the following listView control on a page: <asp:ListView ID="ListView1" runat="server" ItemPlaceholderID="PlaceHolder1"> <ItemTemplate> <dl> <dt>
275
12381
by: Astley Le Jasper | last post by:
Sorry for the numpty question ... How do you find the reference name of an object? So if i have this bob = modulename.objectname() how do i find that the name is 'bob'
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10045
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9994
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8870
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6673
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5298
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.