Lists, tuples and memory.

Elbert Lev

Hi, all!

Here is the problem:
I have a file, which contains a common dictionary - one word per line
(appr. 700KB and 70000 words). I have to read it in memory for future
"spell checking" of the words comming from the customer. The file is
presorted. So here it goes:

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt "))

Works like a charm. It takes on my machine 0.7 seconds to do the trick
and python.exe (from task manager data) was using before this line is
executed
2636K, and after 5520K. The difference is 2884K. Not that bad, taking
into account that in C I'd read the file in memory (700K) scan for
CRs, count them, replace with '\0' and allocate the index vector of
the word beginnigs of the size I found while counting CRs. In this
particular case index vector would be almost 300K. So far so good!

Then I realized, that lstdict as a list is an overkill. Tuple is
enough in my case. So I modified the code:

t = tuple(file("D:\ \CommonDictiona ry.txt"))
lstdict = map(lambda x: x.lower().strip (), t)

This code works a little bit faster: 0.5 sec, but takes 5550K memory.
And maybe this is understandable: after all the first line creates a
list and a tuple and the second another tuple (all of the same size).

But then I decieded to compare this pieces:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

So one more experiment:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2
del lstdict # 3
lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 4

In this case executing line 4 did not add memory!

By the way, the search speed is very-very good when done by function:

def inDict(lstdict, word):
try:
return lstdict[bisect(lstdict, word) - 1] == word
except:
return False

500 words are tested in less then 0.03 seconds.

Jul 18 '05 #1

Subscribe Reply

2162

Christopher T King

On 15 Jul 2004, Elbert Lev wrote:

But then I decieded to compare this pieces:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

Because after line 2, you still have a 2MB tuple sitting around in 't'. ;)

Try this:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t)
del t # 2

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 3

Python doesn't know that you're not going to try to access t at some
future point, even though there are no more accesses in your code.

A couple of other thoughts on what you're doing:

1) Stick with lists. Tuples are meant to be used not really as immutable
lists, but as a way to group related items of different types. e.g. the
sequence 'apple','orange ','pear','banan a' should be stored in a list,
whereas the group of related items 'apple','red',3 ,5 (presumably
describing an apple in some predefined manner, say,
fruit,color,wid th,height) should be stored in a tuple. At least that's
what Guido wants us to do. ;)

2) Assuming you're using a newer version of Python, try using a list
comprehension instead of map(). It's a little bit cleaner, and will
probably be a bit faster too:

lstdict = [x.lower().strip () for x in file("D:\\Commo nDictionary.txt ")]

If you've never worked with them before, list comprehensions are a new
syntax intended to replace map(), filter(), and other such constructs with
one unified syntax. They're usually very easy to read and write (as the
one above is) since they help you avoid using lambda.

Jul 18 '05 #2

Larry Bates

Any reason not to put the words into a dictionary?

Then your code becomes:

lstdict=dict([(x.lower().stri p(), None) for x in
file("D:\\Commo nDictionary.txt ")])
if lstdict.has_key (word):
do something

(not tested, but should be close)

I'll bet access is faster (even if loading is slightly slower).
You also benefit from the fact that the dictionary file
doesn't need to be kept sorted.

HTH,
Larry Bates
Syscon, Inc.

"Elbert Lev" <el*******@hotm ail.com> wrote in message
news:94******** *************** ***@posting.goo gle.com...

Hi, all!

Here is the problem:
I have a file, which contains a common dictionary - one word per line
(appr. 700KB and 70000 words). I have to read it in memory for future
"spell checking" of the words comming from the customer. The file is
presorted. So here it goes:

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt "))

Works like a charm. It takes on my machine 0.7 seconds to do the trick
and python.exe (from task manager data) was using before this line is
executed
2636K, and after 5520K. The difference is 2884K. Not that bad, taking
into account that in C I'd read the file in memory (700K) scan for
CRs, count them, replace with '\0' and allocate the index vector of
the word beginnigs of the size I found while counting CRs. In this
particular case index vector would be almost 300K. So far so good!

Then I realized, that lstdict as a list is an overkill. Tuple is
enough in my case. So I modified the code:

t = tuple(file("D:\ \CommonDictiona ry.txt"))
lstdict = map(lambda x: x.lower().strip (), t)

This code works a little bit faster: 0.5 sec, but takes 5550K memory.
And maybe this is understandable: after all the first line creates a
list and a tuple and the second another tuple (all of the same size).

But then I decieded to compare this pieces:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

So one more experiment:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2
del lstdict # 3
lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 4

In this case executing line 4 did not add memory!

By the way, the search speed is very-very good when done by function:

def inDict(lstdict, word):
try:
return lstdict[bisect(lstdict, word) - 1] == word
except:
return False

500 words are tested in less then 0.03 seconds.

Jul 18 '05 #3

Roy Smith

"Larry Bates" <lb****@swamiso ft.com> wrote:

Any reason not to put the words into a dictionary?
[...]
I'll bet access is faster (even if loading is slightly slower).

One trick I've used when I have something with a long startup time
because it's parsing static files and building data structures is to
build the data structure once, then pickle it (see the pickle and
cPickle modules).

You production code then just has to call cPickle.load (file) and you're
up and running way faster than re-parsing the original data files.

Jul 18 '05 #4

Andrew MacIntyre

On Fri, 15 Jul 2004, Elbert Lev wrote:

{...}

But then I decieded to compare this pieces:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

So one more experiment:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2
del lstdict # 3
lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 4

In this case executing line 4 did not add memory!

The reason the memory is not released without the explicit del is that the
reference count of the original object is not adjusted until the
rebinding of "lstdict" takes place, which is after the new object is fully
constructed. The del in your experiment forces this adjustment.

--
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: an*****@bullsey e.apana.org.au (pref) | Snail: PO Box 370
an*****@pcug.or g.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia

Jul 18 '05 #5

Terry Reedy

"Larry Bates" <lb****@swamiso ft.com> wrote in message
news:CL******** ************@co mcast.com...

Any reason not to put the words into a dictionary?
for spell checking, I think I would...
Then your code becomes:

lstdict=dict([(x.lower().stri p(), None) for x in
file("D:\\Commo nDictionary.txt ")])

The list comp creates an unnecessary (and, in this case, large)
intermediate list.
In 2.4, removing the [] pair would result in a generator expression.
In the meanwhile...

def lostrip(src): .... for w in src: yield w.lower().strip (), None
.... t = dict(lostrip(['abc ', 'DEG\n']))
t

{'abc': None, 'deg': None}

Terry J. Reedy

Jul 18 '05 #6

Dennis Lee Bieber

On Thu, 15 Jul 2004 15:52:53 -0400, Christopher T King
<sq******@WPI.E DU> declaimed the following in comp.lang.pytho n:

whereas the group of related items 'apple','red',3 ,5 (presumably
describing an apple in some predefined manner, say,
fruit,color,wid th,height) should be stored in a tuple. At least that's
Must be a mutant red pear <G>

I don't think I've ever seen an apple with a 2" difference in
height vs width. Granted, a Red Delicious (being a common apple) does
have a bit of a height advantage...

Now, if it were 3 (per pound) and 5 (pounds) <G> (I think one
reference for carb count [diabetic here] has small apples at 3/pound,
large at 2/pound).

-- =============== =============== =============== =============== == <
wl*****@ix.netc om.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
=============== =============== =============== =============== == <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.ne tcom.com/> <

Jul 18 '05 #7

Elbert Lev

Andrew MacIntyre <an*****@bullse ye.apana.org.au > wrote in message news:<ma******* *************** *************** @python.org>...

On Fri, 15 Jul 2004, Elbert Lev wrote:

{...}
But then I decieded to compare this pieces:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 3

As expected, after line 2 memory was 5550K, but after line 3 it jumped
to 7996K!!!

The question:

If refference counting is used, why the second assignment to lstdict
(line 3) did not free the memory alocated by the first one (line 2)
and reuse it?

So one more experiment:

t = tuple(file("D:\ \CommonDictiona ry.txt")) # 1
lstdict = map(lambda x: x.lower().strip (), t) # 2
del lstdict # 3
lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt ")) # 4

In this case executing line 4 did not add memory!

The reason the memory is not released without the explicit del is that the
reference count of the original object is not adjusted until the
rebinding of "lstdict" takes place, which is after the new object is fully
constructed. The del in your experiment forces this adjustment.

Thanks!

This is what I wanted to know.

Jul 18 '05 #8

Arthur

On Thu, 15 Jul 2004 15:52:53 -0400, Christopher T King
<sq******@WPI.E DU> wrote:

1) Stick with lists. Tuples are meant to be used not really as immutable
lists, but as a way to group related items of different types. e.g. the
sequence 'apple','orange ','pear','banan a' should be stored in a list,
whereas the group of related items 'apple','red',3 ,5 (presumably
describing an apple in some predefined manner, say,
fruit,color,wi dth,height) should be stored in a tuple. At least that's
what Guido wants us to do. ;)
But with Fletcher's confession of sticking with tabs ... its anarchy
out there.

2) Assuming you're using a newer version of Python, try using a list
comprehensio n instead of map(). It's a little bit cleaner, and will
probably be a bit faster too:

Faster is good.

Modern and progessive governance. The tax incentive concept at work.

Art

Jul 18 '05 #9

John Lenton

On 15 Jul 2004 12:09:47 -0700, Elbert Lev <el*******@hotm ail.com> wrote:

Hi, all!

Here is the problem:
I have a file, which contains a common dictionary - one word per line
(appr. 700KB and 70000 words). I have to read it in memory for future
"spell checking" of the words comming from the customer. The file is
presorted. So here it goes:

lstdict = map(lambda x: x.lower().strip (),
file("D:\\Commo nDictionary.txt "))

Works like a charm. It takes on my machine 0.7 seconds to do the trick
and python.exe (from task manager data) was using before this line is
executed
2636K, and after 5520K. The difference is 2884K. Not that bad, taking
into account that in C I'd read the file in memory (700K) scan for
CRs, count them, replace with '\0' and allocate the index vector of
the word beginnigs of the size I found while counting CRs. In this
particular case index vector would be almost 300K. So far so good!

Then I realized, that lstdict as a list is an overkill. Tuple is
enough in my case. So I modified the code:

t = tuple(file("D:\ \CommonDictiona ry.txt"))
lstdict = map(lambda x: x.lower().strip (), t)

This code works a little bit faster: 0.5 sec, but takes 5550K memory.
And maybe this is understandable: after all the first line creates a
list and a tuple and the second another tuple (all of the same size).

any reason you didn't do it as

lstdict = tuple([x.strip().lower (), file("D:\\Commo nDictionary.txt ")])

?

also, you might find you can use chunks of the bisect module to speed
up your searching, if you find using a dict too expensive. That's
assuming the bisect module is written in C, which I haven't checked.

--
John Lenton (jl*****@gmail. com) -- Random fortune:
bash: fortune: command not found

Jul 18 '05 #10

Similar topics

7053

tuples vs lists

by: Ivan Voras | last post by:

Are there any performance/size differences between using tuples and using lists? -- -- Every sufficiently advanced magic is indistinguishable from technology - Arthur C Anticlarke

Python

2761

Lists and Tuples

by: Jeff Wagner | last post by:

I've spent most of the day playing around with lists and tuples to get a really good grasp on what you can do with them. I am still left with a question and that is, when should you choose a list or a tuple? I understand that a tuple is immutable and a list is mutable but there has to be more to it than just that. Everything I tried with a list worked the same with a tuple. So, what's the difference and why choose one over the other? Jeff

Python

1398

tuples or lists

by: beliavsky | last post by:

Tuples, unlike lists, are immutable, which I crudely translate to mean "their contents cannot be changed". Out of habit, I use only lists, not tuples, in my code. But I wonder if this is poor style -- putting a collection of data in a tuple, when possible, makes it easier to follow the code. You do not have to think about whether contents have changed from where the tuple was first defined. Is "use tuples instead of lists, when possible"...

Python

1214

is this use of lists normal?

by: Gabriel B. | last post by:

I just sent an email asking for hints on how to import data into a python program As i said earlier i'm really new to python and besides being confortable with the syntax, i'm not sure if i'm on the right track with the logic I'm asking for hints again here at the list because i think i'm already into premature optimization...

Python

3131

convert list of tuples into several lists

by: Oliver Eichler | last post by:

Hi, I can find examples to convert lists into a list of tuples like: >>> map(None,,) but how is it done vice versa? Oliver

Python

2182

Sorting x lists based on one list

by: Philippe C. Martin | last post by:

Hi, I'm looking for an easy algorithm - maybe Python can help: I start with X lists which intial sort is based on list #1. I want to reverse sort list #1 and have all other lists sorted accordingly. Any idea is welcome.

Python

1688

finding items that occur before or after an item in lists

by: Simon Forman | last post by:

I've got a function that I'd like to improve. It takes a list of lists and a "target" element, and it returns the set of the items in the lists that appear either before or after the target item. (Actually, it's a generator, and I use the set class outside of it to collect the unique items, but you get the idea. ;-) ) data = , , ,

Python

4168

Extract String From Enclosing Tuple

by: rshepard | last post by:

I'm a bit embarrassed to have to ask for help on this, but I'm not finding the solution in the docs I have here. Data are assembled for writing to a database table. A representative tuple looks like this: ('eco', "(u'Roads',)", 0.073969887301348305) Pysqlite doesn't like the format of the middle term: pysqlite2.dbapi2.InterfaceError: Error binding parameter 1 - probably

Python

1683

When are immutable tuples *essential*? Why can't you just use lists *everywhere* instead?

by: seberino | last post by:

Please help me think of an example where immutable tuples are essential. It seems that everywhere a tuple is used one could just as easily use a list instead. chris

Python

9480

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10313

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10147

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

8968

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6735

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5378

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5511

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4044

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3643

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP