how to find a lable quickly?

wang frank

Hi,

I am a new user on Python and I really love it.

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

In xx, it contains the list of each line. I want to find a spcefic labels
and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs slowly.
Are there anyway to speed up the process? I thought to convert the data xx
from list to a dictionay, so I can get the index quickly based on the
label. Can I do that effeciently?

Thanks

Frank

__________________________________________________ _______________
$B%a%C%;%s%8%c!<$*M'C#>R2p%W%l%<%s%HBh(B2$BCF3+ ;O!*%i%9%Y%,%9N99T%W%l%<%s%H(B
http://campaign.live.jp/dizon/

May 4 '07 #1

Subscribe Reply

1215

Larry Bates

wang frank wrote:

Hi,

I am a new user on Python and I really love it.
I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

In xx, it contains the list of each line. I want to find a spcefic
labels and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs
slowly. Are there anyway to speed up the process? I thought to convert
the data xx from list to a dictionay, so I can get the index quickly
based on the label. Can I do that effeciently?

Thanks

Frank

__________________________________________________ _______________
$B%a%C%;%s%8%c!<$*M'C#>R2p%W%l%<%s%HBh(B2$BCF3+ ;O!*%i%9%Y%,%9N99T%W%l%<%s%H(B
http://campaign.live.jp/dizon/

Are the labels unique? That is, labels are never repeated in the file. If
not you are going to need to do some processing because dictionary keys
must be unique.

Do you have control over the format of the test.txt file. If so a small
change would put it into a format that the ConfigParser module can handle
which would make it faster because it uses dictionaries.
[labels]
label=3
teststart=5
endtest=100
newrun=2345

With this you can have different sections [section] with labels under each
section. Use configParser to read this and then get options with
geting(section, option).

-Larry

May 4 '07 #2

Miki

Hello Frank,

I am a new user on Python and I really love it.

The more you know, the deeper the love :)

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

This reads the whole file to memory, which might be a problem.

In xx, it contains the list of each line. I want to find a spcefic labels
and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs slowly.
Are there anyway to speed up the process? I thought to convert the data xx
from list to a dictionay, so I can get the index quickly based on the
label. Can I do that effeciently?

IMO a better way is either to not load the whole file to memory:
# Untested
labels = {}.fromkeys(["endtest", "other_label"])
for line in open("test.txt"):
label, value = line.split()
if label in labels:
labels[label] = value.strip()

Another option is to use an external fast program (such as egrep):
from os import popen
labels = {}
for line in popen("egrep 'endtest|other_label' test.txt"):
label, value = line.strip().split()
labels[label] = value

HTH,
--
Miki <mi*********@gmail.com>
http://pythonwise.blogspot.com/

May 4 '07 #3

Duncan Booth

"wang frank" <fw*@hotmail.co.jpwrote:

Hi,

I am a new user on Python and I really love it.

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

First suggestion: never use readlines() unless you really want all the
lines in a list. Iterating over the file will probably be faster
(especially if some of the time you can abort the search without reading
all the way to the end).

>
In xx, it contains the list of each line. I want to find a spcefic
labels and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Ignoring the fact that what you wrote wouldn't compile, you could try:

if ss.startwith('endtest '):
...

>
Since the file is big and I need find more lables, this code runs
slowly. Are there anyway to speed up the process? I thought to convert
the data xx from list to a dictionay, so I can get the index quickly
based on the label. Can I do that effeciently?

Yes, if you need to do this more than once you want to avoid scanning the
file repeatedly. So long as you are confident that every line in the file
is exactly two fields:

lookuptable = dict(s.split() for s in uu)

is about as efficient as you are going to get.

May 4 '07 #4

Similar topics

Passing a field's lable

by: Lukelrc | last post by:

Hi, I have a listfield that displays a list of 'Themes' stored in a database. The listfield's lable is Themenames and it's value is UniqueThemeID. The page that i'm designing allows the user to...