472,135 Members | 1,211 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,135 software developers and data experts.

how to find a lable quickly?

Hi,

I am a new user on Python and I really love it.

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

In xx, it contains the list of each line. I want to find a spcefic labels
and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs slowly.
Are there anyway to speed up the process? I thought to convert the data xx
from list to a dictionay, so I can get the index quickly based on the
label. Can I do that effeciently?

Thanks

Frank

__________________________________________________ _______________
$B%a%C%;%s%8%c!<$*M'C#>R2p%W%l%<%s%HBh(B2$BCF3+ ;O!*%i%9%Y%,%9N99T%W%l%<%s%H(B
http://campaign.live.jp/dizon/

May 4 '07 #1
3 1178
wang frank wrote:
Hi,

I am a new user on Python and I really love it.
I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

In xx, it contains the list of each line. I want to find a spcefic
labels and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs
slowly. Are there anyway to speed up the process? I thought to convert
the data xx from list to a dictionay, so I can get the index quickly
based on the label. Can I do that effeciently?

Thanks

Frank

__________________________________________________ _______________
$B%a%C%;%s%8%c!<$*M'C#>R2p%W%l%<%s%HBh(B2$BCF3+ ;O!*%i%9%Y%,%9N99T%W%l%<%s%H(B
http://campaign.live.jp/dizon/
Are the labels unique? That is, labels are never repeated in the file. If
not you are going to need to do some processing because dictionary keys
must be unique.

Do you have control over the format of the test.txt file. If so a small
change would put it into a format that the ConfigParser module can handle
which would make it faster because it uses dictionaries.
[labels]
label=3
teststart=5
endtest=100
newrun=2345

With this you can have different sections [section] with labels under each
section. Use configParser to read this and then get options with
geting(section, option).

-Larry
May 4 '07 #2
Hello Frank,
I am a new user on Python and I really love it.
The more you know, the deeper the love :)
I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()
This reads the whole file to memory, which might be a problem.
In xx, it contains the list of each line. I want to find a spcefic labels
and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs slowly.
Are there anyway to speed up the process? I thought to convert the data xx
from list to a dictionay, so I can get the index quickly based on the
label. Can I do that effeciently?
IMO a better way is either to not load the whole file to memory:
# Untested
labels = {}.fromkeys(["endtest", "other_label"])
for line in open("test.txt"):
label, value = line.split()
if label in labels:
labels[label] = value.strip()

Another option is to use an external fast program (such as egrep):
from os import popen
labels = {}
for line in popen("egrep 'endtest|other_label' test.txt"):
label, value = line.strip().split()
labels[label] = value

HTH,
--
Miki <mi*********@gmail.com>
http://pythonwise.blogspot.com/

May 4 '07 #3
"wang frank" <fw*@hotmail.co.jpwrote:
Hi,

I am a new user on Python and I really love it.

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()
First suggestion: never use readlines() unless you really want all the
lines in a list. Iterating over the file will probably be faster
(especially if some of the time you can abort the search without reading
all the way to the end).
>
In xx, it contains the list of each line. I want to find a spcefic
labels and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]
Ignoring the fact that what you wrote wouldn't compile, you could try:

if ss.startwith('endtest '):
...
>
Since the file is big and I need find more lables, this code runs
slowly. Are there anyway to speed up the process? I thought to convert
the data xx from list to a dictionay, so I can get the index quickly
based on the label. Can I do that effeciently?
Yes, if you need to do this more than once you want to avoid scanning the
file repeatedly. So long as you are confident that every line in the file
is exactly two fields:

lookuptable = dict(s.split() for s in uu)

is about as efficient as you are going to get.
May 4 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Lukelrc | last post: by
reply views Thread by Ryan Liu | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.