By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,275 Members | 936 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,275 IT Pros & Developers. It's quick & easy.

Pattern matching from a text document

P: n/a
Ben
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

The program then performs a loop like so:

while foo > 0:
if len(list) == 0:
break
else:
pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a-z]+)(\s+)(\d{1})(\d{1})(\d{1})(\d{1})(\d{1})([a-z]+)"
ph = re.compile(pat,re.IGNORECASE)

match = ph.match(list[1])

forename = match.group(1)
surname = match.group(3)
attacking = match.group(7)
defending = match.group(8)
fitness = match.group(9)

print forename
print len(list)
del list[0]

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Jul 18 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
B
"Ben" <be***********@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

[snipped]

The program then performs a loop like so:

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Ben, can you post a sample line from the document and indicate the fields you want to extract? I'm
sure it will be easier to help you this way.

George
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If a slave say to his master: "You are not my master," if they convict
him his master shall cut off his ear."

Hammurabi's Code of Laws
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jul 18 '05 #2

P: n/a
First, if you're going to loop over each line, do it like this:

for line in file('playerlist.txt'):
#do stuff here

Second, this statement is referencing the *second* item in the list,
not the first:

match = ph.match(list[1])

Third, a simple splitting of the lines by some delimiter character
would be easier than regular expressions, but whatever floats your
boat. If you insist on using regexen, then you should compile the
pattern before the loop. No need to do it over and over again.

Fourth, if you want to create a list of players in memory, then you
need either a class or some other structure to represent each player,
and then you need to add them to some kind of list as you go. Like
this:

pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a*-z]+)(\s+)(\d{1})(\d{1})(\d{1}*)(\d{1})(\d{1})([a-z]+)"

ph = re.compile(pat,re.IGNORECASE)
players = []
for line in file('playerlist.txt'):
match = ph.match(line)
player = {
'forename' : match.group(1),
'surname' : match.group(3),
'attacking' : match.group(7),
'defending' : match.group(8),
'fitness' : match.group(9)
}
players.append(player)

Jul 18 '05 #3

P: n/a
Ben,

Others have answered your specific questions, but I thought
I'd use this opportunity to make a general statement. Unlike
other programming languages, Python doesn't make its built-in
functions keywords. You should never, ever, ever name a
variable 'list' (the same is true of dict, tuple, str, ...).
When you do you mask the built-in Python function with your
variables. If this hasn't bitten you before, it will at some
point.

It really doesn't sound like you require regular expression
complexity to just read in some data. You might want to
investigate CSV module (for reading comma delimited files)
or you might just be able to use simple .split() method (for
tab delimited files).

Hope info helps.

Regards,
Larry Bates
Ben wrote:
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

The program then performs a loop like so:

while foo > 0:
if len(list) == 0:
break
else:
pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a-z]+)(\s+)(\d{1})(\d{1})(\d{1})(\d{1})(\d{1})([a-z]+)"
ph = re.compile(pat,re.IGNORECASE)

match = ph.match(list[1])

forename = match.group(1)
surname = match.group(3)
attacking = match.group(7)
defending = match.group(8)
fitness = match.group(9)

print forename
print len(list)
del list[0]

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Jul 18 '05 #4

P: n/a
Ben

George Sakkis wrote:
B
"Ben" <be***********@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string (since each fit player and their attributes is entered line by line in the text document) using list = target.splitlines()

[snipped]

The program then performs a loop like so:

The two main problems I'm having are that the first and entry in the list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Ben, can you post a sample line from the document and indicate the

fields you want to extract? I'm sure it will be easier to help you this way.

George
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If a slave say to his master: "You are not my master," if they convict him his master shall cut off his ear."

Hammurabi's Code of Laws
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Below is a few sample lines. There is the name followed by the class
(not important) followed by 5 digits each of which can range 1-9 and
each detail a different ability, such as fitness, attacking ability
etc. Finally the preferred foot is stated.

Freddie Ljungberg Player 02808right
Dennis Bergkamp Player 90705either
Thierry Henry Player 90906either
Ashley Cole Player 17705left
Thanks for your help

ben

Jul 18 '05 #5

P: n/a
Le 24 Mar 2005 06:16:12 -0800, Ben a écrit :

Below is a few sample lines. There is the name followed by the class
(not important) followed by 5 digits each of which can range 1-9 and
each detail a different ability, such as fitness, attacking ability
etc. Finally the preferred foot is stated.

Freddie Ljungberg Player 02808right
Dennis Bergkamp Player 90705either
Thierry Henry Player 90906either
Ashley Cole Player 17705left filename = 'players' # to adapt
players = {} # mapping of name to abilities
fin = open(filename)
for line in fin:
firstname, lastname, type_, ability = line.split()
players[(lastname, firstname)] = Ability(ability)
fin.close()

where Ability can be e simple function which return processed the
information in the last word(string) of each line, or a class which
stores/manages such information
class Ability(object):
def __init__(self, ability):
digits = ability[:5]
self.details = map(int, list(digits)) # list of details
self.preferred_foot = ability[5:]
# and so on ....

Thanks for your help

ben

Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.