473,405 Members | 2,262 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Pattern matching from a text document

Ben
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

The program then performs a loop like so:

while foo > 0:
if len(list) == 0:
break
else:
pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a-z]+)(\s+)(\d{1})(\d{1})(\d{1})(\d{1})(\d{1})([a-z]+)"
ph = re.compile(pat,re.IGNORECASE)

match = ph.match(list[1])

forename = match.group(1)
surname = match.group(3)
attacking = match.group(7)
defending = match.group(8)
fitness = match.group(9)

print forename
print len(list)
del list[0]

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Jul 18 '05 #1
5 1789
B
"Ben" <be***********@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

[snipped]

The program then performs a loop like so:

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Ben, can you post a sample line from the document and indicate the fields you want to extract? I'm
sure it will be easier to help you this way.

George
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If a slave say to his master: "You are not my master," if they convict
him his master shall cut off his ear."

Hammurabi's Code of Laws
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jul 18 '05 #2
First, if you're going to loop over each line, do it like this:

for line in file('playerlist.txt'):
#do stuff here

Second, this statement is referencing the *second* item in the list,
not the first:

match = ph.match(list[1])

Third, a simple splitting of the lines by some delimiter character
would be easier than regular expressions, but whatever floats your
boat. If you insist on using regexen, then you should compile the
pattern before the loop. No need to do it over and over again.

Fourth, if you want to create a list of players in memory, then you
need either a class or some other structure to represent each player,
and then you need to add them to some kind of list as you go. Like
this:

pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a*-z]+)(\s+)(\d{1})(\d{1})(\d{1}*)(\d{1})(\d{1})([a-z]+)"

ph = re.compile(pat,re.IGNORECASE)
players = []
for line in file('playerlist.txt'):
match = ph.match(line)
player = {
'forename' : match.group(1),
'surname' : match.group(3),
'attacking' : match.group(7),
'defending' : match.group(8),
'fitness' : match.group(9)
}
players.append(player)

Jul 18 '05 #3
Ben,

Others have answered your specific questions, but I thought
I'd use this opportunity to make a general statement. Unlike
other programming languages, Python doesn't make its built-in
functions keywords. You should never, ever, ever name a
variable 'list' (the same is true of dict, tuple, str, ...).
When you do you mask the built-in Python function with your
variables. If this hasn't bitten you before, it will at some
point.

It really doesn't sound like you require regular expression
complexity to just read in some data. You might want to
investigate CSV module (for reading comma delimited files)
or you might just be able to use simple .split() method (for
tab delimited files).

Hope info helps.

Regards,
Larry Bates
Ben wrote:
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string
(since each fit player and their attributes is entered line by line in
the text document) using list = target.splitlines()

The program then performs a loop like so:

while foo > 0:
if len(list) == 0:
break
else:
pat =
"([a-z]+)(\s+)([a-z]+)(\s+)([a-z]+)(\s+)(\d{1})(\d{1})(\d{1})(\d{1})(\d{1})([a-z]+)"
ph = re.compile(pat,re.IGNORECASE)

match = ph.match(list[1])

forename = match.group(1)
surname = match.group(3)
attacking = match.group(7)
defending = match.group(8)
fitness = match.group(9)

print forename
print len(list)
del list[0]

The two main problems I'm having are that the first and entry in the
list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This
is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Jul 18 '05 #4
Ben

George Sakkis wrote:
B
"Ben" <be***********@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
I'm currently trying to develop a demonstrator in python for an
ontology of a football team. At present all the fit players are
exported to a text document.

The program reads the document in and splits each line into a string (since each fit player and their attributes is entered line by line in the text document) using list = target.splitlines()

[snipped]

The program then performs a loop like so:

The two main problems I'm having are that the first and entry in the list is not printing. Once I have overcome this problem I then need
each player and there related variables to be stored seperately. This is not happening at present because each time the loop runs it
overwrites the value in each variable.

Any help would be greatly appreciated.

Ben.

Ben, can you post a sample line from the document and indicate the

fields you want to extract? I'm sure it will be easier to help you this way.

George
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If a slave say to his master: "You are not my master," if they convict him his master shall cut off his ear."

Hammurabi's Code of Laws
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Below is a few sample lines. There is the name followed by the class
(not important) followed by 5 digits each of which can range 1-9 and
each detail a different ability, such as fitness, attacking ability
etc. Finally the preferred foot is stated.

Freddie Ljungberg Player 02808right
Dennis Bergkamp Player 90705either
Thierry Henry Player 90906either
Ashley Cole Player 17705left
Thanks for your help

ben

Jul 18 '05 #5
Le 24 Mar 2005 06:16:12 -0800, Ben a écrit :

Below is a few sample lines. There is the name followed by the class
(not important) followed by 5 digits each of which can range 1-9 and
each detail a different ability, such as fitness, attacking ability
etc. Finally the preferred foot is stated.

Freddie Ljungberg Player 02808right
Dennis Bergkamp Player 90705either
Thierry Henry Player 90906either
Ashley Cole Player 17705left filename = 'players' # to adapt
players = {} # mapping of name to abilities
fin = open(filename)
for line in fin:
firstname, lastname, type_, ability = line.split()
players[(lastname, firstname)] = Ability(ability)
fin.close()

where Ability can be e simple function which return processed the
information in the last word(string) of each line, or a class which
stores/manages such information
class Ability(object):
def __init__(self, ability):
digits = ability[:5]
self.details = map(int, list(digits)) # list of details
self.preferred_foot = ability[5:]
# and so on ....

Thanks for your help

ben

Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

176
by: Thomas Reichelt | last post by:
Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is there a project to add static typing to Python? ...
9
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images...
4
by: Xah Lee | last post by:
20050207 text pattern matching # -*- coding: utf-8 -*- # Python # suppose you want to replace all strings of the form # <img src="some.gif" width="30" height="20"> # to # <img...
1
by: Henry | last post by:
I have a table that stores a list of zip codes using a varchar column type, and I need to perform some string prefix pattern matching search. Let's say that I have the columns: 94000-1235 94001...
2
by: ahogue at theory dot lcs dot mit dot edu | last post by:
Hello - Is there any way to match complex subtree patterns with XPath? The functions I see all seem to match along a single path from root to leaf. I would like to match full subtrees. For...
5
by: Jamie Jackson | last post by:
Two fairly basic questions: I need to supply a method with an array of strings, which will eventually be used to pattern match against another array of strings. 1. Is there a good way to span...
1
by: VanKha | last post by:
I write this program for pattern-matching,but it gives wrong result: #include<iostream> #include<conio.h> #include<string.h> using namespace std; main() { char text,pat;...
3
by: konrad Krupa | last post by:
This message is a continuation of my previous post "Pattern Match" Doug - Thank you for your help. Doug Semler was able to solve my problem to some point but I still need some help. Doug's...
5
by: pramodkh | last post by:
Hi All I am trying to match a pattern in a file and insert a line. If the pattern matches then insert a line before the matching pattern line. for example, I have the following content in a...
0
by: Peted | last post by:
Hi, im having some trouble with reg expression pattern matching for something i think should be a straightforward test. Im validating the text being entered in a winforms textbox and i need...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.