473,399 Members | 2,858 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

* 'struct-like' list *

I'm still fairly new to python, so I need some guidance here...

I have a text file with lots of data. I only need some of the data. I
want to put the useful data into an [array of] struct-like
mechanism(s). The text file looks something like this:

[BUNCH OF NOT-USEFUL DATA....]

Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name........

I would like to have an array of "structs." Each struct has

struct Person{
string Name;
int Age;
int Birhtday;
int SS;
}

I want to go through the file, filling up my list of structs.

My problems are:

1. How to search for the keywords "Name:", "Age:", etc. in the file...
2. How to implement some organized "list of lists" for the data
structure.

Any help is much appreciated.

Feb 6 '06 #1
10 1841
Ernesto:
1. How to search for the keywords "Name:", "Age:", etc. in the file...
You could use regular expression matching:
http://www.python.org/doc/lib/module-re.html

Or plain string searches:
http://www.python.org/dev/doc/devel/...g-methods.html
2. How to implement some organized "list of lists" for the data
structure.


You could make it a list of bunches, for example:
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52308

Or a list of objects of your custom class.

--
René Pijlman
Feb 6 '06 #2
I would like to have an array of "structs." Each struct has

struct Person{
string Name;
int Age;
int Birhtday;
int SS;
}

the easiest way would be

class Person:
pass

john = Person()
david = Person()

john.name = "John Brown"
john.age = 35
etc

think of john as namespace .. with attributes (we call them so) added on
runtime

better approch would be to make real class with constructor

class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __str__(self):
return "person name = %s and age = %i" % (self.name, self.age)

john = Person("john brown", 35)
print john # this calls __str__


I want to go through the file, filling up my list of structs.

My problems are:

1. How to search for the keywords "Name:", "Age:", etc. in the file...
2. How to implement some organized "list of lists" for the data


this depend on the structure of the file
consider this format

New
Name: John
Age: 35
Id: 23242
New
Name: xxx
Age
Id: 43324
OtherInfo: foo
New

here you could read all as string and split it on "New"

here small example
txt = "fooXbarXfoobar"
txt.split("X") ['foo', 'bar', 'foobar']


in more complicated case I would use regexp but
I doubt this is neccessary in your case

Regards, Daniel

Feb 6 '06 #3
"Ernesto" <er*******@gmail.com> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
I'm still fairly new to python, so I need some guidance here...

I have a text file with lots of data. I only need some of the data. I
want to put the useful data into an [array of] struct-like
mechanism(s). The text file looks something like this:

[BUNCH OF NOT-USEFUL DATA....]

Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name........

I would like to have an array of "structs." Each struct has

struct Person{
string Name;
int Age;
int Birhtday;
int SS;
}

I want to go through the file, filling up my list of structs.

My problems are:

1. How to search for the keywords "Name:", "Age:", etc. in the file...
2. How to implement some organized "list of lists" for the data
structure.

Any help is much appreciated.

Ernesto -

Since you are searching for keywords and matching fields, and trying to
populate data structures as you go, this sounds like a good fit for
pyparsing. Pyparsing as built-in features for scanning through text and
extracting data, with suitably named data fields for accessing later.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

------------------------------------------------
from pyparsing import *

inputData = """[BUNCH OF NOT-USEFUL DATA....]

Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name: Fred
Age: 101 Birthday: 061065 SocialSecurity: 587903882000

[MORE USELESS DATA....]

Name: Barney
Age: 99 Birthday: 061265 SocialSecurity: 698014993111

[MORE USELESS DATA....]

"""

dob = Word(nums,exact=6)
# this matches your sample data, but I think SSN's are only 9 digits long
socsecnum = Word(nums,exact=12)

# define the personalData pattern - use results names to associate
# field names with matched tokens, can then access data as if they were
# attributes on an object
personalData = ( "Name:" + empty + restOfLine.setResultsName("Name") +
"Age:" + Word(nums).setResultsName("Age") +
"Birthday:" + dob.setResultsName("Birthday") +
"SocialSecurity:" + socsecnum.setResultsName("SS") )

# use personData.scanString to scan through the input, returning the
matching
# tokens, and their respective start/end locations in the string
for person,s,e in personalData.scanString(inputData):
print "Name:", person.Name
print "Age:", person.Age
print "DOB:", person.Birthday
print "SSN:", person.SS
print

# or use a list comp to scan the whole file, and return your Person data,
giving you
# your requested array of "structs" - not really structs, but ParseResults
objects
persons = [person for person,s,e in personalData.scanString(inputData)]

# or convert to Python dict's, which some people prefer to pyparsing's
ParseResults
persons = [dict(p) for p,s,e in personalData.scanString(inputData)]
print persons[0]
print

# or create an array of Person objects, as suggested in previous postings
class Person(object):
def __init__(self,parseResults):
self.__dict__.update(dict(parseResults))

def __str__(self):
return "Person(%s, %s, %s, %s)" %
(self.Name,self.Age,self.Birthday,self.SS)

persons = [Person(p) for p,s,e in personalData.scanString(inputData)]
for p in persons:
print p.Name,"->",p

--------------------------------------
prints out:
Name: David
Age: 108
DOB: 061095
SSN: 476892771999

Name: Fred
Age: 101
DOB: 061065
SSN: 587903882000

Name: Barney
Age: 99
DOB: 061265
SSN: 698014993111

{'SS': '476892771999', 'Age': '108', 'Birthday': '061095', 'Name': 'David'}

David -> Person(David, 108, 061095, 476892771999)
Fred -> Person(Fred, 101, 061065, 587903882000)
Barney -> Person(Barney, 99, 061265, 698014993111)

Feb 6 '06 #4
[Ernesto]
I'm still fairly new to python, so I need some guidance here...

I have a text file with lots of data. I only need some of the data. I
want to put the useful data into an [array of] struct-like
mechanism(s). The text file looks something like this:

[BUNCH OF NOT-USEFUL DATA....]

Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name........

I would like to have an array of "structs." Each struct has

struct Person{
string Name;
int Age;
int Birhtday;
int SS;
}

I want to go through the file, filling up my list of structs.

My problems are:

1. How to search for the keywords "Name:", "Age:", etc. in the file...
2. How to implement some organized "list of lists" for the data
structure.


Since you're just starting out in Python, this problem presents an
excellent opportunity to learn Python's two basic approaches to text
parsing.

The first approach involves looping over the input lines, searching for
key phrases, and extracting them using string slicing and using
str.strip() to trim irregular length input fields. The start/stop
logic is governed by the first and last key phrases and the results get
accumulated in a list. This approach is easy to program, maintain, and
explain to others:

# Approach suitable for inputs with fixed input positions
result = []
for line in inputData.splitlines():
if line.startswith('Name:'):
name = line[7:].strip()
elif line.startswith('Age:'):
age = line[5:8].strip()
bd = line[20:26]
ssn = line[45:54]
result.append((name, age, bd, ssn))
print result

The second approach uses regular expressions. The pattern is to search
for a key phrase, skip over whitespace, and grab the data field in
parenthesized group. Unlike slicing, this approach is tolerant of
loosely formatted data where the target fields do not always appear in
the same column position. The trade-off is having less flexibility in
parsing logic (i.e. the target fields must arrive in a fixed order):

# Approach for more loosely formatted inputs
import re
pattern = '''(?x)
Name:\s+(\w+)\s+
Age:\s+(\d+)\s+
Birthday:\s+(\d+)\s+
SocialSecurity:\s+(\d+)
'''
print re.findall(pattern, inputData)

Other respondants have suggested the third-party PyParsing module which
provides a powerful general-purpose toolset for text parsing; however,
it is always worth mastering Python basics before moving on to special
purpose tools. The above code fragements are easy to construct and not
hard to explain to others. Maintenance is a breeze.
Raymond
P.S. Once you've formed a list of tuples, it is trivial to create
Person objects for your pascal-like structure:

class Person(object):
def __init__(self, (name, age, bd, ssn)):
self.name=name; self.age=age; self.bd=bd; self.ssn=ssn

personlist = map(Person, result)
for p in personlist:
print p.name, p.age, p.bd, p.ssn

Feb 7 '06 #5
Thanks for the approach. I decided to use regular expressions. I'm
going by the code you posted (below). I replaced the line re.findall
line with my file handle read( ) like this:

print re.findall(pattern, myFileHandle.read())

This prints out only brackets []. Is a 're.compile' perhaps necessary
?
Raymond Hettinger wrote:
# Approach for more loosely formatted inputs
import re
pattern = '''(?x)
Name:\s+(\w+)\s+
Age:\s+(\d+)\s+
Birthday:\s+(\d+)\s+
SocialSecurity:\s+(\d+)
'''
print re.findall(pattern, inputData)


Feb 7 '06 #6
Ernesto wrote:
Thanks for the approach. I decided to use regular expressions. I'm
going by the code you posted (below). I replaced the line re.findall
line with my file handle read( ) like this:

print re.findall(pattern, myFileHandle.read())

This prints out only brackets []. Is a 're.compile' perhaps necessary
?


if you see [] that means findall didn't find anything
that would match your pattern
if you re.compile your pattern beforehand that
would not make findall find the matched text
it's only there for the optimization

consider
lines = [line for line in file("foo.txt").readlines() if
re.match(r"\d+",line)]

in this case it's better to pre-compile regexp one and use it
to match all lines

number = re.compile(r"\d+")
lines = [line for line in file("foo.txt").readlines() if number.match(line)]

fire interactive python and play with re and patterns
speaking from own experience ... the propability is
against you that you will make pattern right on first time

Regards, Daniel

Feb 7 '06 #7
Thanks !

Feb 7 '06 #8
On 6 Feb 2006 09:03:09 -0800, "Ernesto" <er*******@gmail.com> wrote:
I'm still fairly new to python, so I need some guidance here...

I have a text file with lots of data. I only need some of the data. I
want to put the useful data into an [array of] struct-like
mechanism(s). The text file looks something like this:

[BUNCH OF NOT-USEFUL DATA....]

Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name........
Does the useful data always come in fixed-format pairs of lines as in your example?
If so, you could just iterate through the lines of your text file as in example at end [1]

I would like to have an array of "structs." Each struct has

struct Person{
string Name;
int Age;
int Birhtday;
int SS;
} You don't normally want to do real structs in python. You probably want to define
a class to contain the data, e.g., class Person in example at end [1]

I want to go through the file, filling up my list of structs.

My problems are:

1. How to search for the keywords "Name:", "Age:", etc. in the file...
2. How to implement some organized "list of lists" for the data
structure.

It may be very easy, if the format is fixed and space-separated and line-paired
as in your example data, but you will have to tell us more if not.

[1] exmaple:

----< ernesto.py >---------------------------------------------------------
class Person(object):
def __init__(self, name):
self.name = name
def __repr__(self): return 'Person(%r)'%self.name

def extract_info(lineseq):
lineiter = iter(lineseq) # normalize access to lines
personlist = []
for line in lineiter:
substrings = line.split()
if substrings and isinstance(substrings, list) and substrings[0] == 'Name:':
try:
name = ' '.join(substrings[1:]) # allow for names with spaces
line = lineiter.next()
age_hdr, age, bd_hdr, bd, ss_hdr, ss = line.split()
assert age_hdr=='Age:' and bd_hdr=='Birthday:' and ss_hdr=='SocialSecurity:', \
'Bad second line after "Name: %s" line:\n %r'%(name, line)
person = Person(name)
person.age = int(age); person.bd = int(bd); person.ss=int(ss)
personlist.append(person)
except Exception,e:
print '%s: %s'%(e.__class__.__name__, e)
return personlist

def test():
lines = """\
[BUNCH OF NOT-USEFUL DATA....]

Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999

[MORE USELESS DATA....]

Name: Ernesto
Age: 25 Birthday: 040181 SocialSecurity: 123456789

Name: Ernesto
Age: 44 Brithdy: 040106 SocialSecurity: 123456789

Name........
"""
persondata = extract_info(lines.splitlines())
print persondata
ssdict = {}
for person in persondata:
if person.ss in ssdict:
print 'Rejecting %r with duplicate ss %s'%(person, person.ss)
else:
ssdict[person.ss] = person
print 'ssdict keys: %s'%ssdict.keys()
for ss, pers in sorted(ssdict.items(), key=lambda item:item[1].name): #sorted by name
print 'Name: %s Age: %s SS: %s' % (pers.name, pers.age, pers.ss)

if __name__ == '__main__': test()
---------------------------------------------------------------------------

this produces output:

[10:07] C:\pywk\clp>py24 ernesto.py
AssertionError: Bad second line after "Name: Ernesto" line:
'Age: 44 Brithdy: 040106 SocialSecurity: 123456789'
[Person('David'), Person('Ernesto')]
ssdict keys: [123456789, 476892771999L]
Name: David Age: 108 SS: 476892771999
Name: Ernesto Age: 25 SS: 123456789

if you want to try this on a file, (we'll use the source itself here
since it includes valid example data lines), do something like:
import ernesto
info = ernesto.extract_info(open('ernesto.py')) AssertionError: Bad second line after "Name: Ernesto" line:
'Age: 44 Brithdy: 040106 SocialSecurity: 123456789\n' info

[Person('David'), Person('Ernesto')]

tweak to taste ;-)

Regards,
Bengt Richter
Feb 7 '06 #9
Thanks tons !

Feb 7 '06 #10
On Tue, 07 Feb 2006 18:10:05 GMT, bo**@oz.net (Bengt Richter) wrote:
[...]
----< ernesto.py >--------------------------------------------------------- [...]
Just noticed: substrings = line.split()
if substrings and isinstance(substrings, list) and substrings[0] == 'Name:':

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--not needed

str.split always returns a list, even if it's length 1, so that was harmless but should be

if substrings and substrings[0] == 'Name:':

(the first term is needed because ''.split() => [], to avoid [][0])
Sorry.

Regards,
Bengt Richter
Feb 10 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
by: fix | last post by:
Hi all, I feel unclear about what my code is doing, although it works but I am not sure if there is any possible bug, please help me to verify it. This is a trie node (just similar to tree nodes)...
15
by: bugzilla | last post by:
hi,all, I have a C++ program need to convert to c language to be used in a emabedded system. the problem is that the original code was writtern in C++ language with Parent class and some child...
19
by: Russell Shaw | last post by:
Hi, I have two structs in a header file, and they reference each other, causing a compile error. Is there a standard way to deal with this? typedef struct { ... RtAction *actions; }...
2
by: Maurice | last post by:
Hi, Is it legal to cast from void(*)(A*) to void(*)(B*)? Is it legal to cast from struct Derived{Base b; ...} to struct Base? I'm trying to get some inheritance and polymorphism in C and I...
4
by: PCHOME | last post by:
Hi! I have questions about qsort( ). Is anyone be willing to help? I use the following struct: struct Struct_A{ double value; ... } *AA, **pAA;
16
by: Zero | last post by:
Hi everybody! I have the following code: struct Infos { char Haarfarbe; int Groesse; }; struct Person
7
by: Alex | last post by:
If I have two struct. See below: struct s1 { int type; int (*destroy)(struct s1* p); } struct s2 { struct s1 base;
9
by: werasm | last post by:
Hi all, What is the difference between: typedef struct { ... } MyS1; ....and...
3
by: David Bear | last post by:
I found this simple recipe for converting a dotted quad ip address to a string of a long int. struct.unpack('L',socket.inet_aton(ip)) trouble is when I use this, I get struct.error: unpack...
4
by: hugo.arregui | last post by:
Hi! I have two struts like that: struct { int num; int num2; struct b arrayOfB; } a;
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.