By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,626 Members | 2,169 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,626 IT Pros & Developers. It's quick & easy.

Is possible to combine handle_data and regular expressions?

P: n/a
Hi,

I've experimented with regular expressions to solve my problems in the
past but I have seen so many comments about HTMLParser and sgmllib that
I thought I would try a different approach this time so I tried using
HTMLParser.

I want to search through my SGML file for various strings of text and
find out what section they're in. What I have here does this to a
certain extent but I was wondering if I could make handle_data and
regular expressions work together to make this work a little better.

For instance, when I search for "above" as I am here, I just get
something like this: '174.114[1]':'above' but this isn't very useful
b/c I want to know the context of above (i.e., the informaiton on
either side the above) and maybe even us a regular expression to filter
the search a little more.

Any ideas?

As always, I'd appreciate feedback on my efforts.

Thanks,

Greg

###

from HTMLParser import HTMLParser
import os, re
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the file: ")
print
given,ext = os.path.splitext(fname)

inputFile = open(os.path.join(root,fname), 'r')

data = inputFile.read()

class PartFinder(HTMLParser):

_full = None
_secDict = dict()

def found(self):
return self._secDict

def handle_starttag(self, tag, attrs):
if tag == "sec-main":
self._main = dict(attrs).get('no')
self._full = self._main

if tag == "sec-sub1":
self._subone = dict(attrs).get('no')
self._full = self._main + '[' + self._subone + ']'

if tag == "sec-sub2":
self._subtwo = dict(attrs).get('no')
self._full = self._main + '[' + self._subone + ']' + '['
+ self._subtwo + ']'
def handle_data(self, data):
if "Pt" in data:
if not self._secDict.has_key(self._main):
self._secDict[self._full] = [data]
print self._secDict

if __name__ == "__main__":
parser = PartFinder()
parser.feed(data)
x = parser.found()

output_part = given + '.parts'
outputFile = file(os.path.join(root,output_part), 'w')
outputFile.write(str(x))
outputFile.close()

Jan 19 '06 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.