Connecting Tech Pros Worldwide Forums | Help | Site Map

a html parse problem

cheng
Guest
 
Posts: n/a
#1: Jul 19 '05
hi,all

if the html like:
<meta name = "description" content = "a test page">
<meta name = "keywords" content = "keyword1 keyword2">

if i use:
def handle_starttag(self, tag, attrs):
if tag == 'meta':
self.attr = attrs
self.headers += ['%s' % (self.attr)]
self.attr = ''

will get the output:
[('name', 'description'), ('content', 'a test page')]

[('name', 'keywords'), ('content', 'keyword1 keyword2')]

is it some way that only take the content like " a test page, keyword1
, keywork2"

bruno modulix
Guest
 
Posts: n/a
#2: Jul 19 '05

re: a html parse problem


cheng wrote:[color=blue]
> hi,all
>
> if the html like:
> <meta name = "description" content = "a test page">
> <meta name = "keywords" content = "keyword1 keyword2">
>
> if i use:
> def handle_starttag(self, tag, attrs):
> if tag == 'meta':
> self.attr = attrs
> self.headers += ['%s' % (self.attr)]
> self.attr = ''[/color]


[color=blue]
> will get the output:
> [('name', 'description'), ('content', 'a test page')]
>
> [('name', 'keywords'), ('content', 'keyword1 keyword2')][/color]
[color=blue]
> is it some way that only take the content like " a test page, keyword1
> , keywork2"[/color]

And put it where ?-)

Well, it may looks like this:

def handle_starttag(self, tag, attrs):
if tag == 'meta':
try:
self.content.append(attrs['content'])
except KeyError:
pass
self.headers += ['%s' % attr]

HTH
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb@xiludom.gro'.split('@')])"
Closed Thread