Hi,
I've created a custom filter based on HTMLParser, with the following
source:
class Filter(HTMLPars er):
def __init__(self, keyfile):
HTMLParser.__in it__(self)
mykwfile = open(keyfile, 'r')
self._keywords = []
for kw in mykwfile.read() .split('\n'):
self._keywords. append(kw)
print kw
mykwfile.close( )
self._toProcess = False
self.stack = []
def handle_starttag (self, tag, attrs):
if 'a' != tag:
self.stack.appe nd(self.__html_ start_tag(tag, attrs))
return
attrs = dict(attrs)
self._toProcess = True
for key in self._keywords:
if 'a' == tag:
p = re.compile(key, re.IGNORECASE)
if 'href' in attrs:
attrs['href'] = p.sub(r'XXXXX', attrs['href'])
self.stack.appe nd(self.__html_ start_tag(tag, attrs))
def handle_startend tag(self, tag, attrs):
if 'img' != tag and 'meta' != tag:
self.stack.appe nd(self.__html_ startend_tag(ta g, attrs))
return
attrs = dict(attrs)
self._toProcess = True
for key in self._keywords:
p = re.compile(key, re.IGNORECASE)
if 'img' == tag:
if 'src' in attrs:
attrs['src'] = p.sub(r'XXXXX', attrs['src'])
if 'alt' in attrs:
attrs['alt'] = p.sub(r'XXXXX', attrs['alt'])
if 'meta' == tag:
if 'description' in attrs:
attrs['description'] =
p.sub(r'XXXXX', attrs['description'])
if 'content' in attrs:
attrs['content'] =
p.sub(r'XXXXX', attrs['content'])
if 'meta' == tag or 'img' == tag:
self._toProcess = False
self.stack.appe nd(self.__html_ startend_tag(ta g, attrs))
def handle_endtag(s elf, tag):
self.stack.appe nd(self.__html_ end_tag(tag))
if self._toProcess :
self._toProcess = False
def handle_data(sel f, data):
if self._toProcess :
for key in self._keywords:
p = re.compile(key, re.IGNORECASE)
data = p.sub(r'XXXXX', data)
self.stack.appe nd(data)
def __html_start_ta g(self, tag, attrs):
return '<%s%s>' % (tag, self.__html_att rs(attrs))
def __html_startend _tag(self, tag, attrs):
return '<%s%s/>' % (tag, self.__html_att rs(attrs))
def __html_end_tag( self, tag):
return '</%s>' % (tag)
def __html_attrs(se lf, attrs):
_attrs = ''
if attrs:
_attrs = ' %s' % (' '.join([('%s="%s"' % (k,v)) for k,v in
attrs.iteritems ()]))
return _attrs
But when I use it, it gives me the following error message:
ERROR Processor exception: AttributeError: 'list' object has no
attribute 'it
eritems'
Traceback (most recent call last):
File "d:\esp\lib\pyt hon2.3\processo rs\DocDumpF.py" , line 87, in
Process
p.feed(document .GetValue("data "))
File "HTMLParser.py" , line 108, in feed
File "HTMLParser.py" , line 148, in goahead
File "HTMLParser.py" , line 281, in parse_starttag
File "d:\esp\lib\pyt hon2.3\processo rs\DocDumpF.py" , line 121, in
handle_startt
ag
self.stack.appe nd(self.__html_ start_tag(tag, attrs))
File "d:\esp\lib\pyt hon2.3\processo rs\DocDumpF.py" , line 167, in
__html_start_
tag
return '<%s%s>' % (tag, self.__html_att rs(attrs))
File "d:\esp\lib\pyt hon2.3\processo rs\DocDumpF.py" , line 178, in
__html_attrs
_attrs = ' %s' % (' '.join([('%s="%s"' % (k,v)) for k,v in
attrs.iteritems ()
]))
Anybody knows why it says attrs is not a list element?
Thanks,
Rubén 2 8955
def handle_starttag (self, tag, attrs): # <-- attrs here is a
list
if 'a' != tag:
self.stack.appe nd(self.__html_ start_tag(tag, attrs)) #
<-- attrs here is still a list
return
attrs = dict(attrs) # <-- now attrs is a dictionary
rabad a écrit :
Hi,
I've created a custom filter based on HTMLParser, with the following
source:
(snip)
But when I use it, it gives me the following error message:
ERROR Processor exception: AttributeError: 'list' object has no
attribute 'iteritems'
(snip)
File "d:\esp\lib\pyt hon2.3\processo rs\DocDumpF.py" , line 178, in
__html_attrs
_attrs = ' %s' % (' '.join([('%s="%s"' % (k,v)) for k,v in
attrs.iteritems ()
]))
Anybody knows why it says attrs is not a list element?
Actually, what the traceback says is that
1/ attrs is a list object
2/ list objects have no attribute named iteritems
If you assumed it was a dict, then it's probably time to re-read
HTMLParser's doc. Else if you assumed list had an iteritems method, then
it's probably time to re-read the Python's tutorial !-)
IIRC, HTMLParser represents attributes as a list of (attrname, value)
pairs. If so (please check it out), your method should be rewritten as
return ' %s' % (' '.join(('%s="%s "') % attr for attr in attrs)
As a side note: __double_leadin g_undescores is probably a bit extrem.
The convention for implementation attributes is _single_leading _underscore. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Des Small |
last post by:
Lately I have found myself using a pattern to make new dictionaries
quite often, by which I mean twice:
def invert(d):
nd = {}
).append(key) for k, v in d]
return nd
def count(l):
d = {}
|
by: svilen |
last post by:
hi.
this was named but it is
misleading.
i want to have the order of setting items in intrinsic dicts (keyword
args, or class attributes, etc). which is, a prdered dict trhat
matches the source-text sequences.
|
by: svilen |
last post by:
hello again.
i'm now into using python instead of another language(s) for
describing structures of data, including names, structure,
type-checks, conversions, value-validations, metadata etc. And i have
things to offer, and to request.
And a lot of ideas, but who needs them....
here's an example (from type_struct.py):
|
by: Christoph Zwerschke |
last post by:
Ok, the answer is easy: For historical reasons - built-in sets exist
only since Python 2.4.
Anyway, I was thinking about whether it would be possible and desirable
to change the old behavior in future Python versions and let dict.keys()
and dict.values() both return sets instead of lists.
If d is a dict, code like:
for x in d.keys():
|
by: Franck PEREZ |
last post by:
Hello all,
Considering the following code :
class C(object):
...: observers =
...:
...: @classmethod
...: def showObservers(cls):
...: print cls.observers
| |
by: Asko Telinen |
last post by:
Hi all.
I´m a bit newbie writing xml schemas.
Is it possible to define xml element that must have unique attribute
values in same level. For example if i have a xml - document:
<list>
<subsection name="first">
<!-- subsection contents -->
|
by: Drew |
last post by:
When is it appropriate to use dict.items() vs dict.iteritems. Both
seem to work for something like:
for key,val in mydict.items():
print key,val
for key,val in mydict.iteritems():
print key,val
Also, when is it appropriate to use range() vs xrange(). From my
|
by: nitinpatel1117 |
last post by:
i am using an unordered list to display my horizontal navigation.
i am using something link
<ul>
<li>link 1</li>
<li>link 2</li>
</ul>
|
by: Sengly |
last post by:
Dear all,
I am working with wordnet and I am a python newbie. I'd like to know
how can I transfer a list below
In : dog
Out:
to a list like this with python:
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| | |