473,508 Members | 2,412 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Problem with "&" charater in xml.


i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.

Jul 13 '06 #1
3 1122
* Kirt wrote in comp.text.xml:
>i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
SAX allows to split characters events as you encounter here. If there is
no switch to force the SAX parser to accumulate the text before calling
the handler, you have to do that yourself.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jul 13 '06 #2
A SAX parser can notify a text node by calling any number of times the
characters method so you need to accumulate all the information you
receive on the characters method and output the text when you get a
notification different than characters.

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #3
Note that any good SAX tutorial will demonstrate how to buffer the
characters() events, if you don't feel like reinventing the solution
yourself.
Jul 13 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1977
by: C. Titus Brown | last post by:
Hi all, while playing with PBP/mechanize/ClientForm, I ran into a problem with the way htmllib.HTMLParser was handling encoded tag attributes. Specifically, the following HTML was not being...
4
14761
by: barney | last post by:
Hello, I' m using .NET System.Xml.XmlDOcument. When I do the following: XmlDocument xml = new XmlDocument(); xml.Load("blah"); .... xml.Save("blub"); I've got the problem that the following...
5
3422
by: martin | last post by:
Hi, I would be extremly grateful for some help on producing an xml fragemt. The fragment that I wish to produce should look like this <Addresses> <Address>&qout;Somebody's Name&quot;...
7
1784
by: Kirt | last post by:
i have walked a directory and have written the foll xml document. one of the folder had "&" character so i replaced it by "&amp;" #------------------test1.xml <Directory> <dirname>C:\Documents and...
0
7224
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7118
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7323
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7379
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7493
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5625
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5049
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3180
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
763
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.