sign in | join about | help | sitemap
Connecting Tech Pros Worldwide
Kirt's Avatar

Problem with "&" charater in xml.


Question posted by: Kirt (Guest) on July 13th, 2006 06:45 AM

i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0


#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.

3 Answers Posted
Bjoern Hoehrmann's Avatar
Bjoern Hoehrmann July 13th, 2006 09:05 AM
Guest - n/a Posts
#2: Re: Problem with "&" charater in xml.

* Kirt wrote in comp.text.xml:
Quote:
Originally Posted by
>i am getting output as:
>
>C:\Documents and Settings\Administrator\Desktop\1\bye w
>&
>y
>C:\Documents and Settings\Administrator\Desktop\1\hii wx
>
>where as i need an output which should look like this.
>C:\Documents and Settings\Administrator\Desktop\1\bye w&y
>
>C:\Documents and Settings\Administrator\Desktop\1\hii wx
>
>Can someone tell me the solution for this.


SAX allows to split characters events as you encounter here. If there is
no switch to force the SAX parser to accumulate the text before calling
the handler, you have to do that yourself.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
George Bina's Avatar
Guest - n/a Posts
#3: Re: Problem with "&" charater in xml.

A SAX parser can notify a text node by calling any number of times the
characters method so you need to accumulate all the information you
receive on the characters method and output the text when you get a
notification different than characters.

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Kirt wrote:
Quote:
Originally Posted by
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory
>
now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
>
def characters(self,str):
if self.dn:
print str
>
def endElement(self, name):
if name == 'dirname':
self.dn=0
>
>
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler
>
ch = oldHandler()
saxparser = make_parser()
>
saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml
>
i am getting output as:
>
C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
>
where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y
>
C:\Documents and Settings\Administrator\Desktop\1\hii wx
>
Can someone tell me the solution for this.


Joe Kesselman's Avatar
Joe Kesselman July 13th, 2006 10:05 PM
Guest - n/a Posts
#4: Re: Problem with "&" charater in xml.

Note that any good SAX tutorial will demonstrate how to buffer the
characters() events, if you don't feel like reinventing the solution
yourself.
 
Not the answer you were looking for? Post your question . . .
197,027 members ready to help you find a solution.
Join Bytes.com

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 197,027 network members.
Post your question now . . .
It's fast and it's free

Popular Articles

Top Community Contributors