By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,441 Members | 1,691 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,441 IT Pros & Developers. It's quick & easy.

Problem with "&" charater in xml.

P: n/a

i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.

Jul 13 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #2

P: n/a
How do i append characters to a string?

actually my entire handler code is
class oldHandler(ContentHandler):
def __init__(self):
self.fn = 0
self.dn = 0
self.i=[]
self.x=""
self.y=""
self.z=""
self.t=0
self.xx=''

def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
if name=='name':
self.fn=1
if name=='time':
self.t=1

def characters(self,str):
if self.dn:
self.x=str

if self.fn:
self.y=str
if self.t:
self.z=str
ss= self.x+'/'+self.y+','+self.z+ '\r \n'
self.i.append(ss)
def endElement(self, name):
if name == 'dirname':
self.dn=0
if name=='name':
self.fn=0
if name=='time':
self.t=0
def endDocument(self):
f=open('old.txt', 'w')
self.i.sort
f.writelines(self.i)
f.close
so my old.txt now looks like this
y+def.txt,200607130417
C:\Documents and Settings\Administrator\Desktop\1\hii
wx\abc.txt,200607130415

But i wont the output as
C:\Documents and Settings\Administrator\Desktop\1\bye
w&y\def.txt,200607130417
C:\Documents and Settings\Administrator\Desktop\1\hii
wx\abc.txt,200607130415
Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str


The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #3

P: n/a
Kirt wrote:
How do i append characters to a string?
I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(self, ...):
self.chars = ''
def characters(self, s):
self.chars += s
def endElement(self, ...):
value = self.chars

Or use a list and do this:

def endElement(self, ...):
value = ''.join(self.char_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan

Stefan Behnel wrote:
>Kirt wrote:
>>i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

>> def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #4

P: n/a
thanx stefan ur approach worked.

Stefan Behnel wrote:
Kirt wrote:
How do i append characters to a string?

I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(self, ...):
self.chars = ''
def characters(self, s):
self.chars += s
def endElement(self, ...):
value = self.chars

Or use a list and do this:

def endElement(self, ...):
value = ''.join(self.char_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan

Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.
def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #5

P: n/a
A SAX parser can notify a text node by calling any number of times the
characters method so you need to accumulate all the information you
receive on the characters method and output the text when you get a
notification different than characters.

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #6

P: n/a
Note that any good SAX tutorial will demonstrate how to buffer the
characters() events, if you don't feel like reinventing the solution
yourself.
Jul 13 '06 #7

P: n/a
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
[...]
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx
This SAX filter is another way of doing it:

http://aspn.activestate.com/ASPN/Coo.../Recipe/265881

Stefan
Jul 17 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.