473,386 Members | 1,610 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Problem with "&" charater in xml.


i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.

Jul 13 '06 #1
7 1773
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #2
How do i append characters to a string?

actually my entire handler code is
class oldHandler(ContentHandler):
def __init__(self):
self.fn = 0
self.dn = 0
self.i=[]
self.x=""
self.y=""
self.z=""
self.t=0
self.xx=''

def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
if name=='name':
self.fn=1
if name=='time':
self.t=1

def characters(self,str):
if self.dn:
self.x=str

if self.fn:
self.y=str
if self.t:
self.z=str
ss= self.x+'/'+self.y+','+self.z+ '\r \n'
self.i.append(ss)
def endElement(self, name):
if name == 'dirname':
self.dn=0
if name=='name':
self.fn=0
if name=='time':
self.t=0
def endDocument(self):
f=open('old.txt', 'w')
self.i.sort
f.writelines(self.i)
f.close
so my old.txt now looks like this
y+def.txt,200607130417
C:\Documents and Settings\Administrator\Desktop\1\hii
wx\abc.txt,200607130415

But i wont the output as
C:\Documents and Settings\Administrator\Desktop\1\bye
w&y\def.txt,200607130417
C:\Documents and Settings\Administrator\Desktop\1\hii
wx\abc.txt,200607130415
Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str


The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #3
Kirt wrote:
How do i append characters to a string?
I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(self, ...):
self.chars = ''
def characters(self, s):
self.chars += s
def endElement(self, ...):
value = self.chars

Or use a list and do this:

def endElement(self, ...):
value = ''.join(self.char_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan

Stefan Behnel wrote:
>Kirt wrote:
>>i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

>> def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #4
thanx stefan ur approach worked.

Stefan Behnel wrote:
Kirt wrote:
How do i append characters to a string?

I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(self, ...):
self.chars = ''
def characters(self, s):
self.chars += s
def endElement(self, ...):
value = self.chars

Or use a list and do this:

def endElement(self, ...):
value = ''.join(self.char_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan

Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.
def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #5
A SAX parser can notify a text node by calling any number of times the
characters method so you need to accumulate all the information you
receive on the characters method and output the text when you get a
notification different than characters.

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1

def characters(self,str):
if self.dn:
print str

def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #6
Note that any good SAX tutorial will demonstrate how to buffer the
characters() events, if you don't feel like reinventing the solution
yourself.
Jul 13 '06 #7
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&amp;y </dirname>
[...]
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y

C:\Documents and Settings\Administrator\Desktop\1\hii wx
This SAX filter is another way of doing it:

http://aspn.activestate.com/ASPN/Coo.../Recipe/265881

Stefan
Jul 17 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: C. Titus Brown | last post by:
Hi all, while playing with PBP/mechanize/ClientForm, I ran into a problem with the way htmllib.HTMLParser was handling encoded tag attributes. Specifically, the following HTML was not being...
1
by: DrTebi | last post by:
Hello, I have the following problem: I used to "encode" my email address within links, in order to avoid (most) email spiders. So I had a link like this: <a...
11
by: BoonHead, The Lost Philosopher | last post by:
I think the .NET framework is great! It's nice, clean and logical; in contradiction to the old Microsoft. It only saddens me that the new Microsoft still doesn't under stand there own...
2
by: Eric Osman | last post by:
Hi, I'm looking for a javascript function that will convert input such as this: <CLUB Code=" into this: &lt;CLUB Code=&quot;
4
by: barney | last post by:
Hello, I' m using .NET System.Xml.XmlDOcument. When I do the following: XmlDocument xml = new XmlDocument(); xml.Load("blah"); .... xml.Save("blub"); I've got the problem that the following...
5
by: martin | last post by:
Hi, I would be extremly grateful for some help on producing an xml fragemt. The fragment that I wish to produce should look like this <Addresses> <Address>&qout;Somebody's Name&quot;...
14
by: Arne | last post by:
A lot of Firefox users I know, says they have problems with validation where the ampersand sign has to be written as &amp; to be valid. I don't have Firefox my self and don't wont to install it only...
3
by: Kirt | last post by:
i have walked a directory and have written the foll xml document. one of the folder had "&" character so i replaced it by "&amp;" #------------------test1.xml <Directory> <dirname>C:\Documents and...
13
by: Ragnar | last post by:
Hi, 2 issues left with my tidy-work: 1) Tidy transforms a "&amp;" in the source-xml into a "&" in the tidied version. My XML-Importer cannot handle it 2) in a long <title>-string a wrap is...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.