473,696 Members | 1,770 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Problem with "&" charater in xml.


i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
<file>
<name>def.txt </name>
<time>200607130 417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\hii
wx</dirname>
<file>
<name>abc.txt </name>
<time>200607130 415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(Cont entHandler):
def __init__(self):
self.dn = 0
def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1

def characters(self ,str):
if self.dn:
print str

def endElement(self , name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setCo ntentHandler(ch )
saxparser.parse (sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Documents and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Admini strator\Desktop \1\bye w&y

C:\Documents and Settings\Admini strator\Desktop \1\hii wx

Can someone tell me the solution for this.

Jul 13 '06 #1
7 1794
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
<file>
<name>def.txt </name>
<time>200607130 417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\hii
wx</dirname>
<file>
<name>abc.txt </name>
<time>200607130 415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(Cont entHandler):
def __init__(self):
self.dn = 0
def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1

def characters(self ,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

def endElement(self , name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setCo ntentHandler(ch )
saxparser.parse (sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Documents and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Admini strator\Desktop \1\bye w&y

C:\Documents and Settings\Admini strator\Desktop \1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #2
How do i append characters to a string?

actually my entire handler code is
class oldHandler(Cont entHandler):
def __init__(self):
self.fn = 0
self.dn = 0
self.i=[]
self.x=""
self.y=""
self.z=""
self.t=0
self.xx=''

def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1
if name=='name':
self.fn=1
if name=='time':
self.t=1

def characters(self ,str):
if self.dn:
self.x=str

if self.fn:
self.y=str
if self.t:
self.z=str
ss= self.x+'/'+self.y+','+se lf.z+ '\r \n'
self.i.append(s s)
def endElement(self , name):
if name == 'dirname':
self.dn=0
if name=='name':
self.fn=0
if name=='time':
self.t=0
def endDocument(sel f):
f=open('old.txt ', 'w')
self.i.sort
f.writelines(se lf.i)
f.close
so my old.txt now looks like this
y+def.txt,20060 7130417
C:\Documents and Settings\Admini strator\Desktop \1\hii
wx\abc.txt,2006 07130415

But i wont the output as
C:\Documents and Settings\Admini strator\Desktop \1\bye
w&y\def.txt,200 607130417
C:\Documents and Settings\Admini strator\Desktop \1\hii
wx\abc.txt,2006 07130415
Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
<file>
<name>def.txt </name>
<time>200607130 417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\hii
wx</dirname>
<file>
<name>abc.txt </name>
<time>200607130 415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(Cont entHandler):
def __init__(self):
self.dn = 0
def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1

def characters(self ,str):
if self.dn:
print str


The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

def endElement(self , name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setCo ntentHandler(ch )
saxparser.parse (sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Documents and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Admini strator\Desktop \1\bye w&y

C:\Documents and Settings\Admini strator\Desktop \1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #3
Kirt wrote:
How do i append characters to a string?
I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(se lf, ...):
self.chars = ''
def characters(self , s):
self.chars += s
def endElement(self , ...):
value = self.chars

Or use a list and do this:

def endElement(self , ...):
value = ''.join(self.ch ar_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan

Stefan Behnel wrote:
>Kirt wrote:
>>i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
<file>
<name>def.txt </name>
<time>200607130 417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\hii
wx</dirname>
<file>
<name>abc.txt </name>
<time>200607130 415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(Cont entHandler):
def __init__(self):
self.dn = 0
def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1

def characters(self ,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.

>> def endElement(self , name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.set ContentHandler( ch)
saxparser.par se(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Document s and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Document s and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Document s and Settings\Admini strator\Desktop \1\bye w&y

C:\Document s and Settings\Admini strator\Desktop \1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #4
thanx stefan ur approach worked.

Stefan Behnel wrote:
Kirt wrote:
How do i append characters to a string?

I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().

def startElement(se lf, ...):
self.chars = ''
def characters(self , s):
self.chars += s
def endElement(self , ...):
value = self.chars

Or use a list and do this:

def endElement(self , ...):
value = ''.join(self.ch ar_list)

Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...

http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/svn/lxml/trunk/doc/api.txt

Stefan

Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
<file>
<name>def.txt </name>
<time>200607130 417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\hii
wx</dirname>
<file>
<name>abc.txt </name>
<time>200607130 415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(Cont entHandler):
def __init__(self):
self.dn = 0
def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1

def characters(self ,str):
if self.dn:
print str

The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.
def endElement(self , name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setC ontentHandler(c h)
saxparser.pars e(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Documents and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Admini strator\Desktop \1\bye w&y

C:\Documents and Settings\Admini strator\Desktop \1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #5
A SAX parser can notify a text node by calling any number of times the
characters method so you need to accumulate all the information you
receive on the characters method and output the text when you get a
notification different than characters.

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
<file>
<name>def.txt </name>
<time>200607130 417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\hii
wx</dirname>
<file>
<name>abc.txt </name>
<time>200607130 415</time>
</file>
</Directory

now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(Cont entHandler):
def __init__(self):
self.dn = 0
def startElement(se lf, name, attrs):
if name=='dirname' :
self.dn=1

def characters(self ,str):
if self.dn:
print str

def endElement(self , name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler

ch = oldHandler()
saxparser = make_parser()

saxparser.setCo ntentHandler(ch )
saxparser.parse (sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Documents and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Admini strator\Desktop \1\bye w&y

C:\Documents and Settings\Admini strator\Desktop \1\hii wx

Can someone tell me the solution for this.
Jul 13 '06 #6
Note that any good SAX tutorial will demonstrate how to buffer the
characters() events, if you don't feel like reinventing the solution
yourself.
Jul 13 '06 #7
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&amp;"
#------------------test1.xml
<Directory>
<dirname>C:\Doc uments and Settings\Admini strator\Desktop \1\bye
w&amp;y </dirname>
[...]
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml

i am getting output as:

C:\Documents and Settings\Admini strator\Desktop \1\bye w
&
y
C:\Documents and Settings\Admini strator\Desktop \1\hii wx

where as i need an output which should look like this.
C:\Documents and Settings\Admini strator\Desktop \1\bye w&y

C:\Documents and Settings\Admini strator\Desktop \1\hii wx
This SAX filter is another way of doing it:

http://aspn.activestate.com/ASPN/Coo.../Recipe/265881

Stefan
Jul 17 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1990
by: C. Titus Brown | last post by:
Hi all, while playing with PBP/mechanize/ClientForm, I ran into a problem with the way htmllib.HTMLParser was handling encoded tag attributes. Specifically, the following HTML was not being handled correctly: <option value="Small (6&quot;)">Small (6)</option> The 'value' attr was being given the escaped value, not the
1
11439
by: DrTebi | last post by:
Hello, I have the following problem: I used to "encode" my email address within links, in order to avoid (most) email spiders. So I had a link like this: <a href="mailto:DrTebi@yahoo.com">DrTebi</a> This would work like a regular mailto link in any browser, but wouldn't be visible to spiders if they don't have a function to decode it.
11
3638
by: BoonHead, The Lost Philosopher | last post by:
I think the .NET framework is great! It's nice, clean and logical; in contradiction to the old Microsoft. It only saddens me that the new Microsoft still doesn't under stand there own rules when it comes to file paths. A lot of Microsoft installers for example, and also installers of other companies, do not work because they handle paths in the following manner:
2
20520
by: Eric Osman | last post by:
Hi, I'm looking for a javascript function that will convert input such as this: <CLUB Code=" into this: &lt;CLUB Code=&quot;
4
14790
by: barney | last post by:
Hello, I' m using .NET System.Xml.XmlDOcument. When I do the following: XmlDocument xml = new XmlDocument(); xml.Load("blah"); .... xml.Save("blub"); I've got the problem that the following expression: .... snip ...
5
3434
by: martin | last post by:
Hi, I would be extremly grateful for some help on producing an xml fragemt. The fragment that I wish to produce should look like this <Addresses> <Address>&qout;Somebody's Name&quot; &lt;me@mydomain.com&gt;</Address> </Addresses>
14
5928
by: Arne | last post by:
A lot of Firefox users I know, says they have problems with validation where the ampersand sign has to be written as &amp; to be valid. I don't have Firefox my self and don't wont to install it only because of this, so I hope some of you gurus can enlighten me with this :) In what circumstances can the "&amp;" in the source code be involuntary changed to "&" by a browser when or other software, when editing and uploading the file to the web...
3
1136
by: Kirt | last post by:
i have walked a directory and have written the foll xml document. one of the folder had "&" character so i replaced it by "&amp;" #------------------test1.xml <Directory> <dirname>C:\Documents and Settings\Administrator\Desktop\1\bye w&amp;y </dirname> <file> <name>def.txt</name> <time>200607130417</time> </file>
13
2800
by: Ragnar | last post by:
Hi, 2 issues left with my tidy-work: 1) Tidy transforms a "&amp;" in the source-xml into a "&" in the tidied version. My XML-Importer cannot handle it 2) in a long <title>-string a wrap is produced like: <title>my very long title blab la blab la Blabla bla </title> Importer also has got problems with it
0
8590
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8845
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7693
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6512
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5848
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4351
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3025
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2304
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
1988
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.