i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory
now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
def characters(self,str):
if self.dn:
print str
def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler
ch = oldHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
Can someone tell me the solution for this. 7 1773
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory
now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
def characters(self,str):
if self.dn:
print str
The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.
def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler
ch = oldHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
Can someone tell me the solution for this.
How do i append characters to a string?
actually my entire handler code is
class oldHandler(ContentHandler):
def __init__(self):
self.fn = 0
self.dn = 0
self.i=[]
self.x=""
self.y=""
self.z=""
self.t=0
self.xx=''
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
if name=='name':
self.fn=1
if name=='time':
self.t=1
def characters(self,str):
if self.dn:
self.x=str
if self.fn:
self.y=str
if self.t:
self.z=str
ss= self.x+'/'+self.y+','+self.z+ '\r \n'
self.i.append(ss)
def endElement(self, name):
if name == 'dirname':
self.dn=0
if name=='name':
self.fn=0
if name=='time':
self.t=0
def endDocument(self):
f=open('old.txt', 'w')
self.i.sort
f.writelines(self.i)
f.close
so my old.txt now looks like this
y+def.txt,200607130417
C:\Documents and Settings\Administrator\Desktop\1\hii
wx\abc.txt,200607130415
But i wont the output as
C:\Documents and Settings\Administrator\Desktop\1\bye
w&y\def.txt,200607130417
C:\Documents and Settings\Administrator\Desktop\1\hii
wx\abc.txt,200607130415
Stefan Behnel wrote:
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory
now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
def characters(self,str):
if self.dn:
print str
The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.
def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler
ch = oldHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
Can someone tell me the solution for this.
Kirt wrote:
How do i append characters to a string?
I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().
def startElement(self, ...):
self.chars = ''
def characters(self, s):
self.chars += s
def endElement(self, ...):
value = self.chars
Or use a list and do this:
def endElement(self, ...):
value = ''.join(self.char_list)
Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ... http://effbot.org/zone/element-iterparse.htm http://codespeak.net/svn/lxml/trunk/doc/api.txt
Stefan
Stefan Behnel wrote:
>Kirt wrote:
>>i have walked a directory and have written the foll xml document. one of the folder had "&" character so i replaced it by "&" #------------------test1.xml <Directory> <dirname>C:\Documents and Settings\Administrator\Desktop\1\bye w&y </dirname> <file> <name>def.txt</name> <time>200607130417</time> </file> </Directory> <Directory> <dirname>C:\Documents and Settings\Administrator\Desktop\1\hii wx</dirname> <file> <name>abc.txt</name> <time>200607130415</time> </file> </Directory
now in my python code i want to parse this doc and print the directory name. ###----------handler------------filename---handler.py from xml.sax.handler import ContentHandler class oldHandler(ContentHandler): def __init__(self): self.dn = 0 def startElement(self, name, attrs): if name=='dirname': self.dn=1
def characters(self,str): if self.dn: print str
The problem is here. "print" adds a newline. Don't use print, just append the characters (to a string or list) until the endElement callback is called.
>> def endElement(self, name): if name == 'dirname': self.dn=0
#--------------------------------------------------------------------- #main code--- fname----art.py import sys from xml.sax import make_parser from handlers import oldHandler
ch = oldHandler() saxparser = make_parser()
saxparser.setContentHandler(ch) saxparser.parse(sys.argv[1]) #----------------------------------------------------------------------------- i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w & y C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this. C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
Can someone tell me the solution for this.
thanx stefan ur approach worked.
Stefan Behnel wrote:
Kirt wrote:
How do i append characters to a string?
I think the normal approach is to store an empty string (or list) in an
attribute in startElement(), append to it in characters() and use the result
in endElement().
def startElement(self, ...):
self.chars = ''
def characters(self, s):
self.chars += s
def endElement(self, ...):
value = self.chars
Or use a list and do this:
def endElement(self, ...):
value = ''.join(self.char_list)
Maybe you should consider switching to iterparse() of ElementTree or lxml.
Should be a bit easier to use than SAX ...
http://effbot.org/zone/element-iterparse.htm http://codespeak.net/svn/lxml/trunk/doc/api.txt
Stefan
Stefan Behnel wrote:
Kirt wrote: i have walked a directory and have written the foll xml document. one of the folder had "&" character so i replaced it by "&" #------------------test1.xml <Directory> <dirname>C:\Documents and Settings\Administrator\Desktop\1\bye w&y </dirname> <file> <name>def.txt</name> <time>200607130417</time> </file> </Directory> <Directory> <dirname>C:\Documents and Settings\Administrator\Desktop\1\hii wx</dirname> <file> <name>abc.txt</name> <time>200607130415</time> </file> </Directory
now in my python code i want to parse this doc and print the directory name. ###----------handler------------filename---handler.py from xml.sax.handler import ContentHandler class oldHandler(ContentHandler): def __init__(self): self.dn = 0 def startElement(self, name, attrs): if name=='dirname': self.dn=1
def characters(self,str): if self.dn: print str
The problem is here. "print" adds a newline. Don't use print, just append the
characters (to a string or list) until the endElement callback is called.
def endElement(self, name): if name == 'dirname': self.dn=0
#--------------------------------------------------------------------- #main code--- fname----art.py import sys from xml.sax import make_parser from handlers import oldHandler
ch = oldHandler() saxparser = make_parser()
saxparser.setContentHandler(ch) saxparser.parse(sys.argv[1]) #----------------------------------------------------------------------------- i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w & y C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this. C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
Can someone tell me the solution for this.
A SAX parser can notify a text node by calling any number of times the
characters method so you need to accumulate all the information you
receive on the characters method and output the text when you get a
notification different than characters.
Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&y </dirname>
<file>
<name>def.txt</name>
<time>200607130417</time>
</file>
</Directory>
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\hii
wx</dirname>
<file>
<name>abc.txt</name>
<time>200607130415</time>
</file>
</Directory
now in my python code i want to parse this doc and print the directory
name.
###----------handler------------filename---handler.py
from xml.sax.handler import ContentHandler
class oldHandler(ContentHandler):
def __init__(self):
self.dn = 0
def startElement(self, name, attrs):
if name=='dirname':
self.dn=1
def characters(self,str):
if self.dn:
print str
def endElement(self, name):
if name == 'dirname':
self.dn=0
#---------------------------------------------------------------------
#main code--- fname----art.py
import sys
from xml.sax import make_parser
from handlers import oldHandler
ch = oldHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.argv[1])
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
Can someone tell me the solution for this.
Note that any good SAX tutorial will demonstrate how to buffer the
characters() events, if you don't feel like reinventing the solution
yourself.
Kirt wrote:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and Settings\Administrator\Desktop\1\bye
w&y </dirname>
[...]
#-----------------------------------------------------------------------------
i run the code as: $python art.py test1.xml
i am getting output as:
C:\Documents and Settings\Administrator\Desktop\1\bye w
&
y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
where as i need an output which should look like this.
C:\Documents and Settings\Administrator\Desktop\1\bye w&y
C:\Documents and Settings\Administrator\Desktop\1\hii wx
This SAX filter is another way of doing it: http://aspn.activestate.com/ASPN/Coo.../Recipe/265881
Stefan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: C. Titus Brown |
last post by:
Hi all,
while playing with PBP/mechanize/ClientForm, I ran into a problem with
the way htmllib.HTMLParser was handling encoded tag attributes.
Specifically, the following HTML was not being...
|
by: DrTebi |
last post by:
Hello,
I have the following problem:
I used to "encode" my email address within links, in order to avoid (most)
email spiders. So I had a link like this:
<a...
|
by: BoonHead, The Lost Philosopher |
last post by:
I think the .NET framework is great!
It's nice, clean and logical; in contradiction to the old Microsoft.
It only saddens me that the new Microsoft still doesn't under stand there own...
|
by: Eric Osman |
last post by:
Hi,
I'm looking for a javascript function that will convert input such as this:
<CLUB Code="
into this:
<CLUB Code="
|
by: barney |
last post by:
Hello,
I' m using .NET System.Xml.XmlDOcument.
When I do the following:
XmlDocument xml = new XmlDocument();
xml.Load("blah");
....
xml.Save("blub");
I've got the problem that the following...
|
by: martin |
last post by:
Hi,
I would be extremly grateful for some help on producing an xml fragemt.
The fragment that I wish to produce should look like this
<Addresses>
<Address>&qout;Somebody's Name"...
|
by: Arne |
last post by:
A lot of Firefox users I know, says they have problems with validation
where the ampersand sign has to be written as & to be valid. I don't
have Firefox my self and don't wont to install it only...
|
by: Kirt |
last post by:
i have walked a directory and have written the foll xml document.
one of the folder had "&" character so i replaced it by "&"
#------------------test1.xml
<Directory>
<dirname>C:\Documents and...
|
by: Ragnar |
last post by:
Hi,
2 issues left with my tidy-work:
1) Tidy transforms a "&" in the source-xml into a "&" in the tidied
version. My XML-Importer cannot handle it
2) in a long <title>-string a wrap is...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |