473,769 Members | 6,831 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

encoding during elementtree serialization

ElementTree's XML serialization routine implied by tree._write(fil e,
node, encoding, namespaces looks like this (elided):

def _write(self, file, node, encoding, namespaces):
# write XML to file
tag = node.tag
if tag is Comment:
file.write("<!-- %s -->" % _escape_cdata(n ode.text, encoding))
elif tag is ProcessingInstr uction:
file.write("<?% s?>" % _escape_cdata(n ode.text, encoding))
else:
....
file.write("<" + _encode(tag, encoding))
if items or xmlns_items:
items.sort() # lexical order

Note that "_escape_cd ata" (which also performs encoding) and "_encode"
are called for pcdata (and attribute values) only, but not for the tag
literals like "<" and "<?%s?>".

In some profiling I've done, I believe encoding during recursion makes
serialization slightly slower than it could be if we could get away with
not encoding any pcdata or attribute values during recursion.

Instead, we might be able to get away with encoding everything just once
at the end. But I don't know if this is kosher. Is there any reason to
not also encode tag literals and quotation marks that are attribute
containers, just once, at the end of serialization?

Even if that's not acceptable in general because tag literals cannot be
encoded, would it be acceptable for "ascii-compatible" encodings like
utf-8, latin-1, and friends?

Something like:

def _escape_cdata(t ext, encoding=None, replace=string. replace):
# doesn't do any encoding
text = replace(text, "&", "&amp;")
text = replace(text, "<", "&lt;")
text = replace(text, ">", "&gt;")
return text

class _ElementInterfa ce:

...

def write(self, file, encoding="us-ascii"):
assert self._root is not None
if not hasattr(file, "write"):
file = open(file, "wb")
if not encoding:
encoding = "us-ascii"
elif encoding != "utf-8" and encoding != "us-ascii":
file.write("<?x ml version='1.0' encoding='%s'?> \n" % encoding)
tmp = StringIO()
self._write(tmp , self._root, encoding, {})
file.write(tmp. getvalue().enco de(encoding))
def _write(self, file, node, encoding, namespaces):
# write XML to file
tag = node.tag
if tag is Comment:
file.write("<!-- %s -->" % _escape_cdata(n ode.text, encoding))
elif tag is ProcessingInstr uction:
file.write("<?% s?>" % _escape_cdata(n ode.text, encoding))
else:
items = node.items()
xmlns_items = [] # new namespaces in this scope
try:
if isinstance(tag, QName) or tag[:1] == "{":
tag, xmlns = fixtag(tag, namespaces)
if xmlns: xmlns_items.app end(xmlns)
except TypeError:
_raise_serializ ation_error(tag )
file.write("<" + tag)
I smell the mention of a Byte Order Mark coming on. ;-)
Feb 8 '06 #1
0 1972

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3071
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
11
6624
by: Rangi Keen | last post by:
I am instantiating an XmlSerializer using the XmlSerializer(Type) constructor. This works most of the time, but sometimes I get a timeout during the process. I'm using the same type in all cases and it has happened on multiple computers ranging from 500 MHz to 2.6 GHz. Does anyone know why the csc.exe process would time out in this case? It looks like the timeout is set to 10 minutes which seems like a long time to wait for something. ...
15
5426
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with encodings, so I'm sure I'm just screwing something simple up. Can anyone help me? Here's the interactive session. Sorry it's a little verbose, but I figured it would be better to include too much than not enough. I basically expected...
7
4088
by: mirandacascade | last post by:
O/S: Windows XP Home Vsn of Python: 2.4 Copy/paste of interactive window is immediately below; the text/questions toward the bottom of this post will refer to the content of the copy/paste >>> from elementtree import ElementTree >>> beforeRoot = ElementTree.Element('beforeRoot') >>> beforeCtag = ElementTree.SubElement(beforeRoot, 'C')
2
1966
by: mirandacascade | last post by:
Situation is this: 1) I have inherited some python code that accepts a string object, the contents of which is an XML document, and produces a data structure that represents some of the content of the XML document 2) The inherited code is somewhat 'brittle' in that some well-formed XML documents are not correctly processed by the code; the brittleness is caused by how the parser portion of the code handles whitespace. 3) I would like to...
30
4627
by: Chas Emerick | last post by:
I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API, which I understand works in much the same way (with some significant additions). In particular, it shares the use of a .tail attribute. I ran headlong into this aspect of the API while doing some DOM manipulations, and it's got me pretty...
2
5975
by: =?iso-8859-1?q?S=E9bastien_Boisg=E9rault?= | last post by:
Hi, ET being ElementTree in the following code, could anyone explain why it fails ? "<?xml version='1.0' encoding='UTF-16'?>\n<\xff\xfer\x00o\x00o\x00t\x00 />" Traceback (most recent call last): .... xml.parsers.expat.ExpatError: encoding specified in XML declaration is
1
2520
by: Sarika Agarwal | last post by:
Hi, What is the primary difference between serialization and encoding in ..NET! *** Sent via Developersdex http://www.developersdex.com ***
8
2432
by: fscked | last post by:
I am a beginning pythoner and I am having a terrible time trying to figure out how to do something that (it would seeme to me) should be fairly simple. I have a CSV file of unknown encoding and I need to parse that file to get the fields <--- DONE I need to create an xml document that has the proper prolog and namespace information in it. <--- NOT DONE I need it to be encoded properly<--- Looks right in IE, not right in any other app.
0
9590
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10223
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10000
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7413
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5310
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5448
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3968
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3571
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.