Connecting Tech Pros Worldwide Help | Site Map

MS XML Parser error in CData section

!NoItAll's Avatar
Member
 
Join Date: May 2006
Location: Madison, Wi
Posts: 76
#1: Mar 8 '09
The MSXML parser is choking on a single character that often appears in my data within a CDATA section.

Try this:

Expand|Select|Wrap|Line Numbers
  1. Dim bRet As Boolean
  2. Dim lRet as long 
  3. Dim xmlDoc As MSXML2.DOMDocument
  4. Set xmlDoc = New MSXML2.DOMDocument
  5.  
  6. bRet = xmlDoc.Load("d:\test.xml")
  7.  
  8. lRet = xmlDoc.parseError.filepos   'returns the position of the A9 (copyright symbol)
  9.  
  10.  
The file I'm loading looks like this:

<NRCS_2NEWARC RECORDNUMBER= "1844">
<TreeStructure Data="19930121"/>
<![CDATA[NEWSS © 1993 All Rights Reserved ]]>
</NRCS_2NEWARC>

Which I saved as d:\test.xml. See the A9 (copyright symbol) inside the CData section - MSXML chokes on it every time - If I remove the copyright symbol everything works as expected. Why? I thought a CDATA section was supposed to be passed intact! The only thing you're not supposed to put into a CDATA section is ]]> which terminates it.
This is frustrating!
I've tried it with MSXML 4, 5, and 6.
!NoItAll's Avatar
Member
 
Join Date: May 2006
Location: Madison, Wi
Posts: 76
#2: Mar 9 '09

re: MS XML Parser error in CData section


ok - I see the problem. That character, standing alone, makes for improper UTF8, which XML is expecting. The correct thing to do is make sure I convert the data to UTF8 first - but that seems stupid to me. Again - I thought CDATA was supposed to go completely uninterpreted so you could put any old garbage in there. Apparently not - it has to be proper garbage....
Reply


Similar Visual Basic 4 / 5 / 6 bytes