473,406 Members | 2,378 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

MSXML: interpretation of encoded characters

I have a big xml doc full of greek &encoded; characters, which I am
attempting to split up with the MSXML dom object and store chunks in a
database. (Working with ASP)

The problem I'm currently having is that as soon as I load the doc into
xmldom ("MSXML2.DOMDocument.3.0") the encoded characters get
interpreted. When they get inserted into the database they have lost
their original format.

e.g. "♂" gets inserted as "♂" when I really need it to be
preserved as "♂".

How can I stop the MSXML dom object from interpreting the characters?

Thanks

Drew

Jul 20 '05 #1
4 2022
"DrewM" <bo***@doesntexist.com> wrote in message
news:3f*********************@mistral.news.newnet.c o.uk
e.g. "&#x2642;" gets inserted as "♂" when I really need it to be
preserved as "&#x2642;".

How can I stop the MSXML dom object from interpreting the characters?


You cannot. XML is the serialized form of a DOM tree according to a
certain character encoding table. Entities get replaced with their
respective meanings as the document gets loaded into memory.

What you have is not the string '&#x2642;', but an entity saying "I am
Unicode character 9794", and in-memory repesentation will reflect
exactly this fact.

The string '&#x2642;' would repesented as "&amp;#x2642;" in the XML
file, but then it is not a greek character anymore.

What may help you is using the unicode datatypes for your rows - nchar,
nvarchar or, FWIW, ntext (you do not use them, so your character is
displayed as two characters 'â™', as it indeed is two bytes long).

If you then put data from your database back to XML, the correct entity
will be used again.

Martin
Jul 20 '05 #2
Martin Boehm wrote:
What may help you is using the unicode datatypes for your rows - nchar,
nvarchar or, FWIW, ntext (you do not use them, so your character is
displayed as two characters 'â™', as it indeed is two bytes long).

If you then put data from your database back to XML, the correct entity
will be used again.


Thanks for your reply, Martin.

The column I'm inserting into is ntext. The characters get inserted like
'â™,' all the same. Is that what I'd expect?

When I pull the text back out of the database the stay as weird
characters instead of going back to the correct entities - but that may
be my error.

I guess my core question is should the characters look like 'â™,' in my
ntext database column?

Thanks

Drew

Jul 20 '05 #3
"DrewM" <bo***@doesntexist.com> wrote in message
news:3f*********************@mistral.news.newnet.c o.uk
What may help you is using the unicode datatypes for your rows -
nchar, nvarchar or, FWIW, ntext [...]


[...]

The column I'm inserting into is ntext. The characters get inserted
like 'â™,' all the same. Is that what I'd expect?


Since I am not sure what exactly you do, maybe could you post some small
code snippets showing your XML and ASP? What version of SQL Server do
you use?
Maybe Q239530 might help you, but I guess you know that already.

Martin

P.S.: I am not online again until next Monday, so do not wait. ;-)
Jul 20 '05 #4
Hi.
I have some similar problems and was wondering who here could help...

I have some large greek and russian encoded xml files, and when I try
to
display them in html, the encoding seems to stop half at certain
spots..
Here is an example of greek xml...

<option number="1">&#x39D;&#x3B1;
&#x3C0;&#x3C1;&#x3BF;&#x3BB;&#x3AC;&#x3B2;&#x3B5;& #x3B9;
&#x3BC;&#x3AE;&#x3C0;&#x3C9;&#x3C2; &#x3C4;&#x3B1;
&#x3BC;&#x3B7;&#x3C7;&#x3B1;&#x3BD;&#x3AE;&#x3BC;& #x3B1;&#x3C4;&#x3B1;
&#x3C3;&#x3B2;&#x3AE;&#x3C3;&#x3BF;&#x3C5;&#x3B D;
&#x3BB;&#x3CC;&#x3B3;&#x3C9;
&#x3C5;&#x3C8;&#x3B7;&#x3BB;&#x3CE;&#x3BD;
&#x3C0;&#x3B9;&#x3AD;&#x3C3;&#x3B5;&#x3C9;&#x3C 2; &#x3C3;&#x3B5;
&#x3C0;&#x3B5;&#x3C1;&#x3B9;&#x3C0;&#x3C4;&#x3CE;& #x3C3;&#x3B5;&#x3B9;&#x3C2;
&#x3C0;&#x3BF;&#x3C5; &#x3B7;
&#x3B8;&#x3B5;&#x3C1;&#x3BC;&#x3BF;&#x3BA;&#x3C1;& #x3B1;&#x3C3;&#x3AF;&#x3B1;
&#x3C0;&#x3B5;&#x3C1;&#x3B9;&#x3B2;&#x3AC;&#x3BB;& #x3BB;&#x3BF;&#x3BD;&#x3C4;&#x3BF;&#x3C2;
&#x3B5;&#x3AF;&#x3BD;&#x3B1;&#x3B9;
&#x3C5;&#x3C8;&#x3B7;&#x3BB;&#x3AE; &#x3BA;&#x3B1;&#x3B9;
&#x3C4;&#x3B1; &#x3BC;&#x3B7;&#x3C7;&#x3B1;&#x3BD;&#x3AE;&#x3BC;& #x3B1;&#x3C4;&#x3B1;
&#x380;˜{%!</option><option number="2">Î?α ανακαλÏ?Ï?ει
μηÏ?ανήμαÏ?α Ï?οÏ? Ï?Ï?Ï?Ï?ν Î*Ï?οÏ?ν κάÏ?οια
διαÏ?Ï?οή αÏ?Ï? Ï?ην Ï?ελεÏ?Ï?αία Ï?οÏ?ά Ï?οÏ?
Î*γινε Î*λεγÏ?οÏ? και Ï?α οÏ?οία μÏ?οÏ?εί να
Ï?Ï?ειάζεÏ?αι να εÏ?ιÏ?Î</option>

I dont want to display the proper characters here, I just want the xml
to be formed properly with encoded characters so that it can be parsed
later on...
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: mohn3310 | last post by:
We're having some problems with foreign characters using ServerXMLHTTP. We have an asp page that calls a function in COM through ServerXMLHTTP. The COM component returns xml data. If that data...
0
by: Matthew Simoneau | last post by:
I'm trying to figure out how to HTML encode shift_jis text and put it into an attribute. Starting with this XML-file with characters encoded in shift_jis <?xml version="1.0"...
3
by: Supratim | last post by:
Hi, For past few weeks I am working on a function that would take encoded Unicode characters from query string of http requests and then decode them back to Unicode numbers. I have full success...
19
by: Mark Miller | last post by:
QUESTION: Does anyone know how I can use v2.6 of the MSXML parser with .NET? BACKGROUND: I "Web to Print" process that allows our clients (newspapers) to export their data and pass it thru a...
4
by: K | last post by:
I've an XML file in UTF-8. It contains some chinese characters ( both simplified chinese and traditional chinese). In loading the XML file with MSXML parser, I used the below code to retrieve...
3
by: Scott David Daniels | last post by:
In reading over the source for CPython's PyUnicode_EncodeDecimal, I see a dance to handle characters which are neither dec-equiv nor in Latin-1. Does anyone know about the intent of such a...
1
by: Ken Fine | last post by:
I'm using ASP VB. I want to insert a VbCr or a VbLf into a declared string every N characters. FWIW, I want to do this because of an apparent limitation in MSXML2, which I am using to "scrape" data...
5
by: Jeroen | last post by:
We're using MSXML to transform the XML document we have to an XHTML file using an XSLT. Now the problem is that the dotnet implementation we made does something subtly different from the...
19
by: Matthias Truxa | last post by:
Hello, can anyone confirm the existence of the following effects which I'd consider being a critical bug in msxml according to w3c's xpath specs? The Spec says: "The parent, ancestor,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.