By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,493 Members | 1,289 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,493 IT Pros & Developers. It's quick & easy.

MSXML and UTF-8 chinese characters

P: n/a
K
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).

In loading the XML file with MSXML parser, I used the below code to retrieve
the data in a node. The CString was then display in CListCtrl. For the
traditional chinese characters, they were shown correctly, but for
simplified characters, I encounted many "?", but some characters were
correct.

if (MSXML::NODE_ELEMENT == pChild->nodeType)
{
MSXML::IXMLDOMNamedNodeMapPtr pAttrs = pChild->attributes;
MSXML::IXMLDOMNodePtr pAttr;

pAttr = pAttrs->getNamedItem(L"id");
CString id = OLE2T(pAttr->text);

MSXML::IXMLDOMNodePtr pWording = pChild->firstChild;
CString wording = OLE2T(pWording->text);

//add the wording to language
pMessageLanguage->m_wordingList.insert(MessageWordingListPair(id,
wording) );

}
Nov 16 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
K wrote:
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).

In loading the XML file with MSXML parser, I used the below code to
retrieve the data in a node. The CString was then display in
CListCtrl. For the traditional chinese characters, they were shown
correctly, but for simplified characters, I encounted many "?", but
some characters were correct.


You should compile with UNICODE and _UNICODE defined!
Or you have to convert the unicode to MBCS...

--
Greetings
Jochen

Do you need a memory-leak finder ?
http://www.codeproject.com/tools/leakfinder.asp
Nov 16 '05 #2

P: n/a
K,

Does your XML file begin with the following line?

<?xml version="1.0" encoding="UTF-8" ?>

If not, add this line and see what happens. If you do have this line (or
you add it) and still have problems, then you may be using characters that
Windows cannot support or your fonts cannot display (i.e. traditional
Chinese).

Windows supports Unicode up to version 2.1 only. The XML parser converts
your XML source to UTF-16 and parsed internally. When the XML parser sees
the line above it will convert your XML file from UTF-8 with no loss of
information. However, without this line (specifically without the encoding
clue) the system default ANSI code page will be used when converting to
UTF-16.

Even with this line, you may still have characters that your fonts can't
display, however no loss in the conversion to/from UTF-8 will occur.

Hope this helps (and I hope I know what I'm talking about :-)

-MerkX

"K" <k@taka.com> wrote in message
news:#D**************@TK2MSFTNGP10.phx.gbl...
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).

In loading the XML file with MSXML parser, I used the below code to retrieve the data in a node. The CString was then display in CListCtrl. For the
traditional chinese characters, they were shown correctly, but for
simplified characters, I encounted many "?", but some characters were
correct.

if (MSXML::NODE_ELEMENT == pChild->nodeType)
{
MSXML::IXMLDOMNamedNodeMapPtr pAttrs = pChild->attributes;
MSXML::IXMLDOMNodePtr pAttr;

pAttr = pAttrs->getNamedItem(L"id");
CString id = OLE2T(pAttr->text);

MSXML::IXMLDOMNodePtr pWording = pChild->firstChild;
CString wording = OLE2T(pWording->text);

//add the wording to language
pMessageLanguage->m_wordingList.insert(MessageWordingListPair(id,
wording) );

}

Nov 16 '05 #3

P: n/a
K
My project was compiling as UNICODE build, and my XML was begin with the
<?xml ... ?> line, but my problem is still persist.

After reading in the node in MSXML, can I use the macro OLE2T then assign it
to a CStirng ??

What does CSTring store internally ?? I'm using VS.NET to compile my
projects.

I can see and edit the xml file in DreamWaver, so the fonts must be
supported by my system. However, after loading up the XML file by MSXML, and
get the node, and assigned to a CString, and display it out, the problem
happends, for some simplified chinese becomes "?", but some are okay.

"MerkX Zyban" <Me***@NetWand.com> wrote in message
news:uk**************@TK2MSFTNGP09.phx.gbl...
K,

Does your XML file begin with the following line?

<?xml version="1.0" encoding="UTF-8" ?>

If not, add this line and see what happens. If you do have this line (or
you add it) and still have problems, then you may be using characters that
Windows cannot support or your fonts cannot display (i.e. traditional
Chinese).

Windows supports Unicode up to version 2.1 only. The XML parser converts
your XML source to UTF-16 and parsed internally. When the XML parser sees
the line above it will convert your XML file from UTF-8 with no loss of
information. However, without this line (specifically without the encoding clue) the system default ANSI code page will be used when converting to
UTF-16.

Even with this line, you may still have characters that your fonts can't
display, however no loss in the conversion to/from UTF-8 will occur.

Hope this helps (and I hope I know what I'm talking about :-)

-MerkX

"K" <k@taka.com> wrote in message
news:#D**************@TK2MSFTNGP10.phx.gbl...
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).

In loading the XML file with MSXML parser, I used the below code to

retrieve
the data in a node. The CString was then display in CListCtrl. For the
traditional chinese characters, they were shown correctly, but for
simplified characters, I encounted many "?", but some characters were
correct.

if (MSXML::NODE_ELEMENT == pChild->nodeType)
{
MSXML::IXMLDOMNamedNodeMapPtr pAttrs = pChild->attributes;
MSXML::IXMLDOMNodePtr pAttr;

pAttr = pAttrs->getNamedItem(L"id");
CString id = OLE2T(pAttr->text);

MSXML::IXMLDOMNodePtr pWording = pChild->firstChild;
CString wording = OLE2T(pWording->text);

//add the wording to language
pMessageLanguage->m_wordingList.insert(MessageWordingListPair(id,
wording) );

}


Nov 16 '05 #4

P: n/a
> After reading in the node in MSXML, can I use the macro OLE2T then
assign it to a CStirng ??

What does CSTring store internally ?? I'm using VS.NET to compile my
projects.

CString stores ANSI in an ANSI application and Unicode in a UNICODE app.
If you app. is Unicode, there is no need to use

But question marks are usualy the result of bad code page conversions.
Are you sure there are no conversions happening
(maybe in m_wordingList.insert, or in MessageWordingListPair)?

Mihai
Nov 16 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.