UTF8 Encoding - .NET Framework

Matt

I have a problem where I am working with extended character sets in XML but
I have also found that any time I work with a translation or internally
generated Xml document I get the dreaded message, "Data at the root level is
invalid. Line 1 position 1".

If you run the following code there will be extra bytes at the beginning of
the resulting string. I believe this is some type of BigEndian encoding or
something. My question is this, how do I do this and load the result into
the DOM object and keep all character encoding in tact?

Thanks,

Matt

using System;
using System.Xml;
using System.IO;
using System.Text;
class Program
{
static void Main(string[] args)
{
MemoryStream ms = new MemoryStream();
// Create Xml
XmlTextWriter writer = new XmlTextWriter(m s, System.Text.Enc oding.UTF8);
writer.WriteSta rtDocument(true );
writer.WriteSta rtElement("data ", "www.contoso.co m");
writer.WriteEnd Element();
writer.WriteEnd Document();
// Flush Document
writer.Close();
// Get resulting document
string text = Encoding.UTF8.G etString(ms.Get Buffer());
Console.WriteLi ne(text);
// Load resulting Xml Document into DOM
XmlDocument xml = new XmlDocument();
try
{
xml.LoadXml(tex t);
}
catch (Exception ex)
{
Console.WriteLi ne(ex.Message);
}
}
}

Aug 15 '06 #1

Subscribe Reply

6313

Martin Honnen

Matt wrote:

MemoryStream ms = new MemoryStream();
// Create Xml
XmlTextWriter writer = new XmlTextWriter(m s, System.Text.Enc oding.UTF8);
writer.WriteSta rtDocument(true );
writer.WriteSta rtElement("data ", "www.contoso.co m");
writer.WriteEnd Element();
writer.WriteEnd Document();
// Flush Document
writer.Close();
// Get resulting document
string text = Encoding.UTF8.G etString(ms.Get Buffer());

Why are you using a stream to write the XML to but then want a string
with XML? If you want a string then write to a StringWriter not a stream.
If you use a stream then simply load the XML from that stream e.g.

MemoryStream ms = new MemoryStream();
// Create Xml
XmlTextWriter writer = new XmlTextWriter(m s,
System.Text.Enc oding.UTF8);
writer.WriteSta rtDocument(true );
writer.WriteSta rtElement("data ", "www.contoso.co m");
writer.WriteEnd Element();
writer.WriteEnd Document();
// Flush Document
writer.Flush();

ms.Position = 0;

XmlDocument xmlDocument = new XmlDocument();
try {
xmlDocument.Loa d(ms);
Console.WriteLi ne("Got\r\n{0}" , xmlDocument.Out erXml);
}
catch (Exception e) {
Console.WriteLi ne(e);
}
finally {
writer.Close();
}

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Aug 16 '06 #2

Matt

Martin,

Simple, because a stream does not change extended character sets where strings do. This was a quick way to show the issue I am having with working with foreign characters in a document. If you would like, I can give a style sheet and a document that has German characters and you can translate and see the issues for yourself when using a string.

Matt
"Martin Honnen" <ma*******@yaho o.dewrote in message news:OL******** ******@TK2MSFTN GP03.phx.gbl...
Matt wrote:

MemoryStream ms = new MemoryStream();
// Create Xml
XmlTextWriter writer = new XmlTextWriter(m s, System.Text.Enc oding.UTF8);
writer.WriteSta rtDocument(true );
writer.WriteSta rtElement("data ", "www.contoso.co m");
writer.WriteEnd Element();
writer.WriteEnd Document();
// Flush Document
writer.Close();
// Get resulting document
string text = Encoding.UTF8.G etString(ms.Get Buffer());

Aug 16 '06 #3

Martin Honnen

Matt wrote:

Simple, because a stream does not change extended character sets where
strings do. This was a quick way to show the issue I am having with
working with foreign characters in a document. If you would like, I can
give a style sheet and a document that has German characters and you can
translate and see the issues for yourself when using a string.

I don't understand the issue, you need to explain what problem you have.

A .NET string is a sequence of Unicode characters so it does not have
any encoding or character set. Unicode has all characters of lots of
scripts/languages around the world so in a .NET string you do not have
any problems to use German umlauts of French accented characters or
Greek letters or Hebrew letters.

And as shown, you can use a stream if you want but then I would suggest
to load back from the stream with a method that takes a stream.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Aug 16 '06 #4

Similar topics

6929

utf8 and ftplib

by: Richard Lewis | last post by:

Hi there, I'm having a problem with unicode files and ftplib (using Python 2.3.5). I've got this code: xml_source = codecs.open("foo.xml", 'w+b', "utf8") #xml_source = file("foo.xml", 'w+b') ftp.retrbinary("RETR foo.xml", xml_source.write)

Python

7777

Read UTF8 (mixed byte) file & convert to Unicode

by: hunterb | last post by:

I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a DB and display it onscreen. No matter which way I open the file, convert it to Unicode/leave it as is or what ever, I see all single bytes ok, but double bytes become 2 seperate single bytes. Surely there is an easy way to convert these mixed...

.NET Framework

11542

XmlTextWriter and Text.Encoding.UTF8 issues

by: H Lee | last post by:

Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm having issues using XmlTextWriter, saving it out to a file with UTF8 encoding, and seeing "dirty", or "human unreadable" characters show up *right before* the XML declaration. I need to have the XML declaration state "encoding = utf-8", but also...

.NET Framework

2207

UTF8 encoding - Problem

by: Frank Esser | last post by:

Hello! On a PC with German Codepage settings I want to get UTF8 out of string in my application. I use this function: Byte array = Encoding.UTF8.GetBytes("à"); When I look at the Unicode tables then this character is in the Latin table

C# / C Sharp

4495

BIG encoding and UTF8?

by: EmeraldShield | last post by:

We have an application that uses UTF8 everywhere to load / save / process documents. One of our clients is having a problem with BIG Encoded files being trashed after running through our app. Indeed I have verified that if I go to a website in Taiwan and save the file in BIG5 and then just load / save the file with a UTF8 text reader / write some bytes are modified. How can I correct this? It was my understanding the UTF8 was...

C# / C Sharp

4892

Question about ReadLine UTF8 line truncation

by: EmeraldShield | last post by:

(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader) I have a very strange problem that I cannot explain with a UTF8 Readline() although this could exist in other types of encoding, I have not tried them. Our application wrote this sequence to a UTF8 file. Now I am loading it back and the text is not coming back in the same as it went out. DATA: from: processfrom checkemail failed: 501 syntax error in parameters:...

C# / C Sharp

4494

How to get Python to default to UTF8

by: weheh | last post by:

I'm developing a cgi-bin application that must be unicode sensitive. I'm striving for a UTF8 implementation. I'm running python 2.3 on a development machine (windows xp) and a server (windows xp server). Both environments are running Apache 2.2 with the same configuration file. The problem is this. On my development machine I get the following unicode error: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 4-6: invalid...

Python

1618

Set sys.stdout.encoding to utf8 in emacs/python-mode?

by: damonwischik | last post by:

I use emacs 22 and python-mode. Emacs can display utf8 characters (e.g. when I open a utf8-encoded file with Chinese, those characters show up fine), and I'd like to see utf8-encoded output from my python session. From googling, I've found references to * locale.getdefaultlocale(), which is ('en_GB', 'cp1252') * sys.stdout.encoding, which is None * the environment variables LANG, LC_CHARSET and LC_ALL....

Python

5879

Mysql database in UTF8, PHP shows latin1 (iso-8859-1)

by: alex | last post by:

I've converted a latin1 database I have to utf8. The process has been: # mysqldump -u root -p --default-character-set=latin1 -c --insert-ignore --skip-set-charset mydb mydb.sql # iconv -f ISO-8859-1 -t UTF-8 mydb.sql mydb_utf8.sql mysqlCREATE DATABASE mydb_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;

PHP

92624

how to convert UTF8 file into ANSI?

by: firepol | last post by:

Hello there, I am dealing with files encoded in UTF8 and I can't find a way to convert them into ANSI. I've already searched in google for this since a while, and I'm not achieving the result I want to achieve if I use the code I've found on the web, which is the following: Example of test.txt (save it as UTF8): "éàèüöä" string filePath = "c:\\test.txt"; StreamReader fileStream = new StreamReader(filePath); string fileContent =...

C# / C Sharp

10618

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10371

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10110

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7649

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6877

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5546

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5678

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4329

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3008

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General