By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,910 Members | 1,077 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,910 IT Pros & Developers. It's quick & easy.

Encoding XML troubles

P: n/a
Hi,

I am trying to write a generic RSS/Atom/OPML feed client. The problem
is, that those xml feeds may have different encodings:

- <?xml version="1.0" encoding="ISO-8859-1" ?>...
- <?xml version="1.0" encoding="utf-8" ?>...
- ...

I am using the WebRequest functionality to get the feeds. So, my code
looks simplified like this:

WebRequest req = WebRequest.Create(url);
StreamReader reader = new StreamReader(..., Encoding.Default);
string result = readerEnc.ReadToEnd();

As you can see on the second line, I can (or must, because utf-8 is
default) already define the encoding type of the expected stream.
However, as I do not now the encoding type while fetching the xml
stream, I use Encoding.Default.

And now, I am in the middle of the problem: I like to read the result
xml string, get the encoding type and re-encode result string with the
correct encoding type. Otherwise, all special characters are not
readable or missing in the result string.

I have unlukely tried following work-arounds:
- convert directly the result xml string from Encoding.Default to XML
Encoding Type:
result = this.convertString(result, Encoding.Default,
Encoding.GetEncoding(myEncodingStringFromXMLFile)) ;

The convertString function uses similar code as the convert example on
msdn: http://msdn.microsoft.com/library/de...classtopic.asp
--> did not work - characters remained as they where before

- Creating a second StreamReader instance with the right encoding:
StreamReader reader2 = new StreamReader(...,
Encoding.GetEncoding(myEncodingStringFromXMLFile)) ;
string result = readerEnc.ReadToEnd();
--> did not work - it seems, that the ResponseStream from the
WebRequest class can only be read once! I am getting an error when
trying to modify the Position attribute on the stream (Another guy had
exactly the same problem:
http://groups.google.ch/groups?hl=de...f67a0c2&rnum=1)

Is there another solution, than fetching the URL twice? Do I miss some
basic functionalities? Thanks for your help...

Greets,

Phil
Nov 17 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
fitsch wrote:
Hi,

I am trying to write a generic RSS/Atom/OPML feed client. The problem
is, that those xml feeds may have different encodings:

- <?xml version="1.0" encoding="ISO-8859-1" ?>...
- <?xml version="1.0" encoding="utf-8" ?>...
- ...

I am using the WebRequest functionality to get the feeds. So, my code
looks simplified like this:

WebRequest req = WebRequest.Create(url);
StreamReader reader = new StreamReader(..., Encoding.Default);
string result = readerEnc.ReadToEnd();

As you can see on the second line, I can (or must, because utf-8 is
default) already define the encoding type of the expected stream.
However, as I do not now the encoding type while fetching the xml
stream, I use Encoding.Default.
Note that Encoding.Default is your OS default character set, and no
magic catch all encoding. This step will already render a lot of XML
input useless.
And now, I am in the middle of the problem: I like to read the result
xml string, get the encoding type and re-encode result string with the
correct encoding type. Otherwise, all special characters are not
readable or missing in the result string.
Once it's a string, it's a string. You must re-*de*code bytes.
I have unlukely tried following work-arounds:
- convert directly the result xml string from Encoding.Default to XML
Encoding Type:
result = this.convertString(result, Encoding.Default,
Encoding.GetEncoding(myEncodingStringFromXMLFile)) ;

The convertString function uses similar code as the convert example on
msdn:
http://msdn.microsoft.com/library/de...ry/en-us/cpref
/html/frlrfsystemtextencodingclasstopic.asp --> did not work -
characters remained as they where before

- Creating a second StreamReader instance with the right encoding:
StreamReader reader2 = new StreamReader(...,
Encoding.GetEncoding(myEncodingStringFromXMLFile)) ;
string result = readerEnc.ReadToEnd();
--> did not work - it seems, that the ResponseStream from the
WebRequest class can only be read once! I am getting an error when
trying to modify the Position attribute on the stream (Another guy had
exactly the same problem:
http://groups.google.ch/groups?hl=de...f67a0c2&rnum=1)

Is there another solution, than fetching the URL twice? Do I miss some
basic functionalities? Thanks for your help...


The functionality to safely decode XML content is already available in
the BCL. Just use an XmlTextReader.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 17 '05 #2

P: n/a
fitsch wrote:
Hi,

I am trying to write a generic RSS/Atom/OPML feed client. The problem
is, that those xml feeds may have different encodings:

- <?xml version="1.0" encoding="ISO-8859-1" ?>...
- <?xml version="1.0" encoding="utf-8" ?>...
- ...

I am using the WebRequest functionality to get the feeds. So, my code
looks simplified like this:

WebRequest req = WebRequest.Create(url);
StreamReader reader = new StreamReader(..., Encoding.Default);
string result = readerEnc.ReadToEnd();

As you can see on the second line, I can (or must, because utf-8 is
default) already define the encoding type of the expected stream.
However, as I do not now the encoding type while fetching the xml
stream, I use Encoding.Default.
Note that Encoding.Default is your OS default character set, and no
magic catch all encoding. This step will already render a lot of XML
input useless.
And now, I am in the middle of the problem: I like to read the result
xml string, get the encoding type and re-encode result string with the
correct encoding type. Otherwise, all special characters are not
readable or missing in the result string.
Once it's a string, it's a string. You must re-*de*code bytes.
I have unlukely tried following work-arounds:
- convert directly the result xml string from Encoding.Default to XML
Encoding Type:
result = this.convertString(result, Encoding.Default,
Encoding.GetEncoding(myEncodingStringFromXMLFile)) ;

The convertString function uses similar code as the convert example on
msdn:
http://msdn.microsoft.com/library/de...ry/en-us/cpref
/html/frlrfsystemtextencodingclasstopic.asp --> did not work -
characters remained as they where before

- Creating a second StreamReader instance with the right encoding:
StreamReader reader2 = new StreamReader(...,
Encoding.GetEncoding(myEncodingStringFromXMLFile)) ;
string result = readerEnc.ReadToEnd();
--> did not work - it seems, that the ResponseStream from the
WebRequest class can only be read once! I am getting an error when
trying to modify the Position attribute on the stream (Another guy had
exactly the same problem:
http://groups.google.ch/groups?hl=de...f67a0c2&rnum=1)

Is there another solution, than fetching the URL twice? Do I miss some
basic functionalities? Thanks for your help...


The functionality to safely decode XML content is already available in
the BCL. Just use an XmlTextReader.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 17 '05 #3

P: n/a
fitsch <fi****@bluewin.ch> wrote:
I am trying to write a generic RSS/Atom/OPML feed client. The problem
is, that those xml feeds may have different encodings:

- <?xml version="1.0" encoding="ISO-8859-1" ?>...
- <?xml version="1.0" encoding="utf-8" ?>...
- ...

I am using the WebRequest functionality to get the feeds. So, my code
looks simplified like this:

WebRequest req = WebRequest.Create(url);
StreamReader reader = new StreamReader(..., Encoding.Default);
string result = readerEnc.ReadToEnd();


Why bother reading it as a string? The best solution is to get the
stream and pass it directly to XmlTextReader - then the XmlTextReader,
which knows how to deal with the encoding part of the XML declaration,
can do the right thing.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #4

P: n/a
fitsch <fi****@bluewin.ch> wrote:
I am trying to write a generic RSS/Atom/OPML feed client. The problem
is, that those xml feeds may have different encodings:

- <?xml version="1.0" encoding="ISO-8859-1" ?>...
- <?xml version="1.0" encoding="utf-8" ?>...
- ...

I am using the WebRequest functionality to get the feeds. So, my code
looks simplified like this:

WebRequest req = WebRequest.Create(url);
StreamReader reader = new StreamReader(..., Encoding.Default);
string result = readerEnc.ReadToEnd();


Why bother reading it as a string? The best solution is to get the
stream and pass it directly to XmlTextReader - then the XmlTextReader,
which knows how to deal with the encoding part of the XML declaration,
can do the right thing.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.