By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,995 Members | 1,266 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,995 IT Pros & Developers. It's quick & easy.

Encoding problem from usenet

P: n/a

I am really having trouble with encoding characters.
The application I am creating i based on a NNTP component from Smilla

smilla.ru

My propblem is when I read a string which contain special characters and the character is set to utf-8. It works fine with iso-8859-1 strings (I think)

The problem occurs whene case below is Q, encoding is set to utf-8,
s= "=?Utf-8?Q?M=C3=A1scara_WindowsForm?="

the return value of this function is:
"Máscara_WindowsForm"

it should be:
Máscara WindowsForm

I hope someone can help. I am really frustrated...have already used many days om this problem...

Here is the code I am using:

public static string DecodeHeaderString(string s)
{
Match header = Regex.Match(s, @"=\?(.+)\?(.+)\?(.+)\?=");
if (header.Success)
{
string charset = header.Groups[1].Value;
string encoding = header.Groups[2].Value;
string text = header.Groups[3].Value;
switch (encoding)
{
case "B":
text = Encoding.Default.GetString(Convert.FromBase64Strin g(text));
break;
case "Q":
text = Encoding.Default.GetString(FromQuotedPrintableStri ng(text));
break;
}
text = Charset.Decode(charset, text, false);
return text;
}
else
return s;
}

Encoding issue - utf-8?


--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #1
Share this Question
Share on Google+
8 Replies


P: n/a
On Jun 2, 4:31 pm, Lisa (lisa19...@hotmail.com) wrote:

<snip>
Encoding issue - utf-8?
Yes - your code is explicitly using Encoding.Default to convert from
the raw bytes to text, despite the fact that the quoted-printable line
says that the encoding is UTF-8.

You need to work out the encoding to use, and apply it - using
Encoding.Default will use the default encoding for your particular
machine.

Jon
Jun 27 '08 #2

P: n/a
Thanks Jon, You made my day!! You are my hero!!
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #3

P: n/a
<Lisa (li*******@hotmail.com)wrote:
Thanks Jon, You made my day!! You are my hero!!
Not a problem, but just to check - you didn't just switch from
Encoding.Default to Encoding.UTF8 did you? Otherwise things will go
equally wrong when someone posts a quoted-printable subject line with
an ISO-8859-1 encoding (or whatever).

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #4

P: n/a

This is how it look now... I hope this is correct now.

Just for fun: what about iso-2022-jp, will if work?? and how about saving data in sql server 2005, which datatype do I have to use. At the momemt I use nvarcharMax, is this a good aproach? (I do not want
the data to be searchable)

public static string DecodeHeaderString(string s)
{
Match header = Regex.Match(s, @"=\?(.+)\?(.+)\?(.+)\?=");
if (header.Success)
{
string charset = header.Groups[1].Value;
string encoding = header.Groups[2].Value;
string text = header.Groups[3].Value;
switch (encoding)
{
case "B":
switch (charset)
{
case "":
text = Encoding.Default.GetString(Convert.FromBase64Strin g(text));
break;
default:
text = Encoding.GetEncoding(charset).GetString(Convert.Fr omBase64String(text));
break;
}
break;
case "Q":
switch (charset)
{
case "":
text = Encoding.Default.GetString(FromQuotedPrintableStri ng(text));
break;
default:
text = Encoding.GetEncoding(charset).GetString(FromQuoted PrintableString(text));
break;
}
break;
}
text = Charset.Decode(charset, text, false);
return text;
}
else
return s;
}
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #5

P: n/a
<Lisa (li*******@hotmail.com)wrote:
This is how it look now... I hope this is correct now.

Just for fun: what about iso-2022-jp, will if work??
Give it a try - but I'm not sure. You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)
and how about
saving data in sql server 2005, which datatype do I have to use. At
the momemt I use nvarcharMax, is this a good aproach? (I do not want
the data to be searchable)
I wouldn't like to say for sure, but I think it should be okay. You may
need to set your database's collation - I'm not a SQL server expert.
public static string DecodeHeaderString(string s)
<snip>

Given that the code to get the charset is the same regardless of the
transport, you could separate that bit out first...

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #6

P: n/a
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.

Thanks for your help and time :)

--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #7

P: n/a
On Jun 2, 10:53 pm, Lisa (lisa19...@hotmail.com) wrote:
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.
Well, you're currently using Encoding.GetEncoding to convert from an
encoding's name (as written in the quoted printable line) to an
Encoding object. It's possible that there are encodings which .NET
does know about, but not by the name that people use in the QP line.
Suppose QP used "Utf_eight" instead of "UTF-8" - you'd need a way of
mapping from one form to another. You'll need to look at as much
sample data as you've got to find out what's actually in use "in the
wild".

Do you see what I mean?

JonOn Jun 2, 10:53 pm, Lisa (lisa19...@hotmail.com) wrote:
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.
Well, you're currently using Encoding.GetEncoding to convert from an
encoding's name (as written in the quoted printable line) to an
Encoding object. It's possible that there are encodings which .NET
does know about, but not by the name that people use in the QP line.
Suppose QP used "Utf_eight" instead of "UTF-8" - you'd need a way of
mapping from one form to another. You'll need to look at as much
sample data as you've got to find out what's actually in use "in the
wild".

Do you see what I mean?

Jon
Jun 27 '08 #8

P: n/a
To make a mapping was a really good idea, I gathered some data since yesterday and these are the charsets I found:

UTF-8; format=flowed 238
iso-2022-jp 14
koi8-r; format=flowe 23
windows-1257; format 3
Windows-1252 163
ISO-8859-1; format=f 982
us-ascii 21979
windows-1256 1
gb2312 3
KOI8-R 5
windows-1256; format 1
ISO-8859-9 2
windows-1251 8
140
us-ascii; format=flo 10
ISO-8859-15; format= 41
ISO-8859-2; format=f 17
windows-1251; format 1
iso-8859-15 70
Utf-8 2468
windows-1257 16
ISO-8859-1 5086
iso-8859-2 18
windows-1252; format 7

There is actually 140 of those where charset is not defined, and it also seems like the string is cut of after 20 charaters.. I am not sure yet maybe there is some bugs in the smilla library...I'll check it out and let you know...

kisses Lisa
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #9

This discussion thread is closed

Replies have been disabled for this discussion.