473,471 Members | 1,900 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Encoding problem from usenet


I am really having trouble with encoding characters.
The application I am creating i based on a NNTP component from Smilla

smilla.ru

My propblem is when I read a string which contain special characters and the character is set to utf-8. It works fine with iso-8859-1 strings (I think)

The problem occurs whene case below is Q, encoding is set to utf-8,
s= "=?Utf-8?Q?M=C3=A1scara_WindowsForm?="

the return value of this function is:
"Máscara_WindowsForm"

it should be:
Máscara WindowsForm

I hope someone can help. I am really frustrated...have already used many days om this problem...

Here is the code I am using:

public static string DecodeHeaderString(string s)
{
Match header = Regex.Match(s, @"=\?(.+)\?(.+)\?(.+)\?=");
if (header.Success)
{
string charset = header.Groups[1].Value;
string encoding = header.Groups[2].Value;
string text = header.Groups[3].Value;
switch (encoding)
{
case "B":
text = Encoding.Default.GetString(Convert.FromBase64Strin g(text));
break;
case "Q":
text = Encoding.Default.GetString(FromQuotedPrintableStri ng(text));
break;
}
text = Charset.Decode(charset, text, false);
return text;
}
else
return s;
}

Encoding issue - utf-8?


--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #1
8 2597
On Jun 2, 4:31 pm, Lisa (lisa19...@hotmail.com) wrote:

<snip>
Encoding issue - utf-8?
Yes - your code is explicitly using Encoding.Default to convert from
the raw bytes to text, despite the fact that the quoted-printable line
says that the encoding is UTF-8.

You need to work out the encoding to use, and apply it - using
Encoding.Default will use the default encoding for your particular
machine.

Jon
Jun 27 '08 #2
Thanks Jon, You made my day!! You are my hero!!
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #3
<Lisa (li*******@hotmail.com)wrote:
Thanks Jon, You made my day!! You are my hero!!
Not a problem, but just to check - you didn't just switch from
Encoding.Default to Encoding.UTF8 did you? Otherwise things will go
equally wrong when someone posts a quoted-printable subject line with
an ISO-8859-1 encoding (or whatever).

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #4

This is how it look now... I hope this is correct now.

Just for fun: what about iso-2022-jp, will if work?? and how about saving data in sql server 2005, which datatype do I have to use. At the momemt I use nvarcharMax, is this a good aproach? (I do not want
the data to be searchable)

public static string DecodeHeaderString(string s)
{
Match header = Regex.Match(s, @"=\?(.+)\?(.+)\?(.+)\?=");
if (header.Success)
{
string charset = header.Groups[1].Value;
string encoding = header.Groups[2].Value;
string text = header.Groups[3].Value;
switch (encoding)
{
case "B":
switch (charset)
{
case "":
text = Encoding.Default.GetString(Convert.FromBase64Strin g(text));
break;
default:
text = Encoding.GetEncoding(charset).GetString(Convert.Fr omBase64String(text));
break;
}
break;
case "Q":
switch (charset)
{
case "":
text = Encoding.Default.GetString(FromQuotedPrintableStri ng(text));
break;
default:
text = Encoding.GetEncoding(charset).GetString(FromQuoted PrintableString(text));
break;
}
break;
}
text = Charset.Decode(charset, text, false);
return text;
}
else
return s;
}
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #5
<Lisa (li*******@hotmail.com)wrote:
This is how it look now... I hope this is correct now.

Just for fun: what about iso-2022-jp, will if work??
Give it a try - but I'm not sure. You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)
and how about
saving data in sql server 2005, which datatype do I have to use. At
the momemt I use nvarcharMax, is this a good aproach? (I do not want
the data to be searchable)
I wouldn't like to say for sure, but I think it should be okay. You may
need to set your database's collation - I'm not a SQL server expert.
public static string DecodeHeaderString(string s)
<snip>

Given that the code to get the charset is the same regardless of the
transport, you could separate that bit out first...

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #6
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.

Thanks for your help and time :)

--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #7
On Jun 2, 10:53 pm, Lisa (lisa19...@hotmail.com) wrote:
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.
Well, you're currently using Encoding.GetEncoding to convert from an
encoding's name (as written in the quoted printable line) to an
Encoding object. It's possible that there are encodings which .NET
does know about, but not by the name that people use in the QP line.
Suppose QP used "Utf_eight" instead of "UTF-8" - you'd need a way of
mapping from one form to another. You'll need to look at as much
sample data as you've got to find out what's actually in use "in the
wild".

Do you see what I mean?

JonOn Jun 2, 10:53 pm, Lisa (lisa19...@hotmail.com) wrote:
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.
Well, you're currently using Encoding.GetEncoding to convert from an
encoding's name (as written in the quoted printable line) to an
Encoding object. It's possible that there are encodings which .NET
does know about, but not by the name that people use in the QP line.
Suppose QP used "Utf_eight" instead of "UTF-8" - you'd need a way of
mapping from one form to another. You'll need to look at as much
sample data as you've got to find out what's actually in use "in the
wild".

Do you see what I mean?

Jon
Jun 27 '08 #8
To make a mapping was a really good idea, I gathered some data since yesterday and these are the charsets I found:

UTF-8; format=flowed 238
iso-2022-jp 14
koi8-r; format=flowe 23
windows-1257; format 3
Windows-1252 163
ISO-8859-1; format=f 982
us-ascii 21979
windows-1256 1
gb2312 3
KOI8-R 5
windows-1256; format 1
ISO-8859-9 2
windows-1251 8
140
us-ascii; format=flo 10
ISO-8859-15; format= 41
ISO-8859-2; format=f 17
windows-1251; format 1
iso-8859-15 70
Utf-8 2468
windows-1257 16
ISO-8859-1 5086
iso-8859-2 18
windows-1252; format 7

There is actually 140 of those where charset is not defined, and it also seems like the string is cut of after 20 charaters.. I am not sure yet maybe there is some bugs in the smilla library...I'll check it out and let you know...

kisses Lisa
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: lawrence | last post by:
Validator chokes on my pages now because I started sending an character encoding header of UTF-8 but the page is full of non UTF-8 characters. Anyway quick way to convert them? ...
2
by: Matthew Mueller | last post by:
I noticed in python2.3 printing unicode to an appropriate terminal actually works. But using sys.stdout.write doesn't. Ex: Python 2.3.4 (#2, May 29 2004, 03:31:27) on linux2 Type "help",...
1
by: duzos duzos | last post by:
i need to connect to a dbf database with ansi/oem encoding the connection is ok but i have problem with page encoding the characters don't display as they should! does anyone have any...
3
by: Phelim | last post by:
Hi. Im trying to form xml where the content is all unicode, but the xml seems to break at regular intervals, and will not see the content from that point on as unicode. It seems to add some extra...
10
by: David Komanek | last post by:
Hi all, I have a question if it is possible to manipulate the settings of character encoding in Ms Internet Explorer 5.0, 5.5 and 6.0. The problem is that the default instalation of Ms IE seems...
3
by: Max_Us | last post by:
hi everyone. I'm getting the following warning of Encodinf Mistmatch: > Character Encoding mismatch! > > The character encoding specified in the HTTP header (iso-8859-1) is > different from...
5
by: bmth | last post by:
Hi all I am trying to contvert a message from my server to unicode. I initially got a base64 endoced byte message. After some debugging I saw the base64 string had "\0" in the end wich resulted...
0
by: buzz | last post by:
I'm Using IE 6.0, the text shows up intermittently......on a post-back > How do I eliminate the following text from appearing on my asp.net > page... > > Transfer-encoding: chunked289 3a0 , or...
1
by: Matthias Langbein | last post by:
Hi all, in my script I need to find out the encoding of a file and in case it is not of the kind UTF-8 I need to convert it to that format. Any hints?? Thanks
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.