473,769 Members | 4,846 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Encoding problem from usenet


I am really having trouble with encoding characters.
The application I am creating i based on a NNTP component from Smilla

smilla.ru

My propblem is when I read a string which contain special characters and the character is set to utf-8. It works fine with iso-8859-1 strings (I think)

The problem occurs whene case below is Q, encoding is set to utf-8,
s= "=?Utf-8?Q?M=C3=A1scar a_WindowsForm?= "

the return value of this function is:
"Máscara_Windo wsForm"

it should be:
Máscara WindowsForm

I hope someone can help. I am really frustrated...ha ve already used many days om this problem...

Here is the code I am using:

public static string DecodeHeaderStr ing(string s)
{
Match header = Regex.Match(s, @"=\?(.+)\?(.+) \?(.+)\?=");
if (header.Success )
{
string charset = header.Groups[1].Value;
string encoding = header.Groups[2].Value;
string text = header.Groups[3].Value;
switch (encoding)
{
case "B":
text = Encoding.Defaul t.GetString(Con vert.FromBase64 String(text));
break;
case "Q":
text = Encoding.Defaul t.GetString(Fro mQuotedPrintabl eString(text));
break;
}
text = Charset.Decode( charset, text, false);
return text;
}
else
return s;
}

Encoding issue - utf-8?


--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #1
8 2618
On Jun 2, 4:31 pm, Lisa (lisa19...@hotm ail.com) wrote:

<snip>
Encoding issue - utf-8?
Yes - your code is explicitly using Encoding.Defaul t to convert from
the raw bytes to text, despite the fact that the quoted-printable line
says that the encoding is UTF-8.

You need to work out the encoding to use, and apply it - using
Encoding.Defaul t will use the default encoding for your particular
machine.

Jon
Jun 27 '08 #2
Thanks Jon, You made my day!! You are my hero!!
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #3
<Lisa (li*******@hotm ail.com)wrote:
Thanks Jon, You made my day!! You are my hero!!
Not a problem, but just to check - you didn't just switch from
Encoding.Defaul t to Encoding.UTF8 did you? Otherwise things will go
equally wrong when someone posts a quoted-printable subject line with
an ISO-8859-1 encoding (or whatever).

--
Jon Skeet - <sk***@pobox.co m>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #4

This is how it look now... I hope this is correct now.

Just for fun: what about iso-2022-jp, will if work?? and how about saving data in sql server 2005, which datatype do I have to use. At the momemt I use nvarcharMax, is this a good aproach? (I do not want
the data to be searchable)

public static string DecodeHeaderStr ing(string s)
{
Match header = Regex.Match(s, @"=\?(.+)\?(.+) \?(.+)\?=");
if (header.Success )
{
string charset = header.Groups[1].Value;
string encoding = header.Groups[2].Value;
string text = header.Groups[3].Value;
switch (encoding)
{
case "B":
switch (charset)
{
case "":
text = Encoding.Defaul t.GetString(Con vert.FromBase64 String(text));
break;
default:
text = Encoding.GetEnc oding(charset). GetString(Conve rt.FromBase64St ring(text));
break;
}
break;
case "Q":
switch (charset)
{
case "":
text = Encoding.Defaul t.GetString(Fro mQuotedPrintabl eString(text));
break;
default:
text = Encoding.GetEnc oding(charset). GetString(FromQ uotedPrintableS tring(text));
break;
}
break;
}
text = Charset.Decode( charset, text, false);
return text;
}
else
return s;
}
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #5
<Lisa (li*******@hotm ail.com)wrote:
This is how it look now... I hope this is correct now.

Just for fun: what about iso-2022-jp, will if work??
Give it a try - but I'm not sure. You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)
and how about
saving data in sql server 2005, which datatype do I have to use. At
the momemt I use nvarcharMax, is this a good aproach? (I do not want
the data to be searchable)
I wouldn't like to say for sure, but I think it should be okay. You may
need to set your database's collation - I'm not a SQL server expert.
public static string DecodeHeaderStr ing(string s)
<snip>

Given that the code to get the charset is the same regardless of the
transport, you could separate that bit out first...

--
Jon Skeet - <sk***@pobox.co m>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #6
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.

Thanks for your help and time :)

--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #7
On Jun 2, 10:53 pm, Lisa (lisa19...@hotm ail.com) wrote:
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.
Well, you're currently using Encoding.GetEnc oding to convert from an
encoding's name (as written in the quoted printable line) to an
Encoding object. It's possible that there are encodings which .NET
does know about, but not by the name that people use in the QP line.
Suppose QP used "Utf_eight" instead of "UTF-8" - you'd need a way of
mapping from one form to another. You'll need to look at as much
sample data as you've got to find out what's actually in use "in the
wild".

Do you see what I mean?

JonOn Jun 2, 10:53 pm, Lisa (lisa19...@hotm ail.com) wrote:
I'll check out the databse collation thing...

And what did you mean by "....You may wish to prepopulate a map of
charsets you know are likely to occur. (I'm sure that .NET has the
charset in question, but may not know it by that name.)"... could you please write a few words about that. Maybe a link of a sample if you have.
Well, you're currently using Encoding.GetEnc oding to convert from an
encoding's name (as written in the quoted printable line) to an
Encoding object. It's possible that there are encodings which .NET
does know about, but not by the name that people use in the QP line.
Suppose QP used "Utf_eight" instead of "UTF-8" - you'd need a way of
mapping from one form to another. You'll need to look at as much
sample data as you've got to find out what's actually in use "in the
wild".

Do you see what I mean?

Jon
Jun 27 '08 #8
To make a mapping was a really good idea, I gathered some data since yesterday and these are the charsets I found:

UTF-8; format=flowed 238
iso-2022-jp 14
koi8-r; format=flowe 23
windows-1257; format 3
Windows-1252 163
ISO-8859-1; format=f 982
us-ascii 21979
windows-1256 1
gb2312 3
KOI8-R 5
windows-1256; format 1
ISO-8859-9 2
windows-1251 8
140
us-ascii; format=flo 10
ISO-8859-15; format= 41
ISO-8859-2; format=f 17
windows-1251; format 1
iso-8859-15 70
Utf-8 2468
windows-1257 16
ISO-8859-1 5086
iso-8859-2 18
windows-1252; format 7

There is actually 140 of those where charset is not defined, and it also seems like the string is cut of after 20 charaters.. I am not sure yet maybe there is some bugs in the smilla library...I'll check it out and let you know...

kisses Lisa
--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -
Jun 27 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
5543
by: lawrence | last post by:
Validator chokes on my pages now because I started sending an character encoding header of UTF-8 but the page is full of non UTF-8 characters. Anyway quick way to convert them? http://validator.w3.org/check?uri=http%3A%2F%2Fwww.krubner.com%2F
2
1820
by: Matthew Mueller | last post by:
I noticed in python2.3 printing unicode to an appropriate terminal actually works. But using sys.stdout.write doesn't. Ex: Python 2.3.4 (#2, May 29 2004, 03:31:27) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> sys.stdout.encoding 'UTF-8' >>> u=u'\u3053\u3093\u306b\u3061\u308f'
1
1890
by: duzos duzos | last post by:
i need to connect to a dbf database with ansi/oem encoding the connection is ok but i have problem with page encoding the characters don't display as they should! does anyone have any suggestions?? *** Sent via Developersdex http://www.developersdex.com *** Don't just participate in USENET...get rewarded for it!
3
1720
by: Phelim | last post by:
Hi. Im trying to form xml where the content is all unicode, but the xml seems to break at regular intervals, and will not see the content from that point on as unicode. It seems to add some extra characters onto the end of a node (which are usually garbage and a few characters long). After this point (which is always at the end of a particular node, the rest comes out as it would if it were not unicode.
10
17629
by: David Komanek | last post by:
Hi all, I have a question if it is possible to manipulate the settings of character encoding in Ms Internet Explorer 5.0, 5.5 and 6.0. The problem is that the default instalation of Ms IE seems to have hard selected default encoding to "Western European (ISO)", which means iso-8859-1. When browsing pages with some Central/Eastern European characters these are converted to iso-8859-1 so displayed wrong. I would suppose the...
3
4143
by: Max_Us | last post by:
hi everyone. I'm getting the following warning of Encodinf Mistmatch: > Character Encoding mismatch! > > The character encoding specified in the HTTP header (iso-8859-1) is > different from the value in the XML declaration (utf-8). I will use the > value from the HTTP header (iso-8859-1). Although, in my opinion I specify the same encoding also in the htto
5
437
by: bmth | last post by:
Hi all I am trying to contvert a message from my server to unicode. I initially got a base64 endoced byte message. After some debugging I saw the base64 string had "\0" in the end wich resulted in an error when converting from base64. So I cut it away and can now see my string in byte. I tried calling Encoding.Unicode.GetString() and sent my byte as param. The string I get contains only boxes. My byte array seems ok, so I was wondering...
0
1072
by: buzz | last post by:
I'm Using IE 6.0, the text shows up intermittently......on a post-back > How do I eliminate the following text from appearing on my asp.net > page... > > Transfer-encoding: chunked289 3a0 , or some other value Hm... that might be browser bug. I've seen older versions of Mozilla behave like that. What browser are you using?
1
14422
by: Matthias Langbein | last post by:
Hi all, in my script I need to find out the encoding of a file and in case it is not of the kind UTF-8 I need to convert it to that format. Any hints?? Thanks
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10212
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10047
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9995
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8872
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6674
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3563
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.