Connecting Tech Pros Worldwide Help | Site Map

Convert text encoded with character referense ({) to unicode or uft-8

Daniel Köster
Guest
 
Posts: n/a
#1: Nov 22 '05
Is there someone who has got some tips on how to convert text encoded with
character referense ({) to unicode or uft-8 format using VB.net? Is
there a function or something that can help with the conversion?

To use a simple replace "this" with "that" is not an option since there are
som asian-texts that I need to convert as well. (chinese, thai and
japanese;
the replace list would be to large to handle)

What i want to do is to be able to compare a file coded with character
references (i.e. {) with a file coded with normal unicode characters
(i.e. ö,ä,å)

Best regards
Daniel


Jon Skeet [C# MVP]
Guest
 
Posts: n/a
#2: Nov 22 '05

re: Convert text encoded with character referense ({) to unicode or uft-8


Daniel Köster <dk@dontspamme.com> wrote:[color=blue]
> Is there someone who has got some tips on how to convert text encoded with
> character referense ({) to unicode or uft-8 format using VB.net? Is
> there a function or something that can help with the conversion?
>
> To use a simple replace "this" with "that" is not an option since there are
> som asian-texts that I need to convert as well. (chinese, thai and
> japanese;
> the replace list would be to large to handle)
>
> What i want to do is to be able to compare a file coded with character
> references (i.e. {) with a file coded with normal unicode characters
> (i.e. ö,ä,å)[/color]

Just do "normal" parsing to find the &#xxx; to start with, then use
Substring (or whatever) to get the xxx bit, parse it as an integer
(Int32.Parse or Convert.ToInt32) and cast the result to a character.

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Cor Ligthert
Guest
 
Posts: n/a
#3: Nov 22 '05

re: Convert text encoded with character referense (&#123;) to unicode or uft-8


:-)


Mihai N.
Guest
 
Posts: n/a
#4: Nov 22 '05

re: Convert text encoded with character referense (&#123;) to unicode or uft-8


> Just do "normal" parsing to find the &#xxx; to start with, then use[color=blue]
> Substring (or whatever) to get the xxx bit, parse it as an integer
> (Int32.Parse or Convert.ToInt32) and cast the result to a character.[/color]

HttpUtility.HtmlDecode
HttpUtility.HtmlEncode


--
Mihai
-------------------------
Replace _year_ with _ to get the real email
Daniel Köster
Guest
 
Posts: n/a
#5: Nov 22 '05

re: Convert text encoded with character referense (&#123;) to unicode or uft-8


Thank you very much!!!

Best regards
Daniel
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message
news:Xns94F810B4B753DMihaiN@204.127.204.17...[color=blue][color=green]
> > Just do "normal" parsing to find the &#xxx; to start with, then use
> > Substring (or whatever) to get the xxx bit, parse it as an integer
> > (Int32.Parse or Convert.ToInt32) and cast the result to a character.[/color]
>
> HttpUtility.HtmlDecode
> HttpUtility.HtmlEncode
>
>
> --
> Mihai
> -------------------------
> Replace _year_ with _ to get the real email[/color]


Closed Thread