473,320 Members | 1,922 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Special character to &abc equivalents

Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file
they are just ä.

1. I'm using a StreamReader to read the file and I have found that if I
don't use System.Text.Encoding.UTF7 then the characters are lost
completely. Is this the correct way, or is there a way to automatically
get the Stream Reader to select the correct encoding, or use other code
to determine which would be best?

2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process a
string in order to change the ä to &äuml; and so on.

Thanks in advance for any replies.
Nov 19 '05 #1
8 1254
My advice u set underlying operating system encoding whatever u want. And
use streamreader and streamwriter with System.Text.Encoding.Default which
uses underlying OS encoding.

I had same problems with Turkish encoding but this is the best solution
(IMHO)

--

Thanks,
Yunus Emre ALPÖZEN
BSc, MCAD.NET

"Colin Peters" <cp*****@coldmail.com> wrote in message
news:42********@news.bluewin.ch...
Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely those
which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file they
are just ä.

1. I'm using a StreamReader to read the file and I have found that if I
don't use System.Text.Encoding.UTF7 then the characters are lost
completely. Is this the correct way, or is there a way to automatically
get the Stream Reader to select the correct encoding, or use other code to
determine which would be best?

2. Having read the character from the file, it is output literally to the
html, which I guess is to be expected. Is there a way to process a string
in order to change the ä to &äuml; and so on.

Thanks in advance for any replies.

Nov 19 '05 #2
Colin Peters wrote:
Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file
they are just ä.

1. I'm using a StreamReader to read the file and I have found that if
I don't use System.Text.Encoding.UTF7 then the characters are lost
completely.
UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?
Is this the correct way, or is there a way to
automatically get the Stream Reader to select the correct encoding,
or use other code to determine which would be best?
In general, there's no way to guess a character encoding because
there's no universal metadata that could tell you what encoding is
being used.

To put it differently: You must know the encoding, or allow the user to
switch between possible encodings.

2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process
a string in order to change the ä to &äuml; and so on.


That's not necessary if the page is encoded correctly.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #3
Yunus Emre ALPÖZEN [MCAD.NET] wrote:
My advice u set underlying operating system encoding whatever u want. And
use streamreader and streamwriter with System.Text.Encoding.Default which
uses underlying OS encoding.

I had same problems with Turkish encoding but this is the best solution
(IMHO)


Unfortunately I'm using shared hosting. I have little influence over
operating system parameters.

Thanks anyway.
Nov 19 '05 #4
Joerg Jooss wrote:
UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

Thanks anyway.
Colin Peters wrote:

Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file
they are just ä.

1. I'm using a StreamReader to read the file and I have found that if
I don't use System.Text.Encoding.UTF7 then the characters are lost
completely.

UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

Is this the correct way, or is there a way to
automatically get the Stream Reader to select the correct encoding,
or use other code to determine which would be best?

In general, there's no way to guess a character encoding because
there's no universal metadata that could tell you what encoding is
being used.

To put it differently: You must know the encoding, or allow the user to
switch between possible encodings.
2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process
a string in order to change the ä to &äuml; and so on.

That's not necessary if the page is encoded correctly.

Cheers,

Nov 19 '05 #5
You can set the encoding as a Page directive.

<%@Page Language="VB" ResponseEncoding="UTF-8"%>

<%@Page Language="C#" ResponseEncoding="ISO-8859-1"%>

Juan T. Llibre
ASP.NET MVP
http://asp.net.do/foros/
Foros de ASP.NET en Español
Ven, y hablemos de ASP.NET...
======================

"Colin Peters" <cp*****@coldmail.com> wrote in message
news:42**************@coldmail.com...
Joerg Jooss wrote:
UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

Thanks anyway.

Colin Peters wrote:
Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts. 4>>>Normally, these are rendered in html using "%auml;", but in the file
they are just ä.

1. I'm using a StreamReader to read the file and I have found that if
I don't use System.Text.Encoding.UTF7 then the characters are lost
completely.

UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

Is this the correct way, or is there a way to
automatically get the Stream Reader to select the correct encoding,
or use other code to determine which would be best?

In general, there's no way to guess a character encoding because
there's no universal metadata that could tell you what encoding is
being used.

To put it differently: You must know the encoding, or allow the user to
switch between possible encodings.
2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process
a string in order to change the ä to &äuml; and so on.

That's not necessary if the page is encoded correctly.

Cheers,

Nov 19 '05 #6
Colin Peters wrote:
Joerg Jooss wrote:
> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or

Windows-1252?
I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding


There are only a few default instances in Encoding. You can construct
all encodings by name using Encoding.GetEncoding(), e.g.

Encoding enc = Encoding.GetEncoding("ISO-8859-1").

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #7
Aha! The penny has dropped. Or in this case, the Euro.

Many thanks to all.

Joerg Jooss wrote:
Colin Peters wrote:

Joerg Jooss wrote:
> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or

Windows-1252?
I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

There are only a few default instances in Encoding. You can construct
all encodings by name using Encoding.GetEncoding(), e.g.

Encoding enc = Encoding.GetEncoding("ISO-8859-1").

Cheers,

Nov 19 '05 #8
Server.HtmlEncode(string) will convert any "special chars" from a text file
to the relevant &abc; equivalent without having to worry about codepages... I
use it in my chat application to prevent malicious code being inserted into
the database.

Regards,

Paul Parkinson (www.elysaria.com)

"Colin Peters" wrote:
Aha! The penny has dropped. Or in this case, the Euro.

Many thanks to all.

Joerg Jooss wrote:
Colin Peters wrote:

Joerg Jooss wrote:

> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or
Windows-1252?
I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

There are only a few default instances in Encoding. You can construct
all encodings by name using Encoding.GetEncoding(), e.g.

Encoding enc = Encoding.GetEncoding("ISO-8859-1").

Cheers,

Nov 19 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: dbuser | last post by:
Hi, I need help on a problem, as described below. I am reading a file "input.txt"which has data like this: abc def gh izk lmnopq rst uvwxyz I am using fstream object to read the file and...
1
by: leo | last post by:
Hi ! I would like to know if there is a class or a library or something else allowing C# to read in a ABC File ( the database file of the exe generated by filemaker pro ) tank you
2
by: Alan Searle | last post by:
I generate XML from an MS-Access 2002 database and find that I can format and display the data no problem with a XSL/HTML template. Perfect! However, then I found that I needed to replace my...
2
by: NAT | last post by:
I have a xml open and close tag <abc</abc>. I assign a string variable to this tag. <abc& NAT & </abc. The value of the variable NAT = Cal&FLo. But due to the presence of the ampersand. There is an...
1
by: sonald | last post by:
Dear All, I am working on a module that validates the provided CSV data in a text format, which must be in a predefined format. We check for the : 1. Number of fields provided in the text file,...
44
by: Kulgan | last post by:
Hi I am struggling to find definitive information on how IE 5.5, 6 and 7 handle character input (I am happy with the display of text). I have two main questions: 1. Does IE automaticall...
1
by: askcq | last post by:
what is the difference between these (void*)(&abc) and (void *)(abc) where abc is a local varaible inside a function
3
KevinADC
by: KevinADC | last post by:
Purpose The purpose of this article is to discuss the difference between characters inside a character class and outside a character class and some special characters inside a character class....
7
by: neovantage | last post by:
Hey all, I am creating a xml file in this format <?xml version="1.0" encoding="utf-8" standalone="yes"?> <customers> <customer> <category>Brosch&uuml;ren</category> ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.