473,396 Members | 1,877 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

UTF-8 Encoding

During the course of development cycle I receive HTML files from designers
that use Macs and PCs, but use tools other then Visual Studio. So these files
sometimes are not UTF-8 Encoded.

I see that Visual Studio creates a globalization tag with UTF-8 as the
requestEndcoding and responseEncoding.

I have three questions regarding this:
1. Does the globalization tag convert an ANSI encoded file into UTF-8 when
it complies the ASPX and ASCX pages?
2. Is there a MS tool (or 3rd Partly) that can quickly tell me if a file is
UTF-8 encoded and batch convert a set of file to UTF-8? I have UltraEdit, but
it requires me to open each file, view the encoding, select conversation from
the menu.

Thanks.
Nov 19 '05 #1
2 4336
Hi Jmh,

Welcome to ASPNET newsgroup.
As for the encoding for ASPX page in VS.NET/ ASP.NET RUNTIME, they'll
follow the below rules:

The strings we hardcoded in code file .cs or .vb are compiled into bytes at
compiled time, so we don't need care about them. The strings in aspx file
(ascx) are dynamically compiled into assembly at runtime, that what we need
to take care

1. When developing aspx page in VS.NET , the VS.NET ide will save the aspx
page through the default ANSI code page(setting in System Locale) by
default. Also, we can manually use the save options to change them to UTF8
or Unicode encoding.

Then in the web.config's <globalization> element, there is a fileEncoding
attribute this specify the encoding of the aspx file or other dynamic
resource(ascx...). Then, asp.net runtime will use this encoding to parse
the aspx pages. By default, we will find that <globalization> not
explicitly set fileEncoding, this means that runtime use the default ANSI
codepage of the machine(System Locale) to load aspx. So this is ok when we
develop asp.net pages and runtime on the same machine. But if we develop
pages on one box and will deploy to some other server which may have
different SYSTEM LOCALE settings. It's recommended that we explictly save
the aspx files as a certain charset(encoding) and speicfy the fileEncoding
as the same value in web.config.

2. After the asp.net runtime successfully parse the aspx file and load the
strings into memory, all the strings in .net (characeters) are represented
as utf-16 in memory (no matter what charset they're encoded in the source
file). And when asp.net is about to render the page content out to client
side. It will encode the in memory strings
using the charset(encoding) specified in the <globalization> settings 's
"responseEncoding" attribute.
And the "requestEncoding" attribute specify the charset(encoding) used to
decode the comming bytes from clientside( such as querystring, cookie...).

Both of them can be manually override by code using Request.ContentEncoding
/ Response.ContentEncoding

In addition, as for the
=================
Is there a MS tool (or 3rd Partly) that can quickly tell me if a file is
UTF-8 encoded and batch convert a set of file to UTF-8? I
================
question you mentioned, I have got any idea of any existing ones. However,
we can check whether a file is UTF-8 encoding by read the first bytes in
the file , most utf-8 encoded files will contains a three bytes BOM
(like the two byte BOM for unicode text file), see below:

#Byte Order Mark FAQ (from www.unicode.org)
http://www.websina.com/bugzero/kb/unicode-bom.html

Also, we can open a certain text file in notepad and click save as menu, if
the notepad has successfully load the file as utf-8, in the save as dialog,
the encoding will be automatically set as utf-8.
And as for batch convert files different codepage/charset, we can manually
using the .net's System.Text, Sytem.IO api to convert them as long as we
know the source and destination charset.

Just some of my understandings, if you have any other questions or ideas
,please feel free to post here.
Hope helps.

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)




Nov 19 '05 #2
Hi Jmh,

Welcome to ASPNET newsgroup.
As for the encoding for ASPX page in VS.NET/ ASP.NET RUNTIME, they'll
follow the below rules:

The strings we hardcoded in code file .cs or .vb are compiled into bytes at
compiled time, so we don't need care about them. The strings in aspx file
(ascx) are dynamically compiled into assembly at runtime, that what we need
to take care

1. When developing aspx page in VS.NET , the VS.NET ide will save the aspx
page through the default ANSI code page(setting in System Locale) by
default. Also, we can manually use the save options to change them to UTF8
or Unicode encoding.

Then in the web.config's <globalization> element, there is a fileEncoding
attribute this specify the encoding of the aspx file or other dynamic
resource(ascx...). Then, asp.net runtime will use this encoding to parse
the aspx pages. By default, we will find that <globalization> not
explicitly set fileEncoding, this means that runtime use the default ANSI
codepage of the machine(System Locale) to load aspx. So this is ok when we
develop asp.net pages and runtime on the same machine. But if we develop
pages on one box and will deploy to some other server which may have
different SYSTEM LOCALE settings. It's recommended that we explictly save
the aspx files as a certain charset(encoding) and speicfy the fileEncoding
as the same value in web.config.

2. After the asp.net runtime successfully parse the aspx file and load the
strings into memory, all the strings in .net (characeters) are represented
as utf-16 in memory (no matter what charset they're encoded in the source
file). And when asp.net is about to render the page content out to client
side. It will encode the in memory strings
using the charset(encoding) specified in the <globalization> settings 's
"responseEncoding" attribute.
And the "requestEncoding" attribute specify the charset(encoding) used to
decode the comming bytes from clientside( such as querystring, cookie...).

Both of them can be manually override by code using Request.ContentEncoding
/ Response.ContentEncoding

In addition, as for the
=================
Is there a MS tool (or 3rd Partly) that can quickly tell me if a file is
UTF-8 encoded and batch convert a set of file to UTF-8? I
================
question you mentioned, I have got any idea of any existing ones. However,
we can check whether a file is UTF-8 encoding by read the first bytes in
the file , most utf-8 encoded files will contains a three bytes BOM
(like the two byte BOM for unicode text file), see below:

#Byte Order Mark FAQ (from www.unicode.org)
http://www.websina.com/bugzero/kb/unicode-bom.html

Also, we can open a certain text file in notepad and click save as menu, if
the notepad has successfully load the file as utf-8, in the save as dialog,
the encoding will be automatically set as utf-8.
And as for batch convert files different codepage/charset, we can manually
using the .net's System.Text, Sytem.IO api to convert them as long as we
know the source and destination charset.

Just some of my understandings, if you have any other questions or ideas
,please feel free to post here.
Hope helps.

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)




Nov 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Alban Hertroys | last post by:
Another python/psycopg question, for which the solution is probably quite simple; I just don't know where to look. I have a query that inserts data originating from an utf-8 encoded XML file....
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
16
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the user created input to a particular character...
22
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The database output is UTF-8 or UTF-16 only - Thus almost...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
7
by: saroj.yadav | last post by:
As I understand it (correct me, if I am wrong) Unicode came into picture so that a document containing multiple language characters can be supported like somebody can write a document comparing...
4
by: Cott Lang | last post by:
ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character despite seeing quite a bit of 8 bit. I don't really...
23
by: Steven T. Hatton | last post by:
This is one of the first obstacles I encountered when getting started with C++. I found that everybody had their own idea of what a string is. There was std::string, QString, xercesc::XMLString,...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
3
by: Jared Wiltshire | last post by:
I'm trying to convert a wstring (actually a BSTR) to UTF-8. This is what I've currently got: size_t arraySize; setlocale(LC_CTYPE,"C-UTF-8"); arraySize = wcstombs(NULL, wstr, 0); char...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.