473,397 Members | 1,969 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Autodetect UTF-8 vs ISO-8859-1

As part of an error correction mechanism, I'd like to
autodetect ISO-8859-1 vs UTF-8 usage. Where is this
described concisely?

-Lars

Jul 20 '05 #1
2 3396
Lars <la**@nospam.ac.no> writes:
As part of an error correction mechanism, I'd like to autodetect
ISO-8859-1 vs UTF-8 usage. Where is this described concisely?


For an arbitrary text file, it is impossible to distinguish
automatically between the two and be 100 percent sure of choosing
correctly. However, if the file contains no invalid UTF-8 sequences,
it is almost certainly UTF-8. It would be a very unusual ISO-8859-1
file that did not have invalid UTF-8 sequences.

For XML files, it's much simpler: if it is ISO-8859-1, it has to be
declared in the XML declaration.

--
Dean Tiegs, NE¼-20-52-25-W4
“Confortare et esto robustus”
http://telusplanet.net/public/dctiegs/
Jul 20 '05 #2
Dean Tiegs wrote:
For XML files, it's much simpler: if it is ISO-8859-1, it has to be
declared in the XML declaration.


Or some lower-level protocol, like HTTP Content-Type header.
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)

Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: lawrence | last post by:
Someone on www.php.net suggested using a seems_utf8() method to test text for UTF-8 character encoding but didn't specify how to write such a method. Can anyone suggest a test that might work?...
3
by: aa | last post by:
Is it OK to include an ANSI file into a UTF-8 file?
4
by: Alban Hertroys | last post by:
Another python/psycopg question, for which the solution is probably quite simple; I just don't know where to look. I have a query that inserts data originating from an utf-8 encoded XML file....
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
32
by: Wolfgang Draxinger | last post by:
I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size()....
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
10
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email...
23
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars...
3
by: =?Utf-8?B?bWdvcHBlcnQ=?= | last post by:
I'm having an issue with using web services in a web application marked with session cookieless attribuet set to "AutoDetect". My real life scenario is too complex to lay out but here's my...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.