473,513 Members | 2,334 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How test the encoding of a file ?

Hello,
do you know a good program to test what sort of charachters encoding
is used in a file.
I use iconv but it only can translate from a charachter encoding to an
other. The problem is that I have some files and the way I get them
doesn't assure me that what encoding they pretend to be is the one
they use.

Thanks for threading on this subject with me.

P.S. I doesn't think that test all the encoding possibilities with
iconv is a good solution.
Jul 20 '05 #1
2 2211
YGUEL wrote:
Hello,
do you know a good program to test what sort of charachters encoding
is used in a file.
I use iconv but it only can translate from a charachter encoding to an
other. The problem is that I have some files and the way I get them
doesn't assure me that what encoding they pretend to be is the one
they use.

Thanks for threading on this subject with me.

P.S. I doesn't think that test all the encoding possibilities with
iconv is a good solution.

I have see the Appendix F of XML 1.0 but does-it exists a code which
does that ?

Jul 20 '05 #2

"YGUEL" <ma**********@libertysurf.fr> wrote in message
news:53**************************@posting.google.c om...
Hello,
do you know a good program to test what sort of charachters encoding
is used in a file.
Conformant xml parsers do this up to certain point (the ones that implements
xml spec 1.0 appendix F as you mentioned).
I use iconv but it only can translate from a charachter encoding to an
other. The problem is that I have some files and the way I get them
doesn't assure me that what encoding they pretend to be is the one
they use.

The problem here is there is no idiot proof way to do this -
if we have this kind of document for example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<doc>*</doc>

where * would be copyright sign for example (ASCII value xA9)
BUT despite of ISO-8859-1 being specified document would have
been saved in UTF-8 and thus * would be saved as ASCII
values xC2xA9. Now if you load that file with xml parser
you get xC3x82xC2xA9 (first 2 bytes is xC2 converted to ÚTF-8
and last to bytes is A9 converted to UTF-8)
bytes xC2 and xA9 being perfectly legal latin1 characters, how
would you detect that the file was saved in wrong encoding?
Thanks for threading on this subject with me.

P.S. I doesn't think that test all the encoding possibilities with
iconv is a good solution.


If you're dealing with xml, xml declaration with encoding="whatever"
specified would be only recognized by an xml parser, not iconv,
there might be some solutions available I'm not aware though, try google.

with respect,
Toni Uusitalo
Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1595
by: lievemario | last post by:
I've written a small xsl file, but it is not working, it doesn't do the xsl:when test, what is wrong with it? The xsl-file: ********
5
2060
by: kmunderwood | last post by:
I am trying to combine "if match=" and "when test" I am a newbie, and have made both work separately, but I can not seem to combine them. This is my xml("index.xml")page(I can not change this,...
4
11489
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm...
4
1354
by: Nick | last post by:
Hi, Any sample code which test the encoding of a text file? Most results I found in google is : unless you read the file, you will never know the encoding, but the problem is I need to tell...
4
18998
by: Nick | last post by:
Hi, I am trying to output a string of chinese characters as a text file. When I open a file for writing from VB, the file is automatically set to UTF-8 encoding (can tell by opening the file...
0
1943
by: Chris McDonough | last post by:
ElementTree's XML serialization routine implied by tree._write(file, node, encoding, namespaces looks like this (elided): def _write(self, file, node, encoding, namespaces): # write XML to file...
2
2406
by: Netkiller | last post by:
#!/usr/bin/python # -*- coding: utf-8 -*- """ Project: Network News Transport Protocol Server Program Description: 基于数æ®åº“的新闻组,实现BBSå‰ç«¯ä½¿ç”¨NNTPåè®®æ¥è®¿é—®è´´å­...
2
1499
by: Jens Jensen | last post by:
I use StreamReader and StreamWriter to manipulate a text file file . The problem is that my Norgian characters get altered. How can i preserve the character set to ASCI ? Thanks JJ
4
4400
by: grbeal | last post by:
How do I test for a child element with xsl if condition? We have a vendor application that outputs an XML file containing records of School Closings due to inclement weather. That XML file gets...
0
7171
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7388
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7545
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7111
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5692
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5095
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4751
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
1
807
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
461
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.