473,569 Members | 2,879 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How test the encoding of a file ?

Hello,
do you know a good program to test what sort of charachters encoding
is used in a file.
I use iconv but it only can translate from a charachter encoding to an
other. The problem is that I have some files and the way I get them
doesn't assure me that what encoding they pretend to be is the one
they use.

Thanks for threading on this subject with me.

P.S. I doesn't think that test all the encoding possibilities with
iconv is a good solution.
Jul 20 '05 #1
2 2215
YGUEL wrote:
Hello,
do you know a good program to test what sort of charachters encoding
is used in a file.
I use iconv but it only can translate from a charachter encoding to an
other. The problem is that I have some files and the way I get them
doesn't assure me that what encoding they pretend to be is the one
they use.

Thanks for threading on this subject with me.

P.S. I doesn't think that test all the encoding possibilities with
iconv is a good solution.

I have see the Appendix F of XML 1.0 but does-it exists a code which
does that ?

Jul 20 '05 #2

"YGUEL" <ma**********@l ibertysurf.fr> wrote in message
news:53******** *************** ***@posting.goo gle.com...
Hello,
do you know a good program to test what sort of charachters encoding
is used in a file.
Conformant xml parsers do this up to certain point (the ones that implements
xml spec 1.0 appendix F as you mentioned).
I use iconv but it only can translate from a charachter encoding to an
other. The problem is that I have some files and the way I get them
doesn't assure me that what encoding they pretend to be is the one
they use.

The problem here is there is no idiot proof way to do this -
if we have this kind of document for example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<doc>*</doc>

where * would be copyright sign for example (ASCII value xA9)
BUT despite of ISO-8859-1 being specified document would have
been saved in UTF-8 and thus * would be saved as ASCII
values xC2xA9. Now if you load that file with xml parser
you get xC3x82xC2xA9 (first 2 bytes is xC2 converted to ÚTF-8
and last to bytes is A9 converted to UTF-8)
bytes xC2 and xA9 being perfectly legal latin1 characters, how
would you detect that the file was saved in wrong encoding?
Thanks for threading on this subject with me.

P.S. I doesn't think that test all the encoding possibilities with
iconv is a good solution.


If you're dealing with xml, xml declaration with encoding="whate ver"
specified would be only recognized by an xml parser, not iconv,
there might be some solutions available I'm not aware though, try google.

with respect,
Toni Uusitalo
Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1601
by: lievemario | last post by:
I've written a small xsl file, but it is not working, it doesn't do the xsl:when test, what is wrong with it? The xsl-file: ********
5
2064
by: kmunderwood | last post by:
I am trying to combine "if match=" and "when test" I am a newbie, and have made both work separately, but I can not seem to combine them. This is my xml("index.xml")page(I can not change this, it comes to me this way. <?xml version="1.0" encoding="iso-8859-1" ?>
4
11496
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm having issues using XmlTextWriter, saving it out to a file with UTF8 encoding, and seeing "dirty", or "human unreadable" characters show up *right...
4
1360
by: Nick | last post by:
Hi, Any sample code which test the encoding of a text file? Most results I found in google is : unless you read the file, you will never know the encoding, but the problem is I need to tell the streamreader the encoding to be read, how to do it?
4
19001
by: Nick | last post by:
Hi, I am trying to output a string of chinese characters as a text file. When I open a file for writing from VB, the file is automatically set to UTF-8 encoding (can tell by opening the file from notepad). However, when I open this file from a Chinese program that does not support unicode, garbage is displayed. So what I have to do is...
0
1952
by: Chris McDonough | last post by:
ElementTree's XML serialization routine implied by tree._write(file, node, encoding, namespaces looks like this (elided): def _write(self, file, node, encoding, namespaces): # write XML to file tag = node.tag if tag is Comment: file.write("<!-- %s -->" % _escape_cdata(node.text, encoding)) elif tag is ProcessingInstruction:...
2
2414
by: Netkiller | last post by:
#!/usr/bin/python # -*- coding: utf-8 -*- """ Project: Network News Transport Protocol Server Program Description: 基于数æ®åº“的新闻组,实现BBSå‰ç«¯ä½¿ç”¨NNTPåè®®æ¥è®¿é—®è´´å­ Reference: NNTPå议: http://www.mibsoftware.com/userkt/0099.htm 正则表达å¼ï¼š...
2
1510
by: Jens Jensen | last post by:
I use StreamReader and StreamWriter to manipulate a text file file . The problem is that my Norgian characters get altered. How can i preserve the character set to ASCI ? Thanks JJ
4
4403
by: grbeal | last post by:
How do I test for a child element with xsl if condition? We have a vendor application that outputs an XML file containing records of School Closings due to inclement weather. That XML file gets FTP'd to my web host when the Access database is changed. I'm using Dreamweaver to create an XSLT fragment to read the XML file and include the HTML...
0
7703
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7619
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7930
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7681
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7983
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6290
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5228
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3662
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2118
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.