Determining possible encodings of a given text

=?ISO-8859-1?Q?Nordl=F6w?=

How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

Thanks in advance,
Nordlöw

Jun 27 '08 #1

Subscribe Reply

1500

Kevin

I don't know ..
Hope expert the answer.

"Nordlöw" <pe*********@gmail.com????
news:e0**********************************@x41g2000 hsb.googlegroups.com...

How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

Thanks in advance,
Nordlöw

Jun 27 '08 #2

Jens Thoms Toerring

In comp.lang.c Nordloew <pe*********@gmail.comwrote:

How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

Sorry, but that's not a question related to the C programming
language but about some specific task and libraries (that may
be written in C, but that doesn't make it on-topic). The basic
question would remain the same if you would use C, C++, Perl
or any other programming language.

So just a few hints: figuring out which encoding is used for a
file is probably a very difficult task since it would require
that the program understands something about the content of
the file. It's probably possible to make some well-educated
guess if the file is long enough, but a method that gets it
always right is, as far as I can see, impossible. And libiconv
isn't going to be of any help since it's for converting from an
already known encoding to another, it doesn't try to guess the
source encoding (except in the most trival way, using the
locale dependent character encoding when no source encoding
has been specified).

If you're interested in a more in-depth discussion it probably
would make sense to post to comp.programming instead.

Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de

Jun 27 '08 #3

Richard Tobin

In article <e0**********************************@x41g2000hsb. googlegroups.com>,
Nordlöw <pe*********@gmail.comwrote:

>How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

What do you need to know?

If it doesn't contain any bytes above 127, it's probably ascii. If it
contains lots of zeros in the even or odd positions it's probably
UTF-16. If it contains bytes above 127 *and* they're consistent with
UTF-8, then it's almost certainly UTF-8. If it contains a small
proportion of bytes above 127, it's quite likely some ISO-Latin-N
encoding. I don't know much about far-eastern encoding.

You might look at http://jchardet.sourceforge.net/

-- Richard
--
:wq

Jun 27 '08 #4

Malcolm McLean

"Nordlöw" <pe*********@gmail.comwrote in message

>How do I efficiently determine which possible encoding(s) a given text
is in? Can I use the iconv.h api somehow?

It can't be done efficiently. You need to run through the common encodings
and check for plaintext.
If you don't have a set of encodings it is even more difficult.
Look up Markov modelling.
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Jun 27 '08 #5

Similar topics

8260

Determining the encoding of a text file

by: Rajorshi | last post by:

Hello! How do I determine the encoding of a text file ? That is, given a text file I want to know the encoding it is in UTF8 or UTF16 or Latin etc. It would be very helpful if you could tell me...

Python

2914

Umlauts, encodings, sitecustomize.py

by: F. GEIGER | last post by:

I'm on WinXP, Python 2.3. I don't have problems with umlauts (ä, ö, ü and their uppercase instances) in my wxPython-GUIs, when displayed as static texts. But when filling controls with text...

Python

2518

Class introspection and dynamically determining function arguments

by: Mark English | last post by:

I'd like to write a Tkinter app which, given a class, pops up a window(s) with fields for each "attribute" of that class. The user could enter values for the attributes and on closing the window...

Python

3053

Determining character code page/encoding programmaticlly

by: LP | last post by:

I need to figure encoding or code page of a file programmatically. Also I was asked to figure out what was the original encoding of different records stored as Unicode in SQL Server table. So,...

C# / C Sharp

1487

Text encodings

by: Andy Burchill | last post by:

I am using the StreamReader to read in some text from a plain txt file and then display it in a text box. When I look at the text file in notepad and my program the text looks all messed up,...

C# / C Sharp

1517

Is is possible to programmaticaly save a graphic forma given URL?

by: Stacey Levine | last post by:

Ok.. Maybe I am trying the wrong approach. If given a URL to a graphic, I want to save that graphic to a local file. The approach below gets the response, but I can't quite figure out how to save...

Visual Basic .NET

2038

Determining Charset used by system or software

by: Rémi | last post by:

Question: How can you determine the character set used by a webpage you built? My understanding of the issue is that the character set used by an HTML file (or any other file, for that matter)...

HTML / CSS

3146

Unicode, encodings, and asian languages: need some help.

by: apprentice | last post by:

Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages,...

.NET Framework

1792

Best ways of managing text encodings in source/regexes?

by: tinkerbarbet | last post by:

Hi I've read around quite a bit about Unicode and python's support for it, and I'm still unclear about how it all fits together in certain scenarios. Can anyone help clarify? * When I say "#...

Python

7134

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7014

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

6905

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7395

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

4609

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3103

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1429

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

667

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

311

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General