On Jul 12, 2:09 am, Joe Gottman <jgott...@carolina.rr.comwrote:
I have an application that opens and reads a text file using an
ifstream. Recently one or two users have entered the name of a Unicode
file, which caused my program to crash because it couldn?t handle it.
Is there any way to determine after opening a file whether or not it
contains Unicode characters? I don?t want to read the Unicode, I just
want to be able to detect it so I can throw an exception.
Several comments:
-- How do you write a program which crashes if a file contains
Unicode? How can it possibly matter?
-- Which encoding format? Just saying Unicode doesn't mean
anything; is it UTF-8, UTF-16 (LE or BE), or UTF-32 (LE or
BE)?
-- Typically, the OS isn't going to tell you, and nothing in
standard C++ will either. You'll just have to read a bit,
and use some heuristics. If every other byte is 0, or
almost, for example, you're probably looking at UTF-16. If
3 out of 4 bytes are 0 (more or less), UTF-32. For UTF-8,
look for the UTF-8 multibyte sequences.
-- And is Unicode really the problem? What happens if the user
specifies an executable? Or any other of a number of binary
file types?
In the end, I think Alf had the only realistic option: GIGO.
Presumably, your text file has some format, which you parse.
Finding unexpected characters should cause errors in the parse
(and not crash the program), so you output a message saying that
there is a problem at such and such a place in the file. (Try
renaming an executable .cpp, and feeding it to the C++ compiler.
You'll probably get a lot of error messages:-), but the compiler
shouldn't crash.)
--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34