471,338 Members | 1,301 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,338 software developers and data experts.

how find if a file is unicode or not

Hi
As the subject says, how do you know if a file is unicode, ascii or
whatever

ta
Jun 27 '08 #1
4 8069
On Jun 25, 5:15 pm, codefragm...@googlemail.com wrote:
As the subject says, how do you know if a file is unicode, ascii or
whatever
You can't find out for sure. You can read some portion of it and check
for common patterns, but it's still only going to be a guess.

Is there no way that you can require files to be in a certain encoding
in your situation?

Jon
Jun 27 '08 #2
On Jun 26, 1:15*am, codefragm...@googlemail.com wrote:
Hi
* As the subject says, how do you know if a file is unicode, ascii or
whatever

ta
I think you might check for the BOM for Unicode text-file, but there's
no certain and universal way to determine text encoding. I haven't
seen any text editor that does this.

Jun 27 '08 #3
On 25 Jun, 17:20, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:
On Jun 25, 5:15 pm, codefragm...@googlemail.com wrote:
* As the subject says, how do you know if a file is unicode, ascii or
whatever

You can't find out for sure. You can read some portion of it and check
for common patterns, but it's still only going to be a guess.

Is there no way that you can require files to be in a certain encoding
in your situation?

Jon
Hi
Thanks for the reply, I'm new to unicode in general.
- Can you have a file thats part unicode and part ascii or are they
one or the other?
- Once the file is read into c# is there anyway of checking the loaded
strings to see if their unicode?
- Anyone got some example code for checking the BOM?

I want to write a noddy program to read in a file that maybe ascii,
maybe unicode. If its unicode it will rewrite it
as ascii (fine so far) and tell you thats it did it. It could check
the file size which I guess should be halved
but I'm surprised theres no easier way of doing this?
Jun 27 '08 #4
<co**********@googlemail.comwrote:
Thanks for the reply, I'm new to unicode in general.
- Can you have a file thats part unicode and part ascii or are they
one or the other?
A file is really just a sequence of bytes. How those bytes are
interpreted is up to the programs using the file. You could certainly
have a file which changed encoding half way through - it would just be
a pain to work with.
- Once the file is read into c# is there anyway of checking the loaded
strings to see if their unicode?
No, it doesn't work that way. All strings in .NET are stored as Unicode
internally. You could see whether all of the characters in the string
are part of the ASCII character set though.
- Anyone got some example code for checking the BOM?
Not offhand - although I believe StreamReader has an overload to auto-
detect the BOM. Have a look at the docs to check.

See http://pobox.com/~skeet/csharp/unicode.html for an introduction to
the topic.
--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon_skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Michael Weir | last post: by
4 posts views Thread by Guilherme Salgado | last post: by
19 posts views Thread by Svennglenn | last post: by
5 posts views Thread by Jamie | last post: by
6 posts views Thread by bobueland | last post: by
1 post views Thread by ujjwaltrivedi | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.