473,231 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,231 software developers and data experts.

how find if a file is unicode or not

Hi
As the subject says, how do you know if a file is unicode, ascii or
whatever

ta
Jun 27 '08 #1
4 8160
On Jun 25, 5:15 pm, codefragm...@googlemail.com wrote:
As the subject says, how do you know if a file is unicode, ascii or
whatever
You can't find out for sure. You can read some portion of it and check
for common patterns, but it's still only going to be a guess.

Is there no way that you can require files to be in a certain encoding
in your situation?

Jon
Jun 27 '08 #2
On Jun 26, 1:15*am, codefragm...@googlemail.com wrote:
Hi
* As the subject says, how do you know if a file is unicode, ascii or
whatever

ta
I think you might check for the BOM for Unicode text-file, but there's
no certain and universal way to determine text encoding. I haven't
seen any text editor that does this.

Jun 27 '08 #3
On 25 Jun, 17:20, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:
On Jun 25, 5:15 pm, codefragm...@googlemail.com wrote:
* As the subject says, how do you know if a file is unicode, ascii or
whatever

You can't find out for sure. You can read some portion of it and check
for common patterns, but it's still only going to be a guess.

Is there no way that you can require files to be in a certain encoding
in your situation?

Jon
Hi
Thanks for the reply, I'm new to unicode in general.
- Can you have a file thats part unicode and part ascii or are they
one or the other?
- Once the file is read into c# is there anyway of checking the loaded
strings to see if their unicode?
- Anyone got some example code for checking the BOM?

I want to write a noddy program to read in a file that maybe ascii,
maybe unicode. If its unicode it will rewrite it
as ascii (fine so far) and tell you thats it did it. It could check
the file size which I guess should be halved
but I'm surprised theres no easier way of doing this?
Jun 27 '08 #4
<co**********@googlemail.comwrote:
Thanks for the reply, I'm new to unicode in general.
- Can you have a file thats part unicode and part ascii or are they
one or the other?
A file is really just a sequence of bytes. How those bytes are
interpreted is up to the programs using the file. You could certainly
have a file which changed encoding half way through - it would just be
a pain to work with.
- Once the file is read into c# is there anyway of checking the loaded
strings to see if their unicode?
No, it doesn't work that way. All strings in .NET are stored as Unicode
internally. You could see whether all of the characters in the string
are part of the ASCII character set though.
- Anyone got some example code for checking the BOM?
Not offhand - although I believe StreamReader has an overload to auto-
detect the BOM. Have a look at the docs to check.

See http://pobox.com/~skeet/csharp/unicode.html for an introduction to
the topic.
--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon_skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
4
by: Guilherme Salgado | last post by:
Hi there, I have a python source file encoded in unicode(utf-8) with some iso8859-1 strings. I've encoded this file as utf-8 in the hope that python will understand these strings as unicode...
19
by: Svennglenn | last post by:
I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for letters. When I run the following...
3
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...
5
by: Jamie | last post by:
I have a file that was written using Java and the file has unicode strings. What is the best way to deal with these in C? The file definition reads: Data Field Description CHAR File...
2
by: hezhenjie | last post by:
Hi, all: I just need to parse a unicode file, and assume to get data one line by one line. I use _wfopen(), fgetws(), wcslen(), wcsstr(), making it work normally on Windows platform. However,...
4
by: Arif | last post by:
My programs searches the header of input barcode in index file. Get the record position next to Barcode header. Then moves the file pointer of products file to reach that record. My products...
6
by: bobueland | last post by:
The module string has a function called translate. I tried to find the source code for that function. In: C:\Python24\Lib there is one file called string.py I open it and it says
1
by: ujjwaltrivedi | last post by:
Hey guys, Can anyone tell me how to create a text file with Unicode Encoding. In am using FileStream Finalfile = new FileStream("finalfile.txt", FileMode.Append, FileAccess.Write); ...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, youll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.