473,626 Members | 3,439 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to determine stream type?

Given a file, how do I know if it's ascii or unicode or binary? And how
do I know if it's rtf or html or etc? In other words, how do I find the
stream type or mime type?
(No, file extension cannot be the answer)

Thanks

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 15 '05 #1
6 14320
<Kaki <--NO-->> wrote:
Given a file, how do I know if it's ascii or unicode or binary? And how
do I know if it's rtf or html or etc? In other words, how do I find the
stream type or mime type?
(No, file extension cannot be the answer)


There's no way of doing it, basically. A stream is just a sequence of
bytes, and it's perfectly possible to have a stream of bytes which is a
valid document when viewed from more than one perspective (e.g. a text
file in two different encodings).

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #2

"Kaki" <--NO--> wrote in message
news:#Y******** ******@TK2MSFTN GP10.phx.gbl...

Given a file, how do I know if it's ascii or unicode or binary ?
And how do I know if it's rtf or html or etc? In other words,
how do I find the stream type or mime type?

(No, file extension cannot be the answer)


Only large-system operating systems such as VMS [DEC / Compaq] and MVS [IBM]
make any formal distinction between file types. In these systems there are
even physical differences between file types in so far as they are stored
differently, and are accessed with different code routines.

Under operating systems such as DOS / Windows-family, and *NIX / Linux, a
'file' is merely a named, persistent collection of bytes, and the only way
to tell whether a file contains data that is to be interpreted as text, or
as binary is by adherence to some conven'tion such as file extension usage
[e.g. '.txt' indicates a text file etc], and schemes such as searching
'magic numbers' [i.e. byte sequences known to uniquely identify file types]
in files, one heaviliy used in the *NIX / Linux world [the latter systems
also make distinctions between things like sockets, and devices at the
operating system level, but this hardly helps in identifying file types].

Thus, the answer is: there is no way of guaranteeing what a file's 'type'
actually is. All you can do is adhere to some convention, and hope that
everyone else follows suit. When attempting to access a particular file you
would check to ensure that the data read in conforms to the expected pattern
/ format for that file type.

For example, an HTML file could be expected to contain a <HTML> tag
somewhere near the start of the file, while many proprietary file formats
[e.g. MS Excel, Word etc] would sport a byte collection known as a 'header'
containing 'fields' with version information and the like. If, in reading
such files, the expected tags are found, or 'sensible' values for each
field are read in, then you can be reasonably sure [though not absolutuely
certain] that the 'correct' file type has been accessed.

Note that I made no mention of 'streams' which are nothing more than
program objects that are temporarily connected or linked to file(s) for
purposes of file data access / updating. Now, it might be possible for such
objects to report information about the file, or the current connection /
linkage status. However, when first creating establishing a link to a
specified file, such objects can merely make the checks mentioned earlier to
ascertain the 'correctness' of the file.

I'm not sure this is the type of response you were after, but the rather
general nature of your query seemed to warrant it. Additionally, it is the
type of issue that trancends any one programming language / environment.

I hope this helps.

Anthony Borla
Nov 15 '05 #3
Anthony Borla <aj*****@bigpon d.com> wrote:
Under operating systems such as DOS / Windows-family, and *NIX / Linux, a
'file' is merely a named, persistent collection of bytes


Actually that's not true - a file has other attributes under all of the
above. Under Windows a file may be read-only, or hidden, with various
security attributes. Under NT-based systems it may also have alternate
"streams" (not to be confused with the .NET concept of a stream) which
may give additional information. Some Linux file-systems have metadata
too.

A plain Stream in .NET terms, however, has none of this - that really
*is* just a sequence of bytes. Derived types may add more information,
as you've said.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #4

"Kaki" <--NO--> wrote in message
news:%2******** ********@TK2MSF TNGP10.phx.gbl. ..
Given a file, how do I know if it's ascii or unicode or binary? And how
do I know if it's rtf or html or etc? In other words, how do I find the
stream type or mime type?
(No, file extension cannot be the answer)

Thanks

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Athough its not possible to be certain, enough tests should allow you to
figure out what it is(within a limited domain). There is a method[1] that
comes with Internet Explorer that can test for (according to the docs 26)
different types[2]. Its not perfect but the safest bet you have.
As for unicode\ascii differentation, unless you find byte order marks and
are reasonably sure its text, not binary, its not possible to say. Above
all, you should do your best to keep track of type upon loading, but these
should allow you to do some very basic checks.

1.
http://msdn.microsoft.com/library/de...mefromdata.asp
2.
http://msdn.microsoft.com/library/de...appendix_a.asp
Nov 15 '05 #5
Hopefully we'll see this potentially nice feature in framework v1.2 and
beyond...

I hadnt really considered the issue but I do side with the original poster
in that there SHOULD be a common code base that can determine the type of
stream. And, since MIME is becoming a convienient standard then so be it.
--
Eric Newton
C#/ASP Application Developer
http://ensoft-software.com/
er**@cc.ensoft-software.com [remove the first "CC."]

"Daniel O'Connell" <onyxkirx@--NOSPAM--comcast.net> wrote in message
news:eO******** ******@tk2msftn gp13.phx.gbl...

"Kaki" <--NO--> wrote in message
news:%2******** ********@TK2MSF TNGP10.phx.gbl. ..
Given a file, how do I know if it's ascii or unicode or binary? And how
do I know if it's rtf or html or etc? In other words, how do I find the
stream type or mime type?
(No, file extension cannot be the answer)

Thanks

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Athough its not possible to be certain, enough tests should allow you to
figure out what it is(within a limited domain). There is a method[1] that
comes with Internet Explorer that can test for (according to the docs 26)
different types[2]. Its not perfect but the safest bet you have.
As for unicode\ascii differentation, unless you find byte order marks and
are reasonably sure its text, not binary, its not possible to say. Above
all, you should do your best to keep track of type upon loading, but these
should allow you to do some very basic checks.

1.

http://msdn.microsoft.com/library/de...mefromdata.asp 2.
http://msdn.microsoft.com/library/de...appendix_a.asp

Nov 15 '05 #6

"Eric Newton" <er**@cc.enso ft-software.com> wrote in message
news:%2******** ********@TK2MSF TNGP10.phx.gbl. ..
Hopefully we'll see this potentially nice feature in framework v1.2 and
beyond...

I hadnt really considered the issue but I do side with the original poster
in that there SHOULD be a common code base that can determine the type of
stream. And, since MIME is becoming a convienient standard then so be it.
It has its ups, but it is still, unfortunatly, mostly a guess. Outside of
creating standard formats(for example, an xml document that had a <format>
tag), this will always be a guess, and bad luck could result in an incorrect
detection.
I suspect that it should be fairly trivial to get a good guess between image
formats, sgml derived, xml and other text formats, and perhaps other RIFF
type objects, but more complicated, propritary binary formats are probably
out of the question. Also text encoding is an issue because, with the
exception of some forms of unicode, there is no marker, only text data.

However, a managed implementation would be of value, especially if you could
plug in your own recognizers. Even if its not provided in the 1.2\2.0
framework, it is something an independent developer could write.
--
Eric Newton
C#/ASP Application Developer
http://ensoft-software.com/
er**@cc.ensoft-software.com [remove the first "CC."]

"Daniel O'Connell" <onyxkirx@--NOSPAM--comcast.net> wrote in message
news:eO******** ******@tk2msftn gp13.phx.gbl...

"Kaki" <--NO--> wrote in message
news:%2******** ********@TK2MSF TNGP10.phx.gbl. ..
Given a file, how do I know if it's ascii or unicode or binary? And how do I know if it's rtf or html or etc? In other words, how do I find the stream type or mime type?
(No, file extension cannot be the answer)

Thanks

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Athough its not possible to be certain, enough tests should allow you to
figure out what it is(within a limited domain). There is a method[1] that comes with Internet Explorer that can test for (according to the docs 26) different types[2]. Its not perfect but the safest bet you have.
As for unicode\ascii differentation, unless you find byte order marks and are reasonably sure its text, not binary, its not possible to say. Above
all, you should do your best to keep track of type upon loading, but these should allow you to do some very basic checks.

1.

http://msdn.microsoft.com/library/de...mefromdata.asp
2.

http://msdn.microsoft.com/library/de...appendix_a.asp


Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4948
by: Meyer1228 | last post by:
We're printing invoices using the printer function in VB6. If we print to an IBM-compatible dot-matrix, we can use a font called "12 CPI HSD" which works well, since we want to use 12-pitch. If we want to print to our HP Laser, we need to know that it is a laser so we can change the font to something comparable. Is there a way to determine what type of printer is selected - i.e. Laser(or deskjet) or Dot Matrix?
21
39343
by: Sami Viitanen | last post by:
Hello, How can I check if a file is binary or text? There was some easy way but I forgot it.. Thanks in adv.
5
6493
by: Christian Christmann | last post by:
Hi, I've a some classes which are all inherited from the same base class class_BASE; There's another class which holds some information and is "bound" to one of the inherited classes indicated by the member attibute class_BASE *mElement;
4
3048
by: MCollins | last post by:
trying to determine a variable type, specifically that a variable is an integer. i tried using type(var) but that only seemed to produce a response in the command line. is there a built in python function to determine if a variable is an integer?
1
2283
by: Dan | last post by:
All, I am working on an application that allows users to track various items for various clients. For example Client A may have an object Box where Client B has an object Canister. When a user goes to enter a new object I would like the application to determine what objects are available based upon the client (easy enough) and when they determine what object to enter the program would know that it needed to instantiate an object of...
2
9489
by: CJack | last post by:
hi, i have a window form with different controls. i want to loop through all the controls and write the types and lables of each control in a file. I dnt know how to determine the type of a control that wether it is a button, lable or text box. any body out there to help please. thanks in advance
2
1262
by: ljlevend | last post by:
I want to determine if a System.Type represents a system type (i.e., a type that is defined in the .NET framework). The System.Type.UnderlyingSystemType seems like the type of thing that I'm after, but that property always seems to return the same type as the original type (i.e., type is type.UnderlyingSystemType always seems to be True). Thanks for any help. Lance
0
1281
by: Hetal | last post by:
Hi.. I have a VB.NET (Windows forms) application and i would like to determine the type of database (SQL/MySQL/Access) i am connected to. In native VB6, say i have a connection object named cntDB, doing "cntDB.Properties(x)" will provide me with the type of database i am connected to. Here is a sample code from native vb6 application that provides me the database type:
4
2694
by: Bill Fuller | last post by:
I am trying to determine the type for ActiveControls using 3rd party controls (Infragistics in this case) during runtime and getting a rather odd return type at runtime for the UltraWinEditor. Code shippet is as follows: if ( ActiveControl.GetType() == typeof(UltraTextEditor)) { UltraTextEditor tb = (UltraTextEditor) this.ActiveControl; if (tb.Multiline == true)
0
8202
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8707
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8641
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8510
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5575
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4093
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4202
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2628
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1812
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.