By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,456 Members | 1,303 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,456 IT Pros & Developers. It's quick & easy.

How to recognize whether file has XML format or not?

P: n/a
How to recognize whether file has XML format or not?
Here is the code segment:

XmlDocument* pDomDocument = new XmlDocument();
try
{
pDomDocument->Load(strFileName ) ;
}
catch(Exception* e)
{
....
}

Of course if we try to load non XML file, the exception will be thrown.
However exception might be thrown in other cases as well (File doesn't exist
etc)
I need identify the specific case when attempt was made to load non XML file.
Seems to me XmlException class doesn't provide that kind of details.
Any other ideas how can I accomplish the same goal? Thanks in advance
Nov 12 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
Hi Dale

That's why there's more than one Exception class in the Framework. VB code
follows but you get the idea:

HTH

Nigel

Try
Dim t As New Xml.XmlTextReader("C:\test.xml")
Dim d As New Xml.XmlDocument
d.Load(t)
MessageBox.Show(d.OuterXml)
Catch ex As Xml.XmlException
MessageBox.Show("XML exception " & ex.Message)

Catch ex As IO.FileNotFoundException
MessageBox.Show("File Not found exception " & ex.Message)

End Try

"Dale" wrote:
How to recognize whether file has XML format or not?
Here is the code segment:

XmlDocument* pDomDocument = new XmlDocument();
try
{
pDomDocument->Load(strFileName ) ;
}
catch(Exception* e)
{
...
}

Of course if we try to load non XML file, the exception will be thrown.
However exception might be thrown in other cases as well (File doesn't exist
etc)
I need identify the specific case when attempt was made to load non XML file.
Seems to me XmlException class doesn't provide that kind of details.
Any other ideas how can I accomplish the same goal? Thanks in advance

Nov 12 '05 #2

P: n/a
Hello Nigel,

Thank you for your response.

Well I used "File doesn't exist" case just as example. What if file does
exists but with corrupted XML format etc. My point was the XmlException
doesn't provide clear indication for the case when we try to load non XML
file. Am I wrong? Is there other ways to recognize non XML file?

Thanks,
Dale

"Nigel Armstrong" wrote:
Hi Dale

That's why there's more than one Exception class in the Framework. VB code
follows but you get the idea:

HTH

Nigel

Try
Dim t As New Xml.XmlTextReader("C:\test.xml")
Dim d As New Xml.XmlDocument
d.Load(t)
MessageBox.Show(d.OuterXml)
Catch ex As Xml.XmlException
MessageBox.Show("XML exception " & ex.Message)

Catch ex As IO.FileNotFoundException
MessageBox.Show("File Not found exception " & ex.Message)

End Try

"Dale" wrote:
How to recognize whether file has XML format or not?
Here is the code segment:

XmlDocument* pDomDocument = new XmlDocument();
try
{
pDomDocument->Load(strFileName ) ;
}
catch(Exception* e)
{
...
}

Of course if we try to load non XML file, the exception will be thrown.
However exception might be thrown in other cases as well (File doesn't exist
etc)
I need identify the specific case when attempt was made to load non XML file.
Seems to me XmlException class doesn't provide that kind of details.
Any other ideas how can I accomplish the same goal? Thanks in advance

Nov 12 '05 #3

P: n/a
(Hi Nigel!)

Dale,

The problem is that the parser can't decide whether you have a non XML
file or a malformed XML file. There's no idea of corrupted XML format.
Either the file is well-formed and it's XML or it is not and it's not
XML. I don't see any alternative. A non XML file is seen by the parser
as a malformed XML file anyway. When you submit the file to the XML
parser, it's supposed to be XML. If it's not, the parser will just
report "malformed". You can't expect an XML parser to report "Sorry,
it's a bitmap!" :-) .

--
Patrick Philippot - Microsoft MVP
MainSoft Consulting Services
www.mainsoft.fr
Nov 12 '05 #4

P: n/a
>>The problem is that the parser can't decide whether you have a non XML
file or a malformed XML file.<<

Too bad ...
"Patrick Philippot" wrote:
(Hi Nigel!)

Dale,

The problem is that the parser can't decide whether you have a non XML
file or a malformed XML file. There's no idea of corrupted XML format.
Either the file is well-formed and it's XML or it is not and it's not
XML. I don't see any alternative. A non XML file is seen by the parser
as a malformed XML file anyway. When you submit the file to the XML
parser, it's supposed to be XML. If it's not, the parser will just
report "malformed". You can't expect an XML parser to report "Sorry,
it's a bitmap!" :-) .

--
Patrick Philippot - Microsoft MVP
MainSoft Consulting Services
www.mainsoft.fr

Nov 12 '05 #5

P: n/a
Dale wrote:
How to recognize whether file has XML format or not?


From architectural point of view this problem is usually solved using
content types (media or MIME types) identifiers - each resource has
appropriate content type. XML documents have text/xml, application/xml
content types or derived from these (text/*+xml or application/*+xml).
So if you deal with resource on web - use content type to detect XML
documents.
On file system it works too, but not in .NET unfortunately.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Nov 12 '05 #6

P: n/a
Oleg,

I don't deal with web content. I'll tell you why I've asked this question in
the first place.

We migrate our application from VC++ 6.0 to .NET. Our app used to save all
configuration data to the file with extension .zzz (for example), which is
binary file. We want our new migrated application to save all that data in
XML format in the file with the same extension (.zzz). At the same time we
want our new migrated application be able to open old format (binary zzz)
files.

Basically what I want is elegant way to decide how should I open .zzz file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zzz ) ;
}

Of course I can open zzz file, read first string, search for xml substring etc
But all that looks ugly to me. I was sure that there is simply and elegant
way to distiguish between XML and non XML files.

Thanks,
Dale

"Oleg Tkachenko [MVP]" wrote:
Dale wrote:
How to recognize whether file has XML format or not?


From architectural point of view this problem is usually solved using
content types (media or MIME types) identifiers - each resource has
appropriate content type. XML documents have text/xml, application/xml
content types or derived from these (text/*+xml or application/*+xml).
So if you deal with resource on web - use content type to detect XML
documents.
On file system it works too, but not in .NET unfortunately.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com

Nov 12 '05 #7

P: n/a
Dale wrote:
Of course I can open zzz file, read first string, search for xml substring etc
But all that looks ugly to me. I was sure that there is simply and elegant
way to distiguish between XML and non XML files.


I don't think there is any elegant way. Usually the testing can be done
with small portion of the file content, e.g. its header. Binary files
(such as java's *.class or windows' *.exe) usually have some constant
byte sequence at the beginning, which allows easy file type detection.
The same can be done with XML and it's not so ugly, the only problem is
that XML declaraion is actually optional one.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Nov 12 '05 #8

P: n/a
Dale

The XmlException route will work in this case, but you might prefer to do
what parsers do under the hood, and read the first few bytes of the file - if
you have a declaration (and you should), then it's not too hard to work out
the different scenarios for character encoding with UTF-8 and UCS2 Big /
Little endian and ISO-8859-1....you'll have a byte order mark (if you are
doing unicode) followed by the various representations of <?xml.

if you match one of these, then do an XML file open, otherwise do a binary
file open.

HTH

Nigel Armstrong

"Dale" wrote:
Oleg,

I don't deal with web content. I'll tell you why I've asked this question in
the first place.

We migrate our application from VC++ 6.0 to .NET. Our app used to save all
configuration data to the file with extension .zzz (for example), which is
binary file. We want our new migrated application to save all that data in
XML format in the file with the same extension (.zzz). At the same time we
want our new migrated application be able to open old format (binary zzz)
files.

Basically what I want is elegant way to decide how should I open .zzz file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zzz ) ;
}

Of course I can open zzz file, read first string, search for xml substring etc
But all that looks ugly to me. I was sure that there is simply and elegant
way to distiguish between XML and non XML files.

Thanks,
Dale

"Oleg Tkachenko [MVP]" wrote:
Dale wrote:
How to recognize whether file has XML format or not?


From architectural point of view this problem is usually solved using
content types (media or MIME types) identifiers - each resource has
appropriate content type. XML documents have text/xml, application/xml
content types or derived from these (text/*+xml or application/*+xml).
So if you deal with resource on web - use content type to detect XML
documents.
On file system it works too, but not in .NET unfortunately.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com

Nov 12 '05 #9

P: n/a
Dale wrote:
Basically what I want is elegant way to decide how should I open .zzz
file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zzz ) ;
}


Then try to open the file as binary first. I guess it's easy to
recognize whether it's a valid binary configuration file by looking for
some signature in the file (don't tell me you didn't put one in that
file format :-) ). If it's not a binary configuration file, then it
should be an XML file.

--
Patrick Philippot - Microsoft MVP
MainSoft Consulting Services
www.mainsoft.fr
Nov 12 '05 #10

P: n/a
W3C's standard describes how to do that:
http://www.w3.org/TR/xml11/#sec-guessing. And each XML document must have
the xml declaration so if a file doesn't have it then you can be sure it's
not XML.

Jerry

"Nigel Armstrong" <Ni************@discussions.microsoft.com> wrote in
message news:08**********************************@microsof t.com...
Dale

The XmlException route will work in this case, but you might prefer to do
what parsers do under the hood, and read the first few bytes of the file -
if
you have a declaration (and you should), then it's not too hard to work
out
the different scenarios for character encoding with UTF-8 and UCS2 Big /
Little endian and ISO-8859-1....you'll have a byte order mark (if you are
doing unicode) followed by the various representations of <?xml.

if you match one of these, then do an XML file open, otherwise do a binary
file open.

HTH

Nigel Armstrong

"Dale" wrote:
Oleg,

I don't deal with web content. I'll tell you why I've asked this question
in
the first place.

We migrate our application from VC++ 6.0 to .NET. Our app used to save
all
configuration data to the file with extension .zzz (for example), which
is
binary file. We want our new migrated application to save all that data
in
XML format in the file with the same extension (.zzz). At the same time
we
want our new migrated application be able to open old format (binary zzz)
files.

Basically what I want is elegant way to decide how should I open .zzz
file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zzz ) ;
}

Of course I can open zzz file, read first string, search for xml
substring etc
But all that looks ugly to me. I was sure that there is simply and
elegant
way to distiguish between XML and non XML files.

Thanks,
Dale

"Oleg Tkachenko [MVP]" wrote:
> Dale wrote:
>
> > How to recognize whether file has XML format or not?
>
> From architectural point of view this problem is usually solved using
> content types (media or MIME types) identifiers - each resource has
> appropriate content type. XML documents have text/xml, application/xml
> content types or derived from these (text/*+xml or application/*+xml).
> So if you deal with resource on web - use content type to detect XML
> documents.
> On file system it works too, but not in .NET unfortunately.
> --
> Oleg Tkachenko [XML MVP]
> http://blog.tkachenko.com
>

Nov 12 '05 #11

P: n/a
Jerry Pisk wrote:
W3C's standard describes how to do that:
http://www.w3.org/TR/xml11/#sec-guessing.
Above describes how to detect character encoding of an XML document,
which is quite different from what Dale wants.
And each XML document must have
the xml declaration so if a file doesn't have it then you can be sure it's
not XML.


That's true only for XML 1.1. In XML 1.0 XML declarartion is optional.
And XML 1.1 support is still close to nothing, that's a matter of future.

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Nov 12 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.