473,725 Members | 2,169 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to recognize whether file has XML format or not?

How to recognize whether file has XML format or not?
Here is the code segment:

XmlDocument* pDomDocument = new XmlDocument();
try
{
pDomDocument->Load(strFileNa me ) ;
}
catch(Exception * e)
{
....
}

Of course if we try to load non XML file, the exception will be thrown.
However exception might be thrown in other cases as well (File doesn't exist
etc)
I need identify the specific case when attempt was made to load non XML file.
Seems to me XmlException class doesn't provide that kind of details.
Any other ideas how can I accomplish the same goal? Thanks in advance
Nov 12 '05 #1
11 4320
Hi Dale

That's why there's more than one Exception class in the Framework. VB code
follows but you get the idea:

HTH

Nigel

Try
Dim t As New Xml.XmlTextRead er("C:\test.xml ")
Dim d As New Xml.XmlDocument
d.Load(t)
MessageBox.Show (d.OuterXml)
Catch ex As Xml.XmlExceptio n
MessageBox.Show ("XML exception " & ex.Message)

Catch ex As IO.FileNotFound Exception
MessageBox.Show ("File Not found exception " & ex.Message)

End Try

"Dale" wrote:
How to recognize whether file has XML format or not?
Here is the code segment:

XmlDocument* pDomDocument = new XmlDocument();
try
{
pDomDocument->Load(strFileNa me ) ;
}
catch(Exception * e)
{
...
}

Of course if we try to load non XML file, the exception will be thrown.
However exception might be thrown in other cases as well (File doesn't exist
etc)
I need identify the specific case when attempt was made to load non XML file.
Seems to me XmlException class doesn't provide that kind of details.
Any other ideas how can I accomplish the same goal? Thanks in advance

Nov 12 '05 #2
Hello Nigel,

Thank you for your response.

Well I used "File doesn't exist" case just as example. What if file does
exists but with corrupted XML format etc. My point was the XmlException
doesn't provide clear indication for the case when we try to load non XML
file. Am I wrong? Is there other ways to recognize non XML file?

Thanks,
Dale

"Nigel Armstrong" wrote:
Hi Dale

That's why there's more than one Exception class in the Framework. VB code
follows but you get the idea:

HTH

Nigel

Try
Dim t As New Xml.XmlTextRead er("C:\test.xml ")
Dim d As New Xml.XmlDocument
d.Load(t)
MessageBox.Show (d.OuterXml)
Catch ex As Xml.XmlExceptio n
MessageBox.Show ("XML exception " & ex.Message)

Catch ex As IO.FileNotFound Exception
MessageBox.Show ("File Not found exception " & ex.Message)

End Try

"Dale" wrote:
How to recognize whether file has XML format or not?
Here is the code segment:

XmlDocument* pDomDocument = new XmlDocument();
try
{
pDomDocument->Load(strFileNa me ) ;
}
catch(Exception * e)
{
...
}

Of course if we try to load non XML file, the exception will be thrown.
However exception might be thrown in other cases as well (File doesn't exist
etc)
I need identify the specific case when attempt was made to load non XML file.
Seems to me XmlException class doesn't provide that kind of details.
Any other ideas how can I accomplish the same goal? Thanks in advance

Nov 12 '05 #3
(Hi Nigel!)

Dale,

The problem is that the parser can't decide whether you have a non XML
file or a malformed XML file. There's no idea of corrupted XML format.
Either the file is well-formed and it's XML or it is not and it's not
XML. I don't see any alternative. A non XML file is seen by the parser
as a malformed XML file anyway. When you submit the file to the XML
parser, it's supposed to be XML. If it's not, the parser will just
report "malformed" . You can't expect an XML parser to report "Sorry,
it's a bitmap!" :-) .

--
Patrick Philippot - Microsoft MVP
MainSoft Consulting Services
www.mainsoft.fr
Nov 12 '05 #4
>>The problem is that the parser can't decide whether you have a non XML
file or a malformed XML file.<<

Too bad ...
"Patrick Philippot" wrote:
(Hi Nigel!)

Dale,

The problem is that the parser can't decide whether you have a non XML
file or a malformed XML file. There's no idea of corrupted XML format.
Either the file is well-formed and it's XML or it is not and it's not
XML. I don't see any alternative. A non XML file is seen by the parser
as a malformed XML file anyway. When you submit the file to the XML
parser, it's supposed to be XML. If it's not, the parser will just
report "malformed" . You can't expect an XML parser to report "Sorry,
it's a bitmap!" :-) .

--
Patrick Philippot - Microsoft MVP
MainSoft Consulting Services
www.mainsoft.fr

Nov 12 '05 #5
Dale wrote:
How to recognize whether file has XML format or not?


From architectural point of view this problem is usually solved using
content types (media or MIME types) identifiers - each resource has
appropriate content type. XML documents have text/xml, application/xml
content types or derived from these (text/*+xml or application/*+xml).
So if you deal with resource on web - use content type to detect XML
documents.
On file system it works too, but not in .NET unfortunately.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Nov 12 '05 #6
Oleg,

I don't deal with web content. I'll tell you why I've asked this question in
the first place.

We migrate our application from VC++ 6.0 to .NET. Our app used to save all
configuration data to the file with extension .zzz (for example), which is
binary file. We want our new migrated application to save all that data in
XML format in the file with the same extension (.zzz). At the same time we
want our new migrated application be able to open old format (binary zzz)
files.

Basically what I want is elegant way to decide how should I open .zzz file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zz z ) ;
}

Of course I can open zzz file, read first string, search for xml substring etc
But all that looks ugly to me. I was sure that there is simply and elegant
way to distiguish between XML and non XML files.

Thanks,
Dale

"Oleg Tkachenko [MVP]" wrote:
Dale wrote:
How to recognize whether file has XML format or not?


From architectural point of view this problem is usually solved using
content types (media or MIME types) identifiers - each resource has
appropriate content type. XML documents have text/xml, application/xml
content types or derived from these (text/*+xml or application/*+xml).
So if you deal with resource on web - use content type to detect XML
documents.
On file system it works too, but not in .NET unfortunately.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com

Nov 12 '05 #7
Dale wrote:
Of course I can open zzz file, read first string, search for xml substring etc
But all that looks ugly to me. I was sure that there is simply and elegant
way to distiguish between XML and non XML files.


I don't think there is any elegant way. Usually the testing can be done
with small portion of the file content, e.g. its header. Binary files
(such as java's *.class or windows' *.exe) usually have some constant
byte sequence at the beginning, which allows easy file type detection.
The same can be done with XML and it's not so ugly, the only problem is
that XML declaraion is actually optional one.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
Nov 12 '05 #8
Dale

The XmlException route will work in this case, but you might prefer to do
what parsers do under the hood, and read the first few bytes of the file - if
you have a declaration (and you should), then it's not too hard to work out
the different scenarios for character encoding with UTF-8 and UCS2 Big /
Little endian and ISO-8859-1....you'll have a byte order mark (if you are
doing unicode) followed by the various representations of <?xml.

if you match one of these, then do an XML file open, otherwise do a binary
file open.

HTH

Nigel Armstrong

"Dale" wrote:
Oleg,

I don't deal with web content. I'll tell you why I've asked this question in
the first place.

We migrate our application from VC++ 6.0 to .NET. Our app used to save all
configuration data to the file with extension .zzz (for example), which is
binary file. We want our new migrated application to save all that data in
XML format in the file with the same extension (.zzz). At the same time we
want our new migrated application be able to open old format (binary zzz)
files.

Basically what I want is elegant way to decide how should I open .zzz file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zz z ) ;
}

Of course I can open zzz file, read first string, search for xml substring etc
But all that looks ugly to me. I was sure that there is simply and elegant
way to distiguish between XML and non XML files.

Thanks,
Dale

"Oleg Tkachenko [MVP]" wrote:
Dale wrote:
How to recognize whether file has XML format or not?


From architectural point of view this problem is usually solved using
content types (media or MIME types) identifiers - each resource has
appropriate content type. XML documents have text/xml, application/xml
content types or derived from these (text/*+xml or application/*+xml).
So if you deal with resource on web - use content type to detect XML
documents.
On file system it works too, but not in .NET unfortunately.
--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com

Nov 12 '05 #9
Dale wrote:
Basically what I want is elegant way to decide how should I open .zzz
file:

if(zzz is XML file)
{
OpenAsXML(zzz ) ;
}
else
{
// zzz is binary
OpenAsBinary(zz z ) ;
}


Then try to open the file as binary first. I guess it's easy to
recognize whether it's a valid binary configuration file by looking for
some signature in the file (don't tell me you didn't put one in that
file format :-) ). If it's not a binary configuration file, then it
should be an XML file.

--
Patrick Philippot - Microsoft MVP
MainSoft Consulting Services
www.mainsoft.fr
Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2329
by: David Hayes | last post by:
I tried finding an answer on http://www.quirksmode.org/ without success. I am attempting a complicated Frames structure. I have made it work in IE, but not Netscape. I begin with three frames, where the two lower ones are within a Frameset within the master Frameset: 1111111111111111111
1
1830
by: Pascal Rodé | last post by:
Hello, I want to write a little text-editor and it can read utf-8 but only if it knows that the file to read is in this format, how can I recognize the utf-8 encoding automatically so that the right format to open the file is chosen by the editor? greetings
27
2354
by: geskerrett | last post by:
I am hoping someone can help me solve a bit of a puzzle. We are working on a data file reader and extraction tool for an old MS-DOS accounting system dating back to the mid 80's. In the data files, the text information is stored in clearly readable ASCII text, so I am comfortable that this file isn't EBCIDIC, however, the some of the numbers are stored in a format that we can't seem to recognize or unpack using the standard python...
5
1990
by: geskerrett | last post by:
We are working on a project to decipher a record structure of an old accounting system that originates from the late80's mid-90's. We have come across a number format that appears to be a "float" but doesn't match any of the more standard implementations. so we are hoping this is a recognizable number storage format with an identifiable name AND pre-built conversion method similiar to the "struct" modules available in python. Here is...
0
1397
by: Joel Burton | last post by:
On Wed, Aug 06, 2003 at 12:55:52PM -0400, Joel Burton wrote: Reposting, with some clarification to my request. Thanks to the several responses I received originally. Yes, I know that a perfectly vaild PGSQL SQL file could contain only ANSI SQL and therefore not be recognized as PG-related. In that case, though, it would be recognized by Vim's ANSI SQL coloring, and given that's all this file contains, that's no problem. ;)
6
2361
by: Jozef | last post by:
I have some code that was working a minute ago, but not it's not working and it's not recognizing the break points I've set, before or after the problem area. Here's a snippet of the code... Private Sub MoveDb_Click() 1 On Error GoTo cmdMoveDb_Err 2 Dim strLocation As String 3 Dim strTempFile As String
8
9030
by: J. D. Leach | last post by:
I am not sure whether this would be considered off topic or not, but here goes.....don't flame me too bad. Running GNU GCC 4.0.1 and GDB 6.3. Was checking my compiler and debugger output prior to doing some coding in C when I discovered that GDB doesn't recognize the format of core dumps when I inject a segmentation fault. Was able to to set ulimit and get the dump OK, but when I tried to gather the error info by typing the following...
7
3296
by: basyarie | last post by:
Dear VB mania, especially VB6 specialist I have a problem with my GPS. So far, I have a GPS-M1zz from Pioneer Navicom company. It has 2 type of data format, i.e. Pioneer Format and NMEA data standard. I have already retrieved and parsed the NMEA one using VB6 properly. I could display all NMEA data concisely. I could understand weell this standard. But, now, I have a problem with the Pioneer format standard. It is very different format with...
2
10128
by: defn noob | last post by:
from Tkinter import * import os master = Tk() w = Canvas(master, width=800, height=600) print os.path.exists('C:/me/saftarn/desktop/images/blob4.jpg') im = PhotoImage(file = 'C:/users/saftarn/desktop/images/blob4.jpg') #im = file = 'C:/users/me/desktop/images/blob4.jpg'
0
8752
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9401
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9176
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8097
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6702
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6011
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3221
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2635
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.