hi,
is it possible to parse an XML file in C so that i can fulfill these
requirements :
1) replace all "<" and ">" signs inside the body of tag by a space, e.g. :
Example 1:
<fooblabla < bla </foo>
becomes
<fooblabla bla </foo>
Example 2:
<foo>blablabl a </foo>
becomes
<fooblablabla </foo>
2) Remove all extra spaces at the end of every line of the XML file
3) Replace all special characters ( Unicode or Hexadecimal characters) by a
space
I mean the XML file is not well formed if there are "<" and ">" signs a
little bit everywhere,
it is not a valid file in that case, so i do not think the use of a parser
would be appropriate in that case. (How would the parser react when it
encounters a < that does not correspond to the beginning of a tag ???)
Do you have an idea on how i can write a program to deal with these
requirements ?
Technical environment is : Unix, KSH, and C (gcc)
I am thinking of using the "sed" command instead, i can get rid of the extra
spaces and replace the special characters but i still do not know how to
deal with the extra ">" and "<" signs.
Thanks for your help. 24 2503
Marc Dubois wrote:
hi,
is it possible to parse an XML file in C so that i can fulfill these
requirements :
1) replace all "<" and ">" signs inside the body of tag by a space, e.g. :
[...]
2) Remove all extra spaces at the end of every line of the XML file
3) Replace all special characters ( Unicode or Hexadecimal characters) by a
space
I mean the XML file is not well formed if there are "<" and ">" signs a
little bit everywhere,
it is not a valid file in that case, so i do not think the use of a parser
would be appropriate in that case. (How would the parser react when it
encounters a < that does not correspond to the beginning of a tag ???)
Do you have an idea on how i can write a program to deal with these
requirements ?
Technical environment is : Unix, KSH, and C (gcc)
I am thinking of using the "sed" command instead, i can get rid of the extra
spaces and replace the special characters but i still do not know how to
deal with the extra ">" and "<" signs.
Pretty much OT for this group. Try a newsgroup that deals with POSIX
tools, or try "man sed".
<state:off-topic>
Also, look at XML tidy/validation tools. HTML tidy has limited XML support.
</>
Marc Dubois wrote:
hi,
is it possible to parse an XML file in C
Of course it is "possible." Is it easy?
Depends on your experience writing parsers.
The XML grammar is not especially complicated
-- that's sort of the point of it.
If you are willing to take a canned solution, there is expat for C http://www.jclark.com/xml/expat.html
However, your problem seems to be formatting and error correction, not
XML parsing. For example,
<fooblabla < bla </foo>
Is not XML.
<foo>blablabl a </foo>
Is not XML
2) Remove all extra spaces at the end of every line of the XML file
You don't need anything but an address of a char array and '\0' to do
that :-)
3) Replace all special characters ( Unicode or Hexadecimal characters) by a
space
This part might be an interesting problem.
I mean the XML file is not well formed if there are "<" and ">" signs a
little bit everywhere,
Right, so you realize this, and you realize that an XML parser will
simply choke on it and (maybe) tell you where the errors are :-)
it is not a valid file in that case, so i do not think the use of a parser
would be appropriate in that case. (How would the parser react when it
encounters a < that does not correspond to the beginning of a tag ???)
Hopefully, it will emit a diagnostic message ...
Do you have an idea on how i can write a program to deal with these
requirements ?
Technical environment is : Unix, KSH, and C (gcc)
I am thinking of using the "sed" command instead, i can get rid of the extra
spaces and replace the special characters but i still do not know how to
deal with the extra ">" and "<" signs.
You could use a lookahead technique since you always know what you want
to match. The naive approach I'd start with, would be to work the
tokens from the outer extremes to inner, maybe making a pass first just
to validate that the angle brackets all match up.
>
Thanks for your help.
I replied to your post because I work in a Java environment, and I
realize I am spoiled. Doing XML in java is too simple to warrant much
discussion. Doing an XML parser in C, on the other hand, from scratch,
would be a very interesting problem.
After considering it for about half a second, I'd look into the
difficulty level of using the Xerces-C++ library in a C app. Or the
XML::Parser perl module.
I realize you want to feed it invalid XML and correct errors; I know
from experience that you can use Xerces to a certain extent to locate
errors, so it might not be terribly hard to take that approach - make
passes through the xerces validator to find errors, fix them, and end up
with the ability to do SAX or DOM on the document for free.
I have never, ever, even considered touching Xerces-C++, so I don't know
if it has anything in common with Xerces-Java. The docs on the xerces
site make it look easy enough to use.
Somebody out there has done this, right?
Just curious, why do you want to use C for this? I'm not bashing C,
(I love it), but this seems like the kind of task Perl was created
for.
--
-Rob Hoelz
On Tue, 12 Dec 2006 22:07:14 +0100 "Marc Dubois" <no@spam.com>
wrote:
hi,
is it possible to parse an XML file in C so that i can fulfill these
requirements :
1) replace all "<" and ">" signs inside the body of tag by a space,
e.g. : Example 1:
<fooblabla < bla </foo>
becomes
<fooblabla bla </foo>
Example 2:
<foo>blablabl a </foo>
becomes
<fooblablabla </foo>
2) Remove all extra spaces at the end of every line of the XML file
3) Replace all special characters ( Unicode or Hexadecimal
characters) by a space
I mean the XML file is not well formed if there are "<" and ">" signs
a little bit everywhere,
it is not a valid file in that case, so i do not think the use of a
parser would be appropriate in that case. (How would the parser react
when it encounters a < that does not correspond to the beginning of a
tag ???)
Do you have an idea on how i can write a program to deal with these
requirements ?
Technical environment is : Unix, KSH, and C (gcc)
I am thinking of using the "sed" command instead, i can get rid of
the extra spaces and replace the special characters but i still do
not know how to deal with the extra ">" and "<" signs.
Thanks for your help.
i dont know PErl
"Rob Hoelz" <ho***@wisc.edu wrote in message
news:2006121217 2417.133beba0@T heRing...
Just curious, why do you want to use C for this? I'm not bashing C,
(I love it), but this seems like the kind of task Perl was created
for.
--
-Rob Hoelz
On Tue, 12 Dec 2006 22:07:14 +0100 "Marc Dubois" <no@spam.com>
wrote:
>hi, is it possible to parse an XML file in C so that i can fulfill these requirements : 1) replace all "<" and ">" signs inside the body of tag by a space, e.g. : Example 1: <fooblabla < bla </foo>
becomes
<fooblabla bla </foo>
Example 2:
<foo>blablabl a </foo>
becomes
<fooblablabla </foo>
2) Remove all extra spaces at the end of every line of the XML file 3) Replace all special characters ( Unicode or Hexadecimal characters) by a space
I mean the XML file is not well formed if there are "<" and ">" signs a little bit everywhere, it is not a valid file in that case, so i do not think the use of a parser would be appropriate in that case. (How would the parser react when it encounters a < that does not correspond to the beginning of a tag ???)
Do you have an idea on how i can write a program to deal with these requirements ? Technical environment is : Unix, KSH, and C (gcc)
I am thinking of using the "sed" command instead, i can get rid of the extra spaces and replace the special characters but i still do not know how to deal with the extra ">" and "<" signs.
Thanks for your help.
It's a good language; I'd consider learning it if I were you.
"Marc Dubois" <no@spam.comwro te:
i dont know PErl
"Rob Hoelz" <ho***@wisc.edu wrote in message
news:2006121217 2417.133beba0@T heRing...
Just curious, why do you want to use C for this? I'm not bashing C,
(I love it), but this seems like the kind of task Perl was created
for.
--
-Rob Hoelz
On Tue, 12 Dec 2006 22:07:14 +0100 "Marc Dubois" <no@spam.com>
wrote:
hi,
is it possible to parse an XML file in C so that i can fulfill
these requirements :
1) replace all "<" and ">" signs inside the body of tag by a space,
e.g. : Example 1:
<fooblabla < bla </foo>
becomes
<fooblabla bla </foo>
Example 2:
<foo>blablabl a </foo>
becomes
<fooblablabla </foo>
2) Remove all extra spaces at the end of every line of the XML file
3) Replace all special characters ( Unicode or Hexadecimal
characters) by a space
I mean the XML file is not well formed if there are "<" and ">"
signs a little bit everywhere,
it is not a valid file in that case, so i do not think the use of a
parser would be appropriate in that case. (How would the parser
react when it encounters a < that does not correspond to the
beginning of a tag ???)
Do you have an idea on how i can write a program to deal with these
requirements ?
Technical environment is : Unix, KSH, and C (gcc)
I am thinking of using the "sed" command instead, i can get rid of
the extra spaces and replace the special characters but i still do
not know how to deal with the extra ">" and "<" signs.
Thanks for your help.
--
-Rob Hoelz
Rob Hoelz wrote:
Just curious, why do you want to use C for this?
Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html >
"Default User" <de***********@ yahoo.comwrites :
Rob Hoelz wrote:
>Just curious, why do you want to use C for this?
Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
<http://www.caliburn.nl/topposting.html >
Lecturing on top posting is OT.
Richard wrote:
"Default User" <de***********@ yahoo.comwrites :
>Rob Hoelz wrote:
>>Just curious, why do you want to use C for this?
Please don't top-post. Your replies belong following or interspersed with properly trimmed quotes. See the majority of other posts in the newsgroup, or: <http://www.caliburn.nl/topposting.html >
Lecturing on top posting is OT.
It's somwehow ironic but: so is lecturing on OT :-)
--
Johannes
You can have it:
Quick, Accurate, Inexpensive.
Pick two.
"John F" <sp**@127.0.0.1 writes:
Richard wrote:
>"Default User" <de***********@ yahoo.comwrites :
>>Rob Hoelz wrote:
Just curious, why do you want to use C for this?
Please don't top-post. Your replies belong following or intersperse d with properly trimmed quotes. See the majority of other posts in the newsgroup, or: <http://www.caliburn.nl/topposting.html >
Lecturing on top posting is OT.
It's somwehow ironic but: so is lecturing on OT :-)
By convention, meta-discussions about topicality are considered
topical.
In my opinion, discussions about how to post properly should also be
considered topical. If nobody ever complained about top-posting, we'd
end up with an ugly mixture of top-posting, bottom-posting,
mid-posting, and whatever other forms of posting some random person
decides Looks Really Cool. The newsgroup will become more difficult
to read, and those who spend the most time here will lose patience and
give up on the newsgroup. Since spending a lot of time here
correlates fairly strongly (but not perfectly) with expertise, I
suggest that this would be to the great detriment of the newsgroup.
Personally, I *usually* don't complain about top-posting unless I
happen to be replying to the article anyway.
Perhaps we should agree on a de facto standard tag, like "[TP]", for
articles that complain about top-posting without adding new content.
Or perhaps there should be a more generic tag for criticisms of
posting style. (In my opinion, articles that complain about posting
style *and* discuss C need no such tag.)
--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Roberto A. F. De Almeida |
last post by:
Hi,
I'm interested in parsing a file containing this "structure":
"""dataset {
int catalog_number;
sequence {
string experimenter;
int32 time;
structure {
|
by: Oxmard |
last post by:
Armed with my new O'Reilly book Optimizing Oracle Performance I have been
trying to get a better understanding of how Oracle works.
The book makes the statement, " A database cal with dep=n + 1 is the
recursive child of the first subsequent dep=n database call listed in the
SQL data stream. The book gives a few examples, and in trying it out it
seemed to work until I tried the following SQL. My question are why does
this not keep with...
|
by: Cigdem |
last post by:
Hello,
I am trying to parse the XML files that the user selects(XML files are
on anoher OS400 system called "wkdis3"). But i am permenantly getting
that error:
Directory0: \\wkdis3\ROOT\home
Canonicalpath-Directory4: \\wkdis3\ROOT\home\bwe\
You selected the file named AAA.XML
getXmlAlgorithmDocument(): IOException Not logged in
|
by: H |
last post by:
Now, I'm here with another newbie question ....
I want to read a text file, string by string (to do some things with some
words etc etc), but I can't seem to find a way to do this String by String.
Is there anyway, like String s = something.ReadString() ?
Or what may be a fine way to do this ? Only thing I can some up with is to
read 1 char at a time, and look if the next char is a space-sign, and that
way "make" the Strings myself....
|
by: christian.eickhoff |
last post by:
Hi Everyone,
I am currently implementing an XercesDOMParser to parse an XML file and
to validate this file against its XSD Schema file which are both
located on my local HD drive. For this purpose I set the corresponding
XercesDOMParser feature as shown in the upcoming subsection of my code.
As far as I understand, the parsing process should throw an
DOMException in case the XML file doesn't match the Schema file (e.g.
Element...
| |
by: baskarpr |
last post by:
Hi all,
I my program after parsing in SAX parser, I want to write the parse result as an XML file. I want to ensure that there should be no difference between source XML file and parse result xml file. Because I set some properties in parser, which may cause to changes between actual and parsed.
What I expect is the exact XML file structure is to be available into another XML file (incl white spc's) after SAX parsing.
Below is a snippet...
|
by: AdrianH |
last post by:
Assumptions
I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C++ programming.
FYI
Although I have called this article “How to Parse a File in C++”, we are actually mostly lexing a file which is the breaking down of a stream in to its component parts, disregarding the syntax that stream contains. Parsing is actually including the syntax in order to make...
|
by: AdrianH |
last post by:
Assumptions
I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming.
FYI
Although I have called this article “How to Parse a File in C++”, we are actually mostly lexing a file which is the breaking down of a stream in to its component parts, disregarding the syntax that stream contains. Parsing is actually including the syntax in order to make...
|
by: souravmallik |
last post by:
Hello,
I'm facing a big logical problem while writing a parser in VC++ using C.
I have to parse a file in a chunk of bytes in a round robin fashion.
Means, when I select a file, the parser will read first 512kb(IBUFFSIZE) of data, then move to next file and parse the same way. This way I can parse a number of file spreaded over different directory uniformly.
I'm keeping a meta data in a file where I'm keeping the track of file parse...
|
by: Felipe De Bene |
last post by:
I'm having problems parsing an HTML file with the following syntax :
<TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'>
<TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH>
<TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%'
BGCOLOR='#c0c0c0'>Date</TH>
and so on....
whenever I feed the parser with such file I get the error :
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
| |
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |