sign in | join about | help | sitemap
Connecting Tech Pros Worldwide
Rui Maciel's Avatar

programming: SAX and get content between open and close tag?


Question posted by: Rui Maciel (Guest) on July 6th, 2006 01:35 PM
Is it possible to, using the SAX approach, extract the XML content between
an opening and closing tag as if it was a continuous string of text?

For example, let's say we have the following document:

<first>
<second>
<alpha>foo</alpha>
<beta>bar</beta>
</second>
</first>

Is it possible to directly extract the content between <firstand </first>
as if it was a text string?


Thanks in advance
Rui Maciel
--
Running Kubuntu 6.06 with KDE 3.5.3 and proud of it.
jabber:rui_maciel@jabber.org
12 Answers Posted
Juergen Kahrs's Avatar
Juergen Kahrs July 6th, 2006 01:55 PM
Guest - n/a Posts
#2: Re: programming: SAX and get content between open and close tag?

Rui Maciel wrote:
Quote:
Originally Posted by
<first>
<second>
<alpha>foo</alpha>
<beta>bar</beta>
</second>
</first>
>
Is it possible to directly extract the content between <firstand </first>
as if it was a text string?


I think it's not possible. Do you expect the tags inside
<firstto appear as text also ? Or do you expect the
character data between the tags to appear only ? What
_exactly_ do you expect to be the result of your example ?
Joe Kesselman's Avatar
Joe Kesselman July 6th, 2006 02:55 PM
Guest - n/a Posts
#3: Re: programming: SAX and get content between open and close tag?

Rui Maciel wrote:
Quote:
Originally Posted by
Is it possible to directly extract the content between <firstand </first>
as if it was a text string?


Not using standard SAX. Run those events back through a SAX serializer
to regenerate the text from them.
rui.maciel@gmail.com's Avatar
rui.maciel@gmail.com July 6th, 2006 03:05 PM
Guest - n/a Posts
#4: Re: programming: SAX and get content between open and close tag?


Joe Kesselman wrote:
Quote:
Originally Posted by
Not using standard SAX. Run those events back through a SAX serializer
to regenerate the text from them.


I see what you mean. But that seems to be a bit redundant, doesn't it?
I mean, run a XML text through a parser, decompose it and then generate
the exact same information from he parser's information... It looks
like too much trouble just to end up practically where we were before.
It would be a lot simpler if it was possible to extract the original
content which is enclosed by certain tags.


Rui Maciel

Rui Maciel's Avatar
Guest - n/a Posts
#5: Re: programming: SAX and get content between open and close tag?

Juergen Kahrs wrote:
Quote:
Originally Posted by
I think it's not possible. Do you expect the tags inside
<firstto appear as text also ? Or do you expect the
character data between the tags to appear only ? What
exactly do you expect to be the result of your example ?


What I had in mind was to extract the literal text which is enclosed in the
<firstand </firsttags, where the child tags would appear also as if
they were text. To put it in other words, extract the XML subsection
enclosed by the <firstand </firsttags.

Is it possible?


Thanks and best regards
Rui Maciel
--
Running Kubuntu 6.06 with KDE 3.5.3 and proud of it.
jabber:rui_maciel@jabber.org
Joe Kesselman's Avatar
Joe Kesselman July 6th, 2006 03:15 PM
Guest - n/a Posts
#6: Re: programming: SAX and get content between open and close tag?

Join Bytes! wrote:
Quote:
Originally Posted by
It would be a lot simpler if it was possible to extract the original
content which is enclosed by certain tags.


The parser has to grovel through all the bytes anyway, to make sure it
has found the correct matching close-tag.

And this is a relatively uncommon case. Normally if folks are reading an
XML document at all, it's because they want its meaning, not its markup.
(For example, note that the meaning of the text is indeterminate without
knowing what namespace declarations it inherits from its surrounding
context.)

There are special cases where this could be useful... but SAX is
designed for the most general cases.
Greger's Avatar
Guest - n/a Posts
#7: Re: programming: SAX and get content between open and close tag?

Join Bytes! wrote:
Quote:
Originally Posted by
>
Joe Kesselman wrote:
Quote:
Originally Posted by
>Not using standard SAX. Run those events back through a SAX serializer
>to regenerate the text from them.

>
I see what you mean. But that seems to be a bit redundant, doesn't it?
I mean, run a XML text through a parser, decompose it and then generate
the exact same information from he parser's information... It looks
like too much trouble just to end up practically where we were before.
It would be a lot simpler if it was possible to extract the original
content which is enclosed by certain tags.
>
>
Rui Maciel

http://www.saxproject.org/quickstart.html
for java, what language do you use?
--
Qx RSS Reader 1.2.6a released
RSS Reader for Linux.
http://www.gregerhaga.net/qxrss-1.2.6-dox
rui.maciel@gmail.com's Avatar
rui.maciel@gmail.com July 6th, 2006 05:55 PM
Guest - n/a Posts
#8: Re: programming: SAX and get content between open and close tag?


Greger wrote:
Quote:
Originally Posted by
http://www.saxproject.org/quickstart.html
for java, what language do you use?


I'm using C++ at the moment with Qt's XML library.

That site seems rather nice. I'll read it to see if I can finally get a
hang of this XML parsing thing.


Thanks for your help
Rui Maciel

William Park's Avatar
Guest - n/a Posts
#9: Re: programming: SAX and get content between open and close tag?

Rui Maciel <rui.maciel@gmail.comwrote:
Quote:
Originally Posted by
Juergen Kahrs wrote:
>
Quote:
Originally Posted by
I think it's not possible. Do you expect the tags inside
<firstto appear as text also ? Or do you expect the
character data between the tags to appear only ? What
exactly do you expect to be the result of your example ?

>
What I had in mind was to extract the literal text which is enclosed in the
<firstand </firsttags, where the child tags would appear also as if
they were text. To put it in other words, extract the XML subsection
enclosed by the <firstand </firsttags.
>
Is it possible?


If <firsttag is not nested, then treat the XML file as long string.
So, find the first <first>, then find the first </first>. Otherwise,
you have to do some bookkeeping.

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
Joe Kesselman's Avatar
Joe Kesselman July 7th, 2006 12:35 AM
Guest - n/a Posts
#10: Re: programming: SAX and get content between open and close tag?

William Park wrote:
Quote:
Originally Posted by
If <firsttag is not nested, then treat the XML file as long string.
So, find the first <first>, then find the first </first>. Otherwise,
you have to do some bookkeeping.


In other words, text-based rather than XML-based processing, the
"desperate PERL hacker" solution. Doable. Ugly. Sometimes worth
considering, but often means you're asking the wrong questions or
optimizing the wrong things.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Malcolm Dew-Jones's Avatar
Malcolm Dew-Jones July 7th, 2006 01:15 AM
Guest - n/a Posts
#11: Re: programming: SAX and get content between open and close tag?

Joe Kesselman (keshlam-nospam@comcast.net) wrote:
: William Park wrote:
: If <firsttag is not nested, then treat the XML file as long string.
: So, find the first <first>, then find the first </first>. Otherwise,
: you have to do some bookkeeping.

: In other words, text-based rather than XML-based processing, the
: "desperate PERL hacker" solution. Doable. Ugly. Sometimes worth
: considering, but often means you're asking the wrong questions or
: optimizing the wrong things.

No, I think he means that your sax event handler code does something like
the following

global variable first_depth=0;

sub start_element( the_element_as_an_object )
{
if (the_element_as_an_object->its_name = 'first')
{
first_depth ++;
}

if (first_depth 0)
{
my_print_element_as_text( the_element_as_an_object );
}
}

sub end_element( the_element_end_as_an_object )
{
if (first_depth 0)
{
my_print_element_end_as_text( the_element_end_as_an_object );
}

if (the_element_end_as_an_object->its_name = 'first')
{
first_depth --;
}

}

sub handle_everything_else( the_thing_as_an_object)
{
if (first_depth 0)
{
my_print_thing_as_text( the_thing_as_an_object );
}
}


You have to provide the my_print_xxx_as_text routines, and of course the
above is completely pseudo code, but I think you might get the idea.

Joe Kesselman's Avatar
Joe Kesselman July 7th, 2006 01:55 AM
Guest - n/a Posts
#12: Re: programming: SAX and get content between open and close tag?

Malcolm Dew-Jones wrote:
Quote:
Originally Posted by
You have to provide the my_print_xxx_as_text routines, and of course the
above is completely pseudo code, but I think you might get the idea.


That's the "reserialize SAX events into text form" solution, which Rui
was objecting to.


--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Greger's Avatar
Guest - n/a Posts
#13: Re: programming: SAX and get content between open and close tag?

Join Bytes! wrote:
Quote:
Originally Posted by
>
Greger wrote:
>
Quote:
Originally Posted by
>http://www.saxproject.org/quickstart.html
>for java, what language do you use?

>
I'm using C++ at the moment with Qt's XML library.
>
That site seems rather nice. I'll read it to see if I can finally get a
hang of this XML parsing thing.
>
>
Thanks for your help
Rui Maciel

I have never used sax myself, using libxml2 tree in my project, but what
you'ld probably need to do is to "trigger" the function that processes the
contents of a tag when the tagtype you are looking for occurs.
Better:see the Qt documentation, I am sure there are simple ways to achieve
what you try to do.
--
Qx RSS Reader 1.2.6a released
RSS Reader for Linux.
http://www.gregerhaga.net/qxrss-1.2.6-dox
 
Not the answer you were looking for? Post your question . . .
196,986 members ready to help you find a solution.
Join Bytes.com

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 196,986 network members.
Post your question now . . .
It's fast and it's free

Popular Articles

Top Community Contributors