473,395 Members | 1,846 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

programming: SAX and get content between open and close tag?

Is it possible to, using the SAX approach, extract the XML content between
an opening and closing tag as if it was a continuous string of text?

For example, let's say we have the following document:

<first>
<second>
<alpha>foo</alpha>
<beta>bar</beta>
</second>
</first>

Is it possible to directly extract the content between <firstand </first>
as if it was a text string?
Thanks in advance
Rui Maciel
--
Running Kubuntu 6.06 with KDE 3.5.3 and proud of it.
jabber:ru********@jabber.org
Jul 6 '06 #1
12 1724
Rui Maciel wrote:
<first>
<second>
<alpha>foo</alpha>
<beta>bar</beta>
</second>
</first>

Is it possible to directly extract the content between <firstand </first>
as if it was a text string?
I think it's not possible. Do you expect the tags inside
<firstto appear as text also ? Or do you expect the
character data between the tags to appear only ? What
_exactly_ do you expect to be the result of your example ?
Jul 6 '06 #2
Rui Maciel wrote:
Is it possible to directly extract the content between <firstand </first>
as if it was a text string?
Not using standard SAX. Run those events back through a SAX serializer
to regenerate the text from them.
Jul 6 '06 #3

Joe Kesselman wrote:
Not using standard SAX. Run those events back through a SAX serializer
to regenerate the text from them.
I see what you mean. But that seems to be a bit redundant, doesn't it?
I mean, run a XML text through a parser, decompose it and then generate
the exact same information from he parser's information... It looks
like too much trouble just to end up practically where we were before.
It would be a lot simpler if it was possible to extract the original
content which is enclosed by certain tags.
Rui Maciel

Jul 6 '06 #4
Juergen Kahrs wrote:
I think it's not possible. Do you expect the tags inside
<firstto appear as text also ? Or do you expect the
character data between the tags to appear only ? What
exactly do you expect to be the result of your example ?
What I had in mind was to extract the literal text which is enclosed in the
<firstand </firsttags, where the child tags would appear also as if
they were text. To put it in other words, extract the XML subsection
enclosed by the <firstand </firsttags.

Is it possible?
Thanks and best regards
Rui Maciel
--
Running Kubuntu 6.06 with KDE 3.5.3 and proud of it.
jabber:ru********@jabber.org
Jul 6 '06 #5
ru********@gmail.com wrote:
It would be a lot simpler if it was possible to extract the original
content which is enclosed by certain tags.
The parser has to grovel through all the bytes anyway, to make sure it
has found the correct matching close-tag.

And this is a relatively uncommon case. Normally if folks are reading an
XML document at all, it's because they want its meaning, not its markup.
(For example, note that the meaning of the text is indeterminate without
knowing what namespace declarations it inherits from its surrounding
context.)

There are special cases where this could be useful... but SAX is
designed for the most general cases.
Jul 6 '06 #6
ru********@gmail.com wrote:
>
Joe Kesselman wrote:
>Not using standard SAX. Run those events back through a SAX serializer
to regenerate the text from them.

I see what you mean. But that seems to be a bit redundant, doesn't it?
I mean, run a XML text through a parser, decompose it and then generate
the exact same information from he parser's information... It looks
like too much trouble just to end up practically where we were before.
It would be a lot simpler if it was possible to extract the original
content which is enclosed by certain tags.
Rui Maciel
http://www.saxproject.org/quickstart.html
for java, what language do you use?
--
Qx RSS Reader 1.2.6a released
RSS Reader for Linux.
http://www.gregerhaga.net/qxrss-1.2.6-dox
Jul 6 '06 #7

Greger wrote:
http://www.saxproject.org/quickstart.html
for java, what language do you use?
I'm using C++ at the moment with Qt's XML library.

That site seems rather nice. I'll read it to see if I can finally get a
hang of this XML parsing thing.
Thanks for your help
Rui Maciel

Jul 6 '06 #8
Rui Maciel <ru********@gmail.comwrote:
Juergen Kahrs wrote:
I think it's not possible. Do you expect the tags inside
<firstto appear as text also ? Or do you expect the
character data between the tags to appear only ? What
exactly do you expect to be the result of your example ?

What I had in mind was to extract the literal text which is enclosed in the
<firstand </firsttags, where the child tags would appear also as if
they were text. To put it in other words, extract the XML subsection
enclosed by the <firstand </firsttags.

Is it possible?
If <firsttag is not nested, then treat the XML file as long string.
So, find the first <first>, then find the first </first>. Otherwise,
you have to do some bookkeeping.

--
William Park <op**********@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
Jul 6 '06 #9
William Park wrote:
If <firsttag is not nested, then treat the XML file as long string.
So, find the first <first>, then find the first </first>. Otherwise,
you have to do some bookkeeping.
In other words, text-based rather than XML-based processing, the
"desperate PERL hacker" solution. Doable. Ugly. Sometimes worth
considering, but often means you're asking the wrong questions or
optimizing the wrong things.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Jul 6 '06 #10
Joe Kesselman (ke************@comcast.net) wrote:
: William Park wrote:
: If <firsttag is not nested, then treat the XML file as long string.
: So, find the first <first>, then find the first </first>. Otherwise,
: you have to do some bookkeeping.

: In other words, text-based rather than XML-based processing, the
: "desperate PERL hacker" solution. Doable. Ugly. Sometimes worth
: considering, but often means you're asking the wrong questions or
: optimizing the wrong things.

No, I think he means that your sax event handler code does something like
the following

global variable first_depth=0;

sub start_element( the_element_as_an_object )
{
if (the_element_as_an_object->its_name = 'first')
{
first_depth ++;
}

if (first_depth 0)
{
my_print_element_as_text( the_element_as_an_object );
}
}

sub end_element( the_element_end_as_an_object )
{
if (first_depth 0)
{
my_print_element_end_as_text( the_element_end_as_an_object );
}

if (the_element_end_as_an_object->its_name = 'first')
{
first_depth --;
}

}

sub handle_everything_else( the_thing_as_an_object)
{
if (first_depth 0)
{
my_print_thing_as_text( the_thing_as_an_object );
}
}
You have to provide the my_print_xxx_as_text routines, and of course the
above is completely pseudo code, but I think you might get the idea.

Jul 7 '06 #11
Malcolm Dew-Jones wrote:
You have to provide the my_print_xxx_as_text routines, and of course the
above is completely pseudo code, but I think you might get the idea.
That's the "reserialize SAX events into text form" solution, which Rui
was objecting to.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Jul 7 '06 #12
ru********@gmail.com wrote:
>
Greger wrote:
>http://www.saxproject.org/quickstart.html
for java, what language do you use?

I'm using C++ at the moment with Qt's XML library.

That site seems rather nice. I'll read it to see if I can finally get a
hang of this XML parsing thing.
Thanks for your help
Rui Maciel
I have never used sax myself, using libxml2 tree in my project, but what
you'ld probably need to do is to "trigger" the function that processes the
contents of a tag when the tagtype you are looking for occurs.
Better:see the Qt documentation, I am sure there are simple ways to achieve
what you try to do.
--
Qx RSS Reader 1.2.6a released
RSS Reader for Linux.
http://www.gregerhaga.net/qxrss-1.2.6-dox
Jul 7 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Guy | last post by:
I have 4 images on a page and when someone clicks on one of them I open a secondary window and write a few lines to it describing the image. If they don't close that secondary window however, and...
4
by: Mori | last post by:
I am using masterPage and I need to populate a textbox that is in a content control with data from popup page that is not part of the master page. This code works if no masterpage is involved. ...
51
by: bigHairy | last post by:
Hello. I have been teaching myself .NET over the last few months and have had some success. I would like to ask a question though... A number of examples I have followed have the following in...
3
dmjpro
by: dmjpro | last post by:
plz send me a good link which can clearify me how the J2EE framework works i want the details information .... plz help thanx
14
by: Brad | last post by:
I have a .net 2.0 web application project that creates a pdf file, saves the pdf to disk (crystal reports does this part), and then my code reads the pdf file and writes it to the httpresponse ...
1
by: sudip2008 | last post by:
When using the Calendar Popup in a content page of a masterpage the strForName is always set to aspnetForm This breaks this line from working properly window.opener.document.forms...... How can...
1
by: shyaminf | last post by:
hi everybody! iam facing a problem with the transfer of file using servlet programming. i have a code for uploading a file. but i'm unable to execute it using tomcat5.5 server. kindly help me how to...
1
by: fortwilliam | last post by:
Hi, I am very new to "object oriented programming". I have this script which I didn't write but have altered and have been using for a while to allow people to upload files to a website. Now I am...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.