By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,501 Members | 2,861 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,501 IT Pros & Developers. It's quick & easy.

XML Parser

P: n/a
I am updating a xml parser developed by a third party. The xml below is
parsed by the system using the code that follows:

<Company>
- <section id="officers">
- <section id="secretary">
<section id="stakeholder id">15728499</section>
<section id="name">FRED BLOGGS</section>
</section>
- <section id="director" type="current">
<section id="stakeholder id">15728498</section>
<section id="name">JOHN SMITH</section>
</section>
</section>
- <section id="company identification">
<section id="name">ABC LIMITED</section>

$extract = parseXMLForLtd($xml, $type);
function parseXMLForLtd($xml="", $type) {
preg_match_all( "/\<section(.*?)\<\/section\>/s", $xml, $reportData);
$reportData[0] = preg_replace('/\n/i','',$reportData[0]);
$result = $reportData[0];
for($i=0; $i<count($result);$i++) {
$item = trim(stripslashes($result[$i]));

if(eregi('<section id="name">',$item) && eregi('<section id="company
identification">',$item)) {
$extract[name] = preg_replace('/([<][\/a-zA-Z0-9 ="-]+[>])/i', '',
$item);

However when I try to parse a the other field called "name" using:

else if(eregi('<section id="name">',$item) && eregi('<section
id="secretary">',$item)) {
$extract[secretary] = preg_replace('/([<][\/a-zA-Z0-9 ="-]+[>])/i',
'', $item);

It does not work. Could anyone please advise how I can extract "Fred
Bloggs"?

Thanks In Advance
Jan 4 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a

Richard Price <ne**@directroute.co.ukwrote in
<13*************@corp.supernews.com>:
I am updating a xml parser developed by a third party. The
xml below is parsed by the system using the code that
follows:

$extract = parseXMLForLtd($xml, $type);
function parseXMLForLtd($xml="", $type) {
preg_match_all( "/\<section(.*?)\<\/section\>/s",
$xml, $reportData); $reportData[0] =
preg_replace('/\n/i','',$reportData[0]); $result =
$reportData[0]; for($i=0; $i<count($result);$i++) {
$item = trim(stripslashes($result[$i]));

if(eregi('<section id="name">',$item) && eregi('<section
id="company
identification">',$item)) {
$extract[name] = preg_replace('/([<][\/a-zA-Z0-9
="-]+[>])/i', '',
$item);
OMG.
However when I try to parse a the other field called
"name" using:

else if(eregi('<section id="name">',$item) &&
eregi('<section id="secretary">',$item)) {
$extract[secretary] =
preg_replace('/([<][\/a-zA-Z0-9 ="-]+[>])/i',
'', $item);

It does not work. Could anyone please advise how I can
extract "Fred Bloggs"?
Parsing hierarchical markup languages using regexen is an
exercise in futility, if not worse.

<http://www.php.net/manual/en/ref.dom.php>

An XPath expression fetching the node you need would be:

/Company/section[@id='officers']/
section[@id='secretary']/section[@id='name']/text()

--
....also, I submit that we all must honourably commit seppuku
right now rather than serve the Dark Side by producing the
HTML 5 spec.
Jan 4 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.