By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,778 Members | 1,318 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,778 IT Pros & Developers. It's quick & easy.

Remove data outside a pair of xml tags.

P: n/a
Hi !

I am a PHP beginner.
I hope somebody can help me with this problem that I've been having.
I've been trying to clean up junk data that I have at the begining and
ending of an xml file. Let's say I have an xml file with some junk data
like below

--------------------------------------------------------------
Junk dataflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk dataga.
---------------------------------------------------------------

Does someone know how to remove the junk in the file and just return
the actual xml stuff( including the <firsttag> tag ?

I tried to use strpos and substr but somehow the tag in the string is
not being returned.

Any help would be appreciated !

Thanks,
Lazy Pig

Jun 21 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
*** la*******@gmail.com escribió/wrote (21 Jun 2006 11:08:10 -0700):
--------------------------------------------------------------
Junk dataflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk dataga.
---------------------------------------------------------------

Does someone know how to remove the junk in the file and just return
the actual xml stuff( including the <firsttag> tag ?


$foo=preg_replace('/^.*<firsttag>/mU', '', $foo);
$foo=preg_replace('/<\/firsttag>.*$/mU', '', $foo);
This is just an idea, make sure it works as expected in all special cases.
--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Jun 21 '06 #2

P: n/a
la*******@gmail.com wrote:

--------------------------------------------------------------
Junk dataflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk dataga.
---------------------------------------------------------------

Does someone know how to remove the junk in the file and just return
the actual xml stuff( including the <firsttag> tag ?


I believe this should work:

<?php
$str = 'Junk dataflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk dataga.
';

echo preg_replace('/.*?<(.*)>.*/s', '<$1>', $str);
?>
--
Tommy Gildseth
http://design.twobarks.com/
Jun 21 '06 #3

P: n/a
Tommy Gildseth wrote:
la*******@gmail.com wrote:

--------------------------------------------------------------
Junk dataflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk dataga.
---------------------------------------------------------------

Does someone know how to remove the junk in the file and just return
the actual xml stuff( including the <firsttag> tag ?

I believe this should work:

....snip snip php code


Well.... not quite, if the junk data contains < or >

This might be better:

<?php
$str = 'Junk> dat<aflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk data>ga. < sadfsda fsd
';

echo preg_replace('/.*?(<[^<]+>.*<.*?>).*/s', '$1', $str);
?>

--
Tommy Gildseth
http://design.twobarks.com/
Jun 21 '06 #4

P: n/a
Thank you all for your responses !

It turned out my xml file contains junks within its xml tags such as
below.

----------------------------------------------
1ffc
<firsttag>
<secondtag>some data</secondtag>
<third
1ffc
tag>Data for third tag</thirdtag>
</firsttag>
0
--------------------------------------------------
I did some research on the web and it turned out the junks that I have
in the xml file are Greek's characters (The junks are "1ffc", "fa1",
and some numbers including number 0 at the end of xml file. These junks
prevent the xml to be parsed correctly.

Does anybody have any idea to to get rid of these junk/ convert these
characters to empty string/characters ?

Thanks for your help,

Lazy Pig

la*******@gmail.com wrote:
Hi !

I am a PHP beginner.
I hope somebody can help me with this problem that I've been having.
I've been trying to clean up junk data that I have at the begining and
ending of an xml file. Let's say I have an xml file with some junk data
like below

--------------------------------------------------------------
Junk dataflasjfasj
<firsttag>
<secondtag>data</secondtag>
<thirdtag>whatever</thirdtag>
</firsttag>

junk dataga.
---------------------------------------------------------------

Does someone know how to remove the junk in the file and just return
the actual xml stuff( including the <firsttag> tag ?

I tried to use strpos and substr but somehow the tag in the string is
not being returned.

Any help would be appreciated !

Thanks,
Lazy Pig


Jun 21 '06 #5

P: n/a
*** la*******@gmail.com escribió/wrote (21 Jun 2006 15:40:00 -0700):
I did some research on the web and it turned out the junks that I have
in the xml file are Greek's characters (The junks are "1ffc", "fa1",
and some numbers including number 0 at the end of xml file. These junks
prevent the xml to be parsed correctly.

Does anybody have any idea to to get rid of these junk/ convert these
characters to empty string/characters ?


This reminds me of raw responses when using the "chunked" transfer
encoding. Check user notes in fsockopen() and fpassthru() manual pages.

Also, if you're downloading the file from your script, I'd suggest you try
Curl functions and see if garbage goes away.

--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Jun 21 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.