By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,745 Members | 1,873 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,745 IT Pros & Developers. It's quick & easy.

Parse HTML ASCII

P: n/a
When parsing HTML is it possible to have all the ASCII codes converted to
their real values first so that I do not need to search for them to exclude
them.

For example the following is retrieved as a price however it would be easier
to extract using a regex if the code was first converted to a dollar sign:

<h3>

$249,000

</h3>

Thanks in advance...
Jun 28 '06 #1
Share this Question
Share on Google+
6 Replies


P: n/a
McHenry wrote:
For example the following is retrieved as a price however it would be easier
to extract using a regex if the code was first converted to a dollar sign:

<h3>

$249,000

</h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl
Jun 28 '06 #2

P: n/a

"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
For example the following is retrieved as a price however it would be
easier to extract using a regex if the code was first converted to a
dollar sign:

<h3>

$249,000

</h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl


When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?
Jun 29 '06 #3

P: n/a
McHenry wrote:
"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
For example the following is retrieved as a price however it would be
easier to extract using a regex if the code was first converted to a
dollar sign:

<h3>

$249,000

</h3>


Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl


When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?


when I run it I get $249,000

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

I am using PHP Designer and PHP version 5.1.4

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________ ________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
Jun 29 '06 #4

P: n/a

"IchBin" <we******@ptd.net> wrote in message
news:iM********************@ptd.net...
McHenry wrote:
"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
For example the following is retrieved as a price however it would be
easier to extract using a regex if the code was first converted to a
dollar sign:

<h3>

$249,000

</h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl
When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?


when I run it I get $249,000


Output displayed in the browser ? Maybe the browser is converting the ASCII
however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

I am using PHP Designer and PHP version 5.1.4

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________ ________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)

Jun 29 '06 #5

P: n/a
"McHenry" <mc*****@mchenry.com> wrote in message
news:44***********************@per-qv1-newsreader-01.iinet.net.au...

"IchBin" <we******@ptd.net> wrote in message
news:iM********************@ptd.net...
McHenry wrote:
"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
> For example the following is retrieved as a price however it would be
> easier to extract using a regex if the code was first converted to a
> dollar sign:
>
> <h3>
>
> $249,000
>
> </h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl

When I run the example code above it outputs the HTML as it appears
above and doesn't convert the ascii codes ?


when I run it I get $249,000


Output displayed in the browser ? Maybe the browser is converting the
ASCII however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...


Read The Fine Manual :)

http://php.net/html-entity-decode

On that page, there's an example code:
function unhtmlentities($string)
{
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))',
$string);
$string = preg_replace('~&#([0-9]+);~e', 'chr(\\1)', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}

Try and see if it works.

--
"ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" -lpk
sp**@outolempi.net | Gedoon-S @ IRCnet | rot13(xv***@bhgbyrzcv.arg)
Jun 29 '06 #6

P: n/a
McHenry wrote:
Output displayed in the browser ? Maybe the browser is converting the ASCII
however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...


What exactly do you mean by "raw codes"? And also what tool are you
using to view the stored file?

I'd try storing the output utf-8 encoded, either by using something like:

utf8_encode(html_entity_decode('<h3>$249,000</h3>'));

or just use the third parameter to html_entity_decode being the output
encoding.

Then store that to a text file and open it in a unicode aware editor.

HTH.
Ruben.
--
http://www.phpforums.nl
Jun 29 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.