Connecting Tech Pros Worldwide Forums | Help | Site Map

Parse HTML ASCII

McHenry
Guest
 
Posts: n/a
#1: Jun 28 '06
When parsing HTML is it possible to have all the ASCII codes converted to
their real values first so that I do not need to search for them to exclude
them.

For example the following is retrieved as a price however it would be easier
to extract using a regex if the code was first converted to a dollar sign:

<h3>

$249,000

</h3>

Thanks in advance...



Ruben van Engelenburg
Guest
 
Posts: n/a
#2: Jun 28 '06

re: Parse HTML ASCII


McHenry wrote:[color=blue]
> For example the following is retrieved as a price however it would be easier
> to extract using a regex if the code was first converted to a dollar sign:
>
> <h3>
>
> $249,000
>
> </h3>[/color]


Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl
McHenry
Guest
 
Posts: n/a
#3: Jun 29 '06

re: Parse HTML ASCII



"Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
news:44a28738$0$31643$e4fe514c@news.xs4all.nl...[color=blue]
> McHenry wrote:[color=green]
>> For example the following is retrieved as a price however it would be
>> easier to extract using a regex if the code was first converted to a
>> dollar sign:
>>
>> <h3>
>>
>> $249,000
>>
>> </h3>[/color]
>
>
> Hi McHenry,
>
> You're probably looking for html_entity_decode():
>
> <?php
> echo html_entity_decode('<h3>$249,000</h3>');
> ?>
>
> HTH.
> Ruben.
>
> --
> http://www.phpforums.nl[/color]

When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?


IchBin
Guest
 
Posts: n/a
#4: Jun 29 '06

re: Parse HTML ASCII


McHenry wrote:[color=blue]
> "Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
> news:44a28738$0$31643$e4fe514c@news.xs4all.nl...[color=green]
>> McHenry wrote:[color=darkred]
>>> For example the following is retrieved as a price however it would be
>>> easier to extract using a regex if the code was first converted to a
>>> dollar sign:
>>>
>>> <h3>
>>>
>>> $249,000
>>>
>>> </h3>[/color]
>>
>> Hi McHenry,
>>
>> You're probably looking for html_entity_decode():
>>
>> <?php
>> echo html_entity_decode('<h3>$249,000</h3>');
>> ?>
>>
>> HTH.
>> Ruben.
>>
>> --
>> http://www.phpforums.nl[/color]
>
> When I run the example code above it outputs the HTML as it appears above
> and doesn't convert the ascii codes ?
>
>[/color]

when I run it I get $249,000

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

I am using PHP Designer and PHP version 5.1.4

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________ ________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
McHenry
Guest
 
Posts: n/a
#5: Jun 29 '06

re: Parse HTML ASCII



"IchBin" <weconsul@ptd.net> wrote in message
news:iMKcnZphKs4J6T7ZUSdV9g@ptd.net...[color=blue]
> McHenry wrote:[color=green]
>> "Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
>> news:44a28738$0$31643$e4fe514c@news.xs4all.nl...[color=darkred]
>>> McHenry wrote:
>>>> For example the following is retrieved as a price however it would be
>>>> easier to extract using a regex if the code was first converted to a
>>>> dollar sign:
>>>>
>>>> <h3>
>>>>
>>>> $249,000
>>>>
>>>> </h3>
>>>
>>> Hi McHenry,
>>>
>>> You're probably looking for html_entity_decode():
>>>
>>> <?php
>>> echo html_entity_decode('<h3>$249,000</h3>');
>>> ?>
>>>
>>> HTH.
>>> Ruben.
>>>
>>> --
>>> http://www.phpforums.nl[/color]
>>
>> When I run the example code above it outputs the HTML as it appears above
>> and doesn't convert the ascii codes ?[/color]
>
> when I run it I get $249,000[/color]

Output displayed in the browser ? Maybe the browser is converting the ASCII
however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...
[color=blue]
>
> <?php
> echo html_entity_decode('<h3>$249,000</h3>');
> ?>
>
> I am using PHP Designer and PHP version 5.1.4
>
> Thanks in Advance...
> IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
> __________________________________________________ ________________________
>
> 'If there is one, Knowledge is the "Fountain of Youth"'
> -William E. Taylor, Regular Guy (1952-)[/color]


Kimmo Laine
Guest
 
Posts: n/a
#6: Jun 29 '06

re: Parse HTML ASCII


"McHenry" <mchenry@mchenry.com> wrote in message
news:44a377a2$0$12236$5a62ac22@per-qv1-newsreader-01.iinet.net.au...[color=blue]
>
> "IchBin" <weconsul@ptd.net> wrote in message
> news:iMKcnZphKs4J6T7ZUSdV9g@ptd.net...[color=green]
>> McHenry wrote:[color=darkred]
>>> "Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
>>> news:44a28738$0$31643$e4fe514c@news.xs4all.nl...
>>>> McHenry wrote:
>>>>> For example the following is retrieved as a price however it would be
>>>>> easier to extract using a regex if the code was first converted to a
>>>>> dollar sign:
>>>>>
>>>>> <h3>
>>>>>
>>>>> $249,000
>>>>>
>>>>> </h3>
>>>>
>>>> Hi McHenry,
>>>>
>>>> You're probably looking for html_entity_decode():
>>>>
>>>> <?php
>>>> echo html_entity_decode('<h3>$249,000</h3>');
>>>> ?>
>>>>
>>>> HTH.
>>>> Ruben.
>>>>
>>>> --
>>>> http://www.phpforums.nl
>>>
>>> When I run the example code above it outputs the HTML as it appears
>>> above and doesn't convert the ascii codes ?[/color]
>>
>> when I run it I get $249,000[/color]
>
> Output displayed in the browser ? Maybe the browser is converting the
> ASCII however it is still being fed the raw codes by PHP
>
> If you output the function to a txt file you'll find it's still the raw
> codes...
>[/color]

Read The Fine Manual :)

http://php.net/html-entity-decode

On that page, there's an example code:
function unhtmlentities($string)
{
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))',
$string);
$string = preg_replace('~&#([0-9]+);~e', 'chr(\\1)', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}

Try and see if it works.

--
"ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" -lpk
spam@outolempi.net | Gedoon-S @ IRCnet | rot13(xvzzb@bhgbyrzcv.arg)


Ruben van Engelenburg
Guest
 
Posts: n/a
#7: Jun 29 '06

re: Parse HTML ASCII


McHenry wrote:
[color=blue]
> Output displayed in the browser ? Maybe the browser is converting the ASCII
> however it is still being fed the raw codes by PHP
>
> If you output the function to a txt file you'll find it's still the raw
> codes...[/color]

What exactly do you mean by "raw codes"? And also what tool are you
using to view the stored file?

I'd try storing the output utf-8 encoded, either by using something like:

utf8_encode(html_entity_decode('<h3>$249,000</h3>'));

or just use the third parameter to html_entity_decode being the output
encoding.

Then store that to a text file and open it in a unicode aware editor.

HTH.
Ruben.
--
http://www.phpforums.nl
Closed Thread