Connecting Tech Pros Worldwide Help | Site Map

Parse HTML ASCII

 
LinkBack Thread Tools Search this Thread
  #1  
Old June 28th, 2006, 01:25 PM
McHenry
Guest
 
Posts: n/a
Default Parse HTML ASCII

When parsing HTML is it possible to have all the ASCII codes converted to
their real values first so that I do not need to search for them to exclude
them.

For example the following is retrieved as a price however it would be easier
to extract using a regex if the code was first converted to a dollar sign:

<h3>

$249,000

</h3>

Thanks in advance...



  #2  
Old June 28th, 2006, 01:45 PM
Ruben van Engelenburg
Guest
 
Posts: n/a
Default Re: Parse HTML ASCII

McHenry wrote:[color=blue]
> For example the following is retrieved as a price however it would be easier
> to extract using a regex if the code was first converted to a dollar sign:
>
> <h3>
>
> $249,000
>
> </h3>[/color]


Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl
  #3  
Old June 29th, 2006, 04:25 AM
McHenry
Guest
 
Posts: n/a
Default Re: Parse HTML ASCII


"Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
news:44a28738$0$31643$e4fe514c@news.xs4all.nl...[color=blue]
> McHenry wrote:[color=green]
>> For example the following is retrieved as a price however it would be
>> easier to extract using a regex if the code was first converted to a
>> dollar sign:
>>
>> <h3>
>>
>> $249,000
>>
>> </h3>[/color]
>
>
> Hi McHenry,
>
> You're probably looking for html_entity_decode():
>
> <?php
> echo html_entity_decode('<h3>$249,000</h3>');
> ?>
>
> HTH.
> Ruben.
>
> --
> http://www.phpforums.nl[/color]

When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?


  #4  
Old June 29th, 2006, 06:35 AM
IchBin
Guest
 
Posts: n/a
Default Re: Parse HTML ASCII

McHenry wrote:[color=blue]
> "Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
> news:44a28738$0$31643$e4fe514c@news.xs4all.nl...[color=green]
>> McHenry wrote:[color=darkred]
>>> For example the following is retrieved as a price however it would be
>>> easier to extract using a regex if the code was first converted to a
>>> dollar sign:
>>>
>>> <h3>
>>>
>>> $249,000
>>>
>>> </h3>[/color]
>>
>> Hi McHenry,
>>
>> You're probably looking for html_entity_decode():
>>
>> <?php
>> echo html_entity_decode('<h3>$249,000</h3>');
>> ?>
>>
>> HTH.
>> Ruben.
>>
>> --
>> http://www.phpforums.nl[/color]
>
> When I run the example code above it outputs the HTML as it appears above
> and doesn't convert the ascii codes ?
>
>[/color]

when I run it I get $249,000

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

I am using PHP Designer and PHP version 5.1.4

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________ ________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
  #5  
Old June 29th, 2006, 06:45 AM
McHenry
Guest
 
Posts: n/a
Default Re: Parse HTML ASCII


"IchBin" <weconsul@ptd.net> wrote in message
news:iMKcnZphKs4J6T7ZUSdV9g@ptd.net...[color=blue]
> McHenry wrote:[color=green]
>> "Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
>> news:44a28738$0$31643$e4fe514c@news.xs4all.nl...[color=darkred]
>>> McHenry wrote:
>>>> For example the following is retrieved as a price however it would be
>>>> easier to extract using a regex if the code was first converted to a
>>>> dollar sign:
>>>>
>>>> <h3>
>>>>
>>>> $249,000
>>>>
>>>> </h3>
>>>
>>> Hi McHenry,
>>>
>>> You're probably looking for html_entity_decode():
>>>
>>> <?php
>>> echo html_entity_decode('<h3>$249,000</h3>');
>>> ?>
>>>
>>> HTH.
>>> Ruben.
>>>
>>> --
>>> http://www.phpforums.nl[/color]
>>
>> When I run the example code above it outputs the HTML as it appears above
>> and doesn't convert the ascii codes ?[/color]
>
> when I run it I get $249,000[/color]

Output displayed in the browser ? Maybe the browser is converting the ASCII
however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...
[color=blue]
>
> <?php
> echo html_entity_decode('<h3>$249,000</h3>');
> ?>
>
> I am using PHP Designer and PHP version 5.1.4
>
> Thanks in Advance...
> IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
> __________________________________________________ ________________________
>
> 'If there is one, Knowledge is the "Fountain of Youth"'
> -William E. Taylor, Regular Guy (1952-)[/color]


  #6  
Old June 29th, 2006, 08:05 AM
Kimmo Laine
Guest
 
Posts: n/a
Default Re: Parse HTML ASCII

"McHenry" <mchenry@mchenry.com> wrote in message
news:44a377a2$0$12236$5a62ac22@per-qv1-newsreader-01.iinet.net.au...[color=blue]
>
> "IchBin" <weconsul@ptd.net> wrote in message
> news:iMKcnZphKs4J6T7ZUSdV9g@ptd.net...[color=green]
>> McHenry wrote:[color=darkred]
>>> "Ruben van Engelenburg" <ruben@NOSPAM.nl> wrote in message
>>> news:44a28738$0$31643$e4fe514c@news.xs4all.nl...
>>>> McHenry wrote:
>>>>> For example the following is retrieved as a price however it would be
>>>>> easier to extract using a regex if the code was first converted to a
>>>>> dollar sign:
>>>>>
>>>>> <h3>
>>>>>
>>>>> $249,000
>>>>>
>>>>> </h3>
>>>>
>>>> Hi McHenry,
>>>>
>>>> You're probably looking for html_entity_decode():
>>>>
>>>> <?php
>>>> echo html_entity_decode('<h3>$249,000</h3>');
>>>> ?>
>>>>
>>>> HTH.
>>>> Ruben.
>>>>
>>>> --
>>>> http://www.phpforums.nl
>>>
>>> When I run the example code above it outputs the HTML as it appears
>>> above and doesn't convert the ascii codes ?[/color]
>>
>> when I run it I get $249,000[/color]
>
> Output displayed in the browser ? Maybe the browser is converting the
> ASCII however it is still being fed the raw codes by PHP
>
> If you output the function to a txt file you'll find it's still the raw
> codes...
>[/color]

Read The Fine Manual :)

http://php.net/html-entity-decode

On that page, there's an example code:
function unhtmlentities($string)
{
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))',
$string);
$string = preg_replace('~&#([0-9]+);~e', 'chr(\\1)', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}

Try and see if it works.

--
"ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" -lpk
spam@outolempi.net | Gedoon-S @ IRCnet | rot13(xvzzb@bhgbyrzcv.arg)


  #7  
Old June 29th, 2006, 08:05 AM
Ruben van Engelenburg
Guest
 
Posts: n/a
Default Re: Parse HTML ASCII

McHenry wrote:
[color=blue]
> Output displayed in the browser ? Maybe the browser is converting the ASCII
> however it is still being fed the raw codes by PHP
>
> If you output the function to a txt file you'll find it's still the raw
> codes...[/color]

What exactly do you mean by "raw codes"? And also what tool are you
using to view the stored file?

I'd try storing the output utf-8 encoded, either by using something like:

utf8_encode(html_entity_decode('<h3>$249,000</h3>'));

or just use the third parameter to html_entity_decode being the output
encoding.

Then store that to a text file and open it in a unicode aware editor.

HTH.
Ruben.
--
http://www.phpforums.nl
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.