473,467 Members | 1,589 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Parse HTML ASCII

When parsing HTML is it possible to have all the ASCII codes converted to
their real values first so that I do not need to search for them to exclude
them.

For example the following is retrieved as a price however it would be easier
to extract using a regex if the code was first converted to a dollar sign:

<h3>

$249,000

</h3>

Thanks in advance...
Jun 28 '06 #1
6 2081
McHenry wrote:
For example the following is retrieved as a price however it would be easier
to extract using a regex if the code was first converted to a dollar sign:

<h3>

$249,000

</h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl
Jun 28 '06 #2

"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
For example the following is retrieved as a price however it would be
easier to extract using a regex if the code was first converted to a
dollar sign:

<h3>

$249,000

</h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl


When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?
Jun 29 '06 #3
McHenry wrote:
"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
For example the following is retrieved as a price however it would be
easier to extract using a regex if the code was first converted to a
dollar sign:

<h3>

$249,000

</h3>


Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl


When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?


when I run it I get $249,000

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

I am using PHP Designer and PHP version 5.1.4

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________ ________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
Jun 29 '06 #4

"IchBin" <we******@ptd.net> wrote in message
news:iM********************@ptd.net...
McHenry wrote:
"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
For example the following is retrieved as a price however it would be
easier to extract using a regex if the code was first converted to a
dollar sign:

<h3>

$249,000

</h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl
When I run the example code above it outputs the HTML as it appears above
and doesn't convert the ascii codes ?


when I run it I get $249,000


Output displayed in the browser ? Maybe the browser is converting the ASCII
however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

I am using PHP Designer and PHP version 5.1.4

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________ ________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)

Jun 29 '06 #5
"McHenry" <mc*****@mchenry.com> wrote in message
news:44***********************@per-qv1-newsreader-01.iinet.net.au...

"IchBin" <we******@ptd.net> wrote in message
news:iM********************@ptd.net...
McHenry wrote:
"Ruben van Engelenburg" <ru***@NOSPAM.nl> wrote in message
news:44***********************@news.xs4all.nl...
McHenry wrote:
> For example the following is retrieved as a price however it would be
> easier to extract using a regex if the code was first converted to a
> dollar sign:
>
> <h3>
>
> $249,000
>
> </h3>

Hi McHenry,

You're probably looking for html_entity_decode():

<?php
echo html_entity_decode('<h3>$249,000</h3>');
?>

HTH.
Ruben.

--
http://www.phpforums.nl

When I run the example code above it outputs the HTML as it appears
above and doesn't convert the ascii codes ?


when I run it I get $249,000


Output displayed in the browser ? Maybe the browser is converting the
ASCII however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...


Read The Fine Manual :)

http://php.net/html-entity-decode

On that page, there's an example code:
function unhtmlentities($string)
{
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))',
$string);
$string = preg_replace('~&#([0-9]+);~e', 'chr(\\1)', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}

Try and see if it works.

--
"ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" -lpk
sp**@outolempi.net | Gedoon-S @ IRCnet | rot13(xv***@bhgbyrzcv.arg)
Jun 29 '06 #6
McHenry wrote:
Output displayed in the browser ? Maybe the browser is converting the ASCII
however it is still being fed the raw codes by PHP

If you output the function to a txt file you'll find it's still the raw
codes...


What exactly do you mean by "raw codes"? And also what tool are you
using to view the stored file?

I'd try storing the output utf-8 encoded, either by using something like:

utf8_encode(html_entity_decode('<h3>$249,000</h3>'));

or just use the third parameter to html_entity_decode being the output
encoding.

Then store that to a text file and open it in a unicode aware editor.

HTH.
Ruben.
--
http://www.phpforums.nl
Jun 29 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Mitchua | last post by:
When I run the well quoted line: my $ascii = HTML::FormatText->new->format(HTML::Parse::parse_html($html)); to remove HTML tags from an html document, it replaces all tables with "". Is there a...
11
by: Patrick Van Esch | last post by:
Hello, I have the following problem of principle: in writing HTML pages containing ancient greek, there are two possibilities: one is to write the unicode characters directly (encoded as two...
19
by: Johnny Google | last post by:
Here is an example of the type of data from a file I will have: Apple,4322,3435,4653,6543,4652 Banana,6934,5423,6753,6531 Carrot,3454,4534,3434,1111,9120,5453 Cheese,4411,5522,6622,6641 The...
1
by: Peter Williams | last post by:
Hello All, I'm a newbie to this ng. I'm posting here because I have a question about debugging some javascript on some pages of my website. Please don't call me a "troll" -- because I'm not one....
5
by: enkosi | last post by:
<chart> <row label="1"> <col>66</col> <col>100</col> </row> <row label="2"> <col>14</col> </row> </chart>
5
by: js | last post by:
I have a textbox contains text in the format of "yyyy/MM/dd hh:mm:ss". I need to parse the text using System.DateTime.Parse() function with custom format. I got an error using the following code. ...
9
by: RMC | last post by:
Hello, I'm looking for a way to parse/format a memo field within a report. The Access 2000 database (application) has an equipment table that holds a memo field. Within the report, the memo...
2
by: Rick Stem | last post by:
I have checkURL(http://globalwarmingawareness2007.org.uk, globalwarmingawareness2007.org.uk) I see almost everyone using regular expressions. But I don't completely trust them. Don't know if...
3
by: deerhide | last post by:
Hi, I was updating my website, well trying to... and I somehow messed it up. I didnt build it, I bought it so I don't know alot about programming. I receive these errors when going to...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.