473,322 Members | 1,671 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

reg exp

Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp. It uses \ as an escape
character. So \&nbsp will become &nbsp. The final results are
correct, but is there a better way to do this?

Input file test.txt
\HOME & \  BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

1st change
1a= \HOME & \  BORN \& FREE BORN FREE '' \' HELP " \"
w\\\\\\\w
2nd changes
1b= HOME &   BORN & FREE BORN FREE '' ' HELP " "
w\\\w

#!/usr/local/bin/perl5
#
%encode = ( '&' => '&',
'"' => '"',
'\'' => '\'\'' );

$data = `cat test.txt`;
print "Oa= $data\n";
$data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
print "1a= $data\n";
$data =~ s/(\\)(.)/$2/g;
print "1b= $data\n";
This is perl, v5.8.0 built for PA-RISC2.0 On HP-Unix.
Jul 19 '05 #1
5 4064
Ken Chesak wrote:
Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp. It uses \ as an escape
character. So \&nbsp will become &nbsp. The final results are
correct, but is there a better way to do this?

Input file test.txt
\HOME & \&nbsp; BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

1st change
1a= \HOME &amp; \&nbsp; BORN \& FREE BORN FREE '' \' HELP &quot; \"
w\\\\\\\w
2nd changes
1b= HOME &amp; &nbsp; BORN & FREE BORN FREE '' ' HELP &quot; "
w\\\w

#!/usr/local/bin/perl5
#
%encode = ( '&' => '&amp;',
'"' => '&quot;',
'\'' => '\'\'' );

$data = `cat test.txt`;
print "Oa= $data\n";
$data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
print "1a= $data\n";
$data =~ s/(\\)(.)/$2/g;
print "1b= $data\n";


Don't know about better, but this does it with one substitution, and
does not require escaping of HTML entities in the original text:

$data =~ s{(&#?\w+;)|\\(.)|([&"'])}
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

Another thing is that I'm a bit confused about the wider purpose with
the exercise...

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Jul 19 '05 #2
Ken Chesak wrote:
Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp.


You've got bad or inconsistent input data.
Whatever process created the "&nbsp;" items is responsible for making
sure that all the other & occurances are set to "&amp;". You should
fix the upstream process instead of doing post-processing.
-Joe
Jul 19 '05 #3
Gunnar Hjalmarsson <no*****@gunnar.cc> wrote in message news:<eh*********************@newsc.telia.net>...
Ken Chesak wrote:
Perl scipt is formatting text for HTML page. It changes things like
an & to &amp. But should not change &nbsp. It uses \ as an escape
character. So \&nbsp will become &nbsp. The final results are
correct, but is there a better way to do this?

Input file test.txt
\HOME & \&nbsp; BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w

1st change
1a= \HOME &amp; \&nbsp; BORN \& FREE BORN FREE '' \' HELP &quot; \"
w\\\\\\\w
2nd changes
1b= HOME &amp; &nbsp; BORN & FREE BORN FREE '' ' HELP &quot; "
w\\\w

#!/usr/local/bin/perl5
#
%encode = ( '&' => '&amp;',
'"' => '&quot;',
'\'' => '\'\'' );

$data = `cat test.txt`;
print "Oa= $data\n";
$data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
print "1a= $data\n";
$data =~ s/(\\)(.)/$2/g;
print "1b= $data\n";


Don't know about better, but this does it with one substitution, and
does not require escaping of HTML entities in the original text:

$data =~ s{(&#?\w+;)|\\(.)|([&"'])}
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

Another thing is that I'm a bit confused about the wider purpose with
the exercise...


Gunnar,

Thanks, that works nicely. I had not thought of using the ";" to
anchor the html reserved words.

I had one question, what does the ? and : do on the following line,
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

The purpose of the script is to format the text for HTML. It was
originally changing all & to &amp. So when they started putting &nbsp
in, that was being changed to &ampnbsp. Which does not mean anything
to HTML.

Thanks again,
Ken
Jul 19 '05 #4
Ken Chesak wrote:
I had one question, what does the ? and : do on the following line,
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;


It's called the conditional operator, and is a shorter way of writing

if ($1) {
$1
} elsif ($2) {
$2
} else {
$encode{$3}
}

See "perldoc perlop".

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Jul 19 '05 #5
Gunnar Hjalmarsson <no*****@gunnar.cc> wrote in message news:<91*********************@newsc.telia.net>...
Ken Chesak wrote:
I had one question, what does the ? and : do on the following line,
{ $1 ? $1 : $2 ? $2 : $encode{$3} }eg;


It's called the conditional operator, and is a shorter way of writing

if ($1) {
$1
} elsif ($2) {
$2
} else {
$encode{$3}
}


Or a longer way of writing...

$1 || $2 || $encode{$3}

....depending on your point of view.
Jul 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: William C. White | last post by:
Does anyone know of a way to use PHP /w Authorize.net AIM without using cURL? Our website is hosted on a shared drive and the webhost company doesn't installed additional software (such as cURL)...
2
by: Albert Ahtenberg | last post by:
Hello, I don't know if it is only me but I was sure that header("Location:url") redirects the browser instantly to URL, or at least stops the execution of the code. But appearantely it continues...
3
by: James | last post by:
Hi, I have a form with 2 fields. 'A' 'B' The user completes one of the fields and the form is submitted. On the results page I want to run a query, but this will change subject to which...
0
by: Ollivier Robert | last post by:
Hello, I'm trying to link PHP with Oracle 9.2.0/OCI8 with gcc 3.2.3 on a Solaris9 system. The link succeeds but everytime I try to run php, I get a SEGV from inside the libcnltsh.so library. ...
1
by: Richard Galli | last post by:
I want viewers to compare state laws on a single subject. Imagine a three-column table with a drop-down box on the top. A viewer selects a state from the list, and that state's text fills the...
4
by: Albert Ahtenberg | last post by:
Hello, I have two questions. 1. When the user presses the back button and returns to a form he filled the form is reseted. How do I leave there the values he inserted? 2. When the...
1
by: inderjit S Gabrie | last post by:
Hi all Here is the scenerio ...is it possibly to do this... i am getting valid course dates output on to a web which i have designed ....all is okay so far , look at the following web url ...
2
by: Jack | last post by:
Hi All, What is the PHP equivilent of Oracle bind variables in a SQL statement, e.g. select x from y where z=:parameter Which in asp/jsp would be followed by some statements to bind a value...
3
by: Sandwick | last post by:
I am trying to change the size of a drawing so they are all 3x3. the script below is what i was trying to use to cut it in half ... I get errors. I can display the normal picture but not the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.