473,796 Members | 2,460 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

urlencode vs rawurlencode

Hi All,

I can see from the manual that the difference between urlencode and
rawurlencode is that urlencode translates spaces to '+' characters, whereas
rawurlencode translates it into it's hex code.

My question is, is there any real world difference between these two
functions? Or perhaps another way of asking the question: *why* are there
two different functions? In what situation would you need one, and not be
able to use the other?

Thanks!

-Josh
Jul 17 '05 #1
3 28426
Joshua Beall wrote:
I can see from the manual that the difference between urlencode and
rawurlencode is that urlencode translates spaces to '+' characters, whereas
rawurlencode translates it into it's hex code.

My question is, is there any real world difference between these two
functions?
I don't know.
Or perhaps another way of asking the question: *why* are there two
different functions?
A good question. I don't know the answer to that either.

A plus sign is reserved in the query component. A reserved character
may be used for its reserved purpose or, if it doesn't conflict with
the reserved purpose, as data.

Spaces encoded as plus signs is specific to form encoding. The
HTML4.01 specification describes the encoding process: "[i]f the
method is 'get' and the action is an HTTP URI, the user agent takes
the value of action, appends a `?' to it, then appends the form data
set, encoded using the 'application/x-www-form-urlencoded' content
type" (HTML4.01, sec. 17.13.3). So, here, spaces are encoded as plus
signs; elsewhere, spaces are encoded as "%20", as explained in
RFC2396, section 2.4.

Consider:

1. <http://domain.example/?baz=foo+bar>
2. <http://domain.example/?baz=foo%20bar>
3. <http://domain.example/?baz=foo%2Bbar>

All three are syntactically valid URIs. The first could be a URI
generated from an HTML form, where the action specified was
<http://domain.example/>, the method GET and the form data set
consisting of a control named "baz" with current value "foo bar". The
space in the current value is replaced with a plus sign.

Reading Björn Höhrmann's explanation of reserved characters in

"Re: Good/Bad - URI encoding in HTML editor",
http://lists.w3.org/Archives/Public/...2May/0032.html

we see that numbers one and two are *not* equivalent.

Also related is Terje Bless' request for clarification

"Ambiguity of Allowed/Recommended URI Syntax and Escaping",
http://lists.w3.org/Archives/Public/...2Nov/0014.html
In what situation would you need one, and not be able to use the other?


That depends on the URI generator, I think.

The documentation for urlencode says "[t]his function is convenient
when encoding a string to be used in a query part of a URL" [1]. I
don't see any reason to favour it over rawurlencode, however, which
encodes as per section 2.4 of RFC2396 (modulo the fact it always
encodes certain unreserved characters [2]).

Refs.:

"Uniform Resource Identifiers (URI): Generic Syntax", 1998,
http://www.ietf.org/rfc/rfc2396.txt

"Uniform Resource Locators (URL)", 1994,
http://www.ietf.org/rfc/rfc1738.txt
[1] "PHP: urlencode - Manual",
http://www.php.net/manual/en/function.urlencode.php

[2] Section 2.3 of RFC2396 says:

| Unreserved characters can be escaped without changing the semantics
| of the URI, but this should not be done unless the URI is being used
| in a context that does not allow the unescaped character to appear.

--
Jock
Jul 17 '05 #2
If I sound confused, that's because I am.

John Dunlop wrote:
Consider:

1. <http://domain.example/?baz=foo+bar>
2. <http://domain.example/?baz=foo%20bar>
3. <http://domain.example/?baz=foo%2Bbar>
[ ... ]
Reading Björn Höhrmann's explanation of reserved characters in

"Re: Good/Bad - URI encoding in HTML editor",
http://lists.w3.org/Archives/Public/...2May/0032.html

we see that numbers one and two are *not* equivalent.


Actually, I think, numbers one and two are equivalent. Hopefully I've
got this straight in my head now. :-)

RFC1630, which I hadn't read before, sums up Tim BL's original intent:

| Within the query string, the plus sign is reserved as shorthand
| notation for a space. Therefore, real plus signs must be encoded.
| This method was used to make query URIs easier to pass in systems
| which did not allow spaces.

According to RFC1738, sec. 3.3, however, plus signs weren't reserved
in the query component ("searchpart ") of an HTTP URL. That means they
had no reserved purpose, so a plus sign meant a plus sign, not a
space, and they didn't need encoded.

Then came along RFC2396 and the plus sign became reserved in the query
component again. Real plus signs must now be encoded. It doesn't say
what the reserved purpose is for plus signs. I guess, then, plus
signs are shorthand for spaces.

Previously, I was under the impression that a question mark mustn't
appear in query components. It seems I was wrong. A URI may contain
more than one question mark, although URI generators are discouraged
from generating such URIs. The second "?" should always be treated as
data by parsers. See

Roy T. Fielding, 2002-11-17, "Re: Ambiguity of Allowed/Recommended URI
Syntax and Escaping",
http://lists.w3.org/Archives/Public/...2Nov/0015.html

Refs.:

RFC1630 (informational) , 1994-06, "Universal Resource Identifiers in
WWW: A Unifying Syntax for the Expression of Names and Addresses of
Objects on the Network as used in the World-Wide Web",
http://www.ietf.org/rfc/rfc1630.txt

RFC1738 (proposed standard), 1994-12, "Uniform Resource Locators
(URL)",
http://www.ietf.org/rfc/rfc1738.txt

RFC2396 (draft standard), 1998-08, "Uniform Resource Identifiers
(URI): Generic Syntax",
http://www.ietf.org/rfc/rfc2396.txt

--
Jock
Jul 17 '05 #3
"John Dunlop" wrote
1. <http://domain.example/?baz=foo+bar>
2. <http://domain.example/?baz=foo%20bar>


Note that these are also the same as for using $_GET['baz'] (which by design
holds the decoded values). But when explicitely (manually) decoding
$_SERVER['QUERY_STRING']: http://php.net/rawurldecode does *not* convert +
characters into spaces!

Adriaan
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2219
by: leegold2 | last post by:
How do I use rawurlencode()? A snippet would extremely appreciated. I read I should use it twice for plus signs(?) - I need help! Thanks. Very strange stuff happens when I use GET to pass a string like '+high +altitude'. Literally that's my (a legitimate) test string in a Mysql Boolean Fulltext search. Other non alplas are also causing strangeness. Not at first from my user search form, but during pagination of the results, as I GET the...
3
7657
by: JP SIngh | last post by:
Hi All I have users who upload files using my application using ASPUPLOAD component. My code uploads the file to a network location and once the upload is finish I display the hyperlink using the following code <a href=\myserver\attachments\<%=server.urlencode(rs("FileName"))%> target="_blank" ><%=rs("Filename")%>
1
3779
by: yawnmoth | last post by:
Any ideas as to why urlencode(addslashes(chr(0))) returns '%5C0'? It seems like it should return '%00' since that's what urlencode(chr(0)) returns. If not that, I could also see it returning '%5C%00' since that's what urlencode('\\').urlencode(chr(0)) and urlencode('\\'.chr(0)) return. '%5C0', however, confuses me. Even urlencode(chr(0x5C0)) doesn't return '%5C0' - it returns '%C0'.
1
5015
by: Jim | last post by:
Hello, I'm trying to do urllib.urlencode() with unicode correctly, and I wonder if some kind person could set me straight? My understanding is that I am supposed to be able to urlencode anything up to the top half of latin-1 -- decimal 128-255. I can't just send urlencode a unicode character:
4
6746
by: Andreas Klemt | last post by:
Hello, is there a difference between System.Web.HttpUtility.UrlEncode and Server.UrlEncode ?
1
5384
by: Dario Sala | last post by:
Hi, what's the difference about Asp Server.UrlEncode and the Asp.Net Server.UrlEncode ? In asp: Server.UrlEncode("*") = %2A In Asp.Net: Server.UrlEncode("*") = *
4
2595
by: djc | last post by:
1) I just recently used my own function which simply replaces cariage return / line feed characters with <br> tags for a large detail field before showing it via an asp.net page to preserve line breaks for web display. I have since come accross this URLEncode method. Is this something that I should have used for this instead? 2) currently if you were to enter html tags into an input field on my asp.net web page the app will crash... I...
6
2624
by: dbee | last post by:
So I can't seem to urlencode a file with newlines ... it just gives me a series of T_STRING unexpected parse errors... cat job_description | while read file ; do php -r "echo urlencode('$file');" ; done > job_description_encoded && URL_ENCODED_DESCRIPTION=`cat job_description_encoded` .... this takes in a job_description file and outputs a file with alot of errors in the text ... ?
12
4948
by: sleytr | last post by:
Hi, I'm trying to make a gui for a web service. Site using ± character in value of some fields. But I can't encode this character properly. >>> data = {'key':'±'} >>> urllib.urlencode(data) 'key=%C2%B1' but it should be only %B1 not %C2%B1. where is this %C2 coming from?
0
9683
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9529
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10457
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10231
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10176
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9054
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7550
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5443
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5576
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.