A critique of cgi.escape

Lawrence D'Oliveiro

The "escape" function in the "cgi" module escapes characters with special
meanings in HTML. The ones that need escaping are '<', '&' and '"'.
However, cgi.escape only escapes the quote character if you pass a second
argument of True (the default is False):

>>cgi.escape("the \"quick\" & <brownfox")

'the "quick" & <brown> fox'

>>cgi.escape("the \"quick\" & <brownfox", True)

'the "quick" & <brown> fox'

This seems to me to be dumb. The default option should be the safe one: that
is, escape _all_ the potentially troublesome characters. The only time you
can get away with NOT escaping the quote character is outside of markup,
e.g.

<TEXTAREA>
unescaped "quotes" allowed here
</TEXTAREA>

Nevertheless, even in that situation, escaped quotes are acceptable.

So I think the default for the second argument to cgi.escape should be
changed to True. Or alternatively, the second argument should be removed
altogether, and quotes should always be escaped.

Can changing the default break existing scripts? I don't see how. It might
even fix a few lurking bugs out there.

Sep 23 '06

Subscribe Reply

131

9141

Lawrence D'Oliveiro

In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

Max M wrote:

>It also makes the escaped html harder to read for standard cases.

and slows things down a bit.

(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

Sep 26 '06 #51

Lawrence D'Oliveiro

In message <45***********************@dread15.news.tele.dk> , Max M wrote:

Jon Ribbens skrev:
>In article <ma**************************************@python.o rg>, Fredrik
Lundh wrote:
>>>There's nothing to say that cgi.escape should take them both into
account in the one function
so what exactly are you using cgi.escape for in your code ?

To escape characters so that they will be treated as character data
and not control characters in HTML.

>>>What precisely do you think it would "break"?
existing code, and existing tests.

I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?

Some examples are:

- Possibly any code that tests for string equality in a rendered
html/xml page.

You've got to be kidding. Any programmer knows that, to test two strings for
equality, you should do that on a canonical (non-encoded) representation.

- Code that generates cgi.escaped() markup and (rightfully) for some
reason expects the old behaviour to be used.

Whenever I use a channel-coding function, I expect the resulting output to
be only fit for feeding into the channel. I do NOT expect to do anything
else with it. Any kind of data manipulation I do, I do BEFORE feeding it
into the output channel, which means BEFORE putting it through the channel
coding.

- 3. party code that parses/scrapes content from cgi.escaped() markup.
(you could even break Java code this way :-s )

If that code follows the HTML rules, it will work.

Sep 26 '06 #52

Lawrence D'Oliveiro

In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

In article <ef**********@news.albasani.net>, Georg Brandl wrote:

>>I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it could break existing code?

Is that so hard to see? If cgi.escape replaced "'" with an entity
reference, code that expects it not to do so would break.

Sorry, that's still not good enough. Why would any code expect such a
thing?
>>
that's not up to you to decide, though.

Yes it is. An HTML-quoting function converts a string to its HTML-compatible
representation. Since it is now HTML-compatible, any code that tries to
work with it afterwards has got to expect it to be HTML-compatible. Which
means it has to allow for what HTML allows.

Sep 26 '06 #53

Lawrence D'Oliveiro

In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

Lawrence D'Oliveiro wrote:

>>Georg Brandl wrote:

A function is broken if its implementation doesn't match the
documentation.

or if it doesn't match the designer's intent. cgi.escape is old enough
that we would have noticed that, by now...

_We_ certainly have noticed it.

you're not the designer...

I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.

Sep 26 '06 #54

Steven D'Aprano

On Mon, 25 Sep 2006 16:48:03 +0200, Max M wrote:

Any change in Python that has these consequences will rightfully be
considered a bug. So what you are suggesting is to knowingly introduce a
bug in the standard library!

It isn't like there have never been backwards _in_compatible changes to
the standard library before.

Ten seconds of googling finds
http://www.python.org/download/relea...3/highlights/:

int() - this can now return a long when converting a string with many
digits, rather than raising OverflowError. (New in 2.3a2: issues a
FutureWarning when sign-folding an unsigned hex or octal literal.)

Bastion and rexec - these modules are disabled, because they aren't
safe in Python 2.3 (nor in Python 2.2). (New in 2.3a2.)

Hex/oct literals prefixed with a minus sign were handled
inconsistently. This has been fixed in accordance with PEP 237. (New
in 2.3a2.)

Passing a float to C functions expecting an integer now issues a
DeprecationWarning; in the future this will become a TypeError. (New
in 2.3a2.)

None - assignment to variables or attributes named None will now
trigger a warning. In the future, None may become a keyword.

And more, all from one release.

If the behaviour of cgi.escape is "broken", or incomplete, or misleading,
then Python has a great mechanism for introducing incompatible changes
slowly: warnings.

It isn't good enough to say that the function does what it says it does,
if what it does is dangerous and misleading. Artificial example:

def sqr(x):
"""Returns the square of almost all numbers."""
if x != 1: return x**2
else: return -1

The function does exactly what it says, and yet still has badly dangerous
behaviour that risks introducing serious bugs. If people are relying on
unit tests which include specific tests for that behaviour, then the
function and the code needs to be fixed in parallel. That's what the
warnings module is for.

So any arguments about "breaking code" are a red herring: if cgi.escape
does the wrong thing (and that's arguable), and code relies on that
behaviour, then the code is already broken and needs to be fixed in
parallel with the function. So can we accept that:

(1) *if* there is a problem with cgi.escape it needs to be fixed;

(and, dear gods, I would hope that nobody here wants to argue that Python
should make backwards compatibility a higher virtue than correctness!)

(2) it doesn't need to be fixed *immediately* without warning;

(3) but it can be fixed through a gradual process with warning; and

(4) unit tests and code that expect the (presumed) bad behaviour can be
fixed gradually?

Now that we've got that out of the way, can we CALMLY and RATIONALLY
discuss whether cgi.escape is or isn't broken?

Or, more specifically, UNDER WHAT CIRCUMSTANCES it does the wrong thing?

--
Steven D'Aprano

Sep 26 '06 #55

Gabriel G

At Monday 25/9/2006 11:08, Jon Ribbens wrote:

What precisely do you think it would "break"?
existing code, and existing tests.

I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?

FWIW, a *lot* of unit tests on *my* generated html code would break,
and I imagine a *lot* of other people's code would break too. So
changing the defaults is not a good idea.
But if you want, import this on sitecustomize.py and pretend it said
quote=True:

import cgi
cgi.escape.func_defaults = (True,)
del cgi

Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Sep 26 '06 #56

Steve Holden

Jon Ribbens wrote:

In article <ma**************************************@python.o rg>, Brian Quinlan wrote:

>>>Now you're just being ridiculous. In this thread you have been rude,
evasive, insulting, vague, hypocritical, and have failed to answer
substantive points in favour of sarcastic and erroneous sniping - I'd
suggest it's you that needs to worry about being taken seriously.

Actually, at least in the context of this mailing list, Fredrik doesn't
have to worry about that at all. Why? Because he is one of the most
prolific contributers to the Python language and libraries

I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

I generally find that Fredrik's rudeness quotient is satisfactorily
biased towards discouraging ill-informed comment. As far as rudeness
goes, I've found your approach to this discussion to be pretty
obnoxious, and I'm generally know as someone with a high tolerance for
idiotic behaviour.

If your intention was to troll you could not have crafted your
contributions in a better way.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 26 '06 #57

Dan Bishop

Lawrence D'Oliveiro wrote:

In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

Max M wrote:

It also makes the escaped html harder to read for standard cases.
and slows things down a bit.

(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

How exactly would you make s = s.replace('"',""") faster than
*not* doing the replacement?

Sep 26 '06 #58

Duncan Booth

Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealandwrote:

In message <Xn*************************@127.0.0.1>, Duncan Booth
wrote:

>If I have a unicode string such as: u'\u201d' (right double quote),
then I want that encoded in my html as '”' (or ” but the
numeric form is better).

Right-double-quote is not an HTML special, so there's no need to quote
it. I'm only concerned here with characters that have special meanings
in HTML markup.

There is no need to quote " or ' either except in particular situations.

Would you care to suggest how you get a right double quote into any iso-
8859-1 encoded web page without quoting it? Even if the page is utf-8
encoded quoting it can be a good idea.

>
>There should be a one-stop shop where I can take my unicode text and
convert it into something I can safely insert into a generated html
page; at present I need to call both cgi.escape and s.encode to get
the desired effect.

What you're really asking for is a version of cgi.escape that a) fixes
the bugs discussed in this thread, and b) copes with different
encodings while doing so.

To handle b), you would need to pass it some indication of what the
encoding of the string is. In any case, converting a literal
right-double-quote to ” is not relevant to the purpose of
cgi.escape.

You don't seem to understand about html entity escapes. ” is a valid
way to express right double quote whatever the page encoding. There is no
need to know the encoding of the page in order to escape entities, just
escape anything which can be problematic.

Sep 26 '06 #59

Duncan Booth

Lawrence D'Oliveiro <ld*@geek-central.gen.new_zealandwrote:

>(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that
are obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far
the most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

It is slightly slower because it does more. Both cases are about 15 times
faster than the regular expression implementation someone posted to this
thread yesterday.

Sep 26 '06 #60

Lawrence D'Oliveiro

In message <11**********************@m7g2000cwm.googlegroups. com>, Dan
Bishop wrote:

Lawrence D'Oliveiro wrote:
>In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

Max M wrote:

It also makes the escaped html harder to read for standard cases.

and slows things down a bit.

(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far
the most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

How exactly would you make s = s.replace('"',""") faster than
*not* doing the replacement?

Wrong answer. Correctness comes first, then we worry about efficiency.

Sep 26 '06 #61

Lawrence D'Oliveiro

In message <ma**************************************@python.o rg>, Gabriel G
wrote:

At Monday 25/9/2006 11:08, Jon Ribbens wrote:

>What precisely do you think it would "break"?

existing code, and existing tests.

I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?

FWIW, a *lot* of unit tests on *my* generated html code would break...

Why did you write your code that way?

Sep 26 '06 #62

Georg Brandl

Lawrence D'Oliveiro wrote:

In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

>Lawrence D'Oliveiro wrote:

>>>Georg Brandl wrote:

A function is broken if its implementation doesn't match the
documentation.

or if it doesn't match the designer's intent. cgi.escape is old enough
that we would have noticed that, by now...

_We_ certainly have noticed it.

you're not the designer...

I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.

What about the users who don't need to "fix" their code since it's working fine
and flawlessly with the current cgi.escape?

Georg

Sep 26 '06 #63

Georg Brandl

Lawrence D'Oliveiro wrote:

In message <45***********************@dread15.news.tele.dk> , Max M wrote:

>Lawrence is right that the escape method doesn't work the way he expects
it to.

Rewriting a library module simply because a developer is surprised is a
*very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some comments in
this thread, yes. But "surprised" is what a lot of users of the existing
cgi.escape function are going to be when they discover their code isn't
doing what they thought it was.

Why should they be surprised? The documentation states clearly what cgi.escape()
does (as does the docstring).

Georg

Sep 26 '06 #64

Lawrence D'Oliveiro

In message <ef**********@news.albasani.net>, Georg Brandl wrote:

Lawrence D'Oliveiro wrote:
>In message <45***********************@dread15.news.tele.dk> , Max M wrote:

>>Lawrence is right that the escape method doesn't work the way he expects
it to.

Rewriting a library module simply because a developer is surprised is a
*very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some comments
in this thread, yes. But "surprised" is what a lot of users of the
existing cgi.escape function are going to be when they discover their
code isn't doing what they thought it was.

Why should they be surprised? The documentation states clearly what
cgi.escape() does (as does the docstring).

Documentation frequently states stupid things. Doesn't mean it should be
treated as sacrosanct.

Sep 26 '06 #65

Lawrence D'Oliveiro

In message <ef**********@news.albasani.net>, Georg Brandl wrote:

Lawrence D'Oliveiro wrote:
>In message <ma**************************************@python.o rg>, Fredrik
Lundh wrote:

>>Lawrence D'Oliveiro wrote:

Georg Brandl wrote:
>
>A function is broken if its implementation doesn't match the
>documentation.
>
or if it doesn't match the designer's intent. cgi.escape is old
enough that we would have noticed that, by now...

_We_ certainly have noticed it.

you're not the designer...

I don't have to be. Whoever the designer was, they had not properly
thought through the uses of this function. That's quite obvious already,
to anybody who works with HTML a lot. So the function is broken and needs
to be fixed.

If you're worried about changing the semantics of a function that keeps
the same "cgi.escape" name, then fine. We delete the existing function
and add a new, properly-designed one. _That_ will be a wake-up call to
all the users of the existing function to fix their code.

What about the users who don't need to "fix" their code since it's working
fine and flawlessly with the current cgi.escape?

They're just lucky. I guess, that the bugs haven't bitten them--yet.

Sep 26 '06 #66

Max M

Lawrence D'Oliveiro skrev:

In message <ma**************************************@python.o rg>, Gabriel G
wrote:

>At Monday 25/9/2006 11:08, Jon Ribbens wrote:

>>>>What precisely do you think it would "break"?
existing code, and existing tests.
I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?
FWIW, a *lot* of unit tests on *my* generated html code would break...

Why did you write your code that way?

Stop feeding the troll.

Sep 26 '06 #67

Jon Ribbens

In article <ma**************************************@python.o rg>, Steve Holden wrote:

>I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

I generally find that Fredrik's rudeness quotient is satisfactorily
biased towards discouraging ill-informed comment.

It's a pity he's being rude when presented with well-informed comment
then.

As far as rudeness goes, I've found your approach to this discussion
to be pretty obnoxious, and I'm generally know as someone with a
high tolerance for idiotic behaviour.

Why do you say that? I have confined myself to simple logical
arguments, and been frankly very restrained when presented with
rudeness and misunderstanding from other thread participants.
In what way should I have modified my postings?

Sep 26 '06 #68

Georg Brandl

Lawrence D'Oliveiro wrote:

In message <ef**********@news.albasani.net>, Georg Brandl wrote:

>Lawrence D'Oliveiro wrote:
>>In message <45***********************@dread15.news.tele.dk> , Max M wrote:

Lawrence is right that the escape method doesn't work the way he expects
it to.

Rewriting a library module simply because a developer is surprised is a
*very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some comments
in this thread, yes. But "surprised" is what a lot of users of the
existing cgi.escape function are going to be when they discover their
code isn't doing what they thought it was.

Why should they be surprised? The documentation states clearly what
cgi.escape() does (as does the docstring).

Documentation frequently states stupid things. Doesn't mean it should be
treated as sacrosanct.

That's not the point. The point is that someone using cgi.escape() will hardly
be surprised of what it does and doesn't do.

Georg

Sep 26 '06 #69

Jim

Jon Ribbens wrote:

You're right - I've never seen anyone do such a thing. It sounds like
a highly dubious and very fragile sort of test to me, of very limited
use.

I have code that checks to see if my CGI scripts generate the pages
that I expect. That code would break. (Whether I should not have
written them that way is a different point, but it would break.)

Jim

Sep 26 '06 #70

Lawrence D'Oliveiro

In message <ef**********@news.albasani.net>, Georg Brandl wrote:

Lawrence D'Oliveiro wrote:
>In message <ef**********@news.albasani.net>, Georg Brandl wrote:

>>Lawrence D'Oliveiro wrote:
In message <45***********************@dread15.news.tele.dk> , Max M
wrote:

Lawrence is right that the escape method doesn't work the way he
expects it to.
>
Rewriting a library module simply because a developer is surprised is
a *very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some
comments in this thread, yes. But "surprised" is what a lot of users of
the existing cgi.escape function are going to be when they discover
their code isn't doing what they thought it was.

Why should they be surprised? The documentation states clearly what
cgi.escape() does (as does the docstring).

Documentation frequently states stupid things. Doesn't mean it should be
treated as sacrosanct.

That's not the point. The point is that someone using cgi.escape() will
hardly be surprised of what it does and doesn't do.

And this surprise, or lack of it, is relevant to the argument how, exactly?

Sep 26 '06 #71

Steve Holden

Jon Ribbens wrote:

In article <ma**************************************@python.o rg>, Steve Holden wrote:

>>>I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

I generally find that Fredrik's rudeness quotient is satisfactorily
biased towards discouraging ill-informed comment.

It's a pity he's being rude when presented with well-informed comment
then.

>>As far as rudeness goes, I've found your approach to this discussion
to be pretty obnoxious, and I'm generally know as someone with a
high tolerance for idiotic behaviour.

Why do you say that? I have confined myself to simple logical
arguments, and been frankly very restrained when presented with
rudeness and misunderstanding from other thread participants.
In what way should I have modified my postings?

Please allow me to apologise. I have clearly been confusing you with
someone else. A review of your contributions to the thread confirms your
asertion.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 26 '06 #72

Sion Arrowsmith

Jon Ribbens <jo********@unequivocal.co.ukwrote:

>In article <Xn*************************@127.0.0.1>, Duncan Booth wrote:
>I guess you've never seen anyone write tests which retrieve some generated
html and compare it against the expected value. If the page contains any
unescaped quotes then this change would break it.
You're right - I've never seen anyone do such a thing. It sounds like
a highly dubious and very fragile sort of test to me, of very limited
use.

So what sort of test would you use, that doesn't involve comparing
actual output against expected output?

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Sep 26 '06 #73

Christophe

Sion Arrowsmith a écrit :

Jon Ribbens <jo********@unequivocal.co.ukwrote:
>In article <Xn*************************@127.0.0.1>, Duncan Booth wrote:
>>I guess you've never seen anyone write tests which retrieve some generated
html and compare it against the expected value. If the page contains any
unescaped quotes then this change would break it.
You're right - I've never seen anyone do such a thing. It sounds like
a highly dubious and very fragile sort of test to me, of very limited
use.

So what sort of test would you use, that doesn't involve comparing
actual output against expected output?

Well, one could say that the expected output is the one as it'll be
interpreted by the HTLM navigator. And thus, the test should un HTLM
escape the string and compare it to the original string instead of
mandating a specific encoding.

Sep 26 '06 #74

Jon Ribbens

In article <ma**************************************@python.o rg>, Steve Holden wrote:

>Why do you say that? I have confined myself to simple logical
arguments, and been frankly very restrained when presented with
rudeness and misunderstanding from other thread participants.
In what way should I have modified my postings?

Please allow me to apologise. I have clearly been confusing you with
someone else. A review of your contributions to the thread confirms your
asertion.

Oh, ok! You had me worried for a minute there ;-)

Sep 26 '06 #75

Fredrik Lundh

Jon Ribbens wrote:

This has nothing to do with character encodings.

it has *everything* to do with encoding of existing data into HTML so it can be
safely transported to, and recreated by, an HTML-aware client.

does the word "information set" mean anything to you?

</F>

Sep 26 '06 #76

Steve Holden

Lawrence D'Oliveiro wrote:

In message <ef**********@news.albasani.net>, Georg Brandl wrote:

>>Lawrence D'Oliveiro wrote:

>>>In message <ef**********@news.albasani.net>, Georg Brandl wrote:
Lawrence D'Oliveiro wrote:

>In message <45***********************@dread15.news.tele.dk> , Max M
>wrote:
>
>
>>Lawrence is right that the escape method doesn't work the way he
>>expects it to.
>>
>>Rewriting a library module simply because a developer is surprised is
>>a *very* bad idea.
>
>I'm not surprised. Disappointed, yes. Verging on disgust at some
>comments in this thread, yes. But "surprised" is what a lot of users of
>the existing cgi.escape function are going to be when they discover
>their code isn't doing what they thought it was.

Why should they be surprised? The documentation states clearly what
cgi.escape() does (as does the docstring).

Documentation frequently states stupid things. Doesn't mean it should be
treated as sacrosanct.

That's not the point. The point is that someone using cgi.escape() will
hardly be surprised of what it does and doesn't do.

And this surprise, or lack of it, is relevant to the argument how, exactly?

Is there *any* branch of this thread that won't end with some snippy
remark from you?
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 26 '06 #77

Fredrik Lundh

Lawrence D'Oliveiro wrote:

>(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common

really? most HTML attributes cannot even contain things that would need to
be escaped, while *all* element content needs escaping. and the web contains
a lot of element content, as should be obvious to anyone who's been there...

</F>

Sep 26 '06 #78

Georg Brandl

Lawrence D'Oliveiro wrote:

In message <ef**********@news.albasani.net>, Georg Brandl wrote:

>Lawrence D'Oliveiro wrote:
>>In message <ef**********@news.albasani.net>, Georg Brandl wrote:

Lawrence D'Oliveiro wrote:
In message <45***********************@dread15.news.tele.dk> , Max M
wrote:
>
>Lawrence is right that the escape method doesn't work the way he
>expects it to.
>>
>Rewriting a library module simply because a developer is surprised is
>a *very* bad idea.
>
I'm not surprised. Disappointed, yes. Verging on disgust at some
comments in this thread, yes. But "surprised" is what a lot of users of

^^^^^^^^^^^

>>>>the existing cgi.escape function are going to be when they discover
their code isn't doing what they thought it was.

Why should they be surprised? The documentation states clearly what
cgi.escape() does (as does the docstring).

Documentation frequently states stupid things. Doesn't mean it should be
treated as sacrosanct.

That's not the point. The point is that someone using cgi.escape() will
hardly be surprised of what it does and doesn't do.

And this surprise, or lack of it, is relevant to the argument how, exactly?

Which argument? You said users were going to be surprised, I told you why they
aren't.

Georg

(Okay, this is my last posting to this thread)

Sep 26 '06 #79

Fredrik Lundh

Georg Brandl wrote:

It says "to HTML-safe sequences". That's reasonably clear without the need
to reproduce the exact replacements for each character.

the same documentation tells people what function to use if they want to quote *every-
thing* that might need to be quoted, so if people did actually understand everything that
was written in a reasonably clear way, this thread wouldn't even exist.

</F>

Sep 26 '06 #80

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

>This has nothing to do with character encodings.

it has *everything* to do with encoding of existing data into HTML
so it can be safely transported to, and recreated by, an HTML-aware
client.

I can't tell if you're disagreeing or not. You escape the character
"<" as the sequence of characters "<", for example, because
otherwise the HTML user agent will treat it as the start of a tag and
not as character data. You will notice that the character encoding is
utterly irrelevant to this.

does the word "information set" mean anything to you?

You would appear to be talking about either game theory, or XML,
neither of which have anything to do with HTML.

Sep 26 '06 #81

Fredrik Lundh

Jon Ribbens wrote:

It's a pity he's being rude when presented with well-informed comment
then.

since when is the output of

import random, sys
messages = [
"that's irrelevant",
"then their code is broken already",
"that's not good enough",
"then their tests are broken already",
"you're rude",
]
for x in xrange(sys.maxint):
print random.choice(messages)

well-informed? heck, it doesn't even pass the turing test ;-)

</F>

Sep 26 '06 #82

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

the same documentation tells people what function to use if they
want to quote *every-thing* that might need to be quoted, so if
people did actually understand everything that was written in a
reasonably clear way, this thread wouldn't even exist.

The fact that you don't understand that that's not true is the reason
you've been getting into such a muddle in this thread.

Sep 26 '06 #83

Fredrik Lundh

Jon Ribbens wrote:

>does the word "information set" mean anything to you?

You would appear to be talking about either game theory, or XML,
neither of which have anything to do with HTML.

you see no connection between XML's concept of information set and
HTML? (hint: what's XHTML?)

</F>

Sep 26 '06 #84

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

>It's a pity he's being rude when presented with well-informed comment
then.

since when is the output of

[snip code]

>
well-informed? heck, it doesn't even pass the turing test ;-)

Since when did that bear any resemblance to what I have said?

Are you going to grow up and start addressing the substantial points
raised, rather than making puerile sarcastic remarks?

An apology from you would not go amiss.

Sep 26 '06 #85

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

Jon Ribbens wrote:

>>does the word "information set" mean anything to you?

You would appear to be talking about either game theory, or XML,
neither of which have anything to do with HTML.

I notice that yet again you've snipped the substantial point and
failed to answer it, presumably because you don't know how.

you see no connection between XML's concept of information set and
HTML? (hint: what's XHTML?)

I am perfectly well aware of what XHTML is. If you're trying to make
a point, please get to it, rather than going off on irrelevant
tangents. What do XML Information Sets have to do with escaping
control characters in HTML?

Sep 26 '06 #86

Fredrik Lundh

Jon Ribbens wrote:

>the same documentation tells people what function to use if they
want to quote *every-thing* that might need to be quoted, so if
people did actually understand everything that was written in a
reasonably clear way, this thread wouldn't even exist.

The fact that you don't understand that that's not true is the reason
you've been getting into such a muddle in this thread.

it's a fact that it's not true that the documentation points to the function
that it points to ? exactly what definitions of the words "fact" and "true"
are you using here ?

</F>

Sep 26 '06 #87

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

>>the same documentation tells people what function to use if they
want to quote *every-thing* that might need to be quoted, so if
people did actually understand everything that was written in a
reasonably clear way, this thread wouldn't even exist.

The fact that you don't understand that that's not true is the reason
you've been getting into such a muddle in this thread.

it's a fact that it's not true that the documentation points to the function
that it points to ? exactly what definitions of the words "fact" and "true"
are you using here ?

You misunderstand again. The second half of the sentence is the untrue
bit ("if people did ... understand ... this thread wouldn't even exist"),
not the first.

Sep 26 '06 #88

Fredrik Lundh

Jon Ribbens wrote:

I notice that yet again you've snipped the substantial point and
failed to answer it, presumably because you don't know how.

cute.

What do XML Information Sets have to do with escaping control
characters in HTML?

figure out the connection, and you'll have the answer to your "substantial
point".

</F>

Sep 26 '06 #89

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

>What do XML Information Sets have to do with escaping control
characters in HTML?

figure out the connection, and you'll have the answer to your "substantial
point".

If you don't know the answer, you can say so y'know. There's no shame
in it.

Sep 26 '06 #90

Fredrik Lundh

Jon Ribbens wrote:

If you don't know the answer, you can say so y'know.

I know the answer. I'm pretty sure everyone else who's actually read my posts
to this thread might have figured it out by now, too. But since you're still trying
to "win" the debate, long after it's over, I think it's safest to end this thread right
now. *plonk*

Sep 26 '06 #91

Jon Ribbens

In article <ma**************************************@python.o rg>, Fredrik Lundh wrote:

I know the answer. I'm pretty sure everyone else who's actually
read my posts to this thread might have figured it out by now, too.
But since you're still trying to "win" the debate, long after it's
over, I think it's safest to end this thread right now. *plonk*

It's sad to see a grown man throw his toys out of his pram, just
because he's losing an argument...

Sep 26 '06 #92

Brian Quinlan

A summary of this pointless argument:

Why cgi.escape should be changed to escape double quote (and maybe
single quote) characters by default:
o escaping should be very aggressive by default to avoid subtle bugs
o over-escaping is not likely to harm most program significantly
o people who do not read the documentation may be surprised by it's
behavior

Why cgi.escape should NOT be changed:
o it is current used in lots of code and changing it will almost
certainly break some of it, test suites at minimum e.g.
assert my_template_system("<p>{foo}</p>", foo='"') == '<p>"</p>'
o escaping attribute values is less common than escaping element
text so people should not be punished with:
- harder to read output
- (slightly) increased file size
- (slightly) decreased performance
o cgi.escape is not meant for serious web application development, so
either roll your own (trivial) function to do escaping how you want
it or use the one provided by your framework (if it is not automatic)
o the documentation describes the current behavior precisely and
suggests solutions that provide more aggressive escaping, so arguing
about surprising behavior is not reasonable
o it doesn't even make sense for an escape function to exist in the cgi
module, so it should only be used by old applications for
compatibility reasons
Cheers,
Brian

Sep 26 '06 #93

Paul Rubin

Brian Quinlan <br***@sweetapp.comwrites:

o cgi.escape is not meant for serious web application development,

What is it meant for then? Why should the library ever implement
anything in a half-assed way unsuitable for serious application
development, if it can supply a robust implementation instead?

Your other points are reasonable. I like the idea of adding an option
to escape single quotes, but I don't care much what the defaults are.

I notice that the options for pickle.dump/dumps changed incompatibly
between Python 2.2 and 2.3, and nobody really cared.

Sep 26 '06 #94

Jon Ribbens

In article <ma**************************************@python.o rg>, Brian Quinlan wrote:

A summary of this pointless argument:

Your summary seems pretty reasonable, but please note that later on,
the thread was not about cgi.escape escaping (or not) quote
characters (as described in your summary), but about Fredrik arguing,
somewhat incoherently, that it should have to take character encodings
into consideration.

Sep 26 '06 #95

Brian Quinlan

Paul Rubin wrote:

Brian Quinlan <br***@sweetapp.comwrites:
>o cgi.escape is not meant for serious web application development,

What is it meant for then? Why should the library ever implement
anything in a half-assed way unsuitable for serious application
development, if it can supply a robust implementation instead?

I'd have to dig through the revision history to be sure, but I imagine
that cgi.escape was originally only used in the cgi module (and there
only in it's various print_* functions). Then it started being used by
other core Python modules e.g. cgitb, DocXMLRPCServer.

The "mistake", if there was one, was probably that escape wasn't spelled
_escape and got documented in the LaTeX documentation system.

All of this is just speculation though.

Cheers,
Brian

Sep 26 '06 #96

George Sakkis

Lawrence D'Oliveiro wrote:

Fredrik Lundh wrote:
you're not the designer...

I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.

Wow. Are you always that arrogant for things you know very little
about, or just plain stupid ?

Sep 26 '06 #97

Brian Quinlan

Jon Ribbens wrote:

In article <ma**************************************@python.o rg>, Brian Quinlan wrote:
>A summary of this pointless argument:

Your summary seems pretty reasonable, but please note that later on,
the thread was not about cgi.escape escaping (or not) quote
characters (as described in your summary), but about Fredrik arguing,
somewhat incoherently, that it should have to take character encodings
into consideration.

And, of course, about you telling people that their explanations are not
good enough :-)

BTW, I am curious about how you do unit testing. The example that I used
in my summary is a very common pattern but would break in cgi.escape
changed it's semantics. What do you do instead?

Cheers,
Brian

Sep 26 '06 #98

Jon Ribbens

In article <ma**************************************@python.o rg>, Brian Quinlan wrote:

>Your summary seems pretty reasonable, but please note that later on,
the thread was not about cgi.escape escaping (or not) quote
characters (as described in your summary), but about Fredrik arguing,
somewhat incoherently, that it should have to take character encodings
into consideration.

And, of course, about you telling people that their explanations are not
good enough :-)

I guess, if you mean the part of the thread which went "it'll break
existing code", "what existing code"? "existing code" "but what
existing code?" "i dunno, just, er, code" "ok *how* will it break it?"
"i dunno, it just will"?

BTW, I am curious about how you do unit testing. The example that I used
in my summary is a very common pattern but would break in cgi.escape
changed it's semantics. What do you do instead?

To be honest I'm not sure what *sort* of code people test this way. It
just doesn't seem appropriate at all for web page generating code. Web
pages need to be manually viewed in web browsers, and validated, and
checked for accessibility. Checking they're equal to a particular
string just seems bizarre (and where does that string come from
anyway?)

Sep 26 '06 #99

Brian Quinlan

Jon Ribbens wrote:

I guess, if you mean the part of the thread which went "it'll break
existing code", "what existing code"? "existing code" "but what
existing code?" "i dunno, just, er, code" "ok *how* will it break it?"
"i dunno, it just will"?

See below for a possible example.

>BTW, I am curious about how you do unit testing. The example that I used
in my summary is a very common pattern but would break in cgi.escape
changed it's semantics. What do you do instead?

To be honest I'm not sure what *sort* of code people test this way. It
just doesn't seem appropriate at all for web page generating code.

Well, there are dozens (hundreds?) of templating systems for Python.
Here is a (simplified/modified) unit test for my company's system (yeah,
we lifted some ideas from Django):

test.html
---------
<p>{foo | escape}</p>

test.py
-------
t = Template("test.html")
t['foo'] = 'Brian -"Hi!"'
assert str(t) == '<p>Brian -> "Hi"</p>'

So how would you test our template system?

Web
pages need to be manually viewed in web browsers, and validated, and
checked for accessibility.

True.

Checking they're equal to a particular
string just seems bizarre (and where does that string come from
anyway?)

Maybe, which is why I'm asking you how you do it. Some of our web
applications contain 100s of script generated pages. Testing each one by
hand after making a change would be completely impossible. So we use
HTTP scripting for testing purposes i.e. send this request, grab the
results, verify that the test in the element with id="username" equals
"Brian Quinlan", etc. The test also validates that each page is well
formed. We also view each page at some point but not every time a
developer makes a change that might (i.e. everything) affect the entire
system.

Cheers,
Brian

Sep 26 '06 #100

Similar topics