By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,521 Members | 1,407 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,521 IT Pros & Developers. It's quick & easy.

A better XSS trap (Feedback wanted)

P: n/a
Hej everybody,

I built something for myself that might help some of you as well.
Looking at a couple of PHP template engines made me think.

I have two main requirements for a presentation layer framework:
- use PHP as the template language
- effective XSS prevention without betting on discipline

Plain PHP only satisfies the first. I could not find a PHP template
engine that satisfies both. (Savant doesn't.)

So here is my own minimal solution and I would like to know your
opinion. Also, if anybody has seen something like it out there, please
point me to it.

The Idea:
Automatically wrap every string meant for output into an object, which
offers filtering methods like htmlentities. This also means intercepting
access to strings contained in Arrays and Objects. This can be done
using the Decorator pattern.

The code:
http://code.google.com/p/cvphplib/so...runk/cvphplib/
svn checkout http://cvphplib.googlecode.com/svn/trunk/cvphplib/

Example usage:
// first a simple string
<? $string = CV_OutputFilter::filter( '<marquee>evil</marquee>' ); ?>
<?=$string?triggers an error
<?=$string->htmlentities()?works fine
<?=$string->urlencode()? works fine
<?=$string->raw()?outputs the unfiltered value

// extracting a bunch of filtered variables into the local scope
<?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>'<b>') );
extract( CV_OutputFilter::filter($vars)->toArray() );
?>

// access to object members
<?=$o->var?triggers an error
<?=$o->method()?triggers an error
<?=$o->var->htmlentities()?works fine
<?=$o->method()->htmlentities()?works fine

// access to array elements
<?=$array['<i>']?triggers an error
<?=$array['<i>']->htmlentities()?works fine

// Iterating over an array
<? foreach( $array as $value ){} ?works fine
<? foreach( $array as $key =$value ){} ?throws an exception, because
$key would not be filtered in this case

// decorating array keys requires some iterator magic
<? foreach( $array->key_as($key) as $value ): ?>
<?=$key->htmlentities()?>: <?=$value->htmlentities()?<br/>
<? endforeach; ?>
Potential problem:
- potentially slow (due to many object instantiations and reflection)

Benefits:
- effective XSS prevention without betting on discipline
- template-engine-like variable extraction into local scope
- clean and short syntax
- very little to learn

Functionality already implemented, but not shown in the example:
- register custom filter methods
- enable __toString() with custom default filter
- use tuple array(key,value) for $value instead of 'key_as'-magic
- register custom filter applied on keys in ->toArray()
- decoration of multidimensional arrays and webs of object references

More example code:
http://code.google.com/p/cvphplib/so...tputFilter.php
http://code.google.com/p/cvphplib/so...tputFilter.php

So, what do you think?

Best regards

Christopher
Aug 30 '08 #1
Share this Question
Share on Google+
6 Replies


P: n/a
On 30 Aug, 18:23, Christopher Vogt <christopher.v...@rwth-aachen.de>
wrote:
Hej everybody,

I built something for myself that might help some of you as well.
Looking at a couple of PHP template engines made me think.

I have two main requirements for a presentation layer framework:
- use PHP as the template language
- effective XSS prevention without betting on discipline
<snip>
// extracting a bunch of filtered variables into the local scope
<?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>'<b>') );
extract( CV_OutputFilter::filter($vars)->toArray() );
?>
You seem to be putting the cart before the horse. Representations
should reflect the application of the data. Preventing XSS requires a
different representation from preventing SQL injection - the solution
is to change the representation of the data just before it moves out
of PHP (into the browser or into the database).

So where's the benefit in replicating every change-of-representation
function as an object method?

How does this enforce change of representation without a discipline?
You can't enforce that the programmer calls the correct transform at
consumption, and you are requiring a discipline of encapsulating data
at the point where it enters the system.

C.
Aug 31 '08 #2

P: n/a
Hej C.,

thank you for your critique.
><?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>'<b>') );
extract( CV_OutputFilter::filter($vars)->toArray() );
?>
The method name 'filter' might be confusing. Maybe
CV_OutputFilter::decorate($vars) is a better name. The filtering is not
done at this point, but later, when the decorator object's methods (like
->htmlentities()) are called.
Representations should reflect the application of the data.
What do you mean by that?
Preventing XSS requires a
different representation from preventing SQL injection - the solution
is to change the representation of the data just before it moves out
of PHP (into the browser or into the database).
Isn't my solution doing exactly that? In case of an HTML template the
data is converted to html entities, the very moment before it is
embedded into the html. <?=$some_var->htmlentities()?>
So where's the benefit in replicating every change-of-representation
function as an object method?
How does this enforce change of representation without a discipline?
The benefit is the following. The usual way to apply htmlentities or
other escape methods is by calling a function and passing the string as
an argument, like <?=htmlentities($some_var)?>. However people tend to
forget or neglect the function call and embed <?=$some_var?instead,
which opens an XSS vulnerability. My solution forces the developer to
deliberately choose an escape method (or the raw data) because
forgetting to call an escape method fails with an error.
You can't enforce that the programmer calls the correct transform at
consumption,
I can only force him to deliberately choose. Of course I cannot force
him to choose the right thing.
and you are requiring a discipline of encapsulating data
at the point where it enters the system.
Ok, maybe I was a little unclear in this point. I do NOT mean
CV_OutputFilter::filter when the data enters the system, but before it
leaves it. In MVC terms CV_OutputFilter::filter would be called at the
end of the controller, just before the view is executed.

Does this clear things up?

Christopher
Aug 31 '08 #3

P: n/a
On Aug 30, 7:23*pm, Christopher Vogt <christopher.v...@rwth-aachen.de>
wrote:
Hej everybody,

I built something for myself that might help some of you as well.
Looking at a couple of PHP template engines made me think.

I have two main requirements for a presentation layer framework:
*- use PHP as the template language
*- effective XSS prevention without betting on discipline

Plain PHP only satisfies the first. I could not find a PHP template
engine that satisfies both. (Savant doesn't.)

So here is my own minimal solution and I would like to know your
opinion. Also, if anybody has seen something like it out there, please
point me to it.

The Idea:
Automatically wrap every string meant for output into an object, which
offers filtering methods like htmlentities. This also means intercepting
access to strings contained in Arrays and Objects. This can be done
using the Decorator pattern.

The code:http://code.google.com/p/cvphplib/so...runk/cvphplib/
svn checkouthttp://cvphplib.googlecode.com/svn/trunk/cvphplib/

Example usage:
// first a simple string
<? $string = CV_OutputFilter::filter( '<marquee>evil</marquee>' ); ?>
<?=$string?triggers an error
<?=$string->htmlentities()?works fine
<?=$string->urlencode()?* *works fine
<?=$string->raw()?outputs the unfiltered value

// extracting a bunch of filtered variables into the local scope
<?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>'<b>') );
extract( CV_OutputFilter::filter($vars)->toArray() );
?>

// access to object members
<?=$o->var?triggers an error
<?=$o->method()?triggers an error
<?=$o->var->htmlentities()?works fine
<?=$o->method()->htmlentities()?works fine

// access to array elements
<?=$array['<i>']?triggers an error
<?=$array['<i>']->htmlentities()?works fine

// Iterating over an array
<? foreach( $array as $value ){} ?works fine
<? foreach( $array as $key =$value ){} ?throws an exception, because
$key would not be filtered in this case

// decorating array keys requires some iterator magic
<? foreach( $array->key_as($key) as $value ): ?>
* * * * <?=$key->htmlentities()?>: <?=$value->htmlentities()?<br/>
<? endforeach; ?>

Potential problem:
*- potentially slow (due to many object instantiations and reflection)

Benefits:
*- effective XSS prevention without betting on discipline
*- template-engine-like variable extraction into local scope
*- clean and short syntax
*- very little to learn

Functionality already implemented, but not shown in the example:
*- register custom filter methods
*- enable __toString() with custom default filter
*- use tuple array(key,value) for $value instead of 'key_as'-magic
*- register custom filter applied on keys in ->toArray()
*- decoration of multidimensional arrays and webs of object references

More example code:http://code.google.com/p/cvphplib/so...vphplib/tests/...

So, what do you think?
Haven't checked out any code other than the stuff you posted here, but
I like it.

I like how just trying to display a variable triggers an error,
forcing the author to think. Additionally, I like how you're not
trying to add any framework-ish features, but leave it up to the
framework coders to maybe automatically do a CV_OutputFilter::filter()
on all variables sent to the view (assuming an MVC setup).

I see only one downside, and that is that IMHO the position where
things are done is slightly architecturally incorrect: you typically
want to do htmlentities() as the last operation, or second last
sometimes (before an nl2br/bbcode formatter or so). This means that if
there are operations performed on the variable that require it
unmangled, the author needs to do a workaround:

<?= htmlentities(parse_important_stuff_using_html_char s($var->raw())) ?
>
Naturally, if parse_important_stuff_using_html_chars is common, this
can be solved by registering it as a custom filter, of course.

Another worry: did you test the speed of your approach? I haven't
looked into the source, but it appears quite some magic requires
__get, __set and __call, which may slow down the view element of the
page a bit.

(btw, as another (on topic) shameless plug, my home cooked compiling
template engine does htmlentities or htmlspecial chars by default when
echoing expressions, and allows pure PHP code, http://e.teeselink.nl/mplate..
if you're cool with that, I may want to add your filter class thingy
as an extention to it, though, for more fine grained control)

Egbert
Sep 11 '08 #4

P: n/a
Hej Egbert,

thank you for your feedback (and that you like it ;)).
I see only one downside, and that is that IMHO the position where
things are done is slightly architecturally incorrect: you typically
want to do htmlentities() as the last operation, or second last
sometimes (before an nl2br/bbcode formatter or so).
My idea was: One normally applies few sets of functions to most of the
variables most of the time. Maybe one set for normal fields, one for
textareas, and one for urls. Having them in custom filter methods makes
view code shorter and more maintainable. And you are right, for special
cases there is always ->raw(). I don't really see the downside ;).
Another worry: did you test the speed of your approach? I haven't
looked into the source, but it appears quite some magic requires
__get, __set and __call, which may slow down the view element of the
page a bit.
I have this worry as well as there is some magic going on indeed. I did
not test performance yet. It should absolutely be verified with
benchmarks, but my guess is the following:
For most views it should not be a performance problem, when there are
only up to a few dozen variables access in the view. My implementation
does lazy decoration (as in lazy evaluation), meaning that the strings
are almost only decorated on demand and that _not_ everything is
recursed in the beginning. This means one ends up with a number of
instantiated objects roughly matching the number of variables actually
accessed in the view.
However, for heavily dynamic-data-dependent views, such as large tables,
performance could be a problem. But then it would still be possible to
access the whole table using ->raw() and thereby circumvent decoration
of every single cell. But of course this puts back the duty to call
htmlentities manually.
(btw, as another (on topic) shameless plug, my home cooked compiling
template engine does htmlentities or htmlspecial chars by default when
echoing expressions
Actually I thought about having a default escape method using __toString
in my StringFilter as well, but then I disabled it by default. It is
still possible to enable it again and register one filter as the
__toString default filter, but I do not recommend it. In my eyes it is
better to always be forced to choose the right filter depending on the
context and for example html, urls or javascript are different contexts
that require different filters.
and allows pure PHP code, http://e.teeselink.nl/mplate.
I had a quick look on the website and do not see a large difference to
Smarty, expect that mplate seems to put a Smarty-like syntax closer to
PHP-Code. Still there is a new syntax to learn. Why not use pure PHP
instead? <?=$some_var?or {$some_var} is no big difference in my eyes,
only that the first is well known and does not need to be compiled an
extra time. What did I miss ;)?
if you're cool with that, I may want to add your filter class thingy
as an extention to it, though, for more fine grained control)
Sure! I am happy you like it. It's published unter GPL. It would be
great if you link http://code.google.com/p/cvphplib/ in case you use my
code. Note that this was a prototype. I already have some future changes
in mind to further decrease the already small learning curve and at the
same time add some more flexibility. Get the latest code from the SVN.

/Christopher
Sep 12 '08 #5

P: n/a
(btw, as another (on topic) shameless plug, my home cooked compiling
template engine does htmlentities or htmlspecial chars by default when
echoing expressions

Actually I thought about having a default escape method using __toString
in my StringFilter as well, but then I disabled it by default. It is
still possible to enable it again and register one filter as the
__toString default filter, but I do not recommend it. In my eyes it is
better to always be forced to choose the right filter depending on the
context and for example html, urls or javascript are different contexts
that require different filters.
I think you're right there. Maybe I'll have to change some of that.
and allows pure PHP code,http://e.teeselink.nl/mplate.

I had a quick look on the website and do not see a large difference to
Smarty, expect that mplate seems to put a Smarty-like syntax closer to
PHP-Code. Still there is a new syntax to learn. Why not use pure PHP
instead? <?=$some_var?or {$some_var} is no big difference in my eyes,
only that the first is well known and does not need to be compiled an
extra time. What did I miss ;)?
Mostly because of control structures. IMHO, things like

<?php foreach($foo as $bar): ?>
<li>toink <?=$bar?>
<?php if($bar=="banana"): ?>
<i>(a banana)</i>
<?php endif; ?>
</li>
<?php endforeach; ?>

are a horrible read, and using normal syntax with if() { and } instead
of if(): .. endif; doesn't make it any better either. Wrapping them
into curly or angular braces makes the whole thing a much nicer read,
and easier to understand for non-programmers (doing the HTML design).
To my experience, it is a big difference for some.

Additionally, having used Smarty for some earlier projects made
maintenance very easy; for a project some time ago we had created a
whole bunch of additional tempalte functions, and with the verboseness
of their argument lists (HTML-style) and the fact that everything
looked very lean and elegant made adding features over a year later
very easy - no need for looking at the code and thinking "ok wait,
what was I doing here..". I'm convinced that easy-to-read source code
helps a lot with this. And I think most not completely small and
simple pure-PHP views are not very easy to read. Finally, a template
engine helps me get out of "programming mode" and into "design mode"
in my head.

But it's really a matter of taste; I doubt I can convince anyone who's
fine with using plain PHP for views to use mplate instead. I can
imagine some smarty users may be into it, however.
if you're cool with that, I may want to add your filter class thingy
as an extention to it, though, for more fine grained control)

Sure! I am happy you like it. It's published unter GPL. It would be
great if you linkhttp://code.google.com/p/cvphplib/in case you use my
code. Note that this was a prototype. I already have some future changes
in mind to further decrease the already small learning curve and at the
same time add some more flexibility. Get the latest code from the SVN.
Willdo.

After some thought: does your approach also allow objects to be
filtered? this would possibly be a whole bit more complicated, with
method call return values and magic/non-magic properties, etc..

-egbert
Sep 13 '08 #6

P: n/a
Hej Egbert,
>In my eyes it is
better to always be forced to choose the right filter depending on the
context and for example html, urls or javascript are different contexts
that require different filters.

I think you're right there. Maybe I'll have to change some of that.
I think it is partly a question of taste. I am following the
perfectionists path here. But automatically applying htmlentities to
everything by default (but with a by-pass option) also has an advantage.
It makes template code shorter, which is also nice.

About template lanugages:
Mostly because of control structures. [...] Wrapping them
into curly or angular braces makes the whole thing a much nicer read,
and easier to understand for non-programmers (doing the HTML design).
To my experience, it is a big difference for some.

But it's really a matter of taste;
I agree, in the end it comes down to a question of taste and psychology.
I personally stick to PHP only, because the I gain of a template engine
is smaller than the effort for setup, compilation and learning a new
language. And I disagree with your statement on the mplate website,
compared to plain PHP, mplate IS another language and a new way of
working, even if it is closer to PHP.

However, I also use Smarty in one project for quite some years now.
Mainly because of it's security feature. The project is sort of a web
service, where the clients can provide arbitrary presentation logic
using smarty templates. It started before the common web service
standards became really popular or at least known to me. I was a lot
less experienced at that time. Our biggest Problem throughout all time
was that people didn't know the Smarty syntax and having to learn it was
a barrier. So another reason why I like PHP only templates is that PHP
is well known and well documented.
After some thought: does your approach also allow objects to be
filtered? this would possibly be a whole bit more complicated, with
method call return values and magic/non-magic properties, etc..
Yes, objects are also wrapped dynamically and recursively using magic
methods. So are arrays using the ArrayAccess interface. Accessing
anything inside them returns a decorated value. Read the examples in the
SVN, it's all in there :). Browse the SVN from the google code page if
you want. (And yes it was a "whole bit more complicated" to implement
;), but I wanted a comprehensive solution and SPL helped a lot).

One oddity for objects is, that since they are wrapped by decorating
objects, using methods or operators like "instanceof" on the outer
objects does not work as expected, at least if you expect equivalent
results. I didn't like that at first. But in a way it makes sense, as
the wrapping object controls all access to the wrapped object, even to
the type information. So as with strings, anytime the REAL value is
needed, ->raw() is the answer.

/Christopher
Sep 13 '08 #7

This discussion thread is closed

Replies have been disabled for this discussion.