473,785 Members | 2,801 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

A better XSS trap (Feedback wanted)

Hej everybody,

I built something for myself that might help some of you as well.
Looking at a couple of PHP template engines made me think.

I have two main requirements for a presentation layer framework:
- use PHP as the template language
- effective XSS prevention without betting on discipline

Plain PHP only satisfies the first. I could not find a PHP template
engine that satisfies both. (Savant doesn't.)

So here is my own minimal solution and I would like to know your
opinion. Also, if anybody has seen something like it out there, please
point me to it.

The Idea:
Automatically wrap every string meant for output into an object, which
offers filtering methods like htmlentities. This also means intercepting
access to strings contained in Arrays and Objects. This can be done
using the Decorator pattern.

The code:
http://code.google.com/p/cvphplib/so...runk/cvphplib/
svn checkout http://cvphplib.googlecode.com/svn/trunk/cvphplib/

Example usage:
// first a simple string
<? $string = CV_OutputFilter ::filter( '<marquee>evil </marquee>' ); ?>
<?=$string?trig gers an error
<?=$string->htmlentities() ?works fine
<?=$string->urlencode()? works fine
<?=$string->raw()?output s the unfiltered value

// extracting a bunch of filtered variables into the local scope
<?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>' <b>') );
extract( CV_OutputFilter ::filter($vars)->toArray() );
?>

// access to object members
<?=$o->var?triggers an error
<?=$o->method()?trigg ers an error
<?=$o->var->htmlentities() ?works fine
<?=$o->method()->htmlentities() ?works fine

// access to array elements
<?=$array['<i>']?triggers an error
<?=$array['<i>']->htmlentities() ?works fine

// Iterating over an array
<? foreach( $array as $value ){} ?works fine
<? foreach( $array as $key =$value ){} ?throws an exception, because
$key would not be filtered in this case

// decorating array keys requires some iterator magic
<? foreach( $array->key_as($key) as $value ): ?>
<?=$key->htmlentities() ?>: <?=$value->htmlentities() ?<br/>
<? endforeach; ?>
Potential problem:
- potentially slow (due to many object instantiations and reflection)

Benefits:
- effective XSS prevention without betting on discipline
- template-engine-like variable extraction into local scope
- clean and short syntax
- very little to learn

Functionality already implemented, but not shown in the example:
- register custom filter methods
- enable __toString() with custom default filter
- use tuple array(key,value ) for $value instead of 'key_as'-magic
- register custom filter applied on keys in ->toArray()
- decoration of multidimensiona l arrays and webs of object references

More example code:
http://code.google.com/p/cvphplib/so...tputFilter.php
http://code.google.com/p/cvphplib/so...tputFilter.php

So, what do you think?

Best regards

Christopher
Aug 30 '08 #1
6 1344
On 30 Aug, 18:23, Christopher Vogt <christopher.v. ..@rwth-aachen.de>
wrote:
Hej everybody,

I built something for myself that might help some of you as well.
Looking at a couple of PHP template engines made me think.

I have two main requirements for a presentation layer framework:
- use PHP as the template language
- effective XSS prevention without betting on discipline
<snip>
// extracting a bunch of filtered variables into the local scope
<?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>' <b>') );
extract( CV_OutputFilter ::filter($vars)->toArray() );
?>
You seem to be putting the cart before the horse. Representations
should reflect the application of the data. Preventing XSS requires a
different representation from preventing SQL injection - the solution
is to change the representation of the data just before it moves out
of PHP (into the browser or into the database).

So where's the benefit in replicating every change-of-representation
function as an object method?

How does this enforce change of representation without a discipline?
You can't enforce that the programmer calls the correct transform at
consumption, and you are requiring a discipline of encapsulating data
at the point where it enters the system.

C.
Aug 31 '08 #2
Hej C.,

thank you for your critique.
><?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>' <b>') );
extract( CV_OutputFilter ::filter($vars)->toArray() );
?>
The method name 'filter' might be confusing. Maybe
CV_OutputFilter ::decorate($var s) is a better name. The filtering is not
done at this point, but later, when the decorator object's methods (like
->htmlentities() ) are called.
Representations should reflect the application of the data.
What do you mean by that?
Preventing XSS requires a
different representation from preventing SQL injection - the solution
is to change the representation of the data just before it moves out
of PHP (into the browser or into the database).
Isn't my solution doing exactly that? In case of an HTML template the
data is converted to html entities, the very moment before it is
embedded into the html. <?=$some_var->htmlentities() ?>
So where's the benefit in replicating every change-of-representation
function as an object method?
How does this enforce change of representation without a discipline?
The benefit is the following. The usual way to apply htmlentities or
other escape methods is by calling a function and passing the string as
an argument, like <?=htmlentities ($some_var)?>. However people tend to
forget or neglect the function call and embed <?=$some_var?in stead,
which opens an XSS vulnerability. My solution forces the developer to
deliberately choose an escape method (or the raw data) because
forgetting to call an escape method fails with an error.
You can't enforce that the programmer calls the correct transform at
consumption,
I can only force him to deliberately choose. Of course I cannot force
him to choose the right thing.
and you are requiring a discipline of encapsulating data
at the point where it enters the system.
Ok, maybe I was a little unclear in this point. I do NOT mean
CV_OutputFilter ::filter when the data enters the system, but before it
leaves it. In MVC terms CV_OutputFilter ::filter would be called at the
end of the controller, just before the view is executed.

Does this clear things up?

Christopher
Aug 31 '08 #3
On Aug 30, 7:23*pm, Christopher Vogt <christopher.v. ..@rwth-aachen.de>
wrote:
Hej everybody,

I built something for myself that might help some of you as well.
Looking at a couple of PHP template engines made me think.

I have two main requirements for a presentation layer framework:
*- use PHP as the template language
*- effective XSS prevention without betting on discipline

Plain PHP only satisfies the first. I could not find a PHP template
engine that satisfies both. (Savant doesn't.)

So here is my own minimal solution and I would like to know your
opinion. Also, if anybody has seen something like it out there, please
point me to it.

The Idea:
Automatically wrap every string meant for output into an object, which
offers filtering methods like htmlentities. This also means intercepting
access to strings contained in Arrays and Objects. This can be done
using the Decorator pattern.

The code:http://code.google.com/p/cvphplib/so...runk/cvphplib/
svn checkouthttp://cvphplib.google code.com/svn/trunk/cvphplib/

Example usage:
// first a simple string
<? $string = CV_OutputFilter ::filter( '<marquee>evil </marquee>' ); ?>
<?=$string?trig gers an error
<?=$string->htmlentities() ?works fine
<?=$string->urlencode()? * *works fine
<?=$string->raw()?output s the unfiltered value

// extracting a bunch of filtered variables into the local scope
<?php
$vars = array( 'x'=>5, 'o'=>new O(), 'array' =array('<i>'=>' <b>') );
extract( CV_OutputFilter ::filter($vars)->toArray() );
?>

// access to object members
<?=$o->var?triggers an error
<?=$o->method()?trigg ers an error
<?=$o->var->htmlentities() ?works fine
<?=$o->method()->htmlentities() ?works fine

// access to array elements
<?=$array['<i>']?triggers an error
<?=$array['<i>']->htmlentities() ?works fine

// Iterating over an array
<? foreach( $array as $value ){} ?works fine
<? foreach( $array as $key =$value ){} ?throws an exception, because
$key would not be filtered in this case

// decorating array keys requires some iterator magic
<? foreach( $array->key_as($key) as $value ): ?>
* * * * <?=$key->htmlentities() ?>: <?=$value->htmlentities() ?<br/>
<? endforeach; ?>

Potential problem:
*- potentially slow (due to many object instantiations and reflection)

Benefits:
*- effective XSS prevention without betting on discipline
*- template-engine-like variable extraction into local scope
*- clean and short syntax
*- very little to learn

Functionality already implemented, but not shown in the example:
*- register custom filter methods
*- enable __toString() with custom default filter
*- use tuple array(key,value ) for $value instead of 'key_as'-magic
*- register custom filter applied on keys in ->toArray()
*- decoration of multidimensiona l arrays and webs of object references

More example code:http://code.google.com/p/cvphplib/so...vphplib/tests/...

So, what do you think?
Haven't checked out any code other than the stuff you posted here, but
I like it.

I like how just trying to display a variable triggers an error,
forcing the author to think. Additionally, I like how you're not
trying to add any framework-ish features, but leave it up to the
framework coders to maybe automatically do a CV_OutputFilter ::filter()
on all variables sent to the view (assuming an MVC setup).

I see only one downside, and that is that IMHO the position where
things are done is slightly architecturally incorrect: you typically
want to do htmlentities() as the last operation, or second last
sometimes (before an nl2br/bbcode formatter or so). This means that if
there are operations performed on the variable that require it
unmangled, the author needs to do a workaround:

<?= htmlentities(pa rse_important_s tuff_using_html _chars($var->raw())) ?
>
Naturally, if parse_important _stuff_using_ht ml_chars is common, this
can be solved by registering it as a custom filter, of course.

Another worry: did you test the speed of your approach? I haven't
looked into the source, but it appears quite some magic requires
__get, __set and __call, which may slow down the view element of the
page a bit.

(btw, as another (on topic) shameless plug, my home cooked compiling
template engine does htmlentities or htmlspecial chars by default when
echoing expressions, and allows pure PHP code, http://e.teeselink.nl/mplate..
if you're cool with that, I may want to add your filter class thingy
as an extention to it, though, for more fine grained control)

Egbert
Sep 11 '08 #4
Hej Egbert,

thank you for your feedback (and that you like it ;)).
I see only one downside, and that is that IMHO the position where
things are done is slightly architecturally incorrect: you typically
want to do htmlentities() as the last operation, or second last
sometimes (before an nl2br/bbcode formatter or so).
My idea was: One normally applies few sets of functions to most of the
variables most of the time. Maybe one set for normal fields, one for
textareas, and one for urls. Having them in custom filter methods makes
view code shorter and more maintainable. And you are right, for special
cases there is always ->raw(). I don't really see the downside ;).
Another worry: did you test the speed of your approach? I haven't
looked into the source, but it appears quite some magic requires
__get, __set and __call, which may slow down the view element of the
page a bit.
I have this worry as well as there is some magic going on indeed. I did
not test performance yet. It should absolutely be verified with
benchmarks, but my guess is the following:
For most views it should not be a performance problem, when there are
only up to a few dozen variables access in the view. My implementation
does lazy decoration (as in lazy evaluation), meaning that the strings
are almost only decorated on demand and that _not_ everything is
recursed in the beginning. This means one ends up with a number of
instantiated objects roughly matching the number of variables actually
accessed in the view.
However, for heavily dynamic-data-dependent views, such as large tables,
performance could be a problem. But then it would still be possible to
access the whole table using ->raw() and thereby circumvent decoration
of every single cell. But of course this puts back the duty to call
htmlentities manually.
(btw, as another (on topic) shameless plug, my home cooked compiling
template engine does htmlentities or htmlspecial chars by default when
echoing expressions
Actually I thought about having a default escape method using __toString
in my StringFilter as well, but then I disabled it by default. It is
still possible to enable it again and register one filter as the
__toString default filter, but I do not recommend it. In my eyes it is
better to always be forced to choose the right filter depending on the
context and for example html, urls or javascript are different contexts
that require different filters.
and allows pure PHP code, http://e.teeselink.nl/mplate.
I had a quick look on the website and do not see a large difference to
Smarty, expect that mplate seems to put a Smarty-like syntax closer to
PHP-Code. Still there is a new syntax to learn. Why not use pure PHP
instead? <?=$some_var? or {$some_var} is no big difference in my eyes,
only that the first is well known and does not need to be compiled an
extra time. What did I miss ;)?
if you're cool with that, I may want to add your filter class thingy
as an extention to it, though, for more fine grained control)
Sure! I am happy you like it. It's published unter GPL. It would be
great if you link http://code.google.com/p/cvphplib/ in case you use my
code. Note that this was a prototype. I already have some future changes
in mind to further decrease the already small learning curve and at the
same time add some more flexibility. Get the latest code from the SVN.

/Christopher
Sep 12 '08 #5
(btw, as another (on topic) shameless plug, my home cooked compiling
template engine does htmlentities or htmlspecial chars by default when
echoing expressions

Actually I thought about having a default escape method using __toString
in my StringFilter as well, but then I disabled it by default. It is
still possible to enable it again and register one filter as the
__toString default filter, but I do not recommend it. In my eyes it is
better to always be forced to choose the right filter depending on the
context and for example html, urls or javascript are different contexts
that require different filters.
I think you're right there. Maybe I'll have to change some of that.
and allows pure PHP code,http://e.teeselink.nl/mplate.

I had a quick look on the website and do not see a large difference to
Smarty, expect that mplate seems to put a Smarty-like syntax closer to
PHP-Code. Still there is a new syntax to learn. Why not use pure PHP
instead? <?=$some_var? or {$some_var} is no big difference in my eyes,
only that the first is well known and does not need to be compiled an
extra time. What did I miss ;)?
Mostly because of control structures. IMHO, things like

<?php foreach($foo as $bar): ?>
<li>toink <?=$bar?>
<?php if($bar=="banan a"): ?>
<i>(a banana)</i>
<?php endif; ?>
</li>
<?php endforeach; ?>

are a horrible read, and using normal syntax with if() { and } instead
of if(): .. endif; doesn't make it any better either. Wrapping them
into curly or angular braces makes the whole thing a much nicer read,
and easier to understand for non-programmers (doing the HTML design).
To my experience, it is a big difference for some.

Additionally, having used Smarty for some earlier projects made
maintenance very easy; for a project some time ago we had created a
whole bunch of additional tempalte functions, and with the verboseness
of their argument lists (HTML-style) and the fact that everything
looked very lean and elegant made adding features over a year later
very easy - no need for looking at the code and thinking "ok wait,
what was I doing here..". I'm convinced that easy-to-read source code
helps a lot with this. And I think most not completely small and
simple pure-PHP views are not very easy to read. Finally, a template
engine helps me get out of "programmin g mode" and into "design mode"
in my head.

But it's really a matter of taste; I doubt I can convince anyone who's
fine with using plain PHP for views to use mplate instead. I can
imagine some smarty users may be into it, however.
if you're cool with that, I may want to add your filter class thingy
as an extention to it, though, for more fine grained control)

Sure! I am happy you like it. It's published unter GPL. It would be
great if you linkhttp://code.google.com/p/cvphplib/in case you use my
code. Note that this was a prototype. I already have some future changes
in mind to further decrease the already small learning curve and at the
same time add some more flexibility. Get the latest code from the SVN.
Willdo.

After some thought: does your approach also allow objects to be
filtered? this would possibly be a whole bit more complicated, with
method call return values and magic/non-magic properties, etc..

-egbert
Sep 13 '08 #6
Hej Egbert,
>In my eyes it is
better to always be forced to choose the right filter depending on the
context and for example html, urls or javascript are different contexts
that require different filters.

I think you're right there. Maybe I'll have to change some of that.
I think it is partly a question of taste. I am following the
perfectionists path here. But automatically applying htmlentities to
everything by default (but with a by-pass option) also has an advantage.
It makes template code shorter, which is also nice.

About template lanugages:
Mostly because of control structures. [...] Wrapping them
into curly or angular braces makes the whole thing a much nicer read,
and easier to understand for non-programmers (doing the HTML design).
To my experience, it is a big difference for some.

But it's really a matter of taste;
I agree, in the end it comes down to a question of taste and psychology.
I personally stick to PHP only, because the I gain of a template engine
is smaller than the effort for setup, compilation and learning a new
language. And I disagree with your statement on the mplate website,
compared to plain PHP, mplate IS another language and a new way of
working, even if it is closer to PHP.

However, I also use Smarty in one project for quite some years now.
Mainly because of it's security feature. The project is sort of a web
service, where the clients can provide arbitrary presentation logic
using smarty templates. It started before the common web service
standards became really popular or at least known to me. I was a lot
less experienced at that time. Our biggest Problem throughout all time
was that people didn't know the Smarty syntax and having to learn it was
a barrier. So another reason why I like PHP only templates is that PHP
is well known and well documented.
After some thought: does your approach also allow objects to be
filtered? this would possibly be a whole bit more complicated, with
method call return values and magic/non-magic properties, etc..
Yes, objects are also wrapped dynamically and recursively using magic
methods. So are arrays using the ArrayAccess interface. Accessing
anything inside them returns a decorated value. Read the examples in the
SVN, it's all in there :). Browse the SVN from the google code page if
you want. (And yes it was a "whole bit more complicated" to implement
;), but I wanted a comprehensive solution and SPL helped a lot).

One oddity for objects is, that since they are wrapped by decorating
objects, using methods or operators like "instanceof " on the outer
objects does not work as expected, at least if you expect equivalent
results. I didn't like that at first. But in a way it makes sense, as
the wrapping object controls all access to the wrapped object, even to
the type information. So as with strings, anytime the REAL value is
needed, ->raw() is the answer.

/Christopher
Sep 13 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
5993
by: gregory_may | last post by:
This code seems to "work" but I get the following errors: An unhandled exception of type 'System.NullReferenceException' occurred in system.windows.forms.dll then this one: An unhandled exception of type 'System.ExecutionEngineException' occurred in system.windows.forms.dll Anyone know how to either trap these errors or prevent them from happening?
6
2164
by: lastusernameleft | last post by:
i've been researching this issue for a while and can't come to any conclusive answer, mostly it seems to be a preference over syntax, some saying c# is elegant while vb is clunky, or that c# is geeky while vb is more naturally legible. there dont seem to be many capabilities in one over the other, at least with whidbey, so basically, is there anything c# can do that vb can't?
3
8661
by: LineVoltageHalogen | last post by:
Greeting All, I have a stored proc that dynamically truncates all the tables in my databases. I use a cursor and some dynamic sql for this: ...... create cursor Loop through sysobjects and get all table names in my database. .... exec ('truncate table ' + @TableName)
6
4213
by: Ali Akbar | last post by:
Hi everyone, I just wanted to know how my program can trap a particular API call by windows. (The idea is that I want to hook all the open text boxes and keep hooking new ones as and when they are created). So for this i want to trap the CreateWindow or CreateWindowEx API as that I can keep track of the handles etc. I have an idea that this can be done by mapping the entry points of kernel32.dll, gdi32.dll etc, just like how SoftIce...
4
1290
by: Chris Botha | last post by:
If one wants to assure that there is only one instance of an app running, then the popular solution is to use a Mutex and a message box, telling the user there is already one running. I decided to go one further, bring the 1st instance to the foreground, no message boxes. Impressive, I thought. So the first instance stores the main window handle in shared memory (win32 calls), second instance gets the window handle from shared memory if it...
3
5014
by: Robert W. | last post by:
In my WinForms app I wanted to implement a Read-Only textbox. I didn't like the appearance a textbox takes on when the ReadOnly property is set true so instead I trapped the KeyPress event with a generic event handler: private void textBoxMakeReadOnly(object sender, KeyPressEventArgs e) { e.Handled = true; }
7
1803
by: apollonius2 | last post by:
Greetings, I have been working on a little project today to help me better understand classes in Python (I really like Python). I am a self taught programmer and consider myself to fall in the "beginner" category for sure. It was initially sparked by reading about "state machines". This was my attempt at it however I feel it is not quite where it should be: ON = "ON"
2
6533
by: Linda Liu[MSFT] | last post by:
Hi George, Keyboard input introduces the idea of focus. In Windows, a particular element is designated as having the focus, meaning that it acts as the target for keyboard input. The user sets the focus by clicking the mouse or tapping the stylus on the control in question, or by using navigation keys such as the Tab and cursor keys. In principle, any user interface element can receive the focus. The IsFocused property is defined on...
84
3973
by: Patient Guy | last post by:
Which is the better approach in working with Javascript? 1. Server side processing: Web server gets form input, runs it into the Javascript module, and PHP collects the output for document prep. 2. Client side processing: Web server gets form input and passes it to PHP which includes the Javascript written in a way to make the form input processed on the client side and rendered (probably using DOM function calls) on that side as...
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10147
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10090
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9949
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7499
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6739
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5380
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4050
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2879
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.