Automated Form Validation?

Stefan Richter

Hi, after coding for days on stupid form validations -
Like:
strings (min / max length), numbers(min / max value), money(min / max
value), postcodes(min / max value), telefon numbers,
email adresses and so on.

I thought it might be a better way to programm an automated, dynamic
form validation that works for all kinds of fields, shows the
necessary error messages and highlights the coresponding form fields.

Before I start to reinvent the wheel I thought I should ask you guys
if someone has done this before, or if there are some good examples of
how to do this on the web?

Thanks,

Stefan

Jul 17 '05 #1

Subscribe Post Reply

3865

Matt Mitchell

"Stefan Richter" <Do****@gmx.de> wrote in message
news:e2**************************@posting.google.c om...
: Hi, after coding for days on stupid form validations -
: Like:
: strings (min / max length), numbers(min / max value), money(min / max
: value), postcodes(min / max value), telefon numbers,
: email adresses and so on.
:
: I thought it might be a better way to programm an automated, dynamic
: form validation that works for all kinds of fields, shows the
: necessary error messages and highlights the coresponding form fields.
:
: Before I start to reinvent the wheel I thought I should ask you guys
: if someone has done this before, or if there are some good examples of
: how to do this on the web?

I've rolled my own, as I suspect many people on here have; that said, there
are many excellent class libraries for this available online!

For starters, have a look on http://pear.php.net/ and http://phpclasses.org/
(before Manuel Lemos gets his plug in...).

There are also many PHP Application Framework (search for this on google)
systems, which will provide this kind of thing - I expect things like
phppeanuts probably do it...

Matt

Jul 17 '05 #2

Marcin Dobrucki

Matt Mitchell wrote:

For starters, have a look on http://pear.php.net/ and http://phpclasses.org/
(before Manuel Lemos gets his plug in...).

I have been using PEAR's QuickForm which does this kind of thing.
See the "User Documentation" and from there "Quick Start". Basically,
once you add an element to the form, you do $f->addRule(...), and
QuickForm creates the necessary JavaScript code to validate the fields.

/Marcin

Jul 17 '05 #3

Manuel Lemos

Hello,

on 02/22/2005 06:51 PM Stefan Richter said the following:

Hi, after coding for days on stupid form validations -
Like:
strings (min / max length), numbers(min / max value), money(min / max
value), postcodes(min / max value), telefon numbers,
email adresses and so on.

I thought it might be a better way to programm an automated, dynamic
form validation that works for all kinds of fields, shows the
necessary error messages and highlights the coresponding form fields.

Before I start to reinvent the wheel I thought I should ask you guys
if someone has done this before, or if there are some good examples of
how to do this on the web?

As Matt mentioned, you may want to try this very popular forms
generation and validation class:

http://www.phpclasses.org/formsgeneration
--

Regards,
Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/

Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html

Jul 17 '05 #4

Chung Leong

"Stefan Richter" <Do****@gmx.de> wrote in message
news:e2**************************@posting.google.c om...

Hi, after coding for days on stupid form validations -
Like:
strings (min / max length), numbers(min / max value), money(min / max
value), postcodes(min / max value), telefon numbers,
email adresses and so on.

I thought it might be a better way to programm an automated, dynamic
form validation that works for all kinds of fields, shows the
necessary error messages and highlights the coresponding form fields.

Before I start to reinvent the wheel I thought I should ask you guys
if someone has done this before, or if there are some good examples of
how to do this on the web?

I have always found it easily to do it in every page. I mean how hard is it
to check the length of a string or run a regexp?

Jul 17 '05 #5

Matt Mitchell

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:Nu********************@comcast.com...
: > Before I start to reinvent the wheel I thought I should ask you guys
: > if someone has done this before, or if there are some good examples of
: > how to do this on the web?
:
: I have always found it easily to do it in every page. I mean how hard is
it
: to check the length of a string or run a regexp?

It's not hard, but it's pointless doing it 80-200 times in a single
application, when you can do it once and it will all just *work*. Less code
to debug, for starters.

It also allows you to do things like construct site interfaces
*dynamically*.

Obviously I wouldn't write a class to check that 2==2, but a method/function
to verify that an email address is valid, or a url is valid, can clean up
code and make sure that it's doing exactly what you expect it do.

Do you unroll all your loops too?

;-)

Matt

Jul 17 '05 #6

Chung Leong

"Matt Mitchell" <m_****************************@metalsponge.net> wrote in
message news:bV********************@fe1.news.blueyonder.co .uk...

It's not hard, but it's pointless doing it 80-200 times in a single
application, when you can do it once and it will all just *work*. Less code to debug, for starters.

It sounds good in theory, but we all know that it rarely is the case that
you write it once and it all just works. Validation is open-ended. You will
likely have to tinker with the code over time. When you do, good QA practice
tells you to retest all parts of the application that could be affected. And
obviously, it's no fun having to retest 80-200 features just because one new
input field requires special handling.

Jul 17 '05 #7

Matt Mitchell

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:m-********************@comcast.com...
: "Matt Mitchell" <m_****************************@metalsponge.net> wrote in
: message news:bV********************@fe1.news.blueyonder.co .uk...
: > It's not hard, but it's pointless doing it 80-200 times in a single
: > application, when you can do it once and it will all just *work*. Less
: code
: > to debug, for starters.
:
: It sounds good in theory, but we all know that it rarely is the case that
: you write it once and it all just works. Validation is open-ended. You
will
: likely have to tinker with the code over time. When you do, good QA
practice
: tells you to retest all parts of the application that could be affected.
And
: obviously, it's no fun having to retest 80-200 features just because one
new
: input field requires special handling.

OK, I'm obviously missing something here.
I have a web app, with several forms that need to be filled in. One field
type that often comes up is an email address.

Scenario 1:

Each time the email address is checked, I have some code along the lines of

if (!preg_match('(quickcommonsolution|/some\\ mad\\ regex)/', $someinput))
....

I have this code in my application 200 times.

Later on, I decide to change the rules for validating an email address.

I now have to change and CHECK 200 lines to make sure the regex is still
correct.

Scenario 2:

Each time the email address is checked, I do

if (!this_is_a_valid_email($someinput)) ...

I have this code in my application 200 times.

Later on, I decide to change the rules for validating an email address.

I now have to change and CHECK 1 line to make sure the regex is still
correct, because it is in the function defined.

You can have more than one type of "generic" field, maybe with variation
parameters - so have functions that can handle it.

The point is, if there are repeating aspects of your code, you place them in
the same place, so that you only have to write, check and change the code
ONCE. I really do fail to see the sense in what you're saying here, sorry.

Matt

Jul 17 '05 #8

Michael Fesser

.oO(Chung Leong)

It sounds good in theory, but we all know that it rarely is the case that
you write it once and it all just works. Validation is open-ended. You will
likely have to tinker with the code over time.
OOP might help.
When you do, good QA practice
tells you to retest all parts of the application that could be affected. And
obviously, it's no fun having to retest 80-200 features just because one new
input field requires special handling.

You could write validation classes for every purpose thats needed:
string checks, range check, numbers, dates, sets, ... Whenever you need
something special, you just have to write a new validator and apply it
to the form field/value. And if you want to change a validation
algorithm you just have to alter one class.

Micha

Jul 17 '05 #9

rick.huby

>From my experience of form checking it should be easy to set specific
rules for the various fields.

I overcame the overall problem by creating a FormChecker class with
functions such as ValidateEmail, ValidateText, ValidateURL etc.

If suddenly you find that you need to do a specific new check (eg
validate an email from a specific domain) then just create a new
function for this and leave the other one be.

This *should* mean that you don't need to worry about retesting all of
the other functions you have used elsewhere.

With regard to the overall issue - are you saying that you have a
funciton defined in an include file for validating the email, or that
you have the actual code on each validation page?

Jul 17 '05 #10

Matt Mitchell

<ri*******@e-connected.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

: With regard to the overall issue - are you saying that you have a
: funciton defined in an include file for validating the email, or that
: you have the actual code on each validation page?

Specifically, I suggested putting this kind of code into a function/method
call, and was told it is better to write the code *every time it is needed*.

Discuss

Jul 17 '05 #11

Chung Leong

"Matt Mitchell" <m_****************************@metalsponge.net> wrote in
message news:uc*******************@fe3.news.blueyonder.co. uk...

Scenario 2:

Each time the email address is checked, I do

if (!this_is_a_valid_email($someinput)) ...

I have this code in my application 200 times.

Later on, I decide to change the rules for validating an email address.

I now have to change and CHECK 1 line to make sure the regex is still
correct, because it is in the function defined.
I don't what kind of application would "often" ask the user for a e-mail
address. But anyway...

Taking your example, say initially the validation function will only return
true if an address is of the form xx**@xxxx.xxx. Throughout the application
the assumption is made that a e-mail address is of that format and that
format only. Since it cannot contain single quotes, when you insert an
address into a SQL statement, you decide that it's not necessary to escape
them. Likewise, since it cannot contain square brackets, you decide not to
pass it through htmlspecialchars() when you echo it.

Now some time latter, you--or perhaps someone else--decide to relax the
validation rule to accommodate full RFC 2822 syntax, that is, e-mail address
with display names. With change to that one line of code, you suddenly
introduce God knows how many SQL injection and cross-site scripting
vulnerabilities into your application. At the least, your application
wouldn't work correctly. And you wouldn't even know that has happened, since
your argument is predicated on you not retesting features that have been
affected by the change.
The point is, if there are repeating aspects of your code, you place them in the same place, so that you only have to write, check and change the code
ONCE. I really do fail to see the sense in what you're saying here,

sorry.

No one is arguing against modularization here. Data validation is just not
well disposed to be modularized and centralized. I mean, think about it,
what constitutes valid data? That term has little meaning without a context.
We say that a piece of data is valid if it conforms to the expectation of
the code that makes use of the data. The most obvious example is date
handling. If the date is going to be converted to a Unix timestamp, it
cannot be earlier than 1970. On the other hand, if it's going to be stored
as a string, then "N/A" could be valid. How the validation should be done
depends on how the data will be used. Instead of trying to communicate this
context information to some independent validation module, it often easier
to just do the validation right there.

Jul 17 '05 #12

Matt Mitchell

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:wb********************@comcast.com...
: "Matt Mitchell" <m_****************************@metalsponge.net> wrote in
: message news:uc*******************@fe3.news.blueyonder.co. uk...
: I don't what kind of application would "often" ask the user for a e-mail
: address. But anyway...

It was intended as an example - to reduce the level of abstraction in the
discussion (which was about abstraction in code!...)

:
: Taking your example, say initially the validation function will only
return
: true if an address is of the form xx**@xxxx.xxx. Throughout the
application
: the assumption is made that a e-mail address is of that format and that
: format only. Since it cannot contain single quotes, when you insert an
: address into a SQL statement, you decide that it's not necessary to escape
: them. Likewise, since it cannot contain square brackets, you decide not to
: pass it through htmlspecialchars() when you echo it.

I would refute this "sane programming scenario" right at the point where you
decide that user-inputted data is fine to insert into a database without
escaping. On which particular planet is this a good idea? If you are
taking even basic precautions against attacks, then you escape ALL data
before putting it into the database - even down to things like making sure
that numeric fields contain numeric data, etc.

: Now some time latter, you--or perhaps someone else--decide to relax the
: validation rule to accommodate full RFC 2822 syntax, that is, e-mail
address
: with display names. With change to that one line of code, you suddenly
: introduce God knows how many SQL injection and cross-site scripting
: vulnerabilities into your application. At the least, your application
: wouldn't work correctly. And you wouldn't even know that has happened,
since
: your argument is predicated on you not retesting features that have been
: affected by the change.

Or you don't introduce any vulnerabilities at all, if you follow proper
programming practices. Data is escaped before putting it into a database,
entity-escaped before putting it in an html page, and url-encoded before
putting it in a url.

If changing the validation rules on a field can cause this kind of problem,
then you don't have a security problem with the field values - you have a
security problem with a programmer who doesn't check data before using it.
Period.

: > The point is, if there are repeating aspects of your code, you place
them
: in
: > the same place, so that you only have to write, check and change the
code
: > ONCE. I really do fail to see the sense in what you're saying here,
: sorry.

OK, I'll quote your earlier post:

"Matt Mitchell" <m_****************************@metalsponge.net> wrote in
message news:bV********************@fe1.news.blueyonder.co .uk...
It's not hard, but it's pointless doing it 80-200 times in a single
application, when you can do it once and it will all just *work*. Less code to debug, for starters.

It sounds good in theory, but we all know that it rarely is the case that
you write it once and it all just works. Validation is open-ended. You will
likely have to tinker with the code over time. When you do, good QA practice
tells you to retest all parts of the application that could be affected. And
obviously, it's no fun having to retest 80-200 features just because one new
input field requires special handling.

(To set it in context, this was a response to my comment that it would make
more sense to code regex email validation in a write-once function, rather
than coding it for each use)

This posting would seem to be arguing that modularization is *not* a good
idea, that it's better to write the code each time you need that
functionality.

:
: No one is arguing against modularization here. Data validation is just not
: well disposed to be modularized and centralized. I mean, think about it,
: what constitutes valid data? That term has little meaning without a
context.

OK, so here are a few examples of classes of data that can be validated to
some kind of rule:

UK phone numbers/postcodes
US phone numbers/zipcodes
European postcodes

Most countries' car registration numbers, except for "vanity" plates
Social security numbers for most countries
Credit card numbers
Bank account numbers
Passport numbers

: We say that a piece of data is valid if it conforms to the expectation of
: the code that makes use of the data. The most obvious example is date
: handling. If the date is going to be converted to a Unix timestamp, it
: cannot be earlier than 1970. On the other hand, if it's going to be stored
: as a string, then "N/A" could be valid. How the validation should be done

But if a user enters "N/A" when you want to make sure they are entering a
date, then "N/A" is NOT valid - maybe it would be a better idea to indicate
somewhere else that there is no valid date in an input field/database field.
Remember the problems that came up for "09-09-99" being used to indicate "no
date"?

: depends on how the data will be used. Instead of trying to communicate
this
: context information to some independent validation module, it often easier
: to just do the validation right there.

But in the vast majority of cases, the validation IS generic. Most computer
software, most people, and most businesses handle the same type of data
repeatedly; computers are useful because they are good at doing the same
task over and over and over again, exactly the same each time. People are
very bad at doing this, and that's why it's better to get something right,
and then let a computer handle getting it done right the next time.

Jul 17 '05 #13

Tony Marston

"Matt Mitchell" <m_****************************@metalsponge.net> wrote in
message news:dL*******************@fe3.news.blueyonder.co. uk...

<ri*******@e-connected.com> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

: With regard to the overall issue - are you saying that you have a
: funciton defined in an include file for validating the email, or that
: you have the actual code on each validation page?

Specifically, I suggested putting this kind of code into a function/method
call, and was told it is better to write the code *every time it is
needed*.

Whoever told you that is an utter moron. The trick to writing efficient code
is "write once, use many" instead of "write many, use once". I have written
a single validation class which can perform the basic validation for any and
every form. It has two inputs (1) an associative array of name=value pairs
(usually the $_POST array), and (2) and array of validation rules for each
field in the form. take a look at the following URLs for details:

http://www.tonymarston.co.uk/php-mys...ects2.html#a5a

http://www.tonymarston.co.uk/php-mysql/databaseobjects2.html#$fieldspec

--
Tony Marston

http://www.tonymarston.net

Jul 17 '05 #14

Jean-Marc Molina

Stefan Richter a écrit/wrote :

Hi, after coding for days on stupid form validations -
...
Before I start to reinvent the wheel I thought I should ask you guys
if someone has done this before, or if there are some good examples of
how to do this on the web?

Can't check all existing answers but my advice is to check out the Validate
PEAR package http://pear.php.net/package/Validate. To avoid re-inventing the
wheel my advice is to implement a form validator as a singleton and form
validation class inherited from a form validation base class. Each class
calls form validators methods to validate form attributes and implement
methods like display and validate of the base class.

--
Jean-Marc.

Jul 17 '05 #15

Matt Mitchell

"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message
news:cv*******************@news.demon.co.uk...

: Whoever told you that is an utter moron. The trick to writing efficient
code
: is "write once, use many" instead of "write many, use once". I have
written
: a single validation class which can perform the basic validation for any
and

This is exactly the viewpoint I advanced myself! (Obviously the "write
once, use many" section - can't comment on the "moron" bit...)

This whole flamewar is about whether or not you can write generic
input-validation code.

: every form. It has two inputs (1) an associative array of name=value pairs
: (usually the $_POST array), and (2) and array of validation rules for each
: field in the form. take a look at the following URLs for details:

Similar kind of thing here - I use an array of fields, giving field name,
restrictions/validation requirements, description, etc, and the code handles
input, output, validation and form creation.

Matt

Jul 17 '05 #16

Chung Leong

"Matt Mitchell" <m_****************************@metalsponge.net> wrote in
message news:Cq*******************@fe2.news.blueyonder.co. uk...

I would refute this "sane programming scenario" right at the point where you decide that user-inputted data is fine to insert into a database without
escaping. On which particular planet is this a good idea? If you are
taking even basic precautions against attacks, then you escape ALL data
before putting it into the database - even down to things like making sure
that numeric fields contain numeric data, etc.
That's a bit unfair, isn't it? I would, of course, every argument arguing
against the ideal scenario. Everything would be honky dorry be if everyone
follows best practice et al. What guarantee can you give that best practice
was followed, given that, as you said below, people are proned to err? And
keep in mind that validation has a direct bearing on security. If your
assertion that everything was coded according to best practice turned out to
be untrue, then you have all sort of holes in your application.
But in the vast majority of cases, the validation IS generic. Most computer software, most people, and most businesses handle the same type of data
repeatedly; computers are useful because they are good at doing the same
task over and over and over again, exactly the same each time. People are
very bad at doing this, and that's why it's better to get something right,
and then let a computer handle getting it done right the next time.

If that's true, then the validation rules aren't going to change. So you're
back to square one.

Jul 17 '05 #17

Chung Leong

"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message
news:cv*******************@news.demon.co.uk...

Whoever told you that is an utter moron. The trick to writing efficient code is "write once, use many" instead of "write many, use once". I have written a single validation class which can perform the basic validation for any and every form. It has two inputs (1) an associative array of name=value pairs
(usually the $_POST array), and (2) and array of validation rules for each
field in the form. take a look at the following URLs for details:

Maybe I'm an utter moron, but I don't see how your class can validate fields
that are cross-dependent. A simple example is date range. Whether an end
date is valid depends on the start date, and validity of both could be
dependent on the current date.

Jul 17 '05 #18

Tony Marston

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:p7********************@comcast.com...

"Tony Marston" <to**@NOSPAM.demon.co.uk> wrote in message
news:cv*******************@news.demon.co.uk...
Whoever told you that is an utter moron. The trick to writing efficient

code
is "write once, use many" instead of "write many, use once". I have

written
a single validation class which can perform the basic validation for any

and
every form. It has two inputs (1) an associative array of name=value
pairs
(usually the $_POST array), and (2) and array of validation rules for
each
field in the form. take a look at the following URLs for details:

Maybe I'm an utter moron, but I don't see how your class can validate
fields
that are cross-dependent. A simple example is date range. Whether an end
date is valid depends on the start date, and validity of both could be
dependent on the current date.

That is because there are two levels of validation:
(1) Primary - where a single field can be validated according to various
specifications, such as type and size, or that its contents matches a
particular pattern, et cetera.
(2) Secondary - where a field needs to be validated against another field,
which may or may not be on the same table.

In my infrastructure all primary validation is handled by a single
validation class (refer to
http://www.tonymarston.co.uk/php-mys...lidation.class)
which makes use of an array of field specifications. Once this has been
defined all primary validation is automatic.

Secondary validation cannot be defined in the $fieldspec array, so it must
be defined elsewhere (refer to
http://www.tonymarston.co.uk/php-mys...aq.html#faq13).

--
Tony Marston

http://www.tonymarston.net

Jul 17 '05 #19

Matt Mitchell

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:wv********************@comcast.com...

: That's a bit unfair, isn't it? I would, of course, every argument arguing
: against the ideal scenario. Everything would be honky dorry be if everyone
: follows best practice et al. What guarantee can you give that best
practice
: was followed, given that, as you said below, people are proned to err? And
: keep in mind that validation has a direct bearing on security. If your
: assertion that everything was coded according to best practice turned out
to
: be untrue, then you have all sort of holes in your application.
:

I was considering this from the standpoint of coding to improve security.
If you are arguing from the viewpoint that "it is necessary to prevent sql
injection/xss attacks/etc", then it is natural to assume that you would
validate all user data and check it doesn't contain anything harmful.
Whether you do this by refusing to allow users to edit the site templates,
or by escaping all their data to block them, it is a logically inconsistent
argument to say "my method which checks user data is better than yours,
because yours requires the code to check the user data".
: > But in the vast majority of cases, the validation IS generic. Most
: computer
: > software, most people, and most businesses handle the same type of data
: > repeatedly; computers are useful because they are good at doing the same
: > task over and over and over again, exactly the same each time. People
are
: > very bad at doing this, and that's why it's better to get something
right,
: > and then let a computer handle getting it done right the next time.

: If that's true, then the validation rules aren't going to change. So
you're
: back to square one.

1 - If the rules *never* change, then it is still more efficient to code the
validation once, and then reference that function/method multiple times
2 - I didn't say that the validation rules never change, I was arguing that
if they *do* change, it is easier, less error-prone, and more efficient to
change them in a single location; additionally this method can lead to much
more readable code.

Matt

Jul 17 '05 #20

Chung Leong

"Matt Mitchell" <m_****************************@metalsponge.net> wrote in
message news:Ko*******************@fe2.news.blueyonder.co. uk...

I was considering this from the standpoint of coding to improve security.
If you are arguing from the viewpoint that "it is necessary to prevent sql
injection/xss attacks/etc", then it is natural to assume that you would
validate all user data and check it doesn't contain anything harmful.
Whether you do this by refusing to allow users to edit the site templates,
or by escaping all their data to block them, it is a logically inconsistent argument to say "my method which checks user data is better than yours,
because yours requires the code to check the user data".
What the heck kind of approach to improve security is that? Improving
security by...doing nothing? You can't just assume that data
validation/encoding happens correctly. The only way you can verify that user
input is correctly and securely handled is through boundary testing. In the
e-mail example, all the fields would pass the test prior to the change
because special characters are rejected by the validation routine. Whether
they are escaped is thus not tested. After the change, special characters
get through validation to reach the database or HTML. If you do not retest
all parts of the application affected by the change then there's always a
chance of a vulnerability being introduced.
1 - If the rules *never* change, then it is still more efficient to code the validation once, and then reference that function/method multiple times
My answer to that is "the validation function exists already; it's called
preg_match." Regular expressions can describe the requirement of the input
exactly, whereas generic terms like "date" and "phone number" cover variety
of possible formats. When exactly does a function like IsPhoneNumber()
return true? Is 555-444-4444 ok? Or does it have to be (555) 444-4444? What
if I put in a country code?
2 - I didn't say that the validation rules never change, I was arguing that if they *do* change, it is easier, less error-prone, and more efficient to
change them in a single location; additionally this method can lead to much more readable code.

Again, you're ignoring the need for testing, which would consume far more
time than making the change itself. And it goes without saying that untested
code is more error-prone.

I would rather make the changes one at a time and test each of them then to
make one change and have it propagate through out. Copy-and-pasting takes
seconds and I would know which part of the application has been affected.

Jul 17 '05 #21

Matt Mitchell

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:Hc********************@comcast.com...
: > Whether you do this by refusing to allow users to edit the site
templates,
: > or by escaping all their data to block them, it is a logically
: inconsistent
: > argument to say "my method which checks user data is better than yours,
: > because yours requires the code to check the user data".
:
: What the heck kind of approach to improve security is that? Improving
: security by...doing nothing? You can't just assume that data
: validation/encoding happens correctly. The only way you can verify that
user
: input is correctly and securely handled is through boundary testing. In
the
OK, I think this is going to be my last post in this thread, so here goes...

Shades of rtfThread here. Brief executive summary of the to-and-fro on
this:

MM: Use templates and function calls for validation
CL: Templates are bad. People can insert javascript into them. Run XSS
attacks.
Validation functions are bad, since you validate things
differently every time.
MM: Generic validation is good, since you write it once and then just *use*
it - you know it works already.
CL: But what about if you change the criteria for validation? Then your
code will break.

<lots of to-and-fro with "yes it will" "no it won't" "yo' mama" etc, etc>
My point on validation and escaping, is that
1 - when you accept data into the system, validate it. Check it follows the
rules you want. Reject it if it doesn't. Sanitise the nasty bits out.
2 - when you store data, make sure that it encodes into a way so that
everything behaves the way you're expecting. Don't insert an unescaped
single quote mark into a database, just because your script was passed one.

: e-mail example, all the fields would pass the test prior to the change
: because special characters are rejected by the validation routine. Whether
: they are escaped is thus not tested. After the change, special characters
: get through validation to reach the database or HTML. If you do not retest
: all parts of the application affected by the change then there's always a
: chance of a vulnerability being introduced.

Applying the approach above to the email example, after the change
1 - emails will only be passed as "ok" if any special characters occur in
the right form. Validation is slightly more complex than just saying "oh
yeah, it matches /^[a-zA-Z0-9@.-+]{3}$/". I would never advocate just
allowing any string that contained an "@" char in it as a valid email
address.
2 - after you've checked that the email address IS in a legal format, before
storing it in a database you need to ensure that the characters are encoded
in such a way that you construct a database query that is correct, legal,
and does not cause any security problems on the server. All user data is
encoded with slashes added, etc, before putting it anywhere near a database,
even if the field is only allowed to match /^[a-z]*$/ - I know this, because
the database access code does it, so that the rest of the application
doesn't have to check whether the data needs escaping, it will *always* get
done.

How is this "doing nothing"?

: > 1 - If the rules *never* change, then it is still more efficient to code
: the
: > validation once, and then reference that function/method multiple times
:
: My answer to that is "the validation function exists already; it's called
: preg_match." Regular expressions can describe the requirement of the input

So, a function that uses a regular expression *can't* specify input
requirements exactly? How come?

Also, applying a reductio ad absurdum to your argument
1 - using preg_match() is redundant, since it's possible to write code that
will do this using string comparison functions.
2 - PHP is redundant, since any other Turing machine can carry out these
operations

: of possible formats. When exactly does a function like IsPhoneNumber()
: return true? Is 555-444-4444 ok? Or does it have to be (555) 444-4444?
What
: if I put in a country code?

This is a great example of confusing presentation and information.

If you are accepting a standard US phone number as input, the key criterion
is that there should be precisely 10 numeric digits in the string. You can
restrict it more, e.g. limit to valid area codes, etc, but this principle is
valid.

So:
s/[^0-9]*//
if match /^[0-9]{10}$/
isphonenumber = true
else
isphonenumber = false
end if

that way, "555-444-4444" will validate, as will "(555) 444-4444". The
brackets, spaces and hyphens do not affect the underlying semantic of the
input. Understanding this type of distinction and abstraction is key to
designing a non-trivial system.

What *if* you wanted to add different countries' phone numbers into possible
validation types? One solution: have the country code in a separate field.
Then you can check this to be a valid code, and then select an appropriate
validation function/regex to check the "local" part of the phone number.

Or you could write a n-character long regex that will handle all of this at
once (although this is probably not the best way to implement it), although
it is functionally equivalent.

It's all still possible, and since validating several different countries'
phone numbers is so much more complex, perhaps it would be better to
encapsulate this functionality into a function so that the code, once
verified, can be re-used without having to make sure the code is inserted
correctly.

: exactly, whereas generic terms like "date" and "phone number" cover
variety

No, generic terms like "date" and "phone number" come from a fuzzy, human
domain. The concept of data validation is to restrict the fuzzy input to a
clearly-defined subset. If I tell you the date is "30 February 2007.2", I
suspect that you would tell me that there is no such date. Why would you do
this? Because it is invalid. There are rules for the content of dates, and
most computer systems that need to manipulate data generally place
restrictions on the kinds of data they are prepared to work with.

: > 2 - I didn't say that the validation rules never change, I was arguing
: that
: > if they *do* change, it is easier, less error-prone, and more efficient
to
: > change them in a single location; additionally this method can lead to
: much
: > more readable code.
:
: Again, you're ignoring the need for testing, which would consume far more
: time than making the change itself. And it goes without saying that
untested

So changing one regex is more time-consuming that changing 30? And testing
one regex is more time-consuming than testing 30? What I'm saying is, that
once you decide a field/variable/whatever needs to contain a valid email
(within the given requirements of the system), if you have a function which
tests for this property, it isn't necessary to test it beyond demonstrating
that it follows the criteria set.

: code is more error-prone.
:
: I would rather make the changes one at a time and test each of them then
to
: make one change and have it propagate through out. Copy-and-pasting takes
: seconds and I would know which part of the application has been affected.

Again, every time I want to take an email as input, I check it for validity.
I always want emails to fit within the same set of limits. Similarly, when
I take a US phone number as input, I have a very rigid set of criteria to
make sure that it *is* a US phone number. Different parts of the same
system do not have different "conceptions" of what a valid US phone number
is, the abstraction is created so that it can be applied multiple times.

--------

Please feel free to reply to this (both Chung Leong and any other who might
be interested); I shan't reply however. I enjoy reading and posting in this
newsgroup because I find it interesting, and I think it is a good idea to
help other people who may be struggling with problems I've had to solve in
the past.

It does seem that we will never agree on many key points, so this is
unlikely to be a fruitful way to spend time. I *do* want to emphasise,
though, that I think the use of code modularisation (through functions, OOP
or any other method) is a Good Thing because it makes for better-engineered
software. I also feel that the use of templates for web-pages can also
bring about similar benefits, and makes for more maintainable sites,
especially if many people work on many different parts of the sites.

I'm not interested in being vindicated; I suspect quite a few people out
there agree with me on at least a few points, but even if they do not, I
still respectfully oppose your position.

Matt Mitchell

Jul 17 '05 #22

Automated Form Validation?

Similar topics