By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,143 Members | 1,855 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,143 IT Pros & Developers. It's quick & easy.

validation with 404 htaccess redirection

P: n/a

Hello everybody

Does anybody know why w3c validator can not get pages that use 404 htaccess
redirection? I set up two web sites so that clients request non-existent
urls, but htaccess redirects calls to a script which parses the url and
produces requested pages. It works fine with browsers, but when I try to
validate the page I get a 404 error - which bewilders me, because I thought
Apache does the redirection internally without sending the error message to
the client. Why is that, and is there a way to use validator for such pages
(other then validating by uploading file)?

Bartek Górny
--
#!/usr/bin/env python
print 'sygnatura'
Jul 23 '05 #1
Share this Question
Share on Google+
15 Replies


P: n/a
Taki Jeden <ba*********@interia.pl> wrote:
Does anybody know why w3c validator can not get pages that use 404
htaccess redirection?
There is no such thing as 404 redirection. HTTP code 404 means that the
requested resource is not available. It would be incorrect for a client to
treat it otherwise.
I set up two web sites so that clients request
non-existent urls, but htaccess redirects calls to a script which
parses the url and produces requested pages.
Then don't do that. (We don't really know what you do in detail, but it is
pretty clear that it's wrong.)
It works fine with browsers,
Browsers are known to have bugs.
but when I try to validate the page I get a 404 error - which
bewilders me, because I thought Apache does the redirection internally
without sending the error message to the client.


You did? Well, we have even less odds of knowing what's going on, since you
don't reveal the URL, still less show what you have in the .htaccess file.

Have you realized that search engines probably behave correctly, i.e.
remove a page from their indexes if they get a 404 response?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #2

P: n/a
Jukka K. Korpela wrote:
Taki Jeden <ba*********@interia.pl> wrote:
Does anybody know why w3c validator can not get pages that use 404
htaccess redirection?
There is no such thing as 404 redirection. HTTP code 404 means that the
requested resource is not available. It would be incorrect for a client to
treat it otherwise.
I set up two web sites so that clients request
non-existent urls, but htaccess redirects calls to a script which
parses the url and produces requested pages.


Then don't do that. (We don't really know what you do in detail, but it is
pretty clear that it's wrong.)


You don't know what I'm doing, but you know it's wrong? Cool.

My .htaccess file is like this:

ErrorDocument 404 /script.php

Every time you request a page that is not there, your request is handled by
the 'script.php'. It works - check out www.safetycam.pl, for example.
It works fine with browsers,


Browsers are known to have bugs.


Right - so please call the Mozilla and Firefox guys, tell them their
browsers have a serious bug :) (and Microsoft, of course)
but when I try to validate the page I get a 404 error - which
bewilders me, because I thought Apache does the redirection internally
without sending the error message to the client.


You did? Well, we have even less odds of knowing what's going on, since
you don't reveal the URL, still less show what you have in the .htaccess
file.


Yes I did, but I think I know what was my mistake: my guess is that the 404
is sent to the client together with the ErrorDocument information, then the
client sends another request asking for ErrorDocument, attaching the
initial request - while I initially thought that it is Apache itself that
uses the directive to handle the request. If my guess is right, that would
be a VERY bad news for me :(

Bartek

--
#!/usr/bin/env python
print 'sygnatura'
Jul 23 '05 #3

P: n/a
In our last episode,
<cv**********@atlantis.news.tpi.pl>,
the lovely and talented Taki Jeden
broadcast on comp.infosystems.www.authoring.html:
Jukka K. Korpela wrote:
Taki Jeden <ba*********@interia.pl> wrote:
Does anybody know why w3c validator can not get pages that use 404
htaccess redirection?


There is no such thing as 404 redirection. HTTP code 404 means that the
requested resource is not available. It would be incorrect for a client to
treat it otherwise.
I set up two web sites so that clients request
non-existent urls, but htaccess redirects calls to a script which
parses the url and produces requested pages.


Then don't do that. (We don't really know what you do in detail, but it is
pretty clear that it's wrong.) You don't know what I'm doing, but you know it's wrong? Cool. My .htaccess file is like this: ErrorDocument 404 /script.php Every time you request a page that is not there, your request is handled by
the 'script.php'. It works - check out www.safetycam.pl, for example.
Little wonder you are having problems. 404 means "Not found."
The ErrorDocument directive allows you present the user with
an attractive human-readable page for the "Not found" message.
It is still an Error which is why it is called *ErrorDocument*
and the message is "Not found."

Of course people who do customize their 404 document often
do validate it, but they do so *before* pointing ErrorDocument
at it.

It works fine with browsers,


Browsers are known to have bugs.

Right - so please call the Mozilla and Firefox guys, tell them their
browsers have a serious bug :) (and Microsoft, of course)
The bug, of course, is in your brain. Whatever possessed you to
think that 404 was anything other than an error?

but when I try to validate the page I get a 404 error - which
bewilders me, because I thought Apache does the redirection internally
without sending the error message to the client.
You did? Well, we have even less odds of knowing what's going on, since
you don't reveal the URL, still less show what you have in the .htaccess
file.

Yes I did, but I think I know what was my mistake: my guess is that the 404
is sent to the client together with the ErrorDocument information, then the
client sends another request asking for ErrorDocument, attaching the
initial request - while I initially thought that it is Apache itself that
uses the directive to handle the request. If my guess is right, that would
be a VERY bad news for me :(


What happens is Apache sends 404 and then the document. The 404
causes many agents, especially bots such as search spiders and,
as you have discovered, validation bots to sign off on the spot.
The idea of the ErrorDocument after all is to provide the human
with a human-readable clue as to what has happened - something
that is pointless if a human is not operating the client.

There is no point in Apache sending 404 to itself. It knows the
document isn't there. It does not substitute the ErrorDocument
for a non-existing document, but only substitutes it for its own
built-in rather plain not-found-error document.

--
Lars Eighner ei*****@io.com http://www.io.com/~eighner/
Save the Rainforest! Eat a vegetarian!
Jul 23 '05 #4

P: n/a


Lars Eighner wrote:
What happens is Apache sends 404 and then the document.


Since the URI on the ErrorDocument line is local.

In the case of a full URI, however, Apache issues a 302. The OP might
therefore want to try something like:

ErrorDocument 404 http://www.safetycam.pl/script.php

Thor

--
http://www.anta.net/OH2GDF
Jul 23 '05 #5

P: n/a
Thor Kottelin <th**@anta.net> wrote in news:42***************@anta.net:


Lars Eighner wrote:
What happens is Apache sends 404 and then the document.


Since the URI on the ErrorDocument line is local.

In the case of a full URI, however, Apache issues a 302. The OP might
therefore want to try something like:

ErrorDocument 404 http://www.safetycam.pl/script.php


Usually not a good idea, as it obfuscates the true HTTP
response code, and it also means that "script.php" will
not have access to the 'REDIRECT_*' Apache environment
variables.

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
http://www.confluence.org/
My website: http://members.shaw.ca/davepatton/
Jul 23 '05 #6

P: n/a
Taki Jeden <ba*********@interia.pl> wrote in
news:cv**********@atlantis.news.tpi.pl:
Jukka K. Korpela wrote:
Taki Jeden <ba*********@interia.pl> wrote:
Does anybody know why w3c validator can not get pages that use 404
htaccess redirection?
There is no such thing as 404 redirection. HTTP code 404 means that
the requested resource is not available. It would be incorrect for a
client to treat it otherwise.
I set up two web sites so that clients request
non-existent urls, but htaccess redirects calls to a script which
parses the url and produces requested pages.


Then don't do that. (We don't really know what you do in detail, but
it is pretty clear that it's wrong.)


You don't know what I'm doing, but you know it's wrong? Cool.


From your description, yes, what you are doing is wrong,
and people only have the information you provided, so
when they see something wrong, it's not surprising that
they tell you it is wrong ;-)

http://httpd.apache.org/docs/mod/cor...#errordocument
http://httpd.apache.org/docs/custom-error.html
http://www.php.net/manual/en/function.header.php
My .htaccess file is like this:

ErrorDocument 404 /script.php
Nothing wrong with doing that. When a non-existant
document is requested, Apache will set the contents
of some 'REDIRECT_*' environment variables, and then
PHP will parse "script.php".
Every time you request a page that is not there, your request is
handled by the 'script.php'. It works


Perhaps, for some definitions of "works", but not properly,
otherwise you wouldn't be here asking about your problem :-)
but when I try to validate the page I get a 404 error - which
bewilders me, because I thought Apache does the redirection
internally without sending the error message to the client.


You did? Well, we have even less odds of knowing what's going on,
since you don't reveal the URL, still less show what you have in the
.htaccess file.


Yes I did, but I think I know what was my mistake: my guess is that
the 404 is sent to the client together with the ErrorDocument
information, then the client sends another request asking for
ErrorDocument, attaching the initial request


Well, why bother guessing? The Apache behaviour is
documented at the above URLs. Moreover, if you use
the Firefox LiveHTTPHeaders extension you would see
that your guess is wrong.

If you are using "script.php" as a custom error
document, that's fine. If you are serving 'content'
via script.php, then that's not the correct way
to do things.

For example, this URL:
http://www.confluence.org/fred.php
generates an error page, along with a 404 response
header, because fred.php doesn't exist.

The same script also handles this case:
http://www.confluence.org/calendar/index.html
Because index.php exists, but not index.html,
the error document script issues, via header(),
an "HTTP/1.x 301 Moved Permanently" response,
and specifies the redirect location:
http://www.confluence.org/calendar/index.php
which the browser(and validator) handle properly.

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
http://www.confluence.org/
My website: http://members.shaw.ca/davepatton/
Jul 23 '05 #7

P: n/a
"Taki Jeden" wrote in comp.infosystems.www.authoring.html:

Does anybody know why w3c validator can not get pages that use 404 htaccess
redirection?


Because your server returns a "404 not found" status. The fact that
it also returns a lovely page is beside the point: the 404 means
that the _requested_ page is not there.

--

Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
Jul 23 '05 #8

P: n/a
On Sun, 27 Feb 2005 18:47:06 +0100, Taki Jeden
<ba*********@interia.pl> wrote:

Hello everybody

Does anybody know why w3c validator can not get pages that use 404 htaccess
redirection? I set up two web sites so that clients request non-existent
urls, but htaccess redirects calls to a script which parses the url and
produces requested pages. It works fine with browsers, but when I try to
validate the page I get a 404 error - which bewilders me, because I thought
Apache does the redirection internally without sending the error message to
the client. Why is that, and is there a way to use validator for such pages
(other then validating by uploading file)?

Bartek Górny


A permanent redirect is a 301. A temporary redirect is a 302. A 404
means page not found. Sounds like you have your codes muddled.

BB
--
www.kruse.co.uk/ SE*@kruse.demon.co.uk
Affordable SEO!
--
Jul 23 '05 #9

P: n/a
Lars Eighner wrote:


The bug, of course, is in your brain. Whatever possessed you to
think that 404 was anything other than an error?
Hang on - exception is an error too, but try/except is often used for
program flow control. Nothing wrong with my brain.
What happens is Apache sends 404 and then the document. The 404 [cut] document isn't there. It does not substitute the ErrorDocument
for a non-existing document, but only substitutes it for its own
Ok, this is the answer to my question. Thanks.

Time to mod_rewrite...

BG
causes many agents, especially bots such as search spiders and,
as you have discovered, validation bots to sign off on the spot.
The idea of the ErrorDocument after all is to provide the human
with a human-readable clue as to what has happened - something
that is pointless if a human is not operating the client.

There is no point in Apache sending 404 to itself. It knows the built-in rather plain not-found-error document.


--
#!/usr/bin/env python
print 'sygnatura'
Jul 23 '05 #10

P: n/a

In case anybody is interested, here is the way to do it (I admit that my
initial idea was not very clever...).

This should be done with mod_rewrite, by putting the following in the
VirtualHost section:

<IfModule mod_rewrite.c>
RewriteEngine on
# we don't want to redirect requests for images from /graph/ directory
RewriteRule /(graph.*) /$1 [skip=10]
# alternatively we can skip file types by extension
RewriteRule /(.*\.(jpg|gif|swf|pps|ppt|doc|zip|png|pdf)) /$1 [skip=10]
# make webalizer stats accessible the standard way
RewriteRule /(statistics.*) /$1 [skip=10]
# ...and redirect all the rest
RewriteRule /.* /mainscript.php
</IfModule>

This achieves the same goal - all requests are handled by mainscript, while
$_SERVER['REQUEST_URI'] remains intact, so we can parse it and do whatever
we want. But no error is sent - rewrite is tansparent to the client. And I
didn't even had to rewrite a sigle line of my scripts :)

Bartek Górny
Taki Jeden wrote:

Hello everybody

Does anybody know why w3c validator can not get pages that use 404
htaccess redirection? I set up two web sites so that clients request
non-existent urls, but htaccess redirects calls to a script which parses
the url and produces requested pages. It works fine with browsers, but
when I try to validate the page I get a 404 error - which bewilders me,
because I thought Apache does the redirection internally without sending
the error message to the client. Why is that, and is there a way to use
validator for such pages (other then validating by uploading file)?

Bartek Górny


--
#!/usr/bin/env python
print 'sygnatura'
Jul 23 '05 #11

P: n/a

Actually, it is even easier then that - put

header("HTTP/1.1 200 OK");

as the first line of the handling script. It is sent instead of the 404
header, and everything goes fine.

BG

--
#!/usr/bin/env python
print 'sygnatura'
Jul 23 '05 #12

P: n/a
Taki Jeden <ba*********@interia.pl> wrote in
news:d0**********@nemesis.news.tpi.pl:

Actually, it is even easier then that - put

header("HTTP/1.1 200 OK");

as the first line of the handling script. It is sent instead of the 404
header, and everything goes fine.


You are missing the point. If a requested resource
doesn't exist, you shouldn't, in most cases, be
obfuscating that fact by server-side redirections
or 'faking' the HTTP response code.

In addition, creating a new message thread that
refers to "something else"(i.e. 'yet another way',
'it is even easier then that') is not the proper
way to communicate on Usenet.

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
http://www.confluence.org/
My website: http://members.shaw.ca/davepatton/
Jul 23 '05 #13

P: n/a
Dave Patton wrote:
Taki Jeden <ba*********@interia.pl> wrote in
news:d0**********@nemesis.news.tpi.pl:

Actually, it is even easier then that - put

header("HTTP/1.1 200 OK");

as the first line of the handling script. It is sent instead of the 404
header, and everything goes fine.


You are missing the point. If a requested resource
doesn't exist, you shouldn't, in most cases, be
obfuscating that fact by server-side redirections
or 'faking' the HTTP response code.


I found this advice in Chris Beasley's article:

http://www.sitepoint.com/article/sea...riendly-urls/2

Why shouldn't I do that? I'm just telling Apache how to handle this
situation - in effect, I'm sort of 'catching the exception'. Instead of 404
header and error page, Apache sends 200 header and desired page - sending
error header would be wrong, because the requested resource does exist. Or,
if my handler script can't figure out what to send, it can send 404 header
as well.

BG

--
#!/usr/bin/env python
print 'sygnatura'
Jul 23 '05 #14

P: n/a


Dave Patton wrote:
creating a new message thread that
refers to "something else"(i.e. 'yet another way',
'it is even easier then that') is not the proper
way to communicate on Usenet.


If <news:d0**********@nemesis.news.tpi.pl> appeared to you as a new thread,
your newsreader is either broken (unable to understand References lines) or
misconfigured.

Thor

--
http://www.anta.net/OH2GDF
Jul 23 '05 #15

P: n/a
Thor Kottelin <th**@anta.net> wrote in news:42***************@anta.net:


Dave Patton wrote:
creating a new message thread that
refers to "something else"(i.e. 'yet another way',
'it is even easier then that') is not the proper
way to communicate on Usenet.


If <news:d0**********@nemesis.news.tpi.pl> appeared to you as a new
thread, your newsreader is either broken (unable to understand
References lines) or misconfigured.


Or, I'm 'misconfigured' ;-)
Because the subject line was changed, and I routinely do
a 'catchup', it appeard that the posting I replied to was
the start of a new thread. Now that I reloaded older
articles, I see it was posted as part of the original thread.
My bad.

--
Dave Patton
Canadian Coordinator, Degree Confluence Project
http://www.confluence.org/
My website: http://members.shaw.ca/davepatton/
Jul 23 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.