Unrecognized file format prolem with valid html, please help!

Jeff Parker

I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

As you can see if you click this link here
http://validator.w3.org/check?uri=ht...ww.wellsre.com

This site validates just fine using the w3 validator

the problem that i have is that google does not recognise the file
format of this site

check this link here
http://www.google.com/search?sourcei...TF-8&q=wellsre
and this link here

http://64.233.167.104/search?q=cache...+wellsre&hl=en

I am not sure why this page which validates just fine with the w3
validator is not reconised and spidered properly by google

as you can imagine my clients are less than happy about this and I am
at a loss for what to do about it.

If anyone has any ideas for me they would be much appreciated.

thank you

Jeff Parker

Jul 23 '05 #1

Subscribe Post Reply

2522

C A Upsdell

"Jeff Parker" <pu********@hotmail.com> wrote in message
news:32**************************@posting.google.c om...

I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

You have the HTML tag on the same line as the DOCTYPE. Most unusual. Try
putting them on separate lines.

Jul 23 '05 #2

Lars Eighner

In our last episode,
<32**************************@posting.google.com >,
the lovely and talented Jeff Parker
broadcast on comp.infosystems.www.authoring.html:

I have a web application that for the real estate industry. Here is
one of the sites using said application. http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename
and why is the trailing slash missing?

Adding a meta http-equiv with the content type might help. Are
you certain the server is sending the correct content type for
this file?

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.

Jul 23 '05 #3

Leif K-Brooks

Jeff Parker wrote:

I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

the problem that i have is that google does not recognise the file
format of this site

The "HTML" in your content-type is all-caps. Try fixing that.

[leif@localhost leif]$ HEAD http://www.wellsre.com
200 OK
Cache-Control: no-cache
Connection: close
Date: Wed, 10 Nov 2004 04:22:16 GMT
Server: Microsoft-IIS/5.0
Content-Length: 20689
Content-Type: text/HTML; Charset=ISO-8859-1
Client-Date: Wed, 10 Nov 2004 04:25:47 GMT
Client-Peer: 66.232.22.13:80
Client-Response-Num: 1
Set-Cookie: ASPSESSIONIDQCASRDCB=PJBHCLKCPEEHHDOKJBIIIDPI; path=/
X-Powered-By: ASP.NET

Jul 23 '05 #4

Lachlan Hunt

Jeff Parker wrote:

I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

the problem that i have is that google does not recognise the file
format of this site

The HTTP response headers [1] contain:

Content-Type: text/HTML; Charset=ISO-8859-1

I suspect that may be the problem. I've never seen the content type
fields written in uppercase, they're usually written in lowercase. I
don't know if it's invalid or not to have it in uppercase (according to
the relevant RFCs: RFC 2616 (HTTP1.1), 2045 (MIME) or 2046 (Media
Types)), but perhaps google doesn't recognise it like that. Fix your
server to send:

Content-Type: text/html; charset=ISO-8859-1
Also, even though it is valid HTML, you should look into replacing all
those layout tables and presentational elements/attributes with CSS, and
use a DOCTYPE that doesn't trigger quirks mode [2] in browsers. You
should also use instead of to create seperate paragraphs.

eg. Write this:
paragraph 1 ...
paragraph 2 ...

instead of:
paragraph 1 ...
 
paragraph 2 ...

[1] http://cgi.w3.org/cgi-bin/headers?ur...w.wellsre.com/
[2] http://www.mozilla.org/docs/web-deve.../doctypes.html

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web

Jul 23 '05 #5

Neal

On 9 Nov 2004 17:55:41 -0800, Jeff Parker <pu********@hotmail.com> wrote:

I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

As you can see if you click this link here
http://validator.w3.org/check?uri=ht...ww.wellsre.com

This site validates just fine using the w3 validator

the problem that i have is that google does not recognise the file
format of this site

check this link here
http://www.google.com/search?sourcei...TF-8&q=wellsre

See http://www.google.com/search?q=%22We...wellsre.com%22

Jul 23 '05 #6

Neal

On Tue, 09 Nov 2004 23:36:14 -0500, Neal <ne*****@yahoo.com> wrote:

On 9 Nov 2004 17:55:41 -0800, Jeff Parker <pu********@hotmail.com> wrote:
I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

As you can see if you click this link here
http://validator.w3.org/check?uri=ht...ww.wellsre.com

This site validates just fine using the w3 validator

the problem that i have is that google does not recognise the file
format of this site

Oh, never mind, I see now.

What is the file format? You never told us.

Jul 23 '05 #7

Mark Parnell

On Wed, 10 Nov 2004 04:31:26 GMT, Lachlan Hunt <sp***********@gmail.com>
declared in comp.infosystems.www.authoring.html:

paragraph 1 ...
paragraph 2 ...

That would be:

paragraph 1 ...
paragraph 2 ...

--
Mark Parnell
http://www.clarkecomputers.com.au

Jul 23 '05 #8

Neal

On 9 Nov 2004 17:55:41 -0800, Jeff Parker <pu********@hotmail.com> wrote:

http://www.wellsre.com

Possibly unrelated but worth mentioning - in Opera 7.23 the page appears
two times if I reload. One below the other.

Bizarre.

Jul 23 '05 #9

Stan Brown

"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:

<32**************************@posting.google.co m> Jeff Parker:
http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename

You and your browser don't need to know that.
and why is the trailing slash missing?

It isn't.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/

Jul 23 '05 #10

Lars Eighner

In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:

"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
<32**************************@posting.google.com > Jeff Parker:
http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename You and your browser don't need to know that.
I'm not the one begging for help here.
and why is the trailing slash missing?

It isn't.
Oh, it is ont of those *invisible* trailing slashes.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.

Jul 23 '05 #11

Neal

Lars Eighner wrote:

Stan Brown broadcast
It isn't.

Oh, it is ont of those *invisible* trailing slashes.

AFAIK the trailing slash is not needed at the end of a domain. It is at
the end of a directory.

Jul 23 '05 #12

Brian

Lars Eighner wrote:

Jeff Parker :
I have a web application that for the real estate industry.

http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename

There is no filename on the client end, only a url and a resource,
hopefully with a mime type.
and why is the trailing slash missing?

The trailing slash on that url is optional.

--
Brian (remove "invalid" to email me)

Jul 23 '05 #13

Lars Eighner

In our last episode,
<op**************@news.individual.net>,
the lovely and talented Neal
broadcast on comp.infosystems.www.authoring.html:

Lars Eighner wrote:
Stan Brown broadcast
It isn't. Oh, it is ont of those *invisible* trailing slashes.

AFAIK the trailing slash is not needed at the end of a domain. It is at
the end of a directory.

It is my understanding that, at least with some combinations of
browsers and servers, an extra http transaction is required if
the trailing slash is omitted. Moreover, from googling on
trailing slash domain, I find several reports of google handling
sites somewhat differently according to whether the trailing slash
is included.

The question isn't whether your browser or my browser can get the
page. Obviously most - if not all - modern browsers can bring up the
page by hook or by crook. The question was about some apparently
mysterious google behavior, but whether a quirk in google's spider or
in google's subsequent processing is involved I don't know.

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.

Jul 23 '05 #14

Neal

On Wed, 10 Nov 2004 17:25:57 -0600, Lars Eighner <ei*****@io.com> wrote:

The question isn't whether your browser or my browser can get the
page. Obviously most - if not all - modern browsers can bring up the
page by hook or by crook. The question was about some apparently
mysterious google behavior, but whether a quirk in google's spider or
in google's subsequent processing is involved I don't know.

It shouldn't be related to the slash. Likely his filetype is being
mis-served or is otherwise screwed up.

Jul 23 '05 #15

Brian

Lars Eighner wrote:

It is my understanding that, at least with some combinations of
browsers and servers, an extra http transaction is required if the
trailing slash is omitted.

These 2 urls are equivalent:

http://www.example.com
http://www.example.com/

Both of them point to the root of the http server at www.example.com.

These 2 are not:
http://www.example.com/foo
http://www.example.com/foo/

The reason the last 2 are not equivalent is because they point to 2
different urls. It is entirely possible to have one resource at /foo and
another at /foo/ on the same server. On Apache, if there is a directory
name /foo/ in the public document part of the server, and a client
requests /foo, then, barring any special server configuration, the
server will redirect the client to /foo/. Perhaps that's what you were
thinking of.

--
Brian (remove "invalid" to email me)

Jul 23 '05 #16

Lars Eighner

In our last episode,
<yn*********************@bgtnsc04-news.ops.worldnet.att.net>,
the lovely and talented Brian
broadcast on comp.infosystems.www.authoring.html:

Lars Eighner wrote:
Jeff Parker :
I have a web application that for the real estate industry.

http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename

There is no filename on the client end, only a url and a resource,
hopefully with a mime type.

But it is not at all clear this is a client-side problem. It
certainly could be: google's spider could be doing something
very peculiar. But it could be a server-side problem. Therefore
it would be very useful to know as much about what is going on
on the server side as possible.

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.

Jul 23 '05 #17

Stan Brown

"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:

In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:

<32**************************@posting.google.co m> Jeff Parker:
http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename
You and your browser don't need to know that.

I'm not the one begging for help here.

The OP and the OP's browser don't need to know that either. No
client needs to know that. There may not even be a file; that's the
business of the server.

and why is the trailing slash missing?
It isn't.

Oh, it is ont of those *invisible* trailing slashes.

??

There is no need for a slash after the host name. This has been
discussed here extensively in the past, and it's easy enough to look
up:

"An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor <searchpart> is present, the
"/" may also be omitted."

<http://www.cse.ohio-state.edu/cs/Services/rfc/rfc-text/rfc1738.txt>

I believe RFC 1738 (Dec 1994) was the first official spec for URLs;
certainly there have been elaborations since then but this should
show that the trailing slash on host name has never been a
requirement.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

Well, you're down to three lines. It's a step in the right
direction.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/

Jul 23 '05 #18

Lars Eighner

In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:

"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
<32**************************@posting.google.c om> Jeff Parker:
> http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename

You and your browser don't need to know that.

I'm not the one begging for help here. The OP and the OP's browser don't need to know that either. No
client needs to know that. There may not even be a file; that's the
business of the server.
It is also the business of the server to seen the right
content type. Somehow, google isn't getting the content
type right. Maybe the problem is google's spider. Or
maybe the problem is with the server. In any event,
there are good reasons anyone who really wanted to make
useful suggestions about the problem would need to know
where the document is coming from.

I notice you haven't made any suggestions at all about
the problem: useful, lame, obvious, or esoteric. I suggest
that could be because you have no interest at all in being
helpful and no intellectual curiousity about the problem.

and why is the trailing slash missing?

It isn't.

Oh, it is ont of those *invisible* trailing slashes. ?? There is no need for a slash after the host name. This has been
discussed here extensively in the past, and it's easy enough to look
up:
What I did look up was 'trailing slash domain' on google where
I found numerous references to google treating pages differently
according to whether there was a trailing slash on the domain.
I consider it possible that all of those references were from
people who were mistaken, but I also think it possible that
google by accident or design does do something different in
those cases. If so, then perhaps google doesn't operate according
to any number of RFCs, but the person with the problem doesn't
care about the RFCs. He wants his page to show up properly on
google.

P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

Well, you're down to three lines. It's a step in the right
direction.

The difference being a \032 instead of a \n

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.

Jul 23 '05 #19

Alan J. Flavell

On Wed, 10 Nov 2004, Lars Eighner wrote:

In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
There is no need for a slash after the host name. This has been
discussed here extensively in the past, and it's easy enough to look
up:
What I did look up was 'trailing slash domain'

Except that this is *not* the "trailing slash" referred to in those
discussions.

After the hostname (and optional :portnumber) comes a slash which
separates the host part from the local part of the URL.

When the local part is empty, this separating slash is optional.

That slash *looks* to you like a trailing slash: but it isn't, because
it has the localpart of the URL on the right of it. It just so
happens that, in this specific case, the localpart is empty.
on google where I found numerous references to google treating pages
differently according to whether there was a trailing slash on the
domain.

Correct. When the URL's local part needs to include a trailing slash,
that slash is meaningful. When it is omitted, the server might return
some quite different resource; in many practical cases what it will do
is to send a redirection to a corrected URL with the trailing slash
added (but this behaviour is only a widely accepted convention - it
isn't in any way fundamental). The client then has to retrieve that
corrected URL in an extra transaction.

Jul 23 '05 #20

Dr John Stockton

JRS: In article <MP************************@news.odyssey.net>, dated
Wed, 10 Nov 2004 16:03:07, seen in news:comp.infosystems.www.authoring.
html, Stan Brown <th************@fastmail.fm> posted :

< In our last episode,
< <32**************************@posting.google.com >,
< the lovely and talented Jeff Parker
< broadcast on comp.infosystems.www.authoring.html:

P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

Recent USEFOR thinking is/was visible in work-in-progress
<URL:http://www.ietf.org/internet-drafts/draft-ietf-usefor-useage-00.txt>
and/or
<URL:http://www.ietf.org/internet-drafts/draft-ietf-usefor-article-13.txt>

The retro-cited attribution is
(a) not compliant
(b) puerile.

My attributions are compliant; they include optional parts.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME ©
Web <URL:http://www.uwasa.fi/~ts/http/tsfaq.html> -> Timo Salmi: Usenet Q&A.
Web <URL:http://www.merlyn.demon.co.uk/news-use.htm> : about usage of News.
No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.

Jul 23 '05 #21

Brian

Lars Eighner wrote:

Stan Brown
"Lars Eighner" <ei*****@io.com> wrote:

> Jeff Parker:
>
>> http://www.wellsre.com
>
> Where is the rest of it? That is, what is the actual
> filename
The OP and the OP's browser don't need to know that either. No
client needs to know that. There may not even be a file; that's the
business of the server.
It is also the business of the server to seen the right content type.

Sure, but that information is not sent via the url. The server may use
file extension to determine which mime type to send, but the client does
not know how, or even if there is such an association. You asked where
the "rest of" the url was. Stan Brown correctly noted that the url was
not missing anything.
I notice you haven't made any suggestions at all about the problem:
useful, lame, obvious, or esoteric.

This isn't a helpdesk, but a discussion forum. Answers are often
provided incidentally, but it is not a requirement to participate.

Stan Brown:
P.S. I'm a big fan of proper attributions, but four lines does
seem like a superabundance of riches.

Agreed. Please trim the attribution novel you put at the top of your
replies.

--
Brian (remove "invalid" to email me)

Jul 23 '05 #22

Brian

Lars Eighner wrote:

Brian:

Lars Eighner wrote:
Jeff Parker :

http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename
There is no filename on the client end, only a url and a resource,
hopefully with a mime type.

But it is not at all clear this is a client-side problem.

Then why ask about the url? There is no information about the content
type in the url of a resource. That information can only legitamately
come from a response header.
it could be a server-side problem. Therefore it would be very useful
to know as much about what is going on on the server side as
possible.

Forgive me, but you have not been very consistent. When you ask where
the rest of the url is, that leads us to believe that you have
misunderstood something rather fundamental here.

BTW, you seem to be taking this personally. This is a discussion forum,
and one of its most valuable aspects is peer review, which is usually
swift and can be pitiless. But consider the value of that peer review.
If someone gets something wrong, others will correct them, hopefully
before the op gets misled by misinformation.

--
Brian (remove "invalid" to email me)

Jul 23 '05 #23

Unrecognized file format prolem with valid html, please help!

Similar topics