473,397 Members | 1,969 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Unrecognized file format prolem with valid html, please help!

I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

As you can see if you click this link here
http://validator.w3.org/check?uri=ht...ww.wellsre.com

This site validates just fine using the w3 validator

the problem that i have is that google does not recognise the file
format of this site

check this link here
http://www.google.com/search?sourcei...TF-8&q=wellsre
and this link here

http://64.233.167.104/search?q=cache...+wellsre&hl=en

I am not sure why this page which validates just fine with the w3
validator is not reconised and spidered properly by google

as you can imagine my clients are less than happy about this and I am
at a loss for what to do about it.

If anyone has any ideas for me they would be much appreciated.

thank you

Jeff Parker
Jul 23 '05 #1
22 2522
"Jeff Parker" <pu********@hotmail.com> wrote in message
news:32**************************@posting.google.c om...
I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com


You have the HTML tag on the same line as the DOCTYPE. Most unusual. Try
putting them on separate lines.
Jul 23 '05 #2
In our last episode,
<32**************************@posting.google.com >,
the lovely and talented Jeff Parker
broadcast on comp.infosystems.www.authoring.html:
I have a web application that for the real estate industry. Here is
one of the sites using said application. http://www.wellsre.com


Where is the rest of it? That is, what is the actual filename
and why is the trailing slash missing?

Adding a meta http-equiv with the content type might help. Are
you certain the server is sending the correct content type for
this file?

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.
Jul 23 '05 #3
Jeff Parker wrote:
I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

the problem that i have is that google does not recognise the file
format of this site


The "HTML" in your content-type is all-caps. Try fixing that.

[leif@localhost leif]$ HEAD http://www.wellsre.com
200 OK
Cache-Control: no-cache
Connection: close
Date: Wed, 10 Nov 2004 04:22:16 GMT
Server: Microsoft-IIS/5.0
Content-Length: 20689
Content-Type: text/HTML; Charset=ISO-8859-1
Client-Date: Wed, 10 Nov 2004 04:25:47 GMT
Client-Peer: 66.232.22.13:80
Client-Response-Num: 1
Set-Cookie: ASPSESSIONIDQCASRDCB=PJBHCLKCPEEHHDOKJBIIIDPI; path=/
X-Powered-By: ASP.NET
Jul 23 '05 #4
Jeff Parker wrote:
I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

the problem that i have is that google does not recognise the file
format of this site


The HTTP response headers [1] contain:

Content-Type: text/HTML; Charset=ISO-8859-1

I suspect that may be the problem. I've never seen the content type
fields written in uppercase, they're usually written in lowercase. I
don't know if it's invalid or not to have it in uppercase (according to
the relevant RFCs: RFC 2616 (HTTP1.1), 2045 (MIME) or 2046 (Media
Types)), but perhaps google doesn't recognise it like that. Fix your
server to send:

Content-Type: text/html; charset=ISO-8859-1
Also, even though it is valid HTML, you should look into replacing all
those layout tables and presentational elements/attributes with CSS, and
use a DOCTYPE that doesn't trigger quirks mode [2] in browsers. You
should also use <p> instead of <br><br> to create seperate paragraphs.

eg. Write this:
<p>paragraph 1 ...</p>
</p>paragraph 2 ...</p>

instead of:
paragraph 1 ...
<br><br>
paragraph 2 ...

[1] http://cgi.w3.org/cgi-bin/headers?ur...w.wellsre.com/
[2] http://www.mozilla.org/docs/web-deve.../doctypes.html

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 23 '05 #5
On 9 Nov 2004 17:55:41 -0800, Jeff Parker <pu********@hotmail.com> wrote:
I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

As you can see if you click this link here
http://validator.w3.org/check?uri=ht...ww.wellsre.com

This site validates just fine using the w3 validator

the problem that i have is that google does not recognise the file
format of this site

check this link here
http://www.google.com/search?sourcei...TF-8&q=wellsre


See http://www.google.com/search?q=%22We...wellsre.com%22
Jul 23 '05 #6
On Tue, 09 Nov 2004 23:36:14 -0500, Neal <ne*****@yahoo.com> wrote:
On 9 Nov 2004 17:55:41 -0800, Jeff Parker <pu********@hotmail.com> wrote:
I have a web application that for the real estate industry. Here is
one of the sites using said application.

http://www.wellsre.com

As you can see if you click this link here
http://validator.w3.org/check?uri=ht...ww.wellsre.com

This site validates just fine using the w3 validator

the problem that i have is that google does not recognise the file
format of this site


Oh, never mind, I see now.

What is the file format? You never told us.
Jul 23 '05 #7
On Wed, 10 Nov 2004 04:31:26 GMT, Lachlan Hunt <sp***********@gmail.com>
declared in comp.infosystems.www.authoring.html:
<p>paragraph 1 ...</p>
</p>paragraph 2 ...</p>


That would be:

<p>paragraph 1 ...</p>
<p>paragraph 2 ...</p>

--
Mark Parnell
http://www.clarkecomputers.com.au
Jul 23 '05 #8
On 9 Nov 2004 17:55:41 -0800, Jeff Parker <pu********@hotmail.com> wrote:
http://www.wellsre.com


Possibly unrelated but worth mentioning - in Opera 7.23 the page appears
two times if I reload. One below the other.

Bizarre.
Jul 23 '05 #9
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
<32**************************@posting.google.co m> Jeff Parker:
http://www.wellsre.com


Where is the rest of it? That is, what is the actual filename


You and your browser don't need to know that.
and why is the trailing slash missing?


It isn't.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Jul 23 '05 #10
In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
<32**************************@posting.google.com > Jeff Parker:
http://www.wellsre.com


Where is the rest of it? That is, what is the actual filename You and your browser don't need to know that.
I'm not the one begging for help here.
and why is the trailing slash missing?

It isn't.
Oh, it is ont of those *invisible* trailing slashes.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.


--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.
Jul 23 '05 #11
Lars Eighner wrote:
Stan Brown broadcast
It isn't.

Oh, it is ont of those *invisible* trailing slashes.


AFAIK the trailing slash is not needed at the end of a domain. It is at
the end of a directory.
Jul 23 '05 #12
Lars Eighner wrote:
Jeff Parker :
I have a web application that for the real estate industry.

http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename


There is no filename on the client end, only a url and a resource,
hopefully with a mime type.
and why is the trailing slash missing?


The trailing slash on that url is optional.

--
Brian (remove "invalid" to email me)
Jul 23 '05 #13
In our last episode,
<op**************@news.individual.net>,
the lovely and talented Neal
broadcast on comp.infosystems.www.authoring.html:
Lars Eighner wrote:
Stan Brown broadcast
It isn't. Oh, it is ont of those *invisible* trailing slashes.

AFAIK the trailing slash is not needed at the end of a domain. It is at
the end of a directory.


It is my understanding that, at least with some combinations of
browsers and servers, an extra http transaction is required if
the trailing slash is omitted. Moreover, from googling on
trailing slash domain, I find several reports of google handling
sites somewhat differently according to whether the trailing slash
is included.

The question isn't whether your browser or my browser can get the
page. Obviously most - if not all - modern browsers can bring up the
page by hook or by crook. The question was about some apparently
mysterious google behavior, but whether a quirk in google's spider or
in google's subsequent processing is involved I don't know.

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.
Jul 23 '05 #14
On Wed, 10 Nov 2004 17:25:57 -0600, Lars Eighner <ei*****@io.com> wrote:
The question isn't whether your browser or my browser can get the
page. Obviously most - if not all - modern browsers can bring up the
page by hook or by crook. The question was about some apparently
mysterious google behavior, but whether a quirk in google's spider or
in google's subsequent processing is involved I don't know.


It shouldn't be related to the slash. Likely his filetype is being
mis-served or is otherwise screwed up.

Jul 23 '05 #15
Lars Eighner wrote:
It is my understanding that, at least with some combinations of
browsers and servers, an extra http transaction is required if the
trailing slash is omitted.


These 2 urls are equivalent:

http://www.example.com
http://www.example.com/

Both of them point to the root of the http server at www.example.com.

These 2 are not:
http://www.example.com/foo
http://www.example.com/foo/

The reason the last 2 are not equivalent is because they point to 2
different urls. It is entirely possible to have one resource at /foo and
another at /foo/ on the same server. On Apache, if there is a directory
name /foo/ in the public document part of the server, and a client
requests /foo, then, barring any special server configuration, the
server will redirect the client to /foo/. Perhaps that's what you were
thinking of.

--
Brian (remove "invalid" to email me)
Jul 23 '05 #16
In our last episode,
<yn*********************@bgtnsc04-news.ops.worldnet.att.net>,
the lovely and talented Brian
broadcast on comp.infosystems.www.authoring.html:
Lars Eighner wrote:
Jeff Parker :
I have a web application that for the real estate industry.

http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename

There is no filename on the client end, only a url and a resource,
hopefully with a mime type.


But it is not at all clear this is a client-side problem. It
certainly could be: google's spider could be doing something
very peculiar. But it could be a server-side problem. Therefore
it would be very useful to know as much about what is going on
on the server side as possible.

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.
Jul 23 '05 #17
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:

<32**************************@posting.google.co m> Jeff Parker:
http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename
You and your browser don't need to know that.


I'm not the one begging for help here.


The OP and the OP's browser don't need to know that either. No
client needs to know that. There may not even be a file; that's the
business of the server.
and why is the trailing slash missing?
It isn't.


Oh, it is ont of those *invisible* trailing slashes.


??

There is no need for a slash after the host name. This has been
discussed here extensively in the past, and it's easy enough to look
up:

"An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor <searchpart> is present, the
"/" may also be omitted."

<http://www.cse.ohio-state.edu/cs/Services/rfc/rfc-text/rfc1738.txt>

I believe RFC 1738 (Dec 1994) was the first official spec for URLs;
certainly there have been elaborations since then but this should
show that the trailing slash on host name has never been a
requirement.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.


Well, you're down to three lines. It's a step in the right
direction.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Jul 23 '05 #18
In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
"Lars Eighner" <ei*****@io.com> wrote in
comp.infosystems.www.authoring.html:
<32**************************@posting.google.c om> Jeff Parker:
> http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename

You and your browser don't need to know that.


I'm not the one begging for help here. The OP and the OP's browser don't need to know that either. No
client needs to know that. There may not even be a file; that's the
business of the server.
It is also the business of the server to seen the right
content type. Somehow, google isn't getting the content
type right. Maybe the problem is google's spider. Or
maybe the problem is with the server. In any event,
there are good reasons anyone who really wanted to make
useful suggestions about the problem would need to know
where the document is coming from.

I notice you haven't made any suggestions at all about
the problem: useful, lame, obvious, or esoteric. I suggest
that could be because you have no interest at all in being
helpful and no intellectual curiousity about the problem.

and why is the trailing slash missing?

It isn't.


Oh, it is ont of those *invisible* trailing slashes. ?? There is no need for a slash after the host name. This has been
discussed here extensively in the past, and it's easy enough to look
up:
What I did look up was 'trailing slash domain' on google where
I found numerous references to google treating pages differently
according to whether there was a trailing slash on the domain.
I consider it possible that all of those references were from
people who were mistaken, but I also think it possible that
google by accident or design does do something different in
those cases. If so, then perhaps google doesn't operate according
to any number of RFCs, but the person with the problem doesn't
care about the RFCs. He wants his page to show up properly on
google.
P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.

Well, you're down to three lines. It's a step in the right
direction.


The difference being a \032 instead of a \n

--
Lars Eighner -finger for geek code- ei*****@io.com http://www.io.com/~eighner/
If it wasn't for muscle spasms, I wouldn't get any exercise at all.
Jul 23 '05 #19
On Wed, 10 Nov 2004, Lars Eighner wrote:
In our last episode, <MP************************@news.odyssey.net>,
the lovely and talented Stan Brown broadcast on
comp.infosystems.www.authoring.html:
There is no need for a slash after the host name. This has been
discussed here extensively in the past, and it's easy enough to look
up:
What I did look up was 'trailing slash domain'


Except that this is *not* the "trailing slash" referred to in those
discussions.

After the hostname (and optional :portnumber) comes a slash which
separates the host part from the local part of the URL.

When the local part is empty, this separating slash is optional.

That slash *looks* to you like a trailing slash: but it isn't, because
it has the localpart of the URL on the right of it. It just so
happens that, in this specific case, the localpart is empty.
on google where I found numerous references to google treating pages
differently according to whether there was a trailing slash on the
domain.


Correct. When the URL's local part needs to include a trailing slash,
that slash is meaningful. When it is omitted, the server might return
some quite different resource; in many practical cases what it will do
is to send a redirection to a corrected URL with the trailing slash
added (but this behaviour is only a widely accepted convention - it
isn't in any way fundamental). The client then has to retrieve that
corrected URL in an extra transaction.
Jul 23 '05 #20
JRS: In article <MP************************@news.odyssey.net>, dated
Wed, 10 Nov 2004 16:03:07, seen in news:comp.infosystems.www.authoring.
html, Stan Brown <th************@fastmail.fm> posted :

< In our last episode,
< <32**************************@posting.google.com >,
< the lovely and talented Jeff Parker
< broadcast on comp.infosystems.www.authoring.html:

P.S. I'm a big fan of proper attributions, but four lines does seem
like a superabundance of riches.


Recent USEFOR thinking is/was visible in work-in-progress
<URL:http://www.ietf.org/internet-drafts/draft-ietf-usefor-useage-00.txt>
and/or
<URL:http://www.ietf.org/internet-drafts/draft-ietf-usefor-article-13.txt>

The retro-cited attribution is
(a) not compliant
(b) puerile.

My attributions are compliant; they include optional parts.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME ©
Web <URL:http://www.uwasa.fi/~ts/http/tsfaq.html> -> Timo Salmi: Usenet Q&A.
Web <URL:http://www.merlyn.demon.co.uk/news-use.htm> : about usage of News.
No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.
Jul 23 '05 #21
Lars Eighner wrote:
Stan Brown
"Lars Eighner" <ei*****@io.com> wrote:

> Jeff Parker:
>
>> http://www.wellsre.com
>
> Where is the rest of it? That is, what is the actual
> filename
The OP and the OP's browser don't need to know that either. No
client needs to know that. There may not even be a file; that's the
business of the server.
It is also the business of the server to seen the right content type.


Sure, but that information is not sent via the url. The server may use
file extension to determine which mime type to send, but the client does
not know how, or even if there is such an association. You asked where
the "rest of" the url was. Stan Brown correctly noted that the url was
not missing anything.
I notice you haven't made any suggestions at all about the problem:
useful, lame, obvious, or esoteric.


This isn't a helpdesk, but a discussion forum. Answers are often
provided incidentally, but it is not a requirement to participate.
Stan Brown:
P.S. I'm a big fan of proper attributions, but four lines does
seem like a superabundance of riches.


Agreed. Please trim the attribution novel you put at the top of your
replies.

--
Brian (remove "invalid" to email me)
Jul 23 '05 #22
Lars Eighner wrote:
Brian:

Lars Eighner wrote:
Jeff Parker :

http://www.wellsre.com

Where is the rest of it? That is, what is the actual filename
There is no filename on the client end, only a url and a resource,
hopefully with a mime type.


But it is not at all clear this is a client-side problem.


Then why ask about the url? There is no information about the content
type in the url of a resource. That information can only legitamately
come from a response header.
it could be a server-side problem. Therefore it would be very useful
to know as much about what is going on on the server side as
possible.


Forgive me, but you have not been very consistent. When you ask where
the rest of the url is, that leads us to believe that you have
misunderstood something rather fundamental here.

BTW, you seem to be taking this personally. This is a discussion forum,
and one of its most valuable aspects is peer review, which is usually
swift and can be pitiless. But consider the value of that peer review.
If someone gets something wrong, others will correct them, hopefully
before the op gets misled by misinformation.

--
Brian (remove "invalid" to email me)
Jul 23 '05 #23

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: qbsu21th | last post by:
Dear Sir/Madam, I'm writing a small VB6.0 and Access 2002 application in XP OS. I selected the data control property RecordSource, I got the "Unrecognized database format 'xxx/xx/*.mdb' " error....
14
by: L Mehl | last post by:
I tested a FE/BE application developed in A2000 on a A2002 machine and got this message when exiting the app. Clicking the only available button "OK", exits the application, as intended. The FE...
1
by: S. van Beek | last post by:
Dear reader, In the form property "Help File" there is the possibility to specify a .chm help file. This help file can be created with "HTML Help Workshop". According the instructions...
1
by: Rabbit | last post by:
Can anyone tell me why my web developed in .net 2.0, using Access database (.mdb). working fine at my development machines, but when I deploy it onto new Windows Server 2003, the web returns...
0
by: Pakmanoncrack | last post by:
The software company I work for creates 4 chm files from c# code with ndoc and then combines them into a fifth chm file by merging through html help. This is all done through a dos build and there...
6
by: James12345 | last post by:
i have created a database for our advisors to log their call types. It is being run on about 10 different machines. Every couple of weeks, when someone tries to open it, the error Unrecognized...
1
by: Richard Lewis Haggard | last post by:
Is it possible to make a single file that can act as the source of context help topic ID for both MicrosofMicrosoft's HTML Help Workshop and C# projects and, if so, how? Microsoft's HTML...
4
by: Hepburn08 | last post by:
Hi, I need some help opening a corrupt Microsoft Access database file. I get an error message that says "unrecognized database format". Since I can't open it, I can't export the data to another new...
6
by: MISASIA | last post by:
I really hope someone can help me... My office is using a Menu which running VB. There got many button in that Menu and one of the button which name as "GMR" hit error. Its prompt a message box...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.