By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,896 Members | 1,997 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,896 IT Pros & Developers. It's quick & easy.

How to judge whether content type is truly "text/html"?

P: n/a
I know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
The response has one ContentType property. But the property is just
decided by http response header. It is possible that the content is
actually HTML, while the ContentType is "image/jpeg".

Is there any effective way to judge whether the response type is truly
"text"?
I have a idea to read the first several bytes of the response stream;
and check whether they are real displayable characters. But, they can
be any kind of Encoding. Should I try all kinds of Encoding?

Sep 21 '06 #1
Share this Question
Share on Google+
6 Replies


P: n/a
The property is not decided by the HTTP Response Header. It is decided by
the web server and/or the developer who created the web site. The problem
here is, the reason for the ContentType header is to tell the client what is
stored in the stream of bits it is sending. Since a stream of bits is just
1's and 0's there's no way to tell without it.

However, I have never heard of what you describe happening. If it did,
browsers would not be able to view the content, and whomever created the web
site would know about it very shortly (from the response of the users).

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer

A watched clock never boils.

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@k70g2000cwa.googlegr oups.com...
>I know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
The response has one ContentType property. But the property is just
decided by http response header. It is possible that the content is
actually HTML, while the ContentType is "image/jpeg".

Is there any effective way to judge whether the response type is truly
"text"?
I have a idea to read the first several bytes of the response stream;
and check whether they are real displayable characters. But, they can
be any kind of Encoding. Should I try all kinds of Encoding?

Sep 21 '06 #2

P: n/a

Kevin Spencer 写道:
The property is not decided by the HTTP Response Header. It is decided by
the web server and/or the developer who created the web site. The problem
here is, the reason for the ContentType header is to tell the client whatis
stored in the stream of bits it is sending. Since a stream of bits is just
1's and 0's there's no way to tell without it.
Yes, web server can config the response MIME type, which turns to be in
HTTP response header. That is my understanding.
However, I have never heard of what you describe happening. If it did,
browsers would not be able to view the content, and whomever created the web
site would know about it very shortly (from the response of the users).
I tried to manually set one html to be "image/jpeg" type in IIS6. Then
access the page from another machine and ambush the http package with
Fiddle. It shows that the response header has "ContentType:
image/jpeg". Interestingly, IE still show the html page, while Firefox
cannot show it up. It looks that IE does further job.

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer

A watched clock never boils.

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@k70g2000cwa.googlegr oups.com...
I know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
The response has one ContentType property. But the property is just
decided by http response header. It is possible that the content is
actually HTML, while the ContentType is "image/jpeg".

Is there any effective way to judge whether the response type is truly
"text"?
I have a idea to read the first several bytes of the response stream;
and check whether they are real displayable characters. But, they can
be any kind of Encoding. Should I try all kinds of Encoding?
Sep 21 '06 #3

P: n/a

Vadym Stetsyak 写道:
Hello, Morgan!

MCI know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
MCThe response has one ContentType property. But the property is just
MCdecided by http response header. It is possible that the content is
MCactually HTML, while the ContentType is "image/jpeg".

If you're talking to "well-behaved" web server, then it gives you the content
from the set you've specified in the Accept header.
I agree.
It happens to me to handle some un-normal situation:p
>
MCIs there any effective way to judge whether the response type is truly
MC"text"?
MCI have a idea to read the first several bytes of the response stream;
MCand check whether they are real displayable characters. But, they can
MCbe any kind of Encoding. Should I try all kinds of Encoding?

IMO there is no good way how verify if it is "text".
As a workaround you can check the response content for the subset of printable
characters...
The problem is the encoding.
However, html lang are in English which is 33-127 in most of Encoding.
Perhaps try to parse some html tag works.

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Sep 21 '06 #4

P: n/a

Vadym Stetsyak 写道:
Hello, Morgan!

MCI know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
MCThe response has one ContentType property. But the property is just
MCdecided by http response header. It is possible that the content is
MCactually HTML, while the ContentType is "image/jpeg".

If you're talking to "well-behaved" web server, then it gives you the content
from the set you've specified in the Accept header.
I agree.
It happens to me to handle some un-normal situation:p
>
MCIs there any effective way to judge whether the response type is truly
MC"text"?
MCI have a idea to read the first several bytes of the response stream;
MCand check whether they are real displayable characters. But, they can
MCbe any kind of Encoding. Should I try all kinds of Encoding?

IMO there is no good way how verify if it is "text".
As a workaround you can check the response content for the subset of printable
characters...
The problem is the encoding.
However, html lang are in English which is 33-127 in most of Encoding.
Perhaps try to parse some html tag works.

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Sep 21 '06 #5

P: n/a
Hi Morgan,

Your expreience underscores my point. While it is possible to manually (or,
perhaps unintentionally) change the ContentType header, any web site that
did would find out about it very quickly, because there are many different
browsers in use out there, and they would hear about the problem and fix it.

It isn't productive to imagine the most remote of possibilities and handle
them gracefully. If one did, one would never finish much of anything.
Sometimes the most graceful thing to do is to handle the error as an error
and move on. My guess is that you would never run into the issue at all.

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer

A watched clock never boils.

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@m7g2000cwm.googlegro ups.com...

Kevin Spencer ??:
The property is not decided by the HTTP Response Header. It is decided by
the web server and/or the developer who created the web site. The problem
here is, the reason for the ContentType header is to tell the client what
is
stored in the stream of bits it is sending. Since a stream of bits is just
1's and 0's there's no way to tell without it.
Yes, web server can config the response MIME type, which turns to be in
HTTP response header. That is my understanding.
However, I have never heard of what you describe happening. If it did,
browsers would not be able to view the content, and whomever created the
web
site would know about it very shortly (from the response of the users).
I tried to manually set one html to be "image/jpeg" type in IIS6. Then
access the page from another machine and ambush the http package with
Fiddle. It shows that the response header has "ContentType:
image/jpeg". Interestingly, IE still show the html page, while Firefox
cannot show it up. It looks that IE does further job.

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer

A watched clock never boils.

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@k70g2000cwa.googlegr oups.com...
I know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
The response has one ContentType property. But the property is just
decided by http response header. It is possible that the content is
actually HTML, while the ContentType is "image/jpeg".

Is there any effective way to judge whether the response type is truly
"text"?
I have a idea to read the first several bytes of the response stream;
and check whether they are real displayable characters. But, they can
be any kind of Encoding. Should I try all kinds of Encoding?

Sep 21 '06 #6

P: n/a

Kevin Spencer wrote:
Hi Morgan,

Your expreience underscores my point. While it is possible to manually (or,
perhaps unintentionally) change the ContentType header, any web site that
did would find out about it very quickly, because there are many different
browsers in use out there, and they would hear about the problem and fix it.

It isn't productive to imagine the most remote of possibilities and handle
them gracefully. If one did, one would never finish much of anything.
Sometimes the most graceful thing to do is to handle the error as an error
and move on. My guess is that you would never run into the issue at all.
I happen to find one function FindMimeFromData in UrlMon.dll. It
works.

http://msdn.microsoft.com/workshop/n...appendix_a.asp

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer

A watched clock never boils.

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@m7g2000cwm.googlegro ups.com...

Kevin Spencer ??:
The property is not decided by the HTTP Response Header. It is decided by
the web server and/or the developer who created the web site. The problem
here is, the reason for the ContentType header is to tell the client what
is
stored in the stream of bits it is sending. Since a stream of bits is just
1's and 0's there's no way to tell without it.
Yes, web server can config the response MIME type, which turns to be in
HTTP response header. That is my understanding.
However, I have never heard of what you describe happening. If it did,
browsers would not be able to view the content, and whomever created the
web
site would know about it very shortly (from the response of the users).
I tried to manually set one html to be "image/jpeg" type in IIS6. Then
access the page from another machine and ambush the http package with
Fiddle. It shows that the response header has "ContentType:
image/jpeg". Interestingly, IE still show the html page, while Firefox
cannot show it up. It looks that IE does further job.

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer

A watched clock never boils.

"Morgan Cheng" <mo************@gmail.comwrote in message
news:11**********************@k70g2000cwa.googlegr oups.com...
>I know that HttpWebRequest.GetResponse() generates a HttpWebResonse.
The response has one ContentType property. But the property is just
decided by http response header. It is possible that the content is
actually HTML, while the ContentType is "image/jpeg".
>
Is there any effective way to judge whether the response type is truly
"text"?
I have a idea to read the first several bytes of the response stream;
and check whether they are real displayable characters. But, they can
be any kind of Encoding. Should I try all kinds of Encoding?
>
Sep 25 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.