473,401 Members | 2,127 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,401 software developers and data experts.

HttpWebResponse.GetResponseStream returns incomplete stream

Hi, given the following code, I've been successful in grabbing pages
for parsing, but for a certain page template (containing a particular
piece of code) the stream always ends right after that code. If you try
this with just about any type of url (incuding urls from the same site
without that piece of code) it works fine, but with urls containing the
piece of code, the stream is returned only up to that point.

Dim sURL as String
' Works (along with 1000's of other sites/templates/servers):
sURL = "http://www.msnbc.msn.com/id/14191819"
' Doesn't work:
sURL =
"http://www.time.com/time/business/article/0,8599,1226309,00.html"

Dim oSR As StreamReader = getPageContent(sURL)
' If you do oSR.ReadToEnd here, you'll see the page broken at the
wrong place

Private Function getPageContent(ByVal URL As String) As StreamReader
Dim oResponse As HttpWebResponse = Nothing
Dim oSR As StreamReader = Nothing
Dim oRequest As HttpWebRequest
Try
oRequest = WebRequest.Create(URL)
oResponse = CType(oRequest.GetResponse, HttpWebResponse)
oSR = New StreamReader(oResponse.GetResponseStream())
Catch ex As Exception

End Try
Return oSR
End Function

The stream for the time.com pages ends *every time* right after:
<strong>SUBSCRIBE TO TIME MAGAZINE FOR JUST $1.99</strong></a>

.... and the number of characters varies depending on the story, but
each time the "Subscribe" link is there, the response stream dies right
after it. If you view the source of those pages, you'll see a single
blank character, and then an html comment ( <!--cm_searchtext end-->).

So I'm stuck, is it possible that the single character between the </a>
and the comment is breaking the stream? Could it be the server thinking
(correctly) that I'm parsing it and choosing that as the location each
time to cut me off? (Changing the UserAgent property of the
HttpWebRequest doesn't affect the outcome at all). I've played with
several properties of HttpWebRequest, including spoofing a UserAgent,
setting KeepAlive to true, SendChunked, and ProtocolVersion... but
nothing I do seems to keep this from happening.

Any help would be appreciated.
Thanks!
STA

Aug 22 '06 #1
9 13936
Thus wrote ThePants,
Hi, given the following code, I've been successful in grabbing pages
for parsing, but for a certain page template (containing a particular
piece of code) the stream always ends right after that code. If you
try this with just about any type of url (incuding urls from the same
site without that piece of code) it works fine, but with urls
containing the piece of code, the stream is returned only up to that
point.
[...]
... and the number of characters varies depending on the story, but
each time the "Subscribe" link is there, the response stream dies
right after it. If you view the source of those pages, you'll see a
single blank character, and then an html comment ( <!--cm_searchtext
end-->).

So I'm stuck, is it possible that the single character between the
</aand the comment is breaking the stream? Could it be the server
thinking (correctly) that I'm parsing it and choosing that as the
location each time to cut me off? (Changing the UserAgent property of
the HttpWebRequest doesn't affect the outcome at all). I've played
with several properties of HttpWebRequest, including spoofing a
UserAgent, setting KeepAlive to true, SendChunked, and
ProtocolVersion... but nothing I do seems to keep this from happening.
That's a nasty one. At the point where the text is being truncated, there
is a NULL (0x00) character in the page. It's actually the Encoding object
that breaks here, not the response stream. Unfortunately, specifying a DecoderFallback
doesn't work -- seems to be a bug. As a work around, buffer the entire response
in MemoryStream, remove all NULL characters, and decode the buffer with an
Encoding instance.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Aug 24 '06 #2
Joerg Jooss wrote:
That's a nasty one. At the point where the text is being truncated, there
is a NULL (0x00) character in the page. It's actually the Encoding object
that breaks here, not the response stream. Unfortunately, specifying a DecoderFallback
doesn't work -- seems to be a bug. As a work around, buffer the entire response
in MemoryStream, remove all NULL characters, and decode the buffer with an
Encoding instance.

Cheers,
--
Thanks very much for the reply, Joerg. This did the trick! Thank you
very very much for your help.

Aug 25 '06 #3
hi...

could you show me how you did this?

best regards
ThePants wrote:
Joerg Jooss wrote:
That's a nasty one. At the point where the text is being truncated, there
is a NULL (0x00) character in the page. It's actually the Encoding object
that breaks here, not the response stream. Unfortunately, specifying a DecoderFallback
doesn't work -- seems to be a bug. As a work around, buffer the entire response
in MemoryStream, remove all NULL characters, and decode the buffer with an
Encoding instance.

Cheers,
--

Thanks very much for the reply, Joerg. This did the trick! Thank you
very very much for your help.
Aug 28 '06 #4
Thus wrote ja*****@gmail.com,
hi...

could you show me how you did this?
OK, assuming you have a byte array "bytes" containing the entire response
all you need to do is:

using(MemoryStream buffer = new MemoryStream(bytes.Length)) {
foreach(byte b in bytes) {
if(b 0x0) {
buffer.WriteByte(b);
}
}
bytes = buffer.ToArray();
}

// Assuming UTF-8 encoding here...
string response = Encoding.UTF8.GetString(bytes);

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Aug 28 '06 #5
hi..
i have no words to show you how much i am appreciating your help.
but, i couldn't figure out how to capture the stream (from webrequest)
in byte arrays
could you help me out with this too?

best regards ^^

Joerg Jooss wrote:
Thus wrote ja*****@gmail.com,
hi...

could you show me how you did this?

OK, assuming you have a byte array "bytes" containing the entire response
all you need to do is:

using(MemoryStream buffer = new MemoryStream(bytes.Length)) {
foreach(byte b in bytes) {
if(b 0x0) {
buffer.WriteByte(b);
}
}
bytes = buffer.ToArray();
}

// Assuming UTF-8 encoding here...
string response = Encoding.UTF8.GetString(bytes);

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Aug 29 '06 #6
Thus wrote ja*****@gmail.com,
hi..
i have no words to show you how much i am appreciating your help.
but, i couldn't figure out how to capture the stream (from webrequest)
in byte arrays
could you help me out with this too?
best regards ^^
That's System.IO 101 ;-)

Here's a method that sends a HttpWebRequest and copies its response to an
arbitrary Stream object. If you pass a MemoryStream as "outStream", you'll
get what you want.

private void SendRequest(HttpWebRequest request, Stream outStream) {
Debug.Assert(outStream.CanWrite);

using(HttpWebResponse response = (HttpWebResponse) request.GetResponse())
using(Stream responseStream = response.GetResponseStream()) {
byte[] buffer = new byte[0x1000];
int bytes;
while((bytes = responseStream.Read(buffer, 0, buffer.Length)) 0) {
outStream.Write(buffer, 0, bytes);
}
}
}

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Aug 29 '06 #7
Here's my Function in vb.net. Probably not terribly efficient, but I
needed to copy the string back to a memorystream as output. Thanks
again to Joerg for the suggestion.

Private Function getPageContent(ByVal URL As String) As MemoryStream
Dim oResponse As HttpWebResponse = Nothing
Dim oSB As New StringBuilder
Dim oRequest As HttpWebRequest
Try
oRequest = WebRequest.Create(URL)
oRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.2; .NET CLR 2.0.50727; .NET CLR 1.1.4322)"
oResponse = CType(oRequest.GetResponse, HttpWebResponse)
Dim oStreamResponse As Stream = oResponse.GetResponseStream()
Dim oStreamRead As New StreamReader(oStreamResponse)
Dim readBuff(256) As [Char]
Dim nCount As Integer = oStreamRead.Read(readBuff, 0, 256)
While nCount 0
Dim outputData As New [String](readBuff, 0, nCount)
outputData = Replace(outputData, vbNullChar, "")
oSB.Append(outputData)
nCount = oStreamRead.Read(readBuff, 0, 256)
End While
oStreamResponse.Close()
oStreamRead.Close()
Catch ex As Exception

End Try
Dim oWorkStream As New MemoryStream
Dim oEnc As Encoding = Encoding.GetEncoding(1252)
Dim oSW1 As New StreamWriter(oWorkStream, oEnc)
oSW1.Write(oSB.ToString)
oSW1.Flush()
oWorkStream.Position = 0
Return oWorkStream
End Function

Aug 29 '06 #8
Hi.. joerg

once again,,,
thank you for your time,,, and attention, i wish i could invite you a
beer someday.
well.. here is the problem.

i tried the code,,, but it's still throwing the same result..
try this particular url,,
"http://www.altavista.com/web/results?itag=ody&q=tire&kgs=1&kls=0"

if you put this url in IE and run it,,, probably you will get the
result with SIDE Sponser section (right side of page, under sponsered
match)

but if you run this from .net,,, and display the stream (after
processing it),,
you will only see the result without SIDE SPONSERED MATCH section

I don't know if i am explaining well
could you see the problem here?

what i need is display the whole page including every sponsed link..

best regards

jake

Joerg Jooss wrote:
Thus wrote ja*****@gmail.com,
hi..
i have no words to show you how much i am appreciating your help.
but, i couldn't figure out how to capture the stream (from webrequest)
in byte arrays
could you help me out with this too?
best regards ^^

That's System.IO 101 ;-)

Here's a method that sends a HttpWebRequest and copies its response to an
arbitrary Stream object. If you pass a MemoryStream as "outStream", you'll
get what you want.

private void SendRequest(HttpWebRequest request, Stream outStream) {
Debug.Assert(outStream.CanWrite);

using(HttpWebResponse response = (HttpWebResponse) request.GetResponse())
using(Stream responseStream = response.GetResponseStream()) {
byte[] buffer = new byte[0x1000];
int bytes;
while((bytes = responseStream.Read(buffer, 0, buffer.Length)) 0) {
outStream.Write(buffer, 0, bytes);
}
}
}

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Aug 30 '06 #9
Thus wrote ja*****@gmail.com,
Hi.. joerg

once again,,,
thank you for your time,,, and attention, i wish i could invite you a
beer someday.
well.. here is the problem.
i tried the code,,, but it's still throwing the same result.. try this
particular url,,
"http://www.altavista.com/web/results?itag=ody&q=tire&kgs=1&kls=0"

if you put this url in IE and run it,,, probably you will get the
result with SIDE Sponser section (right side of page, under sponsered
match)

but if you run this from .net,,, and display the stream (after
processing it),,
you will only see the result without SIDE SPONSERED MATCH section
I don't know if i am explaining well
could you see the problem here?
what i need is display the whole page including every sponsed link..

best regards
Guess what, I don't even get that sidebar in IE... but at least I seem to
get the exact same content with HttpWebRequest.

Usually, when web applications or web sites behave strangely while being
accessed through your own client application, that is caused by HTTP headers
the site uses to personalize content which are missing in your request --
such as User-Agent to identify the browser, or Accept-Language to identify
your locale. To be on the safe side, you should consider sending these headers:

// request is a HttpWebRequest
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
..NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)";
request.Accept = "en-us";
request.Headers["Acccept-Language"] = "*/*;

This way, you're pretending to be a US IE 6 SP1 that likes any content --
exactly what the real IE sends.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de

Aug 31 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Nathan | last post by:
This is a copy of a message at microsoft.public.dotnet.framework.clr: THE CODE: I'm using an HttpWebResponse object to send an HTTP POST to a Java server I have written and are running on the...
7
by: SKG | last post by:
Iam trying to read an xml file from a website and i get junk characters. But when i open the same file in browser everything is fine. here is the snippet of the code WebRequest objRequest =...
0
by: Winfried Wille | last post by:
Hello group, using the new ftp-methods from dotnet 2.0(using Beta 2), i try to upload a file to a ftp server. According to the documentation i write the data on the stream returned from...
2
by: Joe | last post by:
The example in the msdn shows how to write an image to the OutputStream and the example for GetResponseStream() shows how to read the stream and write it to the console. Unfortunately there is no...
2
by: Jack | last post by:
Hi, I want to read a string a chars from a stream, and put it into a string. At the moment, I'm creating a buffer of a fixed size, and reading the stream of text into it. It works, but I have...
4
by: JVNewbie | last post by:
I'm attempting to test an aspx module that will receive XML data from a web service (the receiver module). I want to be able to test this portion before attempting to create the Web Service that will...
0
by: vishnu | last post by:
Hi, Am trying to post the data over https and am getting error in httpwebresponse.getResponseStream.Please help me to get rid of this issue. Here is the message from immediate window ...
1
by: vito16 | last post by:
Hi, I have some C# code for a console application that was correctly grabbing pages until recently were the data is now incomplete. I am needing to grab all information including sponsored links...
10
by: vegetable21 | last post by:
Hi All, I'm writing an app in C# that will be doing a bit of web scraping. I've got a fair bit of expierence with this but i've come across an issue with the returned HTML i'm getting from the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.