473,668 Members | 2,425 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

HttpWebResponse .GetResponseStr eam returns incomplete stream

Hi, given the following code, I've been successful in grabbing pages
for parsing, but for a certain page template (containing a particular
piece of code) the stream always ends right after that code. If you try
this with just about any type of url (incuding urls from the same site
without that piece of code) it works fine, but with urls containing the
piece of code, the stream is returned only up to that point.

Dim sURL as String
' Works (along with 1000's of other sites/templates/servers):
sURL = "http://www.msnbc.msn.c om/id/14191819"
' Doesn't work:
sURL =
"http://www.time.com/time/business/article/0,8599,1226309, 00.html"

Dim oSR As StreamReader = getPageContent( sURL)
' If you do oSR.ReadToEnd here, you'll see the page broken at the
wrong place

Private Function getPageContent( ByVal URL As String) As StreamReader
Dim oResponse As HttpWebResponse = Nothing
Dim oSR As StreamReader = Nothing
Dim oRequest As HttpWebRequest
Try
oRequest = WebRequest.Crea te(URL)
oResponse = CType(oRequest. GetResponse, HttpWebResponse )
oSR = New StreamReader(oR esponse.GetResp onseStream())
Catch ex As Exception

End Try
Return oSR
End Function

The stream for the time.com pages ends *every time* right after:
<strong>SUBSCRI BE TO TIME MAGAZINE FOR JUST $1.99</strong></a>

.... and the number of characters varies depending on the story, but
each time the "Subscribe" link is there, the response stream dies right
after it. If you view the source of those pages, you'll see a single
blank character, and then an html comment ( <!--cm_searchtext end-->).

So I'm stuck, is it possible that the single character between the </a>
and the comment is breaking the stream? Could it be the server thinking
(correctly) that I'm parsing it and choosing that as the location each
time to cut me off? (Changing the UserAgent property of the
HttpWebRequest doesn't affect the outcome at all). I've played with
several properties of HttpWebRequest, including spoofing a UserAgent,
setting KeepAlive to true, SendChunked, and ProtocolVersion ... but
nothing I do seems to keep this from happening.

Any help would be appreciated.
Thanks!
STA

Aug 22 '06 #1
9 13974
Thus wrote ThePants,
Hi, given the following code, I've been successful in grabbing pages
for parsing, but for a certain page template (containing a particular
piece of code) the stream always ends right after that code. If you
try this with just about any type of url (incuding urls from the same
site without that piece of code) it works fine, but with urls
containing the piece of code, the stream is returned only up to that
point.
[...]
... and the number of characters varies depending on the story, but
each time the "Subscribe" link is there, the response stream dies
right after it. If you view the source of those pages, you'll see a
single blank character, and then an html comment ( <!--cm_searchtext
end-->).

So I'm stuck, is it possible that the single character between the
</aand the comment is breaking the stream? Could it be the server
thinking (correctly) that I'm parsing it and choosing that as the
location each time to cut me off? (Changing the UserAgent property of
the HttpWebRequest doesn't affect the outcome at all). I've played
with several properties of HttpWebRequest, including spoofing a
UserAgent, setting KeepAlive to true, SendChunked, and
ProtocolVersion ... but nothing I do seems to keep this from happening.
That's a nasty one. At the point where the text is being truncated, there
is a NULL (0x00) character in the page. It's actually the Encoding object
that breaks here, not the response stream. Unfortunately, specifying a DecoderFallback
doesn't work -- seems to be a bug. As a work around, buffer the entire response
in MemoryStream, remove all NULL characters, and decode the buffer with an
Encoding instance.

Cheers,
--
Joerg Jooss
ne********@joer gjooss.de
Aug 24 '06 #2
Joerg Jooss wrote:
That's a nasty one. At the point where the text is being truncated, there
is a NULL (0x00) character in the page. It's actually the Encoding object
that breaks here, not the response stream. Unfortunately, specifying a DecoderFallback
doesn't work -- seems to be a bug. As a work around, buffer the entire response
in MemoryStream, remove all NULL characters, and decode the buffer with an
Encoding instance.

Cheers,
--
Thanks very much for the reply, Joerg. This did the trick! Thank you
very very much for your help.

Aug 25 '06 #3
hi...

could you show me how you did this?

best regards
ThePants wrote:
Joerg Jooss wrote:
That's a nasty one. At the point where the text is being truncated, there
is a NULL (0x00) character in the page. It's actually the Encoding object
that breaks here, not the response stream. Unfortunately, specifying a DecoderFallback
doesn't work -- seems to be a bug. As a work around, buffer the entire response
in MemoryStream, remove all NULL characters, and decode the buffer with an
Encoding instance.

Cheers,
--

Thanks very much for the reply, Joerg. This did the trick! Thank you
very very much for your help.
Aug 28 '06 #4
Thus wrote ja*****@gmail.c om,
hi...

could you show me how you did this?
OK, assuming you have a byte array "bytes" containing the entire response
all you need to do is:

using(MemoryStr eam buffer = new MemoryStream(by tes.Length)) {
foreach(byte b in bytes) {
if(b 0x0) {
buffer.WriteByt e(b);
}
}
bytes = buffer.ToArray( );
}

// Assuming UTF-8 encoding here...
string response = Encoding.UTF8.G etString(bytes) ;

Cheers,
--
Joerg Jooss
ne********@joer gjooss.de
Aug 28 '06 #5
hi..
i have no words to show you how much i am appreciating your help.
but, i couldn't figure out how to capture the stream (from webrequest)
in byte arrays
could you help me out with this too?

best regards ^^

Joerg Jooss wrote:
Thus wrote ja*****@gmail.c om,
hi...

could you show me how you did this?

OK, assuming you have a byte array "bytes" containing the entire response
all you need to do is:

using(MemoryStr eam buffer = new MemoryStream(by tes.Length)) {
foreach(byte b in bytes) {
if(b 0x0) {
buffer.WriteByt e(b);
}
}
bytes = buffer.ToArray( );
}

// Assuming UTF-8 encoding here...
string response = Encoding.UTF8.G etString(bytes) ;

Cheers,
--
Joerg Jooss
ne********@joer gjooss.de
Aug 29 '06 #6
Thus wrote ja*****@gmail.c om,
hi..
i have no words to show you how much i am appreciating your help.
but, i couldn't figure out how to capture the stream (from webrequest)
in byte arrays
could you help me out with this too?
best regards ^^
That's System.IO 101 ;-)

Here's a method that sends a HttpWebRequest and copies its response to an
arbitrary Stream object. If you pass a MemoryStream as "outStream" , you'll
get what you want.

private void SendRequest(Htt pWebRequest request, Stream outStream) {
Debug.Assert(ou tStream.CanWrit e);

using(HttpWebRe sponse response = (HttpWebRespons e) request.GetResp onse())
using(Stream responseStream = response.GetRes ponseStream()) {
byte[] buffer = new byte[0x1000];
int bytes;
while((bytes = responseStream. Read(buffer, 0, buffer.Length)) 0) {
outStream.Write (buffer, 0, bytes);
}
}
}

Cheers,
--
Joerg Jooss
ne********@joer gjooss.de
Aug 29 '06 #7
Here's my Function in vb.net. Probably not terribly efficient, but I
needed to copy the string back to a memorystream as output. Thanks
again to Joerg for the suggestion.

Private Function getPageContent( ByVal URL As String) As MemoryStream
Dim oResponse As HttpWebResponse = Nothing
Dim oSB As New StringBuilder
Dim oRequest As HttpWebRequest
Try
oRequest = WebRequest.Crea te(URL)
oRequest.UserAg ent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.2; .NET CLR 2.0.50727; .NET CLR 1.1.4322)"
oResponse = CType(oRequest. GetResponse, HttpWebResponse )
Dim oStreamResponse As Stream = oResponse.GetRe sponseStream()
Dim oStreamRead As New StreamReader(oS treamResponse)
Dim readBuff(256) As [Char]
Dim nCount As Integer = oStreamRead.Rea d(readBuff, 0, 256)
While nCount 0
Dim outputData As New [String](readBuff, 0, nCount)
outputData = Replace(outputD ata, vbNullChar, "")
oSB.Append(outp utData)
nCount = oStreamRead.Rea d(readBuff, 0, 256)
End While
oStreamResponse .Close()
oStreamRead.Clo se()
Catch ex As Exception

End Try
Dim oWorkStream As New MemoryStream
Dim oEnc As Encoding = Encoding.GetEnc oding(1252)
Dim oSW1 As New StreamWriter(oW orkStream, oEnc)
oSW1.Write(oSB. ToString)
oSW1.Flush()
oWorkStream.Pos ition = 0
Return oWorkStream
End Function

Aug 29 '06 #8
Hi.. joerg

once again,,,
thank you for your time,,, and attention, i wish i could invite you a
beer someday.
well.. here is the problem.

i tried the code,,, but it's still throwing the same result..
try this particular url,,
"http://www.altavista.c om/web/results?itag=od y&q=tire&kgs=1& kls=0"

if you put this url in IE and run it,,, probably you will get the
result with SIDE Sponser section (right side of page, under sponsered
match)

but if you run this from .net,,, and display the stream (after
processing it),,
you will only see the result without SIDE SPONSERED MATCH section

I don't know if i am explaining well
could you see the problem here?

what i need is display the whole page including every sponsed link..

best regards

jake

Joerg Jooss wrote:
Thus wrote ja*****@gmail.c om,
hi..
i have no words to show you how much i am appreciating your help.
but, i couldn't figure out how to capture the stream (from webrequest)
in byte arrays
could you help me out with this too?
best regards ^^

That's System.IO 101 ;-)

Here's a method that sends a HttpWebRequest and copies its response to an
arbitrary Stream object. If you pass a MemoryStream as "outStream" , you'll
get what you want.

private void SendRequest(Htt pWebRequest request, Stream outStream) {
Debug.Assert(ou tStream.CanWrit e);

using(HttpWebRe sponse response = (HttpWebRespons e) request.GetResp onse())
using(Stream responseStream = response.GetRes ponseStream()) {
byte[] buffer = new byte[0x1000];
int bytes;
while((bytes = responseStream. Read(buffer, 0, buffer.Length)) 0) {
outStream.Write (buffer, 0, bytes);
}
}
}

Cheers,
--
Joerg Jooss
ne********@joer gjooss.de
Aug 30 '06 #9
Thus wrote ja*****@gmail.c om,
Hi.. joerg

once again,,,
thank you for your time,,, and attention, i wish i could invite you a
beer someday.
well.. here is the problem.
i tried the code,,, but it's still throwing the same result.. try this
particular url,,
"http://www.altavista.c om/web/results?itag=od y&q=tire&kgs=1& kls=0"

if you put this url in IE and run it,,, probably you will get the
result with SIDE Sponser section (right side of page, under sponsered
match)

but if you run this from .net,,, and display the stream (after
processing it),,
you will only see the result without SIDE SPONSERED MATCH section
I don't know if i am explaining well
could you see the problem here?
what i need is display the whole page including every sponsed link..

best regards
Guess what, I don't even get that sidebar in IE... but at least I seem to
get the exact same content with HttpWebRequest.

Usually, when web applications or web sites behave strangely while being
accessed through your own client application, that is caused by HTTP headers
the site uses to personalize content which are missing in your request --
such as User-Agent to identify the browser, or Accept-Language to identify
your locale. To be on the safe side, you should consider sending these headers:

// request is a HttpWebRequest
request.UserAge nt = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
..NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)";
request.Accept = "en-us";
request.Headers["Acccept-Language"] = "*/*;

This way, you're pretending to be a US IE 6 SP1 that likes any content --
exactly what the real IE sends.

Cheers,
--
Joerg Jooss
ne********@joer gjooss.de

Aug 31 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2912
by: Nathan | last post by:
This is a copy of a message at microsoft.public.dotnet.framework.clr: THE CODE: I'm using an HttpWebResponse object to send an HTTP POST to a Java server I have written and are running on the same machine (for dev and testing). Here is the C# code snippet: 1 string clientAddr = "http://127.0.0.1:22225/"; 2 try 3 { 4 webreq = (HttpWebRequest)WebRequest.Create( clientAddr );
7
6671
by: SKG | last post by:
Iam trying to read an xml file from a website and i get junk characters. But when i open the same file in browser everything is fine. here is the snippet of the code WebRequest objRequest = WebRequest.Create("http://www.xvdabc.com/order.xml"); WebResponse objResponse = objRequest.GetResponse(); StreamReader SR = new StreamReader(objResponse.GetResponseStream()); string strContent = oSR.ReadToEnd();
0
1383
by: Winfried Wille | last post by:
Hello group, using the new ftp-methods from dotnet 2.0(using Beta 2), i try to upload a file to a ftp server. According to the documentation i write the data on the stream returned from GetResponseStream , but this stream is readonly! I checked param and configuration of the used ftp-sites. Hers some code: Public Shared Function PutFile(ByVal path As String, ByVal host As String, ByVal ftpPath As String, ByRef overwrite As Overwrite,...
2
2083
by: Joe | last post by:
The example in the msdn shows how to write an image to the OutputStream and the example for GetResponseStream() shows how to read the stream and write it to the console. Unfortunately there is no example on how to distinguish between the image that was written to the stream and the page itself which is being returned. Is there a trick getting only that data I want from the stream or do I need to add a delimiter and parse the entire...
2
6606
by: Jack | last post by:
Hi, I want to read a string a chars from a stream, and put it into a string. At the moment, I'm creating a buffer of a fixed size, and reading the stream of text into it. It works, but I have to create a buffer of a pre-defined length: (ConstBufferByteSize=10000000). How can I read a stream into a buffer or string without knowing the number of chars the stream will contain?
4
3337
by: JVNewbie | last post by:
I'm attempting to test an aspx module that will receive XML data from a web service (the receiver module). I want to be able to test this portion before attempting to create the Web Service that will generate the POST to this module. The "poster module" works fine but when I try to get the response back from the receiver module I get a "(500) Internal Server Error." and the module is never executed. I have added it to the same solution and also...
0
2883
by: vishnu | last post by:
Hi, Am trying to post the data over https and am getting error in httpwebresponse.getResponseStream.Please help me to get rid of this issue. Here is the message from immediate window ?myResp.GetResponseStream() {System.Net.ConnectStream}
1
5533
by: vito16 | last post by:
Hi, I have some C# code for a console application that was correctly grabbing pages until recently were the data is now incomplete. I am needing to grab all information including sponsored links for a url similar to: "http://search.live.com/results.aspx?q=sony+bmg" All sponsored ads on the site are contained in one of two <div> tags: <div id="at"> or <div id="ar"> when I save the output to a file and search for these tags they are not...
10
8433
by: vegetable21 | last post by:
Hi All, I'm writing an app in C# that will be doing a bit of web scraping. I've got a fair bit of expierence with this but i've come across an issue with the returned HTML i'm getting from the pages. When i view the page i'm trying to scrape in a website all is well, and i can see a 'Services' section on the bottom of the page, however when i do a view source on the page, the corresponding code is missing for that part of the page. I've...
0
8462
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8893
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8802
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8658
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7405
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6209
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4206
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2792
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1787
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.