473,236 Members | 1,307 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,236 software developers and data experts.

Retrieving web page data with the microsoft winsock control.

AaronL
99
Hello,

I am currently working on a project that has me in sort of a bind. What I want to do is retrieve web pages from the internet, and strip them down to just text. I'll get using Regular Expressions to strip out the HTML code itself, the problem is actually getting the web pages from the internet.

I tried using the Microsoft Internet Transfer Control but my client was experiencing problems with some web pages not downloading, this particular page would just get to the <body> tag and then stop. My client reported to me that there are issues reported with the ITC so we determined to find an alternative.

Before I tried the ITC, I used the Microsoft Winsock Control, but with that control, I had problems with web pages being truncated causing my HTML strip out routine to malfunction. We don't want to use Microsoft Internet Controls either at the request of my client.

I feel the Microsoft Winsock Control is the best way to go as to my knowledge is the low level way of communicating with servers on the internet. I feel that it's my lack of understanding on how the Winsock Control works is what is causing my problems. So can someone look at this code and tell me what I'm doing wrong or tell me how I can use the Winsock Control for this project? My job is kind of on the line at this point.

Here is code that I put in a module:
Expand|Select|Wrap|Line Numbers
  1. 'Data variables
  2. Public DataIn As String
  3. Public DataOut As String
  4. Public ErrMsg As String
  5.  
  6. 'Network variables
  7. Public URL As String
  8. Public Port As Integer
  9.  
  10. 'Functional Variables
  11. Public NetExecuting As Boolean
  12. Public NetTimer As Long
  13. Public Function GetHTML(URL As String, Port As Integer)
  14.  
  15. NetTimer = 0
  16.  
  17. 'If winsock is still executing, wait
  18. If NetExecuting = True Then
  19.     Do
  20.         DoEvents
  21.     Loop Until NetExecuting = False
  22. End If
  23.  
  24. DataIn = ""
  25.  
  26. If frmWinSock.Winsock.State <> sckClosed Then
  27.     frmWinSock.Winsock.Close
  28. End If
  29.  
  30. 'If port is 0 then set it to 80
  31. If Port <> 0 Then
  32.     frmWinSock.Winsock.RemotePort = Port
  33. Else
  34.     frmWinSock.Winsock.RemotePort = 80
  35. End If
  36.  
  37. 'Send connection string compatible with Internet Explorer
  38. DataOut = "GET / HTTP/1.0" _
  39. & vbCrLf & "Accept: text/html" _
  40. & vbCrLf & "Host: " & URL _
  41. & vbCrLf & "Connection: open " _
  42. & vbCrLf & "User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)" _
  43. & vbCrLf & "Referer: " _
  44. & vbCrLf & "Cookie: " _
  45. & vbCrLf & vbCrLf
  46.  
  47. frmWinSock.Winsock.RemoteHost = URL
  48.  
  49. frmWinSock.Winsock.Connect
  50. Do
  51.     DoEvents
  52. Loop Until frmWinSock.Winsock.State = sckClosed And DataIn <> ""
  53.  
  54. GetHTML = DataIn
  55.  
  56. End Function
  57.  
Here is my code that runs the winsock functions:
Expand|Select|Wrap|Line Numbers
  1. Private Sub timNetTimeOut_Timer()
  2. Winsock.Close
  3. End Sub
  4.  
  5. Private Sub Winsock_Close()
  6. NetExecuting = False
  7. End Sub
  8.  
  9. Private Sub Winsock_Connect()
  10. timNetTimeOut.Enabled = True
  11. NetExecuting = True
  12. ErrMsg = ""
  13. Winsock.SendData DataOut
  14. End Sub
  15.  
  16. Private Sub Winsock_DataArrival(ByVal bytesTotal As Long)
  17. Dim DataArrived As String
  18. On Error Resume Next
  19. Winsock.GetData DataArrived
  20. DataIn = DataIn & DataArrived
  21. End Sub
  22.  
  23. Private Sub Winsock_Error(ByVal Number As Integer, Description As String, ByVal Scode As Long, ByVal Source As String, ByVal HelpFile As String, ByVal HelpContext As Long, CancelDisplay As Boolean)
  24. NetExecuting = False
  25. ErrMsg = "A network error has occurred: " & Number & " " & Description
  26. End Sub
  27.  
Now I set the timeout timer I made to 10 seconds which should be plenty of time to retrieve a web page, but for example http://news.yahoo.com gets truncated.
Jan 6 '10 #1
3 7499
AaronL
99
I found that winsock has a problem with receiving large data streams. If anyone knows a way around this, I could use the help. Thanks!
Jan 6 '10 #2
I found your code clean and I liked it, thanks.

It works out for me to use a buffer and save it away periodically during the reception of data.
Oct 16 '10 #3
Expand|Select|Wrap|Line Numbers
  1. Private Sub Wsck_DataArrival(ByVal bytesTotal As Long)
  2.  
  3.  Wsck.GetData Response
  4.  Buffer = Buffer & Response
  5.  
  6.  If InStr(1, Buffer, "</HTML>", vbTextCompare) > 0 Then
  7.    Open App.Path & "/folder/" & i & ".txt" For Output As #1
  8.        Print #1, Buffer
  9.        Close #1       
  10.  End If
  11.  
  12. End Sub
Oct 16 '10 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Ophir | last post by:
Hello all ! I wrote a simple ActiveX DLL to wrap winsock functionality so I can use it in an ASP page. I call it MyWinSock In the Class module I use this declaration: Dim ctlSocket as...
6
by: Paul Robinson | last post by:
I am developing a website in ASP that connects to a Sybase database. However, when I try to open a connection to the database the page will not load. The script does not timeout, nor the...
5
by: kc | last post by:
Hi Just upgrading a app from VB6 to VB.Net. All is going well apart from the Winsock control. The first thing we notice is that there does not appear to be a .Net version (please correct me if...
1
by: Joe | last post by:
Hello All, I have a user control which is composed of a label and a dropdownlist. In my code I add the user control to a placeholder on the webform. Now I want to be able to retrieve the...
3
by: Dotnet Gruven | last post by:
I've built a WebForm with a Table added dynamically in Page_Load when IsPostBack is false. The table includes a couple of TextBoxes, RadioButtonLists and CheckboxLists. On postback, those...
7
by: Sirplaya | last post by:
I am retrieving images that I stored in SQL Server on my web pages in C#. I have no problem with the images displaying, however, I am trying to wrap the image with an <A HREF ..." and each time I...
1
by: jimmyfo | last post by:
Hi, I recently wrote an ASP.Net web application in VS2005 and published (using VS2005 Publish feature) it to a relatively clean machine with ASP.Net 2.0 and MDAC 2.8 installed on it. However, when...
5
by: Randy Smith | last post by:
Hi ALL, I wonder if anyone has been using n-tier to bind to a GridView control by using the ObjectDataSource. This is our first OOP web application, and we have no tables. Right now we are...
11
by: User Groups | last post by:
I want to have an asp page that can connect to a TCP listener and send some data to it. I have done some coding and was able to send data to the listener but for one time only. As soon as a...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.