473,387 Members | 1,497 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Regex and screen scraping

Hello all,

Well I have to say Im getting exicted about my app , its almost there,
I have added a button to IE and am calling the current instance of IE
and grabbing th URL out just fine. Im using the webclient to grab the
html so far so good and Im only half bald.

Now I am at the point I need to extract out a couple of fields from
the HTML itself. I have read about usin regex to do this but am a
little confusedm, maybe Ive just been staring at the screen too long.

I get this HTML returned.

<b>Binding:</b> Paperback<br> <b>Publisher:</b>

What I need to extract is the word Paperback from the above string.

Here is what I have so far, I have no Idea if its right is

Dim regex As New Regex("<b>Binding:</b>((.|\n)*?)<br> <b>Publisher:",
RegexOptions.IgnoreCase)

But uhhhhhh now what do I do with that to return just the word
Paperback ?

I have several item on the same page that need to be returned, I am a
little lost about what or how I need to read it in ,do I need to put
it into StreamReader or ......well what do I do with it then.

Chris
Nov 22 '05 #1
2 1770
Hi Chris,

When you want to do it in a Document Object Model way you can use mshtml.
You have to set a reference to it using

project->add references-> .Net -> microsoft.mshtml

Do not set an import to it, because it freezes your IDE and reference it
every time you need it.

However did you know that the newsgroup

microsoft.public.dotnet.languages.vb is much more for this kind of
questions.

I hope this helps?

Cor
Nov 22 '05 #2
Hi Chris,

When you want to do it in a Document Object Model way you can use mshtml.
You have to set a reference to it using

project->add references-> .Net -> microsoft.mshtml

Do not set an import to it, because it freezes your IDE and reference it
every time you need it.

However did you know that the newsgroup

microsoft.public.dotnet.languages.vb is much more for this kind of
questions.

I hope this helps?

Cor
Nov 22 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Jonathan Epstein | last post by:
I would like to perform a more classical type of "screen scraping" than what most people now associate with this term. I only want to find all the text on the current screen, and obtain associated...
2
by: Chris Wertman | last post by:
Hello all, Well I have to say Im getting exicted about my app , its almost there, I have added a button to IE and am calling the current instance of IE and grabbing th URL out just fine. Im...
4
by: Roland Hall | last post by:
Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped? function moi() { var tag = '<a href='; var...
2
by: Me | last post by:
I am dealing with a poorly written windows application that does not contain an API. I would like to use C# to run a predetermied set of steps in the application and scrape the resulting data...
0
by: shmulik28 | last post by:
I'm just getting my feet wet w/regex's and was wondering if someone could show me how to parse a piece of code. I'm screen scraping from a financial website and want to add the values into an...
0
by: Robert Martinez | last post by:
I've seen a lot about screen scraping with .NET, mostly in VB.net. I have been able to convert most of it over, but it is still just very basic stuff. Can someone help direct me toward some good...
3
by: _eee_ | last post by:
Does anyone know of a simple code module that can do screen scraping, including simulating user-entered pushbuttons, etc. I can get the first screen on a website with HttpWebRequest, but I need...
1
by: niv | last post by:
Hello, I would like to screen scrape certain parts of a webpage...how can I do this in asp.net For instance.... a stockticker thats embeded on a webpage.. I dont want the entire page.. I...
4
by: rachel | last post by:
Hello, I am currently contracted out by a real estate agent. He has a page that he has created himself that has a list of homes.. their images and data in html format. He wants me to take...
4
by: different.engine | last post by:
Folks: I am screen scraping a large volume of data from Yahoo Finance each evening, and parsing with Beautiful Soup. I was wondering if anyone could give me some pointers on how to make it...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.