By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,289 Members | 3,049 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,289 IT Pros & Developers. It's quick & easy.

Get Data from a Website

100+
P: 675
Access 2002 - Windows XP Home SP3
I want to get some data from a website. Currently I select the desired area of the webpage and copy to Clipboard. This selected area contains text and 2 imbedded images, a .jpg (~7K) and a .gif.
When I cntl-V (Paste) into an Access textbox, the 2 images are gone. Also gone are the stuff the designers of the webpage use to put the various pictures, colors, etc. on the webpage. I know nothing about this.
I have to return to the web page and separately SaveAs the .jpg image, and note which .gif image is shown, so I can manually enter using a combobox.

I want to do automatically-
1) Get the text without the imbedded images.
2) Determine which (of about 20) .gif is present. This may be in the image name, if I could see it.
3) Determine if .jpg is 1 of 3, and if not, SaveAs & strFileName, where I can generate strFileName after analyzing text in 1).

I notice that pasting the entire webpage into WordPad shows different results in the area I'm interested in than selecting that area from the webpage and pasting only that into WordPad.

Can someone point me toward info on how to do this?
Mar 9 '09 #1
Share this Question
Share on Google+
3 Replies


mshmyob
Expert 100+
P: 903
Hello OB,

Have you tried HTML Scraping. A member has some code here that you should be able to modify. I haven't tried the code myself but have a look at it or read some articles on HTML Scraping for more details,

http://bytes.com/topic/access/answer...-html-scraping


cheers,

@OldBirdman
Mar 10 '09 #2

ADezii
Expert 5K+
P: 8,666
@OldBirdman
Couldn't the entire HTML Source Code be Pasted into Word where it could easily be examined or Saved as Text Only where the entire File can be opened in Access, and each Line analyzed in turn?
Mar 11 '09 #3

100+
P: 675
I cannot get the code referred to by HTML scraping to work. I get a message
Compile error:
User-defined type not defined
on Line 2
Expand|Select|Wrap|Line Numbers
  1. Dim webBrowser As webBrowser
Apparently Access 2002 does not know what a webBrowser is. I don't understand the code, and can't figure it out if I can't run it. I learn from new examples by stepping thru the code, but this code won't start for me.

Couldn't the entire HTML Source Code be Pasted into Word where it could easily be examined or Saved as Text Only where the entire File can be opened in Access, and each Line analyzed in turn?
I've never tried automation, which I believe this involves. However, I did open Word and paste the entire webpage into it. It took about 45 seconds (hourglass) to do the paste. Even repeated pastes. The clear in preparation for another paste took 10 seconds.
Loading Word has its own time penalty. I don't know enough about Word to create a command button to read the text. So here is a whole new subject to investigate, but I think not now.
As I look at the Word page after the paste, I still can't find the .jpg picture address to download and save. The .gif name is embedded in the image frame, and that name would be enough for my purposes, if I knew enough to get to it. But the time cost is excessive, and I still would have to return to IE and "SaveAs..." the image.
I currently paste my selected area of the webpage into an unbound textbox, assign to a string variable, and scrape it. This gets me all the info I need except the .gif name and the ability to save the .jpg image.
So far, my steps are:
Expand|Select|Wrap|Line Numbers
  1. 1) Press "New Record" button in my Access (myDB)
  2. 2) Alt+Tab to Microsoft Internet Explorer (IE) 
  3.    and select the correct Tab and/or navigate
  4.    to the desired web page
  5. 3) Select desired section of web page
  6. 4) Cntl+V to copy to clipboard
  7. 5) Alt+Tab to return to myDB
  8. 6) Press command button "Paste from Website"
  9. 7) If MsgBox "GIF not determined", clear with "OK"
  10. 8) Alt+Tab to return to IE
  11. 9) RightClick .jpg Image and select "Save Picture As..."
  12.       Cntl+V to paste file name into dialog
  13.           (this was generated in step 6
  14.            and copied to clipboard)
  15.       Press "Enter" or click "Save" to save image
  16. 10) Mentally note .gif displayed
  17. 11) Cntl+Tab to return to myDB
  18. 12) If 7) displayed msg, Click combobox and 
  19.       select row to note .gif displayed
  20. 13) Click command button to record that an image
  21.       was actually acquired & saved.
Although this seems like a clumsy set of instructions, replacing with these doesn't seem to help.
Expand|Select|Wrap|Line Numbers
  1. 1) same as 1) above
  2. 2) same as 2) above
  3. 3) Cntl+A Select the entire web page
  4. 4) Cntl+C Copy web page to clipboard
  5. 5) Alt+Tab to Word
  6. 6) Cntl+A Select anything in Word
  7. 7) Cntl+P Paste selected from step 4,
  8.     overwriting anything already in Word
  9. 8) Alt+Tab to myDB
  10. 9) Press command button "Scrape from Word"
  11. 10) <<I still have no .jpg image, not sure steps here>>
What I am aiming for (and may not get to) is:
Expand|Select|Wrap|Line Numbers
  1. 1) same as 1) above
  2. 2) same as 2) above
  3. 3) Alt+Tab to myDB
  4. 4) Press command button "Get from Website"
Mar 11 '09 #4

Post your reply

Sign in to post your reply or Sign up for a free account.