By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,481 Members | 3,046 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,481 IT Pros & Developers. It's quick & easy.

how we extract data from html file

100+
P: 198
Hi
i am making a program in which i want to extract data from html file .
Actually there are two dates on html file i want to extract these dates but the main probleum is that these dates are different on each file. A word "AKTIVA" is always comes before these dates.
i made this by seaching the activa word but after this i am not getting any idea how these dates can be accessed.

i use one another method by transfering the whole data of html into excel file but when i open the newly created file using connection object it show s the error that "External Table Is Not IN required Format" actually the value in cells are not in a correct format

can any body tell me about one of the above method i used or give me another idea

i am eagrly waiting for the answer
varinder
May 30 '08 #1
Share this Question
Share on Google+
1 Reply


!NoItAll
100+
P: 296
HTML is not a data structure and therefore can not be accessed like one.
XML is a data structure and there are rules for XML that don't exist for HTML. What you are doing is called "screen scraping" and is a dubious activity at best.
Here's an example:
In XML tags can identify a string as a segment of data, such as a <firstname> or <lastname> or <date>. HTML can only identify how the string is to be displayed (big bold and blue). HTML is a display structure.
You can only hope that whoever creates the HTML will never change how they do it - but they will. HTML is also typically not well formed (unless it is XHTML) and therefore cannot be parsed using standard XML tools.
In order to parse HTML you will pretty much be forced to do it all manually. Instr, Left, Right, Mid, Replace, etc. It gets pretty ugly - and then has to change every time the page author changes their mind.

'************************************************* **************
Public Function GetDatefromHTML(ByVal sHTML as String)as Date
Dim sTemp As String
Dim dDate As Date

sTemp = Mid$(sHTML, InStr(1, sHTML, "aktiva", vbTextCompare))
On Error Goto BadDate
dDate = CDate(Mid$(sTemp, [number of chars into sTemp the date begins], [length of the date string]))
On Error Goto 0
GetDatefromHTML = dDate
Exit Function

BadDate:
dDate = "1/1/1970"
Resume next
End Sub
'************************************************* ************************
There are lots of other ways to do this - this is pretty down and dirty. Using the cDate function (built into VB) it will convert most any valid date format into a DATE type - and will appear however your computer locale is set.
If the HTML changes though, or the date is bad it will return 1/1/1970 (you can pick any date - but it has to return a date.
May 31 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.