473,406 Members | 2,356 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Extracting data form HTML using vba in Access

Hello Everybody,
Long time since my last log here.
My question is how to extract data from HTML file, which is on my computer, using vba. This all is done in Access.
One very important thing is that the file does not contain tables.
Below the data in the HTML file:
Expand|Select|Wrap|Line Numbers
  1. <html>
  2. <head>
  3.     <meta http-equiv="content-type" content="text/html">
  4.     <meta name="copy-right" content="Fuji Machine Mfg. Co., Ltd.">
  5.     <meta http-equiv="Expires" content="{$now}">
  6.     <link rel="stylesheet" href="../../Definition/Timing_IndexReportStyle.css" type="text/css">
  7.     <title>nxt-kimball3 Timing Report</title>
  8. </head>
  9.  
  10. <script type="text/javascript" src="../../Definition/ReplaceInnerHTML.js"></script>
  11. <script type="text/javascript">
  12. <!--
  13.     function CreateContent()
  14.     {
  15.         ReplaceInnerHTML("Target1", "nxt-kimball3_TimingReportHeader_T.xml", "../../Definition/Common_Header.xsl");
  16.         ReplaceInnerHTML("Target2", "nxt-kimball3_TimingReportHeaderUnit_T.xml", "../../Definition/Common_HeaderUnit.xsl");
  17.         ReplaceInnerHTML("Target3", "nxt-kimball3_TimingReportUnitNXT_T.xml", "../../Definition/NXT_Unit.xsl");
  18.     }
  19. -->
  20. </script>
  21.  
  22. <body onload="CreateContent()">
  23. <img class="logo" src="https://bytes.com/img/logo.jpg" width="70">
  24. <h1>Timing Report</h1>
  25. <blockquote>
  26.     <hr>
  27.     <div id="Target1"></div>
  28.     <hr>
  29.     <div id="Target2"></div>
  30.     <hr>
  31.     <div id="Target3"></div>
  32.     <h3>Cycle time detail</h3>
  33.     <pre>
  34.  
  35.     Number Module Head Stage    PP Cycle Nozzle Name     Nozzle Count Qty
  36.     ------+------+----+--------+--------+--------------+------------+---
  37.          1 M3-2   V12  DP20           11
  38.                                          R07-010-070               12 128
  39.          2 M3-2   V12  DP20           11
  40.                                          R07-010-070                9  96
  41.                                          R07-013-070                3  32
  42.          3 M3-2   V12  DP20           12
  43.                                          R07-010-070               10 112
  44.                                          R07-025-070                2  16
  45.          4 M3-2   V12  DP20           16
  46.                                          R07-013-070                6  96
  47.          5 M3-2   H12S DP20           16
  48.                                          R07-013-070                8  80
  49.                                          R07-025-070                2  32
  50.          6 M3-2   H04S DP20           16
  51.                                          R19-013-155-S              2  32
  52.                                          R19-037-155-S              1  16
  53.          7 M6-2   H02  TUM            16
  54.                                          R36-070G-260               1  16
  55.                                          R36-100G-260               1  16
  56.  
  57.      Nozzle required number 
  58.     ----------------------------------------
  59.      Nozzle Name                        Qty
  60.     ---------------------------------+------
  61.      R07-010-070                         31
  62.      R07-013-070                         17
  63.      R19-013-155-S                        2
  64.      R07-025-070                          4
  65.      R19-037-155-S                        1
  66.      R36-070G-260                         1
  67.      R36-100G-260                         1
  68.  
  69.  
  70.      Feeder required number 
  71.     ----------------------------------------
  72.      Feeder Name                        Qty
  73.     ---------------------------------+------
  74.      KT-0800-180                         21
  75.      KT-1200-180                          2
  76.      KT-1600-380                          1
  77.      KT-2400-380                          1
  78.      KT-3200-380                          1
  79.     </pre>
  80.     <hr>
  81. </blockquote>
  82. </body>
  83. </html>
  84.  
And i want to exctract the data in the data table starting on line 35 going thru line 55 in the above code block into a Access table.

If You have some sample code it will best.

Thank You all.
May 5 '16 #1

✓ answered by jforbes

Another option is to use RegEx to parse out the Records. RegEx is a pretty complicated concept, but it works wonders in a situation like this.

In an attempt to prove it out, I mocked up a Form with two TextBoxes and a Button which when the Button is clicked, it parses the txtHTML TextBox and finds all "Records" that match a given pattern and then lists them in the other TextBox:
Expand|Select|Wrap|Line Numbers
  1. Private Sub cmdParseHTML_Click()
  2.  
  3.     Dim sHTML As String
  4.     Dim sRecords As String
  5.     Dim sPattern As String
  6.     Dim vRecords As Variant
  7.     Dim iCount As Integer
  8.  
  9.     sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
  10.     sHTML = Nz(Me.txtHTML.Value, "")
  11.  
  12.     vRecords = getRxMatches(sHTML, sPattern, True, True, True)
  13.  
  14.     If UBound(vRecords) > 0 Then
  15.         For iCount = LBound(vRecords) To UBound(vRecords)
  16.             sRecords = sRecords & "Record " & (iCount + 1) & " contains: " & vRecords(iCount) & vbCrLf
  17.         Next iCount
  18.         sRecords = sRecords & "Total Records Found: " & iCount
  19.     Else
  20.         sRecords = "Could not find any records"
  21.     End If
  22.  
  23.     Me.txtRecords.Value = sRecords
  24.  
  25. End Sub
The above code relies on the following function:
Expand|Select|Wrap|Line Numbers
  1. Public Function getRxMatches(ByVal SourceString As String, ByVal Pattern As String, Optional ByVal IgnoreCase As Boolean = True, Optional ByVal MultiLine As Boolean = True, Optional ByVal MatchGlobal As Boolean = True) As Variant
  2. ' Source:  http://bytecomb.com/regular-expressions-in-vba/
  3. ' Reference for Regex Patterns:  http://www.regexr.com/
  4.  
  5.     Dim oMatch As Object
  6.     Dim arrMatches() 'As Object
  7.     Dim lngCount As Long
  8.     Dim RegExp As Object
  9.  
  10.     Set RegExp = CreateObject("vbscript.regexp")
  11.  
  12.     ' Initialize to an empty array
  13.     arrMatches = Array()
  14.     With RegExp
  15.         .MultiLine = MultiLine
  16.         .IgnoreCase = IgnoreCase
  17.         .Global = MatchGlobal
  18.         .Pattern = Pattern
  19.         For Each oMatch In .Execute(SourceString)
  20.             ReDim Preserve arrMatches(lngCount)
  21.             arrMatches(lngCount) = oMatch.Value
  22.             lngCount = lngCount + 1
  23.         Next
  24.     End With
  25.  
  26.     getRxMatches = arrMatches
  27. End Function
The trick to make this work for you is to create a reliable pattern. The line:
Expand|Select|Wrap|Line Numbers
  1. sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
is where the Pattern is defined. This Pattern, the one in the code, is defined as a Number followed by a space then by five strings containing Alphanumerics (and dashes) separated by spaces. This pattern is just to prove the concept; you'll probably need to work out the exact pattern for yourself. You can use the really cool website at http://www.regexr.com/ to help you troubleshoot your patterns. There's also this documentation from Microsoft to scare the crap out of you: Introduction to Regular Expressions (Scripting)

3 2385
PhilOfWalton
1,430 Expert 1GB
I use something similar to extract share prices from a web page or extract data from an email.

Roughly, I have a table of "Codes"
Expand|Select|Wrap|Line Numbers
  1. CodeID
  2. TableID            Points to table to which the code refers
  3. LinkedTableKeyID   Points to Linked table key
  4. CRBefore    Y/N    Carriage Return before data
  5. SearchString       String being searched for
  6. CRAfter            Carriage Return after data
  7. CodeOrder          Order of search
  8. CodeIsUsed  Y/N    Is this string used or is it one for information for other web sites
  9. FieldName          If we want to load the value found into a field
  10. FieldDesc          If we want to show a description of the field found
  11. CodeBody    Y/N    True = Body of Email, False = Head
  12. CodeRemarks        Any odd comments
  13.  
Here is an example
Expand|Select|Wrap|Line Numbers
  1. SearchString    CodeIsUsed    CodeOrder    CRBefore    CRAfter    FieldName    FieldDesc    CodeBody    CodeRemarks    LinkedTableKeyID    CodeID    ExistingLinkedTableKeyID
  2. "<input type=""hidden"" id=""citicode"" name=""citicode"" value=""KMF8"" />"    -1    10    0    0            -1        204    142    
  3. "<div class=""contentTop1"">KMF8</div>"    -1    20    0    0            -1        204    143    
  4. "<div class=""contentTop1"">GBX&nbsp;</div>"    -1    30    0    0            -1        204    144    
  5. "<div class=""contentTop1"">GBX"    -1    40    0    0    SharePrice        -1        204    145    
  6. </div>    -1    50    0    0            -1        204    153    
  7.  
Looks horrible.
Basically, I read through the HTML until I find the first code, then continue reading through till I find the next code etc. Eventually I find the code just before the information I want (in this case share price) and load it into a field called [SharePrice] in my table.

Obviously a brief outline, but if it looks promising, we can go into it further.

Phil
May 5 '16 #2
jforbes
1,107 Expert 1GB
Another option is to use RegEx to parse out the Records. RegEx is a pretty complicated concept, but it works wonders in a situation like this.

In an attempt to prove it out, I mocked up a Form with two TextBoxes and a Button which when the Button is clicked, it parses the txtHTML TextBox and finds all "Records" that match a given pattern and then lists them in the other TextBox:
Expand|Select|Wrap|Line Numbers
  1. Private Sub cmdParseHTML_Click()
  2.  
  3.     Dim sHTML As String
  4.     Dim sRecords As String
  5.     Dim sPattern As String
  6.     Dim vRecords As Variant
  7.     Dim iCount As Integer
  8.  
  9.     sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
  10.     sHTML = Nz(Me.txtHTML.Value, "")
  11.  
  12.     vRecords = getRxMatches(sHTML, sPattern, True, True, True)
  13.  
  14.     If UBound(vRecords) > 0 Then
  15.         For iCount = LBound(vRecords) To UBound(vRecords)
  16.             sRecords = sRecords & "Record " & (iCount + 1) & " contains: " & vRecords(iCount) & vbCrLf
  17.         Next iCount
  18.         sRecords = sRecords & "Total Records Found: " & iCount
  19.     Else
  20.         sRecords = "Could not find any records"
  21.     End If
  22.  
  23.     Me.txtRecords.Value = sRecords
  24.  
  25. End Sub
The above code relies on the following function:
Expand|Select|Wrap|Line Numbers
  1. Public Function getRxMatches(ByVal SourceString As String, ByVal Pattern As String, Optional ByVal IgnoreCase As Boolean = True, Optional ByVal MultiLine As Boolean = True, Optional ByVal MatchGlobal As Boolean = True) As Variant
  2. ' Source:  http://bytecomb.com/regular-expressions-in-vba/
  3. ' Reference for Regex Patterns:  http://www.regexr.com/
  4.  
  5.     Dim oMatch As Object
  6.     Dim arrMatches() 'As Object
  7.     Dim lngCount As Long
  8.     Dim RegExp As Object
  9.  
  10.     Set RegExp = CreateObject("vbscript.regexp")
  11.  
  12.     ' Initialize to an empty array
  13.     arrMatches = Array()
  14.     With RegExp
  15.         .MultiLine = MultiLine
  16.         .IgnoreCase = IgnoreCase
  17.         .Global = MatchGlobal
  18.         .Pattern = Pattern
  19.         For Each oMatch In .Execute(SourceString)
  20.             ReDim Preserve arrMatches(lngCount)
  21.             arrMatches(lngCount) = oMatch.Value
  22.             lngCount = lngCount + 1
  23.         Next
  24.     End With
  25.  
  26.     getRxMatches = arrMatches
  27. End Function
The trick to make this work for you is to create a reliable pattern. The line:
Expand|Select|Wrap|Line Numbers
  1. sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
is where the Pattern is defined. This Pattern, the one in the code, is defined as a Number followed by a space then by five strings containing Alphanumerics (and dashes) separated by spaces. This pattern is just to prove the concept; you'll probably need to work out the exact pattern for yourself. You can use the really cool website at http://www.regexr.com/ to help you troubleshoot your patterns. There's also this documentation from Microsoft to scare the crap out of you: Introduction to Regular Expressions (Scripting)
May 5 '16 #3
Phil, jforbes,

Thank You,

This Helps a lot. I will try it out and i hope it will work :)
May 9 '16 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: zaphod | last post by:
I want to use MS Accesss as a front end for an existing MySQL database. Since Acceess doesn't have the same data types, eg. ENUM, am I tied to the smaller subset of types supported by Access? I'm...
4
by: phong.lee | last post by:
Here is an issue i've been trying to figure out which i do not have an answer to. Basically i have an excel spreadsheet which has data that i copy and paste from another resource. I also created...
4
by: dubing | last post by:
Hello, Here is what we need to do. On a Linux Apache server, we need to write PHP or Perl scripts to pull the data directly from the Access database and display extracted data on the web. ...
13
by: nitenmistry | last post by:
Hi, I'm creating a HTML report using Python and would like to import some tabular data which is in tab delimited format into the HTML page. The data needs to be displayed as a table within the...
4
by: Debbiedo | last post by:
My software program outputs an XML Driving Directions file that I need to input into an Access table (although if need be I can import a dbf or xls) so that I can relate one of the fields...
9
by: Hamayun Khan | last post by:
Hi I m using form like below <form action='' method = post> <input type=text name=txt id=txt> <select name=region id=region multiple> <option value=1>ONE</option> <option value=2>Two</option>...
5
gpraghuram
by: gpraghuram | last post by:
Hi, I want to know how to extract information from a XML file using a XSL style sheet using perl. I have alreday written a XSL for a XML file. My requirement is to apply this XSL on the xml which...
1
by: typedefcoder | last post by:
I have two files, xml and an html and need to extract data from these on certain patterns. my XML file is pretty well formatted and i can use getline to read a line and search data between tags. ...
6
by: prasad joshi | last post by:
i want to extract text data from notepad file using access vba 2007 ..... can any body tell me how to do this please give me sample examples if anyone have.
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.