Hello Everybody,
Long time since my last log here.
My question is how to extract data from HTML file, which is on my computer, using vba. This all is done in Access.
One very important thing is that the file does not contain tables.
Below the data in the HTML file: -
<html>
-
<head>
-
<meta http-equiv="content-type" content="text/html">
-
<meta name="copy-right" content="Fuji Machine Mfg. Co., Ltd.">
-
<meta http-equiv="Expires" content="{$now}">
-
<link rel="stylesheet" href="../../Definition/Timing_IndexReportStyle.css" type="text/css">
-
<title>nxt-kimball3 Timing Report</title>
-
</head>
-
-
<script type="text/javascript" src="../../Definition/ReplaceInnerHTML.js"></script>
-
<script type="text/javascript">
-
<!--
-
function CreateContent()
-
{
-
ReplaceInnerHTML("Target1", "nxt-kimball3_TimingReportHeader_T.xml", "../../Definition/Common_Header.xsl");
-
ReplaceInnerHTML("Target2", "nxt-kimball3_TimingReportHeaderUnit_T.xml", "../../Definition/Common_HeaderUnit.xsl");
-
ReplaceInnerHTML("Target3", "nxt-kimball3_TimingReportUnitNXT_T.xml", "../../Definition/NXT_Unit.xsl");
-
}
-
-->
-
</script>
-
-
<body onload="CreateContent()">
-
<img class="logo" src="https://bytes.com/img/logo.jpg" width="70">
-
<h1>Timing Report</h1>
-
<blockquote>
-
<hr>
-
<div id="Target1"></div>
-
<hr>
-
<div id="Target2"></div>
-
<hr>
-
<div id="Target3"></div>
-
<h3>Cycle time detail</h3>
-
<pre>
-
-
Number Module Head Stage PP Cycle Nozzle Name Nozzle Count Qty
-
------+------+----+--------+--------+--------------+------------+---
-
1 M3-2 V12 DP20 11
-
R07-010-070 12 128
-
2 M3-2 V12 DP20 11
-
R07-010-070 9 96
-
R07-013-070 3 32
-
3 M3-2 V12 DP20 12
-
R07-010-070 10 112
-
R07-025-070 2 16
-
4 M3-2 V12 DP20 16
-
R07-013-070 6 96
-
5 M3-2 H12S DP20 16
-
R07-013-070 8 80
-
R07-025-070 2 32
-
6 M3-2 H04S DP20 16
-
R19-013-155-S 2 32
-
R19-037-155-S 1 16
-
7 M6-2 H02 TUM 16
-
R36-070G-260 1 16
-
R36-100G-260 1 16
-
-
Nozzle required number
-
----------------------------------------
-
Nozzle Name Qty
-
---------------------------------+------
-
R07-010-070 31
-
R07-013-070 17
-
R19-013-155-S 2
-
R07-025-070 4
-
R19-037-155-S 1
-
R36-070G-260 1
-
R36-100G-260 1
-
-
-
Feeder required number
-
----------------------------------------
-
Feeder Name Qty
-
---------------------------------+------
-
KT-0800-180 21
-
KT-1200-180 2
-
KT-1600-380 1
-
KT-2400-380 1
-
KT-3200-380 1
-
</pre>
-
<hr>
-
</blockquote>
-
</body>
-
</html>
-
And i want to exctract the data in the data table starting on line 35 going thru line 55 in the above code block into a Access table.
If You have some sample code it will best.
Thank You all.
Another option is to use RegEx to parse out the Records. RegEx is a pretty complicated concept, but it works wonders in a situation like this.
In an attempt to prove it out, I mocked up a Form with two TextBoxes and a Button which when the Button is clicked, it parses the txtHTML TextBox and finds all "Records" that match a given pattern and then lists them in the other TextBox: - Private Sub cmdParseHTML_Click()
-
-
Dim sHTML As String
-
Dim sRecords As String
-
Dim sPattern As String
-
Dim vRecords As Variant
-
Dim iCount As Integer
-
-
sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
-
sHTML = Nz(Me.txtHTML.Value, "")
-
-
vRecords = getRxMatches(sHTML, sPattern, True, True, True)
-
-
If UBound(vRecords) > 0 Then
-
For iCount = LBound(vRecords) To UBound(vRecords)
-
sRecords = sRecords & "Record " & (iCount + 1) & " contains: " & vRecords(iCount) & vbCrLf
-
Next iCount
-
sRecords = sRecords & "Total Records Found: " & iCount
-
Else
-
sRecords = "Could not find any records"
-
End If
-
-
Me.txtRecords.Value = sRecords
-
-
End Sub
The above code relies on the following function: - Public Function getRxMatches(ByVal SourceString As String, ByVal Pattern As String, Optional ByVal IgnoreCase As Boolean = True, Optional ByVal MultiLine As Boolean = True, Optional ByVal MatchGlobal As Boolean = True) As Variant
-
' Source: http://bytecomb.com/regular-expressions-in-vba/
-
' Reference for Regex Patterns: http://www.regexr.com/
-
-
Dim oMatch As Object
-
Dim arrMatches() 'As Object
-
Dim lngCount As Long
-
Dim RegExp As Object
-
-
Set RegExp = CreateObject("vbscript.regexp")
-
-
' Initialize to an empty array
-
arrMatches = Array()
-
With RegExp
-
.MultiLine = MultiLine
-
.IgnoreCase = IgnoreCase
-
.Global = MatchGlobal
-
.Pattern = Pattern
-
For Each oMatch In .Execute(SourceString)
-
ReDim Preserve arrMatches(lngCount)
-
arrMatches(lngCount) = oMatch.Value
-
lngCount = lngCount + 1
-
Next
-
End With
-
-
getRxMatches = arrMatches
-
End Function
The trick to make this work for you is to create a reliable pattern. The line: - sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
is where the Pattern is defined. This Pattern, the one in the code, is defined as a Number followed by a space then by five strings containing Alphanumerics (and dashes) separated by spaces. This pattern is just to prove the concept; you'll probably need to work out the exact pattern for yourself. You can use the really cool website at http://www.regexr.com/ to help you troubleshoot your patterns. There's also this documentation from Microsoft to scare the crap out of you: Introduction to Regular Expressions (Scripting) 3 2385
I use something similar to extract share prices from a web page or extract data from an email.
Roughly, I have a table of "Codes" -
CodeID
-
TableID Points to table to which the code refers
-
LinkedTableKeyID Points to Linked table key
-
CRBefore Y/N Carriage Return before data
-
SearchString String being searched for
-
CRAfter Carriage Return after data
-
CodeOrder Order of search
-
CodeIsUsed Y/N Is this string used or is it one for information for other web sites
-
FieldName If we want to load the value found into a field
-
FieldDesc If we want to show a description of the field found
-
CodeBody Y/N True = Body of Email, False = Head
-
CodeRemarks Any odd comments
-
Here is an example -
SearchString CodeIsUsed CodeOrder CRBefore CRAfter FieldName FieldDesc CodeBody CodeRemarks LinkedTableKeyID CodeID ExistingLinkedTableKeyID
-
"<input type=""hidden"" id=""citicode"" name=""citicode"" value=""KMF8"" />" -1 10 0 0 -1 204 142
-
"<div class=""contentTop1"">KMF8</div>" -1 20 0 0 -1 204 143
-
"<div class=""contentTop1"">GBX </div>" -1 30 0 0 -1 204 144
-
"<div class=""contentTop1"">GBX" -1 40 0 0 SharePrice -1 204 145
-
</div> -1 50 0 0 -1 204 153
-
Looks horrible.
Basically, I read through the HTML until I find the first code, then continue reading through till I find the next code etc. Eventually I find the code just before the information I want (in this case share price) and load it into a field called [SharePrice] in my table.
Obviously a brief outline, but if it looks promising, we can go into it further.
Phil
Another option is to use RegEx to parse out the Records. RegEx is a pretty complicated concept, but it works wonders in a situation like this.
In an attempt to prove it out, I mocked up a Form with two TextBoxes and a Button which when the Button is clicked, it parses the txtHTML TextBox and finds all "Records" that match a given pattern and then lists them in the other TextBox: - Private Sub cmdParseHTML_Click()
-
-
Dim sHTML As String
-
Dim sRecords As String
-
Dim sPattern As String
-
Dim vRecords As Variant
-
Dim iCount As Integer
-
-
sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
-
sHTML = Nz(Me.txtHTML.Value, "")
-
-
vRecords = getRxMatches(sHTML, sPattern, True, True, True)
-
-
If UBound(vRecords) > 0 Then
-
For iCount = LBound(vRecords) To UBound(vRecords)
-
sRecords = sRecords & "Record " & (iCount + 1) & " contains: " & vRecords(iCount) & vbCrLf
-
Next iCount
-
sRecords = sRecords & "Total Records Found: " & iCount
-
Else
-
sRecords = "Could not find any records"
-
End If
-
-
Me.txtRecords.Value = sRecords
-
-
End Sub
The above code relies on the following function: - Public Function getRxMatches(ByVal SourceString As String, ByVal Pattern As String, Optional ByVal IgnoreCase As Boolean = True, Optional ByVal MultiLine As Boolean = True, Optional ByVal MatchGlobal As Boolean = True) As Variant
-
' Source: http://bytecomb.com/regular-expressions-in-vba/
-
' Reference for Regex Patterns: http://www.regexr.com/
-
-
Dim oMatch As Object
-
Dim arrMatches() 'As Object
-
Dim lngCount As Long
-
Dim RegExp As Object
-
-
Set RegExp = CreateObject("vbscript.regexp")
-
-
' Initialize to an empty array
-
arrMatches = Array()
-
With RegExp
-
.MultiLine = MultiLine
-
.IgnoreCase = IgnoreCase
-
.Global = MatchGlobal
-
.Pattern = Pattern
-
For Each oMatch In .Execute(SourceString)
-
ReDim Preserve arrMatches(lngCount)
-
arrMatches(lngCount) = oMatch.Value
-
lngCount = lngCount + 1
-
Next
-
End With
-
-
getRxMatches = arrMatches
-
End Function
The trick to make this work for you is to create a reliable pattern. The line: - sPattern = "[0-9]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+ [a-zA-Z0-9-]+[a-zA-Z0-9-]+"
is where the Pattern is defined. This Pattern, the one in the code, is defined as a Number followed by a space then by five strings containing Alphanumerics (and dashes) separated by spaces. This pattern is just to prove the concept; you'll probably need to work out the exact pattern for yourself. You can use the really cool website at http://www.regexr.com/ to help you troubleshoot your patterns. There's also this documentation from Microsoft to scare the crap out of you: Introduction to Regular Expressions (Scripting)
Phil, jforbes,
Thank You,
This Helps a lot. I will try it out and i hope it will work :)
Sign in to post your reply or Sign up for a free account.
Similar topics
by: zaphod |
last post by:
I want to use MS Accesss as a front end for an existing MySQL database.
Since Acceess doesn't have the same data types, eg. ENUM, am I tied to the
smaller subset of types supported by Access? I'm...
|
by: phong.lee |
last post by:
Here is an issue i've been trying to figure out which i do not have an
answer to. Basically i have an excel spreadsheet which has data that
i copy and paste from another resource. I also created...
|
by: dubing |
last post by:
Hello,
Here is what we need to do. On a Linux Apache server, we need to
write PHP or Perl scripts to pull the data directly from the Access
database and display extracted data on the web. ...
|
by: nitenmistry |
last post by:
Hi,
I'm creating a HTML report using Python and would like to import some tabular data which is in tab delimited format into the HTML page.
The data needs to be displayed as a table within the...
|
by: Debbiedo |
last post by:
My software program outputs an XML Driving Directions file that I need
to input into an Access table (although if need be I can import a dbf
or xls) so that I can relate one of the fields...
|
by: Hamayun Khan |
last post by:
Hi
I m using form like below
<form action='' method = post>
<input type=text name=txt id=txt>
<select name=region id=region multiple>
<option value=1>ONE</option>
<option value=2>Two</option>...
|
by: gpraghuram |
last post by:
Hi,
I want to know how to extract information from a XML file using a XSL style sheet using perl.
I have alreday written a XSL for a XML file.
My requirement is to apply this XSL on the xml which...
|
by: typedefcoder |
last post by:
I have two files, xml and an html and need to extract data from these on certain patterns. my XML file is pretty well formatted and i can use getline to read a line and search data between tags.
...
|
by: prasad joshi |
last post by:
i want to extract text data from notepad file using access vba 2007 .....
can any body tell me how to do this
please give me sample examples if anyone have.
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |