472,110 Members | 2,228 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,110 software developers and data experts.

How to speed up a code loop with INSERT INTO query?

Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.

The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)

'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======

Aug 2 '07 #1
8 6305
Why not just import the html file into a table using built in import
features? To do it manually, you would go to file - get external data-
import, and select the html file. To do it in code, you would use
docmd.transfertext. On the import, your lines would automatically
truncate to 255 if that were the size of the field you were importing
to. You could do a global replace of the ' either before or after the
import.

As far as your code goes, I would expect that the delay would have a lot
to do with the replace(left( functions and the sound thing, but I
imagine you tested it without the click.

hope this helps
-John

SaltyBoat wrote:
Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.

The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)

'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======
Aug 2 '07 #2

"SaltyBoat" <ha*****@gmail.comwrote in message
news:11**********************@e9g2000prf.googlegro ups.com...
Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.

The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)

'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======
If a query can't do it as John Welch posted, why not open the EWBsRaw a
record set to insert the records. RunSql needs to open and close the table
for each new record; Open / Close can be very expensive operations. Also
what is "Call Click", how long does it take to run.

Air Code

dim rsEWB as recordset

set rsEWB = currentdb.openrecordset( "EWBsRaw")
....
Do Until ...
....
rsEWB.addnew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
....
Loop
....
rsEWB.Close
set rsEWB = nothing
Aug 2 '07 #3
open a DAO recordset in Apppend mode and use AddNew .. Update

SaltyBoat wrote:
Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.

The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)

'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======
Aug 2 '07 #4
I'd link the text data (not import it), then loop/parse thru it, then
insert the usable data once it's clean. This way the bulk raw data
stays out of the database, and the information is saved already
trimmed and clean.

On Aug 2, 12:59 pm, SaltyBoat <hall...@gmail.comwrote:
Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.

The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)

'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======

Aug 2 '07 #5
Thanks everyone for the help and advice. I tried the link to html,
and the docmd.transfertext suggestions and find that they seem to
require that the data embedded in the html to use an html table
format, which my data does not have. I then tried the advice of using
a recordset in code, versus the docmd.runsql "...INSERT INTO...", and
using a recordset helps quite a bit. I found that the speed my record
processing more than tripled. Very good improvement! And, a couple
people speculated about the delay of the Call Click sound routine,
upon testing that I found it to have no perceptible hit on
performance. Again, thanks for the help, this Usenet group is great!
Here is the working debugged 'real code' that I ended up using, which
is only slightly modified from paii, Ron's 'air code', thanks Ron..

I want to speed this up even more.
I am curious of your advice as to whether faster hardware can help.
Is the likely bottleneck the CPU processing time?
Or, the speed of write to disk?

===========begin code========
Public Sub importPDFrecordset()
Dim rsEWB As DAO.Recordset
Set rsEWB = CurrentDb.OpenRecordset("EWBsRaw")
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
rsEWB.AddNew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
Loop
Close #F
rsEWB.Close
Set rsEWB = Nothing
End Sub
============end code=========

On Aug 2, 10:30 am, "paii, Ron" <n...@no.comwrote:
"SaltyBoat" <hall...@gmail.comwrote in message

news:11**********************@e9g2000prf.googlegro ups.com...
Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.
The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)
'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======

If a query can't do it as John Welch posted, why not open the EWBsRaw a
record set to insert the records. RunSql needs to open and close the table
for each new record; Open / Close can be very expensive operations. Also
what is "Call Click", how long does it take to run.

Air Code

dim rsEWB as recordset

set rsEWB = currentdb.openrecordset( "EWBsRaw")
...
Do Until ...
...
rsEWB.addnew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
...
Loop
...
rsEWB.Close
set rsEWB = nothing

Aug 2 '07 #6
I was thinking that you would use transfertext and just treat the file
as a text file rather than an html file, and have it just import each
line into a single field.
If you want to do it with the recordset method and speed it up, you
might consider replacing all the 's with a text editor beforehand, and
then just saying rsEWB![Field1] = txtLine

SaltyBoat wrote:
>
I want to speed this up even more.
I am curious of your advice as to whether faster hardware can help.
Is the likely bottleneck the CPU processing time?
Or, the speed of write to disk?

===========begin code========
Public Sub importPDFrecordset()
Dim rsEWB As DAO.Recordset
Set rsEWB = CurrentDb.OpenRecordset("EWBsRaw")
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
rsEWB.AddNew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
Loop
Close #F
rsEWB.Close
Set rsEWB = Nothing
End Sub
============end code=========

On Aug 2, 10:30 am, "paii, Ron" <n...@no.comwrote:
>"SaltyBoat" <hall...@gmail.comwrote in message

news:11**********************@e9g2000prf.googlegr oups.com...
>>Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.
The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)
'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======
If a query can't do it as John Welch posted, why not open the EWBsRaw a
record set to insert the records. RunSql needs to open and close the table
for each new record; Open / Close can be very expensive operations. Also
what is "Call Click", how long does it take to run.

Air Code

dim rsEWB as recordset

set rsEWB = currentdb.openrecordset( "EWBsRaw")
...
Do Until ...
...
rsEWB.addnew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
...
Loop
...
rsEWB.Close
set rsEWB = nothing

Aug 3 '07 #7
On Aug 2, 5:05 pm, SaltyBoat <hall...@gmail.comwrote:
Thanks everyone for the help and advice. I tried the link to html,
and the docmd.transfertext suggestions and find that they seem to
require that the data embedded in the html to use an html table
format, which my data does not have. I then tried the advice of using
a recordset in code, versus the docmd.runsql "...INSERT INTO...", and
using a recordset helps quite a bit. I found that the speed my record
processing more than tripled. Very good improvement! And, a couple
people speculated about the delay of the Call Click sound routine,
upon testing that I found it to have no perceptible hit on
performance. Again, thanks for the help, this Usenet group is great!
Here is the working debugged 'real code' that I ended up using, which
is only slightly modified from paii, Ron's 'air code', thanks Ron..

I want to speed this up even more.
I am curious of your advice as to whether faster hardware can help.
Is the likely bottleneck the CPU processing time?
Or, the speed of write to disk?

===========begin code========
Public Sub importPDFrecordset()
Dim rsEWB As DAO.Recordset
Set rsEWB = CurrentDb.OpenRecordset("EWBsRaw")
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
rsEWB.AddNew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
Loop
Close #F
rsEWB.Close
Set rsEWB = Nothing
End Sub
============end code=========

On Aug 2, 10:30 am, "paii, Ron" <n...@no.comwrote:
"SaltyBoat" <hall...@gmail.comwrote in message
news:11**********************@e9g2000prf.googlegro ups.com...
Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.
The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)
'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======
If a query can't do it as John Welch posted, why not open the EWBsRaw a
record set to insert the records. RunSql needs to open and close the table
for each new record; Open / Close can be very expensive operations. Also
what is "Call Click", how long does it take to run.
Air Code
dim rsEWB as recordset
set rsEWB = currentdb.openrecordset( "EWBsRaw")
...
Do Until ...
...
rsEWB.addnew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
...
Loop
...
rsEWB.Close
set rsEWB = nothing
The bang maybe slow.You could probably speed it up a little bit by
declaring the field, eg.

Dim TheField as DAO.Field

After opening the recordset

Set TheField = rsEWB![Field1]

And later

TheField = Replace(Left(txtLine, 254), "'", "")
or
TheField.Value = Replace(Left(txtLine, 254), "'", "")

And you might reduce time a little more by using a Regular Expression
to do the replace.This example replaces HTML tags:

Public Function RemoveTags(ByVal InputString$)
' requires that Visual Basic Script be installed
' VBS is installed by default with Windows
' it disappears only with direct action by an administrator
Dim re As Object
Set re = CreateObject("VBScript.RegExp")
InputString = Trim(InputString)
With re
.Global = True
.IgnoreCase = True
.Pattern = "\<(.|\n)*?\>"
RemoveHTMLTags = .Replace(InputString, "")
End With
Set re = Nothing
End Function

I think the pattern for replacing a single quote would be "\'"

Of course, you don't want to keep instantiating and releasing that RE
Object for every text line so you'd probably want to give it modular
scope, or load the whole file into a string, doing the replace on the
whole string, then saving the string to file again (maybe a new file)
and then importing the individual lines. That might be more efficient
as well.

Of course faster hardware will help. I have a 64 bit machine, and a
laptop. But I develop on an old hand-me-down clunker, an E-Machnes
1742. I figure if the app is fast enough on it, it'll be fast enough
on anything. When I want something to whir I go over to the 64 bit
machine. Sometimes I sit and wait for a long time before I realize
that it completed the task so fast I didn't notice that it started.

Aug 3 '07 #8
Thanks John Welch, I just did a menu command 'File|Get External Data|
Import...|Text Files' of the html file renamed as a txt file. (I am
guessing that this menu wizard uses the same underlying transfertext
method). Total import of 907,000 lines of text took about three
minutes. This is fantastic improvement over the eight hours it used
to take using code and a docmd.runsql INSERT INTO, statement. <grin>.

Also, prior to import I used a text editor to strip out the
troublesome 'single quote' characters, taking about five seconds.

On Aug 2, 8:32 pm, John Welch <so...@nospam.comwrote:
I was thinking that you would use transfertext and just treat the file
as a text file rather than an html file, and have it just import each
line into a single field.
If you want to do it with the recordset method and speed it up, you
might consider replacing all the 's with a text editor beforehand, and
then just saying rsEWB![Field1] = txtLine

SaltyBoat wrote:
I want to speed this up even more.
I am curious of your advice as to whether faster hardware can help.
Is the likely bottleneck the CPU processing time?
Or, the speed of write to disk?
===========begin code========
Public Sub importPDFrecordset()
Dim rsEWB As DAO.Recordset
Set rsEWB = CurrentDb.OpenRecordset("EWBsRaw")
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
rsEWB.AddNew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
Loop
Close #F
rsEWB.Close
Set rsEWB = Nothing
End Sub
============end code=========
On Aug 2, 10:30 am, "paii, Ron" <n...@no.comwrote:
"SaltyBoat" <hall...@gmail.comwrote in message
>news:11**********************@e9g2000prf.googlegr oups.com...
>Needing to import and parse data from a large PDF file into an Access
2002 table: I start by converted the PDF file to a html file. Then
I read this html text file, line by line, into a table using a code
loop and an INSERT INTO query. About 800,000 records of raw text.
Later, I can then loop through and parse these 800,000 strings into
usable data using more code.
The problem I have is that the conversion of the text file, using a
code loop and an INSERT INTO query, runs at the speed of about 30
records per second. Nearly 8 hours total. I welcome suggestions as
to how to speed this up. (Either code based, and/or hardware based.)
'=====Loop import text file into table code=====
' dumps.html is a 14,000 KB text file, with 800,000 lines.
' Click() and TaDa() are code that makes sound so I can hear the
progress of the loop.
' The table EWBsRaw is local and has two fields, ID (an autonumber)
and Field1 (255 size text field).
'===
Public Sub ImportPDFtable()
TheLine = 0
F = FreeFile
Open "G:\BOD\DEWRs\dumps.html" For Input As #F
Do Until EOF(F)
Line Input #F, txtLine
TheLine = TheLine + 1
DoCmd.RunSQL "INSERT INTO EWBsRaw ( Field1 ) SELECT '" &
Replace(Left(txtLine, 254), "'", "") & "' AS Expr1"
Call Click
Loop
Close #F
Debug.Print TheLine
Call TaDa
End Sub
'=====end code=======
If a query can't do it as John Welch posted, why not open the EWBsRaw a
record set to insert the records. RunSql needs to open and close the table
for each new record; Open / Close can be very expensive operations. Also
what is "Call Click", how long does it take to run.
Air Code
dim rsEWB as recordset
set rsEWB = currentdb.openrecordset( "EWBsRaw")
...
Do Until ...
...
rsEWB.addnew
rsEWB![Field1] = Replace(Left(txtLine, 254), "'", "")
rsEWB.Update
...
Loop
...
rsEWB.Close
set rsEWB = nothing

Aug 3 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by Markku Uttula | last post: by
17 posts views Thread by Shailesh Humbad | last post: by
5 posts views Thread by ArShAm | last post: by
4 posts views Thread by cameron | last post: by
8 posts views Thread by Josué Maldonado | last post: by
1 post views Thread by staticfire | last post: by
11 posts views Thread by mdboldin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.