By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,543 Members | 1,887 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,543 IT Pros & Developers. It's quick & easy.

Open very large text files using VBA?

P: 2
Is there a way to open very large text files in VBA?

I wrote some code in VBA under Excel 2003 to read database extracts, add the field names as the first line, and to convert the fixed length fields to tab delimited. This allowed the data to be imported into a data analysis tool. Easy so far.

The problem came when I tried to open the larger files. The biggest is around 350GB. VBA gave an error.

I've got a workaround by splitting the files into smaller chunks, around 1GB, and processing them, then importing the parts into the analysis tool. I'm pretty happy with the results, but it's not perfect because the big data files have to be split as a separate step.

Does anyone know of a way the splits could be managed from within VBA or, preferably, VBA could read the files without needing them split into smaller chunks?

Also, which other languages would allow me to process such big files? It must be possible, because the split programs manage to process them without crashing.

Thanks

James
Nov 6 '10 #1

✓ answered by CookieRevised

The limit with native VBA/VB6 file commands (like: open, close, filelen(), lof(), etc) is indeed 2GB. No matter how you access them (random, input, output, binary).

This because all those commands and functions use LONG as their datatype. And a LONG in VBA/VB6 is a signed 4 byte number, or to put it simply: ranging from -2.147.483.648 to max 2.147.483.647.

So, if you want to process files larger than 2GB, you'll need to use low-level Windows API's instead. Microsoft has provided a KB article which explains all, including a VBA/VB6 class which you can directly use:

https://support.microsoft.com/en-us/kb/189981

Share this Question
Share on Google+
8 Replies


ADezii
Expert 5K+
P: 8,619
I've never worked on any monstrous Text Files of which you describe, but I did find this Thread very interesting. Supposedly, if a Text File is well-formed and has a Database-like structure consisting of Rows and Columns, then you can use ADO to query the contents of the Text File. From the content of your description, your Text Files clearly qualify. I found some useful Code on the Internet and am currently trying to adapt it to your specific need. Is there any chance that you can send me, via my Personal E-Mail Address, a sample File that is close to what you normally experience? This would be for testing purposes, since I have no idea how well, or if at all, this approach will work.
Nov 6 '10 #2

NeoPa
Expert Mod 15k+
P: 31,399
My experience with text files in VBA is that 2GB is the limit of what can be handled. I may have that wrong and it can handle 4GB, but I think it's only 2. 2GB & 4GB are the numbers that can be represented by signed and unsigned 32-bit integers respectively. Actually, each value is short of the full GB boundary by one (2GB-1 and 4GB-1). Larger files can be opened of course. It's simply that there is no way to specify the position required in the file for I/O once you get past the 2GB/4GB boundary.

I'd be interested in finding a way to reference further into files in VBA myself. 64-bit machines would handle much, much more (16 Exabytes or 4 Gigabytes^2) natively, but I expect ordinary Access would restrict them to the scope of the 32-bits anyway. At least for the foreseeable future.

It's a shame, because RAM can be accessed further in than that by using segment registers, but files cannot benefit from that technique.
Nov 7 '10 #3

ADezii
Expert 5K+
P: 8,619
James, here is some Code that I have been experimenting with, using an ADO approach. I ran into a couple of snafus right off the bat, but I'll post it here for your review. For some strange reason, it will only read the first 1,006 Lines out of 5,735. Can't work on it too much now, since I am at work (LOL), but I'll get back to it in a day or two, at which time you will already have solved the problem! (LOL).
Expand|Select|Wrap|Line Numbers
  1. 'Must set a Reference to the Microsoft ActiveX Data Objects X.X Object Library
  2. Dim rsData As ADODB.Recordset
  3.  
  4. '******************************* USER DEFINED *******************************
  5. Const conPATH As String = "C:\Test\"                'PATH without Filename  *
  6. Const conTEXT_FILE_NAME As String = "MyData.txt"    'Text File Name only    *
  7. '****************************************************************************
  8.  
  9. Dim szConnect As String
  10. Dim strSQL As String
  11. Dim lngStart As Long
  12. Dim lngEnd As Long
  13. Dim lngCtr As Long
  14.  
  15. 'Create the Connection String. If the first row of the Text File has Column
  16. 'names, the last line above should be: "Extended Properties=Text;"
  17. szConnect = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
  18.             "Data Source=" & conPATH & ";" & _
  19.             "Extended Properties=""Text;HDR=NO"";"
  20.  
  21. 'Create the SQL statement.
  22.  strSQL = "SELECT * FROM " & conTEXT_FILE_NAME & ";"
  23.  
  24.  Set rsData = New ADODB.Recordset
  25.  
  26.  rsData.Open strSQL, szConnect, adOpenForwardOnly, _
  27.  adLockReadOnly, adCmdText
  28.  
  29.  DoCmd.Hourglass True
  30.  
  31.  lngStart = timeGetTime()
  32.  
  33.  With rsData
  34.    Do While Not .EOF
  35.      lngCtr = lngCtr + 1
  36.        'Waste Time
  37.        Debug.Print "Line# " & Format(lngCtr, "0000000000") & " | " & "Length of Field: " & _
  38.                     Len(.Fields(0)) & " | " & "Value: " & .Fields(0)
  39.          rsData.MoveNext
  40.    Loop
  41.  End With
  42.  
  43. lngEnd = timeGetTime()
  44.  
  45. DoCmd.Hourglass False
  46.  
  47. MsgBox "Processing Time Using ADO: " & (lngEnd - lngStart) / 1000 & " seconds..."
  48.  
  49. rsData.Close
  50. Set rsData = Nothing
Nov 9 '10 #4

P: 2
Well, I'm off to work this morning, and when I get back I'll try to see if I can merge your code and MSDN Article titled "Retrieving Large Data (ADO.NET)". This looks promising.
Nov 10 '10 #5

NeoPa
Expert Mod 15k+
P: 31,399
Just to clarify, I was referring in my post (#3) to a little known or used facility within VBA which gives direct binary access to a file (without any value descriptors). I use the following Open command :
Expand|Select|Wrap|Line Numbers
  1. Open strName For Binary Access Read Write Lock Write As #intFileNo
strName and intFileNo are previously set up variables of course, but this gives such direct access to the file in as much as a string set up with a value of the number of characters required in your buffer can be used to read from (Get #FileNo, lngFilePos, strBuffer) and write to (Put #FileNo, lngFilePos, strBuffer) the file directly.

I mention this simply to clarify my earlier post as I realised many may well wonder what I was going on about (This facility even existing is not well publicised and I expect many believe it doesn't even exist). With the file size limited to what can be addressed directly by a Long Integer value though, this can never exceed 4GB, nor 2GB if it treats it as a signed value (which I would doubt - but the documentation is not specific about this - and I've certainly never stretched it that far).

For that reason particularly (the complicated nature of handling all the I/O directly, could be another reason for putting most people off) I would say this is of limited use for your situation. I hope (and expect) that the ADO.Recordset code should be a more usable solution for you.

Good luck with your project.
Nov 11 '10 #6

P: 1
The limit with native VBA/VB6 file commands (like: open, close, filelen(), lof(), etc) is indeed 2GB. No matter how you access them (random, input, output, binary).

This because all those commands and functions use LONG as their datatype. And a LONG in VBA/VB6 is a signed 4 byte number, or to put it simply: ranging from -2.147.483.648 to max 2.147.483.647.

So, if you want to process files larger than 2GB, you'll need to use low-level Windows API's instead. Microsoft has provided a KB article which explains all, including a VBA/VB6 class which you can directly use:

https://support.microsoft.com/en-us/kb/189981
May 7 '15 #7

zmbd
Expert Mod 5K+
P: 5,397
Cookie that's a nice find!
I'm tempted to set that as "Best Answer"
Any second on the motion?
May 7 '15 #8

NeoPa
Expert Mod 15k+
P: 31,399
You have a second on that my friend :-)

Very interesting post and link. I'll be saving that one away for sure.

I would comment that I did realise that Long values themselves can only reach 2GB but that isn't an absolute block on using them to refer to areas between 2GB and 4GB. Clearly that's how it's implemented, but it's not beyond logic for MS to have implemented an interface whereby negative numbers are treated within the interface as unsigned values. In truth, I'm a little disappointed by their lack of imagination on this one.

Nevertheless, your link takes us beyond all that. I love that it also comes with a usable class for VBA. Wonderful.
May 8 '15 #9

Post your reply

Sign in to post your reply or Sign up for a free account.