473,799 Members | 2,940 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Open very large text files using VBA?

2 New Member
Is there a way to open very large text files in VBA?

I wrote some code in VBA under Excel 2003 to read database extracts, add the field names as the first line, and to convert the fixed length fields to tab delimited. This allowed the data to be imported into a data analysis tool. Easy so far.

The problem came when I tried to open the larger files. The biggest is around 350GB. VBA gave an error.

I've got a workaround by splitting the files into smaller chunks, around 1GB, and processing them, then importing the parts into the analysis tool. I'm pretty happy with the results, but it's not perfect because the big data files have to be split as a separate step.

Does anyone know of a way the splits could be managed from within VBA or, preferably, VBA could read the files without needing them split into smaller chunks?

Also, which other languages would allow me to process such big files? It must be possible, because the split programs manage to process them without crashing.

Thanks

James
Nov 6 '10 #1
8 17324
ADezii
8,834 Recognized Expert Expert
I've never worked on any monstrous Text Files of which you describe, but I did find this Thread very interesting. Supposedly, if a Text File is well-formed and has a Database-like structure consisting of Rows and Columns, then you can use ADO to query the contents of the Text File. From the content of your description, your Text Files clearly qualify. I found some useful Code on the Internet and am currently trying to adapt it to your specific need. Is there any chance that you can send me, via my Personal E-Mail Address, a sample File that is close to what you normally experience? This would be for testing purposes, since I have no idea how well, or if at all, this approach will work.
Nov 6 '10 #2
NeoPa
32,579 Recognized Expert Moderator MVP
My experience with text files in VBA is that 2GB is the limit of what can be handled. I may have that wrong and it can handle 4GB, but I think it's only 2. 2GB & 4GB are the numbers that can be represented by signed and unsigned 32-bit integers respectively. Actually, each value is short of the full GB boundary by one (2GB-1 and 4GB-1). Larger files can be opened of course. It's simply that there is no way to specify the position required in the file for I/O once you get past the 2GB/4GB boundary.

I'd be interested in finding a way to reference further into files in VBA myself. 64-bit machines would handle much, much more (16 Exabytes or 4 Gigabytes^2) natively, but I expect ordinary Access would restrict them to the scope of the 32-bits anyway. At least for the foreseeable future.

It's a shame, because RAM can be accessed further in than that by using segment registers, but files cannot benefit from that technique.
Nov 7 '10 #3
ADezii
8,834 Recognized Expert Expert
James, here is some Code that I have been experimenting with, using an ADO approach. I ran into a couple of snafus right off the bat, but I'll post it here for your review. For some strange reason, it will only read the first 1,006 Lines out of 5,735. Can't work on it too much now, since I am at work (LOL), but I'll get back to it in a day or two, at which time you will already have solved the problem! (LOL).
Expand|Select|Wrap|Line Numbers
  1. 'Must set a Reference to the Microsoft ActiveX Data Objects X.X Object Library
  2. Dim rsData As ADODB.Recordset
  3.  
  4. '******************************* USER DEFINED *******************************
  5. Const conPATH As String = "C:\Test\"                'PATH without Filename  *
  6. Const conTEXT_FILE_NAME As String = "MyData.txt"    'Text File Name only    *
  7. '****************************************************************************
  8.  
  9. Dim szConnect As String
  10. Dim strSQL As String
  11. Dim lngStart As Long
  12. Dim lngEnd As Long
  13. Dim lngCtr As Long
  14.  
  15. 'Create the Connection String. If the first row of the Text File has Column
  16. 'names, the last line above should be: "Extended Properties=Text;"
  17. szConnect = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
  18.             "Data Source=" & conPATH & ";" & _
  19.             "Extended Properties=""Text;HDR=NO"";"
  20.  
  21. 'Create the SQL statement.
  22.  strSQL = "SELECT * FROM " & conTEXT_FILE_NAME & ";"
  23.  
  24.  Set rsData = New ADODB.Recordset
  25.  
  26.  rsData.Open strSQL, szConnect, adOpenForwardOnly, _
  27.  adLockReadOnly, adCmdText
  28.  
  29.  DoCmd.Hourglass True
  30.  
  31.  lngStart = timeGetTime()
  32.  
  33.  With rsData
  34.    Do While Not .EOF
  35.      lngCtr = lngCtr + 1
  36.        'Waste Time
  37.        Debug.Print "Line# " & Format(lngCtr, "0000000000") & " | " & "Length of Field: " & _
  38.                     Len(.Fields(0)) & " | " & "Value: " & .Fields(0)
  39.          rsData.MoveNext
  40.    Loop
  41.  End With
  42.  
  43. lngEnd = timeGetTime()
  44.  
  45. DoCmd.Hourglass False
  46.  
  47. MsgBox "Processing Time Using ADO: " & (lngEnd - lngStart) / 1000 & " seconds..."
  48.  
  49. rsData.Close
  50. Set rsData = Nothing
Nov 9 '10 #4
James From Canb
2 New Member
Well, I'm off to work this morning, and when I get back I'll try to see if I can merge your code and MSDN Article titled "Retrieving Large Data (ADO.NET)". This looks promising.
Nov 10 '10 #5
NeoPa
32,579 Recognized Expert Moderator MVP
Just to clarify, I was referring in my post (#3) to a little known or used facility within VBA which gives direct binary access to a file (without any value descriptors). I use the following Open command :
Expand|Select|Wrap|Line Numbers
  1. Open strName For Binary Access Read Write Lock Write As #intFileNo
strName and intFileNo are previously set up variables of course, but this gives such direct access to the file in as much as a string set up with a value of the number of characters required in your buffer can be used to read from (Get #FileNo, lngFilePos, strBuffer) and write to (Put #FileNo, lngFilePos, strBuffer) the file directly.

I mention this simply to clarify my earlier post as I realised many may well wonder what I was going on about (This facility even existing is not well publicised and I expect many believe it doesn't even exist). With the file size limited to what can be addressed directly by a Long Integer value though, this can never exceed 4GB, nor 2GB if it treats it as a signed value (which I would doubt - but the documentation is not specific about this - and I've certainly never stretched it that far).

For that reason particularly (the complicated nature of handling all the I/O directly, could be another reason for putting most people off) I would say this is of limited use for your situation. I hope (and expect) that the ADO.Recordset code should be a more usable solution for you.

Good luck with your project.
Nov 11 '10 #6
CookieRevised
1 New Member
The limit with native VBA/VB6 file commands (like: open, close, filelen(), lof(), etc) is indeed 2GB. No matter how you access them (random, input, output, binary).

This because all those commands and functions use LONG as their datatype. And a LONG in VBA/VB6 is a signed 4 byte number, or to put it simply: ranging from -2.147.483.648 to max 2.147.483.647.

So, if you want to process files larger than 2GB, you'll need to use low-level Windows API's instead. Microsoft has provided a KB article which explains all, including a VBA/VB6 class which you can directly use:

https://support.microsoft.com/en-us/kb/189981
May 7 '15 #7
zmbd
5,501 Recognized Expert Moderator Expert
Cookie that's a nice find!
I'm tempted to set that as "Best Answer"
Any second on the motion?
May 7 '15 #8
NeoPa
32,579 Recognized Expert Moderator MVP
You have a second on that my friend :-)

Very interesting post and link. I'll be saving that one away for sure.

I would comment that I did realise that Long values themselves can only reach 2GB but that isn't an absolute block on using them to refer to areas between 2GB and 4GB. Clearly that's how it's implemented, but it's not beyond logic for MS to have implemented an interface whereby negative numbers are treated within the interface as unsigned values. In truth, I'm a little disappointed by their lack of imagination on this one.

Nevertheless, your link takes us beyond all that. I love that it also comes with a usable class for VBA. Wonderful.
May 8 '15 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

3
4868
by: Stuart | last post by:
Hi, Please can anyone help me join 5 large (1.8gb) text files togeather to create 1 very large file. I have some code in PHP but it bombs out at 2gb (seems there is a limit and php needs re compiling, something thats not an option for me) I don't want to take up all the servers resources so something that uses little memory would be very good indeed!
0
2419
by: Navin | last post by:
hi, i am using the following code to generate a xls file using the content type now when the user opnes the file at his pc it takes long time to open..if the no of records in file is large does the use of html tags has slow down the process of opening in excel... does speical care needs to be taken so that large html files open in excel quiclky.....................
7
5440
by: mpdsal | last post by:
Hello. I have some very large text files that I need to import into Access. These files are basically SAP system reports that can be up to 100,000 records but they contain a boatload of data I don't care about. My thinking was to clean up as much as I can and edit the files into columns, much like Excel would do, except I am exceeding the 60,000 record limit. Can anyone suggest a good free text editor that will allow me to cleanup enough...
6
6372
by: Rajorshi Biswas | last post by:
Hi folks, Suppose I have a large (1 GB) text file which I want to read in reverse. The number of characters I want to read at a time is insignificant. I'm confused as to how best to do it. Upon browsing through this group and other sources on the web, it seems that there are many ways to do it. Some suggest that simply fseek'ing to 8K bytes before the end of file, and going backwards is the way. In this case, am I guaranteed best results...
6
1897
by: syed javid | last post by:
hi all, I am getting an error when i process 1 MB txt file using StreamReader and i am reading char by char using streamReader.Read() method at some point it is throwing an exception System.ComponentModel.win32Exception:The handle is invalid. Any help for this problem or any kind of informaiton to process large files the file size can be 10 MB. thanks in advance
1
1954
by: Hutty | last post by:
I have a program that open text files and compares them, however, when reading files larger than 500kb the programs seems to bomb. I get re-directed to "page not found". Any idea how to get around this issue? Ultimate goal is to read text files as big as 50mb or more. I'm using the "input type=file " to upload file, and then using Streamreader to read the text. Thanks -- Hutty
1
2411
by: svijay | last post by:
hi I have got a strange problem. May I know any solution for this. Here is the detailed description about the problem We have got a mainframe system and also production and development server.
6
6616
by: Yi Xing | last post by:
Hi, I need to read specific lines of huge text files. Each time, I know exactly which line(s) I want to read. readlines() or readline() in a loop is just too slow. Since different lines have different size, I cannot use seek(). So I am thinking of building an index for the file for fast access. Can anybody give me some tips on how to do this in Python? Thanks. Yi
2
2757
by: tensto | last post by:
Hi, I am running MS Access 2003 on an XP machine. I have developed a database which have some of the tables fields in a language different than English (Eastern Europe one, Polish and others), i.e. an Unicode codepage is used as there're some specific non-English characters being used, for example in people's names. Here comes the tricky part -- I have written a piece of VBA code to export some tables/queries into a flat text files which...
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10482
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10225
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10027
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9072
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7564
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5585
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3759
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.