473,659 Members | 2,488 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Split a large file and then go back through the smaller chunks

1 New Member
I have a large data file that I split into smaller more manageable chunks (went from a 12.86 GB file to 500 MB - 1.6 GB chunks).

I now want to add to the PERL script and go back through those more manageable chunks and pull out any invoices within those smaller data files that are larger than 250 MB each and print them to their own files as well.

How do I go about doing that?

Here is what I am currently working with...

Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl -w
  2.  
  3. my $chunksize = 500000000; # 500MB
  4. my $filenumber = 0;
  5. my $infile = "infile.dat";
  6. my $outsize = 0;
  7. my $eof = 0;
  8.  
  9. open INFILE, $infile;
  10. open OUTFILE, ">outfile_".$filenumber.".dat";
  11.  
  12. while (<INFILE>) {
  13.     chomp;
  14.  
  15.     if( $outsize>$chunksize and /^.{67}11/ ) {
  16.         close OUTFILE;
  17.         $outsize = 0;
  18.         $filenumber++;
  19.         open (OUTFILE, ">outfile_".$filenumber.".dat") or die "Can't open outfile_".$filenumber.".dat";
  20.     }
  21.  
  22.     print OUTFILE "$_\n";
  23.     $outsize += length;
  24. }
  25. close INFILE;
  26.  
Jun 5 '07 #1
1 3090
miller
1,089 Recognized Expert Top Contributor
As Kevin has already noted on TT, you're spamming this thread on multiple forums: Here, PG, TT, and who knows where else. Please note that cross posting will likely net you less help because the most prolific experts are often on multiple forum sites. Cross posters notoriously do not come back for the help that is offered, so people stop offering.

Brigmar was nice enough to provide you with your current script on TT. However, noone is going to do an entire project for you. Even one that is this easy.

The only hard part about this project is the fact that during testing it can take a long time to parse through files more than a meg in size let along a gig. Therefore it helps to break them up into a sample size file for testing purposes before rolling out the production level code.

I would have advised you to break the project up into steps.
1) Count the number of lines in the file
2) Count the number of invoices in the file
3) Count the size of the file and the size of the invoices.
4) Implement the code to output groups of invoices to secondary files.

Neither of these steps is difficult, and they are all you ultimately need to accomplish your goal. The code that you were provided has the fundumentals for this, so it's up to you to put them to together.

If you're having trouble with any particular part, feel free to ask here. However, it's up to you to do the programming yourself.

- Miller
Jun 6 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

12
5678
by: Martin Dieringer | last post by:
I am trying to split a file by a fixed string. The file is too large to just read it into a string and split this. I could probably use a lexer but there maybe anything more simple? thanks m.
5
1900
by: Jermin | last post by:
Hi, I have a database composed of a single table composed of 2 columns, an auto-numbered ID column and a column which contains 30 million random numbers. All I want to Access to do is check the numbers and let me know if there are any duplicates. Ideally I'd like Access to flag the duplicates if they exist. The easy approach that I thought would work is to go to Design view and select Indexed table, No Duplicates. This doesn't work...
4
8175
by: A.M-SG | last post by:
Hi, I have a ASP.NET aspx file that needs to pass large images from a network storage to client browser. The requirement is that users cannot have access to the network share. The aspx file must be the only method that users receive image files.
0
793
by: David Helgason | last post by:
I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more. In coding an game asset server I want to keep a large number of file revisions of varying sizes (1Kb-50Mb) inside the database. Naturally I want to avoid having to allocate whole buffers of 50Mb too often.
1
2610
by: Chris Ashley | last post by:
I am working with some very large bitmap files (1700 * 60000) and need to split them into vertical strips. This is because GDI+ seems to load the entire file into memory and crashes with an out of memory error. How can I read a BMP file directly at byte level and split it into smaller files? For example, into 1700 * 1000 strips.
6
1958
by: Stan | last post by:
I am working on a database in ACCESS 2003. This is a simple DB with only one table. I have split the DB so I can upgrade and debug the front end before installing on my clients' computer. I used the ACCESS splitter utility and everything appeared to go OK. If I run "Copy of DB_be.mdb" I see only the table, not the Queries or Forms. When I run the front end "Copy of DB.mdb" I see the Queries and Forms. but, I also see the Table. If I...
6
8544
by: Bob Bedford | last post by:
Hi all, I've an XML file that takes more than the hosting time limit to be readed by a PHP script. What I'd like to do is split the large XML file (can be more than 30MB) in little parts and keep the header for every file. Here is the idea:
1
2972
by: cjreynolds | last post by:
It has been years since I used flash, so I consider myself a rank amature. I'm Using Flash 8 and would like the code to be backward-compatible to ver. 6. I have a big movie, with lots of images. I've comprimised the image quality as much as I dare, but it still takes 20-30 seconds to load (my client is in Paris, and claims to have the fastest connection in Paris, but it takes more than a minute to load it from there, even when loaded on his...
7
4377
by: John Smith | last post by:
Hi, I am very new to C# and NET framework. I am trying to hash (using MD5CryptoServiceProvider) a source that is split into several files. Now when the source is in one file I can produce the correct md5 hash. My issue is how can I reproduce the correct hash when the file is split into different files.
0
8428
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8341
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8851
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8751
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8630
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7360
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4176
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
1982
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1739
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.