I have a 10 million record in a tab delimited text file (data.txt) and I have another file with line numbers (line_num.txt).
My data.txt is as follows
Expand|Select|Wrap|Line Numbers
- first 1 res 234
- sec 23 fre 890
- dec 26 der 690
- ...
Expand|Select|Wrap|Line Numbers
- 23
- 456
- 3467
- ...
What I have in my mind is to load these line number data ( 23rd, 456th, 3467th) into a hash (query hash) and go through the data.txt line by line and if the line appears in line_number_data_hash, then remove those lines. I don't have any code ready but I would like to whether there are any other way of doing this bit faster as I have millions of records to clean up.
Please let me know
Expand|Select|Wrap|Line Numbers
- 1. Load line_num into line_num_hash
- 2. count=0;
- 3. every line in a text file (<>) {
- if ($count exists in line_num_hash) {
- load the line into data_query_hash
- count++;
- }
- 4. open data.txt and iterate through every line (<>) {
- if ($_ exits in data_query_hash ) {
- #don't print (that is to be removed)
- } else { print the line }
- }