By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,775 Members | 1,740 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,775 IT Pros & Developers. It's quick & easy.

Need Help parsing a file

P: 4
Hi,
I have a .data file which is essentially just a text file with a bunch of numbers.
It's in the format "xx yy zzz abc
ccf fffd sss xxx
xx sss qq"
Basically it is a random amount of numbers per line separated using spaces (not tabs or commas).
What I need to do is parse the file to a format which would be more suitable to use for a SQl database system I am implementing. I need the output file to look like
"1,xx
1,yy
1,zzz
1,abc
2,ccf
2,fffd
2,sss" ... and so on

I have no clue how to use Perl but have been recommended to do so for parsing text files. The code should be about 10-20 lines from my understanding and it would be great if someone could help me out. I tried searching for something but could barely decipher what was out there.
Till now all I have is

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/perl
  2. #program to modify data
  3. $file = 'users/Damien/Downloads/DOROTHEA/dorothea_test.data;
  4. open(INFO, $file);
  5. open(DATA, ">edited");
  6. @lines = <INFO>;
  7. @count = 1;
  8.  
  9. foreach $line (@lines)
  10. {
  11.  
Any help would be greatly appreciated.
Nov 18 '08 #1
Share this Question
Share on Google+
6 Replies


eWish
Expert 100+
P: 971
I guess I don't understand the need for the numbers.

You would need to use the split function to split the text on each occurrence of whitespace.

I would think that using a hash would better and easier to keep track of your data.

--Kevin
Nov 18 '08 #2

P: 4
Thanks for the quick reply.
That command seemed to have worked and I know that a hash function would in fact be much better but the data is currently structured in a way and required to be backwards compatible with an older database which wouldn't be so with hash.

I think I get it however I seem to have run towards a problem. The output file does not have anything in it. I've attached the code which hopefully should point out errors.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/perl
  2. #program to modify data
  3. $file = 'users/Damien/Downloads/DOROTHEA/dorothea_test.data';
  4. open(INFO, $file);
  5. open(DATA, ">>edited.data") or die "cannot open file for reading: $!";
  6. @lines = <INFO>;
  7. $count = 1;
  8.  
  9. foreach $line (@lines)
  10. {
  11. print DATA join('\n', $count . ',' . split(/\W/, $line));
  12. $count += 1;
  13. }
  14.  
  15. close(DATA);
  16. close(INFO);
Nov 18 '08 #3

KevinADC
Expert 2.5K+
P: 4,059
See how this works:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. my $file = 'users/Damien/Downloads/DOROTHEA/dorothea_test.data';
  6. open(DATA, ">>edited.data") or die "cannot open file for reading: $!";
  7. open(INFO, $file) or die "$!";
  8. while (my $line = <INFO>) {
  9.    chomp $line;
  10.    for (split /\s+/, $line){
  11.       print DATA "$.,$_\n";
  12.    }
  13. }
  14. close(DATA);
  15. close(INFO);
Nov 18 '08 #4

P: 4
Thanks a lot for the help. The code works brilliantly!!
I must say, perl looks like a very interesting language. To have figured that out in Java would've taken far too long and would've involved way too much code.
I hope you don't mind if I don't get certain parts and ask what it means. I want to understand it not just get it done for the sake of getting it done (if that made any sense at all).
Nov 18 '08 #5

P: 4
Correct me if my assumptions/findings are wrong here:
a) the 'my' command just declares user variables
b) chomp just eliminates any newline character
c) \s+ (the plus just indicates one or more space rather than just a single space)

I understand the logic between the while and for loops and it seems really intuitive and simple.
Nov 18 '08 #6

nithinpes
Expert 100+
P: 410
Correct me if my assumptions/findings are wrong here:
a) the 'my' command just declares user variables
b) chomp just eliminates any newline character
c) \s+ (the plus just indicates one or more space rather than just a single space)

I understand the logic between the while and for loops and it seems really intuitive and simple.
'my' is used to declare local variables(lexical scoping). The variables declared using 'my' are accessible within the statement block/ script where they are defined, but not inside a function(subroutine) which is called within the block.
Similarly, 'local' is used for dynamic scoping.
The other two findings are right.
Nov 18 '08 #7

Post your reply

Sign in to post your reply or Sign up for a free account.