473,321 Members | 1,622 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,321 software developers and data experts.

Need Help parsing a file

Hi,
I have a .data file which is essentially just a text file with a bunch of numbers.
It's in the format "xx yy zzz abc
ccf fffd sss xxx
xx sss qq"
Basically it is a random amount of numbers per line separated using spaces (not tabs or commas).
What I need to do is parse the file to a format which would be more suitable to use for a SQl database system I am implementing. I need the output file to look like
"1,xx
1,yy
1,zzz
1,abc
2,ccf
2,fffd
2,sss" ... and so on

I have no clue how to use Perl but have been recommended to do so for parsing text files. The code should be about 10-20 lines from my understanding and it would be great if someone could help me out. I tried searching for something but could barely decipher what was out there.
Till now all I have is

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/perl
  2. #program to modify data
  3. $file = 'users/Damien/Downloads/DOROTHEA/dorothea_test.data;
  4. open(INFO, $file);
  5. open(DATA, ">edited");
  6. @lines = <INFO>;
  7. @count = 1;
  8.  
  9. foreach $line (@lines)
  10. {
  11.  
Any help would be greatly appreciated.
Nov 18 '08 #1
6 1652
eWish
971 Expert 512MB
I guess I don't understand the need for the numbers.

You would need to use the split function to split the text on each occurrence of whitespace.

I would think that using a hash would better and easier to keep track of your data.

--Kevin
Nov 18 '08 #2
Thanks for the quick reply.
That command seemed to have worked and I know that a hash function would in fact be much better but the data is currently structured in a way and required to be backwards compatible with an older database which wouldn't be so with hash.

I think I get it however I seem to have run towards a problem. The output file does not have anything in it. I've attached the code which hopefully should point out errors.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/perl
  2. #program to modify data
  3. $file = 'users/Damien/Downloads/DOROTHEA/dorothea_test.data';
  4. open(INFO, $file);
  5. open(DATA, ">>edited.data") or die "cannot open file for reading: $!";
  6. @lines = <INFO>;
  7. $count = 1;
  8.  
  9. foreach $line (@lines)
  10. {
  11. print DATA join('\n', $count . ',' . split(/\W/, $line));
  12. $count += 1;
  13. }
  14.  
  15. close(DATA);
  16. close(INFO);
Nov 18 '08 #3
KevinADC
4,059 Expert 2GB
See how this works:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/local/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. my $file = 'users/Damien/Downloads/DOROTHEA/dorothea_test.data';
  6. open(DATA, ">>edited.data") or die "cannot open file for reading: $!";
  7. open(INFO, $file) or die "$!";
  8. while (my $line = <INFO>) {
  9.    chomp $line;
  10.    for (split /\s+/, $line){
  11.       print DATA "$.,$_\n";
  12.    }
  13. }
  14. close(DATA);
  15. close(INFO);
Nov 18 '08 #4
Thanks a lot for the help. The code works brilliantly!!
I must say, perl looks like a very interesting language. To have figured that out in Java would've taken far too long and would've involved way too much code.
I hope you don't mind if I don't get certain parts and ask what it means. I want to understand it not just get it done for the sake of getting it done (if that made any sense at all).
Nov 18 '08 #5
Correct me if my assumptions/findings are wrong here:
a) the 'my' command just declares user variables
b) chomp just eliminates any newline character
c) \s+ (the plus just indicates one or more space rather than just a single space)

I understand the logic between the while and for loops and it seems really intuitive and simple.
Nov 18 '08 #6
nithinpes
410 Expert 256MB
Correct me if my assumptions/findings are wrong here:
a) the 'my' command just declares user variables
b) chomp just eliminates any newline character
c) \s+ (the plus just indicates one or more space rather than just a single space)

I understand the logic between the while and for loops and it seems really intuitive and simple.
'my' is used to declare local variables(lexical scoping). The variables declared using 'my' are accessible within the statement block/ script where they are defined, but not inside a function(subroutine) which is called within the block.
Similarly, 'local' is used for dynamic scoping.
The other two findings are right.
Nov 18 '08 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
0
by: Doug R | last post by:
Hello, I have a system that I am writing to automaticly import Credit Transaction data into a SQL Server 2000 Database. I am using a VB.Net application to detect when the file arives and prep...
1
by: sommarlov | last post by:
Hi everyone >From one of our systems an xml file is produced. I need to validate this file before we send it to an external system for a very lenghty process. I cannot change the xml file layout....
4
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
27
by: comp.lang.tcl | last post by:
My TCL proc, XML_GET_ALL_ELEMENT_ATTRS, is supposed to convert an XML file into a TCL list as follows: attr1 {val1} attr2 {val2} ... attrN {valN} This is the TCL code that does this: set...
9
by: Paulers | last post by:
Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an...
2
by: Anders B | last post by:
I want to make a program that reads the content of a LUA array save file.. More precicely a save file from a World of Warcraft plugin called CharacterProfiler, which dumps alot of information about...
25
by: Jon Slaughter | last post by:
I have some code that loads up some php/html files and does a few things to them and ultimately returns an html file with some php code in it. I then pass that file onto the user by using echo. Of...
4
by: Tom | last post by:
I am trying to update another developers code, and am stuck in a cludge. It works like this: A user uploads a file from the web page. Our code reads from a NetworkStream to a Filestream, and...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.