By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,693 Members | 1,972 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,693 IT Pros & Developers. It's quick & easy.

Importing data from Excel into MySQL using Perl

numberwhun
Expert Mod 2.5K+
P: 3,503
Ok, let me preface this thread by saying that I cannot provide a real sample of the data that I am working with. The reason is, because it contains a lot of private, confidential information (for work) that I cannot divulge. But, I can provide the details of the database coding that I have been working on. I have been racking my brains the last few days with this and I am at a point that I believe I need some guidance.

I am working on a project for work that will be a database of all of our client information. It is quite an ambitious project, but it is something that is desperately needed by my group. We have a couple thousand clients (currently) and that is growing by leaps and bounds at the moment, so getting this project completed sooner rather than later is a bit critical.

The initial load of client data was provided to me in an excel spreadsheet. It has 39 columns of information and almost 2000 rows, 1 row of information per client. In order to work with the data, I initially started using the Spreadsheet::ParseExcel module, but after attempting to pull out the data, noticed there were only 1300 lined of data in the file, instead of the almost 2000 lines that exist in the file. So, I changed my tactics. I saved the spreadsheet out to a file, but chose .csv as the format. I chose a ":" as the field delimiter and it seemed to export fine, showing the correct number of lines when I checked.

I then wrote the below code to go through the file, line by line, put each value in a variable, prepare the INSERT statement, and then execute. Here is the code I have:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use DBI;
  6.  
  7. # Connect to database
  8. my $dbh = DBI->connect('DBI:mysql:dsmdb', 'username', 'password') or die "Connection failed: $DBI::errstr";
  9.  
  10. # Open data file
  11. open(DATAFILE, "<file.csv") or die "Canont open data file:  $!";
  12.  
  13. my $sth;
  14.  
  15. while(<DATAFILE>){
  16.  
  17.     # Take the current line of data and split it out into variables
  18.     my ($PartnerName, $CorporationName, $ContactGivenName, $ContactPhone, $ContactEmailID, $CommunicationProtocol, $SecurityType, $ECID, $Source, $GEID, $GEMatchType, $CrossoverWithGTI, $ECIDv1, $CustomerName, $UltimateECID, $UltimateName, $CostCenter, $DomicileRegion, $SalesRegion, $SalesHead, $SalesManager, $TMO1SID, $TMO1Name, $TMO2SID, $TMO2Name, $BankerSID, $BankerName, $CSOMgrSID, $CSOMgrName, $CSOSID, $CSOName, $CSPSID, $CSPName, $LOB, $SubLOB, $Segment, $Market, $Industry, $ClientType, $ClientStatus) = split(/:/, $_);
  19.  
  20.     # Strip leading white space from the values in all the variables
  21.     $PartnerName =~ s/^\s*//;
  22.     $CorporationName =~ s/^\s*//;
  23.     $ContactGivenName =~ s/^\s*//;
  24.     $ContactPhone =~ s/^\s*//;
  25.     $ContactEmailID =~ s/^\s*//;
  26.     $CommunicationProtocol =~ s/^\s*//;
  27.     $SecurityType =~ s/^\s*//;
  28.     $ECID =~ s/^\s*//;
  29.     $Source =~ s/^\s*//;
  30.     $GEID =~ s/^\s*//;
  31.     $GEMatchType =~ s/^\s*//;
  32.     $CrossoverWithGTI =~ s/^\s*//;
  33.     $ECIDv1 =~ s/^\s*//;
  34.     $CustomerName =~ s/^\s*//;
  35.     $UltimateECID =~ s/^\s*//;
  36.     $UltimateName =~ s/^\s*//;
  37.     $CostCenter =~ s/^\s*//;
  38.     $DomicileRegion =~ s/^\s*//;
  39.     $SalesRegion =~ s/^\s*//;
  40.     $SalesHead =~ s/^\s*//;
  41.     $SalesManager =~ s/^\s*//;
  42.     $TMO1SID =~ s/^\s*//;
  43.     $TMO1Name =~ s/^\s*//;
  44.     $TMO2SID =~ s/^\s*//;
  45.     $TMO2Name =~ s/^\s*//;
  46.     $BankerSID =~ s/^\s*//;
  47.     $BankerName =~ s/^\s*//;
  48.     $CSOMgrSID =~ s/^\s*//;
  49.     $CSOMgrName =~ s/^\s*//;
  50.     $CSOSID =~ s/^\s*//;
  51.     $CSOName =~ s/^\s*//;
  52.     $CSPSID =~ s/^\s*//;
  53.     $CSPName =~ s/^\s*//;
  54.     $LOB =~ s/^\s*//;
  55.     $SubLOB =~ s/^\s*//;
  56.     $Segment =~ s/^\s*//;
  57.     $Market =~ s/^\s*//;
  58.     $Industry =~ s/^\s*//;
  59.     $ClientType =~ s/^\s*//;
  60.     $ClientStatus =~ s/^\s*//;
  61.  
  62.  
  63.     $sth = $dbh->prepare qw(INSERT INTO client_info (PartnerName CorporationName ContactGivenName ContactPhone ContactEmailID CommunicationProtocol SecurityType ECID Source GEID GEMatchType CrossoverWithGTI ECIDv1 CustomerName UltimateECID UltimateName CostCenter DomicileRegion SalesRegion SalesHead SalesManager TMO1SID TMO1Name TMO2SID TMO2Name BankerSID BankerName CSOMgrSID CSOMgrName CSOSID CSOName CSPSID CSPName LOB SubLOB Segment Market Industry ClientType ClientStatus) VALUES (?)) or die "Prepare failed: " . $dbh->errstr();
  64.  
  65.  
  66.     $sth->execute qw($PartnerName $CorporationName $ContactGivenName $ContactPhone $ContactEmailID $CommunicationProtocol $SecurityType $ECID $Source $GEID $GEMatchType $CrossoverWithGTI $ECIDv1 $CustomerName $UltimateECID $UltimateName $CostCenter $DomicileRegion $SalesRegion $SalesHead $SalesManager $TMO1SID $TMO1Name $TMO2SID $TMO2Name $BankerSID $BankerName $CSOMgrSID $CSOMgrName $CSOSID $CSOName $CSPSID $CSPName $LOB $SubLOB $Segment $Market $Industry $ClientType $ClientStatus) or die "Execute failed: " . $dbh->errstr();
  67. }
  68.  
Now, I have done some tweaking here and there, but at this point, when I run the above script, I get:

$ ./parse_csv_to_db.pl
DBI prepare: invalid number of arguments: got handle + 45, expected handle + between 1 and 2
Usage: $h->prepare($statement [, \%attr]) at ./parse_csv_to_db.pl line 67, <DATAFILE> line 1.
The data itself, by the way, has some issues that I have to find a way to work around. There are a log of names in there that are in the format of "lastname, firstname middlename". Notice the comma after the last name, it is a reason I did not use a comma as a delimiter. I know that there are also names that have an apostrophe in them, such as those Irish names in the format of "O'Something". (sorry, cannot give real names, but you get the idea).

So, the data has characters in it that I am wondering if they are messing up the data load. There are fields that contain the phrase "#EMPTY" and some that are literally empty and don't even have that. They are empty fields all together.

I really need some help as this data really needs to get loaded, but is for some reason really causing a problem.

As an FYI, line 67 that is mentioned in the error is the prepare statement in the above code. When I went into MySQL and played with a "problem line" of data, I found that the statement would not work with the empty fields unless I put something in them. I need it to put nothing in the entry.

I did notice that about 5 entries our of the almost 2000, did, for some reason, load, but that is a far cry from being finished. The entries even had some entry fields, so I am not sure what the problem is. Any ideas are absolutely welcome.

Regards,

Jeff
Apr 18 '10 #1

✓ answered by RonB

You might want to look at using:
DBD::AnyData -- DBI access to XML, CSV and other formats
http://search.cpan.org/~jzucker/DBD-....09/AnyData.pm

Or, you may want to look at using the built-in features of mysql for importing data.

I haven't used it myself, but this might help.
LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Share this Question
Share on Google+
15 Replies


Expert Mod 100+
P: 589
There are a number of issues with the code you posted, but I'll focus on a portion of the prepare/execute statement.

Your prepare statement is using only 1 place holder, but you're passing 40 values in the execute statement. The number of placeholders needs to match the number of values passed in the execute statement.
Apr 18 '10 #2

Expert Mod 100+
P: 589
You might want to look at using:
DBD::AnyData -- DBI access to XML, CSV and other formats
http://search.cpan.org/~jzucker/DBD-....09/AnyData.pm

Or, you may want to look at using the built-in features of mysql for importing data.

I haven't used it myself, but this might help.
LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Apr 18 '10 #3

numberwhun
Expert Mod 2.5K+
P: 3,503
Thanks Ron! I will take a look at what you have posted and get back to this thread. I appreciate the suggestions/help.

Regards,

Jeff
Apr 18 '10 #4

numberwhun
Expert Mod 2.5K+
P: 3,503
You know, learning curves are certainly one of those things that I enjoy. Its in those learning curves that I tend to encounter the issues that I need to learn about an overcome. This one one of those things.

Thank you @RonB! I got the initial data load done using the following within MySQL:

Expand|Select|Wrap|Line Numbers
  1. mysql> LOAD DATA INFILE  "/path/to/file.csv" INTO TABLE client_info FIELDS TERMINATED BY ':' (PartnerName,CorporationName,ContactGivenName,ContactPhone,ContactEmailID,CommunicationProtocol,SecurityType,ECID,Source,GEID,GEMatchType,CrossoverWithGTI,ECIDv1,CustomerName,UltimateECID,UltimateName,CostCenter,DomicileRegion,SalesRegion,SalesHead,SalesManager,TMO1SID,TMO1Name,TMO2SID,TMO2Name,BankerSID,BankerName,CSOMgrSID,CSOMgrName,CSOSID,CSOName,CSPSID,CSPName,LOB,SubLOB,Segment,Market,Industry,ClientType,ClientStatus);
  2.  
Granted, I had to be logged in already to mysql and have selected the correct database to use. Then, I just issued the above command to populate the already created table and voila!!! It worked.

I must really play with my script as I would really like it to work through it because in the futture, there will be more spreadsheets, each containing an update, but also containing this original data. So, I will need to scrub out the already existing data and only load the new stuff. Fun fun fun!!! Now to concentrate on the rest of the project. I am sure there will be further posts from me here as I try to get this off the ground in a reasonable amount of time.

Thanks again @RonB! I really appreciated the help!

Regards,

Jeff
Apr 18 '10 #5

Expert Mod 100+
P: 589
You're welcome.
Apr 18 '10 #6

Expert Mod 100+
P: 589
Jeff,

I thought I'd add a few comments on your code which might help in other parts of your project.

When opening a filehandle, it's best to use a lexical var for the handle instead of the bareword, and don't use all uppercase letters. One advantage of the lexical var is that the handle will automatically close when it goes out of scope. It's also better to use the 3 arg form of open and in most cases you should include the filename in the die statement.

None of those substitution regex's are needed, but if they were, it would be best to use a for/foreach loop. Instead of those regex's, correct the regex in the split.
Expand|Select|Wrap|Line Numbers
  1. split(/\s*:\s*/, $_)
Line lengths greater that 80 characters should be avoided whenever possible and line lengths of 500+ chars is crazy. In this case I would have used an array instead of 40 scalars.

Since the prepare statement never changes, it should be defined outside of the loop.
Apr 18 '10 #7

numberwhun
Expert Mod 2.5K+
P: 3,503
Some great points Ron, thanks! This is what happens when you stop coding for a while. I am getting back into it more heavily and have misplaced (better than forgotten) some of the Best Practices that go along with being a coder.

I see me scraping the script as is an doing a full re-write.

Regards,

Jeff
Apr 18 '10 #8

eWish
Expert 100+
P: 971
Here is an sample of a db statement that I would use.

Expand|Select|Wrap|Line Numbers
  1. my $insert_statement = $dbh->prepare('INSERT INTO table_name(column1, column2) VALUES(?,?)');
  2.    $insert_statement->execute($value1, $value2);
  3.    $insert_statement->finish(); 
--Kevin
Apr 19 '10 #9

numberwhun
Expert Mod 2.5K+
P: 3,503
Thanks Kevin! Much appreciated as well! I am going to compare to what I had and see where I can modify it and hopefully get it working.

I don't know if its just my not knowing how to truly use Perl DBI or what, but it can't be as hard as I seemed to have been making it. ***shakes head***
Apr 19 '10 #10

eWish
Expert 100+
P: 971
I like working with the DBI. If you need any help and the Fishmonger *cough* *cough*, I mean if RonB is not around I will be glad to offer my assistance.

--Kevin
Apr 21 '10 #11

numberwhun
Expert Mod 2.5K+
P: 3,503
LOL, ok, must ask.... why do you call him "Fishmonger"?
Apr 21 '10 #12

Expert Mod 100+
P: 589
FishMonger is my usrername on most other forums. It goes back to when I used to work for my brother in the seafood (wholesale,retail,restaurant) business.

http://www.rosysatthebeach.com/story%20of%20rosys.htm
Apr 21 '10 #13

eWish
Expert 100+
P: 971
Ron,

I have seen you on many other forums, therefore, I knew your FishMonger name. I do like the assistance you offer to the OP's. I just wish I was as good as you :)

--Kevin
Apr 22 '10 #14

Expert Mod 100+
P: 589
Kevin,

Thanks for the compliment.

I wish I was as good as I make it appear. I generally think that I'm just a little above average. There are many other people on these forums that have far more knowledge/experience than I.
Apr 22 '10 #15

numberwhun
Expert Mod 2.5K+
P: 3,503
See, Kevin (eWish) totally underestimates his knowledge, IMHO. I have seen you answer some pretty good questions Kevin, and I am willing to bet you could easily get a job as a Perl coder, if you happened to go that route.

@RonB I agree with Kevin. Your answers are always quite helpful to the OPs and the rest of us as well. Personally, I have fallen a bit off the development bandwagon with the advent of becoming re-employed last October, but am trying to get back on there. Your help kicked me in a good direction.

Thanks to both of you!

Regards,

Jeff
Apr 22 '10 #16

Post your reply

Sign in to post your reply or Sign up for a free account.