473,746 Members | 2,226 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Importing data from Excel into MySQL using Perl

numberwhun
3,509 Recognized Expert Moderator Specialist
Ok, let me preface this thread by saying that I cannot provide a real sample of the data that I am working with. The reason is, because it contains a lot of private, confidential information (for work) that I cannot divulge. But, I can provide the details of the database coding that I have been working on. I have been racking my brains the last few days with this and I am at a point that I believe I need some guidance.

I am working on a project for work that will be a database of all of our client information. It is quite an ambitious project, but it is something that is desperately needed by my group. We have a couple thousand clients (currently) and that is growing by leaps and bounds at the moment, so getting this project completed sooner rather than later is a bit critical.

The initial load of client data was provided to me in an excel spreadsheet. It has 39 columns of information and almost 2000 rows, 1 row of information per client. In order to work with the data, I initially started using the Spreadsheet::Pa rseExcel module, but after attempting to pull out the data, noticed there were only 1300 lined of data in the file, instead of the almost 2000 lines that exist in the file. So, I changed my tactics. I saved the spreadsheet out to a file, but chose .csv as the format. I chose a ":" as the field delimiter and it seemed to export fine, showing the correct number of lines when I checked.

I then wrote the below code to go through the file, line by line, put each value in a variable, prepare the INSERT statement, and then execute. Here is the code I have:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5. use DBI;
  6.  
  7. # Connect to database
  8. my $dbh = DBI->connect('DBI:mysql:dsmdb', 'username', 'password') or die "Connection failed: $DBI::errstr";
  9.  
  10. # Open data file
  11. open(DATAFILE, "<file.csv") or die "Canont open data file:  $!";
  12.  
  13. my $sth;
  14.  
  15. while(<DATAFILE>){
  16.  
  17.     # Take the current line of data and split it out into variables
  18.     my ($PartnerName, $CorporationName, $ContactGivenName, $ContactPhone, $ContactEmailID, $CommunicationProtocol, $SecurityType, $ECID, $Source, $GEID, $GEMatchType, $CrossoverWithGTI, $ECIDv1, $CustomerName, $UltimateECID, $UltimateName, $CostCenter, $DomicileRegion, $SalesRegion, $SalesHead, $SalesManager, $TMO1SID, $TMO1Name, $TMO2SID, $TMO2Name, $BankerSID, $BankerName, $CSOMgrSID, $CSOMgrName, $CSOSID, $CSOName, $CSPSID, $CSPName, $LOB, $SubLOB, $Segment, $Market, $Industry, $ClientType, $ClientStatus) = split(/:/, $_);
  19.  
  20.     # Strip leading white space from the values in all the variables
  21.     $PartnerName =~ s/^\s*//;
  22.     $CorporationName =~ s/^\s*//;
  23.     $ContactGivenName =~ s/^\s*//;
  24.     $ContactPhone =~ s/^\s*//;
  25.     $ContactEmailID =~ s/^\s*//;
  26.     $CommunicationProtocol =~ s/^\s*//;
  27.     $SecurityType =~ s/^\s*//;
  28.     $ECID =~ s/^\s*//;
  29.     $Source =~ s/^\s*//;
  30.     $GEID =~ s/^\s*//;
  31.     $GEMatchType =~ s/^\s*//;
  32.     $CrossoverWithGTI =~ s/^\s*//;
  33.     $ECIDv1 =~ s/^\s*//;
  34.     $CustomerName =~ s/^\s*//;
  35.     $UltimateECID =~ s/^\s*//;
  36.     $UltimateName =~ s/^\s*//;
  37.     $CostCenter =~ s/^\s*//;
  38.     $DomicileRegion =~ s/^\s*//;
  39.     $SalesRegion =~ s/^\s*//;
  40.     $SalesHead =~ s/^\s*//;
  41.     $SalesManager =~ s/^\s*//;
  42.     $TMO1SID =~ s/^\s*//;
  43.     $TMO1Name =~ s/^\s*//;
  44.     $TMO2SID =~ s/^\s*//;
  45.     $TMO2Name =~ s/^\s*//;
  46.     $BankerSID =~ s/^\s*//;
  47.     $BankerName =~ s/^\s*//;
  48.     $CSOMgrSID =~ s/^\s*//;
  49.     $CSOMgrName =~ s/^\s*//;
  50.     $CSOSID =~ s/^\s*//;
  51.     $CSOName =~ s/^\s*//;
  52.     $CSPSID =~ s/^\s*//;
  53.     $CSPName =~ s/^\s*//;
  54.     $LOB =~ s/^\s*//;
  55.     $SubLOB =~ s/^\s*//;
  56.     $Segment =~ s/^\s*//;
  57.     $Market =~ s/^\s*//;
  58.     $Industry =~ s/^\s*//;
  59.     $ClientType =~ s/^\s*//;
  60.     $ClientStatus =~ s/^\s*//;
  61.  
  62.  
  63.     $sth = $dbh->prepare qw(INSERT INTO client_info (PartnerName CorporationName ContactGivenName ContactPhone ContactEmailID CommunicationProtocol SecurityType ECID Source GEID GEMatchType CrossoverWithGTI ECIDv1 CustomerName UltimateECID UltimateName CostCenter DomicileRegion SalesRegion SalesHead SalesManager TMO1SID TMO1Name TMO2SID TMO2Name BankerSID BankerName CSOMgrSID CSOMgrName CSOSID CSOName CSPSID CSPName LOB SubLOB Segment Market Industry ClientType ClientStatus) VALUES (?)) or die "Prepare failed: " . $dbh->errstr();
  64.  
  65.  
  66.     $sth->execute qw($PartnerName $CorporationName $ContactGivenName $ContactPhone $ContactEmailID $CommunicationProtocol $SecurityType $ECID $Source $GEID $GEMatchType $CrossoverWithGTI $ECIDv1 $CustomerName $UltimateECID $UltimateName $CostCenter $DomicileRegion $SalesRegion $SalesHead $SalesManager $TMO1SID $TMO1Name $TMO2SID $TMO2Name $BankerSID $BankerName $CSOMgrSID $CSOMgrName $CSOSID $CSOName $CSPSID $CSPName $LOB $SubLOB $Segment $Market $Industry $ClientType $ClientStatus) or die "Execute failed: " . $dbh->errstr();
  67. }
  68.  
Now, I have done some tweaking here and there, but at this point, when I run the above script, I get:

$ ./parse_csv_to_db .pl
DBI prepare: invalid number of arguments: got handle + 45, expected handle + between 1 and 2
Usage: $h->prepare($state ment [, \%attr]) at ./parse_csv_to_db .pl line 67, <DATAFILE> line 1.
The data itself, by the way, has some issues that I have to find a way to work around. There are a log of names in there that are in the format of "lastname, firstname middlename". Notice the comma after the last name, it is a reason I did not use a comma as a delimiter. I know that there are also names that have an apostrophe in them, such as those Irish names in the format of "O'Somethin g". (sorry, cannot give real names, but you get the idea).

So, the data has characters in it that I am wondering if they are messing up the data load. There are fields that contain the phrase "#EMPTY" and some that are literally empty and don't even have that. They are empty fields all together.

I really need some help as this data really needs to get loaded, but is for some reason really causing a problem.

As an FYI, line 67 that is mentioned in the error is the prepare statement in the above code. When I went into MySQL and played with a "problem line" of data, I found that the statement would not work with the empty fields unless I put something in them. I need it to put nothing in the entry.

I did notice that about 5 entries our of the almost 2000, did, for some reason, load, but that is a far cry from being finished. The entries even had some entry fields, so I am not sure what the problem is. Any ideas are absolutely welcome.

Regards,

Jeff
Apr 18 '10 #1
15 7807
RonB
589 Recognized Expert Moderator Contributor
There are a number of issues with the code you posted, but I'll focus on a portion of the prepare/execute statement.

Your prepare statement is using only 1 place holder, but you're passing 40 values in the execute statement. The number of placeholders needs to match the number of values passed in the execute statement.
Apr 18 '10 #2
RonB
589 Recognized Expert Moderator Contributor
You might want to look at using:
DBD::AnyData -- DBI access to XML, CSV and other formats
http://search.cpan.org/~jzucker/DBD-....09/AnyData.pm

Or, you may want to look at using the built-in features of mysql for importing data.

I haven't used it myself, but this might help.
LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Apr 18 '10 #3
numberwhun
3,509 Recognized Expert Moderator Specialist
Thanks Ron! I will take a look at what you have posted and get back to this thread. I appreciate the suggestions/help.

Regards,

Jeff
Apr 18 '10 #4
numberwhun
3,509 Recognized Expert Moderator Specialist
You know, learning curves are certainly one of those things that I enjoy. Its in those learning curves that I tend to encounter the issues that I need to learn about an overcome. This one one of those things.

Thank you @RonB! I got the initial data load done using the following within MySQL:

Expand|Select|Wrap|Line Numbers
  1. mysql> LOAD DATA INFILE  "/path/to/file.csv" INTO TABLE client_info FIELDS TERMINATED BY ':' (PartnerName,CorporationName,ContactGivenName,ContactPhone,ContactEmailID,CommunicationProtocol,SecurityType,ECID,Source,GEID,GEMatchType,CrossoverWithGTI,ECIDv1,CustomerName,UltimateECID,UltimateName,CostCenter,DomicileRegion,SalesRegion,SalesHead,SalesManager,TMO1SID,TMO1Name,TMO2SID,TMO2Name,BankerSID,BankerName,CSOMgrSID,CSOMgrName,CSOSID,CSOName,CSPSID,CSPName,LOB,SubLOB,Segment,Market,Industry,ClientType,ClientStatus);
  2.  
Granted, I had to be logged in already to mysql and have selected the correct database to use. Then, I just issued the above command to populate the already created table and voila!!! It worked.

I must really play with my script as I would really like it to work through it because in the futture, there will be more spreadsheets, each containing an update, but also containing this original data. So, I will need to scrub out the already existing data and only load the new stuff. Fun fun fun!!! Now to concentrate on the rest of the project. I am sure there will be further posts from me here as I try to get this off the ground in a reasonable amount of time.

Thanks again @RonB! I really appreciated the help!

Regards,

Jeff
Apr 18 '10 #5
RonB
589 Recognized Expert Moderator Contributor
You're welcome.
Apr 18 '10 #6
RonB
589 Recognized Expert Moderator Contributor
Jeff,

I thought I'd add a few comments on your code which might help in other parts of your project.

When opening a filehandle, it's best to use a lexical var for the handle instead of the bareword, and don't use all uppercase letters. One advantage of the lexical var is that the handle will automatically close when it goes out of scope. It's also better to use the 3 arg form of open and in most cases you should include the filename in the die statement.

None of those substitution regex's are needed, but if they were, it would be best to use a for/foreach loop. Instead of those regex's, correct the regex in the split.
Expand|Select|Wrap|Line Numbers
  1. split(/\s*:\s*/, $_)
Line lengths greater that 80 characters should be avoided whenever possible and line lengths of 500+ chars is crazy. In this case I would have used an array instead of 40 scalars.

Since the prepare statement never changes, it should be defined outside of the loop.
Apr 18 '10 #7
numberwhun
3,509 Recognized Expert Moderator Specialist
Some great points Ron, thanks! This is what happens when you stop coding for a while. I am getting back into it more heavily and have misplaced (better than forgotten) some of the Best Practices that go along with being a coder.

I see me scraping the script as is an doing a full re-write.

Regards,

Jeff
Apr 18 '10 #8
eWish
971 Recognized Expert Contributor
Here is an sample of a db statement that I would use.

Expand|Select|Wrap|Line Numbers
  1. my $insert_statement = $dbh->prepare('INSERT INTO table_name(column1, column2) VALUES(?,?)');
  2.    $insert_statement->execute($value1, $value2);
  3.    $insert_statement->finish(); 
--Kevin
Apr 19 '10 #9
numberwhun
3,509 Recognized Expert Moderator Specialist
Thanks Kevin! Much appreciated as well! I am going to compare to what I had and see where I can modify it and hopefully get it working.

I don't know if its just my not knowing how to truly use Perl DBI or what, but it can't be as hard as I seemed to have been making it. ***shakes head***
Apr 19 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

2
8706
by: Andreas Emmert | last post by:
Hi there, I'm trying to import a csv file into my MySQL database. Unfortunately the number format for the price column is formatted in German style, i.e. XX,XX (decimals separated by comma not by dot). When importing this file using LOAD DATA LOCAL INFILE the decimals are cut off. I'm using column type FLOAT for the price column in my database. Is there any chance to preserve the decimals when importing the csv file into my database...
1
2000
by: Charles Alexander | last post by:
Hello I'm trying to redisplay and reorganize some data off a huge Excel spreadsheet. I figured it would be easier to manipulate if I dumped the data into MySQL with the help of phpMyAdmin. I'm a newbie to MySQL and PHP and/or Perl so if someone can lead me in the right direction to accomplish the following. I have a database with 1 table from 5 columns of data.
1
2490
by: torjon | last post by:
I have a text file with 180 headers. Is it possible to import this file into mysql where mysql will automatically create the table with column headings from the headers (as sql server and access do?) In case that doesn;t work, how can I create the table and past the header names into th command line, or use a text file to impoprt the headers in somehow? Last question, am I gonna be sitting there doing a 160+ field create
4
4702
by: Little PussyCat | last post by:
Hello, I nee to write something that will transfer excel data into an SQL Server table. I have for another database application I wrote have it importing Excel spreadsheet data using cell by cell, row by row method. This is fully automated so the user can choose whatever spreadsheet they want to import and press a button which sits on a VB6 frontend. This has been good for that situsation but it can be very slow when there
7
3064
by: Darren | last post by:
I have been attempting to create a reservation planning form in excel that imports Data from an Access database and inserts that information automaticly into the correct spreed sheet and the correct cells within that worksheet. The Excel Book is made up of 12 worksheets named Jan-Dec. Each worksheet has columns labeled as each day of that month. Column 'A' is reserved for 19 rows named "room1 - room19". The data I am importing from the...
2
3610
by: nutthatch | last post by:
I want to be able to import an Excel spreadsheet into Access 2K using the macro command Transferspreadsheet. However, the file I am importing (over which I have no control) contains some records that are "dirty" i.e. the field contents do not comply with the expected format (date/time) and they end up in a seperate table of import errors. (The records in "error" are actually empty fields.) This is a regular event and I do not want to...
1
2458
by: Geoff Jones | last post by:
Hi I have a question which I hope somebody can answer. I have written a VB application with which I want to import an Excel file, analyze the data within it and do some calculations. There are in fact five sheets in the Excel file. My original idea was to import the file into access and create a database file; which I did and worked beautifully. It generated five tables in the database as expected. However, I then thought why not...
5
3176
by: hharriel | last post by:
Hi, I am hoping someone can help me with an issue I am having with excel and ms access. I have collected data (which are in individual excel files) from 49 different school districts. All districts have used the same excel template and populated the same 32 data fields (columns). I created one large excel file from all 49 files which gives me a master table of 60,000 or so records. I have tried to import this master table into access...
7
2441
by: eklavyavats | last post by:
I am able to import data from the database to excel sheet. But i was unable to load the data from the excel sheet to my sql.I have connected the excel sheet to my sql but the data that is imported to the database is corrupted as it is required to enter how the two data are seperated and how does the new line begins. I will b using the visual basic for this purpose. Can i get any sort of help
5
5269
by: Vik Rubenfeld | last post by:
Is there a way to import an MS Access database .mdb file into MySQL running locally on a Mac? Thanks in advance to all for any info.
0
8970
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8795
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9486
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9279
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9214
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6763
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6057
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4573
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4827
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.