473,387 Members | 1,516 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

imputing missing values in MySQL?

Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about
20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for
imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

If Field 1 is blank I substitute a value from another record which matches
on Field 2 and partially matches (all but first 4 characters) on Field 3.
Failing this I want to randomly select a value for Field 1 by choosing
randomly from the (Field 1) values of those (complete) records with matching
Field 2. Otherwise the record is ignored / deleted.

I tried doing this in pure Python, but ran into memory problems. If I can
manage to construct a suitable MySQL table I can pull out the relevant
records and get on with my analysis. But (due to my limited knowledge of
SQL) I find that I'm still (I think) doing far too much in Python.

e.g. I'm parsing each record individually and inserting into one of two
tables (one for records with values for Field 1 and one for records with
missing values for Field 1). Then I'm going to iterate over all the records
in the second table, impute, and add the resulting record to the first
table. Somehow I think there must be a better way. Any ideas? TIA.

Duncan
Jul 19 '05 #1
2 3517
Duncan - if you think about using another toolset -
I would recommend Visual FoxPro and odbc to your mySql db.

You can parse the csv file into vfp, then use either a remote view or
straight sql pass through to handle the record insert into mySql, using the
rules you've desribed below .
If you have MS Visual Studio 6.0 floating around anywhere - VFP 6.0 is on CD
#1.
I've used VFP from versions 5 to 8 with mySql, with considerable results and
low time for coding .

I'm not suggesting you learn yet another toolset, though - but I am a bit
fan of that one.
mondo regards [Will]
--
William Sanders / Electronic Filing Group Remove the DOT BOB to reply via
email.
FREE LONG DISTANCE -> mailto:ex********@efgroup.net
mySql / VFP / MS-SQL
"Duncan Smith" <bu*****@urubu.freeserve.co.uk> wrote in message
news:bh**********@newsg3.svr.pol.co.uk...
Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about 20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

[snip]
Jul 19 '05 #2
Duncan - if you think about using another toolset -
I would recommend Visual FoxPro and odbc to your mySql db.

You can parse the csv file into vfp, then use either a remote view or
straight sql pass through to handle the record insert into mySql, using the
rules you've desribed below .
If you have MS Visual Studio 6.0 floating around anywhere - VFP 6.0 is on CD
#1.
I've used VFP from versions 5 to 8 with mySql, with considerable results and
low time for coding .

I'm not suggesting you learn yet another toolset, though - but I am a bit
fan of that one.
mondo regards [Will]
--
William Sanders / Electronic Filing Group Remove the DOT BOB to reply via
email.
FREE LONG DISTANCE -> mailto:ex********@efgroup.net
mySql / VFP / MS-SQL
"Duncan Smith" <bu*****@urubu.freeserve.co.uk> wrote in message
news:bh**********@newsg3.svr.pol.co.uk...
Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about 20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

[snip]
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Marcus | last post by:
Hello, quick question about MySQL storing NULL values... Say I have a textbox called $_POST and a variable $var. if(empty($_POST)) $var = NULL; else $var = $_POST; Disregarding...
1
by: Maciej Paras | last post by:
Hello! I've written 2 pages: one i HTML format, and second - in ASP. When I'm posting data from HTML page, I receive this error generated by ASP page: Microsoft JET Database Engine (0x80040E14)...
0
by: David Otton | last post by:
Hi, small but disturbing oddity, this. I have a simple table: CREATE TABLE musician ( musicianid int(10) unsigned NOT NULL auto_increment, name varchar(60) NOT NULL default '', PRIMARY KEY...
0
by: Duncan Smith | last post by:
Hello, I'm not very experienced in SQL and I need some advice. I have a comma separarated values file containing around 20 million records and about 20 fields. There are many missing values...
0
by: Daniel Mercier | last post by:
Hi, I'm new to mysql and just installed phpmyadmin. Here is the versions info: System Linux linlyne.spider.sense 2.4.22-1.2199.nptl #1 PHP Version 4.3.8
2
by: Daniel Mercier | last post by:
Hi, I'm new to mysql and just installed phpmyadmin. Here is the versions info: System Linux linlyne.spider.sense 2.4.22-1.2199.nptl #1 PHP Version 4.3.8
0
by: Sean Quilty | last post by:
Help! We are getting the following errors on a slave server: 060304 7:42:03 Error reading packet from server: Could not find first log (server_errno=65535) 060304 7:43:03 Slave: Failed...
2
by: Ike | last post by:
Im wondering if someone can take a look at this simple INSERT statement which is puking on me. I KNOW I must have something stupid in it that I am NOT seeing, which will likely be obvious to...
1
ssnaik84
by: ssnaik84 | last post by:
Hi Guys, Last year I got a chance to work with R&D team, which was working on DB scripts conversion.. Though there is migration tool available, it converts only tables and constraints.. Rest of...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.