imputing missing values in MySQL?

Duncan Smith

Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about
20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for
imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

If Field 1 is blank I substitute a value from another record which matches
on Field 2 and partially matches (all but first 4 characters) on Field 3.
Failing this I want to randomly select a value for Field 1 by choosing
randomly from the (Field 1) values of those (complete) records with matching
Field 2. Otherwise the record is ignored / deleted.

I tried doing this in pure Python, but ran into memory problems. If I can
manage to construct a suitable MySQL table I can pull out the relevant
records and get on with my analysis. But (due to my limited knowledge of
SQL) I find that I'm still (I think) doing far too much in Python.

e.g. I'm parsing each record individually and inserting into one of two
tables (one for records with values for Field 1 and one for records with
missing values for Field 1). Then I'm going to iterate over all the records
in the second table, impute, and add the resulting record to the first
table. Somehow I think there must be a better way. Any ideas? TIA.

Duncan

Jul 19 '05 #1

Subscribe Post Reply

3517

swdev1

Duncan - if you think about using another toolset -
I would recommend Visual FoxPro and odbc to your mySql db.

You can parse the csv file into vfp, then use either a remote view or
straight sql pass through to handle the record insert into mySql, using the
rules you've desribed below .
If you have MS Visual Studio 6.0 floating around anywhere - VFP 6.0 is on CD
#1.
I've used VFP from versions 5 to 8 with mySql, with considerable results and
low time for coding .

I'm not suggesting you learn yet another toolset, though - but I am a bit
fan of that one.
mondo regards [Will]
--
William Sanders / Electronic Filing Group Remove the DOT BOB to reply via
email.
FREE LONG DISTANCE -> mailto:ex********@efgroup.net
mySql / VFP / MS-SQL
"Duncan Smith" <bu*****@urubu.freeserve.co.uk> wrote in message
news:bh**********@newsg3.svr.pol.co.uk...

Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about 20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

[snip]

Jul 19 '05 #2

swdev1

Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about 20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

[snip]

Jul 19 '05 #3

Similar topics

mysql and null values

by: Marcus | last post by:

Hello, quick question about MySQL storing NULL values... Say I have a textbox called $_POST and a variable $var. if(empty($_POST)) $var = NULL; else $var = $_POST; Disregarding...

PHP

Syntax error (missing operator) in query expression

by: Maciej Paras | last post by:

Hello! I've written 2 pages: one i HTML format, and second - in ASP. When I'm posting data from HTML page, I receive this error generated by ASP page: Microsoft JET Database Engine (0x80040E14)...

ASP / Active Server Pages

Missing row

by: David Otton | last post by:

Hi, small but disturbing oddity, this. I have a simple table: CREATE TABLE musician ( musicianid int(10) unsigned NOT NULL auto_increment, name varchar(60) NOT NULL default '', PRIMARY KEY...