473,419 Members | 3,517 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,419 software developers and data experts.

imputing missing values in MySQL?

Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about
20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for
imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

If Field 1 is blank I substitute a value from another record which matches
on Field 2 and partially matches (all but first 4 characters) on Field 3.
Failing this I want to randomly select a value for Field 1 by choosing
randomly from the (Field 1) values of those (complete) records with matching
Field 2. Otherwise the record is ignored / deleted.

I tried doing this in pure Python, but ran into memory problems. If I can
manage to construct a suitable MySQL table I can pull out the relevant
records and get on with my analysis. But (due to my limited knowledge of
SQL) I find that I'm still (I think) doing far too much in Python.

e.g. I'm parsing each record individually and inserting into one of two
tables (one for records with values for Field 1 and one for records with
missing values for Field 1). Then I'm going to iterate over all the records
in the second table, impute, and add the resulting record to the first
table. Somehow I think there must be a better way. Any ideas? TIA.

Duncan
Jul 19 '05 #1
2 3522
Duncan - if you think about using another toolset -
I would recommend Visual FoxPro and odbc to your mySql db.

You can parse the csv file into vfp, then use either a remote view or
straight sql pass through to handle the record insert into mySql, using the
rules you've desribed below .
If you have MS Visual Studio 6.0 floating around anywhere - VFP 6.0 is on CD
#1.
I've used VFP from versions 5 to 8 with mySql, with considerable results and
low time for coding .

I'm not suggesting you learn yet another toolset, though - but I am a bit
fan of that one.
mondo regards [Will]
--
William Sanders / Electronic Filing Group Remove the DOT BOB to reply via
email.
FREE LONG DISTANCE -> mailto:ex********@efgroup.net
mySql / VFP / MS-SQL
"Duncan Smith" <bu*****@urubu.freeserve.co.uk> wrote in message
news:bh**********@newsg3.svr.pol.co.uk...
Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about 20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

[snip]
Jul 19 '05 #2
Duncan - if you think about using another toolset -
I would recommend Visual FoxPro and odbc to your mySql db.

You can parse the csv file into vfp, then use either a remote view or
straight sql pass through to handle the record insert into mySql, using the
rules you've desribed below .
If you have MS Visual Studio 6.0 floating around anywhere - VFP 6.0 is on CD
#1.
I've used VFP from versions 5 to 8 with mySql, with considerable results and
low time for coding .

I'm not suggesting you learn yet another toolset, though - but I am a bit
fan of that one.
mondo regards [Will]
--
William Sanders / Electronic Filing Group Remove the DOT BOB to reply via
email.
FREE LONG DISTANCE -> mailto:ex********@efgroup.net
mySql / VFP / MS-SQL
"Duncan Smith" <bu*****@urubu.freeserve.co.uk> wrote in message
news:bh**********@newsg3.svr.pol.co.uk...
Hello,
I'm not very experienced in SQL and I need some advice. I have a
comma separarated values file containing around 20 million records and about 20 fields. There are many missing values (blank). I am only directly
interested in 2 fields (1 & 4), but need to use two other fields (2 & 3) for imputing values for Field 1 (if missing). If Field 4 is missing I must
delete / ignore the record. The (Python) types are,

Field1 (string) Field2 (string) Field3 (string) Field4 (integer)
For imputation:

[snip]
Jul 19 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Marcus | last post by:
Hello, quick question about MySQL storing NULL values... Say I have a textbox called $_POST and a variable $var. if(empty($_POST)) $var = NULL; else $var = $_POST; Disregarding...
1
by: Maciej Paras | last post by:
Hello! I've written 2 pages: one i HTML format, and second - in ASP. When I'm posting data from HTML page, I receive this error generated by ASP page: Microsoft JET Database Engine (0x80040E14)...
0
by: David Otton | last post by:
Hi, small but disturbing oddity, this. I have a simple table: CREATE TABLE musician ( musicianid int(10) unsigned NOT NULL auto_increment, name varchar(60) NOT NULL default '', PRIMARY KEY...
0
by: Duncan Smith | last post by:
Hello, I'm not very experienced in SQL and I need some advice. I have a comma separarated values file containing around 20 million records and about 20 fields. There are many missing values...
0
by: Daniel Mercier | last post by:
Hi, I'm new to mysql and just installed phpmyadmin. Here is the versions info: System Linux linlyne.spider.sense 2.4.22-1.2199.nptl #1 PHP Version 4.3.8
2
by: Daniel Mercier | last post by:
Hi, I'm new to mysql and just installed phpmyadmin. Here is the versions info: System Linux linlyne.spider.sense 2.4.22-1.2199.nptl #1 PHP Version 4.3.8
0
by: Sean Quilty | last post by:
Help! We are getting the following errors on a slave server: 060304 7:42:03 Error reading packet from server: Could not find first log (server_errno=65535) 060304 7:43:03 Slave: Failed...
2
by: Ike | last post by:
Im wondering if someone can take a look at this simple INSERT statement which is puking on me. I KNOW I must have something stupid in it that I am NOT seeing, which will likely be obvious to...
1
ssnaik84
by: ssnaik84 | last post by:
Hi Guys, Last year I got a chance to work with R&D team, which was working on DB scripts conversion.. Though there is migration tool available, it converts only tables and constraints.. Rest of...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.