472,988 Members | 2,639 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,988 software developers and data experts.

Extracting data from dump file

TYR
I have a large dump file that originated in a MySQL db; I need to get
it into an SQLite file.

Various options are suggested around the web; none of them seem to
work (most failing to import the thing in the first place). So I
removed the assorted taggery from each end, leaving just a big text
file taking the following format:

('value', 'value', 'value, 'value'),
('value','value','value','value')...

I planned to find some way of splitting the thing at the commas
outside the bracketed groups, thus giving me a list of tuples; then I
could of course CREATE TABLE foo VALUES '1', '2, '3', '4' and then
iterate through the list INSERTing INTO.

Then my problems began; I tried using the python csv method, replacing
the string ),( with \t and then using \t as the delimiter. First
problem; there's a size limit coded into the module. No problem, use
csv.field_size_limit() to alter it. Problem; it doesn't actually parse
at all, just sends the whole thing as a string and the SQL INSERT
fails with a "not enough args" error.

Tried using string.split() and re.split(data, r'\t'); first gave the
same error, second failed with a "too many named groups" error. Tried
using ; as a delimiter and going back to csv; this fails to match
the ; for some reason. Any ideas?
Nov 23 '07 #1
1 3482
On Nov 24, 5:46 am, TYR <a.harrow...@gmail.comwrote:
I have a large dump file that originated in a MySQL db; I need to get
it into an SQLite file.

Various options are suggested around the web; none of them seem to
work (most failing to import the thing in the first place). So I
removed the assorted taggery from each end, leaving just a big text
file taking the following format:

('value', 'value', 'value, 'value'),
('value','value','value','value')...
Consider the possibility that individual data rows are separated by "),
\n(" or maybe "), \n(" or something else, not "),(".

What does
print repr(open("my_dump.txt", "rb").read(1000))
tell you? Are there any "\n" in there at all? BTW, what platform are
you running on?
>
I planned to find some way of splitting the thing at the commas
outside the bracketed groups, thus giving me a list of tuples; then I
could of course CREATE TABLE foo VALUES '1', '2, '3', '4' and then
CREATE TABLE foo VALUES '1', .... are you sure?

iterate through the list INSERTing INTO.

Then my problems began; I tried using the python csv method, replacing
the string ),( with \t and then using \t as the delimiter. First
problem; there's a size limit coded into the module. No problem, use
csv.field_size_limit() to alter it.
>>import csv
csv.field_size_limit()
131072
>>>
And do you really expect that any one of your data rows will take more
than 128KB in the dump file?? How large did you have to set the limit
to make it "work"???
Problem; it doesn't actually parse
at all,
Sounds like it did parse, with the result being one csv row containing
one very long field.
just sends the whole thing as a string and the SQL INSERT
fails with a "not enough args" error.

Tried using string.split() and re.split(data, r'\t'); first gave the
same error, second failed with a "too many named groups" error.
All of these are indications that you don't have the correct
delimiter.
Tried
using ; as a delimiter and going back to csv; this fails to match
the ; for some reason. Any ideas?
Examine your dump carefully. Are there any newlines other than at the
logical end of data rows (like e.g. it's folded at column 78)? If so,
you'll have to replace those by spaces or delete them. Work out
exactly what is separating data rows. Let's call it "sep". If you can
use file_contents.replace(sep, "\t"), then you can use
file_contents.split(sep) directly; there's no need for a two step
approach. If sep is variable, then use re.split.

Try this on a small subset of your dump file. Before you go anywhere
near the database, check the results of your split: did you get the
expected number of rows [currently you are getting 1 row], does each
row look OK? Unsubtle hint: the first row may start with "junk(" and
the last may end with ")junk" if you haven't already allowed for that.

Then try inserting into the database. Print a row number and row
contents before each insert (or wrap try/except around each insert) so
that you know which row any error relates to.

If you still have a problem, ask again, but do show exactly what your
dump file looks like and what your code is -- my crystal ball needs a
long recharge after all of the above!

HTH,
John
Nov 23 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Droopy | last post by:
Hi, I am running an application on 74 PC (1 instance on each PC). This morning, 2 applications crashes. There were neither popup windows signaling it nor dump file created. I only find in the...
9
by: Shivam | last post by:
Hi All, I have a mysql dump file... something like. __________________________________________________ CREATE TABLE nation ( ip int(11) unsigned NOT NULL default '0', country char(2) NOT...
0
by: Sunil Basu | last post by:
Hi, I have a interesting thing to know and discuss with you. I am extracting data from an Excel file in a Delphi DbGrid through SQL. I want to create a criteria on a specific cell value of the...
0
by: runner7 | last post by:
I used file_get_contents() to read a pdf into a string and then tried to extract the encoded part between the "stream" and "endstream" words using the strpos() and substr() functions. (I could not...
0
by: sgsiaokia | last post by:
I need help in extracting data from another source file using VBA. I have problems copying the extracted data and format into the required data format. And also, how do i delete the row that is not...
1
by: avinash jain | last post by:
I got a segmentation fault.. I came to know that the problem could be easliy solved be the core dump files. could any tell how to create a core dump file and how to read a core dump file. I am...
1
by: JSagar | last post by:
Hello Expert ! I am new to python , i had done 'C' Code , which extracted file data , But i don't want to give my exe On remote side. Following thing is it Possible using python? -...
1
by: davidson1 | last post by:
Hai friends, In our college we have oracle database , i need the old records for testing my project , and the old records are in dump file. Now i need to tranfer the data from oracle dump file...
4
by: poolboi | last post by:
hi guys i've having some problem extracting data from a text file example if i got a text file with infos like: Date 2008-05-01 Time 22-10 Date 2008-05-01 Time 21-00 Date 2008-05-02 Time...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
3
SueHopson
by: SueHopson | last post by:
Hi All, I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.