473,320 Members | 1,988 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Analysing WikiPedia dump file.....

I just download wikipedia's wikibooks dump file.
http://download.wikimedia.org/
I have a hard time figuring out what to do with 130MB XML file.
I tried hard to convert it to to a MySQL database with xml2sql but grr...
mysqlimport keeps failing with column id invalid...

How could I use the damn thing!
130MB of XML! Most of my text editor / viewer just fail...
May 14 '06 #1
3 1499
On Mon, 15 May 2006 00:43:57 +1000, Lloyd Dupont wrote:
I just download wikipedia's wikibooks dump file.
http://download.wikimedia.org/
I have a hard time figuring out what to do with 130MB XML file.
I tried hard to convert it to to a MySQL database with xml2sql but grr...
mysqlimport keeps failing with column id invalid...

How could I use the damn thing!
130MB of XML! Most of my text editor / viewer just fail...


There's documentation on their download page about the format they are
using and tools that can be used to import the dumps in a database. Have
you tried them?
May 14 '06 #2
As I sai I tryed mysqlimport on the SQL file create by xml2sql from the big
xml.
but I keep having SQL error (invalid value for column)

--
Regards,
Lloyd Dupont

NovaMind development team
NovaMind Software
Mind Mapping Software
<www.nova-mind.com>
"Mehdi" <vi****@REMOVEME.gmail.com> wrote in message
news:1q*******************************@40tude.net. ..
On Mon, 15 May 2006 00:43:57 +1000, Lloyd Dupont wrote:
I just download wikipedia's wikibooks dump file.
http://download.wikimedia.org/
I have a hard time figuring out what to do with 130MB XML file.
I tried hard to convert it to to a MySQL database with xml2sql but grr...
mysqlimport keeps failing with column id invalid...

How could I use the damn thing!
130MB of XML! Most of my text editor / viewer just fail...


There's documentation on their download page about the format they are
using and tools that can be used to import the dumps in a database. Have
you tried them?

May 15 '06 #3
In microsoft.public.dotnet.languages.vc Lloyd Dupont <net.galador@ld> wrote:
As I sai I tryed mysqlimport on the SQL file create by xml2sql from the big
xml.
but I keep having SQL error (invalid value for column)


But did you try the tools they give you? Specifically, mwdumper.
http://meta.wikimedia.org/wiki/Data_dumps

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
May 16 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Claudio Grondi | last post by:
Is there an already available script/tool able to extract records and generate proper HTML code out of the data stored in the Wikipedia SQL data base? e.g. converting all occurences of ] to <a...
4
by: Claudio Grondi | last post by:
I need to unpack on a Windows 2000 machine some Wikipedia media .tar archives which are compressed with TAR 1.14 (support for long file names and maybe some other features) . It seems, that...
7
by: Cyril VELTER | last post by:
I'm trying to dump a database from a 7.1.3 server to a 7.4.2 one. It doesn't works because of difference in COPY format (unless I use -d which is VERY slow on a 16G database). What are the...
2
by: Oleg | last post by:
Dear All, I have upgraded Postgresql from 7.3 to 7.4. Starting pg brings error: The database is in an older format that cannot be read by version 7.4 of PostgreSQL dpkg-upgrade postgresql...
1
by: dekel | last post by:
I'm trying to find the way to create and debug dumpfile of dotnet application. Any recommendation of articles, tools for creating dump file + debug it are appreciated. The documentation I...
1
by: Roman Ziak | last post by:
I switched to Windows server and logs generated by my ISP are pathetic comparing to those from Apache. I would like to do logging via PHP and use the same log for visits and for PHP tracing. That...
2
by: Andi Clemens | last post by:
Hi, we had some problems in the last weeks with our mailserver. Some messages were not delivered and we wanted to know why. But looking through the logfile is a time consuming process. So I...
2
by: John Nagle | last post by:
For some reason, Python's parser for "robots.txt" files doesn't like Wikipedia's "robots.txt" file: False The Wikipedia robots.txt file passes robots.txt validation, and it doesn't disallow...
5
by: Davo1977 | last post by:
Analysing text files to obtain statistics on their content You are to write a Perl program that analyses text files to obtain statistics on their content. The program should operate as follows: ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.