By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,908 Members | 1,850 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,908 IT Pros & Developers. It's quick & easy.

Wikipedia - conversion of in SQL database stored data to HTML

P: n/a
Is there an already available script/tool able to
extract records and generate proper HTML
code out of the data stored in the Wikipedia
SQL data base?
e.g.
converting all occurences of
[[xxx|yyy]] to <a href=xxx>yyy</a>
etc.
Or even better a script/tool able to generate
and write to the disk all the HTML files
if given the YYYYMMDDD_cur_table.sql
data, so that the Wikipedia content
becomes available on local computer
without running a server?

By the way:
has someone succeeded in installation of
a local Wikipedia server? As I remember the
problem caused me to fail on this was that
mySQL server was not able to handle a
database larger than 2 GByte
(the english part of current data and
usually the ..._old_table.sql exceed
this size).

Claudio
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Claudio Grondi wrote:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?
They're not in Python, but there are a couple of tools available here:
<http://tinyurl.com/692pt>.
By the way: has someone succeeded in installation of a local
Wikipedia server?


I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.
Jul 18 '05 #2

P: n/a

<http://tinyurl.com/692pt> redirects (if not just down) to
http://en.wikipedia.org/wiki/Wikiped...D_distribution

I see from this page only one tool (not a couple) which is available
to download and use:

http://www.tommasoconforti.com/ the home of Wiki2static

Wiki2static (version 0.61, 02th Aug 2004)
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
is a Perl script to convert a Wikipedia SQL dump
into an html tree suitable for offline browsing or CD distribution.

I failed to find any documentation, so was forced to play
directly with the script settings myself:

$main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";

and running (in the current directory of the script):
\> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\20040727_cur_table.sql
to test the script on a file with small (112 MByte)
size of the SQL dump .

The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...

Any further hints? What am I doing wrong?

(There are now 1.627 folders and 1.307 files with
a total size of 15.6 MB after one hour runtime and
consumption of 20 seconds CPU time even if
I increased the priority of the process to high
on my W2K box running perl 5.8.3 half an hour
ago)

Claudio
P.S.
I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems.
What was the size of the dump file imported to
the MySQL database? Importing only the current
version which was "a while back" smaller
than 2 GByte (skipping the history dump)
causes no problems with MySQL.

"Leif K-Brooks" <eu*****@ecritters.biz> schrieb im Newsbeitrag
news:3a*************@individual.net... Claudio Grondi wrote:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?


They're not in Python, but there are a couple of tools available here:
<http://tinyurl.com/692pt>.
By the way: has someone succeeded in installation of a local
Wikipedia server?


I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.

Jul 18 '05 #3

P: n/a
> $main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";
The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ... Any further hints? What am I doing wrong?


Inbetween I have noticed, that the script started to
download media files from the Internet. Setting
$include_media = 2;
in the script solved the problem.

Thank you Leif for pointing me to
http://tinyurl.com/692pt

What I am still missing is a binary of texvc for
Windows. Have maybe someone a ready-to-use
compiled version or can point me to one?

Conversion from Perl to Python seems (except a
service provided by
http://www.crazy-compilers.com/bridgekeeper/ )
to be not (yet) available and Perl syntax is for me
so far away from what I already know, that
I see currently no chance to come up with
a Python version of
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz

Claudio
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.