473,386 Members | 1,708 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Wikipedia - conversion of in SQL database stored data to HTML

Is there an already available script/tool able to
extract records and generate proper HTML
code out of the data stored in the Wikipedia
SQL data base?
e.g.
converting all occurences of
[[xxx|yyy]] to <a href=xxx>yyy</a>
etc.
Or even better a script/tool able to generate
and write to the disk all the HTML files
if given the YYYYMMDDD_cur_table.sql
data, so that the Wikipedia content
becomes available on local computer
without running a server?

By the way:
has someone succeeded in installation of
a local Wikipedia server? As I remember the
problem caused me to fail on this was that
mySQL server was not able to handle a
database larger than 2 GByte
(the english part of current data and
usually the ..._old_table.sql exceed
this size).

Claudio
Jul 18 '05 #1
3 3899
Claudio Grondi wrote:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?
They're not in Python, but there are a couple of tools available here:
<http://tinyurl.com/692pt>.
By the way: has someone succeeded in installation of a local
Wikipedia server?


I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.
Jul 18 '05 #2

<http://tinyurl.com/692pt> redirects (if not just down) to
http://en.wikipedia.org/wiki/Wikiped...D_distribution

I see from this page only one tool (not a couple) which is available
to download and use:

http://www.tommasoconforti.com/ the home of Wiki2static

Wiki2static (version 0.61, 02th Aug 2004)
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
is a Perl script to convert a Wikipedia SQL dump
into an html tree suitable for offline browsing or CD distribution.

I failed to find any documentation, so was forced to play
directly with the script settings myself:

$main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";

and running (in the current directory of the script):
\> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\20040727_cur_table.sql
to test the script on a file with small (112 MByte)
size of the SQL dump .

The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...

Any further hints? What am I doing wrong?

(There are now 1.627 folders and 1.307 files with
a total size of 15.6 MB after one hour runtime and
consumption of 20 seconds CPU time even if
I increased the priority of the process to high
on my W2K box running perl 5.8.3 half an hour
ago)

Claudio
P.S.
I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems.
What was the size of the dump file imported to
the MySQL database? Importing only the current
version which was "a while back" smaller
than 2 GByte (skipping the history dump)
causes no problems with MySQL.

"Leif K-Brooks" <eu*****@ecritters.biz> schrieb im Newsbeitrag
news:3a*************@individual.net... Claudio Grondi wrote:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?


They're not in Python, but there are a couple of tools available here:
<http://tinyurl.com/692pt>.
By the way: has someone succeeded in installation of a local
Wikipedia server?


I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.

Jul 18 '05 #3
> $main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";
The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ... Any further hints? What am I doing wrong?


Inbetween I have noticed, that the script started to
download media files from the Internet. Setting
$include_media = 2;
in the script solved the problem.

Thank you Leif for pointing me to
http://tinyurl.com/692pt

What I am still missing is a binary of texvc for
Windows. Have maybe someone a ready-to-use
compiled version or can point me to one?

Conversion from Perl to Python seems (except a
service provided by
http://www.crazy-compilers.com/bridgekeeper/ )
to be not (yet) available and Perl syntax is for me
so far away from what I already know, that
I see currently no chance to come up with
a Python version of
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz

Claudio
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: dmb000006 | last post by:
Hello, I have a database style data structure, each record has several fields. I would like to create a nested data structure that would let me 'query' the data on the value of certain...
2
by: Bassem | last post by:
Hi all... I searhed for a code to save and retrieve image from SQL database using Data adapter but I didn't found anything. Thanks, Bassem.
5
by: voidfill3d | last post by:
I have a problem with ASP.NET and entering data into a MS SQL database. I have the following code and what happens is the data gets into the database, but with one extra space at the end of ...
0
by: | last post by:
Hello, I'm trying to find an example in vb.net of how I can have a user select a MS SQL database stored locally created using MSDE. TIA
7
by: Randy Yates | last post by:
I'm a complete newbie to postgres so please look the other way if these questions are really stupid. Is it legitimate to have one database per data file? For organizational and backup purposes,...
4
by: anuragpj | last post by:
I had made a persons profile in a html page and stored the data in a postgre database. Now I want that, when the person visit his profile page only that person should be able to update the profile...
4
by: McGowan | last post by:
Hi, I'm trying to display data from a mysql database in a HTML table but for some reason my code isn't working. At the moment I have got it to read and display the headers and the first row of the...
6
by: hemak2006 | last post by:
hai... can anyone plz tell me that how to access image from database to data report...
7
by: erikcw | last post by:
Hi, I'm working on a web application where each user will be creating several "projects" in there account, each with 1,000-50,000 objects. Each object will consist of a unique name, an id, and...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.