473,549 Members | 3,109 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Wikipedia - conversion of in SQL database stored data to HTML

Is there an already available script/tool able to
extract records and generate proper HTML
code out of the data stored in the Wikipedia
SQL data base?
e.g.
converting all occurences of
[[xxx|yyy]] to <a href=xxx>yyy</a>
etc.
Or even better a script/tool able to generate
and write to the disk all the HTML files
if given the YYYYMMDDD_cur_t able.sql
data, so that the Wikipedia content
becomes available on local computer
without running a server?

By the way:
has someone succeeded in installation of
a local Wikipedia server? As I remember the
problem caused me to fail on this was that
mySQL server was not able to handle a
database larger than 2 GByte
(the english part of current data and
usually the ..._old_table.s ql exceed
this size).

Claudio
Jul 18 '05 #1
3 3913
Claudio Grondi wrote:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?
They're not in Python, but there are a couple of tools available here:
<http://tinyurl.com/692pt>.
By the way: has someone succeeded in installation of a local
Wikipedia server?


I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.
Jul 18 '05 #2

<http://tinyurl.com/692pt> redirects (if not just down) to
http://en.wikipedia.org/wiki/Wikiped...D_distribution

I see from this page only one tool (not a couple) which is available
to download and use:

http://www.tommasoconforti.com/ the home of Wiki2static

Wiki2static (version 0.61, 02th Aug 2004)
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
is a Perl script to convert a Wikipedia SQL dump
into an html tree suitable for offline browsing or CD distribution.

I failed to find any documentation, so was forced to play
directly with the script settings myself:

$main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";

and running (in the current directory of the script):
\> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\2004072 7_cur_table.sql
to test the script on a file with small (112 MByte)
size of the SQL dump .

The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...

Any further hints? What am I doing wrong?

(There are now 1.627 folders and 1.307 files with
a total size of 15.6 MB after one hour runtime and
consumption of 20 seconds CPU time even if
I increased the priority of the process to high
on my W2K box running perl 5.8.3 half an hour
ago)

Claudio
P.S.
I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems.
What was the size of the dump file imported to
the MySQL database? Importing only the current
version which was "a while back" smaller
than 2 GByte (skipping the history dump)
causes no problems with MySQL.

"Leif K-Brooks" <eu*****@ecritt ers.biz> schrieb im Newsbeitrag
news:3a******** *****@individua l.net... Claudio Grondi wrote:
Is there an already available script/tool able to extract records and
generate proper HTML code out of the data stored in the Wikipedia SQL
data base?


They're not in Python, but there are a couple of tools available here:
<http://tinyurl.com/692pt>.
By the way: has someone succeeded in installation of a local
Wikipedia server?


I loaded all of the Wikipedia data into a local MySQL server a while
back without any problems. I haven't attempted to run Mediawiki on top
of that, but I don't see why that wouldn't work.

Jul 18 '05 #3
> $main_prefix = "u:/WikiMedia-Static-HTML/";
$wiki_language = "pl";
The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ... Any further hints? What am I doing wrong?


Inbetween I have noticed, that the script started to
download media files from the Internet. Setting
$include_media = 2;
in the script solved the problem.

Thank you Leif for pointing me to
http://tinyurl.com/692pt

What I am still missing is a binary of texvc for
Windows. Have maybe someone a ready-to-use
compiled version or can point me to one?

Conversion from Perl to Python seems (except a
service provided by
http://www.crazy-compilers.com/bridgekeeper/ )
to be not (yet) available and Perl syntax is for me
so far away from what I already know, that
I see currently no chance to come up with
a Python version of
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz

Claudio
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2102
by: dmb000006 | last post by:
Hello, I have a database style data structure, each record has several fields. I would like to create a nested data structure that would let me 'query' the data on the value of certain fields. I did this by making the following type of data...
2
11311
by: Bassem | last post by:
Hi all... I searhed for a code to save and retrieve image from SQL database using Data adapter but I didn't found anything. Thanks, Bassem.
5
1510
by: voidfill3d | last post by:
I have a problem with ASP.NET and entering data into a MS SQL database. I have the following code and what happens is the data gets into the database, but with one extra space at the end of the entry. Is this preventable with something other than a trim in my stored procedure? I know this is not necessarily in ASP.NET because I...
0
238
by: | last post by:
Hello, I'm trying to find an example in vb.net of how I can have a user select a MS SQL database stored locally created using MSDE. TIA
7
2865
by: Randy Yates | last post by:
I'm a complete newbie to postgres so please look the other way if these questions are really stupid. Is it legitimate to have one database per data file? For organizational and backup purposes, I'd like to keep the database files for each of several projects separate. This means, e.g., that postmaster must have multiple instances going...
4
3011
by: anuragpj | last post by:
I had made a persons profile in a html page and stored the data in a postgre database. Now I want that, when the person visit his profile page only that person should be able to update the profile and data should be update in the database. And when he go to the editable mode data should be fetched from database and when he update the profile...
4
21283
by: McGowan | last post by:
Hi, I'm trying to display data from a mysql database in a HTML table but for some reason my code isn't working. At the moment I have got it to read and display the headers and the first row of the table and it actually creates the remaining rows in the html table but it doesn't put any data in them. This is my code so far: <?php $con =...
6
4045
by: hemak2006 | last post by:
hai... can anyone plz tell me that how to access image from database to data report...
7
1788
by: erikcw | last post by:
Hi, I'm working on a web application where each user will be creating several "projects" in there account, each with 1,000-50,000 objects. Each object will consist of a unique name, an id, and some meta data. The number of objects will grow and shrink as the user works with their project. I'm trying to decided whether to store the...
0
7520
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7446
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7809
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6041
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5368
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3498
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3480
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1936
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1058
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.