473,651 Members | 2,461 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Advice sought on best way to get data into a Linux db2 8.1.4 database

Hi all,

I'm looking for some advice on how best to implement storage of access
logs into a db/2 8.1.4 database running on a RH 7.2 system.

I have 5 (squid) web caches running here that service the whole
university. All access to external web sites must go through these
caches. Each cache generates a gzip'd access log file that's about
100Mbytes every night.

At the moment I'm ftp'ing them to a central system each night for
processing which generates a set of html stat files that are about
1.2Gbytes - that's per night.
Needless to say at that rate the 36 Gbyte disk space I've assigned to
it doesn't last very long.I'm therefore looking for a way of
transferring the data into a back end database that I can access via a
web interface that makes use of stored procedures,java beans and jsp
pages. I probably don't have to dump the data into the db in real time
so I could just post process the existing access log files every
night. Having said that, updating the database in real time would save
a lot of disk space.

I can create a named pipe on the linux box that the squid caching
process writes to and have the other end connected to a process that
munges the data round and writes it into a database. The code I've got
( not mine) is written in perl and writes data to a mysql database and
it would (i think) be a trivial task to write the data into a db/2
back end instead

The other option is to cat an existing log through the same prog and
jsut update
the info off line.

All the web caches are running RH 8.0 with 4 Gbytes of RAM 4 with 2 *
Gbit/s network links in a trunked configuration so the DB/2 server
would be receiving input from 5 caches simultaneously

I suppose what i'm asking is what would be the quickest way of getting
the data into the database. Stick with perl/DBI ?, java prog to
process the input (doesn't feel as if this would be the quickest way
of doing things) piping a data file through a db2 cli interface? or
something else?

Any help suggestions appreciated

alex
Nov 12 '05 #1
3 2062
It is pretty quick to use the Perl/DBI to load data into db2. I loaded about 5GB data (CSV format) into a backend Linux DB2 in just 10 hrs (the hardware is slightly slower than yours). The only disadvantage is the transction control. You have to add this to MySQL Perl code ( MySQL doesn't support transaction by default). An alternative is to use DB2 CLI to do batch loading. The LOAD facility is very quick, in my case it took only 3 hrs to finish the loading and plus integrity check (depends on your table definition and the size of your data). But if you're not confident with the raw data, you'd better use IMPORT, which inserts data row by row and it took me more than 12 hrs to complete the task.

May this help.

Bing
Alex wrote:
Hi all,

I'm looking for some advice on how best to implement storage of access
logs into a db/2 8.1.4 database running on a RH 7.2 system.

I have 5 (squid) web caches running here that service the whole
university. All access to external web sites must go through these
caches. Each cache generates a gzip'd access log file that's about
100Mbytes every night.

At the moment I'm ftp'ing them to a central system each night for
processing which generates a set of html stat files that are about
1.2Gbytes - that's per night.
Needless to say at that rate the 36 Gbyte disk space I've assigned to
it doesn't last very long.I'm therefore looking for a way of
transferring the data into a back end database that I can access via a
web interface that makes use of stored procedures,java beans and jsp
pages. I probably don't have to dump the data into the db in real time
so I could just post process the existing access log files every
night. Having said that, updating the database in real time would save
a lot of disk space.

I can create a named pipe on the linux box that the squid caching
process writes to and have the other end connected to a process that
munges the data round and writes it into a database. The code I've got
( not mine) is written in perl and writes data to a mysql database and
it would (i think) be a trivial task to write the data into a db/2
back end instead

The other option is to cat an existing log through the same prog and
jsut update
the info off line.

All the web caches are running RH 8.0 with 4 Gbytes of RAM 4 with 2 *
Gbit/s network links in a trunked configuration so the DB/2 server
would be receiving input from 5 caches simultaneously

I suppose what i'm asking is what would be the quickest way of getting
the data into the database. Stick with perl/DBI ?, java prog to
process the input (doesn't feel as if this would be the quickest way
of doing things) piping a data file through a db2 cli interface? or
something else?

Any help suggestions appreciated

alex


Nov 12 '05 #2
off the topic - make sure that you siut on a supported LInux distro. We
validated RH7.2 but RH took it off support (and the same is true for RH8,
RH9). If you are not on a RHEL versuion they are not willing to give you any
support and we can not help from the DB2 side fixing their kernel issues
(even if we would ilke to sometimes).

Boris
"Alex" <A.******@hull. ac.uk> wrote in message
news:b2******** *************** ***@posting.goo gle.com...
Hi all,

I'm looking for some advice on how best to implement storage of access
logs into a db/2 8.1.4 database running on a RH 7.2 system.

I have 5 (squid) web caches running here that service the whole
university. All access to external web sites must go through these
caches. Each cache generates a gzip'd access log file that's about
100Mbytes every night.

At the moment I'm ftp'ing them to a central system each night for
processing which generates a set of html stat files that are about
1.2Gbytes - that's per night.
Needless to say at that rate the 36 Gbyte disk space I've assigned to
it doesn't last very long.I'm therefore looking for a way of
transferring the data into a back end database that I can access via a
web interface that makes use of stored procedures,java beans and jsp
pages. I probably don't have to dump the data into the db in real time
so I could just post process the existing access log files every
night. Having said that, updating the database in real time would save
a lot of disk space.

I can create a named pipe on the linux box that the squid caching
process writes to and have the other end connected to a process that
munges the data round and writes it into a database. The code I've got
( not mine) is written in perl and writes data to a mysql database and
it would (i think) be a trivial task to write the data into a db/2
back end instead

The other option is to cat an existing log through the same prog and
jsut update
the info off line.

All the web caches are running RH 8.0 with 4 Gbytes of RAM 4 with 2 *
Gbit/s network links in a trunked configuration so the DB/2 server
would be receiving input from 5 caches simultaneously

I suppose what i'm asking is what would be the quickest way of getting
the data into the database. Stick with perl/DBI ?, java prog to
process the input (doesn't feel as if this would be the quickest way
of doing things) piping a data file through a db2 cli interface? or
something else?

Any help suggestions appreciated

alex

Nov 12 '05 #3
Ken
A.******@hull.a c.uk (Alex) wrote in message news:<b2******* *************** ****@posting.go ogle.com>...
I suppose what i'm asking is what would be the quickest way of getting
the data into the database. Stick with perl/DBI ?, java prog to
process the input (doesn't feel as if this would be the quickest way
of doing things) piping a data file through a db2 cli interface? or
something else?


I'd recommend a typical warehouse etl solution. However, rather than
go the route of commercial etl tools (especially for such a small
project), I'd implement with simple commodity components:
- sftp the gzipped files to the warehouse server
- transform the files from extract to load images using an
easily-maintained
language such as python (or whatever works best for you)
- use the db2 load utility for loading into base fact tables
- archive both extract and load files in a gzipped format in case you
want to
recover or change your data model.

If you go this route then there are only two challenges initially:
- learning the ins & outs of the db2 load utility
- process management

The load utility isn't that bad - just need to get familiar with how
it locks the table, load recovery, etc. And it's very fast - a small
box using load can easily hit 20,000 rows / sec - far faster than
anything you could do with java, perl, etc.

Process management is trickier. But if you only need to do loads once
a day it isn't too tough. I normally just keep the extract,
transport, transform, and load processes completely separate - each
run once a minute via cron, each ensuring that only a single copy is
running, and each interfacing to the other processes only via a flat
file. Need to ensure that files aren't picked up until they're
complete - so a signally file, or a file rename should be performed at
the conclusion of a successful process.

However, an initial implemention (given just 5 files a day) could be
far simplier - based on a static schedule, and an alerting process in
the event that those files aren't available when the downstream
processes get kicked off. Developed in this fashion, a simple log
fact table solution could be developed in 1-3 days, and then easily
enhanced later if you wanted.

ken
Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3005
by: danaf | last post by:
Dear all, (1)Is it possible to host PHP-based Web Portal + MySQL Database (e.g. PHPNuke) on MS Windows System? If yes which server software I need? (2)Compare PHP over JSP/ASP/CGI in performance/memory usage/others...which one is the best for a web portal targetted around few thousand concurrent users? Any article have detail comparisions are welcomed. (3)What is the function of SOAP?
1
1318
by: john | last post by:
Hello all. Thanks for reading and for any advice... Ok, I have a linux webserver with php/mysql/apache. Next, there are csv (comma separated values) log files on a user's WINDOWS machine (and not on the webserver) in a local directory of theirs like so: d:\reports\log_data2004_`\log_data_jan\log_file_jan.txt d:\reports\log_data2004_2\log_data_feb\log_file_feb.txt
4
2209
by: Socheat Sou | last post by:
After a brief, but informative, discussion on Freenode's #zope chatroom, I was advised to consult the gurus on c.l.p. I'm working for a small company who is in desperate need to rewrite it's 15+ year old, flat-file, client-tracking database. The staff uses OSX, while I administer the couple linux servers we have to run our website and some internal sites. In our initial brainstorming session, we decided to go with a Python backend...
6
4611
by: Robert W. | last post by:
I'm building my first major C# program and am try to use best practices everywhere. So I've implemented the "Document/View Model" whereby: - There's a Windows Form, which we'll call "formView" - There's a Class, which we'll call "classDocument" Thus, when an instance of each the form and the class are instantiated, they each will hold the same data, albeit be independent of each other. To keep the two in perfect sync with each other,...
1
1205
by: Jolly Student | last post by:
Dear Colleagues: Thank you for taking the time to read this - I recently posted here with regards to what was possible with .NET. I have been working as a systems engineer for about fifteen years now. My
1
9623
by: David Van D | last post by:
Hi there, A few weeks until I begin my journey towards a degree in Computer Science at Canterbury University in New Zealand, Anyway the course tutors are going to be teaching us JAVA wth bluej and I was wondering if anyone here would be able to give me some tips for young players such as myself, for learning the language. Is this the best Newsgroup for support with JAVA?
5
3516
by: Ray Tomes | last post by:
Hi Folks I am an old codger who has much experience with computers in the distant past before all this object oriented stuff. Also I have loads of software in such languages as FORTRAN and BASIC, QBASIC etc that is very useful except that it really doesn't like to run on modern operating systems and has hopeless graphics resolution and lack of ease of use in some ways.
9
1419
by: Duncan Smith | last post by:
Hello, I find myself in the, for me, unusual (and at the moment unique) position of having to write a web application. I have quite a lot of existing Python code that will form part of the business logic. This relies on 3rd party libraries (such as numpy) which would make porting to e.g. IronPython difficult (I would imagine). I was thinking LAMP (the P standing for Python, of course), particularly as I was originally encouraged to go...
0
1579
by: Chris Rebert | last post by:
On Mon, Nov 17, 2008 at 10:42 AM, Abah Joseph <joefazee@gmail.comwrote: Have you considered basing this off existing software for schools, like one of the programs listed on http://en.wikipedia.org/wiki/Learning_management_system ? Sounds like one hell of a project. Are you sure you aren't encountering the Second System Effect (http://en.wikipedia.org/wiki/Second-system_effect) despite this being
0
8275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8695
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8460
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8576
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7296
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6157
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4143
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4281
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1585
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.