Hi all,
I'm looking for some advice on how best to implement storage of access
logs into a db/2 8.1.4 database running on a RH 7.2 system.
I have 5 (squid) web caches running here that service the whole
university. All access to external web sites must go through these
caches. Each cache generates a gzip'd access log file that's about
100Mbytes every night.
At the moment I'm ftp'ing them to a central system each night for
processing which generates a set of html stat files that are about
1.2Gbytes - that's per night.
Needless to say at that rate the 36 Gbyte disk space I've assigned to
it doesn't last very long.I'm therefore looking for a way of
transferring the data into a back end database that I can access via a
web interface that makes use of stored procedures,java beans and jsp
pages. I probably don't have to dump the data into the db in real time
so I could just post process the existing access log files every
night. Having said that, updating the database in real time would save
a lot of disk space.
I can create a named pipe on the linux box that the squid caching
process writes to and have the other end connected to a process that
munges the data round and writes it into a database. The code I've got
( not mine) is written in perl and writes data to a mysql database and
it would (i think) be a trivial task to write the data into a db/2
back end instead
The other option is to cat an existing log through the same prog and
jsut update
the info off line.
All the web caches are running RH 8.0 with 4 Gbytes of RAM 4 with 2 *
Gbit/s network links in a trunked configuration so the DB/2 server
would be receiving input from 5 caches simultaneously
I suppose what i'm asking is what would be the quickest way of getting
the data into the database. Stick with perl/DBI ?, java prog to
process the input (doesn't feel as if this would be the quickest way
of doing things) piping a data file through a db2 cli interface? or
something else?
Any help suggestions appreciated
alex