Richard Huxton wrote:
Madison Kelly wrote:
Hi all,
I am new to the list and I didn't want to seem rude at all so I
wanted to ask if this was okay first.
No problem. Reading your message below, you might want to try the
performance list, but general is a good place to start.
I have a program I have written in perl which uses a postgresSQL
database as the backend. The program works but the performance is
really bad. I have been reading as much as I can on optimizing
performance but still it isn't very reasonable. At one point I had my
program able to process 175,000 records in 16min 10sec on a Pentium3
650MHz, 448MB RAM test machine. Since then I got a Pentium3 1GHz,
512MB system and I have tried a lot of things to get the performance
up but now it is substantially slower and I can't seem to figure out
what I am doing wrong.
A few places to start:
1. VACUUM FULL
This will make sure any unused space is reclaimed
2. ANALYZE
This will recalculate stats for the tables
3. Basic performce tuning:
http://www.varlena.com/varlena/Gener...bits/index.php
There's also a good guide to the postgresql.conf file on varlena.com
Would it be appropriate to ask for help on my program on this list?
Full disclosure: The program won't be initially GPL'ed because it is
for my company but it will be released for free to home users and the
source code will be made available (similar to other split-license
programs) though once my company makes it's money back I think they
will fully GPL it (I am on my boss's case about it :p ).
No problem - what you licence your software under is your concern. Once
you've taken the basic steps described above, try to pick out a specific
query that you think is too slow and provide:
1. PostgreSQL version
2. Basic hardware info (as you have)
3. Sizes of tables.
4. Output of EXPLAIN ANALYZE <query here>
The EXPLAIN ANALYZE runs the query and shows how much work PG thought it
would be and how much it actually turned out to be.
HTH
Thank you very much!! I am using Psql 7.4 on a stock install of Fedora
Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the fastest
HDD). The drive carrier I am using is connected via USB2 and uses a few
different hard drives with the fastest being a couple of Barracuda
7200.7 drives (2MB cache, 7,200rpm). I described the program in my reply
to Martijn so here is some of the code (code not related to psql
snipped, let me know if posting it would help - sorry for the wrapping...):
=-[ Calling the database ]-=
# Open the connection to the database
my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect
error (Is PostgresSQL running?): $DBI::errstr");
# Prepare the select statements before using them for speed:
$select_sth = $DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die
"$DBI::errstr";
$select_up = $DB->prepare("UPDATE file_dir SET file_perm=?,
file_own_user=?, file_own_grp=?, file_size=?, file_mod_date=?,
file_mod_time=?, file_mod_time_zone=?, file_exist=? WHERE
file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die
"$DBI::errstr";
$select_in = $DB->prepare("INSERT INTO file_dir ( file_src_uuid,
file_name, file_dir, file_parent_dir, file_perm, file_own_user,
file_own_grp, file_size, file_mod_date, file_mod_time,
file_mod_time_zone, file_backup, file_restore, file_display, file_exist
) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )") || die
"$DBI::errstr";
# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE
file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";
# Start scanning the drive
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name ) ||
die "$DBI::errstr";
if ( $num > 0 )
{
$select_up->execute($file_perm,$file_own_user,$file_own_grp,$ file_size,$file_mod_date,$file_mod_time,$file_mod_ time_zone,$file_exist,$file_src_uuid,$file_parent_ dir,$file_name)
|| die "$DBI::errstr";
}
else
{
$select_in->execute($file_src_uuid,$file_name,$file_dir,$file _parent_dir,$file_perm,$file_own_user,$file_own_gr p,$file_size,$file_mod_date,$file_mod_time,$file_m od_time_zone,$file_backup,$file_restore,$file_disp lay,$file_exist)
|| die "$DBI::errstr";
}
# We need to grab the existing file settings for the special file '/.'
$DBreq=$DB->prepare("SELECT file_backup, file_restore, file_display FROM
file_dir WHERE file_parent_dir='/' AND file_name='.' AND
file_src_uuid='$file_src_uuid'") || die $DBI::errstr;
$file_backup=$DBreq->execute();
@file_backup=$DBreq->fetchrow_array();
$file_backup=@file_backup[0];
$file_restore=@file_backup[1];
$file_display=@file_backup[2];
# Jump into the re-entrant subroutine to scan directories and sub-dirs
&list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup,
$file_restore, $file_display);
# Inside the sub routine
# Does the directory/file/symlink already exist? (there are three of
these for each file type)
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name ) ||
die "$DBI::errstr";
if ( $num > 0 )
{
$select_up->execute($file_perm,$file_own_user,$file_own_grp,$ file_size,$file_mod_date,$file_mod_time,$file_mod_ time_zone,$file_exist,$file_src_uuid,$file_parent_ dir,$file_name)
|| die "$DBI::errstr";
}
else
{
# The file did not exist so we will use the passed parent settings
for the 'file_backup' flag and leave the 'file_display' flag set to 'f'
$select_in->execute($file_src_uuid,$file_name,$file_dir,$file _parent_dir,$file_perm,$file_own_user,$file_own_gr p,$file_size,$file_mod_date,$file_mod_time,$file_m od_time_zone,$file_backup,$file_restore,$file_disp lay,$file_exist)
|| die "$DBI::errstr";
}
# If this was a file I would loop and process the next file in the
directory, if it was a directory itself I would now re-enter the
subroutine to process it's contents and when I fell back I would pick up
where I left off
# Returning from the final subroutine and finishing up
$DB->do("VACUUM ANALYZE");
=-[ finished DB related source code ]-=
Here is the schema for the 'file_dir' table which I hit repeatedly here:
=-[ file_dir table and index schemas ]-=
CREATE TABLE file_dir ( -- Used to store info
on every file on source partitions
file_id serial unique, -- make
this 'bigserial' if there may be more than 2 billion files in the database
file_src_uuid varchar(40) not null, -- the
UUID of the source partition hosting the original file
file_org_uuid varchar(40), -- the UUID
that the file came from (when the file was moved by TLE-BU)
file_name varchar(255) not null, -- Name of
the file or directory
file_dir bool not null, -- t = is
directory, f = file
file_parent_dir varchar(255) not null, -- if
directory '/foo/bar', parent is '/foo', if file '/foo/bar/file', parent
is '/foo/bar'. The mount directory is treated as '/' so any directories
below it will be ignored for this record.
file_perm varchar(10) not null, -- file or
directory permissions
file_own_user varchar(255) not null, -- The
file's owning user (by name, not UID!!)
file_own_grp varchar(255) not null, -- The
file's owning group (by name, not GID!!)
file_size bigint not null, -- File size in
bytes
file_mod_date varchar(12) not null, -- File's
last edited date
file_mod_time varchar(20) not null, -- File's
last edited time
file_mod_time_zone varchar(6) not null, -- File's
last edited time zone
file_backup boolean not null default 'f', --
't' = Include in backup jobs, 'f' = Do not include in backup jobs
file_restore boolean not null default 'f', --
't' = Include in restore jobs, 'f' = Do not include in restore jobs
file_display boolean not null default 'f', --
't' = display, 'f' = hide
file_exist boolean default 't' -- Used to
catch files that have been deleted since the last scan. Before rescan,
all files in a given src_uuid are set to 0 (deleted) and then as each
file is found or updated it is reset back to 1 (exists) and anything
left with a value of '0' at the end of the scan is deleted and we will
remove their record.
);
-- CREATE INDEX file_dir_idx ON file_dir
(file_src_uuid,file_name,file_parent_dir);
=-[ Finish file_dir table and index schemas ]-=
Thanks so much!!
Madison
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match