By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,543 Members | 2,174 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,543 IT Pros & Developers. It's quick & easy.

How can I reduce the number of queries to my PostgreSQL database?

P: n/a
SR
As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

Apr 8 '06 #1
Share this Question
Share on Google+
6 Replies


P: n/a

SR wrote:
As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?


The specific name you are looking for is to 'join' tables. There will
be many references and tutorials available, but I suggest you start
with the PostgreSQL tutorial, which is part of the documentation
supplied with PostgreSQL.

Here is a link to the 'join' command in the online manual.

http://www.postgresql.org/docs/8.1/i...rial-join.html

HTH

Frank Millman

Apr 8 '06 #2

P: n/a
>>>>> "SR" == <sh**@shay-riggs.fsnet.co.uk> writes:
SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:

SR> 1. retrieve a list of all book_ids and book_titles.

SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.

SR> 3. display it all out as an html table on a web page.

That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:

CREATE OR REPLACE FUNCTION author_agg_sfunc(TEXT, authors.name%TYPE)
RETURNS TEXT AS '
SELECT $1 || '', '' || $2;
' LANGUAGE sql;

CREATE OR REPLACE FUNCTION author_agg_ffunc(TEXT)
RETURNS TEXT AS '
SELECT trim(trailing '', '' from $1);
' LANGUAGE sql;

CREATE AGGREGATE author_agg (
basetype = VARCHAR(100),
sfunc = author_agg_sfunc,
stype = TEXT,
finalfunc = author_agg_ffunc,
initcond = ''
);

Then you could use it as follows:

SELECT author_agg(authors.name),
foo,
bar
FROM authors, writes, books
WHERE authors.id = writes.author_id
AND writes.book_id = books.id
GROUP BY foo, bar;

This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :-) Learn at your own pace,
though, but you might want to keep this in mind for future reference.

Martin
Apr 8 '06 #3

P: n/a
In article <11**********************@v46g2000cwv.googlegroups .com>,
SR <sh**@shay-riggs.fsnet.co.uk> wrote:
As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?


Yup. The technique is called "using a relational database". This is
precisely the sort of thing SQL does well. Let's say you want to find
out who wrote 'The Hitchhikers Guide to the Galaxy'. You could do the
following (all sql untested and, let's face it, probably not understood
by author):

1. Query for that book to get the book_id
SELECT id FROM books WHERE title='The Hitchhikers Guide To The Galaxy'

2. Look up that author id in the bookauthor database
SELECT author_id FROM bookauthors WHERE book_id=<book id>

3. Look up that author in the author database
SELECT name FROM authors WHERE id=<author id>

or do

SELECT name FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id
AND title='The Hitchhikers Guide To The Galaxy'

Slick, no? You want something like:

SELECT title, name, book_id FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id

If you have more than one author for a book then the book will
appear in the table multiple times. You'll have to combine
those yourself (the book_id row can help here. I don't know
if you can leverage more SQL for that job).

You can optimize some of these SQL queries if you like.
Optimizing JOINs, which is what these are) is a serious
business, but for piddly databases of this size it really
isn't necessary.

Alan
--
Defendit numerus
Apr 10 '06 #4

P: n/a
In article <11**********************@v46g2000cwv.googlegroups .com>,
"SR" <sh**@shay-riggs.fsnet.co.uk> wrote:
The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book)


If your database is that small, why bother with sophisticated
relational-database queries at all? Why not just load everything into
memory, and use sets and dicts and things to put it all together?

This is feasible for databases with up to thousands or even tens of
thousands of records in them, on today's machines.
Apr 11 '06 #5

P: n/a
SR
Martin Christensen said:
SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:

SR> 1. retrieve a list of all book_ids and book_titles.

SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.

SR> 3. display it all out as an html table on a web page.

That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:
<SNIP>
This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :-) Learn at your own pace,
though, but you might want to keep this in mind for future reference.


Thanks for that... I'm not going to argue with a decade's experience!
I'd never heard of aggregates before, but I'll look into them. Perhaps
I'll be able to impress my friends with them one day.

The reason for keeping the authors separate was to wrap them with an
appropriate HTML href, but presumably your solution could be adapted
for this purpose?

Cheers,

Shay

Apr 11 '06 #6

P: n/a
SR
>> Say I have three tables.

Only three? <G>
Well, yeah, OK, it's more than that, but after years of being worn away
by "Post a minimal example" requests on comp.text.tex, a minimal
example is what you got...
Something like {untested... Might need to do a subselect for the
second JOIN}:

SELECT book_id, title, subtitle, ISBN, surname, firstname, notes from
books
LEFT OUTER JOIN bookauthors on books.book_id = bookauthors.book_id
JOIN authors on bookauthors.author_id = authors.author_id
ORDER BY books.book_id

The reason for the LEFT OUTER JOIN, if I recall the syntax, is to
ensure that you get any books that don't have any authors. The sort
order is to: one) make sure the records are grouped properly for later
processing
Thanks for the stuff on LEFT OUTER JOIN. Authorless books would be one
of those things I wouldn't have noticed going astray.
The output will duplicate the book information for those books that
have multiple authors (the simple meaning of "unnormalized"):

2,A Book, Of Nothing, 123, Who, Guess, something
2,A Book, Of Nothing, 123, Second, I'm, or other
I think this goes along with what I thought of immediately after
posting the question: one query to gather all info needed, then
post-process in Python to order it all (so *that's* why I posted
here...). My thoughts had been to turn

[ 1, "Puppetry", "Bill" ]
[ 1, "Puppetry", "Ben" ]
[ 1, "Puppetry", "Flowerpot Men" ]

into

[ 1, "Puppetry", [ "Bill", "Ben", "Flowerpot Men" ] ]

(if that's not overcomplicating it a bit)...
To make your report, you would output the book specific information
only when it changes (this means you need to initialize a temp record to
null data, and compare each record to the temp; when the compare fails,
put out the new book data, and copy it into the temp -- in this example,
just saving the book ID number would be sufficient, as long as it is a
unique/primary key). THEN, put out the Author information. If the
comparison of book data passes, it is the same book with an additional
author, you just need to output the author data.

tmp_bookID = None
for bk in theCursor:
if tmp_bookID != bk[0]: #assumes book_id is first field
Output_Book_Data(bk)
tmp_bookID = bk[0]
Output_Author_Data(bk)


.... which appears to be along the lines of what your code does! (Where
Output_Author_Data(bk) could append to the author list of the current
book.

I'll go away and see how I can 'adapt' your example code.

Thanks!

Shay

Apr 11 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.