How can I reduce the number of queries to my PostgreSQL database?

As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

Apr 8 '06 #1

Subscribe Post Reply

1663

Frank Millman

SR wrote:

As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

The specific name you are looking for is to 'join' tables. There will
be many references and tutorials available, but I suggest you start
with the PostgreSQL tutorial, which is part of the documentation
supplied with PostgreSQL.

Here is a link to the 'join' command in the online manual.

http://www.postgresql.org/docs/8.1/i...rial-join.html

HTH

Frank Millman

Apr 8 '06 #2

Martin Christensen

>>>>> "SR" == <sh**@shay-riggs.fsnet.co.uk> writes:
SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:

SR> 1. retrieve a list of all book_ids and book_titles.

SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.

SR> 3. display it all out as an html table on a web page.

That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:

CREATE OR REPLACE FUNCTION author_agg_sfunc(TEXT, authors.name%TYPE)
RETURNS TEXT AS '
SELECT $1 || '', '' || $2;
' LANGUAGE sql;

CREATE OR REPLACE FUNCTION author_agg_ffunc(TEXT)
RETURNS TEXT AS '
SELECT trim(trailing '', '' from $1);
' LANGUAGE sql;

CREATE AGGREGATE author_agg (
basetype = VARCHAR(100),
sfunc = author_agg_sfunc,
stype = TEXT,
finalfunc = author_agg_ffunc,
initcond = ''
);

Then you could use it as follows:

SELECT author_agg(authors.name),
foo,
bar
FROM authors, writes, books
WHERE authors.id = writes.author_id
AND writes.book_id = books.id
GROUP BY foo, bar;

This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :-) Learn at your own pace,
though, but you might want to keep this in mind for future reference.

Martin

Apr 8 '06 #3

Alan Morgan

In article <11**********************@v46g2000cwv.googlegroups .com>,
SR <sh**@shay-riggs.fsnet.co.uk> wrote:

As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.

Say I have three tables.

Table "books" contains rows for book_id, title, subtitle, ISBN.

Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.

Table "bookauthors" contains two rows: book_id, author_id.

The bookauthors table links the books and authors tables.

Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:

1. retrieve a list of all book_ids and book_titles.

2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.

3. display it all out as an html table on a web page.

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.

I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique?

Yup. The technique is called "using a relational database". This is
precisely the sort of thing SQL does well. Let's say you want to find
out who wrote 'The Hitchhikers Guide to the Galaxy'. You could do the
following (all sql untested and, let's face it, probably not understood
by author):

1. Query for that book to get the book_id
SELECT id FROM books WHERE title='The Hitchhikers Guide To The Galaxy'

2. Look up that author id in the bookauthor database
SELECT author_id FROM bookauthors WHERE book_id=<book id>

3. Look up that author in the author database
SELECT name FROM authors WHERE id=<author id>

or do

SELECT name FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id
AND title='The Hitchhikers Guide To The Galaxy'

Slick, no? You want something like:

SELECT title, name, book_id FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id

If you have more than one author for a book then the book will
appear in the table multiple times. You'll have to combine
those yourself (the book_id row can help here. I don't know
if you can leverage more SQL for that job).

You can optimize some of these SQL queries if you like.
Optimizing JOINs, which is what these are) is a serious
business, but for piddly databases of this size it really
isn't necessary.

Alan
--
Defendit numerus

Apr 10 '06 #4

Lawrence D'Oliveiro

In article <11**********************@v46g2000cwv.googlegroups .com>,
"SR" <sh**@shay-riggs.fsnet.co.uk> wrote:

The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book)

If your database is that small, why bother with sophisticated
relational-database queries at all? Why not just load everything into
memory, and use sets and dicts and things to put it all together?

This is feasible for databases with up to thousands or even tens of
thousands of records in them, on today's machines.

Apr 11 '06 #5

Martin Christensen said:

SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:

SR> 1. retrieve a list of all book_ids and book_titles.

SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.

SR> 3. display it all out as an html table on a web page.

That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:
<SNIP>
This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :-) Learn at your own pace,
though, but you might want to keep this in mind for future reference.

Thanks for that... I'm not going to argue with a decade's experience!
I'd never heard of aggregates before, but I'll look into them. Perhaps
I'll be able to impress my friends with them one day.

The reason for keeping the authors separate was to wrap them with an
appropriate HTML href, but presumably your solution could be adapted
for this purpose?

Cheers,

Shay

Apr 11 '06 #6

>> Say I have three tables.

Only three? <G>
Well, yeah, OK, it's more than that, but after years of being worn away
by "Post a minimal example" requests on comp.text.tex, a minimal
example is what you got...
Something like {untested... Might need to do a subselect for the
second JOIN}:

SELECT book_id, title, subtitle, ISBN, surname, firstname, notes from
books
LEFT OUTER JOIN bookauthors on books.book_id = bookauthors.book_id
JOIN authors on bookauthors.author_id = authors.author_id
ORDER BY books.book_id

The reason for the LEFT OUTER JOIN, if I recall the syntax, is to
ensure that you get any books that don't have any authors. The sort
order is to: one) make sure the records are grouped properly for later
processing
Thanks for the stuff on LEFT OUTER JOIN. Authorless books would be one
of those things I wouldn't have noticed going astray.
The output will duplicate the book information for those books that
have multiple authors (the simple meaning of "unnormalized"):

2,A Book, Of Nothing, 123, Who, Guess, something
2,A Book, Of Nothing, 123, Second, I'm, or other
I think this goes along with what I thought of immediately after
posting the question: one query to gather all info needed, then
post-process in Python to order it all (so *that's* why I posted
here...). My thoughts had been to turn

[ 1, "Puppetry", "Bill" ]
[ 1, "Puppetry", "Ben" ]
[ 1, "Puppetry", "Flowerpot Men" ]

into

[ 1, "Puppetry", [ "Bill", "Ben", "Flowerpot Men" ] ]

(if that's not overcomplicating it a bit)...
To make your report, you would output the book specific information
only when it changes (this means you need to initialize a temp record to
null data, and compare each record to the temp; when the compare fails,
put out the new book data, and copy it into the temp -- in this example,
just saving the book ID number would be sufficient, as long as it is a
unique/primary key). THEN, put out the Author information. If the
comparison of book data passes, it is the same book with an additional
author, you just need to output the author data.

tmp_bookID = None
for bk in theCursor:
if tmp_bookID != bk[0]: #assumes book_id is first field
Output_Book_Data(bk)
tmp_bookID = bk[0]
Output_Author_Data(bk)

.... which appears to be along the lines of what your code does! (Where
Output_Author_Data(bk) could append to the author list of the current
book.

I'll go away and see how I can 'adapt' your example code.

Thanks!

Shay

Apr 11 '06 #7

Similar topics

Getting error codes for failed queries?

by: Alejandro Forero Cuervo | last post by:

Hello. I'm new to PostgreSQL and I'm using it for some project. I am currently creating an entity engine that provides a web interface, which is oriented towards end users, to...

PostgreSQL Database

pass-through queries and parameters

by: Zlatko Matiæ | last post by:

Let's assume that we have a database on some SQL server (let it be MS SQL Server) and that we want to execute some parameterized query as a pass.through query. How can we pass parameters to the...

Microsoft Access / VBA

Logging all queries

by: MaRCeLO PeReiRA | last post by:

Hi All, Is there any way to log (save) "all" the queries (select, insert, updade, everything) executed on a database, in a table?? Regards, Marcelo

PostgreSQL Database

Moving from MySQL to PGSQL....some questions

by: Karam Chand | last post by:

Hello I have been working with Access and MySQL for pretty long time. Very simple and able to perform their jobs. I dont need to start a flame anymore :) I have to work with PGSQL for my...

PostgreSQL Database

special table queries slow until dump/restore

by: Damon Hart | last post by:

Hi all - I am experiencing continually degrading performance on queries run against the special system tables. I notice the slowdown when these meta-data queries are run implicitly "behind the...

PostgreSQL Database

PostgreSQL seems to wait and block all the queries

by: HM | last post by:

Hello ! ----------------------------------------------------------------------------- I would like to know why the database seems frozen. - psql can access to database on the database server...

PostgreSQL Database

can I reduce the number of queries?

by: jonm4102 | last post by:

Thanks for your previous post. I've run into another hurdle, and would greatly appreciate any help you may be able to offer. I have revenue data sorted by month and years for a number of distinctly...

Microsoft Access / VBA

Running queries on large data structure

by: Christoph Haas | last post by:

Hi, list... I have written an application in Perl some time ago (I was young and needed the money) that parses multiple large text files containing nested data structures and allows the user to...

Python

How to reduce the response time

by: Simon | last post by:

Dear reader, I have an Access application which works as back-end and front-end. In case it's running on a local PC it works perfect. If I install it on a server the response time is...

Microsoft Access / VBA

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General