As a starter project for learning Python/PostgreSQL, I am building a
Books database that stores information on the books on my bookshelf.
Say I have three tables.
Table "books" contains rows for book_id, title, subtitle, ISBN.
Table "authors" contains rows for author_id, author surname, author
first names, biographical notes.
Table "bookauthors" contains two rows: book_id, author_id.
The bookauthors table links the books and authors tables.
Scenario: I have a python script which creates web page listing all
books in the database, and all authors for each book. My python script
does essentially three things:
1. retrieve a list of all book_ids and book_titles.
2. for each book_id, query the bookauthors table and retrieve all
author names for that book_id.
3. display it all out as an html table on a web page.
The script works fine, if a little slow. I think that's because if I
have 50 books in my database, my script performs 51 database queries (1
for all book names; then 1 for each book). A colleague of mine
suggested that I could get away with two queries, 1 to read the book
ids and titles, and 1 to read the bookauthors table to pull in *all*
relations, and then do all the work in Python.
I think I know where he's coming from, but I don't know where to begin.
Any clues? Is there a specific name for this technique? 6 1663
SR wrote: As a starter project for learning Python/PostgreSQL, I am building a Books database that stores information on the books on my bookshelf.
Say I have three tables.
Table "books" contains rows for book_id, title, subtitle, ISBN.
Table "authors" contains rows for author_id, author surname, author first names, biographical notes.
Table "bookauthors" contains two rows: book_id, author_id.
The bookauthors table links the books and authors tables.
Scenario: I have a python script which creates web page listing all books in the database, and all authors for each book. My python script does essentially three things:
1. retrieve a list of all book_ids and book_titles.
2. for each book_id, query the bookauthors table and retrieve all author names for that book_id.
3. display it all out as an html table on a web page.
The script works fine, if a little slow. I think that's because if I have 50 books in my database, my script performs 51 database queries (1 for all book names; then 1 for each book). A colleague of mine suggested that I could get away with two queries, 1 to read the book ids and titles, and 1 to read the bookauthors table to pull in *all* relations, and then do all the work in Python.
I think I know where he's coming from, but I don't know where to begin. Any clues? Is there a specific name for this technique?
The specific name you are looking for is to 'join' tables. There will
be many references and tutorials available, but I suggest you start
with the PostgreSQL tutorial, which is part of the documentation
supplied with PostgreSQL.
Here is a link to the 'join' command in the online manual. http://www.postgresql.org/docs/8.1/i...rial-join.html
HTH
Frank Millman
>>>>> "SR" == <sh**@shay-riggs.fsnet.co.uk> writes:
SR> Scenario: I have a python script which creates web page listing
SR> all books in the database, and all authors for each book. My
SR> python script does essentially three things:
SR> 1. retrieve a list of all book_ids and book_titles.
SR> 2. for each book_id, query the bookauthors table and retrieve all
SR> author names for that book_id.
SR> 3. display it all out as an html table on a web page.
That's one query, if you're willing to make it advanced enough,
although you need to make an aggregate to enable PostgreSQL to
concatenate and comma separate author names. However, this aggregate
will typically need more than one database function. Such an aggregate
could be as follows:
CREATE OR REPLACE FUNCTION author_agg_sfunc(TEXT, authors.name%TYPE)
RETURNS TEXT AS '
SELECT $1 || '', '' || $2;
' LANGUAGE sql;
CREATE OR REPLACE FUNCTION author_agg_ffunc(TEXT)
RETURNS TEXT AS '
SELECT trim(trailing '', '' from $1);
' LANGUAGE sql;
CREATE AGGREGATE author_agg (
basetype = VARCHAR(100),
sfunc = author_agg_sfunc,
stype = TEXT,
finalfunc = author_agg_ffunc,
initcond = ''
);
Then you could use it as follows:
SELECT author_agg(authors.name),
foo,
bar
FROM authors, writes, books
WHERE authors.id = writes.author_id
AND writes.book_id = books.id
GROUP BY foo, bar;
This is the solution that I would use after working nearly a decade
with databases. It is neither simple nor obvious to the novice, but
it's the Right Way To Do It. For a learning exercise, this is way over
the top, but I thought you might benefit from seeing that - so long as
you only need information that would reasonably fit in one table on a
web page or the equivalent - one query is always enough. Or perhaps
that should be One Query Is Always Enough. :-) Learn at your own pace,
though, but you might want to keep this in mind for future reference.
Martin
In article <11**********************@v46g2000cwv.googlegroups .com>,
SR <sh**@shay-riggs.fsnet.co.uk> wrote: As a starter project for learning Python/PostgreSQL, I am building a Books database that stores information on the books on my bookshelf.
Say I have three tables.
Table "books" contains rows for book_id, title, subtitle, ISBN.
Table "authors" contains rows for author_id, author surname, author first names, biographical notes.
Table "bookauthors" contains two rows: book_id, author_id.
The bookauthors table links the books and authors tables.
Scenario: I have a python script which creates web page listing all books in the database, and all authors for each book. My python script does essentially three things:
1. retrieve a list of all book_ids and book_titles.
2. for each book_id, query the bookauthors table and retrieve all author names for that book_id.
3. display it all out as an html table on a web page.
The script works fine, if a little slow. I think that's because if I have 50 books in my database, my script performs 51 database queries (1 for all book names; then 1 for each book). A colleague of mine suggested that I could get away with two queries, 1 to read the book ids and titles, and 1 to read the bookauthors table to pull in *all* relations, and then do all the work in Python.
I think I know where he's coming from, but I don't know where to begin. Any clues? Is there a specific name for this technique?
Yup. The technique is called "using a relational database". This is
precisely the sort of thing SQL does well. Let's say you want to find
out who wrote 'The Hitchhikers Guide to the Galaxy'. You could do the
following (all sql untested and, let's face it, probably not understood
by author):
1. Query for that book to get the book_id
SELECT id FROM books WHERE title='The Hitchhikers Guide To The Galaxy'
2. Look up that author id in the bookauthor database
SELECT author_id FROM bookauthors WHERE book_id=<book id>
3. Look up that author in the author database
SELECT name FROM authors WHERE id=<author id>
or do
SELECT name FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id
AND title='The Hitchhikers Guide To The Galaxy'
Slick, no? You want something like:
SELECT title, name, book_id FROM authors, books, bookauthors
WHERE books.id=bookauthors.book_id
AND authors.id=bookauthors.author_id
If you have more than one author for a book then the book will
appear in the table multiple times. You'll have to combine
those yourself (the book_id row can help here. I don't know
if you can leverage more SQL for that job).
You can optimize some of these SQL queries if you like.
Optimizing JOINs, which is what these are) is a serious
business, but for piddly databases of this size it really
isn't necessary.
Alan
--
Defendit numerus
In article <11**********************@v46g2000cwv.googlegroups .com>,
"SR" <sh**@shay-riggs.fsnet.co.uk> wrote: The script works fine, if a little slow. I think that's because if I have 50 books in my database, my script performs 51 database queries (1 for all book names; then 1 for each book)
If your database is that small, why bother with sophisticated
relational-database queries at all? Why not just load everything into
memory, and use sets and dicts and things to put it all together?
This is feasible for databases with up to thousands or even tens of
thousands of records in them, on today's machines.
Martin Christensen said: SR> Scenario: I have a python script which creates web page listing SR> all books in the database, and all authors for each book. My SR> python script does essentially three things:
SR> 1. retrieve a list of all book_ids and book_titles.
SR> 2. for each book_id, query the bookauthors table and retrieve all SR> author names for that book_id.
SR> 3. display it all out as an html table on a web page.
That's one query, if you're willing to make it advanced enough, although you need to make an aggregate to enable PostgreSQL to concatenate and comma separate author names. However, this aggregate will typically need more than one database function. Such an aggregate could be as follows:
<SNIP>
This is the solution that I would use after working nearly a decade with databases. It is neither simple nor obvious to the novice, but it's the Right Way To Do It. For a learning exercise, this is way over the top, but I thought you might benefit from seeing that - so long as you only need information that would reasonably fit in one table on a web page or the equivalent - one query is always enough. Or perhaps that should be One Query Is Always Enough. :-) Learn at your own pace, though, but you might want to keep this in mind for future reference.
Thanks for that... I'm not going to argue with a decade's experience!
I'd never heard of aggregates before, but I'll look into them. Perhaps
I'll be able to impress my friends with them one day.
The reason for keeping the authors separate was to wrap them with an
appropriate HTML href, but presumably your solution could be adapted
for this purpose?
Cheers,
Shay
>> Say I have three tables. Only three? <G>
Well, yeah, OK, it's more than that, but after years of being worn away
by "Post a minimal example" requests on comp.text.tex, a minimal
example is what you got...
Something like {untested... Might need to do a subselect for the second JOIN}:
SELECT book_id, title, subtitle, ISBN, surname, firstname, notes from books LEFT OUTER JOIN bookauthors on books.book_id = bookauthors.book_id JOIN authors on bookauthors.author_id = authors.author_id ORDER BY books.book_id
The reason for the LEFT OUTER JOIN, if I recall the syntax, is to ensure that you get any books that don't have any authors. The sort order is to: one) make sure the records are grouped properly for later processing
Thanks for the stuff on LEFT OUTER JOIN. Authorless books would be one
of those things I wouldn't have noticed going astray.
The output will duplicate the book information for those books that have multiple authors (the simple meaning of "unnormalized"):
2,A Book, Of Nothing, 123, Who, Guess, something 2,A Book, Of Nothing, 123, Second, I'm, or other
I think this goes along with what I thought of immediately after
posting the question: one query to gather all info needed, then
post-process in Python to order it all (so *that's* why I posted
here...). My thoughts had been to turn
[ 1, "Puppetry", "Bill" ]
[ 1, "Puppetry", "Ben" ]
[ 1, "Puppetry", "Flowerpot Men" ]
into
[ 1, "Puppetry", [ "Bill", "Ben", "Flowerpot Men" ] ]
(if that's not overcomplicating it a bit)...
To make your report, you would output the book specific information only when it changes (this means you need to initialize a temp record to null data, and compare each record to the temp; when the compare fails, put out the new book data, and copy it into the temp -- in this example, just saving the book ID number would be sufficient, as long as it is a unique/primary key). THEN, put out the Author information. If the comparison of book data passes, it is the same book with an additional author, you just need to output the author data.
tmp_bookID = None for bk in theCursor: if tmp_bookID != bk[0]: #assumes book_id is first field Output_Book_Data(bk) tmp_bookID = bk[0] Output_Author_Data(bk)
.... which appears to be along the lines of what your code does! (Where
Output_Author_Data(bk) could append to the author list of the current
book.
I'll go away and see how I can 'adapt' your example code.
Thanks!
Shay This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Alejandro Forero Cuervo |
last post by:
Hello.
I'm new to PostgreSQL and I'm using it for some project. I
am currently creating an entity engine that provides a web
interface, which is oriented towards end users, to...
|
by: Zlatko Matić |
last post by:
Let's assume that we have a database on some SQL server (let it be MS SQL
Server) and that we want to execute some parameterized query as a
pass.through query. How can we pass parameters to the...
|
by: MaRCeLO PeReiRA |
last post by:
Hi All,
Is there any way to log (save) "all" the queries
(select, insert, updade, everything) executed on a
database, in a table??
Regards,
Marcelo
|
by: Karam Chand |
last post by:
Hello
I have been working with Access and MySQL for pretty
long time. Very simple and able to perform their jobs.
I dont need to start a flame anymore :)
I have to work with PGSQL for my...
|
by: Damon Hart |
last post by:
Hi all -
I am experiencing continually degrading performance on queries run
against the special system tables. I notice the slowdown when these
meta-data queries are run implicitly "behind the...
|
by: HM |
last post by:
Hello !
-----------------------------------------------------------------------------
I would like to know why the database seems frozen.
- psql can access to database on the database server...
|
by: jonm4102 |
last post by:
Thanks for your previous post. I've run into another hurdle, and would
greatly appreciate any help you may be able to offer. I have revenue
data sorted by month and years for a number of distinctly...
|
by: Christoph Haas |
last post by:
Hi, list...
I have written an application in Perl some time ago (I was young and needed
the money) that parses multiple large text files containing nested data
structures and allows the user to...
|
by: Simon |
last post by:
Dear reader,
I have an Access application which works as back-end and front-end.
In case it's running on a local PC it works perfect.
If I install it on a server the response time is...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |