By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,385 Members | 847 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,385 IT Pros & Developers. It's quick & easy.

LEFT JOIN on very large tables

P: n/a
My database has two tables - films has 10,000 records and reviews has 20,000 records. Whenever I try to list all the films and the
count of its reviews, MySQL locks and I have to restart the service.

Here's the SQL...

SELECT films.*, COUNT(reviews.id)
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
GROUP BY films.id
LIMIT 0, 100;

JOIN works but doesn't return records for films with no reviews. Please help!

--

StealthBanana
Jul 17 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a
In article <d7**********@news8.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
My database has two tables - films has 10,000 records and reviews has 20,000
records. Whenever I try to list all the films and the
count of its reviews, MySQL locks and I have to restart the service.

Here's the SQL...

SELECT films.*, COUNT(reviews.id)
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
GROUP BY films.id
LIMIT 0, 100;

JOIN works but doesn't return records for films with no reviews. Please
help!


Are films.id and reviews.filmID indexes?

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #2

P: n/a
"StealthBananaT" <no***@nowhere.com> wrote:
My database has two tables - films has 10,000 records and reviews has 20,000 records. Whenever I try to list all the films and the
count of its reviews, MySQL locks and I have to restart the service.

Here's the SQL...

SELECT films.*, COUNT(reviews.id)
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
GROUP BY films.id
LIMIT 0, 100;

JOIN works but doesn't return records for films with no reviews. Please help!


This shouldn't work at all unless "id" is the only field in "films". The
GROUP BY clause must include every field in the SELECT list that isn't in
an aggregate function.

Do you have indexes on films.id, reviews.id, and reviews.filmID?
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jul 17 '05 #3

P: n/a
"Tim Roberts" <ti**@probo.com> wrote in message news:b6********************************@4ax.com...
"StealthBananaT" <no***@nowhere.com> wrote:
My database has two tables - films has 10,000 records and reviews has 20,000 records. Whenever I try to list all the films and
the
count of its reviews, MySQL locks and I have to restart the service.

Here's the SQL...

SELECT films.*, COUNT(reviews.id)
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
GROUP BY films.id
LIMIT 0, 100;

JOIN works but doesn't return records for films with no reviews. Please help!


This shouldn't work at all unless "id" is the only field in "films". The
GROUP BY clause must include every field in the SELECT list that isn't in
an aggregate function.

Do you have indexes on films.id, reviews.id, and reviews.filmID?


The code definitely works - I've tried it with fewer records.

There's no index on reviews.filmID. That is a foreign key so that I can follow which film each review is of.

--

StealthBanana
Jul 17 '05 #4

P: n/a
"Michael Vilain" <vi****@spamcop.net> wrote in message news:vi**************************@comcast.dca.giga news.com...
In article <d7**********@news8.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
My database has two tables - films has 10,000 records and reviews has 20,000
records. Whenever I try to list all the films and the
count of its reviews, MySQL locks and I have to restart the service.

Here's the SQL...

SELECT films.*, COUNT(reviews.id)
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
GROUP BY films.id
LIMIT 0, 100;

JOIN works but doesn't return records for films with no reviews. Please
help!


Are films.id and reviews.filmID indexes?


Both tables have id as their index. filmID is a foreign key so that I know which film the review is of. Here's the SQL that
created both tables...

CREATE TABLE `films` (
`id` int(10) unsigned NOT NULL auto_increment,
`title` varchar(100) NOT NULL default '',
PRIMARY KEY (`id`)
)

CREATE TABLE `reviews` (
`id` int(10) unsigned NOT NULL auto_increment,
`filmID` int(10) unsigned NOT NULL default '0',
`author` varchar(20) NOT NULL default '',
`date` int(10) unsigned NOT NULL default '0',
`region` tinyint(1) unsigned NOT NULL default '0',
`review` longtext NOT NULL,
`rating` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`id`)
)

--

StealthBanana
Jul 17 '05 #5

P: n/a
On Wed, 25 May 2005 12:52:08 +0100, StealthBananaT wrote:
My database has two tables - films has 10,000 records and reviews has 20,000 records.
SELECT films.*, COUNT(reviews.id)


The code definitely works - I've tried it with fewer records.


Try restricting the number of fields you select from films to just the
title, for example.
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #6

P: n/a
On Wed, 25 May 2005 13:54:17 +0200, I wrote:
Try restricting the number of fields you select from films to just the
title, for example.


Oh, I see on your other reply `films` *is* just the title... (and id).
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #7

P: n/a
In article <d7**********@newsg3.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
"Michael Vilain" <vi****@spamcop.net> wrote in message
"news:vi**************************@comcast.dca.gig anews.com...
In article <d7**********@news8.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
My database has two tables - films has 10,000 records and reviews has
20,000
records. Whenever I try to list all the films and the
count of its reviews, MySQL locks and I have to restart the service.

Here's the SQL...

SELECT films.*, COUNT(reviews.id)
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
GROUP BY films.id
LIMIT 0, 100;

JOIN works but doesn't return records for films with no reviews. Please
help!


Are films.id and reviews.filmID indexes?


Both tables have id as their index. filmID is a foreign key so that I know
which film the review is of. Here's the SQL that
created both tables...

CREATE TABLE `films` (
`id` int(10) unsigned NOT NULL auto_increment,
`title` varchar(100) NOT NULL default '',
PRIMARY KEY (`id`)
)

CREATE TABLE `reviews` (
`id` int(10) unsigned NOT NULL auto_increment,
`filmID` int(10) unsigned NOT NULL default '0',
`author` varchar(20) NOT NULL default '',
`date` int(10) unsigned NOT NULL default '0',
`region` tinyint(1) unsigned NOT NULL default '0',
`review` longtext NOT NULL,
`rating` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`id`)
)


In order to not do a linear table scan, you must create indices on all
fields that require random access or are part of a join. Create an
index for the reviews.filmID field.

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #8

P: n/a
But you're joining the reviews table on reviews.filmID -- all the more
reason to create an index on it. With no index, the database is likely
scanning the reviews table for every row in films...in other words,
it's scanning through 20,000 records 10,000 times over.

Jul 17 '05 #9

P: n/a
"ZeldorBlat" <ze********@gmail.com> wrote in message news:11**********************@g44g2000cwa.googlegr oups.com...
But you're joining the reviews table on reviews.filmID -- all the more
reason to create an index on it. With no index, the database is likely
scanning the reviews table for every row in films...in other words,
it's scanning through 20,000 records 10,000 times over.


I've added an index to filmID (I think), but I'm having the same problem.

According to MySQLCC, the SQL that would create the reviews table now looks like this

CREATE TABLE `reviews` (
`id` int(10) unsigned NOT NULL auto_increment,
`filmID` int(10) unsigned NOT NULL default '0',
`author` varchar(20) NOT NULL default '',
`date` int(10) unsigned NOT NULL default '0',
`region` tinyint(1) unsigned NOT NULL default '0',
`review` longtext NOT NULL,
`rating` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `FOREIGN` (`filmID`)
);

Any help would be greatly appreciated.

--

StealthBanana
Jul 17 '05 #10

P: n/a
"Michael Vilain" <vi****@spamcop.net> wrote in message news:vi**************************@comcast.dca.giga news.com...
In article <d7**********@newsg3.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
"Michael Vilain" <vi****@spamcop.net> wrote in message
"news:vi**************************@comcast.dca.gig anews.com...
> In article <d7**********@news8.svr.pol.co.uk>,
> "StealthBananaT" <no***@nowhere.com> wrote:
>
>> My database has two tables - films has 10,000 records and reviews has
>> 20,000
>> records. Whenever I try to list all the films and the
>> count of its reviews, MySQL locks and I have to restart the service.
>>
>> Here's the SQL...
>>
>> SELECT films.*, COUNT(reviews.id)
>> FROM films LEFT JOIN reviews ON films.id = reviews.filmID
>> GROUP BY films.id
>> LIMIT 0, 100;
>>
>> JOIN works but doesn't return records for films with no reviews. Please
>> help!
>
> Are films.id and reviews.filmID indexes?
>


Both tables have id as their index. filmID is a foreign key so that I know
which film the review is of. Here's the SQL that
created both tables...

CREATE TABLE `films` (
`id` int(10) unsigned NOT NULL auto_increment,
`title` varchar(100) NOT NULL default '',
PRIMARY KEY (`id`)
)

CREATE TABLE `reviews` (
`id` int(10) unsigned NOT NULL auto_increment,
`filmID` int(10) unsigned NOT NULL default '0',
`author` varchar(20) NOT NULL default '',
`date` int(10) unsigned NOT NULL default '0',
`region` tinyint(1) unsigned NOT NULL default '0',
`review` longtext NOT NULL,
`rating` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`id`)
)


In order to not do a linear table scan, you must create indices on all
fields that require random access or are part of a join. Create an
index for the reviews.filmID field.


I'd like to thank everybody for their help, but the problem remains. I've tried making reviews.filmID an index but that seemed to
have no effect.

I have a third table called users, and when I try to count the number of reviews submitted by users I have exactly the same
problem...

SELECT users.name, COUNT(reviews.id)
FROM users LEFT JOIN reviews ON users.name = reviews.author
GROUP BY users.name;

This is the users table...

CREATE TABLE `users` (
`name` varchar(20) NOT NULL default '',
`password` varchar(40) NOT NULL default '',
`level` tinyint(1) unsigned NOT NULL default '0',
`email` varchar(100) NOT NULL default '',
`sessionID` varchar(100) default '0',
PRIMARY KEY (`name`)
);

I'm guessing that the problem is linked to the sheer volume of the data in the tables.

--

StealthBanana
Jul 17 '05 #11

P: n/a
In article <d7**********@newsg3.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
"Michael Vilain" <vi****@spamcop.net> wrote in message
"news:vi**************************@comcast.dca.gig anews.com...
In article <d7**********@newsg3.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
"Michael Vilain" <vi****@spamcop.net> wrote in message
"news:vi**************************@comcast.dca.gig anews.com...
> In article <d7**********@news8.svr.pol.co.uk>,
> "StealthBananaT" <no***@nowhere.com> wrote:
>
>> My database has two tables - films has 10,000 records and reviews has
>> 20,000
>> records. Whenever I try to list all the films and the
>> count of its reviews, MySQL locks and I have to restart the service.
>>
>> Here's the SQL...
>>
>> SELECT films.*, COUNT(reviews.id)
>> FROM films LEFT JOIN reviews ON films.id = reviews.filmID
>> GROUP BY films.id
>> LIMIT 0, 100;
>>
>> JOIN works but doesn't return records for films with no reviews.
>> Please
>> help!
>
> Are films.id and reviews.filmID indexes?
>

Both tables have id as their index. filmID is a foreign key so that I
know
which film the review is of. Here's the SQL that
created both tables...

CREATE TABLE `films` (
`id` int(10) unsigned NOT NULL auto_increment,
`title` varchar(100) NOT NULL default '',
PRIMARY KEY (`id`)
)

CREATE TABLE `reviews` (
`id` int(10) unsigned NOT NULL auto_increment,
`filmID` int(10) unsigned NOT NULL default '0',
`author` varchar(20) NOT NULL default '',
`date` int(10) unsigned NOT NULL default '0',
`region` tinyint(1) unsigned NOT NULL default '0',
`review` longtext NOT NULL,
`rating` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`id`)
)


In order to not do a linear table scan, you must create indices on all
fields that require random access or are part of a join. Create an
index for the reviews.filmID field.


I'd like to thank everybody for their help, but the problem remains. I've
tried making reviews.filmID an index but that seemed to
have no effect.

I have a third table called users, and when I try to count the number of
reviews submitted by users I have exactly the same
problem...

SELECT users.name, COUNT(reviews.id)
FROM users LEFT JOIN reviews ON users.name = reviews.author
GROUP BY users.name;

This is the users table...

CREATE TABLE `users` (
`name` varchar(20) NOT NULL default '',
`password` varchar(40) NOT NULL default '',
`level` tinyint(1) unsigned NOT NULL default '0',
`email` varchar(100) NOT NULL default '',
`sessionID` varchar(100) default '0',
PRIMARY KEY (`name`)
);

I'm guessing that the problem is linked to the sheer volume of the data in
the tables.


what happens if you try using the "EXPLAIN" directive to have MySQL tell
you what it would try to do for your query?

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #12

P: n/a
On Thu, 26 May 2005 04:13:41 +0100, "StealthBananaT" <no***@nowhere.com> wrote:
"Michael Vilain" <vi****@spamcop.net> wrote in message news:vi**************************@comcast.dca.giga news.com...
In article <d7**********@newsg3.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:
"Michael Vilain" <vi****@spamcop.net> wrote in message
"news:vi**************************@comcast.dca.gig anews.com...
In article <d7**********@news8.svr.pol.co.uk>,
"StealthBananaT" <no***@nowhere.com> wrote:

> My database has two tables - films has 10,000 records and reviews has
> 20,000
> records. Whenever I try to list all the films and the
> count of its reviews, MySQL locks and I have to restart the service.
>
> Here's the SQL...
>
> SELECT films.*, COUNT(reviews.id)
Change that to films.id, count(reviews.id) and it make more sense given the
GROUP BY, although that's already been pointed out.
FROM films LEFT JOIN reviews ON films.id = reviews.filmID
> GROUP BY films.id
> LIMIT 0, 100;
>
> JOIN works but doesn't return records for films with no reviews. Please
> help!

Are films.id and reviews.filmID indexes?
Both tables have id as their index. filmID is a foreign key so that I know
which film the review is of. Here's the SQL that
created both tables...

CREATE TABLE `films` (
`id` int(10) unsigned NOT NULL auto_increment,
`title` varchar(100) NOT NULL default '',
PRIMARY KEY (`id`)
)

CREATE TABLE `reviews` (
`id` int(10) unsigned NOT NULL auto_increment,
`filmID` int(10) unsigned NOT NULL default '0',
`author` varchar(20) NOT NULL default '',
`date` int(10) unsigned NOT NULL default '0',
`region` tinyint(1) unsigned NOT NULL default '0',
`review` longtext NOT NULL,
`rating` tinyint(1) unsigned NOT NULL default '0',
PRIMARY KEY (`id`)
)


In order to not do a linear table scan, you must create indices on all
fields that require random access or are part of a join. Create an
index for the reviews.filmID field.


I'd like to thank everybody for their help, but the problem remains. I've tried making reviews.filmID an index but that seemed to
have no effect.

I have a third table called users, and when I try to count the number of reviews submitted by users I have exactly the same
problem...

SELECT users.name, COUNT(reviews.id)
FROM users LEFT JOIN reviews ON users.name = reviews.author
GROUP BY users.name;

I'm guessing that the problem is linked to the sheer volume of the data in the tables.


Tables with 10k and 20k rows are not large in terms of rows - but the thing
that catches my attention is the "longtext" field in review.

This could make the table physically quite large; if MySQL is scanning the
table even once, then that could take some time.

You might consider moving the `review` field out to a separate table, with
primary key the same as reviews. That way you can run your queries on the
review metadata without it having to work through the full text of the review.

mysql> select count(*) from films;
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from reviews;
+----------+
| count(*) |
+----------+
| 20000 |
+----------+
1 row in set (0.00 sec)

mysql> select films.id, count(*)
-> from films
-> left outer join reviews
-> on (films.id = reviews.filmID)
-> group by films.id
-> limit 0, 100;

+-----+----------+
| id | count(*) |
+-----+----------+
| 1 | 2 |
[ snipped ]
| 99 | 1 |
| 100 | 3 |
+-----+----------+
100 rows in set (0.02 sec)

That's on an ancient Pentium 200 whilst it's compiling something in the
background, with the tables filled with random-ish data, an index on filmID,
but the review field containing only the string 'a review'.

But load up the reviews table with 250 words in each 'review' field and things
take a turn for the worse; left it running for 15 minutes with no result.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #13

P: n/a
On Thu, 26 May 2005 04:13:41 +0100, StealthBananaT wrote:

I'd like to thank everybody for their help, but the problem remains. I've tried making reviews.filmID an index but that seemed to
have no effect.

I have a third table called users, and when I try to count the number of reviews submitted by users I have exactly the same
problem...


LIMIT can be very ineffiecient at returning results, effectively.

In your case it'll have to calculate the number of reviews for 10,000
before returning 100 rows.

SELECT f.title, f.id, count( r.id )
FROM films f
LEFT JOIN reviews r ON ( f.id = r.filmid )
WHERE f.id > 0 AND f.id <= 100
GROUP BY f.id

Use the where clause for pagination, you'll find your system responds a
lot quicker ;-)

Jul 17 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.