473,513 Members | 4,022 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How can I do this in SQL?

MySQL 4.0.23-standard

I run an online movie database with over 80,000 movies and over 500,000
people stored within.

We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directedBy` and `movies_producedBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationship" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn`, which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I'd like to know whether it's possible to do this entirely in SQL,
since it's really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

Is it possible?

Thanks in advance for any help

Jasper Bryant-Greene
Cabbage Promotions

Jul 23 '05 #1
11 1708
MLH
<snip>
We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directedBy` and `movies_producedBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationship" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn`, which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I'd like to know whether it's possible to do this entirely in SQL,
since it's really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

I suggest you start with a better normalization scheme. Restructure
your data into tblMovies, tblDirectors, tblProducers and maybe
throw in tblActors for good measure. Make sure each has primary
key field (natural keys or otherwise). Your tblMovies table should
have link fields to the primary keyfields in tblDirectors and
tblProducers and maybe fields like [StarActor] & [StarActress].
However, I would suggest a junction table between tblMovies
and tblActors, as movies can have several (3 - 5 - a dozen...)
co-stars. Call it tblCoStars and furnish it with [ActorID] and
[MovieID] fields with a 1-to-many relationship between itself
and each of the other 2 tables, with tblCoStars being the many
side of both relationships. Build yourself indices in tblMovies on
each of these fields: [MovieID] (of course), then [DirectorID],
[ProducerID] and [ActorID].

Here's some sample SQL that should churn out a lot of rows
if you have the rich & full datasets you mentioned...

SELECT DISTINCTROW tblMovies.MovieName, tblMovies.ProducerID,
tblMovies.DirectorID, tblMovies.ReleaseDate, tblMovies.BoxOfficeGross,
tblMovies.MovieRating, tblMovies.MovieDescrip, [DirectorFName] & " " &
[DirectorLName] AS Director, [ProducerFName] & " " & [ProducerLName]
AS Producer
FROM (tblMovies INNER JOIN tblDirectors ON tblMovies.DirectorID =
tblDirectors.DirectorID) INNER JOIN tblProducers ON
tblMovies.ProducerID = tblProducers.ProducerID
WHERE ((tblMovies.ProducerID=1 Or tblMovies.ProducerID=19 Or
tblMovies.ProducerID=39 Or tblMovies.ProducerID=101 Or
tblMovies.ProducerID=311) AND (tblMovies.DirectorID=17 Or
tblMovies.DirectorID=77 Or tblMovies.DirectorID=211 Or
tblMovies.DirectorID=321));

(untested)
Jul 23 '05 #2
MLH
<big snip>
If you call your junction table tblCast & give it 2 fields: [MovieID]
and [ActorID], you could put it in the query to include co-stars in
the query dynaset.

The following should produce list of movies, producers, directors,
actors for a number of films directed by any of several different
directors OR produced by any of several different producers.

SELECT DISTINCTROW tblMovies.MovieName, tblMovies.ProducerID,
tblMovies.DirectorID, tblMovies.ReleaseDate, tblMovies.BoxOfficeGross,
tblMovies.MovieRating, tblMovies.MovieDescrip, [DirectorFName] & " " &
[DirectorLName] AS Director, [ProducerFName] & " " & [ProducerLName]
AS Producer, [ActorFName] & " " & [ActorLName] AS Actor
FROM tblActors INNER JOIN (((tblMovies INNER JOIN tblDirectors ON
tblMovies.DirectorID = tblDirectors.DirectorID) INNER JOIN
tblProducers ON tblMovies.ProducerID = tblProducers.ProducerID) INNER
JOIN tblCast ON tblMovies.MovieID = tblCast.MovieID) ON
(tblMovies.ActorID = tblActors.ActorID) AND (tblActors.ActorID =
tblCast.ActorID)
WHERE ((tblMovies.ProducerID=1 Or tblMovies.ProducerID=19 Or
tblMovies.ProducerID=39 Or tblMovies.ProducerID=101 Or
tblMovies.ProducerID=311)) OR ((tblMovies.DirectorID=17 Or
tblMovies.DirectorID=77 Or tblMovies.DirectorID=211 Or
tblMovies.DirectorID=321));

Jul 23 '05 #3
Hi

Thanks for your reply, but I think you have misunderstood me. I'm not
looking to create lists of actors, producers, directors, etc. for
movies. We already do that and it works fine. What I am looking to do
is create an SQL query that will do the equivilant of what I am
currently doing in PHP, which is:

1. Given a person ID, get the number of records in about 10-15
different tables (directors, producers, actors, crewmembers, sound
techs, art department, makeup, sfx, etc.).
2. Return the name of the table which had the largest number of
records.

This tells me what the person's predominant occupation is. The problem
is, step 1 requires 10-15 different SQL queries executing COUNT(*)
operations. I'm looking for a way to do it all in a single, or at least
in fewer, SQL queries, to improve performance.

Also the scheme you have suggested requires a huge amount of data
redundancy as a single person could be an actor, a producer, a
director, a crewmember and much more. In your scheme there would be
information stored seperately for each occupation of each person, an
exponentially larger amount of information. We can't afford for that
much data redundancy when we're dealing with millions of records.

We already have many more tables (actors, crewmembers, art dep., makeup
dep., sfx, visual fx, sound techs, motion control, etc. etc.) but I
didn't list them all as it wasnt necessary.

Jasper

Jul 23 '05 #4
On 07/04/2005, Jasper Bryant-Greene wrote:
We already have many more tables (actors, crewmembers, art dep.,
makeup dep., sfx, visual fx, sound techs, motion control, etc. etc.)
but I didn't list them all as it wasnt necessary.


Why do you make different tables for different occupations instead of
just a people-table, a movie-table and a table linking people/movies
with the occupation as extra attribute?

Then you can simply say:

SELECT
peoplemovies.occupation,
COUNT(peoplemovies.id) AS number
FROM peoplemovies
WHERE
peoplemovies.people_id = <<whatever>>
GROUP BY peoplemovies.occupation
ORDER BY number DESC
LIMIT 1

--
felix
Jul 23 '05 #5
<snip>
I apologize. I went back and read your question.
Here it is...

Is it possible?

Answer: Yes.

However, I would never consider constructing
a set of database tables as large as those you
described in the same manner you did. Follow
my other instructions and your sql will be much
more efficient and easier to write.
Jul 23 '05 #6
Felix -- we don't create seperate tables.

There is one `people` table which contains all the information specific
to that person.

There is then a huge set of linking tables, `movies_directedBy`,
`movies_writtenBy`, `movies_makeupBy`, `movies_soundBy`, `movies_artBy`
which contain just three columns -- the ID of the movie, the ID of the
person, and a notes column.

This is the best way to do it -- the method that Archie suggests is
not. If I followed Archie's instructions I would have the same number
of tables (one for each occupation) but I would have a lot of data
redundancy which is not acceptable.

I apologise if I haven't made myself clear in previous posts.

Jul 23 '05 #7
Sorry Felix, I re-read your post and now see what you are suggesting.
Excellent idea, I'm going to implement that.

Thanks

Jul 23 '05 #8
Jasper Bryant-Greene wrote:
In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.


I think the following might work, but I haven't tried it so I'm not sure
if there's a logical conflict with doing multiple aggregate functions in
this case.

SELECT p.person_ID,
COUNT(d.movie_ID) AS directedByCount,
COUNT(w.movie_ID) AS writtenByCount
FROM people AS p
LEFT OUTER JOIN movies_directedBy AS d ON p.persion_ID = d.person_ID
LEFT OUTER JOIN movies_writtenBy AS w ON p.persion_ID = w.person_ID
WHERE p.person_ID IN ( $my_person_id_list )
GROUP BY p.person_ID

It should be clear how to add extra joins and COUNT()'s for your other
matching tables. But keep in mind that MySQL has a limit of 31 tables
in a single join (or 63 tables if you're running on a 64-bit operating
system).

Then you can do your comparison between the counts in your PHP code and
decide who is a director, who is a writer, etc.

It occurs to me, though, do you make any allowances for _when_ the
person was a director vs. a writer? What if they were a writer from
1970-1979 and they wrote a lot of movies, but they've been a director
ever since 1979, though for for a smaller total number of projects? :-)

Regards,
Bill K.
Jul 23 '05 #9

"Jasper Bryant-Greene" <ja******@gmail.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
MySQL 4.0.23-standard

I run an online movie database with over 80,000 movies and over 500,000
people stored within.

We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directedBy` and `movies_producedBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationship" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn`, which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I'd like to know whether it's possible to do this entirely in SQL,
since it's really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

Is it possible?

Thanks in advance for any help

Jasper Bryant-Greene
Cabbage Promotions


Jasper,

Why are you doing dynamic queries on a relatively static database? Sure you
are adding new data but the new data is a very small percentage of the
total. Why not run a batch process on a regular basis to determine the
predominate occupation of a person? You could also take into account time
frames to give a more accurate occupation or just list the number of times
the person has acted, directed, produced, etc.. Storing this data and
updating it when needed will save you a vast amount of resources.

Kevin Sproule
Jul 23 '05 #10
Thanks everyone for your help. Kevin -- once we have the process of
actually figuring out the occupation sorted, yes, we will cache the
data in a simple `occupation` field in the `people` table. We need to
reliably calculate this information first though.

What I have now is the following SQL query:

SELECT b.name, COUNT(a.movie) AS n, c.id AS movieID, c.title, c.year
FROM movies_people AS a, occupations AS b, movies AS c
WHERE a.person=$id AND b.id=a.occupation AND c.id=a.movie
GROUP BY a.occupation
ORDER BY n DESC
LIMIT 1

the `movies_people` table is a link between the movie and the person,
the `occupations` table is a table of possible occupations, and the
`movies` table is the master movies table.

This returns me a single row containing the name of the person's
predominant occupation, for example director, and details about a movie
which they "performed that occupation" on, for example a movie they
directed.

The problem is, I want to retrieve the newest movie they performed this
occupation on. In other words, I want the row from the `movies` table
with the highest value for `year`. I've tried fiddling with GROUP BY
and ORDER BY clauses to no avail. Any clues?

By the way -- very good point regarding taking time into account. I
will look at this once I have sorted these issues.

Thanks again everyone for your help. Best regards

Jul 23 '05 #11
Jasper Bryant-Greene wrote:
SELECT b.name, COUNT(a.movie) AS n, c.id AS movieID, c.title, c.year
FROM movies_people AS a, occupations AS b, movies AS c
WHERE a.person=$id AND b.id=a.occupation AND c.id=a.movie
GROUP BY a.occupation
ORDER BY n DESC
LIMIT 1

The problem is, I want to retrieve the newest movie they performed this
occupation on. In other words, I want the row from the `movies` table
with the highest value for `year`.


I think in this case you're trying to get too many answers from one
query. As you've seen, you're getting values from an arbitrary record
in the movies table, out of those that match the grouping by a.occupation.

IMHO, MySQL fails to obey proper SQL standards, by allowing you to ask
for c.* in this query. This should be an error in SQL, because it gives
ambiguous results, because you aren't grouping by a unique key of the
movies table. In MySQL's case, it generally gives you the results from
the first row that matches based on the physical ordering of rows in the
database file; this ordering is arbitrary and in general beyond your
control.

But if you were to use GROUP BY c.id, of course this would not give the
desired results for COUNT(a.movie), which you want to be counted per
occupation type.

This is a pretty good clue that you're actually doing two different
queries, but trying to force them into one SQL statement.
It's not worth it, man! Don't be a hero! <g>

Regards,
Bill K.
Jul 23 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3329
by: James | last post by:
I have a from with 2 fields: Company & Name Depening which is completed, one of the following queries will be run: if($Company){ $query = "Select C* From tblsample Where ID = $Company Order By Company ASC";
5
2738
by: Scott D | last post by:
I am trying to check and see if a field is posted or not, if not posted then assign $location which is a session variable to $location_other. If it is posted then just assign it to $location_other I keep getting "Notice: Undefined index: location_other" referring to (!($_POST)) { $location_other = $location; } else
2
2711
by: Nick | last post by:
Can someone please tell me how to access elements from a multiple selection list? From what ive read on other posts, this is correct. I keep getting an "Undefined variable" error though... Form page************************************************************ <form action="/process.php" method="get" name="formOne" id="formOne"> <select...
2
2528
by: Alexander Ross | last post by:
I have a variable ($x) that can have 50 different (string) values. I want to check for 7 of those values and do something based on it ... as I see it I have 2 options: 1) if (($x=="one") || ($x=="two") || ... || ($x=="seven")) ... or 2) switch ($x){ case("one"):
0
3259
by: Dan Foley | last post by:
This script runs fine, but I'd like to know why it's so slow.. Thanks for any help out there on how i can make it faster (it might take up to 5 min to write these 3 export files whith 15 records each!!!) Dan ==================== <style> body, table, tr, td { font-family: 'verdana'; font-size: 12px;
5
3193
by: Lee Redeem | last post by:
Hi there I've created abd uploaded this basic PHP script: <html> <head> <title>PHP Test</title> </head> <body> <H1 align="center">
5
10035
by: christopher vogt | last post by:
Hi, i'm wondering if there is something like $this-> to call a method inside another method of the same class without using the classname in front. I actually use class TEST { function func1()
6
2653
by: Phil Powell | last post by:
Ok guys, here we go again! SELECT s.nnet_produkt_storrelse_navn FROM nnet_produkt_storrelse s, nnet_produkt_varegruppe v, nnet_storrelse_varegruppe_assoc sv, nnet_produkt p WHERE s.nnet_produkt_storrelse.id = sv.nnet_produkt_storrelse_id AND sv.nnet_produkt_varegruppe_id = v.nnet_produkt_varegruppe_id AND sv.nnet_produkt_varegruppe_id IN (...
1
2182
by: Michel | last post by:
a site like this http://www.dvdzone2.com/dvd Can you make it in PHP and MySQL within 6 weeks? If so, send me your price 2 a r a (at) p a n d o r a . b e
11
3135
by: Maciej Nadolski | last post by:
Hi! I can`t understand what php wants from me:( So: Cannot send session cache limiter - headers already sent (output started at /home/krecik/public_html/silnik.php:208) in /home/krecik/public_html/silnik.php on line 251 Line 208: print ( "error: " . mysql_error()); Line 251: session_register("uprawnienia", "zalogowany"); I can...
0
7270
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7178
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7563
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7125
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
5703
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
4757
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3252
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
813
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
470
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.