473,799 Members | 2,997 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How can I do this in SQL?

MySQL 4.0.23-standard

I run an online movie database with over 80,000 movies and over 500,000
people stored within.

We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directe dBy` and `movies_produce dBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationsh ip" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn` , which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I'd like to know whether it's possible to do this entirely in SQL,
since it's really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

Is it possible?

Thanks in advance for any help

Jasper Bryant-Greene
Cabbage Promotions

Jul 23 '05 #1
11 1735
MLH
<snip>
We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directe dBy` and `movies_produce dBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationsh ip" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn` , which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I'd like to know whether it's possible to do this entirely in SQL,
since it's really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

I suggest you start with a better normalization scheme. Restructure
your data into tblMovies, tblDirectors, tblProducers and maybe
throw in tblActors for good measure. Make sure each has primary
key field (natural keys or otherwise). Your tblMovies table should
have link fields to the primary keyfields in tblDirectors and
tblProducers and maybe fields like [StarActor] & [StarActress].
However, I would suggest a junction table between tblMovies
and tblActors, as movies can have several (3 - 5 - a dozen...)
co-stars. Call it tblCoStars and furnish it with [ActorID] and
[MovieID] fields with a 1-to-many relationship between itself
and each of the other 2 tables, with tblCoStars being the many
side of both relationships. Build yourself indices in tblMovies on
each of these fields: [MovieID] (of course), then [DirectorID],
[ProducerID] and [ActorID].

Here's some sample SQL that should churn out a lot of rows
if you have the rich & full datasets you mentioned...

SELECT DISTINCTROW tblMovies.Movie Name, tblMovies.Produ cerID,
tblMovies.Direc torID, tblMovies.Relea seDate, tblMovies.BoxOf ficeGross,
tblMovies.Movie Rating, tblMovies.Movie Descrip, [DirectorFName] & " " &
[DirectorLName] AS Director, [ProducerFName] & " " & [ProducerLName]
AS Producer
FROM (tblMovies INNER JOIN tblDirectors ON tblMovies.Direc torID =
tblDirectors.Di rectorID) INNER JOIN tblProducers ON
tblMovies.Produ cerID = tblProducers.Pr oducerID
WHERE ((tblMovies.Pro ducerID=1 Or tblMovies.Produ cerID=19 Or
tblMovies.Produ cerID=39 Or tblMovies.Produ cerID=101 Or
tblMovies.Produ cerID=311) AND (tblMovies.Dire ctorID=17 Or
tblMovies.Direc torID=77 Or tblMovies.Direc torID=211 Or
tblMovies.Direc torID=321));

(untested)
Jul 23 '05 #2
MLH
<big snip>
If you call your junction table tblCast & give it 2 fields: [MovieID]
and [ActorID], you could put it in the query to include co-stars in
the query dynaset.

The following should produce list of movies, producers, directors,
actors for a number of films directed by any of several different
directors OR produced by any of several different producers.

SELECT DISTINCTROW tblMovies.Movie Name, tblMovies.Produ cerID,
tblMovies.Direc torID, tblMovies.Relea seDate, tblMovies.BoxOf ficeGross,
tblMovies.Movie Rating, tblMovies.Movie Descrip, [DirectorFName] & " " &
[DirectorLName] AS Director, [ProducerFName] & " " & [ProducerLName]
AS Producer, [ActorFName] & " " & [ActorLName] AS Actor
FROM tblActors INNER JOIN (((tblMovies INNER JOIN tblDirectors ON
tblMovies.Direc torID = tblDirectors.Di rectorID) INNER JOIN
tblProducers ON tblMovies.Produ cerID = tblProducers.Pr oducerID) INNER
JOIN tblCast ON tblMovies.Movie ID = tblCast.MovieID ) ON
(tblMovies.Acto rID = tblActors.Actor ID) AND (tblActors.Acto rID =
tblCast.ActorID )
WHERE ((tblMovies.Pro ducerID=1 Or tblMovies.Produ cerID=19 Or
tblMovies.Produ cerID=39 Or tblMovies.Produ cerID=101 Or
tblMovies.Produ cerID=311)) OR ((tblMovies.Dir ectorID=17 Or
tblMovies.Direc torID=77 Or tblMovies.Direc torID=211 Or
tblMovies.Direc torID=321));

Jul 23 '05 #3
Hi

Thanks for your reply, but I think you have misunderstood me. I'm not
looking to create lists of actors, producers, directors, etc. for
movies. We already do that and it works fine. What I am looking to do
is create an SQL query that will do the equivilant of what I am
currently doing in PHP, which is:

1. Given a person ID, get the number of records in about 10-15
different tables (directors, producers, actors, crewmembers, sound
techs, art department, makeup, sfx, etc.).
2. Return the name of the table which had the largest number of
records.

This tells me what the person's predominant occupation is. The problem
is, step 1 requires 10-15 different SQL queries executing COUNT(*)
operations. I'm looking for a way to do it all in a single, or at least
in fewer, SQL queries, to improve performance.

Also the scheme you have suggested requires a huge amount of data
redundancy as a single person could be an actor, a producer, a
director, a crewmember and much more. In your scheme there would be
information stored seperately for each occupation of each person, an
exponentially larger amount of information. We can't afford for that
much data redundancy when we're dealing with millions of records.

We already have many more tables (actors, crewmembers, art dep., makeup
dep., sfx, visual fx, sound techs, motion control, etc. etc.) but I
didn't list them all as it wasnt necessary.

Jasper

Jul 23 '05 #4
On 07/04/2005, Jasper Bryant-Greene wrote:
We already have many more tables (actors, crewmembers, art dep.,
makeup dep., sfx, visual fx, sound techs, motion control, etc. etc.)
but I didn't list them all as it wasnt necessary.


Why do you make different tables for different occupations instead of
just a people-table, a movie-table and a table linking people/movies
with the occupation as extra attribute?

Then you can simply say:

SELECT
peoplemovies.oc cupation,
COUNT(peoplemov ies.id) AS number
FROM peoplemovies
WHERE
peoplemovies.pe ople_id = <<whatever>>
GROUP BY peoplemovies.oc cupation
ORDER BY number DESC
LIMIT 1

--
felix
Jul 23 '05 #5
<snip>
I apologize. I went back and read your question.
Here it is...

Is it possible?

Answer: Yes.

However, I would never consider constructing
a set of database tables as large as those you
described in the same manner you did. Follow
my other instructions and your sql will be much
more efficient and easier to write.
Jul 23 '05 #6
Felix -- we don't create seperate tables.

There is one `people` table which contains all the information specific
to that person.

There is then a huge set of linking tables, `movies_directe dBy`,
`movies_written By`, `movies_makeupB y`, `movies_soundBy `, `movies_artBy`
which contain just three columns -- the ID of the movie, the ID of the
person, and a notes column.

This is the best way to do it -- the method that Archie suggests is
not. If I followed Archie's instructions I would have the same number
of tables (one for each occupation) but I would have a lot of data
redundancy which is not acceptable.

I apologise if I haven't made myself clear in previous posts.

Jul 23 '05 #7
Sorry Felix, I re-read your post and now see what you are suggesting.
Excellent idea, I'm going to implement that.

Thanks

Jul 23 '05 #8
Jasper Bryant-Greene wrote:
In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.


I think the following might work, but I haven't tried it so I'm not sure
if there's a logical conflict with doing multiple aggregate functions in
this case.

SELECT p.person_ID,
COUNT(d.movie_I D) AS directedByCount ,
COUNT(w.movie_I D) AS writtenByCount
FROM people AS p
LEFT OUTER JOIN movies_directed By AS d ON p.persion_ID = d.person_ID
LEFT OUTER JOIN movies_writtenB y AS w ON p.persion_ID = w.person_ID
WHERE p.person_ID IN ( $my_person_id_l ist )
GROUP BY p.person_ID

It should be clear how to add extra joins and COUNT()'s for your other
matching tables. But keep in mind that MySQL has a limit of 31 tables
in a single join (or 63 tables if you're running on a 64-bit operating
system).

Then you can do your comparison between the counts in your PHP code and
decide who is a director, who is a writer, etc.

It occurs to me, though, do you make any allowances for _when_ the
person was a director vs. a writer? What if they were a writer from
1970-1979 and they wrote a lot of movies, but they've been a director
ever since 1979, though for for a smaller total number of projects? :-)

Regards,
Bill K.
Jul 23 '05 #9

"Jasper Bryant-Greene" <ja******@gmail .com> wrote in message
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .
MySQL 4.0.23-standard

I run an online movie database with over 80,000 movies and over 500,000
people stored within.

We use tables like `movies`, `people`, etc. for storing the actual
movie information, a short plot summary, etc. These tables are MyISAM.

For information like who directed or produced a movie, we have tables
like `movies_directe dBy` and `movies_produce dBy` which have three
columns -- the movie ID in the `movies` table, the person ID in the
`people` table, and a notes column.

These "relationsh ip" tables are mostly InnoDB, with the exception of
the table for casting information, `people_castIn` , which is MyISAM as
fulltext searching is required on this table.

In order to decide what occupation a person is predominantly (e.g.
director, actor, produced, sound tech, etc.) we currently use PHP to
fetch a count of rows from each of these `movies_*` relationship tables
which have a given person ID. The table with the most entries for that
person ID is deemed to represent his or her main occupation.

I'd like to know whether it's possible to do this entirely in SQL,
since it's really slow, especially when the query involves 10 or more
people, to fetch COUNT(*)s from multiple tables having a combined total
of over 3 million rows.

Is it possible?

Thanks in advance for any help

Jasper Bryant-Greene
Cabbage Promotions


Jasper,

Why are you doing dynamic queries on a relatively static database? Sure you
are adding new data but the new data is a very small percentage of the
total. Why not run a batch process on a regular basis to determine the
predominate occupation of a person? You could also take into account time
frames to give a more accurate occupation or just list the number of times
the person has acted, directed, produced, etc.. Storing this data and
updating it when needed will save you a vast amount of resources.

Kevin Sproule
Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3364
by: James | last post by:
I have a from with 2 fields: Company & Name Depening which is completed, one of the following queries will be run: if($Company){ $query = "Select C* From tblsample Where ID = $Company Order By Company ASC";
5
2760
by: Scott D | last post by:
I am trying to check and see if a field is posted or not, if not posted then assign $location which is a session variable to $location_other. If it is posted then just assign it to $location_other I keep getting "Notice: Undefined index: location_other" referring to (!($_POST)) { $location_other = $location; } else
2
2736
by: Nick | last post by:
Can someone please tell me how to access elements from a multiple selection list? From what ive read on other posts, this is correct. I keep getting an "Undefined variable" error though... Form page************************************************************ <form action="/process.php" method="get" name="formOne" id="formOne"> <select name="owner" size="6" multiple id="owner"> <option value="one">one</option> <option...
2
2560
by: Alexander Ross | last post by:
I have a variable ($x) that can have 50 different (string) values. I want to check for 7 of those values and do something based on it ... as I see it I have 2 options: 1) if (($x=="one") || ($x=="two") || ... || ($x=="seven")) ... or 2) switch ($x){ case("one"):
0
3285
by: Dan Foley | last post by:
This script runs fine, but I'd like to know why it's so slow.. Thanks for any help out there on how i can make it faster (it might take up to 5 min to write these 3 export files whith 15 records each!!!) Dan ==================== <style> body, table, tr, td { font-family: 'verdana'; font-size: 12px;
5
3234
by: Lee Redeem | last post by:
Hi there I've created abd uploaded this basic PHP script: <html> <head> <title>PHP Test</title> </head> <body> <H1 align="center">
5
10058
by: christopher vogt | last post by:
Hi, i'm wondering if there is something like $this-> to call a method inside another method of the same class without using the classname in front. I actually use class TEST { function func1()
6
2672
by: Phil Powell | last post by:
Ok guys, here we go again! SELECT s.nnet_produkt_storrelse_navn FROM nnet_produkt_storrelse s, nnet_produkt_varegruppe v, nnet_storrelse_varegruppe_assoc sv, nnet_produkt p WHERE s.nnet_produkt_storrelse.id = sv.nnet_produkt_storrelse_id AND sv.nnet_produkt_varegruppe_id = v.nnet_produkt_varegruppe_id AND sv.nnet_produkt_varegruppe_id IN ( SELECT nnet_produkt_varegruppe_id FROM nnet_produkt_varegruppe
1
2199
by: Michel | last post by:
a site like this http://www.dvdzone2.com/dvd Can you make it in PHP and MySQL within 6 weeks? If so, send me your price 2 a r a (at) p a n d o r a . b e
11
3186
by: Maciej Nadolski | last post by:
Hi! I can`t understand what php wants from me:( So: Cannot send session cache limiter - headers already sent (output started at /home/krecik/public_html/silnik.php:208) in /home/krecik/public_html/silnik.php on line 251 Line 208: print ( "error: " . mysql_error()); Line 251: session_register("uprawnienia", "zalogowany"); I can understand that sth, is wrong in line 251 after line 208 and it is logical to
0
9687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9541
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10252
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10231
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
6805
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5585
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4141
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3759
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2938
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.