Xyon (xyon) wrote,
Xyon
xyon

The Movie Cloud

So a while back I wondered how the "movie cloud" would grow if you gave it the approach of (i.e.) six degrees of separation.

Or, using movies as nodes and actors as edges, what is the maximum path one could travel.


My assumption: 10 Things I Hate About You (1999) is in the cloud. (This is always my example because it's the highest alphabetically in my rated movies table)

The general principle:
Seed the movie cloud (level 0) with one movie. Seed the persons cloud (level 0) with the persons associated with the seed movie.
Movie cloud level 1 contains all of the movies that the people in cloud 0 were in. Persons cloud level 1 contains all of the people now involved in the movie cloud.
Repeat until nothing new is added.

What was added at each level:
mysql> select count(*), cloud_level from cloud_cover_movies group by 2 order by 2;
+----------+-------------+
| count(*) | cloud_level |
+----------+-------------+
|        1 |           0 |
|      418 |           1 |
|    61678 |           2 |
|   241960 |           3 |
|    51766 |           4 |
|     2648 |           5 |
|      380 |           6 |
|       25 |           7 |
|        5 |           8 |
+----------+-------------+
9 rows in set (0.16 sec)

mysql> select count(*), cloud_level from cloud_cover_persons group by 2 order by 2;
+----------+-------------+
| count(*) | cloud_level |
+----------+-------------+
|       50 |           0 |
|    12997 |           1 |
|   401309 |           2 |
|   491555 |           3 |
|    59679 |           4 |
|     4441 |           5 |
|      449 |           6 |
|       42 |           7 |
|       11 |           8 |
+----------+-------------+
9 rows in set (0.43 sec)

mysql> 


But what good are those numbers?

mysql> call cloud_details();
+--------+----------+---------+-----------+------------+
| Movies | MoviePct | Persons | PersonPct | CloudLevel |
+--------+----------+---------+-----------+------------+
|      1 |   0.0002 |      50 |    0.0050 |          0 |
|    419 |   0.0891 |   13047 |    1.3051 |          1 |
|  62097 |  13.2022 |  414356 |   41.4479 |          2 |
| 304057 |  64.6442 |  905911 |   90.6179 |          3 |
| 355823 |  75.6499 |  965590 |   96.5876 |          4 |
| 358471 |  76.2129 |  970031 |   97.0318 |          5 |
| 358851 |  76.2937 |  970480 |   97.0767 |          6 |
| 358876 |  76.2990 |  970522 |   97.0809 |          7 |
| 358881 |  76.3000 |  970533 |   97.0820 |          8 |
+--------+----------+---------+-----------+------------+
(I'm cheating here, my cloud_details procedure actually did 9 seperate selects, I've compressed them for happiness sake)

(There are 470355 movies and 999704 persons (I only have actors and acting credits listed right now) in my database; based off of last week's IMDb data dump)

Now I suppose the challenge is to come up with a chain that is a maximal chain (8 connections in it)
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 4 comments