Xyon (xyon) wrote,

  • Mood:
So I spent some time working on my movie database project again this week. The early part of the week was spent working the process from database schema to full population so I could get it in a more or less automated fashion.

I replaced several perl scripts with inline update statements, which saves time on exporting and importing the data.

I also fixed the DDL -- I had several redundant indicies that I didn't notice because I never really looked at the tables with "show indexes" after creating them back in November.

I also payed attention to the mysqldump file listings -- it prefaces its inserts with something akin to "/*!40000 ALTER TABLE `movie_credits` DISABLE KEYS */;" and then re-enables them when done. There was an update query that took 45 minutes to run, I shut off indexing the next time I refreshed (I must have done full reconstructs of the database at least 30 times throughout the week) and it took 2 minutes (and when indexing was re-enabled it took 3 minutes to build the index). -- Things like this are probably what made my original attempt at an importer suck. Maybe I'll get brave this week and try again.

Total import time is 30 minutes after JMDb import (55 minutes). 18 minutes of that are spent building the 7615196 credits entries (the 4056630 actor/actress entries took 6 minutes, and rebuilding indices took 3 minutes).

Today I added a little bit to the interface -- showing movie plot and trivia as hidden divs that are shown via javascript (and some other minor changes, too). Yes, I did it. Yes, it changes the page; but also yes (and this is the important part), the text ("Show Plot"/"Hide Plot") indicates that the screen is going to change.

But the best part is that now I have a semi-automated solution (JMDb is still manual), and my IMDb ID lookups aren't failing due to the name format having changed (since I'm using yesterday's data).

(Of course, I still need IMDb to fix the dual gender issue of people like Jamie Alcroft (http://www.imdb.com/name/nm0017314/), et al -- as well as any of the gripes I made back when I was first starting this related to the *.list files that may still exist)
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.