Xyon (xyon) wrote,

Movie Database Again

Yeah, I love my movie database...

When I worked on being able to update my database with the latest datafiles from IMDb I used this strategy:
for every data field that wasn't re-importable (movie ratings, imdb ids, movie reviews, et cetera)
  dump data to file
end for

Build new database
import saved data
production to archive
temporary to production

Everything was scripted, and it was all fine... except that "import saved data" was starting to take upwards of 25 minutes. MySQL was having to reparse all of the data, do a index match, build a record set, do the update, et cetera, once for each movie (at around 1.5 seconds per (420 movies currently)) and actor (a lot quicker at the unit level, but 19000 of them added up)

So I moved my data (movie ratings and reviews) to a user namespace (database name is the owning user's username) and the IMDb IDs to an archive database.

Why is this a win? Time to import my data: 0 time units. Time to import movie ids: 0.11s (total). Time to import person ids: 1.97s (total).

So now my only problem is that the JMDB import that I use to then build my format opens a transaction handle but doesn't close it... so lately I've been running out of hard drive space on the mysql partition because the import is building up a multi gigabyte transaction log that it never commits.
Tags: movie database
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.