Tips for Importing Large Datasets Using phpMyAdmin

I recently imported a large dataset using phpMyAdmin (no SSH) and learned a couple things along the way that I thought worth sharing (if only with my future self).

1) Use split

phpMyAdmin’s import has a maximum file size, which is significantly smaller than the dataset I was importing. My initial solution to this problem was to manually break up the text file containing the data. After doing this for a few minutes, I thought “I should make a tool to do this automatically.” Then I thought “Maybe someone already made that tool.” So I headed over to Google to find an application to download, only to discover it’s already on my machine. It’s a Unix command, aptly named “split.” Here’s how I used it:

split -l 250000 bigfile.csv

That gave me a bunch of files of 250,000 lines each, about 8MB in size, comfortably under my 10MB maximum.

2) Use gzip

phpMyAdmin can unpack compressed files. I got an error about file size headers when I tried using a zip file, so I went with gzip instead, and that worked fine, reducing my 8MB files to about 1.5MB. I used gzip like this:

gzip x*

You’re probably wondering: if my maximum upload size is 10MB, why didn’t I split into 30MB files and get 6MB gzips? I didn’t do that because of another limit: time. PHP scripts (including phpMyAdmin) can only run for so long at one go, and it takes a while to run 250,000 MySQL queries.
After that, it’s just a series of imports, which should run smoothly. If you’re willing and able to SSH to your database, of course, command-line MySQL makes this all a single step. But when that’s not an option, this will at least make the slow alternative less so.

One Comment

  1. David St-Onge
    Posted September 11, 2008 at 8:22 am | Permalink

    Is you MySpace blog export still working ?
    I would love to export all my post (not just 10) to wordpress. I have an error when I try your blog export …

    Thanks !

    David

    1

Post a Comment

Your email is never published nor shared.