The Minesweeper Machine

There’s a device in the movie Minority Report that predicts the future for Tom Cruise. I think most of us would call it a computer, but it’s obviously fantasy, hardly like modern computers. Still, it works basically like a modern computer, giving output in response to input.

I think that’s pretty close to our collective vision of the future of computers. I remember when the movie was first released there was a lot of talk about the gesture interfaces, the instant video playback, etc. These were discussions about what kind of computers we might use some day. Strangely, I don’t remember anyone asking how you’d play Minesweeper on that thing.

Seriously, as amazing as that thing was, all signs pointed to it being completely unable to run Minesweeper. That computer had only one application. It was a complicated application, but there was no way to quit it so you could open another application like Minesweeper. Not only was there no way to open Minesweeper, you couldn’t even open a browser to download the application bundle, nor open your file browser to move it into your applications directory.

Tom Cruise should have been furious when he found out his employer spent so much on such a crippled device. But he wasn’t. And he wasn’t for the same reason no one cares that they can’t upgrade the firmware on their TV: when a device does one task, it’s not really a computer.

I’m sure you know where I’m going with this, so I’ll just get to it: Apple’s new iPad is not a computer. With the exception of quitting applications, everything you can’t do on the Minority Report prediction machine is similarly impossible on the iPad. It’s a device with a more specific purpose than general computing. That purpose isn’t very specific, but it also isn’t general computing. It’s specific enough that you don’t need access to the file system and you don’t need to run multiple applications at the same time. A lot of people are bothered by this specificity; I think those people are largely missing the point.

Did you know Nintendo’s Wii doesn’t have a photo editor application? You probably did, but you never really thought of the possibility, because you don’t think of it as a computer. It has games, a browser, email, a photo viewer, and many other applications you find on a computer, but it’s clearly not a computer.

Why not? Part of the non-computer designation is the lack of multitasking or file system access. But I think most of it is the branding. The Wii is a gaming console from a company that makes gaming consoles. The iPhone, on the other hand, is a phone from a company that makes computers. It’s hard not to think of it as a computer, to remember it’s primarily a phone.

It’s even more difficult to think of the iPad as something other than a computer, because it doesn’t fit into another existing category. If it’s not a computer, what is it? Apple didn’t help answer this question with the name. It’s a pad? Well, no, that doesn’t really evoke enough to distract us from the underlying computer. It’s not even close to how our TV’s computers are completely abstracted away.

I don’t have a good name for this new type of device that isn’t quite a computer. John Gruber made a good analogy to the introduction of automatic transmissions in cars. They made cars a lot easier to drive, but also a more difficult to tinker with, a little less powerful. Maybe we should call these new devices automatic computers.

Whatever we call them, I think they are the future, and conventional computers will in time be relegated to a category of device only computing professionals use. As a computing professional, that’s not exactly a happy thought for me. But as a computer user, that’s great. When I’m not moving files around to do some serious computing, I’d like nothing more than for the filesystem to disappear, for all other applications to quit, and for my focus to be entirely on what I’m doing.

Indeed, I’m doing that right now, composing this in WriteRoom, an full-screen text editor. It’s not at all difficult to see how most people would prefer this experience all the time. A week ago, I was thinking the iPhone’s interface was limited because the hardware capabilities are limited. I expected it to get more and more like a computer over time.

Now I expect the iPhone and the iPad to grow further and further away from traditional computer interfaces. The process of exiting one iPhone application and opening another, for example, is still clumsy in terms of exposing an underlying process I really don’t care about. I don’t want to stop talking on the phone, exit my phone application, go to my home screen, and open my email application to write an email. I just want to stop talking and start writing.

Part of me still wants to play Minesweeper on the prediction machine, but I think I’ll forget about that when I have a separate Minesweeper machine. Now that computers are starting to fade into the background, it’s important for those of us closest to computers to remember: this is a good thing.

Web Development Conventions

I’ve started cataloging my web development conventions on a single page. I certainly don’t have everything down yet, and readily admit that some of it is completely unproductive, borderline OCD, but I thought it might be interesting, maybe even useful.

Webshots Get Original Photo Bookmarklet

A while ago I made a bookmarklet to get the original photo from a Flickr page. I used it myself a bit, but mostly I just did it to warn those who publish with Flickr that there wasn’t much more than the illusion of copy protection on photos. Flickr since made their security-by-obscurity a little more obscure and my original bookmarklet no longer works.

Yesterday someone pointed me to Webshots, specifically the source code of a Webshots photo where the source URL is available in plain text, and asked me to make a new bookmarklet. So here it is:

Webshots Get Original Photo

If you’re unfamiliar with bookmarklets, you drag this into your bookmarks and then click it when viewing a photo in Webshots to get the original. I assume at some point Webshots will change things and this will break, so don’t get too attached to it.

Tips for Importing Large Datasets Using phpMyAdmin

I recently imported a large dataset using phpMyAdmin (no SSH) and learned a couple things along the way that I thought worth sharing (if only with my future self).

1) Use split

phpMyAdmin’s import has a maximum file size, which is significantly smaller than the dataset I was importing. My initial solution to this problem was to manually break up the text file containing the data. After doing this for a few minutes, I thought “I should make a tool to do this automatically.” Then I thought “Maybe someone already made that tool.” So I headed over to Google to find an application to download, only to discover it’s already on my machine. It’s a Unix command, aptly named “split.” Here’s how I used it:

split -l 250000 bigfile.csv

That gave me a bunch of files of 250,000 lines each, about 8MB in size, comfortably under my 10MB maximum.

2) Use gzip

phpMyAdmin can unpack compressed files. I got an error about file size headers when I tried using a zip file, so I went with gzip instead, and that worked fine, reducing my 8MB files to about 1.5MB. I used gzip like this:

gzip x*

You’re probably wondering: if my maximum upload size is 10MB, why didn’t I split into 30MB files and get 6MB gzips? I didn’t do that because of another limit: time. PHP scripts (including phpMyAdmin) can only run for so long at one go, and it takes a while to run 250,000 MySQL queries.
After that, it’s just a series of imports, which should run smoothly. If you’re willing and able to SSH to your database, of course, command-line MySQL makes this all a single step. But when that’s not an option, this will at least make the slow alternative less so.

Download Daily Show

You may remember that Viacom sued YouTube for a billion dollars over “Inducement of Copyright Infringement.” What YouTube does, specifically, is make it possible to download clips of, for example, the Daily Show, without watching Comedy Central’s ads. That’s certainly not the purpose of YouTube, but making it possible to download is a side affect of posting anything on the internet.

What many people don’t know is those Daily Show clips were also available on Comedy Central’s website, and could be downloaded through almost the exact same process through which they’re downloaded from YouTube. But Comedy Central’s website being incredibly difficult to access made such downloading more difficult. For one thing, it was almost impossible to locate any but the most recent clips on Comedy Central’s site.

Enter TheDailyShow.com, Viacom’s apparent attempt to compete with YouTube by offering a more accessible interface. Now every Daily Show clip has it’s own page at a semi-permanent address, and these pages are relatively easy to locate. So you can find and watch your favorite old Daily Show clip, just like you once could on YouTube, only without the same “Inducement of Copyright Infringement.” Why didn’t YouTube just do that?

It turns out they did. The increased accessibility of TheDailyShow.com also makes it incredibly easy to download Daily Show clips just as they were downloaded from YouTube. If you follow that link, you’ll see just how easy it is. Paste in a link, start downloading. The lesson here: on the internet, “Inducement of Copyright Infringement” is pretty much synonymous with “accessible.” You can’t view something on the internet without first downloading it, so the only way to make it difficult-to-download is to make it difficult-to-view. I suspect someone at Viacom imagined they could built an easy-to-view but difficult-to-download video website to demonstrate what YouTube could be, and why they’re suing. That someone was completely wrong.

25 Statements About Microformats

At the last Refresh Denver meeting, I talked about microformats. Rather than doing a slideshow presentation, I just wrote down and passed out 25 statements about microformats that I hoped would be interesting enough to drive discussion. It worked out well enough, I think. In hindsight, it probably would have been better to focus less on the abstract concepts and more on the specifics of how microformats work. But I definitely plan to use this kind of “25 statements” style for future talks, as it ends up being more flexible and more participatory than a typical presentation.

MySpace Code Licensing

As I mentioned previously, many people have asked me about releasing the source code for various MySpace feed services. Back when I first created the blog service, it was actually released under an open source license. That was a mistake. The code was written to work on my specific server, and most people downloading it and trying to install it on different servers didn’t understand it well enough to make the adjustments necessary to get it working. So what I got in exchange for releasing the source code was a series of requests for help installing and customizing, and not a single code contribution.

Why don’t I just rewrite the code to work better on other servers? To turn that back around, why would I? I created the original services to solve problems for myself. Specifically, I want to be able to keep up with my friends’ words and events without checking MySpace regularly. Then I made the services available for others to freely use. I’d already made them for myself, so that wasn’t much work. But packaging the code to run better on other servers is work, and that work doesn’t benefit me at all. If it doesn’t get me code contributions and it doesn’t make my life easier, why should I do it?

It turns out money is a pretty good reason. When people started offering me money for the code, suddenly that work became more interesting. This money for work exchange is a novel concept, but apparently others have tried it before. So I mostly copied what they’ve done and I’m now offering licenses for the code behind my various MySpace feed services. You can buy a license for either the blog feed creator, the comment feed creator, or the event feed creator for $100 each, or you can buy all three for $250.

The license is pretty straightforward, but there is one somewhat abnormal clause that requires you to keep your own copy of the code updated as I release new versions. I don’t want a lot of broken versions of the code out there with my name on it, and I don’t think keeping it working is too much to ask. You can license it and not keep it updated if you really want, but then I’m going to stop sending you updates. So you can either update promptly, or not at all. Other than that it basically boils down to you giving me money and me giving you code.

Proxy Service Optimization

Over the past few days, many people have noticed some problems with the MySpace feeds. That’s because over the past few days I’ve been experimenting with potential ways to speed up these feeds. Some of these experiments have been less successful, some breaking the feeds altogether. But I think I’ve come up with some successful solutions, which I’ll document here.

The big bottleneck in the MySpace feed services, as with any proxy service, is getting the original content. MySpace pages weigh in at around 45kb each. At a dozen or requests every minute, that’s a lot. So step one, taken a long time ago, was reducing that load.

It’s the nature of feeds that they’re requested much more often than there’s actually new content. That’s sort of the point of feeds. Ideally the feeds would only reload content from MySpace when there’s something new to load. There are various methods built into HTTP, e.g. etags, to ask a server “is there anything new on this page?” Unfortunately, MySpace doesn’t support any of them, so the only way to find out if there’s something new is to look at the actual page.

My solution to this was to only look at the actual page once per hour. I figure a one hour lag between when your friend updates her MySpace page and when that update shows up in your feed is okay. If you need to know sooner than that, you should 1) go outside or 2) check MySpace directly. So if a page was already reloaded in the past hour, I use a local copy instead of requesting an update from MySpace.

And this was all the optimization the feed service needed for a long time. But lately the feeds have become popular enough that more was needed. I was saving the cached pages in a database, which held both the content and the time of the last update. Fetching the time was almost instant, but fetching 45kb of text out of a database is a relatively slow process, and due to the centralized nature of a database, a few dozen relatively slow database queries will quickly create a backlog that takes a few minutes (forever in internet time) to clear up.

While that backlog is clearing up, a script is waiting for a response from the database. But scripts take up resources, so they can’t just wait around forever. If the database request takes too long, the script times out, and that’s when you see an error message saying my server did not respond. That’s no good.

So the next optimization step was to move the content from the database to individual files. Files don’t take as long to read and write because they don’t need to be indexed for searching like a database record. And if one file does take a long time to read, that doesn’t slow down all the other files as it would in a database. I still kept the update times in the database, because those need to be requested by URL, and I didn’t want to deal with working out some sort of URL-to-file-name mapping.

So that sped up things quite a bit, but I’d still notice a bit of backlog in the database requests every now and then. Why was it taking so long to read and update a simple time in the database? It turns out it was taking so long to find the record of a specific URL.

I was saving the URLs in a TEXT field in the database. I knew VARCHAR fields were indexed faster, but they’re limited to 255 characters, and I wasn’t sure if I’d be getting URLs longer than 255 characters. So I used TEXT, but I created an index, so searches would be faster. The additional time required to update a TEXT index on adding new URLs was apparently more than the time saved in searching, though, so the index actually slowed down the database even more.

So I revisited VARCHAR. Some quick research showed that, out of the 21,000 or so URLs I had cached, only one URL was longer than 255 characters, and it wasn’t even valid. So my third step, after creating a cache and then moving the cache content from the database to the file system, was to index URLs as VARCHAR instead of TEXT.

Now everything seems to be nice and speedy. There’s still a lag when a page actually needs to be requested from MySpace, but for all other requests, the feeds seem to be loading almost instantly. There’s one question I was asked several times by people who noticed the feeds were slow and/or broken: why don’t I just spread out the load by giving out copies of the code? That would certainly remove the need for optimization. But I actually have a few different reasons for not doing that, which I’ll save for the next article.

MySpace Comments Feed Creator

A few different people have notified me about a bug in the MySpace Comments RSS Creator that caused some comments to not show up in the feed. The problem was with comments that had “Currently Online” notices in them. This was kind of difficult to track down because it only showed up when the people leaving the comments were online at the time the comment was being converted to a feed. But it’s fixed now.

Along with that bug fix, Atom feeds are now possible, and so the name is changed from “RSS Creator” to “Feed Creator,” bringing the comments in line with the blog and event feed creators.

MySpace Feed Creator

Pei Huang helpfully pointed out that the MySpace RSS Creator wasn’t working quite right on some blogs. Apparently some blogs have the time at the top of each post and others have it at the bottom. The tool would work on both, but the post times would all be wrong when the time was on top. I don’t know how if this is something new at MySpace or if it’s always been like this, but the tool will handle both types now.

Also, it can now produce Atom feeds in addition to RSS feeds. If you don’t know what that means, you probably have no reason to care, but the name has been changed accordingly. As always, please let me know if you notice any problems.

Bad Behavior has blocked 3499 access attempts in the last 7 days.