We recently replaced most of our image resizing code with Thumbor, an open-source thumbnailing server. This post describes how and why we migrated to a standalone thumbnailing architecture, and addresses some of the challenges we faced along the way.

Background

Historically, 99designs has largely been powered by a monolithic PHP application. Maintaining this application has become increasingly difficult as our team and codebase grow. One cause of this difficulty is that the application contains a lot of incidental functionality—supporting code that isn't the core purpose of the application, but which is necessary for its operation.

As such, we set ourselves a technical goal in 2013 to migrate to a more service-oriented architecture. This means breaking big masses of functionality into discrete services and libraries that do one thing well. Such a design tends to yield smaller, more cohesive services, and provides natural lines along which our team can subdivide.

Image thumbnailing is a generic function required by many graphics-intensive websites, and a prime candidate for extraction into a standalone service.

Thumbnails at 99designs

Our 230,000+ strong designer community uploads a new image to 99designs every ~6 seconds. We serve several thumbnail variations of these images across the site.

Our thumbnailing solution needs to scale to serve our production traffic load. The approach we've used until recently has been to generate thumbnails ahead-of-time using asynchronous task queues. Every time a designer uploads an image, we kick off a task that generates thumbnails of that image and stores them in S3:

Uploading an image enqueues a resize task

If a thumbnail request arrives while the task is generating the thumbnail, we serve a placeholder image:

Asynchronously generating image thumbnails

Once the thumbnailing task finishes, we can serve the resized images:

Serving the generated thumbnails

This architecture has served us pretty well. It keeps response times low and scales nicely, but it has a few shortcomings:

  • We've intertwined the image resizing logic with our PHP application. Other apps in our stack have to implement their own resizing.

  • It's not the simplest solution. There's quite a bit of complexity: deduping resize tasks, using client-side polling to check if a resize operation has completed, etc.

  • We can only serve thumbnails at predefined sizes. If we decided to introduce a new thumbnail size, we'd have to generate that thumbnail for tens of millions of existing images.

A better solution is to create a separate, simpler thumbnailing service that any application in our stack can use.

Thumbor overview

Enter Thumbor. Thumbor is an open-source thumbnail server developed by the clever people behind globo.com. Thumbor resizes images on-demand using specially constructed URLs that contain the URL of the original image and the desired thumbnail dimensions, e.g.:

http://thumbor.example.com/320x240/http://images.example.com/llamas.jpg

In this example, the Thumbor server at thumbor.example.com fetches llamas.jpg from images.example.com over HTTP, resizes it to 320x240 pixels, and streams the thumbnail image data directly to the client.

At face value this seems less scalable than our previous task-based solution, but some careful use of cacheing ensures we only do the resize work once per thumbnail.

New architecture

The high-level thumbnailing architecture now looks like this:

Serving thumbnails on-demand via Thumbor

Our applications generate URLs that point to a Thumbor server (via a CDN). The first request for a particular thumbnail blocks while Thumbor fetches the original image and produces the resized version. We set long cache expiry times on the resulting images, so they're effectively cached forever. The CDN serves all subsequent thumbnail requests.

We put a cluster of Thumbor servers behind an elastic load balancer to cope with production traffic. This also gives redundancy when one of the servers dies.

The resulting architecture is very simple, and our image-resizing capability is neatly encapsulated as a standalone service. This means we avoid the need to re-implement thumbnailing in each of our applications—all that's needed is a small client library to produce Thumbor URLs.

Usage example

We created Phumbor to generate Thumbor URLs in PHP applications. Here's how you might implement a Thumbor view helper:

<?php
function thumbnail($original)
{
    $server = 'http://thumbnails.example.com';
    $secret = 'MY_SECRET_KEY';
    return new \Thumbor\Url\Builder($server, $secret, $original);
}

You might use it in a template like this:

<img src="<?php echo thumbnail('http://images.example.com/llamas.jpg')->resize(320, 240) ?>" />
<img src="<?php echo thumbnail('http://images.example.com/foo.png')->resize(320, 240) ?>" />

This produces the following HTML:

<img src="http://thumbnails.example.com/5yVqQzzWIuobw9rd4UebeF9v78c=/320x240/http://images.example.com/llamas.jpg" />
<img src="http://thumbnails.example.com/X8oXlCzK1ce_UIxiZ0tlv5vF7nY=/320x240/http://images.example.com/foo.png" />

Implementation strategy

We used a couple of complementary techniques to test Thumbor's capabilities before committing to its use in production.

Firstly, we used feature-flipping to selectively enable Thumbor URLs for certain users. Initially we used this to let developers click around the site and check that Thumbor was generating thumbnails correctly.

Secondly, we used asynchronous tasks to simulate a production traffic load on the Thumbor service. Every time an app server handled a thumbnail request, we enqueued a task that requested that same thumbnail from the new Thumbor service. This allowed us to check performance of the service without risking a disruption to our users.

Finally, we used our feature-flipping system to incrementally roll out Thumbor thumbnails to all our users. This worked better than immediately pointing all traffic at the Thumbor service, which tended to cause a spike in response times.

Thumbor configuration

Some of our Thumbor configuration settings differ from the recommended defaults. We tweaked our configuration in response to our performance measurements.

Thumbor ships with a number of imaging backends; the default and recommended backend is PIL. Our testing shows that the OpenCV backend is much faster (i.e. 3-4x faster) than PIL. Unfortunately, OpenCV can't resize GIFs or images with alpha transparency. As a result, we implemented a simple multiplexing backend that delegates to OpenCV wherever possible and falls back to PIL in the degenerate case.

Our production Thumbor cluster consists of 6x c1.medium EC2 instances behind an ELB, each running 4 Thumbor processes behind nginx. This cluster can comfortably serve all our production traffic.

Generally we've found that Thumbor is quite stable, and expect it to further mature as more people use it and make improvements.

Conclusion

Our Thumbor service now serves all design entry thumbnails for our main PHP application. The resulting architecture is much simpler and the service is usable by other applications in our stack. We'll continue to use Thumbor in future apps we develop, and look for more opportunities to simplify our codebase by progressively adopting a more service-oriented architecture.