99designs Tech Blog

Adventures in web development

Swiftly and Machine Learning: Part 2

In this series of guest blog posts, 99designs intern Daniel Williams takes us through how he has applied his knowledge of Machine Learning to the challenge of classifying Swiftly tasks based on what what customer requests.

The challenge

Swiftly is an online service from 99designs that lets customers get small graphic design jobs done quickly and affordably. It’s powered by a global network of professional designers who tackle things like business card updates and photo retouching in 30 minutes or less – an amazing turnaround time for a service with real people in the loop!

With time vital to the service value, any moment wasted in allocating a task to a designer with experience in the specific requirements could have a detrimental impact on the customers experience.

With the ultimate aim of complete and accurate automation of job to designer matching with the customer simply saying in their own terms what they need, we decided to apply machine learning to further develop Swiftly’s “Intelligent Matching System”.

This is part two of a three-part blog series. In part one we tried to determine the types of tasks. In this post, we use machine learning to classify tasks into these task categories. A future post will discuss using our predictions for task allocation.

Categories to predict

To set up a machine learning problem, we need to first decide on what we want the answers to be. After the last post’s experimentation, I decided to split the classification into two parts: what type of document is to be edited or created, and what type of work is needed on the document.

This gives us 7 document types:

  • Logo
  • Business Card
  • Icon
  • Template (ppt / pdf / word etc)
  • Header / Banner / Ad / Poster
  • Social Media
  • Other Image

and 9 types of graphic design work appropriate for small tasks:

  • Vectorisation
  • Transparency
  • Holidays edit
  • Creative Update
  • Resize
  • Reformat
  • General Edit
  • Colour Change
  • Text Change

For example, one task might be Vectorisation on a Logo, another might be Text Change on a Business Card. In total, 63 different combinations of document and work type exist. This is what we’re trying to predict.

Obtaining training data

In my last post, I used unsupervised techniques that don’t need training data. Now that we have a specific outcome we’d like to predict, supervised methods are more appropriate. They use training data find patterns associated with each category, patterns that might be hard for humans to spot. For us, that training data will be a bunch of historical tasks and the correct categories for them.

However, obtaining good training data is a large problem in itself, especially given how many combinations of categories there are!

Mechanical Turk

Knowing how much work was involved, my first instinct was to outsource it to Amazon’s Mechanical Turk service. Mechanical Turk is named after an elaborate 18th century hoax that was exhibited across Europe, in which an automaton could play a strong game of chess against a human opponent. It was a hoax because it was not an automaton at all: there was a human chess player concealed inside the machine, secretly operating it.

Mechanical Turk

Amazon calls its service Artificial Artificial Intelligence, and it is a form of ‘fake’ machine learning. We use software to submit tasks for classification, but real people all over the world get paid a little money to do the categorising for us.

Manual Classification

Unfortunately, the results I achieved from Mechanical Turk were poor. Even humans incorrectly classified many tasks, and this data, if fed into my machine learning classifier, would lead it to poor conclusions and low accuracy. The Turkers may have lacked some specialised knowledge about graphic design, or I may not have set up the Mechanical Turk task sufficiently well. (I wish I had read this post before diving into Mechanical Turk!)

Ultimately, having an accurate training set is perhaps the most important part of developing a good classifier. I rolled up my sleeves, and manually inspected and classified approximately 1200 Swiftly design briefs myself. This was slow and monotonous, but it meant that I knew I had an excellent quality training set.

Pre-processing Pipeline

Our classifier doesn’t accept raw text, but instead we must turn design briefs into features it can make decisions on. Human language is complicated, so there are many steps to go from text to features. Any good natural language system has such a pipeline. In ours, we:

  1. Tokenise: split the text up into individual ‘words’
  2. Remove punctuation and casing
  3. Remove stop words (common words with no predictive power such as ‘a’, ‘the’)
  4. Perform stemming (reducing words to their ‘stem’ e.g. “bounced”, “bounce”, bouncing” and ” “bounces” all become “bounc”)
  5. Perform lemmatisation (see below)
  6. Convert from words (“unigrams”) to word pairs (“bigrams”)

The first four steps we covered in the last post, let’s go over steps 5 and 6 here.

Lemmatisation

Lemmatisation is similar to stemming. It’s the process of grouping related words together by replacing several variations with a common shared symbol. For example, Swiftly task descriptions often contain URLs. Lemmatisation of URLs would mean replacing every URL with a common placeholder (for example “$URL”). So the following brief:

On this business card, please change “www.coolguynumber1.com” to “www.greatestdude.org”

becomes:

On this business card, please change “$URL” to “$URL”

We do this because the number of words that occur in the data set is large, but many only occur once or twice. Nearly every URL we see in a brief will be unique. For our machine learner, it can only say something useful about words which are shared between different tasks, so all these unique words and URLs are wasted.

We do this because pre-processing involves generating a list of all the words that appear in the training dataset. However, words that only appear once in the dataset are removed because they add noise. URLs are generally unique and are unlikely to occur more than once. Without lemmatisation, we lose all information gained from the presence of URLs in a brief. With lemmatisation, we instead get the symbol “$URL” many times. If a URL in a task description turns out to be a discriminating feature, this should increase classification accuracy.

Other lemmas that I used included: dimensions (e.g. 300px x 400px), emails, DPI measures and hexadecimal codes for colours (eg. #CC3399). With these, the following (entirely fictional) task description transforms from:

Please change the email on this business card from coolguy99@99designs.com to koolguy99@99designs.com. Can you also include a link to my website www.coolestguyuknow.net on the bottom? Please also change all the fonts to #CC3399 and the circle to #4C3F99. I want a few different business card sizes, namely: 400 x 400, 30 x 45 and 5600 by 3320. Thanks!

to:

Please change the email on this business card from $EMAIL to $EMAIL. Can you also include a link to my website $URL on the bottom? Please also change all the fonts to $CHEX and the circle to $CHEX. I want a few different business card sizes, namely: $DIM, $DIM and $DIM. Thanks!

Now URLs, email addresses, dimensions and so on can all take many different forms. The easiest way to match as many as possible is to use regular expressions. I used these patterns to perform my lemmatisation (for Python’s re module), you might find them useful too.

1
2
3
4
5
6
# URL regex from: http://daringfireball.net/2010/07/improved_regex_for_matching_urls
DIM_RE = re.compile(r"\b\d+\s?(?:[wW]|px|Px|[Pp]ixels|[hH])?\s*(?:x|by|X)\s*\d+\s?(?:[hH]|px|Px|[Pp]ixels|[wW])?\b", re.DOTALL)
URL_RE = re.compile(r"""((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.‌​][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))*))+(?:(([^\s()<>]+|(‌​([^\s()<>]+)))*)|[^\s`!()[]{};:'".,<>?«»“”‘’]))""", re.DOTALL)
EMAIL_RE = re.compile(r"[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*", re.DOTALL)
DPI_RE = re.compile(r"""\d+\s?(?:DPI|dpi)""")
CHEX_RE = re.compile(r"""#[A-Fa-f0-9]{6}""")

Bigrams

Previously I had worked with each word in the text individually (“unigrams”), but this often means words have no context. So, for example, “business card” was broken into “business” and “card”, and the importance of those words appearing together was lost. Bigrams are simply pairs of words that appear next to each other. So, if we include both unigrams and bigrams, the text “business card” would provide us the features “business”, “card” and “business card”. This captures more of the context of certain phrases. In our data, the top bigrams after stemming were:

bigramfrequency
would like 72
logo add 49
take exist 47
fun creativ 47
add fun 44
exist logo 44
busi card 33
transpar background 28
$URL $URL 24
creativ festiv 22
festiv element 22
bat pumpkin 21
spooki element 21
pumpkin skeleton 21
busi name 20
name logo 20

The pipeline in action

Let’s do a worked example using the sentence below:

Please change the email on this business card from coolguy99@gmail.com to koolguy99@gmail.com. Thanks!

Our pipeline first tokenises the sentence into words. Follow each word from left to right in the table below to see how it gets transformed by the pipeline.

STEP 1STEP 2STEP 3STEP 4STEP 5
Tokenisation Punctuation / Case Removal Stop WordsStemmingLemmatisation
Pleaseplease---
changechangechangechangchang
thethe---
emailemailemailemailemail
onon---
thisthis---
businessbusinessbusinessbusibusi
cardcardcardcardcard
fromfrom---
coolguy99@gmail.comcoolguy99@gmail.comcoolguy99@gmail.comcoolguy99@gmail.com$EMAIL
toto---
koolguy99@gmail.comkoolguy99@gmail.comkoolguy99@gmail.comkoolguy99@gmail.com$EMAIL
Thanks!thanks---

Finally we generate bigrams, which leaves us with the following list of features: “chang”, “email”, “busi”, “card”, “$EMAIL”, “chang email”, “email busi”, “busi card”, “card $EMAIL” and “$EMAIL $EMAIL”.

Vectorisation

As discussed in the last post, we need to convert text into a numerical format. I used a simple model known as the bag-of-words vector space model. This model represents each document as a vector, a count of how many time each different word occurred in it. The vector will have n dimensions, where n is the total number of terms in the whole collection of documents. In the training dataset, there are 9186 tokens. Each brief is sparse – the vast majority of terms will have a count of 0.

Once the data set has been converted into vectors, it can be used to train a supervised learning algorithm.

Supervised Learning: Training the Classifier

Now that our data’s in the desired format, we can finally develop as system that learns to tell the difference between the various categories. This is called building a classifier model. Once the model has been built, new briefs can be fed into it and it will predict their category (called their label).

supervised classification image credit: NLTK

What we’ve discussed so far is getting labels and extracting features using our pipeline. But what algorithm should we use?

Multinomial Naive Bayes

I have chose to use the Multinomial Naive Bayes (“MNB”) classifier for this task. The Naive Bayes Wikipedia page does a good job of explaining the mathematics behind the classifier in detail. Suffice to say that it is simple, computationally efficient and has been shown to work surprisingly well in the field of document classification.

A (simplified) worked Example

A simplified way of thinking about how the algorithm works in the context of document classification is:

  1. For each token in the total training dataset, what is the probability of that token being associated with each class?
  2. For each token in a particular brief, add up the probabilities of each class for each token
  3. pick the class with highest probability.

So, say we have the following probabilities (after laplacian smoothing and normalisation) for the tokens from our earlier example occurring in each category type:

Token Name Other Image Header / Banner / Ad/ Poster /Flier Logo Business Card Template work (ppt / pdf /word etc) Icon Social Media
card 0.00019 0.00322 0.00257 0.0155 0.00021 0.00055 9e-05
busi 0.00038 0.00154 0.00325 0.00915 0.00021 0.00048 0.00037
busi card 6e-05 0.00055 0.00174 0.00904 0.00021 0.00048 9e-05
chang 0.00275 0.00445 0.00416 0.00525 0.00064 0.00159 0.00028
file 0.00596 0.00395 0.00649 0.00525 0.00245 0.00408 0.00241
logo 0.00096 0.0054 0.0266 0.00513 0.00075 0.00512 0.00408
need 0.00832 0.00672 0.00717 0.00478 0.00139 0.00623 0.00232
attach 0.00467 0.00622 0.00364 0.00414 0.0017 0.00484 0.0012
updat 0.00013 0.00104 0.00079 0.00391 0.00032 0.00042 9e-05
$EMAIL 0.00019 0.00073 0.0002 0.00373 0.00032 0.00014 0.00028

Given the the brief:

update the logo on my business card

We would match up each token with it’s probabilities in the table above, giving us the following table. Adding up each column would then give us a score for that class.

Token name Other Image Header / Banner / Ad / Poster / Flier Logo Business Card Template work (ppt / pdf /word etc) Icon Social Media
card 0.00019 0.00322 0.00257 0.0155 0.00021 0.00055 9e-05
busi 0.00038 0.00154 0.00325 0.00915 0.00021 0.00048 0.00037
busi card 6e-05 0.00055 0.00174 0.00904 0.00021 0.00048 9e-05
logo 0.00096 0.0054 0.0266 0.00513 0.00075 0.00512 0.00408
updat 0.00013 0.00104 0.00079 0.00391 0.00032 0.00042 9e-05
sum: 0.00172 0.0118 0.035 0.0427 0.0017 0.00705 0.00472

Business card has the highest score, and so that is our prediction. Simple! The mathematics is a little more sophisticated than this, but the intuition behind it is the same.

Classifier Structure

Now, we have two types of classes to predict, document type and task type. I decided to build the machine learning classifier structure reflect this. A top level classifier which predicts the document type (logo, business card, etc), trained using the full dataset. Then we have a separate specialised classifier for each document type which will predict the task category. So, we will have a classifier just for working out the task type for business card cases, trained only on those cases.

The training and classification is summarised in these handy diagrams.

Classifier Training

Classifier Structure

Classification

Classifier Structure

Results

Are we getting good predictions?

To see whether our algorithm is, in fact, learning with experience, we can plot a learning curve. This tells us both how the classifier is doing, and how helpful more data would be. To test this, I plotted the 10-fold cross-validated accuracy of the top-layer classifier as the training set size is increased:

Learning Curve 1

It looks like our machine is learning! The more data it sees, the better it gets at picking out the correct category. It looks as though accuracy may flatten off at about 80%. This suggests that to do better, we’d need to find new features instead of just collecting more cases. The sub-classifiers, as a result of the classifier structure, have less data to work with in the training set. However, they appeared to follow a similar learning curve.

Accuracy of various implementations

Over the course of my experiments, I tested the accuracy of a variety of implementation and algorithms. For those interested in the details, accuracy figures are below.

Classifier Type / Algorithm Type MNB NBBaseline
Specialised Sub-Classifier:
    Top Level Classifier 78.62 %60.17 %36.33 %
    Sub-Classifier 69.46 % 61.54 %32.97 %
    Combined accuracy 54.61 %37.03 %11.97 %
Generalised Sub-Classifier:
    Top Level Classifier 78.62 %60.17 %36.33 %
    Sub-Classifier59.97 %50.95 % 24.13 %
    Combined accuracy47.15 %30.66 %8.77 %
Single Classifier:
    Accuracy 45.58 % 39.12 %11.43 %

The “Specialised Sub-Classifier” is the implementation we discussed above, whereas the “Generalised Sub-Classifier” used a single classifier to look at task type, rather than one per document type. The “Single Classifier” tries to hit both targets at once, classifying against the full set of 63 category combinations. I also compared multinomial naive bayes against naive bayes (NB) and a simple Zero-R baseline.

Wrapping up

The two-tier classifier approach worked the best, picking the document type correctly nearly 80% of the time, but getting both document and task type right only 55% of the time. The Multinomial Naive Bayes also did better than Naive Bayes on this task, as expected.

How might we improve our results? We could investigate:

Next Time

Next time, I will be discussing the how this system can be applied to assist with the next stage of the customer to designer matching process. How do we figure out which categories a particular designer may be good at? And how do we make sure that designer gets those tasks?

About Daniel

Daniel Williams is a Bachelor of Science (Computing and Software Science) student at the University of Melbourne and Research Assistant at the Centre for Neural Engineering where he applies Machine Learning techniques to the search for genetic indicators of Schizophrenia. He also serves as a tutor at the Department of Computing and Information Systems. Daniel was one of four students selected to take part in the inaugural round of Tin Alley Beta summer internships and he now works part-time at 99designs. Daniel is an avid eurogamer, follower of “the cricket”, and hearty enjoyer of the pub.

We’re Hiring a Ruby on Rails Developer

by John Barton

Here at 99designs we’re what you’d call a polygot shop – we’ve got a mix of PHP, Ruby, Python, and Go in production. When we say production, we mean at serious scale. Our mission is to connect the world with great graphics designers wherever they are, something which we do quite a bit of.

Right now we’re on a hunt for a developer who can Help Us Out™. Usually we advertise for a generalist “web developer” and then find the right place for them internally based on their strengths. This time we’re trying to hire a very specific skill set for a very specific project. The skills are Ruby and Rails, and the project is building out our new payments service.

Company wide we’re transitioning to having small, decentralised teams with their own product lines and the attendant SOA/Platform to support that goal. Last year we had great success with creating our single sign-on system in Go, and this year we’re rounding out the platform with a shared payments system in Rails*.

This new service will enable us to spin up new product lines or move into new international markets quickly. Between the iterative approach we’re taking to replace our old payments system and the UX for both the customers using the service and the developers integrating it there are some exciting and interesting problems to solve on this project.

The existing team on project are very strong developers with good knowledge of the problem space but not a lot of Rails experience. We need a mid to senior developer to come in and help “set the tone” of the codebase. That role had been filled within the team by me (John Barton, internet famous as “angry webscale ruby guy”), but I’ve since been promoted to manage the engineering team as a whole and between all the meetings and spreadsheets it’s hard to keep up the pace of contribution that this project deserves.

You’ll need to be the diesel engine of the team: churn through the backlog turning features into idiomatic and reliable Rails code at a steady cadence. There are opportunities to coach within the team, but even just creating a sizeable body of code to be an example of “this is how we do it” (cue Montell Jordan playing https://www.youtube.com/watch?v=0hiUuL5uTKc) will keep this project on track.

The quality of the codebase after 3 months of progress is high. We don’t believe in magic make-believe numbers here, but right now we’re sitting on a code climate GPA of 4.0. If you’re a fan of Sandi Metz’s Practical Object Oriented Design in Ruby or Avdi Grimm’s Objects on Rails you will feel right at home in this codebase.

If this is something you’re interested in and think you can help us out with, check out the job ad

*You may be wondering “why not go?” for this system. The short answer is that there’s enough complexity in the business rules that the expressiveness of Ruby is very useful, and being a financial project moving numbers around in a database is very important and ActiveRecord is more mature that any of the ORMs available in Go right now. I’m happy to elaborate on our line of thinking during your interview ;-)

Debugging Varnish

by Richo Healey

At 99designs we heavily (ab)use Varnish to make our app super fast, but also to do common, simple tasks without having to invoke our heavy-by-contrast PHP stack. As a result, our Varnish config is pretty involved, containing more than 1000 lines of VCL, and a non-trivial amount of embedded C.

When we started seeing regular segfaults, it was a pretty safe assumption that one of us had goofed writing C code. So how do you track down a transient segfault in a system like Varnish? Join us down the rabbit hole…

Get a core dump

The first step is to modify your production environment to provide you with useful core dumps. There are a few steps in this:

  1. First of all, configure the kernel to provide core dumps by setting a few sysctls:

    echo 1 > /proc/sys/kernel/core_uses_pid
    echo 2 > /proc/sys/fs/suid_dumpable
    mkdir /mnt/cores
    chmod 777 /mnt/cores
    echo  /mnt/cores/core > /proc/sys/kernel/core_pattern
    

    In order, this:

    • Tells the kernel to append pid’s to core files to make it easy to marry up the cores with the logs
    • Tells the kernel that suid binaries are allowed to dump core
    • Creates a place to store cores on AWS’s ephemeral storage (if like us you’re on EC2)
    • Tells the kernel to write core files out there
  2. With this done, and no known way to trigger the bug, play the waiting game.

  3. When varnish explodes, it’s show time. Copy the core file, along with the shared object that varnish emits from compiling the VCL (Located in /var/lib/varnish/$HOSTNAME) over to a development instance and let the debugging begin.

Locate the crash point

If you have access to the excellent LLDB from the LLVM project, use that. In our case, getting it to work on Ubuntu 12.04 involves upgrading half the system, resulting in an environment too dissimilar to production.

If you spend a lot of time in a debugger, you’ll probably want to use a helper like fG!’s gdbinit or voltron to make your life easier. I use voltron, but because of some of the clumsiness in gdb’s API, immediately ran into some bugs.

Finally, debugging environment working, it’s time to dig into the crash. Your situation is going to be different to ours, but here’s how we went about debugging a problem like this recently:

Debugging the core dump with voltron

As you can see in the [code] pane, the faulting instruction is mov 0x0(%rbp),%r14, trying to load the value pointed to by RBP into r14. Looking in the register view we see that RBP is NULL.

Inspecting the source, we see that the faulting routine is inlined, and that the compiler has hijacked RBP (The base pointer for the current stack frame) to use as argument storage for the inline routine

The offending assembly code

Of specific interest is this portion:

1
2
3
4
5
6
   0x000000000045a7c9 <+265>:   mov    0x223300(%rip),%rbp        # 0x67dad0 <pagesize_mask>
   0x000000000045a7d0 <+272>:   not    %rbp
   0x000000000045a7d3 <+275>:   and    0x10(%r14),%rbp
   0x000000000045a7d7 <+279>:   cmpb   $0x0,0x223303(%rip)        # 0x67dae1 <opt_junk>
=> 0x000000000045a7de <+286>:   mov    0x0(%rbp),%r14
   0x000000000045a7e2 <+290>:   mov    0x28(%r14),%r15

Which in plain english:

  • Loads a rip relative address into rbp (pagesize_mask)
  • Inverts rbp bitwise
  • Performs a bitwise and against 16 bytes into the structure pointed to by r14, (mapelm->bits)
  • Pointlessly checks if pagesize_mask is NULL
  • Tries to load the address pointed to by rbp into r14, which faults.

Which is emitted by:

1
2
3
4
5
6
7
8
9
10
11
12
static inline void
arena_dalloc_small(arena_t *arena, arena_chunk_t *chunk, void *ptr,
    arena_chunk_map_t *mapelm)
{
    arena_run_t *run;
    arena_bin_t *bin;
    size_t size;

    run = (arena_run_t *)(mapelm->bits & ~pagesize_mask);
    assert(run->magic == ARENA_RUN_MAGIC);
    bin = run->bin; // XXX KABOOM
    size = bin->reg_size;

We now know that the fault is caused by a mapelm struct with a bits member set to zero; but why are we getting passed this broken struct with garbage in it?

Digging in deeper

Since this function is declared inline, it’s actually folded into the calling frame. The only reason it actually appears as in the backtrace is because the callsite is present in the DWARF debugging data.

We can poke at the value by inferring its location from the upstream assembly, but it’s easier to jump into the next upstream frame and inspect that:

1
2
3
4
5
6
7
8
9
(gdb) frame 1
#1  arena_dalloc (arena=0x7f28c4000020, ptr=0x7f28c40008c0, chunk=0x7f28c4000000) at jemalloc_linux.c:3939
3939    in jemalloc_linux.c
(gdb) info locals
pageind = <optimized out>
mapelm = 0x7f28c4000020
(gdb) p *mapelm
$3 = {link = {rbn_left = 0x300000001, rbn_right_red = 0x100002fda}, bits = 0}
(gdb)

So this looks like an element in a red black tree, with two neighours and a null for the bits member. Let’s double check:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(gdb) ptype *mapelm
type = struct arena_chunk_map_s {
    struct {
        arena_chunk_map_t *rbn_left;
        arena_chunk_map_t *rbn_right_red;
    } link;
    size_t bits;
}
(gdb) ptype arena_run_t
type = struct arena_run_s {
    arena_bin_t *bin;
    unsigned int regs_minelm;
    unsigned int nfree;
    unsigned int regs_mask[1];
}
(gdb)

Wait, wat?

Looking back to get our bearings:

1
run = (arena_run_t *)(mapelm->bits & ~pagesize_mask);

The code is trying to generate a pointer to this arena run structure, using the number of bits in the mapelm struct, AND against the inverse pagesize_mask to locate the start of a page. Because bits is zero, this is the start of the zero page; a NULL pointer.

This is enough to see how it’s crashing, but doesn’t give us much insight for why. Let’s go digging.

Looking back at the code snippit, we see an assertion that the arena_run_t structure’s magic member is correct, so with that known we can go looking for other structures in memory. A quick grep turns up:

1
./lib/libjemalloc/malloc.c:#  define ARENA_RUN_MAGIC 0x384adf93

pagesize_mask is just the page size -1, meaning that any address bitwise AND against the inverse of the pagesize_mask will give you the address at the beginning of that page.

We can therefore just search every writable page in memory for the magic number at the correct offset.

.. Or can we?

1
2
3
4
5
6
7
8
9
10
typedef struct arena_run_s arena_run_t;
struct arena_run_s {
#ifdef MALLOC_DEBUG
    uint32_t    magic;
#  define ARENA_RUN_MAGIC 0x384adf93
#endif

    /* Bin this run is associated with. */
    arena_bin_t *bin;
...

The magic number and magic member of the struct (Conveniently located as the first 4 bytes of each page) only exists if we’ve got a debug build.

Aside: can we abuse LD_PRELOAD for profit?

At this point all signs point to either a double free in varnish’s thread pool implementation, leading to an empty bucket (bits == 0), or a bug in its memory allocation library jemalloc.

In theory, it should be pretty easy to rule out jemalloc, by swapping in another malloc library implementation. We could do that by putting, say tcmalloc, in front of its symbol resolution path using LD_PRELOAD:

We’ll add:

1
export LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0

to /etc/varnish/default and bounce varnish. Then move all the old core files out of the way, wait (and benchmark!)

However, there’s a flaw in our plan. Older versions of varnish (remember that we’re on an LTS distribution of Ubuntu) vendor in a copy of jemalloc and statically link it, meaning that the symbols free and malloc are resolved at compile time, not runtime. This means no easy preload hacks for us.

Rebuilding Varnish

The easy solution won’t work, so let’s do the awkward one: rebuild varnish!

1
apt-get source varnish

Grab a copy of the varnish source, and link it against tcmalloc. Before that though, I deleted lib/libjemalloc and used grep to remove every reference to jemalloc from the codebase (Which was basically just some changes to the configure script and makefiles)

and then add -ltcmalloc_minimal to CFLAGS before building. As an aside, the ubuntu packages for tcmalloc ship /usr/lib/libtcmalloc_minimal.so.0 but not /usr/lib/libtcmalloc_minimal.so, which means the linker can’t find them. I had to manually create a symlink.

With this new varnish in production, we haven’t yet seen the same crash, so it appears that it was a bug in jemalloc, probably a nasty interaction between libpthread and libjemalloc (The crash was consistently inside thread initialization).

Try it yourself?

Let’s hope not. But if you do a lot of Varnish hacking with custom extensions, occasional C bugs are to be expected. This post walked you through a tricky Varnish bug, giving you an idea of the tools and tricks around debugging similar hairy segfaults.

If you’re messing around with voltron, you might find my voltron config and the tmux script I use to setup my environment a useful starting point.

Swiftly and Machine Learning: Part 1

by Daniel Williams

In this series of guest blog posts, 99designs intern Daniel Williams takes us through how he has applied his knowledge of Machine Learning to the problem of classifying Swiftly tasks.

Introduction

Swiftly is an online service from 99designs that lets customers get small graphic design jobs done quickly and affordably. It’s powered by a global network of professional designers who tackle things like business card updates and photo retouching in 30 minutes or less – an amazing turnaround time for a service with real people in the loop!

Given that we have a pool of designers waiting for customer work, how can we best allocate them tasks? Currently we take a naive but fair approach: assign each new task to the designer that has been waiting in the queue the longest. But there’s room for improvement: designers excel at different types of tasks, so ideally we’d match tasks to designers based on expertise. To do this we need to be able to categorise tasks by the skills they require.

In today’s approach, we’ll try to solve the problem with machine learning. The first step is to find a way to automatically categorise a design brief, with categories forming our “areas of expertise”. The next will be figuring out what categories a particular designer is good at. If we can build solid methods for both these two steps, we can begin matching designers to tasks.

In this post, I’ll introduce the problem and walk through some attempts at applying unsupervised techniques for discovering task categories. Follow along, and you may recognise a similar situation of your own that you can apply these methods to.

Swiftly tasks

Swiftly tasks are meant to be quick to fire off and highly flexible. The customer fills in a short text box saying what they want done, uploads an image or two, and then waits for the result. This type of description, plain text and raw images, is highly unstructured. Since image recognition and indexing is its own hard problem, we’ll skip the images for now and focus on the text.

Here’s a couple of examples:

Task A

More Handsome

  1. Remove the man’s glasses.

  2. Make the man’s face MORE HANDSOME.

Task B

In my logo, there is a “virtual” flight path of an airplane. I have had comments that the virtual flight path goes into the middle of the Pacific Ocean for no reason - not a logical graphic. I want you to “straighten” out the flight path - as shown on the Blue lines in the attached PDF titled “Modified_Logo.PDF.” I still want the flight path lines to be in white, with black triangles separating the segments. I just want the segments to be straighter and not go over the ocean as in the original. Please contact me for any clarification. I am uploading the EPS and AI files as well to make the change. Thank you!

How might a human classify these tasks? I would probably classify the first as “image manipulation” and the second as “logo refresh,” although the second could just as easily also be “image manipulation” as well. Already you can see that classifying these sorts of tasks into concrete categories is perhaps going to be more art than science.

Figuring out the categories

The first major problem is deciding on a sensible set of categories. This has turned out to be more difficult than I first imagined. Customers use Swiftly for a wide range of tasks. Plus, there’s quite a bit of overlap — one Swiftly task is sometimes a combination of multiple small tasks. My initial approach, just to get a feel for the data, was to eyeball 100 task briefs and attempt to invent categories and classify them manually. The result of this process:

Category Number of Tasks
Logo Refresh (Holidays) 34
Logo Refresh 11
Copy Change 11
Vectorise 13
Resize/Reformat 17
Transparency 1
Image Manipulation 10
Too hard to classify 3

A large number of the instances were hard to classify, even for a human! I was not 100% happy with the categories that I came up with, with many tasks not fitting comfortably in the buckets. I decided to apply some unsupervised machine learning techniques in any attempt to cluster design briefs into logical groups. Can a machine do better?

Unsupervised clustering

I explored software called gensim, an unsupervised natural language processing and topic modelling library for Python. Gensim comes equipped with various powerful topic modeling algorithms, which are capable of extracting a pre-specified number of topics and associating words with those topics. It also helps with converting a corpus of documents into various formats (e.g. vector space model). The main algorithm that I made use of is called Latent Dirichlet Allocation. The first step is converting the text corpus into a model that allows for the application of mathematical operations.

The vector space model

To apply mathematical-based algorithms to natural language, we need to convert language into a mathematical format. I used a simple model known as the bag-of-words vector space model. This model represents each document as a vector, where each dimension of the vector corresponds to a different word. The value of a word in a particular document is just the number of times it appeared in that particular document. The vector will have n dimensions, where n is the total number of terms in the whole collection of documents. Let’s try an example.

Say we have the following collection of documents:

  1. The monster looked like a very large bird.
  2. The large bird laid very large eggs.
  3. The monster’s name was “eggs.”

After finding all the unique words (“the,” “monster”, etc.) and assigning them an index in the vector, we can count those words in each document to turn each document into a word frequency vector:

  1. (1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0)
  2. (1, 0, 0, 0, 0, 1, 2, 1, 1, 1, 0, 0)
  3. (1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)

Corpus pre-processing

If you just split your text into words on whitespace and apply this naively, the results can be messy. On the one hand text contains punctuation we want to ignore. On the other, this is going to work best when we have lots of words in common between the documents. Do we really want to treat “Egg”, “egg” and “eggs” as different words? To get the best results, you deal with these kinds of problems in a pre-processing step.

In our pre-processing, we:

  1. Split the document description into individual tokens (i.e. words)
  2. Put tokens into lower case
  3. Remove punctuation from start and end of tokens
  4. Remove stop words (e.g. “and”, “but”, the”, …)
  5. Perform stemming

Stemming is the process where words are reduced to their “stem” or root format, basically chopping any variation off their end. For example, the words “stemmer,” “stemming” and “stemmed” would all be reduced to just “stem”. I used the nltk implementation of the snowball stemmer to perform this step. All of these steps can be performed very easily in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import nltk

PUNCTUATION = """!@#$%^&*()_+=][{}'-";:/?\.,~`"""

def tidy_text(task_description):
  """ Does the following:
 1. Tokenises words
 2. Removes punctuation
 3. Removes stop words
 4. Puts words through the snowball stemmer"""

  stemmer = nltk.stem.snowball.EnglishStemmer()
  stopwords = nltk.corpus.stopwords.words('english')

  outwords = []
  for word in task_description.split():
      word = word.strip(PUNCTUATION).lower()
      if word not in stopwords:
          outwords.append(stemmer.stem(word))

  return outwords

Running our earlier bird examples through this function, we get:

1
2
3
['monster', 'look', 'like', 'larg', 'bird']
['larg', 'bird', 'laid', 'larg', 'egg']
['monster', 'name', 'egg']

This process reduces the noise in the vector space model, because tokens that mean the same thing are assigned the same token (through stemming and punctuation and caps normalisation) and words that probably do not add any meaning are removed (through stop word removal). Eventually, I expected the pre-processing steps to be much more in depth, but for now this should get us started.

Latent Dirichlet Allocation (LDA)

LDA is an algorithm developed to automatically discover topics contained within a text corpus. Gensim uses an “online” implementation of LDA, which means that it breaks the documents into chunks and regularly updates the LDA model (as opposed to batch which processes the whole corpus at once). It is a generative probabilistic model that uses Bayesian probabilities to assign probabilities that each document in the corpus belongs to a topic. Importantly, the number of topics must be supplied in advance. Since I did not known how many topics might exist, I decided to apply LDA with varying numbers of topics. For example, if we did an LDA with 5 topics, the result for a single document might look like this:

1
[(0, 0.0208), (1, 0.549), (2, 0.0208), (3, 0.366), (4, 0.0208), (5, 0.0208)]

Which means LDA places that document 2% in topic 0, 55% in topic 1, 20% in topic 2 and so on. For the simple analysis I am doing, I just want the best guess topic. We can convert the result from probabilistic to deterministic by just picking the best guess.

1
max(x, key=lambda lda_result:lda_result[1])

Much of my approach in the following segments is based on Gensim’s author’s LDA guides.

Pre-processing for LDA

I extracted ~4400 job descriptions from the Swiftly database. I removed formatting of each, and applied the pre-processing steps described above (tokenisation, stemming, stop word removal etc.). The result was a plain text file, with each pre-processed Swiftly job on a new line, like this:

within attach illustr file top left window white background we’d like follow item creat use 2 version complet logo also word there vertic version horizont version 1 creat version taglin get organ 2 logo 2 put 4 logo 2 taglin 2 without transpar background

need make titl look better take text top adjust remain element offic los angel mayor eric garcetti partnership ucla labor center rosenberg foundat california communiti foundat california endow asian americans/pacif island philanthropi cordial invit close recept also add hashtag bottom descriptor dreamsummer13 take rsvp august 19 2013

need logo revamp want logo look great monogram ex chanel gucci lv etc logo consist letter r&b want classi font letter either back back intertwin ex roll royc logo ysl gucci lv etc

tri new font similar attach chang colour solid blue rather way edg fade white/light blue look font use www.tradegecko.com logo style font look

want logo tag line made bigger line logo origin cut close caus distort need logo deliv format includ transpar also imag clear need enhanc

remov man glass make man face handsom

I then used the gensim tools to create the vector model required for LDA. On the recommendation of the gensim authors, I also removed all tokens that only appeared once. The doc2bow function used in the MyCorpus class below converts the document into the vector space format discussed above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from gensim import corpora, models, similarities

# pre-process swiftly jobs, each job on a newline
CORPUS = "StemmedStoppedCorpus.txt"

def corpus():
  for line in open(CORPUS):
      yield dictionary.doc2bow(line.split())



  # create dictionary mapping between text and ids
  dictionary = corpora.Dictionary(line.split() for line in open(CORPUS))

  # find words that only appear once in the entire doc set
  once_ids = [tokenid for tokenid, docfreq in dictionary.dfs.iteritems() if docfreq == 1]

  # remove once words
  dictionary.filter_tokens(once_ids)

  # "compactify" - removes gaps in ID mapping created by removing the once words
  dictionary.compactify()

  # save dictionary to file for future use
  dictionary.save("swiftly_corpus.dict")

  # create a corpus object
  swiftly_corpus = MyCorpus()

  # store to disk, for later use
  corpora.MmCorpus.serialize("swiftly_corpus.mm", swiftly_corpus)

Regarding the above code, the MM file is a file format known as Matrix Market format, which represents a matrix of sparse vectors. The dictionary file above simply maps the word_id integers that are used in the MM format to the actual word each id represents.

Applying LDA

Now that the corpus has been stored as a matrix of vectors, we can apply the LDA model and start clustering the Swiftly jobs. This is done with the following lines of code. We can generate different models by changing the num_topics argument in the ldamodel.LdaModel() function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import logging, gensim

# pre-processed swiftly data files
DICTIONARY = "swiftly_corpus.dict"
MM_FILE = "swiftly_corpus.mm"

# the number of topics to create
N_TOPICS = 6

# set up logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

# load mapping dictionary
id2word = gensim.corpora.Dictionary.load(DICTIONARY)

# load market matrix file
mm = gensim.corpora.MmCorpus(MM_FILE)

# create the lda model.  Use Chunks of 500 documents, update model once per chunk analysis, and repeat 3 times.
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=N_TOPICS, update_every=1, chunksize=500, passes=3)

# save the results
lda.save("swiftly_lda{0}_model.lda".format(N_TOPICS))LDA Results

We can use gensim’s lda.showtopics() method to get a sense of the different clusters that LDA has picked out.

1
2
3
4
5
print "LDA where K = {0}\n".format(N_TOPICS)
count = 0
for i in lda.show_topics(topics=-N_TOPICS, topn=20, log=False, formatted=True):
  print "TOPIC {0}: {1}\n".format(count, i)
  count +=1

Where N_TOPICS = 6, the results are:

LDA where K = 6

TOPIC 0: 0.033*logo + 0.028*holiday + 0.025*busi + 0.023*name + 0.021*card + 0.020*chang + 0.019*follow + 0.013*christma + 0.011*incorpor + 0.009*font + 0.009*compani + 0.008*attach + 0.008*line + 0.008*text + 0.007*need + 0.007*replac + 0.007*like + 0.007*2 + 0.006*1 + 0.006*would

TOPIC 1: 0.032*background + 0.029*like + 0.020*logo + 0.020*imag + 0.018*white + 0.018*make + 0.018*need + 0.018*would + 0.017*look + 0.016*color + 0.013*transpar + 0.013*font + 0.012*text + 0.012*black + 0.012*use + 0.010*chang + 0.010*want + 0.010*word + 0.010*one + 0.009*also

TOPIC 2: 0.049*logo + 0.042*file + 0.032*exist + 0.032*creativ + 0.029*element + 0.028*fun + 0.028*etc + 0.025*take + 0.024*need + 0.020*add + 0.020*vector + 0.020*festiv + 0.017*snowflak + 0.015*tree + 0.013*attach + 0.013*use + 0.011*ai + 0.010*snow + 0.010*ep + 0.009*convert

TOPIC 3: 0.031*logo + 0.023*need + 0.020*file + 0.016*attach + 0.014*like + 0.014*look + 0.014*color + 0.013*make + 0.012*use + 0.010*imag + 0.010*size + 0.008*would + 0.008*design + 0.008*2 + 0.008*want + 0.008*halloween + 0.007*format + 0.007*version + 0.007*creat + 0.007*chang

TOPIC 4: 0.040*x + 0.031*imag + 0.024*cover + 0.020*px + 0.018*photo + 0.018*app + 0.013*pictur + 0.012*need + 0.011*size + 0.010*book + 0.009*icon + 0.009*screen + 0.008*googl + 0.008*73 + 0.008*2 + 0.007*attach + 0.007*suppli + 0.007*psd + 0.006*back + 0.006*chang

TOPIC 5: 0.055*celebr + 0.049*add + 0.029*decor + 0.027*logo + 0.024*take + 0.017*banner + 0.015*facebook + 0.014*bottom + 0.014*pumpkin + 0.013*profil + 0.012*bat + 0.012*spooki + 0.012*skeleton + 0.011*side + 0.010*right + 0.009*text + 0.009*say + 0.008*pictur + 0.008*element + 0.008*etc

The number before each token represents how discriminating that token is for the category. Ideally, by eyeballing the discrimiating tokens for a topic we could understand and identify it, giving it a useful name. As you can see, this proved to be difficult. I suspected that there are probably more than six unique categories of tasks on Swiftly, so I run LDA with N_TOPICS set to different numbers. With 15 (this time just top 10 words, without numbers, formatted into a table for easier comprehension), the results are:

TOPIC
1
TOPIC
2
TOPIC
3
TOPIC
4
TOPIC
5
TOPIC
6
TOPIC
7
TOPIC
8
TOPIC
9
TOPIC
10
TOPIC
11
TOPIC
12
TOPIC
13
TOPIC
14
TOPIC
15
imag need element tree yellow creativ celebr file like chang festiv logo need name x
file imag exist snow use take logo background look color pdf card attach follow cover
pictur attach etc santa view add decor logo snowflak blue send christma page busi photo
like size logo thanksgiv new fun make holiday logo code file use imag logo like
line file icon leav servic logo etc need would red need font text chang would
high word halloween gold replac pumpkin word vector color font back attach websit incorpor look
resolut 2 app outlin team spooki possibl transpar want dark page like px compani suppli
photoshop make add make super bat add white make green digit creat pictur card 73
layer logo like fall feel skeleton text png someth match psd file use line websit
hand 1 theme turkey color offer bit ai font panton version busi photo replac templat

At this point, I realised that more pre-processing would be requried to get this right. For instance, it seemed strange that in topic 15 the most discriminating word is ‘x’. Looking closer, I realised that this is because topic 15 represents a resize / reformatting job brief. The ‘x’ gets picked out because a large number of customers are specifying dimensions (e.g. 200px x 500px). I was also surprised to find out that ‘73’ was so discriminating, but a little bit of digging revealed that a twitter profile picture is 73x73 pixels. To address this problem, I plan to use a preprocessing step called Lemmatisation.

Lemmatisation is useful for grouping things like numbers, colours, URLs, email addresses and image dimensions together so that different values are treated equally. For example, if there is a specific colour mentioned in a brief, we don’t really care what the specific colour is—we just care that the brief mentions a colour. In our case, we believe that a brief containing a colour (e.g. #FF00FF) or image dimensions (e.g. 400x300) might give us clues about what type of task it is so we convert anything that looks like these to the tokens $COLOUR and $DIM.

Despite the shortcomings of my pre-processing, this clustering task has picked out some interesting topics! Some, as is probably inevitable, are “junk topics”. Further, seasonal words seem to appear in lots of topics, which is a strange result. Despite this, many of the topics are classifiable. Topic 5 was interesting, where ‘yellow’ was such a discriminating term. A very quick (and non-scientific) review of the data suggests that people often do not like the colour yellow (I agree with them!) and want it changed. An attempt to name the topics from the table above:

  • Topic 1: Change an image so it’s in higher resolution
  • Topic 3: Change or create a logo or icon, perhaps for a smartphone app
  • Topic 4: Edits of a seasonal nature (Christmas, Thanksgiving)
  • Topic 5: Replace yellow (?!)
  • Topic 6: Halloween edits
  • Topic 8: Vectorisation task, e.g. “take this png file, turn it into a vector on a transparent background”
  • Topic 10: Change a colour in some way, often a font. “Panton” is a stemmed form of “pantone”, a popular colour chart
  • Topic 14: Change copy or update information on a business card
  • Topic 15: Resize or reformat a photo, often for social media purposes

Having to provide the number of topics to LDA, before you even know what’s reasonable, feels like a chicken-and-egg problem. It’s possible to try different numbers of topics and eyeball the results, but at times it felt a bit too much like guesswork. Nevertheless, I view these results as a decent “proof of concept”. It’s reassuring that a computer can find categories like this, and suggests that with more tweaking and a nicely labelled dataset, the job of automatically classifying Swiftly task briefs is entirely possible!

Next time…

That wraps up my experiments with unsupervised classification for this post. Next time, I plan to discuss my efforts after I settle on the Swiftly categories. I’d like to develop a nice labelled training data set (most likely using Amazon’s Mechanical Turk service), and then experiment with supervised machine learning techniques. I will also detail my efforts at a developing a more sophisticated pre-processing procedure. Tune in!

About Daniel

Daniel Williams is a Bachelor of Science (Computing and Software Science) student at the University of Melbourne and Research Assistant at the Centre for Neural Engineering where he applies Machine Learning techniques to the search for genetic indicators of Schizophrenia. He also serves as a tutor at the Department of Computing and Information Systems. Daniel was one of four students selected to take part in the inaugural round of Tin Alley Beta summer internships. Daniel is an avid eurogamer, follower of “the cricket”, and hearty enjoyer of the pub.

On-demand Image Thumbnailing With Thumbor

by Stuart Campbell

We recently replaced most of our image resizing code with Thumbor, an open-source thumbnailing server. This post describes how and why we migrated to a standalone thumbnailing architecture, and addresses some of the challenges we faced along the way.

Background

Historically, 99designs has largely been powered by a monolithic PHP application. Maintaining this application has become increasingly difficult as our team and codebase grow. One cause of this difficulty is that the application contains a lot of incidental functionality—supporting code that isn’t the core purpose of the application, but which is necessary for its operation.

As such, we set ourselves a technical goal in 2013 to migrate to a more service-oriented architecture. This means breaking big masses of functionality into discrete services and libraries that do one thing well. Such a design tends to yield smaller, more cohesive services, and provides natural lines along which our team can subdivide.

Image thumbnailing is a generic function required by many graphics-intensive websites, and a prime candidate for extraction into a standalone service.

Thumbnails at 99designs

Our 230,000+ strong designer community uploads a new image to 99designs every ~6 seconds. We serve several thumbnail variations of these images across the site.

Our thumbnailing solution needs to scale to serve our production traffic load. The approach we’ve used until recently has been to generate thumbnails ahead-of-time using asynchronous task queues. Every time a designer uploads an image, we kick off a task that generates thumbnails of that image and stores them in S3:

If a thumbnail request arrives while the task is generating the thumbnail, we serve a placeholder image:

Once the thumbnailing task finishes, we can serve the resized images:

This architecture has served us pretty well. It keeps response times low and scales nicely, but it has a few shortcomings:

  • We’ve intertwined the image resizing logic with our PHP application. Other apps in our stack have to implement their own resizing.

  • It’s not the simplest solution. There’s quite a bit of complexity: deduping resize tasks, using client-side polling to check if a resize operation has completed, etc.

  • We can only serve thumbnails at predefined sizes. If we decided to introduce a new thumbnail size, we’d have to generate that thumbnail for tens of millions of existing images.

A better solution is to create a separate, simpler thumbnailing service that any application in our stack can use.

Thumbor overview

Enter Thumbor. Thumbor is an open-source thumbnail server developed by the clever people behind globo.com. Thumbor resizes images on-demand using specially constructed URLs that contain the URL of the original image and the desired thumbnail dimensions, e.g.:

http://thumbor.example.com/320x240/http://images.example.com/llamas.jpg

In this example, the Thumbor server at thumbor.example.com fetches llamas.jpg from images.example.com over HTTP, resizes it to 320x240 pixels, and streams the thumbnail image data directly to the client.

At face value this seems less scalable than our previous task-based solution, but some careful use of cacheing ensures we only do the resize work once per thumbnail.

New architecture

The high-level thumbnailing architecture now looks like this:

Our applications generate URLs that point to a Thumbor server (via a CDN). The first request for a particular thumbnail blocks while Thumbor fetches the original image and produces the resized version. We set long cache expiry times on the resulting images, so they’re effectively cached forever. The CDN serves all subsequent thumbnail requests.

We put a cluster of Thumbor servers behind an elastic load balancer to cope with production traffic. This also gives redundancy when one of the servers dies.

The resulting architecture is very simple, and our image-resizing capability is neatly encapsulated as a standalone service. This means we avoid the need to re-implement thumbnailing in each of our applications—all that’s needed is a small client library to produce Thumbor URLs.

Usage example

We created Phumbor to generate Thumbor URLs in PHP applications. Here’s how you might implement a Thumbor view helper:

1
2
3
4
5
6
7
<?php
function thumbnail($original)
{
    $server = 'http://thumbnails.example.com';
    $secret = 'MY_SECRET_KEY';
    return new \Thumbor\Url\Builder($server, $secret, $original);
}

You might use it in a template like this:

1
2
<img src="<?php echo thumbnail('http://images.example.com/llamas.jpg')->resize(320, 240) ?>" />
<img src="<?php echo thumbnail('http://images.example.com/foo.png')->resize(320, 240) ?>" />

This produces the following HTML:

1
2
<img src="http://thumbnails.example.com/5yVqQzzWIuobw9rd4UebeF9v78c=/320x240/http://images.example.com/llamas.jpg" />
<img src="http://thumbnails.example.com/X8oXlCzK1ce_UIxiZ0tlv5vF7nY=/320x240/http://images.example.com/foo.png" />

Implementation strategy

We used a couple of complementary techniques to test Thumbor’s capabilities before committing to its use in production.

Firstly, we used feature-flipping to selectively enable Thumbor URLs for certain users. Initially we used this to let developers click around the site and check that Thumbor was generating thumbnails correctly.

Secondly, we used asynchronous tasks to simulate a production traffic load on the Thumbor service. Every time an app server handled a thumbnail request, we enqueued a task that requested that same thumbnail from the new Thumbor service. This allowed us to check performance of the service without risking a disruption to our users.

Finally, we used our feature-flipping system to incrementally roll out Thumbor thumbnails to all our users. This worked better than immediately pointing all traffic at the Thumbor service, which tended to cause a spike in response times.

Thumbor configuration

Some of our Thumbor configuration settings differ from the recommended defaults. We tweaked our configuration in response to our performance measurements.

Thumbor ships with a number of imaging backends; the default and recommended backend is PIL. Our testing shows that the OpenCV backend is much faster (i.e. 3-4x faster) than PIL. Unfortunately, OpenCV can’t resize GIFs or images with alpha transparency. As a result, we implemented a simple multiplexing backend that delegates to OpenCV wherever possible and falls back to PIL in the degenerate case.

Our production Thumbor cluster consists of 6x c1.medium EC2 instances behind an ELB, each running 4 Thumbor processes behind nginx. This cluster can comfortably serve all our production traffic.

Generally we’ve found that Thumbor is quite stable, and expect it to further mature as more people use it and make improvements.

Conclusion

Our Thumbor service now serves all design entry thumbnails for our main PHP application. The resulting architecture is much simpler and the service is usable by other applications in our stack. We’ll continue to use Thumbor in future apps we develop, and look for more opportunities to simplify our codebase by progressively adopting a more service-oriented architecture.

Internationalizing 99designs

by Lars Yencken

Two years ago, 99designs had localized sites for a handful of English speaking countries, and our dev team had little experience in multilingual web development. But we felt that translating our site was an important step, removing yet another barrier for designers and customers all over the world to work together. Today we serve localized content to customers in 18 countries, across six languages. Here’s how we got there, and some of the road blocks we ran into.

Starting local

The most difficult aspect to internationalizing is language, so we started with localization: everything but language. In particular, this means region-appropriate content and currency. A six-month development effort saw us refactor our core PHP codebase to support local domains for a large number of countries (e.g. 99designs.de), where customers could see local content and users could pay and receive payments in local currencies. At the end of this process, each time we launched a regional domain we began redirecting users to that domain from our Varnish layer, based on GeoIP lookups. The process has changed little since then, and continued to serve us well in our recent launch in Singapore.

Languages and translation

With localization working, it was time to make hard decisions about how we would go about removing the language barrier for non-English speakers (i.e. the majority of the world).There were a lot of questions for us to answer.

  • What languages will we offer users in a given region?
  • How will users choose their language?
  • How will we present translated strings to users?
  • How will strings be queued for translation?
  • Who will do the translation?

What languages to offer?

Rather than making region, language and currency all user selectable, we chose to restrict language and currency availability to a user’s region. This was a trade-off which made working with local content easier: if our German region doesn’t support Spanish, we avoid having to write Spanish marketing copy for it. Our one caveat was for all regions to support English as a valid language. As an international language of trade, this lessens any negative impact of region pinning.

Translating strings

There were two main approaches we considered for translation: use a traditional GNU gettext approach and begin escaping strings, or else try a translation proxy such as Smartling. gettext had several advantages: it has a long history, and is well supported by web frameworks; it’s easily embedded; and translations just become additional artifacts which can be easily version controlled. However, it would require a decent refactoring of our existing PHP codebase, and left open issues of how to source translations.

In Smartling’s approach, a user’s request is proxied through Smartling’s servers, in turn requesting the English version of our site and applying translations to the response before the user receives it. When a translation is missing, the English version is served and the string is added to a queue to be translated. Pulling this off would mean reducing substantially the amount of code to be changed, a great win. However, it risked us relying on a third-party for our uptime and performance.

In the end, we went with Smartling for several reasons. They provided a source of translators, and expertise in internationalization which we were lacking. Uptime and performance risks were mitigated somewhat by two factors. Firstly, Smartling’s proxy would be served out of the US-East AWS region, the same region our entire stack is served from, increasing the likelihood that their stack and ours would sink or swim together. Secondly, since our English language domains would continue to be served normally, the bulk of our traffic would still bypass the proxy and be under our direct control.

Preparing our site

We set our course and got to work. There was substantially more to do than we first realized, mostly spread over three areas.

Escaping user-generated content

Strings on our site which contained user content quickly filled our translation queue (think “Logo design for Greg” vs “Logo design for Sarah”). Contest titles, descriptions, usernames, comments, you name it, anything sourced from a user had to be found and wrapped in a <span class="sl_notranslate"> tag. This amounted to a significant ongoing audit of the pages on our site, fixing them as we went.

Preparing Javascript for translation

Our Javascript similarly needed to be prepared for translation, with rich client-side pages the worst hit. All strings needed to be hoisted to a part of the JS file which could be marked up for translation. String concatenation was no longer ok, since it made flawed assumptions about the grammar of other languages. Strings served through a JSON API were likewise hidden from translation, meaning we had to find other ways to serve the same data.

Making our design more flexible

In our design and layout, we could no longer be pixel-perfect, since translated strings for common navigation elements were often much longer in the target language. Instead, it forced us to develop a more robust design which could accommodate the variation in string width. We stopped using CSS transforms to vary the case of text stylistically, since other languages are more sensitive to case changes than English.

The wins snowball

After 9 months of hard work, we were proud to launch a German language version of our site, a huge milestone for us. With the hardest work now done, the following 9 months saw us launch French, Italian, Spanish and Dutch-language sites. Over time, the amount of new engineering work reduced with each launch, so that the non-technical aspects of marketing to, supporting and translating a new region now dominate the time to launch a new language.

The challenges

We also encountered several unexpected challenges.

Client-side templating

We mentioned earlier that the richer the client-side JS, the more work required to ensure smooth translation. The biggest barrier for us was our use of Mustache templates, which were initially untranslatable on the fly. To their credit, Smartling vastly improved their support for Mustache during our development, allowing us to clear this hurdle.

Translating non-web artifacts

It should be no surprise: translation by proxy is a strategy for web pages, but not a strong one for other non-web artifacts. In particular, for a long time translating emails was a pain, and in the worst case consisted of engineers and country managers basically emailing templates for translation back and forward. After some time, we worked around this issue by using Smartling’s API in combination with gettext for email translation.

Exponential growth of translation strings

Over time, we repeatedly found our translation queue clogged with huge numbers of strings awaiting translation. Many of these cases were bugs where we hadn’t appropriately marked up user-generated content, but the most stubborn were due to our long-tail marketing efforts. Having a page for each combination of industry, product category and city led to an explosion of strings to translate. Tackling these properly would require a natural language generation engine with some understanding of each language’s grammar. For now we’ve simply excluded these pages from our translation efforts.

The future

This has been an overview of the engineering work involved in localizing and translating a site like ours to other languages. Ultimately, we feel that the translation proxy approach we took cut down our time to market significantly; we’d recommend it to other companies who are similarly expanding. Now that several sites are up and running, we’ll continue to use a mix of the proxy and gettext approaches, where each is most appropriate.

We’re proud to be able to ship our site in multiple languages, and keen to keep breaking down barriers between businesses and designers wherever they may be, enabling them to work together in the languages in which they’re most comfortable.

Discuss on Hacker News or Reddit.

The Rails Asset Pipeline for Every Framework!

by Daniel Heath

I recently found myself wanting the features of the rails asset pipeline in my golang project at work. Since there isn’t much in the way of asset pipelining for golang yet, I built it. Turns out, sprockets is really easy to integrate. Here’s how you can go about setting it up for your project.

Assets in development

First things first - lets get it to the ‘it works on my machine’ stage. I’ve put together a sample repo using the asset pipeline, which you can use as a guide.

The setup for your app will be similar:

  • The assets folder contains your stylesheets, javascript, etc (this directory name is set in sprockets/environment.rb).
  • You’ll need a similar Rakefile to build assets (and maybe launch the server)
  • You might store the sprockets directory somewhere else - update the Rakefile to match.
  • Use a Gemfile and the bundler rubygem to manage dependencies.
  • Edit the rakefile to change the port the asset server runs on.

When your app starts up (in development), it should make a request to http://localhost:11111/assets/manifest.json, which provides a JSON hash linking asset names (e.g. “application.css”) to the relative URLs the compiled assets can be fetched from. To generate a link to an asset in your app, use the JSON hash you fetched to lookup the URL. For example, the URL for “application.css” this might look like http://localhost:11111/application-8e5bf6909b33895a72899ee43f5a9d53.css.

That should be all you need for development - you should be able to see SASS/Coffeescript assets compiled and loading normally. Hooray!

Assets in production

For production we want to pre-compile assets rather than regenerating them each time they change.

rake assets will create a ‘public’ folder containing ‘manifest.json’ (same format as before). Get this directory onto your production servers. git add -Af public/ will add it to source control if you deploy via git.

When generating a link to an asset, simply look up manifest.json from the filesystem rather than from HTTP.

Fin

If you’ve followed these steps, you’ll have a fullly functioning asset pipeline for your golang project. The whole thing, including deployment, took me well under a day to add to our app. The resulting assets are minified, concatenated, and gzipped (for size). They are also fingerprinted, so you can serve them with an unlimited cache lifetime and reap the benefits.

Although I set this up for golang, there’s nothing go-specific about it. The same technique works just as well for any language or framework without a mature asset pipeline. If you find yourself in need, just use this pattern and you can be up and running in no time.

Bug Tracking With GitHub Survivor

by Stuart Campbell

At 99designs, we try to make sure we’re always fixing bugs as well as writing code. It can be easy to neglect bugs when you’re busy churning out new features.

We use GitHub issues to track bugs in our various applications. GitHub issues integrate well with our codebase, commits and pull requests, but the reporting facilities are a bit limited.

As our team grows, it’s become increasingly important for us to be able to answer key questions about bugs, including:

  • How many bugs are currently open?
  • Have we each remembered to spend time working on bug fixes this sprint?
  • Are we closing more bugs than we’re opening?

To help answer these questions, a few of our team spent a number of hack days implementing a bug dashboard named GitHub Survivor.

Unlike the similarly-named reality TV show, GitHub Survivor doesn’t feature eliminations, gruelling physical challenges, or Jeff Probst. However, it does pit developers against one another — in a light-hearted way.

We display GitHub Survivor on a big screen in the office, where all the team can see it. We’ve found it helps keep our minds on bugs — it reminds us to make a small effort every sprint, gradually bringing the bug count closer to zero.

A bug leaderboard occupies the bulk of the screen. It shows who’s closed the most bugs this sprint (may they be laden with Praise and Whisky!) and who’s forgotten to spend some time fixing bugs (may they toil in the maintenance of a thousand Malbolge programs!).

There are charts showing the number of bugs opened and closed in recent sprints, the open bug count over time, and a big indicator showing the current open bug count.

The source is available for you to inspect and adapt to your needs. Please try it out, make improvements and contribute them back! We hope you find it useful.

We’re passionate about building high-quality software at 99designs, and this is just one way we measure whether we’re doing a good job of that. If you’re similarly interested in building cool things in an awesome environment, check out our open positions!

Join the discussion at Hacker News and Reddit.

Ruby Metaprogramming for the Lulz

by Richo Healey.

What if it were possible to call methods with spaces in their name directly from Ruby?

If you’ve seen Gary Bernhardt’s awesome talk where he digs into some of the quirks of Ruby, you’ll know that it’s pretty trivial to get bare words in Ruby:

1
2
3
4
5
6
1.9.3-p286 :001 > def method_missing(*args)
1.9.3-p286 :002?>   args.join(" ")
1.9.3-p286 :003?>   end
 => nil
1.9.3-p286 :004 > ruby has bare words
SystemStackError: stack level too deep

Wait.. What?

Disappointing to say the least. Obviously something is amiss. It turns out this is just a quirk in irb; if you try instead

1
2
3
4
5
6
7
1.9.3-p286 :001 > def self.method_missing(*args)
1.9.3-p286 :002?>   args.join(" ")
1.9.3-p286 :003?>   end
=> nil
1.9.3-p286 :004 > ruby has bare words
=> "ruby has bare words"
1.9.3-p286 :005 >

Cool, so what else can we do with this? It’s trivial to define a method with a space in its name, and calling it isn’t terribly difficult:

1
2
3
4
5
6
7
8
1.9.3-p286 :005 > self.class.send(:define_method, :"i have a space") do
1.9.3-p286 :006 >     puts "I has a space"
1.9.3-p286 :007?>   end
=> #<Proc:0x007ff89c1e0b58@(irb):5 (lambda)>
1.9.3-p286 :008 > send(:"i have a space")
I has a space
=> nil
1.9.3-p286 :009 >

But having created such a monstrosity, how do you call it from the repl? Or for that matter, from an actual Ruby program? This is obviously something you should be doing in production…

1
2
3
4
5
6
7
8
9
10
11
12
13
self.instance_exec do
def method_missing(sym, *args)
  # Splat args if passed in from a parent call
  if args.length == 1 && args[0].is_a?(Array) && args[0][0].class == NameError
    args = args[0]
  end

  method_names, arguments = args.partition { |a| a.class == NameError }
  method([sym.to_s, *method_names.map(&:name)].join(" ")).call(*arguments)
rescue NameError => e
  return [e, *arguments]
end
end

Bam. You may be looking at this baffled (or if you’re reasonably tight with metaprogramming in Ruby, sharpening/setting fire to something with a view to causing me significant bodily harm).

Walking through this, we first of all act on whatever self is; in most cases this will be the local scope. If we didn’t do this, we’d be defining the method on Object, which can cause all kinds of headaches when you’re trying to debug.

Immediately after this, we unpack arguments if they look like they were created by an earlier instance of this method. This is unwieldy, but unfortunately Ruby’s single return values and the recursion we’re employing here make it necessary. We could definitely define a subclass of Array to make the test cleaner and the implementation more robust, but I preferred to keep this as short as possible and use the bare minimum number of Ruby primitives.

Once we’ve unpacked our arguments, we do the real magic. First off, we split our arguments into NameErrors, the container we’re using for our missing method names, and everything else (the legitimate arguments we were called with).

We try to find a method with the current name (as we’ll be building our method name right to left with recursive calls to method_missing), and failing that we pack up our current attempt with our arguments, and return it for the next pass.

There are enough issues with this (if you defined the methods foo bar baz and bar baz, a call to foo bar baz would call foo with bar bazs return) to make it unwieldy. On the other hand; if those bugs are the only thing stopping you from putting this into production, you’ve probably got larger issues.

If this large scale abuse of the language excites you, you might be interested to know that we’re hiring.

At this point you’re probably eager to know.. does it work?

1
2
3
4
5
6
7
8
9
1.9.3-p286 :001 > load "bare_words.rb"
1.9.3-p286 :002 > self.class.send(:define_method, :"i has a space") do |name, greeting|
1.9.3-p286 :003 >     puts "#{greeting}, #{name}!"
1.9.3-p286 :004?>   end
=> #<Proc:0x007fc6b41872c0@(irb):2 (lambda)>
1.9.3-p286 :005 > i has a space "richo", "Hello"
Hello, richo!
=> nil
1.9.3-p286 :006 >

Join the discussion at Hacker News and Reddit.

Talk Like a Pirate Day

by David Lutz

If you had happened to have been wandering around the 99designs office today you would have heard hysterical laughter and cries of “yarrrr”.

Something we try to do is foster a good “DevOps” working culture. One of the critical components of DevOps is a collaborative way of working and close working relationship between those working in development and operations. And other parts of the business for that matter. Company culture is something that is tricky to improve if it’s not working, but immediately obvious when it is working.

A key component of good company culture is good communication. We make extensive use of IRC as a communication medium. Naturally we use it for technical discussions like “How does this component of code work”, but just like companies like GitHub and Etsy we use IRC within the company as a very effective method of documenting what both dev and ops are doing day to day. We also have a bot that lives in IRC and does useful things for us. Our bot is called agent99. She makes our life easier and has the ability to do such things as deploy new versions of code to our production website.

She also has a bit of character. She can find memes to punctuate the moment, or fetch funny pictures of cats when asked.

We’ve actually open-sourced agent99 as well as many other pieces of code. We like to contribute code that we think might be useful to others back to the community.

So what was the source of hysterical laughter? Well today is Talk Like a Pirate Day. So one of our staff hacked together a plugin to make agent99 send a “yarr” over the office speakers. Hilarity ensued. It was quicky repatched, so that “yarr” only works one golden day a year.

This is a little example of where all are empowered to use technology to make it a more enjoyable workplace. And a work environment that is both fun and technically challenging is a competitive advantage in that helps attract and retain motivated and happy employees!