How Lights Online increased CVR |READ NOW
High-throughput Product Recommendations
Tyler Hutcherson
December 15, 2021

Recently, I shared about Skafos’ interactive search engine for product recommendations. It highlights the overall ML approach and focuses on why this works for our unique business and set of product offerings within Shopify.

In this post, I will open up the hood a bit further and share some tools that we use to make our product recommendations fast enough for modern web-based interactivity. As you might expect — it comes down to picking the right tools for the job and picking an architecture that is flexible and scalable.


Starting from the bottom of the stack, our machine learning models are wrapped in a popular Spotify-developed, open-source framework called Annoy (Approximate Nearest Neighbors Oh Yeah). Annoy allows for efficient vector similarity calculations between items, in this case, eCommerce products. It’s fast, reliable, and accurate. The speed boost is in part due to the fact that you can hold the models in memory, even sharing them between processes. Additionally, as the name indicates, the framework performs an “approximate” search using a tree, allowing it to cut some corners.

The beauty of using Annoy is that it’s just a vessel for delivering similarity recommendations. At the end of the day, you still need your own “special sauce” to create the model itself. At Skafos, we’ve invested countless hours into developing proprietary algorithms for transforming products into numerical vectors, a process called “embedding” or “encoding”: Image feature extraction, text processing, dimensionality reduction, matrix factorization…you name it.


This piece is quite simple. As discussed in the last post, our search engine is composed of many models. We use multithreading to query asynchronously, and then reduce results into a final score vector. The Python concurrent library allows you to perform asynchronous execution with threads using a ThreadPoolExecutor.

Here’s a sample of code that might be useful:

# Submit search jobs to threadpool executorwith concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
   for model in models
       futures.append(ex.submit(model.search_by_item, product_id)
   for future in concurrent.futures.as_completed(futures):
       result = future.result()


Redis is the backbone of our architecture that makes the machine learning APIs soar. For those unfamiliar, it’s a “remote dictionary server” hosting an in-memory key/value store with some built-in datatypes and operations. We use Redis for the following on the backend:

  • Result caching
  • Product attribute & price filtering with RediSearch
  • Inter-service communication with Pub/Sub


I’m not going to try to write anything deeply about microservices, as it is a widely debated and discussed topic in the software architecture world. Just do a bit of googling and you will see what I mean…

We build microservices because it gives the ability to release fixes and new features with ease and ultimate flexibility. It also benefits ML/AI applications because it can separate core functionality, resources, and business logic in a way that optimizes for inference + API throughput.

We use a tool called Hydra to power our microservice architecture (fun fact: it’s also built on top of Redis).

These are just a few examples of powerful tools that you can use to speed up your machine learning APIs. The right choice depends on the use-case and functional requirements of the app and end-user. Hopefully, this gives you some ideas to start down the path.

 What kinds of tools do you use to improve machine learning model speed + performance? Tweet me and let me know.