Designing Home Feeds

This piece was originally posted on the Quora Design Blog.

This is a foundational piece I’m going to refer back to as we publish a series of posts on machine learning and design. Machine Learning is extremely important at Quora. It powers our personalized digest emails, the home feed, applying topics to questions, moderation, routing questions to people to answer, spam detection and more.

In our case the answer to the question is that almost all designers have some basic understanding of Machine Learning, and we’ve also developed a framework for who needs to know how much.

Note that I’m going to use Designer and PM interchangeably because the two functional roles have a lot of overlap at Quora and this should be equally helpful to both. Imagine you are someone working on…

Design Patterns

Your day to day is working with type, grid, layout, color, and establishing design patterns to be reused across the site. This is about as far as you can get from ML

Nevertheless, it’s extremely useful to realize that content in a highly personalized product can greatly vary from user to user and it’s a risk designing for people only like yourself. For example, when I worked at Facebook, my news feed would be full of high quality DSLR photos from other employees, not unlike this launch photo from the 2013 Newsfeed redesign.

But it turned out most real users didn’t have friends with great cameras at the time, so in reality most screens were smaller and the emphasis the design places on presenting high quality images well simply wasn’t as important as internally perceived.

Spam/Quality Control Systems

Spam filters are based on classification [1]. Classifiers use data from the past to label if something belongs to one class (in this class spam) or another (not spam). Any such system makes mistakes, either marking well-intentioned messages as spam (false positives) or still letting spam through (false negatives). This generalizes to multi-class classification as well.

Handling false positives and negatives becomes the main user experience concern of such systems, firstly on the client:

Messenger’s Message Requests Interface is a flow for handling messages from people you don’t already know. It has to solve for spam, harassment and celebrity inbox overload.

But if your classification system is not bulletproof, for example if legitimate emails sometimes get marked as spam, then message senders have to take partial responsibility and ask you to “make sure you check your spam folder”.

Your job then becomes how to handle when false positive and negatives when they happen. When things get inadvertently marked as spam, you’re designing for the fear of missing out and when something makes it past the filters, you’re dealing with explaining the potential failure and giving the user the feedback loops to then deal with it.

Notifications

Notifications are a great lever for increasing engagement and giving people feedback for their actions but these systems are complex to design.

As companies grow larger, they often develop a system where different teams can create different emails or notifications to be sent to users and based on their previous engagement and current bandwidth, a classifier determines whether or not to send an email for example LinkedIn’s email optimization framework.

But these systems are often blind to special cases where false positives or negatives are particularly expensive:

If someone I follow comments on an answer of mine, that is really high signal and I want to know it happened, even if I don’t have the time to respond to many other comments.
I might care much more about feedback answers I wrote recently than ones I wrote a long time ago.

Knowing when a user expects a deterministic relationship and the system cannot afford a false negative is extremely important to preventing a fear of missing out and maintaining trust.

Often ML systems are augmented with a ruleset to preserve wanted behavior. In fact, there are many ways to encode “rules” into practical ML systems, which I talk about in this answer. Morever, some products go so far as to avoid Machine Learning altogether for these problems and have incredibly complex rule sets so they can preserve the desired level of trust. For example, Slack users this flowchart to determine whether to send you a push notification:

Slack’s rules for determining whether to push a notification

Ranked Feeds

Unlike Twitter, the home feeds for Facebook, Quora and Netflix are not sorted by time but instead are ranked for relevance. This brings us to our first regression [2] problem, wherein the ML algorithm is trying to predict a probability that you’ll like a story. This is a number between 0 and 1, based on many observed previous instances of 0 and 1.

Having to sort a list of possible results (note the resemblance to search engines, which we’ll talk about soon) saves you some of the pain that comes with the false positives and negatives in classification but there’s several other concerns that go into making a good home feed experience, including freshness, diversity and user expectations to be seeing the things that matter most first.

So the people driving the user experience often have an important role to play in defining the objective function [3], which you can think of as what the system is trying to maximize. For example, if Facebook tries to maximize the chances you’ll like a story, it’ll show you very interesting but possibly outdated content, or many stories from the same person, and these are user experience considerations that need to be factored in.

I’ve written a little bit about designing home feeds on the blog before but stay tuned, there’s a lot more on the way.

Also, it’s really important that you collect the right signal from users about whether they truly liked the content or not, which brings me to…

Content Actions

Feedback is oh-so-important to these ranking systems. Any regression model can only optimize something it can measure, which means you need clear signals from the product that a user wants more of something or less of something.

These signals could be implicit, such as the amount of time you spent watching a video on YouTube or explicit, like downvoting a question or answer on Quora. I talk about them in quite some detail in another post but the main takeaways are that these signals must be:

Predictive: Actually indicative of a good user experience
Unambiguous: Doesn’t muddle intents, e.g. Twitter’s original “star” content action was designed for bookmarking but used in practice for liking
Dense: Used often enough by users that you can learn robustly from the data. This is why implicit signals like watch time tend to do extremely well.

Explaining Magic Numbers

Oftentimes, a regression model is used to price something, but discerning users want to understand why it’s priced the way it is. For example, Airbnb ran into the problem of trying to explain predicted pricing to hosts. Thankfully, their model was (or close enough to) a linear regression model where you could express the predicted price as a weighted sum of uniqueness, demand and location, which they could translate into easy-to-understand visualizations.

Smart Pricing regression model next to a visualization explaining the model is made of three parts that vary per host, from Amber Cartwright’s Invisble Design.

Model interpretability is a big area of ongoing work in Machine Learning, and one tradeoff we’re seeing is that the more complex and accurate the model, the less interpretable it usually is.

Which brings me now to Google, which was probably the first mainstream ranking product with complex UI considerations. Note that I don’t say Machine Learning yet because it wasn’t always ML powered, but now it is.

In producing a search result page, a search engine has to go through several of the considerations we’ve outlined above — filtering out adult/offensive content, ranking a list under a complex objective function, explaining numbers in the metadata and collecting feedback for whether a search result satisfied the query (I’ll let you think about how they do this).

But search has a unique UI consideration of its own, which is discerning user intent. As you can see, when I search for the movie, I could be searching for an overview (right hand column), showtimes, news, trailers, reviews or maybe something else entirely.

It’s important for a designer to understand what different combinations of intents can a query have and design a UI system that can flexibly adapt to the many different possibilities.

Positioning Related Content

ML is also used in surfacing related content. It is typically formulated as a regression problem maximizing the probability of clicking through on a related link.

But in the YouTube example above, “Everything Wrong with Underworld: Rise of the Lycans” has nothing in common with the history of Japan. It just happens to be something YouTube knows I’m likely to click because I watched another video from the same channel. YouTube plays a tricky balancing act here in making sure some of the related content is actually related as well as positioning the feature more defensively as “Up Next” instead of its original name “Related Videos.”

Voice & Conversational

Chatbots are all the rage (or are they already so 2016?), but both natural language understanding and speech synthesis are in their infancy. So it’s important for a designer to work well with the team and understand the limitations of the system.

For example, having chatbots remember context is very hard and speech synthesis is finally, barely getting to human levels of quality with breakthrough improvements every day.

Natural language is so much at the frontier right now that product opportunities unavailable last week might open up today.

Copywriting

Last but certainly not least, countless user studies on personalized products across the industry reveal that many people don’t even realize that what they’re seeing is personalized for them, so even the little details matter, for example titling feed as “Top Stories for You” instead of “Top Stories” or by using the person’s name in the appropriate places in the UI.

So that’s a non-exhaustive but fairly comprehensive list of reasons it makes sense for product people to get familiar with ML. Up next, I’ll write a little where to pick up a functional knowledge of Machine Learning and follow up with some interesting case studies and experiences we’ve had over the years.

Glossary

Hopefully these are enough to make intuitive sense. I’ll fill them in more rigorously in another post.

Classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
Regression in this context is predicting a real value outcome based on previous observations, e.g. predicting the price of a new house on the market based on other similar houses that have been priced as a function of their features (square footage, number of rooms, etc.)
Objective Function: When training a machine learning model based on previous observations (this is called training data), we do this by giving it either “mistakes” to minimize or something to maximize. For example, if we’re predicting how likely it is that a user will like a story in Facebook’s news feed, then the probability of liking that story P(like) is the objective function we want to maximize over the training data.
Features: These are attributes of the data that are useful in predicting the objective function and can be observed in fresh inputs that we need to classify or predict a real output for. For example, the number of times “Nigerian prince” appears in an email is a feature used to predict whether an email is spam. Similarly the square footage of a house in predicting its price.