MGID Blog 》How Does a Content Recommendation Engine Work?

As website visitors surf through the Internet, they leave vast amounts of behavioral data, such as visited pages, what they clicked on, and what they shared on social media. Through testing, we compared different algorithms and found ways to figure out what ads have the highest probability to be clicked on.

Think back to a time when friends recommended you a movie for the weekend. Are their tips based on what they know you like, what they like and the underlying assumption of your mutual similarity, or the current context of your conversation?

Today, in almost every online destination, from e-commerce stores to online movie theatres and social platforms, there are recommendation systems whose job is to look at user behavior from the background and suggest items that they are most likely to engage with.

All online giants are striving to get better at providing the most relevant and personalized recommendations to their users. In this article, you'll get a sneak peek at different types of recommender systems, filtering algorithms, and how the MGID content recommendation engine works.

What is a content recommendation engine?

Content recommender systems give observations of a user’s behavior and predict which other things the same user will respond to. Essentially, they help create personalized experiences that feel like a friend that knows you, what you like, what others like, and understands what options are out there for you.

User clicks, purchases, views, reading behavior, or other actions can be represented graphically as connections between users on one side and content or items on the other. Each line means that a user purchased, viewed, or clicked a particular item. In some systems, these connections might vary in their strength; for example, they may identify the number of times an item was purchased or the movie rating on a scale from 1 to 10. Thus, the problem is to identify which other unknown lines can be added to this graph and predict their strength.

There are two different types of recommenders systems, ones based on the item or user filtering. User-based filtering algorithms are pretty straightforward; they simply pick other users with similar interests or behavior patterns, then analyze what items were chosen by those similar users, and suggest these to a new user.

Instead, item-based algorithms look for related items throughout the catalog. Here, the term ‘related’ should be determined on a case-by-case basis. Often, it means that item A was chosen (purchased, clicked, watched, etc.) with unusually high frequency by users who also chose item B (the related item).

The history of recommendation engines

In 1998, when they still were primarily a bookstore, Amazon had launched a very simple, item-to-item recommendation engine. Their first algorithm was based on collaborative filtering and suggested new items for purchase based on what the user already had had in the shopping cart. This feature has been very well-received by users, and since then recommender systems have gained popularity across the web.

By 2003, Amazon and other large e-commerce operators had sophisticated this feature: it now supplied recommendations based on a user’s past purchases and items browsed in the store. Search result pages had a different algorithm that featured items more related to the search. More pages had at least some recommended content, including browse pages, product detail pages, and others. At that time, about 30% of all pageviews on Amazon came from the recommender system.

Then online players in entertainment, travel and other niches also started using recommendation algorithms. Netflix has used this feature so extensively that in 2006 they announced an ML competition for the movie rating prediction problem, the Netflix Prize. They offered $1 million to improve the accuracy of their movie recommendation system. Different solutions and algorithms were assessed based on how they can minimize the root mean squared error (RMSE) of the predicted rating, with the 10% decrease set as a target.

Finally, in the 2010s, digital publishers and news websites also started using content discovery recommendations, suggesting additional articles to visitors based on their onsite behavior or interests. These recommended articles can lead to content from the same site or to other sites, show video content, or preview other web formats.

On-site recommendations can increase user engagement with the publisher’s site and decrease bounce rates while off-site recommendations are used to promote external content projects, advertise products and generate leads.

Today, publishers add content discovery recommendations in a variety of ways, from simple plugins to dedicated platforms with diverse functionality. One example is the MGID platform that was the first to offer content recommendation widgets, the most popular format today.

Some of the recommender engines (mainly plugins) are based on keyword analysis and tags to suggest content similar to the one a user is currently consuming. Others investigate user behavior, how they engage with different content, their interests, and social demographics to provide recommendations.

Collaborative vs content-based filtering

There are two general approaches to constructing recommendation algorithms, content-based and collaborative filtering.

Content-based filtering labels each item or user with certain characteristics and then looks into these features, making assumptions about their similarity. You have to know the products or the audience really well to make guesses about them. For example, the recommender engine has to know the specific movie genre of the movies, their country of origin, director, release date, etc. to conclude they are similar and recommend one when a user expressed interest in another.

In contrast to content filtering, collaborative systems do not require deep product expertise or extensive categorization because they look at actual user behavior. The features or characteristics are extracted directly from the historical data on past interactions between users and items. The engine develops a giant matrix with users and items and identifies common clusters to make suggestions. Also, suitable distance metrics can be used for matrix factorization.

Thus, collaborative recommender systems can be differentiated into two different types:

memory-based filtering

These systems look for the item-to-item or user-to-user similarity. Basically, it makes recommendations based on whoever bought (or viewed, clicked, etc.) product A also bought product B. Memory-based systems can be very precise, but they require multi-dimensional clustering and are hard to scale.

model-based filtering

Here, the algorithm is based on matrix factorization: you have to assign a certain number of features (parameters) and weights for these features, i.e. build a mathematical model to predict item similarity. You also have to choose the objective function for the model, for example, the likelihood of purchase.

To sum up, content-based models can be used when all features related to items and users are known. Collaborative filtering, on the other hand, distributes recommendations without deep product expertise or when there is a high probability of biased conclusions. To make use of the available product expertise and avoid potential biases, hybrid filtering can be used.

MGID content recommendation engine

MGID’s algorithm picks native advertisements a user might most likely be interested in based on their past behavior and the current context of the page. The objective function the engine is set to improve is the user’s CTR (click-through rate), i.e. the system predicts the likelihood of a user clicking on various advertisements and shows the ads with the highest probability.

The algorithm is based on hybrid item-based filtering, i.e. the recommender system blends data from content-based algorithms (using the content categories of the web page and advertisement, social demographics, audience interests, etc.) with behavior-based algorithms (using user past pageviews, clicks, and impressions).

The importance of short-term user interest

When building the recommender system, we aimed to distinguish the features or parameters that can help us show the most relevant and click-worthy ads. Based on a series of experiments and tests, we determined that short-term user interest, i.e. observations of the user’s most recent actions on the site such as clicks and pageviews, is the most important parameter that predicts which advertisements might get clicked on.

For example, the likelihood of a user clicking on the ad from a particular category increases if they recently clicked on other ads from the same category. Using short-term user interest as one of the main predictors to choose content recommendations, we were able to increase the average CTR by 3.5% in product campaigns and by 4.5% in content campaigns.

The system updates immediately based on new information about user clicks and pageviews. For each ad placement that’s part of the webpage context and a user’s recent actions on the site, the MGID recommendation engine looks up the most relevant advertisements, filters duplicated or dismissed ads, and then the ads are shown to the visitor.

The proximity of the observed actions in time matters: if a user had clicked on ads from a particular category even a few days ago, there is little evidence that ads from the same category would be useful for them today. Therefore, we identify and store only the recent data on user behavior.

Final thought

All online giants are fighting to get better at recommendation systems. On the one hand, people’s tastes and behavior can never be perfectly predicted because there are so many different variables and they are always in flux. It is possible, however, to estimate the most probable matches and show the most relevant ads using the vast amounts of preference and behavioral data.

In native advertising, the recommendation engine acts as a third party that balances users’ interests with publishers’ settings and advertisers’ targeting. This way native content recommendations can boost reader engagement, as well as bring more conversions and sales.

Now that we covered the basics of how content recommender systems work and how MGID does it, we invite you to contact us and find out how the MGID platform can help your online business.

Oleksii Borysov

Oleksii Borysov is the VP of Product at MGID with over 10 years of experience in AdTech. Oleksii is an expert and recognized leader within the industry, and is always focusing on impressive innovations that bring results. At MGID, his role centers around innovation leadership, transferring proven solutions and finding new ways to improve our core product - the MGID global advertising platform. Oleksii built the data-science unit from scratch and has been responsible for hiring the product team and guiding them on implementing new features.