MGID Blog 》Google’s Data-Driven Attribution Model Isn’t Perfect, But It Is Progress

Last click is the most commonly used attribution. Why? Because it’s very simple – but it’s also clearly flawed.

A user’s path in the funnel is affected by multiple touch points, including the ad impressions that are seen or heard and not clicked. Assigning all the credit to the last click is as good as assigning all the credit for one’s fitness level to one’s last workout.

I bet you can’t remember the last time you clicked a Geico ad. But if you live in the US, you can easily fill in the blanks in the following sentence: “A 15-minute call could save you 15% or more on ___ ____________.”

The next time you need car insurance, more likely than not, you’ll type “Geico'' into your browser, after which you might click the first link you see: an AdWords link. The insurance quote you’re given will be counted as a lead by Geico, but here’s an important question: Does the ad you clicked deserve full credit for the lead?

Google AdWords, which supports six attribution models, recently changed its default from last click to a complex model Google calls “data-driven attribution.” The name is rather unfortunate. All attribution models, including last click, are driven by data. Google might as well call it an “electricity-powered” attribution model.

In principle, the idea behind data-driven attribution sounds great. The example given by Google appears to indicate a model that correlates conversions to certain events, such as clicks on particular ads. The credit is then spread across the events that correlate the most with the conversions.

Unfortunately, not much is known about how the model is built or how exactly it works. It’s a black box that might be powered by a regression or a neural net, among other things – who knows.

As someone who works with predictive modeling, I wonder if Google's "data-driven attribution" model accounts for context and interactions.

In the example provided by Google, it’s possible that the ad for “Bike tour New York” might have a stronger correlation with conversions than "Bike tour Brooklyn waterfront" across all traffic. However, when the traffic comes from within the New York area, the more specific ads, such as "Bike tour Brooklyn waterfront," might perform better.

Secondly, the new default attribution model does not appear to explain how ad views that do not result in clicks count toward the attribution, if at all.

Google mentions “holdback experiments” as a way to calibrate the model and arrive at incrementality, which is encouraging. In my view, strictly controlled holdback experiments are the gold standard of attribution and incrementality measurement. This works as follows:

A certain percentage, say 10%, of the target audience is held back as a control. The users in the control group are not exposed to the ads.
After the campaign is complete, the advertiser shares its list of buyers with the provider.
Some of the participants in the control group will end up converting anyway. The difference in the percentage (and monetary value) of the conversions between the control group and the exposed group represents the true incrementality of the campaign.

In practice, this attribution study will be challenging to implement. Usually, it involves resolving the identities of both converted users and exposed users. Doing so presents obvious privacy-related challenges. Clearly, Google cannot do this type of study for every campaign, but at least such studies appear to be used for calibration.

The new default attribution solution should answer the question as to which of Google’s campaign components contributed to the most conversions. It won’t, however, answer the question of incrementality or the question of which components of advertisers’ overall spend produced the most conversions.

Still, it is a step in the right direction.

(As published on AdExchanger)

Dmitri Kazanski

Dmitri Kazanski is our Head of Product for North America, at MGID. He brings a rich leadership background spanning 21+ years in the Ad Technology space. Dmitri's experience touches the entire breadth of the LumaScape including: publisher(s), programmatic media buyers, data providers, and SSP's/exchanges. Grounded in an advanced engineering degree, business degree, Lean Six Sigma, and Agile training, Dmitri has led teams to release groundbreaking products that were considered ahead of their time including: Native in 2005, Dynamic Messaging in 2006, Header Bidding in 2012, Continuous Score Bidding in 2016. He is particularly passionate about applying Predictive Modeling and Machine Learning to resolve the inefficiencies of the ad tech ecosystem.