Understanding Marketing Attribution Algorithms- Why the PAIN may not be Worth the GAIN in the Long Run?

In one of earlier my blog posts, I had mentioned that there exists a big market for Data-Driven Marketing Attribution (DDMA) solutions. Marketers across various industries have the tough job of evaluating various vendors who come up with different algorithms to solve the Attribution problem.
In a way, the buyer is offered a bouquet of algorithms to choose from, with no single one being touted as the right answer to the buyer’s situation. These algorithms have fancy names like Markov Chain, Shapley Value, and Hidden Markov etc.  They sound big and complex, which they are to some extent, but one can get under the hood and understand how they work and interpret their output.  Each comes with its own set of assumptions which the vendor may or may not tell you. Let me now briefly explain the popular attribution algorithms or what- the-vendor-never-tells.

DDMA Demystified

(And this is where I will tell you more about Markov Chain Model, Hidden Markov etc.….In case you already know OR you know that knowing these names are enough, smart call  Smiley.PNGlet’s head to the next section- Comprehensibility of Algorithm Stops Here

1. Markov Chain Model: Named after the famous Russian mathematician Andrey Markov, a Markov chain model is defined by:

  • A set of states – a traversal through the states as generating a sequence. Each state adds to the sequence (except for begin and end).
  • A set of transitions with associated probabilities – the transitions emanating from a given state define a distribution over the possible next states.

Defining States: Customer journeys contain one or more contact across a variety of channels. The condition of person (customer) after interaction with a particular marketing channel, say, Search Ad, an Email, Banner ad, TV ad etc. is a state. The prospective customer moves between those states according to transition probabilities. The customer path has three special states: a START state that represents the starting point of a customer journey; a CONVERSION state representing a successful conversion; and an absorbing NULL state for customer journeys that have not ended in a conversion.
The data of customer interactions may show series of interactions, leading to the final purchase. The contribution of each channel is identified as how much of conversions disappear when that particular channel is removed.

2. Hidden Markov Model:  The Hidden Markov Model (HMM) assumes that a consumer moves in a staged manner from a disengaged state to the state of conversion, and advertisement interaction affects the consumer’s movement through these different stages based on the concept of conversion funnel. However, when the consumer actively interacts with these advertisements (e.g. by clicking on them), his likelihood to convert increases considerably. HMM assigns attribution credit to an ad based on the incremental impact it has on the consumer’s probability to convert.
This is one model where the basic assumption is in keeping with our intuition of how a consumer actually reaches a stage of purchase. I agree that consumers’ path to purchase is no longer linear to the extent we can no longer record his progress in a classic sales funnel in the CRM.  It transcends multiple channels and devices, but we can appreciate the fact that the mental state of a consumer would follow a sequential path of 3-5 stages from Awareness, to Final Purchase. At least that’s the way I make my purchases.

Consumer Mental State.PNGConsumer’s Mental State during the purchasing process

3. Classification Algorithms: Classification algorithms are commonly used Machine Learning algorithms. Some of the popular ones used for DDMA are Naïve Bayesian, Logistic Regression, Decision Trees etc. Basically all these algorithms look at the interactions of the consumers’ with various ads and whether s/he converts or not. So the classification problem is that of conversion which is a binary outcome and the model is used to find the individual contribution of various channels.

4. Shapley Value Method: Shapley Value Model is another variant of Game Theory that was proposed by Von Neumann and Morgenstern in Theory of Games and Economic Behaviour in 1944. This model, named in honor of Lloyd Shapley, who introduced it in 1953, is a solution concept in cooperative game theory. To each cooperative game it assigns a unique distribution of a total surplus generated by the coalition of all players.  The assumption is, each coalition may attain some payoffs, and then we try to predict which coalitions will form (and hence the payoffs obtained). When Shapley Value Method is used for DDMA the players in the game are the marketing touch points (i.e. Ad Exposures) and the outcome/payoff is the number of conversions.

And that’s it. This is where the comprehensibility of algorithms probably stops.
The next generation of machine learning algorithms based on Neural Networks and Deep Learning are all Black Boxes and these Black Boxes are going to take over the WORLD. The arrival of next generation of algorithms are a great equalizer where the NERD and HERD become the same (as far as explaining why it does what it does goes).

Welcome to the world of Black Box Algorithms-Neural networks/Deep Learning
Black Box.PNG

Neural networks/Deep Learning are being hyped as the next big thing, and it pretty much already is.  They are key components of many AI(Artificial Intelligence) applications that includes image recognition, speech recognition, behaviour, natural language understanding, machine translation and of course the self-driving cars.

Self Driving Car.PNGThey can be a cause for concern because they’re ‘black boxes’ when it comes to elucidating exactly how their results are generated.

So what is Deep Learning?

Deep LearningNah not this!

Deep Learnings are simulations of Neurons in the Human Brain.Neurons-Deep Learning.PNG
(Source: Andrew NG: Deep Learning Course)

Neural Networks are good at supervised learning (learning from tagged data). By giving Neural Networks a lot of data, they can perform image, speech and behaviour recognition. Deep Learning is all about large Neural Networks. Andrew Ng (former chief scientist at Baidu, Co-Founder of Coursera; Stanford CS faculty), explains how and why Deep Learning algorithms will beat every other algorithm.  He calls it the Virtuous Cycle  of AI.

Virtuous Cycle of AIThe Virtuous Cycle of AI (Source: Andrew NG: Deep Learning Course)

Most algorithms taper off and they don’t get any better with more data. But not Deep Learning algorithms.   And this leads to the Virtuous Cycle of AI.  This is most likely the reason, why does Google Search beats every other search engine. Not necessarily because of superiority of algorithms but because of greater access to data. More people use Google for search and better it gets and hence even more people use Google.

So I expect the next generation attribution algorithms to be based on Deep Learning. And may be attribution as a concept may become irrelevant when managerial decisions on marketing spends are directly provided as answers by Deep Learning algorithms.  These are scary times because the guy with the data is the guy who wins. The guy who is winning is the guy who has the data. Which as of now is only Google and Facebook (Possibly Amazon, if Bezos decides this is the next thing for him).


Marketers look forward to data-driven algorithms for assigning the right amounts of credit to advertisements that led to final consumer purchase and, they would like to modera/supplement this information with their own gut and intuition. In the days to come, with every customer interaction being digitally recorded ab-initio, most probably by Google, we are creating a perfect platform for the data hungry Deep Learning Algorithms which may simply tell you how to tweak your marketing budget. You will have to take it at face value. The current set of algorithms can be understood, comprehended and explained as to how the results where arrived at if you put in the EFFORT. Believe me, with the arrival of Deep Learning (Neural Networks) we can forget what’s under the hood.  If Marketers are not able to understand it, trust Google and if Google says if it is true and you will have to believe it.

So is Machine Learning worth learning or should we just leave it to the Machines. If we cannot decide we can ask Google.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s