modeling is the head of analytics worth. It doesn’t concentrate on what occurred, and even what will occur – it takes analytics additional by telling us what we should always do to vary what will occur. To harness this additional prescriptive energy, nonetheless, we should tackle an extra assumption…a causal assumption. The naive practitioner will not be conscious that transferring from predictive to prescriptive comes with the luggage of this lurking assumption. I Googled ‘prescriptive analytics’ and searched the primary ten articles for the phrase ‘causal.’ To not my shock (however to my disappointment), I didn’t get a single hit. I loosened the specificity of my phrase search by attempting ‘assumption’ – this one did shock me, not a single hit both! It’s clear to me that that is an under-taught element of prescriptive modeling. Let’s repair that!
While you use prescriptive modeling, you make causal bets, whether or not it or not. And from what I’ve seen this can be a terribly under-emphasized level on the subject given its significance.
By the tip of this text, you’ll have a transparent understanding of why prescriptive modeling has causal assumptions and how one can establish in case your mannequin/method meets them. We’ll get there by masking the matters beneath:
- Transient overview of prescriptive modeling
- Why does prescriptive modeling have a causal assumption?
- How do we all know if we now have met the causal assumption?
What’s Prescriptive Modeling?
Earlier than we get too far, I need to say that that is not an article on prescriptive analytics – there may be loads of details about that elsewhere. This portion shall be a fast overview to function a refresher for readers who’re already at the very least considerably accustomed to the subject.
There’s a broadly recognized hierarchy of three analytics varieties: (1) descriptive analytics, (2) predictive analytics, and (3) prescriptive analytics.
Descriptive analytics appears to be like at attributes and qualities within the information. It calculates tendencies, averages, medians, commonplace deviations, and so on. Descriptive analytics doesn’t try to say something extra in regards to the information than is empirically observable. Usually, descriptive analytics are present in dashboards and experiences. The worth it gives is in informing the consumer of the important thing statistics within the information.
Predictive analytics goes a step past descriptive analytics. As a substitute of summarizing information, predictive analytics finds relationships inside the information. It makes an attempt to separate the noise from the sign in these relationships to search out underlying, generalizable patterns. From these patterns, it will possibly make predictions on unseen information. It goes additional than descriptive analytics as a result of it gives insights on unseen information, slightly than simply the information which can be instantly noticed.
Prescriptive analytics goes an extra step past predictive analytics. Prescriptive analytics makes use of fashions created by predictive analytics to advocate good or optimum actions. Usually, prescriptive analytics will run simulations by predictive fashions and advocate the technique with probably the most fascinating end result.
Let’s take into account an instance to higher illustrate the distinction between predictive and prescriptive analytics. Think about you’re a information scientist at an organization that sells subscriptions to on-line publications. You could have developed a mannequin that predicts that likelihood {that a} buyer will cancel their subscription in a given month. The mannequin has a number of inputs, together with promotions despatched to the shopper. Thus far, you’ve solely engaged in predictive modeling. Someday, you get the brilliant concept that it is best to enter completely different reductions into your predictive mannequin, observe the impression of the reductions on buyer churn, and advocate the reductions that greatest stability the price of the low cost with the advantage of elevated buyer retention. Together with your shift in focus from prediction to intervention, you have got graduated to prescriptive analytics!
Beneath are examples of doable analyses for the shopper churn mannequin for every stage of analytics:
Now that we’ve been refreshed on the three kinds of analytics, let’s get into the causal assumption that’s distinctive to prescriptive analytics.
The Causal Assumption in Prescriptive Analytics
Transferring from predictive to prescriptive analytics feels intuitive and pure. You could have a mannequin that predicts an vital end result utilizing options, a few of that are in your management. It is sensible to then simulate manipulating these options to drive in direction of a desired end result. What doesn’t really feel intuitive (at the very least to a junior modeler) is that doing so strikes you right into a harmful area in case your mannequin hasn’t captured the causal relationships between the goal variable and the options you plan to vary.
We’ll first present the risks with a easy instance involving a rubber duck, leaves and a pool. We’ll then transfer on to real-world failures which have come from making causal bets once they weren’t warranted.
Leaves, a pool and a rubber duck
You take pleasure in spending time exterior close to your pool. As an astute observer of your atmosphere, you discover that your favourite pool toy – a rubber duck – is often in the identical a part of the pool because the leaves that fall from a close-by tree.

Finally, you determine that it’s time to clear the leaves out of the pool. There’s a particular nook of the pool that’s best to entry, and also you need all the leaves to be in that space so you’ll be able to extra simply accumulate and discard them. Given the mannequin you have got created – the rubber duck is in the identical space because the leaves – you determine that it might be very intelligent to maneuver the toy to the nook and watch in delight because the leaves observe the duck. Then you’ll simply scoop them up and proceed with the remainder of the day, having fun with your newly cleaned pool.
You make the change and really feel like a idiot as you stand within the nook of the pool, proper over the rubber duck, internet in hand, whereas the leaves stubbornly keep in place. You could have made the horrible mistake of utilizing prescriptive analytics when your mannequin doesn’t go the causal assumption!

Perplexed, you look into the pool once more. You discover a slight disturbance within the water coming from the pool jets. You then determine to rethink your predictive modeling method utilizing the angle of the jets to foretell the placement of the leaves as a substitute of the rubber duck. With this new mannequin, you estimate how you might want to configure the jets to get the leaves to your favourite nook. You progress the jets and this time you might be profitable! The leaves drift to the nook, you take away them and go on together with your day a wiser information scientist!
This can be a quirky instance, however it does illustrate just a few factors effectively. Let me name them out.
- The rubber duck is a traditional ‘confounding’ variable. It’s also affected by the pool jets and has no impression on the placement of the leaves.
- Each the rubber duck and the pool jet fashions made correct predictions – if we merely wished to know the place the leaves had been, they could possibly be equivalently good.
- What breaks the rubber duck mannequin has nothing to do with the mannequin itself and all the pieces to do with the way you used the mannequin. The causal assumption wasn’t warranted however you moved ahead anyway!
I hope you loved the whimsical instance – let’s transition to speaking about real-world examples.
Shark Tank Pitch
In case you haven’t seen it, Shark Tank is a present the place entrepreneurs pitch their enterprise concept to rich traders (referred to as ‘sharks’) with the hopes of securing funding cash.
I used to be not too long ago watching a Shark Tank re-run (as one does) – one of many pitches within the episode (Season 10, Episode 15) was for a corporation referred to as GoalSetter. GoalSetter is an organization that permits mother and father to open ‘mini’ financial institution accounts of their baby’s title that household and associates could make deposits into. The thought is that as a substitute of giving toys or reward playing cards to kids as presents, folks can provide deposit certificates and kids can save up for issues (‘targets’) they need to buy.
I’ve no qualms with the enterprise concept, however within the presentation, the entrepreneur made this declare:
…children who’ve financial savings accounts of their title are six occasions extra prone to go to school and 4 occasions extra prone to personal shares by the point they’re younger adults…
Assuming this statistic is true, this assertion, by itself, is all fantastic and effectively. We will take a look at the information and see that there’s a relationship between a baby having a checking account of their title and going to school and/or investing (descriptive). We might even develop a mannequin that predicts if a baby will go to school or personal shares utilizing checking account of their title as a predictor (predictive). However this doesn’t inform us something about causation! The funding pitch has this refined prescriptive message – “give your child a GoalSetting account and they are going to be extra prone to go to school and personal shares.” Whereas semantically just like the quote above, these two statements are worlds aside! One is an announcement of statistical incontrovertible fact that depends on no assumptions, and the opposite is a prescriptive assertion that has a enormous causal assumption! I hope that confounding variable alarms are ringing in your head proper now. It appears a lot extra doubtless that issues like family revenue, monetary literacy of oldsters and cultural influences would have a relationship with each the likelihood of opening a checking account in a baby’s title and that baby going to school. It doesn’t appear doubtless that giving a random child a checking account of their title will enhance their possibilities of going to school. That is like transferring the duck within the pool and anticipating the leaves to observe!
Studying Is Elementary Program
Within the Nineteen Sixties, there was a government-funded program referred to as ‘Studying is Elementary (RIF).’ A part of this program centered on placing books within the properties of low-income kids. The purpose was to extend literacy in these households. The technique was partially based mostly on the concept that properties with extra books in them had extra literate kids. You would possibly know the place I’m going with this one based mostly on the Shark Tank instance we simply mentioned. Observing that properties with a lot of books have extra literate kids is descriptive. There’s nothing unsuitable with that. However, once you begin making suggestions, you step out of descriptive area and leap into the prescriptive world – and as we’ve established, that comes with the causal assumption. Placing books in properties assumes that the books trigger the literacy! Analysis by Susan Neuman discovered that placing books in properties was not adequate in rising literacy with out further sources1.
In fact, giving books to kids who can’t afford them is an effective factor – you don’t want a causal assumption to do good issues 😊. However, when you have the precise purpose of accelerating literacy, you’ll be well-advised to evaluate the validity of the causal assumption behind your actions to understand your required outcomes!
How do we all know if we fulfill the causality assumption?
We’ve established that prescriptive modeling requires a causal assumption (a lot that you’re in all probability exhausted!). However how can we all know if the idea is met by our mannequin? When interested by causality and information, I discover it useful to separate my ideas between experimental and observational information. Let’s undergo how we will really feel good (or perhaps at the very least ‘okay’) about causal assumptions with these two kinds of information.
Experimental Information
If in case you have entry to good experimental information in your prescriptive modeling, you might be very fortunate! Experimental information is the gold commonplace for establishing causal relationships. The main points of why that is the case are out of scope of this text, however I’ll say that the randomized project of remedies in a well-designed experiment offers with confounders, so that you don’t have to fret about them ruining your informal assumptions.
We will practice predictive fashions on the output of a very good experiment – i.e., good experimental information. On this case, the data-generating course of meets causal identification circumstances between the goal variables and variables that had been randomly assigned remedies. I need to emphasize that solely variables which can be randomly assigned within the experiment will qualify for the causal declare on the idea of the experiment alone. The causal impact of different variables (referred to as covariates) might or will not be accurately captured. For instance, think about that we ran an experiment that randomly offered a number of vegetation with varied ranges of nitrogen, phosphorus and potassium and we measured the plant development. From this experimental information, we created the mannequin beneath:

As a result of nitrogen, phosphorus and potassium had been remedies that had been randomly assigned within the experiment, we will conclude that betas 1 by 3 estimate a causal relationship on plant development. Solar publicity was not randomly assigned which prevents us from claiming a causal relationship by the ability of experimental information. This isn’t to say {that a} causal declare will not be justified for covariates, however the declare would require further assumptions that we are going to cowl within the observational information part developing.
I’ve used the qualifier good when speaking about experimental information a number of occasions now. What’s a good experiment? I’ll go over two frequent points I’ve seen that forestall an experiment from creating good information, however there may be much more that may go unsuitable. It is best to learn up on experimental design if you need to go deeper.
Execution errors: This is likely one of the most typical points with experiments. I used to be as soon as assigned to a undertaking just a few years in the past the place an experiment was run, however some information had been combined up relating to which topics bought which remedies – the information was not usable! If there have been vital execution errors chances are you’ll not be capable of draw legitimate causal conclusions from the experimental information.
Underpowered experiments: This may occur for a number of causes – for instance, there will not be sufficient sign coming from the therapy, or there might have been too few experimental models. Even with good execution, an underpowered examine might fail to uncover actual results which might forestall you from assembly the causal conclusion required for prescriptive modeling.
Observational Information
Satisfying the causal assumption with observational information is far more troublesome, dangerous and controversial than with experimental information. The randomization that could be a key half in creating experimental information is highly effective as a result of it removes the issues attributable to all confounding variables – recognized and unknown, noticed and unobserved. With observational information, we don’t have entry to this extraordinarily helpful energy.
Theoretically, if we will accurately management for all confounding variables, we will nonetheless make causal claims with observational information. Whereas some might disagree with this assertion, it’s broadly accepted in precept. The actual problem lies within the software.
To accurately management for a confounding variable, we have to (1) have high-quality information for the variable and (2) accurately mannequin the connection between the confounder and our goal variable. Doing this for every recognized confounder is troublesome, however it isn’t the worst half. The worst half is which you could by no means know with certainty that you’ve accounted for all confounders. Even with sturdy area data, the chance that there’s an unknown confounder “on the market” stays. The most effective we will do is embrace each confounder we will consider after which depend on what known as the ‘no unmeasured confounder’ assumption to estimate causal relationships.
Modeling with observational information can nonetheless add loads of worth in prescriptive analytics, though we will by no means know with certainty that we accounted for all confounding variables. With observational information, I consider the causal assumption as being met in levels as a substitute of in a binary style. As we account for extra confounders, we seize the causal impact higher and higher. Even when we miss just a few confounders, the mannequin should add worth. So long as the confounders don’t have too massive of an impression on the estimated causal relationships, we could possibly add extra worth making choices with a barely biased causal mannequin than utilizing the method we had earlier than we used prescriptive modeling (e.g., guidelines or intuition-based choices).
Having a practical mindset with observational information might be vital since (1) observational information is cheaper and far more frequent than experimental information and (2) if we depend on hermetic causal conclusions (which we will’t get with observational information), we could also be leaving worth on the desk by ruling out causal fashions which can be ‘adequate’, although not good. You and your small business companions must determine the extent of leniency to have with assembly the causal assumption, a mannequin constructed on observational information might nonetheless add main worth!
Wrapping it up
Whereas prescriptive analytics is highly effective and has the potential so as to add loads of worth, it depends on causal assumptions whereas descriptive and predictive analytics don’t. You will need to perceive and to satisfy the causal assumption in addition to doable.
Experimental information is the gold commonplace of estimating causal relationships. A mannequin constructed on good experimental information is in a powerful place to satisfy the causal assumptions required by prescriptive modeling.
Establishing causal relationships with observational information might be tougher due to the potential of unknown or unobserved confounding variables. We must always stability rigor and pragmatism when utilizing observational information for prescriptive modeling – rigor to think about and try to manage for each confounder doable and pragmatism to grasp that whereas the causal results will not be completely captured, the mannequin might add extra worth than the present decision-making course of.
I hope that this text has helped you acquire a greater understanding of why prescriptive modeling depends on causal assumptions and how one can tackle assembly these assumptions. Blissful modeling!
- Neuman, S. B. (2017). Principled Adversaries: Literacy Analysis for Political Motion. Academics School File, 119(6), 1–32.