Written by Gianluca Binelli.
In the past few weeks, Covid-19 has been spreading around the world. This tragedy, impacting thousands of lives all over the world, is forcing people to change their lifestyles and their habits in order to limit virus diffusion. The main goal is to reduce the impact of the virus thereby helping national health care systems to take care of people in need. Countries around the globe are taking containment measures by introducing social distancing and quarantines.
In the coming months, the virus is going to impact the economy in all aspects, digital marketing included. Business leaders across the globe are trying to assess the impact.
To answer that question, as scientifically as possible, our analysts did what they do best: they used a script.
Building a “Parallel Universe” with the Synthetic Control Method
To assess the impact of changes in consumer’s behaviour and various government measures we use the Synthetic Control Method.
Imagine building a parallel universe where you can compare the impact Covid-19 has had on the global economy with a universe where Covid-19 never reared its ugly head. The Synthetic Control Method lets us do exactly this.
Leveraging the CausalImpact R Package
CausalImpact is a great package, developed by Google (link to the paper here), which performs an improved version of the so-called Synthetic Control Method, an econometric technique that estimates the causal effect of a certain action/policy that occurs in a specific time period by “Frankensteining” together a “perfect twin” made from the time series we are interested in.
When we talk about the “perfect twin” we are referring to the “counterfactual”, or, “what would have happened to our time-series if the action/policy had not occurred”. To create this “perfect twin”, we need to use n similar time series which were not affected by the action/policy.
Example: How Many Sales Were Lost Due to the Lockdown Effect?
Let’s assume we are working with an international marketing campaign active in many European countries (UK, Germany, Italy, France, etc.) and we want to check whether Italy’s lockdown has had an effect on sales.
If we want to perform such an analysis, we can consider the Italian sales time series (at a daily level) and utilize the enforcements imposed by the Italian Government (on March 9th) to assess such an impact.
In order to estimate Italy’s “perfect twin”, the CausalImpact package will use the other European countries’ sales time-series and, by taking a weighted average of those countries, create the counterfactual.
By looking at the difference between the actual sales time series and the counterfactual after the policy was introduced, we can estimate the impact of the measures on sales.
This is how it works.
Step 0: Wash Your Hands. No, Seriously, Do It.
Step 1. Your Input Data
Prepare a spreadsheet with your dataset.
You can make a copy of this example of input sheet and adjust it to your own data. Bear in mind that the first column must be the dates and the second column must be the time series you want to study.
Done? Good. Now download it as a CSV.
Step 2: Running the Script
If you do not have it already, download RStudio (link here) and follow the instructions in order to execute the file.
Also download our R script (link at the bottom of the page), and execute it.
Step 3: Output of the Script
This is how the script output should look.
The first part of the graph showing the main time-series (bold line) and the predicted twin behaviour (dotted line).
Please note that it is very important that the perfect twin behaves closely to the main line, i.e. the differences between the two time-series should be as little as possible.
By looking at the first cell of the graph we can see that the dotted line is pretty close to the bold line, hence we can conclude that the twin we created artificially was a good one.
If you see a huge difference, such as in this frame below, you should change the set of time-series you are using to predict the counterfactual. The second part of the graph shows the precise difference between the actual time-series and the counterfactual.
In the last cell of the graph, you can see the cumulative effect of the intervention on sales.
The R script also provides you with a written analysis of the output, which should provide further clarifications to your analysis.
Where to Get Started
You can find a raw R script to do here, it is commented, but in case you have any questions please reach out in the comments below and we’ll be happy to help.
Stay safe. #andratuttobene,
Booster Box
P.S. Would you like to help Bergamo? Bergamo city and the surrounding province has been affected drastically by the Coronavirus: more than 3,900 people are infected. Doctors and personnel of the Papa Giovanni XXIII Hospital are working tirelessly. We fear the situation will only get worse in the coming days. We can all help them reinforce their intensive care unit by making a donation at https://www.gofundme.com/f/emergenza-covid-cesvi-per-bergamo