Modeling Marketing Mix using PyMC3

[]

Data

I use the demo dataset provided by Robyn and follow the same methodological steps in data preparation to have the same baseline for comparison. For comparison with Robyn solutions I ran the demo.R file that comes with Robyn package, written in R, without any changes in settings.

The dataset consists of 208 weeks of revenue having:

  • 5 media spend channels: tv_S, ooh_S, print_S, facebook_S, search_S
  • 2 media channels that have also the exposure information (Impression, Clicks): facebook_I, search_clicks_P
  • Organic media without spend: newsletter
  • Control variables: events, holidays, competitor sales (competitor_sales_B)

The modeling window is 92 weeks from 2016–11–21 to 2018–08–20. To make it generic I define two index variables to refer to this time window:

START_ANALYSIS_INDEX = 52
END_ANALYSIS_INDEX = 144

Data Preparation

We have to apply two preparation steps as described in Robyn:

Robyn leverages Prophet, Facebook’s open source ML library for time series forecasting. We use Prophet to automatically decompose the trend, seasonality and holidays effects directly from the response as input variables for further modeling. This capability often increases model fit and reduces autoregressive patterns in residuals.

In addition, we can convert categorical variables such events into a numeric using Prophet’s decomposition.

The second preparation step:

When using exposure variables (impressions, clicks, GRPs etc) instead of spend Robyn fits an nonlinear model with Michaelis Menten function between exposure and spend to establish the spend-exposure relationship

The second step will allow us to convert exposure into spend for the final share of spend vs. share of effect comparison.

Let’s load the data first:

data = pd.read_csv("./data/data_raw_2015-11-23__2019-11-11.csv", parse_dates = ["DATE"])
data.columns = [c.lower() if c in ["DATE"] else c for c in data.columns]
dataImage by the author

The holidays data is a separate file that is originally shipped with Prophet. The original holidays data has a daily granularity so it should be aggregated on a weekly level first. Robyn uses German holidays in their demo.

holidays = pd.read_csv("./data/prophet_holidays_daily.csv", parse_dates = ["ds"])
holidays["begin_week"] = holidays["ds"].dt.to_period('W-SUN').dt.start_time
#combine same week holidays into one holiday
holidays_weekly = holidays.groupby(["begin_week", "country", "year"], as_index = False).agg({'holiday':'#'.join, 'country': 'first', 'year': 'first'}).rename(columns = {'begin_week': 'ds'})
holidays_weekly_de = holidays_weekly.query("(country == 'DE')").copy()
holidays_weekly_deImage by the author

Prophet Decomposition

Now we are ready to fit Prophet on our data, including holidays, a categorical variable, and using yearly seasonality. Important to note that we apply the decomposition on all available data and not on the modeling window.

prophet_data = data.rename(columns = {'revenue': 'y', 'date': 'ds'})
#add categorical into prophet
prophet_data = pd.concat([prophet_data, pd.get_dummies(prophet_data["events"], drop_first = True, prefix = "events")], axis = 1)prophet = Prophet(yearly_seasonality=True, holidays=holidays_weekly_de)
prophet.add_regressor(name = "events_event2")
prophet.add_regressor(name = "events_na")prophet.fit(prophet_data[["ds", "y", "events_event2", "events_na"]])
prophet_predict = prophet.predict(prophet_data[["ds", "y", "events_event2", "events_na"]])Image by the author

Let’s extract the seasonality, trend, holidays, and events components:

prophet_columns = [col for col in prophet_predict.columns if (col.endswith("upper") == False) & (col.endswith("lower") == False)]
events_numeric = prophet_predict[prophet_columns].filter(like = "events_").sum(axis = 1)final_data = data.copy()
final_data["trend"] = prophet_predict["trend"]
final_data["season"] = prophet_predict["yearly"]
final_data["holiday"] = prophet_predict["holidays"]
final_data["events"] = (events_numeric - np.min(events_numeric)).values

Spend-Exposure estimation

In this step, we estimate the spend-exposure non-linear relationship using the Michaelis Menten function

The spend-to-exposure function is defined as follows:

Image by the author

where Vmax and Km are two parameters we have to estimate. These two parameters will be later used to find the reverse relationship of exposure-to-spend. The parameters are estimated for the modeling window.

#define the function
spend_to_exposure_menten_func = lambda spend, V_max, K_m: V_max * spend / (K_m + spend)media_exposures = ["facebook_I", "search_clicks_P"]
media_spends = ["facebook_S", "search_S"]media_spend_exposure_df = pd.DataFrame()
for (media_exposure, media_spend) in zip(media_exposures, media_spends):
V_max = final_data[media_exposure].values[START_ANALYSIS_INDEX : END_ANALYSIS_INDEX].max()
K_m = V_max / 2
spend = final_data[media_spend].values[START_ANALYSIS_INDEX : END_ANALYSIS_INDEX]
exposure = final_data[media_exposure].values[START_ANALYSIS_INDEX : END_ANALYSIS_INDEX]
best_values, _ = optimize.curve_fit(f = spend_to_exposure_menten_func, xdata = spend, ydata = exposure, p0 = [V_max, K_m])
media_spend_exposure_df = pd.concat([media_spend_exposure_df, pd.DataFrame({'spend': [media_spend], 'exposure': [media_exposure], 'V_max': [best_values[0]], 'K_m': [best_values[1]]})]).reset_index(drop = True)

media_spend_exposure_df

Image by the author

Bayesian Modeling

Now we are ready for modeling. We model our response variable (revenue) as an additive linear regression, which can be described by the following equation:

Image by the author

where b_0 corresponds to the baseline revenue, b_m coefficients correspond to media variables that were transformed through adstock and saturation functions, b_c coefficients correspond to control variables and ϵ is some noise. All mentioned coefficients, noise, and the parameters of the adstock and saturation function should be estimated by the model.

The first decision we have to make prior to modeling is how we are going to normalize our dependent and independent variables. By normalizing independent variables we generalize our modeling to other sources of data because we can use the same priors for most of our independent variables. Moreover, it will be very difficult to set priors on non-normalized data. Therefore I apply 0–1 normalization on independent variables and experiment with two different normalizations of the response variable:

  • Scaling by 100K
  • 0–1 Normalization

Scaling by 100K

The original revenue range is 672250–3827520, so by scaling the revenue by 100K, I get the following range: 6.72–38.27 which makes it easier to experiment with priors.

First, I define my input variables:

data = final_data
transform_variables = ["trend", "season", "holiday", "competitor_sales_B", "events", "tv_S", "ooh_S", "print_S", "facebook_I", "search_clicks_P", "newsletter"]delay_channels = ["tv_S", "ooh_S", "print_S", "facebook_I", "search_clicks_P", "newsletter"]media_channels = ["tv_S", "ooh_S", "print_S", "facebook_I", "search_clicks_P"]control_variables = ["trend", "season", "holiday", "competitor_sales_B", "events"]target = "revenue"

Then I normalize independent variables using MinMaxScaler and scale the dependent variable by 100K

data_transformed = data.copy()numerical_encoder_dict = {}for feature in transform_variables:
scaler = MinMaxScaler()
original = data[feature].values.reshape(-1, 1)
transformed = scaler.fit_transform(original)
data_transformed[feature] = transformed
numerical_encoder_dict[feature] = scalerdependent_transformation = None
original = data[target].values
data_transformed[target] = original / 100_000

The modeling part consists of several steps:

  • I iterate first through media channels (delay channels) and define the priors for adstock and saturation transformation. I experimented with different priors but am finally using those described in the Bayesian Methods paper
  • I apply adstock transformation on all available data to allow the carryover effect build up using the historical data
  • I apply saturation transformation on the modeling window defined by START_ANALYSIS_INDEX and END_ANALYSIS_INDEX
  • I force the regression coefficients for delay channels to be positive using HalfNormal distribution
  • Next, I iterate through the rest of the variables without any constraints on the sign of the coefficients
  • I defined the prior for Intercept as having normal distribution starting at the average of the revenue. I follow the Robyn’s methodology constraining the Intercept to be positive.
response_mean = []
with pm.Model() as model_2:
for channel_name in delay_channels:
print(f"Delay Channels: Adding {channel_name}")

x = data_transformed[channel_name].values

adstock_param = pm.Beta(f"{channel_name}_adstock", 3, 3)
saturation_gamma = pm.Beta(f"{channel_name}_gamma", 2, 2)
saturation_alpha = pm.Gamma(f"{channel_name}_alpha", 3, 1)

x_new = adstock_geometric_theano_pymc3(x, adstock_param)
x_new_sliced = x_new[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX]
saturation_tensor = saturation_hill_pymc3(x_new_sliced, saturation_alpha, saturation_gamma)

channel_b = pm.HalfNormal(f"{channel_name}_media_coef", sd = 3)
response_mean.append(saturation_tensor * channel_b)

for control_var in control_variables:
print(f"Control Variables: Adding {control_var}")

x = data_transformed[control_var].values[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX]

control_beta = pm.Normal(f"{control_var}_control_coef", sd = 3)
control_x = control_beta * x
response_mean.append(control_x)

intercept = pm.Normal("intercept", np.mean(data_transformed[target].values), sd = 3)
#intercept = pm.HalfNormal("intercept", 0, sd = 3)

sigma = pm.HalfNormal("sigma", 4)

likelihood = pm.Normal("outcome", mu = intercept + sum(response_mean), sd = sigma, observed = data_transformed[target].values[START_ANALYSIS_INDEX:END_ANALYSIS_INDEX])

I generate samples from prior distribution to check if my choice of priors was reasonable:

Image by the author

And plot prior distributions:

Post a Comment

hey there, great job keep on interacting
© Quancea official©.ⒹPowered by Datamiv  All rights reserved. Powered by Mrskt