Credit Card Project [5-6]

The promotion was finally launched on February 1, 2020. 

Was the promotion effective? We will find out!

In the earlier modules, we created two data sets:

  • Lead1 and
  • Lead1_control

Let's first combine the two data sets:

data all_lead1;
set lead1 (in=a) lead1_control (in=b);
length flag $8;
if a then flag = 'Target';
else if b then flag = 'Control';

The All_lead1 data set contains the full list of customers in this Sephora promo (both target and control groups).

These customers have all received the Sephora promotion in some form (i.e. email, tele-sales, Facebook).

Now, let's track their spending in February, 2020.

proc sql;
create table tran_track_temp1 as
select a.*, age, income, gender, flag
from transaction_table_202002 a inner join all_lead1 b
on a.custno = b.custno
order by a.custno;

The Proc SQL step above looks at the transaction records in February.

It inner-joins the All_lead1 table with the transaction record data.

The output tran_track_temp1 table contains the transaction records for customers in our Sephora promo.

Now, for each customer we will calculate how many transactions they have made, and the total amount they spent in February:

proc sql;
create table tran_track as
select custno, flag, count(*) as num_transac, sum(transaction_amount) as tot_spend
from tran_track_temp1
group by custno, flag
order by custno, flag;

The proc sql step above creates two columns for each customer:

  • Num_transac: the number of transactions made
  • Tot_spend: the amount spent in February

We can now perform a student t-test to compare the results between the two groups.
First, let's compare the total spent by the target and control groups:

proc ttest data=tran_track;
class flag;
var tot_spend;

In our example, we want to compare the results of the target and control groups.​

The Flag variable specifies whether the customer is in the target or control group.

We will specify it in the Class statement.

We also want to analyze the total spend in our analysis.

The Tot_spend variable is specified in the Var statement:

Now, let's look at the results.

The target group spent an average of $1,527 in February, compared with $1,291 by the control group.

The spend increment is $235.

The target group has spent on average $235 more than the control group.

Is the difference statistically significant? We'll look at the p-values from the t-test results.

The Proc TTEST output provides p-values from two different methods

  • Pooled method
  • Satterthwaite method.

The Pooled method is used when the two comparison groups (i.e. target and control) have equal variance, otherwise the Satterthwaite method is used.
How do we know if the two groups have an equal variance?
We look at the equality of variance test that is also provided in the output.

The p-value for the equality of variance test is 0.0372, which is less than 0.05. 
We can conclude that there is a significant difference between the variance of the two groups.
As a result, we will look at the p-value from the Satterthwaite method, which is 0.0455.

The p-value is less than 0.05.

We can conclude that there is a statistically significant difference between the total amounts spent by the two groups.

In layman’s terms, customers in the target group, who received the promotional messages, spent an average of $235 more than those who were not included in the promotion. 

This is unlikely to happen by chance, so the promotion seems to have worked very well.

Now, let's also look at whether the promotion increased the number of transactions made by the customers.

proc ttest data=tran_track;
class flag;
var num_transac;

The target group made slightly more purchases (5.18) than the control group (5.01).
The p-values from both the Pooled and the Satterthwaite methods are above 0.05.

The result is not statistically significant.
The promotion did not end up leading the customers to make more purchases (in terms of transaction count).