Python A/B Test

A/B Test

0 minute read

In this project we are perfoming a so called A/B test to assess the sigh up rate of two groups using Chi-Square test for independence. Summary for the business, for the data we have so far, we can see that it’s not nessecary to spend money for the expensive mailing and keep the simpler and cheeper one.

IMPORT REQUIRED PACKAGES

import pandas as pd
from scipy.stats import chi2_contingency, chi2

IMPORT DATA

campaign_data=pd.read_excel('Data/grocery_database.xlsx', sheet_name='campaign_data')

FILTER OUR DATA-drop the row with mailer group=Control as our task is to compare mailer1 and mailer2 groups

campaign_data=campaign_data.loc[campaign_data['mailer_type']!='Control']

SUMMARISE TO GET OUR OBSERVED FREQUENCIES

observed_values=pd.crosstab(campaign_data['mailer_type'],campaign_data['signup_flag']).values
mailer1_signup_rate=123/(252+123)
mailer2_signup_rate=127/(209+127)
print(mailer1_signup_rate, mailer2_signup_rate)

SET HYPOTHESIS AND SET ACCEPTANCE CRITERIA

null_hypothesis='There is no relationship between the mailer type and signup rate. They are independent.'
alternate_hypothesis='There is a relationship between the mailer type and signup rate. They are not independent.'

acceptance_criteria=0.05

CALCULATE EXPECTED FREQUENCIES AND CH-SQUARE STATISTIC

chi2_statistic, p_value, dof, expected_values=chi2_contingency(observed_values, correction=False)
print(chi2_statistic, p_value)

FIND THE CRITICAL VALUE FOR OUR TEST

critical_value=chi2.ppf(1-acceptance_criteria, dof)
print(critical_value)

PRINT THE RESULTS (CHI-SQUARE STATISTIC)

if chi2_statistic >= critical_value:
    print(f'As our chi-square statistic of {chi2_statistic} is higher than our critical value {critical_value}, we reject the null hypothesis and conclude that: {alternate_hypothesis}')
else:
    print(f'As our chi-square statistic of {chi2_statistic} is lower than our critical value {critical_value}, we retain the null hypothesis and conclude that: {null_hypothesis}')

PRINT THE RESULTS (P_VALUE)

if p_value <= acceptance_criteria:
    print(f'As our p-value of {p_value} is lower than our acceptance_criteria {acceptance_criteria}, we reject the null hypothesis and conclude that: {alternate_hypothesis}')
else:
    print(f'As our p-value of {p_value} is higher than our acceptance_criteria {acceptance_criteria}, we retain the null hypothesis and conclude that: {null_hypothesis}')

OUTPUT: permutation_importance

Khatuna Kurdovanidze

Data Analyst/Data Scientist Portfolio