Introduction

In the world of e-commerce, optimizing the conversion rate is a paramount goal. A/B testing is a powerful tool that allows businesses to experiment with changes to their websites and product pages, enabling data-driven decisions for improved conversion rate. This case study presents an A/B test conducted for an ecommerce website to enhance its conversion rate.

Background

  • Business: XYZ Electronics

  • Objective: To increase the conversion rate on the product page

  • Duration: June 2023

Hypothesis

The hypothesis for this A/B test is a change in the product page will result in a higher conversion rate. The change to be tested is an alternative design of the “Checkout page with”, where the original design will be compared against the new design

The original checkout design without the paypal button The new design to be tested with the paypal button

Test Duration

The test would last for more between 2 weeks to 4 weeks, during the month of January 2017.

Data Understanding

Data Collection/ Sourcing

The data was saved on a database, to be queried and imported into R. Importing the required libraries

# For connecting to PostgreSQL database
library(RPostgreSQL)

# For data manipulation and visualization
library(tidyverse)

# advanced themes for ggplot visualization
library(ggthemes)

Connecting to the Database The code to connect to the database was excluded for privacy issues.

Next, we return a list of all the tables in the database so we can get the tables we will be needing for the test

dbListTables(conn)
## [1] "transactions"   "users"          "ab_data"        "countries"     
## [5] "combined_table"

Query database to collect required data and preview the tables

# load the tables
ab_data <- as_tibble(dbGetQuery(conn, "SELECT * FROM ab_data"))
countries <- as_tibble(dbGetQuery(conn, "SELECT * FROM countries"))

# preview the tables
head(ab_data)
head(countries)

Data Structure and Data Manipulation

First we get the structure of both tables.

str(ab_data)
## tibble [294,478 × 5] (S3: tbl_df/tbl/data.frame)
##  $ user_id     : int [1:294478] 851104 804228 661590 853541 864975 936923 679687 719014 817355 839785 ...
##  $ timestamp   : POSIXct[1:294478], format: "2017-01-21 22:11:48" "2017-01-12 08:01:45" ...
##  $ group       : chr [1:294478] "control" "control" "treatment" "treatment" ...
##  $ landing_page: chr [1:294478] "old_page" "old_page" "new_page" "new_page" ...
##  $ converted   : int [1:294478] 0 0 0 0 1 0 1 0 1 1 ...
str(countries)
## tibble [290,584 × 2] (S3: tbl_df/tbl/data.frame)
##  $ user_id: int [1:290584] 834778 928468 822059 711597 710616 909908 811617 938122 887018 820683 ...
##  $ country: chr [1:290584] "UK" "US" "UK" "UK" ...

The data frame ab_data is having 294478 rows and 5 columns, while the countries data frame is having 290584 rows and 2` columns. Both data are related by user_id variable.

# join the data
ab_data_combined <- ab_data %>%
  left_join(countries, by = "user_id")

Data Definition

  • user_id: the unique identifier for each visitor in the dataset.

  • timestamp: the date and time the user enter the page.

  • group: category of the experiment to which users belong to.

  • landing page: old or new page used for the analysis.

  • country: the geographical location of the user.

Variables

  • Treatment indicators: landing page, group

  • Response indicator: converted

Data Analysis

Randomization check

The first thing to do is to check if the randomization have been done properly and is equal between the treatments.

table(ab_data_combined$group, ab_data_combined$landing_page)
##            
##             new_page old_page
##   control       1928   145274
##   treatment   145311     1965

There seems to be some data that has been misplaced, control group should be old_pages, treatment group should be new_pages. Next we instigated which of the variables have been wrongly placed

colSums(table(ab_data_combined$group, ab_data_combined$landing_page))
## new_page old_page 
##   147239   147239

The landing_page variable is correctly randomized.

rowSums(table(ab_data_combined$group, ab_data_combined$landing_page))
##   control treatment 
##    147202    147276

The group variable is wrongly randomized and will be corrected.

Fixing Randomization

# replace wrongly placed control and treatment group
ab_data_combined <- ab_data_combined %>%
  mutate(group = ifelse(landing_page == "new_page", "treatment", "control"))

# check for correct randomization
table(ab_data_combined$group, ab_data_combined$landing_page)
##            
##             new_page old_page
##   control          0   147239
##   treatment   147239        0

The randomization has been fixed and the experiment can continue without issues ## Missing Data After correcting the randomization, we check if there are missing in any of the column of the data frame.

anyNA(ab_data_combined)
## [1] FALSE

There are no missing data

EDA

Total number of responses

The total number of customers in the experiment are 294478 from 3 countries.

ab_data_combined %>%
  ggplot(aes(group, fill = country))+
  geom_bar(position = "dodge")+
  scale_fill_manual(values = c("yellow4", "wheat4", "tan4"))+
  theme_minimal()+
  ggtitle("Frequency of Groups According to Countries")

Frequency of Members That Converted

ab_data_combined %>%
  filter(converted == 1) %>%
  ggplot(aes(country, fill = group))+
  geom_bar(position = "dodge2")+
  scale_fill_manual(values = c("tan4", "wheat4"))+
  labs(y = "Frequency",
       title = "Frequency of Converted Customer Across the Various Regions")+
  theme_minimal()

Data Analysis

Probability of Conversion for the Groups

ab_data_combined %>%
  group_by(group, landing_page) %>%
  summarize(prob = sum(converted)/length(converted))
## `summarise()` has grouped output by 'group'. You can override using the
## `.groups` argument.

Is the probability of being converted equal across landing page?

prop.test(xtabs(~landing_page + converted, data = ab_data_combined)[, 2:1])
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  xtabs(~landing_page + converted, data = ab_data_combined)[, 2:1]
## X-squared = 1.8568, df = 1, p-value = 0.173
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0039880781  0.0007144889
## sample estimates:
##    prop 1    prop 2 
## 0.1188408 0.1204776

There are no differences across the landing page, and the probability of conversion is more or less the same as presented by the p-value

Do conversion changes across the landing page

t.test(converted~landing_page, data = ab_data_combined)
## 
##  Welch Two Sample t-test
## 
## data:  converted by landing_page
## t = -1.3683, df = 294466, p-value = 0.1712
## alternative hypothesis: true difference in means between group new_page and group old_page is not equal to 0
## 95 percent confidence interval:
##  -0.0039813041  0.0007077149
## sample estimates:
## mean in group new_page mean in group old_page 
##              0.1188408              0.1204776

t-test shows that there is no significant difference between the landing pages as the confidence interval is close to zero for the means of both pages.

Does conversion rate changes across Country Borders

ab_data_combined %>%
  group_by(country) %>%
  summarise(prob = sum(converted)/length(converted))

The probability of conversion seems to be similar across regions, but I will carry out a test to compare the 3 regions

One Way Anova Test

anova_test <- aov(converted ~ country, data = ab_data_combined)

summary(anova_test)
##                 Df Sum Sq Mean Sq F value Pr(>F)
## country          2      0  0.1360   1.291  0.275
## Residuals   294475  31020  0.1053

The p-value shows that that country does not an effect on the conversion rate of the store. # Summary While test and visuals has shown that there’s no significant difference between the landing pages we can note that: - the old page (mean conversion rate = 0.1205) is likely to convert more than the new page (mean conversion rate = 0.1188)

  • A p-value of 0.1712 implies that there’s a 17.12% probability of observing a difference as extreme as the one observed, but we can’t take this as a significant difference as there isn’t strong evidence to reject that the pages are having the same effect.

  • Given preliminary test, there’s no need to move on with the new_page, and saving cost is advisable here.

  • Country do not also have an effect on the conversion rate.

Conclusion and Recommendation

An AB Test was carried out to test if a new landing page should be implemented by an online store, to increase its conversion rate of its customers. After comparing the means, and test of significant, it is recommended to keep the old landing page as it is more effective than the new page, even if it doesn’t really differ. However, to save cost, it is advised to keep the old landing page