In the world of e-commerce, optimizing the conversion rate is a paramount goal. A/B testing is a powerful tool that allows businesses to experiment with changes to their websites and product pages, enabling data-driven decisions for improved conversion rate. This case study presents an A/B test conducted for an ecommerce website to enhance its conversion rate.
Business: XYZ Electronics
Objective: To increase the conversion rate on the product page
Duration: June 2023
The hypothesis for this A/B test is a change in the product page will result in a higher conversion rate. The change to be tested is an alternative design of the “Checkout page with”, where the original design will be compared against the new design
The original checkout design without the paypal button The new design to be tested with the paypal button
The test would last for more between 2 weeks to 4 weeks, during the month of January 2017.
The data was saved on a database, to be queried and imported into R. Importing the required libraries
# For connecting to PostgreSQL database
library(RPostgreSQL)
# For data manipulation and visualization
library(tidyverse)
# advanced themes for ggplot visualization
library(ggthemes)
Connecting to the Database The code to connect to the database was excluded for privacy issues.
Next, we return a list of all the tables in the database so we can get the tables we will be needing for the test
## [1] "transactions" "users" "ab_data" "countries"
## [5] "combined_table"
Query database to collect required data and preview the tables
# load the tables
ab_data <- as_tibble(dbGetQuery(conn, "SELECT * FROM ab_data"))
countries <- as_tibble(dbGetQuery(conn, "SELECT * FROM countries"))
# preview the tables
head(ab_data)
First we get the structure of both tables.
## tibble [294,478 × 5] (S3: tbl_df/tbl/data.frame)
## $ user_id : int [1:294478] 851104 804228 661590 853541 864975 936923 679687 719014 817355 839785 ...
## $ timestamp : POSIXct[1:294478], format: "2017-01-21 22:11:48" "2017-01-12 08:01:45" ...
## $ group : chr [1:294478] "control" "control" "treatment" "treatment" ...
## $ landing_page: chr [1:294478] "old_page" "old_page" "new_page" "new_page" ...
## $ converted : int [1:294478] 0 0 0 0 1 0 1 0 1 1 ...
## tibble [290,584 × 2] (S3: tbl_df/tbl/data.frame)
## $ user_id: int [1:290584] 834778 928468 822059 711597 710616 909908 811617 938122 887018 820683 ...
## $ country: chr [1:290584] "UK" "US" "UK" "UK" ...
The data frame ab_data is having 294478 rows and 5 columns, while the countries data frame is having 290584 rows and 2` columns. Both data are related by user_id variable.
user_id: the unique identifier for each visitor in the dataset.
timestamp: the date and time the user enter the page.
group: category of the experiment to which users belong to.
landing page: old or new page used for the analysis.
country: the geographical location of the user.
Treatment indicators: landing page, group
Response indicator: converted
The first thing to do is to check if the randomization have been done properly and is equal between the treatments.
##
## new_page old_page
## control 1928 145274
## treatment 145311 1965
There seems to be some data that has been misplaced, control group should be old_pages, treatment group should be new_pages. Next we instigated which of the variables have been wrongly placed
## new_page old_page
## 147239 147239
The landing_page variable is correctly randomized.
## control treatment
## 147202 147276
The group variable is wrongly randomized and will be corrected.
# replace wrongly placed control and treatment group
ab_data_combined <- ab_data_combined %>%
mutate(group = ifelse(landing_page == "new_page", "treatment", "control"))
# check for correct randomization
table(ab_data_combined$group, ab_data_combined$landing_page)
##
## new_page old_page
## control 0 147239
## treatment 147239 0
The randomization has been fixed and the experiment can continue without issues ## Missing Data After correcting the randomization, we check if there are missing in any of the column of the data frame.
## [1] FALSE
There are no missing data
The total number of customers in the experiment are 294478 from 3 countries.
ab_data_combined %>%
group_by(group, landing_page) %>%
summarize(prob = sum(converted)/length(converted))
## `summarise()` has grouped output by 'group'. You can override using the
## `.groups` argument.
##
## 2-sample test for equality of proportions with continuity correction
##
## data: xtabs(~landing_page + converted, data = ab_data_combined)[, 2:1]
## X-squared = 1.8568, df = 1, p-value = 0.173
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.0039880781 0.0007144889
## sample estimates:
## prop 1 prop 2
## 0.1188408 0.1204776
There are no differences across the landing page, and the probability of conversion is more or less the same as presented by the p-value
##
## Welch Two Sample t-test
##
## data: converted by landing_page
## t = -1.3683, df = 294466, p-value = 0.1712
## alternative hypothesis: true difference in means between group new_page and group old_page is not equal to 0
## 95 percent confidence interval:
## -0.0039813041 0.0007077149
## sample estimates:
## mean in group new_page mean in group old_page
## 0.1188408 0.1204776
t-test shows that there is no significant difference between the landing pages as the confidence interval is close to zero for the means of both pages.
The probability of conversion seems to be similar across regions, but I will carry out a test to compare the 3 regions
## Df Sum Sq Mean Sq F value Pr(>F)
## country 2 0 0.1360 1.291 0.275
## Residuals 294475 31020 0.1053
The p-value shows that that country does not an effect on the conversion rate of the store. # Summary While test and visuals has shown that there’s no significant difference between the landing pages we can note that: - the old page (mean conversion rate = 0.1205) is likely to convert more than the new page (mean conversion rate = 0.1188)
A p-value of 0.1712 implies that there’s a 17.12% probability of observing a difference as extreme as the one observed, but we can’t take this as a significant difference as there isn’t strong evidence to reject that the pages are having the same effect.
Given preliminary test, there’s no need to move on with the new_page, and saving cost is advisable here.
Country do not also have an effect on the conversion rate.
An AB Test was carried out to test if a new landing page should be implemented by an online store, to increase its conversion rate of its customers. After comparing the means, and test of significant, it is recommended to keep the old landing page as it is more effective than the new page, even if it doesn’t really differ. However, to save cost, it is advised to keep the old landing page