Analyzing Sales Performance:
E-Commerce Insights from Indian Market
In this project, I delved into the dynamic landscape of Indian e-commerce, exploring the intricacies of sales performance through meticulously curated data.
Tool used: R Programming Language
Introduction
Welcome to the world of e-commerce sales analysis! In this dataset, i delved into the dynamic landscape of Indian e-commerce, exploring the intricacies of sales performance through meticulously curated data. the dataset, sourced from Kaggle, comprises three comprehensive CSV files: List of Orders, Order Details, and Sales Targets.
Dataset Overview
- List of Orders: This dataset offers a detailed glimpse into purchase information, featuring essential data points such as Order ID, Date of Purchase, and Customer Details.
- Order Details: Dive deeper into the specifics of this dataset, which includes Order ID, Order Price, Quantity, Profit, and detailed Category and Subcategory information.
- Sales Target: Gain insight into the overarching goals and aspirations of each product category with this dataset, showcasing sales target amounts and dates.
Loading Package
I started by loading and installing the package required for necessary data cleaning, analysis and visualization
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
Load Datasets
The next step was to load the dataset required for analysis
Order_List<- read.csv(“C:\\Users\\USER\\Documents\\PERSONAL\\DATA ANALYTICS\\My Portfolio\\R Studio\\List of Orders.csv”)
Order_Details<- read.csv(“C:\\Users\\USER\\Documents\\PERSONAL\\DATA ANALYTICS\\My Portfolio\\R Studio\\Order Details.csv”)
Sales_Target<- read.csv(“C:\\Users\\USER\\Documents\\PERSONAL\\DATA ANALYTICS\\My Portfolio\\R Studio\\Sales Target.csv”)
Data Cleaning and Exploration
I cleaned and explored each dataset for analysis
Order_List Dataset
glimpse(Order_List)
## Rows: 500
## Columns: 5
## $ Order.ID <chr> “B-25601”, “B-25602”, “B-25603”, “B-25604”, “B-25605”, “B…
## $ Order.Date <chr> “1/4/2018”, “1/4/2018”, “3/4/2018”, “3/4/2018”, “5/4/2018…
## $ CustomerName <chr> “Bharat”, “Pearl”, “Jahan”, “Divsha”, “Kasheen”, “Hazel”,…
## $ State <chr> “Gujarat”, “Maharashtra”, “Madhya Pradesh”, “Rajasthan”, …
## $ City <chr> “Ahmedabad”, “Pune”, “Bhopal”, “Jaipur”, “Kolkata”, “Bang…
I converted the ‘order_Date’ column from character datatype to date datatype
Order_List$Order.Date<- dmy(Order_List$Order.Date)
glimpse(Order_List)
## Rows: 500
## Columns: 5
## $ Order.ID <chr> “B-25601”, “B-25602”, “B-25603”, “B-25604”, “B-25605”, “B…
## $ Order.Date <date> 2018-04-01, 2018-04-01, 2018-04-03, 2018-04-03, 2018-04-…
## $ CustomerName <chr> “Bharat”, “Pearl”, “Jahan”, “Divsha”, “Kasheen”, “Hazel”,…
## $ State <chr> “Gujarat”, “Maharashtra”, “Madhya Pradesh”, “Rajasthan”, …
## $ City <chr> “Ahmedabad”, “Pune”, “Bhopal”, “Jaipur”, “Kolkata”, “Bang…
Order_Details Dataset
glimpse(Order_Details)
## Rows: 1,500
## Columns: 6
## $ Order.ID <chr> “B-25601”, “B-25601”, “B-25601”, “B-25601”, “B-25602”, “B…
## $ Amount <int> 1275, 66, 8, 80, 168, 424, 2617, 561, 119, 1355, 24, 193,…
## $ Profit <int> -1148, -12, -2, -56, -111, -272, 1151, 212, -5, -60, -30,…
## $ Quantity <int> 7, 5, 3, 4, 2, 5, 4, 3, 8, 5, 1, 3, 3, 4, 6, 2, 1, 2, 9, …
## $ Category <chr> “Furniture”, “Clothing”, “Clothing”, “Electronics”, “Elec…
## $ Sub.Category <chr> “Bookcases”, “Stole”, “Hankerchief”, “Electronic Games”, …
Sales_Target Dataset
glimpse(Sales_Target)
## Rows: 36
## Columns: 3
## $ Month <chr> “Apr-18”, “May-18”, “Jun-18”, “Jul-18”, “Aug-18”, “Sep-18”, “…
## $ Category <chr> “Furniture”, “Furniture”, “Furniture”, “Furniture”, “Furnitur…
## $ Target <int> 10400, 10500, 10600, 10800, 10900, 11000, 11100, 11300, 11400…
Summary Statistics
Let’s summarize the dataset before carrying further analysis and answering the business questions
summary(Order_List)
## Order.ID Order.Date CustomerName State
## Length:500 Min. :2018-04-01 Length:500 Length:500
## Class :character 1st Qu.:2018-07-20 Class :character Class :character
## Mode :character Median :2018-11-05 Mode :character Mode :character
## Mean :2018-10-21
## 3rd Qu.:2019-01-25
## Max. :2019-03-31
## City
## Length:500
## Class :character
## Mode :character
##
##
##
summary(Order_Details)
## Order.ID Amount Profit Quantity
## Length:1500 Min. : 4.0 Min. :-1981.00 Min. : 1.000
## Class :character 1st Qu.: 45.0 1st Qu.: -9.25 1st Qu.: 2.000
## Mode :character Median : 118.0 Median : 9.00 Median : 3.000
## Mean : 287.7 Mean : 15.97 Mean : 3.743
## 3rd Qu.: 322.0 3rd Qu.: 38.00 3rd Qu.: 5.000
## Max. :5729.0 Max. : 1698.00 Max. :14.000
## Category Sub.Category
## Length:1500 Length:1500
## Class :character Class :character
## Mode :character Mode :character
##
##
##
summary(Sales_Target)
## Month Category Target
## Length:36 Length:36 Min. : 9000
## Class :character Class :character 1st Qu.:10050
## Mode :character Mode :character Median :11450
## Mean :12108
## 3rd Qu.:14500
## Max. :16000
Business Questions
- What is the total sales revenue for each month?
In answering this question, i joined both the Order_List and Order_Details table using the inner join function
Sales_Table<- inner_join(Order_List, Order_Details, by = “Order.ID”)
Afterwards, i calculated the sales of each transaction
Sales_Table<- mutate(Sales_Table, Sales = Amount * Quantity)
Additionally, I formatted the date column to a character datatype to display both the month names and year when calculating the sales for each month.
Sales_Table$Order.Date<- format(Sales_Table$Order.Date, format = “%b-%y”)
glimpse(Sales_Table)
## Rows: 1,500
## Columns: 11
## $ Order.ID <chr> “B-25601”, “B-25601”, “B-25601”, “B-25601”, “B-25602”, “B…
## $ Order.Date <chr> “Apr-18”, “Apr-18”, “Apr-18”, “Apr-18”, “Apr-18”, “Apr-18…
## $ CustomerName <chr> “Bharat”, “Bharat”, “Bharat”, “Bharat”, “Pearl”, “Pearl”,…
## $ State <chr> “Gujarat”, “Gujarat”, “Gujarat”, “Gujarat”, “Maharashtra”…
## $ City <chr> “Ahmedabad”, “Ahmedabad”, “Ahmedabad”, “Ahmedabad”, “Pune…
## $ Amount <int> 1275, 66, 8, 80, 168, 424, 2617, 561, 119, 1355, 24, 193,…
## $ Profit <int> -1148, -12, -2, -56, -111, -272, 1151, 212, -5, -60, -30,…
## $ Quantity <int> 7, 5, 3, 4, 2, 5, 4, 3, 8, 5, 1, 3, 3, 4, 6, 2, 1, 2, 9, …
## $ Category <chr> “Furniture”, “Clothing”, “Clothing”, “Electronics”, “Elec…
## $ Sub.Category <chr> “Bookcases”, “Stole”, “Hankerchief”, “Electronic Games”, …
## $ Sales <int> 8925, 330, 24, 320, 336, 2120, 10468, 1683, 952, 6775, 24…
Finally, I calculated the sales for each month
Sales_Table %>%
select(Order.Date, Sales) %>%
group_by(Order.Date) %>%
summarise(Total_Sales = sum(Sales)) %>%
arrange(desc (Total_Sales))
## # A tibble: 12 × 2
## Order.Date Total_Sales
## <chr> <int>
## 1 Jan-19 337229
## 2 Mar-19 321750
## 3 Nov-18 229785
## 4 Feb-19 177620
## 5 Oct-18 176663
## 6 Dec-18 164839
## 7 Apr-18 152827
## 8 Aug-18 151514
## 9 May-18 138767
## 10 Sep-18 132717
## 11 Jun-18 110015
## 12 Jul-18 53144
Visualizing the above
ggplot(Sales_Table, aes(x = Order.Date, y = Sales)) +
geom_bar(stat = “identity”, fill = “blue”) +
labs(title = “Monthly Performance Analysis”, X = “Date”, y = “Total Sales”, caption = “Data sourced from Kaggle”)+
theme(axis.text.x = element_text(angle = 90)) + theme(plot.title = element_text(hjust = 0.4))
- The top three performing months were January-2019, March-2019 and November-2018
- Which category has the highest sales revenue in the given period?
Sales_Table %>%
select(Category, Sales) %>%
group_by(Category) %>%
summarise(Total_Sales = sum(Sales)) %>%
filter(Total_Sales == max(Total_Sales))
## # A tibble: 1 × 2
## Category Total_Sales
## <chr> <int>
## 1 Electronics 816583
- From the above, Electronics was the top performing product category
- How does the sales target for each category vary month by month?
Sales_Target %>%
select(Category, Target, Month) %>%
group_by(Month, Category) %>%
summarise(Total_Target = sum(Target))
## `summarise()` has grouped output by ‘Month’. You can override using the
## `.groups` argument.
## # A tibble: 36 × 3
## # Groups: Month [12]
## Month Category Total_Target
## <chr> <chr> <int>
## 1 Apr-18 Clothing 12000
## 2 Apr-18 Electronics 9000
## 3 Apr-18 Furniture 10400
## 4 Aug-18 Clothing 14000
## 5 Aug-18 Electronics 9000
## 6 Aug-18 Furniture 10900
## 7 Dec-18 Clothing 16000
## 8 Dec-18 Electronics 9000
## 9 Dec-18 Furniture 11400
## 10 Feb-19 Clothing 16000
## # ℹ 26 more rows
- The monthly targets for each product category vary. For Furniture, the monthly target changes range from +100 to +500 units. Clothing targets were increased by 2000 units quarterly in 2018, but remained constant after an increase in Q3 2018. Electronics targets were steady throughout 2018 but saw a significant 77% increase from 9000 to 16,000 units per month in January 2019, remaining constant thereafter.
- Which city has the highest number of orders?
Sales_Table %>%
select(City, Quantity) %>%
group_by(City) %>%
summarise(Order_Count = sum(Quantity)) %>%
filter(Order_Count == max(Order_Count))
## # A tibble: 1 × 2
## City Order_Count
## <chr> <int>
## 1 Indore 1084
- The city with the highest number of orders was Indore
- What is the average profit margin for each category?
Firstly, i calculated the profit margin of each transaction
Sales_Table<- mutate(Sales_Table, Profit_Margin = (Profit/Sales)* 100)
I then calculated the average profit margin for each category and housed it in a new variable
P_Margin <- Sales_Table %>%
select (Category, Profit_Margin) %>%
group_by(Category) %>%
summarise(Avg_Profit_Margin = mean(Profit_Margin))
print(P_Margin)
## # A tibble: 3 × 2
## Category Avg_Profit_Margin
## <chr> <dbl>
## 1 Clothing 1.68
## 2 Electronics 0.375
## 3 Furniture -2.36
finally, i rounded off the output and Concatenated the “%” symbol
P_Margin$Avg_Profit_Margin<-round(P_Margin$Avg_Profit_Margin,2)
P_Margin$Avg_Profit_Margin<- paste0(P_Margin$Avg_Profit_Margin, “%”)
print(P_Margin)
## # A tibble: 3 × 2
## Category Avg_Profit_Margin
## <chr> <chr>
## 1 Clothing 1.68%
## 2 Electronics 0.37%
## 3 Furniture -2.36%
- From the above analysis, Clothing had the highest Avg_Profit_Margin followed by Electronics, while Furniture is currently being sold at a loss.
- Is there any correlation between the quantity of products ordered and the profit generated?
In answering the above, i created object variables for both quantity and product before calculating the correlation
Profit_obj<- Sales_Table$Profit
Quantity_obj<- Sales_Table$Quantity
I then calculated the correlation
correlation<- cor(Quantity_obj, Profit_obj)
print(correlation)
## [1] 0.00244136
Furthermore, i also analyzed the correlation using a scattered plot
ggplot(data = Sales_Table, aes(x = Quantity, y = Profit))+
geom_point() + labs(title = “A Correlation plot between Quantity and Profit”, x = “Quantity”, y = “Profit”,
caption = “Data sourced from Kaggle”)
- Examining the output and plot, it’s clear that there is a very low correlation between Quantity and Profit. This indicates that profit does not rely on the quantity sold. In other words, selling more items does not necessarily lead to higher profits. This suggests that other factors, such as pricing, cost of goods, or operational efficiency, might be influencing profitability more than the sheer volume of sales.
- What is the percentage achievement of sales targets for each category?
In resolving the above business question, i started by joining both Sales_Table and Sales_Target tables using the Full Join function and housed it in a new variable
Target_Table<- full_join(Sales_Table, Sales_Target)
## Joining with `by = join_by(Category)`
## Warning in full_join(Sales_Table, Sales_Target): Detected an unexpected many-to-many relationship between `x` and `y`.
## Row 1 of `x` matches multiple rows in `y`.
## Row 13 of `y` matches multiple rows in `x`.
## If a many-to-many relationship is expected, set `relationship =
## “many-to-many”` to silence this warning.
I went ahead to calculate the percentage achieved for each category
Target_Achieved<-Target_Table %>%
select(Category, Sales, Target) %>%
group_by(Category) %>%
summarise(Total_Sales = sum(Sales),
Total_Sales_Target = sum(Target),
Percentage_Achieved = (Total_Sales/Total_Sales_Target)*100)
print(Target_Achieved)
## # A tibble: 3 × 4
## Category Total_Sales Total_Sales_Target Percentage_Achieved
## <chr> <int> <int> <dbl>
## 1 Clothing 7974264 165126000 4.82920
## 2 Electronics 9798996 39732000 24.66273
## 3 Furniture 7989180 32294700 24.73836
Finally, i rounded off the output and Concatenated the “%” symbol
Target_Achieved$Percentage_Achieved<-round(Target_Achieved$Percentage_Achieved, 2)
Target_Achieved$Percentage_Achieved <- paste0(Target_Achieved$Percentage_Achieved,”%”)
print(Target_Achieved)
## # A tibble: 3 × 4
## Category Total_Sales Total_Sales_Target Percentage_Achieved
## <chr> <int> <int> <chr>
## 1 Clothing 7974264 165126000 4.83%
## 2 Electronics 9798996 39732000 24.66%
## 3 Furniture 7989180 32294700 24.74%
- Based on the analysis, Furniture led in target achievement with a 24.74% rate, closely matching the performance of Electronics. Despite running at a loss, the data indicates demand, and with adjustments, profitability could be achieved. Electronics ranked second with a 24.66% achievement rate, outperforming Clothing but still showing room for improvement. This relatively higher achievement suggests stronger performance or more effective sales strategies for this category. Clothing achieved only 4.83% of its sales target, indicating a significant shortfall. This suggests potential issues in sales strategies or market conditions affecting clothing sales. However, it is important to note that higher targets were often allocated to clothing, possibly due to its higher profitability compared to other categories.
- How does the sales performance vary across different states?
State_Sales<- Sales_Table %>%
select(State, Sales) %>%
group_by(State) %>%
summarise(Tol_Sales = sum(Sales)) %>%
arrange(desc (Tol_Sales))
print(State_Sales)
## # A tibble: 19 × 2
## State Tol_Sales
## <chr> <int>
## 1 “Madhya Pradesh” 569685
## 2 “Maharashtra” 467660
## 3 “Uttar Pradesh” 150032
## 4 “Gujarat” 100292
## 5 “Delhi” 97071
## 6 “Rajasthan” 94050
## 7 “Andhra Pradesh” 82897
## 8 “Punjab” 77591
## 9 “Karnataka” 66231
## 10 “West Bengal” 58035
## 11 “Nagaland” 57985
## 12 “Haryana” 54891
## 13 “Bihar” 54082
## 14 “Jammu and Kashmir” 53201
## 15 “Kerala ” 46158
## 16 “Himachal Pradesh” 39850
## 17 “Tamil Nadu” 29195
## 18 “Goa” 27919
## 19 “Sikkim” 20045
Visualizing the above
State_Sales$State<- factor(State_Sales$State, levels = State_Sales$State[order(-State_Sales$Tol_Sales)])
ggplot(State_Sales, aes(x = State, y = Tol_Sales)) +
geom_bar(stat = “identity”, fill = “blue”) +
labs(title = “State Performance Analysis”, X = “State”, y = “Sales”, caption = “Data sourced from Kaggle”)+
theme(axis.text.x = element_text(angle = 90)) + theme(plot.title = element_text(hjust = 0.4))
- Madhya Pradesh was the top performing state for the period under review
- Can we identify any monthly seasonal trends in sales for each category?
ggplot(Sales_Table, aes(x= Order.Date, y= Sales, colour = Category)) +
geom_line() +
labs(title = “Monthly Seasonal Trend in Sales”, x = “Year”, y =”Sales”, caption = “Data sourced from Kaggle”)+
scale_color_manual(values = c(“Clothing” = “red”, “Electronics” = “magenta”, “Furniture” = “blue”)) +
facet_wrap(~Category)+ theme(axis.text.x = element_text(angle = 90))
- From the above analysis, Clothing sold more in August during the summer season. Electronics recorded higher sales in January during the winter season, while Furniture saw more sales in March during the spring season.
- Are there any outliers in terms of order amount or profit that need further investigation?
ggplot(data = Sales_Table, aes(x = Amount, y = Profit))+
geom_point(fill = “blue”) + labs(title = “A Scattered plot between Amount and Profit”, x = “Amount”, y = “Profit”,
caption = “Data sourced from Kaggle
- The scattered plot between Amount and Profit revealed several data points that can be considered outliers, which may require further investigation:
- High Amount, Low Profit: There are points with high amounts (greater than 4000) but relatively low profit. These outliers indicate transactions where a large amount was involved but did not yield proportional profit, suggesting possible inefficiencies, pricing issues, or unexpected costs.
- High Loss: A few data points show significant losses (ranging from -1000 to -2000) at various amount levels. These outliers may indicate problematic transactions where the cost exceeded the revenue by a substantial margin, warranting investigation into the reasons behind these high losses.
- Isolated Points: Points that are isolated from the main cluster, especially the one with a profit higher than 1000 or a loss lower than -1000, are outliers. These could be due to exceptional circumstances, errors, or unique cases that differ significantly from the typical transaction.
To conclude, the outliers in the scatter plot, which are far from the main cluster of points, suggest areas where business processes or transactions need to be examined. These points indicate unusual transactions that might reveal issues or anomalies that should be investigated further.
Conclusion
The analysis of the Indian e-commerce dataset reveals insightful trends in sales performance across different product categories. Electronics emerged as the top-performing category with the highest total sales revenue. Furniture, while also having significant sales, shows a negative average profit margin, indicating it is being sold at a loss. Clothing, despite having the highest average profit margin at 1.68%, had the lowest percentage achievement of sales targets at 4.83%. Additionally, Indore recorded the highest number of orders among all cities. This comprehensive analysis underscores the critical need for category-specific strategies to enhance profitability and achieve sales targets effectively.