Merged Markets: Exploring Beverage Sales Through Combined Data
Merged Markets: Exploring Beverage Sales Through Combined Data Data Science Project
Data Wrangling with Pandas

Merged Markets: Exploring Beverage Sales Through Combined Data

This project involves analyzing retail and warehouse sales data of various items from multiple suppliers over four years (2017-2020). You will identify trends and patterns in sales performance , supplier contribution, and inventory management to ultimately improve sales strategies, supplier relationships, and stock allocation.
Start this project
Merged Markets: Exploring Beverage Sales Through Combined DataMerged Markets: Exploring Beverage Sales Through Combined Data
Project Created by

Adeyinka Odiaka

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

How many Item types are there in all of the datasets?

Start by merging the datasets and count the number of unique ITEM TYPE there are in the DataFrame.

input

How much did the store make in Retail sale in 2018 and 2019

Merge the datasets containing sales information for these two years. Once combined, filter the merged DataFrame to isolate retail sales data for each year individually. Sum up these filtered values, calculate the total retail sales for each year and sum them up. Your final answer should be a float, like 7.65.

codevalidated

Identify the Top selling Items(In Retail) across all years

Merge the data from all years. Then, calculate the total RETAIL SALES for each unique item across all years. Finally, use the .sort_values() function in descending order to identify the top-selling items for the entire period (2017-2020)in hierarchical. Store your final output in the variable top_sellers_all_years.

input

Which item type had the most retail sale throughout the years?

From the output of the previous activity, determine the ITEM TYPE with the highest value.

codevalidated

Calculate the percentage change in sales over the years.

Calculate the percentage change in sales for each item year-over-year (YoY). Start by adding a new column to the combined DataFrame:TOTAL SALES as the sum of RETAIL SALES and WAREHOUSE SALES. Group the data by YEAR to compute annual total sales, then calculate the year-over-year percentage change in these sales. Finally, store your final result in the variable: annual_sales.

Calculating the year-over-year percentage change in sales is important because it provides valuable insights into the company's growth and performance over time. It helps identify trends, such as consistent growth or decline, and can reveal the impact of market conditions, strategic decisions, or external factors on sales.

codevalidated

Calculate the rank of each `ITEM TYPE` based on `TOTAL SALES` for each individual year.

In this activity, the goal is to determine and rank the top-selling items based on their total sales for each individual year from 2017 to 2020. Group merged data by YEAR and ITEM TYPE and reset the index. Then sort the aggregated data in descending order based on total sales for each year. Your final result should be stored in the variable: yearly_item_sales_sorted.

codevalidated

Identify the top 10 suppliers in terms of sales(2018-2020).

Concatenate selected datasets to ensure the analysis focuses solely on years 2018, 2019, and 2020. Next, group the data by SUPPLIER and TOTAL SALES(Make sure to create this column anew and reset the index ) and calculate the total sales contributed by each supplier across these three years. Finally sort the results to identify the top ten performing suppliers in terms of sales contribution. Store your result in the variable: supplier_sales_contribution_sorted.

codevalidated

Determine the top suppliers(in terms of `RETAIL SALES`) for alcoholic beverages in 2017 and 2020.

Begin by merging the datasets in question, then filter the merged dataset to include WINE, BEER and LIQUOR. Next, aggregate the filtered data by YEAR and SUPPLIER, calculating the total sales for wine attributed to each supplier annually and reset the index. Using these aggregated results, identify the supplier with the highest total wine sales for each year within the dataset. Store your final output in the variable: top_supplier_wine.

codevalidated

Determine the number of unique items supplied by each supplier.

Combine the datasets for all years and group the combined dataset by the SUPPLIER column and count the unique ITEM TYPE entries for each supplier, then calculate the number of unique items each supplier provides. The resulting DataFrame(using .reset_index()) will contain two columns: SUPPLIER and UNIQUE_ITEM_COUNT. Store your final output in the variable: unique_items_per_supplier.

codevalidated

Which suppliers provide the most Items?

Using the output from the previous, sort unique_items_per_supplier in descending order based on the count of unique item codes to identify the top suppliers with the most diverse product portfolios. Finally, display the top 10 result in the variable top_10_suppliers_by_items

codevalidated

Compute Average Monthly Sales to Identify Best Months Yearly

Begin by applying a dictionary that maps numeric month values to month names to a combined dataset of all years, creating a new MONTH_NAME column. Group this DataFrame by YEAR and MONTH_NAME and calculate the average sales per month. Reset the index and then identify the months with the highest average sales yearly. Store your output as best_months.

codevalidated

Identify which months had the highest sales of beer before 2020

Start by merging the datasets for years before 2020. Filter the combined DataFrame to include only entries where the ITEM TYPE is BEER. Next, compute a TOTAL SALES column and group the data by YEAR and MONTH(resetting the index), summing the TOTAL SALES for each month. Store your final result in beer_sales_by_month.

codevalidated

Identify which Sales Channel has the most sales for each year

Calculate and compare the total sales revenue generated through warehouse channels versus retail channels for each year from 2017 to 2020.

Aggregate the merged DataFrame by YEAR, summing up the total WAREHOUSE SALES and RETAIL SALES for each year. The resulting output: sales_comparison DataFrame is then filtered for retail and warehouse sales individually. Merge the result of retail and warehouse sales, and store in the variable: merged_sales.

codevalidated

Compute Average Sales per Transaction from 2017 to 2020

We aim to find out the average value of each transaction for every sales channel across all years from 2017 to 2020. To do this, we need the total sales amount per transaction, aggregated by year and sales channel. Once we have this, we can determine whether transaction sizes vary depending on whether the sales were warehouse or retail.

First, use the merged DataFrame and aggregate it by both YEAR and ITEM TYPE. Calculate the total sales using .sum() function and count the number of transactions using .count() function for each item type in each year.

With the total sales and transaction count in hand, compute the average sales per transaction by dividing the total sales by the number of transactions. Append this calculation as a new column AVG_SALES_PER_TRANSACTION to your DataFrame.

Finally, store your output to the variable: :average_sales_comparison.

codevalidated

Calculate the Percentage Contribution of Warehouse and Retail Sales to Total Sales Volume Before 2020

Merge selected datasets, to include the years in question, then group the merged DataFrame by YEARand aggregate the sum of WAREHOUSE SALES, RETAIL SALES, and TOTAL SALES. Next, calculate the percentage contribution of warehouse and retail sales to the total sales for each year. This is achieved by dividing the sum of WAREHOUSE SALES and RETAIL SALES by TOTAL SALES and multiplying by 100. Store your final output in the variable: sales_contribution.

codevalidated

Identifying Dominant Sales Channels for Suppliers

Do certain suppliers dominate one channel over the other, and what factors might influence these preferences?

First, create a new DataFrame by grouping the merged DataFrame for all years by SUPPLIER and ITEM TYPE and aggregating the sum of RETAIL SALES and WAREHOUSE SALES. Then calculate the total sales for each supplier across both channels by adding RETAIL SALES and WAREHOUSE SALES. Next, compute the percentage of sales contributed by each channel relative to the total sales for each supplier. This is done by dividing RETAIL SALES and WAREHOUSE SALES by TOTAL SALES and multiplying by 100. Finally, identify suppliers that dominate each channel based on higher percentage sales. Suppliers with a higher percentage of retail sales compared to warehouse sales are categorized as retail-dominant, while those with higher warehouse sales percentages are classified as warehouse-dominant. Combine both results and store in the variable: dominant_suppliers.

codevalidated

Identify which item types are predominantly sold through each channel.

Group the combined data for all years by Year and Item Type, aggregating the sum of Retail Sales and Warehouse Sales. At the same time, calculate the total sales per year across both channels; for this, reset the index for both computations. Merge the aggregated item sales data with the total sales data to compute the percentage of total sales contributed by each item type in each channel. Achieve this by dividing the Retail Sales Item and Warehouse Sales Item by the Retail Sales Total and Warehouse Sales Total, respectively, and then multiplying by 100. The resulting DataFrame: item_sales is your result.

Merged Markets: Exploring Beverage Sales Through Combined DataMerged Markets: Exploring Beverage Sales Through Combined Data
Project Created by

Adeyinka Odiaka

This project is part of

Data Wrangling with Pandas

Explore other projects