All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
To explore the relationship between land size and total sales, we want to categorize the land area using the Total Land Area (acres)
column into meaningful categories: Small
, Medium
, Large
, and Extra Large
. We will use the following ranges to define each category: 0
, 50
, 100
, 200
, and inf
.
Create a new column named LandAreaCategory
and store the result in this column. The categories should be as follows:
Small
: land areas between 0 and 50 acres.Medium
: land areas between 50 and 100 acres.Large
: land areas between 100 and 200 acres.Extra Large
: land areas above 200 acres.Your result should look similar to the series shown in the image below:
In order to analyze the distribution of sales across different price levels, we need to categorize the prices of our products. We will use the Price per Unit
column to do this.
Below are the price categories and their corresponding price ranges:
Low
: prices between 0 and 10Medium
: prices between 10 and 20High
: prices between 20 and 30Premium
: prices above 30Create a new column called PriceCategory
to store these categories. Your result should look similar to the series shown in the image below:
Please make sure to create the categories and store them in the PriceCategory
column as mentioned.
To examine the impact of different dairy brands on sales performance, we will create dummy variables for each brand using the Brand
column. The dummy variables should have the prefix Brand
and use the default separator.
Store the resulting dummy variables in the variable brand_dummies
.
Please note that the resulting dataframe should look similar to the example shown below:
We want to analyze the impact of product shelf life on customer demand by discretizing the Shelf Life (days)
column into different categories. To do this, create a new column called ShelfLifeCategory
to store the results. The categories should be as follows:
Short
: shelf life between 0 and 7 daysMedium
: shelf life between 7 and 14 daysLong
: shelf life between 14 and 30 daysVery Long
: shelf life above 30 daysNote: Make sure that the first interval is included in your analysis. The resulting categories should be similar to the series shown below:
Create dummy variables for the Storage Condition
column, which contains different conditions such as refrigerated and ambient. This will allow us to examine the effects of storage conditions on sales. Use the prefix "Storage" and the default separator when creating the dummy variables. Save the resulting dataframe in a variable named storage_dummies
.
In the resulting dataframe, the storage conditions should be represented as separate columns with binary values indicating whether a particular storage condition is present or not, similar to the example shown below:
Create binary indicators for different sales channels, such as online and offline, to identify the most effective channels for dairy sales. Use the prefix Channel:
as a separator for each indicator. Save the results in a variable called channel_dummies
.
Make sure your output dataframe looks similar to the example provided above.
In order to analyze the seasonal sales trends, we need to categorize the production dates into specific time periods (months) and create a new column called ProductionMonth
.
The production dates will be divided into different time periods based on month ranges throughout the dataset, leaving the first interval inclusive. Each time period will be labeled with a number from 1 to the length of the month bins.
Once the categorization is completed, the results should be stored in the ProductionMonth
column. The final outcome should resemble the series shown in the provided image.
To assess the adequacy of stock levels and determine optimal replenishment strategies, we need to analyze the Quantity in Stock (liters/kg)
column. We will categorize the stock levels into three meaningful categories:
Low
- for stock levels below the first quantileMedium
- for stock levels between the first and second quantilesHigh
- for stock levels above the second quantileCreate a new column named StockCategory
to store the results. Additionally, store the bins in a variable stock_bins
.
Note that the expected output should resemble the provided image.
Sort the revenue, from the 'Approx. Total Revenue(INR)' column, into four distinctive categories: Low
, Medium
, High
, and Premium
.
The classification guidelines are as follows:
Low
for the first quarter of revenue (first 25%)Medium
for the second quarter (25% to 50%)High
for the third quarter (50% to 75%)Premium
for the last quarter (75% to 100%)Generate these categories and include them in a newly created column named RevenueCategory
.
Below is an illustration of what the results might look like:
To optimize reorder amounts for various dairy products, characterize the 'Reorder Quantity (liters/kg)' column into four categories: Low, Medium, High, and Very High.
The outcome should be saved in a new column known as 'ReorderCategory'. Detailed information on how to categorize would look like this:
Please see the preview below:
Store the bar chart in the variable stock_category_plot
.
Notes:
Your bar chart should:
Have a Distribution of Stock Categories
title
Stock Category
as the X label valueCount
as the Y label valueThe desired output chart should look like this:
Store the line chart in the variable monthly_production_plot
.
Notes:
Your line chart should:
Have a Monthly Production Trends
title
Production Month
as the X label valueTotal Quantity
as the Y label valueo
markerThe desired output chart should look like this:
Store the resulting series in the variable median_quantity_sold
.
The resulting series should look like this:
Store the resulting series in the variable total_revenue
.
The resulting series should look like this: