All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
In this activity, you are asked to merge the products_df
and sales_df
dataframes based on their product_ID
columns. After the merge operation, group the resulting dataframe by product_type
. Calculate the total sales for each product category and store the result in a new column called total_price
. Note that some categories may not have sold any products. For these categories, fill the total sales value with 0. Store the resulting dataframe in a variable named total_sales_by_category
.
The resulting dataframe should look like this:
Combine the DataFrames, customers_df
and orders_df
, on the customer_id
column. Group the updated DataFrame according to customer_id
. For each customer, you need to determine the number of total orders and find the average order value. Lastly, rename the DataFrame columns in customer_analysis
to total_orders
and average_payment
.
The resulting dataframe should look like this:
Combine the products_df
and sales_df
dataframes using a left
join on the product_ID
column. This ensures that products without any sales are included. Then, group the combined dataframe by product_ID
and compute the total quantity sold for each product using the sum()
function. After that, merge the products_df
with the total_quantity_sold
dataframe, again using a left
join on the product_ID
column. Replace any NaN values in the quantity_sales
column with 0. Lastly, compute the remaining quantity for each product by subtracting the quantity_sales
from the quantity
.
Save the resulting dataframe in the product_inventory
variable.
The resulting dataframe should look like this:
Combine the customers_df
, orders_df
, and sales_df
data frames using a left
join based on the customer_id
and order_id
columns. Then group the combined data frame by city
. Compute the total sales and average order value for each city using the sum()
and mean()
functions. Substitute NaN
values in the payment column with 0
. Rename the total_price
and payment
columns to total_price_in_sales
and average_order_payment
.
The final data frame, which includes total_price_in_sales
and average_order_payment
, should be stored in the geographic_analysis
variable.
The resulting dataframe should look like this:
Combine the customers_df
and orders_df
dataframes based on the customer_id
column. Identify duplicate rows in the combined dataframe using the duplicated()
function, specifically where the customer_id
column is replicated. Eliminate any duplicate rows with the drop_duplicates()
function. Store the resulting dataframe of customer_id
and customer_name
in the returning_customers
variable. This dataframe will provide a listing of all returning customers and their names.
The resulting dataframe should look like this:
Combine the orders_df
, sales_df
, and customers_df
dataframes using the order_id
and customer_id
columns, respectively. Organize the combined dataframe according to gender. Work out the average order value and quantity for each gender. Change the names of the total_price
and order_id
columns to average_total_price
and number_of_orders
, respectively.
Save the final dataframe, which should include gender
, average_total_price
, and number_of_orders
, in a variable titled avg_order_value_by_gender
.
The resulting dataframe should look like this:
Store the resulting series in the variable sales_by_age_group
.
Notes:
customers_age_group_df
contains a customer_id
and age_group
columns, orders_df
contains order_id
and customer_id
, and sales_df
contains order_id
and total_price
.
Age groups are '<25', '25-40', '40-60', '60+'
and there is a code to run in activity 7 in the notebook to add the age_group
column to customers_age_group_df
dataframe.
The resulting series should look like this:
Merge the products_df
and sales_df
dataframes using the product_ID
and product_id
columns. Next, group the merged dataframe by color and use the sum()
function to add up the quantity_sales
column - this will give the total quantity sold for each color. Then, you should reset the index of the new dataframe and sort it by the quantity_sales
column. Finally, save the final dataframe in the most_sold_colors
variable.
The resulting dataframe should look like this:
Merge the products_df
, sales_df
, and orders_df
data frames using the product_ID
, product_id
, and order_id
columns, respectively. Be sure to use suffixes while merging. Then, group the merged data frame by city
and product_name
, and aggregate the quantity_sales
column to find the total quantity sold for each product in every city. Next, sort this new data frame according to the city
and total_quantity_sold
columns. Now, group the sorted data frame by the city
. Finally, save the resultant data frame in a variable named top_selling_products_by_city
.
The resulting dataframe should look like this:
Combine the customers_df
and orders_df
dataframes using the customer_id
column. Next, group the unified dataframe to compute the aggregate amount each customer paid. Make sure to reset the index of the resulting dataframe. Then, arrange the resultant dataframe in descending order by the payment
column.
Next, modify the payment
column name to paid_amount
.
Save the final dataframes, which include the customer_name
, gender
, and paid_amount
columns, in the top_male
and top_female
variables.
The resulting dataframes should look like this:
Combine the orders_df
and sales_df
dataframes to obtain details related to orders and products. Work out the delivery time for every order. Group the combined dataframe by order_id
and consolidate order_id
and delivery_time
columns to identify the number of products per order and the corresponding delivery time. Order the final dataframe by delivery_time
. Save the output dataframe in the fastest_orders
variable.
The resulting dataframe should look like this:
Combine the orders_df
and customers_df
dataframes to obtain both order and customer information. Substitute any null values in the order_date
and delivery_date
fields with today's date. Compute the delivery duration for each order. Subsequently, group the combined dataframe by 'state' and find the average delivery duration for each state. Rename the columns appropriately. Assign the final dataframe to the variable avg_delivery_time_by_state
.
Note: for customers who have no orders, put a default delivery time of 0 day (same day).
The resulting dataframe should look like this:
Merge the products_price_ranges_df
and sales_df
dataframes to combine product details with sales information. Group the merged dataframe by price_range
and calculate the total sales for each category. Rename the total_price
column to total_sales
for clarity. Finally, sort the resultant dataframe. Save this dataframe in the price_range_analysis
variable.
products_price_ranges_df
dataframe. The resulting dataframe should look like this: