Diving Deep into Data Relationships: Pandas Merging with Shopping Cart Data
Diving Deep into Data Relationships: Pandas Merging with Shopping Cart Data Data Science Project
Data Wrangling with Pandas

Diving Deep into Data Relationships: Pandas Merging with Shopping Cart Data

The goal in this lab is to enhance your skills more in merging and joining datasets by providing the opportunity to explore and analyze the intricate relationships within the shopping cart data about customer details, product information, and order history provided in Shopping Cart Database.
Start this project
Diving Deep into Data Relationships: Pandas Merging with Shopping Cart DataDiving Deep into Data Relationships: Pandas Merging with Shopping Cart Data
Project Created by

Anurag Verma

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Calculate the Total Sales for Each Product Category

In this activity, you are asked to merge the products_df and sales_df dataframes based on their product_ID columns. After the merge operation, group the resulting dataframe by product_type. Calculate the total sales for each product category and store the result in a new column called total_price. Note that some categories may not have sold any products. For these categories, fill the total sales value with 0. Store the resulting dataframe in a variable named total_sales_by_category.

The resulting dataframe should look like this:

solution_1

codevalidated

Find the total number of orders and the average order value for each customer.

Combine the DataFrames, customers_df and orders_df, on the customer_id column. Group the updated DataFrame according to customer_id. For each customer, you need to determine the number of total orders and find the average order value. Lastly, rename the DataFrame columns in customer_analysis to total_orders and average_payment.

The resulting dataframe should look like this:

solution_2

codevalidated

Determine the total quantity sold and remaining quantity for each product.

Combine the products_df and sales_df dataframes using a left join on the product_ID column. This ensures that products without any sales are included. Then, group the combined dataframe by product_ID and compute the total quantity sold for each product using the sum() function. After that, merge the products_df with the total_quantity_sold dataframe, again using a left join on the product_ID column. Replace any NaN values in the quantity_sales column with 0. Lastly, compute the remaining quantity for each product by subtracting the quantity_sales from the quantity.

Save the resulting dataframe in the product_inventory variable.

The resulting dataframe should look like this:

solution_3

codevalidated

Determine the total sales and average order value for each city.

Combine the customers_df, orders_df, and sales_df data frames using a left join based on the customer_id and order_id columns. Then group the combined data frame by city. Compute the total sales and average order value for each city using the sum() and mean() functions. Substitute NaN values in the payment column with 0. Rename the total_price and payment columns to total_price_in_sales and average_order_payment.

The final data frame, which includes total_price_in_sales and average_order_payment, should be stored in the geographic_analysis variable.

The resulting dataframe should look like this:

solution_4

codevalidated

Identify returning customers (customers who have made more than one order)

Combine the customers_df and orders_df dataframes based on the customer_id column. Identify duplicate rows in the combined dataframe using the duplicated() function, specifically where the customer_id column is replicated. Eliminate any duplicate rows with the drop_duplicates() function. Store the resulting dataframe of customer_id and customer_name in the returning_customers variable. This dataframe will provide a listing of all returning customers and their names.

The resulting dataframe should look like this:

solution_5

codevalidated

Determine the average order value by gender.

Combine the orders_df, sales_df, and customers_df dataframes using the order_id and customer_id columns, respectively. Organize the combined dataframe according to gender. Work out the average order value and quantity for each gender. Change the names of the total_price and order_id columns to average_total_price and number_of_orders, respectively.

Save the final dataframe, which should include gender, average_total_price, and number_of_orders, in a variable titled avg_order_value_by_gender.

The resulting dataframe should look like this:

solution_6

codevalidated

Determine the total orders price per age group.

Store the resulting series in the variable sales_by_age_group.

Notes:

  1. customers_age_group_df contains a customer_id and age_group columns, orders_df contains order_id and customer_id, and sales_df contains order_id and total_price.

  2. Age groups are '<25', '25-40', '40-60', '60+' and there is a code to run in activity 7 in the notebook to add the age_group column to customers_age_group_df dataframe.

The resulting series should look like this:

solution_7

codevalidated

Determine the most sold colors.

Merge the products_df and sales_df dataframes using the product_ID and product_id columns. Next, group the merged dataframe by color and use the sum() function to add up the quantity_sales column - this will give the total quantity sold for each color. Then, you should reset the index of the new dataframe and sort it by the quantity_sales column. Finally, save the final dataframe in the most_sold_colors variable.

The resulting dataframe should look like this:

solution_8

codevalidated

Determine the top selling products in each city.

Merge the products_df, sales_df, and orders_df data frames using the product_ID, product_id, and order_id columns, respectively. Be sure to use suffixes while merging. Then, group the merged data frame by city and product_name, and aggregate the quantity_sales column to find the total quantity sold for each product in every city. Next, sort this new data frame according to the city and total_quantity_sold columns. Now, group the sorted data frame by the city. Finally, save the resultant data frame in a variable named top_selling_products_by_city.

The resulting dataframe should look like this:

solution_9

codevalidated

Determine the name of the male and female customers who paid the most.

Combine the customers_df and orders_df dataframes using the customer_id column. Next, group the unified dataframe to compute the aggregate amount each customer paid. Make sure to reset the index of the resulting dataframe. Then, arrange the resultant dataframe in descending order by the payment column.

Next, modify the payment column name to paid_amount.

Save the final dataframes, which include the customer_name, gender, and paid_amount columns, in the top_male and top_female variables.

The resulting dataframes should look like this:

solution_10

codevalidated

Determine the number of products in the 5 orders with the shortest delivery time

Combine the orders_df and sales_df dataframes to obtain details related to orders and products. Work out the delivery time for every order. Group the combined dataframe by order_id and consolidate order_id and delivery_time columns to identify the number of products per order and the corresponding delivery time. Order the final dataframe by delivery_time. Save the output dataframe in the fastest_orders variable.

The resulting dataframe should look like this:

solution_11

codevalidated

Determine the average delivery time by state.

Combine the orders_df and customers_df dataframes to obtain both order and customer information. Substitute any null values in the order_date and delivery_date fields with today's date. Compute the delivery duration for each order. Subsequently, group the combined dataframe by 'state' and find the average delivery duration for each state. Rename the columns appropriately. Assign the final dataframe to the variable avg_delivery_time_by_state.

Note: for customers who have no orders, put a default delivery time of 0 day (same day).

The resulting dataframe should look like this:

solution_12

codevalidated

Determine the total sales per price range.

Merge the products_price_ranges_df and sales_df dataframes to combine product details with sales information. Group the merged dataframe by price_range and calculate the total sales for each category. Rename the total_price column to total_sales for clarity. Finally, sort the resultant dataframe. Save this dataframe in the price_range_analysis variable.

  • Note: price range is determined by the code provided in activity 13 section in the notebook, you just have to run it and then use products_price_ranges_df dataframe.

The resulting dataframe should look like this:

solution_13

Diving Deep into Data Relationships: Pandas Merging with Shopping Cart DataDiving Deep into Data Relationships: Pandas Merging with Shopping Cart Data
Project Created by

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

This project is part of

Data Wrangling with Pandas

Explore other projects