All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Perform the selection and store the results in the variable duplicate_rows
.
keep='first'
.This data contains dulpicate orders with the same quantity and unit price, so drop these duplicates.
Perform the dropping and store the results in the variable df_without_duplicate_orders
.
As each invoice should have the stock code only one time for each customer and the customer may have different quantities: Drop duplicates while keeping the first non-NaN value.
Perform the dropping and store the results in the variable df_keep_first
.
If you want to show number of unique transactions per weekday and StockCode combination, you will need to drop duplicate stockcode on same day.
Perform the dropping and store the results in the variable df_unique_stock_day
.
Imagine it is black friday and each customer is allowed to buy only one product in the invoice. So, we need to drop all data that has more than one product in the same invoice.
Perform the dropping and store the results in the variable df_black_friday
.
Imagine we want to know all unique countries in our stock, drop duplicate countries keeping first row.
Perform the dropping and store the results in the variable df_unique_countries
.
Imagine we want to know all ordered products in our retail, drop duplicate products based on StockCode, Description, and UnitPrice.
Perform the dropping and store the results in the variable df_unique_products
.
We want to know all unique total costs paid by each different customer, So drop these duplicates.
Perform the dropping and store the results in the variable df_customer_unique_payments
.
Perform the dropping and store the results in the variable df_unique
.