Tidying Up: Removing Duplicates in Video Game Sales Data
Tidying Up: Removing Duplicates in Video Game Sales Data Data Science Project
Data Cleaning with Pandas

Tidying Up: Removing Duplicates in Video Game Sales Data

This project takes you on an exploratory journey through a diverse dataset of video games sales data. It features numerous activities designed to hone your skills in cleaning and handling duplicate data. Get ready to dive deep and master these essential data cleaning skills!
Start this project
Tidying Up: Removing Duplicates in Video Game Sales DataTidying Up: Removing Duplicates in Video Game Sales Data
Project Created by

Vidhi Shah

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

Count the total number of duplicate values.

Let's start off by identifying how many duplicate values we need to conquer.

Find out and enter the total number of duplicate values in the dataframe.

multiplechoice

Counting Unique and Duplicate Address

Find out the number of Unique and Duplicate values in the Address column.

Choose the correct option from below.

Note : The order is ( Unique Address Count, Duplicate Address Count ).

codevalidated

Identify Duplicates by Avatar Name

Your task is to identify and extract the distinct duplicate values from the Avatar column.

Step 1: Identify all the duplicate values in the Avatar column and store them in a variable named duplicate_avatars.

Step 2: From the duplicate_avatars variable, extract the unique values of these duplicates and store them in a new variable named duplicate_avatar_names.

Your result should match the following output :

img3

multiplechoice

Which of the following code snippets will remove all duplicate rows from a DataFrame `df`, keeping only the `first occurrence` of each row?

codevalidated

Identifying Duplicates Across Multiple Columns

You need to identify duplicate values where both the columns Length of Membership and Yearly Amount Spent match.

Your result should match the following output :

img5

codevalidated

Flag Rows with Duplicate Values

Create a new column named Is_Duplicate_Address to flag rows that have duplicate values in the Address column.

Your result should match the following output :

img6

codevalidated

Identifying and Dropping Duplicates by Address

Drop all the duplicate values present in the Address column and store the result in a new variable called df_unique_addresses.

Your result should match the following output :

img7

multiplechoice

When using the `drop_duplicates()` method, what happens if you set the `keep` parameter to `False`?

codevalidated

Dropping Duplicates by Length of Membership

Drop all the duplicate values other than the first occurrence from the Length of Membership column and store the result in a new variable called df_cleaned_first.

Your result should match the following output : img9

multiplechoice

When using `drop_duplicates()`, how can you ensure that the operation only considers specific columns for detecting duplicates?

multiplechoice

Why it is important to specify the `subset` parameter when using the `drop_duplicates()` function in pandas?

multiplechoice

No Duplicates!

Is there any column with 0 duplicate values? Let's find out!

Your task is to find out list of columns who have 0 duplicate values.

Choose the correct option :

input

Max Duplicates!

Identify and enter the name of the column that has maximum duplicates!

Tidying Up: Removing Duplicates in Video Game Sales DataTidying Up: Removing Duplicates in Video Game Sales Data
Project Created by

Vidhi Shah

As a Project Author at DataWars, I dive into the world of data science and AI/ML with a millennial flair, constantly intrigued by the inner workings of technology. While I'm not crunching numbers, you'll find me cheering for my favorite cricket team.

As a Project Author at DataWars, I dive into the world of data science and AI/ML with a millennial flair, constantly intrigued by the inner workings of technology. While I'm not crunching numbers, you'll find me cheering for my favorite cricket team.

This project is part of

Data Cleaning with Pandas

Explore other projects