Handling Missing and Null Values in YouTube Channel Data
Handling Missing and Null Values in YouTube Channel Data Data Science Project
Data Cleaning with Pandas

Handling Missing and Null Values in YouTube Channel Data

This project takes you on an exploratory journey through a diverse dataset of YouTubers. It features numerous activities designed to hone your skills in data cleaning, particularly focusing on handling missing and null values. Get ready to dive deep and master these essential data manipulation techniques!
Start this project
Handling Missing and Null Values in YouTube Channel DataHandling Missing and Null Values in YouTube Channel Data
Project Created by

Vidhi Shah

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

multiplechoice

Find the total number of null values in the dataset

To initiate the data cleaning process, it's important to understand where the data is missing in your dataset.

Your task is to find the total number of null values in the dataset. Please choose the correct method or code from the provided options to accomplish this.

multiplechoice

Figure out which column has the most missing values

Based on the results of Activity 1, which column do you think has the most missing values?

multiplechoice

Non-Missing Notifiers: Counting Valid Data

Next, analyze the data set for not null values.

Your task is to choose the appropriate option from provided choices that correctly determines the count of not null values in the Subscribers column.

codevalidated

Remove Completely Empty Rows

Using dropna() function, remove all the rows that are entirely empty.

Store the result in the dataframe df_cleaned.

The result should match the following output:

Activity_4

codevalidated

Column-Specific Removal

Using dropna() drop columns where more than 50% of the data is missing.

Store your results in df_cleaned_column variable.

The result should match the following output:

Activity_5

input

Identify the Column with the Least Missing Values

After removing columns with excessive missing values, use isnull() to identify which column now has the fewest missing values.

Write down the column name below.

codevalidated

The Rank Rescuer

Use mean imputation to handle missing values in the Rank column.

The result should match the following output:

Activity_7

multiplechoice

Use of Mean Imputation

Why is mean imputation suitable for the Rank column?

codevalidated

Comment Caution: Imputing Average Comments

Use median imputation to handle missing values in the Average Comments column.

Median imputation will provide a more typical representation of the general comment volume unaffected by extreme outliers.

The result should match the following output:

Activity_9

codevalidated

Apply Forward Fill

Apply forward fill to handle missing values in the Country column.

The result should match the following output:

Activity_10

codevalidated

Apply Backward Fill

Apply backward fill to handle any remaining missing values in the Country column.

The result should match the following output:

Activity_11

multiplechoice

Understand the combination of Forward and Backward Fill

Why might combining forward and backward fill be beneficial?

codevalidated

View Boost: Interpolate Average Views

Apply linear interpolation using interpolate() to estimate and fill missing values in the Average Views column.

The result should match the following output:

Activity_13

codevalidated

Impute Subscribers Using Forward Fill and Mode

First apply forward fill, then impute remaining missing values in Subscribers using the mode.

The result should match the following output:

Activity_14

codevalidated

Impute Category Using Unknown

Your task is to fill the missing Category values with a new category named Unknown. This strategy allows the clear and simple handling of data, while preserving the integrity of your analysis by marking unknown data explicitly

The result should match the following output:

Activity_15

codevalidated

Fill Missing Content Type Values with Unknown

Fill the missing values in the Content Type column with a new category named Unknown.

The result should match the following output:

Activity_16

Handling Missing and Null Values in YouTube Channel DataHandling Missing and Null Values in YouTube Channel Data
Project Created by

Vidhi Shah

As a Project Author at DataWars, I dive into the world of data science and AI/ML with a millennial flair, constantly intrigued by the inner workings of technology. While I'm not crunching numbers, you'll find me cheering for my favorite cricket team.

As a Project Author at DataWars, I dive into the world of data science and AI/ML with a millennial flair, constantly intrigued by the inner workings of technology. While I'm not crunching numbers, you'll find me cheering for my favorite cricket team.

This project is part of

Data Cleaning with Pandas

Explore other projects