Practice GroupBy operations with Netflix data
Practice GroupBy operations with Netflix data Data Science Project
Data Wrangling with Pandas

Practice GroupBy operations with Netflix data

In this hands-on project, we'll explore the Netflix TV shows and movies dataset. You'll learn how to use Python's GroupBy operations to group data by single or multiple columns. Plus, we'll apply built-in and custom functions for data aggregation and summarization. Get ready to dive into the world of data manipulation with Netflix data!
Start this project
Practice GroupBy operations with Netflix dataPractice GroupBy operations with Netflix data
Project Created by

Anurag Verma

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Drop records where the imdb_score column has missing values (NaN)

Note that you have to modify the original dataframe.

codevalidated

For each TV show or movie which has NaN value in the age certification column, replace it to be 'No certification'

Note that you have to modify the original dataframe.

codevalidated

For each TV show or movie which has NaN value in the seasons column, replace it to be the most occured value in the seasons

Note that you have to modify the original dataframe.

codevalidated

Count the number of movies or TV shows for each age certification

Store the resulting dataframe in the variable certification_counts.

Your result should look similar to this dataframe: activity1-answer

codevalidated

Count the number of movies and TV shows (seperately) produced in each release year

Store the resulting dataframe in the variable count_by_release_year.

Your result should look similar to this dataframe:

activity2-answer

codevalidated

Calculate the average runtime and imdb score of movies and TV shows for each release year

Store the resulting dataframe in the variable average_duration_imdb_score.

Your result should look similar to this dataframe:

activity3-answer

codevalidated

Count the number of movies and TV shows for each genre

Store the resulting dataframe in the variable genre_counts.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity4-answer

codevalidated

Calculate the standard deviation of movies and TV shows ratings for each release year

Store the resulting dataframe in the variable imdb_score_std.

Your result should look similar to this dataframe:

activity5-answer

codevalidated

Calculate the maximum TMDB popularity and minimum IMDb score for each production country

Store the resulting dataframe in the variable TMDB_popularity.

Note: you have to explode the production_countries column first.

Your result should look similar to this dataframe:

activity6-answer

codevalidated

Calculate the sum of IMDb votes for each genre and find the average TMDB score

Store the resulting dataframe in the variable genres_votes_scores.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity7-answer

codevalidated

Calculate the average rating deviation from the mean for each genre

Store the resulting dataframe in the variable genre_avg_deviation.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity8-answer

codevalidated

Calculate the standardized score for TMDB popularity for each movie or TV show within its respective genre using the implemented function and modify the original dataframe to incluede it

Store the resulting dataframe in the variable titles_df with a new column called standardized_tmdb_popularity.

Refer to this link for more information about the standardized score formula: https://en.wikipedia.org/wiki/Standard_score

codevalidated

Find the minimum and maximum release year for each type (movie or TV show)

Store the resulting dataframe in the variable min_max_year.

Your result should look similar to this dataframe:

activity10-answer

codevalidated

Calculate the average IMDb score and the max TMDB score for each genre and release year combination

Store the resulting dataframe in the variable genre_year_scores.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity11-answer

codevalidated

Calculate the average length of titles (number of characters) for each genre

Store the resulting dataframe in the variable genre_average_length.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity12-answer

codevalidated

Find the count and average IMDb score for each age certification category

Store the resulting dataframe in the variable certification_stats.

Your result should look similar to this dataframe:

activity13-answer

Practice GroupBy operations with Netflix dataPractice GroupBy operations with Netflix data
Project Created by

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

This project is part of

Data Wrangling with Pandas

Explore other projects