Practice GroupBy operations with Netflix data

codevalidated

Drop records where the imdb_score column has missing values (NaN)

Note that you have to modify the original dataframe.

codevalidated

For each TV show or movie which has NaN value in the age certification column, replace it to be 'No certification'

Note that you have to modify the original dataframe.

codevalidated

For each TV show or movie which has NaN value in the seasons column, replace it to be the most occured value in the seasons

Note that you have to modify the original dataframe.

codevalidated

Count the number of movies or TV shows for each age certification

Store the resulting dataframe in the variable certification_counts.

Your result should look similar to this dataframe: activity1-answer

codevalidated

Count the number of movies and TV shows (seperately) produced in each release year

Store the resulting dataframe in the variable count_by_release_year.

Your result should look similar to this dataframe:

activity2-answer

codevalidated

Calculate the average runtime and imdb score of movies and TV shows for each release year

Store the resulting dataframe in the variable average_duration_imdb_score.

Your result should look similar to this dataframe:

activity3-answer

codevalidated

Count the number of movies and TV shows for each genre

Store the resulting dataframe in the variable genre_counts.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity4-answer

codevalidated

Calculate the standard deviation of movies and TV shows ratings for each release year

Store the resulting dataframe in the variable imdb_score_std.

Your result should look similar to this dataframe:

activity5-answer

codevalidated

Calculate the maximum TMDB popularity and minimum IMDb score for each production country

Store the resulting dataframe in the variable TMDB_popularity.

Note: you have to explode the production_countries column first.

Your result should look similar to this dataframe:

activity6-answer

codevalidated

Calculate the sum of IMDb votes for each genre and find the average TMDB score

Store the resulting dataframe in the variable genres_votes_scores.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity7-answer

codevalidated

Calculate the average rating deviation from the mean for each genre

Store the resulting dataframe in the variable genre_avg_deviation.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity8-answer

codevalidated

Calculate the standardized score for TMDB popularity for each movie or TV show within its respective genre using the implemented function and modify the original dataframe to incluede it

Store the resulting dataframe in the variable titles_df with a new column called standardized_tmdb_popularity.

Refer to this link for more information about the standardized score formula: https://en.wikipedia.org/wiki/Standard_score

codevalidated

Find the minimum and maximum release year for each type (movie or TV show)

Store the resulting dataframe in the variable min_max_year.

Your result should look similar to this dataframe:

activity10-answer

codevalidated

Calculate the average IMDb score and the max TMDB score for each genre and release year combination

Store the resulting dataframe in the variable genre_year_scores.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity11-answer

codevalidated

Calculate the average length of titles (number of characters) for each genre

Store the resulting dataframe in the variable genre_average_length.

Note: you have to explode the genres column first.

Your result should look similar to this dataframe:

activity12-answer

codevalidated

Find the count and average IMDb score for each age certification category

Store the resulting dataframe in the variable certification_stats.

Your result should look similar to this dataframe:

activity13-answer

Anurag Verma

Project Activities

Drop records where the imdb_score column has missing values (NaN)

For each TV show or movie which has NaN value in the age certification column, replace it to be 'No certification'

For each TV show or movie which has NaN value in the seasons column, replace it to be the most occured value in the seasons

Count the number of movies or TV shows for each age certification

Count the number of movies and TV shows (seperately) produced in each release year

Calculate the average runtime and imdb score of movies and TV shows for each release year

Count the number of movies and TV shows for each genre

Calculate the standard deviation of movies and TV shows ratings for each release year

Calculate the maximum TMDB popularity and minimum IMDb score for each production country

Calculate the sum of IMDb votes for each genre and find the average TMDB score

Calculate the average rating deviation from the mean for each genre

Calculate the standardized score for TMDB popularity for each movie or TV show within its respective genre using the implemented function and modify the original dataframe to incluede it

Find the minimum and maximum release year for each type (movie or TV show)

Calculate the average IMDb score and the max TMDB score for each genre and release year combination

Calculate the average length of titles (number of characters) for each genre

Find the count and average IMDb score for each age certification category

Anurag Verma

Data Wrangling with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database