Practice Data Filtering & Sorting with Hollywood Movie Data
Practice Data Filtering & Sorting with Hollywood Movie Data Data Science Project
Intro to Pandas for Data Analysis

Practice Data Filtering & Sorting with Hollywood Movie Data

Dive into Hollywood's world through data analysis using Python and Pandas! Explore movie datasets to uncover trends in budgets, genres, and revenues. Master techniques like .loc[], .sort_values(), and .query() to gain insights. Use .str.contains() and compound conditions for advanced filtering. Uncover the secrets behind blockbusters in this exciting journey through cinema's landscape.
Start this project
Practice Data Filtering & Sorting with Hollywood Movie DataPractice Data Filtering & Sorting with Hollywood Movie Data
Project Created by

Dhrubaraj Roy

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

Find the movie with the maximum `runtime`.

Write the value of the movie with the maximum runtime as a floating point number. For example, 676.0

multiplechoice

What is the earliest Release Date?

Select from the following options, what is the earliest date found in the release_date column?

input

What's the highest value in the `vote_count` column?

Identify the highest value present in the vote_count column. Please write it as a floating point number, for example, 12751.0

codevalidated

Select the first 5 movies

Select the first 5 movies from this df using .iloc[] method and store the result in the variable first_5_movies

input

What is the `title` of the 10th movie?

Enter the name of the 10th movie.

codevalidated

Select `title`, `release_date` of movies in 2nd, 6th, 11th positions

Find the 2nd, 6th, and 11th movies or series and extract only the columns at index positions 0 (title) and 12 (release_date). Store the result in variable selected_movies.

codevalidated

Sort the `df` by `release_date` in ascending order.

This activity modifies the original df by sorting it in ascending order based on the release_date column.

codevalidated

Select movies in English and sort them by Vote Average

Filter the movies with original_language equals to English, and sort them by vote_average in descending order. Store your result in df_english_movies.

codevalidated

Select movies/series with long runtime and high rating

Find the movies that have a runtime value greater than 120 and vote_average greather than 7. Store your result in the variable df_runtime_above_120.

codevalidated

Sort by `genres` (asc) and `revenue` (desc)

Sort the df, genres in ascending order and revenue in descending order. Store the result in the variable df_genres_revenue.

codevalidated

Select Action movies/series and sort them by voting average and revenue

Find only the Action genre movies or series and sort them by the vote_average column in descending order and the revenue column in ascending order.

codevalidated

Select movies/series with `budget > 50M` and `runtime > 120 min`

Filter the df to display movies that have a budget greater than $50,000,000 and a runtime longer than 120 minutes. Show only the original_title, budget, and runtime columns. Store your result in the variable df_budget_runtime.

multiplechoice

Select Action and Adventure movies/series with high `vote_average` and sort them by `vote_average` (asc) and `revenue` (asc)

What are the top three movies in the df with a vote_average over 7 in the Action or Adventure genres?

Note - Sort them by ascending order of the vote_average and revenue columns.

codevalidated

Select `Star Wars` movies/series and sort them by `release_date` (asc) and `revenue` (desc)

Filter the df with Star Wars in the title, and sort them by the release_date column in ascending order and the revenue column in descending order. Store your result in the variable df_star_wars.

codevalidated

Select `original_title`, `budget`, `revenue`, `vote_average` for movies with `budget > 100M`

Select only the original_title, budget, revenue, and vote_average columns with a budget over $100,000,000. Store the result in the variable df_high_budget.

Note - Don't use comma (,).

codevalidated

Find movies with `runtime > 120` and `vote_average > 7`. Sort them in descending order.

Select only the movies or series with a runtime greater than 120 minutes and a vote_average greater than 7 and sort them by the runtime column in descending order. Store the result in the variable df_run_vote

codevalidated

Find top 10 movies/series with `vote_count` and `vote_average > 7`, show `original_title`, `vote_count`, `vote_average`.

Select the top 10 movies or series with a vote_count greater than 1000 and a vote_average greater than 7. Show only the original_title, vote_count, and vote_average columns. Store the result in the variable df_top_10.

codevalidated

Find the movies or series that have a `vote_average` above 8 or have made more than 500,000,000 in `revenue`.

Store the result in the variable df_high.

Note - Don't use comma (,)

input

Enter the release date of the movie that is not in `English`.

Write down the earliest release_date of the movie that is not in English. To find the earliest date, sort the release_date column in ascending order.

codevalidated

Select movies with high runtime, rating, popularity and sort them by descending order.

Identify the movies with a runtime greater than 150 minutes, vote_average greater than 7, and popularity greater than 20 and sort the df by the vote_average column in descending order. Store the result in the variable df_long_movies.

codevalidated

Find Action, Adventure, or Sci-Fi movies sorted by descending order

Sort the df by the runtime column in descending order and select movies in the Action or Adventure or Science Fiction genres. Store the result in the variable df_action_sci_adv.

codevalidated

Find movies with a budget under 20M and revenue over 100M and sort them by descending order.

Sort the df by the revenue column in descending order and select movies with a budget less than 20,000,000 and revenue greater than 100,000,000. Store the result in the variable df_budget_revenue.

Practice Data Filtering & Sorting with Hollywood Movie DataPractice Data Filtering & Sorting with Hollywood Movie Data
Project Created by

Dhrubaraj Roy

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

This project is part of

Intro to Pandas for Data Analysis

Explore other projects