Exploring and Analysing Superhero Attributes using Pandas
Exploring and Analysing Superhero Attributes using Pandas Data Science Project
Data Wrangling with Pandas

Exploring and Analysing Superhero Attributes using Pandas

This project takes you into the world of superheroes, exploring their abilities, origins, and more through a comprehensive dataset. You’ll engage in activities that challenge your data cleaning, filtering, plotting and analysis skills with Pandas. Ready to uncover the data behind your favorite heroes? Let’s get started!
Start this project
Exploring and Analysing Superhero Attributes using PandasExploring and Analysing Superhero Attributes using Pandas
Project Created by

Lohith Unnam

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Rename Columns

Rename the columns in the dataset for consistency. Change type_race to race_type and has_immortality to is_immortal.

codevalidated

Convert Ability Columns to Descriptive Labels

Transform the binary values in the ability-related columns has_shapeshifting, has_telepathy, has_regeneration, is_immortal, has_teleportation into descriptive labels. Replace 1 with Yes and 0 with No.

codevalidated

Select the First 10 Rows of the DataFrame

Select the first 10 rows of the dataframe df and store it in a variable named first_ten_rows.

codevalidated

Drop Columns with High Missing Values

Drop columns that have more than 30% of their values missing.

codevalidated

Remove Rows Containing NaN Values and Reset the Index

Clean the dataset by dropping all rows containing any NaN values. After removing these rows, reset the df's index to ensure it remains sequential.

codevalidated

Convert Height Strings to Numeric Values

Extract numeric height values in centimeters from mixed string formats and convert them into numeric (int) data type.

codevalidated

Filter and Store Human Race Superheroes

Extract all superheroes who belong to the human race from the race_type column. Store the resultant dataframe in df_main_race.

codevalidated

Filter Superheroes by Alignment and Creator

Filter the dataset to include only those superheroes who have a Good alignment and are part of the Marvel Comics. Store the resultant dataframe in marvel_good_alignment.

codevalidated

Sort by Overall Score and Intelligence

Sort df first by overall_score in descending order, then by intelligence_score. Store the sorted dataframe in df_sorted.

codevalidated

Filter Superheroes with Specific Superpowers

Filter the dataset to find superheroes who have both super speed and super strength. Store the filtered dataframe in df_superpowers.

codevalidated

Group by Creator and Calculate Average Scores

Group the dataset by creator and calculate the average overall_score for each creator. Store the result in a variable named average_scores_by_creator.

input

Identify the Creator Whose Characters Have the Highest Average Overall Score?

codevalidated

Analyze Race Based on Intelligence

Group the dataset by race and calculate the average intelligence_score for each race. Store the result in race_intelligence.

multiplechoice

Among the following what is the most intelligent superhero race?

codevalidated

Discretize Overall Scores

Create a function called categorize_score that categorizes superheroes based on their overall_score into five categories:

  • 0-50 (Very Low)
  • 51-100 (Low)
  • 101-150 (Moderate)
  • 151-200 (High)
  • 201-250 (Very High)

Then, create a new column named score_category to store these categories.

codevalidated

Create a Power Index

Use the apply function to create a new column power_index, which is the sum of intelligence_score, speed_score, and durability_score.

codevalidated

Count the Number of Superpowers

Use apply to count the number of superpowers each superhero has and add it as a new column num_superpowers.

codevalidated

Visualize Overall Score Distribution

Create a histogram to visualize the distribution of overall_score. Store the plot in a variable named overall_score_histogram.

codevalidated

Create a Correlation Matrix

Generate a correlation matrix to explore the relationships between different scores like intelligence_score, speed_score, and durability_score. Store the correlation matrix in correlation_matrix.

multiplechoice

Which two following columns have the highest correlation score?

codevalidated

Scatter Plot of Height vs Power Score

Create a scatter plot of height against power_score to explore any potential relationship. Store the plot in height_vs_power_plot.

codevalidated

Analyze Superpowers Across Genders

Group the dataset by gender and analyze the average number of superpowers each gender possesses. Store the result in gender_superpowers.

codevalidated

Analyze Alignment Distribution

Create a pie chart showing the distribution of alignments (Good, Bad, etc.) across the dataset. Store the chart in alignment_pie_chart.

multiplechoice

To what the majority of superheroes are aligned with?

Answer this question by looking at the pie chart plotted in 25th activity.

codevalidated

Visualize Height Distribution

Visualize the distribution using a boxplot. Store the result in height_boxplot.

Exploring and Analysing Superhero Attributes using PandasExploring and Analysing Superhero Attributes using Pandas
Project Created by

Lohith Unnam

I'm an undergraduate student majoring in Computer Science and Engineering with a focus on Artificial Intelligence and Machine Learning. I have a strong passion for programming and AI. I aim to make meaningful contributions to the field of AI.

I'm an undergraduate student majoring in Computer Science and Engineering with a focus on Artificial Intelligence and Machine Learning. I have a strong passion for programming and AI. I aim to make meaningful contributions to the field of AI.

This project is part of

Data Wrangling with Pandas

Explore other projects