All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Rename the columns in the dataset for consistency. Change type_race
to race_type
and has_immortality
to is_immortal
.
Transform the binary values in the ability-related columns has_shapeshifting
, has_telepathy
, has_regeneration
, is_immortal
, has_teleportation
into descriptive labels. Replace 1
with Yes
and 0
with No
.
Select the first 10 rows of the dataframe df
and store it in a variable named first_ten_rows
.
Drop columns that have more than 30%
of their values missing.
Clean the dataset by dropping all rows containing any NaN
values. After removing these rows, reset the df
's index to ensure it remains sequential.
Extract numeric height values in centimeters from mixed string formats and convert them into numeric (int)
data type.
Extract all superheroes who belong to the human race from the race_type column
. Store the resultant dataframe in df_main_race
.
Filter the dataset to include only those superheroes who have a Good
alignment and are part of the Marvel Comics
. Store the resultant dataframe in marvel_good_alignment
.
Sort df
first by overall_score
in descending order, then by intelligence_score
. Store the sorted dataframe in df_sorted
.
Filter the dataset to find superheroes who have both super speed
and super strength
. Store the filtered dataframe in df_superpowers
.
Group the dataset by creator
and calculate the average overall_score
for each creator. Store the result in a variable named average_scores_by_creator
.
Group the dataset by race and calculate the average intelligence_score for each race. Store the result in race_intelligence
.
Create a function called categorize_score
that categorizes superheroes based on their overall_score
into five categories:
Then, create a new column named score_category
to store these categories.
Use the apply
function to create a new column power_index
, which is the sum of intelligence_score
, speed_score
, and durability_score
.
Use apply
to count the number of superpowers each superhero has and add it as a new column num_superpowers
.
Create a histogram to visualize the distribution of overall_score
. Store the plot in a variable named overall_score_histogram
.
Generate a correlation matrix to explore the relationships between different scores like intelligence_score
, speed_score
, and durability_score
. Store the correlation matrix in correlation_matrix.
Create a scatter plot of height
against power_score
to explore any potential relationship. Store the plot in height_vs_power_plot
.
Group the dataset by gender
and analyze the average number of superpowers each gender possesses. Store the result in gender_superpowers
.
Create a pie chart showing the distribution of alignments (Good, Bad, etc.) across the dataset. Store the chart in alignment_pie_chart
.
Answer this question by looking at the pie chart plotted in 25th activity.
Visualize the distribution using a boxplot. Store the result in height_boxplot
.