All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Use a vectorized operation to compute the mean
age of all passengers in the df
.
Enter the exact value returned.
Create a boolean series named is_survivor
using the Survived
column , indicating whether each passenger survived.
Your result should look something like this :
df['Age'].fillna(df['Age'].mean(), inplace=True)
Calculate the mean
age from the Age
column and store in a variable named mean_age
.
Create a series named age_difference
to store age difference from the mean_age
.
Your result would look something like this :
Calculate the minimum and maximum values from the Fare
column and store them in variables named fare_min
and fare_max
.
Next, create a series named normalized_fare
to store the normalized fare values, calculated using the formula:
(fare - min_fare ) / ( max_fare - min_fare ) .
Your result would look something like this:
Create a series named family_size
by calculating the total family size for each passenger.
This is done by summing the values from the Siblings/Spouses Aboard
and Parents/Children Aboard
columns, and adding 1 (to include the passenger themselves).
Your result would look something like this:
Calculate the fare per family member by dividing the Fare
column by the family_size
.
Store the result in a series named fare_per_family_member
.
Your result would look something like this:
Create a series fare_weight
by dividing the Fare
values by the maximum fare value.
Then, calculate the weighted age by multiplying the Age
column by the fare_weight
and store the result in a series named weighted_age
.
Your result would look something like this:
Sort the Fare
column in ascending
order and store it in a series named sorted_fares
.
Then, calculate the cumulative fare percentage by taking the cumulative sum of the sorted_fares
and dividing it by the total fare sum.
Multiply the result by 100 to express it as a percentage.
Store the final result in a series named cumulative_fare_percentage
.
Your result would look something like this:
Calculate the first (Q1) and third (Q3) quartiles of the Fare column.
Then, compute the interquartile range (IQR) as Q3 - Q1.
Identify any outliers where the fare is less than Q1 - 1.5 * IQR or greater than Q3 + 1.5 * IQR.
Store the result as a boolean series named is_fare_outlier
.
Your result would look something like this:
Sort the dataset by its index and then calculate the rolling average of the Fare
over a window of 10 rows.
The minimum number of periods required for calculation is 1.
Store the result in a series named rolling_average_fare
.
Your result would look something like this: