All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Modify the tracks_df
DataFrame directly to replace missing values.
Modify the tracks_df
Dataframe by replacing the missing values in the Duration_ms
, Loudness
, Speechiness
, Energy
, and Tempo
columns with their respective means.
Using the Duration_ms
column, we want to discretize the durations of our tracks into three categories:
Short
tracks: those that are between 0 and 180000
ms.Medium
tracks: between 180000
and 300000
msLong
: above 300000
ms Create the categories and store them in a new column Duration_Category
. It should look similar to:
Using the Tempo
column, we want to discretize the tempo of our tracks into three categories:
Slow
tracks: those that are between 0 and 100
bpm.Medium
tracks: between 100
and 140
bpmFast
: above 140
bpmCreate the categories and store them in a new column Tempo_Category
. It should look similar to:
Using the Views
column, we want to discretize the views of our tracks into two categories:
Non-Viral
tracks: those that are between 0 and 1,000,000
views.Viral
tracks: above 1,000,000
viewsCreate the categories and store them in a new column Viral_Category
. It should look similar to:
Store the generated chart in a variable named viral_tempo_bar_chart
and the organized data in another variable named viral_tempo_counts
.
Notes:
Please complete the previous activities before attempting this one.
Your chart should be a stacked bar chart and have a figure size of (10, 6)
.
Your chart should resemble the following example:
Store the resulting chart in the variable duration_viral_bar_chart
and the grouped data in the variable duration_viral_counts
.
Notes:
Ensure completion of previous activities before attempting this one.
Construct a stacked bar chart with a figure size of (10, 6)
.
Your resulting chart should resemble this example:
Store the resulting dummy variables in the album_type_dummies
variable. Be sure to prefix each variable with Track
. Don't forget to convert the dtype of each column in the dummy variables to bool
.
Using the Loudness
column, we want to discretize the loudness of our tracks into five categories:
Very Low
tracks: those that are between -50 and -35 dB.Low
tracks: between -35 and -20 dBModerate
tracks: between -20 and -5 dBHigh
tracks: between -5 and 10 dBVery High
: above 10 dBCreate the categories and store them in a new column Loudness_Category
. It should look similar to:
Write your answer in the below input box as an integer.
Generate dummy variables for the 'Artist' column. Each of these variables must be prefixed with 'Genre', using a colon ':' as the separator. Ensure the resultant data is stored in a new variable named genres_dummies
. Remember to convert the dtype of each column in the dummy variables to bool
.
Store the categorized results into a new column titled Speechiness_Quantile
. The quantiles should be labeled Q1
, Q2
, Q3
, Q4
, Q5
.
Using the Energy
column, we want to discretize the energy of our tracks into five categories:
Very Low
tracks: those that are between 0 and 0.2.Low
tracks: between 0.2 and 0.4Moderate
tracks: between 0.4 and 0.6High
tracks: between 0.6 and 0.8Very High
: above 0.8Create the categories and store them in a new column Energy_Category
. It should look similar to: