All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
From the data preview, you'd find that there are two additional columns serving as the index. Drop them from the dataset.
Determine the number of unique genres in the dataset.
Add a new column called Album Length
that contains the length (number of characters) of each album title.
Start by creating a dictionary that maps French weekday abbreviations to their full English names. Then use the .apply()
function along with a lambda function to replace the values in the WeekDay
column with their English equivalents. Following this, use the .applymap()
function to convert all string elements in the DataFrame to title case, ensuring a consistent and clean format across all textual data. Finally, display the first few rows of the updated DataFrame with df.head()
to verify the changes.
Create a new column by extracting the initials of each artist's name. Use the .apply()
function on the Artist
column to apply a custom lambda function that splits each artist's name into individual words, takes the first letter of each word, and then joins these letters together to form the initials.
For example, if the Artist's name is John Doe
, the new Artist Initials
column should contain JD
.
Define a function that categorizes each genre into the following groups: Urban
, Rock
, Pop
, and Other
. The function checks if the genre belongs to specific predefined categories (Hip hop, R&B, Electronic/EDM for Urban
, Rock for Rock
, Pop for Pop
and Other
) and returns the corresponding group name. Then use the .apply()
function to apply the function to the Genre
column , creating a new column called Genre Type
that contains the broader genre classifications.
Group the DataFrame by the Artist
column and use .apply()
with a lambda function to count the number of albums for each artist. Then sort this Series in descending order , so the artists with the most albums appear first. Store your final output in the variable: album_count_series
.
Define a function that checks if a given day name belongs to the weekend (Saturday or Sunday). Then use the .apply()
function on the WeekDay
column to apply this function element-wise. This results in a boolean mask where each row in the DataFrame is marked True if the corresponding WeekDay is a weekend day, and False otherwise. This will create a new DataFrame: filtered_df
that contains only the rows where albums were released on weekends.
Use the .apply()
function with a lambda function to capitalize the first letter of each artist's name in the Artist
column of the DataFrame. This will capitalize the initial letter of every name within the Artist column of the DataFrame
Start by defining a function that takes a string of artists separated by commas and checks if there are multiple artists, indicating a collaboration. This function returns True if there is more than one artist in the list and false if otherwise. Next, use the .apply()
method to apply this function to the Artist
column of a DataFrame df. This will create a new column Collaboration
that holds True for rows where there is a collaboration and False otherwise. Finally, create a new DataFrame collaborations_df
that contains only the rows from df where the Collaboration
column is True, effectively filtering the original DataFrame to include only the rows with collaborations.
Create a mask using the .where()
function on the Artist
column of the DataFrame. The mask is constructed to mark entries as True where the artist's name starts with the letter A
, and False otherwise. Then filter the DataFrame using this mask to create a subset named a_artists
containing only those rows where the mask is True, effectively displaying artists whose names start with A
.
Modify the Label
column in the DataFrame based on a condition using the .where()
function. Specifically, apply .where()
to check if each row in the Genre
column does not equal Jazz
. For rows where this condition is True (i.e., the Genre is not Jazz), the original Label
value is retained. For rows where the condition is False (i.e., the Genre is Jazz), the Label is replaced with Ipecac
.
For example, if an album's genre is Jazz
and the label is Blue Note
, it will be replaced with Ipecac.
Use the .where()
function to conditionally modify the Album
column in the DataFrame. For rows where the weekday isnt Sunday, the original Album
value is retained. For rows where the Weekday is Sunday, the Album
is replaced with Special Release
. In summary, replace the Album
value with Special Release
wherever the WeekDay
is Sunday
. If the WeekDay
is not Sunday
, keep the original Album
value.
In this activity, you are required to create a new DataFrame. This new DataFrame will only contain rows where the Genre of the album is Rock
and the album was released in July
. Make sure to use the existing DataFrame to filter only for the criteria described, dropping any rows that contain NaN values resulting from the filtering process, and drop the Collaboration
column. Store your output in the variable: rock_df
and show the first 5 rows of the new DataFrame once you've created it.
Create a function that decodes the month from the release date, verifying if it falls in the initial six months or last six months of the year, returning either FIRST HALF or SECOND HALF as appropriate. Apply this function to the Release date
column of the DataFrame and generate a new column named Release Half
. Subsequently, conditionally alter the Genre values to SECOND HALF ROCK for rows that fall under the Rock genre and have a release date in the second half of the year, while leaving the other Genre values intact.
Start by defining a list of major record labels (Columbia, Sony Music, Atlantic, Nuclear Blast, Rca, and Island), and a function to check if a given label is in that list. Then creates a new column, Major Label
to flag records associated with these major labels. Finally, use the where()
function to create a new series that retains the Genre
values only if the record is from a major label and the genre is Pop
, replacing all other values with NaN. Store your final output in the variable: series
.
Your task is to calculate the average album length by genre for all major labels. This will involve selecting only records associated with major labels, grouping the data by Genre
, and then calculating the average Album Length
for each group. Store your final output in the variable: average_lengths_major_labels
.