All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
From the data preview, you'd find that there are two additional columns serving as the index. Drop them from the dataset.
Determine the number of unique genres in the dataset.
Add a new column called Album Length that contains the length (number of characters) of each album title.
Start by creating a dictionary that maps French weekday abbreviations to their full English names. Then use the .apply() function along with a lambda function to replace the values in the WeekDay column with their English equivalents. Following this, use the .applymap() function to convert all string elements in the DataFrame to title case, ensuring a consistent and clean format across all textual data. Finally, display the first few rows of the updated DataFrame with df.head() to verify the changes.
Create a new column by extracting the initials of each artist's name. Use the .apply() function on the Artist column to apply a custom lambda function that splits each artist's name into individual words, takes the first letter of each word, and then joins these letters together to form the initials.
For example, if the Artist's name is John Doe, the new Artist Initials column should contain JD.
Define a function that categorizes each genre into the following groups: Urban, Rock, Pop, and Other. The function checks if the genre belongs to specific predefined categories (Hip hop, R&B, Electronic/EDM for Urban, Rock for Rock, Pop for Pop and Other) and returns the corresponding group name. Then use the .apply() function to apply the function to the Genre column , creating a new column called Genre Type that contains the broader genre classifications.
Group the DataFrame by the Artist column and use .apply() with a lambda function to count the number of albums for each artist. Then sort this Series in descending order , so the artists with the most albums appear first. Store your final output in the variable: album_count_series.
Define a function that checks if a given day name belongs to the weekend (Saturday or Sunday). Then use the .apply() function on the WeekDay column to apply this function element-wise. This results in a boolean mask where each row in the DataFrame is marked True if the corresponding WeekDay is a weekend day, and False otherwise. This will create a new DataFrame: filtered_df that contains only the rows where albums were released on weekends.
Use the .apply() function with a lambda function to capitalize the first letter of each artist's name in the Artist column of the DataFrame. This will capitalize the initial letter of every name within the Artist column of the DataFrame
Start by defining a function that takes a string of artists separated by commas and checks if there are multiple artists, indicating a collaboration. This function returns True if there is more than one artist in the list and false if otherwise. Next, use the .apply() method to apply this function to the Artist column of a DataFrame df. This will create a new column Collaboration that holds True for rows where there is a collaboration and False otherwise. Finally, create a new DataFrame collaborations_df that contains only the rows from df where the Collaboration column is True, effectively filtering the original DataFrame to include only the rows with collaborations.
Create a mask using the .where() function on the Artist column of the DataFrame. The mask is constructed to mark entries as True where the artist's name starts with the letter A, and False otherwise. Then filter the DataFrame using this mask to create a subset named a_artists containing only those rows where the mask is True, effectively displaying artists whose names start with A.
Modify the Label column in the DataFrame based on a condition using the .where() function. Specifically, apply .where() to check if each row in the Genre column does not equal Jazz. For rows where this condition is True (i.e., the Genre is not Jazz), the original Label value is retained. For rows where the condition is False (i.e., the Genre is Jazz), the Label is replaced with Ipecac.
For example, if an album's genre is Jazz and the label is Blue Note, it will be replaced with Ipecac.
Use the .where() function to conditionally modify the Album column in the DataFrame. For rows where the weekday isnt Sunday, the original Album value is retained. For rows where the Weekday is Sunday, the Album is replaced with Special Release. In summary, replace the Album value with Special Release wherever the WeekDay is Sunday. If the WeekDay is not Sunday, keep the original Album value.
In this activity, you are required to create a new DataFrame. This new DataFrame will only contain rows where the Genre of the album is Rock and the album was released in July. Make sure to use the existing DataFrame to filter only for the criteria described, dropping any rows that contain NaN values resulting from the filtering process, and drop the Collaboration column. Store your output in the variable: rock_dfand show the first 5 rows of the new DataFrame once you've created it.
Create a function that decodes the month from the release date, verifying if it falls in the initial six months or last six months of the year, returning either FIRST HALF or SECOND HALF as appropriate. Apply this function to the Release date column of the DataFrame and generate a new column named Release Half. Subsequently, conditionally alter the Genre values to SECOND HALF ROCK for rows that fall under the Rock genre and have a release date in the second half of the year, while leaving the other Genre values intact.
Start by defining a list of major record labels (Columbia, Sony Music, Atlantic, Nuclear Blast, Rca, and Island), and a function to check if a given label is in that list. Then creates a new column, Major Label to flag records associated with these major labels. Finally, use the where() function to create a new series that retains the Genre values only if the record is from a major label and the genre is Pop, replacing all other values with NaN. Store your final output in the variable: series.
Your task is to calculate the average album length by genre for all major labels. This will involve selecting only records associated with major labels, grouping the data by Genre, and then calculating the average Album Length for each group. Store your final output in the variable: average_lengths_major_labels.