Data Transformation: A Comparative Analysis of .apply(), .applymap(), and .where()

codevalidated

Remove these columns from the dataset: `Unnamed: 0`, `Filter`.

From the data preview, you'd find that there are two additional columns serving as the index. Drop them from the dataset.

input

How many unique genres are in the dataset?

Determine the number of unique genres in the dataset.

codevalidated

Calculate the Length of Album Titles.

Add a new column called Album Length that contains the length (number of characters) of each album title.

codevalidated

Convert the French weekday abbreviations in the WeekDay column to their corresponding English names.

Start by creating a dictionary that maps French weekday abbreviations to their full English names. Then use the .apply() function along with a lambda function to replace the values in the WeekDay column with their English equivalents. Following this, use the .applymap() function to convert all string elements in the DataFrame to title case, ensuring a consistent and clean format across all textual data. Finally, display the first few rows of the updated DataFrame with df.head() to verify the changes.

codevalidated

Create a new column: `Artist Initials` that contains the initials of each artist's name.

Create a new column by extracting the initials of each artist's name. Use the .apply() function on the Artist column to apply a custom lambda function that splits each artist's name into individual words, takes the first letter of each word, and then joins these letters together to form the initials. For example, if the Artist's name is John Doe, the new Artist Initials column should contain JD.

codevalidated

Create a new column that classifies genres into different categories.

Define a function that categorizes each genre into the following groups: Urban, Rock, Pop, and Other. The function checks if the genre belongs to specific predefined categories (Hip hop, R&B, Electronic/EDM for Urban, Rock for Rock, Pop for Pop and Other) and returns the corresponding group name. Then use the .apply() function to apply the function to the Genre column , creating a new column called Genre Type that contains the broader genre classifications.

codevalidated

Compute the Number of Albums per Artist

Group the DataFrame by the Artist column and use .apply() with a lambda function to count the number of albums for each artist. Then sort this Series in descending order , so the artists with the most albums appear first. Store your final output in the variable: album_count_series.

codevalidated

Identify Albums Released Over the Weekend (Saturday and Sunday).

Define a function that checks if a given day name belongs to the weekend (Saturday or Sunday). Then use the .apply() function on the WeekDay column to apply this function element-wise. This results in a boolean mask where each row in the DataFrame is marked True if the corresponding WeekDay is a weekend day, and False otherwise. This will create a new DataFrame: filtered_df that contains only the rows where albums were released on weekends.

codevalidated

Capitalize All Artist's Names in the Dataset

Use the .apply() function with a lambda function to capitalize the first letter of each artist's name in the Artist column of the DataFrame. This will capitalize the initial letter of every name within the Artist column of the DataFrame

codevalidated

Identify Collaborations (Multiple Artists in Album)

Start by defining a function that takes a string of artists separated by commas and checks if there are multiple artists, indicating a collaboration. This function returns True if there is more than one artist in the list and false if otherwise. Next, use the .apply() method to apply this function to the Artist column of a DataFrame df. This will create a new column Collaboration that holds True for rows where there is a collaboration and False otherwise. Finally, create a new DataFrame collaborations_df that contains only the rows from df where the Collaboration column is True, effectively filtering the original DataFrame to include only the rows with collaborations.

codevalidated

Identify albums where the artist's name starts with an `A` in the `Artist` column.

Create a mask using the .where() function on the Artist column of the DataFrame. The mask is constructed to mark entries as True where the artist's name starts with the letter A, and False otherwise. Then filter the DataFrame using this mask to create a subset named a_artists containing only those rows where the mask is True, effectively displaying artists whose names start with A.

codevalidated

Use .where() to filter albums where the genre is `Jazz` and replace their Labels with `Ipecac`

Modify the Label column in the DataFrame based on a condition using the .where() function. Specifically, apply .where() to check if each row in the Genre column does not equal Jazz. For rows where this condition is True (i.e., the Genre is not Jazz), the original Label value is retained. For rows where the condition is False (i.e., the Genre is Jazz), the Label is replaced with Ipecac. For example, if an album's genre is Jazz and the label is Blue Note, it will be replaced with Ipecac.

codevalidated

Change the names of albums released on `Sunday` to `Special Release` and leave others unchanged.

Use the .where() function to conditionally modify the Album column in the DataFrame. For rows where the weekday isnt Sunday, the original Album value is retained. For rows where the Weekday is Sunday, the Album is replaced with Special Release. In summary, replace the Album value with Special Release wherever the WeekDay is Sunday. If the WeekDay is not Sunday, keep the original Album value.

codevalidated

Create a new DataFrame that only contains rows where the `Genre` is `Rock` and was released in `July`.

In this activity, you are required to create a new DataFrame. This new DataFrame will only contain rows where the Genre of the album is Rock and the album was released in July. Make sure to use the existing DataFrame to filter only for the criteria described, dropping any rows that contain NaN values resulting from the filtering process, and drop the Collaboration column. Store your output in the variable: rock_dfand show the first 5 rows of the new DataFrame once you've created it.

codevalidated

Formulate a New Column Indicating First or Second Half Release Date, Replace `Rock` Genres in the Second Half with `SECOND HALF Rock`

Create a function that decodes the month from the release date, verifying if it falls in the initial six months or last six months of the year, returning either FIRST HALF or SECOND HALF as appropriate. Apply this function to the Release date column of the DataFrame and generate a new column named Release Half. Subsequently, conditionally alter the Genre values to SECOND HALF ROCK for rows that fall under the Rock genre and have a release date in the second half of the year, while leaving the other Genre values intact.

codevalidated

Create a flag column indicating whether an album is from a major label

Start by defining a list of major record labels (Columbia, Sony Music, Atlantic, Nuclear Blast, Rca, and Island), and a function to check if a given label is in that list. Then creates a new column, Major Label to flag records associated with these major labels. Finally, use the where() function to create a new series that retains the Genre values only if the record is from a major label and the genre is Pop, replacing all other values with NaN. Store your final output in the variable: series.

codevalidated

Calculate the average album length by Genre for major labels.

Your task is to calculate the average album length by genre for all major labels. This will involve selecting only records associated with major labels, grouping the data by Genre, and then calculating the average Album Length for each group. Store your final output in the variable: average_lengths_major_labels.

Adeyinka Odiaka

Project Activities

Remove these columns from the dataset: `Unnamed: 0`, `Filter`.

How many unique genres are in the dataset?

Calculate the Length of Album Titles.

Convert the French weekday abbreviations in the WeekDay column to their corresponding English names.

Create a new column: `Artist Initials` that contains the initials of each artist's name.

Create a new column that classifies genres into different categories.

Compute the Number of Albums per Artist

Identify Albums Released Over the Weekend (Saturday and Sunday).

Capitalize All Artist's Names in the Dataset

Identify Collaborations (Multiple Artists in Album)

Identify albums where the artist's name starts with an `A` in the `Artist` column.

Use .where() to filter albums where the genre is `Jazz` and replace their Labels with `Ipecac`

Change the names of albums released on `Sunday` to `Special Release` and leave others unchanged.

Create a new DataFrame that only contains rows where the `Genre` is `Rock` and was released in `July`.

Formulate a New Column Indicating First or Second Half Release Date, Replace `Rock` Genres in the Second Half with `SECOND HALF Rock`

Create a flag column indicating whether an album is from a major label

Calculate the average album length by Genre for major labels.

Adeyinka Odiaka

Data Wrangling with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database