Spotify Data Explorer: Honing DataFrame Mutation Techniques
Spotify Data Explorer: Honing DataFrame Mutation Techniques Data Science Project
Intro to Pandas for Data Analysis

Spotify Data Explorer: Honing DataFrame Mutation Techniques

Get ready to rock the data from the roaring 1920s! Journey back to 1928 and explore a unique dataset of hit songs. Uncover the popular artists, genres, and musical trends of the era using Python and Pandas. Master techniques like renaming columns, adding new features, and updating values. With each line of code, you'll bring the vintage melodies back to life and discover the stories hidden in the data.
Start this project
Spotify Data Explorer: Honing DataFrame Mutation TechniquesSpotify Data Explorer: Honing DataFrame Mutation Techniques
Project Created by

Dhrubaraj Roy

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Rename the `acousticness` column to `acoustic_level`

The acousticness column in the DataFrame represents the acoustic level of each song. To make the column name more descriptive and readable, use the rename() function to change the column name from acousticness to acoustic_level. Set the inplace parameter to True to modify the DataFrame directly without creating a new copy. This renaming operation will update the column name in the original DataFrame df.

codevalidated

Rename Multiple Columns Using the `rename()` Function

Rename multiple columns in the DataFrame df to make them more descriptive, concise, and easily understandable. Change 'danceability' to 'dance_score', 'duration_ms' to 'duration_milliseconds', 'instrumentalness' to 'instrumental', 'liveness' to 'live_performance', and 'speechiness' to 'speech_presence'. Assign the resulting DataFrame with the renamed columns back to the variable df to update the original DataFrame.

codevalidated

Add a new column called `duration_seconds` that converts the `duration_milliseconds` column from milliseconds to seconds

Convert the duration_milliseconds column values to seconds and store the result in a new column named duration_seconds.

Note : New column added at the end of the df

codevalidated

Add a new column called `popularity_score` that multiplies the `popularity` column by 0.01

Rescale the values in the popularity column by multiplying them with 0.01 and store the rescaled values in a new column named popularity_score.

Note : New column added at the end of the df

codevalidated

Add a new column called `is_popular` that assigns 1 to songs with `popularity` greater than 70 and 0 otherwise

Create a new column is_popular that contains 1 for rows where the popularity value is greater than 70, and 0 otherwise. Convert the boolean result to integer values, where True becomes 1 and False becomes 0. This new column will indicate whether a song is popular or not, with 1 representing popular songs and 0 representing non-popular songs.

Note : New column added at the end of the df

codevalidated

Add a new column called `artist_count` that counts the number of artists in the `artists` column

Calculate the number of artists for each row by counting the number of commas in the artists column and adding 1, then store the result in a new column named artist_count.

Note : New column added at the end of the df

codevalidated

Add a new column called `duration_minutes` that calculates the duration in minutes from the `duration_seconds` column

Convert the duration_seconds column from seconds to minutes and store the result in a new column named duration_minutes.

Note : New column added at the end of the df

codevalidated

Update the `popularity` column by adding 10 to each value

Increase the values in the popularity column by adding 10 to each value.

codevalidated

Update the `speech_presence` column by multiplying each value by 0.8

Reduce the values in the speech_presence column by multiplying them with 0.8.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

codevalidated

Update the `dance_score` column by subtracting 0.1 from each value

Decrease the values in the dance_score column by subtracting 0.1 from each value.

codevalidated

Update the `mode` column by replacing 0 with 'Minor' and 1 with 'Major'

Replace the numerical values in the mode column with textual representations, where 0 is replaced with 'Minor' and 1 is replaced with 'Major'.

codevalidated

Update the `tempo` column by setting values greater than 150 to 150

Limit the maximum value in the tempo column to 150 by clipping any values above 150 to 150.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

codevalidated

Replace All Numerical Values in the 'Key' Column with Their Corresponding Note Names

Replace the numerical values in the key column with their corresponding note names, the mappings are:

0 → 'C', 1 → 'C#', 2 → 'D', 3 → 'D#', 4 → 'E', 5 → 'F', 6 → 'F#', 7 → 'G', 8 → 'G#', 9 → 'A', 10 → 'A#', 11 → 'B'

codevalidated

Replace the `explicit` column values 0 and 1 with `Not Explicit` and `Explicit`, respectively

Replace the numerical values in the explicit column with textual representations, where 0 is replaced with 'Not Explicit' and 1 is replaced with 'Explicit'.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

codevalidated

Replace the `year` column values before 1950 with 1950

For rows where the year value is less than 1950, replace the year value with 1950.

codevalidated

Replace the `tempo` column values above 150 with 150 and values below 50 with 50

Limit the tempo column values between 50 and 150. For values exceeding 150, replace them with 150, and for values below 50, replace them with 50.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

Spotify Data Explorer: Honing DataFrame Mutation TechniquesSpotify Data Explorer: Honing DataFrame Mutation Techniques
Project Created by

Dhrubaraj Roy

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

This project is part of

Intro to Pandas for Data Analysis

Explore other projects