Transforming Video Game Sales Data with Pandas: Apply, Applymap, and Where
Transforming Video Game Sales Data with Pandas: Apply, Applymap, and Where Data Science Project
Data Wrangling with Pandas

Transforming Video Game Sales Data with Pandas: Apply, Applymap, and Where

Explore the video game industry's data treasure! Enhance your Pandas skills by reshaping game sales info. You'll group genres with custom functions, analyze global sales using `.apply()`, and reveal regional patterns with `.transform()`. Learn data wrangling. You will sort games and figure out success rates. You will also spot cross-platform hits. Practice with lambda functions, `.applymap()`, and `.where()`. Boost your data skills while exploring how gaming has changed over time!
Start this project
Transforming Video Game Sales Data with Pandas: Apply, Applymap, and WhereTransforming Video Game Sales Data with Pandas: Apply, Applymap, and Where
Project Created by

Dhrubaraj Roy

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Calculate Annual Revenue Performance Using `Global_Sales` and `year`

The year column represents the release year of each game, and the Global_Sales column represents the total worldwide sales of the game in millions of units.

Create a new column called Revenue_Per_Year to store the calculated values. For games released before 2024, divide the Global_Sales by the number of years between the release year and 2024. For games released in or after 2024, simply use the Global_Sales value as the revenue per year.

Note : New column added at the end of the df

codevalidated

Calculate `Title_Length` from `Name` Column

Create a new column called Title_Length to store the length of each game's title. Calculate the number of characters in each game's name from the Name column and assign this count to the new Title_Length column.

codevalidated

Categorize `Genre` into Broader `Genre_Category`

Create a new column called Genre_Category to store a simplified categorization of game genres. First, define a function categorize_genre() that takes a genre as input and returns a broader category. This function should map genres to four main categories: 'Action/Adventure', 'Sports/Racing', 'RPG/Strategy', or 'Other'. Then, apply this function to the Genre column to populate the new Genre_Category column.

The resulting Genre_Category column will contain simplified genre categories for each game, providing a more streamlined classification of game types.

codevalidated

Calculate `Total_Sales` from Regional Sales Columns

Create a new column Total_Sales that represents the overall sales performance of each game globally. This column will sum up the individual sales values from the NA_Sales, EU_Sales, JP_Sales, and Other_Sales columns for each game entry.

The Total_Sales column provides a comprehensive view of each game's commercial success across all markets, allowing for easier comparison and analysis of overall game performance.

codevalidated

Calculate Regional Sales Percentages from `Total_Sales`

Calculate regional sales percentages for each game. Use the Total_Sales column to determine the percentage of sales in North America, Europe, Japan, and Other regions. If Total_Sales is zero, set all percentages to 0. The result will be stored in new columns: NA_Percentage, EU_Percentage, JP_Percentage, and Other_Percentage.

Note : New column added at the end of the df

codevalidated

Categorize `Total_Sales` into `Sales_Category`

Create a new column called Sales_Category to categorize each game's sales performance. First, define a function categorize_sales() that takes a sales value as input and returns a category: 'Low' for sales less than 1 million, 'Medium' for sales between 1 and 5 million, and 'High' for sales of 5 million or more. Then, apply this function to the Total_Sales column to populate the new Sales_Category column.

The resulting Sales_Category column will contain the sales category ('Low', 'Medium', or 'High') for each game based on its total sales value.

codevalidated

Generate `Summary` String from `Name`, `Platform`, and `Total_Sales`

Create a new column called Summary to store a concise description of each game. For every row in the DataFrame, generate a string that combines the game's name from the Name column, its platform from the Platform column, and its total sales figure from the Total_Sales column. Format the total sales value to display two decimal places and append "million" to indicate the unit.

The resulting Summary column will contain a formatted string for each game, providing a quick overview of the game's key information in a single field.

codevalidated

Calculate `Years_Since_Release` from `Year`

Create a new column called Years_Since_Release to calculate how long ago each game was released. Use the Year column to determine the time elapsed since each game's release date. For games released in or before the current year (2024), calculate the difference between the current year and the release year. For any games with future release dates, assign a value of 0.

The resulting Years_Since_Release column will show the age of each game in years, with unreleased games marked as 0 years old.

Note : New column added at the end of the df

codevalidated

Format `Global_Sales` as `Formatted_Global_Sales`

Create a new column called Formatted_Global_Sales to present the global sales figures in a more readable format. Transform the values from the Global_Sales column by rounding them to two decimal places, replacing the decimal point with a comma, and adding an "M" suffix to represent millions.

The resulting Formatted_Global_Sales column will display the sales figures in a standardized format, making it easier to read and compare values across different games.

codevalidated

Create `Release_Decade` from `Year`

Create a new column called Release_Decade to group games by their release decade. Use the Year column to determine which decade each game belongs to. Round down each year to the nearest decade and format the result as a string ending with "s".

The resulting Release_Decade column will display the decade of release for each game (e.g., "1980s", "1990s", "2000s"), allowing for easy categorization and analysis of games by era.

Note : New column added at the end of the df

codevalidated

Flag Top 10% of Sales in Each Region

The NA_Sales, EU_Sales, JP_Sales, and Other_Sales columns represent the sales figures for each game in North America, Europe, Japan, and other regions, respectively.

Calculate and flag top 10% selling games for each region. Create new columns Top_NA_Sales, Top_EU_Sales, Top_JP_Sales, and Top_Other_Sales to indicate if a game's sales in that region are in the top 10%. Use the quantile() function to determine the threshold for each region, then apply a lambda function to compare each game's sales against this threshold

Note : New columns are added at the end of the df

codevalidated

Identify Games from Most Recent `Decade`

Create two new columns: Decade and Is_Recent.

For Decade, use the Year column to determine which decade each game belongs to. Round down each year to the nearest decade and format the result as a string ending with "s".

For Is_Recent, identify the most recent decade from the Decade column. Then, mark each game as 'Yes' if it belongs to this most recent decade, or 'No' if it doesn't.

These columns will help categorize games by their release era and highlight those from the most recent decade.

multiplechoice

How many games are recent?

codevalidated

Calculate `Sales_vs_Genre_Avg` Using `Global_Sales`

Compare each game's sales performance to the average sales of its genre. First, calculate the mean global sales for each genre. Then, create a new column Sales_vs_Genre_Avg that shows how each game's global sales compare to its genre's average. For genres with zero average sales, set the ratio to zero to avoid division errors.

The Sales_vs_Genre_Avg column will contain values above 1 for games that outperform their genre average, and below 1 for those that underperform. This metric helps identify standout games within each genre.

codevalidated

Identify Cross-Platform Hits Using `Platform_Count` and `Global_Sales`

Create two new columns: Platform_Count and Cross_Platform_Hit.

For Platform_Count, use the Name and Platform columns to count how many different platforms each game was released on.

For Cross_Platform_Hit, identify games that were released on multiple platforms (more than one) and had global sales exceeding 5 million units. Mark these games as 'Yes', and others as 'No'.

These columns will help analyze the prevalence and success of multi-platform game releases.

codevalidated

Calculate `Publisher_Success_Rate` Based on `Global_Sales`

Create a new column called Publisher_Success_Rate. First, define a function called success_rate that calculates the percentage of a publisher's games with global sales exceeding 1 million units. Then, apply this success_rate function to each publisher's game catalog using the Publisher and Global_Sales columns.

The Publisher_Success_Rate column will show the percentage of successful games for each publisher, providing insight into which publishers consistently produce high-selling games.

codevalidated

Determine `Dominant_Region` from Regional Sales Columns

Create a new column Dominant_Region that identifies the region with the highest sales. Use a dictionary to compare sales across North America, Europe, Japan, and Other regions. If North America is the dominant region, represent it as NaN; otherwise, return the region code.

NA_Sales, EU_Sales, JP_Sales, and Other_Sales columns represent sales figures for each region.

codevalidated

Analyze `Genre` Popularity Trends Across `Decade`

Calculate the performance ranking of each genre within its release decade based on total global sales. Group the data by decade and genre, then rank genres within each decade according to their total sales. Create a new column Genre_Rank_in_Decade to store these rankings.

Next, analyze how each genre's popularity changes over time. Compare a genre's rank in its current decade to its average rank in previous decades. Create a Genre_Trend column, marking a genre as 'Rising' if its current rank is better than its historical average, and 'Falling' otherwise.

This analysis helps track how different game genres perform and evolve in popularity across different decades.

codevalidated

Calculate `NA_EU_Ratio` from `NA_Sales` and `EU_Sales`

Create a new column called NA_EU_Ratio to compare North American and European sales for each game. Use the NA_Sales and EU_Sales columns to calculate this ratio. For each game, divide the North American sales by the European sales, but only if European sales are greater than zero. If European sales are zero, set the ratio to zero to avoid division errors.

The NA_EU_Ratio column will show how North American sales compare to European sales for each game, with higher values indicating stronger performance in North America relative to Europe.

codevalidated

Format Numeric Columns to Two Decimal Places

Format all numeric columns in the dataset to display two decimal places. First, select columns with float or integer data types. Then, apply string formatting to these numeric values, rounding them to two decimal places. Finally, create a new DataFrame df_numeric containing only these formatted numeric columns.

df_numeric will contain all numeric columns from the original dataset, with values formatted as strings with two decimal places.

Transforming Video Game Sales Data with Pandas: Apply, Applymap, and WhereTransforming Video Game Sales Data with Pandas: Apply, Applymap, and Where
Project Created by

Dhrubaraj Roy

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

This project is part of

Data Wrangling with Pandas

Explore other projects