Data Wrangling in Action: Analyzing Kindle Books Data

codevalidated

Perform a Left Join on Two DataFrames Using a Common Column

Merge DataFrames df1 and df2 based on the common column ASIN using a left join. Store the result in the df variable.

multiplechoice

Which of the following column uniquely identify each row in our data?

input

What is the number of unique book categories present in our dataset?

Enter the number of categories of books present in the DataFrame df.

codevalidated

Filter and Sort High-Rated Books by Price

Filter the DataFrame df to include only books with a rating higher than 4.5. After filtering, sort the books by price in descending order and store the top five rows in top_rated_books.

codevalidated

Filter and Sort Recent Bestsellers by Rating

Filter the DataFrame df to include only bestseller books published from 2020 onwards. After filtering, sort the books by rating in descending order and store the top ten rows in recent_bestsellers.

multiplechoice

Which of the following is NOT a valid aggregation function in pandas?

multiplechoice

What does the following code do?

   df.groupby('Category_Name')['Price'].mean()

input

Find the Seller with the Highest Average Price

Enter the name of the seller with the highest average book price.

codevalidated

Group and Analyze Reviews by Author

Group the DataFrame df by Author and calculate the total number of reviews and the average rating for each author. After grouping, sort the result by the total number of reviews in descending order and store the top five authors in top_authors_by_reviews.

codevalidated

Create a Summary DataFrame that shows the following for each category of Books

The number of books
The average rating
The percentage of books that are bestsellers

Store the resultant dataframe in category_summary_df

multiplechoice

What is the primary purpose of binning in data analysis?

codevalidated

Create Dummy Variables for Book Categories

Create a new DataFrame containing dummy variables for the Category_Name column. Convert the categorical values in Category_Name into binary dummy variables. Store the resultant DataFrame in the variable category_dummies.

multiplechoice

Which method would you use to create bins with an equal number of items in each bin?

codevalidated

Categorize Prices into Bins

Create a new column Price_Category by binning the Price column into 5 equal-width bins.

Use the following bins and labels:

0-140: Very Low
140-280: Low
280-420: Medium
420-560: High
560-700: Very High

Make sure to set include_lowest=True

multiplechoice

What's the main difference between `apply()` and `applymap()`?

codevalidated

Extract Year from `Published_Date`

Use a lambda function with the apply() method to extract the year from the Published_Date column. Store the extracted year in published_year_series variable.

codevalidated

Classify Titles by Length

Calculate the length of each book title in terms of the number of words and store it in a new column Title_Length. Determine the median title length. Classify each title as either Long Title or Short Title based on Title_Length. If the Title_Length is greater than or equal to the median length then classify it as Long Title and if the Title_Length is less than median then classify as Short Title. Store this classification in a new column Title_Category.

codevalidated

Normalize Author and Category Names to Lowercase

Create a new DataFrame df_author_category containing only the Author and Category_Name columns from the original DataFrame df. Convert all text in these columns to lowercase using applymap() and store the result in df_lower_case.

codevalidated

Categorize Books Based on Ratings

Use the apply() method with a custom function to categorize books based on their ratings as follows:

5 stars: Excellent
4 to 4.9 stars: Very Good
3 to 3.9 stars: Good
Below 3 stars: Average

And store the categorized rating in Rating_Category column. After that, use groupby() to count how many books fall into each category. Store the result as a series in the rating_counts_series variable.

multiplechoice

Which of the following code snippets would you use to create a histogram with `10` bins for the `Price` column?

codevalidated

Bar Plot of Top 10 Authors by Number of Appearances

Create a bar plot displaying the top 10 authors who appear most frequently in our dataset.

codevalidated

Pie Chart of Bestseller Distribution

Create a pie chart to visualize the distribution of books that are bestsellers versus those that are not.

Lohith Unnam

Project Activities

Perform a Left Join on Two DataFrames Using a Common Column

Which of the following column uniquely identify each row in our data?

What is the number of unique book categories present in our dataset?

Filter and Sort High-Rated Books by Price

Filter and Sort Recent Bestsellers by Rating

Which of the following is NOT a valid aggregation function in pandas?

What does the following code do?

Find the Seller with the Highest Average Price

Group and Analyze Reviews by Author

Create a Summary DataFrame that shows the following for each category of Books

What is the primary purpose of binning in data analysis?

Create Dummy Variables for Book Categories

Which method would you use to create bins with an equal number of items in each bin?

Categorize Prices into Bins

What's the main difference between `apply()` and `applymap()`?

Extract Year from `Published_Date`

Classify Titles by Length

Normalize Author and Category Names to Lowercase

Categorize Books Based on Ratings

Which of the following code snippets would you use to create a histogram with `10` bins for the `Price` column?

Bar Plot of Top 10 Authors by Number of Appearances

Pie Chart of Bestseller Distribution

Lohith Unnam

Data Wrangling with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database