All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Merge DataFrames df1 and df2 based on the common column ASIN using a left join. Store the result in the df variable.
Enter the number of categories of books present in the DataFrame df.
Filter the DataFrame df to include only books with a rating higher than 4.5. After filtering, sort the books by price in descending order and store the top five rows in top_rated_books.
Filter the DataFrame df to include only bestseller books published from 2020 onwards. After filtering, sort the books by rating in descending order and store the top ten rows in recent_bestsellers.
df.groupby('Category_Name')['Price'].mean()
Enter the name of the seller with the highest average book price.
Group the DataFrame df by Author and calculate the total number of reviews and the average rating for each author. After grouping, sort the result by the total number of reviews in descending order and store the top five authors in top_authors_by_reviews.
Store the resultant dataframe in category_summary_df
Create a new DataFrame containing dummy variables for the Category_Name column. Convert the categorical values in Category_Name into binary dummy variables. Store the resultant DataFrame in the variable category_dummies.
Create a new column Price_Category by binning the Price column into 5 equal-width bins.
Use the following bins and labels:
0-140: Very Low140-280: Low280-420: Medium420-560: High560-700: Very HighMake sure to set include_lowest=True
Use a lambda function with the apply() method to extract the year from the Published_Date column. Store the extracted year in published_year_series variable.
Calculate the length of each book title in terms of the number of words and store it in a new column Title_Length. Determine the median title length. Classify each title as either Long Title or Short Title based on Title_Length. If the Title_Length is greater than or equal to the median length then classify it as Long Title and if the Title_Length is less than median then classify as Short Title. Store this classification in a new column Title_Category.
Create a new DataFrame df_author_category containing only the Author and Category_Name columns from the original DataFrame df. Convert all text in these columns to lowercase using applymap() and store the result in df_lower_case.
Use the apply() method with a custom function to categorize books based on their ratings as follows:
5 stars: Excellent4 to 4.9 stars: Very Good3 to 3.9 stars: Good3 stars: AverageAnd store the categorized rating in Rating_Category column. After that, use groupby() to count how many books fall into each category. Store the result as a series in the rating_counts_series variable.
Create a bar plot displaying the top 10 authors who appear most frequently in our dataset.
Create a pie chart to visualize the distribution of books that are bestsellers versus those that are not.