All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Merge DataFrames df1
and df2
based on the common column ASIN
using a left join. Store the result in the df
variable.
Enter the number of categories of books present in the DataFrame df
.
Filter the DataFrame df
to include only books with a rating higher than 4.5
. After filtering, sort the books by price in descending order and store the top five rows in top_rated_books
.
Filter the DataFrame df
to include only bestseller books published from 2020
onwards. After filtering, sort the books by rating in descending order and store the top ten rows in recent_bestsellers
.
df.groupby('Category_Name')['Price'].mean()
Enter the name of the seller with the highest average book price.
Group the DataFrame df
by Author
and calculate the total number of reviews and the average rating for each author. After grouping, sort the result by the total number of reviews in descending order and store the top five authors in top_authors_by_reviews
.
Store the resultant dataframe in category_summary_df
Create a new DataFrame containing dummy variables for the Category_Name
column. Convert the categorical values in Category_Name
into binary dummy variables. Store the resultant DataFrame in the variable category_dummies
.
Create a new column Price_Category
by binning the Price
column into 5
equal-width bins.
Use the following bins and labels:
0-140
: Very Low140-280
: Low280-420
: Medium420-560
: High560-700
: Very HighMake sure to set include_lowest=True
Use a lambda function with the apply()
method to extract the year from the Published_Date
column. Store the extracted year in published_year_series
variable.
Calculate the length of each book title in terms of the number of words and store it in a new column Title_Length
. Determine the median title length. Classify each title as either Long Title
or Short Title
based on Title_Length
. If the Title_Length
is greater than or equal to the median length then classify it as Long Title
and if the Title_Length
is less than median then classify as Short Title
. Store this classification in a new column Title_Category
.
Create a new DataFrame df_author_category
containing only the Author
and Category_Name
columns from the original DataFrame df
. Convert all text in these columns to lowercase using applymap()
and store the result in df_lower_case
.
Use the apply()
method with a custom function to categorize books based on their ratings as follows:
5
stars: Excellent
4
to 4.9
stars: Very Good
3
to 3.9
stars: Good
3
stars: Average
And store the categorized rating in Rating_Category
column. After that, use groupby()
to count how many books fall into each category. Store the result as a series in the rating_counts_series
variable.
Create a bar plot displaying the top 10
authors who appear most frequently in our dataset.
Create a pie chart to visualize the distribution of books that are bestsellers versus those that are not.