Practice DataFrame Mutations using Good Reads Books and Reviews Data
Practice DataFrame Mutations using Good Reads Books and Reviews Data Data Science Project
Intro to Pandas for Data Analysis

Practice DataFrame Mutations using Good Reads Books and Reviews Data

In this coding lab, you'll manipulate a dataset of "Best Books Ever" from Goodreads. Tasks include creating new columns, deleting rows and columns, modifying values, adding new rows, and utilizing the inplace parameter.
Start this project
Practice DataFrame Mutations using Good Reads Books and Reviews DataPractice DataFrame Mutations using Good Reads Books and Reviews Data
Project Created by

Anurag Verma

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Calculating the Price-to-Rating Ratio

Create a new column Price-to-Rating Ratio in the DataFrame that calculates the price-to-rating ratio for each book. This ratio will help us understand how the price of a book relates to its average rating.

codevalidated

Remove the `isbn` Column

The "isbn" column is not needed for our analysis. Write a script to remove this column from the dataframe.

codevalidated

Extract and Add the `Year Published` Column

Write a script to extract the publication year from the publishDate column and create a new column named YearPublished in the dataframe.

After extracting the year, convert it to a datetime format with only the year (e.g., 2000, 2001, etc.).

codevalidated

Filter Books with Ratings Above 4.5

Create a new dataframe that only includes books with ratings equal to or above 4.5. Name this new dataframe best_books.

codevalidated

Count and Add the Number of Genres

Each book is associated with multiple genres in the form of list of strings. Create a new column GenreCount that stores the number of genres associated with each book.

codevalidated

Split Author Names into First and Last Name Columns

Some analyses might require having the author's first and last names in separate columns. Write a script to create two new columns, FirstName and LastName, from the author column. For simplicity, assume the last word in the author field is the last name and everything before it is the first name.

codevalidated

Drop Books with Fewer than 100 Pages

Some entries in the dataset might represent short stories or other short works. For this activity, remove all rows from the dataframe where the number of pages is less than 100.

codevalidated

Extract the Primary Genre

Each book can belong to multiple genres. Create a new column PrimaryGenre that contains only the first genre listed for each book. Genre column contains a list of genres.

Note: The genres column contains a string representation of a list of genres. You can use the eval function to convert the string representation back into a list.

Also if the genres column contains an empty list, set the value of PrimaryGenre to None.

codevalidated

Flag Books with multiple Awards

Create a new column MultipleAwards that flags books that have won multiple awards. If a book has won more than one award, set the value of MultipleAwards to True; otherwise, set it to False.

codevalidated

Estimate Reading Time Based on Page Count

Assuming an average reading speed of 250 words per minute and approximately 300 words per page, create a new column ReadingTimeHours that estimates the reading time in hours for each book.

codevalidated

Flag 21st Century Publications

Create a new column Published21stCentury that flags (True/False) whether a book was published in the 21st century (year 2000 and onwards).

codevalidated

Simplifying the DataFrame by Dropping Columns

Drop the coverImg, description, and ratingsByStars columns from the dataframe as they will not be used in further analysis. Drop these columns permanently by setting the inplace parameter to True.

codevalidated

Adding a New Book Entry

Add a new book entry to the dataframe with the following details:

new_book = {
    "bookID": 10000,
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": '1925-04-10',
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}

Add this new entry to the index len(df).

codevalidated

Transforming Publish Dates into Datetime Format

The publishDate and firstPublishDate columns contain dates in object(string) format. Convert these columns into datetime objects to enable more sophisticated date-based operations and analyses.

Use the format "%Y-%m-%d" to convert the string dates into datetime objects.

codevalidated

Bulk Adding New Book Entries to the DataFrame

Add multiple new book entries to the DataFrame at once. This activity involves creating a list of dictionaries, where each dictionary represents a new book entry with values for all the relevant columns, and then appending this list to the existing DataFrame.

Below are the details of the new book entries:

new_books = [
    {
        "bookID": 10001,
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "rating": 4.3,
        "pages": 281,
        "publishDate": pd.to_datetime('1960-07-11'),
        "firstPublishDate": pd.to_datetime('1960-07-11'),
        "publisher": "J.B. Lippincott & Co.",
        "price": 9.99,
        "genres": "['Fiction', 'Classics']",
        "GenreCount": 2,
        "FirstName": "Harper",
        "LastName": "Lee",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 11.24,
        "Published21stCentury": False
    },
    {
        "bookID": 10002,
        "title": "1984",
        "author": "George Orwell",
        "rating": 4.2,
        "pages": 328,
        "publishDate": pd.to_datetime('1949-06-08'),
        "firstPublishDate": pd.to_datetime('1949-06-08'),
        "publisher": "Secker & Warburg",
        "price": 12.99,
        "genres": "['Fiction', 'Classics']",
        "GenreCount": 2,
        "FirstName": "George",
        "LastName": "Orwell",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 13.12,
        "Published21stCentury": False
    }
]

Add these new entries to the dataframe at position len(df) and len(df) + 1 respectively.

Practice DataFrame Mutations using Good Reads Books and Reviews DataPractice DataFrame Mutations using Good Reads Books and Reviews Data
Project Created by

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

This project is part of

Intro to Pandas for Data Analysis

Explore other projects