Practice DataFrame Mutations using Good Reads Books and Reviews Data

codevalidated

Calculating the Price-to-Rating Ratio

Create a new column Price-to-Rating Ratio in the DataFrame that calculates the price-to-rating ratio for each book. This ratio will help us understand how the price of a book relates to its average rating.

codevalidated

Remove the `isbn` Column

The "isbn" column is not needed for our analysis. Write a script to remove this column from the dataframe.

codevalidated

Extract and Add the `Year Published` Column

Write a script to extract the publication year from the publishDate column and create a new column named YearPublished in the dataframe.

After extracting the year, convert it to a datetime format with only the year (e.g., 2000, 2001, etc.).

codevalidated

Filter Books with Ratings Above 4.5

Create a new dataframe that only includes books with ratings equal to or above 4.5. Name this new dataframe best_books.

codevalidated

Count and Add the Number of Genres

Each book is associated with multiple genres in the form of list of strings. Create a new column GenreCount that stores the number of genres associated with each book.

codevalidated

Split Author Names into First and Last Name Columns

Some analyses might require having the author's first and last names in separate columns. Write a script to create two new columns, FirstName and LastName, from the author column. For simplicity, assume the last word in the author field is the last name and everything before it is the first name.

codevalidated

Drop Books with Fewer than 100 Pages

Some entries in the dataset might represent short stories or other short works. For this activity, remove all rows from the dataframe where the number of pages is less than 100.

codevalidated

Extract the Primary Genre

Each book can belong to multiple genres. Create a new column PrimaryGenre that contains only the first genre listed for each book. Genre column contains a list of genres.

Note: The genres column contains a string representation of a list of genres. You can use the eval function to convert the string representation back into a list.

Also if the genres column contains an empty list, set the value of PrimaryGenre to None.

codevalidated

Flag Books with multiple Awards

Create a new column MultipleAwards that flags books that have won multiple awards. If a book has won more than one award, set the value of MultipleAwards to True; otherwise, set it to False.

codevalidated

Estimate Reading Time Based on Page Count

Assuming an average reading speed of 250 words per minute and approximately 300 words per page, create a new column ReadingTimeHours that estimates the reading time in hours for each book.

codevalidated

Flag 21st Century Publications

Create a new column Published21stCentury that flags (True/False) whether a book was published in the 21st century (year 2000 and onwards).

codevalidated

Simplifying the DataFrame by Dropping Columns

Drop the coverImg, description, and ratingsByStars columns from the dataframe as they will not be used in further analysis. Drop these columns permanently by setting the inplace parameter to True.

codevalidated

Adding a New Book Entry

Add a new book entry to the dataframe with the following details:

new_book = {
    "bookID": 10000,
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "rating": 3.9,
    "pages": 180,
    "publishDate": '1925-04-10',
    "publisher": "Scribner",
    "price": 7.99,
    "genres": "['Fiction', 'Classics']",
    "GenreCount": 2,
    "FirstName": "F.",
    "LastName": "Fitzgerald",
    "PrimaryGenre": "Fiction",
    "MultipleAwards": False,
    "ReadingTimeHours": 9.0,
    "Published21stCentury": True
}

Add this new entry to the index len(df).

codevalidated

Transforming Publish Dates into Datetime Format

The publishDate and firstPublishDate columns contain dates in object(string) format. Convert these columns into datetime objects to enable more sophisticated date-based operations and analyses.

Use the format "%Y-%m-%d" to convert the string dates into datetime objects.

codevalidated

Bulk Adding New Book Entries to the DataFrame

Add multiple new book entries to the DataFrame at once. This activity involves creating a list of dictionaries, where each dictionary represents a new book entry with values for all the relevant columns, and then appending this list to the existing DataFrame.

Below are the details of the new book entries:

new_books = [
    {
        "bookID": 10001,
        "title": "To Kill a Mockingbird",
        "author": "Harper Lee",
        "rating": 4.3,
        "pages": 281,
        "publishDate": pd.to_datetime('1960-07-11'),
        "firstPublishDate": pd.to_datetime('1960-07-11'),
        "publisher": "J.B. Lippincott & Co.",
        "price": 9.99,
        "genres": "['Fiction', 'Classics']",
        "GenreCount": 2,
        "FirstName": "Harper",
        "LastName": "Lee",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 11.24,
        "Published21stCentury": False
    },
    {
        "bookID": 10002,
        "title": "1984",
        "author": "George Orwell",
        "rating": 4.2,
        "pages": 328,
        "publishDate": pd.to_datetime('1949-06-08'),
        "firstPublishDate": pd.to_datetime('1949-06-08'),
        "publisher": "Secker & Warburg",
        "price": 12.99,
        "genres": "['Fiction', 'Classics']",
        "GenreCount": 2,
        "FirstName": "George",
        "LastName": "Orwell",
        "PrimaryGenre": "Fiction",
        "MultipleAwards": False,
        "ReadingTimeHours": 13.12,
        "Published21stCentury": False
    }
]

Add these new entries to the dataframe at position len(df) and len(df) + 1 respectively.

Anurag Verma

Project Activities

Calculating the Price-to-Rating Ratio

Remove the `isbn` Column

Extract and Add the `Year Published` Column

Filter Books with Ratings Above 4.5

Count and Add the Number of Genres

Split Author Names into First and Last Name Columns

Drop Books with Fewer than 100 Pages

Extract the Primary Genre

Flag Books with multiple Awards

Estimate Reading Time Based on Page Count

Flag 21st Century Publications

Simplifying the DataFrame by Dropping Columns

Adding a New Book Entry

Transforming Publish Dates into Datetime Format

Bulk Adding New Book Entries to the DataFrame

Anurag Verma

Intro to Pandas for Data Analysis

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database