Pandas Capstone Project: Working with custom data + titanic bonus

codevalidated

Creating dataframe from dictionary

In this activity, you will create your custom dataframe from the dictionary of the below data:

Name	Age	Sex
Alice	25	F
Bob	30	M
Charlie	45	M
Diana	20	F
Emma	28	F
Frank	50	M
Grace	32	F
Henry	37	M
Isabella	23	F
Jack	42	M
Karen	29	F
Liam	31	M
Maria	48	F
Nathan	27	M
Olivia	36	F
Peter	41	M

Don't worry you didn't need to write all the data, just copy below dictionary and create a new dataframe name df from it.

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Emma', 'Frank', 'Grace', 'Henry', 'Isabella', 'Jack', 'Karen', 'Liam', 'Maria', 'Nathan', 'Olivia', 'Peter'],
    'Age': [25, 30, 45, 20, 28, 50, 32, 37, 23, 42, 29, 31, 48, 27, 36, 41],
    'Sex': ['F', 'M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
}

multiplechoice

Answering Basic Statistical Questions

Select all the possible the correct answer from the the below options.

codevalidated

Creating New Columns

Create a new colunm in above dataframe named Status, with the following data.

Name	Age	Sex	Status
Alice	25	F	Student
Bob	30	M	Worker
Charlie	45	M	Worker
Diana	20	F	Student
Emma	28	F	Student
Frank	50	M	Retiree
Grace	32	F	Worker
Henry	37	M	Worker
Isabella	23	F	Student
Jack	42	M	Worker
Karen	29	F	Student
Liam	31	M	Worker
Maria	48	F	Retiree
Nathan	27	M	Student
Olivia	36	F	Worker
Peter	41	M	Worker

Below is the list of values for Status column:

['Student', 'Worker', 'Worker', 'Student', 'Student', 'Retiree', 'Worker', 'Worker', 'Student', 'Worker', 'Student', 'Worker', 'Retiree', 'Student', 'Worker', 'Worker']

codevalidated

Renaming Columns

In this activity, you will rename columns in a DataFrame. Rename as below:

Name: Full Name
Age: Years Old
Sex: Gender

codevalidated

Drop single row

Drop rows from DataFrame which have index 4.

codevalidated

Drop multiple rows

Drop rows from DataFrame which have index 7 and 9.

codevalidated

Drop row with condition

Drop rows from DataFrame which have 'Bob' as Full Name.

codevalidated

Add new row to dataframe

Add new row at the end of DataFrame with below values:

'Full Name': 'Emma'
'Years Old': 28, 
'Gender': 'F'
'Status': 'Student'

codevalidated

Add multiple rows

Add new row to DataFrame as below data:

Full Name	Years Old	Gender	Status
Bob	30	M	Worker
Emma	28	F	Student
Henry	37	M	Worker
Jack	42	M	Worker

Make sure to use ignore_index=True while appending the rows otherwise you will fail to pass the activity.

codevalidated

Filter the data

Filter data with Gender as F, Status as Student and Years Old' greater than 20 years old. After filtering store data in new DataFrame name Filter_Data.

multiplechoice

Select all correct options

From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.

multiplechoice

Select all correct options

From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.

multiplechoice

Select all correct options

From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.

multiplechoice

Create Basic Plots: Line Chart

Create a line chart between Full Name and Years Old. Mark Full Name on x-axis and Years Old on y-axis for df dataframe.
Based on line chart check all correct options.

multiplechoice

Create Basic Plots: Bar Chart

Create a bar chart of the Gender and based on bar chart check all correct options for df dataframe.

multiplechoice

Create Basic Plots: Pie Chart

Create a pie chart for 'Gender' and choose all correct options for df dataframe.. Also add autopct='%1.1f%%' to show percentage on pie chart.

codevalidated

Reading the Titanic Dataset

Read the Titanic dataset from a CSV file into a pandas DataFrame. Store the results in dataframe named df.

multiplechoice

Getting Information About the Dataset

Use the info() method to get information about the data types in the dataset. Check all the correct answers.

multiplechoice

Getting Basic Statistical Information

Use the describe() method to get basic statistical information about the numeric columns in the dataset. Select minimum and maximum Fare from the information.

codevalidated

Calculating Basic Statistics for a Column

Calculate the mean, median, and standard deviation of the 'Age' column.

Store the mean in a variable named age_mean
Store the median in a variable named age_median
Store the standard deviation in a variable named age_std_deviation

codevalidated

Creating a New Column

Create a new column called 'Family Size' that combines the 'Siblings/Spouses Aboard' and 'Parents/Children Aboard' columns.

codevalidated

Renaming a Column

Rename the 'Fare' column to 'Ticket Price'.

codevalidated

Rename multiple columns.

Rename the columns of a DataFrame as below:

'Pclass' -> 'Passenger Class'
'Name' -> 'Full Name'

codevalidated

Dropping Rows

Drop the rows in the dataset where the 'Age' is less than 18 years old.

codevalidated

Adding a New Row

Create a new row in the dataset. Add the following values in respective columns:

Survived: 0
Passenger Class: 3
Full Name: 'Harry'
Sex: 'male'
Age: 30
Siblings/Spouses Aboard: 0
Parents/Children Aboard: 2
Ticket Price: 50.00
Family Size: 3

Use below data to create the new row:

df2 = {'Survived':  0, 
       'Passenger Class': 3, 
       'Full Name': 'Harry', 
       'Sex': 'male', 
       'Age': 30, 
       'Siblings/Spouses Aboard': 0, 
       'Parents/Children Aboard': 2, 
       'Ticket Price': 50.00, 
       'Family Size': 3
    }

codevalidated

Filtering the Dataset

Filter the dataset to only include passengers who were in first class and paid more than $100 for their ticket.
Store the filtered dataset in filtered_df variable.

codevalidated

Creating a Bar Chart

Create a bar chart showing the number of passengers in each class(Pclass).

Note:

Read titanic.csv file again and store it in new_df variable.
Store the counts of different classes in counts variable.

codevalidated

Creating a Pie Chart

Create a pie chart showing the percentage of male and female passengers in the dataset.

Note:

Read titanic.csv file again and store it in new_df variable.
Store the counts of male and female in gender_counts variable.

Anurag Verma

Project Activities

Creating dataframe from dictionary

Answering Basic Statistical Questions

Creating New Columns

Renaming Columns

Drop single row

Drop multiple rows

Drop row with condition

Add new row to dataframe

Add multiple rows

Filter the data

Select all correct options

Select all correct options

Select all correct options

Create Basic Plots: Line Chart

Create Basic Plots: Bar Chart

Create Basic Plots: Pie Chart

Reading the Titanic Dataset

Getting Information About the Dataset

Getting Basic Statistical Information

Calculating Basic Statistics for a Column

Creating a New Column

Renaming a Column

Rename multiple columns.

Dropping Rows

Adding a New Row

Filtering the Dataset

Creating a Bar Chart

Creating a Pie Chart

Anurag Verma

Intro to Pandas for Data Analysis

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database