All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
In this activity, you will create your custom dataframe from the dictionary of the below data:
Name | Age | Sex |
---|---|---|
Alice | 25 | F |
Bob | 30 | M |
Charlie | 45 | M |
Diana | 20 | F |
Emma | 28 | F |
Frank | 50 | M |
Grace | 32 | F |
Henry | 37 | M |
Isabella | 23 | F |
Jack | 42 | M |
Karen | 29 | F |
Liam | 31 | M |
Maria | 48 | F |
Nathan | 27 | M |
Olivia | 36 | F |
Peter | 41 | M |
Don't worry you didn't need to write all the data, just copy below dictionary and create a new dataframe name df
from it.
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Emma', 'Frank', 'Grace', 'Henry', 'Isabella', 'Jack', 'Karen', 'Liam', 'Maria', 'Nathan', 'Olivia', 'Peter'],
'Age': [25, 30, 45, 20, 28, 50, 32, 37, 23, 42, 29, 31, 48, 27, 36, 41],
'Sex': ['F', 'M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
}
Select all the possible the correct answer from the the below options.
Create a new colunm in above dataframe named Status
, with the following data.
Name | Age | Sex | Status |
---|---|---|---|
Alice | 25 | F | Student |
Bob | 30 | M | Worker |
Charlie | 45 | M | Worker |
Diana | 20 | F | Student |
Emma | 28 | F | Student |
Frank | 50 | M | Retiree |
Grace | 32 | F | Worker |
Henry | 37 | M | Worker |
Isabella | 23 | F | Student |
Jack | 42 | M | Worker |
Karen | 29 | F | Student |
Liam | 31 | M | Worker |
Maria | 48 | F | Retiree |
Nathan | 27 | M | Student |
Olivia | 36 | F | Worker |
Peter | 41 | M | Worker |
Below is the list of values for Status column:
['Student', 'Worker', 'Worker', 'Student', 'Student', 'Retiree', 'Worker', 'Worker', 'Student', 'Worker', 'Student', 'Worker', 'Retiree', 'Student', 'Worker', 'Worker']
In this activity, you will rename columns in a DataFrame. Rename as below:
Drop rows from DataFrame which have index 4.
Drop rows from DataFrame which have index 7 and 9.
Drop rows from DataFrame which have 'Bob' as Full Name
.
Add new row at the end of DataFrame with below values:
'Full Name': 'Emma'
'Years Old': 28,
'Gender': 'F'
'Status': 'Student'
Add new row to DataFrame as below data:
Full Name | Years Old | Gender | Status |
---|---|---|---|
Bob | 30 | M | Worker |
Emma | 28 | F | Student |
Henry | 37 | M | Worker |
Jack | 42 | M | Worker |
Make sure to use ignore_index=True
while appending the rows otherwise you will fail to pass the activity.
Filter data with Gender
as F
, Status
as Student
and Years Old'
greater than 20 years old. After filtering store data in new DataFrame name Filter_Data
.
From the previous created dataframe Filter_Data
, analyse Filter_Data
and check all correct options.
From the previous created dataframe Filter_Data
, analyse Filter_Data
and check all correct options.
From the previous created dataframe Filter_Data
, analyse Filter_Data
and check all correct options.
Create a line chart between Full Name
and Years Old
. Mark Full Name
on x-axis and Years Old
on y-axis for df
dataframe.
Based on line chart check all correct options.
Create a bar chart of the Gender
and based on bar chart check all correct options for df
dataframe.
Create a pie chart for 'Gender' and choose all correct options for df
dataframe.. Also add autopct='%1.1f%%'
to show percentage on pie chart.
Read the Titanic dataset from a CSV file into a pandas DataFrame. Store the results in dataframe named df
.
Use the info()
method to get information about the data types in the dataset. Check all the correct answers.
Use the describe()
method to get basic statistical information about the numeric columns in the dataset. Select minimum and maximum Fare from the information.
Calculate the mean, median, and standard deviation of the 'Age' column.
age_mean
age_median
age_std_deviation
Create a new column called 'Family Size' that combines the 'Siblings/Spouses Aboard' and 'Parents/Children Aboard' columns.
Rename the 'Fare' column to 'Ticket Price'.
Rename the columns of a DataFrame as below:
'Pclass' -> 'Passenger Class'
'Name' -> 'Full Name'
Drop the rows in the dataset where the 'Age' is less than 18 years old.
Create a new row in the dataset. Add the following values in respective columns:
Use below data to create the new row:
df2 = {'Survived': 0,
'Passenger Class': 3,
'Full Name': 'Harry',
'Sex': 'male',
'Age': 30,
'Siblings/Spouses Aboard': 0,
'Parents/Children Aboard': 2,
'Ticket Price': 50.00,
'Family Size': 3
}
Filter the dataset to only include passengers who were in first class and paid more than $100 for their ticket.
Store the filtered dataset in filtered_df
variable.
Create a bar chart showing the number of passengers in each class(Pclass
).
Note:
titanic.csv
file again and store it in new_df
variable.counts
variable.Create a pie chart showing the percentage of male and female passengers in the dataset.
Note:
titanic.csv
file again and store it in new_df
variable.gender_counts
variable.