All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
In this activity, you will create your custom dataframe from the dictionary of the below data:
| Name | Age | Sex |
|---|---|---|
| Alice | 25 | F |
| Bob | 30 | M |
| Charlie | 45 | M |
| Diana | 20 | F |
| Emma | 28 | F |
| Frank | 50 | M |
| Grace | 32 | F |
| Henry | 37 | M |
| Isabella | 23 | F |
| Jack | 42 | M |
| Karen | 29 | F |
| Liam | 31 | M |
| Maria | 48 | F |
| Nathan | 27 | M |
| Olivia | 36 | F |
| Peter | 41 | M |
Don't worry you didn't need to write all the data, just copy below dictionary and create a new dataframe name df from it.
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Emma', 'Frank', 'Grace', 'Henry', 'Isabella', 'Jack', 'Karen', 'Liam', 'Maria', 'Nathan', 'Olivia', 'Peter'],
'Age': [25, 30, 45, 20, 28, 50, 32, 37, 23, 42, 29, 31, 48, 27, 36, 41],
'Sex': ['F', 'M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
}
Select all the possible the correct answer from the the below options.
Create a new colunm in above dataframe named Status, with the following data.
| Name | Age | Sex | Status |
|---|---|---|---|
| Alice | 25 | F | Student |
| Bob | 30 | M | Worker |
| Charlie | 45 | M | Worker |
| Diana | 20 | F | Student |
| Emma | 28 | F | Student |
| Frank | 50 | M | Retiree |
| Grace | 32 | F | Worker |
| Henry | 37 | M | Worker |
| Isabella | 23 | F | Student |
| Jack | 42 | M | Worker |
| Karen | 29 | F | Student |
| Liam | 31 | M | Worker |
| Maria | 48 | F | Retiree |
| Nathan | 27 | M | Student |
| Olivia | 36 | F | Worker |
| Peter | 41 | M | Worker |
Below is the list of values for Status column:
['Student', 'Worker', 'Worker', 'Student', 'Student', 'Retiree', 'Worker', 'Worker', 'Student', 'Worker', 'Student', 'Worker', 'Retiree', 'Student', 'Worker', 'Worker']
In this activity, you will rename columns in a DataFrame. Rename as below:
Drop rows from DataFrame which have index 4.
Drop rows from DataFrame which have index 7 and 9.
Drop rows from DataFrame which have 'Bob' as Full Name.
Add new row at the end of DataFrame with below values:
'Full Name': 'Emma'
'Years Old': 28,
'Gender': 'F'
'Status': 'Student'
Add new row to DataFrame as below data:
| Full Name | Years Old | Gender | Status |
|---|---|---|---|
| Bob | 30 | M | Worker |
| Emma | 28 | F | Student |
| Henry | 37 | M | Worker |
| Jack | 42 | M | Worker |
Make sure to use ignore_index=True while appending the rows otherwise you will fail to pass the activity.
Filter data with Gender as F, Status as Student and Years Old' greater than 20 years old. After filtering store data in new DataFrame name Filter_Data.
From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.
From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.
From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.
Create a line chart between Full Name and Years Old. Mark Full Name on x-axis and Years Old on y-axis for df dataframe.
Based on line chart check all correct options.
Create a bar chart of the Gender and based on bar chart check all correct options for df dataframe.
Create a pie chart for 'Gender' and choose all correct options for df dataframe.. Also add autopct='%1.1f%%' to show percentage on pie chart.
Read the Titanic dataset from a CSV file into a pandas DataFrame. Store the results in dataframe named df.
Use the info() method to get information about the data types in the dataset. Check all the correct answers.
Use the describe() method to get basic statistical information about the numeric columns in the dataset. Select minimum and maximum Fare from the information.
Calculate the mean, median, and standard deviation of the 'Age' column.
age_meanage_medianage_std_deviationCreate a new column called 'Family Size' that combines the 'Siblings/Spouses Aboard' and 'Parents/Children Aboard' columns.
Rename the 'Fare' column to 'Ticket Price'.
Rename the columns of a DataFrame as below:
'Pclass' -> 'Passenger Class'
'Name' -> 'Full Name'
Drop the rows in the dataset where the 'Age' is less than 18 years old.
Create a new row in the dataset. Add the following values in respective columns:
Use below data to create the new row:
df2 = {'Survived': 0,
'Passenger Class': 3,
'Full Name': 'Harry',
'Sex': 'male',
'Age': 30,
'Siblings/Spouses Aboard': 0,
'Parents/Children Aboard': 2,
'Ticket Price': 50.00,
'Family Size': 3
}
Filter the dataset to only include passengers who were in first class and paid more than $100 for their ticket.
Store the filtered dataset in filtered_df variable.
Create a bar chart showing the number of passengers in each class(Pclass).
Note:
titanic.csv file again and store it in new_df variable.counts variable.Create a pie chart showing the percentage of male and female passengers in the dataset.
Note:
titanic.csv file again and store it in new_df variable.gender_counts variable.