All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Read the covid.csv
file into a dataframe named df
and include first column as the index column.
Choose the correct shape for the df
dataframe.
Choose the correct data type. There can be multiple correct answers.
Select the minimum and maximum values of the total_cases
column in the COVID-19 dataset stored in df
dataframe.
Select the total number of cases in the COVID-19 dataset using the total_cases
column in the df
dataframe.
Select the mean number of new cases per day in the COVID-19 dataset and select the correct answer. Answer is rounded to two decimal places.
Craete a new dataframe named df1
which contains only the continent
and location
columns from the df
dataframe.
Drop the iso_code
, new_cases_smoothed
, new_deaths_smoothed
, total_cases_per_million
, new_cases_per_million
, new_cases_smoothed_per_million
, total_deaths_per_million
, new_deaths_per_million
, and new_deaths_smoothed_per_million
columns from the df
dataframe.
Add a new row to the df
dataframe with the following values:
new_data = {'continent': ['Africa'], 'location': ['Zimbabwe'], 'date': ['2022-12-07'], 'total_cases': [259356.0], 'new_cases': [192.0], 'total_deaths': [5622.0], 'new_deaths': [2.0], 'population_density': [42.729], 'median_age': [19.6], 'aged_65_older': [2.822], 'aged_70_older': [1.845], 'gdp_per_capita': [1899.767], 'cardiovasc_death_rate': [307.846], 'diabetes_prevalence': [1.85], 'life_expectancy': [61.55], 'population': [16320539.0]}
Update the value of the total_cases
column for the row with index 166620 to 259357.0 in df
dataframe.
Update the values of the total_cases
column for the rows with index 166620 and 166621 to 259357.0 and 259358.0 respectively.
Remove the rows with index 166620
and 166621
from the dataframe.
Select all the rows from the dataframe where the total_cases
column is greater than 1000000.0. Store the result in a variable named df_1m
.
Select the total_cases
and total_deaths
columns for the rows with index 5168, 5172 and 163703. Store the result in a variable named df_cases_death
.
Sort the dataframe in ascending order of the total_cases
column. Store the result in a variable named df_sorted
.
Sort the dataframe in descending order of the total_cases
column. Store the result in a variable named df_sorted_desc
.
Sort the dataframe in descending order of the total_cases
column and then in ascending order of the total_deaths
column. Store the result in a variable named df_sorted_multi
.
Create a new column named total_cases_per_million
in the dataframe df
by dividing the total_cases
column by the population
column.
Update the total_cases_per_million
column in the dataframe df
by multiplying it by 1000.
Remove the total_cases_per_million
column from the df
dataframe.
Rename the total_cases
column to Total Cases
and the total_deaths
column to Total Deaths
.
Create three dataframe objects named df_india
, df_china
, and df_greater_new_cases
by filtering the df
dataframe object using boolean indexing as follows:
For df_india
, select all rows from the COVID-19 DataFrame where the location
is either "India" or "China".
For df_china
, select all rows from the COVID-19 DataFrame where the number of new_cases
is between 100000 and 200000.
For df_greater_new_cases
, select all rows from the COVID-19 DataFrame where the number of new_cases
per day is greater than or equal to 10000.
Read the data from the covid.csv
file and store it in the df_for_visualization
dataframe object. Also parse the date
column as a datetime object.
Filter the data_for_visualization
dataframe object to select only the rows where the date
is in the month of March 2020 and location
is India. Store the filtered dataframe object in the df_for_plot
variable.
Plot a line plot using the df_for_plot
dataframe object. The x-axis should be the date
column and the y-axis should be the new_cases
column. Based on the plot, which of the following statements is true?
Plot a bar plot using the df_for_plot
dataframe object. The x-axis should be the date
column and the y-axis should be the total_deaths
column. Based on the plot, which of the following statements is true?