All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Read the covid.csv file into a dataframe named df and include first column as the index column.
Choose the correct shape for the df dataframe.
Choose the correct data type. There can be multiple correct answers.
Select the minimum and maximum values of the total_cases column in the COVID-19 dataset stored in df dataframe.
Select the total number of cases in the COVID-19 dataset using the total_cases column in the df dataframe.
Select the mean number of new cases per day in the COVID-19 dataset and select the correct answer. Answer is rounded to two decimal places.
Craete a new dataframe named df1 which contains only the continent and location columns from the df dataframe.
Drop the iso_code, new_cases_smoothed, new_deaths_smoothed, total_cases_per_million, new_cases_per_million, new_cases_smoothed_per_million, total_deaths_per_million, new_deaths_per_million, and new_deaths_smoothed_per_million columns from the df dataframe.
Add a new row to the df dataframe with the following values:
new_data = {'continent': ['Africa'], 'location': ['Zimbabwe'], 'date': ['2022-12-07'], 'total_cases': [259356.0], 'new_cases': [192.0], 'total_deaths': [5622.0], 'new_deaths': [2.0], 'population_density': [42.729], 'median_age': [19.6], 'aged_65_older': [2.822], 'aged_70_older': [1.845], 'gdp_per_capita': [1899.767], 'cardiovasc_death_rate': [307.846], 'diabetes_prevalence': [1.85], 'life_expectancy': [61.55], 'population': [16320539.0]}
Update the value of the total_cases column for the row with index 166620 to 259357.0 in df dataframe.
Update the values of the total_cases column for the rows with index 166620 and 166621 to 259357.0 and 259358.0 respectively.
Remove the rows with index 166620 and 166621 from the dataframe.
Select all the rows from the dataframe where the total_cases column is greater than 1000000.0. Store the result in a variable named df_1m.
Select the total_cases and total_deaths columns for the rows with index 5168, 5172 and 163703. Store the result in a variable named df_cases_death.
Sort the dataframe in ascending order of the total_cases column. Store the result in a variable named df_sorted.
Sort the dataframe in descending order of the total_cases column. Store the result in a variable named df_sorted_desc.
Sort the dataframe in descending order of the total_cases column and then in ascending order of the total_deaths column. Store the result in a variable named df_sorted_multi.
Create a new column named total_cases_per_million in the dataframe df by dividing the total_cases column by the population column.
Update the total_cases_per_million column in the dataframe df by multiplying it by 1000.
Remove the total_cases_per_million column from the df dataframe.
Rename the total_cases column to Total Cases and the total_deaths column to Total Deaths.
Create three dataframe objects named df_india, df_china, and df_greater_new_cases by filtering the df dataframe object using boolean indexing as follows:
For df_india, select all rows from the COVID-19 DataFrame where the location is either "India" or "China".
For df_china, select all rows from the COVID-19 DataFrame where the number of new_cases is between 100000 and 200000.
For df_greater_new_cases, select all rows from the COVID-19 DataFrame where the number of new_cases per day is greater than or equal to 10000.
Read the data from the covid.csv file and store it in the df_for_visualization dataframe object. Also parse the date column as a datetime object.
Filter the data_for_visualization dataframe object to select only the rows where the date is in the month of March 2020 and location is India. Store the filtered dataframe object in the df_for_plot variable.
Plot a line plot using the df_for_plot dataframe object. The x-axis should be the date column and the y-axis should be the new_cases column. Based on the plot, which of the following statements is true?
Plot a bar plot using the df_for_plot dataframe object. The x-axis should be the date column and the y-axis should be the total_deaths column. Based on the plot, which of the following statements is true?