All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Merge DataFrames df_numeric
and df_string
based on the common column Booking_ID
using a left join. Store the resultant dataframe in df
variable.
Filter the DataFrame df
to display only the rows where the number of adults is greater than 2
. Store the resultant dataframe in df_n_adults
variable.
Filter the DataFrame df
to include only bookings from the arrival year 2018
with a lead time greater than 100
. After filtering, sort the results by avg_price_per_room
in descending order. Store the resultant dataframe in filtered_df
variable.
Group the DataFrame df
by room_type_reserved
and calculate the average avg_price_per_room
for each room type. Store the result in avg_price_by_room
.
Enter the name of the room type that has the highest average price per room.
Group the DataFrame df
by arrival_month
and calculate the count
, mean
and median
of avg_price_per_room
for each month. Store the result in monthly_stats
variable.
Use groupby()
and agg()
to calculate sum of avg_price_per_room
and the booking count for each combination of arrival_year
and arrival_month
. Store the resultant dataframe in revenue_summary
variable.
Generate dummy variables for the type_of_meal_plan
column in the DataFrame df
. After creating the dummy variables, drop the original type_of_meal_plan
column, if it exists. Set drop_first=True
. Store the result in df
.
Create a new column price_category
in the DataFrame df
by binning the avg_price_per_room
into three distinct categories:
Budget
- [0 - 180]
Standard
- (180 - 360]
Luxury
- (360 - 540]
Use the apply()
function to create a new column total_nights
in the DataFrame df
. This column will be the sum of no_of_weekend_nights
and no_of_week_nights
, providing a total count of nights stayed for each booking.
Create a new DataFrame named string_df
and store all the string columns in that DataFrame. Use the applymap()
function to convert all string values in string_df
to lowercase.
Create a bar plot that visualizes the average lead_time
for each market_segment_type
in the df
. Set color='orange
to make the bars in orange color.
Create a scatter plot to visualize the relationship between lead_time
and avg_price_per_room
.
Plot a stacked bar chart showing the distribution of canceled vs. not canceled bookings across different market segments.
Plot a line chart showing the number of bookings for each room type over different months of the year.