All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Group the dataset by Gender
and calculate the average Age
for each gender group. This will help you understand the average age of male and female customers.
Enter the average age of Female
in exact decimals returned.
Group the dataset by Marital Status
and count the frequency of orders where Output
is Yes
. This will help you identify how marital status affects ordering trends.
Store the result in the orders_by_marital_status
variable.
The result should match the following output :
Group the dataset by Occupation
and count the frequency of orders where Output
is "Yes". This analysis will show which occupations order more frequently.
Store the result in the orders_by_occupation
variable.
The result should match the following output :
Group the dataset by Educational Qualifications
and count the number of orders where Output
is "Yes".
Use groupby
followed by size
to count occurrences, then reset_index
to convert the groupby result into a DataFrame, facilitating easier analysis and visualization.
This will help you determine if education level affects ordering behavior.
Store the result in the orders_by_education
variable
The result should match the following output :
Group the dataset by Family size
and count the number of orders where Output
is "Yes". This will help you understand if larger families order more frequently.
Store the result in the orders_by_family_size
variable.
The result should match the following output :
Split the dataset into subsets based on Gender
(Male and Female) and later concatenate them to compare findings between genders.
Store the result in the concatenated_data
variable.
The result should match the following output :
Split the dataset into subsets based on No Income
and Below Rs.10000
from the Monthly Income
column, then concatenate them.
After concatenation, reset the index using reset_index(drop=True)
to ensure the index is continuous and without duplicates.
Store the result in the concatenated_data_income
variable.
The result should match the following output :
Split the Feedback
data into Positive
and Negative
subsets, merge these analyses on Occupation
to get a comprehensive view of customer sentiments.
After grouping by Occupation
, use reset_index
to convert the indices into columns, and specify the column name for the count of feedback using name='Positive'
for positive feedback and name='Negative'
for negative feedback.
Note: It is 'Negative ' and not
Negative
. In the dataset there is space after the word the Negative.
Store the result in the merged_feedback
variable.
The result should match the following output :
Split the dataset into subsets based on Post_Graduate
, Graduate
and Ph.D
from the Educational Qualifications
column, then concatenate them.
After concatenation, reset the index using reset_index(drop=True)
to ensure the index is continuous and without duplicates.
Store the result in the concatenated_education_data
variable.
The result should match the following output :
Split the dataset into subsets based on Family size
(1-2, 3-4, and 5 or more) and later concatenate them to analyze the effect of family size on feedback.
Store the result in the concatenated_family_size_data
variable.
The result should match the following output:
Use the applymap
function to convert all entries in the Occupation
column to uppercase to standardize the data.
Store the result in the df
DataFrame with a new column Occupation Uppercase
.
The result should match the following output :
Use the where
function to identify orders from families with a size greater than 4. This will help in targeting larger families for marketing campaigns.
Use notna()
method to remove NAN values created by where
.
Store the result in the large_family_orders
variable.
The result should match the following output :
Apply a custom function to derive geographical insights based on latitude
and longitude
. This will help in understanding the geographical distribution of orders.
Use the pandas apply
method to apply this function across the DataFrame. The apply
method should be used with axis=1
, which ensures that the function is applied to each row individually.
Store the result in the df
DataFrame with a new column Location Insights
.
The result should match the following output :
Convert the Marital Status
column into dummy variables for regression or classification analysis. This will help in predictive modeling.
Store the result in the marital_status_dummies
variable.
The result should match the following output :
Convert the Occupation
column into dummy variables for analysis of occupational impacts on ordering habits. This will facilitate regression analysis.
Store the result in the occupation_dummies
variable.
The result should match the following output :
Filter the dataset to include where occupation is Student
with No Income
to analyze their ordering patterns. This will help in understanding the behavior of student customers.
Store the result in the student_orders
variable.
The result should match the following output :
Group by Educational Qualifications
and Feedback
to assess if different educational qualifications correlate with specific types of feedback. This will help in understanding how education level influences customer satisfaction.
Enter the number of Positive
and Negative
feedbacks for Educational Qualifications : Graduate
.
Note: Enter in comma seperated format, for example : 154, 20
Examine if there’s a direct correlation between the size of the family and the frequency of orders.
Group the dataset by Family size
and count the frequency of orders where Output
is Yes
.
Store the result in the family_size_order_frequency
variable.
The result should match the following output :