All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Identify the continuous variables (numerical data types: int64
or float64
) in the df
and store their column names in continuous_vars
.
Create a new categorical column age_group
in the df
by discretizing the age
column into the following bins and labels: bin 0, 18
with label Child
, bin 18, 35
with label Young
, bin 35, 65
with label Adult
, and bin 65, 100
with label Senior
.
Note : New column added at the end of the
df
Create a new categorical column balance_range
in the DataFrame df
by binning the balance
column into four equal-sized quantile ranges, with labels Low
, Medium
, High
, and Very High
.
Note : New column added at the end of the
df
Create a new categorical column duration_cat
in the df
by binning the duration
column into the following intervals and labels:
-1, 10
, Label: Very Short
10, 20
, Label: Short
20, 30
, Label: Medium
30, 60
, Label: Long
60, maximum duration value
, Label: Very Long
Note : New column added at the end of the
df
Create a new categorical column campaign_intensity
in the df
by binning the campaign
column into the following intervals and labels:
-1, 5
, Label: Low
5, 10
, Label: Medium
10, 20
, Label: High
20, maximum campaign value
, Label: Very High
Note : New column added at the end of the
df
Create a new categorical column recency_cat
in the df
by binning the pdays
column into intervals representing the recency of previous contact days, with the following labels: Very Recent
for the interval -1, 7
, Recent
for 7, 30
, Moderate
for 30, 60
, and Old
for 60, maximum pdays value
.
Note : New column added at the end of the
df
Create a new categorical column prev_contact_level
in the df
by binning the previous
column into the following intervals and labels: bin -1, 0
with label None
, bin 0, 2
with label Low
, bin 2, 5
with label Medium
, and bin 5, maximum previous value
with label High
.
Note : New column added at the end of the
df
Create a new column age_bins
in the df
by binning the age
column into 5 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column pdays_bins
in the df
by binning the pdays
column into 5 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column previous_bins
in the df
by binning the previous
column into 4 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column age_custom_bins
in the DataFrame df
by binning the age
column into the following custom-defined age bins without assigning labels: bin 0, 20
, bin 20, 30
, bin 30, 40
, bin 40, 60
, and bin 60, 100
.
Note : New column added at the end of the
df
Create a new column balance_labeled_bins
in the df
by binning the balance
column into 5 equal-width bins and assigning the following custom labels to the bins: Very Low
for the first bin, Low
for the second bin, Medium
for the third bin, High
for the fourth bin, and Very High
for the fifth bin.
Note : New column added at the end of the
df
Create a new column balance_bins
in the df
by binning the balance
column into 4 equal-sized quantile bins without assigning labels to the bins, and drop any duplicate values.
Note : New column added at the end of the
df
Create a new column duration_bins
in the df
by binning the duration
column into 10 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column campaign_bins
in the df
by binning the campaign
column into 3 equal-sized quantile bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new set of dummy variables (one-hot encoded) for the job
column in the df
and store them in a new DataFrame job_dummies
.
Create a new set of dummy variables (one-hot encoded) for the marital
column in the df
, while dropping the first dummy variable to avoid the dummy variable trap. Store the resulting dummy variables in a new DataFrame marital_dummies
.
Create a new set of dummy variables (one-hot encoded) for the education
column in the df
, while handling missing values by creating a separate dummy variable for them. Store the resulting dummy variables in a new DataFrame education_dummies
. Rename the dummy variable column representing missing values to education_nan
for better readability.
Create a new set of dummy variables (one-hot encoded) for the categorical columns job
, marital
, and education
in the df
. Prefix the dummy variable column names with job
, marital
, and education
respectively to distinguish them from other dummy variables. Store the resulting dummy variables in categorical_dummies
.
Create a new set of dummy variables (one-hot encoded) for the job
column in the df
, prefixed with job
. Store the result in job_dummies
. Then, concatenate the original df
with the newly created job dummy variables along the column axis to form a new DataFrame df_with_dummies
containing all the original columns and the new job dummy variable columns.