All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Identify the continuous variables (numerical data types: int64 or float64) in the df and store their column names in continuous_vars.
Create a new categorical column age_group in the df by discretizing the age column into the following bins and labels: bin 0, 18 with label Child, bin 18, 35 with label Young, bin 35, 65 with label Adult, and bin 65, 100 with label Senior.
Note : New column added at the end of the
df
Create a new categorical column balance_range in the DataFrame df by binning the balance column into four equal-sized quantile ranges, with labels Low, Medium, High, and Very High.
Note : New column added at the end of the
df
Create a new categorical column duration_cat in the df by binning the duration column into the following intervals and labels:
-1, 10, Label: Very Short10, 20, Label: Short20, 30, Label: Medium30, 60, Label: Long60, maximum duration value, Label: Very LongNote : New column added at the end of the
df
Create a new categorical column campaign_intensity in the df by binning the campaign column into the following intervals and labels:
-1, 5, Label: Low5, 10, Label: Medium10, 20, Label: High20, maximum campaign value, Label: Very HighNote : New column added at the end of the
df
Create a new categorical column recency_cat in the df by binning the pdays column into intervals representing the recency of previous contact days, with the following labels: Very Recent for the interval -1, 7, Recent for 7, 30, Moderate for 30, 60, and Old for 60, maximum pdays value.
Note : New column added at the end of the
df
Create a new categorical column prev_contact_level in the df by binning the previous column into the following intervals and labels: bin -1, 0 with label None, bin 0, 2 with label Low, bin 2, 5 with label Medium, and bin 5, maximum previous value with label High.
Note : New column added at the end of the
df
Create a new column age_bins in the df by binning the age column into 5 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column pdays_bins in the df by binning the pdays column into 5 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column previous_bins in the df by binning the previous column into 4 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column age_custom_bins in the DataFrame df by binning the age column into the following custom-defined age bins without assigning labels: bin 0, 20, bin 20, 30, bin 30, 40, bin 40, 60, and bin 60, 100.
Note : New column added at the end of the
df
Create a new column balance_labeled_bins in the df by binning the balance column into 5 equal-width bins and assigning the following custom labels to the bins: Very Low for the first bin, Low for the second bin, Medium for the third bin, High for the fourth bin, and Very High for the fifth bin.
Note : New column added at the end of the
df
Create a new column balance_bins in the df by binning the balance column into 4 equal-sized quantile bins without assigning labels to the bins, and drop any duplicate values.
Note : New column added at the end of the
df
Create a new column duration_bins in the df by binning the duration column into 10 equal-width bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new column campaign_bins in the df by binning the campaign column into 3 equal-sized quantile bins without assigning labels to the bins.
Note : New column added at the end of the
df
Create a new set of dummy variables (one-hot encoded) for the job column in the df and store them in a new DataFrame job_dummies.
Create a new set of dummy variables (one-hot encoded) for the marital column in the df, while dropping the first dummy variable to avoid the dummy variable trap. Store the resulting dummy variables in a new DataFrame marital_dummies.
Create a new set of dummy variables (one-hot encoded) for the education column in the df, while handling missing values by creating a separate dummy variable for them. Store the resulting dummy variables in a new DataFrame education_dummies. Rename the dummy variable column representing missing values to education_nan for better readability.
Create a new set of dummy variables (one-hot encoded) for the categorical columns job, marital, and education in the df. Prefix the dummy variable column names with job, marital, and education respectively to distinguish them from other dummy variables. Store the resulting dummy variables in categorical_dummies.
Create a new set of dummy variables (one-hot encoded) for the job column in the df, prefixed with job. Store the result in job_dummies. Then, concatenate the original df with the newly created job dummy variables along the column axis to form a new DataFrame df_with_dummies containing all the original columns and the new job dummy variable columns.