Practice Data Wrangling and Feature Engineering for Bank Marketing Data Using Pandas

codevalidated

Identify Continuous Variable Columns

Identify the continuous variables (numerical data types: int64 or float64) in the df and store their column names in continuous_vars.

codevalidated

Discretize Age Column into Age Groups

Create a new categorical column age_group in the df by discretizing the age column into the following bins and labels: bin 0, 18 with label Child, bin 18, 35 with label Young, bin 35, 65 with label Adult, and bin 65, 100 with label Senior.

codevalidated

Create Balance Ranges by Quantile Binning

Create a new categorical column balance_range in the DataFrame df by binning the balance column into four equal-sized quantile ranges, with labels Low, Medium, High, and Very High.

codevalidated

Create Duration Categories by Binning Duration Column

Create a new categorical column duration_cat in the df by binning the duration column into the following intervals and labels:

Interval: -1, 10, Label: Very Short
Interval: 10, 20, Label: Short
Interval: 20, 30, Label: Medium
Interval: 30, 60, Label: Long
Interval: 60, maximum duration value, Label: Very Long

codevalidated

Create Campaign Intensity Categories by Binning Campaign Column

Create a new categorical column campaign_intensity in the df by binning the campaign column into the following intervals and labels:

Interval: -1, 5, Label: Low
Interval: 5, 10, Label: Medium
Interval: 10, 20, Label: High
Interval: 20, maximum campaign value, Label: Very High

codevalidated

Create Recency Categories by Binning Previous Contact Days

Create a new categorical column recency_cat in the df by binning the pdays column into intervals representing the recency of previous contact days, with the following labels: Very Recent for the interval -1, 7, Recent for 7, 30, Moderate for 30, 60, and Old for 60, maximum pdays value.

codevalidated

Create Previous Contact Level Categories by Binning Previous Column

Create a new categorical column prev_contact_level in the df by binning the previous column into the following intervals and labels: bin -1, 0 with label None, bin 0, 2 with label Low, bin 2, 5 with label Medium, and bin 5, maximum previous value with label High.

codevalidated

Create Equal-Width Age Bins

Create a new column age_bins in the df by binning the age column into 5 equal-width bins without assigning labels to the bins.

codevalidated

Create Equal-Width Bins for Previous Contact Days

Create a new column pdays_bins in the df by binning the pdays column into 5 equal-width bins without assigning labels to the bins.

codevalidated

Create Equal-Width Bins for Previous Column

Create a new column previous_bins in the df by binning the previous column into 4 equal-width bins without assigning labels to the bins.

codevalidated

Create Custom Age Bins Without Labels

Create a new column age_custom_bins in the DataFrame df by binning the age column into the following custom-defined age bins without assigning labels: bin 0, 20, bin 20, 30, bin 30, 40, bin 40, 60, and bin 60, 100.

codevalidated

Create Labeled Bins for Balance Column

Create a new column balance_labeled_bins in the df by binning the balance column into 5 equal-width bins and assigning the following custom labels to the bins: Very Low for the first bin, Low for the second bin, Medium for the third bin, High for the fourth bin, and Very High for the fifth bin.

codevalidated

Create Quantile Bins for Balance Column

Create a new column balance_bins in the df by binning the balance column into 4 equal-sized quantile bins without assigning labels to the bins, and drop any duplicate values.

codevalidated

Create Equal-Width Bins for Duration Column

Create a new column duration_bins in the df by binning the duration column into 10 equal-width bins without assigning labels to the bins.

codevalidated

Create Quantile Bins for Campaign Column

Create a new column campaign_bins in the df by binning the campaign column into 3 equal-sized quantile bins without assigning labels to the bins.

codevalidated

Create Dummy Variables for Job Column

Create a new set of dummy variables (one-hot encoded) for the job column in the df and store them in a new DataFrame job_dummies.

codevalidated

Create Dummy Variables for Marital Column with One Dummy Dropped

Create a new set of dummy variables (one-hot encoded) for the marital column in the df, while dropping the first dummy variable to avoid the dummy variable trap. Store the resulting dummy variables in a new DataFrame marital_dummies.

codevalidated

Create Dummy Variables for Education Column with Missing Values Handled

Create a new set of dummy variables (one-hot encoded) for the education column in the df, while handling missing values by creating a separate dummy variable for them. Store the resulting dummy variables in a new DataFrame education_dummies. Rename the dummy variable column representing missing values to education_nan for better readability.

codevalidated

Create Dummy Variables for Multiple Categorical Columns

Create a new set of dummy variables (one-hot encoded) for the categorical columns job, marital, and education in the df. Prefix the dummy variable column names with job, marital, and education respectively to distinguish them from other dummy variables. Store the resulting dummy variables in categorical_dummies.

codevalidated

Concatenate Job Dummy Variables with Original DataFrame

Create a new set of dummy variables (one-hot encoded) for the job column in the df, prefixed with job. Store the result in job_dummies. Then, concatenate the original df with the newly created job dummy variables along the column axis to form a new DataFrame df_with_dummies containing all the original columns and the new job dummy variable columns.

Dhrubaraj Roy

Project Activities

Identify Continuous Variable Columns

Discretize Age Column into Age Groups

Create Balance Ranges by Quantile Binning

Create Duration Categories by Binning Duration Column

Create Campaign Intensity Categories by Binning Campaign Column

Create Recency Categories by Binning Previous Contact Days

Create Previous Contact Level Categories by Binning Previous Column

Create Equal-Width Age Bins

Create Equal-Width Bins for Previous Contact Days

Create Equal-Width Bins for Previous Column

Create Custom Age Bins Without Labels

Create Labeled Bins for Balance Column

Create Quantile Bins for Balance Column

Create Equal-Width Bins for Duration Column

Create Quantile Bins for Campaign Column

Create Dummy Variables for Job Column

Create Dummy Variables for Marital Column with One Dummy Dropped

Create Dummy Variables for Education Column with Missing Values Handled

Create Dummy Variables for Multiple Categorical Columns

Concatenate Job Dummy Variables with Original DataFrame

Dhrubaraj Roy

Data Wrangling with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database