Practice Data Wrangling and Feature Engineering for Bank Marketing Data Using Pandas
Practice Data Wrangling and Feature Engineering for Bank Marketing Data Using Pandas Data Science Project
Data Wrangling with Pandas

Practice Data Wrangling and Feature Engineering for Bank Marketing Data Using Pandas

Wrangling bank marketing data just got real! Flex your Pandas skills by discretizing age with `.cut()`, binning balances using `.qcut()`, and creating dummy jobs via `.get_dummies()`. Master techniques like discretization, binning, and one-hot encoding to preprocess continuous and categorical columns. Unlock new dimensions through feature engineering - a practical way to elevate your data munging prowess!
Start this project
Practice Data Wrangling and Feature Engineering for Bank Marketing Data Using PandasPractice Data Wrangling and Feature Engineering for Bank Marketing Data Using Pandas
Project Created by

Dhrubaraj Roy

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Identify Continuous Variable Columns

Identify the continuous variables (numerical data types: int64 or float64) in the df and store their column names in continuous_vars.

codevalidated

Discretize Age Column into Age Groups

Create a new categorical column age_group in the df by discretizing the age column into the following bins and labels: bin 0, 18 with label Child, bin 18, 35 with label Young, bin 35, 65 with label Adult, and bin 65, 100 with label Senior.

Note : New column added at the end of the df

codevalidated

Create Balance Ranges by Quantile Binning

Create a new categorical column balance_range in the DataFrame df by binning the balance column into four equal-sized quantile ranges, with labels Low, Medium, High, and Very High.

Note : New column added at the end of the df

codevalidated

Create Duration Categories by Binning Duration Column

Create a new categorical column duration_cat in the df by binning the duration column into the following intervals and labels:

  • Interval: -1, 10, Label: Very Short
  • Interval: 10, 20, Label: Short
  • Interval: 20, 30, Label: Medium
  • Interval: 30, 60, Label: Long
  • Interval: 60, maximum duration value, Label: Very Long

Note : New column added at the end of the df

codevalidated

Create Campaign Intensity Categories by Binning Campaign Column

Create a new categorical column campaign_intensity in the df by binning the campaign column into the following intervals and labels:

  • Interval: -1, 5, Label: Low
  • Interval: 5, 10, Label: Medium
  • Interval: 10, 20, Label: High
  • Interval: 20, maximum campaign value, Label: Very High

Note : New column added at the end of the df

codevalidated

Create Recency Categories by Binning Previous Contact Days

Create a new categorical column recency_cat in the df by binning the pdays column into intervals representing the recency of previous contact days, with the following labels: Very Recent for the interval -1, 7, Recent for 7, 30, Moderate for 30, 60, and Old for 60, maximum pdays value.

Note : New column added at the end of the df

codevalidated

Create Previous Contact Level Categories by Binning Previous Column

Create a new categorical column prev_contact_level in the df by binning the previous column into the following intervals and labels: bin -1, 0 with label None, bin 0, 2 with label Low, bin 2, 5 with label Medium, and bin 5, maximum previous value with label High.

Note : New column added at the end of the df

codevalidated

Create Equal-Width Age Bins

Create a new column age_bins in the df by binning the age column into 5 equal-width bins without assigning labels to the bins.

Note : New column added at the end of the df

codevalidated

Create Equal-Width Bins for Previous Contact Days

Create a new column pdays_bins in the df by binning the pdays column into 5 equal-width bins without assigning labels to the bins.

Note : New column added at the end of the df

codevalidated

Create Equal-Width Bins for Previous Column

Create a new column previous_bins in the df by binning the previous column into 4 equal-width bins without assigning labels to the bins.

Note : New column added at the end of the df

codevalidated

Create Custom Age Bins Without Labels

Create a new column age_custom_bins in the DataFrame df by binning the age column into the following custom-defined age bins without assigning labels: bin 0, 20, bin 20, 30, bin 30, 40, bin 40, 60, and bin 60, 100.

Note : New column added at the end of the df

codevalidated

Create Labeled Bins for Balance Column

Create a new column balance_labeled_bins in the df by binning the balance column into 5 equal-width bins and assigning the following custom labels to the bins: Very Low for the first bin, Low for the second bin, Medium for the third bin, High for the fourth bin, and Very High for the fifth bin.

Note : New column added at the end of the df

codevalidated

Create Quantile Bins for Balance Column

Create a new column balance_bins in the df by binning the balance column into 4 equal-sized quantile bins without assigning labels to the bins, and drop any duplicate values.

Note : New column added at the end of the df

codevalidated

Create Equal-Width Bins for Duration Column

Create a new column duration_bins in the df by binning the duration column into 10 equal-width bins without assigning labels to the bins.

Note : New column added at the end of the df

codevalidated

Create Quantile Bins for Campaign Column

Create a new column campaign_bins in the df by binning the campaign column into 3 equal-sized quantile bins without assigning labels to the bins.

Note : New column added at the end of the df

codevalidated

Create Dummy Variables for Job Column

Create a new set of dummy variables (one-hot encoded) for the job column in the df and store them in a new DataFrame job_dummies.

codevalidated

Create Dummy Variables for Marital Column with One Dummy Dropped

Create a new set of dummy variables (one-hot encoded) for the marital column in the df, while dropping the first dummy variable to avoid the dummy variable trap. Store the resulting dummy variables in a new DataFrame marital_dummies.

codevalidated

Create Dummy Variables for Education Column with Missing Values Handled

Create a new set of dummy variables (one-hot encoded) for the education column in the df, while handling missing values by creating a separate dummy variable for them. Store the resulting dummy variables in a new DataFrame education_dummies. Rename the dummy variable column representing missing values to education_nan for better readability.

codevalidated

Create Dummy Variables for Multiple Categorical Columns

Create a new set of dummy variables (one-hot encoded) for the categorical columns job, marital, and education in the df. Prefix the dummy variable column names with job, marital, and education respectively to distinguish them from other dummy variables. Store the resulting dummy variables in categorical_dummies.

codevalidated

Concatenate Job Dummy Variables with Original DataFrame

Create a new set of dummy variables (one-hot encoded) for the job column in the df, prefixed with job. Store the result in job_dummies. Then, concatenate the original df with the newly created job dummy variables along the column axis to form a new DataFrame df_with_dummies containing all the original columns and the new job dummy variable columns.

Practice Data Wrangling and Feature Engineering for Bank Marketing Data Using PandasPractice Data Wrangling and Feature Engineering for Bank Marketing Data Using Pandas
Project Created by

Dhrubaraj Roy

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

Project Author at DataWars, responsible for leading the development and delivery of innovative machine learning and data science projects.

This project is part of

Data Wrangling with Pandas

Explore other projects