All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Which column is the predictor/target variable in this dataset? The value that we'll try to predict:
Is it binary or multiclass classification?
Store the features in X
and the target y
.
Despite there being various ways to solve this exercise, the results must be dataframes in order to be considered correct.
Use the train_test_split
function to split the data into training and testing. Use a proportion of 80% for the training set, and 20% for the testing set, and random_state=0
.
Store the values in the variables in X_train
,X_test
,y_train
and y_test
.
Instantiate the model and store it in the variable dt
. Use random_state=0 in the argument of the model.
It's time to train the decision tree using the training dataset.
Use the trained model to make predictions on the test data. Store the prediction in y_pred
.