All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Examine the test and train dataset above and chose the correct statements.
Make sure you have run the code for previewing the train dataset above and written the code for test dataset by replacing the name training_dataset
with testing_dataset
As we can see from the our result above and during the EDA that, some of the columns of train dataset are redundant and shoul be removed. Also the test dataset does not contain them. Hence, let remove them from our train dataset.
You can do it by several different ways. But here let's follow the steps defined below:
columns_to_retain
y
of train dataset to columns_to_retain
columns_to_retain
As we separated our independent variables and put them into x_training_data above, we now need the corresponding target values from our training_dataset. We will store these values in y_training_data
and later use during the training of our models.
The steps are the same as we did for XGBoost model above.
The steps are the same as we did for XGBoost model above.
We are using accuracy_score(y_pred,y_valid)
to find the accruacy score in the code above
Store the predicted values in the variable y_pred_rnd_frst
You have to insert the missing values where indicated: