Simple Model Tests: performance evaluations

codevalidated

Separate the target and the features into two variables.

Store the features in X and the target y.

codevalidated

Use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 20% testing, and random_state=0.

Store the values in the variables in X_train, X_test,y_train, y_test, and random_state.

codevalidated

Train an Random Forest with the following parameters: n_estimators=100 and random_state=42 and calculated the accuracy for the testing set.

Train a Random Forest Classifier using the training data, and store the model in rf. You can specify the model parameters such as the maximum depth of the tree or the minimum number of samples required to split an internal node.

Calculate the accuracy of both the training and testing sets and run the code in a Jupyter Notebook.

Store the results in the variables train_accuracy and test_accuracy.

The expected accuracy for a simple problem varies depending on the specifics of the problem and data. However, for a well-defined and simple problem with a large and diverse training dataset, a well-trained machine learning model could achieve an accuracy of over 80% in some cases.

codevalidated

Compute precision, recall and f1-score using the test dataset

Store the precision, recall, and f1-score of the positive class in the variables precision,recall and f_1_score

multiplechoice

If the Precision-Recall curve of a binary classification model has a steep slope in the beginning and becomes less steep towards the end, it means that it has high precision at the beginning, but as the recall increases, the precision decreases.

multiplechoice

Best models performance

Which model presents the worst performance in the test dataset?

Verónica Barraza

Project Activities

Separate the target and the features into two variables.

Use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 20% testing, and random_state=0.

Train an Random Forest with the following parameters: n_estimators=100 and random_state=42 and calculated the accuracy for the testing set.

Compute precision, recall and f1-score using the test dataset

If the Precision-Recall curve of a binary classification model has a steep slope in the beginning and becomes less steep towards the end, it means that it has high precision at the beginning, but as the recall increases, the precision decreases.

Best models performance

Verónica Barraza

Classification in Depth with Scikit-Learn

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database