Cure The Princess

multiplechoice

Which variable present the highest correlation with the target?

multiplechoice

Is the distribution of the target variable in your dataset balanced or imbalanced?

multiplechoice

Did you find any missing value?

codevalidated

Train and test split

First, separate the target and the features into two variables.

Store the features in X and the target y.

Then, use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 20% testing, and random_state=0.

Store the values in the variables in X_train,X_test,y_train, y_test,random_state .

codevalidated

Model

Train a liner SVM (import LinearSVC) using the training data, and store the model in svm. You can specify the model parameters such as the C.

Remember to standarize the dataset (code provided below), please store the results in X_train_sd and X_test_sd .

sc_X = StandardScaler()
X_train_sd=sc_X.fit_transform(X_train)
X_test_sd=sc_X.transform(X_test)

Calculate the f1-score of both the training and testing sets and run the code in a Jupyter Notebook.

Store the results in the variables f1_score_train and f1_score_test .

The expected accuracy for a simple problem varies depending on the specifics of the problem and data. However, for a well-defined and simple problem with a large and diverse training dataset, a well-trained machine learning model could achieve an f1-score of over 85% in some cases.

Verónica Barraza

Project Activities

Which variable present the highest correlation with the target?

Is the distribution of the target variable in your dataset balanced or imbalanced?

Did you find any missing value?

Train and test split

Model

Verónica Barraza

Classification in Depth with Scikit-Learn

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database