All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Replace the original dataset df
Run the completed code in yout notebook
sm = SimpleImputer(strategy="COMPLETE",missing_values=np.nan)
df.iloc[:, 1] = sm.fit_transform(df.iloc[:, 1].values.reshape(-1,1))
df.iloc[:, 6:8] = sm.fit_transform(df.iloc[:, 6:8])
df.iloc[:, 33:36] = sm.fit_transform(df.iloc[:, 33:36])
df.iloc[:, 16:20] = np.round(sm.fit_transform(df.iloc[:, 16:20]))
df.iloc[:, 36:37] = np.round(sm.fit_transform(df.iloc[:, 36:37]))
Sometine there more than one possible solution. If the result is not as expected, think another way to resolve this activity.
Apply the OneHotEncoder from scikit-learn to encode the categorical columns.
First, store the name of the categorical columns in a variable categorical_columns
.
Concatenate the result with the numerical variables in a new dataframe called data_preprocessed
.
For this task
Normalize the dataset to ensure that all features are on a similar scale. This step is crucial for logistic regression, as it helps prevent certain features from dominating the others in the model's learning process.
You should use StandardScaler to standardize the features and store the results in the variables X_train_scaler
and X_test_scaler
.
Train a logistic regression model with regularization using the normalized dataset. Regularization helps prevent overfitting and improves the model's generalization ability.
You should use the LogisticRegression class from scikit-learn and set the regularization parameter C to control the regularization strength. Store the trained model in the variable logreg_model.
Evaluate the performance of the trained logistic regression model using the testing dataset.
You should use the predict method of the trained logreg_model to make predictions on the normalized testing data. Calculate and store the predictions in the variable y_pred. Then, utilize appropriate evaluation metrics to assess the model's performance, such as accuracy, precision, recall, and F1-score. Store the results in their respective variables: accuracy, precision, recall, and f1_score.
Read the following statements and determine whether each statement is true or false.