All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Identify invalid values in the season
column and replace them with the string Unknown season
(data imputation).
IMPORTANT: If by any reason you think you have incorrectly modified the original dataframe, just go ahead and read it again.
Analyze the columns home_goals
and away_goals
and answer: how many invalid values each contains?
Hint: Use a visualization to help you in the process!
Replace all the invalid goals in home_goals
and away_goals
with 0
(data imputation).
The result
column contains a "summary" of the result of the match. H
indicates a home win; A
indicates an away win; D
indicates a draw.
Identify and clean the values assigning the correct result.
Calculate the average number of goals per match. Enter the value with up to 2 decimals. Example, if you find the value to be 1.8857
, enter just 1.88
.
For the previous activity, it would have been convenient to have a total_goals
column with the sum of home_goals
and away_goals
.
Create the column now.
Calculate the number of average goals per season. The result should be a series ordered per season. Store the value in the variable goals_per_season
. It'll look something like:
What was the biggest goal difference in a match found in the dataset?
Note: Goal diff can be either from a home win, or an away win. Example: a 10-1
result or a 1-10
result are the same difference, 9
goals for the winning team.
Find the team that has won the most matches away from home.
This is a tricky activity, because we're not looking for the "total" of goals received, but the "ratio" of received goals / played goals.
Example, the team Charlton Athletic
is the team with LITERALLY the least goals received at home, with only 20, but that's because they only played 38 matches in total, and only 19 at home.
What's the team with the lowest goals received to match played ratio? Defined as: goals_received / home_games
.
What's the team that playing away from home scored the most goals?