Practicing Python Collections using Soccer players data
Practicing Python Collections using Soccer players data Data Science Project
Python Collections

Practicing Python Collections using Soccer players data

Practice the fundamentals of data analysis by just using nesting and combining. Using the amazing dataset of 2022-23 Football Players Stats you will analyze the best players, teams their average goals, the perfect assists. You will do all this in a step wise manner going from answering analytical questions to data transforming questions and then combining both these skills to transform the data on an aggregate function.
Start this project
Practicing Python Collections using Soccer players dataPracticing Python Collections using Soccer players data
Project Created by

Anurag Verma

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

Who's the player with the most assists in the Premier League?

Find the player with the highest number of assists that played AT LEAST 1000 minutes. The assists value is computed "per minute". Provide the full name of the player, followed by their assists number. Only consider players who have played at least 1000 minutes.

Example of the expected input: Harry Kane, 0.33

Important: the whitespace between the comma and the number is important.

input

Find the player with the highest goals-to-shots ratio in the Premier League.

Only consider players: * who have taken at least 10 shots * played in the Premier League

Hint: The goals-to-shots ratio is computed by dividing the number of goals by the number of shots.

Example of expected input: Robert Lewandowski, 0.56

Important: The whitespace between comma and the number is important.

codevalidated

Determine the average age of players in each squad.

Create a dictionary named average_age_per_squad where the keys are the squad names, and the values are the average ages rounded to the nearest whole number.

Example of expected output:

{'Liverpool': 26, 'Manchester City': 27, 'Chelsea': 25, ...}

Hint: To calculate the average age, sum up the ages of all players in each squad and divide by the number of players in that squad.

input

Which player has the highest playing time in the Premier League.

Find the player who has played the most minutes in Premier League.

Example of input: Milan

input

What is the highest number of minutes played by a player in `La Liga`

Find the maximum minutes played in La Liga. Here La Liga is the Premier League/Comeptition name.

multiplechoice

Which team, on average, scores the highest number of goals per game in the `Bundesliga`

Consider only teams that:

  • Have played at least 10 matches.
  • Play at the Bundesliga

Average Goals per Game is defined as: the total goals scored divided by the number of games played by the team.

Find the highest average goals per game and the name of the team which scored the highest goals per game and select the correct answer from the options below.

codevalidated

Transform the list of lists into a list of dictionaries

Create a new variable named players_dict that contains ALL the players but each player represented as a dictionary, containing only the keys:

  • player
  • nation
  • position
  • squad
  • competition
  • age

The resulting variable should have this structure:

[{'player': 'Brenden Aaronson',
  'nation': 'USA',
  'position': 'MFFW',
  'squad': 'Leeds United',
  'competition': 'Premier League',
  'age': 22},
 {'player': 'Yunis Abdelhamid',
  'nation': 'MAR',
  'position': 'DF',
  'squad': 'Reims',
  'competition': 'Ligue 1',
  'age': 35}]
codevalidated

Create a new variable named `players_by_nation` that groups players by their respective nations.

The variable should be a dictionary where the keys are the nation names, and the values are lists of players belonging to that nation.

Each player should be represented as a dictionary containing the keys:

  • player
  • position
  • squad
  • competition
  • age.

Example of expected output:


{'USA': [{'player': 'Brenden Aaronson',
          'position': 'MFFW',
          'squad': 'Leeds United',
          'competition': 'Premier League',
          'age': 31},
         {'player': 'Tyler Adams',
          'position': 'MF',
          'squad': 'Leeds United',
          'competition': 'Premier League',
          'age': 31},
          ...
        ],
        ...
}
codevalidated

Create a new variable named `players_by_squad` that groups players by their respective squads.

The variable should be a dictionary where the keys are the squad names, and the values are lists of players belonging to that squad.

Each player should be represented as a dictionary containing the keys:

  • player
  • position
  • squad
  • competition
  • age.

Example of expected output:

{'Leeds United': [{'player': 'Brenden Aaronson',
                    'position': 'MFFW',
                    'squad': 'Leeds United',
                    'competition': 'Premier League',
                    'age': 22},
                  {'player': 'Tyler Adams',
                   'position': 'MF',
                   'squad': 'Leeds United',
                   'competition': 'Premier League',
                   'age': 23},
                    ...
                ],
                ...
}
codevalidated

Calculate the average age of players for each competition

Create a new variable named average_age_by_competition that contains the average age for each competition. The competitions should be the keys, and the values should be the average ages rounded to one decimal place.

Example of expected output:

{
    'Premier League': 27.5,
    'La Liga': 26.8,
    'Bundesliga': 25.9,
    ...
}

Note that, the value for age is rounded to the nearest 1 decimal point

codevalidated

Transform the players data to dict-of-dict

Transform the dataset to create a new variable named average_stats_by_position. This variable should contain the average values of goals, assists, and shots on target (SoT) for each position across all players. The positions should be the keys, and the values should be dictionaries with the keys goals, assists, and sot, representing the average values for each statistic.

Example of expected output:


{
    'FW': {'goals': 12.5, 'assists': 5.3, 'sot': 8.1},
    'MF': {'goals': 6.2, 'assists': 8.7, 'sot': 3.9},
    'DF': {'goals': 2.8, 'assists': 3.1, 'sot': 1.6},
    ...
}

Note that, the values for goals, assists, sot are rounded to the nearest 1 decimal point

codevalidated

Calculate total goals per Competition.

Create a dictionary containing each competition as key, and the sum of all the goals scored as a value. Store the result in the variable goals_per_comp. It should look something like:

{
    'Premier League': ...,
    'Serie A': ...,
    ...
    'Bundesliga': ...
}
codevalidated

Calculate the total number of minutes played by each squad across all competitions.

Create a dictionary named total_minutes_by_squad where the keys are the squad names, and the values are the total minutes played by that squad.

Example of expected output:


{
    'Manchester United': 4578,
    'Real Madrid': 5123,
    'Bayern Munich': 3984,
    ...
}


codevalidated

Determine the average number of starts for players in each competition.

Create a dictionary named average_starts_per_comp where the keys are the competition names, and the values are the average number of starts rounded to the nearest whole number.

Example of expected output:

{
    'Premier League': 23,
    'La Liga': 19,
    'Bundesliga': 21,
    ...
}

codevalidated

Calculate the max scorers per competition.

Find the top scorers of each competition (maximum number of goals scored). Store your results in the variable top_scorers_per_comp. Attention! There might be more than one top scorer in the league, so your result should be a list of dictionaries containing each player and their goals as a tuple. Example:

# this is not real data or real result
# just to demonstrate the structure
{
    "Ligue 1": [
        ("Lionel Messi", 14),
        ("Kylian Mbappe", 14),
    ],
    'La Liga': [
        ('Robert Lewandowski', 18)
    ]
}
codevalidated

Convert the dataset into a new format - List of Dictionaries.

Create a list named goals_and_assists_by_player which is a list of dictionaries. Each dictionary should contain the keys player, competition, goals, and assists.

Example of expected output:

[
    {'player': 'Harry Kane', 'competition': 'Premier League', 'goals': 25, 'assists': 12},
    {'player': 'Lionel Messi', 'competition': 'La Liga', 'goals': 30, 'assists': 18},
    {'player': 'Cristiano Ronaldo', 'competition': 'Serie A', 'goals': 27, 'assists': 10},
    ...
]

codevalidated

Calculate the total number of goals and assists for each player in the dataset

Expected Output is a dictionary with the player name as the key and the value is a dictionary with keys goals and assists and their respective values.

{
    'Harry Kane': {'goals': 25, 'assists': 12},
    'Lionel Messi': {'goals': 30, 'assists': 18},
    'Cristiano Ronaldo': {'goals': 27, 'assists': 10},
    ...
}
codevalidated

Groups players into different age groups

Create a new variable named players_by_age_group that groups players into different age groups. The variable should be a dictionary where the keys are the age group names (e.g., 'Under 20', '20-25', '26-30', 'Over 30'), and the values are lists of players belonging to that age group. Each player should be represented as a dictionary containing the keys player, nation, position, squad, competition, and age.

The age group is considered as:

  • Under 20 i.e <20
  • 20-25 i.e 20 <= age <= 25
  • 26-30 i.e 26 <= ge <=30
  • Over 30 i.e >30

Example of expected output:

{
    'Under 20': [
        {'player': 'Player A', 'nation': 'Country A', 'position': 'Position A', 'squad': 'Squad A', 'competition': 'Competition A', 'age': 19},
        {'player': 'Player B', 'nation': 'Country B', 'position': 'Position B', 'squad': 'Squad B', 'competition': 'Competition B', 'age': 18},
        ...
    ],
    '20-25': [
        {'player': 'Player C', 'nation': 'Country C', 'position': 'Position C', 'squad': 'Squad C', 'competition': 'Competition C', 'age': 22},
        {'player': 'Player D', 'nation': 'Country D', 'position': 'Position D', 'squad': 'Squad D', 'competition': 'Competition D', 'age': 25},
        ...
    ],
    ...
}

Practicing Python Collections using Soccer players dataPracticing Python Collections using Soccer players data
Project Created by

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

This project is part of

Python Collections

Explore other projects