top of page
Writer's pictureEbenezer Adebiyi

Data-Driven Insights and Performance Metrics of EPL 2021/2022

Updated: Jun 30

Learning without practice is like walking without shoes; you will not be confident in your steps. Ever since I started working on this project, I have learned the most effective and better ways to analyze and handle data. This has increased my knowledge and my curiosity to learn more, and has been the major motivation that keeps me going in my data career. After working on the Spotify audio dataset, I felt compelled to work on the sports dataset because sport is an entertainment that promotes harmony and unity, which has a direct impact on culture as well as a means to maintain a healthy lifestyle and boost our morale. In sports, football is one of the most popular sports that attracts more attention. I felt working on a football dataset would give me more insight into football. I headed to kaggle to source my data. When surfing for datasets, I stumbled on the English Premier League 2021/2022 season dataset. I decided to lay my hand on this data in order to have more insight on 2021–2022 season. The dataset consists of two documents, both with different content. I downloaded the open source dataset in comma separated values (csv) format, and imported it to excel.

The first dataset is the players' statistics. This consists of 624 rows and 10 columns. The column consists of the team name, jersey number, player name, and position played. Appearances (The total number of times a player appears in a match) Substitution (the number of times a player is substituted), Goals scored (Total goals score by each player), Penalties (the number of penalties played by each player), Yellow cards (shows the number of yellow cards received by the players during the season), red cards (shows the number of red cards received by the players during the season). The second dataset is the English Premier League table. This dataset consists of 20 rows and 9 columns. The column consists of the team, GF (the number of goals scored), GA (the number of goals conceded), GD (the goals difference, or the difference between the goal score and the goal conceded), Ptds (the total points).My goals for this analysis are to determine the team that conceals the most goals, To determine the players with the most yellow cards, to determine the team with fair play (low yellow cards and red cards) and to determine the team with the most penalties.


In data cleaning, I removed the duplicates; some of the data contained some special characters. I used find and replace to eliminate the special characters. Also, there is no space in between the first name and last name in the player statistics dataset. Creating space in between them is the major challenge I encountered when cleaning the data. Thank God for my good friend, Youtube. I learned how to create space in between them through her by using Kutools for excel. Having created a space, I did the other necessary cleaning and transformed the data into an organized form. I used Power Pivot in Excel to create data modeling and combine the two datasets (player statistics and English Premier League table). I created the data model by joining the two tables together using the team column, which is common to both datasets. I created a pivot table based on the data modeling and had accessed to all of the columns in the two datasets; then, I analyzed my data to my liking. According to my analysis, Wolverhampton wanderer is the most fair team with 29% followed by Watford with 34%.

Norwich city conceded the most with a total of 84 goals conceded, followed by Leed United with a total of 79 goals conceded. Chelsea has the most penalties with a total of 13 penalties followed by Manchester city with a total of 11 penalties. Jamesa Tarkonski of Burnley, Antonia Rudiger of Chelsea and conora Gallagher of crystal palace have the most yellow cards with a total of 12 respectively. In order to communicate my findings in a unique and attractive way, I reached out to my friend, Youtube. I saw and learned a beautiful dashboard and I tried to replicate it a bit, but I designed mine in a special and unique way. Gbam! I arrived at my dashboard. Excel, without a doubt, is underrated.I published this project on my Github



991 views0 comments

תגובות

דירוג של 0 מתוך 5 כוכבים
אין עדיין דירוגים

הוספת דירוג
bottom of page