{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Como analisar dados de Futebol\n", "\n", "Neste vídeo eu trago uma introdução sobre como, no geral, você irá encontrar os dados de futebol. Os tipos de tabelas e como unir todas elas.\n", "\n", "Espero que ajude a sua jornada de conhecimento.\n", "\n", "Link para download da base de dados: https://www.kaggle.com/tiagodatascience/campeonato-paulista-a1\n", "\n", "Link para o vídeo do Youtube: https://www.youtube.com/watch?v=5AwqRsMtkCo&t=637s\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "import pandas as pd\n", "import betfairlightweight\n", "import time\n", "pd.set_option('display.max_columns',50)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "df_jogos = pd.read_csv('../dados/jogos_paulista.csv')\n", "df_stats = pd.read_csv('../dados/stats_paulista.csv')\n", "df_eventos = pd.read_csv('../dados/events_paulista.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tabela de Jogos\n", "\n", "- 1 jogo por linha\n", "- Dados como o estádio, juiz, resultado, id do jogo, nome dos times, etc.\n", "- Base para a união dos dados para as demais tabelas" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['idEvent', 'dateEvent', 'eventCity', 'eventStatus', 'eventElapsed',\n", " 'eventLeague', 'eventRound', 'idHomeTeam', 'strHomeTeam',\n", " 'logoHomeTeam', 'idAwayTeam', 'strAwayTeam', 'logoAwayTeam',\n", " 'intHomeScore', 'intAwayScore', 'intHomeHt', 'intAwayHt', 'date',\n", " 'hour', 'idWinner', 'winner', 'winOrDraw', 'underOver', 'predGoalsHome',\n", " 'predGoalsAway', 'advice', 'formHome', 'formAway', 'attHome', 'attAway',\n", " 'defHome', 'defAway', 'poissonHome', 'poissonAway', 'h2hHome',\n", " 'h2hAway', 'goalsHome', 'goalsAway', 'totalHome', 'totalAway',\n", " 'homeOdd', 'drawOdd', 'awayOdd', 'over05', 'over05Ht', 'under15',\n", " 'under15Ht'],\n", " dtype='object')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_jogos.columns" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idEventdateEventeventCityeventStatuseventElapsedeventLeagueeventRoundidHomeTeamstrHomeTeamlogoHomeTeamidAwayTeamstrAwayTeamlogoAwayTeamintHomeScoreintAwayScoreintHomeHtintAwayHtdatehouridWinnerwinnerwinOrDrawunderOverpredGoalsHomepredGoalsAwayadviceformHomeformAwayattHomeattAwaydefHomedefAwaypoissonHomepoissonAwayh2hHomeh2hAwaygoalsHomegoalsAwaytotalHometotalAwayhomeOdddrawOddawayOddover05over05Htunder15under15Ht
03178882020-02-02 19:00:00-03:00Campinas, São PauloMatch Finished90Paulista - A1Regular Season - 4138Guarani Campinashttps://media.api-sports.io/football/teams/138...10003Santo Andréhttps://media.api-sports.io/football/teams/100...21112020-02-02 00:00:0019:00:0010003.0Santo AndréFalse1.5-2.5-4.5Combo Winner : Santo André and +1.5 goals3169505050502476000038.861.30.00.00.00.00.00.00.0
13178902020-02-03 20:00:00-03:00Ribeirão Preto, São PauloMatch Finished90Paulista - A1Regular Season - 42618Botafogo SPhttps://media.api-sports.io/football/teams/261...7848Mirassolhttps://media.api-sports.io/football/teams/784...06032020-02-03 00:00:0020:00:007848.0MirassolTrue0.0-1.5-1.5Double chance : draw or Mirassol2575505030700100000035.065.00.00.00.00.00.00.00.0
23178912020-02-03 20:00:00-03:00São Paulo, São PauloMatch Finished90Paulista - A1Regular Season - 4126Sao Paulohttps://media.api-sports.io/football/teams/126...7834Novorizontinohttps://media.api-sports.io/football/teams/783...11002020-02-03 00:00:0020:00:007834.0NovorizontinoTrue-3.5-2.5-2.5Combo Double chance : draw or Novorizontino an...50505050010000000050.050.00.00.00.00.00.00.00.0
33178922020-02-07 19:15:00-03:00Barueri, São PauloMatch Finished90Paulista - A1Regular Season - 5143Oestehttps://media.api-sports.io/football/teams/143...138Guarani Campinashttps://media.api-sports.io/football/teams/138...11112020-02-07 00:00:0019:15:00138.0Guarani CampinasFalse1.5-1.5-4.5Combo Winner : Guarani Campinas and +1.5 goals3070208027738927129693137.562.50.00.00.00.00.00.00.0
43179072020-02-21 16:00:00-03:00Ribeirão Preto, São PauloMatch Finished90Paulista - A1Regular Season - 72618Botafogo SPhttps://media.api-sports.io/football/teams/261...1201Inter De Limeirahttps://media.api-sports.io/football/teams/120...21012020-02-21 00:00:0016:00:001201.0Inter De LimeiraTrue0.0-1.5-2.5Double chance : draw or Inter De Limeira1882435725751090000024.076.00.00.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " idEvent dateEvent eventCity \\\n", "0 317888 2020-02-02 19:00:00-03:00 Campinas, São Paulo \n", "1 317890 2020-02-03 20:00:00-03:00 Ribeirão Preto, São Paulo \n", "2 317891 2020-02-03 20:00:00-03:00 São Paulo, São Paulo \n", "3 317892 2020-02-07 19:15:00-03:00 Barueri, São Paulo \n", "4 317907 2020-02-21 16:00:00-03:00 Ribeirão Preto, São Paulo \n", "\n", " eventStatus eventElapsed eventLeague eventRound \\\n", "0 Match Finished 90 Paulista - A1 Regular Season - 4 \n", "1 Match Finished 90 Paulista - A1 Regular Season - 4 \n", "2 Match Finished 90 Paulista - A1 Regular Season - 4 \n", "3 Match Finished 90 Paulista - A1 Regular Season - 5 \n", "4 Match Finished 90 Paulista - A1 Regular Season - 7 \n", "\n", " idHomeTeam strHomeTeam \\\n", "0 138 Guarani Campinas \n", "1 2618 Botafogo SP \n", "2 126 Sao Paulo \n", "3 143 Oeste \n", "4 2618 Botafogo SP \n", "\n", " logoHomeTeam idAwayTeam \\\n", "0 https://media.api-sports.io/football/teams/138... 10003 \n", "1 https://media.api-sports.io/football/teams/261... 7848 \n", "2 https://media.api-sports.io/football/teams/126... 7834 \n", "3 https://media.api-sports.io/football/teams/143... 138 \n", "4 https://media.api-sports.io/football/teams/261... 1201 \n", "\n", " strAwayTeam logoAwayTeam \\\n", "0 Santo André https://media.api-sports.io/football/teams/100... \n", "1 Mirassol https://media.api-sports.io/football/teams/784... \n", "2 Novorizontino https://media.api-sports.io/football/teams/783... \n", "3 Guarani Campinas https://media.api-sports.io/football/teams/138... \n", "4 Inter De Limeira https://media.api-sports.io/football/teams/120... \n", "\n", " intHomeScore intAwayScore intHomeHt intAwayHt date \\\n", "0 2 1 1 1 2020-02-02 00:00:00 \n", "1 0 6 0 3 2020-02-03 00:00:00 \n", "2 1 1 0 0 2020-02-03 00:00:00 \n", "3 1 1 1 1 2020-02-07 00:00:00 \n", "4 2 1 0 1 2020-02-21 00:00:00 \n", "\n", " hour idWinner winner winOrDraw underOver predGoalsHome \\\n", "0 19:00:00 10003.0 Santo André False 1.5 -2.5 \n", "1 20:00:00 7848.0 Mirassol True 0.0 -1.5 \n", "2 20:00:00 7834.0 Novorizontino True -3.5 -2.5 \n", "3 19:15:00 138.0 Guarani Campinas False 1.5 -1.5 \n", "4 16:00:00 1201.0 Inter De Limeira True 0.0 -1.5 \n", "\n", " predGoalsAway advice formHome \\\n", "0 -4.5 Combo Winner : Santo André and +1.5 goals 31 \n", "1 -1.5 Double chance : draw or Mirassol 25 \n", "2 -2.5 Combo Double chance : draw or Novorizontino an... 50 \n", "3 -4.5 Combo Winner : Guarani Campinas and +1.5 goals 30 \n", "4 -2.5 Double chance : draw or Inter De Limeira 18 \n", "\n", " formAway attHome attAway defHome defAway poissonHome poissonAway \\\n", "0 69 50 50 50 50 24 76 \n", "1 75 50 50 30 70 0 100 \n", "2 50 50 50 0 100 0 0 \n", "3 70 20 80 27 73 8 92 \n", "4 82 43 57 25 75 10 90 \n", "\n", " h2hHome h2hAway goalsHome goalsAway totalHome totalAway homeOdd \\\n", "0 0 0 0 0 38.8 61.3 0.0 \n", "1 0 0 0 0 35.0 65.0 0.0 \n", "2 0 0 0 0 50.0 50.0 0.0 \n", "3 71 29 69 31 37.5 62.5 0.0 \n", "4 0 0 0 0 24.0 76.0 0.0 \n", "\n", " drawOdd awayOdd over05 over05Ht under15 under15Ht \n", "0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 0.0 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_jogos.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tabela Estatísticas\n", "\n", "- Pode variar, neste caso são 2 linhas por jogo (1 para o time da casa e 1 para o visitante)\n", "- Trazem os dados como posse de bola, chutes a gol, escanteios, cartões, etc.\n", "- Com os tratamentos, transformando 1 linha por jogo, geralmente é o input para análises e modelos estatísticos" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idEventidTeamstrTeamisHomeshots_on_goalshots_off_goaltotal_shotsblocked_shotsshots_insideboxshots_outsideboxfoulscorner_kicksoffsidesball_possessionyellow_cardsred_cardsgoalkeeper_savestotal_passespasses_accuratepasses_%
031795210018Água SantaTrue4.03.011.04.06.05.015.07.01.042%1.0NaN3.0312.0242.078%
13179527826FerroviáriaFalse2.07.017.08.08.09.013.08.0NaN58%3.0NaN3.0418.0334.080%
23179532618Botafogo SPTrue5.03.012.04.010.02.018.07.01.060%4.0NaN4.0478.0394.082%
331795310018Água SantaFalse5.04.013.04.08.05.010.05.0NaN40%3.0NaN4.0307.0239.078%
431795410018Água SantaTrue2.02.05.01.03.02.020.01.03.035%6.0NaN5.0305.0240.079%
\n", "
" ], "text/plain": [ " idEvent idTeam strTeam isHome shots_on_goal shots_off_goal \\\n", "0 317952 10018 Água Santa True 4.0 3.0 \n", "1 317952 7826 Ferroviária False 2.0 7.0 \n", "2 317953 2618 Botafogo SP True 5.0 3.0 \n", "3 317953 10018 Água Santa False 5.0 4.0 \n", "4 317954 10018 Água Santa True 2.0 2.0 \n", "\n", " total_shots blocked_shots shots_insidebox shots_outsidebox fouls \\\n", "0 11.0 4.0 6.0 5.0 15.0 \n", "1 17.0 8.0 8.0 9.0 13.0 \n", "2 12.0 4.0 10.0 2.0 18.0 \n", "3 13.0 4.0 8.0 5.0 10.0 \n", "4 5.0 1.0 3.0 2.0 20.0 \n", "\n", " corner_kicks offsides ball_possession yellow_cards red_cards \\\n", "0 7.0 1.0 42% 1.0 NaN \n", "1 8.0 NaN 58% 3.0 NaN \n", "2 7.0 1.0 60% 4.0 NaN \n", "3 5.0 NaN 40% 3.0 NaN \n", "4 1.0 3.0 35% 6.0 NaN \n", "\n", " goalkeeper_saves total_passes passes_accurate passes_% \n", "0 3.0 312.0 242.0 78% \n", "1 3.0 418.0 334.0 80% \n", "2 4.0 478.0 394.0 82% \n", "3 4.0 307.0 239.0 78% \n", "4 5.0 305.0 240.0 79% " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_stats.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tabela Eventos\n", "\n", "- 1 linha por tipo de evento/time/jogador/tempo\n", "- Geralmente traz os principais eventos do jogos: Substituições, cartões, gols\n", "- Interessante para montar uma linha do tempo do jogo e entender o efeito destes eventos no resultado. Será que um cartão vermelho acima dos 30 minutos garante que o time com 1 a mais será vencedor? Fica a questão para análises...hehe" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idEventtimeElapsedidTeamstrTeamidPlayerstrPlayertipodetailcomments
0317952327826Ferroviária15278CarlãoCardYellow CardNaN
1317952407826Ferroviária9457RayansubstSubstitution 1NaN
23179525510018Água Santa55298UedersonsubstSubstitution 1NaN
3317952597826Ferroviária80600TonyCardYellow CardNaN
43179526510018Água Santa9852Felipe AzevedosubstSubstitution 2NaN
\n", "
" ], "text/plain": [ " idEvent timeElapsed idTeam strTeam idPlayer strPlayer tipo \\\n", "0 317952 32 7826 Ferroviária 15278 Carlão Card \n", "1 317952 40 7826 Ferroviária 9457 Rayan subst \n", "2 317952 55 10018 Água Santa 55298 Uederson subst \n", "3 317952 59 7826 Ferroviária 80600 Tony Card \n", "4 317952 65 10018 Água Santa 9852 Felipe Azevedo subst \n", "\n", " detail comments \n", "0 Yellow Card NaN \n", "1 Substitution 1 NaN \n", "2 Substitution 1 NaN \n", "3 Yellow Card NaN \n", "4 Substitution 2 NaN " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_eventos.head()" ] } ], "metadata": { "interpreter": { "hash": "108a3afc346c8e3d5ef6dd858082a2a21016866ea5e9f39881138b1a93059fd6" }, "kernelspec": { "display_name": "Python 3.9.7", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }