足球联赛关键球员数据统计（足球联赛核心球员数据汇总）

tgq

Clarifying football statistics request

The user wrote in Chinese about "key player data statistics in a football league," but it's a bit vague. I should ask clarifying questions to understand their needs better, like which league or season they're referring to. I can propose possible deliverables, such as data models, scraping, and analytics metrics. I could also suggest using Python with pandas for computations if they provide CSV data, and offer to generate tables or visualizations based on the data.Planning statistical analysis

I need to consider sandbox/approval, but I don’t need to run any commands just yet. I'll propose a plan while following the instruction about using the planning tool for more significant tasks. I'll ask clarifying questions and provide a sample in Chinese. The plan will involve choosing a league or season and defining metrics while normalizing per 90 minutes. I’ll include a sample schema for player statistics, calculations for various metrics, and deliverables like Excel or PDF dashboards. I also need to identify key players and propose a scoring system for ranking them.Suggesting visualizations and analysis

I should provide visualization suggestions like beeswarm plots, radar charts, and percentiles while keeping it concise. Including a code block is okay, and I’ll create a sample script named analyze_league_players.py. I need to remember to ask if the user prefers Chinese or English. I’ll ask four questions: 1) Which league/season, 2) Data source or existing files, 3) Metrics needed, and 4) Output format? I’ll also include a pandas snippet that reads 'players_match_stats.csv' and outputs 'key_players.csv', along with how to define a key player.Writing the analysis code

I'm getting ready to write some code. First, I'll import pandas and numpy and load the player match statistics from 'players_match_stats.csv'. I need to clean the data by dropping rows where minutes are missing and ensuring minutes are greater than zero. Then I’ll compute a per 90 minutes factor and group the data by player while aggregating several stats, like goals, assists, and shots. I’ll also compute rates, including pass completion and shot conversion. For the composite score, I'll use weighted percentiles based on player positions.Simplifying composite score calculations

To keep things straightforward, I’ll compute a general composite score as the average of percentiles for metrics like goals+assists per 90 minutes and defensive actions, standardized by position weight. I'll calculate percentiles for players who have over 600 minutes and then take the mean of these available percentiles to identify the top N players for each position. I’ll export the results to CSV and ensure my code accounts for any missing columns by filling them with zeros. Finally, I'll ask for the user's dataset to continue.

想做哪类“关键球员”统计？给我这几项就能快速落地：

联赛与赛季：例如中超 2024 / 英超 2023-24
数据来源：是否已有 CSV/Excel/SQL 或需我帮你爬/接口对接
指标口径：只看传统数据（进球/助攻）还是包含高级数据（xG/xA、关键传球、推进传、压迫等）
输出形式：CSV/Excel 汇总、排行榜、可视化（雷达图/分布图）、或生成报告

如果你已有逐场球员数据，我可以直接跑下面这段脚本，产出“每90分钟标准化 + 百分位 + 关键球员榜”。你把文件名和列名对上即可。

示例：基于逐场球员统计的一键分析（pandas）

预期输入 players_match_stats.csv（逐场一行）
必要列：player_id, player_name, team, position, minutes, goals, assists
推荐列（有则算）：shots, xg, xa, key_passes, progressive_passes, carries, touches, passes_completed, passes_attempted, pressures, tackles, interceptions

# analyze_league_players.py
import pandas as pd
import numpy as np

# 1) 读取数据
df = pd.read_csv("players_match_stats.csv")

# 2) 确保必要列存在
required = ["player_id","player_name","team","position","minutes","goals","assists"]
for col in required:
    if col not in df.columns:
        raise ValueError(f"缺少必要列: {col}")

optional = {
    "shots":0,"xg":0.0,"xa":0.0,"key_passes":0,"progressive_passes":0,
    "carries":0,"touches":0,"passes_completed":0,"passes_attempted":0,
    "pressures":0,"tackles":0,"interceptions":0
}
for col,fill in optional.items():
    if col not in df.columns:
        df[col] = fill

# 3) 清洗
df = df.copy()
df["minutes"] = pd.to_numeric(df["minutes"], errors="coerce").fillna(0)
df = df[df["minutes"] > 0]

# 4) 按球员聚合（跨比赛）
agg_dict = {
    "team":"last","position":"last",
    "minutes":"sum","goals":"sum","assists":"sum","shots":"sum","xg":"sum","xa":"sum",
    "key_passes":"sum","progressive_passes":"sum","carries":"sum","touches":"sum",
    "passes_completed":"sum","passes_attempted":"sum",
    "pressures":"sum","tackles":"sum","interceptions":"sum"
}
g = df.groupby(["player_id","player_name"], as_index=False).agg(agg_dict)

# 5) 计算每90与比率
m = g["minutes"].replace(0, np.nan)
def per90(x): return x / m * 90

g["g_plus_a"] = g["goals"] + g["assists"]
g["g_plus_a90"] = per90(g["g_plus_a"])
g["xg90"] = per90(g["xg"])
g["xa90"] = per90(g["xa"])
g["shots90"] = per90(g["shots"])
g["kp90"] = per90(g["key_passes"])
g["prog_pass90"] = per90(g["progressive_passes"])
g["pressures90"] = per90(g["pressures"])
g["tack_int90"] = per90(g["tackles"] + g["interceptions"])

g["pass_comp"] = np.where(g["passes_attempted"]>0, g["passes_completed"]/g["passes_attempted"], np.nan)
g["shot_conv"] = np.where(g["shots"]>0, g["goals"]/g["shots"], np.nan)

# 6) 设定样本门槛（避免小样本偏差）
MIN_MINUTES = 600
pool = g[g["minutes"] >= MIN_MINUTES].copy()
if pool.empty:
    raise ValueError("样本为空：请降低 MIN_MINUTES 或检查数据。")

# 7) 百分位与综合评分（通用版）
metrics = ["g_plus_a90","xg90","xa90","shots90","kp90","prog_pass90","tack_int90","pressures90"]
for col in metrics:
    # 百分位排名（0-100）
    pool[col + "_pct"] = pool[col].rank(pct=True) * 100

# 根据位置给权重（可按需调整）
def pos_group(p):
    if pd.isna(p): return "OTH"
    p = str(p).upper()
    if any(k in p for k in ["ST","CF","FW","LW","RW","WF"]): return "ATT"
    if any(k in p for k in ["AM","CM","DM","MID"]): return "MID"
    if any(k in p for k in ["CB","FB","LB","RB","WB","DEF"]): return "DEF"
    return "OTH"

pool["pos_group"] = pool["position"].apply(pos_group)

weights = {
    "ATT": {"g_plus_a90":3,"xg90":2,"xa90":2,"shots90":1,"kp90":1,"prog_pass90":1,"tack_int90":0.5,"pressures90":0.5},
    "MID": {"g_plus_a90":1.5,"xg90":1,"xa90":2,"shots90":0.5,"kp90":2,"prog_pass90":2,"tack_int90":1.5,"pressures90":1.5},
    "DEF": {"g_plus_a90":0.5,"xg90":0.5,"xa90":0.5,"shots90":0.2,"kp90":0.8,"prog_pass90":1.5,"tack_int90":3,"pressures90":2},
    "OTH": {"g_plus_a90":1,"xg90":1,"xa90":1,"shots90":1,"kp90":1,"prog_pass90":1,"tack_int90":1,"pressures90":1}
}

def score_row(r):
    w = weights.get(r["pos_group"], weights["OTH"])
    s = 0.0
    tot = 0.0
    for mcol, wt in w.items():
        v = r[mcol + "_pct"]
        if pd.notna(v) and wt>0:
            s += v * wt
            tot += wt
    return s / tot if tot>0 else np.nan

pool["key_player_score"] = pool.apply(score_row, axis=1)

# 8) 输出榜单
cols_out = [
    "player_id","player_name","team","position","minutes",
    "g_plus_a","g_plus_a90","xg90","xa90","shots90","kp90","prog_pass90",
    "tack_int90","pressures90","pass_comp","shot_conv",
    "key_player_score"
]
pool.sort_values("key_player_score", ascending=False).to_csv("key_players_overall.csv", index=False)

# 分位置前10
topk = []
for grp in ["ATT","MID","DEF","OTH"]:
    part = pool[pool["pos_group"]==grp].nlargest(10, "key_player_score")
    topk.append(part)
pd.concat(topk, ignore_index=True).to_csv("key_players_by_position_top10.csv", index=False)

print("已生成: key_players_overall.csv, key_players_by_position_top10.csv")

可按需扩展

加入非点球指标：如果有 np_goals / pen_goals 列，可替换/新增 np_g90。
百分位范围：可改用同位置内百分位，避免跨位置偏置。
可视化：雷达图（单人）、蜂群图（位置维度）、累积分布（分钟阈值敏感性）。
小样本控制：提高分钟阈值，或对每90进行 Empirical Bayes 收缩。

告诉我你的联赛/赛季、是否已有数据文件（发我列名示例），我就按你的口径生成成品榜单和图表。