FIFA Soccer DS

End-to-end computer vision pipeline for soccer video analysis. Processes footage at 22 FPS with YOLOv8 detection, ByteTrack multi-object tracking, and GraphSAGE for tactical pattern recognition.

View on GitHub

PythonYOLOv8ByteTrackPyTorch GeometricFastAPIMLflow

Pipeline Architecture

Video Input

MP4/RTSP

→

Frame Extraction

30 FPS

→

Detection

YOLOv8

→

Tracking

ByteTrack

→

Graph Build

Spatial

→

GNN Analysis

GraphSAGE

Detection — YOLOv8

Each frame is processed through a fine-tuned YOLOv8 model that identifies three object classes: players, ball, and referees. The model runs at 95%+ precision with confidence threshold of 0.35 to balance recall vs false positives for distant players.

src/detect/inference.py

from ultralytics import YOLO

model = YOLO("models/yolov8n_soccer.pt")

def detect_frame(frame, conf=0.35):
    results = model(frame, conf=conf, verbose=False)
    boxes = results[0].boxes

    return {
        "boxes": boxes.xyxy.cpu().numpy(),  # [x1, y1, x2, y2]
        "scores": boxes.conf.cpu().numpy(),
        "classes": boxes.cls.cpu().numpy()  # 0=player, 1=ball, 2=referee
    }

Example

Frame 001 → 22 detections (players: 20, ball: 1, referee: 1)

Output Format

{
  "frame_id": 1,
  "detections": [
    {"box": [120, 340, 180, 520], "class": "player", "conf": 0.92},
    {"box": [540, 280, 560, 300], "class": "ball", "conf": 0.87},
    ...
  ]
}

Tracking — ByteTrack

ByteTrack maintains consistent player IDs across frames using Kalman filter prediction and Hungarian algorithm matching. Handles occlusions (passes, tackles) with a max_age of 20 frames, allowing track recovery after brief disappearances.

src/track/bytetrack.py

from src.track.byte_tracker import BYTETracker

tracker = BYTETracker(
    track_thresh=0.5,    # High-confidence threshold
    match_thresh=0.8,    # IoU matching threshold
    track_buffer=30,     # Frames to keep lost tracks
    frame_rate=30
)

def track_frame(detections, frame_id):
    # Format: [x1, y1, x2, y2, score]
    dets = np.hstack([detections["boxes"],
                      detections["scores"][:, None]])

    tracks = tracker.update(dets, frame_id)
    return tracks  # Each track has persistent ID

Example

Player #7 tracked across 847 frames with 3 occlusion recoveries

Output Format

{
  "track_id": 7,
  "history": [
    {"frame": 1, "box": [120, 340, 180, 520]},
    {"frame": 2, "box": [125, 342, 185, 522]},
    ...
  ],
  "total_frames": 847,
  "occlusions_recovered": 3
}

Graph Construction

For each frame, we build a spatial graph where nodes are tracked players and edges connect players within a distance threshold (120px ≈ 8-10 meters on pitch). This captures the tactical structure — who is near whom, potential passing lanes, defensive coverage.

src/graph/builder.py

import torch
from torch_geometric.data import Data

def build_frame_graph(tracks, distance_threshold=120.0):
    positions = get_centroids(tracks)  # [N, 2]
    n_players = len(positions)

    # Build edges based on distance
    edges = []
    for i in range(n_players):
        for j in range(i + 1, n_players):
            dist = np.linalg.norm(positions[i] - positions[j])
            if dist < distance_threshold:
                edges.append([i, j])
                edges.append([j, i])  # Undirected

    edge_index = torch.tensor(edges, dtype=torch.long).T
    x = torch.tensor(positions, dtype=torch.float)

    return Data(x=x, edge_index=edge_index)

Example

Frame 150: 22 nodes, 47 edges (avg degree: 4.3)

Output Format

Graph(
  x=[22, 2],           # 22 players, (x, y) position
  edge_index=[2, 94],  # 47 bidirectional edges
)

GraphSAGE Analysis

A 2-layer GraphSAGE model learns 64-dimensional embeddings for each player based on their position and neighborhood. Players in similar tactical roles (e.g., wide midfielders) cluster together in embedding space, enabling formation detection and role classification.

src/models/graphsage.py

from torch_geometric.nn import SAGEConv
import torch.nn.functional as F

class TacticalGNN(torch.nn.Module):
    def __init__(self, in_dim=2, hidden_dim=32, out_dim=64):
        super().__init__()
        self.conv1 = SAGEConv(in_dim, hidden_dim)
        self.conv2 = SAGEConv(hidden_dim, out_dim)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=0.2, training=self.training)
        x = self.conv2(x, edge_index)

        return x  # [N, 64] player embeddings

Example

Similar embeddings → similar tactical roles (clustering accuracy: 89%)

Output Format

Player embeddings [22, 64]:
  Player #7 (LW):  [0.12, -0.34, 0.89, ...]
  Player #11 (RW): [0.15, -0.31, 0.85, ...]  # Similar!
  Player #4 (CB):  [-0.67, 0.23, -0.12, ...]  # Different

Cosine similarity LW-RW: 0.94
Cosine similarity LW-CB: 0.12

Full Pipeline Example

Processing a 10-second La Liga clip (Barcelona vs Real Madrid) through the complete pipeline:

Input

10s clip @ 30fps

300 frames, 1920x1080

Detection

6,600 bounding boxes

~22 detections/frame

Tracking

22 unique tracks

0 ID switches

Graphs

300 frame graphs

avg 45 edges/frame

Output

Formation: 4-3-3 vs 4-4-2

Role predictions for all 22 players

Run the full pipeline

# Full pipeline execution
python -m src.pipeline_full \
    --video data/raw/barca_vs_real.mp4 \
    --output-dir outputs/match_analysis \
    --confidence 0.35 \
    --distance-threshold 120.0 \
    --max-age 20

# Output structure:
# outputs/match_analysis/
# ├── detections/          # Per-frame detection JSON
# ├── tracklets.json       # Complete track histories
# ├── graphs/              # PyG graph objects
# ├── embeddings.npy       # Player embeddings [N, 64]
# └── pipeline_summary.json

Performance

22 FPS

Real-time inference

RTX 3070 8GB

95%+

Detection precision

@ 0.35 confidence

ID switches

On test clips