Back to Projects

FIFA Soccer DS

End-to-end computer vision pipeline for soccer video analysis. Processes footage at 22 FPS with YOLOv8 detection, ByteTrack multi-object tracking, and GraphSAGE for tactical pattern recognition.

View on GitHub
PythonYOLOv8ByteTrackPyTorch GeometricFastAPIMLflow

Pipeline Architecture

Video Input
MP4/RTSP
Frame Extraction
30 FPS
Detection
YOLOv8
Tracking
ByteTrack
Graph Build
Spatial
GNN Analysis
GraphSAGE
1

Detection — YOLOv8

Each frame is processed through a fine-tuned YOLOv8 model that identifies three object classes: players, ball, and referees. The model runs at 95%+ precision with confidence threshold of 0.35 to balance recall vs false positives for distant players.

src/detect/inference.py
from ultralytics import YOLO

model = YOLO("models/yolov8n_soccer.pt")

def detect_frame(frame, conf=0.35):
    results = model(frame, conf=conf, verbose=False)
    boxes = results[0].boxes

    return {
        "boxes": boxes.xyxy.cpu().numpy(),  # [x1, y1, x2, y2]
        "scores": boxes.conf.cpu().numpy(),
        "classes": boxes.cls.cpu().numpy()  # 0=player, 1=ball, 2=referee
    }

Example

Frame 001 → 22 detections (players: 20, ball: 1, referee: 1)

Output Format

{
  "frame_id": 1,
  "detections": [
    {"box": [120, 340, 180, 520], "class": "player", "conf": 0.92},
    {"box": [540, 280, 560, 300], "class": "ball", "conf": 0.87},
    ...
  ]
}
2

Tracking — ByteTrack

ByteTrack maintains consistent player IDs across frames using Kalman filter prediction and Hungarian algorithm matching. Handles occlusions (passes, tackles) with a max_age of 20 frames, allowing track recovery after brief disappearances.

src/track/bytetrack.py
from src.track.byte_tracker import BYTETracker

tracker = BYTETracker(
    track_thresh=0.5,    # High-confidence threshold
    match_thresh=0.8,    # IoU matching threshold
    track_buffer=30,     # Frames to keep lost tracks
    frame_rate=30
)

def track_frame(detections, frame_id):
    # Format: [x1, y1, x2, y2, score]
    dets = np.hstack([detections["boxes"],
                      detections["scores"][:, None]])

    tracks = tracker.update(dets, frame_id)
    return tracks  # Each track has persistent ID

Example

Player #7 tracked across 847 frames with 3 occlusion recoveries

Output Format

{
  "track_id": 7,
  "history": [
    {"frame": 1, "box": [120, 340, 180, 520]},
    {"frame": 2, "box": [125, 342, 185, 522]},
    ...
  ],
  "total_frames": 847,
  "occlusions_recovered": 3
}
3

Graph Construction

For each frame, we build a spatial graph where nodes are tracked players and edges connect players within a distance threshold (120px ≈ 8-10 meters on pitch). This captures the tactical structure — who is near whom, potential passing lanes, defensive coverage.

src/graph/builder.py
import torch
from torch_geometric.data import Data

def build_frame_graph(tracks, distance_threshold=120.0):
    positions = get_centroids(tracks)  # [N, 2]
    n_players = len(positions)

    # Build edges based on distance
    edges = []
    for i in range(n_players):
        for j in range(i + 1, n_players):
            dist = np.linalg.norm(positions[i] - positions[j])
            if dist < distance_threshold:
                edges.append([i, j])
                edges.append([j, i])  # Undirected

    edge_index = torch.tensor(edges, dtype=torch.long).T
    x = torch.tensor(positions, dtype=torch.float)

    return Data(x=x, edge_index=edge_index)

Example

Frame 150: 22 nodes, 47 edges (avg degree: 4.3)

Output Format

Graph(
  x=[22, 2],           # 22 players, (x, y) position
  edge_index=[2, 94],  # 47 bidirectional edges
)
4

GraphSAGE Analysis

A 2-layer GraphSAGE model learns 64-dimensional embeddings for each player based on their position and neighborhood. Players in similar tactical roles (e.g., wide midfielders) cluster together in embedding space, enabling formation detection and role classification.

src/models/graphsage.py
from torch_geometric.nn import SAGEConv
import torch.nn.functional as F

class TacticalGNN(torch.nn.Module):
    def __init__(self, in_dim=2, hidden_dim=32, out_dim=64):
        super().__init__()
        self.conv1 = SAGEConv(in_dim, hidden_dim)
        self.conv2 = SAGEConv(hidden_dim, out_dim)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=0.2, training=self.training)
        x = self.conv2(x, edge_index)

        return x  # [N, 64] player embeddings

Example

Similar embeddings → similar tactical roles (clustering accuracy: 89%)

Output Format

Player embeddings [22, 64]:
  Player #7 (LW):  [0.12, -0.34, 0.89, ...]
  Player #11 (RW): [0.15, -0.31, 0.85, ...]  # Similar!
  Player #4 (CB):  [-0.67, 0.23, -0.12, ...]  # Different

Cosine similarity LW-RW: 0.94
Cosine similarity LW-CB: 0.12

Full Pipeline Example

Processing a 10-second La Liga clip (Barcelona vs Real Madrid) through the complete pipeline:

Input
10s clip @ 30fps
300 frames, 1920x1080
Detection
6,600 bounding boxes
~22 detections/frame
Tracking
22 unique tracks
0 ID switches
Graphs
300 frame graphs
avg 45 edges/frame
Output
Formation: 4-3-3 vs 4-4-2
Role predictions for all 22 players
Run the full pipeline
# Full pipeline execution
python -m src.pipeline_full \
    --video data/raw/barca_vs_real.mp4 \
    --output-dir outputs/match_analysis \
    --confidence 0.35 \
    --distance-threshold 120.0 \
    --max-age 20

# Output structure:
# outputs/match_analysis/
# ├── detections/          # Per-frame detection JSON
# ├── tracklets.json       # Complete track histories
# ├── graphs/              # PyG graph objects
# ├── embeddings.npy       # Player embeddings [N, 64]
# └── pipeline_summary.json

Performance

22 FPS
Real-time inference
RTX 3070 8GB
95%+
Detection precision
@ 0.35 confidence
0
ID switches
On test clips