Overview
GeneralsAI is an experiment in applying large language models to real-time strategy games — specifically, building an AI agent that can reason about complex game states, form multi-step strategic plans, and explain its decisions in plain language.
Design
State encoding: The game map is serialized into a compact structured representation that fits in a context window — unit positions, resource counts, terrain, known enemy positions, and recent game history. This encoding is the hardest part: LLMs are sensitive to how information is presented.
Hierarchical planning: The agent operates at two levels. A high-level planner (one LLM call per strategic turn) sets objectives: “expand north”, “defend base”, “attack enemy flank”. A low-level executor (one call per unit action) translates objectives into concrete moves. This two-level structure keeps context windows manageable and produces more coherent strategies than flat prompting.
Chain-of-thought: Both planner and executor use CoT prompting. The planner’s reasoning trace is logged and displayed alongside each decision — you can watch the AI argue with itself about whether to rush or expand.
Evaluation: Agents play round-robin tournaments against each other and against scripted bots. Win rate and average game length are tracked per model/prompt variant.
Interesting Findings
- GPT-4-class models significantly outperform smaller models, particularly in late-game resource management
- CoT prompting improves win rate by ~15% vs. direct prompting (the model reasons through tradeoffs it would otherwise miss)
- The biggest failure mode: over-commitment. LLMs are bad at knowing when to cut losses on a failing strategy
Tech Stack
Python · FastAPI · LangChain · TypeScript · WebSocket (live state streaming)