Gaming

Game-Based Data Capture for Real-World Simulation

10,000+Hours of synchronized gameplay data
Gaming
summary.md

Challenge:Training agents to behave like humans in 3D environments requires paired observation-action data — synchronized video of what the player sees with a precise record of what they do — and no commercial capture tool provides this.

Solution:We designed and built a custom capture application from scratch.

Result:The synchronized dataset enabled the lab to train agents that predict human actions from visual observations with significantly higher fidelity than models trained on video-only data.

0+Hours of synchronized gameplay data
<0msVideo-to-input temporal alignment error
Custom0Capture solution built from scratch
0Data loss incidents across all sessions
// THE CHALLENGE

Training agents to behave like humans in 3D environments requires paired observation-action data — synchronized video of what the player sees with a precise record of what they do — and no commercial capture tool provides this. Existing screen-recording software captures pixels but discards control inputs entirely. Game telemetry APIs expose some state variables but not raw keystrokes at frame-level resolution. The lab needed a purpose-built solution that could record both streams with sub-frame temporal alignment, scale across different game engines and input devices, and sustain long capture sessions (4+ hours) without data loss, clock drift, or performance degradation on consumer hardware.

// OUR APPROACH

We designed and built a custom capture application from scratch. The system performs simultaneous screen recording at native resolution and raw input logging, capturing every keystroke, mouse movement, and controller input as structured data with microsecond-precision timestamps. Frame-level alignment between the video and control streams is maintained via a shared monotonic clock, with periodic sync markers to detect and correct any drift.

The application was engineered for robustness during sustained sessions. Memory management, disk I/O buffering, and CPU scheduling were tuned to prevent frame drops or input lag during 4+ hour recording windows. We validated capture fidelity by replaying logged inputs against recorded footage and measuring temporal alignment error — consistently under 16ms (one frame at 60fps).

The solution scaled across multiple game titles spanning first-person and third-person 3D environments. Player profiles were diverse by skill level and playstyle. Output format was standardized: per-frame JPEG streams paired with CSV control logs, each row containing timestamp, input device, key/axis, and value. A master manifest mapped each session to game title, player demographics, and session metadata.

01
DesignBuild custom capture application
02
RecordSimultaneous video + raw input logging
03
SyncFrame-level timestamp alignment
04
Scale10,000+ hours across game types
// RESULTS
10,000+Hours of synchronized gameplay data
<16msVideo-to-input temporal alignment error
CustomCapture solution built from scratch
0Data loss incidents across all sessions
// IMPACT

The synchronized dataset enabled the lab to train agents that predict human actions from visual observations with significantly higher fidelity than models trained on video-only data. The frame-level action labels eliminated the need for inverse dynamics models to infer intent from pixel changes — a noisy and lossy intermediate step that had been a primary error source in prior work. The dataset also served as a benchmark for evaluating action-prediction architectures, with the control stream providing ground-truth supervision.

Sample Capture Data

Real gameplay with synchronized input telemetry. Each clip includes the raw keystroke and mouse data captured alongside the video at microsecond precision.

Red Dead Redemption 2
hover to play
input-stream.jsonl
0/630
W
A
S
D
Shi
Spa
E
mouse
0.75s keydown D
0.83s mousemove 599:1110
0.86s mousemove 599:1110
0.88s mousemove 597:1110
0.90s mousemove 597:1110
0.90s mousemove 596:1110
0.91s mousemove 594:1110
0.92s mousemove 593:1110
0.93s mousemove 591:1110
0.94s mousemove 588:1110
0.94s mousemove 585:1110
0.95s mousemove 582:1110
0.96s keyup D
0.96s mousemove 579:1110
0.97s mousemove 578:1110
Resolution: 1280x720FPS: 30Sync Error: <16msEvents: 630Events/sec: 53Cost/sec: $0.004
PUBG: Battlegrounds
hover to play
input-stream.jsonl
0/978
W
A
S
D
Shi
Spa
E
mouse
0.02s mousemove 1914:214
0.06s mousemove 1913:214
0.11s mousemove 1913:214
0.13s mousemove 1912:214
0.15s mousemove 1912:214
0.16s mousemove 1911:214
0.17s mousemove 1911:213
0.18s mousemove 1910:213
0.19s mousemove 1910:213
0.19s mousemove 1909:213
0.20s mousemove 1909:213
0.21s mousemove 1908:213
0.22s mousemove 1908:213
0.26s mousemove 1907:213
0.27s mousemove 1907:212
Resolution: 1280x720FPS: 30Sync Error: <16msEvents: 978Events/sec: 82Cost/sec: $0.004
// FAQ

Ready to build your next dataset?

Tell us about your project and we will scope a plan within 48 hours.