Hero--3

Turing at ICML 2026 | Booth #B406
Explore off-the-shelf data packs, coding benchmarks, and RL environments for post-training, evaluation, reward modeling, and production.

Talk to a Researcher ⟶
Discuss your evaluation or post-training goals with Turing’s technical team
Work with Turing ⟶
Contribute as a domain expert or explore full-time roles
OTS data packs, benchmarks, and RL environments
OTS data packs
Ready to deploy. Validated and calibrated for frontier evaluation, RLVR, reward modeling, benchmarking, fine-tuning, and failure-mode analysis.
- Open-MM-RL
- MM-STEM-HLE++
- HLE++
- Advanced Reasoning Rubrics
Coding & SWE evaluation
Deterministic benchmarks built on real-repository tasks and verified outcomes. Reusable across evaluation, SFT, and RL.
- CyberBench
- SWE-bench++
- Code Review Bench
- Terminal-Bench
- LiveCodeBench
RL environments
Controlled environments for computer-use and MCP agents. Each run includes prompts, tools, workflows, verifiers, reward logic, leaderboards, and full tool-environment traces.
Expert-verified datasets
Human-in-the-loop datasets across enterprise, STEM, and multimodal domains. Domain-precise, fully traceable, and ready for SFT, RL, and evaluation.
Info Display -- 3
Built with teams advancing frontier AI
Turing partners with AI labs and enterprises to build evaluation-safe data, post-training systems, and deployment-ready workflows. We connect research progress to real-world model performance.
Info Display -- 1 [dark-mode]
Apply your expertise to frontier AI
Contribute as a researcher
Flexible, project-based work. Design hard tasks, evaluate model outputs, and identify reasoning gaps.
Join the Researcher Network
Join Turing full-time
Build post-training systems, evaluation loops, production infrastructure, and real-world model improvement workflows.
Explore full-time roles







