Turing at ICML 2026 | Booth #B406

Explore off-the-shelf data packs, coding benchmarks, and RL environments for post-training, evaluation, reward modeling, and production.

Talk to a Researcher ⟶

Discuss your evaluation or post-training goals with Turing’s technical team

Work with Turing ⟶

Contribute as a domain expert or explore full-time roles

Trusted by the labs building frontier AI.

OTS data packs, benchmarks, and RL environments

OTS data packs
Ready to deploy. Validated and calibrated for frontier evaluation, RLVR, reward modeling, benchmarking, fine-tuning, and failure-mode analysis.
  1. Open-MM-RL
  2. MM-STEM-HLE++
  3. HLE++
  4. Advanced Reasoning Rubrics
Coding & SWE evaluation
Deterministic benchmarks built on real-repository tasks and verified outcomes. Reusable across evaluation, SFT, and RL.
  1. CyberBench
  2. ‍SWE-bench++
  3. Code Review Bench
  4. Terminal-Bench
  5. LiveCodeBench
RL environments
Controlled environments for computer-use and MCP agents. Each run includes prompts, tools, workflows, verifiers, reward logic, leaderboards, and full tool-environment traces.
Expert-verified datasets
Human-in-the-loop datasets across enterprise, STEM, and multimodal domains. Domain-precise, fully traceable, and ready for SFT, RL, and evaluation.

Built with teams advancing frontier AI

Turing partners with AI labs and enterprises to build evaluation-safe data, post-training systems, and deployment-ready workflows. We connect research progress to real-world model performance.

Apply your expertise to frontier AI

Contribute as a researcher
Flexible, project-based work. Design hard tasks, evaluate model outputs, and identify reasoning gaps.
Join the Researcher Network
Join Turing full-time
Build post-training systems, evaluation loops, production infrastructure, and real-world model improvement workflows.
Explore full-time roles