

Turing at ICLR 2026 | Booth #301
Explore RL environments, evaluation infrastructure, benchmarks, and real-world data systems for post-training research.
RL environments
Production-grade UI and MCP environments for agent training and evaluation. Each environment includes prompts, verifiers, and reward logic for controlled experimentation.
Benchmarks and evaluation
Reproducible scoring across unified execution environments, built on real defects and tasks, with semantic-aware tests and versioned runs for full auditability.
End-to-end evaluation for software engineering agents. 500 public + 7,000+ commercial software engineering tasks.
700+ open-ended multimodal reasoning tasks across STEM and business domains.
Evaluating agentic code partners through difficult review tasks. 1,200 public tasks, 6,296 commercial tasks.
Off-the-shelf data packs
Calibrated, ready-to-deploy datasets built for frontier model evaluation. Each pack ships in standard formats and is compatible with your existing harness.
Hill-climbing terminal-bench reasoning in Harbor format. ~33–40% resolution on frontier models.
Deterministic algorithmic evaluation for frontier coding models. 1K+ non-public samples in LCB-native JSON format.
Graduate-to-PhD headroom sets to preserve measurable pass@k separation after HLE saturation. Get 1k-5k+ OTS packs within 24–48 hours.
.jpg)
Expert-verified data
Human-in-the-loop datasets built for SFT, RL, and evaluation. Built from real enterprise workflows with domain precision and full traceability.
- Coding: real-world repo tasks and verified patches
- STEM: advanced math, chemistry, physics, and biology
- Multimodality: audio, image, and GUI reasoning
- Domain-specific: finance, legal, healthcare, and retail
- Robotics & Embodied AI: imitation learning and embodied reasoning
- Trust & Safety: policy-grounded tasks and adversarial prompts
- Infrastructure-as-Code: cloud infrastructure evaluation in real environments

Case studies & collaborations
Turing has partnered with leading AI labs and enterprises to build governed post-training systems that close the gap between research benchmarks and production deployment.
.avif)
Contribute as a researcher
Join Turing's network of PhDs and Olympiad-level researchers contributing to post-training research in coding, STEM, multimodal evaluation, robotics, and more.
Work for Turing’s internal team
Join our internal research and engineering teams building RL environments, benchmarks, and post-training systems.

LLM Researchers Happy Hour During ICLR
Join us for an invite-only gathering bringing together AI researchers and enterprise leaders driving real-world AI innovation.
📅April 23, 2026 (6:00 PM - 9:00 PM)







