Turing at NeurIPS 2025

Explore RL environments, benchmarks, and expert-verified datasets built for post-training, reinforcement learning, and structured model evaluation across STEM, multimodality, tool use, and coding.

Talk to a Researcher

RL Environments

Coding & Benchmarks

Data

Event & Connect

RL environments

UI and MCP environments for agent training and evaluation. Each environment includes prompts, verifiers, and reward logic for controlled experimentation and validated results.

Request RL Environment

Transactional environments

Test agents in realistic ordering, cart, and fulfillment workflows with embedded verifiers and step logic.

Support-resolution environments

Evaluate multi-step reasoning for ticket triage, routing, and knowledge retrieval in helpdesk-style tasks.

Project-management environments

Run MCP-based agent evaluations with live schema validation and verifier-scored task logic.

Coding and benchmarking

Deterministic systems for measuring model reasoning, synthesis, and code understanding on verifiable tasks.

Request Hillclimb

SWE-bench++

Software reasoning benchmark using real GitHub issues and validated fixes.

VLM-bench

Multimodal reasoning benchmark across STEM and business tasks using vision-language inputs.

CodeBench

Deterministic evaluation for code models with structured prompts and ideal responses.

Data

Expert-verified datasets for post-training and evaluation, built from auditable pipelines with human-in-the-loop QA.

Catalog highlights

Coding: Real-world repo tasks and reasoning traces.
STEM: Research-grade STEM and bioinformatics tasks with executable reasoning and code.
Multimodality: Audio, image, and GUI reasoning datasets.
Domain-specific: Finance, medical, legal, and economics.
Robotics & Embodied AI: Imitation learning and embodied reasoning.‍
Custom: Scoped experiments, edge cases, or novel-modality datasets.

Request Sample Data

NVIDIA Data Filtering Challenge award

Evening discussion with NVIDIA and Turing leadership on model maturity and frontier evaluation, followed by the NVIDIA Challenge awards.

RSVP on Lu.ma →

548 Market Street, PMB 18282, San Francisco, CA 94104

Turing at NeurIPS 2025

STEM Scientific Coding Dataset

RL environments

Coding and benchmarking

Data

Catalog highlights

NVIDIA Data Filtering Challenge award

Talk to a researcher