Rubric-based reasoning data pack

Hero--1

Advanced PhD Reasoning Rubrics

This is some text inside of a div block.

Rubric-based reasoning data pack for frontier model improvement

PhD-level reasoning data built for frontier AI teams working on RL, post-training, evaluation, reward modeling, and reasoning-failure analysis.

Get access to 1,100+ multi-domain scientific tasks across computer science, data science, and chemistry, paired with weighted atomic rubrics and golden answers.

Get the full data pack

Explore sample data on Hugging Face

Turning expert evaluation into machine-verifiable training signal

Turing builds evaluation-safe, expert-authored datasets for frontier model improvement. Our rubrics-based dataset extends that work from final-answer benchmarking into granular, criterion-level evaluation and reward signal.

Weighted atomic rubrics

Per-criterion scoring for intermediate reasoning
Rubrics aligned to prompt and golden answer
Supports RL, reward modeling, and evaluation harnesses

Doctoral-level task design

Authored by subject-matter experts
Designed for advanced scientific and technical reasoning
Built around self-contained, non-retrievable inputs

Human-led validation

Independent expert review
Problem statement, rubric set, and golden answers checked for consistency, domain correctness, ambiguity, and rubric atomicity

Frontier-model calibration

Tested across 16 evaluation rounds
Pass rates between 0% and 50%
Designed to remain discriminative for current state-of-the-art systems

Info Display -- 3

Multi-domain coverage for advanced reasoning models

Custom expansions can be scoped for additional subdomains, difficulty levels, and specific model-improvement workflows.

Computer Science

Algorithms, systems, machine learning, programming languages, formal methods, databases, and data engineering

Data Science

Business analytics, finance, healthcare, supply chain, HR, IT support, and research workflows

‍

Chemistry

Organic, inorganic, organometallic, polymer, physical, and analytical chemistry

‍

Info Collapse -- 4 [dark-mode]

Built for RL, evaluation, and failure analysis

Each task measures more than final-answer correctness. Rubrics evaluate visible derivations, mechanism and structure identification, quantitative computation with explicit units, methodological choices, edge-case handling, and executable multi-step pipelines.

Each task is designed to support:

Reinforcement learning with per-criterion reward signal
Post-training and reward modeling
Model comparison and regression testing
Reasoning-trace quality analysis
Scientific and engineering QA evaluation
Failure-mode diagnosis across intermediate reasoning steps

548 Market Street, PMB 18282, San Francisco, CA 94104

Rubric-based reasoning data pack for frontier model improvement

Turning expert evaluation into machine-verifiable training signal

Multi-domain coverage for advanced reasoning models

Built for RL, evaluation, and failure analysis

Request advanced reasoning rubrics access