Made a Python failure dataset for DPO/RLHF — how do you source negative examples?

namakoo · April 26, 2026, 7:08am

Hi everyone,

I’ve been quietly building a Python failure dataset for DPO / RLHF
training over the past couple of weeks, running 24/7 on a single
RTX 4060.

The basic idea: an autopilot pipeline generates Python code attempts
for various CS domains (FFT, Monte Carlo, ZKP, etc.), runs each in a
sandboxed pytest container, and keeps the genuine failures with
error logs as rejected-side training data.

Quick stats:

~2K failure rows shipped (v1, v2)
19 CS domains covered
146 downloads since launch

Two questions for DPO / RLHF practitioners here:

1. How are you currently sourcing negative examples for DPO?
Do you have your own pipeline, or rely on synthetic data from larger
models? Curious about the trade-offs you’ve found.

2. What domains do you most need failure data for?
I can pivot the autopilot’s domain priority in a few days, so
concrete requests directly shape what gets generated next.

Free sample (100 rows):

Even one-line replies help calibrate the next release.

-– namakoo

Topic		Replies	Views
Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community 🤗Datasets	0	19	April 29, 2026
ORPO/DPO dataset clarification 🤗Datasets	3	499	August 29, 2024
Autotrain I've wasted my money on it but it doesn't work 🤗AutoTrain	0	705	June 27, 2024
DPO training data format Intermediate	7	2391	September 23, 2024
How to evaluate a tranied model? Models	10	226	September 25, 2025

Made a Python failure dataset for DPO/RLHF — how do you source negative examples?

Related topics