Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community

Hi everyone! I’ve been building DinoDS, a modular dataset system for LLM training built around lane-based dataset bundles.

The idea is simple: instead of treating training data like one giant premade dump, I’m organizing it into capability-focused bundles that map to specific assistant behaviors and failure types — things like:

  • retrieval grounding

  • workflow / tool routing

  • memory and continuity

  • structured outputs

  • identity and behavior shaping

I’ve started publishing some of these dataset bundle previews on Hugging Face, and I also made a Space that helps people explore which dataset bundle might actually be useful for their use case.

So the current flow is:

  • explore the DinoDS concept

  • identify what kind of assistant behavior you want to improve

  • see which bundle / lane family fits

  • check out the related dataset previews

I’d really love feedback from the HF community on a few things:

  1. Does this bundle-first / lane-based way of presenting datasets make sense?

  2. Is the Space + dataset bundle flow intuitive?

  3. What would make these previews more useful for people evaluating training data?

  4. Would you rather explore by failure type, capability, or use case?

You can check out the bundles, the Space, and the website here:

Would love thoughts, criticism, and suggestions — especially from people building assistants, copilots, routing systems, or structured-output workflows.

1 Like