Train a fully open SmolLM4-750M model

Tralalabs · May 11, 2026, 1:36pm

Hi Hugging Face Team,

I wanted to suggest a possible new small language model for the HuggingFaceTB / SmolLM family: a fully open 750M-class model designed to sit between the smaller SmolLM2 models and the larger 1.7B / 3B models.

The proposed model name is:

SmolLM4-750M

The goal would be a compact, useful, public model that can run on modest hardware while still being strong for chat, coding help, math, summarization, and English/Spanish use.

Suggested high-level settings:

Size class: around 750M parameters
Context window: 16,384 tokens
Model type: causal language model
Main languages: English + Spanish
FineWeb-2 language subset: Spanish / spa_Latn
License target: Apache-2.0
Goal: fully open weights, data recipe, training details, and evaluation details

Suggested dataset stack:

HuggingFaceTB/smollm-corpus — core small-model pretraining mix
HuggingFaceFW/fineweb-edu — high-quality educational web data
HuggingFaceTB/finemath — math and problem-solving
HuggingFaceTB/stack-edu — educational code
HuggingFaceTB/smoltalk2 — main chat / post-training data
HuggingFaceTB/cosmopedia — synthetic textbooks, blogs, and stories
HuggingFaceFW/fineweb-2, Spanish subset spa_Latn — multilingual expansion
open-thoughts/OpenThoughts-114k — compact reasoning traces
HuggingFaceTB/smol-smoltalk — final small-model instruction polish

Why I think this would be useful:

A 750M-class model would be small enough for local and low-resource users, but stronger than ultra-tiny models.
16K context would make it modern and useful without pushing it too far for its size.
English + Spanish support would make it more useful globally while keeping the language scope focused.
The dataset stack follows the SmolLM style: public web, educational data, math, code, synthetic educational text, multilingual data, reasoning data, and small-model-focused instruction tuning.
It would be great for students, hobbyists, researchers, local inference users, and people who want a practical open model below 1B parameters.

I’m leaving the exact internal architecture choices to the Hugging Face team, since you would know best how to design and train it properly.

Thanks for reading,
Erik

Topic		Replies	Views
Smollm or othe SLM's example uses andmfeedback for getting the most of of them Beginners	5	297	October 4, 2024
Easy to grab hello world llm creation tutorial Beginners	0	535	February 12, 2024
Fine-tune model for domain or create language model from scratch Beginners	0	723	May 2, 2022
Which Open Source LLM is suitable for training? Mistral-7B or Llama2-7B? Models	0	1127	April 21, 2024
FastLoRAChat Instruct-tune LLaMA on consumer hardware with shareGPT data Show and Tell	0	714	April 19, 2023

Train a fully open SmolLM4-750M model

Related topics