Hi Hugging Face Team,
I wanted to suggest a possible new small language model for the HuggingFaceTB / SmolLM family: a fully open 750M-class model designed to sit between the smaller SmolLM2 models and the larger 1.7B / 3B models.
The proposed model name is:
SmolLM4-750M
The goal would be a compact, useful, public model that can run on modest hardware while still being strong for chat, coding help, math, summarization, and English/Spanish use.
Suggested high-level settings:
-
Size class: around 750M parameters
-
Context window: 16,384 tokens
-
Model type: causal language model
-
Main languages: English + Spanish
-
FineWeb-2 language subset: Spanish /
spa_Latn -
License target: Apache-2.0
-
Goal: fully open weights, data recipe, training details, and evaluation details
Suggested dataset stack:
-
HuggingFaceTB/smollm-corpus — core small-model pretraining mix
-
HuggingFaceFW/fineweb-edu — high-quality educational web data
-
HuggingFaceTB/finemath — math and problem-solving
-
HuggingFaceTB/stack-edu — educational code
-
HuggingFaceTB/smoltalk2 — main chat / post-training data
-
HuggingFaceTB/cosmopedia — synthetic textbooks, blogs, and stories
-
HuggingFaceFW/fineweb-2, Spanish subset spa_Latn — multilingual expansion
-
open-thoughts/OpenThoughts-114k — compact reasoning traces
-
HuggingFaceTB/smol-smoltalk — final small-model instruction polish
Why I think this would be useful:
-
A 750M-class model would be small enough for local and low-resource users, but stronger than ultra-tiny models.
-
16K context would make it modern and useful without pushing it too far for its size.
-
English + Spanish support would make it more useful globally while keeping the language scope focused.
-
The dataset stack follows the SmolLM style: public web, educational data, math, code, synthetic educational text, multilingual data, reasoning data, and small-model-focused instruction tuning.
-
It would be great for students, hobbyists, researchers, local inference users, and people who want a practical open model below 1B parameters.
I’m leaving the exact internal architecture choices to the Hugging Face team, since you would know best how to design and train it properly.
Thanks for reading,
Erik