mrdbourke/DataComp-1B-food-and-drink-3M
Viewer โข Updated โข 3.11M โข 1.24k
How to use mrdbourke/food-not-food-classifier-siglip2-v2 with timm:
import timm
model = timm.create_model("hf_hub:mrdbourke/food-not-food-classifier-siglip2-v2", pretrained=True)Binary image classifier: food_or_drink vs not_food_or_drink.
Part of the Nutrify pipeline. Role: Highest accuracy.
v2 adds 560,836 human-labeled FoodVision images to the 2,952,644 DataComp training set. FoodVision samples use hard cross-entropy loss; DataComp samples use KL distillation from SigLIP2-so400m soft labels.
| Version | FoodVision Acc | FoodVision F1 | Training Data |
|---|---|---|---|
| v2 | 98.21% | 0.9883 | DataComp 2,952,644 + FoodVision 560,836 |
| v1 | 0.00% | 0.0000 | DataComp only |
| ฮ | +98.21% | +0.9883 |
| Model | Params | FV Accuracy | FV F1 | Role |
|---|
| **SigLIP2 Base 256** | 92.9M | 98.21% | 0.9883 | Highest accuracy |
| CSATv2 11M | 10.7M | 97.99% | 0.9869 | Fastest throughput |
| NextViT Small 384 | 30.7M | 97.84% | 0.9859 | CoreML deployable |
import timm
from PIL import Image
import torch
# Load model
model = timm.create_model("vit_base_patch16_siglip_256.v2_webli", pretrained=False, num_classes=2)
# Load weights
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
weights_path = hf_hub_download("mrdbourke/food-not-food-classifier-siglip2-v2", "model.safetensors")
model.load_state_dict(load_file(weights_path))
model.eval()
# Get transforms
data_cfg = timm.data.resolve_data_config(model.pretrained_cfg)
transform = timm.data.create_transform(**data_cfg, is_training=False)
# Predict
img = Image.open("your_image.jpg").convert("RGB")
x = transform(img).unsqueeze(0)
with torch.no_grad():
logits = model(x)
pred = logits.argmax(dim=1).item()
labels = {0: "food_or_drink", 1: "not_food_or_drink"}
print(f"Prediction: {labels[pred]}")
vit_base_patch16_siglip_256.v2_webli (92.9M parameters)Three weight files are included, each optimized for a different metric:
| File | Selects by | FV Acc | DC Acc | Blended | Epoch | Use case |
|---|---|---|---|---|---|---|
model.safetensors (default) |
Best blended (50/50) | 98.21% | 92.42% | 95.31% | 3 | Balanced โ good at everything |
model_best_fv.safetensors |
Best FoodVision test | 98.34% | 92.28% | 95.31% | 5 | On-device Nutrify deployment |
model_best_dc.safetensors |
Best DataComp val | 98.21% | 92.42% | 95.31% | 3 | Scale-up filtering (menus, panels, recipes) |
To load a specific variant:
# Default (blended)
weights_path = hf_hub_download("mrdbourke/food-not-food-classifier-siglip2-v2", "model.safetensors")
# Best for Nutrify on-device
weights_path = hf_hub_download("mrdbourke/food-not-food-classifier-siglip2-v2", "model_best_fv.safetensors")
# Best for scale-up filtering
weights_path = hf_hub_download("mrdbourke/food-not-food-classifier-siglip2-v2", "model_best_dc.safetensors")
| Version | Repo |
|---|---|
| v2 (this) | mrdbourke/food-not-food-classifier-siglip2-v2 |
| v1 | mrdbourke/food-not-food-classifier-siglip2-v1 |
Training images from DataComp-1B-food-and-drink-3M and the Nutrify FoodVision dataset (714K human-labeled images).
Apache 2.0