nanovdr
/

NanoVDR-L

@@ -51,6 +51,12 @@ model-index:
 NanoVDR-L is a 151M-parameter text-only query encoder for visual document retrieval, trained via asymmetric cross-modal distillation from [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B). It uses ModernBERT-base + a 2-layer MLP projector and achieves the highest v1 score (82.4) among all NanoVDR variants.
 ## Results
 | Model | Params | ViDoRe v1 | ViDoRe v2 | ViDoRe v3 | Avg Retention |

 NanoVDR-L is a 151M-parameter text-only query encoder for visual document retrieval, trained via asymmetric cross-modal distillation from [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B). It uses ModernBERT-base + a 2-layer MLP projector and achieves the highest v1 score (82.4) among all NanoVDR variants.
+### Highlights
+- **Single-vector retrieval** — queries and documents share the same 2048-dim embedding space as [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B); retrieval is a plain dot product, FAISS-compatible, **4 KB per page** (float16)
+- **Lightweight on storage** — 612 MB model; doc index costs 64× less than ColPali's multi-vector patches
+- **Asymmetric setup** — tiny 151M text encoder at query time; large VLM indexes documents offline once
 ## Results
 | Model | Params | ViDoRe v1 | ViDoRe v2 | ViDoRe v3 | Avg Retention |