Sentence Similarity
sentence-transformers
Safetensors
English
modernbert
feature-extraction
visual-document-retrieval
cross-modal-distillation
knowledge-distillation
nanovdr
Eval Results (legacy)
text-embeddings-inference
Instructions to use nanovdr/NanoVDR-L with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nanovdr/NanoVDR-L with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nanovdr/NanoVDR-L") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Fix ColPali claim; add single-vector + storage highlights
Browse files
README.md
CHANGED
|
@@ -51,6 +51,12 @@ model-index:
|
|
| 51 |
|
| 52 |
NanoVDR-L is a 151M-parameter text-only query encoder for visual document retrieval, trained via asymmetric cross-modal distillation from [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B). It uses ModernBERT-base + a 2-layer MLP projector and achieves the highest v1 score (82.4) among all NanoVDR variants.
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
## Results
|
| 55 |
|
| 56 |
| Model | Params | ViDoRe v1 | ViDoRe v2 | ViDoRe v3 | Avg Retention |
|
|
|
|
| 51 |
|
| 52 |
NanoVDR-L is a 151M-parameter text-only query encoder for visual document retrieval, trained via asymmetric cross-modal distillation from [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B). It uses ModernBERT-base + a 2-layer MLP projector and achieves the highest v1 score (82.4) among all NanoVDR variants.
|
| 53 |
|
| 54 |
+
### Highlights
|
| 55 |
+
|
| 56 |
+
- **Single-vector retrieval** — queries and documents share the same 2048-dim embedding space as [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B); retrieval is a plain dot product, FAISS-compatible, **4 KB per page** (float16)
|
| 57 |
+
- **Lightweight on storage** — 612 MB model; doc index costs 64× less than ColPali's multi-vector patches
|
| 58 |
+
- **Asymmetric setup** — tiny 151M text encoder at query time; large VLM indexes documents offline once
|
| 59 |
+
|
| 60 |
## Results
|
| 61 |
|
| 62 |
| Model | Params | ViDoRe v1 | ViDoRe v2 | ViDoRe v3 | Avg Retention |
|