oddadmix's picture
Update README.md
aa65b4a verified
---
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
- vision
- ocr
- arabic
- islamic
- qwen3
- fine-tuned
language:
- ar
datasets:
- seemorg/books-ocr
metrics:
- wer
- cer
- bleu
library_name: peft
pipeline_tag: image-to-text
---
# Qari-OCR-0.4.0-VL-4B-Instruct
A vision-language model fine-tuned for OCR on Islamic books and Arabic manuscripts. Based on [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct), trained on 45,000 image-text pairs from the [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr) dataset.
## Results
| Model | CER ↓ | WER ↓ | BLEU ↑ |
|-------|-------|-------|--------|
| **Qari-OCR-0.4.0** | **0.1222** | **0.2562** | **68.41** |
| Qwen/Qwen3-VL-4B-Instruct | 0.4922 | 0.6966 | 34.61 |
| Qwen/Qwen3-VL-8B-Instruct | 0.6876 | 0.8954 | 23.89 |
| NAMAA/Qari-0.2.2.1 | 0.6448 | 0.5126 | 21.97 |
| MBZUAI/AIN | 1.2843 | 1.2697 | 3.50 |
## Usage
```python
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import torch
model_name = "NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": f"./{src}"},
{"type": "text", "text": "Free OCR."},
],
}
]
# Preparation for inference
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
result = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
```
## Training
- **Base model:** Qwen/Qwen3-VL-4B-Instruct
- **Dataset:** [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr)
- **Training samples:** 45,000 image-text pairs
- **Domain:** Islamic books and Arabic religious texts
## Limitations
- Optimized for printed Islamic texts; performance may vary on modern Arabic fonts or handwritten text.
- Requires reasonable image quality (300+ DPI recommended).
- Arabic script only.
## Citation
```bibtex
@misc{qari-ocr-0.4.0,
author = {NAMAA-Space},
title = {Qari-OCR-0.4.0-VL-4B-Instruct},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct}}
}
```