Instructions to use NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-VL-4B-Instruct") model = PeftModel.from_pretrained(base_model, "NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: Qwen/Qwen3-VL-4B-Instruct | |
| tags: | |
| - vision | |
| - ocr | |
| - arabic | |
| - islamic | |
| - qwen3 | |
| - fine-tuned | |
| language: | |
| - ar | |
| datasets: | |
| - seemorg/books-ocr | |
| metrics: | |
| - wer | |
| - cer | |
| - bleu | |
| library_name: peft | |
| pipeline_tag: image-to-text | |
| # Qari-OCR-0.4.0-VL-4B-Instruct | |
| A vision-language model fine-tuned for OCR on Islamic books and Arabic manuscripts. Based on [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct), trained on 45,000 image-text pairs from the [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr) dataset. | |
| ## Results | |
| | Model | CER ↓ | WER ↓ | BLEU ↑ | | |
| |-------|-------|-------|--------| | |
| | **Qari-OCR-0.4.0** | **0.1222** | **0.2562** | **68.41** | | |
| | Qwen/Qwen3-VL-4B-Instruct | 0.4922 | 0.6966 | 34.61 | | |
| | Qwen/Qwen3-VL-8B-Instruct | 0.6876 | 0.8954 | 23.89 | | |
| | NAMAA/Qari-0.2.2.1 | 0.6448 | 0.5126 | 21.97 | | |
| | MBZUAI/AIN | 1.2843 | 1.2697 | 3.50 | | |
| ## Usage | |
| ```python | |
| from transformers import AutoProcessor, AutoModelForVision2Seq | |
| from PIL import Image | |
| import torch | |
| model_name = "NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct" | |
| processor = AutoProcessor.from_pretrained(model_name) | |
| model = AutoModelForVision2Seq.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.float16, | |
| device_map="auto" | |
| ) | |
| messages = [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image", "image": f"./{src}"}, | |
| {"type": "text", "text": "Free OCR."}, | |
| ], | |
| } | |
| ] | |
| # Preparation for inference | |
| inputs = processor.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_dict=True, | |
| return_tensors="pt" | |
| ) | |
| inputs = inputs.to(model.device) | |
| # Inference: Generation of the output | |
| generated_ids = model.generate(**inputs, max_new_tokens=2048) | |
| generated_ids_trimmed = [ | |
| out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) | |
| ] | |
| result = processor.batch_decode( | |
| generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False | |
| )[0] | |
| ``` | |
| ## Training | |
| - **Base model:** Qwen/Qwen3-VL-4B-Instruct | |
| - **Dataset:** [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr) | |
| - **Training samples:** 45,000 image-text pairs | |
| - **Domain:** Islamic books and Arabic religious texts | |
| ## Limitations | |
| - Optimized for printed Islamic texts; performance may vary on modern Arabic fonts or handwritten text. | |
| - Requires reasonable image quality (300+ DPI recommended). | |
| - Arabic script only. | |
| ## Citation | |
| ```bibtex | |
| @misc{qari-ocr-0.4.0, | |
| author = {NAMAA-Space}, | |
| title = {Qari-OCR-0.4.0-VL-4B-Instruct}, | |
| year = {2025}, | |
| publisher = {HuggingFace}, | |
| howpublished = {\url{https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct}} | |
| } | |
| ``` |