--- license: apache-2.0 base_model: Qwen/Qwen3-VL-4B-Instruct tags: - vision - ocr - arabic - islamic - qwen3 - fine-tuned language: - ar datasets: - seemorg/books-ocr metrics: - wer - cer - bleu library_name: peft pipeline_tag: image-to-text --- # Qari-OCR-0.4.0-VL-4B-Instruct A vision-language model fine-tuned for OCR on Islamic books and Arabic manuscripts. Based on [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct), trained on 45,000 image-text pairs from the [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr) dataset. ## Results | Model | CER ↓ | WER ↓ | BLEU ↑ | |-------|-------|-------|--------| | **Qari-OCR-0.4.0** | **0.1222** | **0.2562** | **68.41** | | Qwen/Qwen3-VL-4B-Instruct | 0.4922 | 0.6966 | 34.61 | | Qwen/Qwen3-VL-8B-Instruct | 0.6876 | 0.8954 | 23.89 | | NAMAA/Qari-0.2.2.1 | 0.6448 | 0.5126 | 21.97 | | MBZUAI/AIN | 1.2843 | 1.2697 | 3.50 | ## Usage ```python from transformers import AutoProcessor, AutoModelForVision2Seq from PIL import Image import torch model_name = "NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct" processor = AutoProcessor.from_pretrained(model_name) model = AutoModelForVision2Seq.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" ) messages = [ { "role": "user", "content": [ {"type": "image", "image": f"./{src}"}, {"type": "text", "text": "Free OCR."}, ], } ] # Preparation for inference inputs = processor.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt" ) inputs = inputs.to(model.device) # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=2048) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] result = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False )[0] ``` ## Training - **Base model:** Qwen/Qwen3-VL-4B-Instruct - **Dataset:** [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr) - **Training samples:** 45,000 image-text pairs - **Domain:** Islamic books and Arabic religious texts ## Limitations - Optimized for printed Islamic texts; performance may vary on modern Arabic fonts or handwritten text. - Requires reasonable image quality (300+ DPI recommended). - Arabic script only. ## Citation ```bibtex @misc{qari-ocr-0.4.0, author = {NAMAA-Space}, title = {Qari-OCR-0.4.0-VL-4B-Instruct}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct}} } ```