NAMAA-Space
/

Qari-OCR-0.4.0-VL-4B-Instruct

Model card Files Files and versions

Qari-OCR-0.4.0-VL-4B-Instruct / README.md

oddadmix's picture

Update README.md

aa65b4a verified 4 months ago

|

history blame contribute delete

2.76 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-VL-4B-Instruct
	tags:
	- vision
	- ocr
	- arabic
	- islamic
	- qwen3
	- fine-tuned
	language:
	- ar
	datasets:
	- seemorg/books-ocr
	metrics:
	- wer
	- cer
	- bleu
	library_name: peft
	pipeline_tag: image-to-text
	---

	# Qari-OCR-0.4.0-VL-4B-Instruct

	A vision-language model fine-tuned for OCR on Islamic books and Arabic manuscripts. Based on [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct), trained on 45,000 image-text pairs from the [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr) dataset.

	## Results

	\| Model \| CER ↓ \| WER ↓ \| BLEU ↑ \|
	\|-------\|-------\|-------\|--------\|
	\| Qari-OCR-0.4.0 \| 0.1222 \| 0.2562 \| 68.41 \|
	\| Qwen/Qwen3-VL-4B-Instruct \| 0.4922 \| 0.6966 \| 34.61 \|
	\| Qwen/Qwen3-VL-8B-Instruct \| 0.6876 \| 0.8954 \| 23.89 \|
	\| NAMAA/Qari-0.2.2.1 \| 0.6448 \| 0.5126 \| 21.97 \|
	\| MBZUAI/AIN \| 1.2843 \| 1.2697 \| 3.50 \|

	## Usage

	```python
	from transformers import AutoProcessor, AutoModelForVision2Seq
	from PIL import Image
	import torch

	model_name = "NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct"
	processor = AutoProcessor.from_pretrained(model_name)
	model = AutoModelForVision2Seq.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": f"./{src}"},
	{"type": "text", "text": "Free OCR."},
	],
	}
	]

	# Preparation for inference
	inputs = processor.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_dict=True,
	return_tensors="pt"
	)
	inputs = inputs.to(model.device)

	# Inference: Generation of the output
	generated_ids = model.generate(**inputs, max_new_tokens=2048)
	generated_ids_trimmed = [
	out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
	]
	result = processor.batch_decode(
	generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
	)[0]
	```

	## Training

	- Base model: Qwen/Qwen3-VL-4B-Instruct
	- Dataset: [seemorg/books-ocr](https://huggingface.co/datasets/seemorg/books-ocr)
	- Training samples: 45,000 image-text pairs
	- Domain: Islamic books and Arabic religious texts

	## Limitations

	- Optimized for printed Islamic texts; performance may vary on modern Arabic fonts or handwritten text.
	- Requires reasonable image quality (300+ DPI recommended).
	- Arabic script only.

	## Citation

	```bibtex
	@misc{qari-ocr-0.4.0,
	author = {NAMAA-Space},
	title = {Qari-OCR-0.4.0-VL-4B-Instruct},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct}}
	}
	```