Egyptian Arabic Qwen3-TTS — Custom Voice
A fine-tuned Qwen3-TTS 1.7B model specialized in generating Egyptian Arabic (Masri) speech with a natural, conversational tone.
The model was trained on Egyptian Arabic speech data to better capture the dialectal prosody, pronunciation, and conversational rhythm that are not well represented in the base multilingual Qwen3-TTS model.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-TTS-12Hz-1.7B-Base |
| Task | Text-to-Speech (Speech Synthesis) |
| Architecture | Transformer TTS |
| Parameters | 1.7B |
| Speech Codec | 12Hz |
| Voice Type | Custom Voice |
| Speaker Name | egyptian_speaker |
| Primary Language | Egyptian Arabic (عربي مصري) |
| Training Data | ~25 hours clean Egyptian speech |
Motivation
While the base Qwen3-TTS model supports multiple languages, Egyptian Arabic dialect is significantly underrepresented. The base model produced speech that sounded foreign — with inconsistent pronunciation and an unnatural conversational rhythm for Egyptian dialect.
This fine-tune directly addresses those issues:
- Natural Egyptian dialect pronunciation
- Conversational prosody and tone
- Clear, clean speech output
- Retains the original model's multilingual capability
Example Usage
Installation
pip install qwen-tts soundfile torch
Basic Inference
from qwen_tts import Qwen3TTSModel
import soundfile as sf
import torch
model_id = "itshamdi404/Egy_Arabic_Qwen3-TTS-12Hz-1.7B-Base"
tts = Qwen3TTSModel.from_pretrained(
model_id,
device_map={"": 0},
torch_dtype=torch.float16,
)
wavs, sr = tts.generate_custom_voice(
text="إزيك يا صاحبي عامل إيه النهاردة",
speaker="egyptian_speaker",
language="auto",
)
sf.write("speech.wav", wavs[0], sr)
Optional Sampling Parameters
wavs, sr = tts.generate_custom_voice(
text="النهاردة الجو جميل جدا في القاهرة",
speaker="egyptian_speaker",
language="auto",
temperature=0.8, # Controls variation (lower = more consistent)
top_p=0.9, # Nucleus sampling threshold
)
sf.write("speech.wav", wavs[0], sr)
Dataset
The model was fine-tuned on approximately 90 hours of clean Egyptian Arabic speech collected from real spoken Egyptian sources.
The dataset includes a variety of speakers and natural conversational language covering a wide range of topics.
Limitations
- The model is specialized for Egyptian Arabic and may perform worse on other Arabic dialects.
- Performance may degrade on:
- Uncommon or rare vocabulary
- Regional Egyptian sub-dialect variations
- Only a single Egyptian speaker voice is currently available.
- Like most TTS systems, performance may vary on very long or complex sentences.
Future Improvements
Possible future improvements include:
- Adding more speaker voices and diversity
- Training on larger Egyptian Arabic datasets
- Improving robustness across regional Egyptian sub-dialects
- Evaluating across multiple Arabic dialects
Author
Hamdi Mohamed — AI Engineer specializing in:
- Large Language Models (LLMs)
- Speech AI
- Computer Vision
Citation
If you use this model in your research or project, please cite:
@misc{hamdi2026egyptianqwen3tts,
author = {Hamdi Mohamed},
title = {Egyptian Arabic Qwen3-TTS: Fine-tuning Large TTS Models for Regional Dialects},
year = {2026},
url = {https://huggingface.co/itshamdi404/Egy_Arabic_Qwen3-TTS-12Hz-1.7B-Base}
}
- Downloads last month
- 52
Model tree for itshamdi404/Egy_Arabic_Qwen3-TTS-12Hz-1.7B-Base
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-Base