Hybrid Multilingual RAG + Fine-Tuned Economic Sentiment + Economic Forecast
ENSSEA — Masters Thesis | Si Tayeb Houari | 2025–2026
No index built yet.
⏳ Build Timer
Press Build Index to start.
✅ Strict RAG Mode: OFF
Tips: Speak for at least 1 second · Avoid background noise · English and Arabic supported.
Economic Forecast: ARIMA vs SARIMAX with ensemble sentiment
🎯 Retrieval Ranking Evaluation
Upload a JSON or CSV gold dataset to measure how well the retrieval pipeline finds the right chunks.
JSON format:
[{"query": "...", "gold_doc_id": "file.pdf", "gold_chunk_ids": [12, 13], "gold_page": 3}]
CSV format (columns: query, gold_doc_id, gold_chunk_ids, gold_page).
All fields except query are optional — missing ones are safely ignored.
Per-Query Results
🤖 Hybrid Multilingual RAG + Economic Sentiment + Forecast
ENSSEA — Masters Thesis | Si Tayeb Houari | 2025–2026
🏗️ Architecture Overview
This application is a full end-to-end research pipeline that combines:
- Hybrid Retrieval-Augmented Generation (RAG)
- Multilingual Economic Sentiment Analysis
- Supervised Fine-Tuned Sentiment Model
- Time-Series Economic Forecasting
- Retrieval Quality Evaluation
🔍 Tab-by-Tab Explanation
1 · Upload Files
Upload PDF, TXT, CSV, or DOCX files. The system extracts text, splits it into overlapping chunks (≈320 characters with 90-char overlap), embeds every chunk with a multilingual sentence transformer, and stores them in a FAISS flat inner-product index. Metadata (filename, page, year, language) is preserved per chunk.
- Similarity Threshold — chunks scoring below this cosine similarity are filtered before reranking.
- Strict RAG Mode — when ON, the chatbot only answers from the index and never falls back to general knowledge.
- Save / Load Index — serialises the FAISS index + metadata to
/tmpso it survives a restart.
2 · Sentiment / Search
Enter a word or phrase. The system:
- Searches the index for exact keyword hits (regex, case-insensitive).
- Runs the full hybrid retrieval pipeline to find the most semantically relevant chunks.
- Computes an ensemble sentiment score using 4 signals:
| Signal | Weight (general) | Weight (economic) |
|---|---|---|
| FinBERT | 40% | 15% |
| XLM-RoBERTa | 20% | — |
| Fine-Tuned Econ Model | 25% | 60% |
| Lexicon | 15% | 25% |
3 · Smart Chatbot
A grounded conversational agent. For each user message:
- Intent detection — "explain the PDF" routes to document summarisation mode.
- RAG retrieve — retrieves top-4 chunks using the pipeline described below.
- Evidence check — decides whether retrieved evidence is strong, partial, or absent.
- LLM (Llama-3.3-70B via Groq) — generates the answer with a strict grounding system prompt.
- Cites source filenames and page numbers inline.
4 · Voice Interface
Record your question via microphone:
- ASR — OpenAI Whisper-Small transcribes the audio (mono float32, any sample rate).
- The transcript is passed to the Smart Chatbot pipeline.
- The text answer is converted to speech with gTTS (Google Text-to-Speech), supporting English and Arabic.
- Common issues (too-short audio, ASR load failure) are caught and shown as readable error messages.
5 · Analytics
Session and corpus statistics:
- Word Cloud — frequency-weighted scatter of the most common tokens.
- Language Distribution — pie chart of Arabic vs English chunks.
- Top Keywords — horizontal bar chart of the 15 most frequent English words (stop-words removed).
- Live session counters (questions asked, RAG hits, fallbacks, build time).
6 · Forecast
Fetches macroeconomic time-series from the World Bank API (GDP growth, CPI inflation, unemployment, exchange rate) and fits two competing models:
| Model | Description |
|---|---|
| ARIMA(1,1,1) | Baseline autoregressive integrated moving-average |
| SARIMAX+Ensemble | ARIMA augmented with the document-derived ensemble sentiment index as an exogenous variable |
Statistical validation:
- ADF test — checks stationarity before Granger testing.
- Granger causality — tests whether sentiment Granger-causes the economic target.
- Diebold-Mariano test — formally tests whether SARIMAX is significantly better than ARIMA.
7 · Auto RAG Eval
Automatically constructs a held-out evaluation set from the indexed chunks and measures:
| Metric | Definition |
|---|---|
| Context Recall | Token-F1 between ground-truth and the best retrieved chunk |
| Faithfulness | Token-F1 between the generated answer and the top retrieved chunk |
| Answer Relevancy | Token-F1 between the generated answer and the ground truth |
| Context Precision | Mean token-F1 across all retrieved chunks vs. ground truth |
8 · Retrieval Ranking Eval (new)
Upload a JSON or CSV gold-labelled evaluation set and measure how well the production retrieval pipeline ranks relevant documents:
| Metric | Definition |
|---|---|
| Recall@K | Was a relevant chunk found in the top-K results? (K = 1, 3, 5) |
| Precision@3 | Fraction of top-3 results that are relevant |
| MRR | Mean Reciprocal Rank — average of 1/rank of first relevant hit |
| Avg Latency | Wall-clock time per query in milliseconds |
Relevance matching (in priority order):
file == gold_doc_id— exact filename matchchunk_id in gold_chunk_ids— exact chunk-level matchpage == gold_page— page-level match (fallback)
Gold dataset format (JSON):
[
{
"query": "What is the GDP growth rate?",
"gold_doc_id": "report_2023.pdf",
"gold_chunk_ids": [12, 13],
"gold_page": 3
}
]
All fields except query are optional — missing fields are safely skipped.
🛠️ Core Retrieval Pipeline
Every query goes through this exact pipeline (shared by chatbot, voice, and both evaluation tabs):
Query
│
├─ 1. Dense retrieval (FAISS inner product, top-48 candidates)
│ └── paraphrase-multilingual-MiniLM-L12-v2
│
├─ 2. Filter (cosine similarity < threshold → drop)
│
├─ 3. Hybrid scoring
│ ├── Semantic score × 0.45
│ ├── BM25 score × up to 0.25
│ ├── Keyword overlap × 0.20
│ └── Exact match bonus + 0.10
│
├─ 4. Cross-encoder reranker (ms-marco-MiniLM-L-6-v2)
│ └── final = hybrid × 0.45 + CE_norm × 0.55
│
├─ 5. Deduplicate (max 2 chunks per file)
│
└─ 6. Return top-N candidates with metadata
🧰 Key Libraries
| Library | Role |
|---|---|
sentence-transformers |
Dense embedding + cross-encoder reranking |
faiss-cpu |
ANN vector search |
transformers |
FinBERT, XLM-RoBERTa, Whisper ASR |
groq |
Llama-3.3-70B LLM inference |
gTTS |
Text-to-speech for voice output |
langdetect |
Language detection per chunk |
statsmodels |
ARIMA, SARIMAX, Granger, ADF |
gradio |
Web UI |
pypdf / PyPDF2 |
PDF text extraction |
python-docx |
DOCX text extraction |
Developed as part of a Masters thesis in Economics & Data Science at ENSSEA, Algeria.