Upload Files (PDF / TXT / CSV / DOCX)

No index built yet.

⏳ Build Timer

Press Build Index to start.

Similarity Threshold

0.05 0.6

Strict RAG Mode

✅ Strict RAG Mode: OFF

Enter text or keyword

Sentiment

Score

Report

Message

Chat Export

🎙️ How to use: Click the microphone below, speak your question clearly, then press Ask by Voice. The system will transcribe your speech, search the documents, and read the answer aloud.

Tips: Speak for at least 1 second · Avoid background noise · English and Arabic supported.

🎤 Record your question

📝 Transcript

Voice Conversation

🔊 Answer (spoken)

Word Cloud

Language Distribution

Top Keywords

Economic Forecast: ARIMA vs SARIMAX with ensemble sentiment

Country Code (ISO)

Target Variable

Start Year

1990 2022

End Year

2000 2025

Forecast Chart

Number of evaluation samples

4 20

Evaluation Chart

🎯 Retrieval Ranking Evaluation

Upload a JSON or CSV gold dataset to measure how well the retrieval pipeline finds the right chunks.

JSON format:

[{"query": "...", "gold_doc_id": "file.pdf", "gold_chunk_ids": [12, 13], "gold_page": 3}]

CSV format (columns: query, gold_doc_id, gold_chunk_ids, gold_page). All fields except query are optional — missing ones are safely ignored.

Upload Gold Dataset (.json or .csv)

Sample file

Per-Query Results

Per-Query Results

Download Results CSV

Metrics Chart

🤖 Hybrid Multilingual RAG + Economic Sentiment + Forecast

ENSSEA — Masters Thesis | Si Tayeb Houari | 2025–2026

🏗️ Architecture Overview

This application is a full end-to-end research pipeline that combines:

Hybrid Retrieval-Augmented Generation (RAG)
Multilingual Economic Sentiment Analysis
Supervised Fine-Tuned Sentiment Model
Time-Series Economic Forecasting
Retrieval Quality Evaluation

🔍 Tab-by-Tab Explanation

1 · Upload Files

Upload PDF, TXT, CSV, or DOCX files. The system extracts text, splits it into overlapping chunks (≈320 characters with 90-char overlap), embeds every chunk with a multilingual sentence transformer, and stores them in a FAISS flat inner-product index. Metadata (filename, page, year, language) is preserved per chunk.

Similarity Threshold — chunks scoring below this cosine similarity are filtered before reranking.
Strict RAG Mode — when ON, the chatbot only answers from the index and never falls back to general knowledge.
Save / Load Index — serialises the FAISS index + metadata to /tmp so it survives a restart.

2 · Sentiment / Search

Enter a word or phrase. The system:

Searches the index for exact keyword hits (regex, case-insensitive).
Runs the full hybrid retrieval pipeline to find the most semantically relevant chunks.
Computes an ensemble sentiment score using 4 signals:

Signal	Weight (general)	Weight (economic)
FinBERT	40%	15%
XLM-RoBERTa	20%	—
Fine-Tuned Econ Model	25%	60%
Lexicon	15%	25%

3 · Smart Chatbot

A grounded conversational agent. For each user message:

Intent detection — "explain the PDF" routes to document summarisation mode.
RAG retrieve — retrieves top-4 chunks using the pipeline described below.
Evidence check — decides whether retrieved evidence is strong, partial, or absent.
LLM (Llama-3.3-70B via Groq) — generates the answer with a strict grounding system prompt.
Cites source filenames and page numbers inline.

4 · Voice Interface

Record your question via microphone:

ASR — OpenAI Whisper-Small transcribes the audio (mono float32, any sample rate).
The transcript is passed to the Smart Chatbot pipeline.
The text answer is converted to speech with gTTS (Google Text-to-Speech), supporting English and Arabic.
Common issues (too-short audio, ASR load failure) are caught and shown as readable error messages.

5 · Analytics

Session and corpus statistics:

Word Cloud — frequency-weighted scatter of the most common tokens.
Language Distribution — pie chart of Arabic vs English chunks.
Top Keywords — horizontal bar chart of the 15 most frequent English words (stop-words removed).
Live session counters (questions asked, RAG hits, fallbacks, build time).

6 · Forecast

Fetches macroeconomic time-series from the World Bank API (GDP growth, CPI inflation, unemployment, exchange rate) and fits two competing models:

Model	Description
ARIMA(1,1,1)	Baseline autoregressive integrated moving-average
SARIMAX+Ensemble	ARIMA augmented with the document-derived ensemble sentiment index as an exogenous variable

Statistical validation:

ADF test — checks stationarity before Granger testing.
Granger causality — tests whether sentiment Granger-causes the economic target.
Diebold-Mariano test — formally tests whether SARIMAX is significantly better than ARIMA.

7 · Auto RAG Eval

Automatically constructs a held-out evaluation set from the indexed chunks and measures:

Metric	Definition
Context Recall	Token-F1 between ground-truth and the best retrieved chunk
Faithfulness	Token-F1 between the generated answer and the top retrieved chunk
Answer Relevancy	Token-F1 between the generated answer and the ground truth
Context Precision	Mean token-F1 across all retrieved chunks vs. ground truth

8 · Retrieval Ranking Eval (new)

Upload a JSON or CSV gold-labelled evaluation set and measure how well the production retrieval pipeline ranks relevant documents:

Metric	Definition
Recall@K	Was a relevant chunk found in the top-K results? (K = 1, 3, 5)
Precision@3	Fraction of top-3 results that are relevant
MRR	Mean Reciprocal Rank — average of 1/rank of first relevant hit
Avg Latency	Wall-clock time per query in milliseconds

Relevance matching (in priority order):

file == gold_doc_id — exact filename match
chunk_id in gold_chunk_ids — exact chunk-level match
page == gold_page — page-level match (fallback)

Gold dataset format (JSON):

[
  {
    "query": "What is the GDP growth rate?",
    "gold_doc_id": "report_2023.pdf",
    "gold_chunk_ids": [12, 13],
    "gold_page": 3
  }
]

All fields except query are optional — missing fields are safely skipped.

🛠️ Core Retrieval Pipeline

Every query goes through this exact pipeline (shared by chatbot, voice, and both evaluation tabs):

Query
  │
  ├─ 1. Dense retrieval  (FAISS inner product, top-48 candidates)
  │       └── paraphrase-multilingual-MiniLM-L12-v2
  │
  ├─ 2. Filter  (cosine similarity < threshold → drop)
  │
  ├─ 3. Hybrid scoring
  │       ├── Semantic score  × 0.45
  │       ├── BM25 score      × up to 0.25
  │       ├── Keyword overlap × 0.20
  │       └── Exact match bonus + 0.10
  │
  ├─ 4. Cross-encoder reranker  (ms-marco-MiniLM-L-6-v2)
  │       └── final = hybrid × 0.45 + CE_norm × 0.55
  │
  ├─ 5. Deduplicate  (max 2 chunks per file)
  │
  └─ 6. Return top-N candidates with metadata

🧰 Key Libraries

Library	Role
`sentence-transformers`	Dense embedding + cross-encoder reranking
`faiss-cpu`	ANN vector search
`transformers`	FinBERT, XLM-RoBERTa, Whisper ASR
`groq`	Llama-3.3-70B LLM inference
`gTTS`	Text-to-speech for voice output
`langdetect`	Language detection per chunk
`statsmodels`	ARIMA, SARIMAX, Granger, ADF
`gradio`	Web UI
`pypdf / PyPDF2`	PDF text extraction
`python-docx`	DOCX text extraction

Developed as part of a Masters thesis in Economics & Data Science at ENSSEA, Algeria.

Hybrid Multilingual RAG + Fine-Tuned Economic Sentiment + Economic Forecast