--- base_model: bert-base-uncased library_name: peft tags: - peft - lora - bert - masked-lm - fmri-encoding - neuroscience - stat214 license: apache-2.0 --- # stat214-lab3-bert-lora-r4-maxlen256 LoRA adapter for `bert-base-uncased`, fine-tuned on transcripts from the Huth Lab fMRI story-listening dataset for the **Stat 214 (Spring 2026)** final project at UC Berkeley. The adapter is used to extract context-aware word embeddings that are then fed into a per-voxel ridge regression to predict whole-brain BOLD signal from spoken-story stimuli. ## Configuration | Hyperparameter | Value | |---|---| | Base model | `bert-base-uncased` | | LoRA rank `r` | 4 | | LoRA alpha | 8 | | LoRA dropout | 0.1 | | Target modules | `query`, `value` | | Training objective | Masked Language Modeling (MLM, 15%) | | Training stories | 86 (Huth Lab podcast transcripts) | | MLM max sequence length | 256 | | Epochs | 3 | | Optimizer | AdamW, lr=2e-4 | | Batch size | 16 | | Final MLM training loss | — | ## Encoding-model performance After extracting per-word embeddings from this adapter (using ±10 word context windows + Lanczos downsampling + 4 TR delays) and fitting per-voxel ridge regression on Subjects 2 and 3: | Subject | Mean CC | Top 5% CC | Top 1% CC | Top-1 voxel | |---|---|---|---|---| | Subject 2 | 0.0643 | 0.2143 | 0.2906 | 0.4736 | | Subject 3 | 0.0660 | 0.2176 | 0.3043 | 0.5159 | (See full project repository for ridge weights, evaluation code, and SHAP / LIME word-importance analyses.) ## Loading the adapter ```python from transformers import BertForMaskedLM, BertTokenizerFast from peft import PeftModel tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") base = BertForMaskedLM.from_pretrained("bert-base-uncased") model = PeftModel.from_pretrained(base, "RheaTinghe/stat214-lab3-bert-lora-r4-maxlen256") model.eval() # Extract per-word embeddings via ±10 word context windows # (see scripts/run_bert_pretrained.py in the project repo for the # complete extraction pipeline) ``` ## Citation ```bibtex @misc{stat214lab3, author = {Galloro, Drew and Wang, Ruihang and Khothsombath, Benjamin and Zhang, Rhea}, title = {Stat 214 Lab 3: BERT-LoRA encoding model for fMRI}, year = {2026}, note = {UC Berkeley Spring 2026}, } ```