---
base_model: bert-base-uncased
library_name: peft
tags:
- peft
- lora
- bert
- masked-lm
- fmri-encoding
- neuroscience
- stat214
license: apache-2.0
---

# stat214-lab3-bert-lora-r4-maxlen256

LoRA adapter for `bert-base-uncased`, fine-tuned on transcripts from the
Huth Lab fMRI story-listening dataset for the **Stat 214 (Spring 2026)**
final project at UC Berkeley.

The adapter is used to extract context-aware word embeddings that are then
fed into a per-voxel ridge regression to predict whole-brain BOLD signal
from spoken-story stimuli.

## Configuration

| Hyperparameter | Value |
|---|---|
| Base model | `bert-base-uncased` |
| LoRA rank `r` | 4 |
| LoRA alpha | 8 |
| LoRA dropout | 0.1 |
| Target modules | `query`, `value` |
| Training objective | Masked Language Modeling (MLM, 15%) |
| Training stories | 86 (Huth Lab podcast transcripts) |
| MLM max sequence length | 256 |
| Epochs | 3 |
| Optimizer | AdamW, lr=2e-4 |
| Batch size | 16 |
| Final MLM training loss | — |

## Encoding-model performance

After extracting per-word embeddings from this adapter (using ±10 word
context windows + Lanczos downsampling + 4 TR delays) and fitting per-voxel
ridge regression on Subjects 2 and 3:

| Subject | Mean CC | Top 5% CC | Top 1% CC | Top-1 voxel |
|---|---|---|---|---|
| Subject 2 | 0.0643 | 0.2143 | 0.2906 | 0.4736 |
| Subject 3 | 0.0660 | 0.2176 | 0.3043 | 0.5159 |

(See full project repository for ridge weights, evaluation code, and SHAP /
LIME word-importance analyses.)

## Loading the adapter

```python
from transformers import BertForMaskedLM, BertTokenizerFast
from peft import PeftModel

tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
base = BertForMaskedLM.from_pretrained("bert-base-uncased")
model = PeftModel.from_pretrained(base, "RheaTinghe/stat214-lab3-bert-lora-r4-maxlen256")
model.eval()

# Extract per-word embeddings via ±10 word context windows
# (see scripts/run_bert_pretrained.py in the project repo for the
# complete extraction pipeline)
```

## Citation

```bibtex
@misc{stat214lab3,
  author = {Galloro, Drew and Wang, Ruihang and Khothsombath, Benjamin and Zhang, Rhea},
  title  = {Stat 214 Lab 3: BERT-LoRA encoding model for fMRI},
  year   = {2026},
  note   = {UC Berkeley Spring 2026},
}
```