For anyone else looking, this can be done, and it’s answered in this question:
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| How to add additional custom pre-tokenization processing? | 6 | 5412 | March 7, 2023 | |
| Custom PostProcessor? | 0 | 947 | November 10, 2022 | |
| Modifying normalizer for pretrained tokenizers don't consistently work | 2 | 149 | June 12, 2024 | |
| Tokenizer post_processor help | 1 | 1435 | October 27, 2022 | |
| Adding atomic / indivisible tokens to BPE tokenizer | 8 | 218 | July 3, 2025 |