Hi everyone,
I’ve been experimenting with embedding a local LLM directly into an indie game called HexJudge.
The game is a medieval witch trial interrogation game where players freely type questions to NPC villagers instead of selecting dialogue options.
NPC dialogue runs fully offline without external APIs, and villagers dynamically reference relationships, suspicions, and previous conversations.
One unexpected challenge is that the AI model itself became larger than most of the actual game content — over 5GB of the build size is currently just model weights.
Recent development has focused on:
- reducing model size
- improving load times
- optimizing inference on lower-end PCs
- making conversations feel believable
The Steam demo recently entered open beta, and it’s been fascinating seeing players create completely different interrogation stories.
I’m curious if anyone else here has experimented with embedding local models directly into gameplay systems.
Have you considered having your game load the model weights in from a folder? Then someone with less VRAM available could stick a smaller model in there whilst someone with more VRAM could choose a more beefier model to use. Then players won’t have to redownload the weights everytime you update the game/rebuild the .exe (I’m assuming that when you say build size you’re referring to everything being inside of a singular compiled file/application). You’d probably want to have the game start with a small LLM downloaded into the folder incase people want to jump right in/can’t figure out how to download model weights.
Yeah, I’ve been thinking about separating the model from the main game build for exactly those reasons.
Right now I prioritized a “just launch and play” approach with everything bundled together, but it definitely creates problems for updates and lower-end PCs.
Supporting multiple model options depending on hardware would also be really interesting in the future.
Still experimenting with the balance between accessibility and flexibility.