Embedding a local LLM directly into an indie interrogation game

CheetahTownGames · May 17, 2026, 2:06pm

Hi everyone,

I’ve been experimenting with embedding a local LLM directly into an indie game called HexJudge.

The game is a medieval witch trial interrogation game where players freely type questions to NPC villagers instead of selecting dialogue options.

NPC dialogue runs fully offline without external APIs, and villagers dynamically reference relationships, suspicions, and previous conversations.

One unexpected challenge is that the AI model itself became larger than most of the actual game content — over 5GB of the build size is currently just model weights.

Recent development has focused on:

reducing model size
improving load times
optimizing inference on lower-end PCs
making conversations feel believable

The Steam demo recently entered open beta, and it’s been fascinating seeing players create completely different interrogation stories.

I’m curious if anyone else here has experimented with embedding local models directly into gameplay systems.

EclipsedStar · May 17, 2026, 11:04pm

Have you considered having your game load the model weights in from a folder? Then someone with less VRAM available could stick a smaller model in there whilst someone with more VRAM could choose a more beefier model to use. Then players won’t have to redownload the weights everytime you update the game/rebuild the .exe (I’m assuming that when you say build size you’re referring to everything being inside of a singular compiled file/application). You’d probably want to have the game start with a small LLM downloaded into the folder incase people want to jump right in/can’t figure out how to download model weights.

CheetahTownGames · May 17, 2026, 11:30pm

Yeah, I’ve been thinking about separating the model from the main game build for exactly those reasons.

Right now I prioritized a “just launch and play” approach with everything bundled together, but it definitely creates problems for updates and lower-end PCs.

Supporting multiple model options depending on hardware would also be really interesting in the future.

Still experimenting with the balance between accessibility and flexibility.

Topic		Replies	Views
Token size if planning to use LLM while running a game? Beginners	3	244	April 17, 2026
Isn't there a simpler way to run LLMs / models locally? Beginners	3	2126	April 28, 2025
Local code generation model (12GB VRAM, 32GB RAM) Beginners	4	2521	November 1, 2025
Help with my questions. very new at this Intermediate	9	134	February 20, 2026
Need Suggestions for LLM Models Suitable for 250GB RAM Server Models	0	245	December 29, 2024

Embedding a local LLM directly into an indie interrogation game

Related topics