Somnium

Description

Somnium is a magic focused Minecraft mod in which game mechanics and code logic is generated dynamically at runtime. This is performed by using a chain of agents framework (a series of sequential LLMs) where its input and output is constrained by a preexisting application interface.

Disclaimer: Even if items have a certain description it does not mean that they do what the description says it does. Only singleplayer is officially supported for now.

Issues can either be posted on Github or on Discord.

Gameplay (as of current version)

A Goat Horn can be used to initiate a ritual if it is done in the right place. Burning items may also do something. What else happens is unknown.

Hardware Requirements

In order for the LLMs to not take an excruciatingly long time to process a GPU with at least 8GB of VRAM is recommended. Alternatively at least 16GB of RAM is recommended.

The models require a total of 16.11 GB of storage.

Installation

The mod requires an Ollama service to be running on the host machine (when using a default configuration). In order to download Ollama check the official website. After installing Ollama the following three models need to be installed

qwen2.5vl:7b
ALIENTELLIGENCE/gamemasterroleplaying:latest
qwen3:8b
snowflake-arctic-embed:110m

A model can be installed by running ollama pull <model name>. To check if GPU drivers are set up correctly and Ollama can make use of them, run any model using ollama run <model name> and in a separate CLI instance run ollama ps. If the GPU can be used it should show that the model is running either fully or partially on the GPU. For more details check the Ollama FAQ.

If everything was set up correctly the message Ollama server is online! should be sent in the chat when joining a world.

Technical Details

If you wish to not know the exact mechanisms of the mod before you try it consider this a spoiler warning.

The mod's event handler is listening for certain activities of the player that act as triggers for a ritual. As of now these activities include:

Using a goat horn

When any of the above actions are performed a chain of agents composed of 4 LLMs is used to generate the required outcome. The 4 LLMs are classified as:

Descriptor Model
Filter Model
Context Generation Model
Coder Model

The process takes the following steps

Render data + a priming prompt is processed by the descriptor model generating a natural language description of the image
The generated description + a priming prompt is processed by the filter model which is tasked to determine if the environment is relevant. In this case a relevant environment is whether the player seems to be in a structure resembling a temple of sorts.
If the description is relevant then a log of player activities (what items the player has burned recently) is attached and prompted to the context generator. This model acts as the game master, it being in charge of determining the narrative of game. It generates responses containing descriptions of what should happen in the game.
The coder model is tasked to best implement the generate descriptions into code, using a predefined API.

Python Code, Interpretation & Linking

After the Ollama hosted code generator is done, the code needs to be ran on the server. This basically amounts to arbitrary code execution from a remote location. Even if there is no bad actor involved, the received code is generated by an LLM, so it should generally not be trusted. In order to ameliorate this issue the decision for an, as simple as possible, code interpreter was chosen. This makes the situation better since the code interpreter can be set up to only have access to a certain set of functions that it can call from the original code base. The only question left is what language should be interpreted.

Even if Minecraft is coded in java, python was chosen to be the language that the LLMs have to generate in order to implement features in the game. This was done for the following reasons:

Light syntax making it easier to code the interpreter
Python code is more common on the internet than other programming languages, making it have a large footprint in the datasets that were used for the LLMs.
It is not strongly typed, thus there is a lower chance for the LLM to make a mistake

The Python interpreter that is part of the mod only has the following features coded in

operators
variables
functions
loops
if statements
ternary operators
lambda expressions
lists
tuples

In order for the interpreted Python code to be able to call Java methods, a function linking feature was added. This allows for Python function calls and parameters to be routed to pre-defined Java methods, where the return value of the Java method is also routed back to the return value of the Python code. A set of functions that are part of the interpreter are linked by default (such as the implementation for the function print). Additional functions that are part of the mod's functionality (creating effects, items, healing entities etc) are linked by annotating static methods with @PythonMethodLink. This annotation also takes in a docs parameter which should hold the documentation of the functon. This documentation is important as that is what is prompted to the coder model.

Compatibility

Minecraft: Java Edition

Platforms

Supported environments

Links