Compatibility
Minecraft: Java Edition
Platforms
Supported environments
Tags
Creators
Details
llama.jar 🦙
llama.jar is a lightweight, high-performance Minecraft mod that brings the power of local Large Language Models (LLMs) directly into Minecraft using llama.cpp.
Runs completely offline on your own hardware with no API keys, no subscriptions, and zero external network requests.
[!WARNING] This is an alpha release and has not been fully tested in all environments. Use with caution and please report any bugs or issues you encounter.
Key Features
- Offline Local LLM Inference: Run your favorite open-source models (such as LLaMA-3, Qwen-2.5, Phi-3, Mistral, and more) directly inside the Minecraft process.
- Hardware Acceleration & GPU Support: Supports multi-core CPU execution alongside GPU offloading (CUDA, Metal, OpenCL/CLBlast) for hardware-accelerated generation. Supports multi-GPU configurations.
- Built-in Game Integration: Prompt your loaded models in real-time using in-game commands or listen to interactions.
- Modding Library Base: Fully architected to serve as a library mod. You can import
llama.jarin your custom mods to easily add AI-driven NPCs, dynamic quest generation, or intelligent automated assistants. - Wide Version Compatibility: Supports both Forge and Fabric platforms across multiple Minecraft versions:
1.20.11.21.11.21.11
How to Setup
- Download the Mod: Download the correct jar matching your mod loader (Forge or Fabric) and Minecraft version.
- Add GGUF Models: Download any GGUF format model (e.g.
llama-3-8b-instruct.Q4_K_M.ggufor smaller models likeqwen-2.5-1.5b-instruct) and place the.gguffile inside your.minecraft/models/folder. - Launch Minecraft: The mod will automatically set up the workspace on startup.
In-Game Commands
Manage models and run prompts on the fly using command permissions:
/model list— Lists all available GGUF files in yourmodels/directory./model load <filename>— Loads the selected model into memory./model status— Displays information about the currently active model./llama <prompt>— Submits a prompt to the loaded model and streams the output directly to chat./llama stop— Immediately halts the current text generation.
Configuration
A common configuration file (llamajar-common.toml or llamajar.json) is generated in your config directory. You can tweak performance parameters:
modelName— Name of the GGUF model to load automatically on startup (leave empty to load manually).systemPrompt— Set a default system prompt to customize model behavior, personality, or guidelines (leave empty for no system prompt).gpuLayers— The number of model layers to offload to the GPU (higher offloads more computation to GPU VRAM).threads— Number of CPU threads to allocate to model inference (defaults to matching CPU physical cores).contextLength— The maximum context window size (tokens) for conversations.
Developer Integration (Library Mod Usage)
To use llama.jar as a foundation for your own AI-enabled mod, add it to your development environment.
Gradle Setup
repositories {
mavenLocal() // After publishing llama.jar locally
}
dependencies {
// For Fabric development
modImplementation "com.popr4x.llamajar:llamajar-fabric-1.20.1:alpha-1.0"
// For Forge development
implementation "com.popr4x.llamajar:llamajar-forge-1.20.1:alpha-1.0"
}
Accessing llama.cpp context in Java
import com.llamajar.LlamaJar;
import de.kherud.llama.LlamaModel;
import de.kherud.llama.LlamaIterator;
// Check if a model is currently loaded
if (LlamaJar.isModelLoaded()) {
LlamaModel model = LlamaJar.getModel();
// Perform custom inferences, register custom listeners, or manage model state
}
License
This project is licensed under the AGPLv3 License. Under the hood, it utilizes llama.cpp and the Java JNI bindings from de.kherud:llama licensed under Apache-2.0.


