Chunker – Major Performance & Memory Optimization Update & Support for Folia
This version of Chunker brings significant improvements in memory efficiency, GC behavior, and compatibility — especially under high parallel load configurations.
Improvements
-
Coordinate Caching
- Introduced a lightweight cache for coordinates to reduce frequent object creation and Java GC pressure.
- This dramatically reduces heap usage during high-throughput pre-generation.
-
Memory Leak Fixes
- Resolved an issue where tracked chunks for players were not being cleared correctly, causing memory to grow over time even after chunks were unloaded. This would only happend if players where in the server and running pregen in the background at the same time.
-
Smarter Memory Footprint
- Optimized several internal variables and data structures to further reduce overall memory usage, especially when running across all 3 dimensions.
-
Added Support for Folia
- Chunker is now fully compatible with Folia servers, alongside all existing supported server types.
Benchmark Results – Memory Usage Comparison
Two tests were performed to compare memory usage between the old and new versions of Chunker, using the same world seed and identical start points.
Test | Old Version | New Version | Improvement |
---|---|---|---|
parallel_tasks_multiplier = 16 (on all dimensions) | 7.2 GB | 3.3 GB | −54.2% |
parallel_tasks_multiplier = 200 (on end dimnension only) | 26.7 GB (crashed) | 11.3 GB | −57.7% |
- Average memory usage reduced by about (50%)
- New version remained stable under extreme chunk rates(about 3200 ChunksPerSecond in the end) that caused the old version to crash.
Benchmark Test Setup (Reproducible)
JVM Launch Parameters (start.bat
):
@echo off
java -Xms1G -Xmx30G -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=30 -XX:G1MaxNewSizePercent=40 -XX:G1HeapRegionSize=8M -XX:G1ReservePercent=20 -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 -XX:InitiatingHeapOccupancyPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem -XX:MaxTenuringThreshold=1 -XX:+OptimizeStringConcat -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -XX:+UseNUMA -XX:ParallelGCThreads=16 -XX:ConcGCThreads=16 -XX:MaxGCPauseMillis=50 -Dusing.aikars.flags=https://mcflags.emc.gs -Daikars.new.flags=true -jar server.jar --nogui
pause
Xms1G
andXmx30G
should be updated to match your minimum(Xms
) and max memory(Xmx
) you want for your own server- Update both
XX:ParallelGCThreads
andXX:ConcGCThreads
to match your number of threads
Paper Configuration (paper-global.yml
):
chunk-loading-advanced:
auto-config-send-distance: true
player-max-concurrent-chunk-generates: -1 # removes per-player generation limits
player-max-concurrent-chunk-loads: -1 # removes per-player load limits
chunk-loading-basic:
player-max-chunk-generate-rate: -1.0 # removes chunk generation rate cap
player-max-chunk-load-rate: -1.0 # removes chunk load rate cap
player-max-chunk-send-rate: -1.0 # removes chunk send rate cap
chunk-system:
gen-parallelism: default # allows Paper to scale based on system threads
io-threads: 16 # matches CPU thread count for optimal throughput
worker-threads: 16 # matches CPU thread count for chunk processing
region-file-cache-size: 16 # reduced cache size to lower memory usage while pre-generating with no players online; 16 is a balanced middle ground (lowest allowed is 4)
Testing Methodology:
- All world data was reset between tests, except
level.dat
, to keep the same world seed. - Chunker plugin data was wiped to start fresh from
(0, 0)
in each test. - Each test was run for 1 hour for the parallel_tasks_multiplier 16 test and 15 min for parallel_tasks_multiplier 200 test, allowing memory to stabilize.
- Console print statements may still cause slight memory increases due to terminal buffer growth.
Tips for Best Performance
- Use Paper or Folia with chunk-loading limits removed (as shown above).
- Tune
parallel_tasks_multiplier
andtask_queue_timer
based on your system’s CPU core count and available memory. - Use
print_update_delay
of 10s+ to avoid excessive log output during long runs. - Lower
region-file-cache-size
if running pregen on a fresh world with no players online.
- I’ve improved how the plugin schedules chunk generation. Previously, each task started its own timer, causing many timers to run at once and creating high CPU usage.
- Now, Chunker only runs a single repeating process that handles multiple chunks per cycle, which lowers the initial CPU load when generating large numbers of chunks(there will still be an initial spike but not for as long). This should prevent the runaway increase in CPU usage when scheduling higher than 16 for
parallelTasksMultiplier
. - Was able to do 16
parallelTasksMultiplier
for all worlds on a 9800X3D with some headroom left over.
Update to 1.21.4 and bug fix
- Updated to 1.21.4 API
- Fixed bug that would prevent new tasks from running with the code claiming a task was already enabled when it was not.
- Tasks are now removed from the
tasks
map when the pre-generation completes or is disabled, preventing "already enabled/disabled" console errors.
- Fixed a bug were enabling
autorun
caused theparallel_tasks_multiplier
to be ignored, always setting an equal number of tasks across all threads. - Updated chunker to improve performance and memory efficiency, resulting in a ~5-7% uplift in performance, with chunk processing increasing from 198-200 to 208-215 chunks per second.
New:
- Can now run multiple generation tasks at the same time, all with their own settings.
- New print to tell you how long tasks took to complete.
- Added
settings.yml
that includes more in-depth settings as well as the option to run tasks automatically when there are no players on the server. Tasks are terminated when any players connect.
# Configuration
# auto_run: Set to true if you want pre-generation to start automatically when no players are on the server.
# Acceptable values: true or false
# task_queue_timer: Determines how fast chunks are queued up. A value between 50-70 is recommended for modern AMD 5000 series and Intel 13th Gen CPUs in the Overworld,
# Adjust based on performance needs.
# parallel_tasks_multiplier: Sets the number of async tasks running concurrently. 'auto' will distribute the tasks based on your thread count.
# You can also set a specific integer value (e.g., 2, 4). It's recommended to stay below your total thread count.
# Example with 'auto' and 12 threads:
# world:
# parallel_tasks_multiplier: 4
# world_nether:
# parallel_tasks_multiplier: 4
# world_the_end:
# parallel_tasks_multiplier: 4
# print_update_delay: How often to print information (s-Seconds, m-Minutes, h-Hours). Default is 5s (5 seconds).
# radius: Defines how far the pre-generator should run (b-Blocks, c-Chunks, r-Regions) or 'default' to pre-generate until the world border.
# Settings
world:
auto_run: false # Acceptable values: true or false
task_queue_timer: 60 # Acceptable range: positive integer
parallel_tasks_multiplier: auto # 'auto' or a positive integer value
print_update_delay: 5s # Format: [value][s|m|h]. Example: 5s, 2h, 1d
radius: default # Format: [value][b|c|r]. Example: 100b, 1c, 10r, or 'default'
world_nether:
auto_run: false
task_queue_timer: 60
parallel_tasks_multiplier: auto
print_update_delay: 5s
radius: default
world_the_end:
auto_run: false
task_queue_timer: 60
parallel_tasks_multiplier: auto
print_update_delay: 5s
radius: default
Changes:
/pregenoff
has been updated to allow you to shut off specific generation tasks per world by using/pregenoff [world]
. The default behavior for/pregenoff
with no arguments will shut down all pre-generation tasks.- Parallel task multiplier increases load more linearly, allowing for better control of the load each task puts on the server. This means you won't have to push the parallel task multiplier past your thread count anymore.
- Moved away from using PaperLib implementation and are instead using Paper's methods directly. This improved performance slightly for Paper server forks.
- Improved printing for Bukkit/Spigot as well as performance. Default Bukkit/Spigot behavior with PaperLib was quite slow, only loading about 20 chunks per second. The new custom implementation for pre-generating chunks with Bukkit/Spigot is about 2-3x faster.
Fixes:
- Reworked printing to make it more accurate and removed unnecessary/redundant prints that serve no use.
Old Approach:
- Utilizes the BukkitScheduler to run chunk unload tasks.
- Tasks are scheduled within the game loop, which introduces some overhead and can limit concurrency.
- Uses a scheduler to repeatedly check and unload chunks.
New Approach:
- Uses virtual threads to handle chunk unload tasks.
- Virtual threads are lightweight, allowing for many more concurrent tasks without significant performance penalties.
- Runs tasks asynchronously, removing the dependency on the game loop scheduler.
Why the New Approach is Better:
- The new approach using virtual threads offers lightweight concurrency and reduced overhead compared to the old scheduler-based method, simplifying the code and improving scalability for handling chunk unloading in Minecraft.