minerl

MineRL BASALT Competition 2022

Pixel art tile textures generated by a Dreambooth model I trained.

My strategy was inspired by this keynote by Josh Tenenbaum.

My top priority for the contest was to create a prior for intuitive physics of voxel terrain, ignoring dynamic geometry. My theory was that if the input to VPT fine tuning was a symbolic world model instead of raw pixels, then training any objective would be much easier (possible).

This invovled:

Creating an neural network to predict a voxel heightmap relative to a camera view
The agent running a headless game engine simulation of its actions in the context of a scene predition

I was only able to pass through two milestones within the time limits of the competition. I used Blender to headlessly render tens of thousands of synthetically generated voxel heightmaps and their corresponding occlusion renders (each voxel index was represented by a distinct color).

Then, I trained a neural network on this purely synthetic data: voxel_sight github repo

I originally tried to visualize the outputs by recreating the 3D scenes, but I found it was easier to view predicted heightmaps as 2D images!

From left to right (1. actual scene data), (2. predicted voxel existence), (3. predicted voxel heights), (4. predicted composite)

One thing I wish I had time for is to train on a larger dataset of textures and scenes, as well as performing more data augmentation that would allow better generalization to the non-voxel entities of Minecraft.

I was sponsored to virtually attend NeurIPS, which was super exciting!

My experiments here helped me make progress on procedural infinite world generation with diffusion models, and start to design a seed curation interface for texture generation.