MiniDXNN Examples

Example applications for GPU-accelerated MLP inference and training with DirectX 12 Cooperative Vector.

Example 01: Texture Inference
Example 02: Texture Training
MLP Model Format

Example 01: Texture Inference

Loads a pre-trained MLP binary and reconstructs a texture on the GPU.

Workflow: Train & export (Python) → Load binary → GPU inference → Save PPM image

Step-by-Step

1. Train a model (Python)

pip install torch numpy matplotlib
cd scripts/pyreference
python texture_reconstruction_mlp.py            # defaults: 3 hidden layers, 64 neurons, leaky_relu

Training options

Option	Default	Description
`--backbone-layers`	`3`	Number of hidden layers
`--hidden-dim`	`64`	Neurons per hidden layer
`--activation`	`leaky_relu`	`identity`, `sigmoid`, `tanh`, `relu`, `leaky_relu`
`--texture-pattern`	`checkerboard`	`gradient`, `checkerboard`, `stripes`, `circle`, `perlin`
`--epochs`	`30`	Training iterations
`--learning-rate`	`0.01`	Optimizer learning rate
`--optimizer`	`adam`	`sgd`, `adam`, `lion`
`--dtype`	`float`	`float`, `half`
`--samples`	`50000`	Number of training samples
`--batch-size`	`500`	Training batch size
`--no-display`	`false`	Skip matplotlib display and exit automatically
`--seed`	`987654321`	Random seed
`--texture-width`	`1024`	Texture width
`--texture-height`	`1024`	Texture height

2. Run GPU inference

cd build/example
Release/01-texture-inference.exe ../../scripts/pyreference/texture-mlp-data.bin output.ppm

Argument	Required	Default	Description
`mlp-binary`	Yes	—	Path to MLP binary file
`output`	No	`mlp-inference-output.ppm`	Output PPM image
`--texture-width`	No	`1024`	Image width
`--texture-height`	No	`1024`	Image height
`--cpu`	No	`false`	Use CPU reference ML operations instead of GPU
`--cpp-fallback`	No	`false`	Use C++ fallback (mlp.hlsl compiled as C++)
`--software-linalg`	No	`false`	Use software linear algebra instead of Cooperative Vector
`--debug`	No	`false`	Verbose output

Example 02: Texture Training

Trains an MLP entirely on the GPU to learn a 2D texture pattern, then reconstructs the texture using the trained model.

Workflow: Init weights → GPU mini-batch training (SGD/Adam/Lion) → Reconstruct texture → Save PPM image

Run

cd build/example
Release/02-texture-training.exe --output-image training-result.ppm

Option	Default	Description
`--backbone-layers`	`3`	Number of hidden layers
`--hidden-dim`	`64`	Neurons per hidden layer
`--activation`	`leaky_relu`	`identity`, `sigmoid`, `tanh`, `relu`, `leaky_relu`
`--bias` / `--no-bias`	`true`	Enable/disable bias in MLP layers
`--epochs`	`30`	Training epochs
`--batch-size`	`500`	Mini-batch size
`--learning-rate`	`0.01`	Optimizer learning rate
`--optimizer`	`adam`	`sgd`, `adam`, `lion`
`--samples`	`50000`	Number of training samples
`--texture-width`	`1024`	Texture width
`--texture-height`	`1024`	Texture height
`--texture-pattern`	`checkerboard`	`gradient`, `checkerboard`, `stripes`, `circle`, `perlin`
`--output-image`	`mlp-training-output.ppm`	Output PPM image
`--cpu`	`false`	Use CPU reference ML operations instead of GPU
`--cpp-fallback`	`false`	Use C++ fallback (mlp.hlsl compiled as C++)
`--software-linalg`	`false`	Use software linear algebra instead of Cooperative Vector
`--debug`	`false`	Enable debug mode for detailed output
`--seed`	`987654321`	Random seed

GPU Kernel Details

The training compute shaders (kernel/02_texture_training.comp) demonstrate:

Using mininn::TrainingLayerDataRef to bind weight, bias, gradient, and logits cache buffers
Calling mininn::forward() followed by mininn::backward() for a full training step
GPU-side optimizer kernels (SGD, Adam, Lion) that read gradients and update weights directly on the GPU
Shared optimizer implementations in kernel/optimizer.hlsl that work on packed byte buffers

MLP Model Format

Binary format used by Example 01 (produced by the Python training script):

Header (12 bytes):
  int32  numHiddenLayers
  int32  hiddenLayerDim
  int32  activationType    (0=Identity, 1=Sigmoid, 2=Tanh, 3=ReLU, 4=LeakyReLU)

Per layer (numHiddenLayers + 1 layers):
  float32[outputDim × inputDim]   Weight matrix (row-major)
  float32[outputDim]              Bias vector

Layer ordering: Input(2)→Hidden₁→…→Hiddenₙ→Output(2). Hidden layers use the header's activation; the output layer always uses Sigmoid.

For HLSL API details, see MLP HLSL API Reference.

License

MIT License — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiniDXNN Examples

Table of Contents

Example 01: Texture Inference

Step-by-Step

1. Train a model (Python)

2. Run GPU inference

Example 02: Texture Training

Run

GPU Kernel Details

MLP Model Format

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MiniDXNN Examples

Table of Contents

Example 01: Texture Inference

Step-by-Step

1. Train a model (Python)

2. Run GPU inference

Example 02: Texture Training

Run

GPU Kernel Details

MLP Model Format

License