Skip to content

Latest commit

 

History

History
140 lines (104 loc) · 5.16 KB

File metadata and controls

140 lines (104 loc) · 5.16 KB

MiniDXNN Examples

Example applications for GPU-accelerated MLP inference and training with DirectX 12 Cooperative Vector.

Table of Contents


Example 01: Texture Inference

Loads a pre-trained MLP binary and reconstructs a texture on the GPU.

Workflow: Train & export (Python) → Load binary → GPU inference → Save PPM image

Step-by-Step

1. Train a model (Python)

pip install torch numpy matplotlib
cd scripts/pyreference
python texture_reconstruction_mlp.py            # defaults: 3 hidden layers, 64 neurons, leaky_relu
Training options
Option Default Description
--backbone-layers 3 Number of hidden layers
--hidden-dim 64 Neurons per hidden layer
--activation leaky_relu identity, sigmoid, tanh, relu, leaky_relu
--texture-pattern checkerboard gradient, checkerboard, stripes, circle, perlin
--epochs 30 Training iterations
--learning-rate 0.01 Optimizer learning rate
--optimizer adam sgd, adam, lion
--dtype float float, half
--samples 50000 Number of training samples
--batch-size 500 Training batch size
--no-display false Skip matplotlib display and exit automatically
--seed 987654321 Random seed
--texture-width 1024 Texture width
--texture-height 1024 Texture height

2. Run GPU inference

cd build/example
Release/01-texture-inference.exe ../../scripts/pyreference/texture-mlp-data.bin output.ppm
Argument Required Default Description
mlp-binary Yes Path to MLP binary file
output No mlp-inference-output.ppm Output PPM image
--texture-width No 1024 Image width
--texture-height No 1024 Image height
--cpu No false Use CPU reference ML operations instead of GPU
--cpp-fallback No false Use C++ fallback (mlp.hlsl compiled as C++)
--software-linalg No false Use software linear algebra instead of Cooperative Vector
--debug No false Verbose output

Example 02: Texture Training

Trains an MLP entirely on the GPU to learn a 2D texture pattern, then reconstructs the texture using the trained model.

Workflow: Init weights → GPU mini-batch training (SGD/Adam/Lion) → Reconstruct texture → Save PPM image

Run

cd build/example
Release/02-texture-training.exe --output-image training-result.ppm
Option Default Description
--backbone-layers 3 Number of hidden layers
--hidden-dim 64 Neurons per hidden layer
--activation leaky_relu identity, sigmoid, tanh, relu, leaky_relu
--bias / --no-bias true Enable/disable bias in MLP layers
--epochs 30 Training epochs
--batch-size 500 Mini-batch size
--learning-rate 0.01 Optimizer learning rate
--optimizer adam sgd, adam, lion
--samples 50000 Number of training samples
--texture-width 1024 Texture width
--texture-height 1024 Texture height
--texture-pattern checkerboard gradient, checkerboard, stripes, circle, perlin
--output-image mlp-training-output.ppm Output PPM image
--cpu false Use CPU reference ML operations instead of GPU
--cpp-fallback false Use C++ fallback (mlp.hlsl compiled as C++)
--software-linalg false Use software linear algebra instead of Cooperative Vector
--debug false Enable debug mode for detailed output
--seed 987654321 Random seed

GPU Kernel Details

The training compute shaders (kernel/02_texture_training.comp) demonstrate:

  • Using mininn::TrainingLayerDataRef to bind weight, bias, gradient, and logits cache buffers
  • Calling mininn::forward() followed by mininn::backward() for a full training step
  • GPU-side optimizer kernels (SGD, Adam, Lion) that read gradients and update weights directly on the GPU
  • Shared optimizer implementations in kernel/optimizer.hlsl that work on packed byte buffers

MLP Model Format

Binary format used by Example 01 (produced by the Python training script):

Header (12 bytes):
  int32  numHiddenLayers
  int32  hiddenLayerDim
  int32  activationType    (0=Identity, 1=Sigmoid, 2=Tanh, 3=ReLU, 4=LeakyReLU)

Per layer (numHiddenLayers + 1 layers):
  float32[outputDim × inputDim]   Weight matrix (row-major)
  float32[outputDim]              Bias vector

Layer ordering: Input(2)→Hidden₁→…→Hiddenₙ→Output(2). Hidden layers use the header's activation; the output layer always uses Sigmoid.


For HLSL API details, see MLP HLSL API Reference.

License

MIT License — see LICENSE.

Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved.