Example applications for GPU-accelerated MLP inference and training with DirectX 12 Cooperative Vector.
Loads a pre-trained MLP binary and reconstructs a texture on the GPU.
Workflow: Train & export (Python) → Load binary → GPU inference → Save PPM image
pip install torch numpy matplotlib
cd scripts/pyreference
python texture_reconstruction_mlp.py # defaults: 3 hidden layers, 64 neurons, leaky_reluTraining options
| Option | Default | Description |
|---|---|---|
--backbone-layers |
3 |
Number of hidden layers |
--hidden-dim |
64 |
Neurons per hidden layer |
--activation |
leaky_relu |
identity, sigmoid, tanh, relu, leaky_relu |
--texture-pattern |
checkerboard |
gradient, checkerboard, stripes, circle, perlin |
--epochs |
30 |
Training iterations |
--learning-rate |
0.01 |
Optimizer learning rate |
--optimizer |
adam |
sgd, adam, lion |
--dtype |
float |
float, half |
--samples |
50000 |
Number of training samples |
--batch-size |
500 |
Training batch size |
--no-display |
false |
Skip matplotlib display and exit automatically |
--seed |
987654321 |
Random seed |
--texture-width |
1024 |
Texture width |
--texture-height |
1024 |
Texture height |
cd build/example
Release/01-texture-inference.exe ../../scripts/pyreference/texture-mlp-data.bin output.ppm| Argument | Required | Default | Description |
|---|---|---|---|
mlp-binary |
Yes | — | Path to MLP binary file |
output |
No | mlp-inference-output.ppm |
Output PPM image |
--texture-width |
No | 1024 |
Image width |
--texture-height |
No | 1024 |
Image height |
--cpu |
No | false |
Use CPU reference ML operations instead of GPU |
--cpp-fallback |
No | false |
Use C++ fallback (mlp.hlsl compiled as C++) |
--software-linalg |
No | false |
Use software linear algebra instead of Cooperative Vector |
--debug |
No | false |
Verbose output |
Trains an MLP entirely on the GPU to learn a 2D texture pattern, then reconstructs the texture using the trained model.
Workflow: Init weights → GPU mini-batch training (SGD/Adam/Lion) → Reconstruct texture → Save PPM image
cd build/example
Release/02-texture-training.exe --output-image training-result.ppm| Option | Default | Description |
|---|---|---|
--backbone-layers |
3 |
Number of hidden layers |
--hidden-dim |
64 |
Neurons per hidden layer |
--activation |
leaky_relu |
identity, sigmoid, tanh, relu, leaky_relu |
--bias / --no-bias |
true |
Enable/disable bias in MLP layers |
--epochs |
30 |
Training epochs |
--batch-size |
500 |
Mini-batch size |
--learning-rate |
0.01 |
Optimizer learning rate |
--optimizer |
adam |
sgd, adam, lion |
--samples |
50000 |
Number of training samples |
--texture-width |
1024 |
Texture width |
--texture-height |
1024 |
Texture height |
--texture-pattern |
checkerboard |
gradient, checkerboard, stripes, circle, perlin |
--output-image |
mlp-training-output.ppm |
Output PPM image |
--cpu |
false |
Use CPU reference ML operations instead of GPU |
--cpp-fallback |
false |
Use C++ fallback (mlp.hlsl compiled as C++) |
--software-linalg |
false |
Use software linear algebra instead of Cooperative Vector |
--debug |
false |
Enable debug mode for detailed output |
--seed |
987654321 |
Random seed |
The training compute shaders (kernel/02_texture_training.comp) demonstrate:
- Using
mininn::TrainingLayerDataRefto bind weight, bias, gradient, and logits cache buffers - Calling
mininn::forward()followed bymininn::backward()for a full training step - GPU-side optimizer kernels (SGD, Adam, Lion) that read gradients and update weights directly on the GPU
- Shared optimizer implementations in
kernel/optimizer.hlslthat work on packed byte buffers
Binary format used by Example 01 (produced by the Python training script):
Header (12 bytes):
int32 numHiddenLayers
int32 hiddenLayerDim
int32 activationType (0=Identity, 1=Sigmoid, 2=Tanh, 3=ReLU, 4=LeakyReLU)
Per layer (numHiddenLayers + 1 layers):
float32[outputDim × inputDim] Weight matrix (row-major)
float32[outputDim] Bias vector
Layer ordering: Input(2)→Hidden₁→…→Hiddenₙ→Output(2). Hidden layers use the header's activation; the output layer always uses Sigmoid.
For HLSL API details, see MLP HLSL API Reference.
MIT License — see LICENSE.
Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved.