feat(cuda): add attention forward and backward kernel declarations by Eamon2009 · Pull Request #64 · Eamon2009/Quadtrix.cpp

Eamon2009 · 2026-05-31T18:42:07Z

Summary

Adds the CUDA header declarations (#pragma once) for the core attention mechanism's forward and backward passes within the quadtrix::cuda namespace. This sets up the interface for the upcoming GPU kernel implementations.

Key Additions

attention_forward: Computes the attention mechanism given a combined QKV tensor (input_qkv), storing intermediate states in preatt and att, and writing the final result to output.
attention_backward: Handles the gradient passes, computing grad_input_qkv, grad_preatt, and grad_att from the incoming grad_output.
Configuration Flexibility: Both functions accept an explicit number of attention heads (num_heads) and an optional cudaStream_t for non-blocking asynchronous execution.
Return Types: Functions utilize the internal Status type for unified error handling.

Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com>

Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning.

Eamon2009 · 2026-05-31T19:27:01Z

/run-checks

github-actions · 2026-05-31T19:27:49Z

✅ All checks passed!

* feat(cuda): add attention forward backward kernel declarations (#64) * docs: report [run_20260530_165216] (~791 tok/s) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 * docs:report [run_20260530_165216](~791 tok/s) (#61) Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900 Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add attention forward and backward kernel declarations Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning. --------- Co-authored-by: Max <eamon5174@gmail.com> * feat(cuda): add checkpoint metadata struct and stub functions * feat(cuda): introduce core type definitions and error handling utilities - Defines `DType` and `DeviceKind` enums supporting standard types (F32, F16, BF16, I32, U8). - Implements `dtype_name` and `dtype_size` metadata helper functions. - Adds an explicit `Status` struct for non-throwing error propagation alongside `checked_mul` for safe allocation size computation. - Introduces `check_cuda` and `abort_on_cuda` error macros and handling mechanisms, exposed via the `QUADTRIX_CUDA_CHECK` macro. * feat(cuda): add TokenBatchView struct and DataLoader stub class * feat(cuda): add GeLU activation forward and backward declarations - Introduces the `GeluMode` enum to toggle between `Exact` and `Approximate` mathematical variants. - Declares the `gelu_forward` and `gelu_backward` kernel entrypoints. - Configures both signatures with optional stream execution and a default mode of `GeluMode::Approximate`. * feat(cuda): add gradient norm calculation and clipping interfaces --------- Co-authored-by: Max <eamon5174@gmail.com>

codeaddict-119 and others added 2 commits May 31, 2026 19:36

docs: report [run_20260530_165216] (~791 tok/s)

125ddfa

Includes metrics for generalization gap, throughput (~791 tok/s), and gradient norms. Parameters: 6.68M | lr: 1e-3 | batch: 16 | steps: 6000 - Achieved best validation loss of 4.1319 at step 3900

Eamon2009 self-assigned this May 31, 2026

Eamon2009 requested a review from codeaddict-119 May 31, 2026 18:42

feat(cuda): add attention forward and backward kernel declarations

212b311

Introduces the header declarations for `attention_forward` and `attention_backward` operations inside the `quadtrix::cuda` namespace. Configured with support for custom CUDA streams and head partitioning.

Repository owner deleted a comment from github-actions Bot May 31, 2026

Eamon2009 added 2 commits June 1, 2026 00:49

Merge branch 'master' into codeaddict-master

62c52aa

Merge branch 'master' into codeaddict-master

f1cd13d

Repository owner deleted a comment from github-actions Bot May 31, 2026

codeaddict-119 approved these changes May 31, 2026

View reviewed changes

Eamon2009 merged commit 40b8bd9 into master May 31, 2026
6 checks passed

Eamon2009 added the cuda label May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cuda): add attention forward and backward kernel declarations#64

feat(cuda): add attention forward and backward kernel declarations#64
Eamon2009 merged 5 commits into
masterfrom
codeaddict-master

Eamon2009 commented May 31, 2026

Uh oh!

Eamon2009 commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eamon2009 commented May 31, 2026

Summary

Uh oh!

Eamon2009 commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants