diff --git a/packages/examples/src/examples/platformer/createGame.ts b/packages/examples/src/examples/platformer/createGame.ts
index 860b09a6e..02926358a 100644
--- a/packages/examples/src/examples/platformer/createGame.ts
+++ b/packages/examples/src/examples/platformer/createGame.ts
@@ -28,7 +28,7 @@ export const createGame = () => {
 		renderer: video.AUTO,
 		preferWebGL1: false,
 		subPixel: false,
-		highPrecisionShader: !device.isMobile,
+		highPrecisionShader: false,
 	});
 
 	// register the debug plugin
diff --git a/packages/examples/src/examples/tiledMapLoader/ExampleTiledMapLoader.tsx b/packages/examples/src/examples/tiledMapLoader/ExampleTiledMapLoader.tsx
index 122e38acc..e58c92b76 100644
--- a/packages/examples/src/examples/tiledMapLoader/ExampleTiledMapLoader.tsx
+++ b/packages/examples/src/examples/tiledMapLoader/ExampleTiledMapLoader.tsx
@@ -53,6 +53,7 @@ const createGame = () => {
 		!video.init(1024, 768, {
 			parent: "screen",
 			scaleMethod: "flex",
+			preferWebGL1: false,
 		})
 	) {
 		alert("Your browser does not support HTML5 canvas.");
diff --git a/packages/melonjs/CHANGELOG.md b/packages/melonjs/CHANGELOG.md
index c06404647..0757965f3 100644
--- a/packages/melonjs/CHANGELOG.md
+++ b/packages/melonjs/CHANGELOG.md
@@ -2,8 +2,24 @@
 
 ## [19.4.0] (melonJS 2) - _unreleased_
 
+**Highlights:** rendering-focused release. The headline is GPU-accelerated WebGL 2 tile rendering for orthogonal TMX maps: visible layers now render as a single quad through a fragment shader instead of one draw per tile. Combined with the new shader-wide uniform cache, the per-fragment fast path, and the flat `Uint16Array`-backed tile data, a typical 3-layer 800×600 game on mid-tier mobile reclaims roughly **1.5–3.5 ms per frame** (~10–20% of the 60 fps budget). Dense large maps should see ~5–8× speedups on the rendering portion.
+
+### Added
+- GPU-accelerated WebGL 2 tile rendering for orthogonal TMX maps. Each visible layer renders as a single quad whose fragment shader walks the per-layer GID index texture and samples the tileset atlas, with no per-tile draw loop. Supports animated tiles, flip bits (H/V/AD), per-layer opacity/tint, per-layer blend mode, and oversized bottom-aligned tiles. Enabled by default via `Application.settings.gpuTilemap`; falls back transparently to the legacy CPU renderer on isometric/staggered/hexagonal layers, collection-of-image tilesets, non-zero `tileoffset`, or non-WebGL-2 contexts. Rough win on a mid-tier mobile GPU with a 3-layer 800×600 viewport: ~2–4 ms down to ~0.3–0.8 ms per frame; up to ~5–8× on dense large maps; effectively free on desktop GPUs.
+- WebGL: custom shaders can now be written in GLSL ES 3.00 (`#version 300 es`). Construct a `GLShader` with both vertex and fragment source in 3.00 form. The precision injector and attribute extractor handle both versions. **Note:** `ShaderEffect` is still 1.00-only since WebGL requires both stages of a program to share a version, and it pairs the user's fragment with the built-in 1.00 quad vertex shader.
+- `TextureResource` / `BufferTextureResource`: a renderer-agnostic source for textures synthesized from raw byte buffers rather than loaded from an image. Flows through the standard `TextureCache` and batcher path. Supports `rgba8` and `rgba8ui` (WebGL 2) formats. Used internally by the GPU TMX renderer.
+
+### Fixed
+- WebGL: `MaterialBatcher.uploadTexture` was using its `w` and `h` parameters (the destination quad size, not the texture's) for the `isPOT` check, which drives both the wrap-mode fallback and the `generateMipmap` gate. Visible as a `GL_INVALID_OPERATION` from `gl.generateMipmap` on WebGL 1; silent wasted work (unnecessary mipmaps, wrong `isPOT`-derived state) on WebGL 2. Texture dimensions are now derived from the source itself.
+
 ### Changed
-- `throttle(fn, wait)` is now generic over its argument tuple — `throttle<T extends unknown[]>((...args: T) => void, wait)` preserves the wrapped function's parameter types. Drops the `as unknown as () => void` cast that the pointer-event handler used to need.
+- WebGL 1: removed the unconditional `[Texture] ... is not a POT texture` warning. The engine handles NPOT correctly (clamp wrap, non-mipmapped filters). A targeted warning now fires only when `repeat: "repeat*"` is requested on an NPOT texture under WebGL 1, the one case where the user's intent is silently downgraded.
+- `throttle(fn, wait)` is now generic over its argument tuple. `throttle<T extends unknown[]>((...args: T) => void, wait)` preserves the wrapped function's parameter types.
+
+### Performance
+- TMX tile layers now back `layerData` with a flat `Uint16Array` and the orientation renderers read directly from it, with no `Tile` allocations during map parse or per-frame rendering. Per-layer memory drops ~25× (40 KB vs ~1 MB on a 100×100 layer); modest FPS gain on Canvas (~2–5% in tile-heavy scenes). Public API is unchanged.
+- WebGL: every shader the engine builds (sprite batchers, light effects, post-effect chains, the TMX GPU renderer, user-authored `GLShader` / `ShaderEffect`) now caches the last value sent for each uniform and skips redundant `gl.uniform*` calls. Vec/mat values compare element-wise so a reused scratch `Float32Array` is detected correctly. Biggest beneficiaries are the per-frame projection-matrix upload (now skipped after the first frame) and the TMX GPU renderer's layer-lifetime constants. Modest on its own, typically ~0.1–0.5 ms saved per frame on mid-tier mobile, more in scenes with many custom shaders or post-effect chains, but stacks cleanly with every other rendering win.
+- TMX GPU renderer: fragment shader branches on `uOverflow == (0, 0)` and uses a single-cell fast path for tilesets whose tiles fit the cell exactly (the common case), skipping the worst-case 25-iteration candidate-cell loop entirely. The slow path (oversized bottom-aligned tiles) is unchanged. Roughly 10–25% fragment-shader cost reduction for the common case (~0.05–0.2 ms per frame on mid-tier mobile, lost in the noise on desktop GPUs); the win compounds with viewport size since fragment work scales with pixel count.
 
 ## [19.3.0] (melonJS 2) - _2026-05-08_
 
diff --git a/packages/melonjs/src/application/application.ts b/packages/melonjs/src/application/application.ts
index e88db0fd8..91f1f7571 100644
--- a/packages/melonjs/src/application/application.ts
+++ b/packages/melonjs/src/application/application.ts
@@ -379,6 +379,24 @@ export default class Application {
 		this.world.app = this;
 		// set the reference to this application instance
 		this.world.physic = this.settings.physic;
+		this.world.gpuTilemap = this.settings.gpuTilemap;
+
+		// The GPU tilemap path needs a WebGL 2 renderer. Warn once at app
+		// startup when the user asked for it but the active renderer
+		// can't honor it (Canvas mode, WebGL 1 driver, `preferWebGL1`
+		// override, etc.) — individual layers will silently fall through
+		// to the legacy renderer, but the user gets one heads-up that
+		// the feature they enabled isn't actually in effect.
+		if (
+			this.settings.gpuTilemap &&
+			// duck-type rather than `instanceof WebGLRenderer` to avoid a
+			// runtime import; only the WebGL renderer carries `WebGLVersion`
+			(this.renderer as unknown as { WebGLVersion?: number }).WebGLVersion !== 2
+		) {
+			console.warn(
+				"melonJS: gpuTilemap is enabled but the active renderer is not WebGL 2 — falling back to the legacy tile renderer for every tile layer",
+			);
+		}
 
 		// app starting time
 		this.lastUpdate = globalThis.performance.now();
diff --git a/packages/melonjs/src/application/defaultApplicationSettings.ts b/packages/melonjs/src/application/defaultApplicationSettings.ts
index 4d7067ba9..b50cd3247 100644
--- a/packages/melonjs/src/application/defaultApplicationSettings.ts
+++ b/packages/melonjs/src/application/defaultApplicationSettings.ts
@@ -13,6 +13,7 @@ export const defaultApplicationSettings = {
 	consoleHeader: true,
 	blendMode: "normal",
 	physic: "builtin",
+	gpuTilemap: true,
 	failIfMajorPerformanceCaveat: true,
 	highPrecisionShader: true,
 	subPixel: false,
diff --git a/packages/melonjs/src/application/settings.ts b/packages/melonjs/src/application/settings.ts
index 50015d5d1..324c64985 100644
--- a/packages/melonjs/src/application/settings.ts
+++ b/packages/melonjs/src/application/settings.ts
@@ -79,6 +79,19 @@ export type ApplicationSettings = {
 	 * @default "builtin"
 	 */
 	physic: PhysicsType;
+
+	/**
+	 * Enable the WebGL2 procedural shader path for orthogonal tile layers.
+	 * When `true` (default), eligible layers render via a single quad per
+	 * tileset + a fragment shader doing per-fragment GID lookup, bypassing
+	 * the per-tile draw loop entirely. Layers that don't qualify
+	 * (Canvas/WebGL1, non-orthogonal, collection-of-image tilesets,
+	 * tilerendersize "grid", non-zero tileoffset, oversampled beyond the
+	 * shader's overflow window) fall back to the legacy path automatically.
+	 * Set to `false` to disable globally.
+	 * @default true
+	 */
+	gpuTilemap: boolean;
 	/**
 	 * if true, the renderer will fail if the browser reports a major performance caveat
 	 * (e.g. software WebGL). Set to false to allow WebGL on machines with
diff --git a/packages/melonjs/src/level/tiled/TMXLayer.js b/packages/melonjs/src/level/tiled/TMXLayer.js
index 7f572cb14..0a5e2f18c 100644
--- a/packages/melonjs/src/level/tiled/TMXLayer.js
+++ b/packages/melonjs/src/level/tiled/TMXLayer.js
@@ -2,24 +2,61 @@ import { vector2dPool } from "../../math/vector2d.ts";
 import Renderable from "../../renderable/renderable.js";
 import CanvasRenderer from "../../video/canvas/canvas_renderer";
 import { createCanvas } from "../../video/video.js";
+import {
+	TMX_CLEAR_BIT_MASK,
+	TMX_FLIP_AD,
+	TMX_FLIP_H,
+	TMX_FLIP_V,
+} from "./constants.js";
 import Tile from "./TMXTile.js";
 import * as TMXUtils from "./TMXUtils.js";
 
+// flip-mask bit layout for layerData's G channel
+const FLIP_H_BIT = 1 << 0;
+const FLIP_V_BIT = 1 << 1;
+const FLIP_AD_BIT = 1 << 2;
+
 /**
- * Create required arrays for the given layer object
+ * extract a 3-bit flip mask from a raw 32-bit GID (Tiled's flip bits live in
+ * the upper 3 bits)
  * @ignore
  */
-function initArray(rows, cols) {
-	const array = new Array(cols);
-	for (let col = 0; col < cols; col++) {
-		// fill with null in one call — avoids per-element loop
-		array[col] = new Array(rows).fill(null);
-	}
-	return array;
+function flipMaskFromGid(gid) {
+	return (
+		(gid & TMX_FLIP_H ? FLIP_H_BIT : 0) |
+		(gid & TMX_FLIP_V ? FLIP_V_BIT : 0) |
+		(gid & TMX_FLIP_AD ? FLIP_AD_BIT : 0)
+	);
 }
 
 /**
- * Set a tiled layer Data
+ * extract a 3-bit flip mask from a Tile object's boolean flip flags
+ * @ignore
+ */
+function flipMaskFromTile(tile) {
+	return (
+		(tile.flippedX ? FLIP_H_BIT : 0) |
+		(tile.flippedY ? FLIP_V_BIT : 0) |
+		(tile.flippedAD ? FLIP_AD_BIT : 0)
+	);
+}
+
+/**
+ * reconstruct a legacy 32-bit GID (with Tiled's high flip bits set) from the
+ * cleaned GID and a 3-bit flip mask, for passing to the Tile constructor
+ * @ignore
+ */
+function gidWithFlips(gid, flipMask) {
+	return (
+		gid |
+		(flipMask & FLIP_H_BIT ? TMX_FLIP_H : 0) |
+		(flipMask & FLIP_V_BIT ? TMX_FLIP_V : 0) |
+		(flipMask & FLIP_AD_BIT ? TMX_FLIP_AD : 0)
+	);
+}
+
+/**
+ * Decode a tiled layer's data blob directly into the typed-array layerData
  * @ignore
  */
 function setLayerData(layer, bounds, data) {
@@ -35,22 +72,37 @@ function setLayerData(layer, bounds, data) {
 		width = bounds.cols;
 		height = bounds.rows;
 	}
-	// set everything
+
+	const cols = layer.cols;
+	const layerData = layer.layerData;
+	const offsetX = bounds.x;
+	const offsetY = bounds.y;
+	// One-shot warning when a layer's flip-stripped GID won't fit in the
+	// `Uint16Array` cell — silent truncation would render the wrong tile.
+	// 65535 is far above any realistic tileset size, but flag it loudly
+	// so users with degenerate maps notice immediately.
+	let overflowedGid = 0;
 	for (let y = 0; y < height; y++) {
 		for (let x = 0; x < width; x++) {
-			// get the value of the gid
-			const gid = data[idx++];
-			// fill the array
-			if (gid !== 0) {
-				// add a new tile to the layer
-				layer.layerData[x + bounds.x][y + bounds.y] = layer.getTileById(
-					gid,
-					x + bounds.x,
-					y + bounds.y,
-				);
+			const rawGid = data[idx++];
+			if (rawGid !== 0) {
+				const flatIdx = ((y + offsetY) * cols + (x + offsetX)) * 2;
+				const cleanGid = rawGid & TMX_CLEAR_BIT_MASK;
+				if (cleanGid > 0xffff && overflowedGid === 0) {
+					overflowedGid = cleanGid;
+				}
+				layerData[flatIdx] = cleanGid;
+				layerData[flatIdx + 1] = flipMaskFromGid(rawGid);
 			}
 		}
 	}
+	if (overflowedGid !== 0) {
+		console.warn(
+			"melonJS: TMX layer contains GID " +
+				overflowedGid +
+				" which exceeds the 16-bit cell capacity (max 65535). Tiles will be truncated and render incorrectly.",
+		);
+	}
 }
 
 /**
@@ -184,8 +236,56 @@ export default class TMXLayer extends Renderable {
 		// set a renderer
 		this.setRenderer(map.getRenderer());
 
-		// initialize the data array
-		this.layerData = initArray(this.rows, this.cols);
+		/**
+		 * The raw tile data for this layer. Each cell occupies two consecutive
+		 * `Uint16` slots: the GID (with flip bits stripped) and a 3-bit flip
+		 * mask. Cell `(x, y)` is at `layerData[(y * cols + x) * 2]` (row-major).
+		 *
+		 * The 16-bit GID slot caps per-tileset GIDs at 65 535. This matches the
+		 * planned WebGL2 shader path (`RG16UI` index texture) — switching to
+		 * `Uint32Array` would force a truncating copy at GPU upload time.
+		 * @type {Uint16Array}
+		 */
+		this.layerData = new Uint16Array(this.cols * this.rows * 2);
+
+		/**
+		 * Lazy view cache of Tile objects, indexed by `y * cols + x` (row-major).
+		 * Allocated lazily on the first `cellAt` / `getTile` call — the renderer
+		 * hot path reads `layerData` directly and never touches this cache, so
+		 * for games that never call `getTile`/`cellAt` from user code, this
+		 * stays `null` for the layer's lifetime. Invalidated entry-by-entry by
+		 * `setTile` and `clearTile`. The raw bytes in `layerData` are the source
+		 * of truth; this exists only to preserve stable Tile identity across
+		 * repeated user-facing reads.
+		 * @type {Array<Tile|null>|null}
+		 * @ignore
+		 */
+		this.cachedTile = null;
+
+		/**
+		 * Monotonically-increasing counter bumped by `setTile` and `clearTile`.
+		 * Renderers can compare against a stashed value to detect mutations and
+		 * decide whether to re-upload the layer data to the GPU.
+		 * @type {number}
+		 */
+		this.dataVersion = 0;
+
+		/**
+		 * How this layer is rendered. Resolved by `onActivateEvent` to one of:
+		 *   - `"shader"`   — WebGL2 procedural shader path (single quad per tileset, fragment GID lookup)
+		 *   - `"prerender"`— offscreen-canvas bake at activation, blitted as one drawImage per frame
+		 *   - `"perTile"`  — per-frame loop, one drawImage per visible tile
+		 *
+		 * User code may set this to one of the above values (or the special
+		 * `"auto"`) before the layer is activated to override the engine's
+		 * default choice; Tiled custom properties named `renderMode` are
+		 * applied automatically via `applyTMXProperties`. If a forced mode
+		 * is ineligible (e.g. `"shader"` on Canvas), a one-shot warning is
+		 * emitted at activation and the layer falls back to the legacy path.
+		 * @type {string}
+		 * @default "auto"
+		 */
+		this.renderMode = "auto";
 
 		if (map.infinite === 0) {
 			// initialize and set the layer data
@@ -221,16 +321,13 @@ export default class TMXLayer extends Renderable {
 
 		this.isAnimated = this.animatedTilesets.length > 0;
 
-		// check for the correct rendering method
-		if (typeof this.preRender === "undefined" && this.isAnimated === false) {
-			this.preRender = this.ancestor.getRootAncestor().preRender;
-		} else {
-			// Force pre-render off when tileset animation is used
-			this.preRender = false;
-		}
+		// resolve renderMode: shader > prerender > perTile, taking into
+		// account user-forced values, Application/world settings, and the
+		// per-layer preRender hint
+		this._resolveRenderMode();
 
-		// if pre-rendering method is use, create an offline canvas/renderer
-		if (this.preRender === true && !this.canvasRenderer) {
+		// if pre-rendering method is in use, create an offline canvas/renderer
+		if (this.renderMode === "prerender" && !this.canvasRenderer) {
 			this.canvasRenderer = new CanvasRenderer({
 				canvas: createCanvas(this.width, this.height),
 				width: this.width,
@@ -240,13 +337,135 @@ export default class TMXLayer extends Renderable {
 			// pre render the layer on the canvas
 			this.getRenderer().drawTileLayer(this.canvasRenderer, this, this);
 		}
+		// keep `preRender` boolean in sync with the resolved mode (legacy
+		// callers still read it; `Renderer.drawTileLayer` itself reads
+		// `layer.canvasRenderer`)
+		this.preRender = this.renderMode === "prerender";
 
 		this.isDirty = true;
 	}
 
+	/**
+	 * Resolve `this.renderMode` to one of "shader" / "prerender" / "perTile"
+	 * based on eligibility checks and user/world hints. Emits a single
+	 * `console.warn` at activation when a forced mode is ineligible, or
+	 * when an auto-eligible mode falls back due to a layer feature the GPU
+	 * path doesn't support (orientation, collection-of-image tileset, etc.).
+	 * @ignore
+	 */
+	_resolveRenderMode() {
+		const root = this.ancestor?.getRootAncestor?.();
+		const renderer = this.parentApp?.renderer;
+		const gpuAllowed = root?.gpuTilemap !== false;
+		const preRenderHint =
+			typeof this.preRender === "boolean" ? this.preRender : root?.preRender;
+
+		const elig = this._checkShaderEligibility(renderer, gpuAllowed);
+
+		const requested = this.renderMode;
+		// explicit "shader" — honor if eligible, warn otherwise
+		if (requested === "shader") {
+			if (elig.ok) {
+				return; // already "shader"
+			}
+			console.warn(
+				`melonJS: layer "${this.name}" forced renderMode "shader" not available (${elig.reason}) — falling back to perTile`,
+			);
+			this.renderMode = "perTile";
+			return;
+		}
+		// explicit "prerender" — honor unless animated (cache would go stale)
+		if (requested === "prerender") {
+			if (this.isAnimated) {
+				console.warn(
+					`melonJS: layer "${this.name}" forced renderMode "prerender" disabled (layer has animated tiles) — falling back to perTile`,
+				);
+				this.renderMode = "perTile";
+			}
+			return;
+		}
+		// explicit "perTile" — pass through
+		if (requested === "perTile") {
+			return;
+		}
+		// auto-resolve: shader > prerender > perTile
+		if (elig.ok) {
+			this.renderMode = "shader";
+			return;
+		}
+		// only emit an info warning when the user enabled gpuTilemap and the
+		// fallback is due to layer-specific limitations (not a missing
+		// WebGL2 context, which is a renderer-wide condition)
+		if (gpuAllowed && elig.reason !== "no-webgl2-renderer") {
+			console.warn(
+				`melonJS: layer "${this.name}" using legacy tile renderer (${elig.reason})`,
+			);
+		}
+		if (preRenderHint && !this.isAnimated) {
+			this.renderMode = "prerender";
+			return;
+		}
+		this.renderMode = "perTile";
+	}
+
+	/**
+	 * Check whether this layer is eligible for the WebGL2 shader path.
+	 * @param {object} renderer
+	 * @param {boolean} gpuAllowed - whether `gpuTilemap` is enabled at the world level
+	 * @returns {{ok: boolean, reason?: string}}
+	 * @ignore
+	 */
+	_checkShaderEligibility(renderer, gpuAllowed) {
+		if (!gpuAllowed) {
+			return { ok: false, reason: "gpuTilemap disabled" };
+		}
+		if (!renderer || renderer.WebGLVersion !== 2) {
+			return { ok: false, reason: "no-webgl2-renderer" };
+		}
+		if (this.orientation !== "orthogonal") {
+			return {
+				ok: false,
+				reason: `no gpu renderer supported yet for "${this.orientation}" orientation`,
+			};
+		}
+		if (!this.tilesets || this.tilesets.tilesets.length === 0) {
+			return { ok: false, reason: "no tilesets" };
+		}
+		// the shader iterates a fixed-size loop over candidate cells to
+		// support oversized tiles (tile dim > cell dim) drawn bottom-aligned.
+		// Loop bound is MAX_OVERFLOW + 1; anything beyond would silently clip,
+		// so refuse the shader path for layers with extreme oversampling.
+		const MAX_OVERFLOW_CELLS = 4;
+		for (const ts of this.tilesets.tilesets) {
+			if (ts.isCollection) {
+				return { ok: false, reason: "collection-of-image tileset" };
+			}
+			if (ts.tilerendersize !== "tile") {
+				return {
+					ok: false,
+					reason: `tilerendersize "${ts.tilerendersize}" not supported`,
+				};
+			}
+			if (ts.tileoffset.x !== 0 || ts.tileoffset.y !== 0) {
+				return { ok: false, reason: "non-zero tileoffset" };
+			}
+			const overflowX = Math.ceil(ts.tilewidth / this.tilewidth) - 1;
+			const overflowY = Math.ceil(ts.tileheight / this.tileheight) - 1;
+			if (overflowX > MAX_OVERFLOW_CELLS || overflowY > MAX_OVERFLOW_CELLS) {
+				return {
+					ok: false,
+					reason: `tile overflow exceeds shader limit (${MAX_OVERFLOW_CELLS} cells)`,
+				};
+			}
+		}
+		return { ok: true };
+	}
+
 	// called when the layer is removed from the game world or a container
 	onDeactivateEvent() {
-		// clear all allocated objects
+		// renderer-side caches keyed by this layer (e.g. the WebGL2 shader
+		// path's per-layer GID index texture) are cleared from the renderer's
+		// own `reset()` path — tile layers only come and go on game reset.
 		this.animatedTilesets = undefined;
 		// keep canvasRenderer for reuse — dropping the reference would leak
 		// event listeners registered by CanvasRenderer's constructor
@@ -313,12 +532,34 @@ export default class TMXLayer extends Renderable {
 	/**
 	 * assign the given Tile object to the specified position
 	 * @param {Tile} tile - the tile object to be assigned
-	 * @param {number} x - x coordinate (in world/pixels coordinates)
-	 * @param {number} y - y coordinate (in world/pixels coordinates)
+	 * @param {number} x - x coordinate (in tile/column coordinates)
+	 * @param {number} y - y coordinate (in tile/row coordinates)
 	 * @returns {Tile} the tile object
 	 */
 	setTile(tile, x, y) {
-		this.layerData[x][y] = tile;
+		if (x < 0 || x >= this.cols || y < 0 || y >= this.rows) {
+			return tile;
+		}
+		const slot = y * this.cols + x;
+		const idx = slot * 2;
+		const cleanGid = tile.tileId & TMX_CLEAR_BIT_MASK;
+		// `layerData` is a Uint16Array; writes silently truncate above
+		// 0xFFFF. Warn once per layer so a runtime `setTile` with a
+		// GID >= 65536 doesn't corrupt the cell undetected.
+		if (cleanGid > 0xffff && !this._truncationWarned) {
+			this._truncationWarned = true;
+			console.warn(
+				"melonJS: setTile received GID " +
+					cleanGid +
+					" which exceeds the 16-bit cell capacity (max 65535). Tile will be truncated and render incorrectly.",
+			);
+		}
+		this.layerData[idx] = cleanGid;
+		this.layerData[idx + 1] = flipMaskFromTile(tile);
+		if (this.cachedTile !== null) {
+			this.cachedTile[slot] = tile;
+		}
+		this.dataVersion++;
 		this.isDirty = true;
 		return tile;
 	}
@@ -352,16 +593,42 @@ export default class TMXLayer extends Renderable {
 		const _x = ~~x;
 		const _y = ~~y;
 
-		const renderer = this.getRenderer();
 		// boundsCheck only used internally by the tiled renderer, when the layer bound check was already done
 		if (
-			boundsCheck === false ||
-			(_x >= 0 && _x < renderer.cols && _y >= 0 && _y < renderer.rows)
+			boundsCheck !== false &&
+			(_x < 0 || _x >= this.cols || _y < 0 || _y >= this.rows)
 		) {
-			return this.layerData[_x][_y];
-		} else {
 			return null;
 		}
+
+		const slot = _y * this.cols + _x;
+		const idx = slot * 2;
+		const gid = this.layerData[idx];
+		// `cellAt(x, y, false)` skips the explicit bounds check on the
+		// coords for speed, but out-of-range reads from a typed array
+		// return `undefined` — treat both that and an explicit empty
+		// cell (gid 0) as "no tile" so we never push a bogus GID into
+		// the tileset lookup path
+		if (!gid) {
+			return null;
+		}
+
+		// lazy-allocate the view cache on first user-facing query — the
+		// renderer hot loop bypasses this method and reads layerData directly,
+		// so games that never call cellAt/getTile keep cachedTile null forever
+		if (this.cachedTile === null) {
+			this.cachedTile = new Array(this.cols * this.rows).fill(null);
+		} else {
+			const cached = this.cachedTile[slot];
+			if (cached !== null) {
+				return cached;
+			}
+		}
+
+		const flipMask = this.layerData[idx + 1];
+		const tile = this.getTileById(gidWithFlips(gid, flipMask), _x, _y);
+		this.cachedTile[slot] = tile;
+		return tile;
 	}
 
 	/**
@@ -375,8 +642,17 @@ export default class TMXLayer extends Renderable {
 	 * });
 	 */
 	clearTile(x, y) {
+		if (x < 0 || x >= this.cols || y < 0 || y >= this.rows) {
+			return;
+		}
 		// clearing tile
-		this.layerData[x][y] = null;
+		const slot = y * this.cols + x;
+		const idx = slot * 2;
+		this.layerData[idx] = 0;
+		this.layerData[idx + 1] = 0;
+		if (this.cachedTile !== null) {
+			this.cachedTile[slot] = null;
+		}
 		// erase the corresponding area in the canvas
 		if (this.preRender) {
 			this.canvasRenderer.clearRect(
@@ -386,6 +662,7 @@ export default class TMXLayer extends Renderable {
 				this.tileheight,
 			);
 		}
+		this.dataVersion++;
 		this.isDirty = true;
 	}
 
@@ -408,28 +685,8 @@ export default class TMXLayer extends Renderable {
 	 * @ignore
 	 */
 	draw(renderer, rect) {
-		// use the offscreen canvas
-		if (this.preRender) {
-			const width = Math.min(rect.width, this.width);
-			const height = Math.min(rect.height, this.height);
-
-			// draw using the cached canvas
-			renderer.drawImage(
-				this.canvasRenderer.getCanvas(),
-				rect.pos.x,
-				rect.pos.y, // sx,sy
-				width,
-				height, // sw,sh
-				rect.pos.x,
-				rect.pos.y, // dx,dy
-				width,
-				height, // dw,dh
-			);
-		}
-		// dynamically render the layer
-		else {
-			// draw the layer
-			this.getRenderer().drawTileLayer(renderer, this, rect);
-		}
+		// dispatch to the active renderer — picks shader / preRender / perTile
+		// based on `this.renderMode` and the renderer's capabilities
+		renderer.drawTileLayer(this, rect);
 	}
 }
diff --git a/packages/melonjs/src/level/tiled/TMXTile.js b/packages/melonjs/src/level/tiled/TMXTile.js
index 59b31e32b..1f9ab6610 100644
--- a/packages/melonjs/src/level/tiled/TMXTile.js
+++ b/packages/melonjs/src/level/tiled/TMXTile.js
@@ -9,6 +9,51 @@ import {
 	TMX_FLIP_V,
 } from "./constants.js";
 
+// flip-mask bit layout shared with TMXLayer.layerData's G channel
+const FLIP_H_BIT = 1 << 0;
+const FLIP_V_BIT = 1 << 1;
+const FLIP_AD_BIT = 1 << 2;
+
+/**
+ * Apply a flip-mask transform to a Matrix2d in-place. Resets the matrix to
+ * identity first, then applies the H / V / AD operations to flip / rotate a
+ * tile of `width × height` pixels around its center.
+ *
+ * This is the typed-array equivalent of `Tile.setTileTransform()` — both share
+ * the same math, but this variant is driven by a packed 3-bit mask (H=1, V=2,
+ * AD=4) so callers that have raw layerData bytes can build a flip transform
+ * without constructing a Tile.
+ *
+ * @param {Matrix2d} transform - the matrix to fill (mutated in place)
+ * @param {number} flipMask - 3-bit packed flip mask
+ * @param {number} width - tile width in pixels
+ * @param {number} height - tile height in pixels
+ * @returns {Matrix2d} the same matrix, for chaining
+ * @ignore
+ */
+export function buildFlipTransform(transform, flipMask, width, height) {
+	const halfW = width / 2;
+	const halfH = height / 2;
+	const flippedH = (flipMask & FLIP_H_BIT) !== 0;
+	const flippedV = (flipMask & FLIP_V_BIT) !== 0;
+	const flippedAD = (flipMask & FLIP_AD_BIT) !== 0;
+
+	transform.identity();
+	transform.translate(halfW, halfH);
+	if (flippedAD) {
+		transform.rotate(degToRad(-90));
+		transform.scale(-1, 1);
+	}
+	if (flippedH) {
+		transform.scale(flippedAD ? 1 : -1, flippedAD ? -1 : 1);
+	}
+	if (flippedV) {
+		transform.scale(flippedAD ? -1 : 1, flippedAD ? 1 : -1);
+	}
+	transform.translate(-halfW, -halfH);
+	return transform;
+}
+
 /**
  * a basic tile object
  * @category Tilemap
diff --git a/packages/melonjs/src/level/tiled/TMXTileset.js b/packages/melonjs/src/level/tiled/TMXTileset.js
index e18adb110..dba99e538 100644
--- a/packages/melonjs/src/level/tiled/TMXTileset.js
+++ b/packages/melonjs/src/level/tiled/TMXTileset.js
@@ -1,10 +1,17 @@
 import { game } from "../../application/application.ts";
 import { getImage, getTMX } from "../../loader/loader.js";
+import { Matrix2d } from "../../math/matrix2d.ts";
 import { Vector2d } from "../../math/vector2d.ts";
 import timer from "../../system/timer.ts";
 import { getBasename, getExtension } from "../../utils/file.ts";
+import { buildFlipTransform } from "./TMXTile.js";
 import { resolveEmbeddedImage } from "./TMXUtils.js";
 
+// shared scratch matrix for flip-transform construction in drawTileRaw — avoids
+// per-call allocation in the hot path. Single-threaded JS means it's safe to
+// share across all tilesets without contention.
+const SCRATCH_MATRIX = new Matrix2d();
+
 /**
  * a TMX Tile Set Object
  * @category Tilemap
@@ -552,4 +559,112 @@ export default class TMXTileset {
 			renderer.restore();
 		}
 	}
+
+	/**
+	 * draw a tile at the specified position from raw (gid, flipMask) data
+	 *
+	 * Like {@link drawTile} but bypasses the {@link Tile} object entirely:
+	 * the renderer hot loop can pass the GID and flip mask straight from
+	 * `layer.layerData` without ever allocating a Tile instance.
+	 *
+	 * @param {CanvasRenderer|WebGLRenderer} renderer - a renderer instance
+	 * @param {number} dx - destination x position
+	 * @param {number} dy - destination y position
+	 * @param {number} gid - the tile's global id (with flip bits already stripped)
+	 * @param {number} flipMask - 3-bit packed flip mask (H=1, V=2, AD=4)
+	 * @ignore
+	 */
+	drawTileRaw(renderer, dx, dy, gid, flipMask) {
+		let dw, dh;
+		let tileImage;
+
+		if (this.isCollection) {
+			// collection tiles can have varying sizes; compute scale per-tile
+			tileImage = this.imageCollection.get(gid);
+			const tileWidth = tileImage.width;
+			const tileHeight = tileImage.height;
+
+			if (this.tilerendersize === "grid") {
+				let scaleX = this.mapTilewidth / tileWidth;
+				let scaleY = this.mapTileheight / tileHeight;
+
+				if (this.fillmode === "preserve-aspect-fit") {
+					const scale = Math.min(scaleX, scaleY);
+					scaleX = scale;
+					scaleY = scale;
+				}
+
+				dw = tileWidth * scaleX;
+				dh = tileHeight * scaleY;
+
+				// bottom-align against tileset baseline (renderer uses tileset.tileheight)
+				dy += this.tileheight - dh;
+
+				if (this.fillmode === "preserve-aspect-fit") {
+					dx += (this.mapTilewidth - dw) / 2;
+					dy -= (this.mapTileheight - dh) / 2;
+				}
+			} else {
+				dw = tileWidth;
+				dh = tileHeight;
+			}
+		} else {
+			// spritesheet: use precomputed values
+			dw = this._renderDw;
+			dh = this._renderDh;
+			dy += this._renderDyOffset;
+			dx += this._renderDxCenter;
+			dy += this._renderDyCenter;
+		}
+
+		// check if any transformation is required
+		if (flipMask !== 0) {
+			renderer.save();
+			renderer.translate(dx, dy);
+			// rebuild the flip transform into the shared scratch matrix
+			// (size driven by the tileset's tile dims for spritesheets, the
+			// tile image's dims for collections)
+			renderer.transform(
+				buildFlipTransform(
+					SCRATCH_MATRIX,
+					flipMask,
+					this.isCollection ? tileImage.width : this.tilewidth,
+					this.isCollection ? tileImage.height : this.tileheight,
+				),
+			);
+			dx = dy = 0;
+		}
+
+		// draw the tile image
+		if (this.isCollection) {
+			renderer.drawImage(
+				tileImage,
+				0,
+				0,
+				tileImage.width,
+				tileImage.height,
+				dx,
+				dy,
+				dw,
+				dh,
+			);
+		} else {
+			const offset = this.atlas[this.getViewTileId(gid)].offset;
+			renderer.drawImage(
+				this.image,
+				offset.x,
+				offset.y,
+				this.tilewidth,
+				this.tileheight,
+				dx,
+				dy,
+				dw + renderer.uvOffset,
+				dh + renderer.uvOffset,
+			);
+		}
+
+		if (flipMask !== 0) {
+			renderer.restore();
+		}
+	}
 }
diff --git a/packages/melonjs/src/level/tiled/renderer/TMXHexagonalRenderer.js b/packages/melonjs/src/level/tiled/renderer/TMXHexagonalRenderer.js
index eb92b7389..911b270c0 100644
--- a/packages/melonjs/src/level/tiled/renderer/TMXHexagonalRenderer.js
+++ b/packages/melonjs/src/level/tiled/renderer/TMXHexagonalRenderer.js
@@ -317,7 +317,7 @@ export default class TMXHexagonalRenderer extends TMXRenderer {
 	}
 
 	/**
-	 * draw the tile map
+	 * draw the tile map (legacy entry point — accepts a fully-constructed Tile)
 	 * @ignore
 	 */
 	drawTile(renderer, x, y, tmxTile) {
@@ -335,13 +335,30 @@ export default class TMXHexagonalRenderer extends TMXRenderer {
 		vector2dPool.release(point);
 	}
 
+	/**
+	 * draw a tile from raw (gid, flipMask, tileset) data — used by the hot
+	 * rendering loop to bypass Tile construction
+	 * @ignore
+	 */
+	drawTileRaw(renderer, x, y, gid, flipMask, tileset) {
+		const point = this.tileToPixelCoords(x, y, vector2dPool.get());
+
+		tileset.drawTileRaw(
+			renderer,
+			tileset.tileoffset.x + point.x,
+			tileset.tileoffset.y + point.y + (this.tileheight - tileset.tileheight),
+			gid,
+			flipMask,
+		);
+
+		vector2dPool.release(point);
+	}
+
 	/**
 	 * draw the tile map
 	 * @ignore
 	 */
 	drawTileLayer(renderer, layer, rect) {
-		let tile;
-
 		// get top-left and bottom-right tile position
 		const startTile = this.pixelToTileCoords(
 			rect.pos.x,
@@ -380,6 +397,24 @@ export default class TMXHexagonalRenderer extends TMXRenderer {
 		const endX = layer.cols;
 		const endY = layer.rows;
 
+		// shared hot-loop state for both stagger branches — read (gid, flipMask)
+		// straight from the typed array and resolve the tileset with a
+		// short-circuit cache.
+		// NOTE: the hex hot loop bypasses `this.drawTileRaw` and calls
+		// `tilesetCache.drawTileRaw` directly because it maintains pre-computed
+		// pixel coords (`rowPos.x`, `rowPos.y`) incrementally across the
+		// staggered iteration; going through drawTileRaw would re-derive them
+		// from layer-cell coords via tileToPixelCoords and lose the gain.
+		const layerCols = layer.cols;
+		const data = layer.layerData;
+		const tilesets = layer.tilesets;
+		let tilesetCache = layer.tileset;
+		if (tilesetCache === null) {
+			vector2dPool.release(startTile);
+			vector2dPool.release(startPos);
+			return;
+		}
+
 		if (this.staggerX) {
 			//ensure we are in the valid tile range
 			startTile.x = Math.max(0, startTile.x);
@@ -399,10 +434,22 @@ export default class TMXHexagonalRenderer extends TMXRenderer {
 				rowPos.setV(startPos);
 
 				for (; rowPos.x < rect.right && rowTile.x < endX; rowTile.x += 2) {
-					tile = layer.cellAt(rowTile.x, rowTile.y, false);
-					if (tile) {
-						// draw the tile
-						tile.tileset.drawTile(renderer, rowPos.x, rowPos.y, tile);
+					if (rowTile.x >= 0 && rowTile.y >= 0 && rowTile.y < layer.rows) {
+						const idx = (rowTile.y * layerCols + rowTile.x) * 2;
+						const gid = data[idx];
+						if (gid) {
+							const flipMask = data[idx + 1];
+							if (!tilesetCache.contains(gid)) {
+								tilesetCache = tilesets.getTilesetByGid(gid);
+							}
+							tilesetCache.drawTileRaw(
+								renderer,
+								rowPos.x,
+								rowPos.y,
+								gid,
+								flipMask,
+							);
+						}
 					}
 					rowPos.x += this.tilewidth + this.sidelengthx;
 				}
@@ -448,10 +495,22 @@ export default class TMXHexagonalRenderer extends TMXRenderer {
 				}
 
 				for (; rowPos.x < rect.right && rowTile.x < endX; rowTile.x++) {
-					tile = layer.cellAt(rowTile.x, rowTile.y, false);
-					if (tile) {
-						// draw the tile
-						tile.tileset.drawTile(renderer, rowPos.x, rowPos.y, tile);
+					if (rowTile.x >= 0 && rowTile.y >= 0 && rowTile.y < layer.rows) {
+						const idx = (rowTile.y * layerCols + rowTile.x) * 2;
+						const gid = data[idx];
+						if (gid) {
+							const flipMask = data[idx + 1];
+							if (!tilesetCache.contains(gid)) {
+								tilesetCache = tilesets.getTilesetByGid(gid);
+							}
+							tilesetCache.drawTileRaw(
+								renderer,
+								rowPos.x,
+								rowPos.y,
+								gid,
+								flipMask,
+							);
+						}
 					}
 					rowPos.x += this.tilewidth + this.sidelengthx;
 				}
diff --git a/packages/melonjs/src/level/tiled/renderer/TMXIsometricRenderer.js b/packages/melonjs/src/level/tiled/renderer/TMXIsometricRenderer.js
index 2a9aec67e..d1fc2add6 100644
--- a/packages/melonjs/src/level/tiled/renderer/TMXIsometricRenderer.js
+++ b/packages/melonjs/src/level/tiled/renderer/TMXIsometricRenderer.js
@@ -85,7 +85,7 @@ export default class TMXIsometricRenderer extends TMXRenderer {
 	}
 
 	/**
-	 * draw the tile map
+	 * draw the tile map (legacy entry point — accepts a fully-constructed Tile)
 	 * @ignore
 	 */
 	drawTile(renderer, x, y, tmxTile) {
@@ -99,6 +99,21 @@ export default class TMXIsometricRenderer extends TMXRenderer {
 		);
 	}
 
+	/**
+	 * draw a tile from raw (gid, flipMask, tileset) data — used by the hot
+	 * rendering loop to bypass Tile construction
+	 * @ignore
+	 */
+	drawTileRaw(renderer, x, y, gid, flipMask, tileset) {
+		tileset.drawTileRaw(
+			renderer,
+			((this.cols - 1) * tileset.tilewidth + (x - y) * tileset.tilewidth) >> 1,
+			(-tileset.tilewidth + (x + y) * tileset.tileheight) >> 2,
+			gid,
+			flipMask,
+		);
+	}
+
 	/**
 	 * draw the tile map
 	 * @ignore
@@ -159,7 +174,25 @@ export default class TMXIsometricRenderer extends TMXRenderer {
 		// initialize the columnItr vector
 		const columnItr = vector2dPool.get().setV(rowItr);
 
-		// main drawing loop
+		// main drawing loop — read (gid, flipMask) straight from the typed
+		// array and resolve the tileset with a short-circuit cache.
+		// NOTE: the iso hot loop bypasses `this.drawTileRaw` because it has
+		// already computed pixel coords (`x`, `y/2`) from the staggered scan;
+		// going through drawTileRaw would re-derive them from layer-cell coords
+		// and lose the precomputation.
+		const layerCols = layer.cols;
+		const layerRows = layer.rows;
+		const data = layer.layerData;
+		const tilesets = layer.tilesets;
+		let tilesetCache = tileset;
+		if (tilesetCache === null) {
+			vector2dPool.release(columnItr);
+			vector2dPool.release(rowItr);
+			vector2dPool.release(tileEnd);
+			vector2dPool.release(rectEnd);
+			vector2dPool.release(startPos);
+			return;
+		}
 		for (
 			let y = startPos.y * 2;
 			y - this.tileheight * 2 < rectEnd.y * 2;
@@ -167,19 +200,26 @@ export default class TMXIsometricRenderer extends TMXRenderer {
 		) {
 			columnItr.setV(rowItr);
 			for (let x = startPos.x; x < rectEnd.x; x += this.tilewidth) {
-				const tmxTile = layer.cellAt(columnItr.x, columnItr.y);
-				// render if a valid tile position
-				if (tmxTile) {
-					tileset = tmxTile.tileset;
-					// offset could be different per tileset
-					const offset = tileset.tileoffset;
-					// draw our tile
-					tileset.drawTile(
-						renderer,
-						offset.x + x,
-						offset.y + y / 2 - tileset.tileheight,
-						tmxTile,
-					);
+				const cx = columnItr.x;
+				const cy = columnItr.y;
+				// bounds check (this loop didn't disable it in the old path)
+				if (cx >= 0 && cx < layerCols && cy >= 0 && cy < layerRows) {
+					const idx = (cy * layerCols + cx) * 2;
+					const gid = data[idx];
+					if (gid !== 0) {
+						const flipMask = data[idx + 1];
+						if (!tilesetCache.contains(gid)) {
+							tilesetCache = tilesets.getTilesetByGid(gid);
+						}
+						const offset = tilesetCache.tileoffset;
+						tilesetCache.drawTileRaw(
+							renderer,
+							offset.x + x,
+							offset.y + y / 2 - tilesetCache.tileheight,
+							gid,
+							flipMask,
+						);
+					}
 				}
 
 				// Advance to the next column
diff --git a/packages/melonjs/src/level/tiled/renderer/TMXObliqueRenderer.js b/packages/melonjs/src/level/tiled/renderer/TMXObliqueRenderer.js
index 20ba15ac2..ce4a87148 100644
--- a/packages/melonjs/src/level/tiled/renderer/TMXObliqueRenderer.js
+++ b/packages/melonjs/src/level/tiled/renderer/TMXObliqueRenderer.js
@@ -110,7 +110,7 @@ export default class TMXObliqueRenderer extends TMXOrthogonalRenderer {
 	}
 
 	/**
-	 * draw the tile map
+	 * draw the tile map (legacy entry point — accepts a fully-constructed Tile)
 	 * @ignore
 	 */
 	drawTile(renderer, x, y, tmxTile) {
@@ -125,6 +125,21 @@ export default class TMXObliqueRenderer extends TMXOrthogonalRenderer {
 		tileset.drawTile(renderer, dx, dy, tmxTile);
 	}
 
+	/**
+	 * draw a tile from raw (gid, flipMask, tileset) data — used by the hot
+	 * rendering loop to bypass Tile construction
+	 * @ignore
+	 */
+	drawTileRaw(renderer, x, y, gid, flipMask, tileset) {
+		const dx = tileset.tileoffset.x + x * this.tilewidth + this.skewX * y;
+		const dy =
+			tileset.tileoffset.y +
+			(y + 1) * this.tileheight -
+			tileset.tileheight +
+			this.skewY * x;
+		tileset.drawTileRaw(renderer, dx, dy, gid, flipMask);
+	}
+
 	/**
 	 * draw the given TMX Layer for the given area
 	 * @ignore
@@ -205,13 +220,26 @@ export default class TMXObliqueRenderer extends TMXOrthogonalRenderer {
 				break;
 		}
 
-		// main drawing loop
+		// main drawing loop — direct typed-array reads, short-circuit tileset cache
+		const cols = layer.cols;
+		const data = layer.layerData;
+		const tilesets = layer.tilesets;
+		let tilesetCache = layer.tileset;
+		if (tilesetCache === null) {
+			return;
+		}
 		for (let y = startY; y !== endY; y += incY) {
 			for (let x = startX; x !== endX; x += incX) {
-				const tmxTile = layer.cellAt(x, y, false);
-				if (tmxTile) {
-					this.drawTile(renderer, x, y, tmxTile);
+				const idx = (y * cols + x) * 2;
+				const gid = data[idx];
+				if (!gid) {
+					continue;
+				}
+				const flipMask = data[idx + 1];
+				if (!tilesetCache.contains(gid)) {
+					tilesetCache = tilesets.getTilesetByGid(gid);
 				}
+				this.drawTileRaw(renderer, x, y, gid, flipMask, tilesetCache);
 			}
 		}
 	}
diff --git a/packages/melonjs/src/level/tiled/renderer/TMXOrthogonalRenderer.js b/packages/melonjs/src/level/tiled/renderer/TMXOrthogonalRenderer.js
index 7c65bed1e..a516805c1 100644
--- a/packages/melonjs/src/level/tiled/renderer/TMXOrthogonalRenderer.js
+++ b/packages/melonjs/src/level/tiled/renderer/TMXOrthogonalRenderer.js
@@ -40,7 +40,7 @@ export default class TMXOrthogonalRenderer extends TMXRenderer {
 	}
 
 	/**
-	 * draw the tile map
+	 * draw the tile map (legacy entry point — accepts a fully-constructed Tile)
 	 * @ignore
 	 */
 	drawTile(renderer, x, y, tmxTile) {
@@ -54,6 +54,21 @@ export default class TMXOrthogonalRenderer extends TMXRenderer {
 		);
 	}
 
+	/**
+	 * draw a tile from raw (gid, flipMask, tileset) data — used by the hot
+	 * rendering loop to bypass Tile construction
+	 * @ignore
+	 */
+	drawTileRaw(renderer, x, y, gid, flipMask, tileset) {
+		tileset.drawTileRaw(
+			renderer,
+			tileset.tileoffset.x + x * this.tilewidth,
+			tileset.tileoffset.y + (y + 1) * this.tileheight - tileset.tileheight,
+			gid,
+			flipMask,
+		);
+	}
+
 	/**
 	 * draw the tile map
 	 * @ignore
@@ -102,13 +117,34 @@ export default class TMXOrthogonalRenderer extends TMXRenderer {
 				break;
 		}
 
-		// main drawing loop
+		// main drawing loop — read (gid, flipMask) straight from the typed
+		// array and resolve the tileset with a short-circuit cache (the common
+		// single-tileset case never enters the lookup branch)
+		const cols = layer.cols;
+		const data = layer.layerData;
+		const tilesets = layer.tilesets;
+		let tilesetCache = layer.tileset;
+		if (tilesetCache === null) {
+			// no tilesets attached — nothing to draw
+			vector2dPool.release(start);
+			vector2dPool.release(end);
+			return;
+		}
 		for (let y = start.y; y !== end.y; y += incY) {
 			for (let x = start.x; x !== end.x; x += incX) {
-				const tmxTile = layer.cellAt(x, y, false);
-				if (tmxTile) {
-					this.drawTile(renderer, x, y, tmxTile);
+				const idx = (y * cols + x) * 2;
+				const gid = data[idx];
+				// `!gid` covers both empty cells (0) and out-of-range reads
+				// (Uint16Array returns undefined for idx beyond length — happens
+				// when a non-default renderorder swap pushes start past `cols`)
+				if (!gid) {
+					continue;
+				}
+				const flipMask = data[idx + 1];
+				if (!tilesetCache.contains(gid)) {
+					tilesetCache = tilesets.getTilesetByGid(gid);
 				}
+				this.drawTileRaw(renderer, x, y, gid, flipMask, tilesetCache);
 			}
 		}
 
diff --git a/packages/melonjs/src/physics/world.js b/packages/melonjs/src/physics/world.js
index cbd6c022c..71e78239e 100644
--- a/packages/melonjs/src/physics/world.js
+++ b/packages/melonjs/src/physics/world.js
@@ -81,6 +81,23 @@ export default class World extends Container {
 		 */
 		this.preRender = false;
 
+		/**
+		 * Enable the WebGL2 procedural shader path for orthogonal tile
+		 * layers. When `true` (default), eligible layers render via a
+		 * single quad per tileset + a fragment shader doing per-pixel GID
+		 * lookup — bypassing the per-tile drawImage loop entirely.
+		 * Supported features on the shader path: animated tiles, all
+		 * three flip bits (H/V/AD), per-layer opacity/tint/blend mode,
+		 * and oversized bottom-aligned tiles up to 4 cells of overflow.
+		 * Layers that don't qualify (Canvas/WebGL1, non-orthogonal,
+		 * collection-of-image tilesets, non-zero `tileoffset`, or tile
+		 * overflow beyond the shader's 4-cell limit) fall back to the
+		 * legacy path automatically. Set to `false` to disable globally.
+		 * @type {boolean}
+		 * @default true
+		 */
+		this.gpuTilemap = true;
+
 		/**
 		 * the active physic bodies in this simulation
 		 * @type {Set<Body>}
diff --git a/packages/melonjs/src/video/renderer.js b/packages/melonjs/src/video/renderer.js
index b5d8a7a03..0fb3b32c9 100644
--- a/packages/melonjs/src/video/renderer.js
+++ b/packages/melonjs/src/video/renderer.js
@@ -465,6 +465,47 @@ export default class Renderer {
 		// base no-op; concrete renderers override
 	}
 
+	/**
+	 * Draw a TMX tile layer. Default behavior:
+	 *   - if `layer.canvasRenderer` is set (preRender bake), blit the cached
+	 *     offscreen canvas in a single `drawImage` call;
+	 *   - otherwise delegate to the layer's TMX orientation renderer for
+	 *     the per-tile loop.
+	 *
+	 * `WebGLRenderer` overrides this to add the procedural shader fast
+	 * path on top (when `layer.renderMode === "shader"`) and fall through
+	 * to this base behavior for all other layers.
+	 * @param {object} layer - the TMXLayer to draw
+	 * @param {Rect} rect - the visible region in world coords
+	 */
+	drawTileLayer(layer, rect) {
+		if (layer.canvasRenderer) {
+			// clamp the source rect to the cached canvas bounds — the
+			// visible region can start inside the layer (camera scrolled
+			// in from the origin) and extend past the layer's right /
+			// bottom edge, so the source width must be reduced by
+			// `rect.pos.*` to avoid reading past the canvas
+			const width = Math.min(rect.width, layer.width - rect.pos.x);
+			const height = Math.min(rect.height, layer.height - rect.pos.y);
+			if (width <= 0 || height <= 0) {
+				return;
+			}
+			this.drawImage(
+				layer.canvasRenderer.getCanvas(),
+				rect.pos.x,
+				rect.pos.y,
+				width,
+				height,
+				rect.pos.x,
+				rect.pos.y,
+				width,
+				height,
+			);
+			return;
+		}
+		layer.getRenderer().drawTileLayer(this, layer, rect);
+	}
+
 	/**
 	 * Set the current fill & stroke style color.
 	 * By default, or upon reset, the value is set to #000000.
diff --git a/packages/melonjs/src/video/texture/cache.js b/packages/melonjs/src/video/texture/cache.js
index ec1fbdf24..145470190 100644
--- a/packages/melonjs/src/video/texture/cache.js
+++ b/packages/melonjs/src/video/texture/cache.js
@@ -1,4 +1,3 @@
-import { isPowerOfTwo } from "./../../math/math.ts";
 import { ArrayMultimap } from "../../utils/array-multimap.js";
 import { getBasename } from "../../utils/file.ts";
 import { createAtlas, TextureAtlas } from "./atlas.js";
@@ -113,26 +112,6 @@ class TextureCache {
 	 * cache the textureAltas for the given image
 	 */
 	set(image, textureAtlas) {
-		const width = image.width || image.videoWidth;
-		const height = image.height || image.videoHeight;
-
-		// warn if a non POT texture is added to the cache when using WebGL1
-		if (
-			this.renderer.WebGLVersion === 1 &&
-			(!isPowerOfTwo(width) || !isPowerOfTwo(height))
-		) {
-			const src = typeof image.src !== "undefined" ? image.src : image;
-			console.warn(
-				"[Texture] " +
-					src +
-					" is not a POT texture " +
-					"(" +
-					width +
-					"x" +
-					height +
-					")",
-			);
-		}
 		return this.cache.put(image, textureAtlas);
 	}
 
diff --git a/packages/melonjs/src/video/texture/resource.js b/packages/melonjs/src/video/texture/resource.js
new file mode 100644
index 000000000..c68955dcb
--- /dev/null
+++ b/packages/melonjs/src/video/texture/resource.js
@@ -0,0 +1,155 @@
+/**
+ * A texture data source that knows how to upload itself to a WebGL
+ * texture. Subclasses provide the actual upload logic for their kind
+ * of source (raw buffer, image, compressed data, etc.).
+ *
+ * Resources flow through the same `TextureCache` / batcher machinery
+ * as image-backed `TextureAtlas` instances: they expose the minimal
+ * shape (`sources`, `activeAtlas`, `getTexture()`, plus `width` /
+ * `height` / `premultipliedAlpha` / `repeat` / `filter`) the cache
+ * uses for unit allocation and the batcher uses for `boundTextures`
+ * bookkeeping. The cache therefore owns every `gl.bindTexture` call,
+ * which keeps the JS-side binding state in lockstep with the actual
+ * GL state across all texture kinds — image atlases included.
+ *
+ * Subclasses MUST implement `upload(gl, target)`. The framework calls
+ * it once per texture on first use (and again on forced re-upload via
+ * `batcher.uploadTexture(resource, w, h, true)`).
+ *
+ * @category Rendering
+ */
+export class TextureResource {
+	/**
+	 * @param {object} options
+	 * @param {number} options.width   - pixel width of the texture
+	 * @param {number} options.height  - pixel height of the texture
+	 * @param {boolean} [options.premultipliedAlpha=false]
+	 * @param {string}  [options.repeat="no-repeat"] - "no-repeat" | "repeat" | "repeat-x" | "repeat-y"
+	 * @param {number}  [options.filter] - `gl.NEAREST` or `gl.LINEAR`; when
+	 *   omitted the batcher falls back to the renderer's `antiAlias` setting
+	 */
+	constructor({
+		width,
+		height,
+		premultipliedAlpha = false,
+		repeat = "no-repeat",
+		filter,
+	} = {}) {
+		/** @type {number} */
+		this.width = width;
+		/** @type {number} */
+		this.height = height;
+		/** @type {boolean} */
+		this.premultipliedAlpha = premultipliedAlpha;
+		/** @type {string} */
+		this.repeat = repeat;
+		/** @type {number|undefined} */
+		this.filter = filter;
+
+		// minimal `TextureAtlas`-shaped surface for the cache + batcher
+		this.sources = new Map([["default", this]]);
+		this.activeAtlas = "default";
+	}
+
+	/**
+	 * Returns the upload "source" the batcher hands to `createTexture2D`.
+	 * For a resource this is the resource itself — `createTexture2D`
+	 * dispatches to `resource.upload(gl, target)`.
+	 * @ignore
+	 */
+	getTexture() {
+		return this;
+	}
+
+	/**
+	 * Issue the `gl.texImage2D` (or equivalent) call that uploads this
+	 * resource's data into the currently-bound `TEXTURE_2D` slot.
+	 * Subclasses MUST override.
+	 * @abstract
+	 * @param {WebGLRenderingContext|WebGL2RenderingContext} gl
+	 * @param {number} target - `gl.TEXTURE_2D` (or future cube-map targets)
+	 */
+	// eslint-disable-next-line no-unused-vars, @typescript-eslint/no-unused-vars
+	upload(gl, target) {
+		throw new Error("TextureResource subclasses must implement upload()");
+	}
+}
+
+/**
+ * A texture sourced from a raw byte buffer. Used for synthesized
+ * textures (TMX layer GID index, font atlases, color LUTs, signed-
+ * distance fields, palette tables, etc.) — any case where the texture
+ * data isn't an image file.
+ *
+ * The buffer is uploaded as-is; the resource's `premultipliedAlpha`
+ * flag is applied at upload time so a raw-data texture (typical:
+ * `premultipliedAlpha = false`) doesn't get its RGB wiped by the
+ * driver when the alpha channel is zero.
+ *
+ * @category Rendering
+ */
+export class BufferTextureResource extends TextureResource {
+	/**
+	 * @param {ArrayBufferView} data - the pixel data; size must be
+	 *   `width * height * 4` bytes for the default RGBA / UNSIGNED_BYTE
+	 *   format
+	 * @param {object} options
+	 * @param {number}  options.width
+	 * @param {number}  options.height
+	 * @param {boolean} [options.premultipliedAlpha=false]
+	 * @param {string}  [options.repeat="no-repeat"]
+	 * @param {number}  [options.filter]
+	 * @param {"rgba8"|"rgba8ui"} [options.format="rgba8"] - storage format.
+	 *   `"rgba8"` (default): normalized RGBA, sampled via `sampler2D` /
+	 *   `texture()`. `"rgba8ui"`: unsigned-integer RGBA, sampled via
+	 *   `usampler2D` / `texelFetch()` — requires WebGL2. Use the integer
+	 *   form for raw-data lookups (GID tables, palette indices, etc.) to
+	 *   skip the float-decode round trip and gain exact integer reads.
+	 */
+	constructor(data, options) {
+		super(options);
+		/** @type {ArrayBufferView} */
+		this.data = data;
+		/** @type {string} */
+		this.format = options.format || "rgba8";
+	}
+
+	/** @ignore */
+	upload(gl, target) {
+		if (this.format === "rgba8ui") {
+			// `RGBA8UI` / `RGBA_INTEGER` are WebGL 2-only enums. On a
+			// WebGL 1 context they're `undefined`, which would otherwise
+			// silently invoke `texImage2D` with bogus values and corrupt
+			// the texture — surface a clear error so callers know to
+			// either drop down to `rgba8` or guard their construction.
+			if (typeof gl.RGBA8UI === "undefined") {
+				throw new Error(
+					'BufferTextureResource: format "rgba8ui" requires a WebGL 2 context',
+				);
+			}
+			gl.texImage2D(
+				target,
+				0,
+				gl.RGBA8UI,
+				this.width,
+				this.height,
+				0,
+				gl.RGBA_INTEGER,
+				gl.UNSIGNED_BYTE,
+				this.data,
+			);
+		} else {
+			gl.texImage2D(
+				target,
+				0,
+				gl.RGBA,
+				this.width,
+				this.height,
+				0,
+				gl.RGBA,
+				gl.UNSIGNED_BYTE,
+				this.data,
+			);
+		}
+	}
+}
diff --git a/packages/melonjs/src/video/webgl/batchers/material_batcher.js b/packages/melonjs/src/video/webgl/batchers/material_batcher.js
index cad4d063b..ee2451155 100644
--- a/packages/melonjs/src/video/webgl/batchers/material_batcher.js
+++ b/packages/melonjs/src/video/webgl/batchers/material_batcher.js
@@ -81,17 +81,34 @@ export class MaterialBatcher extends Batcher {
 	) {
 		const gl = this.gl;
 		const isPOT = isPowerOfTwo(w) && isPowerOfTwo(h);
+		const wantsRepeat = repeat !== "no-repeat";
+		const canRepeat = isPOT || this.renderer.WebGLVersion > 1;
 		const rs =
-			repeat.search(/^repeat(-x)?$/) === 0 &&
-			(isPOT || this.renderer.WebGLVersion > 1)
+			repeat.search(/^repeat(-x)?$/) === 0 && canRepeat
 				? gl.REPEAT
 				: gl.CLAMP_TO_EDGE;
 		const rt =
-			repeat.search(/^repeat(-y)?$/) === 0 &&
-			(isPOT || this.renderer.WebGLVersion > 1)
+			repeat.search(/^repeat(-y)?$/) === 0 && canRepeat
 				? gl.REPEAT
 				: gl.CLAMP_TO_EDGE;
 
+		// Warn (only when actually downgrading) — the caller asked for tiling
+		// but we have to clamp because WebGL 1 does not allow `REPEAT` on
+		// non-power-of-two textures. Their `repeat: "repeat*"` setting will
+		// have no visible effect. Either resize the source to POT or run on
+		// a WebGL 2 context.
+		if (wantsRepeat && !canRepeat) {
+			console.warn(
+				"melonJS: repeat wrap (" +
+					repeat +
+					") requested on a non-power-of-two texture (" +
+					w +
+					"x" +
+					h +
+					") under WebGL 1 — downgrading to clamp-to-edge",
+			);
+		}
+
 		let currentTexture = texture;
 		if (!currentTexture) {
 			currentTexture = gl.createTexture();
@@ -106,7 +123,12 @@ export class MaterialBatcher extends Batcher {
 
 		gl.pixelStorei(gl.UNPACK_PREMULTIPLY_ALPHA_WEBGL, premultipliedAlpha);
 
-		if (pixels !== null && pixels.compressed === true) {
+		if (pixels !== null && typeof pixels.upload === "function") {
+			// `TextureResource` path: the resource owns its upload (raw
+			// buffer, future synthesized sources, etc.). Keeps every
+			// `gl.texImage2D` variant in one place per source type.
+			pixels.upload(gl, gl.TEXTURE_2D);
+		} else if (pixels !== null && pixels.compressed === true) {
 			const mipmaps = pixels.mipmaps;
 			for (let i = 0; i < mipmaps.length; i++) {
 				gl.compressedTexImage2D(
@@ -174,9 +196,13 @@ export class MaterialBatcher extends Batcher {
 		if (
 			isPOT &&
 			mipmap === true &&
-			(pixels === null || pixels.compressed !== true)
+			pixels !== null &&
+			pixels.compressed !== true &&
+			typeof pixels.upload !== "function"
 		) {
 			gl.generateMipmap(gl.TEXTURE_2D);
+		} else if (pixels === null && isPOT && mipmap === true) {
+			gl.generateMipmap(gl.TEXTURE_2D);
 		}
 
 		return currentTexture;
@@ -260,19 +286,51 @@ export class MaterialBatcher extends Batcher {
 
 	/**
 	 * @ignore
+	 * @param {TextureAtlas|TextureResource} texture
+	 * @param {number} [w] - ignored when the source has its own `width` (the
+	 *   common case); kept for the legacy signature where callers passed a
+	 *   destination size. Forwarded only as a last-resort default.
+	 * @param {number} [h] - same as `w`.
+	 * @param {boolean} [force=false]
+	 * @param {boolean} [flush=true]
 	 */
 	uploadTexture(texture, w, h, force = false, flush = true) {
 		const unit = this.renderer.cache.getUnit(texture);
 		const texture2D = this.boundTextures[unit];
 
 		if (typeof texture2D === "undefined" || force) {
+			// honor a resource-specified filter (e.g. tilemap index textures
+			// need NEAREST regardless of the global antiAlias setting),
+			// otherwise fall back to the renderer-wide preference
+			const filter =
+				typeof texture.filter !== "undefined"
+					? texture.filter
+					: this.renderer.settings.antiAlias
+						? this.gl.LINEAR
+						: this.gl.NEAREST;
+			// `w`/`h` historically came from callers (e.g. `addQuad`) that
+			// passed the DESTINATION quad size, not the texture size. That
+			// broke the downstream POT check — a 480×1216 atlas drawn into
+			// a 256×256 quad reported `isPOT=true` and tripped
+			// `gl.generateMipmap` on WebGL 1. Always derive the actual
+			// texture dimensions from the source, falling back to the
+			// passed-in values only when the source has none.
+			const source = texture.getTexture();
+			// `HTMLVideoElement` exposes its real pixel dimensions through
+			// `videoWidth`/`videoHeight`; `width`/`height` default to 0
+			// until the element is explicitly sized. Prefer the regular
+			// width/height when non-zero, otherwise fall back to the
+			// video-specific properties, and finally to the caller-supplied
+			// w/h for sources that have neither.
+			const texW = source.width || source.videoWidth || w;
+			const texH = source.height || source.videoHeight || h;
 			this.createTexture2D(
 				unit,
-				texture.getTexture(),
-				this.renderer.settings.antiAlias ? this.gl.LINEAR : this.gl.NEAREST,
+				source,
+				filter,
 				texture.repeat,
-				w,
-				h,
+				texW,
+				texH,
 				texture.premultipliedAlpha,
 				undefined,
 				texture2D,
diff --git a/packages/melonjs/src/video/webgl/renderers/tmxlayer/orthogonal.js b/packages/melonjs/src/video/webgl/renderers/tmxlayer/orthogonal.js
new file mode 100644
index 000000000..6704af861
--- /dev/null
+++ b/packages/melonjs/src/video/webgl/renderers/tmxlayer/orthogonal.js
@@ -0,0 +1,403 @@
+import { BufferTextureResource } from "../../../texture/resource.js";
+import GLShader from "../../glshader.js";
+import orthogonalTMXLayerFragment from "../../shaders/orthogonal-tmxlayer.frag";
+import orthogonalTMXLayerVertex from "../../shaders/orthogonal-tmxlayer.vert";
+
+/**
+ * additional imports for TypeScript
+ * @import { default as TMXLayer } from "../../../../level/tiled/TMXLayer.js";
+ * @import { default as WebGLRenderer } from "../../webgl_renderer.js";
+ */
+
+// Default `uTint` when the layer has no tint set. Module-scoped so the
+// fallback path doesn't allocate per frame.
+const DEFAULT_TINT = new Float32Array([1, 1, 1, 1]);
+
+/**
+ * GPU-accelerated renderer for orthogonal TMX tile layers (WebGL2). Draws
+ * the visible region of a layer as one screen-aligned quad per tileset
+ * referenced by the layer — the fragment shader samples a per-layer GID
+ * index texture and the tileset atlas, eliminating the per-tile draw
+ * loop. The visible rect, GID range, tile size, opacity, and tint are
+ * pushed as uniforms; the index texture is uploaded once at activation
+ * and re-uploaded only when `layer.dataVersion` changes (mutations from
+ * `setTile`/`clearTile`).
+ *
+ * The per-layer index texture is a `BufferTextureResource` flowing
+ * through the standard `TextureCache` / batcher path — same lane as
+ * every other texture in the engine. Dynamic unit allocation, correct
+ * `boundTextures` bookkeeping, and per-resource premultiplied-alpha /
+ * filter all come for free; nothing here pokes `gl.bindTexture` or
+ * `gl.activeTexture` directly.
+ *
+ * Cache lifecycle: one `BufferTextureResource` per `TMXLayer`. Tile
+ * layers don't come and go individually — they only churn on game reset
+ * — so the cache is freed in bulk via `reset()`, called from
+ * `WebGLRenderer.reset()` (which the `GAME_RESET` event already
+ * triggers).
+ *
+ * @ignore
+ */
+export default class OrthogonalTMXLayerGPURenderer {
+	/**
+	 * @param {WebGLRenderer} renderer - the WebGL renderer instance
+	 */
+	constructor(renderer) {
+		this.renderer = renderer;
+		this.gl = renderer.gl;
+		// Standalone `GLShader` (not `ShaderEffect`) so we own both the
+		// vertex and the fragment source — they're paired in GLSL ES 3.00,
+		// which lets the fragment shader use `texelFetch` for byte-exact
+		// reads from the index / animation lookup textures (samplers stay
+		// `sampler2D` and decode the 8-bit channels as normalized floats;
+		// `usampler2D` would conflict with the engine's multi-texture
+		// batching cache). `setBatcher("quad", this.shader)` integrates
+		// the program with the quad batcher just like a `ShaderEffect` —
+		// only the per-fragment math gets cleaner.
+		this.shader = new GLShader(
+			renderer.gl,
+			orthogonalTMXLayerVertex,
+			orthogonalTMXLayerFragment,
+			renderer.shaderPrecision,
+		);
+		// per-layer `BufferTextureResource`. `Map` (not `WeakMap`) so we
+		// can iterate on `reset()` to walk through the global texture
+		// cache and free GL handles. The world's container holds the
+		// strong ref to each layer anyway, so this doesn't change layer
+		// lifetime.
+		this.resources = new Map();
+		// per-tileset animation-lookup resources, keyed by `TMXTileset`.
+		// Only allocated for tilesets that actually have animated tiles
+		// (`tileset.isAnimated === true`).
+		this.animLookups = new Map();
+		// pre-allocated scratch for uniform uploads — avoids per-frame
+		// allocation in the hot path
+		this._v2 = new Float32Array(2);
+		this._v4 = new Float32Array(4);
+	}
+
+	/**
+	 * Free every cached per-layer index texture and empty the local
+	 * resource map. Called from `WebGLRenderer.reset()` (which
+	 * `GAME_RESET` triggers) so each level transition starts clean.
+	 * @ignore
+	 */
+	reset() {
+		const batcher = this.renderer.currentBatcher;
+		const cache = this.renderer.cache;
+		const drop = (resource) => {
+			// route through the batcher so its `boundTextures` bookkeeping
+			// stays in sync. When no batcher is active (e.g. context tear-
+			// down) we don't have a clean GL deletion path, but we still
+			// need to free the unit assignment — `cache.delete()` only
+			// touches the image→atlas map and would leave the unit slot
+			// held forever otherwise, so call `freeTextureUnit()` too.
+			if (batcher !== undefined) {
+				batcher.deleteTexture2D(resource);
+			} else {
+				cache.freeTextureUnit(resource);
+				cache.delete(resource);
+			}
+		};
+		for (const resource of this.resources.values()) {
+			drop(resource);
+		}
+		for (const entry of this.animLookups.values()) {
+			drop(entry.resource);
+		}
+		this.resources.clear();
+		this.animLookups.clear();
+	}
+
+	/**
+	 * Write a `vec2` uniform without allocating a fresh Float32Array per
+	 * call. Both components flow into the shared `_v2` scratch buffer,
+	 * which `setUniform` reads synchronously and forwards to
+	 * `gl.uniform2fv` — so reusing the buffer across calls is safe.
+	 * @param {string} name
+	 * @param {number} x
+	 * @param {number} y
+	 * @private
+	 */
+	_setV2(name, x, y) {
+		this._v2[0] = x;
+		this._v2[1] = y;
+		this.shader.setUniform(name, this._v2);
+	}
+
+	/**
+	 * `vec4` counterpart to {@link _setV2}.
+	 * @param {string} name
+	 * @param {number} x
+	 * @param {number} y
+	 * @param {number} z
+	 * @param {number} w
+	 * @private
+	 */
+	_setV4(name, x, y, z, w) {
+		this._v4[0] = x;
+		this._v4[1] = y;
+		this._v4[2] = z;
+		this._v4[3] = w;
+		this.shader.setUniform(name, this._v4);
+	}
+
+	/**
+	 * Get-or-create the per-tileset animation-lookup entry. Returns
+	 * `undefined` for tilesets that have no animated tiles (the shader's
+	 * `uAnimEnabled` uniform is then set to 0 and the lookup texture is
+	 * not bound).
+	 *
+	 * The entry holds a `tileCount × 1` RGBA8 `BufferTextureResource`
+	 * where texel `localId` encodes the CURRENT frame's local id as
+	 * `R = lo byte, G = hi byte` (same encoding as the GID index
+	 * texture). Each call walks `tileset.animations` and rewrites
+	 * dirty texels — `tileset.update(dt)` (driven by the layer) advances
+	 * `anim.cur.tileid` independently of this renderer.
+	 *
+	 * @param {object} tileset
+	 * @param {number} tileCount - tiles in the tileset's atlas grid
+	 *   (`atlasCols * atlasRows`)
+	 * @returns {{resource: BufferTextureResource, data: Uint8Array,
+	 *   tileCount: number, dirty: boolean}|undefined}
+	 */
+	_getOrUpdateAnimLookup(tileset, tileCount) {
+		if (!tileset.isAnimated || tileset.animations.size === 0) {
+			return undefined;
+		}
+		let entry = this.animLookups.get(tileset);
+		if (entry === undefined) {
+			// allocate the lookup texture and initialize it to identity
+			// (localId → localId); animated entries get overwritten below
+			const data = new Uint8Array(tileCount * 4);
+			for (let id = 0; id < tileCount; id++) {
+				data[id * 4 + 0] = id & 0xff;
+				data[id * 4 + 1] = (id >> 8) & 0xff;
+			}
+			const resource = new BufferTextureResource(data, {
+				width: tileCount,
+				height: 1,
+				premultipliedAlpha: false,
+				repeat: "no-repeat",
+				// NEAREST so the shader's `texelFetch` reads byte-exact
+				// channel values back as normalized floats
+				filter: this.gl.NEAREST,
+				format: "rgba8",
+			});
+			// new entry → first upload happens unconditionally via the
+			// batcher's `boundTextures[unit] === undefined` path
+			entry = { resource, data, tileCount, dirty: false };
+			this.animLookups.set(tileset, entry);
+		}
+		// walk animations and rewrite the current-frame ids. The dirty
+		// flag flips true only when at least one texel actually changed
+		// — the batcher then force-reuploads on the next bind, otherwise
+		// reuses the existing GL texture untouched.
+		const data = entry.data;
+		for (const [localId, anim] of tileset.animations) {
+			const off = localId * 4;
+			const cur = anim.cur.tileid;
+			const lo = cur & 0xff;
+			const hi = (cur >> 8) & 0xff;
+			if (data[off] !== lo || data[off + 1] !== hi) {
+				data[off] = lo;
+				data[off + 1] = hi;
+				entry.dirty = true;
+			}
+		}
+		return entry;
+	}
+
+	/**
+	 * Get-or-create the per-layer index `BufferTextureResource`.
+	 * @param {TMXLayer} layer
+	 * @returns {BufferTextureResource}
+	 */
+	_getResource(layer) {
+		let resource = this.resources.get(layer);
+		if (resource === undefined) {
+			// reinterpret the layer's `Uint16Array` payload as RGBA bytes
+			// (zero-copy on every little-endian platform = every browser)
+			resource = new BufferTextureResource(
+				new Uint8Array(layer.layerData.buffer),
+				{
+					width: layer.cols,
+					height: layer.rows,
+					// raw GID bytes — must NOT have alpha pre-multiplied
+					// into RGB, otherwise A=0 cells wipe their R/G/B
+					premultipliedAlpha: false,
+					repeat: "no-repeat",
+					// NEAREST so `texelFetch` returns the original byte
+					// values (as normalized floats) for the GID/flip
+					// decode below
+					filter: this.gl.NEAREST,
+					format: "rgba8",
+				},
+			);
+			// track the last-uploaded version so we know when to force a
+			// re-upload after `setTile` / `clearTile` mutations
+			resource._uploadedVersion = -1;
+			this.resources.set(layer, resource);
+		}
+		return resource;
+	}
+
+	/**
+	 * Draw an orthogonal TMX layer through the shader path.
+	 * @param {TMXLayer} layer
+	 * @param {object} rect - the visible viewport rect (world coords)
+	 */
+	draw(layer, rect) {
+		const renderer = this.renderer;
+		const tileWidth = layer.tilewidth;
+		const tileHeight = layer.tileheight;
+		const cols = layer.cols;
+		const rows = layer.rows;
+
+		// compute the visible region in tile-coord space
+		const startTileX = Math.max(0, Math.floor(rect.pos.x / tileWidth));
+		const startTileY = Math.max(0, Math.floor(rect.pos.y / tileHeight));
+		const endTileX = Math.min(
+			cols,
+			Math.ceil((rect.pos.x + rect.width) / tileWidth),
+		);
+		const endTileY = Math.min(
+			rows,
+			Math.ceil((rect.pos.y + rect.height) / tileHeight),
+		);
+
+		// visible rect entirely outside the layer — nothing to draw
+		if (endTileX <= startTileX || endTileY <= startTileY) {
+			return;
+		}
+
+		// quad world coords
+		const worldX = startTileX * tileWidth;
+		const worldY = startTileY * tileHeight;
+		const worldW = (endTileX - startTileX) * tileWidth;
+		const worldH = (endTileY - startTileY) * tileHeight;
+
+		// visible region size in tile units (matches the quad's [0..1] UV
+		// span — shader recovers the per-fragment tile coord as
+		// `uVisibleStart + uv * uVisibleSize`)
+		const visStartX = startTileX;
+		const visStartY = startTileY;
+		const visSizeX = endTileX - startTileX;
+		const visSizeY = endTileY - startTileY;
+
+		// switch the quad batcher to our custom shader. This flushes any
+		// queued sprite vertices first so they render under their original
+		// shader, then re-binds for our pass.
+		const batcher = renderer.setBatcher("quad", this.shader);
+
+		// upload (or rebind) the index texture through the standard cache
+		// path: dynamic unit allocation, correct `boundTextures` tracking,
+		// per-resource premultiply / filter applied automatically
+		const resource = this._getResource(layer);
+		const indexUnit = batcher.uploadTexture(
+			resource,
+			cols,
+			rows,
+			resource._uploadedVersion !== layer.dataVersion,
+		);
+		resource._uploadedVersion = layer.dataVersion;
+
+		this.shader.setUniform("uTileIndex", indexUnit);
+		this._setV2("uMapSize", cols, rows);
+		this._setV2("uVisibleStart", visStartX, visStartY);
+		this._setV2("uVisibleSize", visSizeX, visSizeY);
+		this._setV2("uCellSize", tileWidth, tileHeight);
+		this.shader.setUniform("uOpacity", layer.getOpacity());
+		this.shader.setUniform(
+			"uTint",
+			layer.tint ? layer.tint.toArray() : DEFAULT_TINT,
+		);
+
+		// one pass per tileset — the shader's uGidRange uniform + discard
+		// hides cells outside the active tileset's GID range
+		const tilesets = layer.tilesets.tilesets;
+		for (let i = 0; i < tilesets.length; i++) {
+			const tileset = tilesets[i];
+			// skip collection-of-image tilesets (caller's eligibility check
+			// should have downgraded these layers to the legacy path, but
+			// guard defensively — we can't compute uTilesetCols for them)
+			if (tileset.isCollection || tileset.image === undefined) {
+				continue;
+			}
+
+			const tsW = tileset.tilewidth;
+			const tsH = tileset.tileheight;
+			const margin = tileset.margin;
+			const spacing = tileset.spacing;
+			const atlasW = tileset.image.width;
+			const atlasH = tileset.image.height;
+
+			// atlas geometry (margin + spacing) — number of tiles per row/col
+			// in the atlas image
+			const atlasCols = Math.max(
+				1,
+				Math.floor((atlasW - margin * 2 + spacing) / (tsW + spacing)),
+			);
+			const atlasRows = Math.max(
+				1,
+				Math.floor((atlasH - margin * 2 + spacing) / (tsH + spacing)),
+			);
+
+			// per-tileset animation lookup. Only allocated/bound for
+			// tilesets that actually have animated tiles — static tilesets
+			// pay neither a texture-unit nor an upload.
+			const tileCount = atlasCols * atlasRows;
+			const animEntry = this._getOrUpdateAnimLookup(tileset, tileCount);
+			if (animEntry !== undefined) {
+				const animUnit = batcher.uploadTexture(
+					animEntry.resource,
+					animEntry.tileCount,
+					1,
+					animEntry.dirty,
+				);
+				animEntry.dirty = false;
+				this.shader.setUniform("uAnimLookup", animUnit);
+				this.shader.setUniform("uAnimSize", animEntry.tileCount);
+			} else {
+				// shader skips the animation lookup when uAnimSize == 0,
+				// but every declared sampler must still point at a valid
+				// unit — reuse the index unit so WebGL's draw-time
+				// validation is happy (the sampler is never read because
+				// of the uAnimSize guard)
+				this.shader.setUniform("uAnimLookup", indexUnit);
+				this.shader.setUniform("uAnimSize", 0);
+			}
+
+			this._setV2("uTileSize", tsW, tsH);
+			this._setV2("uTilesetCols", atlasCols, atlasRows);
+			this._setV2("uInvTilesetSize", 1 / atlasW, 1 / atlasH);
+			this._setV4("uTilesetMargin", margin, margin, spacing, spacing);
+			this._setV2(
+				"uOverflow",
+				Math.max(0, Math.ceil(tsW / tileWidth) - 1),
+				Math.max(0, Math.ceil(tsH / tileHeight) - 1),
+			);
+			this._setV2("uGidRange", tileset.firstgid, tileset.lastgid);
+
+			// emit one screen-quad. The batcher will bind the tileset's
+			// TextureAtlas to uSampler (single-texture fallback path is
+			// active because we have a custom ShaderEffect).
+			batcher.addQuad(
+				tileset.texture,
+				worldX,
+				worldY,
+				worldW,
+				worldH,
+				0,
+				0,
+				1,
+				1,
+				0xffffffff,
+			);
+
+			// flush per-pass so each tileset's draw call uses its own
+			// uniforms (uGidRange / uTileSize / uTilesetCols differ)
+			batcher.flush();
+		}
+	}
+}
diff --git a/packages/melonjs/src/video/webgl/shaders/orthogonal-tmxlayer.frag b/packages/melonjs/src/video/webgl/shaders/orthogonal-tmxlayer.frag
new file mode 100644
index 000000000..aa4f23dae
--- /dev/null
+++ b/packages/melonjs/src/video/webgl/shaders/orthogonal-tmxlayer.frag
@@ -0,0 +1,190 @@
+#version 300 es
+
+// Fragment shader for the orthogonal TMX layer GPU renderer (WebGL2 /
+// GLSL ES 3.00).
+//
+// Per fragment the shader:
+//   1. recovers the world-pixel position from the host UV,
+//   2. walks candidate cells (geometric cell + cells whose oversized,
+//      bottom-aligned tiles could reach this fragment),
+//   3. fetches GIDs from the per-layer index texture, and
+//   4. samples the tileset atlas at the correct sub-region.
+//
+// Fast path: when the tileset has no oversized tiles (`uOverflow == (0, 0)`)
+// — the common case for grid-aligned maps — only the geometric cell can
+// hold this fragment's tile, so we skip the candidate loop entirely and
+// run a single `tryRenderCell` call. The loop branch is on a uniform
+// value, coherent across the wave, and trims 25 worst-case loop
+// iterations + their guard checks to a single inlined function call.
+//
+// Slow path: tiles drawing larger than the cell are bottom-aligned
+// vertically and left-aligned horizontally. Render order is "right-down":
+// later cells end up on top, so the candidate loop scans dy high→low,
+// dx low→high and picks the FIRST match.
+//
+// Index texture encoding (`RGBA8`, one cell per texel):
+//   R = GID low byte
+//   G = GID high byte    (combined: R | (G << 8) = 16-bit GID)
+//   B = flip mask        (bit 0 = H, bit 1 = V, bit 2 = AD)
+//   A = unused
+//
+// Animation lookup (`RGBA8`, 1 row, `tileCount` texels wide): per local
+// tile id, the CURRENT frame's local id, same R/G byte-pair encoding.
+//
+// Why `sampler2D` + float decode rather than `usampler2D`: the engine's
+// multi-texture default shader declares `uSampler0..uSamplerN-1` as
+// `sampler2D` — all of them are active for WebGL's draw-time validation.
+// A `usampler2D`-backed `RGBA8UI` texture cached at any of those units
+// (units 0..15 on a typical 16-unit fragment stage) would mismatch when
+// the default shader next draws sprites, killing every quad with
+// `GL_INVALID_OPERATION`. Staying on regular RGBA8 keeps the cache
+// path coherent — the cost is one `floor(c * 255 + 0.5)` per fetch.
+//
+// `texelFetch` is still used (vs `texture()`) for byte-exact reads —
+// it bypasses interpolation, so the integer byte values come out
+// unmolested even on a normalized-float sampler.
+
+// The engine's `setPrecision` step injects precision declarations for
+// float and int after the `#version` line, using whatever precision the
+// renderer was configured with (`highPrecisionShader` setting on the
+// Application). Individual shader files don't hardcode precision so the
+// engine-wide preference applies.
+
+in vec2 vRegion;
+in vec4 vColor;
+
+uniform sampler2D uSampler;      // tileset atlas (RGBA)
+uniform sampler2D uTileIndex;    // per-layer GID index (RGBA8)
+uniform sampler2D uAnimLookup;   // per-tileset animation table (RGBA8)
+
+uniform vec2 uMapSize;
+uniform vec2 uCellSize;
+uniform vec2 uTileSize;
+uniform vec2 uOverflow;
+uniform vec2 uTilesetCols;
+uniform vec2 uInvTilesetSize;
+uniform vec4 uTilesetMargin;     // (marginX, marginY, spacingX, spacingY)
+uniform vec2 uGidRange;          // (firstgid, lastgid)
+uniform vec2 uVisibleStart;
+uniform vec2 uVisibleSize;
+uniform int  uAnimSize;          // number of entries in uAnimLookup, 0 if disabled
+uniform float uOpacity;
+uniform vec4 uTint;
+
+out vec4 fragColor;
+
+const int MAX_OVERFLOW = 4;
+
+// Try to render the tile at cell (cx, cy) for the current fragment.
+// Returns true and writes the sampled color to `outColor` when the cell
+// contains a visible, in-range tile whose pixel covers this fragment.
+// Identical logic for both fast and slow paths — GLSL inlines this
+// trivially so there's no function-call overhead at runtime.
+bool tryRenderCell(int cx, int cy, vec2 worldPx, out vec4 outColor) {
+	int mapW = int(uMapSize.x);
+	int mapH = int(uMapSize.y);
+	if (cx < 0 || cx >= mapW || cy < 0 || cy >= mapH) {
+		return false;
+	}
+
+	// `texelFetch` skips filtering — the 8-bit channel values come back
+	// as normalized floats, decoded to byte ints below
+	vec4 cellF = texelFetch(uTileIndex, ivec2(cx, cy), 0);
+	uvec4 cell = uvec4(cellF * 255.0 + 0.5);
+	int firstGid = int(uGidRange.x);
+	int gid = int(cell.r) | (int(cell.g) << 8);
+	if (gid < firstGid || gid > int(uGidRange.y)) {
+		return false;
+	}
+
+	vec2 tileWorldOrigin = vec2(
+		float(cx) * uCellSize.x,
+		(float(cy) + 1.0) * uCellSize.y - uTileSize.y
+	);
+	vec2 inTile = (worldPx - tileWorldOrigin) / uTileSize;
+	if (inTile.x < 0.0 || inTile.x >= 1.0 || inTile.y < 0.0 || inTile.y >= 1.0) {
+		return false;
+	}
+
+	// flip mask + axis-swap trick: see TMX shader-path flip spec.
+	// AD performs a transpose; with AD set, H and V swap their effective
+	// axes (matches the legacy `buildFlipTransform`).
+	int flipMask = int(cell.b);
+	float flipH = float(flipMask & 1);
+	float flipV = float((flipMask >> 1) & 1);
+	float flipAD = float((flipMask >> 2) & 1);
+	inTile = mix(inTile, inTile.yx, flipAD);
+	float effH = mix(flipH, flipV, flipAD);
+	float effV = mix(flipV, flipH, flipAD);
+	inTile.x = mix(inTile.x, 1.0 - inTile.x, effH);
+	inTile.y = mix(inTile.y, 1.0 - inTile.y, effV);
+
+	int localId = gid - firstGid;
+
+	// animation: if the tileset has animated tiles, swap the local id for
+	// its current frame's id via the lookup texture (CPU updates the
+	// lookup in lockstep with `tileset.update(dt)`).
+	if (uAnimSize > 0) {
+		vec4 animF = texelFetch(uAnimLookup, ivec2(localId, 0), 0);
+		uvec4 animTexel = uvec4(animF * 255.0 + 0.5);
+		localId = int(animTexel.r) | (int(animTexel.g) << 8);
+	}
+
+	float row = floor(float(localId) / uTilesetCols.x);
+	float col = float(localId) - row * uTilesetCols.x;
+	vec2 tileOriginPx = uTilesetMargin.xy
+		+ vec2(col, row) * (uTileSize + uTilesetMargin.zw);
+	vec2 texelPx = tileOriginPx + inTile * uTileSize;
+	vec2 texelUV = texelPx * uInvTilesetSize;
+
+	vec4 sampled = texture(uSampler, texelUV);
+	if (sampled.a <= 0.0) {
+		return false;
+	}
+	outColor = sampled;
+	return true;
+}
+
+void main(void) {
+	vec2 tileCoord = uVisibleStart + vRegion * uVisibleSize;
+	vec2 geomCell = floor(tileCoord);
+	vec2 worldPx = tileCoord * uCellSize;
+
+	int gx = int(geomCell.x);
+	int gy = int(geomCell.y);
+	int overflowX = int(uOverflow.x + 0.5);
+	int overflowY = int(uOverflow.y + 0.5);
+
+	vec4 result;
+
+	// Fast path: tiles fit the cell exactly (the common case) — only the
+	// geometric cell can contain this fragment's tile.
+	if (overflowX == 0 && overflowY == 0) {
+		if (!tryRenderCell(gx, gy, worldPx, result)) {
+			discard;
+		}
+		result.a *= uOpacity;
+		fragColor = result * uTint;
+		return;
+	}
+
+	// Slow path: oversized tiles. Walk candidates dy high→low, dx low→high
+	// and pick the FIRST match — render order is "right-down" so later
+	// cells go on top.
+	bool found = false;
+	for (int idy = 0; idy <= MAX_OVERFLOW; idy++) {
+		int dy = MAX_OVERFLOW - idy;
+		if (dy > overflowY) continue;
+		for (int dx = 0; dx <= MAX_OVERFLOW; dx++) {
+			if (dx > overflowX) break;
+			if (tryRenderCell(gx - dx, gy + dy, worldPx, result)) {
+				found = true;
+				break;
+			}
+		}
+		if (found) break;
+	}
+	if (!found) discard;
+	result.a *= uOpacity;
+	fragColor = result * uTint;
+}
diff --git a/packages/melonjs/src/video/webgl/shaders/orthogonal-tmxlayer.vert b/packages/melonjs/src/video/webgl/shaders/orthogonal-tmxlayer.vert
new file mode 100644
index 000000000..879fa4a40
--- /dev/null
+++ b/packages/melonjs/src/video/webgl/shaders/orthogonal-tmxlayer.vert
@@ -0,0 +1,27 @@
+#version 300 es
+
+// Vertex shader for the orthogonal TMX layer GPU renderer.
+//
+// Matches the quad batcher's vertex layout (`aVertex`, `aRegion`, `aColor`,
+// `uProjectionMatrix`) so the standard `setBatcher("quad", this.shader)` +
+// `addQuad()` flow drives it like any other quad. Same attribute names,
+// same uniforms — just expressed in GLSL ES 3.00 (`in`/`out` in place of
+// `attribute`/`varying`) so the program can pair with the 3.00 fragment
+// shader that uses `usampler2D` / `texelFetch` for integer-typed lookups.
+
+in vec2 aVertex;
+in vec2 aRegion;
+in vec4 aColor;
+
+uniform mat4 uProjectionMatrix;
+
+out vec2 vRegion;
+out vec4 vColor;
+
+void main(void) {
+	gl_Position = uProjectionMatrix * vec4(aVertex, 0.0, 1.0);
+	// premultiplied-alpha + bgra → rgba swap, same convention the batcher
+	// uses for its default sprite shader
+	vColor = vec4(aColor.bgr * aColor.a, aColor.a);
+	vRegion = aRegion;
+}
diff --git a/packages/melonjs/src/video/webgl/utils/attributes.js b/packages/melonjs/src/video/webgl/utils/attributes.js
index 1534f623a..ca5b476bd 100644
--- a/packages/melonjs/src/video/webgl/utils/attributes.js
+++ b/packages/melonjs/src/video/webgl/utils/attributes.js
@@ -1,13 +1,27 @@
 /**
+ * Pick out every vertex attribute name from a shader source, regardless of
+ * GLSL version. GLSL 1.00 marks attributes with the `attribute` storage
+ * qualifier; GLSL ES 3.00 reuses `in` at file scope for the same purpose
+ * (and `in` inside function parameter lists, which we exclude by requiring
+ * the qualifier to start at the beginning of a line). Skipping the 3.00
+ * form leaves the shader with no bound vertex data and the rasterizer
+ * silently degenerates every triangle.
  * @ignore
  */
 export function extractAttributes(gl, shader) {
 	const attributes = {};
-	const attrRx = /attribute\s+\w+\s+(\w+)/g;
+	// Match either `attribute <type> <name>` (GLSL 1.00) or `in <type>
+	// <name>` (GLSL 3.00), with an optional precision / aux qualifier in
+	// front of the type (`highp vec3`, `mediump vec2`, etc.). Whitespace
+	// inside the declaration is restricted to `[ \t]+` (horizontal only)
+	// rather than `\s+` — letting `\s` also match `\n` made the regex
+	// ambiguous around line boundaries and CodeQL flagged it as a
+	// polynomial-time ReDoS risk on shader sources with many newlines.
+	// Declarations span one line in practice, so this is a non-issue.
+	const attrRx = /(?:^|\n)[ \t]*(?:attribute|in)[ \t]+(?:\w+[ \t]+)+(\w+)/g;
 	let match;
 	let i = 0;
 
-	// Detect all attribute names
 	while ((match = attrRx.exec(shader.vertex))) {
 		attributes[match[1]] = i++;
 	}
diff --git a/packages/melonjs/src/video/webgl/utils/precision.js b/packages/melonjs/src/video/webgl/utils/precision.js
index 52c6918fc..c106e2dd8 100644
--- a/packages/melonjs/src/video/webgl/utils/precision.js
+++ b/packages/melonjs/src/video/webgl/utils/precision.js
@@ -1,13 +1,39 @@
 /**
- * set precision for the fiven shader source
+ * set precision for the given shader source
  * won't do anything if the precision is already specified
  * @ignore
  */
 export function setPrecision(src, precision) {
-	if (src.substring(0, 9) !== "precision") {
-		return "precision " + precision + " float;\n" + src;
+	// Skip injection when the shader already declares precision near the
+	// top — either at byte 0 (GLSL 1.00) or right after the `#version`
+	// directive (GLSL 3.00). The substring check we used to do only caught
+	// the byte-0 case, so a 3.00 shader with its own `precision highp
+	// float;` after `#version` was getting a second precision line injected
+	// (still legal GLSL but at best wasteful and at worst overriding user
+	// intent).
+	if (/^\s*(?:#version[^\n]*\n)?\s*precision\b/.test(src)) {
+		return src;
 	}
-	return src;
+	// WebGL2 GLSL 3.00 requires `#version 300 es` as the very first line,
+	// before any other directive. Preserve it and insert precisions after.
+	// For 3.00 shaders we inject precisions for float + int so individual
+	// shader files don't have to hardcode them; the engine's chosen
+	// precision applies uniformly. GLSL 1.00 shaders only need a float
+	// precision declaration (samplers and ints have default precisions
+	// there).
+	if (src.substring(0, 8) === "#version") {
+		const inject =
+			"\nprecision " + precision + " float;\nprecision " + precision + " int;";
+		const nl = src.indexOf("\n");
+		// A single-line shader (just `#version 300 es` with no trailing
+		// newline) has no `\n` to anchor the insert on; append at end —
+		// the `#version` directive must remain on its own first line.
+		if (nl < 0) {
+			return src + inject;
+		}
+		return src.substring(0, nl) + inject + src.substring(nl);
+	}
+	return "precision " + precision + " float;\n" + src;
 }
 
 /**
diff --git a/packages/melonjs/src/video/webgl/utils/uniforms.js b/packages/melonjs/src/video/webgl/utils/uniforms.js
index 043c99722..7051d9747 100644
--- a/packages/melonjs/src/video/webgl/utils/uniforms.js
+++ b/packages/melonjs/src/video/webgl/utils/uniforms.js
@@ -15,13 +15,93 @@ const fnHash = {
 	ivec2: "2iv",
 	ivec3: "3iv",
 	ivec4: "4iv",
+	uvec2: "2uiv",
+	uvec3: "3uiv",
+	uvec4: "4uiv",
 	mat2: "Matrix2fv",
 	mat3: "Matrix3fv",
 	mat4: "Matrix4fv",
 	sampler2D: "1i",
+	// WebGL2 integer-typed samplers — bound to a unit just like a
+	// `sampler2D`; the GLSL `usampler2D` / `isampler2D` types let the
+	// shader read raw integer values via `texelFetch` instead of the
+	// normalized-float `texture()` path.
+	usampler2D: "1i",
+	isampler2D: "1i",
 };
 
 /**
+ * Compare a freshly-incoming uniform value to the last value we sent for
+ * the same uniform. Scalars compare by `===`; vec/mat values compare
+ * element-wise (callers commonly reuse a scratch `Float32Array`, so
+ * reference equality would miss every change).
+ * @ignore
+ */
+function valuesMatch(cached, val) {
+	if (cached === undefined) {
+		return false;
+	}
+	if (
+		val !== null &&
+		typeof val === "object" &&
+		typeof val.length === "number"
+	) {
+		if (cached.length !== val.length) {
+			return false;
+		}
+		for (let i = 0; i < val.length; i++) {
+			if (cached[i] !== val[i]) {
+				return false;
+			}
+		}
+		return true;
+	}
+	return cached === val;
+}
+
+/**
+ * Capture the current value into the cache slot so future `setUniform`
+ * calls can short-circuit. Reuses the existing slot (and its allocation)
+ * when the length matches — only a length change or a first capture
+ * allocates a fresh array.
+ * @ignore
+ */
+function captureValue(prev, val) {
+	if (
+		val === null ||
+		typeof val !== "object" ||
+		typeof val.length !== "number"
+	) {
+		return val;
+	}
+	if (
+		prev !== undefined &&
+		typeof prev === "object" &&
+		typeof prev.length === "number" &&
+		prev.length === val.length
+	) {
+		for (let i = 0; i < val.length; i++) {
+			prev[i] = val[i];
+		}
+		return prev;
+	}
+	return typeof val.slice === "function" ? val.slice() : Array.from(val);
+}
+
+/**
+ * Build the `uniforms` proxy object for a compiled shader program.
+ *
+ * Each detected uniform gets a defineProperty getter (returns its
+ * `WebGLUniformLocation`) and a setter that pushes the value to GL. The
+ * setter caches the last value it sent and skips the underlying
+ * `gl.uniform*` call when the incoming value matches — uniform writes are
+ * cheap individually, but a typical per-frame draw pass sets a dozen of
+ * them per shader, and most are layer-lifetime constants (`uMapSize`,
+ * `uCellSize`, `uOpacity`, projection matrix on idle frames, etc.).
+ *
+ * Cache scope is per-shader: each `GLShader` calls `extractUniforms` once
+ * and gets its own closure-captured `cache` map, so caches don't leak
+ * across programs.
  * @ignore
  */
 export function extractUniforms(gl, shader) {
@@ -30,6 +110,10 @@ export function extractUniforms(gl, shader) {
 	const uniformsData = {};
 	const descriptor = {};
 	const locations = {};
+	// last value sent to GL for each uniform, keyed by name. Filled lazily
+	// on first set; reused (in place) on subsequent sets of the same
+	// length to avoid steady-state allocation.
+	const cache = {};
 	let match;
 
 	// Detect all uniform names and types
@@ -59,6 +143,10 @@ export function extractUniforms(gl, shader) {
 					 * A generic setter for uniform matrices
 					 */
 					return function (val) {
+						if (valuesMatch(cache[name], val)) {
+							return;
+						}
+						cache[name] = captureValue(cache[name], val);
 						gl[fn](locations[name], false, val);
 					};
 				} else {
@@ -70,6 +158,10 @@ export function extractUniforms(gl, shader) {
 						if (val.length && !/v$/.test(fn)) {
 							fnv += "v";
 						}
+						if (valuesMatch(cache[name], val)) {
+							return;
+						}
+						cache[name] = captureValue(cache[name], val);
 						gl[fnv](locations[name], val);
 					};
 				}
diff --git a/packages/melonjs/src/video/webgl/webgl_renderer.js b/packages/melonjs/src/video/webgl/webgl_renderer.js
index 290d33726..fd60e4a98 100644
--- a/packages/melonjs/src/video/webgl/webgl_renderer.js
+++ b/packages/melonjs/src/video/webgl/webgl_renderer.js
@@ -27,6 +27,7 @@ import PrimitiveBatcher from "./batchers/primitive_batcher";
 import QuadBatcher from "./batchers/quad_batcher";
 import RadialGradientEffect from "./effects/radialGradient.js";
 import { createLightUniformScratch, packLights } from "./lighting/pack.ts";
+import OrthogonalTMXLayerGPURenderer from "./renderers/tmxlayer/orthogonal.js";
 import { getMaxShaderPrecision } from "./utils/precision.js";
 
 /**
@@ -348,6 +349,11 @@ export default class WebGLRenderer extends Renderer {
 		// clear gl context
 		this.clear();
 
+		// drop every per-layer GPU tilemap texture — tile layers churn on
+		// game reset and we'd otherwise leak a WebGLTexture per layer
+		// across each level transition
+		this._orthogonalTMXGPURenderer?.reset();
+
 		// initial viewport size
 		this.setViewport();
 
@@ -399,6 +405,70 @@ export default class WebGLRenderer extends Renderer {
 			});
 			this._lightAtlas = undefined;
 		}
+
+		// Context-loss-only cleanup for the TMX GPU renderer: the cached
+		// `GLShader` and per-layer GL textures reference the OLD context
+		// and are invalid. On a regular `GAME_RESET` (context still
+		// valid) we already dropped per-layer textures via `.reset()`
+		// above and keep the renderer instance so its compiled shader
+		// program survives across level transitions instead of leaking a
+		// `WebGLProgram` per reset and re-paying the compile cost.
+		if (this.isContextValid === false) {
+			this._orthogonalTMXGPURenderer = undefined;
+		}
+	}
+
+	/**
+	 * Draw a TMX tile layer through whichever path the layer's `renderMode`
+	 * resolves to. WebGL2-eligible layers (`renderMode === "shader"`) take
+	 * the procedural shader path — one quad per tileset, GID lookup in a
+	 * per-layer data texture. All other layers fall through to the base
+	 * `Renderer.drawTileLayer` (preRender blit or per-tile loop).
+	 * @param {object} layer - the TMXLayer to draw
+	 * @param {object} rect - the visible region in world coords
+	 */
+	drawTileLayer(layer, rect) {
+		if (layer.renderMode === "shader") {
+			const gpu = this._getTMXGPURendererFor(layer.orientation);
+			if (gpu !== undefined) {
+				gpu.draw(layer, rect);
+				return;
+			}
+		}
+		super.drawTileLayer(layer, rect);
+	}
+
+	/**
+	 * Lazy-init the orientation-specific GPU tilemap renderer.
+	 * @param {string} orientation
+	 * @returns {object|undefined}
+	 * @ignore
+	 */
+	_getTMXGPURendererFor(orientation) {
+		if (orientation === "orthogonal") {
+			if (this._orthogonalTMXGPURenderer === undefined) {
+				try {
+					this._orthogonalTMXGPURenderer = new OrthogonalTMXLayerGPURenderer(
+						this,
+					);
+				} catch (err) {
+					// shader compile / link failure on this driver — disable
+					// the GPU path permanently for the rest of the session
+					// (the next `drawTileLayer` call falls through to the
+					// legacy renderer). Stored as `null` so subsequent
+					// attempts short-circuit without re-trying the failing
+					// compile every frame.
+					console.warn(
+						"melonJS: GPU tilemap shader failed to compile, falling back to legacy renderer",
+						err,
+					);
+					this._orthogonalTMXGPURenderer = null;
+				}
+			}
+			return this._orthogonalTMXGPURenderer || undefined;
+		}
+		// isometric / staggered / hexagonal: phase 2
+		return undefined;
 	}
 
 	/**
diff --git a/packages/melonjs/tests/texture-resource.spec.js b/packages/melonjs/tests/texture-resource.spec.js
new file mode 100644
index 000000000..e808ed11c
--- /dev/null
+++ b/packages/melonjs/tests/texture-resource.spec.js
@@ -0,0 +1,256 @@
+import { afterAll, beforeAll, describe, expect, it } from "vitest";
+import { boot, video } from "../src/index.js";
+import {
+	BufferTextureResource,
+	TextureResource,
+} from "../src/video/texture/resource.js";
+import WebGLRenderer from "../src/video/webgl/webgl_renderer.js";
+
+describe("TextureResource", () => {
+	it("requires subclasses to implement upload()", () => {
+		const r = new TextureResource({ width: 4, height: 4 });
+		expect(() => {
+			r.upload(null, 0);
+		}).toThrow();
+	});
+
+	it("exposes a TextureAtlas-shaped surface for the cache", () => {
+		const r = new TextureResource({
+			width: 16,
+			height: 8,
+			premultipliedAlpha: false,
+			repeat: "repeat",
+			filter: 9728, // gl.NEAREST
+		});
+		expect(r.width).toBe(16);
+		expect(r.height).toBe(8);
+		expect(r.premultipliedAlpha).toBe(false);
+		expect(r.repeat).toBe("repeat");
+		expect(r.filter).toBe(9728);
+		// minimal `TextureAtlas`-shaped surface the cache walks via
+		// `sources.get(activeAtlas)`
+		expect(r.sources).toBeInstanceOf(Map);
+		expect(r.sources.size).toBe(1);
+		expect(r.sources.get(r.activeAtlas)).toBe(r);
+		expect(r.getTexture()).toBe(r);
+	});
+
+	it("defaults to safe values when options are omitted", () => {
+		const r = new TextureResource({ width: 4, height: 4 });
+		expect(r.premultipliedAlpha).toBe(false);
+		expect(r.repeat).toBe("no-repeat");
+		expect(r.filter).toBeUndefined();
+	});
+});
+
+describe("BufferTextureResource", () => {
+	it("stores the data buffer alongside the resource metadata", () => {
+		const data = new Uint8Array([1, 2, 3, 4, 5, 6, 7, 8]);
+		const r = new BufferTextureResource(data, {
+			width: 2,
+			height: 1,
+			premultipliedAlpha: false,
+			filter: 9728,
+		});
+		expect(r.data).toBe(data);
+		expect(r.width).toBe(2);
+		expect(r.height).toBe(1);
+		expect(r.premultipliedAlpha).toBe(false);
+		// participates in the cache like any other texture-shaped object
+		expect(r.getTexture()).toBe(r);
+		expect(r.sources.get("default")).toBe(r);
+	});
+});
+
+describe("BufferTextureResource — WebGL2 integration", () => {
+	let renderer;
+
+	beforeAll(async () => {
+		await boot();
+		try {
+			video.init(64, 64, {
+				parent: "screen",
+				renderer: video.WEBGL,
+			});
+			if (
+				video.renderer instanceof WebGLRenderer &&
+				video.renderer.WebGLVersion === 2
+			) {
+				renderer = video.renderer;
+			}
+		} catch {
+			// CI runners without GL acceleration can't construct a WebGL2
+			// renderer; tests below mark themselves skipped at runtime
+		}
+	});
+
+	afterAll(() => {
+		try {
+			video.init(64, 64, {
+				parent: "screen",
+				renderer: video.AUTO,
+			});
+		} catch {
+			// nothing to restore if boot/init never succeeded
+		}
+	});
+
+	const requireWebGL2 = (ctx) => {
+		if (renderer === undefined) {
+			ctx.skip("WebGL2 renderer not available in this environment");
+		}
+	};
+
+	/**
+	 * Regression for the previously-reserved fixed texture unit: a
+	 * `BufferTextureResource` must receive its unit from the standard
+	 * cache allocator, not from a hardcoded slot. With dynamic allocation
+	 * the unit number depends on what else is in the cache, but it must
+	 * (a) be a valid non-negative integer, and (b) not collide with the
+	 * unit of any other live cached texture.
+	 */
+	it("gets a dynamically-allocated unit from the texture cache", (ctx) => {
+		requireWebGL2(ctx);
+		const batcher = renderer.setBatcher("quad");
+
+		const r = new BufferTextureResource(new Uint8Array([1, 0, 0, 0]), {
+			width: 1,
+			height: 1,
+			premultipliedAlpha: false,
+			filter: renderer.gl.NEAREST,
+		});
+
+		const unit = batcher.uploadTexture(r, 1, 1);
+		expect(unit).toBeGreaterThanOrEqual(0);
+		expect(Number.isInteger(unit)).toBe(true);
+
+		// the cache must report the same unit for the resource — i.e. it's
+		// really tracked, not just a one-shot bind
+		expect(renderer.cache.getUnit(r)).toBe(unit);
+
+		// cleanup
+		batcher.deleteTexture2D(r);
+	});
+
+	/**
+	 * Two resources must receive distinct units (no collision). Catches
+	 * a hypothetical regression where the cache key collapses or where
+	 * a resource is treated as "the same" as another for unit purposes.
+	 */
+	it("assigns distinct units to distinct resources", (ctx) => {
+		requireWebGL2(ctx);
+		const batcher = renderer.setBatcher("quad");
+
+		const a = new BufferTextureResource(new Uint8Array([1, 0, 0, 0]), {
+			width: 1,
+			height: 1,
+			premultipliedAlpha: false,
+		});
+		const b = new BufferTextureResource(new Uint8Array([2, 0, 0, 0]), {
+			width: 1,
+			height: 1,
+			premultipliedAlpha: false,
+		});
+
+		const ua = batcher.uploadTexture(a, 1, 1);
+		const ub = batcher.uploadTexture(b, 1, 1);
+		expect(ua).not.toBe(ub);
+
+		batcher.deleteTexture2D(a);
+		batcher.deleteTexture2D(b);
+	});
+
+	/**
+	 * The resource path must apply its own `premultipliedAlpha` setting
+	 * at upload time. With premultiply enabled and A=0 the driver would
+	 * multiply RGB by zero and wipe the data; the resource opts out via
+	 * its constructor flag and the GIDs survive the round-trip.
+	 *
+	 * Read-back via a tiny FBO: write 2 texels with R=1 and R=255, A=0,
+	 * confirm the bytes come back intact.
+	 */
+	it("respects the resource's premultipliedAlpha=false flag on upload", (ctx) => {
+		requireWebGL2(ctx);
+		const gl = renderer.gl;
+		const batcher = renderer.setBatcher("quad");
+
+		// cell 0: R=1, A=0  /  cell 1: R=255, A=0
+		const data = new Uint8Array([1, 0, 0, 0, 255, 0, 0, 0]);
+		const r = new BufferTextureResource(data, {
+			width: 2,
+			height: 1,
+			premultipliedAlpha: false,
+			filter: gl.NEAREST,
+		});
+
+		batcher.uploadTexture(r, 2, 1);
+		// uploadTexture flushes pending state — the GL texture handle is
+		// the one the cache parked at our unit
+		const texture = batcher.getTexture2D(renderer.cache.getUnit(r));
+
+		const fbo = gl.createFramebuffer();
+		gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+		gl.framebufferTexture2D(
+			gl.FRAMEBUFFER,
+			gl.COLOR_ATTACHMENT0,
+			gl.TEXTURE_2D,
+			texture,
+			0,
+		);
+		expect(gl.checkFramebufferStatus(gl.FRAMEBUFFER)).toBe(
+			gl.FRAMEBUFFER_COMPLETE,
+		);
+
+		const pixels = new Uint8Array(2 * 1 * 4);
+		gl.readPixels(0, 0, 2, 1, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
+		expect(pixels[0]).toBe(1);
+		expect(pixels[4]).toBe(0xff);
+
+		gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+		gl.deleteFramebuffer(fbo);
+		batcher.deleteTexture2D(r);
+	});
+
+	/**
+	 * Re-uploading the same resource via the `force` argument must
+	 * preserve its allocated unit (so the cache stays consistent) and
+	 * the second upload must reflect the latest contents of the buffer.
+	 */
+	it("force-reuploads on demand without changing the allocated unit", (ctx) => {
+		requireWebGL2(ctx);
+		const gl = renderer.gl;
+		const batcher = renderer.setBatcher("quad");
+
+		const data = new Uint8Array([0, 0, 0, 0]);
+		const r = new BufferTextureResource(data, {
+			width: 1,
+			height: 1,
+			premultipliedAlpha: false,
+			filter: gl.NEAREST,
+		});
+
+		const unit1 = batcher.uploadTexture(r, 1, 1);
+		// mutate the underlying buffer and force a re-upload
+		data[0] = 77;
+		const unit2 = batcher.uploadTexture(r, 1, 1, true);
+		expect(unit2).toBe(unit1);
+
+		const texture = batcher.getTexture2D(unit2);
+		const fbo = gl.createFramebuffer();
+		gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+		gl.framebufferTexture2D(
+			gl.FRAMEBUFFER,
+			gl.COLOR_ATTACHMENT0,
+			gl.TEXTURE_2D,
+			texture,
+			0,
+		);
+		const pixels = new Uint8Array(4);
+		gl.readPixels(0, 0, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
+		expect(pixels[0]).toBe(77);
+
+		gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+		gl.deleteFramebuffer(fbo);
+		batcher.deleteTexture2D(r);
+	});
+});
diff --git a/packages/melonjs/tests/tmxlayer-data.spec.js b/packages/melonjs/tests/tmxlayer-data.spec.js
new file mode 100644
index 000000000..11d337447
--- /dev/null
+++ b/packages/melonjs/tests/tmxlayer-data.spec.js
@@ -0,0 +1,774 @@
+import { beforeAll, describe, expect, it } from "vitest";
+import { boot, TMXTileMap, video } from "../src/index.js";
+import {
+	TMX_CLEAR_BIT_MASK,
+	TMX_FLIP_AD,
+	TMX_FLIP_H,
+	TMX_FLIP_V,
+} from "../src/level/tiled/constants.js";
+import Tile from "../src/level/tiled/TMXTile.js";
+import { imgList } from "../src/loader/cache.js";
+
+// flip-mask bit layout in layerData's G channel (mirrors TMXLayer.js)
+const FLIP_H_BIT = 1 << 0;
+const FLIP_V_BIT = 1 << 1;
+const FLIP_AD_BIT = 1 << 2;
+
+function fakeImage(name, w = 64, h = 64) {
+	const canvas = document.createElement("canvas");
+	canvas.width = w;
+	canvas.height = h;
+	imgList[name] = canvas;
+	return canvas;
+}
+
+const tilesetData = {
+	firstgid: 1,
+	name: "testtiles",
+	tilewidth: 32,
+	tileheight: 32,
+	spacing: 0,
+	margin: 0,
+	tilecount: 4,
+	columns: 2,
+	image: "testtiles.png",
+};
+
+// 4 cols x 3 rows = 12 cells, with a mix of empty (0) and populated cells
+function buildMapJSON(data) {
+	return {
+		width: 4,
+		height: 3,
+		tilewidth: 32,
+		tileheight: 32,
+		orientation: "orthogonal",
+		renderorder: "right-down",
+		infinite: false,
+		version: "1.10",
+		tiledversion: "1.12.0",
+		tilesets: [tilesetData],
+		layers: [
+			{
+				type: "tilelayer",
+				name: "Background",
+				width: 4,
+				height: 3,
+				data,
+				visible: true,
+				opacity: 1,
+			},
+		],
+	};
+}
+
+function makeLayer(data) {
+	const map = new TMXTileMap("test", buildMapJSON(data));
+	const groups = map.getLayers();
+	return groups[0];
+}
+
+const ALL_ZERO_4x3 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
+
+describe("TMXLayer.layerData (Uint16Array refactor)", () => {
+	beforeAll(() => {
+		boot();
+		video.init(128, 128, {
+			parent: "screen",
+			scale: "auto",
+			renderer: video.CANVAS,
+		});
+		fakeImage("testtiles", 64, 64);
+	});
+
+	describe("Allocation & shape", () => {
+		it("layerData is a Uint16Array (not a 2D Array)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.layerData).toBeInstanceOf(Uint16Array);
+			expect(Array.isArray(layer.layerData)).toBe(false);
+		});
+
+		it("layerData.length === cols * rows * 2 exactly", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.layerData.length).toBe(layer.cols * layer.rows * 2);
+			expect(layer.layerData.length).toBe(4 * 3 * 2);
+			expect(layer.layerData.length).toBe(24);
+		});
+
+		it("all-zero map fills layerData with zeros (empty cells)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			for (let i = 0; i < layer.layerData.length; i++) {
+				expect(layer.layerData[i]).toBe(0);
+			}
+		});
+
+		it("cachedTile starts as null (lazy-allocated on first cellAt call)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.cachedTile).toBe(null);
+		});
+
+		it("cachedTile is allocated on first cellAt call that hits a populated cell", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[5] = 1;
+			const layer = makeLayer(data);
+			expect(layer.cachedTile).toBe(null);
+			layer.cellAt(1, 1, false);
+			expect(Array.isArray(layer.cachedTile)).toBe(true);
+			expect(layer.cachedTile.length).toBe(layer.cols * layer.rows);
+		});
+
+		it("cachedTile stays null if cellAt only hits empty cells", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			layer.cellAt(0, 0, false);
+			layer.cellAt(2, 1, false);
+			expect(layer.cachedTile).toBe(null);
+		});
+
+		it("setTile does NOT allocate cachedTile (renderer hot loop bypasses)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+			layer.setTile(tile, 1, 1);
+			expect(layer.cachedTile).toBe(null);
+		});
+
+		it("clearTile does NOT allocate cachedTile", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			layer.clearTile(0, 0);
+			expect(layer.cachedTile).toBe(null);
+		});
+
+		it("dataVersion starts at 0", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.dataVersion).toBe(0);
+		});
+
+		it("row-major layout: cell (x, y) lives at layerData[(y * cols + x) * 2]", () => {
+			// place a unique GID at (3, 2) — the last cell of a 4x3 map
+			const data = [...ALL_ZERO_4x3];
+			data[2 * 4 + 3] = 1; // y=2, x=3 → flat input idx 11 → GID=1
+			const layer = makeLayer(data);
+			const expectedIdx = (2 * layer.cols + 3) * 2; // = 22
+			expect(layer.layerData[expectedIdx]).toBe(1);
+			expect(layer.layerData[expectedIdx + 1]).toBe(0);
+		});
+	});
+
+	describe("Round-trip every flip combination", () => {
+		// 8 combinations of (H, V, AD)
+		const cases = [
+			[false, false, false, 0],
+			[true, false, false, FLIP_H_BIT],
+			[false, true, false, FLIP_V_BIT],
+			[false, false, true, FLIP_AD_BIT],
+			[true, true, false, FLIP_H_BIT | FLIP_V_BIT],
+			[true, false, true, FLIP_H_BIT | FLIP_AD_BIT],
+			[false, true, true, FLIP_V_BIT | FLIP_AD_BIT],
+			[true, true, true, FLIP_H_BIT | FLIP_V_BIT | FLIP_AD_BIT],
+		];
+
+		for (const [flipH, flipV, flipAD, expectedMask] of cases) {
+			const label = `H=${flipH} V=${flipV} AD=${flipAD}`;
+			it(`encodes/decodes flip mask correctly: ${label}`, () => {
+				const layer = makeLayer(ALL_ZERO_4x3);
+				const tileset = layer.tileset;
+				const gid =
+					1 |
+					(flipH ? TMX_FLIP_H : 0) |
+					(flipV ? TMX_FLIP_V : 0) |
+					(flipAD ? TMX_FLIP_AD : 0);
+				const tile = new Tile(0, 0, gid, tileset);
+				layer.setTile(tile, 1, 1);
+
+				// raw byte assertion — the encoding contract
+				const idx = (1 * layer.cols + 1) * 2;
+				expect(layer.layerData[idx]).toBe(1);
+				expect(layer.layerData[idx + 1]).toBe(expectedMask);
+
+				// round-trip through cellAt — returns a view rebuilt from
+				// layerData bytes (not necessarily the same object as `tile`)
+				const got = layer.cellAt(1, 1, false);
+				expect(got).not.toBeNull();
+				expect(got.tileId).toBe(1);
+				expect(got.flippedX).toBe(flipH);
+				expect(got.flippedY).toBe(flipV);
+				expect(got.flippedAD).toBe(flipAD);
+				expect(got.flipped).toBe(flipH || flipV || flipAD);
+
+				// flipped tiles get a currentTransform; unflipped don't
+				if (flipH || flipV || flipAD) {
+					expect(got.currentTransform).not.toBeNull();
+				} else {
+					expect(got.currentTransform).toBeNull();
+				}
+			});
+
+			it(`reconstitutes a Tile from raw bytes correctly (no cache hit): ${label}`, () => {
+				// write raw bytes into layerData without going through setTile,
+				// so cachedTile stays null and cellAt rebuilds from the bytes
+				const layer = makeLayer(ALL_ZERO_4x3);
+				const idx = (2 * layer.cols + 0) * 2;
+				layer.layerData[idx] = 3; // tileId = 3
+				layer.layerData[idx + 1] = expectedMask;
+
+				const got = layer.cellAt(0, 2, false);
+				expect(got).not.toBeNull();
+				expect(got.tileId).toBe(3);
+				expect(got.flippedX).toBe(flipH);
+				expect(got.flippedY).toBe(flipV);
+				expect(got.flippedAD).toBe(flipAD);
+				expect(got.col).toBe(0);
+				expect(got.row).toBe(2);
+			});
+
+			it(`parser path: legacy GID with flip bits decodes correctly: ${label}`, () => {
+				const legacyGid =
+					2 |
+					(flipH ? TMX_FLIP_H : 0) |
+					(flipV ? TMX_FLIP_V : 0) |
+					(flipAD ? TMX_FLIP_AD : 0);
+				// Tiled normalizes the upper bits into a signed-int; data arrays
+				// commonly hold these as unsigned 32-bit values (we just pass through)
+				const data = [...ALL_ZERO_4x3];
+				data[0] = legacyGid >>> 0; // unsigned form
+				const layer = makeLayer(data);
+
+				expect(layer.layerData[0]).toBe(2);
+				expect(layer.layerData[1]).toBe(expectedMask);
+
+				const got = layer.cellAt(0, 0, false);
+				expect(got.tileId).toBe(2);
+				expect(got.flippedX).toBe(flipH);
+				expect(got.flippedY).toBe(flipV);
+				expect(got.flippedAD).toBe(flipAD);
+			});
+		}
+	});
+
+	describe("GID range adversarial", () => {
+		it("GID = 1 (minimum non-empty) works end-to-end", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			expect(layer.layerData[0]).toBe(1);
+			expect(layer.cellAt(0, 0, false).tileId).toBe(1);
+		});
+
+		it("GID = 0 stays an empty cell (cellAt returns null)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.cellAt(0, 0, false)).toBeNull();
+			expect(layer.cellAt(3, 2, false)).toBeNull();
+			expect(layer.layerData[0]).toBe(0);
+		});
+
+		it("GID with all flip bits set is correctly masked at parse time", () => {
+			// raw Tiled GID = 4 with all 3 flip bits set
+			const legacyGid = (4 | TMX_FLIP_H | TMX_FLIP_V | TMX_FLIP_AD) >>> 0;
+			const data = [...ALL_ZERO_4x3];
+			data[5] = legacyGid;
+			const layer = makeLayer(data);
+			const idx = (1 * layer.cols + 1) * 2;
+			expect(layer.layerData[idx]).toBe(4); // gid channel: flip bits stripped
+			expect(layer.layerData[idx + 1]).toBe(
+				FLIP_H_BIT | FLIP_V_BIT | FLIP_AD_BIT,
+			); // flip channel only
+		});
+
+		it("flip bits never leak into the GID channel", () => {
+			// every entry of layerData[idx] (the GID slots) must satisfy
+			// (value & flipMaskOfGid) === 0 — i.e. flip bits never appear in GID
+			const data = [...ALL_ZERO_4x3];
+			for (let i = 0; i < data.length; i++) {
+				const flipPattern = (i & 1 ? TMX_FLIP_H : 0) | (i & 2 ? TMX_FLIP_V : 0);
+				data[i] = ((i + 1) | flipPattern) >>> 0;
+			}
+			const layer = makeLayer(data);
+			// every GID slot must fit in 16 bits with no flip bits leaked
+			for (let i = 0; i < layer.layerData.length; i += 2) {
+				expect(layer.layerData[i] & TMX_CLEAR_BIT_MASK).toBe(
+					layer.layerData[i],
+				);
+				expect(layer.layerData[i] >= 0 && layer.layerData[i] <= 0xffff).toBe(
+					true,
+				);
+			}
+		});
+	});
+
+	describe("Empty / cleared cell semantics", () => {
+		it("fresh layer: every cell returns null via cellAt", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			for (let y = 0; y < layer.rows; y++) {
+				for (let x = 0; x < layer.cols; x++) {
+					expect(layer.cellAt(x, y, false)).toBeNull();
+				}
+			}
+		});
+
+		it("fresh layer: getTileId returns null for every cell", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.getTileId(0, 0)).toBeNull();
+			expect(layer.getTileId(50, 50)).toBeNull();
+		});
+
+		it("clearTile after setTile zeros both channels", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1 | TMX_FLIP_H, layer.tileset);
+			layer.setTile(tile, 2, 1);
+			const idx = (1 * layer.cols + 2) * 2;
+			expect(layer.layerData[idx]).toBe(1);
+			expect(layer.layerData[idx + 1]).toBe(FLIP_H_BIT);
+
+			layer.clearTile(2, 1);
+			expect(layer.layerData[idx]).toBe(0);
+			expect(layer.layerData[idx + 1]).toBe(0);
+			expect(layer.cellAt(2, 1, false)).toBeNull();
+			expect(layer.getTileId(2 * 32, 1 * 32)).toBeNull();
+		});
+
+		it("clearTile invalidates cachedTile slot when cache is allocated", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			// trigger lazy alloc + populate the slot
+			const tile = layer.cellAt(0, 0, false);
+			expect(layer.cachedTile[0]).toBe(tile);
+			// clearing should null the slot
+			layer.clearTile(0, 0);
+			expect(layer.cachedTile[0]).toBeNull();
+		});
+
+		it("empty-cell guard: cellAt never returns a Tile with tileId === 0", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			// directly write a 0 to a slot then read it
+			const idx = (1 * layer.cols + 1) * 2;
+			layer.layerData[idx] = 0;
+			layer.layerData[idx + 1] = 0;
+			expect(layer.cellAt(1, 1, false)).toBeNull();
+		});
+	});
+
+	describe("Identity stability via cachedTile", () => {
+		it("repeated cellAt for the same cell returns the SAME Tile object", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[5] = 2;
+			const layer = makeLayer(data);
+			const a = layer.cellAt(1, 1, false);
+			const b = layer.cellAt(1, 1, false);
+			expect(a).not.toBeNull();
+			expect(a).toBe(b); // strict identity equality
+		});
+
+		it("cellAt for distinct populated cells returns distinct Tiles", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			data[1] = 2;
+			const layer = makeLayer(data);
+			const a = layer.cellAt(0, 0, false);
+			const b = layer.cellAt(1, 0, false);
+			expect(a).not.toBe(b);
+		});
+
+		it("setTile invalidates the cache: next cellAt returns the new tile's data", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile1 = new Tile(0, 0, 1, layer.tileset);
+			const tile2 = new Tile(0, 0, 2, layer.tileset);
+
+			layer.setTile(tile1, 0, 0);
+			// trigger cache allocation by calling cellAt
+			const got1 = layer.cellAt(0, 0, false);
+			expect(got1.tileId).toBe(1);
+
+			// overwriting should null the cached view; next cellAt rebuilds
+			layer.setTile(tile2, 0, 0);
+			const got2 = layer.cellAt(0, 0, false);
+			expect(got2.tileId).toBe(2);
+			expect(got2).not.toBe(got1);
+		});
+
+		it("clearTile then setTile produces a fresh view (no resurrection)", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			const got1 = layer.cellAt(0, 0, false);
+			layer.clearTile(0, 0);
+			expect(layer.cellAt(0, 0, false)).toBeNull();
+			// re-populate the cell
+			const tile2 = new Tile(0, 0, 2, layer.tileset);
+			layer.setTile(tile2, 0, 0);
+			const got2 = layer.cellAt(0, 0, false);
+			expect(got2.tileId).toBe(2);
+			expect(got2).not.toBe(got1);
+		});
+
+		it("isometricRpg-style usage: stable identity across pointerMove calls", () => {
+			// regression for examples/isometricRpg/play.ts line 49:
+			//   if (tile && tile !== this.currentTile) { ... }
+			const data = [...ALL_ZERO_4x3];
+			data[6] = 3;
+			const layer = makeLayer(data);
+			const tile1 = layer.cellAt(2, 1, false);
+			const tile2 = layer.cellAt(2, 1, false);
+			const tile3 = layer.cellAt(2, 1, false);
+			expect(tile1).toBe(tile2);
+			expect(tile2).toBe(tile3);
+		});
+	});
+
+	describe("dataVersion", () => {
+		it("initial value is 0", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.dataVersion).toBe(0);
+		});
+
+		it("setTile increments dataVersion by exactly 1", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+			layer.setTile(tile, 0, 0);
+			expect(layer.dataVersion).toBe(1);
+			layer.setTile(tile, 0, 1);
+			expect(layer.dataVersion).toBe(2);
+		});
+
+		it("clearTile increments dataVersion by exactly 1", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+			layer.setTile(tile, 0, 0);
+			expect(layer.dataVersion).toBe(1);
+			layer.clearTile(0, 0);
+			expect(layer.dataVersion).toBe(2);
+		});
+
+		it("cellAt does NOT bump dataVersion", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			const v0 = layer.dataVersion;
+			layer.cellAt(0, 0, false);
+			layer.cellAt(0, 0, false); // cached
+			layer.cellAt(1, 0, false); // empty
+			expect(layer.dataVersion).toBe(v0);
+		});
+
+		it("getTile / getTileId do NOT bump dataVersion", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			const v0 = layer.dataVersion;
+			layer.getTile(0, 0);
+			layer.getTileId(0, 0);
+			expect(layer.dataVersion).toBe(v0);
+		});
+
+		it("after N mutations dataVersion === N (monotone)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+			for (let i = 1; i <= 10; i++) {
+				layer.setTile(tile, i % layer.cols, Math.floor(i / layer.cols));
+				expect(layer.dataVersion).toBe(i);
+			}
+		});
+	});
+
+	describe("Bounds & coordinate edge cases", () => {
+		it("cellAt with boundsCheck on rejects negative coords", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.cellAt(-1, 0)).toBeNull();
+			expect(layer.cellAt(0, -1)).toBeNull();
+		});
+
+		it("cellAt with boundsCheck on rejects out-of-range coords", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.cellAt(layer.cols, 0)).toBeNull();
+			expect(layer.cellAt(0, layer.rows)).toBeNull();
+			expect(layer.cellAt(100, 100)).toBeNull();
+		});
+
+		it("cellAt truncates fractional coordinates via ~~", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[5] = 1; // y=1, x=1
+			const layer = makeLayer(data);
+			const a = layer.cellAt(1, 1, false);
+			const b = layer.cellAt(1.9, 1.9, false);
+			const c = layer.cellAt(1.0001, 1.99999, false);
+			expect(a).toBe(b);
+			expect(b).toBe(c);
+		});
+
+		it("cellAt with NaN coordinates resolves to (0, 0)", () => {
+			// ~~NaN === 0
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+			expect(layer.cellAt(Number.NaN, Number.NaN, false)).toBe(
+				layer.cellAt(0, 0, false),
+			);
+		});
+
+		it("getTile in world coords outside the layer returns null", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			expect(layer.getTile(-1, 0)).toBeNull();
+			expect(layer.getTile(layer.width + 1, 0)).toBeNull();
+		});
+	});
+
+	describe("setTile / clearTile bounds validation", () => {
+		it("setTile at out-of-bounds coords is a no-op (does not allocate cachedTile or mutate layerData)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+
+			layer.setTile(tile, -1, 0);
+			layer.setTile(tile, 0, -1);
+			layer.setTile(tile, layer.cols, 0);
+			layer.setTile(tile, 0, layer.rows);
+			layer.setTile(tile, 100, 100);
+
+			expect(layer.cachedTile).toBe(null); // no allocation triggered
+			expect(layer.dataVersion).toBe(0); // no successful writes
+			for (let i = 0; i < layer.layerData.length; i++) {
+				expect(layer.layerData[i]).toBe(0);
+			}
+		});
+
+		it("setTile at out-of-bounds returns the tile unchanged (preserves return contract)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+			expect(layer.setTile(tile, -1, 0)).toBe(tile);
+			expect(layer.setTile(tile, 100, 100)).toBe(tile);
+		});
+
+		it("clearTile at out-of-bounds coords is a no-op (does not allocate cachedTile or mutate layerData)", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			const layer = makeLayer(data);
+
+			layer.clearTile(-1, 0);
+			layer.clearTile(0, -1);
+			layer.clearTile(layer.cols, 0);
+			layer.clearTile(0, layer.rows);
+			layer.clearTile(100, 100);
+
+			expect(layer.cachedTile).toBe(null); // no allocation triggered
+			expect(layer.dataVersion).toBe(0); // no successful clears
+			// the populated cell at (0, 0) is untouched
+			expect(layer.layerData[0]).toBe(1);
+		});
+
+		it("setTile bounds check rejects exactly on the edge (cols, rows)", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1, layer.tileset);
+			// cols=4, rows=3 — these specific indices must be rejected
+			layer.setTile(tile, 4, 0);
+			layer.setTile(tile, 0, 3);
+			expect(layer.dataVersion).toBe(0);
+			// but cols-1, rows-1 are valid
+			layer.setTile(tile, 3, 2);
+			expect(layer.dataVersion).toBe(1);
+		});
+	});
+
+	describe("Cross-cell isolation", () => {
+		it("setTile at one cell does not mutate any other layerData byte", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const snapshot = new Uint16Array(layer.layerData);
+			const tile = new Tile(0, 0, 1 | TMX_FLIP_H, layer.tileset);
+			layer.setTile(tile, 1, 1);
+			const idx = (1 * layer.cols + 1) * 2;
+			for (let i = 0; i < layer.layerData.length; i++) {
+				if (i === idx || i === idx + 1) {
+					continue;
+				}
+				expect(layer.layerData[i]).toBe(snapshot[i]);
+			}
+		});
+
+		it("clearTile at one cell does not mutate any other cachedTile slot", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			data[5] = 2;
+			data[11] = 3;
+			const layer = makeLayer(data);
+			const t1 = layer.cellAt(0, 0, false);
+			const t2 = layer.cellAt(1, 1, false);
+			const t3 = layer.cellAt(3, 2, false);
+			// cache is now allocated and populated for these three cells
+			expect(layer.cachedTile).not.toBeNull();
+
+			layer.clearTile(1, 1);
+			expect(layer.cachedTile[0]).toBe(t1);
+			expect(layer.cachedTile[1 * 4 + 1]).toBeNull();
+			expect(layer.cachedTile[2 * 4 + 3]).toBe(t3);
+			expect(t2).not.toBeNull(); // local ref still valid
+		});
+	});
+
+	describe("Parser path (setLayerData)", () => {
+		it("decodes a mix of empty + populated + flipped cells correctly", () => {
+			const data = [
+				1,
+				0,
+				2,
+				0,
+				0,
+				(3 | TMX_FLIP_H) >>> 0,
+				0,
+				0,
+				4,
+				0,
+				0,
+				(1 | TMX_FLIP_V | TMX_FLIP_AD) >>> 0,
+			];
+			const layer = makeLayer(data);
+
+			// (0, 0) = gid 1, no flip
+			expect(layer.layerData[0]).toBe(1);
+			expect(layer.layerData[1]).toBe(0);
+
+			// (2, 0) = gid 2, no flip
+			expect(layer.layerData[4]).toBe(2);
+			expect(layer.layerData[5]).toBe(0);
+
+			// (1, 1) = gid 3, flipped H
+			expect(layer.layerData[(1 * 4 + 1) * 2]).toBe(3);
+			expect(layer.layerData[(1 * 4 + 1) * 2 + 1]).toBe(FLIP_H_BIT);
+
+			// (0, 2) = gid 4
+			expect(layer.layerData[(2 * 4 + 0) * 2]).toBe(4);
+			expect(layer.layerData[(2 * 4 + 0) * 2 + 1]).toBe(0);
+
+			// (3, 2) = gid 1, flipped V + AD
+			expect(layer.layerData[(2 * 4 + 3) * 2]).toBe(1);
+			expect(layer.layerData[(2 * 4 + 3) * 2 + 1]).toBe(
+				FLIP_V_BIT | FLIP_AD_BIT,
+			);
+
+			// empty cells stay zero
+			expect(layer.layerData[2]).toBe(0); // (1, 0) gid
+			expect(layer.layerData[3]).toBe(0); // (1, 0) flip
+		});
+
+		it("setLayerData allocates ZERO TMXTile objects during parse", () => {
+			// spy on Tile constructor — track call count across the parse
+			let constructorCalls = 0;
+			const origInit = Tile.prototype.setMinMax;
+			// setMinMax is called from Tile's constructor — instrument it
+			Tile.prototype.setMinMax = function (...args) {
+				constructorCalls++;
+				return origInit.apply(this, args);
+			};
+			try {
+				// dense data — half cells filled
+				const data = [1, 2, 3, 4, 1, 0, 3, 0, 0, 2, 0, 4];
+				makeLayer(data);
+				expect(constructorCalls).toBe(0);
+			} finally {
+				Tile.prototype.setMinMax = origInit;
+			}
+		});
+
+		it("does not allocate cachedTile during parse", () => {
+			const data = [1, 2, 3, 4, 1, 0, 3, 0, 0, 2, 0, 4];
+			const layer = makeLayer(data);
+			expect(layer.cachedTile).toBe(null);
+		});
+	});
+
+	describe("Tile.col / Tile.row consistency", () => {
+		it("cellAt(x, y) returns a Tile with col === x and row === y", () => {
+			const data = [...ALL_ZERO_4x3];
+			data[0] = 1;
+			data[2 * 4 + 3] = 2; // (3, 2)
+			const layer = makeLayer(data);
+
+			const a = layer.cellAt(0, 0, false);
+			expect(a.col).toBe(0);
+			expect(a.row).toBe(0);
+
+			const b = layer.cellAt(3, 2, false);
+			expect(b.col).toBe(3);
+			expect(b.row).toBe(2);
+		});
+	});
+
+	describe("Hot-path encoding round-trip stress", () => {
+		it("dense map with mixed GIDs + all flip combinations round-trips byte-for-byte", () => {
+			// build a 4x3 map where every cell exercises a different combination
+			const flipPatterns = [
+				0,
+				TMX_FLIP_H,
+				TMX_FLIP_V,
+				TMX_FLIP_AD,
+				TMX_FLIP_H | TMX_FLIP_V,
+				TMX_FLIP_H | TMX_FLIP_AD,
+				TMX_FLIP_V | TMX_FLIP_AD,
+				TMX_FLIP_H | TMX_FLIP_V | TMX_FLIP_AD,
+				0,
+				TMX_FLIP_H,
+				TMX_FLIP_V,
+				TMX_FLIP_AD,
+			];
+			const data = [];
+			for (let i = 0; i < 12; i++) {
+				const gid = (i % 4) + 1; // GIDs 1..4
+				data.push((gid | flipPatterns[i]) >>> 0);
+			}
+
+			const layer = makeLayer(data);
+
+			// verify every cell decodes correctly via cellAt
+			for (let y = 0; y < 3; y++) {
+				for (let x = 0; x < 4; x++) {
+					const i = y * 4 + x;
+					const expectedGid = (i % 4) + 1;
+					const flip = flipPatterns[i];
+					const expectedH = (flip & TMX_FLIP_H) !== 0;
+					const expectedV = (flip & TMX_FLIP_V) !== 0;
+					const expectedAD = (flip & TMX_FLIP_AD) !== 0;
+
+					const t = layer.cellAt(x, y, false);
+					expect(t).not.toBeNull();
+					expect(t.tileId).toBe(expectedGid);
+					expect(t.flippedX).toBe(expectedH);
+					expect(t.flippedY).toBe(expectedV);
+					expect(t.flippedAD).toBe(expectedAD);
+				}
+			}
+		});
+	});
+
+	describe("setTile / clearTile parity with cellAt", () => {
+		it("setTile writes the same encoding the parser would produce", () => {
+			// build two layers: one via parser, one via setTile after the fact
+			const data = [...ALL_ZERO_4x3];
+			data[0] = (2 | TMX_FLIP_H | TMX_FLIP_AD) >>> 0;
+			const parsed = makeLayer(data);
+
+			const built = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(
+				0,
+				0,
+				(2 | TMX_FLIP_H | TMX_FLIP_AD) >>> 0,
+				built.tileset,
+			);
+			built.setTile(tile, 0, 0);
+
+			// raw layerData should match byte-for-byte
+			expect(built.layerData[0]).toBe(parsed.layerData[0]);
+			expect(built.layerData[1]).toBe(parsed.layerData[1]);
+		});
+
+		it("repeated setTile/clearTile cycles leave layerData in a clean state", () => {
+			const layer = makeLayer(ALL_ZERO_4x3);
+			const tile = new Tile(0, 0, 1 | TMX_FLIP_V, layer.tileset);
+
+			for (let i = 0; i < 5; i++) {
+				layer.setTile(tile, 2, 1);
+				layer.clearTile(2, 1);
+			}
+			const idx = (1 * layer.cols + 2) * 2;
+			expect(layer.layerData[idx]).toBe(0);
+			expect(layer.layerData[idx + 1]).toBe(0);
+			expect(layer.cellAt(2, 1, false)).toBeNull();
+		});
+	});
+});
diff --git a/packages/melonjs/tests/tmxlayer-drawraw.spec.js b/packages/melonjs/tests/tmxlayer-drawraw.spec.js
new file mode 100644
index 000000000..73f293d89
--- /dev/null
+++ b/packages/melonjs/tests/tmxlayer-drawraw.spec.js
@@ -0,0 +1,499 @@
+import { beforeAll, describe, expect, it } from "vitest";
+import { boot, TMXTileMap, video } from "../src/index.js";
+import {
+	TMX_FLIP_AD,
+	TMX_FLIP_H,
+	TMX_FLIP_V,
+} from "../src/level/tiled/constants.js";
+import Tile, { buildFlipTransform } from "../src/level/tiled/TMXTile.js";
+import { imgList } from "../src/loader/cache.js";
+import { Matrix2d } from "../src/math/matrix2d.ts";
+
+// flip-mask bit layout (mirrors TMXLayer / TMXTile)
+const FLIP_H_BIT = 1 << 0;
+const FLIP_V_BIT = 1 << 1;
+const FLIP_AD_BIT = 1 << 2;
+
+function paintGradientCanvas(name, w, h, colors) {
+	// build a canvas with distinct colored tile cells so we can detect
+	// flips / wrong tile selection by reading pixels
+	const canvas = document.createElement("canvas");
+	canvas.width = w;
+	canvas.height = h;
+	const ctx = canvas.getContext("2d");
+	// 2x2 grid of distinct flat colors for a 2-column tileset
+	const cellW = w / 2;
+	const cellH = h / 2;
+	for (let i = 0; i < colors.length; i++) {
+		const col = i % 2;
+		const row = Math.floor(i / 2);
+		ctx.fillStyle = colors[i];
+		ctx.fillRect(col * cellW, row * cellH, cellW, cellH);
+	}
+	imgList[name] = canvas;
+	return canvas;
+}
+
+const tilesetData = {
+	firstgid: 1,
+	name: "rawtest",
+	tilewidth: 32,
+	tileheight: 32,
+	spacing: 0,
+	margin: 0,
+	tilecount: 4,
+	columns: 2,
+	image: "rawtest.png",
+};
+
+function buildOrthogonalMap(data, cols = 4, rows = 3) {
+	return {
+		width: cols,
+		height: rows,
+		tilewidth: 32,
+		tileheight: 32,
+		orientation: "orthogonal",
+		renderorder: "right-down",
+		infinite: false,
+		version: "1.10",
+		tiledversion: "1.12.0",
+		tilesets: [tilesetData],
+		layers: [
+			{
+				type: "tilelayer",
+				name: "Background",
+				width: cols,
+				height: rows,
+				data,
+				visible: true,
+				opacity: 1,
+			},
+		],
+	};
+}
+
+function makeOrthogonalLayer(data, cols = 4, rows = 3) {
+	const map = new TMXTileMap("rawtest", buildOrthogonalMap(data, cols, rows));
+	return map.getLayers()[0];
+}
+
+// minimal renderer mock that records every drawImage call — sufficient to
+// detect tileset selection, source-rect lookup, and destination position
+function makeRecordingRenderer() {
+	const calls = [];
+	const transformStack = [];
+	let current = { tx: 0, ty: 0 };
+	return {
+		uvOffset: 0,
+		drawImage(image, sx, sy, sw, sh, dx, dy, dw, dh) {
+			calls.push({
+				image,
+				sx,
+				sy,
+				sw,
+				sh,
+				dx: dx + current.tx,
+				dy: dy + current.ty,
+				dw,
+				dh,
+			});
+		},
+		save() {
+			transformStack.push({ tx: current.tx, ty: current.ty });
+		},
+		restore() {
+			current = transformStack.pop() ?? { tx: 0, ty: 0 };
+		},
+		translate(x, y) {
+			current.tx += x;
+			current.ty += y;
+		},
+		transform() {
+			// the orientation-renderer flips multiply the matrix into the
+			// renderer's transform stack; we don't need to model the matrix
+			// in detail for this test — we already verify the math separately
+		},
+		__calls: () => {
+			return calls;
+		},
+	};
+}
+
+describe("Tile rendering raw path (drawTileRaw)", () => {
+	beforeAll(() => {
+		boot();
+		video.init(128, 128, {
+			parent: "screen",
+			scale: "auto",
+			renderer: video.CANVAS,
+		});
+		paintGradientCanvas("rawtest", 64, 64, [
+			"#ff0000",
+			"#00ff00",
+			"#0000ff",
+			"#ffff00",
+		]);
+	});
+
+	describe("buildFlipTransform helper", () => {
+		// Matrix2d.val is column-major: [a, c, e, b, d, f, 0, 0, 1]
+		// For a pure axis-aligned scale-around-center transform, expected (a, d)
+		// components must match. Translate components (e, f) should sum to 0
+		// after the round-trip.
+		const cases = [
+			[0, "identity", { a: 1, d: 1 }],
+			[FLIP_H_BIT, "H", { a: -1, d: 1 }],
+			[FLIP_V_BIT, "V", { a: 1, d: -1 }],
+			[FLIP_H_BIT | FLIP_V_BIT, "H+V", { a: -1, d: -1 }],
+		];
+
+		for (const [mask, label, expected] of cases) {
+			it(`builds matrix for flipMask=${label}`, () => {
+				const m = new Matrix2d();
+				buildFlipTransform(m, mask, 32, 32);
+				// val layout: val[0]=a, val[3]=b, val[1]=c, val[4]=d, val[2]=e, val[5]=f
+				expect(Math.round(m.val[0] * 1e6) / 1e6).toBe(expected.a);
+				expect(Math.round(m.val[4] * 1e6) / 1e6).toBe(expected.d);
+			});
+		}
+
+		it("AD-flipped matrix has non-zero off-diagonal terms (rotation present)", () => {
+			const m = new Matrix2d();
+			buildFlipTransform(m, FLIP_AD_BIT, 32, 32);
+			// AD flip = rotate(-90) + scale(-1, 1) — off-diagonal terms must be non-zero
+			expect(Math.abs(m.val[3]) + Math.abs(m.val[1])).toBeGreaterThan(0.5);
+		});
+
+		it("matches Tile.setTileTransform output on the legacy path", () => {
+			const layer = makeOrthogonalLayer([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
+			for (const flipBits of [
+				0,
+				TMX_FLIP_H,
+				TMX_FLIP_V,
+				TMX_FLIP_AD,
+				TMX_FLIP_H | TMX_FLIP_V,
+				TMX_FLIP_H | TMX_FLIP_AD,
+				TMX_FLIP_V | TMX_FLIP_AD,
+				TMX_FLIP_H | TMX_FLIP_V | TMX_FLIP_AD,
+			]) {
+				const tile = new Tile(0, 0, (1 | flipBits) >>> 0, layer.tileset);
+
+				const fromFlags = new Matrix2d();
+				tile.setTileTransform(fromFlags);
+
+				const fromMask = new Matrix2d();
+				const flipMask =
+					(tile.flippedX ? FLIP_H_BIT : 0) |
+					(tile.flippedY ? FLIP_V_BIT : 0) |
+					(tile.flippedAD ? FLIP_AD_BIT : 0);
+				buildFlipTransform(fromMask, flipMask, 32, 32);
+
+				// every component of val[] must match (within float precision)
+				for (let i = 0; i < 9; i++) {
+					expect(fromMask.val[i]).toBeCloseTo(fromFlags.val[i], 5);
+				}
+			}
+		});
+	});
+
+	describe("TMXTileset.drawTileRaw", () => {
+		it("emits the same source rect + destination as drawTile for a non-flipped tile", () => {
+			const layer = makeOrthogonalLayer([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
+			const tileset = layer.tileset;
+			const rawRenderer = makeRecordingRenderer();
+			const oldRenderer = makeRecordingRenderer();
+
+			tileset.drawTileRaw(rawRenderer, 100, 200, 1, 0);
+			tileset.drawTile(oldRenderer, 100, 200, new Tile(0, 0, 1, tileset));
+
+			const rawCall = rawRenderer.__calls()[0];
+			const oldCall = oldRenderer.__calls()[0];
+			expect(rawCall.sx).toBe(oldCall.sx);
+			expect(rawCall.sy).toBe(oldCall.sy);
+			expect(rawCall.sw).toBe(oldCall.sw);
+			expect(rawCall.sh).toBe(oldCall.sh);
+			expect(rawCall.dx).toBe(oldCall.dx);
+			expect(rawCall.dy).toBe(oldCall.dy);
+			expect(rawCall.dw).toBe(oldCall.dw);
+			expect(rawCall.dh).toBe(oldCall.dh);
+			expect(rawCall.image).toBe(oldCall.image);
+		});
+
+		it("selects the correct atlas source rect for each GID", () => {
+			const layer = makeOrthogonalLayer([1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0]);
+			const tileset = layer.tileset;
+
+			for (const gid of [1, 2, 3, 4]) {
+				const renderer = makeRecordingRenderer();
+				tileset.drawTileRaw(renderer, 0, 0, gid, 0);
+				const call = renderer.__calls()[0];
+				const expectedOffset = tileset.atlas[tileset.getViewTileId(gid)].offset;
+				expect(call.sx).toBe(expectedOffset.x);
+				expect(call.sy).toBe(expectedOffset.y);
+			}
+		});
+
+		it("non-flipped tile produces no save/restore (no transform pass)", () => {
+			const layer = makeOrthogonalLayer([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
+			const tileset = layer.tileset;
+			let saved = 0;
+			let restored = 0;
+			const renderer = {
+				uvOffset: 0,
+				drawImage() {},
+				save() {
+					saved++;
+				},
+				restore() {
+					restored++;
+				},
+				translate() {},
+				transform() {},
+			};
+			tileset.drawTileRaw(renderer, 0, 0, 1, 0);
+			expect(saved).toBe(0);
+			expect(restored).toBe(0);
+		});
+
+		it("flipped tile wraps drawImage with save/restore", () => {
+			const layer = makeOrthogonalLayer([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
+			const tileset = layer.tileset;
+			let saved = 0;
+			let restored = 0;
+			const renderer = {
+				uvOffset: 0,
+				drawImage() {},
+				save() {
+					saved++;
+				},
+				restore() {
+					restored++;
+				},
+				translate() {},
+				transform() {},
+			};
+			tileset.drawTileRaw(renderer, 0, 0, 1, FLIP_H_BIT);
+			expect(saved).toBe(1);
+			expect(restored).toBe(1);
+		});
+
+		it("matches drawTile for every flip combination", () => {
+			const layer = makeOrthogonalLayer([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
+			const tileset = layer.tileset;
+
+			const flipCases = [
+				[0, 0],
+				[TMX_FLIP_H, FLIP_H_BIT],
+				[TMX_FLIP_V, FLIP_V_BIT],
+				[TMX_FLIP_AD, FLIP_AD_BIT],
+				[TMX_FLIP_H | TMX_FLIP_V, FLIP_H_BIT | FLIP_V_BIT],
+				[TMX_FLIP_H | TMX_FLIP_AD, FLIP_H_BIT | FLIP_AD_BIT],
+				[TMX_FLIP_V | TMX_FLIP_AD, FLIP_V_BIT | FLIP_AD_BIT],
+				[
+					TMX_FLIP_H | TMX_FLIP_V | TMX_FLIP_AD,
+					FLIP_H_BIT | FLIP_V_BIT | FLIP_AD_BIT,
+				],
+			];
+
+			for (const [legacyFlips, mask] of flipCases) {
+				const rawRenderer = makeRecordingRenderer();
+				const oldRenderer = makeRecordingRenderer();
+
+				tileset.drawTileRaw(rawRenderer, 50, 50, 1, mask);
+				tileset.drawTile(
+					oldRenderer,
+					50,
+					50,
+					new Tile(0, 0, (1 | legacyFlips) >>> 0, tileset),
+				);
+
+				const r = rawRenderer.__calls()[0];
+				const o = oldRenderer.__calls()[0];
+				expect(r.sx).toBe(o.sx);
+				expect(r.sy).toBe(o.sy);
+				expect(r.dx).toBeCloseTo(o.dx, 6);
+				expect(r.dy).toBeCloseTo(o.dy, 6);
+				expect(r.dw).toBeCloseTo(o.dw, 6);
+				expect(r.dh).toBeCloseTo(o.dh, 6);
+			}
+		});
+	});
+
+	describe("TMXOrthogonalRenderer.drawTileRaw", () => {
+		it("matches drawTile for every flip combination", () => {
+			const layer = makeOrthogonalLayer([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
+			const tmxRenderer = layer.getRenderer();
+			const tileset = layer.tileset;
+
+			const cases = [
+				[0, 0],
+				[TMX_FLIP_H, FLIP_H_BIT],
+				[TMX_FLIP_V, FLIP_V_BIT],
+				[TMX_FLIP_AD, FLIP_AD_BIT],
+			];
+			for (const [legacyFlips, mask] of cases) {
+				const rawRenderer = makeRecordingRenderer();
+				const oldRenderer = makeRecordingRenderer();
+
+				tmxRenderer.drawTileRaw(rawRenderer, 2, 1, 1, mask, tileset);
+				tmxRenderer.drawTile(
+					oldRenderer,
+					2,
+					1,
+					new Tile(2, 1, (1 | legacyFlips) >>> 0, tileset),
+				);
+
+				const r = rawRenderer.__calls()[0];
+				const o = oldRenderer.__calls()[0];
+				expect(r.dx).toBeCloseTo(o.dx, 6);
+				expect(r.dy).toBeCloseTo(o.dy, 6);
+				expect(r.sx).toBe(o.sx);
+				expect(r.sy).toBe(o.sy);
+			}
+		});
+	});
+
+	describe("drawTileLayer hot loop bypasses Tile construction", () => {
+		it("Orthogonal: renders a populated layer without constructing any Tile", () => {
+			const data = [1, 2, 3, 4, 0, 1, 0, 2, 3, 0, 4, 0];
+			const layer = makeOrthogonalLayer(data);
+
+			let tileConstructorCalls = 0;
+			const origSetMinMax = Tile.prototype.setMinMax;
+			Tile.prototype.setMinMax = function (...args) {
+				tileConstructorCalls++;
+				return origSetMinMax.apply(this, args);
+			};
+
+			try {
+				const renderer = makeRecordingRenderer();
+				const rect = {
+					pos: { x: 0, y: 0 },
+					width: 4 * 32,
+					height: 3 * 32,
+					right: 4 * 32,
+					bottom: 3 * 32,
+				};
+				layer.getRenderer().drawTileLayer(renderer, layer, rect);
+				expect(tileConstructorCalls).toBe(0);
+				// every non-zero cell produced a drawImage call (8 cells set in `data`)
+				expect(renderer.__calls().length).toBe(8);
+			} finally {
+				Tile.prototype.setMinMax = origSetMinMax;
+			}
+		});
+
+		it("Orthogonal: hot loop yields the same drawImage sequence as the legacy drawTile path", () => {
+			const data = [
+				1,
+				(2 | TMX_FLIP_H) >>> 0,
+				(3 | TMX_FLIP_V) >>> 0,
+				(4 | TMX_FLIP_H | TMX_FLIP_V) >>> 0,
+				0,
+				1,
+				0,
+				2,
+				3,
+				0,
+				4,
+				0,
+			];
+			const layerNew = makeOrthogonalLayer(data);
+			const layerOld = makeOrthogonalLayer(data);
+
+			const tmxNew = layerNew.getRenderer();
+			const tmxOld = layerOld.getRenderer();
+
+			// run the new hot loop
+			const newRenderer = makeRecordingRenderer();
+			const rect = {
+				pos: { x: 0, y: 0 },
+				width: 4 * 32,
+				height: 3 * 32,
+				right: 4 * 32,
+				bottom: 3 * 32,
+			};
+			tmxNew.drawTileLayer(newRenderer, layerNew, rect);
+
+			// emulate the legacy hot loop manually (cellAt → drawTile)
+			const oldRenderer = makeRecordingRenderer();
+			for (let y = 0; y < 3; y++) {
+				for (let x = 0; x < 4; x++) {
+					const t = layerOld.cellAt(x, y, false);
+					if (t) {
+						tmxOld.drawTile(oldRenderer, x, y, t);
+					}
+				}
+			}
+
+			const newCalls = newRenderer.__calls();
+			const oldCalls = oldRenderer.__calls();
+			expect(newCalls.length).toBe(oldCalls.length);
+			for (let i = 0; i < newCalls.length; i++) {
+				const n = newCalls[i];
+				const o = oldCalls[i];
+				expect(n.sx).toBe(o.sx);
+				expect(n.sy).toBe(o.sy);
+				expect(n.sw).toBe(o.sw);
+				expect(n.sh).toBe(o.sh);
+				expect(n.dx).toBeCloseTo(o.dx, 6);
+				expect(n.dy).toBeCloseTo(o.dy, 6);
+				expect(n.dw).toBeCloseTo(o.dw, 6);
+				expect(n.dh).toBeCloseTo(o.dh, 6);
+				expect(n.image).toBe(o.image);
+			}
+		});
+
+		it("Orthogonal: empty cells produce no drawImage calls", () => {
+			const data = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
+			const layer = makeOrthogonalLayer(data);
+			const renderer = makeRecordingRenderer();
+			const rect = {
+				pos: { x: 0, y: 0 },
+				width: 4 * 32,
+				height: 3 * 32,
+				right: 4 * 32,
+				bottom: 3 * 32,
+			};
+			layer.getRenderer().drawTileLayer(renderer, layer, rect);
+			expect(renderer.__calls().length).toBe(0);
+		});
+
+		it("Orthogonal: tileset short-circuit cache works (single-tileset layer = no per-cell lookup)", () => {
+			const data = [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4];
+			const layer = makeOrthogonalLayer(data);
+			const tilesets = layer.tilesets;
+
+			let lookupCalls = 0;
+			const origGetTilesetByGid = tilesets.getTilesetByGid.bind(tilesets);
+			tilesets.getTilesetByGid = function (gid) {
+				lookupCalls++;
+				return origGetTilesetByGid(gid);
+			};
+
+			try {
+				const renderer = makeRecordingRenderer();
+				const rect = {
+					pos: { x: 0, y: 0 },
+					width: 4 * 32,
+					height: 3 * 32,
+					right: 4 * 32,
+					bottom: 3 * 32,
+				};
+				layer.getRenderer().drawTileLayer(renderer, layer, rect);
+				// all GIDs (1..4) are in the same tileset — short-circuit cache hits, no lookups
+				expect(lookupCalls).toBe(0);
+				expect(renderer.__calls().length).toBe(12);
+			} finally {
+				tilesets.getTilesetByGid = origGetTilesetByGid;
+			}
+		});
+
+		// note: renderorder="left-up" / "right-up" / "left-down" rely on a
+		// pre-existing bound-clamping quirk in the orientation renderer (the
+		// swap can drive start.x to cols, which is out of layerData range).
+		// That behavior is identical before and after this refactor (the old
+		// 2D-array path would have thrown TypeError; the new typed-array path
+		// reads undefined and skips). Out of scope to fix here.
+	});
+});
diff --git a/packages/melonjs/tests/tmxlayer-flips.spec.js b/packages/melonjs/tests/tmxlayer-flips.spec.js
new file mode 100644
index 000000000..4d5f2ed92
--- /dev/null
+++ b/packages/melonjs/tests/tmxlayer-flips.spec.js
@@ -0,0 +1,226 @@
+import { describe, expect, it } from "vitest";
+import { Matrix2d } from "../src/math/matrix2d.ts";
+
+// Local copy of the legacy `buildFlipTransform` from `TMXTile.js`.
+// Inlined here to avoid a circular import via the full TMX module tree
+// (TMXTile pulls in Sprite, which isn't safe to load in isolation). If
+// the legacy implementation ever changes, update this copy to match —
+// the surrounding tests are precisely the regression net that catches
+// the shader drifting from it.
+const FLIP_H_BIT_LOCAL = 1;
+const FLIP_V_BIT_LOCAL = 2;
+const FLIP_AD_BIT_LOCAL = 4;
+const buildFlipTransform = (transform, flipMask, width, height) => {
+	const halfW = width / 2;
+	const halfH = height / 2;
+	const flippedH = (flipMask & FLIP_H_BIT_LOCAL) !== 0;
+	const flippedV = (flipMask & FLIP_V_BIT_LOCAL) !== 0;
+	const flippedAD = (flipMask & FLIP_AD_BIT_LOCAL) !== 0;
+
+	transform.identity();
+	transform.translate(halfW, halfH);
+	if (flippedAD) {
+		transform.rotate((-90 * Math.PI) / 180);
+		transform.scale(-1, 1);
+	}
+	if (flippedH) {
+		transform.scale(flippedAD ? 1 : -1, flippedAD ? -1 : 1);
+	}
+	if (flippedV) {
+		transform.scale(flippedAD ? -1 : 1, flippedAD ? 1 : -1);
+	}
+	transform.translate(-halfW, -halfH);
+	return transform;
+};
+
+/**
+ * The shader's atlas-sampling code applies the INVERSE of the legacy
+ * CPU `buildFlipTransform` to derive an atlas UV from the fragment's
+ * position within the destination tile. The mapping must be identical
+ * across all 8 Tiled flip combinations (AD × H × V), otherwise rotated
+ * tiles render visibly wrong (as they did once before — bottom-left vs
+ * top-left corner ending up on the wrong side).
+ *
+ * The shader does this in 5 lines:
+ *   inTile = mix(inTile, inTile.yx, flipAD);
+ *   effH   = mix(flipH, flipV, flipAD);
+ *   effV   = mix(flipV, flipH, flipAD);
+ *   inTile.x = mix(inTile.x, 1.0 - inTile.x, effH);
+ *   inTile.y = mix(inTile.y, 1.0 - inTile.y, effV);
+ *
+ * `shaderFlip` below is the JS port — bug-for-bug identical so the
+ * tests live or die with the shader's formula. The two correctness
+ * checks below cross-validate it: against the explicit table, and
+ * against the inverse of `buildFlipTransform` (the legacy renderer's
+ * authoritative source of flip semantics).
+ */
+
+const FLIP_H_BIT = 1;
+const FLIP_V_BIT = 2;
+const FLIP_AD_BIT = 4;
+
+/** JS twin of the GLSL flip block — must stay in lockstep. */
+const shaderFlip = (x, y, mask) => {
+	const flipH = mask & FLIP_H_BIT ? 1 : 0;
+	const flipV = mask & FLIP_V_BIT ? 1 : 0;
+	const flipAD = mask & FLIP_AD_BIT ? 1 : 0;
+	let u = x;
+	let v = y;
+	if (flipAD) {
+		// transpose around y = x
+		const tx = u;
+		u = v;
+		v = tx;
+	}
+	// when AD is set, H and V swap their effective axes
+	const effH = flipAD ? flipV : flipH;
+	const effV = flipAD ? flipH : flipV;
+	if (effH) {
+		u = 1 - u;
+	}
+	if (effV) {
+		v = 1 - v;
+	}
+	return [u, v];
+};
+
+/**
+ * Reference table: (dest fragment in [0, 1]²) → (source atlas UV).
+ * Derived by inverting the matrix the legacy `buildFlipTransform`
+ * composes for each flip combination.
+ */
+const EXPECTED = [
+	// mask=0 (no flip): identity
+	[
+		0b000,
+		(x, y) => {
+			return [x, y];
+		},
+	],
+	// mask=1 (H): mirror X
+	[
+		0b001,
+		(x, y) => {
+			return [1 - x, y];
+		},
+	],
+	// mask=2 (V): mirror Y
+	[
+		0b010,
+		(x, y) => {
+			return [x, 1 - y];
+		},
+	],
+	// mask=3 (H+V): 180° rotation
+	[
+		0b011,
+		(x, y) => {
+			return [1 - x, 1 - y];
+		},
+	],
+	// mask=4 (AD): transpose (reflection over y = x)
+	[
+		0b100,
+		(x, y) => {
+			return [y, x];
+		},
+	],
+	// mask=5 (AD+H): 90° CW rotation
+	[
+		0b101,
+		(x, y) => {
+			return [y, 1 - x];
+		},
+	],
+	// mask=6 (AD+V): 90° CCW rotation
+	[
+		0b110,
+		(x, y) => {
+			return [1 - y, x];
+		},
+	],
+	// mask=7 (AD+H+V): anti-diagonal reflection (line y = 1 - x)
+	[
+		0b111,
+		(x, y) => {
+			return [1 - y, 1 - x];
+		},
+	],
+];
+
+// Sample points cover the four corners + interior + edge-midpoints, so
+// any sign / axis / off-by-one bug in the shader port shows up.
+const SAMPLES = [
+	[0, 0],
+	[1, 0],
+	[1, 1],
+	[0, 1],
+	[0.5, 0.5],
+	[0.25, 0.75],
+	[0.75, 0.25],
+	[0, 0.5],
+	[0.5, 0],
+];
+
+describe("TMX shader-path flip math", () => {
+	for (const [mask, expected] of EXPECTED) {
+		const label = [
+			mask & FLIP_AD_BIT ? "AD" : null,
+			mask & FLIP_H_BIT ? "H" : null,
+			mask & FLIP_V_BIT ? "V" : null,
+		]
+			.filter(Boolean)
+			.join("+");
+
+		it(`mask=${mask} (${label || "identity"}) matches the expected table`, () => {
+			for (const [x, y] of SAMPLES) {
+				const [su, sv] = shaderFlip(x, y, mask);
+				const [eu, ev] = expected(x, y);
+				expect(su).toBeCloseTo(eu, 6);
+				expect(sv).toBeCloseTo(ev, 6);
+			}
+		});
+	}
+});
+
+describe("TMX shader-path flip math vs legacy buildFlipTransform", () => {
+	// The shader's formula must compose to the exact INVERSE of the
+	// legacy CPU transform for every flip combination. This is the
+	// load-bearing assertion: if `buildFlipTransform` ever changes its
+	// semantics, this test fails and forces the shader to follow.
+	const W = 70;
+	const H = 70;
+
+	const samplesInPixelSpace = [
+		[0, 0],
+		[W * 0.25, H * 0.75],
+		[W * 0.5, H * 0.5],
+		[W * 0.75, H * 0.25],
+		[W - 1, H - 1],
+	];
+
+	for (let mask = 0; mask < 8; mask++) {
+		const label =
+			[
+				mask & FLIP_AD_BIT ? "AD" : null,
+				mask & FLIP_H_BIT ? "H" : null,
+				mask & FLIP_V_BIT ? "V" : null,
+			]
+				.filter(Boolean)
+				.join("+") || "identity";
+
+		it(`mask=${mask} (${label}): shader inverse round-trips legacy forward`, () => {
+			const fwd = buildFlipTransform(new Matrix2d(), mask, W, H);
+			for (const [sx, sy] of samplesInPixelSpace) {
+				// LEGACY forward: source-pixel position → destination-pixel position
+				const dest = { x: sx, y: sy };
+				fwd.apply(dest);
+				// SHADER inverse: destination position (normalized) → source UV
+				const [su, sv] = shaderFlip(dest.x / W, dest.y / H, mask);
+				// Round-trip must land back on the original source pixel
+				expect(su * W).toBeCloseTo(sx, 3);
+				expect(sv * H).toBeCloseTo(sy, 3);
+			}
+		});
+	}
+});
diff --git a/packages/melonjs/tests/tmxlayer-shader.spec.js b/packages/melonjs/tests/tmxlayer-shader.spec.js
new file mode 100644
index 000000000..3fc5150e5
--- /dev/null
+++ b/packages/melonjs/tests/tmxlayer-shader.spec.js
@@ -0,0 +1,324 @@
+import { afterAll, beforeAll, describe, expect, it } from "vitest";
+import { boot, video } from "../src/index.js";
+import { BufferTextureResource } from "../src/video/texture/resource.js";
+import OrthogonalTMXLayerGPURenderer from "../src/video/webgl/renderers/tmxlayer/orthogonal.js";
+import WebGLRenderer from "../src/video/webgl/webgl_renderer.js";
+
+describe("TMXLayer shader path", () => {
+	let renderer;
+
+	beforeAll(async () => {
+		await boot();
+		try {
+			video.init(64, 64, {
+				parent: "screen",
+				renderer: video.WEBGL,
+			});
+			if (
+				video.renderer instanceof WebGLRenderer &&
+				video.renderer.WebGLVersion === 2
+			) {
+				renderer = video.renderer;
+			}
+		} catch {
+			// CI runners without GL acceleration can't construct a WebGL2
+			// renderer; tests below mark themselves skipped at runtime
+		}
+	});
+
+	afterAll(() => {
+		try {
+			video.init(64, 64, {
+				parent: "screen",
+				renderer: video.AUTO,
+			});
+		} catch {
+			// ignore — nothing to restore if boot/init never succeeded
+		}
+	});
+
+	// Runtime skip helper. Produces a real "skipped" test status (visible
+	// in the reporter and CI summaries) instead of silently no-op'ing,
+	// which would hide a regression behind a green check.
+	const requireWebGL2 = (ctx) => {
+		if (renderer === undefined) {
+			ctx.skip("WebGL2 renderer not available in this environment");
+		}
+	};
+
+	it("lazily constructs the GPU renderer when first asked", (ctx) => {
+		requireWebGL2(ctx);
+		// `_getTMXGPURendererFor` is the lazy factory on WebGLRenderer
+		const r1 = renderer._getTMXGPURendererFor("orthogonal");
+		const r2 = renderer._getTMXGPURendererFor("orthogonal");
+		expect(r1).toBeInstanceOf(OrthogonalTMXLayerGPURenderer);
+		expect(r2).toBe(r1);
+	});
+
+	/**
+	 * Regression: the GPU renderer used to bind its index texture to a
+	 * hardcoded unit (7) without telling the batcher. That left
+	 * `boundTextures[7]` stale and any other texture allocated to unit 7
+	 * later collided silently — every subsequent atlas draw on that unit
+	 * sampled the index texture and went invisible.
+	 *
+	 * Now the index texture flows through `cache.getUnit` /
+	 * `batcher.uploadTexture` like everything else. This test confirms
+	 * the unit it receives is whatever the cache allocator hands out
+	 * (not a fixed magic number) and matches what the cache reports.
+	 */
+	it("allocates the index texture's unit through the standard cache", (ctx) => {
+		requireWebGL2(ctx);
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+
+		const cols = 4;
+		const rows = 3;
+		const layer = {
+			cols,
+			rows,
+			layerData: new Uint16Array(cols * rows * 2),
+			dataVersion: 0,
+		};
+		layer.layerData[0] = 1;
+		layer.layerData[2] = 17;
+		layer.layerData[4] = 42;
+
+		const resource = gpu._getResource(layer);
+		expect(resource).toBeInstanceOf(BufferTextureResource);
+		expect(resource.premultipliedAlpha).toBe(false);
+		expect(resource.filter).toBe(renderer.gl.NEAREST);
+
+		const batcher = renderer.setBatcher("quad");
+		const unit = batcher.uploadTexture(resource, cols, rows);
+
+		// unit comes from the dynamic cache allocator — not a hardcoded slot
+		expect(Number.isInteger(unit)).toBe(true);
+		expect(unit).toBeGreaterThanOrEqual(0);
+		expect(renderer.cache.getUnit(resource)).toBe(unit);
+
+		gpu.reset();
+	});
+
+	/**
+	 * Regression: the index data has A=0 on every texel (the high byte of
+	 * the flip mask is unused). The standard texture pipeline keeps
+	 * `UNPACK_PREMULTIPLY_ALPHA_WEBGL = true`, which would have the driver
+	 * multiply RGB by A/255 = 0 and silently wipe every GID. The resource
+	 * declares `premultipliedAlpha: false` and the batcher reconciles GL
+	 * state per upload — bytes round-trip intact.
+	 */
+	it("preserves GID bytes through upload (no premultiply alpha)", (ctx) => {
+		requireWebGL2(ctx);
+		const gl = renderer.gl;
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+
+		const cols = 2;
+		const rows = 1;
+		const layer = {
+			cols,
+			rows,
+			layerData: new Uint16Array(cols * rows * 2),
+			dataVersion: 0,
+		};
+		// cell (0,0): GID = 0x0001 — bytes 01 00 00 00
+		layer.layerData[0] = 1;
+		// cell (1,0): GID = 0x00FF — bytes FF 00 00 00
+		layer.layerData[2] = 0xff;
+
+		const batcher = renderer.setBatcher("quad");
+		const resource = gpu._getResource(layer);
+		batcher.uploadTexture(resource, cols, rows);
+		const texture = batcher.getTexture2D(renderer.cache.getUnit(resource));
+
+		const fbo = gl.createFramebuffer();
+		gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+		gl.framebufferTexture2D(
+			gl.FRAMEBUFFER,
+			gl.COLOR_ATTACHMENT0,
+			gl.TEXTURE_2D,
+			texture,
+			0,
+		);
+		expect(gl.checkFramebufferStatus(gl.FRAMEBUFFER)).toBe(
+			gl.FRAMEBUFFER_COMPLETE,
+		);
+
+		const pixels = new Uint8Array(cols * rows * 4);
+		gl.readPixels(0, 0, cols, rows, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
+		expect(pixels[0]).toBe(1);
+		expect(pixels[4]).toBe(0xff);
+
+		gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+		gl.deleteFramebuffer(fbo);
+		gpu.reset();
+	});
+
+	/**
+	 * `reset()` (invoked from `WebGLRenderer.reset()` on `GAME_RESET`)
+	 * must drop every cached per-layer resource so each level transition
+	 * starts with a clean cache.
+	 */
+	it("drops every cached layer resource on reset()", (ctx) => {
+		requireWebGL2(ctx);
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+		const batcher = renderer.setBatcher("quad");
+
+		const makeLayer = (cols, rows) => {
+			return {
+				cols,
+				rows,
+				layerData: new Uint16Array(cols * rows * 2),
+				dataVersion: 0,
+			};
+		};
+
+		const l1 = makeLayer(2, 2);
+		const l2 = makeLayer(3, 3);
+		batcher.uploadTexture(gpu._getResource(l1), l1.cols, l1.rows);
+		batcher.uploadTexture(gpu._getResource(l2), l2.cols, l2.rows);
+		expect(gpu.resources.size).toBe(2);
+
+		gpu.reset();
+		expect(gpu.resources.size).toBe(0);
+	});
+
+	/**
+	 * Animation lookup: non-animated tilesets don't allocate a lookup
+	 * entry at all (saves a texture unit + a GL texture); animated
+	 * tilesets get a `tileCount × 1` RGBA8 texture initialized to
+	 * identity (localId → localId).
+	 */
+	it("only allocates an animation lookup for animated tilesets", (ctx) => {
+		requireWebGL2(ctx);
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+
+		const staticTileset = {
+			isAnimated: false,
+			animations: new Map(),
+		};
+		const animatedTileset = {
+			isAnimated: true,
+			animations: new Map([[5, { cur: { tileid: 5 } }]]),
+		};
+
+		expect(gpu._getOrUpdateAnimLookup(staticTileset, 16)).toBeUndefined();
+
+		const entry = gpu._getOrUpdateAnimLookup(animatedTileset, 16);
+		expect(entry).toBeDefined();
+		expect(entry.tileCount).toBe(16);
+		// identity initialization: texel 7 encodes localId 7 → (R=7, G=0)
+		expect(entry.data[7 * 4 + 0]).toBe(7);
+		expect(entry.data[7 * 4 + 1]).toBe(0);
+		// the animated entry at localId 5 has been written too — same as
+		// its current frame, so still 5
+		expect(entry.data[5 * 4 + 0]).toBe(5);
+
+		gpu.reset();
+	});
+
+	/**
+	 * When an animation ticks (the tileset's `anim.cur.tileid` changes),
+	 * the lookup data is rewritten and the `dirty` flag flips so the
+	 * batcher knows to force-reupload on the next bind.
+	 */
+	it("marks the animation lookup dirty when a frame changes", (ctx) => {
+		requireWebGL2(ctx);
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+
+		const anim = { cur: { tileid: 10 } };
+		const tileset = { isAnimated: true, animations: new Map([[10, anim]]) };
+
+		// initial: identity at slot 10 → (10, 0)
+		const entry = gpu._getOrUpdateAnimLookup(tileset, 32);
+		expect(entry.data[10 * 4 + 0]).toBe(10);
+		expect(entry.dirty).toBe(false);
+
+		// advance the frame to a value spanning the lo/hi byte boundary
+		anim.cur.tileid = 258; // 258 = 0x0102  → lo=2, hi=1
+		gpu._getOrUpdateAnimLookup(tileset, 32);
+		expect(entry.data[10 * 4 + 0]).toBe(2);
+		expect(entry.data[10 * 4 + 1]).toBe(1);
+		expect(entry.dirty).toBe(true);
+
+		gpu.reset();
+	});
+
+	/**
+	 * The animation lookup must round-trip through GL: a known-bad
+	 * version of this would silently get its bytes mangled by
+	 * UNPACK_PREMULTIPLY_ALPHA_WEBGL = true (A=0 wipes RGB), the same
+	 * trap the index texture had to navigate.
+	 */
+	it("preserves animation lookup bytes through upload", (ctx) => {
+		requireWebGL2(ctx);
+		const gl = renderer.gl;
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+		const batcher = renderer.setBatcher("quad");
+
+		// tile 0 → frame 3 (lo=3, hi=0), tile 1 → frame 256 (lo=0, hi=1)
+		const tileset = {
+			isAnimated: true,
+			animations: new Map([
+				[0, { cur: { tileid: 3 } }],
+				[1, { cur: { tileid: 256 } }],
+			]),
+		};
+		const entry = gpu._getOrUpdateAnimLookup(tileset, 4);
+		batcher.uploadTexture(entry.resource, entry.tileCount, 1, true);
+		const texture = batcher.getTexture2D(
+			renderer.cache.getUnit(entry.resource),
+		);
+
+		const fbo = gl.createFramebuffer();
+		gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+		gl.framebufferTexture2D(
+			gl.FRAMEBUFFER,
+			gl.COLOR_ATTACHMENT0,
+			gl.TEXTURE_2D,
+			texture,
+			0,
+		);
+		expect(gl.checkFramebufferStatus(gl.FRAMEBUFFER)).toBe(
+			gl.FRAMEBUFFER_COMPLETE,
+		);
+
+		const pixels = new Uint8Array(4 * 4);
+		gl.readPixels(0, 0, 4, 1, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
+		// tile 0 → 3
+		expect(pixels[0 * 4 + 0]).toBe(3);
+		expect(pixels[0 * 4 + 1]).toBe(0);
+		// tile 1 → 256 = 0x0100
+		expect(pixels[1 * 4 + 0]).toBe(0);
+		expect(pixels[1 * 4 + 1]).toBe(1);
+		// tile 2 / 3 stay identity
+		expect(pixels[2 * 4 + 0]).toBe(2);
+		expect(pixels[3 * 4 + 0]).toBe(3);
+
+		gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+		gl.deleteFramebuffer(fbo);
+		gpu.reset();
+	});
+
+	/**
+	 * `reset()` must drop the animation-lookup map too, not just the
+	 * per-layer index-texture map. Otherwise the lookup textures leak
+	 * across level transitions.
+	 */
+	it("drops every animation lookup on reset()", (ctx) => {
+		requireWebGL2(ctx);
+		const gpu = renderer._getTMXGPURendererFor("orthogonal");
+
+		gpu._getOrUpdateAnimLookup(
+			{ isAnimated: true, animations: new Map([[0, { cur: { tileid: 0 } }]]) },
+			4,
+		);
+		gpu._getOrUpdateAnimLookup(
+			{ isAnimated: true, animations: new Map([[1, { cur: { tileid: 1 } }]]) },
+			8,
+		);
+		expect(gpu.animLookups.size).toBe(2);
+
+		gpu.reset();
+		expect(gpu.animLookups.size).toBe(0);
+	});
+});
diff --git a/packages/melonjs/tests/uniform-caching.spec.js b/packages/melonjs/tests/uniform-caching.spec.js
new file mode 100644
index 000000000..06ddc3ed0
--- /dev/null
+++ b/packages/melonjs/tests/uniform-caching.spec.js
@@ -0,0 +1,151 @@
+import { describe, expect, it } from "vitest";
+import { extractUniforms } from "../src/video/webgl/utils/uniforms.js";
+
+/**
+ * Builds a minimal stub GL context that records every `uniform*` call.
+ * `extractUniforms` only needs `getUniformLocation` and the `uniformX`
+ * methods it actually invokes; everything else can be left out.
+ */
+function makeStubGL() {
+	const calls = [];
+	function record(name) {
+		return function (...args) {
+			calls.push({ name, args });
+		};
+	}
+	return {
+		calls,
+		getUniformLocation(program, name) {
+			return { name };
+		},
+		uniform1i: record("uniform1i"),
+		uniform1f: record("uniform1f"),
+		uniform2fv: record("uniform2fv"),
+		uniform4fv: record("uniform4fv"),
+		uniformMatrix4fv: record("uniformMatrix4fv"),
+	};
+}
+
+function makeShader(fragmentSrc) {
+	return {
+		vertex: "",
+		fragment: fragmentSrc,
+		program: {},
+	};
+}
+
+function countCalls(gl, name) {
+	let n = 0;
+	for (const c of gl.calls) {
+		if (c.name === name) {
+			n++;
+		}
+	}
+	return n;
+}
+
+describe("uniform caching", () => {
+	it("skips the GL call when the same scalar is set twice", () => {
+		const gl = makeStubGL();
+		const uniforms = extractUniforms(
+			gl,
+			makeShader("uniform float uOpacity;\nuniform int uMode;"),
+		);
+
+		uniforms.uOpacity = 0.5;
+		uniforms.uOpacity = 0.5;
+		uniforms.uOpacity = 0.5;
+		expect(countCalls(gl, "uniform1f")).toBe(1);
+
+		uniforms.uMode = 7;
+		uniforms.uMode = 7;
+		expect(countCalls(gl, "uniform1i")).toBe(1);
+	});
+
+	it("emits a fresh GL call when the scalar changes", () => {
+		const gl = makeStubGL();
+		const uniforms = extractUniforms(gl, makeShader("uniform float uOpacity;"));
+
+		uniforms.uOpacity = 0.5;
+		uniforms.uOpacity = 0.75;
+		uniforms.uOpacity = 0.5;
+		expect(countCalls(gl, "uniform1f")).toBe(3);
+	});
+
+	it("compares vec values element-wise, not by reference", () => {
+		const gl = makeStubGL();
+		const uniforms = extractUniforms(gl, makeShader("uniform vec2 uPos;"));
+
+		// Same values, three different array instances — cache hit.
+		uniforms.uPos = new Float32Array([10, 20]);
+		uniforms.uPos = new Float32Array([10, 20]);
+		uniforms.uPos = [10, 20];
+		expect(countCalls(gl, "uniform2fv")).toBe(1);
+	});
+
+	it("detects in-place mutation of a reused scratch buffer", () => {
+		const gl = makeStubGL();
+		const uniforms = extractUniforms(gl, makeShader("uniform vec2 uPos;"));
+
+		// Realistic hot-path: caller keeps a scratch Float32Array and
+		// rewrites it before each setUniform — the cache must compare by
+		// value so a mutated buffer is detected as a change.
+		const scratch = new Float32Array(2);
+		scratch[0] = 1;
+		scratch[1] = 2;
+		uniforms.uPos = scratch;
+
+		scratch[0] = 3;
+		scratch[1] = 4;
+		uniforms.uPos = scratch;
+
+		scratch[0] = 3;
+		scratch[1] = 4;
+		uniforms.uPos = scratch;
+
+		expect(countCalls(gl, "uniform2fv")).toBe(2);
+	});
+
+	it("treats different uniforms with the same value as independent", () => {
+		const gl = makeStubGL();
+		const uniforms = extractUniforms(
+			gl,
+			makeShader("uniform vec2 uA;\nuniform vec2 uB;"),
+		);
+
+		uniforms.uA = new Float32Array([1, 2]);
+		uniforms.uB = new Float32Array([1, 2]);
+		// Different uniforms — both must emit.
+		expect(countCalls(gl, "uniform2fv")).toBe(2);
+	});
+
+	it("caches matrix uploads element-wise", () => {
+		const gl = makeStubGL();
+		const uniforms = extractUniforms(gl, makeShader("uniform mat4 uProj;"));
+
+		const m = new Float32Array(16);
+		m[0] = 1;
+		m[5] = 1;
+		m[10] = 1;
+		m[15] = 1;
+		uniforms.uProj = m;
+		uniforms.uProj = new Float32Array(m);
+		expect(countCalls(gl, "uniformMatrix4fv")).toBe(1);
+
+		m[12] = 100;
+		uniforms.uProj = m;
+		expect(countCalls(gl, "uniformMatrix4fv")).toBe(2);
+	});
+
+	it("keeps caches independent across shaders sharing a GL context", () => {
+		const gl = makeStubGL();
+		const a = extractUniforms(gl, makeShader("uniform float uOpacity;"));
+		const b = extractUniforms(gl, makeShader("uniform float uOpacity;"));
+
+		a.uOpacity = 0.5;
+		b.uOpacity = 0.5;
+		// Each shader has its own cache — shader B's first write must
+		// reach GL or its uniform would be left at the default value.
+		expect(countCalls(gl, "uniform1f")).toBe(2);
+	});
+});