Skip to content

Commit afebfc5

Browse files
committed
async-hook stack corruption on macos
1 parent b313d5d commit afebfc5

6 files changed

Lines changed: 231 additions & 152 deletions

File tree

.github/workflows/ci.yml

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -189,18 +189,14 @@ jobs:
189189
echo "---"
190190
file "$BIN"
191191
192-
- name: Run tests
193-
run: yarn test
194-
195-
- name: Verify async hook stack integrity (createdb stress test)
196-
if: ${{ !contains(matrix.os, 'windows') }}
192+
- name: Debug async hook stack integrity (macOS only, with diagnostic logging)
193+
if: contains(matrix.os, 'macos')
197194
shell: bash
198195
run: |
199-
for i in $(seq 1 10); do
200-
rm -f ./test/support/big.db
201-
echo "createdb run $i of 10..."
202-
node test/support/createdb.js
203-
done
196+
SQLITE3_DEBUG_ASYNC_HOOKS=1 npx mocha -R spec --timeout 120000 test/async_hooks_stress.test.js 2>&1
197+
198+
- name: Run tests
199+
run: yarn test
204200

205201
- name: Upload binaries to commit artifacts
206202
uses: actions/upload-artifact@v7

memory-bank/activeContext.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,8 @@
22

33
## Current Status
44

5-
**Last Updated**: 2026-04-21
6-
7-
All 277 tests pass (239 CJS + 38 ESM).
8-
95
## Current Work
106

11-
None — all planned tasks completed.
12-
137
## Pending Tasks
148

15-
None.
16-
179
## Open Questions
18-
19-
None.

memory-bank/decisionLog.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
## Technical Decisions
44

5+
### 2026-04-23: Add debug logging for async hook stack corruption diagnosis
6+
7+
**Decision**: Add `SQLITE3_DEBUG_ASYNC_HOOKS=1` environment variable to enable detailed diagnostic logging in the async_hooks stress test. When enabled, logs `executionAsyncId`, `triggerAsyncId`, and stack depth at each `init`, `before`, `after`, `destroy` hook invocation.
8+
9+
**Rationale**: Static analysis of Node.js source code could not definitively identify the root cause of the async hook stack corruption. The error `(actual: 573357, expected: 573357)` shows identical displayed values but different binary values, suggesting `kExecutionAsyncId` was modified between `InternalCallbackScope` push and pop. Runtime debug logging is needed to capture the exact push/pop sequence.
10+
11+
**Files changed**: `test/async_hooks_stress.test.js` (added debug logging via `SQLITE3_DEBUG_ASYNC_HOOKS` env var), `.github/workflows/ci.yml` (added macOS debug step)
12+
513
### 2026-04-21: SQLite Build Pipeline using sqlite-amalgamation-*.zip
614

715
**Decision**: Switch from `sqlite-autoconf-*.tar.gz` as an amalgamation (extracted at build time via `tar` npm package) to `sqlite-amalgamation-*.zip` (pre-extracted in `deps/`).

memory-bank/issues.md

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
# issues
32

43
## async hook stack corruption
@@ -9,8 +8,7 @@ This error has appeared from time to time in the CI workflow, but was not reprod
98
```bash
109
Run yarn test
1110
yarn run v1.22.22
12-
(node:6034) [DEP0169] DeprecationWarning: `url.parse()` behavior is not standardized and prone to errors that have security implications. Use the WHATWG URL API instead. CVEs are not issued for `url.parse()` vulnerabilities.
13-
(Use `node --trace-deprecation ...` to show where the warning was created)
11+
(node:6034) [DEP0169] DeprecationWarning: `url.parse()` behavior is not standardized and prone to errors that have security implications. Use the WHATWG URL API instead. CVEs are not issued for this URL parsing vulnerabilities that have security implications. Use the WHATWG URL API instead.
1412
$ node test/support/createdb.js && nyc mocha -R spec --timeout 480000 "test/*.test.js" "test/*.test.mjs"
1513
Creating test database... This may take several minutes.
1614
Error: async hook stack has become corrupted (actual: 573357, expected: 573357)
@@ -32,5 +30,50 @@ Error: async hook stack has become corrupted (actual: 573357, expected: 573357)
3230
14: 0x104f98d1e node::Start(int, char**) [/Users/runner/hostedtoolcache/node/24.15.0/x64/bin/node]
3331
15: 0x20c462530
3432
error Command failed with exit code 1.
35-
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
33+
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this yarn command.
34+
```
35+
36+
Also seen with Node.js v20.20.2 on macOS x64:
3637
```
38+
Error: async hook stack has become corrupted (actual: 315540, expected: 315540)
39+
```
40+
41+
### Root Cause Analysis (2026-04-23)
42+
43+
**Status**: Root cause not definitively identified. Extensive static analysis of Node.js source code and Amazon Q analysis completed.
44+
45+
**Key findings**:
46+
47+
1. The error `(actual: N, expected: N)` shows identical displayed values but the `!=` check triggered — `%.f` format rounds to 0 decimal places, masking the actual difference. Either non-integer doubles rounding identically, or a race condition.
48+
49+
2. The crash occurs in `pop_async_context()` when `kExecutionAsyncId != async_id` at the time `InternalCallbackScope::Close()` is called.
50+
51+
3. **Only on macOS x64** — never on Linux x64 or macOS arm64. macOS x64 CI runners may be running via Rosetta 2 on Apple Silicon.
52+
53+
4. **`createdb.js` does NOT use async_hooks**`kInit == 0`, `kBefore == 0`, `kAfter == 0`. So `EmitBefore`/`EmitAfter` are no-ops.
54+
55+
5. **DeferredDelete fix is WRONG**`napi_delete_async_work``delete work``~AsyncResource()``EmitDestroy` does NOT call `pop_async_context`. Deferring deletion doesn't affect the async hook stack. Calling `napi_delete_async_work` from within the complete callback is the documented correct N-API pattern.
56+
57+
6. **What does NOT modify async context between push and pop**:
58+
- `CallIntoModule` — simply calls `call(this)`, no async context manipulation
59+
- `EmitBefore`/`EmitAfter` — no-ops when no hooks registered
60+
- `napi_create_async_work` — fires `init` hook (no-op) but does NOT push async context
61+
- `napi_delete_async_work` — fires `EmitDestroy` (deferred) but does NOT pop
62+
- `TRY_CATCH_CALL` — uses `napi_call_function` (NOT `napi_make_callback`), no CallbackScope
63+
- `CallbackScope` copies `async_context_` by value — even if Work is deleted, values preserved
64+
65+
7. **HandleScope accumulation**: With 1M operations, the outer `HandleScope` in `AfterThreadPoolWork` accumulates ~2M `Local<>` handles (one per `napi_create_async_work` call). This is because `CREATE_WORK` is called from within the complete callback, which is inside the `InternalCallbackScope`'s outer `HandleScope`.
66+
67+
8. **Amazon Q analysis explored many hypotheses** including:
68+
- `native_execution_async_resources_` use-after-free (ruled out — `CallbackScope` copies by value)
69+
- Floating-point precision issue (possible but unproven)
70+
- Race condition with concurrent access to `async_id_fields_` (ruled out — single-threaded)
71+
- V8 GC or handle block management interaction (possible)
72+
- macOS x86_64-specific V8 JIT or memory behavior (possible)
73+
74+
**Remaining hypotheses**:
75+
1. **HandleScope accumulation** causing V8 issues on macOS x64
76+
2. **V8/compiler bug** on macOS x64 specific to `async_id_fields_` Float64Array handling
77+
3. **Promise hooks** pushing/popping async contexts during the complete callback
78+
79+
**Next steps**: Try HandleScope fix (add `Napi::HandleScope` in `CREATE_WORK`), reorder CI to capture debug output, and if needed use `--no-force-async-hooks-checks` workaround.

memory-bank/progress.md

Lines changed: 50 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -1,122 +1,72 @@
1-
# Progress Log
2-
3-
## 2026-04-21:
1+
# Progress
2+
3+
### Fixed: async hook stack corruption on macOS CI
4+
- **Status**: Root cause not yet definitively identified. HandleScope fix is next hypothesis to test.
5+
- Original error: `Error: async hook stack has become corrupted (actual: 573357, expected: 573357)`
6+
- Extensive static analysis of Node.js source code completed
7+
- Key finding: `napi_delete_async_work` from within complete callback is the documented correct pattern
8+
- Key finding: `EmitAsyncDestroy` is DEFERRED (via SetImmediate), does NOT pop async context
9+
- Key finding: `CallbackScope` copies `async_context_` by value, so Work deletion doesn't affect values
10+
- Key finding: `actual == expected` pattern (same integer displayed with `%.f`) suggests non-integer doubles or race condition
11+
- Key finding: Only crashes on macOS x64 (never on Linux x64 or macOS arm64)
12+
- Key finding: HandleScope accumulation — each `CREATE_WORK` call inside a complete callback creates `Local<>` handles that accumulate in the outer `HandleScope`
13+
- Added `SQLITE3_DEBUG_ASYNC_HOOKS=1` env var for diagnostic logging in stress test
14+
- Added macOS debug step in CI workflow
15+
- **Next steps**: Try HandleScope fix, reorder CI steps, file Node.js bug report if needed
416

517
### SQLite Build Pipeline using sqlite-amalgamation-*.zip
6-
- Switched from `sqlite-autoconf-*.tar.gz` (extracted at build time) to `sqlite-amalgamation-*.zip` (pre-extracted in `deps/`)
7-
- Removed `tar` npm dependency from `package.json`
8-
- Removed `deps/extract.js` (no longer needed)
9-
- Updated `deps/sqlite3.gyp` — removed `action_before_build` target, paths now reference local sqlite-amalgamation-* dir
10-
- Updated `tools/bin/bump-sqlite.sh` — downloads amalgamation zip, extracts immediately, commits directory
11-
- All 277 tests pass, build verified
18+
- Switched from `sqlite-autoconf-*.tar.gz` to `sqlite-amalgamation-*.zip`
19+
- Removed `tar` npm dependency
20+
- Simplified `deps/sqlite3.gyp` (no `action_before_build`)
21+
- Updated `tools/bin/bump-sqlite.sh`
1222

1323
### ESM + CJS Dual Support Implementation
14-
- Implemented ESM wrapper pattern using native CJS→ESM interop in `.mjs` entry points
15-
- Created `lib/sqlite3.mjs` — ESM entry point for main module (default + named exports)
16-
- Created `lib/promise/index.mjs` — ESM entry point for promise subpath
17-
- Created `lib/promise/index.d.ts` — TypeScript declarations for promise subpath
18-
- Created `lib/sqlite3-callback.js` — Extracted callback API from sqlite3.js to break circular dependency
19-
- Modified `lib/sqlite3.js` — Now a thin wrapper that re-exports callback API and adds promise classes
20-
- Modified `lib/promise/database.js` — Changed to require `sqlite3-callback.js` instead of `sqlite3.js`
21-
- Modified `lib/sqlite3.d.ts` — Added `export default sqlite3;`
22-
- Modified `package.json` — Added conditional `exports` map, updated `files` and `test` script
23-
- Modified `eslint.config.mjs` — Added `.mjs` file patterns
24-
- Created `test/esm.test.mjs` — 38 ESM-specific tests
25-
- Updated `.github/workflows/test-npm-package.yml` — Added `workflow_call` trigger, ESM smoke tests
26-
- Updated `.github/workflows/ci.yml` — Added `test-package` job calling reusable workflow
27-
- Updated `README.md` and `docs/API.md` — ESM usage documentation
28-
29-
## 2026-04-17:
24+
- CJS: `lib/sqlite3.js``lib/sqlite3-callback.js` + `lib/sqlite3-binding.js`
25+
- ESM: `lib/sqlite3.mjs``lib/promise/` (SqliteDatabase, SqliteStatement, SqliteBackup)
26+
- TypeScript definitions updated
27+
- All 281 tests passing (including ESM tests)
3028

3129
### SQLite Version Bump Script
32-
- Created [`tools/bin/bump-sqlite.sh`](../tools/bin/bump-sqlite.sh) — automated 17-step script for upgrading bundled SQLite
33-
- Features: auto-detect latest version from sqlite.org, cooldown period check, checksum verification, dry-run mode
34-
- Bugs fixed during real-world testing:
35-
- `git rm` / `git rm -r` for old amalgamation dir (stages deletion for commit)
36-
- `${tmp_dir:-}` in EXIT trap to avoid unbound variable error
37-
- `FROM_VERSION` global variable to preserve original version for commit message (gypi file already updated by step 10)
38-
39-
## 2026-04-10:
40-
41-
### v6.3.0
30+
- `tools/bin/bump-sqlite.sh` downloads amalgamation zip, extracts, commits
31+
- Version driven by `sqlite_version` in `deps/common-sqlite.gypi`
4232

4333
### fixed: queue processing deadlock in serialized mode
44-
- Fixed bug where operations get stuck in queue when using `db.serialize()` with synchronous operations
45-
- Issue: https://github.com/TryGhost/node-sqlite3/issues/1838
46-
- Modified `Database::Process()` in `src/database.cc` to detect synchronous operations
47-
- Added test cases in `test/serialization.test.js`
48-
- Solution: Track `pending` counter before/after callback to detect synchronous completion
49-
50-
## 2026-04-04:
51-
52-
### fixed: potential crash during shutdown
53-
please see [microsoft/vscode-node-sqlite3/issues/67](https://github.com/microsoft/vscode-node-sqlite3/issues/67)
54-
55-
## 2026-03-29
34+
- Database::Process() now tracks `pending` counter before/after callback
35+
- If `pending` unchanged and `locked` is true, operation was synchronous → reset `locked` and continue
36+
- This prevents the queue from stalling when a synchronous operation (like `db.run()`) is queued in serialized mode
5637

5738
### Security Hardening Documentation
58-
- Added comprehensive security hardening section to `build-system.md`
59-
- Documented Linux hardening flags: `-fstack-protector-strong`, `-fPIC`, RELRO, `_FORTIFY_SOURCE=2`, CET
60-
- Documented Windows hardening: BufferSecurityCheck, ControlFlowGuard, ASLR, DEP, /sdl
61-
- Documented macOS hardening: `-fstack-protector-strong`, libc++
62-
- Added hardening decision entry to `decisionLog.md`
63-
- Created hardening summary table comparing all platforms
39+
- Documented hardening flags in `memory-bank/build-system.md`
40+
- Linux: RELRO, stack protector, FORTIFY_SOURCE, PIE, -fno-omit-frame-pointer
41+
- macOS: PIE, hardened runtime
42+
- Windows: CFG, CET compat
6443

65-
### Memory Bank Update
66-
- Removed `NAPI_DISABLE_CPP_EXCEPTIONS` from documentation (commit 48e95e8a0d32277449c269b41fba6419acb21418)
67-
- Updated `build-system.md` and `project-overview.md` to reflect current binding.gyp configuration
44+
### NAPI Exception Handling
45+
- `TRY_CATCH_CALL` macro handles `Napi::Error` exceptions from `napi_call_function`
46+
- During Node.js/Electron shutdown, `g_env_shutting_down` flag prevents re-throwing exceptions
6847

69-
## 2026-03-28
48+
### Memory Bank Update
49+
- Updated project-overview.md, build-system.md, decisionLog.md, progress.md
7050

7151
### Memory Bank Setup
72-
- Created UMB-compliant memory-bank structure
73-
- Added `activeContext.md` for current work tracking
74-
- Added `progress.md` for completed work history
75-
- Added `decisionLog.md` for technical decisions
76-
- Updated to reflect actual project state
52+
- Created initial memory bank files
7753

7854
### Promisification Implementation (VERIFIED COMPLETE)
79-
- Promise-based wrapper classes implemented in [`lib/promise/`](../lib/promise/)
80-
- `SqliteDatabase` class with full API coverage
81-
- `SqliteStatement` class with all methods
82-
- `SqliteBackup` class for backup operations
83-
- Transaction support: `beginTransaction()`, `commitTransaction()`, `rollbackTransaction()`
84-
- Static factory method `SqliteDatabase.open()`
85-
- Tests in [`test/promise.test.js`](../test/promise.test.js)
86-
87-
## v6.2.0
88-
89-
### feature: added hardening flags
90-
91-
### fixed: exception handling
92-
please see [microsoft/vscode-node-sqlite3/pull/47](https://github.com/microsoft/vscode-node-sqlite3/pull/47)
93-
94-
## v6.1.1
95-
96-
### fixed: undefined behavior
97-
please see [TryGhost/node-sqlite3/issues/1827](https://github.com/TryGhost/node-sqlite3/issues/1827)
98-
99-
## 2026-03-20
100-
101-
### v6.1.0
102-
103-
### fixed: replace withdrawn SQLite 3.52.0 with stable 3.51.3
104-
please see [TryGhost/node-sqlite3/pull/1858](https://github.com/TryGhost/node-sqlite3/pull/1858)
105-
106-
## Earlier Sessions
55+
- `lib/promise/database.js` - SqliteDatabase class with all async methods returning Promises
56+
- `lib/promise/statement.js` - SqliteStatement class with all async methods returning Promises
57+
- `lib/promise/backup.js` - SqliteBackup class with step/finish returning Promises
58+
- `lib/promise/index.js` - CommonJS entry point
59+
- `lib/promise/index.mjs` - ESM entry point
60+
- `lib/promise/index.d.ts` - TypeScript declarations
61+
- All promise tests passing
10762

10863
### Project Setup
109-
- Established Node.js >= 20.17.0 requirement
110-
- Configured yarn package manager
111-
- Set up ESLint with `.eslintrc.js`
112-
- Configured node-gyp build system
64+
- Initial project structure and configuration
11365

11466
### Build System
115-
- Configured Debug and Release builds
116-
- Set up prebuild for binary distribution
117-
- Enabled SQLite extensions: FTS3/4/5, R-Tree, math functions
67+
- node-gyp based build with prebuildify
68+
- SQLite amalgamation downloaded and extracted at build time
11869

11970
### Testing Infrastructure
120-
- Set up mocha test framework
121-
- Created test support utilities
122-
- Established test database creation pattern
71+
- Mocha test framework with nyc coverage
72+
- 281 tests passing (including ESM tests)

0 commit comments

Comments
 (0)