Skip to content

feat(swiftbuddy): MemPalace v1, native macOS theming, HF model manage…#18

Open
solderzzc wants to merge 65 commits intomainfrom
feature/swiftbuddy-mempalace-v1
Open

feat(swiftbuddy): MemPalace v1, native macOS theming, HF model manage…#18
solderzzc wants to merge 65 commits intomainfrom
feature/swiftbuddy-mempalace-v1

Conversation

@solderzzc
Copy link
Copy Markdown
Member

…ment, TDD harness

SwiftBuddy App:

  • Adaptive macOS theming via native NSColor semantics (Light/Dark mode)
  • Removed confusing segmented picker from ModelPickerView
  • Embedded HuggingFace search directly into ModelManagementView
  • Added 'Strict MLX Formatting Only' toggle for HF search filtering
  • Fixed ServerManager ghost isOnline state (deferred until NIO bind succeeds)
  • Fixed missing 'pickaxe' SF Symbol → 'hammer.fill'
  • Bulletproof JSON extraction via boundary scanning (cleanJSON)

MemPalace Core:

  • Wing → Room → MemoryEntry architecture (SwiftData)
  • Apple NLEmbedding vector search with cosine similarity
  • ExtractionService for LLM-based memory mining
  • 3 tool-calling schemas (save_fact, search, list_rooms)

Testing Infrastructure:

  • SwiftBuddyTests target in Package.swift
  • ExtractionServiceTests (3 adversarial JSON parsing tests)
  • HFModelSearchTests (strict/loose MLX filter network tests)
  • Persistent TDD harness system at .agents/harness/
    • Level 1: Memory handling (9 features, 3 passing)
    • Level 2: Model management (10 features, 2 passing)
    • Level 3: MemPalace parity audit vs milla-jovovich/mempalace (34 features, ~22%)

Build:

  • generate_xcodeproj.py updated with Hummingbird + SwiftSoup deps
  • Network entitlements for HF API access
  • arm64-only architecture enforcement

solderzzc added 30 commits April 7, 2026 18:53
- Injected export pipeline guaranteeing MLX metal library initialization hooks bypass Github Action test environments natively
- Introduced currentWing target on ChatViewModel for persona routing
- Intercepted userText explicitly searching SwiftData native memories
- Pre-pended retrieved factual context invisibly inside system prompts ensuring zero-latency, 100% stable context retention across all dumb models seamlessly
…rference and make downloaded models directly tappable to load
…o prevent macOS layout recursion crashes resulting in blank models
…cursive background querying for HF Hub discovery
…skeleton constraints for HuggingFace Hub modal layout
…ng to RegistryService to trace GitHub API access drops
…hed persona.json and statically request known room txt files
… WAL transaction flooding during massive persona corpus ingestion
…oops by converting TextEditor blocks to vertical TextFields inside iOS/macOS active ScrollViews
…ine and introduce Native graphical Map hierarchy for memory rooms
…natively into ChatView toolbars for RAG identity mapping
…tly reflect the currently selected memory persona wing
…try and pivot root Navigation to a primary Friends List model
…ectures by forcefully prepending RAG variables linearly against raw User instructions rather than allocating hostile System Role bounds
solderzzc and others added 28 commits April 8, 2026 16:48
…cks and trap silent HF snapshot failures to guarantee observable developer console logs
…serve KV Prefix caching continuity across MLX generations, and patch RPG Thought UI aesthetics
…r to reject raw boilerplate text and prevent small parameter LLM line-by-line regurgitation
…ridging and append Persona deletion traps in UI
…ap to prevent multiline Persona RAG directives from leaking into user UI bubbles
…he ModelPicker sheets to display real-time global download speeds and ETA dynamically
… Qwen 3 and Qwen 3.5 exclusivity as requested
…mpalace-v1

# Conflicts:
#	.github/workflows/build.yml
#	scripts/profiling/profile_runner.py
… and DMG packaging pipeline via Github Actions
…d() calls during MoE streaming

MoE routing exhibits strong temporal locality — adjacent tokens frequently
route to the same experts (60-70% overlap). This cache stores recently-loaded
quantized expert weight matrices in a bounded LRU (default 2048 entries) keyed
by (safetensorsPath, tensorName, expertIndex).

On cache hits, the entire pread() → allocator::malloc → eval cycle is skipped,
yielding zero I/O latency for repeated expert accesses. Cache hit/miss metrics
are logged to stderr every 10 seconds alongside existing SSD stream stats.

The cache is automatically cleared on model unload to prevent stale weights
and free unified memory.
…enchmark

Results with Hot Expert LRU Cache active:
- SSD + 16-Worker Prefetch: 3.8 tok/s, 5.95s TTFT, 34.9 GB GPU
- SSD + TurboQuant: 3.0 tok/s, 9.46s TTFT, 34.9 GB GPU
- SSD Stream (cold): 0.01 tok/s, 299.66s TTFT, 88.2 GB GPU

The expert cache eliminates ~60-70% of redundant pread() calls on warm
runs, delivering a 300x+ improvement over cold SSD streaming.
@solderzzc solderzzc force-pushed the feature/swiftbuddy-mempalace-v1 branch from 102ef78 to 278ea04 Compare April 10, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant