Unity is the runtime behind Unify's persistent AI colleagues. It is built for assistants you can interrupt mid-task, redirect without restarting, run in parallel, and grow over time through typed, long-lived state.
Unity and Hermes Agent overlap in ambition, but they optimize for different layers. Hermes is a broad personal-agent product with a large end-user surface. Unity is the cognitive runtime for a different bet: steerable nested execution, code-first plans over typed primitives, dual-brain voice, and distributed state managers that consolidate memory into queryable tables instead of keeping everything in one ever-growing loop.
Start here: Overview β’ Quickstart β’ Demos β’ ARCHITECTURE.md
| Steerable nested execution | Every operation returns a live handle. Pause, resume, interject, or ask questions at any depth without restarting the work. |
| Code plans, not flat tool menus | The Actor writes Python programs over typed primitives with variables, loops, and control flow, so multi-step work becomes one coherent plan. |
| Dual-brain voice | A real-time voice process handles sub-second conversation while a slower orchestration layer keeps planning and using tools in the background. |
| Distributed state managers | Contacts, knowledge, tasks, transcripts, guidance, files, images, and more each live behind a dedicated manager with its own async LLM tool loop. |
| Structured memory consolidation | Documents, calls, screenshares, tasks, and follow-up corrections get distilled into typed, queryable state instead of one opaque transcript summary. |
| Concurrent steerable actions | Multiple actions can run at once, each with its own inspection and steering surface. |
| Persistent identity across channels | Messages, SMS, email, phone calls, and meetings all feed the same long-term memory and task state. |
If you're comparing the two directly, the clearest distinction is the layer each project emphasizes:
| If you want... | Better fit |
|---|---|
| A polished personal-agent product with a wide messaging/gateway surface, remote deployment paths, and a large end-user docs surface out of the box | Hermes Agent |
| A runtime you can study, embed, or extend around interruptible execution, nested steering, code-first planning, and typed long-lived state | Unity |
Unity's distinctive bets are:
- Every public operation returns a live handle. Steering is a first-class protocol, not an afterthought.
- The Actor writes Python over
primitives.*. Multi-step composition happens inside one generated program, not one JSON tool call at a time. - Memory is split across managers. Contacts, knowledge, tasks, transcripts, guidance, files, and images remain queryable and inspectable as separate domains.
- Voice runs as two coordinated brains. A fast real-time process keeps up with speech while a slower brain continues reasoning and tool use.
By default, Unity's open-core quickstart is fully local: the runtime, the LLM client, and the persistence backend (Orchestra, via Docker) all run on your machine. Hosted backend at unify.ai is optional.
Prerequisites:
- Python 3.12+ (the installer will fetch it with
uvif needed) - Docker (runs the local Orchestra backend)
- PortAudio for audio support
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-dev
- macOS:
- One LLM provider key β OpenAI or Anthropic are the simplest paths from this README
Install:
curl -fsSL https://raw.githubusercontent.com/unifyai/unity/main/scripts/install.sh | bashThe installer clones unity, unify, unillm, and orchestra as siblings under ~/.unity/, installs dependencies, creates a unity CLI shim in ~/.local/bin/, boots a local Orchestra in Docker, and writes the local UNIFY_KEY / ORCHESTRA_URL into ~/.unity/unity/.env.
Then add one model provider key to ~/.unity/unity/.env:
OPENAI_API_KEY=sk-...
# or
ANTHROPIC_API_KEY=...Run the sandbox:
unity --project_name Sandbox --overwriteAt the configuration prompt:
| Option | What it gives you |
|---|---|
1 |
ConversationManager orchestration without CodeAct β useful if you want to isolate the top-level brain |
2 |
The full runtime: ConversationManager + CodeAct + simulated managers |
3 |
Option 2 plus desktop/browser control through agent-service |
If you're evaluating Unity as a runtime, start with option 2.
Example session:
> msg Hey, can you help me organize my upcoming week?
> sms I need to reschedule my meeting with Sarah to Thursday
> email Project Update | Here are the Q3 numbers you asked for...
Other unity subcommands:
unity setupβ re-bootstrap local Orchestraunity statusβ show local Orchestra statusunity stopβ stop local Orchestraunity restartβ stop + start (wipes the local DB)unity help
The local open-core path is the default. You do not need a Unify account to run Unity locally with Orchestra in Docker.
If you want to install the code without starting a local Orchestra, use:
curl -fsSL https://raw.githubusercontent.com/unifyai/unity/main/scripts/install.sh | bash -s -- --skip-setupThat leaves the code installed but expects you to point Unity at either:
- your own Orchestra deployment via
ORCHESTRA_URL, or - Unify's hosted backend via
UNIFY_KEY+ORCHESTRA_URL
Manual install
git clone https://github.com/unifyai/unity.git ~/.unity/unity
git clone https://github.com/unifyai/unify.git ~/.unity/unify
git clone https://github.com/unifyai/unillm.git ~/.unity/unillm
git clone https://github.com/unifyai/orchestra.git ~/.unity/orchestra
cd ~/.unity/unity
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
cd ~/.unity/orchestra
poetry install
ORCHESTRA_INACTIVITY_TIMEOUT_SECONDS=0 scripts/local.sh start
# Copy the ORCHESTRA_URL and UNIFY_KEY it prints into ~/.unity/unity/.envThe installer copies .env.example to .env. That file is intentionally minimal for the public quickstart.
- Use
.env.exampleif you want the smallest working sandbox config. - Use
.env.advanced.exampleif you want local comms, hosted comms, LiveKit, Tavily, voice integrations, visual caching, or test-infra settings.
For the full sandbox matrix β voice mode, live voice calls, local comms, hosted comms, and GUI mode β see sandboxes/conversation_manager/README.md.
Unity is the open core of the Unify platform. This repository contains the agent runtime: the managers, async tool loops, CodeAct actor, dual-brain voice coordination, event backbone, and memory consolidation.
The persistence backend is open-source too: Orchestra runs locally by default in the quickstart. The supporting client libraries Unify and UniLLM are open-source as well.
Not open-sourced is the managed platform layer around the runtime: hosted communication routing, telephony and SIP infrastructure, Microsoft 365 tenant integration, assistant session control plane, billing, and identity. You can point Unity at Unify's hosted backend instead of a local Orchestra, but features that depend on the managed platform layer only work against the hosted service.
Every operation in Unity returns a live handle you can steer. These handles nest: the user steers the ConversationManager, the ConversationManager steers the Actor, the Actor steers the managers. Corrections, pauses, and queries propagate through the full depth.
In practice:
- "Also include Q2 numbers" mid-way through a report β the agent adjusts without restarting
- "Pause that, something urgent" β work freezes and resumes exactly where it left off
- "How's the flight search going?" β you get a status update without disrupting the work
- Three tasks running at once, each independently steerable
The universal return type. Every manager's ask, update, and execute methods return one.
handle = await actor.act("Research flights to Tokyo and draft an itinerary")
# Twenty seconds later, while it's still working:
await handle.interject("Also check train options from Tokyo to Osaka")
# Or if something urgent comes up:
await handle.pause()
# ... deal with the urgent thing ...
await handle.resume()When the Actor calls primitives.contacts.ask(...), the ContactManager starts its own tool loop and returns its own handle β nested inside the Actor's handle, which is nested inside the ConversationManager's. Steering at any level propagates.
contacts = await primitives.contacts.ask(
"Who was involved in the Henderson project?"
)
for contact in contacts:
history = await primitives.knowledge.ask(
f"What was {contact} last working on?"
)
await primitives.contacts.update(
f"Send {contact} a catch-up email referencing {history}"
)This runs in a sandboxed execution session with the full primitives.* API available β the same typed interfaces the rest of the system uses. One program per turn, with variables, loops, and real control flow. Contact lookup β knowledge retrieval β outbound communication becomes one plan, not three separate tool-selection turns.
Slow brain β the ConversationManager. Sees the full picture: all conversations, notifications, in-flight actions. Makes deliberate decisions. Runs in the main process.
Fast brain β a real-time voice agent on LiveKit, running as a separate subprocess. Sub-second latency. Handles the conversation autonomously.
They talk over IPC. When the slow brain wants to guide the conversation, it sends:
- SPEAK β "say exactly this" (bypasses the fast brain's LLM entirely)
- NOTIFY β "here's some context, decide what to do with it"
- BLOCK β nothing; the fast brain keeps going on its own
A speech urgency evaluator can preempt the slow brain when the user says something that needs immediate attention.
Every 50 messages, the MemoryManager runs a background extraction pass. It pulls out:
- Contact profiles β who people are, their roles, relationships
- Per-contact summaries β what you've been discussing, sentiment, themes
- Response policies β how each person prefers to communicate
- Domain knowledge β project details, preferences, long-term facts
- Tasks β things you committed to, deadlines, follow-ups
Structured, queryable state in typed tables rather than freeform transcript summaries.
ββ In-Flight Actions βββββββββββββββββββββββββββββββββ
β β
β [0] research_flights βββββββββββββ In progress β
β β ask, interject, stop, pause β
β β
β [1] draft_summary βββββββββββββ In progress β
β β ask, interject, stop, pause β
β β
β [2] find_restaurants ββββββββββββ Starting β
β β ask, interject, stop, pause β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Each action gets its own dynamically-generated steering tools. You can inspect, interject into, pause, resume, or stop one action without affecting the others.
ConversationManager (dual-brain orchestration, event-driven scheduling)
β
β Slow Brain βββ IPC βββΊ Fast Brain (real-time voice, LiveKit)
β
βΌ
CodeActActor (generates Python plans, calls primitives.* APIs)
β
βΌ
State Managers (each runs its own async LLM tool loop)
β
βββ ContactManager β people and relationships
βββ KnowledgeManager β domain facts, structured knowledge
βββ TaskScheduler β durable tasks, execution with live handles
βββ TranscriptManager β conversation history and search
βββ GuidanceManager β procedures, SOPs, how-to knowledge
βββ FileManager β file parsing and registry
βββ ImageManager β image storage, vision queries
βββ FunctionManager β user-defined functions, primitives registry
βββ WebSearcher β web research orchestration
βββ SecretManager β encrypted secret storage
βββ BlacklistManager β blocked contact details
βββ DataManager β low-level data operations
β
βββ EventBus β typed pub/sub backbone (Pydantic events)
βββ MemoryManager β offline consolidation every 50 messages
- User message arrives. The slow brain renders a full state snapshot and makes a single-shot tool decision.
- It starts an action via
actor.act(...)β gets back aSteerableToolHandle, registered inin_flight_actions. - The Actor generates a Python plan calling typed primitives. Each primitive dispatches to a manager running its own LLM tool loop, returning its own steerable handle.
- Meanwhile, the slow brain can start more work, steer existing work, or guide the fast brain during voice calls.
- The MemoryManager observes message events and periodically distills conversations into structured knowledge.
- The EventBus carries typed events with hierarchy labels aligned to tool-loop lineage, making everything observable.
| Repo | Role |
|---|---|
| unity (this) | The agent runtime β managers, tool loops, CodeAct, voice, orchestration |
| orchestra | Persistence backend β FastAPI + Postgres + pgvector. Installer spins it up locally in Docker |
| unify | Python SDK β the client Unity uses to talk to Orchestra |
| unillm | LLM access layer β OpenAI, Anthropic, or any compatible endpoint |
All MIT-licensed. The managed product layer β communication routing, telephony, the assistant session control plane, the web dashboard, billing, identity β runs on Unify's platform and is not part of this open core. You can point Unity at Unify's hosted Orchestra instead of a local one, but managed-service features only work against the hosted backend.
Tests exercise the real system (steerable handles, CodeAct, manager composition, nested tool loops) against simulated backends with cached LLM responses:
uv sync --all-groups
source .venv/bin/activate
tests/parallel_run.sh tests/ # everything
tests/parallel_run.sh tests/actor/ # one module
tests/parallel_run.sh tests/contact_manager/ # anotherSee tests/README.md for the full philosophy β responses are cached, not mocked.
| File | What's there |
|---|---|
unity/common/async_tool_loop.py |
SteerableToolHandle β the protocol everything returns |
unity/common/_async_tool/loop.py |
The async tool loop engine β nesting, steering, context propagation |
unity/actor/code_act_actor.py |
CodeAct β plan generation, sandbox, primitives |
unity/conversation_manager/conversation_manager.py |
Dual-brain orchestration, debouncing, in-flight actions |
unity/conversation_manager/domains/brain_action_tools.py |
How the brain starts, steers, and tracks concurrent work |
unity/function_manager/primitives/registry.py |
How primitives are assembled into the typed API surface |
unity/events/event_bus.py |
Typed event backbone |
unity/memory_manager/memory_manager.py |
Offline consolidation pipeline |
unity/
βββ unity/
β βββ actor/ # CodeActActor
β βββ conversation_manager/ # Dual-brain orchestration
β β βββ domains/ # Brain tools, action tracking, rendering
β βββ common/
β β βββ async_tool_loop.py # SteerableToolHandle
β β βββ _async_tool/ # Tool loop internals
β βββ contact_manager/
β βββ knowledge_manager/
β βββ task_scheduler/
β βββ transcript_manager/
β βββ guidance_manager/
β βββ memory_manager/
β βββ function_manager/
β βββ file_manager/
β βββ image_manager/
β βββ web_searcher/
β βββ secret_manager/
β βββ events/
β βββ manager_registry.py
βββ sandboxes/ # Interactive playgrounds
β βββ conversation_manager/ # Full ConversationManager sandbox (start here)
βββ tests/
βββ agent-service/ # Node.js desktop/browser automation
βββ deploy/ # Dockerfile, Cloud Build, virtual desktop
No regex or substring matching for routing user intent. Everything goes through LLM reasoning, guided by prompts and tool docstrings. If the system handles something wrong, we fix the prompt, not add a hardcoded rule.
No mocked LLMs in tests. Every test uses real inference, cached for speed. Delete the cache and you're re-evaluating against live models.
No defensive coding. No try/except around things that shouldn't fail. No null checks for things that shouldn't be null. The system fails loud when assumptions break.
English as an API. Managers communicate through natural-language interfaces. The Actor orchestrates through English-language primitives. The whole system stays inspectable without reading implementation code.
MIT β see LICENSE.
Built by the team at Unify.
