Docs_archive
Mesh V2 global
Source docs/MESH_V2_GLOBAL.md · All_docs
Product goal: allow agents (and Hive nodes) to communicate beyond a single tenant / a single OS, with discovery, trust, and transport suited for WAN — without abandoning the local v1 (Redis + A2A on a single instance).
The v1 is the copper ring: same instance, same Redis, protocol and observability. The v2 is the Internet of agents: federation, global addressability, and delivery semantics where pub/sub is no longer sufficient.
1. Principles
- Open but verifiable: not every peer is trusted; identity and policy before sensitive traffic (cf. Layer 5
TECH_VISION.md). - Interop: A2A / Agent Card as lingua franca when possible; bridges to other stacks if needed.
- Degradation: an isolated node remains usable (local v1 mode); the WAN mesh is additive.
- No network magic: NAT, firewalls, and data jurisdiction remain the operator's responsibility; this document describes patterns, not a promise of illegal circumvention.
2. Technical pillars (logical order)
3. Proposed phases (deliverables)
Phase V2.0 — "Controlled" federation
- Two (or N) Hive instances mutually approved (admin pairing + shared secret or PKI).
- Outbound: outbound A2A proxy to the other domain (fixed URL, mTLS).
- Inbound: dedicated ingress —
POST /api/a2a/federated/jsonrpc(alias of the A2A JSON-RPC handler, same federation auth) + optional IP allowlist (MESH_FEDERATION_INBOUND_ALLOWLIST). - Deliverable: runbook + config; not yet "anyone on Earth".
V2.0.1 — Implemented in code (config + visibility)
- Variables:
MESH_FEDERATION_ENABLED,MESH_FEDERATION_PEERS(comma-separated list of HTTPS origins). GET /api/mesh/federation/status(viewer+): JSON{ meshV2, federation }without secrets.GET /api/a2a/statusnow includesfederation(same payload as above) for the Settings → A2A / mesh dashboard.- Production warning if federation is enabled without any valid parsed peer (empty list or only invalid URLs).
V2.0.2 — Operator probe (reachability)
GET /api/mesh/federation/status?probe=1(operator+): the server performs a GET to/.well-known/agent-card.jsonon each origin in the effective list (static peers + signed roster manifest merged — see V2.2.1; arbitrary client-side paths are not possible — no open SSRF). Response:{ meshV2, federation, probe: [...] }with HTTP status, latency, and any error per peer. If federation is disabled or has no valid peers,probeis an empty array.GET /api/mesh/federation/status?debug_manifest=1(operator+): addsmanifestDebug(manifestLastError,effectivePeerCount) —manifestLastError= stable token (snake_caseorhttp_NNN) orunknownif the value is not recognized (never a raw exception message); can be combined with?probe=1. A shared peer resolution with thefederationbody avoids a secondresolveFederationPeersin the same request when probe/debug is requested.- UI: Settings → A2A / mesh, when federation is enabled — Probe peers button (same
?probe=1endpoint, session cookie). - Tests:
app/src/lib/mesh-federation-probe.test.ts(fetch mock);app/src/app/api/mesh/federation/status/route.test.ts(auth + probe mock).
V2.1 — Federated JSON-RPC transport (short-lived JWT + optional secret)
- Variable
MESH_FEDERATION_SHARED_SECRET(≥32 characters, same value on each paired node) — serves as the HS256 signing key for the JWT and (if enabled) for hash + timing-safe comparison of the legacy header. - Inbound:
POST /api/a2a/jsonrpcwith exactly one of the headersX-Hive-Federation-JWT(preferred, durationMESH_FEDERATION_JWT_TTL_SECONDS) orX-Hive-Federation-Secret(legacy). Both at once → 400.exp/iatverification with configurable clock skew tolerance (MESH_FEDERATION_JWT_CLOCK_SKEW_LEEWAY_SECONDS, default 60 s, 0 = strict). Each JWT minted by Hive includes a uniquejti; reception consumes thisjtionce in Redis (TTL untilexp) — replaying the same token → 401. If Redis is unavailable during this step → 503.MESH_FEDERATION_JWT_REQUIRE_JTI(default true) requires the presence ofjti;MESH_FEDERATION_JWT_REQUIRE_AUDIENCE(default true) requires anaudclaim equal to the public origin of this instance;MESH_FEDERATION_INBOUND_ALLOW_LEGACY_SECRET(default true) — if false, only JWT is accepted (secret alone → 403).MESH_FEDERATION_PROXY_SEND_SECRET(default false): the proxy only sends the legacy secret on the wire if explicitly enabled. A2A identityhive-federated(equivalent to operator for the JSON-RPC RBAC ceiling). Audit: after handler execution, Postgres actionmesh.federation.inbound_jsonrpc(method, HTTP status, JWT/legacy mode,issif useful, correlation). - Outbound (operator proxy):
POST /api/mesh/federation/proxy/jsonrpc(operator+, session or API token), JSON body{ "peerIndex": 0, "rpc": { ... } }—peerIndexaligned with the order of parsed origins inMESH_FEDERATION_PEERS. The server relays tohttps://<peer>/api/a2a/jsonrpcwith a fresh JWT including theaudclaim = target peer's origin (anti-replay on another instance). IfMESH_FEDERATION_PROXY_SEND_SECRET=true(default), also sendsX-Hive-Federation-Secretfor backward compatibility. Response (including SSE) forwarded to the client. Audit: actionmesh.federation.proxy_jsonrpc. - Status:
GET /api/mesh/federation/statusexposesmeshV2: "2.10.0"; thefederationblock includeswanMesh(public descriptor, peer ceiling, manifest),phase,sharedSecretConfigured,jsonRpcProxy(paths, headers, JWT TTL, audience,jwtAlgHS256/Ed25519, local Ed25519 public key if applicable,jwtRequireAudience,jwtRequireJti,proxyOperatorTokenRequired, proxy secret flag,inboundAllowLegacySecret— never the raw secret). - Ed25519 (optional):
MESH_FEDERATION_JWT_ALG=Ed25519— JWTs signed with a local seed (MESH_FEDERATION_ED25519_SEED_HEX, 64 hex); each peer registers the public key of others inMESH_FEDERATION_PEER_ED25519_PUBLIC_KEYS(same order asMESH_FEDERATION_PEERS). Theissclaim = public origin of the issuer; the signature is verified with the peer's key. mTLS between instances remains recommended at the reverse proxy / LB level (outside application-levelfetchscope). - Rate limit: dedicated Redis window (
MESH_FEDERATION_RATE_LIMIT_MAX,MESH_FEDERATION_RATE_LIMIT_WINDOW_MS, default 100 / 60s) — keyin:<IP>for inbound JSON-RPC (federation header) andproxy:<operatorId>forPOST /api/mesh/federation/proxy/jsonrpc, independent of the generic A2A rate limit. - Directory (no HTTP fetch from the endpoint):
GET /api/mesh/federation/directory(viewer+) — listspeers[]withpeerIndex,origin,hostname,agentCardUrlfor the effective list static + manifest (same order as the proxy). Also exposed asfederation.directoryPathinGET /api/a2a/status. Runbook:MESH_FEDERATION_RUNBOOK.md. - Inbound IP allowlist:
MESH_FEDERATION_INBOUND_ALLOWLISToptional — if set, only listed IPs can use inbound federated auth (JWT or secret, checked before verification). Public status:federation.federationInboundAllowlistActive.
Phase V2.2 — Optional public directory
- Publishing Agent Cards (or pointers) in a registry (self-hosted Hive or trusted third party).
- Schema: tenant id, capabilities, endpoints, Noise/signing public keys.
- Deliverable: OpenAPI / JSON schema spec + minimal reference server.
V2.2.1 — Implemented (discovery + signed roster)
GET /.well-known/hive-mesh.json(public, no auth):hive-mesh-descriptor-v1descriptor — instance origin, Agent Card / A2A JSON-RPC URLs,federation+wanMeshblock (peer ceiling, configured manifest, static/manifest counters, last manifest error if fetch/verify fails). ShortCache-Control;Access-Control-Allow-Origin: *for indexing / tooling.MESH_FEDERATION_MAX_PEERS(default 512, max 8192): ceiling after static + manifest deduplication.- Signed manifest:
MESH_FEDERATION_PEERS_MANIFEST_URL+MESH_FEDERATION_PEERS_MANIFEST_PUBLIC_KEY_HEX(Ed25519, 64 hex). JSON body{ "payload": { "v": 1, "peers": [...] }, "sigHex": "128 hex" }where the Ed25519 signature coversstableStringify(payload)(same canonicalization asmesh-envelope). Static peers remain first; manifest entries without a valid Ed25519 key are ignored inMESH_FEDERATION_JWT_ALG=Ed25519mode. Security: in production (NODE_ENV≠development), onlyhttps:is accepted for the manifest URL; HTTP response limited to 2 MiB (stream +Content-Length) to prevent OOM. - Resolution:
resolveFederationPeerscaches the merged roster in Redis (hive:cache:federation_peers:v1:*) with TTL =MESH_FEDERATION_PEERS_MANIFEST_REFRESH_SECONDS(seconds, bounded 30–86400), plus an in-memory L1 cache per worker. The logical key includes the manifest URL, the signing public key, and the static parameters. If Redis is unavailable (read/write), falls back to L1 + fetch. Manifest HTTP fetch: distributed lockSET … NXunderhive:lock:+ same suffix as the cache key for single-flight across workers (others wait for cache population; fetch fallback if wait exceeds timeout).buildMeshFederationPublicAsyncreuses the same snapshot forwanMeshand the transport ready boolean (no second resolve in this path). Inbound JSON-RPC in Ed25519: a shared resolve between the "verification ready" gate and JWT verification. Directory, probe, proxy: merged list. Structured logs: modulemesh.federation.resolve(manifest success/failure, Redis errors — no secrets). - Public status:
federation.wanMeshexposesmanifestLastSyncOkandmanifestIssueCategory(safe values:fetch/verify/size/protocol/unknown) — no raw error string on/.well-known/hive-mesh.json. meshV2exposed as 2.10.0 on federation status / directory routes andGET /.well-known/hive-mesh.json(public mesh bootstrap + identity / reputation / public API keys + per-key scopes + opt-in reputation blocking + public tier OTel metrics — see V2.3).
Phase V2.3 — Open mesh (Internet)
- DHT-style discovery / public relays (libp2p model or equivalent) — outside current application scope; delivered alternative:
MESH_PUBLIC_MESH_BOOTSTRAP_URLS(third-partyhive-mesh.jsonURLs published inGET /.well-known/hive-mesh.json→publicMesh.bootstrapMeshDescriptorUrls). - Bootstrapped (MVP transport): JSON-RPC without auth for explicitly allowlisted methods (
A2A_PUBLIC_JSONRPC_*, default off), Redis rate limithive:rl:public_a2aper IP, identityhive-public-a2a— seeMESH_PUBLIC_A2A.mdandGET /api/a2a/status→publicJsonRpc/publicMesh. AliasPOST /api/a2a/jsonrpc/public(same handler);/.well-known/hive-mesh.jsonexposespublicJsonRpcUrlwhen public mode is active. - Per-identity rate limit: configurable header + SHA-256 hash →
hive:rl:public_a2a_id(in addition to the IP bucket). - Public API keys:
A2A_PUBLIC_JSONRPC_API_KEYS+ buckethive:rl:public_a2a_apikey; optionA2A_PUBLIC_JSONRPC_REQUIRE_API_KEY(seeMESH_PUBLIC_A2A.md). - Reputation (opt-in): Redis counters
hive:mesh:pub_rep:*per API key hash or identity; optional blocking by threshold onrate_limited+rpc_error(HTTP 429) — seeMESH_PUBLIC_A2A.md. - Immediate follow-up (doc / ops): advanced reputation (beyond Redis thresholds), enriched threats (
MESH_PUBLIC_A2A.md). - Planetary follow-up: see §3.1 — Planetary mesh trajectory (DHT/relay, global directory, WAN gateway).
§3.1 — Planetary mesh trajectory (target: "Internet of agents")
Goal: beyond static bootstrap (
MESH_PUBLIC_MESH_BOOTSTRAP_URLS, signed manifests, paired federation), enable discovery and transport that do not depend on a closed list of URLs manually maintained by each operator — while remaining verifiable and revocable.
This is not a single iteration: multiple teams / quarters, with infrastructure choices (self-hosted vs public network).
Recommended milestones (logical order)
Reference in this repository (stubs): P4 "federation pull" registry (MESH_PLANETARY_P4_FEDERATED_SYNC.md), P5 proof hook (MESH_PLANETARY_P5_TRUST_PROOF.md), P6 W3C propagation on the bridge → ingress path (MESH_PLANETARY_P6_WAN_TRACE.md) — see MESH_PLANETARY_TRACE.md.
Risks to explicitly accept
- Abuse: an open mesh attracts spam and enumeration — reputation, quotas, and relay cost must be budgeted from P2–P3.
- Jurisdiction: a "planetary" relay may transit data outside the region; the operator remains responsible (cf. principle §1 "no network magic").
- Governance: who signs the global manifests / registries? Without governance, the DHT quickly becomes unreliable for trust (only for reachability).
Additional success criteria (planetary)
- An agent can resolve a stable handle (e.g., DID or
agent://+ registry) to an Agent Card without knowing the instance URL in advance (outside a local static list). - Two nodes without mutual public IPs can establish an application channel (relay + auth) with documented latency and cost.
- Revocation of a peer or a certificate propagates within the chosen trust model (TTL, CRL, registry, or rotating key).
Phase V2.4 — UX & Pencil
- Implemented (partial): Settings → Federation tab — link status (phase, effective peers, secret, allowlist), WAN / manifest card (static + manifest counters, sync,
/.well-known/hive-mesh.jsonlink), directory (GET /api/mesh/federation/directory) with Agent Card links, transport summary, probe + manifest diagnostic (operator). The A2A / mesh tab links to Federation for details; Settings → A2A also displays public identity / API keys / reputation / bootstrap mesh (V2.3) when configured. - Full visual alignment with
pencil-new.pen(v1 deviation accepted inMESH_V1_DONE) — remains a separate design effort.
4. Relationship with the current code
- To reuse:
@hive/a2a-sdk,mesh-events/mesh-envelope,mesh.a2a.*logs,MESH_BUS_HMAC_SECRET, HTTP correlation. - To add: gateway process (or sidecar service) for WAN — rarely in the sole Next.js thread.
- Not to break: v1-only deployments (one VM, one Redis) must continue without V2 config.
5. V2 success criteria (draft)
- Two teams on two continents can establish a verified A2A channel without sharing the same Postgres database.
- An agent can discover (via directory) the card of a third-party agent without prior access to the other's Redis.
- Operators can revoke a federated link without redeploying the entire mesh.
- Audit: every cross-domain delegation leaves a trace in the Postgres audit log (
audit_logs) — actionsmesh.federation.proxy_jsonrpc(operator outbound proxy) andmesh.federation.inbound_jsonrpc(inbound authenticated federation JSON-RPC, after handler execution).
6. References
TECH_VISION.md— Layer 2 Agent Mesh, Layer 4 protocols, Layer 5 security.MESH_V1_DONE.md— v1 closed status (local promise).A2A_INTEGRATION.md— current entry point.MESH_PLANETARY_TRACE.md— traceability doc ↔ schemas ↔ code ↔ milestones P1–P6.MESH_PLANETARY_P1_GLOBAL_DIRECTORY.md— global directory draft spec (P1).MESH_PLANETARY_P2_WAN_GATEWAY.md— WAN gateway draft ADR (P2).schemas/hive-registry-record-v1.schema.json— P1 JSON Schema.openapi/registry-v1.yaml— registry read OpenAPI (draft).openapi/gateway-v1.yaml— P2 ingress gateway OpenAPI (draft).schemas/hive-mesh-descriptor-v1.schema.json— JSON Schema for the/.well-known/hive-mesh.jsonpublic descriptor.openapi/hive-mesh-well-known.yaml— OpenAPI for this public GET.MESH_PLANETARY_P3_TRANSPORT.md— WAN transport P3 ADR (NATS JetStream MVP).openapi/transport-bridge-v1.yaml— HTTP → bus bridge OpenAPI (P3).
Living document — prioritize phases based on your first use case (multi-region enterprise vs public mesh). To explicitly target planetary scope, chain P1→P3 (directory + gateway + transport) before investing heavily in P4 DHT.