Skip to content
Pilox

Docs_archive

A2A ops audit

Source docs/A2A_OPS_AUDIT.md · All_docs

Review document: quality, scalability, maintainability, and alignment with a same-OS deployment (single host or single container). Last update: includes Phase C (configurable RBAC, GET /api/a2a/status, mesh.a2a.* logs, UI Settings, mesh Redis docs).


1. What is solid

Scroll horizontally →
AreaVerdict
CouplingFew dedicated files (src/lib/a2a/*, two routes). Typed imports from the SDK, no protocol fork.
Runtimeruntime = "nodejs" on A2A routes — consistent with the SDK's crypto / server handler (not Edge).
Authauthorize(A2A_JSONRPC_MIN_ROLE) on JSON-RPC (viewer | operator | admin, default viewer); same session / Bearer / HIVE_INTERNAL_TOKEN stack as the rest of the app.
DiscoveryGET /.well-known/agent-card.json public, metadata; JSON-RPC behind auth — standard A2A usage.
Buildprebuild compiles the SDK; next dev / next build in Webpack avoid the Turbopack failure on file: packages.
A2A tasksDefault RedisTaskStore (A2A_TASK_STORE=redis) — shared across workers; TTL A2A_TASK_TTL_SECONDS (0 = no expiration).
A2A quotaRedis sliding window (hive:rl:a2a:*) — same logic as the rest of the app; the SDK's in-memory rate limit is disabled (rateLimit: false).
Card keysA2A_SIGNING_SECRET_KEY_HEX + A2A_NOISE_STATIC_SECRET_KEY_HEX (64 hex each) → stable keys; otherwise generated on the fly (dev).
Upstream SDKprotocolKey rename (formerly static name) on transport factories: avoids the Webpack/Function#name crash in prod.
HTTP secretsSecrets encryption moved out of route.ts into secrets-crypto.ts: Next routes only export handlers.
Phase C — RBACA2A_JSONRPC_MIN_ROLE validated by Zod; aligned with Role on the authorize side (redundant TypeScript cast but harmless as long as the schema remains the source of truth).
Phase C — IdentityauthSource === "internal" → A2A label hive-internal (no more collision with a literal user.id).
Phase C — Status APIbuildA2APublicStatus + A2APublicStatusPayload; enabled follows A2A_ENABLED (Agent Card 404 / JSON-RPC 503 when disabled).
Phase C — Observabilitymesh.a2a.rpc.request + mesh.a2a.rpc.complete (durationMs, outcome: ok / jsonrpc_error / exception / invalid_json); SSE stream: stream_start / stream_end. Caller fields: callerKind + callerRef via a2aCallerLogFields (no plaintext email).
Phase C — UISettings → A2A imports A2APublicStatusPayload; banner if enabled: false.
Phase C — DocsMESH_V1_REDIS_BUS.md, PRODUCTION.md §10–11, MESH_V1_DONE.md checkboxes updated; some mesh checkboxes remain intentionally open (hard agent→agent transport, dev README, mock bus tests).

2. Remaining limitations (multi-worker)

Everything runs in the same Node process per replica: suitable for "single OS" / one VM / one pod. With multiple workers or replicas:

Scroll horizontally →
TopicStatus
TasksResolved by default: RedisTaskStore + REDIS_URL. A2A_TASK_STORE=memory for local tests without Redis.
A2A rate limitResolved: Redis (hive:rl:a2a:*, A2A_RATE_LIMIT_*).
Card keysResolved in prod: A2A_SIGNING_SECRET_KEY_HEX + A2A_NOISE_STATIC_SECRET_KEY_HEX (64 hex each); otherwise warning (rotating keys).
SDK auditPer process: A2A_SDK_AUDIT_ENABLED=false recommended on multiple workers until a distributed audit store is available.
SDK circuit breakerPer process: A2A_SDK_CIRCUIT_BREAKER_ENABLED=false if per-instance state is not desired.
Server singletonOne HiveA2AServer per worker — normal; shared state goes through Redis.

3. Maintainability

Scroll horizontally →
PointDetail
Stub executorThe text [Hive platform A2A — stub executor] is intentional; any business replacement must remain in server.ts (or a dedicated module imported from there).
A2A logsLogger a2a.jsonrpc + event mesh.a2a.rpc.request — stable fields for filtering; extend with exit/duration if SOC requires it.
Operator statusA single buildA2APublicStatus function prevents drift between the API and future consumers.
URL in the cardBased on AUTH_URL from env(). Behind a reverse proxy, AUTH_URL must be the public URL seen by clients (not http://localhost:3000 in prod).
file: dependencyThe SDK lives in the monorepo; to publish the app without the full repo, you will need npm pack / private registry / workspace CI — plan for this in the release pipeline.

4. Security (brief reminder)

  • JSON-RPC requires at minimum the A2A_JSONRPC_MIN_ROLE role (default viewer): in sensitive prod environments, switching to operator reduces the attack surface (dashboard viewers can no longer call JSON-RPC).
  • GET /api/a2a/status: viewer+ — exposes configuration (TTL, rate limit, SDK flags), no secrets; acceptable for any user who can open Settings; do not make public without review.
  • Public agent card: no secrets; sensitive endpoints remain behind auth.
  • Ongoing review: A2A extensions (HTTP_EXTENSION_HEADER), SDK quotas and audit per THREAT_MODEL if present.

5. Summary: "perfect / scalable / maintainable"

  • Perfect: no — v1 (stub executor); SDK audit/circuit breaker remain optional on the cluster side.
  • Vertically scalable: yes.
  • Horizontally scalable: yes for A2A tasks + quota + stable keys (Redis + env); strict audit still requires evolution or A2A_SDK_AUDIT_ENABLED=false.
  • Maintainable: yes — A2A modules + public-status.ts + api/a2a/* routes.
  • Phase C: good trade-off — policy and observability without over-engineering; the MESH_V1_DONE checklist still reflects honest gaps (business transport, DX compose, bus tests).

For the mesh product checklist, cross-reference with MESH_V1_DONE.md, A2A_INTEGRATION.md, and MESH_V1_REDIS_BUS.md.


6. Addendum — Phase C targeted audit (review "as before")

Scroll horizontally →
CriterionNote
Same-OS consistencyYes: status and JSON-RPC remain in Next + Redis already used by the app.
ScalabilityUnchanged on the data side; status is an env() read per request — negligible.
RBACConsistent with authorize; HIVE_INTERNAL_TOKEN is still treated as operator (so it passes if minRole ≤ operator).
ConfidentialityJSON-RPC logs: callerKind / callerRef (hash for email-like values, UUID pass-through, service labels).
Documentation vs codeMESH_V1_DONE: some checked boxes assume that A2A logs = mesh v1 observability — honest for A2A, not yet for Redis pub/sub without a dedicated business handler.
Minimal debtUnit tests: a2a-log-privacy, buildA2APublicStatus + A2A_ENABLED; no E2E on /api/a2a/status.