Agent Nodes and Remote Execution
AgentHub can bind an agent to either the local Main Node or a registered remote Agent Node. This enables distributed execution while keeping a single control plane.
Why Agent Nodes Matter
Agent Nodes are not just a deployment detail. They let AgentHub keep one control plane while moving execution closer to:
- Data: Large datasets that shouldn't move over network
- Compute: GPU/TPU resources or specialized hardware
- Network: Required network boundaries or VPN segments
- Ownership: Machines that should own local worktrees
This is also where the actor p2p model matters: mailbox and control traffic can stay consistent even when the process runs remotely.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Main Control Plane │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web UI │ │ API │ │ Registry │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Internal gRPC Control Plane │ │
│ │ (mTLS/TLS encrypted, authenticated) │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Agent Node │ │ Agent Node │ │ Agent Node │
│ (node-gpu) │ │ (node-east) │ │ (node-west) │
└──────────────┘ └──────────────┘ └──────────────┘
What Agent Nodes Control
Each registered node stores:
| Field | Required | Description |
|---|---|---|
id | Yes | Stable unique identifier (e.g., node-gpu-01) |
name | Yes | Human-readable name |
grpc_target | Yes | gRPC endpoint (e.g., https://node1.internal:50051) |
tls_server_name | No | TLS SNI override for certificate validation |
default_worktree_root | No | Default base for create_worktree mode |
The node registry is a control-plane view. Runtime state still lives on the selected execution node.
Deployment Prerequisites
Before registering remote nodes, ensure:
Main Control Plane
[internal_grpc]
enabled = true
listen = "0.0.0.0:50051"
[internal_grpc.security]
mode = "tls" # or "mtls" for mutual TLS
cert_dir = "/etc/agenthub/certs"
[internal_grpc.auth]
shared_secret = "your-256-bit-secret-here"
issuer = "agenthub"
audience = "agenthub-internal"
[internal_grpc.bootstrap]
token = "bootstrap-token-for-remote-nodes"
Root operators can copy the bootstrap token from Agents -> Join node with token.
Agent Node onboarding is token-based; QR is not part of the node join path.
Remote Node
[server]
role = "node"
node_id = "node-01"
[internal_grpc]
enabled = true
listen = "0.0.0.0:50051"
[internal_grpc.security]
mode = "tls"
cert_dir = "/etc/agenthub/certs"
[internal_grpc.auth]
shared_secret = "same-secret-as-main" # Must match!
issuer = "agenthub"
audience = "agenthub-internal"
[internal_grpc.bootstrap]
token = "bootstrap-token-from-main-control-plane"
node_id must match the node you register on the main control plane. In node
mode, AgentHub only starts the internal gRPC execution/control surface; it does
not serve the public web UI or HTTP API.
The bootstrap token is only the join handshake. TLS material, JWT issuer/audience, and the shared secret must still match the main control plane configuration today.
Network Requirements
- Main control plane can reach remote node on gRPC port
- Firewalls allow bidirectional gRPC traffic
- DNS resolution (or IP addresses) configured
TLS Configuration
Mode: tls (Server Authentication)
Remote node presents certificate; main control plane verifies.
Generate certificates:
# On main control plane
cd /etc/agenthub/certs
# Generate CA
openssl req -x509 -newkey rsa:4096 -keyout ca-key.pem -out ca-cert.pem \
-days 365 -nodes -subj "/CN=agenthub-ca"
# Generate server cert for remote node
openssl req -newkey rsa:4096 -keyout node-key.pem -out node-csr.pem \
-nodes -subj "/CN=node1.internal"
openssl x509 -req -in node-csr.pem -CA ca-cert.pem -CAkey ca-key.pem \
-out node-cert.pem -days 365 -CAcreateserial
# Copy to remote node
scp ca-cert.pem node-cert.pem node-key.pem node1.internal:/etc/agenthub/certs/
Register node:
ID: node-01
Name: GPU Node 01
gRPC Target: https://node1.internal:50051
TLS Server Name: node1.internal
Default Worktree Root: /data/agenthub/worktrees
Mode: mtls (Mutual Authentication)
Both sides present and verify certificates. More secure but complex.
Benefits:
- Node cannot be impersonated
- Main control plane identity verified by nodes
- No shared secret needed (certificates only)
Setup:
- Generate client certificates for main control plane
- Distribute CA to all nodes
- Configure
mode = "mtls"on both sides
Deployment Examples
Example 1: Single Remote GPU Node
Scenario: Machine learning workloads on GPU server.
Main Control Plane (hub.example.com):
[server]
listen = "0.0.0.0:8080"
[internal_grpc]
enabled = true
listen = "0.0.0.0:50051"
[internal_grpc.security]
mode = "tls"
cert_dir = "/etc/agenthub/certs"
[internal_grpc.auth]
shared_secret = "CHANGE-ME-256-BIT-SECRET"
GPU Node (gpu01.internal):
[server]
role = "node"
node_id = "gpu-01"
[internal_grpc]
enabled = true
listen = "0.0.0.0:50051"
[internal_grpc.security]
mode = "tls"
cert_dir = "/etc/agenthub/certs"
[internal_grpc.auth]
shared_secret = "CHANGE-ME-256-BIT-SECRET"
[worktree]
default_root = "/data/agenthub/worktrees"
Bootstrap and registration:
# 1. Copy the bootstrap token from Agents -> Join node with token.
# 2. Start the remote node with the matching [internal_grpc] config above.
# 3. Register the reachable node route via UI or API.
curl -X POST http://hub.example.com:8080/api/admin/agent-nodes \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "gpu-01",
"name": "GPU Server 01",
"grpc_target": "https://gpu01.internal:50051",
"tls_server_name": "gpu01.internal",
"default_worktree_root": "/data/agenthub/worktrees"
}'
Test:
# Create agent targeting GPU node
# UI: Agents → Create → Execution Node: "GPU Server 01"
Example 2: Multi-Region Deployment
Scenario: Teams in US and EU with local execution.
┌────────────── ───┐ ┌─────────────────┐
│ Main Control │◄───────►│ Main Control │
│ (us-central) │ │ (eu-west) │
│ (passive) │ sync │ (active) │
└────────┬────────┘ └────────┬────────┘
│ │
┌────┴────┐ ┌────┴────┐
▼ ▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│us-01 │ │us-02 │ │eu-01 │ │eu-02 │
└──────┘ └──────┘ └──────┘ └──────┘
Example 3: Kubernetes Deployment
Main Control Plane (Deployment + Service):
apiVersion: apps/v1
kind: Deployment
metadata:
name: agenthub-control
spec:
replicas: 1
selector:
matchLabels:
app: agenthub-control
template:
metadata:
labels:
app: agenthub-control
spec:
containers:
- name: agenthub
image: agenthub:latest
ports:
- containerPort: 8080
name: http
- containerPort: 50051
name: grpc
volumeMounts:
- name: config
mountPath: /etc/agenthub
- name: data
mountPath: /data
volumes:
- name: config
configMap:
name: agenthub-config
- name: data
persistentVolumeClaim:
claimName: agenthub-data
---
apiVersion: v1
kind: Service
metadata:
name: agenthub-control
spec:
selector:
app: agenthub-control
ports:
- port: 8080
name: http
- port: 50051
name: grpc
Agent Node (DaemonSet for node-local execution):
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: agenthub-node
spec:
selector:
matchLabels:
app: agenthub-node
template:
metadata:
labels:
app: agenthub-node
spec:
hostNetwork: true
containers:
- name: agenthub
image: agenthub:latest
command: ["agenthub", "--node-mode"]
ports:
- containerPort: 50051
volumeMounts:
- name: workdir
mountPath: /workdirs
volumes:
- name: workdir
hostPath:
path: /var/agenthub/workdirs
Internal gRPC Also Powers agenthub actor ...
The same internal gRPC control plane is used by the actor CLI:
# These commands require internal gRPC:
agenthub actor team-members --actor-id <id>
agenthub actor inbox --actor-id <id> --run-id <run_id>
agenthub actor ack --actor-id <id> --message-id <msg_id>
agenthub actor send --actor-id <id> --to <recipient> --payload '{}'
agenthub actor time-trigger-list --actor-id <id>
agenthub actor permission-review-respond --request-id <id> --decision allow
Important: internal_grpc.enabled = true is required even on single-machine setups for actor CLI to work.
Local Loopback Configuration
For local actor CLI usage:
[internal_grpc]
enabled = true
listen = "127.0.0.1:50051"
[internal_grpc.auth]
shared_secret = "local-dev-secret"
The CLI reads shared_secret from config to mint tokens. Auto-generated secrets (stored in cert_dir/auth_secret.txt) are not automatically used by CLI.
Actor CLI Batch Operations
The authority-side actor CLI also supports small client-side batch workflows for high-frequency operator actions:
agenthub actor ack --message-id 101 --message-id 102agenthub actor permission-review-respond --permission-id req-1 --permission-id req-2 --option-id approved
Batch handling is still sequential on the client side. AgentHub does not expose a separate batch internal gRPC protocol for these operations.
Single-item calls keep their original JSON object output. Multi-item calls return a JSON array of per-item responses.
For permission review responses, any session or persistent approval path must
use the request-provided --option-id value. --outcome currently supports
only cancelled.
Register and Edit Nodes
Via Web UI
- Go to
Agentspage - Click "Register Node" (root only)
- Fill in node details
- Test connection
Via API
Register node:
curl -X POST http://localhost:8080/api/admin/agent-nodes \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "node-01",
"name": "Production Node 01",
"grpc_target": "https://node01.prod.internal:50051",
"tls_server_name": "node01.prod.internal",
"default_worktree_root": "/data/agenthub/worktrees"
}'
List nodes:
curl http://localhost:8080/api/admin/agent-nodes \
-H "Authorization: Bearer $TOKEN"
Update node:
curl -X PUT http://localhost:8080/api/admin/agent-nodes/node-01 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Updated Name",
"default_worktree_root": "/new/path"
}'
Delete node:
curl -X DELETE http://localhost:8080/api/admin/agent-nodes/node-01 \
-H "Authorization: Bearer $TOKEN"
Note: Nodes with active agents cannot be deleted.
Default Worktree Root
Default worktree root is optional and applies to remote create_worktree agents.
With default root:
- Leaving
Workdirblank increate_worktreemode is allowed - AgentHub derives workdir under node root
- Example:
/data/worktrees/myproject-goal-abc123/
Without default root:
- Must provide explicit
Workdir - Full path must exist on remote node
- Must be within node's safe_paths
Per-Node Worktree Strategy
| Node Type | Recommended Default Root | Use Case |
|---|---|---|
| GPU nodes | /data/agenthub/worktrees | ML workloads with large datasets |
| Build nodes | /var/lib/agenthub/builds | CI/CD compilation |
| Dev nodes | /home/agenthub/worktrees | Development environments |
Execution Behavior
Main Node
- Local safe-path and worktree policies apply directly
- No network overhead
- Filesystem access is direct
Remote Node
- AgentHub proxies lifecycle control over encrypted gRPC
- Execution data stays on remote node
- UI/Control stays on main control plane
- Output streamed via gRPC to main plane, then to UI
Network Flow
User → Main Control Plane → Remote Node
↓ ↓
gRPC call Process spawned
↓ ↓
Status check Output captured
↓ ↓
Stream to UI ← Output via gRPC
Actor P2P And Mailbox Delivery
Remote execution preserves the actor model:
- Actor control: Relayed over internal gRPC
- Mailbox delivery: Targets remote recipients through same path
- Local state: Remote nodes keep execution data
- Central view: Main node is primary control plane
This ensures remote execution feels like AgentHub, not a different product.
Health Checking
Manual Health Check
# Check gRPC endpoint
curl -v telnet://node01.internal:50051
# Check TLS
echo | openssl s_client -connect node01.internal:50051 2>/dev/null | openssl x509 -noout -text
# Check via AgentHub API
curl http://main-hub:8080/api/admin/agent-nodes/node-01/health \
-H "Authorization: Bearer $TOKEN"
Automated Monitoring
Monitor these metrics:
| Metric | Warning Threshold | Critical Threshold |
|---|---|---|
| gRPC latency | > 100ms | > 500ms |
| Connection failures | > 1% | > 10% |
| Agent start time | > 30s | > 60s |
| Disk usage (node) | > 80% | > 95% |
Troubleshooting
"failed to connect to remote node"
Causes:
- Network unreachable
- Firewall blocking
- Node not running
- TLS certificate mismatch
Diagnostic:
# From main control plane
grpcurl -insecure node01.internal:50051 list
# Check certificates
echo | openssl s_client -connect node01.internal:50051 -servername node01.internal
"certificate verify failed"
Solutions:
- Verify
tls_server_namematches certificate CN/SAN - Check CA certificate is in trust store
- Regenerate certificates if expired
"unauthorized"
Causes:
shared_secretmismatch- Token expired
- Clock skew between nodes
Solutions:
- Ensure same
shared_secreton all nodes - Check system clocks are synchronized (NTP)
- Restart nodes after secret changes
Slow Agent Start on Remote Node
Possible causes:
- Network latency to node
- Slow worktree creation
- Resource constraints on node
Solutions:
- Use
use_existingmode for faster starts - Pre-warm worktree directories
- Monitor node resource usage
Operator Rollout Flow
New Node Onboarding
-
Prepare node:
# Install AgentHub on remote machine# Copy TLS certificates# Create config file -
Start node:
agenthub# Verify gRPC port listeningss -tlnp | grep 50051 -
Verify connectivity:
# From main control planecurl http://main-hub:8080/api/admin/agent-nodes \-H "Authorization: Bearer $TOKEN" -
Register node (via UI or API)
-
Set default worktree root (optional)
-
Test:
- Create remote-target agent
- Verify card shows
node:<id> - Start agent and confirm output visible
-
Production readiness:
- Configure monitoring
- Set up log aggregation
- Document node capabilities
Security Best Practices
- Use mTLS for production multi-node deployments
- Rotate shared_secret periodically
- Limit safe_paths on each node to minimum required
- Use dedicated service accounts for AgentHub processes
- Enable audit logging for node registration/changes
- Network segmentation: Place nodes in appropriate network zones
Operational Tips
- Stable IDs: Use environment-oriented IDs like
node-eastorbuild-fleet-a - Naming: Include region/purpose in names:
us-east-gpu-01 - Capacity planning: Monitor node resource usage
- Gradual rollout: Validate one node before full deployment
- Documentation: Maintain node capability matrix (GPU, memory, etc.)