Troubleshooting

This page helps you diagnose and resolve common AgentHub issues.

Quick Diagnostic Commands

# Check if AgentHub is running
curl -i http://localhost:8080/

# Check authentication
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/api/me

# Verify internal gRPC (if enabled)
agenthub actor inbox --actor-id test --limit 1

# Check disk space for event databases
df -h ~/.agenthub

# View recent logs
journalctl -u agenthub -n 100 --no-pager

Symptoms: Browser cannot connect to http://localhost:8080

Diagnostic Steps:

Check if AgentHub is running:
```
pgrep -a agenthub
```

Verify the server is listening:

netstat -tlnp | grep 8080
# or
ss -tlnp | grep 8080

Check config for correct listen address:

[server]
listen = "127.0.0.1:8080"  # or "0.0.0.0:8080" for remote access

Solutions:

Start AgentHub: agenthub
Check firewall rules if accessing remotely
Verify port is not in use by another process

Invalid Credentials

Symptoms: "Invalid username or password" error

Solutions:

Verify caps lock is off
Check if root user exists: check ~/.agenthub/agenthub.db with sqlite3
If root password lost, you may need to reset the database (data loss)

Join Token Issues

Symptoms: "Invalid join token" or "Join token expired"

Solutions:

Generate new join challenge via admin API
Complete join within token expiration window
Verify PIN is entered correctly

Agent Startup Issues

Agent Cannot Start

Symptoms: Clicking Start shows error or no response

Common Causes:

Cause	Diagnostic	Solution
Invalid workdir	Check path exists	Create directory or choose different path
Path outside safe_paths	Check `safe_paths` config	Add path to config or use allowed path
Missing ACP binary	Check configured `codex_acp.binary` (or startup logs showing `config codex_acp_binary`) and verify that exact path exists and is executable	Install the bundled ACP adapter or set `codex_acp.binary` to a valid executable
Permission denied	Check directory permissions	`chmod 755 /path/to/workdir`

"Workdir path must be within allowed safe paths"

Solutions:

Add path to safe_paths in config:

safe_paths = [
  "/home/you/projects",
  "/new/path/here",
]

Use create_worktree mode (uses configured default_root)
Restart AgentHub after config changes

ACP Binary Starts But Fails Immediately

Symptoms: Agent starts and exits quickly, or the server logs show ACP handshake / startup errors.

Diagnostic Steps:

Check which binary AgentHub is launching by inspecting the configured codex_acp.binary value first.

Verify that exact adapter binary is the expected one:

/path/to/your-configured-agenthub-codex-acp --version

Compare the binary path and reported adapter version to the deployed AgentHub build, or rebuild/replace the adapter from the same repository revision if you are unsure they match.

Solutions:

Rebuild from the current repository state if the binary is stale
Avoid mixing an older fork-pinned ACP binary into a newer AgentHub deployment
If you intentionally use a custom ACP binary, set codex_acp.binary to that exact path and verify protocol compatibility first

"Agent is already running"

Solutions:

Wait for current run to complete
Stop the agent first, then restart
Check if zombie process exists:
```
ps aux | grep agenthub-codex-acp
```

Session and Output Issues

No Output or Stale Output

Symptoms: Agent shows "running" but no new output appears

Diagnostic Steps:

Check connection badge in UI header
- connected: SSE connection active
- connecting / reconnecting: Connection issues

Check browser console for SSE errors:

// In browser console
new EventSource('/api/agents/{agent_id}/events')

Verify event database is writable:

ls -la ~/.agenthub/agent-events/{agent_id}.db

Check server logs for ACP event sink errors

Recovery Actions:

Keep the current session for evidence
Create a fresh session with same prompt
Compare behavior to isolate differences
Check Connection Status and Recovery

History Replay Is Slow

Symptoms: Opening a completed session takes long time to load

Solutions:

Large sessions (>10k events) naturally take time
Reduce event_retention_days in config

Delete old agent event databases:

rm ~/.agenthub/agent-events/{old_agent_id}.db

Events Missing From History

Symptoms: Recent events don't appear in session view

Causes:

Events persisted asynchronously (small delay normal)
Database locked during cleanup (if vacuum_on_cleanup = true)
Event database corruption

Solutions:

Wait a few seconds and refresh
Check server logs for SQLite errors
Restart AgentHub if database appears stuck

Team Issues

Team Runtime Won't Start

Symptoms: "Start Team" fails or hangs

Diagnostic Steps:

Check Team spec is valid JSON
Verify all member_id references are valid
Ensure leader_member_id references existing member
Check entrypoint is defined

Common Errors:

Error	Cause	Solution
`spec.members must be an array`	Invalid spec format	Ensure members is JSON array
`spec.leader_member_id must reference a defined member`	Leader not in members list	Add leader to members or fix reference
`step already exists for run`	Duplicate step key	Use unique step keys

Permission Review Not Routing

Symptoms: Tool permission requests not reaching reviewers

Solutions:

Check Team has leader defined
Verify requester_role is set correctly

Ensure agenthub actor ... commands work:

agenthub actor team-members --actor-id <leader_actor_id>

Team Messages Not Delivered

Symptoms: Messages sent but not received by members

Diagnostic Steps:

Check actor inbox:

agenthub actor inbox --actor-id <member_id> --run-id <run_id>

Verify mailbox routing is correct
Check internal gRPC connectivity

Internal gRPC and Actor Issues

"internal grpc client not available"

Solutions:

Enable internal gRPC in config:

[internal_grpc]
enabled = true
listen = "127.0.0.1:50051"

Restart AgentHub
Verify shared_secret is explicitly configured

Actor CLI Commands Fail

Symptoms: agenthub actor inbox returns error

Diagnostic Steps:

Verify internal gRPC is enabled
Check shared_secret matches between config and CLI context
Ensure authority process is running
Verify --actor-id is correct

Solutions:

# Test basic connectivity
agenthub actor team-members --actor-id <id>

# Check inbox with explicit run scope
agenthub actor inbox --actor-id <id> --run-id <run_id> --limit 10

Agent Node Issues

Cannot Register Remote Node

Symptoms: "failed to connect to remote node" error

Diagnostic Steps:

Verify remote AgentHub is running with internal gRPC enabled
Check network connectivity:
```
telnet remote-host 50051
```
Verify TLS certificates are valid
Check firewall rules

Remote Agent Won't Start

Symptoms: Agent assigned to remote node but doesn't start

Solutions:

Verify node is reachable from main control plane
Check node's Default worktree root is configured or provide explicit Workdir
Review node logs on remote machine
Ensure same shared_secret across cluster

Performance Issues

High CPU Usage

Diagnostic Steps:

Check active agent count
Review event database sizes:
```
du -sh ~/.agenthub/agent-events/*.db
```
Monitor cleanup operations

Solutions:

Reduce event_retention_days
Lower delete_batch_size for gentler cleanup
Delete old/unused agents

High Disk Usage

Diagnostic Steps:

# Check AgentHub data directory
 du -sh ~/.agenthub/*

# Find largest event databases
ls -lhS ~/.agenthub/agent-events/*.db | head -10

Solutions:

Enable vacuum_on_cleanup = true
Manually delete old agent databases
Reduce retention period

Slow Query Performance

Causes:

Large event databases without cleanup
Missing indexes (should be automatic)
Concurrent cleanup operations

Solutions:

Regular cleanup via event_retention_days
Schedule maintenance window for VACUUM
Monitor with: sqlite3 agent.db "PRAGMA integrity_check;"

Notification Issues

Push Notifications Not Received

Diagnostic Steps:

Check browser notification permission
Verify VAPID keys exist:
```
cat ~/.agenthub/vapid.json
```
Check subscription status in UI
Verify subject is configured

Solutions:

Re-subscribe in browser
Rotate VAPID keys if corrupted
Check HTTPS requirement for production

Database Issues

SQLite Lock/Timeout Errors

Symptoms: "database is locked" errors in logs

Solutions:

Reduce concurrent operations
Check for long-running queries
Ensure proper connection pooling

Database Corruption

Symptoms: SQLite integrity check failures

Recovery:

# Backup first
cp ~/.agenthub/agenthub.db ~/.agenthub/agenthub.db.backup

# Check integrity
sqlite3 ~/.agenthub/agenthub.db "PRAGMA integrity_check;"

# For event databases
sqlite3 ~/.agenthub/agent-events/{agent_id}.db "PRAGMA integrity_check;"

Getting Help

When reporting issues, include:

AgentHub version: agenthub --version
Config (sanitized): cat ~/.agenthub/config.toml
Logs at debug level:
```
# Add to config
[logging]
level = "debug"
```
System info:
```
uname -a
df -h
free -h
```
Reproduction steps

Recovery Strategy

For serious issues:

Keep evidence: Don't delete failed sessions
Create fresh environment: New workdir/worktree
Isolate variables: Test with minimal config
Incremental changes: Add complexity gradually
Monitor: Watch logs during recovery

Quick Diagnostic Commands​

Login Issues​

Cannot Access Login Page​

Invalid Credentials​

Join Token Issues​

Agent Startup Issues​

Agent Cannot Start​

"Workdir path must be within allowed safe paths"​

ACP Binary Starts But Fails Immediately​

"Agent is already running"​

Session and Output Issues​

No Output or Stale Output​

History Replay Is Slow​

Events Missing From History​

Team Issues​

Team Runtime Won't Start​

Permission Review Not Routing​

Team Messages Not Delivered​

Internal gRPC and Actor Issues​

"internal grpc client not available"​

Actor CLI Commands Fail​

Agent Node Issues​

Cannot Register Remote Node​

Remote Agent Won't Start​

Performance Issues​

High CPU Usage​

High Disk Usage​

Slow Query Performance​

Notification Issues​

Push Notifications Not Received​

Database Issues​

SQLite Lock/Timeout Errors​

Database Corruption​

Getting Help​

Recovery Strategy​

Quick Diagnostic Commands

Login Issues

Cannot Access Login Page

Invalid Credentials

Join Token Issues

Agent Startup Issues

Agent Cannot Start

"Workdir path must be within allowed safe paths"

ACP Binary Starts But Fails Immediately

"Agent is already running"

Session and Output Issues

No Output or Stale Output

History Replay Is Slow

Events Missing From History

Team Issues

Team Runtime Won't Start

Permission Review Not Routing

Team Messages Not Delivered

Internal gRPC and Actor Issues

"internal grpc client not available"

Actor CLI Commands Fail

Agent Node Issues

Cannot Register Remote Node

Remote Agent Won't Start

Performance Issues

High CPU Usage

High Disk Usage

Slow Query Performance

Notification Issues

Push Notifications Not Received

Database Issues

SQLite Lock/Timeout Errors

Database Corruption

Getting Help

Recovery Strategy