๐ Grafana-Style Terminal Dashboard¶
Overview¶
The agent metrics grafana
command provides a comprehensive, Grafana-inspired dashboard directly in your terminal. View real-time metrics, performance indicators, and system health without leaving the command line.
No External Dependencies
The terminal dashboard is completely self-contained. No Grafana installation required!
Features¶
๐จ Rich Visualization¶
- Tables: Formatted data with headers and borders
- Progress Bars: Visual representation of resource usage
- Color Coding: Green/yellow/red indicators for status
- Sections: Organized layout with clear separators
- Summary Box: At-a-glance statistics
๐ Real-Time Updates¶
- Watch Mode: Auto-refresh at configurable intervals
- Live Metrics: See changes as tasks execute
- Screen Clearing: Clean updates without scrolling
๐ Comprehensive Metrics¶
- Agent information and build details
- System resources (memory, goroutines)
- Task execution statistics
- Performance metrics (P50, P99 latencies)
- gRPC request statistics
- Error tracking
Quick Start¶
Basic Usage¶
View dashboard once:
Watch Mode¶
Auto-refresh every 5 seconds (default):
Custom refresh interval (e.g., every 2 seconds):
Stop watching: Press Ctrl+C
Dashboard Sections¶
1. ๐ง Agent Information¶
Displays agent metadata and configuration.
Fields:
Field | Description | Example |
---|---|---|
Version | Agent build version | v1.2.3 , dev |
OS | Operating system | linux , darwin |
Architecture | CPU architecture | arm64 , amd64 |
Uptime | Time since agent start | 2h 34m , 5d 12h 45m |
Last Updated | Metrics snapshot timestamp | 2025-10-05 15:42:30 |
Example Output:
๐ง Agent Information
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Version โ v1.2.3 โ
โ OS โ linux โ
โ Architecture โ arm64 โ
โ Uptime โ 2h 34m โ
โ Last Updated โ 2025-10-05 15:42:30 โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2. ๐ป System Resources¶
Visual progress bars showing resource utilization.
Metrics:
Goroutines¶
- Current count vs. threshold (default: 1000)
- Color coding:
- ๐ข Green: < 60% (healthy)
- ๐ก Yellow: 60-80% (moderate)
- ๐ด Red: > 80% (high)
Memory (MB)¶
- Allocated memory in megabytes
- Threshold: 512MB default
- Same color coding as goroutines
Example Output:
๐ป System Resources
Goroutines: [โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 342/1000 (34.2%)
Memory (MB): [โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 78/512 (15.2%)
Interpretation: - Green bars: System is healthy - Yellow bars: Monitor closely, may need attention - Red bars: Resource pressure, investigate
3. ๐ Task Metrics¶
Summary of task execution results.
Status Table:
Status | Icon | Color | Description |
---|---|---|---|
Success | โ | Green | Tasks completed successfully |
Failed | โ | Red | Tasks that failed |
Skipped | โ | Yellow | Tasks skipped (conditions not met) |
Running Tasks Bar: - Current concurrent tasks - Threshold: 10 (configurable)
Example Output:
๐ Task Metrics
โโโโโโโโโโโโโฌโโโโโโโโโ
โ Status โ Count โ
โโโโโโโโโโโโโผโโโโโโโโโค
โ โ Success โ 145 โ
โ โ Failed โ 3 โ
โ โ Skipped โ 12 โ
โโโโโโโโโโโโโดโโโโโโโโโ
Running Tasks: [โโโโโโโโโโโโโโโโโโโโโโโ] 2/10 (20.0%)
Use Cases: - Quick health check: Low failure rate is good - Capacity monitoring: High running tasks may indicate bottleneck - Audit trail: Total tasks executed
4. โฑ๏ธ Task Performance¶
Detailed latency metrics for executed tasks.
Columns:
Column | Description |
---|---|
Task | Task name from .sloth file |
P50 (ms) | Median execution time (50th percentile) |
P99 (ms) | 99th percentile latency |
Status | Performance indicator |
Performance Indicators:
Indicator | Criteria | Meaning |
---|---|---|
๐ข Fast | P99 < 1000ms | Excellent performance |
๐ก Normal | P99 < 5000ms | Acceptable performance |
๐ด Slow | P99 >= 5000ms | Needs optimization |
Example Output:
โฑ๏ธ Task Performance
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโ
โ Task โ P50 (ms) โ P99 (ms) โ Status โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโค
โ install_packages โ 234.56 โ 567.89 โ ๐ก Normalโ
โ check_service โ 12.34 โ 45.67 โ ๐ข Fast โ
โ deploy_app โ 1234.56 โ 5678.90 โ ๐ด Slow โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโ
Action Items: - ๐ข Fast tasks: No action needed - ๐ก Normal tasks: Monitor trends - ๐ด Slow tasks: Investigate and optimize
5. ๐ gRPC Metrics¶
Statistics for master-agent communication.
Columns:
Column | Description |
---|---|
Method | gRPC method name |
Requests | Total requests for this method |
Avg Latency (ms) | P50 latency in milliseconds |
Common Methods:
Method | Description |
---|---|
ExecuteTask | Task execution requests |
ExecuteCommand | Direct command execution |
GetAgentInfo | Agent info queries |
RegisterAgent | Agent registration |
Example Output:
๐ gRPC Metrics
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ Method โ Requests โ Avg Latency (ms) โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโค
โ ExecuteTask โ 156 โ 234.56 โ
โ ExecuteCommand โ 45 โ 12.34 โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
Interpretation: - High request count: Agent is actively used - High latency: Network or master performance issues - Low latency (<50ms): Excellent connectivity
6. โ ๏ธ Errors¶
Error tracking by type (only shown if errors exist).
Columns:
Column | Description |
---|---|
Error Type | Category of error |
Count | Number of occurrences (red) |
Common Error Types:
Type | Description |
---|---|
task_execution | Errors during task execution |
grpc_timeout | gRPC request timeouts |
module_error | Module-specific errors |
network_error | Network connectivity issues |
Example Output:
โ ๏ธ Errors
โโโโโโโโโโโโโโโโโโฌโโโโโโโโ
โ Error Type โ Count โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโค
โ task_execution โ 12 โ
โ grpc_timeout โ 3 โ
โ module_error โ 5 โ
โโโโโโโโโโโโโโโโโโดโโโโโโโโ
Action Items: - Investigate errors with highest counts - Check logs for error details - Review failing tasks in Task Metrics section
7. ๐ Summary¶
Consolidated overview in a highlighted box.
Metrics:
Metric | Description | Color |
---|---|---|
Total Tasks | All tasks executed | Cyan |
Running | Currently executing | Yellow |
Memory | Current allocation (MB) | Green |
Goroutines | Active goroutines | Magenta |
Example Output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Summary โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Total Tasks: 160 | Running: 2 | Memory: 78 MB | โ
โ Goroutines: 342 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Use Cases¶
Development Workflow¶
Monitor tasks during development:
You're developing a deployment script and want to see metrics in real-time.
- See task counts increment
- Monitor latency changes
- Catch errors immediately
Performance Tuning¶
Identify bottlenecks:
Production Monitoring¶
Quick health checks:
Capacity Planning¶
Determine if you need more agents:
Deciding if current agent fleet is sufficient.
- ๐ด Red resource bars: Add more agents
- ๐ก Yellow consistently: Monitor closely
- ๐ข Green: Current capacity is good
Advanced Features¶
Watch Mode¶
Continuous monitoring with auto-refresh:
# Refresh every 5 seconds (default)
./sloth-runner agent metrics grafana my-agent --watch
# Fast refresh (1 second) for development
./sloth-runner agent metrics grafana my-agent --watch --interval 1
# Slow refresh (30 seconds) for overview
./sloth-runner agent metrics grafana my-agent --watch --interval 30
Features: - Clears screen between updates for clean display - Press Ctrl+C
to stop - Ideal for monitoring during task execution
Comparison¶
Compare metrics across multiple agents:
#!/bin/bash
# compare-agents.sh
agents=("agent1" "agent2" "agent3")
for agent in "${agents[@]}"; do
echo "========================================="
echo "Agent: $agent"
echo "========================================="
./sloth-runner agent metrics grafana $agent
echo ""
read -p "Press Enter for next agent..."
done
Scripting¶
Extract specific metrics for automation:
# Get current running tasks
./sloth-runner agent metrics grafana my-agent | grep "Running Tasks"
# Check for errors
./sloth-runner agent metrics grafana my-agent | grep -A10 "โ ๏ธ Errors"
# Extract memory usage
./sloth-runner agent metrics grafana my-agent | grep "Memory (MB)"
Color Reference¶
Status Colors¶
Color | Hex | Usage |
---|---|---|
๐ข Green | #4CAF50 | Success, healthy, fast |
๐ก Yellow | #FFC107 | Warning, moderate, skipped |
๐ด Red | #F44336 | Error, high, slow |
๐ต Cyan | #00BCD4 | Information, totals |
๐ฃ Magenta | #9C27B0 | Secondary metrics |
Visual Indicators¶
Symbol | Meaning |
---|---|
โ | Success |
โ | Failure |
โ | Skipped |
๐ข | Fast/Healthy |
๐ก | Normal/Warning |
๐ด | Slow/Critical |
Troubleshooting¶
Dashboard Shows "No Data"¶
Symptoms: All metrics are zero or empty
Causes: 1. Agent just started (no tasks executed yet) 2. Telemetry disabled 3. Metrics endpoint unreachable
Solutions:
# Check if agent has telemetry enabled
./sloth-runner agent list
# Verify metrics endpoint
./sloth-runner agent metrics prom my-agent --snapshot
# Execute a test task to generate metrics
./sloth-runner agent run my-agent "echo test"
# Try dashboard again
./sloth-runner agent metrics grafana my-agent
Connection Refused¶
Symptoms: "Failed to fetch metrics: connection refused"
Causes: 1. Agent is down 2. Metrics port is blocked 3. Wrong agent name
Solutions:
# Verify agent is running
./sloth-runner agent list
# Check metrics endpoint
curl http://agent-ip:9090/health
# Check firewall
telnet agent-ip 9090
Incomplete Dashboard¶
Symptoms: Some sections missing
Causes: 1. No data for that metric category (e.g., no errors = no Errors section) 2. Old agent version without all metrics
Solutions: - This is normal! Sections only appear when data exists. - For Errors section: Only shown when errors > 0 - For Task Performance: Only shown when tasks have been executed
Watch Mode Not Updating¶
Symptoms: Dashboard frozen in watch mode
Causes: 1. Terminal doesn't support ANSI escape codes 2. Very long refresh interval
Solutions:
# Use shorter interval
./sloth-runner agent metrics grafana my-agent --watch --interval 2
# Try different terminal
# (e.g., iTerm2, modern Terminal.app, Windows Terminal)
# Fallback: Run without watch mode
./sloth-runner agent metrics grafana my-agent
Best Practices¶
Refresh Intervals¶
Use Case | Recommended Interval |
---|---|
Active development | 1-2 seconds |
Task execution monitoring | 5 seconds (default) |
Background monitoring | 10-30 seconds |
Overview checks | Single run (no watch) |
When to Use¶
โ Use Dashboard For: - Quick health checks - Real-time task monitoring - Performance troubleshooting - Development feedback
โ Don't Use Dashboard For: - Historical analysis (use Grafana web UI) - Alerting (use Prometheus alerts) - Long-term trends (use time-series visualization) - Multi-agent comparison (manually run for each)
Complementary Tools¶
Tool | When to Use |
---|---|
Terminal Dashboard | Quick checks, development |
Prometheus | Historical queries, alerting |
Grafana Web UI | Long-term trends, dashboards |
agent metrics prom | Get endpoint URL, raw metrics |
Examples¶
Example 1: Healthy Agent¶
๐ Sloth Runner Metrics Dashboard - Agent: production-1
๐ง Agent Information
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโ
โ Version โ v1.2.3 โ
โ OS โ linux โ
โ Architecture โ amd64 โ
โ Uptime โ 7d 14h 23m โ
โ Last Updated โ 2025-10-05 10:30:15 โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโ
๐ป System Resources
Goroutines: [โโโโโโโโโโโโโโ] 125/1000 (12.5%)
Memory (MB): [โโโโโโโโโโโโโ] 45/512 (8.8%)
๐ Task Metrics
โโโโโโโโโโโโโฌโโโโโโโโ
โ Status โ Count โ
โโโโโโโโโโโโโผโโโโโโโโค
โ โ Success โ 1,234 โ
โ โ Failed โ 5 โ
โโโโโโโโโโโโโดโโโโโโโโ
Running Tasks: [โโโโโโโโโโโโ] 0/10 (0.0%)
โฑ๏ธ Task Performance
โโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโ
โ Task โ P50 (ms) โ P99 (ms) โ Status โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโค
โ health_checkโ 5.23 โ 12.45 โ ๐ข Fast โ
โ deploy โ 456.78 โ 892.34 โ ๐ก Normalโ
โโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Summary โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฃ
โ Total Tasks: 1,239 | Running: 0 | โ
โ Memory: 45 MB | Goroutines: 125 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Example 2: Agent Under Load¶
๐ Sloth Runner Metrics Dashboard - Agent: worker-3
๐ป System Resources
Goroutines: [โโโโโโโโโโโโโโโโโโโโโโโโโโ] 857/1000 (85.7%)
Memory (MB): [โโโโโโโโโโโโโโโโโโโโโโโโโโ] 412/512 (80.5%)
๐ Task Metrics
Running Tasks: [โโโโโโโโโโ] 8/10 (80.0%)
โฑ๏ธ Task Performance
โโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโ
โ Task โ P50 (ms) โ P99 (ms) โ Status โ
โโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโค
โ big_deploy โ 3456.78 โ 8932.12 โ ๐ด Slow โ
โโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโ
โ ๏ธ Errors
โโโโโโโโโโโโโโโโโโฌโโโโโโโโ
โ Error Type โ Count โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโค
โ task_timeout โ 23 โ
โโโโโโโโโโโโโโโโโโดโโโโโโโโ
Interpretation: This agent is under heavy load. Consider: - Reducing concurrent tasks - Optimizing slow tasks - Adding more agents to distribute load - Investigating task timeouts
Next Steps¶
- Prometheus Metrics Reference - Detailed metric documentation
- Deployment Guide - Set up production monitoring
- Telemetry Overview - Back to overview
Further Reading¶
- pterm Library Documentation - Terminal visualization library used
- Prometheus Best Practices - Metric naming and usage