AgentTel¶

Agent-Ready Telemetry

Enrich OpenTelemetry spans — backend and frontend — with the structured context AI agents need to autonomously diagnose and resolve production incidents.

Get Started Browse the Reference

What is AgentTel?¶

Standard observability answers "What happened?" AgentTel adds "What does an AI agent need to know to act on this?"

Modern observability tools generate massive volumes of telemetry — traces, metrics, logs — optimized for human consumption through dashboards and alert rules. AI agents tasked with autonomous incident response face critical gaps:

No behavioral context — Spans lack baselines, so agents can't distinguish normal from anomalous
No topology awareness — Agents don't know which services are critical, who owns them, or what depends on what
No decision metadata — Is this operation retryable? Is there a fallback? What's the runbook?
No actionable interface — Agents can read telemetry but can't query live system state or execute remediation

AgentTel closes these gaps at the instrumentation layer — enriching every span across the full stack (JVM, Go, Node.js, Python backends and browser frontends) with baselines, topology, and decision metadata so AI agents can reason and act autonomously.

What changes on a span:

	Standard OTel Span	With AgentTel	Why It Matters
Identity	`http.method=POST`, `http.route=/api/payments`	+ `topology.team=payments-platform`, `tier=critical`	Agent knows who owns this, how critical it is, and who to page
Baselines	(none)	+ `baseline.latency_p50_ms=45`, `baseline.error_rate=0.001`	Agent can tell if current behavior is normal or anomalous
Decisions	(none)	+ `decision.retryable=true`, `decision.runbook_url=...`	Agent knows what it's allowed to do without asking a human
Anomaly	(none)	+ `anomaly.detected=true`, `anomaly.score=0.92`	Agent gets alerted in real time, not after a threshold breach
Causality	(none)	+ `cause.category=dependency`, `cause.hint=stripe-api timeout`	Agent skips root-cause investigation and jumps to remediation

How It Works¶

AgentTel enriches telemetry across the full stack — all configurable via YAML or code, no manual instrumentation required:

%%{init: {'theme': 'base', 'themeVariables': {'lineColor': '#6366f1'}}}%%
graph LR
    B1["Your Backend<br/>(JVM / Go / Node.js / Python)"] --> AT1["AgentTel SDK"]
    B2["Your Frontend<br/>(Browser)"] --> AT2["AgentTel Web SDK"]
    AT1 --> C["OpenTelemetry SDK"]
    AT2 --> C
    C --> D["OTel Collector / Backend"]
    D --> E["AI Agent"]

    AT1 -->|"Topology + Baselines<br/>+ Decisions"| C
    AT2 -->|"Journeys + Anomalies<br/>+ Correlation"| C
    E -->|"MCP Tools<br/>(15 tools)"| AT1
    B2 -->|"W3C Trace Context"| B1

    style B1 fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style B2 fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style AT1 fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style AT2 fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style C fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style D fill:#a5b4fc,stroke:#6366f1,color:#1e1b4b
    style E fill:#818cf8,stroke:#6366f1,color:#1e1b4b

Level	Where	What It Adds	Example
Topology	OTel Resource (once per service)	Service identity, ownership, dependencies	team, tier, on-call channel
Baselines	Span attributes (per operation)	What "normal" looks like — backend and frontend	P50/P99 latency, error rate, page load time
Decisions	Span attributes (per operation)	What an agent is allowed to do	retryable, runbook URL, escalation level
Journeys	Frontend spans (per user flow)	Multi-step funnel tracking	checkout completion rate, step abandonment
Anomalies	Both backend and frontend spans	Real-time deviation detection	z-score spikes, rage clicks, error loops
Correlation	Cross-stack span linking	Frontend-to-backend trace linking	W3C Trace Context, backend trace IDs

Module Architecture¶

%%{init: {'theme': 'base', 'themeVariables': {'lineColor': '#6366f1'}}}%%
graph TB
    subgraph App["Your Application"]
        YML["application.yml / agenttel.yml"]
        ANN["@AgentOperation (optional)"]
    end

    subgraph Frontend["Frontend"]
        WEB["agenttel-web<br/><small>Browser SDK (TypeScript)<br/>Auto-instrumentation, Journeys,<br/>Anomaly Detection, Correlation</small>"]
    end

    subgraph Integration["Integration Layer (JVM)"]
        SBS["agenttel-spring-boot-starter<br/><small>Auto-config, BPP, AOP</small>"]
        JAE["agenttel-javaagent<br/><small>Zero-code OTel extension</small>"]
    end

    subgraph MultiLang["Multi-Language SDKs"]
        GOSDK["agenttel-go<br/><small>Go SDK — net/http, Gin, gRPC<br/>Baselines, Anomaly, SLO, GenAI</small>"]
        NODESDK["agenttel-node<br/><small>Node.js SDK — Express, Fastify<br/>Baselines, Anomaly, SLO, GenAI</small>"]
        PYSDK["agenttel-python<br/><small>Python SDK — FastAPI<br/>Baselines, Anomaly, SLO, GenAI</small>"]
    end

    subgraph Core["Core Libraries (JVM)"]
        COR["agenttel-core<br/><small>SpanProcessor, Baselines,<br/>Anomaly Detection, SLO Tracking</small>"]
        GEN["agenttel-genai<br/><small>LangChain4j, Spring AI,<br/>Anthropic, OpenAI, Bedrock</small>"]
        AGC["agenttel-agentic<br/><small>Agent Tracing, Orchestration,<br/>Cost, Guardrails, Quality</small>"]
        AGT["agenttel-agent<br/><small>MCP Server, Health, Incidents,<br/>Remediation, Reporting</small>"]
    end

    subgraph Tooling["IDE Tooling"]
        INS["agenttel-instrument<br/><small>MCP Server (Python)<br/>Codebase Analysis, Config Gen,<br/>Validation, Auto-Improvements</small>"]
    end

    subgraph Foundation["Foundation"]
        API["agenttel-api<br/><small>Annotations, Attributes, Models</small>"]
        TYPES["agenttel-types<br/><small>Shared TypeScript types</small>"]
        OTEL["OpenTelemetry SDK"]
    end

    App --> Integration
    App --> MultiLang
    SBS --> COR
    SBS --> GEN
    SBS --> AGC
    SBS --> AGT
    JAE --> COR
    COR --> API
    GEN --> API
    AGC --> API
    AGT --> COR
    GOSDK --> OTEL
    NODESDK --> TYPES
    NODESDK --> OTEL
    PYSDK --> OTEL
    API --> OTEL
    WEB --> TYPES
    WEB --> OTEL
    INS -.->|"generates config"| App

    style App fill:none,stroke:#818cf8,color:#818cf8
    style Frontend fill:none,stroke:#818cf8,color:#818cf8
    style Integration fill:none,stroke:#818cf8,color:#818cf8
    style MultiLang fill:none,stroke:#818cf8,color:#818cf8
    style Core fill:none,stroke:#818cf8,color:#818cf8
    style Tooling fill:none,stroke:#818cf8,color:#818cf8
    style Foundation fill:none,stroke:#818cf8,color:#818cf8
    style SBS fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style JAE fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style WEB fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style GOSDK fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style NODESDK fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style PYSDK fill:#a78bfa,stroke:#7c3aed,color:#1e1b4b
    style INS fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style COR fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style GEN fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style AGC fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style AGT fill:#818cf8,stroke:#6366f1,color:#1e1b4b
    style API fill:#818cf8,stroke:#a5b4fc,color:#1e1b4b
    style TYPES fill:#818cf8,stroke:#a5b4fc,color:#1e1b4b
    style OTEL fill:#818cf8,stroke:#a5b4fc,color:#1e1b4b

What an Agent Sees¶

When an incident occurs, an AI agent gets structured context via MCP:

=== INCIDENT inc-a3f2b1c4 ===
SEVERITY: HIGH
SUMMARY: POST /api/payments experiencing elevated error rate (5.2%)

## WHAT IS HAPPENING
Error Rate: 5.2% (baseline: 0.1%)
Latency P50: 312ms (baseline: 45ms)
Patterns: ERROR_RATE_SPIKE
Error Breakdown: dependency_timeout=62%, connection_error=31%, unknown=7%
Baseline Confidence: high (1,250 samples)

## WHAT CHANGED
Last Deploy: v2.1.0 at 2025-01-15T14:30:00Z
CHANGE CORRELATION:
  Likely cause: DEPLOYMENT (deploy-v2.1.0) — confidence: 0.85

## WHAT IS AFFECTED
Scope: operation_specific
User-Facing: YES
Affected Deps: stripe-api

## SUGGESTED ACTIONS
  - [HIGH] rollback_deployment: Rollback to previous version (NEEDS APPROVAL)
  - [MEDIUM] toggle_circuit_breaker: Circuit break stripe-api
    Spec: failureThreshold=5, halfOpenAfterMs=30000, successThreshold=3

## PLAYBOOK: error-rate-spike-response
  [1] CHECK: Classify error types → step 2
  [2] DECISION: Mostly dependency errors? → step 3 (yes) / step 4 (no)
  [3] ACTION: Enable circuit breaker → step 5
  [4] ACTION: Rollback deployment (NEEDS APPROVAL) → step 5
  [5] CHECK: Verify error rate decreasing

Compatibility¶

Backend (JVM)

Component	Supported Versions
Java	17, 21
OpenTelemetry SDK	1.59.0+
Spring Boot	3.4.x
Spring AI	1.0.0+ (optional)
LangChain4j	1.0.0+ (optional)
Anthropic Java SDK	2.0.0+ (optional)
OpenAI Java SDK	4.0.0+ (optional)
AWS Bedrock SDK	2.30.0+ (optional)

Backend (Go)

Component	Supported Versions
Go	1.22+
OpenTelemetry SDK	1.33.0+
net/http, Gin, gRPC	Latest

Backend (Node.js)

Component	Supported Versions
Node.js	18+
TypeScript	5.0+
OpenTelemetry SDK	1.30.0+
Express, Fastify	Latest

Backend (Python)

Component	Supported Versions
Python	3.11+
FastAPI	0.100+
OpenTelemetry SDK	1.20.0+
Django, Flask	Coming soon

Frontend (Browser)

Component	Supported Versions
TypeScript	4.7+
Modern browsers	Chrome, Firefox, Safari, Edge (ES2020+)

Tooling

Component	Supported Versions
Python (instrument agent)	3.11+

Get Started View on GitHub

AgentTel¶

What is AgentTel?¶

How It Works¶

Key Features¶

Multi-Language SDKs ¶

Enriched Spans ¶

MCP Server ¶

Zero-Code Mode ¶

GenAI Instrumentation ¶

Frontend Telemetry ¶

Anomaly Detection ¶

Incident Context ¶

Agent Observability ¶

Multi-Agent Support ¶

Instrumentation Agent ¶

Module Architecture¶

What an Agent Sees¶

Compatibility¶