"Can You See What
Your AI Saw?"

GenAI Endpoint Observability for Detection Engineers

Mika Ayenson, Ph.D. · Detection Engineering @ Elastic

$ whoami_

About Me

Mika Ayenson, Ph.D.

TRaDE Team Lead @ Elastic

▸Leads Threat Research & Detection Engineering (TRaDE)

▸10+ years in security research & cyber experimentation

▸Automates everything, family-motivated efficiency nerd

        ayenson.com
        github.com/Mikaayenson
        linkedin.com/in/mika-ayenson
        x.com/stryker0x
      

The Landscape Won't Stop Growing

New GenAI tools, agents, and frameworks ship daily. Detection engineering can't keep up with enumeration.

        Cursor
        ·
        Claude Code
        ·
        GitHub Copilot
        ·
        ChatGPT
        ·
        Windsurf
        ·
        Devin
        ·
        Gemini Code Assist
        ·
        Amazon Q
        ·
        Tabnine
        ·
        Codeium
        ·
        Replit Agent
        ·
        Sourcegraph Cody
        ·
        JetBrains AI
        ·
        Aider
        ·
        Continue
        ·
        Sweep
        ·
        OpenHands
        ·
        Codium
        ·
        LangChain
        ·
        LangGraph
        ·
        CrewAI
        ·
        AutoGen
        ·
        OpenAI Agents SDK
        ·
        Semantic Kernel
        ·
        DSPy
        ·
        Instructor
        ·
        Pydantic AI
        ·
        Agno
        ·
        SmolAgents
        ·
        Mastra
        ·
        Anthropic Computer Use
        ·
        Browser Use
        ·
        Playwright MCP
        ·
        Stagehand
        ·
        Browserbase
        ·
        Steel
        ·
        AgentQL
        ·
        Lovable
        ·
        Bolt.new
        ·
        v0
        ·
        Tempo
        ·
        Softgen
        ·
        Firebase Studio
        ·
        Figma Make
        ·
        vLLM
        ·
        Ollama
        ·
        SGLang
        ·
        TensorRT-LLM
        ·
        llama.cpp
        ·
        LM Studio
        ·
        Jan
        ·
        Msty
        ·
        GPT4All
        ·
        MemO
        ·
        Letta
        ·
        Zep
        ·
        LangMem
        ·
        Cognee
        ·
        Graphiti
        ·
        Hypergol
        ·
        Keywords AI
        ·
        OpenRouter
        ·
        Portkey
        ·
        Helicone
        ·
        LiteLLM
        ·
        E2B
        ·
        Daytona
        ·
        Modal
        ·
        Runloop
        ·
        Morph
        ·
        Cognition
        ·
        Poolside
        ·
        Magic
        ·
        Factory AI
        ·
        and counting...
      

190+ tools across 14 categories — and this was last week's count.

Source: Morph AI Coding Agent Dev Tools map · LLMDevs community

REAL WORLD · r/vscode · 2026

"I have no idea which extension it could be"

A VS Code user discovers one of their extensions is silently connecting to livefootballtickets.com. Extensions from Microsoft, Anthropic, GitHub — all trusted publishers. No new installs. No idea which one.

Reddit post about suspicious VS Code extension

VS Code firewall prompt for livefootballtickets.com

The detection gap

Users install tools faster than they can audit them. Extensions run with ambient authority. When something goes wrong — they can't even identify the source.

Your Devs Are Running These.
Right Now. On Your Endpoints.

85%

of devs regularly use AI coding tools

22%

of merged code is AI-authored

55%

are "heavy users" (76%+ of coding time)

These tools spawn shells, write files, and make network calls. And your EDR has no idea an AI is driving.

JetBrains Dev Ecosystem 2025 · DX AI-Assisted Engineering Q4 2025 · ACTI Agentic Coding Survey Jan 2026

TELEMETRY INSIGHTS · DNS | START HERE | CHEAPEST TELEMETRY

Top LLM Endpoints by Host Count

Source: Events.dns.question.name · Metric: unique host.id · DNS/network is the fastest path to visibility

Unique
Hosts

api.anthropic.com27.77%

chatgpt.com23.86%

claude.ai20.61%

api.openai.com10.63%

gemini.googleapis.com3.47%

api-inference.huggingface.co2.17%

copilot.microsoft.com1.74%

perplexity.ai1.52%

openai.azure.com1.30%

others (groq, deepseek, ...)6.93%

TELEMETRY INSIGHTS · CODE SIGNATURE

Signed vs Unsigned Binaries → LLM Endpoints

Source: process.code_signature.exists + process.code_signature.status · Metric: unique host.id

98.5% signed + trusted

CODE SIGNATURE · 131 UNIQUE HOSTS

Signed + Trusted129 hosts · 98.47%

Unsigned (no signature)2 hosts · 1.53%

Only 2 out of 131 hosts ran unsigned binaries

Nearly every process connecting to LLM APIs is signed and trusted. An unsigned binary reaching an LLM endpoint is an immediate anomaly worth investigating.

Detection rule: Alert on any process where process.code_signature.exists == false connecting to known LLM domains.

TELEMETRY INSIGHTS · PROCESS ANCESTRY

Top Processes Spawned by GenAI Tools

Mostly signed, mostly legitimate. The unsigned ones are buried in the noise — and that's the problem.

git

3,400+

720+

node

490+

400+

bash

240+

194

python

170+

kubectl

AVG COMMAND LINE LENGTH

48K

chars · zsh via claude · macOS

Top by OS:
macOS zsh→claude: 48K
Linux bash→cursor: 3K
Windows pwsh→codex: 2.3K

⚠ Caveat

Long cmdlines = indicator, but LLMs generate multi-pipe chains that defeat entropy-based detection

TELEMETRY INSIGHTS · FILE MODIFICATION

GenAI File Write Activity

What GenAI tools and their descendants write to disk, by unique host.id

omni → .md

9 hosts

go → .md

3 hosts

codex → .py

3 hosts

claude → .txt

2 hosts

claude → .tmp

2 hosts

Claude → .json

2 hosts

go → .yaml

2 hosts

git → .json

2 hosts

Python → .jsonl

2 hosts

Autonomous file writes

GenAI tools write code, configs, temp files, and data to disk without explicit user action. The subset targeting .json, .yaml, .md configs is a persistence surface.

Notable patterns

▸ omni (Cursor) writes .md across 9 hosts

▸ codex writes .py files directly

▸ claude creates .tmp staging files

▸ copilot-language-server writes to .db

Detection angle: Monitor file writes to AI config paths by unexpected processes.

TELEMETRY INSIGHTS · DETECTION SIGNALS

Real Signals from Production Telemetry

Credential Access via GenAI

credentials.db via Python15 hosts

credentials via claude4 hosts

Cookies via claude2 hosts

logins.json via claude.exe1 host

azureProfile.json via jq4 hosts

Suspicious URLs via GenAI Shell

webhook.site · zsh via claude

api.telegram.org · bash via node

polymarket.com · 30 events via claude

hackerone.com · zsh via claude

Raw IPs: 35.178.x, 52.56.x, 162.243.x

Persistence via GenAI Tool

PlistBuddy → Terminal.plist6 hosts

defaults → Terminal.plist6 hosts

watchman → watchman.plist1 host

powershell → AutoMuteGames.lnk1 host

Unusual DNS from GenAI Processes

Cursor Helper → hcaptcha.com

Cursor Helper → supabase.co

Cursor Helper → intercom

Claude → microsoft365.mcp.claude.com

codex → grafana-mcp.osend.io

The Problem

Your alert fired. A suspicious process ran. Good luck figuring out if a human or an LLM did it.

...the problem, again

From the presentation right before this one.

"Find evil in base-rd01-cdrive.E01 and write a comprehensive report..."

THE CORE QUESTION

Intent Attribution Is Broken

Same PID, same user, same command line. Nearly identical telemetry. Looking at raw events, it's extremely difficult to tell them apart.

Threat Model

OWASP MCP Top 10 (2025) · VirusTotal: From Automation to Infection (Feb 2026)

Attack in Action

A poisoned repo file tricks the AI agent into exfiltrating data — all through legitimate-looking process chains.

The entire chain looks like a developer running curl. Without prompt-level telemetry, the poisoned context is invisible.

Current Telemetry

What does endpoint telemetry actually give us when AI tools run commands?

What EDR Sees Today

Visible

▸Process spawns with ancestry chains

▸File events: creates, writes, deletes

▸Network / DNS: cheapest, most actionable

▸User context: uid, session, root = suspicious

▸Runtime scrutiny: node, npm, deno spawns

New Observability Opportunities

✗The prompt that triggered it

✗Model reasoning for the command

✗Shell builtins: no child process

✗MCP tool identity: which server?

✗Human vs. AI intent boundary

Where Detection Breaks Down

Human or AI?

No field distinguishes "Cursor Agent spawned this" from "user typed it." process.parent.name is all there is. How would you enrich that?

Sandbox bypass

AI agents can self-escalate from sandboxed to unsandboxed execution. How would you even detect that privilege transition?

Cross-session context loss

AI tools operate session-based. New session, clean slate. If an attacker starts fresh, your forensic chain breaks. How do you correlate across sessions?

Which MCP server ran that?

MCP servers spawn processes through node or python3. The server identity isn't in process metadata. How do you attribute the tool?

Process Ancestry Chain

Detection Opportunities

Process telemetry gives you more than you'd think. Here's where the signal is.

Why `parent.name` Isn't Enough

Direct child only

// Catches 1 level, misses grandchildren process where event.type == "start" and process.parent.name in ( "claude", "cursor", "Cursor Helper (Plugin)" ) and process.name in ("curl", "wget", "nc")

Full ancestry walk

// Walks entire tree, any depth process where event.type == "start" and descendant of [ process where process.parent.name in ( "claude", "cursor", "Cursor Helper (Plugin)" ) ] and process.name in ("curl", "wget", "nc")

Source: production Elastic Defend rules

PRODUCTION ES|QL | THE KEY TECHNIQUE

Entity ID Intersection

// detection-rules: Suspicious Network or Recon Activity via GenAI Descendant FROM logs-endpoint.events.process-* | WHERE event.type == "start" AND process.Ext.ancestry IS NOT NULL | EVAL is_genai_spawn = process.parent.name IN ("claude", "cursor", "Cursor Helper", "copilot", "windsurf" /* ... */), genai_entity_id = CASE(is_genai_spawn, process.parent.entity_id, NULL) | INLINE STATS all_genai_entity_ids = VALUES(genai_entity_id) WHERE genai_entity_id IS NOT NULL | EVAL genai_ancestor_ids = MV_INTERSECTION(all_genai_entity_ids, process.Ext.ancestry) // ← entity_id ∩ entity_id | WHERE genai_ancestor_ids IS NOT NULL AND NOT is_genai_spawn // exclude direct children (grandchildren+ only) AND process.name IN ("curl", "wget", "nc", "nmap", "whoami", "ssh" /* ... */)

ELASTIC DEFEND | THE MCP PROBLEM

MCP Server Detection

// Catches MCP servers spawning shells or utilities process where event.type == "start" and process.parent.name in ("node", "python3", "deno") and process.parent.command_line like~ ("*mcp*", "*mcp-server*", "*mcp_server*") and ( process.name in ("bash", "zsh", "sh") and process.args == "-c" or process.name in ("curl", "git", "powershell.exe") )

What's Detectable

MCP server spawns a shell with -c
Pattern match on *mcp* in parent cmdline

What's still missing

Which MCP tool was invoked
MCP server identity or trust level

EQL · CREDENTIAL ACCESS + COLLECTION

GenAI Process Accessing Sensitive Files

file where event.action in ("open", "creation", "modification") and event.outcome == "success" // GenAI process by name and process.name in ( "claude", "cursor", "copilot", "codex", "ollama", "gpt4all", "gemini-cli", "llama-server" /* ... */ ) // Sensitive credential & config files and file.name like~ ( "logins.json", "Login Data", "cookies.sqlite", "credentials", "accessTokens.json", "login.keychain-db", "known_hosts", ".bashrc", ".zshrc" /* ... */ )

What's Shippable Today

Tactic	Rule	Type
Execution	GenAI/MCP Child Process Execution	BBR · EQL
Execution	Suspicious Activity via GenAI Descendant	ES\|QL + EQL
C2	GenAI Connection to Unusual Domain	new_terms
Credential Access	Sensitive File Access + auto `kill_process`	EQL sequence
Persistence	LaunchAgents / rc.local / Startup modification	EQL sequence
Defense Evasion	Unusual Process Modifying GenAI Config	new_terms

Cross-platform: macOS, Linux, Windows · Elastic Defend rules include automated prevention responses

Detection Rules vs. Hunting

Detection Rules

Real-time

EQL descendant of · maxspan=1m

Auto-response: kill_process

Threat Hunting

Batch / Historical

ES|QL INLINE STATS + MV_INTERSECTION

Hours of history · surface anomalies

Use both. Elastic Defend kills in real-time. SIEM detection rules find what slipped past.

Simplest starting point: Flag known LLM process names (cursor, claude, copilot) via process metadata and mark every subprocess as AI-descended.

Where the Industry Needs to Go

Today's best option is process.parent.name and command_line heuristics. That's not good enough.

The Gaps

Every production rule today is a workaround for telemetry that should exist but doesn't.

Risks Still on the Table

PII & API Key Exposure

Capturing prompts means logging what devs type: PII, secrets, API keys
Prompt telemetry creates a new exfil surface
gen_ai.content.prompt needs redaction before SIEM

Cross-Session Blind Spots

New session = clean slate. Forensic trail is gone
No session correlation ID in endpoint telemetry
Can't reconstruct the AI's full decision chain

More telemetry is good. But it introduces new compliance and forensic challenges that need to be solved simultaneously.

EMERGING CAPABILITY

Agent Hooks = Telemetry Source

All major AI coding tools now expose lifecycle hooks: shell commands that fire at key agent events. These are a direct telemetry pipeline for what EDR can't see.

Claude Code

PreToolUse · PostToolUse
SessionStart / End
UserPromptSubmit
PermissionRequest
SubagentStart / Stop

Cursor

beforeShellExecution
afterFileEdit
beforeMCPExecution
sessionStart / End
afterAgentResponse

GitHub Copilot

preToolUse · postToolUse
sessionStart / End
userPromptSubmitted
errorOccurred

APPLICATION MONITORING · OTEL

OTel Application Monitoring Bridges the Gap

EDR sees processes. APM sees intent. gen_ai.* semantic conventions connect them.

gen_ai.* Semantic Conventions

▸ gen_ai.client.token.usage: token consumption

▸ gen_ai.client.operation.duration: latency

▸ gen_ai.server.time_to_first_token

▸ gen_ai.client.inference.operation.details

▸ gen_ai.evaluation.result: guardrails

MCP Trace Context

▸ params._meta propagates trace IDs across tools

▸ Correlate: prompt → tool decision → process spawn

▸ Link AI agent session to endpoint alert

Upstream Work Happening Now

▸ OTel Python SDK → gen_ai conventions v1.37+

▸ Elastic ECS RFC 0052: gen_ai.* field mapping

▸ Endpoint native GenAI observability requested

▸ Hook → OTel → SIEM pipeline prototyping

The Vision

Hook captures PreToolUse → OTel gen_ai.client.inference → enriches endpoint alert with AI attribution → detection rule correlates both

OpenTelemetry GenAI Conventions

Claude Code already ships OTel natively: 8 metrics + 5 event types via claude_code.* namespace. The gen_ai.* spec provides a standard. Most teams aren't ingesting either yet.

OTel `gen_ai.*` Spec

gen_ai.agent.id

gen_ai.operation.name

gen_ai.provider.name

MCP via params._meta

Industry standard, not yet widely adopted

Claude Code `claude_code.*`

claude_code.token.usage

claude_code.cost.usage

claude_code.session.count

+ tool_result, api_request events

Own namespace, ships today

Still missing for security

Spawn correlation ID

Guardrail decision context

Cursor, Copilot: no OTel

Gap is narrowing

Where the Industry Should Head

// One JSON blob transforms a generic alert into an AI-attributable event { "process.parent.ai_tool.name": "cursor", "process.parent.ai_tool.mode": "agent", // agent | ask | edit | manual "gen_ai.session.id": "sess_abc123", "gen_ai.tool.name": "Shell", "gen_ai.tool.mcp_server": "filesystem-server", "gen_ai.tool.user_approved": true, "gen_ai.guardrail.decision": "allowed", "gen_ai.model.provider": "anthropic", "gen_ai.process.correlation_id": "proc_corr_001" }

Provider Logs = Security Telemetry

Guardrail blocks, tool-use decisions, refusal events: these are security signals. Most are trapped in vendor dashboards.

Claude Code is the exception. It exports OTel metrics and events natively: session duration, token usage, tool results, API errors. Proof providers can do it.

Takeaways

Reduce the Attack Surface

Before writing rules, reduce what you need to detect.

Asset Inventory First

▸ Know which GenAI tools are on your fleet

▸ Track: Cursor, Claude Code, Copilot, Windsurf, Codex

▸ ps aux | grep -E "cursor|claude|copilot"

▸ If you don't know it's running, you can't secure it

Browser vs. Native Client

▸ Browser-based AI (Gemini web, ChatGPT web) = sandboxed

▸ Native clients (Claude Code, Cursor) = full endpoint access

▸ Pushing users to browser reduces shell/file risk

▸ Attacker must bypass browser sandbox first

Scrutinize Runtimes

▸ Monitor node, npm, deno execution as GenAI descendants

▸ GenAI running as root = immediate flag

▸ Apply network controls: block unexpected LLM domains

DNS/Network First

▸ Cheapest, most actionable telemetry source

▸ Faster than parsing 48K-char command lines

▸ Block or alert on suspicious domains per policy

▸ NGFWs already classify GenAI traffic (FortiGate, Palo Alto App-ID) — detect & block without an agent

▸ Then layer process ancestry and file monitoring

Recap: The Path Forward

A maturity progression. Start where you are, build toward full observability.

1. Visibility

▸ Inventory GenAI tools on your fleet

▸ DNS queries to LLM endpoints

▸ Process ancestry from AI parents

▸ Who is using what, and how much?

TODAY: use existing telemetry

2. Guardrails

▸ Detection rules for intent, not just tools

▸ Network controls on LLM domains

▸ NGFW app control: GenAI category for policy enforcement

▸ Browser/sandbox policies where possible

▸ Scrutinize root, node, unsigned binaries

NEXT: active detection & policy

3. Full Observability

▸ OTel gen_ai.* semantic conventions

▸ Agent hooks (PreToolUse, SessionStart)

▸ Anomaly detection on metrics (CPU, tokens)

▸ Prompt → tool → process correlation

LONG TERM: OTel, hooks, APM

Go Get The Rules

All rules shown today are open source. Click to open on GitHub.

SIEM DETECTION RULES

elastic/detection-rules

            repo:elastic/detection-rules

            "Domain: LLM"

ES|QL & EQL rules for GenAI process detection, DNS telemetry, ancestry chain analysis

ENDPOINT BEHAVIOR RULES

elastic/protections-artifacts

            repo:elastic/protections-artifacts

            path:/^behavior// genai

Elastic Defend behavioral rules with real-time kill_process prevention responses

Thanks

Mika Ayenson, Ph.D.

TRaDE Team Lead · Elastic

linkedin.com/in/mika-ayenson

References & Links

Go Read The Rules

github.com/elastic/detection-rules

Search for domain: llm in the repo

All SIEM rules shown today are open source

Standards

OTel GenAI Semantic Conventions

OTel MCP Conventions

OWASP MCP Top 10 (2025)

Research

SpecterOps: An Evening with Claude Code

Unit42: MCP Attack Vectors

Snyk: Prompt Injection Meets MCP

Elastic: Advances in LLM Security

"Can You See WhatYour AI Saw?"

About Me

The Landscape Won't Stop Growing

"I have no idea which extension it could be"

Your Devs Are Running These.Right Now. On Your Endpoints.

Top LLM Endpoints by Host Count

Signed vs Unsigned Binaries → LLM Endpoints

Top Processes Spawned by GenAI Tools

GenAI File Write Activity

Real Signals from Production Telemetry

Credential Access via GenAI

Suspicious URLs via GenAI Shell

Persistence via GenAI Tool

Unusual DNS from GenAI Processes

The Problem

...the problem, again

Intent Attribution Is Broken

Threat Model

Attack in Action

Current Telemetry

What EDR Sees Today

Visible

New Observability Opportunities

Where Detection Breaks Down

Human or AI?

Sandbox bypass

Cross-session context loss

Which MCP server ran that?

Process Ancestry Chain

Detection Opportunities

Why parent.name Isn't Enough

Direct child only

Full ancestry walk

Entity ID Intersection

MCP Server Detection

What's Detectable

What's still missing

GenAI Process Accessing Sensitive Files

What's Shippable Today

Detection Rules vs. Hunting

Detection Rules

Threat Hunting

Where the Industry Needs to Go

The Gaps

Risks Still on the Table

PII & API Key Exposure

Cross-Session Blind Spots

Agent Hooks = Telemetry Source

Claude Code

Cursor

GitHub Copilot

OTel Application Monitoring Bridges the Gap

gen_ai.* Semantic Conventions

MCP Trace Context

Upstream Work Happening Now

OpenTelemetry GenAI Conventions

OTel gen_ai.* Spec

Claude Code claude_code.*

Still missing for security

Where the Industry Should Head

Provider Logs = Security Telemetry

Takeaways

Reduce the Attack Surface

Asset Inventory First

Browser vs. Native Client

Scrutinize Runtimes

DNS/Network First

Recap: The Path Forward

1. Visibility

2. Guardrails

3. Full Observability

Go Get The Rules

elastic/detection-rules

elastic/protections-artifacts

Thanks

References & Links

Go Read The Rules

Standards

Research

"Can You See What
Your AI Saw?"

Your Devs Are Running These.
Right Now. On Your Endpoints.

Why `parent.name` Isn't Enough

OTel `gen_ai.*` Spec

Claude Code `claude_code.*`