GenAI Security Research & Development

Applied research at the intersection of generative AI and cybersecurity since GPT-2 and Nvidia Jarvis (2019-2020 at JHUAPL), years before the current wave. Two focus areas: using AI to detect threats, and detecting threats to AI systems.

Continuous learner: Coursera AI Engineer Agentic Track • HackTheBox AI Red Teamer

GenAI Security

Prompt Injection Detection via ES|QL

Hybrid detection combining regex pre-filters with LLM-as-judge classification. Handles evasion via tokenization, encoding, and context manipulation.

/* Rule-based pre-filter */
| EVAL r_ignore = CASE(
    txt RLIKE "(ignore previous|disregard)",
    "ignore_previous", NULL)
| EVAL r_override = CASE(
    txt RLIKE "override.*(rules|policy)",
    "override_phrasing", NULL)

/* LLM classification */
| COMPLETION llm_out = judge_prompt 
    WITH { "inference_id": "bedrock" }
| DISSECT llm_out "label=%{label} score=%{score}"
| WHERE label == "override" AND score >= 0.70

Prompt Injection Taxonomy

Tested and documented 16+ attack categories:

• Direct Override• Priority Manipulation• Indirect via RAG/Tickets• Tool-Call Coercion• Tokenization Confusion• Payload Splitting• Encoded (Base64/Hex)• Context Poisoning• Code Injection• Chain/Recursive Injection• System Prompt Exfil• DAN-Style Jailbreaks• Zero-Width Unicode• Mask Completion

Adversarial Guardrail Testing

Systematic testing of cloud provider content filters across Azure, AWS Bedrock, and GCP. Scenario-based attack generation, replay testing, and tool-call coercion via agent builders.

GenAI Telemetry Standards

Contributing to ECS field schemas for GenAI observability (gen_ai.* fields). Aligning with OpenTelemetry conventions for input/output messages and tool invocations.

MITRE ATLAS Detection Rules

First public detection rules mapped to MITRE ATLAS framework for AI/ML threats. Covering prompt injection, model poisoning, and supply chain attacks.

AWS Bedrock Detection & Hunting

Initial GenAI detection rules and hunting queries for AWS Bedrock model invocation logs. Token anomalies, latency spikes, guardrail triggers.

Multimodal Threat Vectors

Image-based prompt injection, audio transcript manipulation, cross-modal payload delivery. As LLMs become multimodal, detection must adapt.

AI Pipeline Supply Chain

Model poisoning, training data contamination, dependency risks. How do you detect a compromised embedding model?

Adversarial Robustness

How do attackers evade AI-augmented detection? Poisoning feedback loops, gaming confidence scores, crafting deceptive inputs.

GenAI Development

Agentic Alert Triage

Multi-agent system using hypothesis testing (H0: benign vs H1: malicious) with evidence layers:

  • Context-1: Alert fields, rule metadata
  • Context-2: Internal signals (24h aggregates, burst detection)
  • Context-3: External enrichment (TI, VT verdicts)

Key insight: Missing evidence lowers confidence, doesn't escalate verdicts.

Context over Prompt Engineering

Prompts should teach reasoning principles, not memorize patterns. Separation of concerns: Prompt = epistemic policy, Context = evidence supply.

Automated Exception Generation

LLM-driven rule tuning from alert clustering. Produce exceptions that reduce noise while preserving detection fidelity.

OpenAI Agents SDK Patterns

Building security agents with structured tool definitions and typed outputs:

@function_tool
def enrich_alert(
    alert_id: str,
    lookback_hours: int = 24
) -> AlertContext:
    """Fetch internal context for alert triage.
    
    Returns prevalence, burst metrics, and 
    related alerts within the time window.
    """
    return query_context_layer_2(
        alert_id, 
        hours=lookback_hours
    )

Threat Coverage Gap Analysis

Ingest threat reports (PDF, HTML, URL), extract TTPs, compare against detection repos. Automatic chunking for large reports with merged analysis.

LLM Evaluation & Tracing

Phoenix and Langsmith for agent tracing. Ground truth datasets, calibration metrics, and automated regression testing for security workflows.

Human-in-the-Loop Design

Confidence thresholds for escalation, explainable verdicts, feedback loops for continuous calibration improvement.

Areas of Exploration

Security-Specialized Models

Domain-specific vs general-purpose trade-offs

Synthetic Security Data

AI-generated attack telemetry for testing

Threat Hunting Assistants

Conversational hypothesis exploration

Auto Incident Response

LLM-generated playbooks with safety constraints

Detection Translation

YARA ↔ Sigma ↔ ES|QL ↔ KQL

LLM Evaluation Pipelines

Tracing, ground truth, calibration metrics