RedSheep SecurityRedSheepSecurity
Academy/Advanced/Lesson 26
Advanced — Lesson 26 of 10

Adversary Emulation & Purple Teaming

10 min read

Adversary emulation is the practice of simulating real-world threat actor behavior to test and validate an organization's defenses. Unlike generic penetration testing, which seeks to find any path to compromise, adversary emulation follows a specific threat actor's known playbook — their tools, techniques, and procedures — to answer the question: "Can we detect and respond to this specific threat?" This lesson covers emulation planning, tooling, purple teaming, and how CTI drives the entire process.

Learning Objectives

  • Distinguish adversary emulation from penetration testing and red teaming
  • Understand how MITRE ATT&CK and CTI reporting inform emulation planning
  • Identify key adversary emulation tools and frameworks
  • Describe the purple teaming methodology and its benefits
  • Measure defensive coverage and conduct gap analysis using emulation results

What Is Adversary Emulation?

Definition: Adversary emulation is a type of offensive security assessment where the red team replicates the specific tactics, techniques, and procedures of a known threat actor, based on cyber threat intelligence, to evaluate an organization's ability to detect and respond to that particular threat.

The key distinction from traditional offensive security activities:

Activity Objective Approach CTI Role
Vulnerability Assessment Find vulnerabilities Automated scanning Minimal
Penetration Testing Prove exploitability Any path to objective Limited
Red Teaming Test overall security posture Stealthy, objective-based Moderate — scenario-driven
Adversary Emulation Validate defense against specific actor Follow actor's known playbook Central — drives the entire plan

Adversary emulation is inherently CTI-driven. Without quality intelligence about how a specific threat actor operates, there is nothing to emulate.

MITRE ATT&CK's Role in Emulation Planning

MITRE ATT&CK serves as the common language and structural framework for emulation planning. When a CTI team builds a threat actor profile mapped to ATT&CK techniques (as discussed in Lesson 24), that profile becomes the blueprint for an emulation plan.

From CTI to Emulation Plan

  1. Identify the priority threat actor based on your organization's threat landscape, sector, and intelligence requirements
  2. Compile the actor's TTP profile from published reporting, ATT&CK entries, and internal intelligence
  3. Map the attack flow — determine the sequence of techniques the actor typically uses, from initial access through actions on objectives
  4. Identify procedures — determine the specific tools, commands, and methods the actor employs at each step
  5. Build the emulation plan — create a step-by-step playbook that replicates the actor's operational sequence
  6. Define success criteria — what detection and response outcomes are you measuring?

MITRE CTID Adversary Emulation Library

The MITRE Center for Threat-Informed Defense (CTID) maintains the Adversary Emulation Library, a collection of publicly available emulation plans based on real threat actors. As of 2025, the library includes plans for:

  • APT29 (Cozy Bear) — SVR-linked espionage operations
  • FIN6 — Financially motivated group targeting payment card data
  • menuPass (APT10) — Chinese espionage targeting managed service providers
  • Sandworm — GRU Unit 74455, destructive operations
  • Turla — Russian FSB-linked espionage group

Each emulation plan in the library includes a detailed operational flow, the specific ATT&CK techniques at each step, the tools and commands to execute, and the expected detection opportunities. These plans are freely available on the CTID GitHub repository.

Adversary Emulation Tools and Frameworks

MITRE Caldera

Caldera is an open-source adversary emulation platform developed by MITRE. It provides an automated framework for running adversary emulation operations using "abilities" (individual ATT&CK techniques implemented as executable plugins) organized into "adversary profiles" (sequences of abilities that model a specific actor).

Key features:

  • Agent-based execution on target systems
  • Ability to chain techniques into multi-step operations
  • Built-in ATT&CK technique coverage
  • Automated reporting of which techniques succeeded or were detected
  • Extensible plugin architecture

Atomic Red Team

Atomic Red Team, developed by Red Canary, is a library of small, focused tests (called "atomics") mapped to individual ATT&CK techniques. Each atomic test is a self-contained procedure that executes a single technique on a target system.

Atomic Red Team differs from Caldera in philosophy:

  • Atomics are granular — each test covers one technique, executed independently
  • No agent required — tests are typically executed via command line (PowerShell, bash, cmd)
  • Low barrier to entry — tests can be run by any team member, not just red team specialists
  • Testing, not operations — atomics validate detection, they do not simulate full attack chains

The Atomic Red Team repository on GitHub contains tests for hundreds of ATT&CK techniques across Windows, macOS, and Linux.

SCYTHE

SCYTHE is a commercial adversary emulation platform that enables security teams to build and execute realistic attack scenarios. SCYTHE provides:

  • A visual campaign builder for designing multi-step emulation plans
  • Pre-built threat actor campaigns based on published intelligence
  • Integration with ATT&CK for technique mapping
  • Automated detection validation and reporting

Other Notable Tools

  • Infection Monkey (Guardicore/Akamai): Open-source breach and attack simulation focusing on network propagation
  • Invoke-AtomicRedTeam: PowerShell module for executing Atomic Red Team tests programmatically
  • Prelude Operator: Lightweight adversary emulation agent with community and commercial versions

Building an Emulation Plan from CTI

A practical emulation plan translates intelligence into action. Here is the structure of an emulation plan document:

1. Threat Actor Overview

Summarize the actor: name/aliases, attributed sponsor, known targets, motivation, and operational history. Cite the intelligence sources used.

2. Scope and Objectives

Define what will be emulated (which phases of the kill chain), what systems are in scope, and what specific questions the emulation aims to answer.

3. Attack Flow

Document the step-by-step operational sequence:

Phase ATT&CK Technique Procedure Tool/Command Expected Detection
Initial Access T1566.001 Spear-phishing Attachment Macro-enabled Word doc delivered via email Crafted .docm with VBA macro Email gateway alert, Endpoint macro execution
Execution T1059.001 PowerShell Macro launches encoded PowerShell powershell -enc [base64] Process creation (Sysmon EID 1), Script block logging (EID 4104)
Persistence T1053.005 Scheduled Task Create scheduled task for callback schtasks /create ... Scheduled task creation (EID 4698), Sysmon EID 1
C2 T1071.001 Web Protocols HTTPS beacon to C2 server Custom C2 over HTTPS Network anomaly detection, DNS query logging

4. Detection Expectations

For each step, document what detection should fire, in which tool (SIEM, EDR, NDR), and what the expected alert looks like. This becomes the scorecard.

5. Safety Controls

Define abort criteria, deconfliction procedures (how to distinguish emulation from a real incident), and rollback procedures for any changes made to target systems.

Purple Teaming

Definition: Purple teaming is a collaborative security exercise where red team (offensive) and blue team (defensive) personnel work together in real-time, with the explicit goal of improving detection and response capabilities.

Purple Team vs. Traditional Red Team

In a traditional red team engagement, the red team operates covertly and the blue team is unaware. Success is measured by whether the red team achieves its objective undetected.

In a purple team exercise, both teams work together:

  • The red team executes a technique and announces it
  • The blue team checks whether the activity was detected
  • If detected: validate the alert quality and response procedures
  • If not detected: collaboratively investigate why and build the detection
  • Move to the next technique and repeat

Purple Team Workflow

  1. Plan: CTI team identifies the priority threat actor and builds the TTP profile
  2. Prepare: Red team builds the emulation plan; blue team identifies existing detections for each technique
  3. Execute: Technique by technique, the red team executes while the blue team monitors
  4. Assess: For each technique, record: Was it detected? By what tool? How long until detection? Was the alert actionable?
  5. Remediate: For gaps, collaboratively build new detections, tune existing ones, or identify telemetry collection needs
  6. Report: Document findings, new detections created, gaps remaining, and prioritized recommendations

Benefits of Purple Teaming

  • Immediate feedback loop: Detections are validated or created in real-time, not weeks after a report
  • Knowledge transfer: Blue team learns attack techniques; red team learns defensive blind spots
  • Measurable outcomes: Clear metrics on detection coverage before and after the exercise
  • Prioritized by threat: Focus is on the actors and techniques most relevant to the organization

Measuring Defensive Coverage

Emulation exercises produce measurable data about defensive capability:

Metric Description
Detection Rate Percentage of emulated techniques that triggered an alert
Mean Time to Detect Average time between technique execution and alert generation
Alert Quality Was the alert specific enough for an analyst to understand and act on?
Response Effectiveness Did the SOC respond correctly when alerted?
Coverage Delta Change in ATT&CK coverage before vs. after the exercise

Gap Analysis

After an emulation exercise, gaps fall into three categories:

  1. Telemetry gaps: The necessary data is not being collected (e.g., no PowerShell script block logging enabled)
  2. Detection logic gaps: The data is available but no rule exists to detect the behavior
  3. Response gaps: The detection fired but the SOC did not respond effectively (training, process, or tooling issue)

Each gap type requires a different remediation approach: telemetry gaps need engineering changes, detection gaps need rule development, and response gaps need process improvement or training.

How CTI Drives the Entire Process

CTI is the engine of adversary emulation and purple teaming:

  • Before: CTI identifies which actors matter, what their TTPs are, and what the emulation should replicate
  • During: CTI provides context during the exercise — "this is how APT29 actually does this in the wild"
  • After: CTI validates whether the exercise was realistic, identifies gaps in the actor profile, and updates intelligence based on defensive findings

Organizations without a CTI function driving emulation default to generic penetration testing. With CTI integration, every emulation exercise directly improves resilience against the specific threats that matter most.

Key Takeaways

  • Adversary emulation replicates specific threat actor behavior, making it fundamentally different from generic penetration testing
  • MITRE ATT&CK provides the structural framework; CTI provides the content that drives emulation planning
  • Tools like Caldera, Atomic Red Team, and SCYTHE enable automated and repeatable emulation execution
  • Purple teaming maximizes value by combining offensive execution with real-time defensive validation
  • Emulation results produce measurable metrics on detection coverage, response capability, and defensive gaps
  • CTI is the central driver throughout the process — without quality intelligence, emulation becomes untargeted testing

Practical Exercise

Using the MITRE CTID Adversary Emulation Library (available at https://github.com/center-for-threat-informed-defense/adversary_emulation_library):

  1. Select one emulation plan (APT29 is recommended as it is the most detailed)
  2. Review the operational flow and identify the ATT&CK techniques used at each phase
  3. For five techniques in the plan, document:
    • What telemetry source would be needed to detect it (e.g., Sysmon, Windows Security logs, network capture)
    • Whether a Sigma rule exists in the SigmaHQ repository for this technique
    • What a realistic false positive scenario would look like for that detection
  4. Create a detection scorecard (table format): Technique | Detection Exists (Y/N) | Data Source Required | Gap Type (if N)
  5. Prioritize the gaps: If you could only close three detection gaps, which three would provide the most defensive value and why?

Further Reading