Automating CTI Workflows — CTI Academy

The volume of threat data available to CTI analysts has grown exponentially while team sizes have remained relatively flat. Automation is not optional for modern CTI programs — it is essential for keeping pace with the data volume, reducing analyst fatigue on repetitive tasks, and ensuring consistent processing. However, automation is a tool, not a replacement for analysis. This lesson covers what to automate, what to keep human, and how to build practical automation workflows.

Learning Objectives

Identify CTI tasks that benefit from automation versus those requiring human judgment
Design an indicator enrichment pipeline using common threat intelligence APIs
Understand the role of TIP and SOAR platforms in CTI automation
Use Python and common libraries to build basic CTI automation scripts
Plan for API rate limiting, error handling, and long-term maintenance of automated workflows

Why Automate CTI Workflows

Three pressures drive CTI automation:

Volume: A mid-sized organization might ingest tens of thousands of indicators daily from commercial feeds, open-source feeds, ISAC sharing, and internal detections. No human team can manually review, enrich, and triage that volume.

Speed: When a new vulnerability or campaign is disclosed, organizations need to determine relevance within hours, not days. Automated enrichment can provide initial context in seconds.

Consistency: Manual processes introduce variation. One analyst might check five sources for enrichment; another might check three. Automation ensures every indicator passes through the same enrichment pipeline with the same quality checks.

Key Principle: Automate the collection and processing phases of the intelligence cycle. Protect the analysis and dissemination phases for human judgment.

What to Automate

Indicator Enrichment

The single highest-value automation target. When a new indicator (hash, IP, domain, URL) enters your pipeline, automatically query:

Reputation services: Is this indicator known to be malicious?
Context services: What is this indicator associated with? (campaigns, threat actors, malware families)
Technical metadata: When was this domain registered? What does this IP host? What does this file do?

Feed Aggregation and Deduplication

Most organizations consume multiple threat intelligence feeds. Automated aggregation normalizes indicators into a common format, deduplicates across sources, and tracks which sources reported each indicator (source confidence stacking).

Report Parsing and IOC Extraction

Threat intelligence reports from vendors, ISACs, and government agencies arrive as PDFs, emails, and web pages. Automated extraction can pull structured indicators (IPs, domains, hashes, URLs) from unstructured text using regex patterns and defanging logic (converting hxxps:// back to https://, [.] back to ., etc.).

Indicator Scoring

Combine enrichment results into a composite score that helps analysts prioritize which indicators warrant deeper investigation. Scoring models typically weight factors like: number of sources reporting the indicator, age, association with known threat actors, relevance to your industry, and confidence level of each source.

TIP Ingestion

Automated ingestion into your Threat Intelligence Platform ensures that enriched, scored indicators are available for detection, hunting, and analysis without manual data entry.

Alerting on Priority Indicators

Configure automated alerts when indicators matching specific criteria appear — for example, any indicator associated with threat actors in your PIR list, any indicator involving your organization's domain names, or any indicator with a score above a defined threshold.

What NOT to Automate

Analysis and Assessment

Automation can tell you that an IP address has been reported as malicious by four sources and is associated with APT29. It cannot tell you what that means for your organization, whether the assessment is credible, or how it changes your risk posture. Analytical judgment must remain human.

Strategic Intelligence Production

Automated tools cannot produce strategic intelligence. Understanding geopolitical trends, assessing how a new regulation affects your threat landscape, or evaluating whether a threat actor's motivations are shifting — these require human reasoning, contextual knowledge, and critical thinking.

Confidence Assessment

While automation can aggregate data, the confidence level assigned to an intelligence assessment must reflect human evaluation of source reliability, information quality, and analytical rigor. Automated scoring is a starting point, not a final determination.

Dissemination Decisions

Deciding who needs specific intelligence, at what classification level, with what caveats, and through which channel requires understanding your stakeholders, their needs, and the sensitivity of the information.

The Automation Boundary: If a task requires judgment, context, or creativity, keep it human. If a task is repetitive, rule-based, and high-volume, automate it. The goal is to free analyst time for the work that requires expertise.

Enrichment APIs and Services

Building an enrichment pipeline requires integrating with multiple data sources via their APIs.

Common Enrichment Sources

Service	Data Provided	Free Tier
VirusTotal	File analysis, URL scanning, domain/IP context	4 lookups/minute, 500/day
Shodan	Internet-connected device data, banners, ports	1 lookup/minute (API)
AbuseIPDB	IP reputation based on community reports	1,000 checks/day
GreyNoise	Classification of internet scanners (benign vs. malicious)	50/day (Community API)
URLScan.io	URL rendering and analysis	100 scans/day (private)
AlienVault OTX	Threat indicators and pulse (community intelligence)	Generous free tier
CIRCL Passive DNS	Historical DNS resolution data	Free for authorized users

Commercial Sources

Recorded Future: Risk scores, entity relationships, dark web monitoring
Mandiant Advantage: Threat actor profiles, campaign tracking, indicator context
DomainTools: WHOIS history, passive DNS, domain risk scoring
CrowdStrike Falcon Intelligence: Adversary profiles, indicator enrichment

TIP and SOAR Platforms

Threat Intelligence Platforms (TIPs)

TIPs are purpose-built for managing the intelligence lifecycle.

MISP (Malware Information Sharing Platform): Open-source TIP widely used in the CTI community. MISP stores indicators as "events" with attributes, supports STIX/TAXII for sharing, provides correlation across events, and offers a robust API for automation. The pymisp Python library enables programmatic interaction.

OpenCTI: Open-source platform built on STIX 2.1 as its native data model. OpenCTI provides entity relationship mapping, integration with MITRE ATT&CK, and connectors for dozens of data sources. It offers a GraphQL API for automation.

Commercial TIPs: Platforms like ThreatConnect, Anomali ThreatStream, and Recorded Future provide managed infrastructure, commercial feed integrations, and support. They reduce operational overhead but add licensing cost.

SOAR Platforms

Security Orchestration, Automation, and Response platforms (SOAR) extend automation beyond CTI into the broader security operations workflow.

TheHive + Cortex: TheHive is an open-source incident response platform; Cortex is its companion analysis engine. Cortex provides "analyzers" — modular enrichment scripts that query services like VirusTotal, Shodan, and MISP — and "responders" that take automated actions. This combination is popular for CTI-driven IR automation.

Splunk SOAR (formerly Phantom): Commercial SOAR platform with visual playbook design. Widely used for automating enrichment, triage, and response actions.

XSOAR (Palo Alto): Commercial SOAR with extensive integration marketplace and playbook library.

Python for CTI Automation

Python is the dominant language for CTI automation due to its extensive library ecosystem and readability.

Essential Libraries

Library	Purpose
`requests`	HTTP requests to APIs
`stix2`	Creating and parsing STIX 2.1 objects
`pymisp`	Interacting with MISP instances
`pandas`	Data manipulation and analysis
`defang` / `iocextract`	Extracting and defanging indicators from text
`ipaddress`	IP address validation and network operations
`json`	Parsing API responses
`logging`	Tracking script execution and errors
`ratelimit`	Managing API rate limits

Example: Basic Enrichment Workflow Structure

A practical enrichment pipeline follows this pattern:

Ingest: Read indicators from a file, API, or message queue
Validate: Confirm the indicator is properly formatted (valid IP, valid hash length, etc.)
Deduplicate: Check whether this indicator has been enriched recently to avoid redundant API calls
Enrich: Query multiple sources, handling rate limits and errors gracefully
Score: Apply a scoring model to the aggregated enrichment results
Store: Write enriched indicators to your TIP or database
Alert: Notify analysts if the indicator meets priority thresholds

API Rate Limiting and Management

Every API has rate limits. Exceeding them results in blocked requests, and repeated violations can lead to API key revocation. Best practices:

Track your limits: Document the rate limit for every API you use. Build rate limiting into your code, not as an afterthought.
Use exponential backoff: When you receive a 429 (Too Many Requests) response, wait progressively longer before retrying.
Cache results: Store enrichment results with a TTL (time-to-live). If the same indicator is queried again within the TTL, return cached results instead of making a new API call.
Prioritize queries: If you have 10,000 indicators to enrich and a daily limit of 500 API calls, score or prioritize indicators before enrichment so the most important ones are enriched first.
Use bulk endpoints: Some APIs (notably VirusTotal) offer batch query endpoints that are more efficient than individual lookups.

Building an Enrichment Workflow

A practical enrichment workflow for a small CTI team might look like this:

Feed ingestion script runs hourly, pulling indicators from RSS feeds, TAXII servers, and email-based advisories into a staging database
Deduplication checks each indicator against the existing database; only new indicators proceed
Enrichment pipeline queries VirusTotal, AbuseIPDB, and GreyNoise for each new indicator, respecting rate limits and caching results
Scoring engine applies weights: multiple source reporting (+), associated with tracked threat actor (+), only seen by one low-confidence source (-), known benign infrastructure (-)
TIP ingestion pushes scored indicators into MISP with appropriate tags and context
SIEM push exports high-confidence indicators to Splunk/Elastic lookup tables for real-time detection
Analyst notification sends a daily digest email summarizing new high-priority indicators requiring human review

Maintaining Automation Over Time

Automation is not "set and forget." Common maintenance requirements:

API changes: Services update their APIs, deprecate endpoints, or change response formats. Monitor changelogs for services you depend on.
Feed quality drift: A feed that was high-quality six months ago may have degraded. Periodically review feed false positive rates and relevance.
Infrastructure updates: Scripts break when the underlying infrastructure changes (Python version, library updates, TIP upgrades). Pin dependency versions and test updates before deploying.
Scaling: As your indicator volume grows, scripts that worked with 100 indicators per day may fail with 10,000. Design for growth from the beginning.
Documentation: Document every automated workflow — what it does, what it depends on, how to troubleshoot it, and who maintains it. When the person who built it leaves, the documentation is all that remains.

Key Takeaways

Automate collection and processing tasks; preserve analysis and judgment for human analysts
Indicator enrichment is the highest-value automation target for most CTI teams
Build enrichment pipelines that validate, deduplicate, enrich, score, store, and alert
API rate limiting must be designed into automation from the start, not bolted on later
Open-source tools (MISP, OpenCTI, TheHive/Cortex) provide powerful automation capabilities without licensing costs
All automation requires ongoing maintenance — budget time for it or it will degrade

Practical Exercise

Build a Basic Enrichment Script

Using Python (or pseudocode if you prefer), design an enrichment pipeline for IP addresses:

Input: Create a text file with 10 IP addresses (use well-known public IPs like 8.8.8.8, 1.1.1.1 and fictional indicators).
Validation: Write a function that validates each IP address format and rejects invalid entries. Use Python's ipaddress module and filter out private (RFC 1918) and reserved addresses.
Enrichment Design: For each valid IP, document which APIs you would query and what data you would extract. Create a data structure (dictionary/JSON) that stores the combined enrichment results.
Scoring Model: Define a simple scoring algorithm. For example: base score of 0; +20 for each source reporting it as malicious; +30 if associated with a known threat actor; -20 if GreyNoise identifies it as a benign scanner; -50 if it belongs to a known CDN or cloud provider.
Output: Design the output format — what would you store in your TIP, what would you push to your SIEM, and what threshold would trigger an analyst notification?
Rate Limit Plan: Given free-tier API limits (see the table above), calculate how many indicators per day your pipeline could process and identify the bottleneck API.