Open Source Intelligence (OSINT) for CTI — CTI Academy

Open Source Intelligence (OSINT) is one of the most accessible and valuable intelligence disciplines available to cyber threat intelligence analysts. According to the Office of the Director of National Intelligence (ODNI), OSINT is intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement. This lesson covers the types of open sources relevant to CTI, the tools used to collect and analyze them, and the critical operational security and ethical considerations that every analyst must understand.

Learning Objectives

Define OSINT and explain its role within the cyber threat intelligence lifecycle
Identify the major categories of open sources used in CTI collection
Describe key OSINT tools and their primary use cases
Apply operational security (OPSEC) principles during OSINT collection
Evaluate source reliability and navigate ethical and legal boundaries

What Is OSINT?

OSINT (Open Source Intelligence): Intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement. — Office of the Director of National Intelligence (ODNI)

OSINT is often the first intelligence discipline a CTI analyst will use. Unlike signals intelligence (SIGINT) or human intelligence (HUMINT), OSINT relies on information that anyone can legally access. However, "publicly available" does not mean "easy to find" or "easy to interpret." The skill of an OSINT analyst lies in knowing where to look, how to collect efficiently, how to validate what is found, and how to synthesize raw data into actionable intelligence.

In the CTI context, OSINT supports nearly every phase of the intelligence cycle: identifying emerging threats, tracking threat actor infrastructure, discovering exposed credentials or data leaks, mapping attack surfaces, and validating indicators of compromise (IOCs).

Types of Open Sources

Social Media and Online Communities

Social media platforms (Twitter/X, LinkedIn, Telegram, Discord) are frequently used by threat actors for communication, recruitment, and even selling access to compromised networks. Analysts monitor these platforms to track threat actor personas, identify new campaigns, and gather early warnings about emerging threats. Telegram channels in particular have become a major vector for ransomware group communications and data leak announcements.

Paste Sites and Data Dumps

Sites like Pastebin, GitHub Gists, and various paste alternatives are commonly used to share stolen credentials, configuration files, and proof-of-compromise data. Monitoring these sites can provide early warning of breaches and expose indicators that feed into detection efforts.

Code Repositories

GitHub, GitLab, and Bitbucket often contain accidentally exposed secrets (API keys, passwords, internal configurations), malware source code, and proof-of-concept exploit code. Threat actors also use repositories to host tooling and command-and-control (C2) infrastructure components.

DNS Records and WHOIS Data

DNS records (A, AAAA, MX, TXT, CNAME, NS) reveal how domains resolve and can expose relationships between infrastructure components. WHOIS data — though increasingly redacted due to GDPR — can still provide registration dates, registrar information, and occasionally registrant details that help cluster related infrastructure. Historical WHOIS (via services like DomainTools) is particularly valuable for tracking infrastructure changes over time.

Certificate Transparency Logs

Certificate Transparency (CT) logs are publicly auditable records of TLS/SSL certificates issued by certificate authorities. Tools like crt.sh allow analysts to search these logs to discover subdomains, identify infrastructure provisioning patterns, and detect typosquatting or phishing domains that have obtained certificates.

Government and Regulatory Sources

CISA advisories, FBI flash alerts, NIST vulnerability data (NVD), and international CERT publications provide vetted, authoritative intelligence. These are high-reliability sources that should be incorporated into any CTI program's collection plan.

News and Industry Reports

Vendor threat reports (Mandiant, CrowdStrike, Recorded Future, Cisco Talos, Microsoft Threat Intelligence), security news outlets, and academic research papers provide analyzed intelligence that can be cross-referenced against internal observations.

Key OSINT Tools for CTI

Tool	Primary Use	What It Provides
Shodan	Internet-connected device search	Exposed services, banners, vulnerabilities, device metadata
Censys	Internet asset discovery	Certificate data, host configurations, protocol information
VirusTotal	File and URL analysis	Multi-engine scan results, behavioral analysis, community comments, relationship graphs
urlscan.io	URL investigation	Screenshots, DOM snapshots, network requests, resource hashes
AlienVault OTX	Threat intelligence sharing	Community-sourced IOC pulses, related indicators, adversary tracking
crt.sh	Certificate transparency search	Issued certificates by domain, subdomain discovery
Wayback Machine	Historical web content	Archived versions of web pages, useful for tracking changes to threat actor sites
WHOIS lookup tools	Domain registration data	Registrant info, registration dates, name servers
SpiderFoot	Automated OSINT collection	Multi-source correlation across dozens of data sources

Using These Tools Effectively

No single tool provides a complete picture. Effective OSINT collection requires combining multiple sources and cross-referencing findings. For example, when investigating a suspicious domain:

Check WHOIS for registration details and dates
Search crt.sh for related certificates and subdomains
Query Shodan or Censys for services running on the resolved IP
Submit the URL to urlscan.io for a safe rendered preview
Check VirusTotal for community detections and behavioral data
Search AlienVault OTX for related threat intelligence pulses

This layered approach builds a more complete and reliable picture than any single query.

OPSEC Considerations During Collection

Operational security during OSINT collection is critical. Careless collection can alert adversaries, compromise ongoing investigations, or expose your organization's intelligence interests.

Key OPSEC Principles

Do not interact directly with threat actor infrastructure from attributable networks. Use VPNs, Tor, or dedicated research infrastructure to avoid exposing your organization's IP addresses.
Use dedicated research accounts that are not linked to your real identity or your organization. Never use personal or corporate accounts for threat actor engagement.
Be aware of tracking pixels, canary tokens, and honeypots. Threat actors sometimes plant these in leaked documents or paste site dumps to identify who is investigating them.
Use passive collection methods when possible. Services like Shodan, Censys, and VirusTotal already scan infrastructure — querying their databases does not generate new traffic to the target.
Document your collection methods. If intelligence may be used in legal proceedings or shared with law enforcement, the collection methodology matters.
Separate research environments. Use virtual machines or dedicated hardware for OSINT research to prevent accidental cross-contamination with operational systems.

Source Reliability Evaluation

Not all open sources are equally trustworthy. Analysts must evaluate both the reliability of the source and the credibility of the information it provides. The NATO Admiralty System (also called the 6x6 system) is a widely used framework:

Rating	Source Reliability	Information Credibility
A / 1	Completely reliable	Confirmed by other sources
B / 2	Usually reliable	Probably true
C / 3	Fairly reliable	Possibly true
D / 4	Not usually reliable	Doubtful
E / 5	Unreliable	Improbable
F / 6	Cannot be judged	Cannot be judged

When evaluating OSINT sources, consider:

Provenance: Where did this information originate? Is the source a known security vendor, an anonymous paste, or a social media post?
Corroboration: Can the information be confirmed through independent sources?
Timeliness: How current is the information? Stale IOCs can generate false positives.
Motivation: Does the source have a reason to mislead? Threat actors sometimes plant false flags or disinformation.

Ethical and Legal Considerations

OSINT collection operates within legal boundaries, but those boundaries vary by jurisdiction and context.

Legality: Just because information is publicly accessible does not mean collecting it is legal in all jurisdictions. Privacy laws such as GDPR (EU), CCPA (California), and various computer fraud statutes may restrict what you can collect and how you can use it.
Terms of Service: Many platforms prohibit scraping or automated collection. Violating ToS may not be criminal, but it can result in account bans and, in some cases, civil liability.
Ethical boundaries: Avoid collecting information that is not relevant to your intelligence requirement. Do not engage in social engineering, impersonation, or deception during passive OSINT collection.
Data handling: Collected OSINT may contain personally identifiable information (PII). Handle this data according to your organization's data protection policies and applicable regulations.
Attribution caution: OSINT alone rarely provides sufficient evidence for definitive attribution. Present findings with appropriate confidence language and avoid making claims that exceed what the evidence supports.

Key Takeaways

OSINT is intelligence derived from publicly available information and is foundational to CTI operations
Open sources span social media, paste sites, code repos, DNS/WHOIS, certificate transparency logs, government advisories, and vendor reports
Effective OSINT requires combining multiple tools and cross-referencing findings — no single source is sufficient
OPSEC during collection is essential: use passive methods, dedicated infrastructure, and research personas
Evaluate every source for reliability and credibility using structured frameworks like the Admiralty System
Legal and ethical boundaries must be understood and respected — "publicly available" does not mean "anything goes"

Practical Exercise

Infrastructure Investigation Exercise:

Pick a known-malicious domain from a recent CISA advisory (visit cisa.gov/news-events/cybersecurity-advisories) and perform a passive investigation:

Look up the domain's WHOIS registration data — note the registration date, registrar, and any available registrant information
Search crt.sh for any certificates issued to the domain or its subdomains
Query VirusTotal for the domain — review detections, passive DNS history, and any associated files
Search Shodan for the IP address the domain resolves to — document open ports and services
Check AlienVault OTX for any threat intelligence pulses that reference the domain

Compile your findings into a brief summary (one paragraph) that answers: What can you assess about this infrastructure based solely on open sources? Rate the reliability of each source you used.