Open Source Intelligence (OSINT) is one of the most accessible and valuable intelligence disciplines available to cyber threat intelligence analysts. According to the Office of the Director of National Intelligence (ODNI), OSINT is intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement. This lesson covers the types of open sources relevant to CTI, the tools used to collect and analyze them, and the critical operational security and ethical considerations that every analyst must understand.
Learning Objectives
- Define OSINT and explain its role within the cyber threat intelligence lifecycle
- Identify the major categories of open sources used in CTI collection
- Describe key OSINT tools and their primary use cases
- Apply operational security (OPSEC) principles during OSINT collection
- Evaluate source reliability and navigate ethical and legal boundaries
What Is OSINT?
OSINT (Open Source Intelligence): Intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement. — Office of the Director of National Intelligence (ODNI)
OSINT is often the first intelligence discipline a CTI analyst will use. Unlike signals intelligence (SIGINT) or human intelligence (HUMINT), OSINT relies on information that anyone can legally access. However, "publicly available" does not mean "easy to find" or "easy to interpret." The skill of an OSINT analyst lies in knowing where to look, how to collect efficiently, how to validate what is found, and how to synthesize raw data into actionable intelligence.
In the CTI context, OSINT supports nearly every phase of the intelligence cycle: identifying emerging threats, tracking threat actor infrastructure, discovering exposed credentials or data leaks, mapping attack surfaces, and validating indicators of compromise (IOCs).
Types of Open Sources
Social Media and Online Communities
Social media platforms (Twitter/X, LinkedIn, Telegram, Discord) are frequently used by threat actors for communication, recruitment, and even selling access to compromised networks. Analysts monitor these platforms to track threat actor personas, identify new campaigns, and gather early warnings about emerging threats. Telegram channels in particular have become a major vector for ransomware group communications and data leak announcements.
Paste Sites and Data Dumps
Sites like Pastebin, GitHub Gists, and various paste alternatives are commonly used to share stolen credentials, configuration files, and proof-of-compromise data. Monitoring these sites can provide early warning of breaches and expose indicators that feed into detection efforts.
Code Repositories
GitHub, GitLab, and Bitbucket often contain accidentally exposed secrets (API keys, passwords, internal configurations), malware source code, and proof-of-concept exploit code. Threat actors also use repositories to host tooling and command-and-control (C2) infrastructure components.
DNS Records and WHOIS Data
DNS records (A, AAAA, MX, TXT, CNAME, NS) reveal how domains resolve and can expose relationships between infrastructure components. WHOIS data — though increasingly redacted due to GDPR — can still provide registration dates, registrar information, and occasionally registrant details that help cluster related infrastructure. Historical WHOIS (via services like DomainTools) is particularly valuable for tracking infrastructure changes over time.
Certificate Transparency Logs
Certificate Transparency (CT) logs are publicly auditable records of TLS/SSL certificates issued by certificate authorities. Tools like crt.sh allow analysts to search these logs to discover subdomains, identify infrastructure provisioning patterns, and detect typosquatting or phishing domains that have obtained certificates.
Government and Regulatory Sources
CISA advisories, FBI flash alerts, NIST vulnerability data (NVD), and international CERT publications provide vetted, authoritative intelligence. These are high-reliability sources that should be incorporated into any CTI program's collection plan.
News and Industry Reports
Vendor threat reports (Mandiant, CrowdStrike, Recorded Future, Cisco Talos, Microsoft Threat Intelligence), security news outlets, and academic research papers provide analyzed intelligence that can be cross-referenced against internal observations.
Key OSINT Tools for CTI
| Tool | Primary Use | What It Provides |
|---|---|---|
| Shodan | Internet-connected device search | Exposed services, banners, vulnerabilities, device metadata |
| Censys | Internet asset discovery | Certificate data, host configurations, protocol information |
| VirusTotal | File and URL analysis | Multi-engine scan results, behavioral analysis, community comments, relationship graphs |
| urlscan.io | URL investigation | Screenshots, DOM snapshots, network requests, resource hashes |
| AlienVault OTX | Threat intelligence sharing | Community-sourced IOC pulses, related indicators, adversary tracking |
| crt.sh | Certificate transparency search | Issued certificates by domain, subdomain discovery |
| Wayback Machine | Historical web content | Archived versions of web pages, useful for tracking changes to threat actor sites |
| WHOIS lookup tools | Domain registration data | Registrant info, registration dates, name servers |
| SpiderFoot | Automated OSINT collection | Multi-source correlation across dozens of data sources |
Using These Tools Effectively
No single tool provides a complete picture. Effective OSINT collection requires combining multiple sources and cross-referencing findings. For example, when investigating a suspicious domain:
- Check WHOIS for registration details and dates
- Search crt.sh for related certificates and subdomains
- Query Shodan or Censys for services running on the resolved IP
- Submit the URL to urlscan.io for a safe rendered preview
- Check VirusTotal for community detections and behavioral data
- Search AlienVault OTX for related threat intelligence pulses
This layered approach builds a more complete and reliable picture than any single query.
OPSEC Considerations During Collection
Operational security during OSINT collection is critical. Careless collection can alert adversaries, compromise ongoing investigations, or expose your organization's intelligence interests.
Key OPSEC Principles
- Do not interact directly with threat actor infrastructure from attributable networks. Use VPNs, Tor, or dedicated research infrastructure to avoid exposing your organization's IP addresses.
- Use dedicated research accounts that are not linked to your real identity or your organization. Never use personal or corporate accounts for threat actor engagement.
- Be aware of tracking pixels, canary tokens, and honeypots. Threat actors sometimes plant these in leaked documents or paste site dumps to identify who is investigating them.
- Use passive collection methods when possible. Services like Shodan, Censys, and VirusTotal already scan infrastructure — querying their databases does not generate new traffic to the target.
- Document your collection methods. If intelligence may be used in legal proceedings or shared with law enforcement, the collection methodology matters.
- Separate research environments. Use virtual machines or dedicated hardware for OSINT research to prevent accidental cross-contamination with operational systems.
Source Reliability Evaluation
Not all open sources are equally trustworthy. Analysts must evaluate both the reliability of the source and the credibility of the information it provides. The NATO Admiralty System (also called the 6x6 system) is a widely used framework:
| Rating | Source Reliability | Information Credibility |
|---|---|---|
| A / 1 | Completely reliable | Confirmed by other sources |
| B / 2 | Usually reliable | Probably true |
| C / 3 | Fairly reliable | Possibly true |
| D / 4 | Not usually reliable | Doubtful |
| E / 5 | Unreliable | Improbable |
| F / 6 | Cannot be judged | Cannot be judged |
When evaluating OSINT sources, consider:
- Provenance: Where did this information originate? Is the source a known security vendor, an anonymous paste, or a social media post?
- Corroboration: Can the information be confirmed through independent sources?
- Timeliness: How current is the information? Stale IOCs can generate false positives.
- Motivation: Does the source have a reason to mislead? Threat actors sometimes plant false flags or disinformation.
Ethical and Legal Considerations
OSINT collection operates within legal boundaries, but those boundaries vary by jurisdiction and context.
- Legality: Just because information is publicly accessible does not mean collecting it is legal in all jurisdictions. Privacy laws such as GDPR (EU), CCPA (California), and various computer fraud statutes may restrict what you can collect and how you can use it.
- Terms of Service: Many platforms prohibit scraping or automated collection. Violating ToS may not be criminal, but it can result in account bans and, in some cases, civil liability.
- Ethical boundaries: Avoid collecting information that is not relevant to your intelligence requirement. Do not engage in social engineering, impersonation, or deception during passive OSINT collection.
- Data handling: Collected OSINT may contain personally identifiable information (PII). Handle this data according to your organization's data protection policies and applicable regulations.
- Attribution caution: OSINT alone rarely provides sufficient evidence for definitive attribution. Present findings with appropriate confidence language and avoid making claims that exceed what the evidence supports.
Key Takeaways
- OSINT is intelligence derived from publicly available information and is foundational to CTI operations
- Open sources span social media, paste sites, code repos, DNS/WHOIS, certificate transparency logs, government advisories, and vendor reports
- Effective OSINT requires combining multiple tools and cross-referencing findings — no single source is sufficient
- OPSEC during collection is essential: use passive methods, dedicated infrastructure, and research personas
- Evaluate every source for reliability and credibility using structured frameworks like the Admiralty System
- Legal and ethical boundaries must be understood and respected — "publicly available" does not mean "anything goes"
Practical Exercise
Infrastructure Investigation Exercise:
Pick a known-malicious domain from a recent CISA advisory (visit cisa.gov/news-events/cybersecurity-advisories) and perform a passive investigation:
- Look up the domain's WHOIS registration data — note the registration date, registrar, and any available registrant information
- Search crt.sh for any certificates issued to the domain or its subdomains
- Query VirusTotal for the domain — review detections, passive DNS history, and any associated files
- Search Shodan for the IP address the domain resolves to — document open ports and services
- Check AlienVault OTX for any threat intelligence pulses that reference the domain
Compile your findings into a brief summary (one paragraph) that answers: What can you assess about this infrastructure based solely on open sources? Rate the reliability of each source you used.
Further Reading
- Open Source Intelligence Techniques by Michael Bazzell — comprehensive reference on OSINT methodology and tools (regularly updated editions)
- NIST SP 800-150: Guide to Cyber Threat Information Sharing — framework for sharing and consuming threat intelligence, including OSINT (https://csrc.nist.gov/publications/detail/sp/800-150/final)
- ODNI Open Source Center — the U.S. Intelligence Community's primary OSINT organization and its publicly available guidance
- NATO Open Source Intelligence Handbook — foundational doctrine on OSINT collection, processing, and dissemination within a military/intelligence context