Cloudflare Mythos Preview Experience — 8-Stage AI Vulnerability Discovery Pipeline — AI Pathfinder by Jason Fleagle
Cloudflare Mythos Preview Experience — 8-Stage AI Vulnerability Discovery Pipeline — AI Pathfinder by Jason Fleagle

Cloudflare just published one of the clearest signals yet about where AI-powered cybersecurity is heading.

Not toward a chatbot. Not toward a generic coding agent. Not toward a single magic model pointed at a repository with the instruction: “find vulnerabilities.”

Cloudflare’s Project Glasswing results point to something more important: The model is not the product. The harness is the product. And speed alone is not the strategy.

On May 18, 2026, Cloudflare shared what it learned after testing Anthropic’s Mythos Preview across more than fifty of its own repositories as part of Project Glasswing. The results were not just about speed. They were about architecture.

Mythos Preview showed a meaningful jump in two areas that matter deeply for real security work: exploit chain construction and proof generation. The model was not only finding suspicious code — it was reasoning across multiple small primitives, connecting them into working exploit paths, writing proof code, compiling it, running it, learning from failure, and trying again. That is a very different kind of security tool. And it changes the job of the defender.

Breaking This Down: Why Generic Agents Are the Wrong Shape

Cloudflare’s biggest lesson was simple: pointing a generic coding agent at a large repository does not work at enterprise scale. It will produce findings. It may even produce useful findings. But it will not produce meaningful coverage across a real codebase.

The reason is structural. Vulnerability research is not one long stream of work. It is many narrow investigations running in parallel. A strong human researcher does not ask, “Is this entire hundred-thousand-line repository secure?” They ask narrower questions:

  • Can attacker-controlled input reach this parser?
  • Does this trust boundary actually hold?
  • Can this memory safety bug become reachable from the network?
  • Does this low-severity issue become high-severity when chained with another primitive?

Then they do that again. And again. Across thousands of files, functions, attack classes, boundaries, libraries, and consumers. That is not a chat session. That is a system. Cloudflare built that system.

The Harness Is the Product: Cloudflare’s 8-Stage Pipeline

Cloudflare described a vulnerability discovery harness with multiple stages, each with a specific job:

StageFunction
ReconGives downstream agents shared context about the codebase.
HuntRuns many narrow vulnerability tasks in parallel.
ValidateBrings in an independent agent to try to disprove the finding.
GapfillRe-queues weakly covered areas for additional investigation.
DedupeCollapses duplicate root causes into single findings.
TraceDecides whether attacker-controlled input can actually reach the flaw.
FeedbackTurns reachable traces into new tasks for the next cycle.
ReportTurns results into structured, queryable data — not free-form prose.

That last point matters more than it sounds. Security teams do not need more impressive paragraphs. They need queryable, validated, prioritized, reproducible findings. They need a pipeline. They need controls. They need evidence. They need triage that can survive scale.

Shortly after Cloudflare published its post, Simone Margaritelli released a public GitHub project called evilsocket/audit — a from-scratch reimplementation of the eight-stage vulnerability-discovery pipeline described by Cloudflare. The repo is not Mythos. It is not magic. And it should not be run casually against sensitive code without a controlled environment. But it matters because it turns the Cloudflare architecture into something security teams can study, test, and adapt.

What Mythos Changed: Chaining Low-Severity Bugs Into Real Exploits

Cloudflare said Mythos Preview was a real step forward from previous general-purpose frontier models. The model could do something earlier models often struggled to finish: it could stitch pieces together.

Other models might identify an interesting bug, explain why it mattered, and then stop before proving exploitability. Mythos Preview could take low-severity bugs that might otherwise sit invisible in a backlog and chain them into a more severe exploit. That is the scary part — not because defenders can do this, but because attackers eventually will.

This is the same pattern we saw across the industry:

  • Anthropic’s Project Glasswing: Mythos Preview found thousands of zero-day vulnerabilities across major operating systems, browsers, and critical software at a level that could surpass all but the most skilled humans.
  • Palo Alto Networks: After using Anthropic’s Mythos and OpenAI’s frontier cyber models, Palo Alto reported 26 CVEs representing 75 issues across more than 130 products — compared with its usual volume of fewer than 5 CVEs in a month.
  • Microsoft’s MDASH: A multi-model agentic scanning harness with more than 100 specialized agents across preparation, scanning, validation, deduplication, proof, and remediation.

Different companies. Same pattern. The future of AI security is not one model doing everything. It is specialized agents, structured workflows, independent validation, evidence generation, and human-controlled remediation.

The Signal-to-Noise Problem: Why More Findings Isn’t the Goal

AI vulnerability discovery has a noise problem. Cloudflare called this out directly. Models are biased toward finding something. Ask a model to find bugs, and it will often return speculative findings wrapped in language like “possibly,” “potentially,” or “could in theory.” That is useful for exploration. It is expensive for triage.

Every false positive consumes human attention. Every speculative report burns time. Every low-confidence finding adds friction to a security team that is already overloaded. Cloudflare’s answer was not to trust the model more — it was to design a better system around it:

  • Add adversarial review.
  • Separate “is this code buggy?” from “can an attacker actually reach it?”
  • Run narrow tasks in parallel.
  • Use independent validation.
  • Collapse duplicates.
  • Trace reachability.
  • Turn reports into structured data.

The most important stage may be Trace — which asks whether attacker-controlled input can actually reach the bug from outside the system. That is the difference between “there is a flaw” and “there is a reachable vulnerability.” That distinction is going to matter more as AI-generated findings explode.

Speed Alone Will Not Save You

The natural reaction to Mythos-class models is to move faster. Scan faster. Patch faster. Compress the response cycle. Cloudflare pushed back on that.

Faster matters. But faster is not enough. If regression testing takes a day, you cannot honestly claim a two-hour patch SLA without skipping something important. And when you skip regression testing, you can create a worse failure than the one you were trying to fix. Cloudflare said it saw a version of this when model-written patches fixed the original bug but quietly broke something else the code depended on.

The harder question is architectural: How do you make exploitation harder even when a bug exists? That means controls in front of the application, segmentation inside the application, least privilege, faster global rollout, better isolation, and better blast-radius reduction.

AI-speed attackers require more than AI-speed patching. They require architecture that reduces the value of the bug.

The Operator Action Plan

Here is what security, infrastructure, and AI leaders should do this week:

1. Inventory your current vulnerability workflow.
Map how findings move from discovery to validation, ownership, patching, testing, rollout, monitoring, and closure. Find the slowest handoffs.

2. Separate discovery from reachability.
Do not let every AI-generated finding become an engineering fire drill. Build a process that asks whether attacker-controlled input can actually reach the flaw.

3. Build an adversarial validation step.
Use a second reviewer — human, agentic, or both — to disprove findings before they hit the remediation queue.

4. Start narrow.
Do not point an agent at your whole codebase. Pick one attack class, one boundary, one service, or one high-risk component.

5. Turn outputs into structured data.
AI-generated prose is not enough. Require schema-based findings with reproduction steps, evidence, reachability, severity, owner, and remediation status.

6. Strengthen architecture before the exploit arrives.
Reduce blast radius. Limit standing privilege. Put controls in front of exposed services. Make vulnerabilities harder to reach, not just faster to patch.

7. Define your AI Assurance layer.
Decide how AI-generated security work is reviewed, tested, logged, monitored, escalated, and improved over time.

If you are looking for help building a secure AI deployment strategy for your organization, reach out to the team at Netsync. We can help you determine the right architecture, governance, and action plan to get you where you need to go.


Frequently Asked Questions

What is Cloudflare’s Project Glasswing?

Project Glasswing is Cloudflare’s internal initiative to test AI-powered vulnerability discovery across its own codebases. In May 2026, Cloudflare published results from testing Anthropic’s Mythos Preview model across more than fifty repositories, revealing key architectural lessons about how AI security tools must be structured to be effective at enterprise scale.

What is Anthropic’s Mythos model?

Mythos is Anthropic’s frontier AI model designed for advanced reasoning tasks, including cybersecurity research. Mythos Preview demonstrated the ability to perform exploit chain construction and proof generation — reasoning across multiple code primitives to build working exploit paths, not just identify suspicious code.

Why can’t you just point an AI agent at your codebase?

Generic AI agents are the wrong shape for high-coverage vulnerability discovery. They hold one stream of work, chase one hypothesis, and eventually hit context limits. Real security coverage requires parallelism — many agents running narrow, independent tasks simultaneously. Without a structured harness, you get noise, not coverage.

What is the Trace stage in Cloudflare’s pipeline?

The Trace stage determines whether attacker-controlled input can actually reach a discovered flaw from outside the system. This is the critical distinction between “there is a bug” and “there is a reachable, exploitable vulnerability.” It prevents security teams from wasting remediation effort on findings that are not actually accessible to attackers.

What is evilsocket/audit?

evilsocket/audit is an open-source GitHub project by Simone Margaritelli that reimplements the eight-stage vulnerability discovery pipeline described by Cloudflare. It allows security teams to study and adapt the architecture, but should only be run in disposable, controlled environments due to its ability to compile and execute proof-of-concept code.

How does AI change the role of the security team?

AI does not replace security teams — it changes their job. The model can find more. The harness decides whether the organization can use what it finds. Security leaders must now design the operating model: the workflow, validation steps, structured outputs, and governance layer that turns AI-generated findings into actionable, trustworthy remediation.


References & Sources

[1] Cloudflare Blog: Project Glasswing — AI-Powered Vulnerability Discovery

[2] Anthropic: Mythos Preview — Frontier AI for Cybersecurity

[3] GitHub: evilsocket/audit — Open-Source Vulnerability Discovery Pipeline


About Jason Fleagle

Jason Fleagle is the Head of AI for Netsync and an AI and Growth Consultant working with global brands to help with their successful AI adoption and management. He helps humanize data — so every growth decision an organization makes is rooted in clarity, not confusion. He has overseen the development and delivery of over $50M in digital solutions, driving significant revenue growth and operational efficiency for his clients.

Jason is also the Founder of Catalyst Brand Group, where he blends AI software development, digital marketing, and automation to deliver revenue-first, real-world deployments. He is the creator of Growth OS and Personify, focusing on measurable ROI and clear deliverables.

Connect with Jason on LinkedIn to stay updated on the latest in AI, growth strategies, and enterprise technology.

Read the original post on LinkedIn: Cloudflare Published Their Mythos Preview Experience

Leave A Comment