Introducing SandyClaw - The First Dynamic Sandbox for AI Agent Skills and Prompts

[GET ACCESS]
Close Icon
Linkedin
Linkedin
Illustration Cloud

Introducing SandyClaw - The First Dynamic Sandbox for AI Agent Skills and Prompts

TL;DR

  • AI agent skill marketplaces are the new software supply chain, and attackers are already exploiting them.
  • The current generation of skill scanners relies on static code checks or LLM-based evaluation. Neither executes the skill, which means neither detects malicious behavior that only reveals itself at runtime.
  • SandyClaw is the first platform to apply dynamic sandbox detonation to AI agent skills.
  • It runs the skill in a controlled environment, records every action at the LLM and OS level, and delivers a verdict backed by behavioral evidence rather than inference.
  • Available now at sandyclaw.permiso.io.

The supply chain nobody is securing

AI agents need skills to do useful work. Skills are downloadable capability packages that teach agents how to interact with tools, APIs, and services. They are distributed through open marketplaces, installed with a single command, and executed with whatever permissions the agent's identity already has.

That last part is the problem. A skill does not bring its own credentials. It inherits the agent's IAM roles, API tokens, and service connections. A malicious skill does not need to break in. It runs with the keys the agent already holds.

Attackers figured this out quickly. Malicious skills have been found on major agent marketplaces: credential stealers, reverse shells, data exfiltration routines disguised as productivity tools. One industry audit found over 1,100 malicious skills on a single marketplace. The agent skill ecosystem is the software supply chain problem all over again, except the packages run with broader access, less oversight, and almost no security validation.

What scanning gets wrong

The market responded with skill scanners. Dozens of them. The approaches fall into two categories: static code analysis (pattern matching against known-bad signatures) and LLM-based evaluation (sending the skill's code to another model and asking whether it looks safe).

Both share the same fundamental limitation: neither runs the skill. They analyze what the code looks like and attempt to predict what it will do. A skill that looks clean in source but makes unauthorized network calls, accesses environment variables, or exfiltrates data when it actually executes will pass both approaches undetected.

LLM-based evaluation adds a second problem. The same class of technology that can be manipulated through prompt injection is being used as the security gate. A skill designed to evade an LLM evaluator can include content that influences the evaluating model's judgment. The judge and the defendant are built from the same architecture.

We saw this coming

Permiso's threat research team, whose detection capabilities recently won the SC Award 2026 for Best Threat Detection Technology, was documenting malicious agent skills in the wild while the rest of the industry was still debating whether the risk was real. That recognition was built on years of detecting threats across human, non-human, and AI identities, the same detection philosophy that now powers SandyClaw.

When that team turned its attention to the agent skill ecosystem, the findings confirmed what they suspected: the scanning approaches being adopted by the market had an obvious ceiling. They could see what a skill's code looked like. They could not see what a skill does when it runs.

That research led directly to SandyClaw.

Introducing SandyClaw

SandyClaw is the first platform to bring dynamic sandbox detonation to AI agent skills. Unlike static scanning or runtime containment approaches, SandyClaw executes the skill in a controlled, instrumented environment and records everything that happens.

The concept is borrowed from a methodology the cybersecurity industry has relied on for over a decade in other contexts. Static analysis of executable files was the dominant approach to malware detection until attackers learned to write code that looked benign and behaved maliciously. The industry's answer was sandbox detonation: run the file in a controlled environment and watch what it does. SandyClaw applies that same principle to agent skills.

SandyClaw homepage

How it works

SandyClaw follows a four-step process: submit, detonate, analyze, and decide.

Step 1: Submit a skill. Upload a skill from any agent framework or marketplace. Select a model and provide a test prompt. SandyClaw accepts skills from OpenClaw, Cursor, Codex, and other frameworks that use downloadable skill packages.

Step 2: Sandbox execution. SandyClaw spins up an instrumented sandbox environment and runs the skill with a live agent. This is not a simulation. The skill executes with real LLM interactions in a controlled environment where every action is captured at both the LLM and operating system level.

Step 3: Behavioral recording. While the skill runs, SandyClaw records everything: every LLM action and decision, all outbound network calls and domain resolutions, SSL-encrypted traffic (intercepted and decrypted inside the sandbox), every file read and write attempt, environment variable and credential access attempts, and DNS resolution and system-level events. Nothing the skill does goes unobserved.

Step 4: Verdict and report. Detection analysis runs against multiple engines: Sigma, YARA, Nova, and Snort, augmented with custom detection rules written by the Permiso security research team. The output is a clear verdict with a complete behavioral record. Security teams can inspect every action the skill took, review every detection engine match, and verify the determination themselves. No black boxes. No confidence scores. Evidence.

How SandyClaw compares

The table below summarizes the difference between the two existing approaches and what SandyClaw introduces.

Capability Static scanning LLM-based evaluation Runtime containment Permiso SandyClaw
Executes the skill No No No (restricts agent, not skill) Yes
Detects runtime behavior No No Monitors agent messages, not skill behavior Yes
Network monitoring No No Controls egress via policy Yes + SSL intercept
Detection engines Rule-based signatures LLM judgment Static scanners + policy Sigma, YARA, Nova, Snort + custom
Verdict evidence Code references LLM reasoning Scan findings + telemetry Full behavioral log
Evasion resistance Low (obfuscation bypasses) Medium (prompt manipulation) Medium (static scanning limits) High (behavior is observable)
Skill intelligence repository No No No Yes (searchable database)
Cross-framework Varies Varies Platform-specific OpenClaw, Cursor, Codex+

What makes SandyClaw different

Dynamic detonation is the core differentiator, but it is not the only one. SandyClaw was designed as a complete skill security platform, not just a sandbox.

Dynamic behavioral analysis. SandyClaw does not read skill code and guess at intent. It runs the skill and watches what happens. Every LLM action, every network call, every file write, every environment variable access attempt is captured as it occurs. If a skill tries to exfiltrate data, phone home to a command-and-control server, or access sensitive files, SandyClaw sees it happen in real time. This is the fundamental difference between SandyClaw and every other tool on the market: observed behavior versus inferred intent.

Multi-engine detection stack. SandyClaw does not rely on a single detection methodology. Analysis runs against Sigma, YARA, Nova, and Snort, augmented with proprietary detection rules written by the team that was finding malicious skills in the wild before anyone else was looking. Multiple engines catch what a single approach would miss.

SandyClaw Analysis

Full traffic visibility with SSL intercept. Most sandbox environments cannot decrypt outbound traffic. SandyClaw performs SSL interception inside the sandbox, which means exfiltration attempts over HTTPS are fully visible and inspectable. A skill that sends stolen credentials to an attacker-controlled endpoint over an encrypted connection gets caught.

Searchable skill intelligence repository. SandyClaw maintains a database of every skill it has analyzed. Before you run your own analysis, you can check whether someone has already submitted that skill and what the verdict was. This turns SandyClaw from a one-off analysis tool into a continuous reference for the agent skill ecosystem.

Agent-native integration. Agents themselves can be configured to submit skills to SandyClaw for validation before installation. This enables automated security gates directly within agentic workflows. One Fortune 500 customer is already running their entire internal skill registry through SandyClaw using this capability. SandyClaw also integrates with the Permiso platform, enabling automatic skill analysis as part of the broader agent monitoring workflow.

Full verdict transparency. SandyClaw does not return a confidence score. It returns the complete behavioral record behind every determination: every file written, every domain resolved, every network call made. Security teams can verify the finding themselves rather than trusting an opaque label.

Get started

SandyClaw is available now. Permiso platform customers receive unrestricted access. Security teams can sign up at sandyclaw.permiso.io to get started.

If you are evaluating agent skills for your organization and want to see what your current scanning approach is missing, submit a skill and see what SandyClaw finds.

Want to see SandyClaw in action? Watch Ian Ahl, CTO at Permiso Security, walk through the platform, detonate a malicious skill, and explain the detection
methodology in the full demo -> Full episode here

SandyClaw v1 (728x90) (1)

 

Frequently Asked Questions 

What is SandyClaw?

SandyClaw is a dynamic analysis platform built by Permiso Security that applies sandbox detonation to AI agent skills. Instead of scanning a skill's source code, SandyClaw executes the skill in a controlled environment, records every action at the LLM and operating system level, and delivers a verdict backed by observed behavior. It runs detection analysis against Sigma, Yara, Nova, and Snort engines augmented with custom rules written by the Permiso security research team.

How is SandyClaw different from static skill scanners?

Static scanners analyze a skill's code without executing it. They catch known-bad patterns but miss any behavior that only manifests at runtime, such as unauthorized network calls, environment variable exfiltration, or data sent to external webhooks. LLM-based scanners send the code to another model for evaluation, which introduces the limitation of asking the same technology that can be manipulated through prompt injection to serve as the security gate. SandyClaw detonates the skill and records what it actually does, closing the gap both approaches leave open.

What does SandyClaw detect during a skill analysis?

SandyClaw records every LLM action, network request, domain resolution, file write, and environment variable access attempt during execution. It also performs SSL interception inside the sandbox, decrypting encrypted outbound traffic so that exfiltration attempts over HTTPS are fully visible. The result is a complete behavioral record that security teams can verify independently.

Which agent frameworks does SandyClaw support?

SandyClaw works across all major agent frameworks, including OpenClaw, Cursor, Codex, and others that use skills or plugins. Framework choice does not change the analysis. If an agent uses downloadable skills, SandyClaw can analyze them.

How do I get access to SandyClaw?

SandyClaw is available now at sandyclaw.permiso.io. Anyone can sign up and start analyzing skills. Permiso platform customers receive unrestricted access. Agents can also be configured to submit skills to SandyClaw for validation before installation, enabling automated security gates within agentic workflows.

Illustration Cloud

Related Articles

8 Critical AI Security Challenges and How Permiso Solves Them

AI adoption is accelerating faster than security teams can track it. While organizations rush to deploy AI agents and integrate AI services across their operations, they're creating massive blind spots in their security posture. The problem isn't

Comprehensive Identity Visibility and Intelligence with Permiso Discover

In cybersecurity, we often rush to solutions, implementing detection tools and response platforms, without first answering a fundamental question: What identities do we actually have?

Permiso Delivers Complete AI Security Through Unified Identity Platform

The speed of artificial intelligence adoption is unprecedented. From automating code generation to powering customer service, AI is no longer a strategic option but a business imperative. Yet, with this rapid integration comes a new, expansive

View more posts