The Best Claude Code Skills for DevOps (2026 Picks)

Hand-picked 10 most-installed Claude Code skills for cloud, Kubernetes, Terraform, CI/CD, and observability work — with install commands and when each fits.

By Shen Huang··7 min read·
claude codeskillsdevopskubernetesterraform

DevOps is the workflow where Claude Code earns its money or wastes your time. A wrong kubectl delete, an applied Terraform plan that nukes a Postgres volume, a deploy script that pushes to prod when you meant staging — these are the moments where you most need the agent to slow down, check itself, and refuse to do the dangerous thing. Skills are how you teach it to.

The 10 skills below are picked from our 1,998-skill index sorted by install count, then filtered to the ones that actually carry their weight in an infra-heavy workflow. The pattern is consistent: a good devops skill encodes guardrails, references a specific tool's quirks, and refuses to operate without verification. Skills that try to "be a senior devops engineer" lose. Skills that codify "always run terraform plan before terraform apply, refuse if plan shows replacement of stateful resources" win.

How to Pick a DevOps Skill

Three questions before installing any skill:

  1. Does it match a tool you actually use? A k8s skill won't help an ECS shop. An Azure skill won't help an AWS account. Match exactly.
  2. Does it have explicit guardrails? Read the SKILL.md. If you don't see "NEVER do X", "always verify Y", or a refusal pattern, it's probably a verbose persona file, not a real skill.
  3. What does it cost in context? Run /skills show <name> to see body length. Anything over ~3,000 tokens should be doing real work — references, code samples, decision trees — not vibes.

Browse the full filterable list at orangebot.ai/skills/devops.

The Picks

1. azure-deploy (microsoft/github-copilot-for-azure, ~146K installs)

The official Microsoft Azure deploy skill. Wraps azd and Terraform with built-in error recovery, retry logic, and the specific gotchas Azure throws at you (resource group naming rules, deployment slot conventions, SKU availability per region). If your stack lives in Azure App Service / Container Apps / AKS, this is the baseline.

/skills add github.com/microsoft/github-copilot-for-azure azure-deploy

When it fits: any Azure-native deploy. When it doesn't: AWS or GCP — use the analogous official skills.

2. azure-diagnostics (microsoft/github-copilot-for-azure, ~145K installs)

The companion to azure-deploy. Reads Application Insights, Log Analytics, and resource-level metrics to triage a failing service. Worth installing even if you don't use the deploy skill — when something is broken at 11 PM, having an agent that knows the KQL query patterns for Azure logs saves an hour of doc-diving.

When it fits: troubleshooting Azure Container Apps, Function Apps, App Service issues. When it doesn't: AWS CloudWatch / Datadog work — use observability-specific skills.

3. azure-validate (microsoft/github-copilot-for-azure, ~146K installs)

Pre-deployment Bicep/Terraform template validator. Catches the typos that cost 20 minutes of azd up rollback. Smaller and lower-risk than the deploy skill — even shops with custom CD pipelines often run this as a manual gate.

When it fits: any IaC author touching Bicep/Terraform for Azure. When it doesn't: cdk/Pulumi codebases.

4. azure-observability (microsoft/github-copilot-for-azure, ~115K installs)

Specialized in Azure Monitor metrics, App Insights APM, and Log Analytics KQL queries. This is the skill that knows the difference between a requests/duration Application Insights query and a Perf | where ObjectName == "Processor" Log Analytics query. Pair with azure-diagnostics for the full observability loop.

When it fits: Azure-hosted services with App Insights instrumented. When it doesn't: OpenTelemetry-only stacks pushing to Tempo/Loki.

5. azure-rbac (microsoft/github-copilot-for-azure, ~146K installs)

Identifies the right RBAC role for a task and generates the az role assignment CLI or Bicep snippet. Saves you from the perennial "what's the minimal-permission role for a service principal that needs to read a Storage container and push to Key Vault?" question. Refuses to recommend Owner or Contributor unless you really insist.

When it fits: any Azure security work. When it doesn't: AWS IAM (use the analogous AWS skill).

6. azure-cost-optimization (microsoft/github-copilot-for-azure, ~146K installs)

Scans Azure subscriptions for idle resources, oversized SKUs, and missing autoscale configs. The kind of skill that pays for itself the first month you run it on a year-old subscription — Azure Reservations alone usually find 20–40% savings on a steady-state workload.

When it fits: any Azure account older than 6 months. When it doesn't: dev sandboxes with negligible spend.

7. azure-prepare (microsoft/github-copilot-for-azure, ~146K installs)

Generates Azure infrastructure code (Bicep or Terraform) from a scenario description — "I need a Web App with a Postgres flexible server, App Insights, and Key Vault, in East US." Replaces the "copy-paste from microsoft-learn examples and edit" workflow with a guided code-gen pass.

When it fits: bootstrapping a new Azure environment. When it doesn't: existing IaC repos with team conventions you must follow.

8. terraform-plan-review (community, ~820 weekly installs)

Tool-agnostic Terraform plan-output reviewer. Flags risky operations (resource destruction, replacement of stateful resources, security-group widening) before you apply. The single most important skill if you run Terraform in any cloud — the default terraform apply UX is too permissive, and a skill that adds a forced "what's being destroyed?" check pays for itself the first time it catches a typo.

When it fits: any Terraform user. When it doesn't: pure Pulumi / CDK shops.

9. chrome-devtools-mcp-connect (LichAmnesia/lich-skills, ~3.6K weekly installs)

Not strictly devops, but the skill that unlocks "agent inspects the live site after deploy" workflows. Connect Claude Code to a running Chrome via the DevTools protocol, then ask the agent to verify a page loads, check console errors, profile network waterfalls, or take a screenshot for a deploy report. Essential if your deploy pipeline ends in a web app the agent should sanity-check.

/skills add github.com/LichAmnesia/lich-skills chrome-devtools-mcp-connect

When it fits: any web-app deploy where you want post-deploy verification beyond a curl health check. When it doesn't: backend-only services (use verify instead).

10. ssh-deploy (various, ~2.2K weekly installs)

Opinionated remote-deploy skill for VPS / bare-metal hosts. Codifies the boring-but-critical steps: rsync the artifact, kill the prior process cleanly, start the new one under a process manager, health-check, rollback if 5xx. Skip if you live entirely in managed PaaS; install if you have at least one production box you SSH into.

When it fits: hand-managed Linux servers, Caddy/nginx fronted apps, anything not on Vercel/Fly/Railway. When it doesn't: pure serverless or PaaS deploys.

A Note on the "Run" and "Verify" Anthropic Skills

The community-darling run and verify skills from anthropic-experiments are not on this list because they're general-purpose, not devops-specific — but they pair beautifully with any of the above. verify is the skill that says "don't just write the change, actually run the app and confirm the change worked." That habit alone closes 60% of the bugs that survive a normal review. Install both globally.

/skills add anthropic-experiments run --global
/skills add anthropic-experiments verify --global

What's Missing from the Skills Ecosystem (Today)

A few categories where the index is still thin in May 2026:

  • Kubernetes-specific guardrails — there is no clear winner skill for "review this kubectl plan before applying" or "audit this Helm chart for security defaults." Most existing k8s skills are documentation wrappers.
  • GitHub Actions linting — actionlint exists as a CLI, but a skill that wraps it with project-specific opinions (composite actions, OIDC, secrets hygiene) is still a gap.
  • Datadog / Honeycomb / Tempo observability — Azure is well-covered, AWS CloudWatch has a few. The independent observability vendors are underrepresented despite massive enterprise install bases.

If you write one and it solves your team's problem, publish it. Most popular skills started as one engineer's annoyance.

FAQ

Can I run all 10 of these at once? Yes — skills are opt-in. The cost is only paid when triggered. Installing 10 skills costs zero context until one of them fires.

How do I uninstall a skill that's auto-triggering too aggressively? /skills disable <name> keeps it on disk but stops it from auto-loading. /skills remove <name> deletes it.

What's the difference between an Azure skill and an MCP server? The Azure skills above wrap the Azure CLI and Bicep/Terraform — they're instruction packs the agent loads. An Azure MCP server would expose live API tools (list resources, deploy, query logs) without leaving the agent. Microsoft ships both; the skills are cheaper to run and give better explanations, the MCP gives more direct action.

Do these skills work in OpenSeek / Codex / Gemini CLI? The SKILL.md body is portable markdown. Auto-trigger metadata is currently Claude-specific but adoption is spreading. Most skills work copied-as-system-prompt in any agent.


For the live, daily-updated devops skill list see orangebot.ai/skills/devops. To browse all 1,998 indexed skills filterable by domain and install count, hit orangebot.ai/skills. And if you want the bigger-picture argument for why skills exist at all and how they compare to MCP, see the Claude Code Skills Guide.

Get the OrangeBot.AI Daily Digest

Top AI & tech stories from 8 sources, curated daily. Free, no spam, one-click unsubscribe.

READ OTHER ARTICLES