About DevOps & Infra

DevOps covers infrastructure-as-code (Terraform, Pulumi), container orchestration (Kubernetes, Nomad), CI/CD (GitHub Actions, Argo), observability (Datadog, Honeycomb, Grafana), and cloud platforms (AWS, GCP, Azure). Feed surfaces tool releases, postmortems, and operational deep-dives.

TOPIC · DEVOPS

DevOps & Infra

Infrastructure, platforms, pipelines, and operational stories.

3 unique stories from the last 14 days across 8 sources.

GitHub Trending(1)

  1. NousResearch / hermes-agent

Hugging Face(1)

  1. Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

    General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction contract required for scoring. We introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol that makes heterogeneous agent harnesses, or claws, comparable under fair settings including a fixed prompt, runtime budget, workspace contract, patch extraction procedure, and evaluator. The full benchmark contains 350 GitHub issue-resolution instances across 8 languages and 43 repositories, drawn from SWE-bench-Multilingual and SWE-bench-Verified-Mini after future-commit cleanup. We also release Claw-SWE-Bench Lite for faster validation, which is an 80-instance subset selected by a cost-aware, rank-aware procedure over 17 calibration columns. On the full benchmark, OpenClaw with a minimal direct-diff adapter scores only 19.1% Pass@1, whereas the full adapter reaches 73.4% with the same GLM 5.1 backbone, showing that adapter design is essential for enabling OpenClaw-style harnesses to perform coding tasks effectively. Across an OpenClaw times nine-model sweep and a five-claw times two-model sweep, model choice changes Pass@1 by 29.4 pp and harness choice by 27.4 pp under fixed models; systems with similar accuracy can differ substantially in total API cost. Claw-SWE-Bench therefore treats harness and cost accounting as first-class axes of SWE-style coding-agent evaluation, providing both a full benchmark and a low-cost reference set for reproducible comparison. The data is available at https://github.com/opensquilla/claw-swe-bench and https://huggingface.co/datasets/TokenRhythm/Claw-SWE-Bench.

Techmeme(1)

  1. Q&A with Google DeepMind's Director of AGI Economics Alex Imas and Epoch AI's Phil Trammell on what remains scarce after AGI, redistributing AI wealth, and more (Dwarkesh Patel/Dwarkesh Podcast)

    Dwarkesh Patel / Dwarkesh Podcast : Q&A with Google DeepMind's Director of AGI Economics Alex Imas and Epoch AI's Phil Trammell on what remains scarce after AGI, redistributing AI wealth, and more —  “One robot now turns into many robots next year, but the number of ballerinas is the same.”  —  Economics of AGI episode w Alex Imas and Phil Trammell.

Browse other topics