Curated by Shen Huang · 86 stories · ~13 min read
DIGEST · 2026-06-02

OrangeBot.AI Digest — 2026-06-02

86 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. MAI-Thinking-1 (microsoft.ai)
  2. Gmail thinks I'm stupid, so I left (moddedbear.com)
  3. MAI-Code-1-Flash (microsoft.ai)
  4. Morningstar values SpaceX at $780B, half its IPO target (www.reuters.com)
  5. Larry Ellison: "Citizens will be on their best behavior because we’re recording" (www.techradar.com)
  6. Three Ways to Get Paid (2018) (jasonzweig.com)
  7. Coreutils for Windows (github.com)
  8. Please don't spam people looking for employment. It's just cruel
  9. A walking tour of surveillance infrastructure in Seattle (2020) (coveillance.org)
  10. Show HN: Eyeball (eyeball.rory.codes)
  11. Stop Ruining It (seths.blog)
  12. Apple rejected my dictation app for using the accessibility API (www.mitmllc.com)
  13. Love systemd timers (blog.tjll.net)
  14. Adafruit receives demand letter from Fenwick legal counsel on behalf of Flux.ai (blog.adafruit.com)
  15. Why Janet? (2023) (ianthehenry.com)

GitHub Trending(11)

  1. chopratejas / headroom
  2. microsoft / markitdown
  3. affaan-m / ECC
  4. D4Vinci / Scrapling
  5. nesquena / hermes-webui
  6. reconurge / flowsint
  7. OpenBMB / VoxCPM
  8. stefan-jansen / machine-learning-for-trading
  9. jamwithai / production-agentic-rag-course
  10. supermemoryai / supermemory
  11. Open-LLM-VTuber / Open-LLM-VTuber

Product Hunt(15)

  1. Gusto Cofounder

    If Gusto, OpenClaw, and Claude Cowork had a baby...

  2. Knock agent for Slack

    Build, manage, and ship customer messaging from Slack

  3. Branda

    A fun new way to create & manage brands.

  4. choclift

    Use iPhone to open apps, Apple Shortcuts and websites on Mac

  5. GlowPulse

    Your Mac's camera is now a heart-rate sensor

  6. findloc.ai

    Make your business citable by ChatGPT, Claude & Perplexity

  7. Vokal

    A collaboration space for 10x teammates with their Al agents

  8. Rodeo by TwelveLabs

    Describe your shot. Rodeo builds your first cut.

  9. Kompassify 2.0

    User onboarding now with an AI copilot

  10. Trovelo

    Plan and track your trips privately

  11. Mirowl

    Search all your screenshots via a local OCR-powered AI

  12. Overline

    Real-time AI captions and translation for any browser video

  13. Paste MCP & AI Tools

    Infinite clipboard for Claude, Codex and other AI tools

  14. PawPause

    Lock your keyboard and prevent cats from causing chaos

  15. Enshittifier

    Chrome extension that replaces "AI" with 💩

Hugging Face(15)

  1. A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

    As agent capabilities advance, existing benchmarks, such as τ^2-Bench, are becoming increasingly saturated. Yet constructing new benchmark tasks remains complex, costly, and labor-intensive. Moreover, the standard approach, in which scenarios are first written in natural language and then mapped to tool sequences, captures only a narrow subset of the tool-use patterns agents exercise. In this paper, we address these problems by reversing the task construction process. We propose TASTE: Task Synthesis from Tool Sequence Evolution, an automatic method that generates challenging tasks with broader tool-use coverage. TASTE utilizes an Adaptive Contrastive n-gram model trained on LLM-judged validity signals. This enables sampling valid tool sequences that cover a vast range of tool combinations. TASTE then selects representative sequences from the pool via clustering, instantiates them into complete benchmark tasks, and refines them through iterative difficulty evolution. Using TASTE, we construct τ^c-Bench, a challenging extension of the three domains of τ^2-Bench. We evaluate 11 agent/user LLM pairs and find that models nearly saturating τ^2-Bench suffer severe performance drops on our tasks (e.g., Gemini-3-Flash falls from 0.82!-!0.94 to 0.28!-!0.61). Beyond increasing difficulty, our generated tasks more than double the number of unique tool combinations agents must execute. Our results suggest high scores on existing benchmarks often reflect saturation rather than robust task-solving ability. By automating the generation of difficult, high-coverage benchmarks, TASTE enables continuous, scalable evaluation of future agents.

  2. Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

    Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available at https://github.com/pat-jj/harness-1.

  3. Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

    Speculative decoding accelerates LLM inference by drafting multiple tokens and verifying them in parallel with the target model. However, its practical speedup is constrained by the trade-off between draft quality and drafting cost: autoregressive drafters model causal dependencies among draft tokens but incur sequential overhead, while parallel drafters reduce drafting cost but weaken intra-block dependency modeling. In this paper, we propose Domino, a speculative decoding framework that decouples causal dependency modeling from expensive autoregressive draft execution. Domino first uses a parallel draft backbone to produce preliminary draft distributions for the entire block, and then applies a lightweight Domino head to refine them with prefix-dependent causal information. To stabilize teacher-forced causal encoding, we further introduce a base-anchored training curriculum that first strengthens the parallel backbone and then gradually shifts optimization toward the causally corrected final distribution. Experiments on Qwen3 models show that Domino achieves up to \(5.49\times\) end-to-end speedup under the Transformers backend and up to \(5.8\times\) throughput speedup under SGLang serving.

  4. Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

    Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive markets, these perturbations are typically independent across providers. We theoretically prove that averaging output probability distributions recovers the unwatermarked distribution with up to a second-order error term. Empirically, simply averaging 3-5 models cancels out these perturbations. We introduce WASH (Watermark Attenuation via Statistical Hybridisation), which solves practical challenges in ensemble generation: vocabulary misalignment and tokenisation differences across heterogeneous models. Experiments across six watermarking schemes and three LLMs show that averaging across 3 models suppresses detection z-scores from 5-300 to below 2 (below the detection threshold of 4) and reduces TPR at 5% FPR to below 50%, while improving quality by 27.5% and running 6 times faster than the best baseline on the long sequence generation. Our results suggest that robust AI-text detection via watermarking requires either accepting this fundamental vulnerability or unprecedented coordination among model providers.

  5. VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

    The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to logical failures across diverse reasoning scenarios. Existing efforts try to utilize Vision-Language Models (VLMs) as problem pre-solvers to produce or refine textual guidance for the VGM. However, textual descriptions fail to capture intricate spatiotemporal details, and VGMs often struggle to faithfully execute fine-grained or long-tail instructions even with a valid plan. While VLMs struggle as solvers, they possess strong perception capabilities to evaluate process-constraint satisfaction and final-goal achievement. Leveraging this strength, we introduce a paradigm shift that transitions the role of VLMs to "teachers". Specifically, a VLM teacher extracts task-specific rules to formulate differentiable rewards, guiding a VGM Reasoner via test-time online optimization of a lightweight LoRA module. This strategy enables adaptive test-time optimization and extends the reasoning capabilities beyond the VGM's intrinsic boundaries. Evaluations on symbolic (VBVR-Bench) and general-purpose (RULER-Bench) video reasoning benchmarks show that the proposed method yields a 16.7-point average performance gain, outperforming the VLM-as-Solver paradigm (+0.4 points) and Best-of-N scaling (+2.2 points) by a large margin at comparable test-time cost. These findings reveal that integrating VLMs as test-time teachers offers a promising paradigm for achieving generalizable video reasoning. Project Page: https://VLM-as-Teacher.github.io/

  6. When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

    Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinforcement learning is unstable in ways that are poorly understood. We study when end-to-end RL training of multi-agent LLM workflows improves over their base models, comparing Shared-Policy training, where all roles update one policy, with Isolated-Policy training, where each role has its own parameters. Our experimental matrix spans Eval-Opt, Voting, and Orch-Workers workflows, math and code tasks, and three model scales (0.6B, 1.7B, 4B). We find that multi-agent RL usually improves over base models, but gains depend jointly on workflow, task, and scale, not on policy sharing alone. Isolated-Policy tends to reach higher peak accuracy yet more often falls off a terminal accuracy cliff, while Shared-Policy training does not eliminate failure; it redistributes failure into qualitatively different patterns. We then explain the strongest of these patterns through role-level gradient dynamics induced by workflow topology and policy routing: under Isolated-Policy, parallel same-role agents on shared prompts amplify per-role gradients and drive terminal degradation in Voting and Orch-Workers workflows; under Shared-Policy, asymmetric per-step gradient mass causes the shared policy to be captured by the dominant role, producing different failure signatures by task and workflow. Together, the empirical map and its underlying mechanisms show that policy sharing routes training pressure through different channels rather than offering uniform stability, making it a design choice with workflow- and task-conditional tradeoffs.

  7. LVSA: Training-Free Sparse Attention for Long Video Diffusion

    Dense self-attention is the compute and quality bottleneck of long-video diffusion inference: cost grows quadratically with the sequence length, and beyond the training horizon the model converges to near-static output, that is, "frozen" repetitive video. State of the art approaches are either too costly, e.g., they require retraining, or fail to satisfy both performance and quality objectives in a scalable manner. To this end, we introduce Long Video Sparse Attention (LVSA), a training-free model-agnostic block-sparse attention for video diffusion transformers that combines a structured window pattern with rotating global anchors, thus removing the fixed-grid bias which causes long-range temporal artifacts. LVSA, combined with a FlashInfer kernel, reduces compute up to 3.17x on Wan 2.1 1.3B at a 6x horizon, 2.98x on Wan 2.1 14B at a 6x horizon, and 3.33x on HunyuanVideo 1.5 at a 1.5x horizon, compared to dense attention. Beyond reducing compute, LVSA enables HunyuanVideo 1.5 generation at a 2x horizon, which is otherwise out-of-memory on a single GPU. Moreover, LVSA provides speedups up to 2.41x compared to RIFLEx and 3.27x compared to UltraViCo on Wan 2.1 1.3B. To demonstrate applicability across diverse platforms, we apply LVSA on NPUs and achieve speedups up to 2.71x on Wan 2.2 A14B and 3.24x on Wan 2.1 1.3B compared to dense attention. To evaluate quality in a fair way, we introduce VQeval, a tool properly scoring loopy video failures, which instead are rewarded in state of the art evaluators like VBench-Long. LVSA is quality-neutral for generation at training horizon length and quality-positive at extended lengths.

  8. MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

    The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal applications and development platforms. However, existing benchmarks predominantly focus on generic information-seeking tools and fail to capture the practical challenges posed by personal social applications, where tools interact with individual accounts or local databases. To bridge this critical gap, we introduce MCP-Persona, the first benchmark specifically designed for evaluating agent performance on real-world, personalized MCP tools. MCP-Persona encompasses a diverse set of widely-used applications, ranging from social media platforms like Reddit and Xiaohongshu (Rednote) to enterprise collaboration suites such as Lark (Feishu) and Slack. Our extensive experiments on various state-of-the-art (SOTA) agents demonstrate their significant struggles with personalized tool use, thereby highlighting the benchmark's crucial role in identifying and addressing these limitations. MCP-Persona is publicly available at https://github.com/wwh0411/MCP-Persona}{https://github.com/wwh0411/MCP-Persona.

  9. LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

    Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory: once the active window accumulates appearance errors, subsequent generations can only condition on this degraded trajectory and drift further away. We address this limitation by formulating long video generation as a retrieval-augmented generation (RAG) problem. Rather than relying solely on the recent window, we treat previously generated latents as a dynamic, searchable history. We propose LongLive-RAG, a general retrieval framework for AR video generation. At each new block, LongLive-RAG uses a query embedding to retrieve relevant historical latents. This lightweight retrieval step adds only a small overhead relative to generation and lets the generator condition on non-local context instead of only the recent window. To make retrieval more discriminative, we introduce the Window Temporal Delta Loss that suppresses redundant local similarity and encourages embeddings to capture meaningful temporal changes. Together, these components help reduce error accumulation caused by sliding-window attention. Experiments across multiple AR backbones and generation lengths show improved long-video quality and the best average VBench-Long rank. To our knowledge, among open-ended AR long video generation methods, LongLive-RAG is the first to formulate self-generated latent history as content-addressable retrieval memory. Code is available at https://github.com/qixinhu11/LongLive-RAG.

  10. Joint Agent Memory and Exploration Learning via Novelty Signals

    In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.

  11. OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

    Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the diverse, ever-changing open web. Although online RL has shown promise for text-based agents, its potential for training visual web agents directly on live websites remains largely underexplored. In this paper, we introduce OpenWebRL, an open framework for training visual web agents with online multi-turn RL on real websites. OpenWebRL covers the full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, we train OpenWebRL-4B, which establishes a new open-source state of the art on challenging live-web benchmarks. With only 0.4K initialization trajectories and 2.2K open-ended RL training tasks, OpenWebRL-4B achieves 67.0% success on Online-Mind2Web and 64.0% on DeepShop, outperforming prior open agents of similar or larger scale and remaining competitive with proprietary systems including OpenAI CUA and Gemini CUA. Beyond strong benchmark performance, we systematically study the key design choices that make online RL effective for visual web agents, and analyze how RL improves agentic reasoning. Overall, our work offers a practical path toward building more capable, reproducible, and cost-efficient open web agents. We will release our training data, models, and code to support future research.

  12. Policy and World Modeling Co-Training for Language Agents

    Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, yet existing approaches often require separate simulators, extra training stages, or additional inference-time computation. We observe that on-policy RL rollouts already contain the needed signal: each transition pairs an action with its resulting next observation. Based on this observation, we propose PaW, a Policy and World modeling co-training framework that adds auxiliary WM supervision to the same policy during RL, without changing the inference paradigm. To make auxiliary WM supervision informative and stable, PaW introduces three components: action-entropy-based WM data selection, noise-tolerant WM loss, and reward-adaptive loss balancing. Experiments on three agentic task benchmarks show consistent improvements over strong RL baselines across models and RL algorithms. These results suggest that standard RL rollouts are a practical source of WM supervision for language-agent training.

  13. AFUN: Towards an Affordance Foundation Model for Functionality Understanding

    Affordance understanding bridges visual perception and physical action, serving as an explainable interface for robot manipulation in open and unstructured real-world environments. Yet, building an affordance foundation model that not only understands where and how the interaction should happen, but also generalizes across diverse environments, objects, and tasks, remains a long-standing research challenge. Existing methods typically address only part of this challenge, either localizing task-relevant regions without specifying executable motion, or predicting motion but with limited scalability. In this paper, we present ourmodel, a step towards an affordance foundation model for functionality understanding. From a single RGB-D observation and a language task description, ourmodel predicts a task-conditional functional mask (where to interact) and a 3D post-contact motion curve (how to interact). To support open-world generalization, we build a large-scale standardized data pipeline that converts heterogeneous robot, human, simulation, and real-world scan data into a shared affordance schema with language, masks, and object-centric 3D motion labels. We evaluate ourmodel from three aspects: for affordance segmentation, ourmodel outperforms all baselines by a large margin across 8 test sets from 4 benchmarks, improving mean gIoU/cIoU by +23.9/+26.3; for contact-point prediction, it predicts substantially more accurate points, with a 12.7--61.3% hit-rate gain over the best baseline; and for 3D motion, it achieves the best performance on all three test sets. ourmodel can be deployed for real-world robot manipulation without finetuning for robot embodiment or using task-specific heuristics, demonstrating the ability to adapt to open-world affordance tasks. Project page: https://www.zhaoningwang.com/AFUN

  14. Agent Skills Should Go Beyond Text: The Case for Visual Skills

    Reusable skills are a key mechanism for extending agent capabilities, allowing agents to accumulate experience and solve increasingly complex tasks. Yet most existing skill-learning methods store reusable experience as text-only assets, such as instructions, reasoning traces, or summarized trajectories. We argue that this text-only paradigm creates a fundamental bottleneck for visual-centric tasks, where reusable knowledge often depends on spatial layout, visual grounding, fine-grained appearance, and localized state changes. To address this limitation, we propose \NAME, a multimodal skill paradigm that combines declarative textual logic with explicit visual support. We distinguish three reusable forms: static priors for stable spatial conventions, dynamic priors for in-situ visual working memory, and interleaved visual skills that bind ordered text steps to the source frames, screenshots, or page regions that justify them. Rather than only describing what to do, visual skills also encode where to look, how to inspect, and how to verify visual outcomes. To scale visual-skill construction, we introduce \SYSTEM, an automatic system that converts agent experience into reusable multimodal skills by preserving textual reasoning, spatial references, visual boundaries, and interaction patterns from task trajectories. Experiments on GUI and other visual-centric tasks show that visual skills consistently outperform text-only skills, particularly when success requires spatial correspondence, visual evidence, and state-aware interaction. These results support our central position: reusable agent skills should go beyond text and become multimodal assets for future multimodal agents.

  15. MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

    Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains unclear. Existing embodied and game-based benchmarks often compress interaction into short-horizon tasks or entangle success with domain-specific game mechanics. In this paper, we introduce MineExplorer benchmark for evaluating open-world exploration capabilities of MLLM agents in Minecraft. We first filter atomic tasks whose solutions rely heavily on Minecraft-specific knowledge to better reflect general open-world reasoning. Then we organize the benchmark around a ReAct-style capability formulation and compose atomic tasks into implicit multi-hop tasks. To further construct reliable instances, MineExplorer uses a multi-agent synthesis workflow that jointly designs task graphs, sandbox scenes, and rule-based milestone evaluators. Human evaluation shows that the multi-agent synthesis workflow produces significantly more reliable instances than a single-agent baseline. Experiments with advanced MLLM agents show that open-world exploration remains challenging, as strong models can handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories. Further analysis finds that task difficulty tracks agent completion, and larger models or thinking modes do not consistently translate into better performance. Code and dataset are available at https://github.com/Jometeorie/MineExplorer.

Techmeme(15)

  1. The US sanctions Nobitex, Iran's largest crypto exchange, accusing it of helping Iran's government and blacklisted state institutions evade Western sanctions (Gavin Finch/Reuters)

    Gavin Finch / Reuters : The US sanctions Nobitex, Iran's largest crypto exchange, accusing it of helping Iran's government and blacklisted state institutions evade Western sanctions —  The United States announced sanctions on Iran's biggest cryptocurrency exchange on Tuesday, accusing it of enabling the Iranian government …

  2. Palo Alto Networks reports Q3 revenue up 31% YoY to $3B, including $388M from CyberArk and Chronosphere, vs. $2.94B est., and forecasts Q4 revenue above est. (Samantha Subin/CNBC)

    Samantha Subin / CNBC : Palo Alto Networks reports Q3 revenue up 31% YoY to $3B, including $388M from CyberArk and Chronosphere, vs. $2.94B est., and forecasts Q4 revenue above est. —  - Palo Alto Networks topped Wall Street's third-quarter estimates, citing AI advancements

  3. Meta is testing a Series feature, letting select creators make episodic Reels that are placed in a dedicated hub on their profile, using both old and new Reels (Aisha Malik/TechCrunch)

    Aisha Malik / TechCrunch : Meta is testing a Series feature, letting select creators make episodic Reels that are placed in a dedicated hub on their profile, using both old and new Reels —  Meta is testing a new “Series” feature for Reels that's designed to make it easier to keep up with serialized content on Instagram and Facebook, the company told TechCrunch.

  4. Source: a CoreWeave-tied data center raised $900M via five-year junk bonds, priced at par to yield 7.5%, as the sector increasingly turns to high-yield bonds (Gowri Gurumurthy/Bloomberg)

    Gowri Gurumurthy / Bloomberg : Source: a CoreWeave-tied data center raised $900M via five-year junk bonds, priced at par to yield 7.5%, as the sector increasingly turns to high-yield bonds —  A data center tied to CoreWeave Inc. raised $900 million from a high-yield note offering, joining a wave of junk issuers tapping debt markets …

  5. Marvell shares closed up 32.52% on Tuesday after Jensen Huang hailed the chipmaker as the "next trillion-dollar company" at Computex (Sawdah Bhaimiya/CNBC)

    Sawdah Bhaimiya / CNBC : Marvell shares closed up 32.52% on Tuesday after Jensen Huang hailed the chipmaker as the “next trillion-dollar company” at Computex —  Nvidia's CEO Jensen Huang hailed Marvell Technology as the next trillion-dollar firm, sending its shares up 32% on Tuesday.

  6. Microsoft releases ASSERT, an open-source framework that lets developers generate and run AI behavior tests using natural-language descriptions (Ram Iyer/TechCrunch)

    Ram Iyer / TechCrunch : Microsoft releases ASSERT, an open-source framework that lets developers generate and run AI behavior tests using natural-language descriptions —  AI researchers and labs have advanced by leaps and bounds in evaluating AI models for everything from safety and compliance to sycophancy and alignment.

  7. Internal memo: Meta is scaling back elements of its employee tracking tool, launched in April to help train its AI models, after staff raised concerns (Jyoti Mann/The Information)

    Jyoti Mann / The Information : Internal memo: Meta is scaling back elements of its employee tracking tool, launched in April to help train its AI models, after staff raised concerns —  Meta Platforms is scaling back elements of its employee tracking tool after staff raised concerns about the tool, according to an internal memo reviewed by The Information.

  8. Microsoft and Mayo Clinic partner for an AI model trained on Mayo's medical data, with plans to build an AI healthcare assistant and AI tools for clinicians (Clare Duffy/CNN)

    Clare Duffy / CNN : Microsoft and Mayo Clinic partner for an AI model trained on Mayo's medical data, with plans to build an AI healthcare assistant and AI tools for clinicians —  People have been seeking out health information online since the dawn of the internet.  And now, tens of millions of people …

  9. An interview with Microsoft AI CEO Mustafa Suleyman about its models catching up to the state of the art from months ago, refusing to distill models, and more (Reed Albergotti/Semafor)

    Reed Albergotti / Semafor : An interview with Microsoft AI CEO Mustafa Suleyman about its models catching up to the state of the art from months ago, refusing to distill models, and more —  THE SCENE  —  Not long ago, Microsoft leapt ahead of competitors by funding OpenAI when it was a fledgling research lab.

  10. Microsoft debuts MAI-Thinking-1, its first advanced reasoning AI model, trained "from the ground up on clean data, without distillation from third-party models" (Jay Peters/The Verge)

    Jay Peters / The Verge : Microsoft debuts MAI-Thinking-1, its first advanced reasoning AI model, trained “from the ground up on clean data, without distillation from third-party models” —  MAI-Thinking-1 is one of seven new models announced at Build 2026. … Microsoft announced a bunch of new in-house AI models …

  11. Microsoft unveils Majorana 2, a quantum chip that it developed using AI tools for materials science, and says it will have commercial quantum machines by 2029 (Stephen Nellis/Reuters)

    Stephen Nellis / Reuters : Microsoft unveils Majorana 2, a quantum chip that it developed using AI tools for materials science, and says it will have commercial quantum machines by 2029 —  Microsoft (MSFT.O) on Tuesday unveiled a new quantum computing chip that it redesigned with the help of AI …

  12. Google adds a scam-detection feature, built on RCS, to Android 12 and later that verifies whether a call is coming from the caller's actual Android smartphone (Lily Hay Newman/Wired)

    Lily Hay Newman / Wired : Google adds a scam-detection feature, built on RCS, to Android 12 and later that verifies whether a call is coming from the caller's actual Android smartphone —  Available for Android 12 and later, the anti-scam feature is baked into Google Dialer, which sends a silent “confirmation signal” …

  13. Microsoft announces the Agent Control Specification, an open-source standard that gives developers a granular, consistent way to control what AI agents can do (Ram Iyer/TechCrunch)

    Ram Iyer / TechCrunch : Microsoft announces the Agent Control Specification, an open-source standard that gives developers a granular, consistent way to control what AI agents can do —  As AI agents grow ever more capable, enterprises racing to put them to work across applications, workflows, and products face a new challenge …

  14. GitHub unveils a GitHub Copilot desktop app in technical preview, which introduces a new feature called canvases for bidirectional work between users and agents (Mario Rodriguez/The GitHub Blog)

    Mario Rodriguez / The GitHub Blog : GitHub unveils a GitHub Copilot desktop app in technical preview, which introduces a new feature called canvases for bidirectional work between users and agents —  At Microsoft Build 2026, GitHub introduced new tools, updates, and surfaces so agents can work the way you already work.

  15. Microsoft announces seven AI models, including a reasoning one and an "ultra efficient" coding one fine-tuned for GitHub, for businesses and to lower its costs (Rafe Rosner-Uddin/Financial Times)

    Rafe Rosner-Uddin / Financial Times : Microsoft announces seven AI models, including a reasoning one and an “ultra efficient” coding one fine-tuned for GitHub, for businesses and to lower its costs —  Software giant's AI chief Mustafa Suleyman says focus is on developing products for business users

Solidot(15)

  1. 拒绝停止呼吸的土壤

    法国生化学家 Sébastien Fontaine 15 年来一直试图杀死土壤,他想要了解没有任何生命的土壤能释放多少碳。 他的团队将土壤密封在罐子内,用伽马射线进行灭菌照射。然后等待土壤释放的二氧化碳——这是微生物呼吸持续进行的标志——下降。他们等待了几周,几个月。在显微镜下,经辐射处理的土壤没有显示任何生命迹象,但它仍在继续释放二氧化碳。土壤拒绝停止呼吸。Fontaine 的实验室重复了实验得到了相同的结果。研究人员开始寻找无生命土壤中的呼吸来源。Fontaine 的团队如今报告,他们的土壤样本在六年内持续消耗氧气并释放二氧化碳。他们提出,为生命提供能量的代谢过程也可能发生在活细胞之外。他们的实验表明,即使没有通常组织土壤的生物蛋白质,这种代谢过程也能在土壤中发挥作用。如果他们的假设正确,那么部分生化反应如释放富碳糖分子能量的反应,可能并非生物所独有。此类反应甚至可能在地球生命出现前就已经存在。

  2. 蓝色章鱼是全新物种

    2015 年在加拉帕戈斯群岛进行深海考察的科学家在查看遥控潜水器拍摄的影像时,发现了一只体型娇小、通体呈蓝色的章鱼,它位于水下约 1773 米处。科学家捕捉了这只章鱼以进行进一步分析。研究人员如今得出结论:这只体型小到可以放在手掌的可爱小生物属于一个全新物种。研究报告发表在《Zootaxis》期刊上。小章鱼被保存在储藏室中。由于它的独一无二,且极不可能采集到另一只,科学家不愿意对其解剖进行彻底的物种鉴定分析。因此研究团队选择了 mini-CT 扫描,研究表明这种生物手臂很短,臂上的吸盘很少,没有墨囊,皮肤光滑,且有一颗巨大的脊齿。他们将该物种命名为 Microeledone galapagensis。

  3. 富铁免疫细胞帮助信鸽导航

    迁徒鸟、海龟等动物似乎具有感知地磁场的能力,能利用地磁场进行导航。根据发表在《科学》期刊上的一项研究,信鸽肝脏中的富铁免疫细胞可能赋予了其磁罗盘的能力。对信鸽组织薄片的分析发现,其肝脏巨噬细胞富含铁蛋白,但它在脾脏中很少,且在喙和大脑中完全不存在。电子显微镜的进一步观察发现,巨噬细胞紧邻神经元,而这些神经元都与中枢神经系统相连。研究人员设计了一个试验检验富含铁的巨噬细胞是否能像磁罗盘一样为信鸽指引方向:他们使用名为 clodronate liposomes 的药物抑制巨噬细胞的活性。研究团队训练了 34 只信鸽。白天信鸽利用太阳的位置确定方向。当阴天或完全被云层遮蔽时,它们依靠磁感应辨别方向。研究团队给 18 只信鸽注射了 clodronate,24 小时后当云完全遮蔽阳光时将它们逐一放飞。这些信鸽都佩戴了 GPS 发射器,研究团队能实时追踪其飞行轨迹。所有 18 只信鸽都迷路了,直到天空放晴才返回。16 只对照组的信鸽都没有迷路。研究人员表示,如果铁蛋白辅助导航机制得到证实,那么它可能具有普适性,适用于从蜜蜂到蝙蝠,到鲸鱼和鲨鱼等各种动物。

  4. NASA 低音爆超音速飞机 X-59 将首次尝试突破音速

    NASA 宣布,由洛克希德马丁臭鼬工厂设计的 X-59 Quess 低音爆超音速飞机将在本月首次尝试突破音速。X-59 设计能突破音速但同时不会有超音速飞机通常会产生的音爆,它会产生更安静的“砰砰声”,类似室内听到关车门的声音。它没有前向窗户,而是通过摄像头和显示屏为飞行员提供飞机前方的增强现实的外部视觉系统。如果 X-59 成功它有望对超音速飞行和航空业产生革命性影响,解除目前对超音速飞行的限制。X-59 于 2025 年 10 月完成首飞,2026 年 3 月以来进行了 14 次试飞,本月的超音速飞行计划在 16.7 公里高度实现 1.4 马赫。

  5. 中国打击快餐行业的幽灵外卖

    中国正在打击引发食品安全问题的幽灵外卖。幽灵外卖指的是在外卖平台上提供外卖服务但没有实体店的商家。根据周一生效的新规,外卖平台上的商家信息必须与实体店相符,商家还必须注明是否提供堂食服务。去年北京一男子投诉称他通过外卖平台订购的蛋糕质量不佳,上面装饰着不可食用的花朵。此事引发了对“幽灵外卖”的关注。调查发现,他订购蛋糕的连锁店在各大电商平台上列出了近 380 家门店,但实际上却没有一家实体店。其网店还使用了伪造的营业执照。进一步调查显示,从网店订购的蛋糕实际上外包给一个订单转运平台,该平台会将订单分配给出价最低的第三方商家。当局在两个订单转运平台上共查获了 360 万份蛋糕订单。当局还在七大外卖平台上发现了 6.7 万家“幽灵店铺”,这些店铺与订单转运网站“相互勾结,形成非法供应链”。今年四月,市场监管总局宣布对拼多多、美团、京东、饿了么、抖音、淘宝、天猫 7 家电商平台“幽灵外卖”系列案罚款 36 亿元。

  6. 中国将数据和算法纳入商业秘密保护

    中国扩大商业秘密保护范围,将数据和算法纳入其中,以加强防范技术外流。中国国家市场监督管理总局修订的《商业秘密保护规定》在星期一(6月1日)正式施行。这是中国法律首次明确将数据、算法等数字资产纳入商业秘密保护范围。新规也对远程办公和跨境企业合作提出更严格的安全要求。企业必须采取保护措施,包括按照员工职级限制文件访问权限、隐藏敏感信息,以及记录用户操作行为等。规定还将境外实施的侵犯商业秘密行为纳入规制范围,但未明确具体执法机制。配合新规实施,中国国家市场监管总局星期一启动为期一个月的专项执法行动,重点针对生物医药、半导体和人工智能等关键领域,严厉打击“恶意挖角”以及员工跳槽时携带商业秘密等行为。

  7. 能源危机推动 37 个国家的电动汽车销量创新高

    受中东危机导致燃料价格上涨的影响,全球电动汽车销量快速增长。根据标普全球汽车数据统计,在可获取数据的 150 个国家中,3 月有包括澳大利亚和英国在内的 28 个国家刷新了电动汽车单月销量历史纪录。4 月则有包括巴西和菲律宾在内的 9 个国家创下新高。3 月和 4 月期间,91% 的国家电动汽车销量实现增长。在原油进口高度依赖中东的韩国,3~4 月的电动汽车销量同比增长至 2.4 倍。电动汽车在新车销售中的占比提高14个百分点达到 26%。东南亚地区电动汽车销量增长 4 成,市场占比升至 16%。欧盟市场也摆脱了一度停滞的局面,销量同比增长 4 成。中国市场虽然电动汽车销量下降 8%,但由于整体新车需求同步下滑,电动汽车在新车销售中的占比反而提高5个百分点达到 42%。国际能源署在 5 月发布的报告中指出,此次能源危机的应对方式“将在未来几年塑造全球汽车市场”。

  8. 海盗湾在被警方搜查 20 年后

    2006 年 5 月 31 日,海盗湾成立不到三年,65 名瑞典警察进入了斯德哥尔摩的一个数据中心。在美国政府的压力下,作为刑事调查的一部分,他们奉命下线海盗湾的服务器。在警察进入数据中心前,海盗湾联合创始人Gottfrid Svartholm 和 Fredrik Neij 就感觉到情况不妙。他们注意到有密探跟踪他们。不过这一次警方的目标是他们的服务器。上午 10 点左右,Gottfrid 告诉 Fredrik 办公室来了警察。他让同事去托管机房销毁“罪证”。Fredrik 离开时,他意识到问题可能与他们的 torrent tracker 相关。为以防万一他决定对网站进行完整备份。当他到达托管机房时,他的担忧得到了证实。数十名警察带走了数十台服务器,其中大部分属于与海盗湾无关的客户。接下来几天,Fredrik 备份网站的决定显然是海盗湾历史上最关键的时刻。正因为有了备份,海盗湾团队才得以在三天内恢复网站。事件的处理方式也延续了海盗湾一贯的恶搞。他们将网站更名为“警察湾”(The Police Bay),设计了一个向好莱坞发射炮弹的新标志。几天后网站的标志被凤凰图案取代,象征着它从数字灰烬里重生。这次突击搜查非但没有让海盗湾倒闭,反而让它成为主流媒体关注的焦点,而很大程度上这要归功于网站的快速恢复。媒体的报道也引发了网站流量的激增,与好莱坞的预期结果相反。20 年后,海盗湾仍然还是那个海盗湾。

  9. Red Hat 官方 NPM 账号被入侵,软件包被植入恶意程序

    Red Hat 官方 NPM 账号 @redhat-c​​loud-services 被入侵,该账号相关联的多个软件包植入了窃取凭证的恶意程序。恶意程序旨在窃取 GitHub Action Secret、以及 AWS、GCP、Azure、Kubernetes、HashiCorp Vault、npm 和 CircleCI 等的凭证,它还是一种能自我传播的蠕虫,会利用窃取的 npm 令牌和 npm 的 bypass_2fa 参数,自动重新发布其它软件包的后门版本。Red Hat 在一份声明中表示,恶意软件包已经移除,它仍然在进行调查,初步分析未发现对客户或合作伙伴环境或 Red Hat 生产系统造成任何影响。

  10. Anthropic 申请 IPO

    Anthropic 已向美国证券交易委员会(SEC)秘密提交了 IPO 招股说明书。该公司表示在 SEC 完成审查之后,将根据市场状况等因素选择上市。Anthropic 的估值今年以来出现了爆炸式增长,在上周的最新一轮融资中估值达到了 9650 亿美元,超过了 OpenAI 在 3 月下旬的 8520 亿美元估值。美国股市即将迎来三家万亿市值公司的上市,SpaceX 预计本月上市,Anthropic 竞争对手 OpenAI 预计会很快递交上市申请,三家公司的市值预计将达到 4 万亿美元。

  11. 黑客利用 Meta AI 机器人接管 Instagram 名人账号

    亲伊朗黑客诱骗 Meta AI 机器人短时间内接管了多个 Instagram 名人账号,其中包括奥巴马和美国太空军总军士长(Chief Master Sergeant),之后在账号上发表了亲伊朗的图片和信息。攻击方法非常简单:首先使用 VPN 连接到目标用户常住地附近,然后请求重置账号密码,要求与 Meta AI 客服对话,指示 AI 将目标账户关联到一个新邮箱地址,AI 按指示向该邮箱地址发送一次性验证码后,攻击者就可以重置密码接管账号。目前 Telegram 上已经出现了大量交易被接管账号的频道。Meta 的 Andy Stone 声称该公司已经采取行动解决了问题。

  12. 三种埃博拉疫苗在研发中

    The International Aids Vaccine Initiative(IAVI)、牛津大学以及 Moderna 公司正在研发针对埃博拉病毒的疫苗。IAVI 表示正在刚果民主共和国爆发的埃博拉疫情可能是至今最严重的。疫情发生在冲突地区,已经报告了逾千例疑似病例,邻国乌干达已确诊 9 例。目前已知有六种埃博拉病毒株,只有三种会引发疫情。最常见的 Zaire 毒株已有针对性的疫苗,但此次爆发的是比较罕见的 Bundibugyo 毒株,目前还没有针对它的疫苗。Moderna 公司宣布将利用 mRNA 技术研发针对 Bundibugyo 毒株的疫苗。

  13. 巴西亚马逊出现旱季延长和降雨模式改变

    最近发表的两项研究显示,巴西亚马逊地区开始出现此前预测几十年后才会出现的情景,包括旱季延长和降雨模式改变。如果没有采取应对措施,情况可能会迅速恶化,对生物多样性、天然水库的补充以及森林功能构成威胁。其中一项研究表明,亚马逊地区的旱季正从四个月延长至六个月,期间降水量减少逾 150 毫米。第二项研究分析了 2023 年至 2024 年间亚马逊地区的干旱情况。研究结果显示,过火面积增加了 9%,森林退化预警增加了 19%,在干旱高峰期,多达 420 万公顷的森林受到火灾影响。结果表明,干旱、火灾和退化的循环在加剧,削弱了生态系统的恢复能力。亚马逊雨林的面积也可能会减少。

  14. 中国批准首例侵入式脑机接口芯片之后

    去年 10 月的一天,Dong Hui 突然决定试试能不能握笔写字。6 年前他因为车祸导致的脊髓损伤而颈部以下瘫痪。他缓慢而坚定的写下了自己的名字、谢谢和日期。他能做到这一切来自他参加的脑机接口芯片试验。2024 年 11 月 Dong Hui 成为中国首批接受脑部手术植入侵入式脑机接口芯片的患者之一。今年三月他使用的植入式脑机接口产品获得了商业使用批准。他植入的脑机接口设备被称为 NEO,由上海初创公司 Neuracle Technology 和清华大学合作研发。手术历时约 1.5 小时,收集脑电信号的传感器植入放置在他的硬脑膜上。植入物会将信号传输到计算机。计算机将信号翻译成指令,控制他每天 2.5 小时训练期间佩戴的软体机器人手套,帮助他学习抓握。手术后大约一周他开始康复训练,“训练的第九天,我的右手成功不用手套抓住了一个球,那真是个奇迹。”悉尼科技大学的脑机接口研究员 Avinash Singh 表示,NEO 迅速获得批准的原因之一是其侵入性相对较小,它的 8 个传感器放置在大脑保护膜之上,相比下马斯克(Elon Musk)所创办的 Neuralink 公司开发的 N1 脑机芯片直接穿透了大脑皮层。NEO 的出血、胶质瘢痕形成和长期信号衰减的风险较低。中国还着手将脑机接口列入医保,将其与量子技术、人形机器人等列为对中国未来科技竞争力至关重要的六大关键产业之一。信息科学家 Meicen Sun 表示,中国一大优势是患者乐于接受新技术。美国初创公司 Axoft 正与中国公司合作在中国对四名患者进行脑机接口测试,并计划扩大规模。

  15. 实验性药物显著延长了最致命癌症患者的生存期

    胰腺癌是最致命的癌症,大部分现有疗法的效果甚微。现在名为 daraxonrasib 的药物公布了 III 期临床试验结果,有 500 名胰腺癌已扩散的患者参与了试验,其中 248 名患者每日服用 daraxonrasib,其余 252 名接受化疗。结果显示,服药组的中位生存期为 13.2 个月,化疗组为 6.6 个月,也就是药物将患者的生存期延长了一倍,而且副作用更少。研究报告公布在芝加哥举行的美国临床肿瘤学会年会上,专家认为这种药物有望引领一场治疗革命。Daraxonrasib 的作用机制是靶向名为 Kras 的蛋白质,这种蛋白质驱动了几乎所有胰腺癌。药物通过粘合分子去捕获并抑制 Kras 蛋白,从而阻止肿瘤的生长。

NEWSLETTER · FREE · WEEKLY

OrangeBot Weekly

5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.