Curated by Shen Huang · 90 stories · ~14 min read
DIGEST · 2026-05-13

OrangeBot.AI Digest — 2026-05-13

90 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Linux gaming is faster because Windows APIs are becoming Linux kernel features (www.xda-developers.com)
  2. Twin brothers wipe 96 government databases minutes after being fired (arstechnica.com)
  3. A History of IDEs at Google (laurent.le-brun.eu)
  4. Open Source Resistance: keep OSS alive on company time (ossresistance.com)
  5. Kickstarter is forced to ban adult content by payment processors (kotaku.com)
  6. Setting up a free *.city.state.us locality domain (2025) (fredchan.org)
  7. Reverting the incremental GC in Python 3.14 and 3.15 (discuss.python.org)
  8. Dutch suicide prevention website shares data with tech companies without consent (nltimes.nl)
  9. Leaving GitHub for Forgejo (jorijn.com)
  10. I moved my digital stack to Europe (monokai.com)
  11. Preserving Fisher-Price Pixter (dmitry.gr)
  12. New stainless steel can survive conditions for hydrogen production in seawater (www.sciencedaily.com)
  13. SecurityBaseline.eu (internetcleanup.foundation)
  14. Deterministic Fully-Static Whole-Binary Translation Without Heuristics (arxiv.org)
  15. The vi family (lpar.ath0.com)

GitHub Trending(15)

  1. tinyhumansai / openhuman
  2. rohitg00 / agentmemory
  3. obra / superpowers
  4. yikart / AiToEarn
  5. influxdata / telegraf
  6. millionco / react-doctor
  7. K-Dense-AI / scientific-agent-skills
  8. danielmiessler / Personal_AI_Infrastructure
  9. supertone-inc / supertonic
  10. CloakHQ / CloakBrowser
  11. Greedeks / GTweak
  12. mattpocock / skills
  13. ArthurBrussee / brush
  14. imthenachoman / How-To-Secure-A-Linux-Server
  15. apernet / hysteria

Product Hunt(15)

  1. BossHogg

    Agent-first CLI for PostHog analytics and feature flags

  2. Claudy

    A proper home for Claude Code — multi-session, multi-account

  3. Apideck MCP Server

    Give AI agents access to real-time data across 200+ apps

  4. Liminary

    Ground your AI in saved knowledge as you work

  5. Blaze 2.0

    AI marketer for SMBs complete w/ strategy, content, and ads

  6. Googlebook

    A new kind of laptop designed for Gemini Intelligence

  7. LayerProof Matte 2.0

    Create high-quality social content at the speed of trends

  8. CraftBot with Living UI

    Grow your own software that is alive.

  9. SideNotes

    Take notes on your screen side

  10. Memoket Gem

    An AI wearable that remembers your conversations all day

  11. Pipali

    An AI coworker for any computer work

  12. Vibespace

    Workspace for multi-agent collaboration

  13. IndexedAI

    Your site scores X/100 for AI agents with next steps

  14. AI meeting notes by Snaply

    Free & Private AI meeting notes for you Mac

  15. Mi

    30-line zero-config CLI agent for bug fixes + refactoring

Hugging Face(15)

  1. Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics

    World models enable agents to anticipate the effects of their actions by internalizing environment dynamics. In enterprise systems, however, these dynamics are often defined by tenant-specific business logic that varies across deployments and evolves over time, making models trained on historical transitions brittle under deployment shift. We ask a question the world-models literature has not addressed: when the rules can be read at inference time, does an agent still need to learn them? We argue, and demonstrate empirically, that in settings where transition dynamics are configurable and readable, runtime discovery complements offline training by grounding predictions in the active system instance. We propose enterprise discovery agents, which recover relevant transition dynamics at runtime by reading the system's configuration rather than relying solely on internalized representations. We introduce CascadeBench, a reasoning-focused benchmark for enterprise cascade prediction that adopts the evaluation methodology of World of Workflows on diverse synthetic environments, and use it together with deployment-shift evaluation to show that offline-trained world models can perform well in-distribution but degrade as dynamics change, whereas discovery-based agents are more robust under shift by grounding their predictions in the current instance. Our findings suggest that, in configurable enterprise environments, agents should not rely solely on fixed internalized dynamics, but should incorporate mechanisms for discovering relevant transition logic at runtime.

  2. World Action Models: The Next Frontier in Embodied AI

    Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify predictive state modeling with action generation, targeting a joint distribution over future states and actions rather than actions alone. However, the literature remains fragmented across architectures, learning objectives, and application scenarios, lacking a unified conceptual framework. We formally define WAMs and disambiguate them from related concepts, and trace the foundations and early integration of VLA and world model research that gave rise to this paradigm. We organize existing methods into a structured taxonomy of Cascaded and Joint WAMs, with further subdivision by generation modality, conditioning mechanism, and action decoding strategy. We systematically analyze the data ecosystem fueling WAMs development, spanning robot teleoperation, portable human demonstrations, simulation, and internet-scale egocentric video, and synthesize emerging evaluation protocols organized around visual fidelity, physical commonsense, and action plausibility. Overall, this survey provides the first systematic account of the WAMs landscape, clarifies key architectural paradigms and their trade-offs, and identifies open challenges and future opportunities for this rapidly evolving field.

  3. AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

    In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced reasoning tasks: Reasoning Text-to-Image Generation, where the model actively infers implicit user intents, and Self-Reflective Refinement, where it autonomously diagnoses and corrects misalignments in generated outputs. To address the challenge of providing stable supervision for real-world multimodal generation, we introduce the Decompositional Verifiable Reward (DVReward). Unlike holistic scalar rewards, DVReward utilizes an LLM to decompose complex user requests into atomic, verifiable semantic and quality questions, which are then evaluated by a general MLLM to provide reliable and interpretable feedback. Extensive experiments demonstrate that AlphaGRPO yields robust improvements across multimodal generation benchmarks, including GenEval, TIIF-Bench, DPG-Bench and WISE, while also achieving significant gains in editing tasks on GEdit without training on editing tasks. These results validate that our self-reflective reinforcement approach effectively leverages inherent understanding to guide high-fidelity generation. Project page: https://huangrh99.github.io/AlphaGRPO/

  4. Efficient Pre-Training with Token Superposition

    Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we present Token-Superposition Training (TST), a simple drop-in method that significantly improves the data throughput per FLOPs during pre-training without modifying the parallelism, optimizer, tokenizer, data, or model architecture. TST is done in two phases: (i) A highly efficient superposition phase where we combine many contiguous tokens into one bag and train using a multi-hot cross-entropy (MCE) objective, and (ii) a recovery phase where we revert back to standard training. We extensively evaluate TST on the scale of 270M and 600M parameters and validate on 3B and a 10B A1B mixture of experts model, demonstrating that it is highly robust in different settings. Ultimately, TST consistently outperforms baseline loss and downstream evaluations, and under equal-loss settings, TST yields up to a 2.5x reduction in total pre-training time at the 10B A1B scale.

  5. ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

    Computer Use Agents (CUAs) can act through both atomic GUI actions, such as click and type, and high-level tool calls, such as API-based file operations, but this hybrid action space often leaves them uncertain about when to continue with GUI actions or switch to tools, leading to suboptimal execution paths. This difficulty stems from the scarcity of high-quality interleaved GUI-Tool trajectories, the cost and brittleness of collecting real tool trajectories, and the lack of trajectory-level supervision for GUI-Tool path selection. In this paper, we propose ToolCUA, an end-to-end agent designed to learn optimal GUI-Tool path selection through a staged training paradigm. We first introduce an Interleaved GUI-Tool Trajectory Scaling Pipeline that repurposes abundant static GUI trajectories and synthesizes a grounded tool library, enabling diverse GUI-Tool trajectories without manual engineering or real tool-trajectory collection. We then perform Tool-Bootstrapped GUI RFT, combining warmup SFT with single-turn RL to improve decisions at critical GUI-Tool switching points. Finally, we optimize ToolCUA with Online Agentic RL in a high-fidelity GUI-Tool environment, guided by a Tool-Efficient Path Reward that encourages appropriate tool use and shorter execution paths. Experiments on OSWorld-MCP show that ToolCUA achieves 46.85% accuracy, a relative improvement of approximately 66% over the baseline, establishing a new state of the art among models of comparable scale. It also improves by 3.9% over GUI-only settings, demonstrating effective GUI-Tool orchestration. The results further suggest that training in a hybrid action space is a promising paradigm for real-world digital agents. Open-sourced here: https://x-plug.github.io/ToolCUA/

  6. MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

    The Model Context Protocol (MCP) has unified the interface between Large Language Models (LLMs) and external tools, yet a fundamental gap remains in how agents conceptualize the environments within which they operate. Current paradigms are bifurcated: Task-level planning often ignores execution-time dynamics, while reactive execution lacks long-horizon foresight. We present MCP-Cosmos, a framework that infuses generative World Models (WM) into the MCP ecosystem to enable predictive task automation. By unifying three disparate technologies, namely MCP, World Model, and Agent, we demonstrate that a "Bring Your Own World Model" (BYOWM) strategy allows agents to simulate state transitions and refine plans in a latent space before execution. We conducted experiments using two strategies, namely ReAct and SPIRAL with 2 planning models and 3 representative world models over 20+ MCP-Bench tasks. We observed improvements in Agent's environment interaction KPI such as tool success rate and tool parameter accuracy. The framework also offers new metrics such as Execution Quality to generate new insights about the effectiveness of world models compared to baseline.

  7. L2P: Unlocking Latent Potential for Pixel Generation

    Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation. By utilizing LDM-generated synthetic images as the sole training corpus, L2P fits an already smooth data manifold, enabling rapid convergence with zero real-data collection. This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs. Furthermore, eliminating the VAE memory bottleneck unlocks native 4K ultra-high resolution generation. Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval.

  8. Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

    Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as transient outputs, so intermediate visual evidence cannot be re-consumed by later tools. Second, training data is usually built by fixed curation recipes that cannot track the target agent's evolving capability. To address these challenges, we first introduce a visual-native agent harness centered on an image bank reference protocol, which registers every tool-returned image as an addressable reference and makes intermediate visual evidence reusable by later tools. On top of this harness, On-policy Data Evolution (ODE) runs a closed-loop data generator that refines itself across rounds from rollouts of the policy being trained. This per-round refinement makes each round's data target what the current policy still needs to learn. The same framework supports both diverse supervised fine-tuning data and policy-aware reinforcement learning data curation, covering the full training lifecycle of the target agent. Across 8 multimodal deep search benchmarks, ODE improves the Qwen3-VL-8B agent from 24.9% to 39.0% on average, surpassing Gemini-2.5 Pro in standard agent-workflow setting (37.9%). At 30B, ODE raises the average score from 30.6% to 41.5%. Further analyses validate the effectiveness of image-bank reuse, especially on complex tasks requiring iterative visual refinement, while rollout-feedback evolution yields more grounded SFT traces and better policy-matched RL tasks than static synthesis.

  9. Relit-LiVE: Relight Video by Jointly Learning Environment Video

    Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

  10. Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

    Reinforcement learning with verifiable rewards (RLVR) for Large Reasoning Models hinges on baseline estimation for variance reduction, but existing approaches pay a heavy price: PPO requires a policy-model scale critic, while GRPO needs multiple rollouts per prompt to keep its empirical group mean stable. We introduce Policy Optimization with Internal State Value Estimation), which obtains a baseline at negligible cost by using the policy model's internal signals already computed during the policy forward pass. A lightweight probe predicts the expected verifiable reward from the hidden states of the prompt and generated trajectory, as well as token-entropy statistics, and is trained online alongside the policy. To preserve gradient unbiasedness despite using trajectory-conditioned features, we introduce a cross-rollout construction that predicts each rollout's value from an independent rollout's internal states. Because POISE estimates prompt value using only a single rollout, it enables higher prompt diversity for a fixed compute budget during training. This reduces gradient variance for more stable learning and also eliminates the compute overhead of sampling costs for detecting zero-advantage prompts. On Qwen3-4B and DeepSeek-R1-Distill-Qwen-1.5B across math reasoning benchmarks, POISE matches DAPO while requiring less compute. Moreover, its value estimator shows similar performance to a separate LLM-scale value model and generalizes to various verifiable tasks. By leveraging the model's own internal representations, POISE enables more stable and efficient policy optimization.

  11. Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

    The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a single stream in chat models leads to a number of limitations: the agent cannot act (generate output) while reading, and in reverse, cannot react to new information while writing. Similarly, the agent cannot act while thinking and cannot think while reading or acting on information. In this work, we show that models can be unblocked by switching from instruction-tuning for sequential message formats to instruction-tuning for multiple, parallel streams of computation, splitting each role into a separate stream. Every forward pass of the language model then simultaneously reads from multiple input streams and generates tokens in multiple output streams, all of which causally depend on earlier timesteps. We argue that this data-driven change remedies a number of usability limitations as outlined above, improves model efficiency through parallelization, improves model security through better separation of concerns and can further improve model monitorability.

  12. On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

    Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-policy, and often incur a safety-utility trade-off: improving agent safety comes at the cost of degraded task performance. Such sparse and single-objective rewards severely limit real-world usability. To bridge this gap, we propose FATE, an on-policy self-evolving framework that transforms verifier-scored failures into repair supervision without expert demonstrations. For each failure, the same policy proposes repair candidates, which are then re-scored by verifiers and filtered across security, utility, over-refusal control, and trajectory validity. This dense trajectory-level information is then used as a supervision signal for agent self-evolution. During this process, we further introduce Pareto-Front Policy Optimization (PFPO), combining supervised warmup with Pareto-aware policy optimization to preserve safety-utility trade-offs. Experiments on AgentDojo, AgentHarm, and ATBench show that FATE improves safety across different models and scales while preserving useful behavior. Compared with strong baselines, FATE reduces attack success rate by 33.5%, harmful compliance by 82.6%, and improves external trajectory-safety diagnosis by 6.5%. These results suggest that failed trajectories can provide structured repair supervision for safer self-evolving agents.

  13. Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

    Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be decomposed into two semantically distinct factors: a training--inference discrepancy term that aligns inference-side and training-side distributions at the same behavior-policy version, and a policy-staleness term that constrains the update from the historical policy to the current policy. We show that practical asynchronous pipelines with delayed updates and partial rollouts often lose the required historical training-side logits, or old logits. This missing-old-logit problem entangles discrepancy repair with staleness correction, breaks the intended semantics of decoupled correction, and makes clipping and masking thresholds interact undesirably. To address this issue, we study both exact and approximate correction routes. We propose three exact old-logit acquisition strategies: snapshot-based version tracking, a dedicated old-logit model, and synchronization via partial rollout interruption, and compare their system trade-offs. From the perspective of approximate correction, we focus on preserving the benefits of decoupled correction through a more appropriate approximate policy when exact old logits cannot be recovered at low cost, without incurring extra system overhead. Following this analysis, we adopt a revised PPO-EWMA method, which achieves significant gains in both training speed and optimization performance. Code at https://github.com/millioniron/ROLL.

  14. SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning

    We introduce SeePhys Pro, a fine-grained modality transfer benchmark that studies whether models preserve the same reasoning capability when critical information is progressively transferred from text to image. Unlike standard vision-essential benchmarks that evaluate a single input form, SeePhys Pro features four semantically aligned variants for each problem with progressively increasing visual elements. Our evaluation shows that current frontier models are far from representation-invariant reasoners: performance degrades on average as information moves from language to diagrams, with visual variable grounding as the most critical bottleneck. Motivated by this inference-time fragility, we further develop large training corpora for multimodal RLVR and use blind training as a diagnostic control, finding that RL with all training images masked can still improve performance on unmasked validation sets. To analyze this effect, text-deletion, image-mask-rate, and format-saturation controls suggest that such gains can arise from residual textual and distributional cues rather than valid visual evidence. Our results highlight the need to evaluate multimodal reasoning not only by final-answer accuracy, but also by robustness under modality transfer and by diagnostics that test whether improvements rely on task-critical visual evidence.

  15. World Model for Robot Learning: A Comprehensive Survey

    World models, which are predictive representations of how environments evolve under actions, have become a central component of robot learning. They support policy learning, planning, simulation, evaluation, data generation, and have advanced rapidly with the rise of foundation models and large-scale video generation. However, the literature remains fragmented across architectures, functional roles, and embodied application domains. To address this gap, we present a comprehensive review of world models from a robot-learning perspective. We examine how world models are coupled with robot policies, how they serve as learned simulators for reinforcement learning and evaluation, and how robotic video world models have progressed from imagination-based generation to controllable, structured, and foundation-scale formulations. We further connect these ideas to navigation and autonomous driving, and summarize representative datasets, benchmarks, and evaluation protocols. Overall, this survey systematically reviews the rapidly growing literature on world models for robot learning, clarifies key paradigms and applications, and highlights major challenges and future directions for predictive modeling in embodied agents. To facilitate continued access to newly emerging works, benchmarks, and resources, we will maintain and regularly update the accompanying GitHub repository alongside this survey.

Techmeme(15)

  1. Cisco reports Q3 revenue up 12% YoY to $15.84B, vs. $15.56B est., forecasts Q4 revenue above est., and is cutting almost 4,000 jobs; CSCO jumps 17%+ after hours (Jordan Novet/CNBC)

    Jordan Novet / CNBC : Cisco reports Q3 revenue up 12% YoY to $15.84B, vs. $15.56B est., forecasts Q4 revenue above est., and is cutting almost 4,000 jobs; CSCO jumps 17%+ after hours —  Cisco shares soared 15% in extended trading on Wednesday after the networking company issued results and guidance that topped Wall Street's projections.

  2. Mythos Preview is the first AI model to complete both of AISI's cyber ranges, which measure models' cyberattack capabilities; GPT-5.5 solved only one of them (AI Security Institute)

    AI Security Institute : Mythos Preview is the first AI model to complete both of AISI's cyber ranges, which measure models' cyberattack capabilities; GPT-5.5 solved only one of them —  In February 2026, we internally estimated that the length of cyber tasks AI models could complete had doubled every 4.7 months since late 2024 …

  3. Q&A with Anthropic CFO Krishna Rao on the "cone of uncertainty" in AI, allocating compute, returns to frontier intelligence, platform vs. application, and more (Invest Like The Best on YouTube)

    Invest Like The Best on YouTube : Q&A with Anthropic CFO Krishna Rao on the “cone of uncertainty” in AI, allocating compute, returns to frontier intelligence, platform vs. application, and more —  In this episode of Invest Like The Best, Patrick O'Shaughnessy sits down with Anthropic CFO Krishna Rao …

  4. Musk v. Altman: Microsoft executive Michael Wetter testifies that Microsoft has spent $100B+ on OpenAI, including its investments and to build infrastructure (Bloomberg)

    Bloomberg : Musk v. Altman: Microsoft executive Michael Wetter testifies that Microsoft has spent $100B+ on OpenAI, including its investments and to build infrastructure —  Microsoft Corp. has spent more than $100 billion on its partnership with OpenAI, a sum that underscores the significance …

  5. LinkedIn says it has "implemented organizational changes"; source: LinkedIn plans to cut ~5% of its 17,500 full-time workers and focus on business growth areas (Reuters)

    Reuters : LinkedIn says it has “implemented organizational changes”; source: LinkedIn plans to cut ~5% of its 17,500 full-time workers and focus on business growth areas —  LinkedIn planned to inform staff of layoffs on Wednesday, two people familiar with the matter told Reuters, in a widening of technology sector cuts this year.

  6. Rivian CEO RJ Scaringe's Mind Robotics, which is building AI-powered robots for manufacturing tasks, raised $400M, source says at a $3.4B valuation (Sean McLain/Wall Street Journal)

    Sean McLain / Wall Street Journal : Rivian CEO RJ Scaringe's Mind Robotics, which is building AI-powered robots for manufacturing tasks, raised $400M, source says at a $3.4B valuation —  Funding for AI-powered industrial robot project now exceeds $1 billion  —  Mind Robotics, the startup founded by Rivian Chief Executive RJ Scaringe …

  7. Instagram rolls out Instants, which lets users share ephemeral photos, as an in-app feature and as a standalone Android and iOS app in select countries (Zac Hall/9to5Mac)

    Zac Hall / 9to5Mac : Instagram rolls out Instants, which lets users share ephemeral photos, as an in-app feature and as a standalone Android and iOS app in select countries —  Meta just launched a brand new iPhone app called Instants.  Built around ephemeral photo sharing, the new social media app is also the latest Instagram feature.

  8. Adaption, co-founded by ex-Cohere VP of AI research Sara Hooker, unveils AutoScientist, which can automate the research loop behind model training and alignment (Russell Brandom/TechCrunch)

    Russell Brandom / TechCrunch : Adaption, co-founded by ex-Cohere VP of AI research Sara Hooker, unveils AutoScientist, which can automate the research loop behind model training and alignment —  For years, AI researchers have anticipated the moment when AI systems will be able to improve themselves better than humans could.

  9. Anthropic launches Claude for Small Business, featuring a host of automated services like bookkeeping functions, business insights, and tools for ad campaigns (Lucas Ropek/TechCrunch)

    Lucas Ropek / TechCrunch : Anthropic launches Claude for Small Business, featuring a host of automated services like bookkeeping functions, business insights, and tools for ad campaigns —  Anthropic is looking to court smaller companies.  To that end, the company announced Wednesday the launch of Claude for Small Business …

  10. Microsoft unveils MDASH, a security system that orchestrates 100+ AI agents to find vulnerabilities, and says it identified 16 previously unknown Windows flaws (Gyana Swain/CSO)

    Gyana Swain / CSO : Microsoft unveils MDASH, a security system that orchestrates 100+ AI agents to find vulnerabilities, and says it identified 16 previously unknown Windows flaws —  The agentic tool, codenamed MDASH, will open to enterprise customers in private preview in June.

  11. Sources: Mistral has been developing a cybersecurity-focused AI model and has held discussions about it with European banks, which don't have access to Mythos (Bloomberg)

    Bloomberg : Sources: Mistral has been developing a cybersecurity-focused AI model and has held discussions about it with European banks, which don't have access to Mythos —  French artificial intelligence startup Mistral AI is in discussions with European banks about deploying its answer to Anthropic PBC's Mythos …

  12. Sources: Arm and its parent company SoftBank expressed preliminary interest in acquiring Cerebras weeks before its expected IPO; Cerebras rebuffed them (Bloomberg)

    Bloomberg : Sources: Arm and its parent company SoftBank expressed preliminary interest in acquiring Cerebras weeks before its expected IPO; Cerebras rebuffed them —  Arm Holdings Plc and its majority owner SoftBank Group Corp. made an approach to acquire Cerebras Systems Inc., the AI computing firm …

  13. The US FTC says Shutterstock will pay $35M to settle charges that Shutterstock misled consumers about its subscription plans and made it too difficult to cancel (Jonathan Stempel/Reuters)

    Jonathan Stempel / Reuters : The US FTC says Shutterstock will pay $35M to settle charges that Shutterstock misled consumers about its subscription plans and made it too difficult to cancel —  Shutterstock (SSTK.N) will pay $35 million to settle U.S. Federal Trade Commission charges that the online provider of stock photography …

  14. OpenAI endorses the Kids Online Safety Act and Illinois SB 315, an AI safety bill to establish requirements around transparency, incident reporting, and more (OpenAI Global Affairs)

    OpenAI Global Affairs : OpenAI endorses the Kids Online Safety Act and Illinois SB 315, an AI safety bill to establish requirements around transparency, incident reporting, and more —  Welcome (back) to The Prompt.  For the first time, we're using it to endorse legislation.  The bills we're supporting today …

  15. Q&A with Alexandr Wang on rebuilding Meta's AI stack, Muse Spark, personal superintelligence, Meta acquiring Assured Robot Intelligence, Sam Altman, and more (Ashlee Vance/Core Memory)

    Ashlee Vance / Core Memory : Q&A with Alexandr Wang on rebuilding Meta's AI stack, Muse Spark, personal superintelligence, Meta acquiring Assured Robot Intelligence, Sam Altman, and more —  Last June, Meta pried Alex Wang away from Scale AI, the company he co-founded and ran, in a deal valued at $14 billion.

Solidot(15)

  1. 被解雇兄弟删了 96 个数据库

    企业员工在被解雇前其凭证会提前失效,因为被解雇的员工是安全隐患。一对被同一家公司解雇的双胞胎兄弟 Muneeb 和 Sohaib Akhter 在几分钟内删掉了 96 个美国政府数据库。两兄弟有犯罪前科,曾因为犯罪行为被判刑,但 2023 年和 2024 年两人先后被同一家公司雇佣。雇主在 2025 年 2 月知道了他们过去的行为,立即解雇了他们。然而 Sohaib 的账号被关闭了,但其兄弟 Muneeb 的账号却被忽略了,Muneeb 立即采取行动对公司进行破坏,他删除了公司维护的美国政府数据库,并且还询问 AI 工具删库后如何清除服务器上的系统日志。两人最终于 12 月 3 日被捕,2026 年 4 月 15 日 Muneeb 认罪,Sohaib 则接受了陪审团的裁决,他被判有共谋罪。

  2. Kickstarter 禁止成人内容

    众筹平台 Kickstarter 过去几天修改了规则,扩大了禁止的成人内容范围。此前它只禁止“色情内容”,如今显著扩大了成人内容范围,包括但不限于:暗示性行为,MILF/DILF 内容,暗示性裸露,任何包含女性乳头/乳晕、生殖器和肛门的内容。它甚至还禁止了屁股。暂时不清楚 Kickstarter 为什么要这么做。媒体猜测可能是来自支付公司 Stripe 的压力。Kickstarter 曾在 3 月向众筹项目的发起人发邮件,警告 Stripe 将对任何包含“成人/NSFW 内容”的项目进行审核,可能会关闭项目的筹款账号。

  3. 美国左右派都对 AI 表示担忧

    右派的 Steve Bannon 和左派的 Bernie Sanders 在很多问题上的观点相去甚远,但他们都认为 AI 对工人阶级是一场灾难。Sanders 说 AI 寡头想要的不仅仅是取代特定工作岗位,他们想要取代工人。Bannon 说硅谷不在乎普通人。民调显示,美国是世界最关注 AI 的国家之一,既是全球 AI 的主要研发者也是 AI 的主要反对者。美国反 AI 的情绪在持续高涨。缅因州通过了美国首个全州范围内的数据中心建设禁令,虽然禁令被州长否决。OpenAI CEO Sam Altman 的家被人投掷了燃烧弹。四分之一的美国人接受将暴力作为一种反对手段。硅谷喜欢将 AI 比作工业革命,喜欢强调工业革命所释放的巨大财富。工业革命的确促进了经济增长,但亲身经历工业革命却是另一回事。企业家积累了巨额财富,工人的工资却停滞不前,工作条件也日益恶化。大多数美国人已感受到经济体制被操纵有利于富人。民意调查发现,对 AI 在日常生活中的作用最乐观的美国人群体是年收入超过 20 万美元的人。财富和权力日益集中在少数人手中。

  4. 艺术和文化活动与延缓衰老相关

    根据发表在《Innovation in Aging》期刊上的一项研究,唱歌、绘画、参观美术馆或博物馆有助于延缓衰老,这项研究将积极参与艺术文化活动与健康改善联系起来。延缓衰老速度并不一定意味着寿命更长。研究使用“表观遗传时钟”评估生物衰老,可预测未来的发病率和死亡率。根据研究使用的一种评估方法,每周至少进行一次艺术活动的人,其衰老速度减缓了 4%;而每月进行一次则减缓了 3%。另一项测试表明,每周至少进行一次艺术活动的人,其生理年龄平均比很少参与此类活动的人年轻一岁。而每周锻炼一次的人,按此标准仅年轻六个月。研究人员表示,艺术对延缓衰老速度是如此显著,堪比吸烟者和戒烟者之间的差异。

  5. 富士康证实遭遇网络攻击

    在勒索组织 Nitrogen 宣布它窃取了 8TB 数据逾 1100 万文件后,富士康证实它在北美地区的部分工厂遭遇了网络攻击,称“网络安全团队立即启动响应机制,采取多项运营措施,确保生产和交付的连续性。受影响的工厂目前正在恢复正常生产”。Nitrogen 声称它窃取的文件包括了英特尔、苹果、Google、戴尔和英伟达等公司项目相关的技术图纸。这不是富士康第一次遭遇勒索组织的网络攻击,此前 LockBit 先后在 2022 年和 2024 年攻击了富士康旗下子公司。

  6. 亚马逊员工用 AI 工具刷分

    亚马逊员工正利用一款内部 AI 工具,把一些并无必要的工作自动化,以向管理层显示自己更频繁地使用了这项技术。亚马逊最近几周开始大规模部署其自研产品“MeshClaw”,它允许员工创建可接入办公软件、并代替用户执行任务的 AI 智能体。一些员工表示,有同事正利用这套软件,把额外、其实并无必要的 AI 操作也自动化,只为提高自己的 token 消耗量。亚马逊设定了目标,要求逾八成开发者每周使用 AI,还在今年早些时候开始在内部排行榜上追踪 AI token 的使用情况,给员工带来了采用该技术的压力。于是部分员工利用 MeshClaw 最大化 token 消耗量。亚马逊告诉员工,AI token 统计数据不会用于绩效评估。但员工认为管理层正在监控这些数据。

  7. 欧盟的浏览器选择屏为 Firefox 增加了数百万用户

    欧盟的 Digital Markets Act(DMA)强制要求苹果和 Google 向消费者提供浏览器选择屏,允许消费者选择非默认浏览器如 Safari 或 Chrome。Mozilla 估计,浏览器选择屏为它带来了大约 600 万用户,其中 iOS 平台上的用户数增长了 113%,而 Android 只增加 12%。这一差异可能与苹果和 Google 实现浏览器选择屏的方式有关:苹果用户在首次打开 Safari 时看到浏览器选择屏,而 Android 设备则是在首次启动或恢复出厂设置后。Mozilla 称,用户留存率比 DMA 实施前提高了五倍。浏览器开发商 Aloha、Brave、Opera 和 Vivaldi 此前也披露 DMA 强制实施后的最初几天和几周内,用户数量都出现了显著增长。Mozilla 希望 DMA 也应适用于桌面操作系统,指责微软使用欺骗性的设计策略推广其 Edge 浏览器。

  8. FCC 允许外国路由器在 2029 年前继续获得更新

    今年 3 月 FCC 将所有外国制造的消费级路由器加入到监管清单,在美国禁售此类产品。FCC 给出的理由是国家安全,称“恶意行为者利用外国制造路由器的安全漏洞攻击美国家庭、破坏网络、进行间谍活动和窃取知识产权”。美国居民仍然可以使用现有的外国制造路由器,外国制造商允许在 2027 年 3 月之前向美国客户提供有限的维护和安全补丁。现在 FCC 将这一期限延长到 2029 年 1 月,除了提供安全补丁外,外国路由器制造商还允许提供大型的软件和固件更新。此举旨在为美国居民使用的外国路由器提供持久的安全,为更换设备留出更多的时间。

  9. 韩国总统幕僚提议向全民发放“ AI 红利”

    韩国总统府政策室长金容范(Kim Yong-beom)在 Facebook 上发帖,提议将韩国芯片制造商一部分激增的利润进行再分配,将 AI 繁荣所产生的利润与全体公民共享。金容范认为 AI 带来的经济收益至少部分是建立在韩国过去五十年建设的工业基础设施之上,他指出 AI 时代的超额利润是集中化的,少数芯片公司和少数人获得巨额收益。上个月数万民众聚集在三星主要芯片工厂外,要求员工获得更高的 AI 利润分成。三星工会要求将 15% 的营业利润分配给芯片部门员工。工会威胁将于 5 月 21 日开始为期 18 天的罢工。

  10. Google 宣布以 AI 为核心的新笔电 Googlebook

    Google 宣布了运行桌面模式 Android 操作系统的新笔电 Googlebook,但除了一遍遍强调 AI、AI 以及 AI 外没有透露更多信息,比如实现 AI 操作所需的运算是否要借助于云端,是否会将用户在笔记本上的所有操作上传到云端,搜索巨人除了突出 AI 外没有解释是否存在类似微软 Recall 的隐私问题。Googlebook 和运行 Chrome OS 的 Chromebook 一样,由第三方公司宏碁、华硕、戴尔、惠普和联想制造和出售。Google 称 Googlebook 深度集成了 Gemini Intelligence,仅仅移动光标就能激活被称为“Magic Pointer”的 AI 功能,AI 会分析屏幕上的内容,根据上下文提供建议,能从多个应用中提取数据。比如将光标指向电邮中的日期即可创建日历预约。Googlebook 也能与用户的 Android 手机深度交互,能将手机上的应用串流到笔记本上,能将手机上的文件传输到电脑上。

  11. 社媒上的毒性

    2025 年 12 月斯坦福大学的研究人员分析了 22 亿条社媒帖子,寻找模式识别发布有毒内容的用户比例。所谓有毒内容指的是充斥着仇恨的极端主义内容。那么发布有毒内容的用户比例多高呢?可能比你想象的低得多,但此类内容被推荐算法放大而让很多人以为它们是主流。在 Twitter/X 上,有毒推文的转发量比非有毒推文高约 86%,曝光度高约 27%;0.3% 的用户分享了 80% 的争议新闻;6% 的用户发布了约 73% 的政治推文。在 TikTok 上,25% 的用户发布了 98% 的公开视频。具体数字有所不同,但本质相同:少数活跃用户压倒了绝大多数用户。研究人员发现的社媒模式是:沉默的大多数,因担心表达异议而社交孤立,大多数用户要么保持沉默要么离开平台,将平台空间让给了表达极端观点的用户;积极发帖的少数人会陷入认知偏差,认为自己属于多数派。

  12. 土星冰环可能源自其卫星

    长期以来,土星环究竟是如何形成的,一直都是争论的焦点。最新的数值模拟指出,壮丽的行星环系统并非与土星同时诞生,而是在约 1 亿年前才形成。这项由美中联合研究团队提出的假说,将环的起源归功于一颗被命名为蝶蛹(Chrysalis)的古老卫星,在强大引力作用下发生的结构性毁灭。该卫星的物理规格与现今的土星第三大卫星土卫八(Iapetus)相仿,直径约 1,469 公里,且具备分层化的内部结构,由岩石核心与外层冰壳组成。研究指出,蝶蛹卫星原本运行于非常狭长的椭圆轨道,最近轨道距离土星半径的1至1.5倍区域,这正是冰质天体的洛希极限(Roche limit)临界范围。在此区域内,土星强大的潮汐力克服了卫星自身的结构强度,迫使其在引力撕扯下发生彻底的崩解。卫星解体后的残骸大部分被土星引力捕获,历经演化后形成了广阔的行星环,其余部分则逃逸至太空。研究显示,初期的土星环规模可能远超现今观测所见,但随后受到土卫六(Titan)等大型卫星的引力影响,大量物质被移除或重新分配。

  13. 欧盟准备对 TikTok 和 Instagram 的成瘾性设计采取行动

    欧盟委员会主席 Ursula von der Leyen 周二表示欧盟将在今年晚些时候对 TikTok 和 Instagram 等平台上的成瘾性设计功能采取行动。此类功能包括了无限滚动、自动播放和推送通知。欧盟委员会最早将在今年夏天公布一项法律提议,目前正在等待 Special Panel of experts on Child Safety Online 的调查报告。

  14. 研究发现工作时间减少与肥胖率下降相关

    欧洲肥胖大会公布的一项研究比较了 1990-2022 年间 33 个经合组织国家的工作模式和肥胖率。结果发现,美国、墨西哥和哥伦比亚等年工作时间较长的国家肥胖率也更高,即使北欧国家的平均能量和脂肪摄入量高于拉美国家。年工作时间减少 1% 与肥胖率下降 0.16% 相关。研究人员认为,工作压力和缺乏锻炼时间可能是工作时长更多的人容易发胖的原因。研究主要作者、澳大利亚昆士兰大学的 Pradeepa Korale-Gedara 博士表示,压力增加会提高皮质醇激素水平,导致人们在无法消耗能量的工作中储存更多脂肪。研究人员强调这一发现是相关性的,并不代表因果关系。但它促使专家再次呼吁推行四天工作制,四天工作制有助于人们在饮食、运动和睡眠方面做出更健康的选择,有助于促进整个社会的健康。

  15. Digg 再次尝试重启,将转向 AI 新闻聚合

    Digg 今年一月初上线了一个 Reddit 克隆版本,提供类似的基于兴趣的社区。但两个月后就宣布关闭,理由是机器人账号泛滥。现在 Digg 准备再次尝试重启,这一次是转向它曾经的模样:新闻聚合。Digg 向 Beta 测试用户展示了新网站的预览,目标是追踪某个领域最具影响力的声音,推送真正值得关注的新闻。AI 是 Digg 目前测试的领域,如果成功将扩展到其他主题。Digg 会实时从 X 抓取内容以判断讨论热点,同时还会进行情感分析、聚类分析和信号检测,判断哪些内容最重要。

NEWSLETTER · FREE · WEEKLY

OrangeBot Weekly

5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.