DIGEST · 2026-05-03

OrangeBot.AI Digest — 2026-05-03

80 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Statue of a man blinded by a flag put up by Banksy in central London (www.smithsonianmag.com)
  2. OpenAI's o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors (www.theguardian.com)
  3. I built my own hair electrolysis machine (www.scd31.com)
  4. Denuvo has been cracked in all single-player games it previously protected (www.tomshardware.com)
  5. Why TUIs Are Back (wiki.alcidesfonseca.com)
  6. BYOMesh – New LoRa mesh radio offers 100x the bandwidth (partyon.xyz)
  7. Metal Gear Solid 2's source code has been leaked on 4chan (www.thegamer.com)
  8. Southwest Headquarters Tour (katherinemichel.github.io)
  9. How far behind is each major Chromium browser? (chromium-drift.pages.dev)
  10. For thirty years I programmed with Phish on, every day (christophermeiklejohn.com)
  11. A desktop made for one (isene.org)
  12. Mercedes-Benz commits to bringing back physical buttons (www.drive.com.au)
  13. Utah to hold websites liable for users who mask their location with VPNs (www.tomshardware.com)
  14. Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML (acai.sh)
  15. Windows quality update: Progress we've made since March (blogs.windows.com)

GitHub Trending(9)

  1. ruvnet / ruflo
  2. TauricResearch / TradingAgents
  3. soxoj / maigret
  4. Hmbown / DeepSeek-TUI
  5. AIDC-AI / Pixelle-Video
  6. browserbase / skills
  7. czlonkowski / n8n-mcp
  8. 1jehuang / jcode
  9. openwrt / openwrt

Product Hunt(15)

  1. Mockin 2.0

    Ultimate career toolkit for UX/UI & Product designers

  2. Rosentic

    Catch when coding agents break each other before merge

  3. Radar

    The missing open-source Kubernetes UI

  4. PandaProbe

    open source agent engineering platform

  5. Aximote In-Car App

    The fitness tracker for your car

  6. Huddle01 VMs

    Virtual Machines for Your Agents

  7. Scholé

    Turn everyday work into personalized AI learning

  8. Filect

    Organize Your Files With AI

  9. Microsoft Copilot Health

    Dedicated space to bring your personal health data together

  10. YouTube TV Custom Multiview

    Mix and match up to 4 live streams at once

  11. Breaks

    A quiet Pomodoro that lives in your menu bar.

  12. Ara

    Build an entire business by texting

  13. Feather

    Photo editor with local AI

  14. Cloud Computer by Manus

    A dedicated cloud machine for bots and software

  15. Zed 1.0

    High-performance, open source, multiplayer code editor

Hugging Face(15)

  1. Heterogeneous Scientific Foundation Model Collaboration

    Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.

  2. Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

    Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis toward intelligent visual generation: plausible visuals grounded in structure, dynamics, domain knowledge, and causal relations. To frame this shift, we introduce a five-level taxonomy: Atomic Generation, Conditional Generation, In-Context Generation, Agentic Generation, and World-Modeling Generation, progressing from passive renderers to interactive, agentic, world-aware generators. We analyze key technical drivers, including flow matching, unified understanding-and-generation models, improved visual representations, post-training, reward modeling, data curation, synthetic data distillation, and sampling acceleration. We further show that current evaluations often overestimate progress by emphasizing perceptual quality while missing structural, temporal, and causal failures. By combining benchmark review, in-the-wild stress tests, and expert-constrained case studies, this roadmap offers a capability-centered lens for understanding, evaluating, and advancing the next generation of intelligent visual generation systems.

  3. Co-Evolving Policy Distillation

    RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first training experts and then performing OPD, though avoiding divergence, fails to fully absorb teacher capabilities due to large behavioral pattern gaps between teacher and student. We propose Co-Evolving Policy Distillation (CoPD), which encourages parallel training of experts and introduces OPD during each expert's ongoing RLVR training rather than after complete expert training, with experts serving as mutual teachers (making OPD bidirectional) to co-evolve. This enables more consistent behavioral patterns among experts while maintaining sufficient complementary knowledge throughout. Experiments validate that CoPD achieves all-in-one integration of text, image, and video reasoning capabilities, significantly outperforming strong baselines such as mixed RLVR and MOPD, and even surpassing domain-specific experts. The model parallel training pattern offered by CoPD may inspire a novel training scaling paradigm.

  4. Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

    Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliably reconstruct method evolution topologies from unstructured text. We introduce Intern-Atlas, a methodological evolution graph that automatically identifies method-level entities, infers lineage relationships among methodologies, and captures the bottlenecks that drive transitions between successive innovations. Built from 1,030,314 papers spanning AI conferences, journals, and arXiv preprints, the resulting graph comprises 9,410,201 semantically typed edges, each grounded in verbatim source evidence, forming a queryable causal network of methodological development. To operationalize this structure, we further propose a self-guided temporal tree search algorithm for constructing evolution chains that trace the progression of methods over time. We evaluate the quality of the resulting graph against expert-curated ground-truth evolution chains and observe strong alignment. In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graphs as a foundational data layer for the emerging automated scientific discovery.

  5. ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

    Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.

  6. Efficient Training on Multiple Consumer GPUs with RoundPipe

    Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer from an inherent limitation termed the weight binding issue. Binding uneven model stages (e.g., the LM head is large) to GPUs limits the pipeline's throughput to that of the GPU with the heaviest load, leading to severe pipeline bubbles. In this paper, we propose RoundPipe, a novel pipeline schedule that breaks the weight binding constraint on consumer GPU servers. RoundPipe treats GPUs as a pool of stateless execution workers and dynamically dispatches computation stages across devices in a round-robin manner, achieving a near-zero-bubble pipeline. To ensure training correctness and system efficiency, RoundPipe integrates a priority-aware transfer scheduling engine, a fine-grained distributed event-based synchronization protocol, and an automated layer partitioning algorithm. Evaluations on an 8times RTX 4090 server demonstrate that RoundPipe achieves 1.48--2.16times speedups over state-of-the-art baselines when fine-tuning 1.7B to 32B models. Remarkably, RoundPipe enables LoRA fine-tuning of the Qwen3-235B model with 31K sequence length on a single server. RoundPipe is publicly available as an open-source Python library with comprehensive documentation.

  7. Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

    LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a task was executed. We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot. Each release is constructed from public workflow-demand signals, with ClawHub Top-500 skills used in the current release, and materialized as controlled tasks with fixed fixtures, services, workspaces, and graders. For grading, Claw-Eval-Live records execution traces, audit logs, service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions. The release contains 105 tasks spanning controlled business services and local workspace repair, and evaluates 13 frontier models under a shared public pass rule. Experiments reveal that reliable workflow automation remains far from solved: the leading model passes only 66.7% of tasks and no model reaches 70%. Failures are structured by task family and execution surface, with HR, management, and multi-system business workflows as persistent bottlenecks and local workspace repair comparatively easier but unsaturated. Leaderboard rank alone is insufficient because models with similar pass rates can diverge in overall completion, and task-level discrimination concentrates in a middle band of tasks. Claw-Eval-Live suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.

  8. Leveraging Verifier-Based Reinforcement Learning in Image Editing

    While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores without detailed checks, ignoring different instruction requirements and causing biased rewards. To address this, we argue that the key is to move from a simple scorer to a reasoning verifier. We introduce Edit-R1, a framework that builds a chain-of-thought (CoT) verifier-based reasoning reward model (RRM) and then leverages it for downstream image editing. The Edit-RRM breaks instructions into distinct principles, evaluates the edited image against each principle, and aggregates these checks into an interpretable, fine-grained reward. To build such an RRM, we first apply supervised fine-tuning (SFT) as a ``cold-start'' to generate CoT reward trajectories. Then, we introduce Group Contrastive Preference Optimization (GCPO), a reinforcement learning algorithm that leverages human pairwise preference data to reinforce our pointwise RRM. After building the RRM, we use GRPO to train editing models with this non-differentiable yet powerful reward model. Extensive experiments demonstrate that our Edit-RRM surpasses powerful VLMs such as Seed-1.5-VL and Seed-1.6-VL as an editing-specific reward model, and we observe a clear scaling trend, with performance consistently improving from 3B to 7B parameters. Moreover, Edit-R1 delivers gains to editing models like FLUX.1-kontext, highlighting its effectiveness in enhancing image editing.

  9. Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

    Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

  10. Representation Fréchet Loss for Visual Generation

    We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

  11. Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

    We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in architecture, training data and recipes. In particular, Nemotron 3 delivers leading results in real-world document understanding, long audio-video comprehension, and agentic computer use. Built on the highly efficient Nemotron 3 Nano 30B-A3B backbone, Nemotron 3 Nano Omni further incorporates innovative multimodal token-reduction techniques to deliver substantially lower inference latency and higher throughput than other models of similar size. We are releasing model checkpoints in BF16, FP8, and FP4 formats, along with portions of the training data and codebase to facilitate further research and development.

  12. Synthetic Computers at Scale for Long-Horizon Productivity Simulation

    Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed. In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

  13. Step-level Optimization for Efficient Computer-use Agents

    Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this uniform allocation of compute is fundamentally inefficient for long-horizon GUI tasks. Such trajectories are highly heterogeneous: many steps are routine and can be handled reliably by smaller, cheaper policies, while errors tend to concentrate at a relatively small number of high-risk moments. Across computer-use benchmarks, these failures repeatedly take two forms: progress stalls, where the agent loops, repeats ineffective actions, or fails to make meaningful progress, and silent semantic drift, where the agent continues taking locally plausible actions after already deviating from the user's true goal. To address this inefficiency, we propose an event-driven, step-level cascade for computer-use agents that runs a small policy by default and escalates to a stronger model only when lightweight learned monitors detect elevated risk. Our framework combines two complementary signals: a Stuck Monitor that detects degraded progress from recent reasoning-action history and triggers recovery, and a Milestone Monitor that identifies semantically meaningful checkpoints where sparse verification is most informative for catching drift. This design turns always-on frontier-model inference into adaptive, on-demand compute allocation over the course of an evolving interaction. The framework is modular and deployment-oriented: it can be layered on top of existing computer-use agents without changing the underlying agent architecture or retraining the large model.

  14. The Last Human-Written Paper: Agent-Native Research Artifacts

    Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.

  15. InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

    With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which results in a failure mode that we term blind execution. To address this gap, we introduce InteractWeb-Bench, the first multimodal interactive benchmark for website generation under non-expert low-code user conditions. InteractWeb-Bench introduces four types of user agents and persona-driven instruction perturbations to systematically simulate diverse user behaviors, including ambiguity, redundancy, and contradiction, grounded in requirement engineering defect taxonomies. We develop an interactive execution environment for agents, featuring a unified action space comprising Clarify, Implement, Verify, and Submit, enabling iterative intent refinement, code synthesis, and visual feedback-based validation. Extensive experiments and analysis reveal that frontier MLLM-based agents remain trapped in blind execution, exposing limitations in intent recognition and adaptive interaction.

Techmeme(15)

  1. A look at Atlassian and Twilio earnings beats, with early signs of Atlassian's AI response success and Twilio becoming a picks-and-shovels layer for AI agents (Jason Lemkin/SaaStr)

    Jason Lemkin / SaaStr : A look at Atlassian and Twilio earnings beats, with early signs of Atlassian's AI response success and Twilio becoming a picks-and-shovels layer for AI agents —  So this week two of the more important bellwether names in B2B software reported earnings.  And neither of them just “beat.”

  2. Analysis: Asian suppliers account for ~90% of Nvidia's production costs, up from 65% in 2025, as latest wave of collaborations shifts from chips to physical AI (Abhishek Vishnoi/Bloomberg)

    Abhishek Vishnoi / Bloomberg : Analysis: Asian suppliers account for ~90% of Nvidia's production costs, up from 65% in 2025, as latest wave of collaborations shifts from chips to physical AI —  The list of Asian stocks that benefit from business partnership with Nvidia Corp. is getting longer, as the region further integrates …

  3. JLL: Japan's $23B data center market is set to grow ~50% by 2030, with 90% of sites concentrated in densely populated regions, prompting pushback from residents (Financial Times)

    Financial Times : JLL: Japan's $23B data center market is set to grow ~50% by 2030, with 90% of sites concentrated in densely populated regions, prompting pushback from residents —  Japan is getting ready for a huge surge in AI facilities — and complaints from nearby residents

  4. How Amazon's expansion into fashion helped Jeff Bezos enter fashion's inner circle, as he and Lauren Sánchez Bezos become underwriters for this year's Met Gala (Chavie Lieber/Wall Street Journal)

    Chavie Lieber / Wall Street Journal : How Amazon's expansion into fashion helped Jeff Bezos enter fashion's inner circle, as he and Lauren Sánchez Bezos become underwriters for this year's Met Gala —  The Amazon founder and Lauren Sánchez Bezos have become front-row fixtures through business expansion and charitable giving

  5. Nintendo's share price has fallen by ~45% since August 2025, as rising memory chip costs drive investor concerns over profit margins for the Switch 2 (David Keohane/Financial Times)

    David Keohane / Financial Times : Nintendo's share price has fallen by ~45% since August 2025, as rising memory chip costs drive investor concerns over profit margins for the Switch 2 —  Higher memory chip costs fuel fears of price rise for Switch 2 and cast shadow over console's success  —  The latest Super Mario movie released …

  6. A profile of BlackBerry's QNX division, whose operating system controls safety features in 275M cars and accounts for half of BlackBerry's revenue (Ben Cohen/Wall Street Journal)

    Ben Cohen / Wall Street Journal : A profile of BlackBerry's QNX division, whose operating system controls safety features in 275M cars and accounts for half of BlackBerry's revenue —  John Wall has spent nearly his entire career working for the same company.  And when he tells people where he works, nobody has any clue what he's talking about.

  7. An evaluation by NIST's CAISI says DeepSeek V4 Pro lags behind leading US AI models by about eight months and is the most capable Chinese AI model to date (NIST)

    NIST : An evaluation by NIST's CAISI says DeepSeek V4 Pro lags behind leading US AI models by about eight months and is the most capable Chinese AI model to date —  In April 2026, the Center for AI Standards and Innovation (CAISI) evaluated the open-weight AI model DeepSeek V4 Pro ("DeepSeek V4").

  8. A slew of top Boston Dynamics execs have left the Hyundai-owned company in recent months, as sources say it faces pressure to speed the delivery of humanoids (Rachyl Jones/Semafor)

    Rachyl Jones / Semafor : A slew of top Boston Dynamics execs have left the Hyundai-owned company in recent months, as sources say it faces pressure to speed the delivery of humanoids —  In recent months, a slew of top executives at Boston Dynamics have left the company, which Hyundai bought a majority stake in back in 2021.

  9. Sources: OpenAI employees have raised alarms internally over failures to alert law enforcement when users describe plans for real-world violence to ChatGPT (Georgia Wells/Wall Street Journal)

    Georgia Wells / Wall Street Journal : Sources: OpenAI employees have raised alarms internally over failures to alert law enforcement when users describe plans for real-world violence to ChatGPT —  OpenAI's chatbot dispenses advice on weapons and role-plays mass shootings.  The carnage is raising scrutiny on when and how companies intervene.

  10. Palo Alto Networks agrees to acquire Portkey, which develops AI gateway tech to manage and secure AI agents; sources say the deal values Portkey at $120M-$140M (The Economic Times)

    The Economic Times : Palo Alto Networks agrees to acquire Portkey, which develops AI gateway tech to manage and secure AI agents; sources say the deal values Portkey at $120M-$140M —  Cybersecurity giant Palo Alto Networks is acquiring AI infrastructure startup Portkey to bolster its defences for autonomous AI systems.

  11. Amadeus IT Group, which operates the world's largest travel booking system, plans to acquire French biometrics company Idemia Public Security, for €1.2B in cash (Javi West Larrañaga/Reuters)

    Javi West Larrañaga / Reuters : Amadeus IT Group, which operates the world's largest travel booking system, plans to acquire French biometrics company Idemia Public Security, for €1.2B in cash —  Spanish travel technology firm Amadeus (AMA.MC) on Wednesday announced a plan to acquire French biometrics company Idemia Public Security …

  12. Ask.com shutters, as its owner IAC "continues to sharpen its focus"; a dot-com era icon, Ask Jeeves launched in 1997, a year before Google (Chase DiBenedetto/Mashable)

    Chase DiBenedetto / Mashable : Ask.com shutters, as its owner IAC “continues to sharpen its focus”; a dot-com era icon, Ask Jeeves launched in 1997, a year before Google —  “As IAC continues to sharpen its focus, we have made the decision to discontinue our search business, which includes Ask.com.

  13. Maryland becomes the first US state to ban surveillance pricing in grocery stores, as other states including CO, CA, MA, IL, and NJ consider similar bills (Sanya Mansoor/The Guardian)

    Sanya Mansoor / The Guardian : Maryland becomes the first US state to ban surveillance pricing in grocery stores, as other states including CO, CA, MA, IL, and NJ consider similar bills —  Critics say Maryland's new law banning rapidly change product costs based on consumer data is full of carveouts

  14. Analysis: after Trump's World Liberty raised $550M from investors, tokens worth hundreds of millions in USD were privately sold in "white glove" transactions (Olga Kharif/Bloomberg)

    Olga Kharif / Bloomberg : Analysis: after Trump's World Liberty raised $550M from investors, tokens worth hundreds of millions in USD were privately sold in “white glove” transactions —  The pitch was straightforward: Invest in the cryptocurrency venture of Donald Trump and his family …

  15. Investigation: Nobitex was founded by two brothers from Iran's elite Kharrazi family; the crypto exchange processed hundreds of millions beyond US sanctions (Reuters)

    Reuters : Investigation: Nobitex was founded by two brothers from Iran's elite Kharrazi family; the crypto exchange processed hundreds of millions beyond US sanctions —  Two brothers from the elite Kharrazi family, using an alternative surname, started up Nobitex in 2018.

Solidot(11)

  1. VS Code 默认在 commit 中插入 Co-Authored-by Copilot

    微软的编辑器 VS Code 被发现默认在 commit 中插入了 Co-Authored-by Copilot,不管用户有没有使用其 AI 助手 Copilot。此事再次在用户中引发了大量批评。微软开发者回应称他们将会在下个版本中解决默认启用的问题,称如果用户没有使用 AI 助手那么就不应该说代码是 Copilot 合作编写的。

  2. 中国三月绿色技术出口增长七成

    因霍尔木兹海峡封锁引发的新一轮能源危机,世界各国正加速向清洁能源转型,最大的绿色技术出口国中国三月的太阳能、电池和电动汽车的出口总额同比增长 70%,其中出口的太阳能装机容量达到 68GW,电池出口额达到 100 亿美元,电动汽车和混合动力汽车出口同比增长 140%。多达 50 个国家从中国进口的太阳能设备都创历史新高。

  3. Steam 用户中使用 Linux 比例占 4.52%

    2026 年 3 月 Steam 玩家中使用 Linux 比例达到了史无前例的 5.33%,比前一个月增加了一倍多。根据 Valve 公布的 2026 年 4 月 Steam 硬件和软件调查,Steam 用户中使用 Linux 比例回落到了 4.52%,减少 0.81%,但仍然比去年同期翻了一番。Windows 操作系统的比例提高到 93.47%,OSX 占 2.01%。有众多证据表明 Linux 上的游戏表现有了翻天覆地的变化,而 Linux 下游戏的一大特性是需要的资源比 Windows 更少,在今天内存价格飙升的时期显得更有吸引力。其它数据显示:简体中文用户比例占 23.41%,英语用户占 36.77% 。用户使用英特尔 CPU 的比例占 55.81%,AMD 占 44.18%,几乎和前一个月相同。

  4. 英国 NHS 以 AI 为由准备关闭所有开源库

    日程安排平台 Cal.com 上月宣布从开源转为闭源,理由是 AI 工具更容易从开源代码中发现漏洞,而安全性依赖于模糊,因此闭源有助于提高安全。现在英国国家医疗服务体系(NHS)以相同的理由准备关闭它几乎所有的开源库,这一决定引发了广泛争议和批评。批评者指出 NHS 公布的大部分开源库是数据集、内部工具、指南、研究工具、前端设计等,它们不会因为安全扫描技术的进步而受到影响。此外是否开源对于 Anthropic Mythos 之类的 AI 工具并无区别,因为它们也能分析二进制程序并寻找漏洞。批评者发表了公开信,呼吁 NHS 保持其代码公开。

  5. 杭州法院裁定以 AI 代替人类为由裁员是系违法

    杭州市中级人民法院公布了一起有关“AI 接替人类员工”的判例,判决公司因“AI 成本比人工低”而辞退员工系违法行为,涉事企业需要支付赔偿金 26 万元人民币。在本案中,现年 35 岁的小周(化名) 2022 年入职杭州某家科技公司担任 AI 大模型“质检员”,负责对 AI 大模型与用户交互形成的答案进行正确性判定。2025 年,该公司以“AI 大模型技术升级,原来需要人工完成的质检工作,现在 AI 自己就能做了”为由,试图对小周进行调岗降薪:从主管降为普通员工、月薪从 2.5 万元人民币降到 1.5 万元。小周拒绝如此安排,随后就被公司解除劳动合同。小周申请劳动仲裁,仲裁庭判定公司应当支付违法解除劳动合同赔偿金 26 万余元。该公司不服,因此诉诸法庭。杭州市中级人民法院审理后认定,该公司解约非因裁撤业务、经营不善、减少亏损等消极因素,而是以 AI 的成本优势为由,不属于劳动合同无法履行的“客观情况重大变化”。而且该公司之前为小周提供的调岗降薪方案,实际上导致待遇大幅下降,并非合理协商方案。因此法庭认定该公司构成违法解除,支持仲裁结果,判决其按 2N 标准支付小周赔偿金。杭州市中级人民法院民事第五庭庭长丁晔对媒体表示,在企业视角下,应用 AI 提效降本是市场竞争的必然选择;而在劳动者视角下,因技术变革而失去岗位或被降薪,实质是公司将正常的技术迭代风险转嫁给劳动者。

  6. 人可以在睡梦中交流和学习

    很多人都有过在睡梦中获得灵感的经历。这种现象促使科学家研究睡眠学习。在 1954 年 Charles W. Simon 和 William H. Emmons 认为大多数睡眠学习研究的参与者其实都是清醒的,因此此类研究都毫无意义。他们将睡眠研究归类为科幻和伪科学,之后几十年很少有人再对此展开研究。但最近几年,科学家再次尝试展开研究。新研究主要针对清醒梦者,即在睡眠中保持意识清醒,并意识到自己正在做梦的人。根据发表在《Neuroscience of Consciousness》期刊上的一项研究,20 名清醒梦者在实验室里尝试睡梦中解谜。每个谜题都与特定的声音配对,旨在促使他们恢复处理相应的谜题。在实验室里,参与者解开了梦中出现的谜题的 42%,对于没有出现在梦中的谜题,他们只解开了 17%。大多数人不会做清醒梦,所以研究对象并不具有代表性。研究人员认为一种解释是:我们在睡着时,更可能将不相关的刺激联系起来。研究人员并不建议为了睡眠中学习而干扰睡眠,因为睡眠是重要的生理过程,干扰这一过程可能得不偿失。

  7. Ask.com 关闭

    有 30 年历史的搜索引擎 Ask.com 于 2026 年 5 月 1 日关闭。Ask.com 创办于 1996 年 6 月,最初的名字叫 AskJeeves.com,2006 年弃用了名字 Jeeves 变成了 Ask.com,成为了搜索引擎,有自己的爬虫和算法。2010 年面对大型搜索引擎的竞争它将网络搜索技术外包,恢复了问答网站的功能。Ask.com 虽然关闭了,但 AskJeeves.com 仍继续运营。Jeeves 的意思是贴身侍从,名字来自于英国作家 P. G. Wodehouse 作品《Jeeves》系列,Jeeves 是绅士 Bertie Wooster 的贴身男仆。

  8. 为什么 OpenAI 的系统提示词要专门限制 Goblins

    OpenAI Codex CLI 系统提示词专门加入了一条对地精(Goblins)等词的限制:“never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query”。官方解释称,从 GPT-5.1 开始该公司的模型在比喻中提及 goblin 等词的频率大增,ChatGPT 中 goblin 的使用量增加了 175%,gremlin 使用量增加了 52%。它为此展开了调查,发现是因为 Nerdy 个性无意中奖励了此类比喻,导致高频使用 goblin 的行为扩散。为解决该问题,OpenAI 淘汰了 Nerdy 个性,移除了对 goblin 友好的奖励信号,从训练数据过滤掉相关示例,防止其再次不恰当的出现。

  9. 瑞士将于六月公投是否将人口限制在一千万

    瑞士将于 6 月 14 日举行全民公投,决定是否在 2050 年前将全国常住人口限制在一千万以内。瑞士的人口出生率为每名妇女生育 1.29 个孩子,远低于 2.1 的人口替代率,它的人口增长主要归因于外来移民。目前瑞士人口已超过 900 万,官方数据显示,2024 年外国公民占到了瑞士总人口的 27% 以上。右翼的瑞士人民党(Swiss People's Party)支持的提案要求“2050 年前瑞士常住人口不得超过 1000 万,且瑞士应放弃与欧盟的自由流动协议”。对瑞士 16176 名受访者的最新民调显示,52%的人支持或倾向于支持该提案,46% 的人反对,其余未表明立场。

  10. 内核曝出 Root 提权漏洞 Copy Fail

    Xint Code 团队报告了被称为 Copy Fail 的内核 root 提权漏洞。该漏洞非常容易利用,影响 2017 年以来的几乎所有内核版本。在漏洞披露前内核安全团队没有提前通知发行版也引发了争议。内核不将损坏的页面标记为可写回,因此磁盘上的文件内容不变,但内存中的页面缓存已被篡改。访问文件时,系统读取的是页面缓存,因此损坏的数据会立即影响整个系统。本地非特权用户可通过损坏 setuid 二进制文件的页面缓存获取 root 权限。由于页面缓存在主机和容器之间共享,攻击者可以跨容器边界利用此漏洞。该漏洞影响几乎所有发行版,主要发行版都已经释出或准备释出补丁。

  11. Mozilla 反对 Chrome 的 Prompt API

    Google Chrome 在 2025 年提出了 Prompt API,也就是为浏览器集成的本地模型——使用前需要下载——提供统一的 JavaScript API。Google 还有意让该 API 成为一个 W3C 标准。Chrome 桌面版集成的大模型是 Gemini Nano,使用该模型需要本地设备至少有 4GB 显存、16GB 内存和至少 22GB 可用空间(浏览器所在硬盘)。Mozilla 开发者发表声明反对 Chrome 的 Prompt API。开发者认为该 API 存在巨大的互操作性问题,因为不同的模型都有各种独特的特性,因此系统提示词需要对模型进行针对性调整,然而对一个模型进行的调整对另一个模型就可能是过度修正。为了实现互操作性,Mozilla 和 Apple 可能不得不获得 Google 模型的授权,或者发布一个与 Google 模型特性兼容的模型。另一个大问题是模型的中立性缺乏。