DIGEST · 2026-05-07

OrangeBot.AI Digest — 2026-05-07

88 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Building for the Future (blog.cloudflare.com)
  2. Dirtyfrag: Universal Linux LPE (www.openwall.com)
  3. AI slop is killing online communities (rmoff.net)
  4. DeepSeek 4 Flash local inference engine for Metal (github.com)
  5. Agents need control flow, not more prompts (bsuh.bearblog.dev)
  6. Chrome removes claim of On-device Al not sending data to Google Servers (old.reddit.com)
  7. Motherboard sales 'collapse' amid unprecedented shortages fueled by AI (www.tomshardware.com)
  8. I want to live like Costco people (tastecooking.com)
  9. AlphaEvolve: Gemini-powered coding agent scaling impact across fields (deepmind.google)
  10. Child marriages plunged when girls stayed in school in Nigeria (www.nature.com)
  11. The Burning Man MOOP Map (www.not-ship.com)
  12. Grand Theft Oil Futures: Insider traders keep making a killing at our expense (paulkrugman.substack.com)
  13. LinkedIn profile visitor lists belong to the people, says Noyb (www.theregister.com)
  14. Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004) (www.ticalc.org)
  15. Chevrolet Performance eCrate package (400v/200hp) (www.chevrolet.com)

GitHub Trending(13)

  1. anthropics / financial-services
  2. Hmbown / DeepSeek-TUI
  3. z-lab / dflash
  4. InsForge / InsForge
  5. LearningCircuit / local-deep-research
  6. addyosmani / agent-skills
  7. VectifyAI / PageIndex
  8. vercel-labs / open-agents
  9. docusealco / docuseal
  10. decolua / 9router
  11. PriorLabs / TabPFN
  12. aaif-goose / goose
  13. Augani / openreel-video

Product Hunt(15)

  1. DevPass by LLM Gateway

    One key to access every coding model in 3 flat prices

  2. Memory Tags

    Scan text to make flashcards and improve your memory

  3. Saydi

    Live voice translation to hear the conversation instantly

  4. GetThis

    Turn voice, text, or screenshots into tasks.

  5. Forge

    A complete React toolkit made for AI

  6. Askmeety

    The best meeting notes you never wrote and 100% on your Mac

  7. Lingo.dev v1

    Localization engineering platform for consistent translation

  8. LikeTony.ai

    Make your landing sound like Elon, Jobs or Yoda

  9. Phrony

    Ship AI agents without the operational burden

  10. Contextual Moderation for Chat

    AI-powered moderation for safer chat experiences

  11. GPT‑5.5 Instant

    Smarter, more personal answers as ChatGPT's new default

  12. reMarkable Paper Pure

    The reMarkable 2 successor goes back to basics

  13. ExploreYC

    Your data layer for Y Combinator's startup ecosystem

  14. FlowMarket

    A social network of AI agents generating B2B deals

  15. Google Pomelli Catalog

    Turn a product catalog into branded campaign assets

Hugging Face(15)

  1. Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

    Distillation-based acceleration has become foundational for making autoregressive streaming video diffusion models practical, with distribution matching distillation (DMD) as the de facto choice. Existing methods, however, train the student to match the teacher's output indiscriminately, treating every rollout, frame, and pixel as equally reliable supervision. We argue that this caps distilled quality, since it overlooks two complementary axes of variance in DMD supervision: Inter-Reliability across student rollouts whose supervision varies in reliability, and Intra-Perplexity across spatial regions and temporal frames that contribute unequally to where quality can still be improved. The objective thus conflates two questions under a uniform weight: whether to learn from each rollout, and where to concentrate optimization within it. To address this, we propose Stream-R1, a Reliability-Perplexity Aware Reward Distillation framework that adaptively reweights the distillation objective at both rollout and spatiotemporal-element levels through a single shared reward-guided mechanism. At the Inter-Reliability level, Stream-R1 rescales each rollout's loss by an exponential of a pretrained video reward score, so that rollouts with reliable supervision dominate optimization. At the Intra-Perplexity level, it back-propagates the same reward model to extract per-pixel gradient saliency, which is factored into spatial and temporal weights that concentrate optimization pressure on regions and frames where refinement yields the largest expected gain. An adaptive balancing mechanism prevents any single quality axis from dominating across visual quality, motion quality, and text alignment. Stream-R1 attains consistent improvements on all three dimensions over distillation baselines on standard streaming video generation benchmarks, without architectural modification or additional inference cost.

  2. Stream-T1: Test-Time Scaling for Streaming Video Generation

    While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering computational overhead while enabling fine-grained temporal control. Driven by this insight, we introduced Stream-T1, a pioneering comprehensive TTS framework exclusively tailored for streaming video generation. Specifically, Stream-T1 is composed of three units: (1) Stream -Scaled Noise Propagation, which actively refines the initial latent noise of the generating chunk using historically proven, high-quality previous chunk noise, effectively establishes temporal dependency and utilizing the historical Gaussian prior to guide the current generation; (2) Stream -Scaled Reward Pruning, which comprehensively evaluates generated candidates to strike an optimal balance between local spatial aesthetics and global temporal coherence by integrating immediate short-term assessments with sliding-window-based long-term evaluations; (3) Stream-Scaled Memory Sinking, which dynamically routes the context evicted from KV-cache into distinct updating pathways guided by the reward feedback, ensuring that previously generated visual information effectively anchors and guides the subsequent video stream. Evaluated on both 5s and 30s comprehensive video benchmarks, Stream-T1 demonstrates profound superiority, significantly improving temporal consistency, motion smoothness, and frame-level visual quality.

  3. RLDX-1 Technical Report

    While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. π_{0.5} and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while π_{0.5} and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.

  4. OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

    Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

  5. HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

    Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reasoning capabilities, they lack the capacity to predict future geometric evolution, creating a significant disparity between semantic interpretation and physical simulation. To bridge this gap, we propose HERMES++, a unified driving world model that integrates 3D scene understanding and future geometry prediction within a single framework. Our approach addresses the distinct requirements of these tasks through synergistic designs. First, a BEV representation consolidates multi-view spatial information into a structure compatible with LLMs. Second, we introduce LLM-enhanced world queries to facilitate knowledge transfer from the understanding branch. Third, a Current-to-Future Link is designed to bridge the temporal gap, conditioning geometric evolution on semantic context. Finally, to enforce structural integrity, we employ a Joint Geometric Optimization strategy that integrates explicit geometric constraints with implicit latent regularization to align internal representations with geometry-aware priors. Extensive evaluations on multiple benchmarks validate the effectiveness of our method. HERMES++ achieves strong performance, outperforming specialist approaches in both future point cloud prediction and 3D scene understanding tasks. The model and code will be publicly released at https://github.com/H-EmbodVis/HERMESV2.

  6. PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

    Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a "physical architect" to plan a "Hierarchical Physical Blueprint" defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.

  7. Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

    Reasoning-intensive retrieval aims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important for agentic search systems, where retrievers must provide complementary evidence across iterative search and synthesis. However, existing work remains limited on both evaluation and training: benchmarks such as BRIGHT provide narrow gold sets and evaluate retrievers in isolation, while synthetic training corpora often optimize single-passage relevance rather than evidence portfolio construction. We introduce BRIGHT-Pro, an expert-annotated benchmark that expands each query with multi-aspect gold evidence and evaluates retrievers under both static and agentic search protocols. We further construct RTriever-Synth, an aspect-decomposed synthetic corpus that generates complementary positives and positive-conditioned hard negatives, and use it to LoRA fine-tune RTriever-4B from Qwen3-Embedding-4B. Experiments across lexical, general-purpose, and reasoning-intensive retrievers show that aspect-aware and agentic evaluation expose behaviors hidden by standard metrics, while RTriever-4B substantially improves over its base model.

  8. D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

    The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion model where the LLM/VLM serves as the encoder can inherit its encoder's in-context capabilities. This enables us to make the training as an on-policy self-distillation process. Specifically, during training, we make the model acts as both the teacher and the student with different contexts, where the student is conditioned only on the text feature, while the teacher is conditioned on the multimodal feature of both the text prompt and the target image. Training minimizes the two predicted distributions over the student's own roll-outs. By optimized on the model's own trajectory and under it's own supervision, D-OPSD enables the model to learn new concept, style, etc. without sacrificing the original few-step capacity.

  9. Lightning Unified Video Editing via In-Context Sparse Attention

    Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \texttt{LIVEditor} , a novel lightning video editing model via ISA and a proposed video-editing data pipeline that curated a 1.7M high-quality dataset. Extensive experiments demonstrate that LIVEditor achieves a sim60% reduction in attention-module latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering near-lossless acceleration without compromising visual fidelity.

  10. Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

    We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

  11. MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

    Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

  12. ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

    Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects negative-token hidden representations onto an SVD-based low-rank positive subspace and uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4\% in Avg@16 and 7.0\% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

  13. Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback

    Estimating how well a person performs an action, rather than which action is performed, is central to coaching, rehabilitation, and talent identification. This task is challenging because proficiency is encoded in subtle differences in timing, balance, body mechanics, and execution, often distributed across multiple views and short temporal events. We discuss three recent contributions to multi-view proficiency estimation on Ego-Exo4D. SkillFormer introduces a parameter-efficient discriminative architecture for selective multi-view fusion; PATS improves temporal sampling by preserving locally dense excerpts of fundamental movements; and ProfVLM reformulates proficiency estimation as conditional language generation, producing both a proficiency label and expert-style feedback through a gated cross-view projector and a compact language backbone. Together, these methods achieve state-of-the-art accuracy on Ego-Exo4D with up to 20x fewer trainable parameters and up to 3x fewer training epochs than video-transformer baselines, while moving from closed-set classification toward interpretable feedback generation. These results highlight a shift toward efficient, multi-view systems that combine selective fusion, proficiency-aware sampling, and actionable generative feedback.

  14. SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

    The emergence of "vibe coding" platforms, where users describe applications in natural language and AI agents autonomously generate full-stack software, has created a need for rigorous evaluation beyond code-level benchmarks. In order to assess them as virtual software development agencies on understanding business requirements, making architectural decisions, writing production code, handling iterative modifications, and maintaining business readiness, we introduce SWE-WebDev Bench, a 68-metric evaluation framework spanning 25 primary and 43 diagnostic metrics across seven groups, organized along three dimensions: Interaction Mode (App Creation Request (ACR) vs. App Modification Request (AMR)), Agency Angle (Product Manager (PM), Engineering, Ops), and Complexity Tier (T4 multi-role SaaS, T5 AI-native). Our evaluation (six platforms, three domains, 18 evaluation cells) reveals four recurring shortcomings in the current generation of AI app builders: (1) A specification bottleneck, where platforms compress rich business requirements into oversimplified technical plans, (2) A pervasive frontend-backend decoupling, where visually polished UIs mask absent or broken backend infrastructure, (3) A steep production-readiness cliff, where no platform scores above 60% on engineering quality and post-generation human effort varies substantially across platforms and (4) Widespread security and infrastructure failures, with no platform exceeding 65% Security Score against a 90% target and concurrency handling as low as 6%. These observations are descriptive of our sample and require larger-scale replication to establish generality. We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps. Code and benchmark resources are available at: https://github.com/snowmountainAi/webdevbench and https://webdevbench.com/.

  15. Diffusion Model as a Generalist Segmentation Learner

    Diffusion models are primarily trained for image synthesis, yet their denoising trajectories encode rich, spatially aligned visual priors. In this paper, we demonstrate that these priors can be utilized for text-conditioned semantic and open-vocabulary segmentation, and this approach can be generalized to various downstream tasks to make a general-purpose diffusion segmentation framework. Concretely, we introduce DiGSeg (Diffusion Models as a Generalist Segmentation Learner), which repurposes a pretrained diffusion model into a unified segmentation framework. Our approach encodes the input image and ground-truth mask into the latent space and concatenates them as conditioning signals for the diffusion U-Net. A parallel CLIP-aligned text pathway injects language features across multiple scales, enabling the model to align textual queries with evolving visual representations. This design transforms an off-the-shelf diffusion backbone into a universal interface that produces structured segmentation masks conditioned on both appearance and arbitrary text prompts. Extensive experiments demonstrate state-of-the-art performance on standard semantic segmentation benchmarks, as well as strong open-vocabulary generalization and cross-domain transfer to medical, remote sensing, and agricultural scenarios-without domain-specific architectural customization. These results indicate that modern diffusion backbones can serve as generalist segmentation learners rather than pure generators, narrowing the gap between visual generation and visual understanding.

Techmeme(15)

  1. Nvidia and data center operator IREN announce a deal to deploy up to 5 GW of AI infrastructure; Nvidia can invest $2.1B into IREN; IREN jumps 9%+ after hours (Jonathan Vanian/CNBC)

    Jonathan Vanian / CNBC : Nvidia and data center operator IREN announce a deal to deploy up to 5 GW of AI infrastructure; Nvidia can invest $2.1B into IREN; IREN jumps 9%+ after hours —  IREN shares popped 13% in extended-trading on Thursday after the data center operator announced a partnership with semiconductor giant Nvidia.

  2. Cloudflare reports Q1 revenue up 34% YoY to $639.8M, plans to cut 1,100+ jobs as it shifts to an "agentic AI-first operating model"; NET drops 13%+ after hours (Ignacio Gonzalez/Bloomberg)

    Ignacio Gonzalez / Bloomberg : Cloudflare reports Q1 revenue up 34% YoY to $639.8M, plans to cut 1,100+ jobs as it shifts to an “agentic AI-first operating model”; NET drops 13%+ after hours —  Cloudflare Inc. plans to cut more than 1,100 jobs globally as it accelerates its shift to an agentic AI-first operating model.

  3. Airbnb reports Q1 revenue up 18% YoY to $2.68B, vs. $2.62B est., Nights and Seats Booked up 9% to 156.2M, vs. 155.77M est., lifts 2026 revenue growth guidance (Samantha Subin/CNBC)

    Samantha Subin / CNBC : Airbnb reports Q1 revenue up 18% YoY to $2.68B, vs. $2.62B est., Nights and Seats Booked up 9% to 156.2M, vs. 155.77M est., lifts 2026 revenue growth guidance —  Airbnb reported mixed first-quarter results after the bell on Thursday and warned of regional weakness spurred by the war in Iran.

  4. Coinbase reports Q1 revenue down 31% YoY to $1.41B, vs. $1.52B est., and a loss of $1.49 per share, vs. a $0.27 profit est.; COIN drops 4%+ after hours (CNBC)

    CNBC : Coinbase reports Q1 revenue down 31% YoY to $1.41B, vs. $1.52B est., and a loss of $1.49 per share, vs. a $0.27 profit est.; COIN drops 4%+ after hours —  Coinbase posted lower-than-expected results for the first quarter as crypto prices fell, weighing on one of the companies' major revenue drivers — spot trading in digital assets .

  5. Goldman Sachs: Alphabet and Amazon generated "other income" totaling $53B in Q1, or ~60% of their Q1 income; $49B was due to equity stakes in private companies (Robin Wigglesworth/Financial Times)

    Robin Wigglesworth / Financial Times : Goldman Sachs: Alphabet and Amazon generated “other income” totaling $53B in Q1, or ~60% of their Q1 income; $49B was due to equity stakes in private companies —  The AI “hyperscalers” reported bumper earnings for the first quarter, with both sales and profits beating expectations.

  6. Sources: OpenAI and Broadcom discuss terms for Broadcom to finance initial custom chip production for ~$18B, conditioned on Microsoft buying ~40% of the chips (Anissa Gardizy/The Information)

    Anissa Gardizy / The Information : Sources: OpenAI and Broadcom discuss terms for Broadcom to finance initial custom chip production for ~$18B, conditioned on Microsoft buying ~40% of the chips —  When OpenAI and chip designer Broadcom announced last fall that they would make custom artificial intelligence chips together, they positioned it as a done deal.

  7. Anthropic researchers detail "natural language autoencoders", which convert LLM activations, the numbers encoding a model's thoughts, into natural language text (Anthropic)

    Anthropic : Anthropic researchers detail “natural language autoencoders”, which convert LLM activations, the numbers encoding a model's thoughts, into natural language text —  When you talk to an AI model like Claude, you talk to it in words.  Internally, Claude processes those words …

  8. Sources: Ramp told investors it is raising $750M co-led by Iconiq Capital and GIC at a valuation of $40B+ before the investment, up from $32B in November 2025 (Kate Clark/Wall Street Journal)

    Kate Clark / Wall Street Journal : Sources: Ramp told investors it is raising $750M co-led by Iconiq Capital and GIC at a valuation of $40B+ before the investment, up from $32B in November 2025 —  The financing target represents a more than 30% increase from six months ago  —  The corporate card and expense management startup Ramp …

  9. EU legislators reach a deal to postpone restrictions on high-risk AI until December 2027 and to exempt the use of AI in industrial applications from the AI Act (Pieter Haeck/Politico)

    Pieter Haeck / Politico : EU legislators reach a deal to postpone restrictions on high-risk AI until December 2027 and to exempt the use of AI in industrial applications from the AI Act —  Deal marks first significant delay of digital rules amid pressure from the U.S.  —  Restrictions on high-risk uses …

  10. While Anthropic will use the Colossus 1 data center, which has a really bad environmental record, xAI retains the larger Colossus 2 for its own AI training (Simon Willison/Simon Willison's Weblog)

    Simon Willison / Simon Willison's Weblog : While Anthropic will use the Colossus 1 data center, which has a really bad environmental record, xAI retains the larger Colossus 2 for its own AI training —  There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far …

  11. Sources: AirPods with cameras reached an advanced testing stage; the cameras will feed data to Siri to help answer questions, rather than take photos or video (Mark Gurman/Bloomberg)

    Mark Gurman / Bloomberg : Sources: AirPods with cameras reached an advanced testing stage; the cameras will feed data to Siri to help answer questions, rather than take photos or video —  Apple Inc. has reached the late stages of development for new AirPods with built-in cameras, a significant milestone for what will likely …

  12. OpenAI launches three voice models in the API: GPT-Realtime-2 with GPT-5-class reasoning, GPT-Realtime-Whisper for transcription, and GPT-Realtime-Translate (Zac Hall/9to5Mac)

    Zac Hall / 9to5Mac : OpenAI launches three voice models in the API: GPT-Realtime-2 with GPT-5-class reasoning, GPT-Realtime-Whisper for transcription, and GPT-Realtime-Translate —  OpenAI has just released three new realtime voice models that it says will “unlock a new class of voice apps for developers.”

  13. OpenAI launches Trusted Contact, an optional safety feature for ChatGPT that lets adult users assign an emergency contact for mental health and safety concerns (Jess Weatherbed/The Verge)

    Jess Weatherbed / The Verge : OpenAI launches Trusted Contact, an optional safety feature for ChatGPT that lets adult users assign an emergency contact for mental health and safety concerns —  The feature expands existing teenage safety options to anyone over 18. … OpenAI is launching an optional safety feature …

  14. Elon Musk says SpaceX reserves "the right to reclaim the compute" from Anthropic if its "AI engages in actions that harm humanity" (Elon Musk/@elonmusk)

    Elon Musk / @elonmusk : Elon Musk says SpaceX reserves “the right to reclaim the compute” from Anthropic if its “AI engages in actions that harm humanity” —  @MobofJoggers @nottombrown Just as SpaceX launches hundreds of satellites for competitors with fair terms and pricing, we will provide compute to AI companies that are taking the right steps to ensure it is good for humanity. We reserve the right to reclaim the compute if their AI engages in actions that

  15. Filing: OnlyFans owner Leonid Radvinsky's wife assumed significant control of OnlyFans holding company Fenix in March following the death of Radvinsky (Bloomberg)

    Bloomberg : Filing: OnlyFans owner Leonid Radvinsky's wife assumed significant control of OnlyFans holding company Fenix in March following the death of Radvinsky —  Leonid Radvinsky's widow has surfaced as a beneficiary of the business empire behind adult-content platform OnlyFans, the latest example of …

Solidot(15)

  1. 2026年一季度全球智能手机售价创历史同期新高

    根据 Counterpoint Research 的报告,虽然因内存和储存价格飙升而导致手机价格上涨销量下滑,但 2026年一季度全球智能手机市场营收同比增长 8% 达到 1170 亿美元,平均售价同比上涨 12% 达到 399 美元,创下第一季度历史新高。苹果在 2026 年第一季度营收同比增长 22%,在前五大智能手机品牌中增速最快,同时也创下第一季度营收的历史新高。苹果首次于第一季度在全球智能手机市场出货量位居第一,市占率达 21%。三星在 2026 年第一季度的营收和出货量均位居第二。小米一季度出货量同比下降 19%,营收同比下降 18%。OPPO 和 vivo 分别位列 2026 年第一季度营收排名的第四和第五。

  2. 主板销量暴跌

    AI 热导致了计算机主要零部件如内存价格暴涨,连带导致其它受 AI 影响不大的零部件如主板销量暴跌。消费者因过高的价格而推迟了计算机升级。四大主板厂商都下调了销售目标。华硕在 2025 年售出了 1500 万块主板,而 2026 年上半年将只出货逾 500 万块主板,全年销量可能低于一千万,销量同比下降 33%。技嘉和微星去年分别售出 1150 万块和 1100 万块主板。如今两家公司都将 2026 年的内部销量预测下调至 900 万块(技嘉)和 840 万块(微星),分别下降 22% 和 24%。华擎受到最大冲击,预计其出货量将下降 37%,从 2025 年的 430 万块降至今年的 270 万块。这些数据意味着整个主板市场(至少四大厂商)萎缩了 28%。

  3. 传粉昆虫与农户健康与收入息息相关

    根据发表在《自然》期刊上的一项研究,传粉昆虫与农户健康与收入息息相关。英国研究人员观察了尼泊尔 10 个小农社区中 776 人的饮食、营养状况、耕作方式和社会经济状况,并记录了支持其营养和生计的传粉物种。他们发现,昆虫传粉者直接贡献了 44% 的农场收入,以及超过 20% 的维生素A、叶酸和维生素E摄入。研究表明,传粉者和人类的关系对于维持环境和人类健康至关重要。传粉者减少与营养摄入和家庭收入降低相关。

  4. Valve 以 CC 协议发布 Steam 手柄 CAD 文件

    Valve 在知识共享(CC)协议下发布了其上市即售罄的 Steam 手柄的 CAD 文件。用户可以自由使用这些文件制作底座支架、手柄装饰套或其它任何想要的东西。Valve 称手柄为用户所有,可以任意处置,但它建议用户将其交给专业人士处理,因为用户可以会不小心造成损害,而这不在保修范围之内。

  5. CRISPR-Cas12a2 能选择性摧毁细胞

    犹他州立大学的研究人员在《自然》期刊上发表论文,报告了基因编辑技术 CRISPR-Cas12a2 的突破,它能选择性摧毁细胞,对疾病治疗具有重大意义。治疗包括癌症在内的疾病的难点在于如何在不破坏健康组织的情况下清除恶性肿瘤或感染。更知名的 CRISPR-Cas9 基因编辑技术是使用引导 RNA 去结合互补 DNA,CRISPR-Cas12a2 是使用引导 RNA 去结合互补 RNA。研究人员报告,Cas12a2 具有特定靶向性,几乎没有脱靶效应,能选择性杀死含有单点突变的致癌细胞,而不含该突变的细胞不受影响,且没有观察到任何副作用。在小鼠实验中,新疗法在单次治疗后使肿瘤体积缩小了约 50%。

  6. SpaceX IPO 给予马斯克不受限制的权力以及禁止投资者提起诉讼

    根据 SpaceX 的 IPO 注册申报文件,SpaceX 组合超级投票权、强制仲裁等规则赋予马斯克(Elon Musk)等人广泛的控制权,马斯克拥有几乎不受制约的执行权,还限制了投资者/股东挑战管理层、提起诉讼以及就公司治理问题进行投票的能力,结果是唯一能解雇马斯克的人只有马斯克本人。马斯克目前持有 SpaceX 42.5% 的股权,拥有 83.8% 的投票权,上市后仍将有逾 50% 的投票权。SpaceX 的 IPO 文件属于保密文件,使得该公司可在不披露详细财务信息的情况下推进 IPO 进程。

  7. 泰国蝙蝠发现类新冠病毒

    东京大学的研究人员在《Cell》期刊上发表论文,报告在泰国的蝙蝠体内发现了一种遗传信息与新冠病毒相似的病毒。该病毒有可能具备感染人类的能力。新冠病毒 2019 年出现并引发全球大流行,它被广泛认为来自于蝙蝠。近年的研究已在东南亚野生蝙蝠身上检出多种与新冠病毒基因信息相似的病毒。但此前发现的病毒和新冠病毒不同,其无法与人类细胞表面的蛋白质结合,基本不具备感染人类的能力。研究团队针对泰国境内栖息的蝙蝠开展了病毒调查。结果成功锁定了一种可与人体细胞蛋白结合的新病毒。在实验室中测试现有新冠疫苗和治疗药物对新病毒是否有效,结果均显示有效。该病毒的增殖能力和致病性低于新冠病毒。

  8. SpaceX 降低了 Falcon 9 发射频率

    Falcon 9 是 SpaceX 的主力火箭,该公司还有重型火箭 Falcon Heavy 以及仍然处于研发阶段 Starship 火箭。2023 年 SpaceX 的 Falcon 火箭共执行了 96 次发射,2024 年完成了 134 次发射,2025 年 165 次。今年初 SpaceX 总裁 Gwynne Shotwell 表示今年计划执行 140 或 145 次 Falcon 发射,随着 Starship 投入服役发射量将减少。发射减少的趋势从其发射台的活动频率上已经显现出来。肯尼迪太空中心的 LC-39A 发射台已经不再用于发射 Falcon 9 火箭,改为发射少量 Falcon Heavy 以及 Starship。Falcon 9 在短期内不太会退役,其寿命预计会持续到 2030 年代之后,但其发射频率在降低。

  9. 未来的 reCAPTCHA 验证将需要手机

    如何遏制智能体给网站带来的欺骗性自动流量?Google 的下一代 reCAPTCHA 验证技术将通过二维码要求人类使用手机进行扫描操作来区分机器和人类。Google 对手机的要求是:运行 iOS v15.0-16.4 的苹果设备,需要下载专门的 reCAPTCHA app;Android 设备运行的 Google Play Services 版本必须高于 25.41.30,该版本是在 2025 年 10 月释出的。以后你可能需要比较新的手机设备才能正常浏览网站。

  10. 扎克伯格被控个人授权和鼓励公司侵犯版权

    五大出版商 Hachette、Macmillan、McGraw Hill、Elsevier 和 Cengage 以及作家 Scott Turow 起诉 Meta 公司及其 CEO 扎克伯格(Mark Zuckerberg),指控扎克伯格个人授权和积极鼓励大规模版权侵犯,使用盗版图书、期刊论文和网络抓取的资料训练 Meta 公司的 Llama AI 系统。Meta 否认有任何不当行为,表示将应诉,称法院已认定使用受版权保护的材料训练 AI 属于合理使用。用版权材料训练 AI 可能是合理使用,但 Meta 使用了非法手段获取了版权材料。起诉书称,为了赢得 AI 军备竞赛并构建一个功能完善的生成式 AI 模型,Meta 和扎克伯格遵循了其“快速行动打破常规”的信条,首先从盗版网站非法下载了数百万本受版权保护的书籍和期刊文章,未经授权抓取了几乎整个互联网的内容,构成了历史上最大规模的版权侵权之一。

  11. 大气二氧化碳浓度创下新纪录

    美国国家海洋和大气管理局(NOAA)位于夏威夷 Mauna Loa 的天文台纪录到四月二氧化碳平均浓度为创纪录的 431 ppm。气候变化非营利组织 Climate Central 的气候学家 Zachary Labe 表示,这一纪录令人沮丧但并不出人意料,表明随着地球持续变暖大气二氧化碳浓度不断增加。他解释说,大气中二氧化碳含量通常在每年四月达到峰值,因为冬季过后腐烂的植物会释放温室气体。部分二氧化碳会在温暖月份植物生长过程中重新吸收。但 NOAA 的数据显示显示了一个令人担忧的趋势,二氧化碳月平均浓度在持续上升。

  12. CNN 创始人 Ted Turner 去世,享年 87 岁

    CNN 创始人 Ted Turner 去世,享年 87 岁。他创办的 CNN 以 24 小时实时播报全球新闻闻名,对电视新闻产生了革命性影响。CNN 创办于 1980 年 6 月 1 日,是第一个 24 小时播报新闻的有线电视网。1995 年 CNN 出售给了时代华纳,Turner 退出了电视行业,他一直称 CNN 是其一生最伟大的成就。

  13. 研究称吃鸡蛋有助于降低阿尔茨海默病风险

    研究人员发现,每天吃一个鸡蛋,每周至少五天,可将患阿尔茨海默病风险降低最多 27%。每月吃 1-3 次鸡蛋可将风险降低 17%,每周吃 2-4 次鸡蛋可将风险降低 20%。研究人员称,鸡蛋能提供有益于大脑健康的关键营养素。鸡蛋提供胆碱,胆碱是乙酰胆碱和磷脂酰胆碱的前体,而乙酰胆碱和磷脂酰胆碱对记忆和突触功能至关重要。鸡蛋还含有叶黄素和玉米黄素——这些类胡萝卜素与认知能力的提高和氧化应激的降低有关。鸡蛋还含有重要的 ω-3 脂肪酸,蛋黄尤其富含磷脂,磷脂占鸡蛋总脂质的近 30%,对神经递质受体的功能至关重要。这项研究获得了美国鸡蛋委员会的资助。

  14. OpenAI 总裁被迫在法庭作证时阅读自己的个人日记

    马斯克(Elon Musk)上周在法庭上作证指控 OpenAI 的另外两位联合创始人 Greg Brockman 和 Sam Altman 放弃创办时的非营利使命以谋取个人私利。本周 Brockman 出庭作证,被迫在陪审团前阅读个人日记,似乎印证了马斯克的指控。Brockman 称他从学生时期就写日记,在职业生涯中通过写日记去思考重大决策。这些日记是在去年 10 月作为证据递交到法庭,今年 1 月解封。2017 年马斯克向 OpenAI 发出最后通牒,要么完全由他掌控 OpenAI 的营利性部门,要么 OpenAI 继续保持非营利性质。而 Brockman 同一时间在日记里畅谈了赚钱的好处。在 OpenAI 成立了不由马斯克掌控的营利性部门之后,Brockman 个人在 OpenAI 的股份如今价值 300 亿美元。他还在日记中纠结投票反对马斯克的计划或者投票支持将马斯克逐出董事会是否在道德上是错误的。他在日记中写道:“从他手中夺走这家非营利机构是错误的。在道德上是败坏的。”

  15. 奥斯卡奖拒绝 AI 演员和 AI 创作的剧本

    负责评选奥斯卡奖的美国电影艺术与科学学院宣布,只有人类表演和人类创作的剧本才有资格获得奥斯卡奖提名。奥斯卡奖不会全面禁止 AI 工具的使用,但将根据人类是否在创意作品中仍然扮演核心角色去评判电影。电影艺术与科学学院表示,如果电影制作人在作品中使用了 AI 工具,此类工具既不会帮助也不会损害其获得提名的机会。这是电影学院首次明确奖项只颁发给人类的表演和人类创作的剧本。