OrangeBot.AI Digest — 2026-05-06
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- BYD overtakes Tesla and Kia as the best-selling EV brand in key overseas markets (electrek.co)
- From Supabase to Clerk to Better Auth (blog.val.town)
- Appearing productive in the workplace (nooneshappy.com)
- Higher usage limits for Claude and a compute deal with SpaceX (www.anthropic.com)
- Valve releases Steam Controller CAD files under Creative Commons license (www.digitalfoundry.net)
- Vibe coding and agentic engineering are getting closer than I'd like (simonwillison.net)
- Ted Turner has died (www.cnn.com)
- The bottleneck was never the code (www.thetypicalset.com)
- Red Squares – GitHub outages as contributions (red-squares.cian.lol)
- Batteries Not Included, or Required, for These Smart Home Sensors (coe.gatech.edu)
- Multi-stroke text effect in CSS (yuanchuan.dev)
- Knitting bullshit (katedaviesdesigns.com)
- Reverse-engineering the 1998 Ultima Online demo server (draxinar.github.io)
- CARA 2.0 – “I Built a Better Robot Dog” (www.aaedmusa.com)
- 245TB Micron 6600 ION Data Center SSD Now Shipping (investors.micron.com)
GitHub Trending(15)
- Hmbown / DeepSeek-TUI
- addyosmani / agent-skills
- PriorLabs / TabPFN
- docusealco / docuseal
- LearningCircuit / local-deep-research
- LadybirdBrowser / ladybird
- InsForge / InsForge
- virattt / dexter
- anthropics / financial-services
- ruvnet / ruflo
- cheahjs / free-llm-api-resources
- shiyu-coder / Kronos
- bwya77 / vscode-dark-islands
- bytedance / deer-flow
- D4Vinci / Scrapling
Product Hunt(15)
- moar
Your documents. AI ready.
- Alumni Founder
The tool that maps founder networks for any company
- Shadow 2.0
The work your meetings create, done before they end
- WOZCODE
Cut Claude Code costs by up to 50%
- Realtime TTS-2
Voice AI that feels as good as it sounds
- Gas City 1.0
build your own software factory
- Magic
Blend your content into real-world locations
- Ajelix AI Agent for Work
The first truly agentic AI sidebar for Google Workspace™
- Open Finance MCP
Access your bank data in ChatGPT & Claude via Open Finance
- Kanwas
An open-source brain for your team
- Knowly 1.0
LLM Wiki + NotebookLM, in one closed-loop Proactive AI
- pay.sh
Discover, access, and pay for any API autonomously
- Magic Studio by Once UI
Turn Once UI into a $10k agency
- Contrario
The AI recruiting platform powered by expert recruiters
- Superset 2.0
Run 100s of coding agents on any machine from anywhere
Hugging Face(15)
- ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance of agent systems built on LLMs depends on both the model weights and the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor's framing. Therefore, we present ARIS as a research harness that coordinates machine-learning research workflows through cross-model adversarial collaboration as a default configuration: an executor model drives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusable Markdown-defined skills, model integrations via MCP, a persistent research wiki for iterative reuse of prior findings, and deterministic figure generation. The orchestration layer coordinates five end-to-end workflows with adjustable effort settings and configurable routing to reviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence: integrity verification, result-to-claim mapping, and claim auditing that cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-pass scientific-editing pipeline, mathematical-proof checks, and visual inspection of the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.
- OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
- Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matches the supervision distribution. This problem is further amplified in multimodal reasoning, where perception errors and reasoning failures follow distinct drift patterns that compound during subsequent RL. We introduce PRISM, a three-stage pipeline that mitigates this drift by inserting an explicit distribution-alignment stage between SFT and RLVR. Building on the principle of on-policy distillation (OPD), PRISM casts alignment as a black-box, response-level adversarial game between the policy and a Mixture-of-Experts (MoE) discriminator with dedicated perception and reasoning experts, providing disentangled corrective signals that steer the policy toward the supervision distribution without requiring access to teacher logits. While 1.26M public demonstrations suffice for broad SFT initialization, distribution alignment demands higher-fidelity supervision; we therefore curate 113K additional demonstrations from Gemini 3 Flash, featuring dense visual grounding and step-by-step reasoning on the hardest unsolved problems. Experiments on Qwen3-VL show that PRISM consistently improves downstream RLVR performance across multiple RL algorithms (GRPO, DAPO, GSPO) and diverse multimodal benchmarks, improving average accuracy by +4.4 and +6.0 points over the SFT-to-RLVR baseline on 4B and 8B, respectively. Our code, data, and model checkpoints are publicly available at https://github.com/XIAO4579/PRISM.
- X2SAM: Any Segmentation in Images and Videos
Multimodal Large Language Models (MLLMs) have demonstrated strong image-level visual understanding and reasoning, yet their pixel-level perception across both images and videos remains limited. Foundation segmentation models such as the SAM series produce high-quality masks, but they rely on low-level visual prompts and cannot natively interpret complex conversational instructions. Existing segmentation MLLMs narrow this gap, but are usually specialized for either images or videos and rarely support both textual and visual prompts in one interface. We introduce X2SAM, a unified segmentation MLLM that extends any-segmentation capabilities from images to videos. Given conversational instructions and visual prompts, X2SAM couples an LLM with a Mask Memory module that stores guided vision features for temporally consistent video mask generation. The same formulation supports generic, open-vocabulary, referring, reasoning, grounded conversation generation, interactive, and visual grounded segmentation across image and video inputs. We further introduce the Video Visual Grounded (V-VGD) segmentation benchmark, which evaluates whether a model can segment object tracks in videos from interactive visual prompts. With a unified joint training strategy over heterogeneous image and video datasets, X2SAM delivers strong video segmentation performance, remains competitive on image segmentation benchmarks, and preserves general image and video chat ability.
- HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.
- Video Generation with Predictive Latents
Video Variational Autoencoder (VAE) enables latent video generative modeling by mapping the visual world into compact spatiotemporal latent spaces, improving training efficiency and stability. While existing video VAEs achieve commendable reconstruction quality, continued optimization of reconstruction does not necessarily translate into improved generative performance. How to enhance the diffusability of video latents remains a critical and unresolved challenge. In this work, inspired by principles of predictive world modeling, we investigate the potential of predictive learning to improve the video generative modeling. To this end, we introduce a simple and effective predictive reconstruction objective that unifies predictive learning with video reconstruction. Specifically, we randomly discard future frames and encode only partial past observations, while training the decoder to reconstruct the observed frames and predict future ones simultaneously. This design encourages the latent space to encode temporally predictive structures and build a more coherent understanding of video dynamics, thereby improving generation quality. Our model, termed Predictive Video VAE (PV-VAE), achieves superior performance on video generation, with 52% faster convergence and a 34.42 FVD improvement over the Wan2.2 VAE on UCF101. Furthermore, comprehensive analyses demonstrate that PV-VAE not only exhibits favorable scalability, with generative performance improving alongside VAE training, but also yields consistent gains in downstream video understanding, underscoring a latent space that effectively captures temporal coherence and motion priors.
- PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination
Patent examination is a complex, multi-stage process requiring both technical expertise and legal reasoning, increasingly challenged by rising application volumes. Prior benchmarks predominantly view patent examination as discriminative classification or static extraction, failing to capture its inherently interactive and iterative nature, similar to the peer review and rebuttal process in academic publishing. In this paper, we introduce PatRe, the first benchmark that models the full patent examination lifecycle, including Office Action generation and applicant rebuttal. PatRe comprises 480 real-world cases and supports both oracle and retrieval-simulated evaluation settings. Our benchmark reframes patent examination as a dynamic, multi-turn process of justification and response. Extensive experiments across various LLMs reveal critical insights into model performance, including differences between proprietary and open-source models, as well as task asymmetries between examiner analysis and applicant-side rebuttal. These findings highlight both the potential and current limitations of LLMs in modeling complex, real-world legal reasoning and technical novelty judgment in patent examination. We release our code and dataset to facilitate future research on patent examination modeling.
- SVGS: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors
Gaussian Splatting demonstrates impressive results in multi-view reconstruction based on Gaussian explicit representations. However, the current Gaussian primitives only have a single view-dependent color and an opacity to represent the appearance and geometry of the scene, resulting in a non-compact representation. In this paper, we introduce a new method called SVGS (Spatially Varying Gaussian Splatting) that utilizes spatially varying colors and opacity in a single Gaussian primitive to improve its representation ability. We have implemented bilinear interpolation, movable kernels, and tiny neural networks as spatially varying functions. SVGS employs 2D Gaussian surfels as primitives, which significantly enhances novel-view synthesis while maintaining high-quality geometric reconstruction. This approach is particularly effective in practical applications, as scenes combining complex textures with relatively simple geometry occur frequently in real-world environments. Quantitative and qualitative experimental results demonstrate that all three functions outperform the baseline, with the best movable kernels achieving superior novel view synthesis performance on multiple datasets, highlighting the strong potential of spatially varying functions. Project page: https://ruixu.me/html/SuperGaussians/index.html
- SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
Language models excel at diagnostic assessments on currated medical case-studies and vignettes, performing on par with, or better than, clinical professionals. However, existing studies focus on complex scenarios with rich context making it difficult to draw conclusions about how these systems perform for patients reporting symptoms in everyday life. We deployed SymptomAI, a set of conversational AI agents for end-to-end patient interviewing and differential diagnosis (DDx), via the Fitbit app in a study that randomized participants (N=13,917) to interact with five AI agents. This corpus captures diverse communication and a realistic distribution of illnesses from a real world population. A subset of 1,228 participants reported a clinician-provided diagnosis, and 517 of these were further evaluated by a panel of clinicians during over 250 hours of annotation. SymptomAI DDx were significantly more accurate (OR = 2.47, p < 0.001) than those from independent clinicians given the same dialogue in a blinded randomized comparison. Moreover, agentic strategies which conduct a dedicated symptom interview that elicit additional symptom information before providing a diagnosis, perform substantially better than baseline, user-guided conversations (p < 0.001). An auxiliary analysis on 1,509 conversations from a general US population panel validated that these results generalize beyond wearable device users. We used SymptomAI diagnoses as labels for all 13,917 participants to analyze over 500,000 days of wearable metrics across nearly 400 unique conditions. We identified strong associations between acute infections and physiological shifts (e.g., OR > 7 for influenza). While limited by self-reported ground truth, these results demonstrate the benefits of a dedicated and complete symptom interview compared to a user-guided symptom discussion, which is the default of most consumer LLMs.
- Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces
As large language model (LLM) agents evolve from isolated tool users into coordinated teams, reinforcement learning (RL) must optimize not only individual actions but also how work is spawned, delegated, communicated, aggregated, and stopped. This paper studies RL for LLM-based multi-agent systems through orchestration traces: temporal interaction graphs whose events include sub-agent spawning, delegation, communication, tool use, return, aggregation, and stopping decisions. Using this lens, we identify three technical axes. First, reward design spans eight families, including orchestration rewards for parallelism speedup, split correctness, and aggregation quality. Second, reward and credit signals attach to eight credit- or signal-bearing units from token to team; explicit counterfactual message-level credit remains especially sparse in our curated pool. Third, orchestration learning decomposes into five sub-decisions: when to spawn, whom to delegate to, how to communicate, how to aggregate, and when to stop. In our curated pool as of May 4, 2026, we found no explicit RL training method for the stopping decision. We connect academic methods to public industrial evidence from Kimi Agent Swarm, OpenAI Codex, and Anthropic Claude Code. The resulting scale gap is a gap between publicly reported deployment envelopes and open academic evaluation regimes, not independent verification of industrial training traces. We release the artifact at https://github.com/xxzcc/awesome-llm-mas-rl, including an 84-entry tagged paper pool, a 32-record exclusion log, scripted corpus statistics, and a minimal JSON schema for replayable orchestration traces.
- StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing
We present StateSMix, a fully self-contained lossless compressor that couples an online-trained Mamba-style State Space Model (SSM) with sparse n-gram context mixing and arithmetic coding. The model is initialised from scratch and trained token-by-token on the file being compressed, requiring no pre-trained weights, no GPU, and no external dependencies. The SSM (DM=32, NL=2, approximately 120K active parameters per file) provides a continuously-updated probability estimate over BPE tokens, while nine sparse n-gram hash tables (bigram through 32-gram, 16M slots each) add exact local and long-range pattern memorisation via a softmax-invariant logit-bias mechanism that updates only non-zero-count tokens. An entropy-adaptive scaling mechanism modulates the n-gram contribution based on the SSM's predictive confidence, preventing over-correction when the neural model is already well-calibrated. On the standard enwik8 benchmark, StateSMix achieves 2.123 bpb on 1 MB, 2.149 bpb on 3 MB, and 2.162 bpb on 10 MB, beating xz -9e (LZMA2) by 8.7%, 5.4%, and 0.7% respectively. Ablation experiments establish the SSM as the dominant compression engine: it alone accounts for a 46.6% size reduction over a frequency-count baseline and beats xz without any n-gram component, while n-gram tables provide a complementary 4.1% gain through exact context memorisation. OpenMP parallelisation of the training loop yields 1.9x speedup on 4 cores. The system is implemented in pure C with AVX2 SIMD and processes approximately 2,000 tokens per second on commodity x86-64 hardware.
- SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion
Although multi-modal learning has advanced point cloud completion, the theoretical mechanisms remain unclear. Recent works attribute success to the connection between modalities, yet we identify that standard hard projection severs this connection: projecting a sparse point cloud onto the image plane yields an extremely sparse support, which hinders visual prior propagation, a failure mode we term Cross-Modal Entropy Collapse. To address this practical limitation, we propose SplAttN, which replaces hard projection with Differentiable Gaussian Splatting to produce a dense, continuous image-plane representation. By reformulating projection as continuous density estimation, SplAttN avoids collapsed sparse support, facilitates gradient flow, and improves cross-modal connection learnability. Extensive experiments show that SplAttN achieves state-of-the-art performance on PCN and ShapeNet-55/34. Crucially, we utilize the real-world KITTI benchmark as a stress test for multi-modal reliance. Counter-factual evaluation reveals that while baselines degenerate into unimodal template retrievers insensitive to visual removal, SplAttN maintains a robust dependency on visual cues, validating that our method establishes an effective cross-modal connection. Code is available at https://github.com/zay002/SplAttN.
- Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
Workspace learning requires AI agents to identify, reason over, exploit, and update explicit and implicit dependencies among heterogeneous files in a worker's workspace, enabling them to complete both routine and advanced tasks effectively. Despite its importance, existing relevant benchmarks largely evaluate agents on pre-specified or synthesized files with limited real-world dependencies, leaving workspace-level evaluation underexplored. To this end, we introduce Workspace-Bench, a benchmark for evaluating AI agents on Workspace Learning invOlving Large-Scale File Dependencies. We construct realistic workspaces with 5 worker profiles, 74 file types, 20,476 files (up to 20GB) and curate 388 tasks, each with its own file dependency graph, evaluated across 7,399 total rubrics that require cross-file retrieval, contextual reasoning, and adaptive decision-making. We further provide Workspace-Bench-Lite, a 100-task subset that preserves the benchmark distribution while reducing evaluation costs by about 70%. We evaluate 4 popular agent harnesses and 7 foundation models. Experimental results show that current agents remain far from reliable workspace learning, where the best reaches only 68.7%, substantially below the human result of 80.7%, and the average performance across agents is only 47.4%.
- Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
Reinforcement learning (RL) has become a central post-training tool for improving the reasoning abilities of large language models (LLMs). In these systems, the rollout, the trajectory sampled from a prompt to termination, including intermediate reasoning steps and optional tool or environment interactions, determines the data the optimizer learns from, yet rollout design is often underreported. This survey provides an optimizer-agnostic view of rollout strategies for RL-based post-training of reasoning LLMs. We formalize rollout pipelines with unified notation and introduce Generate-Filter-Control-Replay (GFCR), a lifecycle taxonomy that decomposes rollout pipelines into four modular stages: Generate proposes candidate trajectories and topologies; Filter constructs intermediate signals via verifiers, judges, critics; Control allocates compute and makes continuation/branching/stopping decisions under budgets; and Replay retains and reuses artifacts across rollouts without weight updates, including self-evolving curricula that autonomously generate new training tasks. We complement GFCR with a criterion taxonomy of reliability, coverage, and cost sensitivity that characterizes rollout trade-offs. Using this framework, we synthesize methods spanning RL with verifiable rewards, process supervision, judge-based gating, guided and tree/segment rollouts, adaptive compute allocation, early-exit and partial rollouts, throughput optimization, and replay/recomposition for self-improvement. We ground the framework with case studies in math, code/SQL, multimodal reasoning, tool-using agents, and agentic skill benchmarks that evaluate skill induction, reuse, and cross-task transfer. Finally, we provide a diagnostic index that maps common rollout pathologies to GFCR modules and mitigation levers, alongside open challenges for building reproducible, compute-efficient, and trustworthy rollout pipelines.
- The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail
Niche-domain Indic ASR -- digit strings, currency amounts, addresses, brand names, English/Indic codemix -- is under-served by both open-source SOTA and commercial systems. On a synthesised entity-dense Telugu test set (held-out by synthesis system), vasista22/whisper-telugu-large-v2 (open SOTA) achieves Entity-Hit-Rate (EHR) 0.027 and Deepgram Nova-3 (commercial) 0.16. We close this gap with a self-contained TTS<->STT flywheel: an open-source Indic TTS pipeline synthesises ~22,000 entity-dense Indic-English code-mix utterances at <$50 marginal cost, and a LoRA fine-tune on top of vasista22 achieves EHR 0.473 on the held-out test (17x over open SOTA, 3x over commercial), with read-prose regression bounded to +6.6 pp WER on FLEURS-Te. Cross-language: beta-Hi 0.337 (7x vs vasista22) and beta-Ta 0.543 (22x vs vasista22, 22x vs Deepgram); on Hindi where Deepgram has substantial entity coverage, the flywheel underperforms commercial. All three beta models fall below pre-registered EHR targets (0.75 for Te, 0.65 for Hi/Ta); we report honestly. A native-human-recorded sanity check (n=20 Telugu) confirms transfer to real speech (beta-Te EHR 0.516 on native vs 0.473 on synth). An EDSA-isolation ablation (LoRA on FLEURS-Te alone) yields EHR 0.020 on the same held-out, attributing ~100% of the gain to the EDSA corpus. We additionally report a language-conditional finding: vanilla Whisper-large-v3 has Telugu-specific Script Collapse (SFR 0.46-0.71) that a per-language LoRA corrects (SFR 0.81-0.97), but the recipe is contraindicated on Hindi and Tamil where vanilla SFR >= 0.98. Code, holdouts, predictions, EDSA corpus, and entity dictionaries are released open-source.
Techmeme(15)
- Filing: Meta asks a judge to overturn the jury's verdict in the Los Angeles social media addiction trial or order a new trial, citing Section 230 protections (Diana Novak Jones/Reuters)
Diana Novak Jones / Reuters : Filing: Meta asks a judge to overturn the jury's verdict in the Los Angeles social media addiction trial or order a new trial, citing Section 230 protections — Meta Platforms (META.O) has asked a Los Angeles judge to throw out a jury's verdict finding the company liable for a woman's depression …
- Snap reports Q1 revenue up 12% YoY to $1.53B, in line with est., and says it ended its $400M Perplexity deal announced in November; SNAP drops 4%+ after hours (Jonathan Vanian/CNBC)
Jonathan Vanian / CNBC : Snap reports Q1 revenue up 12% YoY to $1.53B, in line with est., and says it ended its $400M Perplexity deal announced in November; SNAP drops 4%+ after hours — Snap shares dropped about 4% in extended trading after the company reported first-quarter earnings on Wednesday …
- DoorDash reports Q1 revenue up 33% YoY to $4.04B, vs. $4.14B est., and forecasts Q2 marketplace gross order value above estimates; DASH jumps 11%+ after hours (Koyena Das/Reuters)
Koyena Das / Reuters : DoorDash reports Q1 revenue up 33% YoY to $4.04B, vs. $4.14B est., and forecasts Q2 marketplace gross order value above estimates; DASH jumps 11%+ after hours — DoorDash (DASH.O) on Wednesday forecast second-quarter marketplace gross order value above analysts' estimates …
- Arm reports Q4 revenue up 20% YoY to $1.5B, says AGI CPU demand will drive $2B in sales in 2027 and 2028, over 2x its prior guidance; ARM jumps 11%+ after hours (Michael Acton/Financial Times)
Michael Acton / Financial Times : Arm reports Q4 revenue up 20% YoY to $1.5B, says AGI CPU demand will drive $2B in sales in 2027 and 2028, over 2x its prior guidance; ARM jumps 11%+ after hours — SoftBank-backed UK group says its first in-house semiconductor has drawn strong demand — Arm shares jumped 10 per cent …
- Google Chrome silently installs a ~4GB Gemini Nano model on desktop devices; Google says it has been offered since 2024 and users can remove it via settings (Ben Schoon/9to5Google)
Ben Schoon / 9to5Google : Google Chrome silently installs a ~4GB Gemini Nano model on desktop devices; Google says it has been offered since 2024 and users can remove it via settings — The ongoing march of AI features continues to go on, whether you want it to or not, and a recent update to Google Chrome probably installed …
- Corgi, which provides insurance for startups and uses AI to generate quotes, manage claims, and more, raised a $160M Series B led by TCV at a $1.3B valuation (Richard Nieva/Forbes)
Richard Nieva / Forbes : Corgi, which provides insurance for startups and uses AI to generate quotes, manage claims, and more, raised a $160M Series B led by TCV at a $1.3B valuation — Named after the company-owned dog, Corgi embodies the sometimes-absurd San Francisco startup scene. Now it's worth $1.3 billion, after raising $160 million.
- Musk v. Altman: Mira Murati testifies that Sam Altman lied to her about the safety standards for a new OpenAI model and that he made her work more difficult (Jay Peters/The Verge)
Jay Peters / The Verge : Musk v. Altman: Mira Murati testifies that Sam Altman lied to her about the safety standards for a new OpenAI model and that he made her work more difficult — OpenAI's former CEO testified under oath that Altman lied to her. … Mira Murati, OpenAI's former CTO, has testified under oath …
- Instacart reports Q1 revenue up 14% YoY to $1.02B, GTV up 13% to $10.29B, and orders up 10%, compared with a 16% growth a year earlier; CART drops 8.18% (Neil J Kanatt/Reuters)
Neil J Kanatt / Reuters : Instacart reports Q1 revenue up 14% YoY to $1.02B, GTV up 13% to $10.29B, and orders up 10%, compared with a 16% growth a year earlier; CART drops 8.18% — Instacart (CART.O) on Wednesday forecast second-quarter gross transaction value above Wall Street expectations and said shoppers …
- Anthropic says it signed a deal with SpaceX to use "all of the compute capacity" at Colossus 1, giving it access to over 300 MW of new capacity within the month (Axios)
Axios : Anthropic says it signed a deal with SpaceX to use “all of the compute capacity” at Colossus 1, giving it access to over 300 MW of new capacity within the month — Anthropic said Wednesday it has struck a deal to gain access to compute capacity from Elon Musk's SpaceX …
- Google releases Multi-Token Prediction drafters for its Gemma 4 models, which use a form of speculative decoding to guess future tokens for faster inference (Ryan Whitwam/Ars Technica)
Ryan Whitwam / Ars Technica : Google releases Multi-Token Prediction drafters for its Gemma 4 models, which use a form of speculative decoding to guess future tokens for faster inference — Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI.
- Sources: Microsoft is considering delaying or dropping its 2030 goal of matching its hourly power use with renewable energy purchases, amid the data center boom (Bloomberg)
Bloomberg : Sources: Microsoft is considering delaying or dropping its 2030 goal of matching its hourly power use with renewable energy purchases, amid the data center boom — Microsoft Corp. may shelve one of the industry's most ambitious clean-energy targets as it tries to remove hurdles that could hold …
- Following its SpaceX deal, Anthropic doubles Claude Code's five-hour rate limits for paid plans and removes peak hours limit reduction for Pro and Max plans (Anthropic)
Anthropic : Following its SpaceX deal, Anthropic doubles Claude Code's five-hour rate limits for paid plans and removes peak hours limit reduction for Pro and Max plans — We've agreed to a partnership with SpaceX that will substantially increase our compute capacity.
- SpaceX signs an agreement with Anthropic to provide access to Colossus 1, and says Anthropic expressed interest in partnering for orbital compute capacity (xAI)
xAI : SpaceX signs an agreement with Anthropic to provide access to Colossus 1, and says Anthropic expressed interest in partnering for orbital compute capacity — SpaceXAI has signed an agreement with Anthropic to provide access to Colossus 1, one of the world's largest and fastest-deployed AI supercomputers.
- Anthropic updates Claude Managed Agents with "dreaming", a scheduled process that reviews recent work and updates memory, available in research preview (Frederic Lardinois/The New Stack)
Frederic Lardinois / The New Stack : Anthropic updates Claude Managed Agents with “dreaming”, a scheduled process that reviews recent work and updates memory, available in research preview — In April, Anthropic launched the public beta of Managed Agents, its platform for running AI agents on its infrastructure.
- Sources: DeepSeek is in talks to raise $3B to $4B led by China's national AI fund, valuing DeepSeek at up to $50B (Reuters)
Reuters : Sources: DeepSeek is in talks to raise $3B to $4B led by China's national AI fund, valuing DeepSeek at up to $50B — Chinese AI startup DeepSeek could be valued at up to $50 billion in its maiden fundraising drive, three sources said, as the large language model builder seeks to reverse …
Solidot(15)
- CNN 创始人 Ted Turner 去世,享年 87 岁
CNN 创始人 Ted Turner 去世,享年 87 岁。他创办的 CNN 以 24 小时实时播报全球新闻闻名,对电视新闻产生了革命性影响。CNN 创办于 1980 年 6 月 1 日,是第一个 24 小时播报新闻的有线电视网。1995 年 CNN 出售给了时代华纳,Turner 退出了电视行业,他一直称 CNN 是其一生最伟大的成就。
- 研究称吃鸡蛋有助于降低阿尔茨海默病风险
研究人员发现,每天吃一个鸡蛋,每周至少五天,可将患阿尔茨海默病风险降低最多 27%。每月吃 1-3 次鸡蛋可将风险降低 17%,每周吃 2-4 次鸡蛋可将风险降低 20%。研究人员称,鸡蛋能提供有益于大脑健康的关键营养素。鸡蛋提供胆碱,胆碱是乙酰胆碱和磷脂酰胆碱的前体,而乙酰胆碱和磷脂酰胆碱对记忆和突触功能至关重要。鸡蛋还含有叶黄素和玉米黄素——这些类胡萝卜素与认知能力的提高和氧化应激的降低有关。鸡蛋还含有重要的 ω-3 脂肪酸,蛋黄尤其富含磷脂,磷脂占鸡蛋总脂质的近 30%,对神经递质受体的功能至关重要。这项研究获得了美国鸡蛋委员会的资助。
- OpenAI 总裁被迫在法庭作证时阅读自己的个人日记
马斯克(Elon Musk)上周在法庭上作证指控 OpenAI 的另外两位联合创始人 Greg Brockman 和 Sam Altman 放弃创办时的其非营利使命以谋取个人私利。本周 Brockman 出庭作证,被迫在陪审团前阅读个人日记,似乎印证了马斯克的指控。Brockman 称他从学生时期就写日记,在职业生涯中通过写日记去思考重大决策。这些日记是在去年 10 月作为证据递交到法庭,今年 1 月解封。2017 年马斯克向 OpenAI 发出最后通牒,要么完全由他掌控 OpenAI 的营利性部门,要么 OpenAI 继续保持非营利性质。而 Brockman 同一时间在日记里畅谈了赚钱的好处。在 OpenAI 成立了不由马斯克掌控的营利性部门之后,Brockman 个人在 OpenAI 的股份如今价值 300 亿美元。他还在日记中纠结投票反对马斯克的计划或者投票支持将马斯克逐出董事会是否在道德上是错误的。他在日记中写道:“从他手中夺走这家非营利机构是错误的。在道德上是败坏的。”
- 奥斯卡奖拒绝 AI 演员和 AI 创作的剧本
负责评选奥斯卡奖的美国电影艺术与科学学院宣布,只有人类表演和人类创作的剧本才有资格获得奥斯卡奖提名。奥斯卡奖不会全面禁止 AI 工具的使用,但将根据人类是否在创意作品中仍然扮演核心角色去评判电影。电影艺术与科学学院表示,如果电影制作人在作品中使用了 AI 工具,此类工具既不会帮助也不会损害其获得提名的机会。这是电影学院首次明确奖项只颁发给人类的表演和人类创作的剧本。
- 深色微塑料可能加剧全球暖化
根据发表在《Nature Climate Change》期刊上的一项研究,微塑料——尤其是深色的——可能通过吸收更多的热量加剧全球暖化。复旦大学的研究团队分析了不同颜色和大小的微塑料与光的相互作用,发现黑色、黄色、蓝色和红色微塑料颗粒吸收光的能力远胜于白色颗粒,其中黑色和彩色颗粒对光的吸收水平比无色颗粒高 76 倍。他们还发现,不同大小的颗粒以不同强度吸收光,且随着老化而改变吸收光的方式。研究人员估计悬浮在大气中的微塑料对全球变暖的贡献约为黑碳的六分之一,黑碳主要来源于化石燃料的燃烧。虽然微塑料并非暖化的主要来源,但对暖化的影响也绝不是微不足道的。研究人员粗略估计全球一年的微塑料排放量所造成的影响,大约相当于 200 座燃煤电厂运行一年。
- Google Chrome 被发现在合格设备上静默下载 Gemini Nano
Google Chrome 被发现在合格设备上静默下载了 4GB 大小的 Gemini Nano 模型,而且会在用户删除之后重新下载。Gemini Nano 就是 Google 受争议的 Prompt API 所针对的本地模型,运行该模型需要至少有 4GB 显存、16GB 内存和至少 22GB 可用空间(浏览器安装包所在分区)。Google Chrome 有 38 亿用户,是市场份额最高的浏览器,满足运行 Gemini Nano 要求的设备至少数以亿计,即使不考虑重复下载,为如此多的设备静默下载 4GB 数据也是难以想象的资源浪费。此外值得一提是 Chrome 安装包大小是 1GB 左右,悄悄下载的模型大小四倍于浏览器本身,超出了大多数用户对额外功能大小的预期。Gemini Nano 下载在被称为 OptGuideOnDeviceModel 的文件夹内,该名字代表 OptimizationGuide on-device model storage。
- 研究揭示饮食如何增强免疫力
发表在《自然》期刊上的一项研究揭示了饮食如何增强免疫力,帮助我们抵抗感染。研究团队在受试者一天的第一餐前采集了血液样本,并在 6 小时后再次采样。在此期间,受试者可以自由进食。随后研究团队评估了血液中 T 细胞的代谢状况。研究发现,与禁食过夜相比,进食后 T 细胞更容易获取完成高耗能激活过程所需的营养。进食后,这些细胞能更有效地摄取糖分,拥有更多的脂肪以及更高效的线粒体,最终这些增强了 T 细胞应对威胁的能力。小鼠实验表明,这些细胞增殖能力更强,并能提供更好的抗感染保护。进食产生这种效果的确切机制尚不明确,研究人员认为可能是因为 T 细胞产生蛋白质的能力增强了,从而使其在感染初期更快地被激活。
- Google 的 Pixel 11 系列将提供内存缩减版本
由于全球内存供应紧张,Google 即将在今年晚些时候推出的 Pixel 11 系列将提供内存缩减版本。以上一代 Pixel 10 为例,Pixel 10a 内存是 8GB,Pixel 10 内存为 12GB,Pro 版则是 16GB。Pixel 11 的基础版内存将从 12GB 减少到 8GB,而 Pro 版会提供一款 12GB 版本。所有型号仍然会提供 16GB 版,但预计价格会上涨。内存短缺已经导致全世界大部分手机厂商提高价格。
- 苹果同意向美国 iPhone 用户支付 2.5 亿美元和解 AI 功能交付延迟诉讼
苹果同意向美国 iPhone 用户支付 2.5 亿美元和解 AI 功能交付延迟诉讼。苹果是在 2024 年 6 月宣布了 Apple Intelligence,以及升级语音助手 Siri 的功能,但相关 AI 功能可用性推迟了多次,其中 AI 升级版的 Siri 预计要到今年晚些时候才推出。诉讼指控苹果的广告误导了消费者。和解协议覆盖了 2024 年 6 月 10 日至 2025 年 3 月 29 日期间购买 iPhone 16 系列和 iPhone 15 Pro 机型的美国用户。每台 iPhone 设备最多可获得 95 美元的赔偿。
- OpenAI、Google 和微软推动在学校课程中加入 AI 素养课
加州民主党参议员 Adam Schiff 提出了获得两党支持的新法案——《The Literacy in Future Technologies Artificial Intelligence(LIFT AI Act)》,旨在修改 K-12 课程加入 AI 素养课,为 AI 课程以及相关教材、教师培训等提供资助。法案将 AI 素养定义为使用 AI,具体是指“具备与年龄相符的知识和能力,能有效使用 AI,批判性解读输出,解决 AI 世界中的问题,以及降低潜在风险。法案得到了主要 AI 公司如 OpenAI、Google 和微软,以及美国教师联合会、信息技术产业理事会、软件与信息产业协会、惠普公司等的支持。
- Notepad++ for Mac 引发商标权争议
Andrey Letov 维护的 Notepad++ for Mac 项目引发了商标权争议。Notepad++ 原作者侯今吾认为项目名字有误导性,将 macOS 移植版本冠名为 Notepad++ 会给人产生该项目由 Notepad++ 团队维护或是获得认可的官方 macOS 版本的印象,但事实并非如此,此举会让用户感到困惑,并面临商标侵权的风险。Letov 已将该项目重命名为 NextPad++,并使用了不同于 Notepad++ 的图标。Letov 在开发 Notepad++ for Mac 过程中还大量使用了 AI 辅助编程工具 Anthropic Claude CLI,对于后续项目维护可能会带来疑问,也潜在面临安全问题。
- NASA 发布逾 1.2 万张阿尔忒弥斯二号绕月任务照片
NASA 发布了 12217 张阿尔忒弥斯二号绕月任务照片。阿尔忒弥斯二号任务于 2026 年 4 月 1 日发射,4 月 10 日返回地球,期间宇航员打破了 1970 年阿波罗 13 号宇航员创下的人类最远飞行距离纪录。发布的照片提供了两种分辨率:2592 x 2048 和 640 x 506,在线照片支持光标放大功能。
- Roku 和 TCL 因软件更新导致电视变砖被起诉
Roku 和 TCL 因软件更新导致部分型号的电视停止响应、反复重启,甚至完全无法开机——即变砖——而被起诉。Roku 和 TCL 是合作伙伴,递交到加州联邦法院的诉状称,两家公司在释出更新前未进行充分测试,更新后消费者的电视屏幕直接黑屏。受影响的型号包括:Roku Select 系列、Roku Plus 系列以及搭载 Roku 操作系统的 TCL 电视如 3 系列、4 系列、5 系列和 6 系列。问题在购买电视一两年后开始出现,比大多数人的预期要快得多,两家公司承诺会修复软件问题,但至今未提供完整的解决方案。
- NASA 局长认为冥王星是行星
2006 年国际天文联合会(IAU)修正了行星定义,将冥王星降级为矮行星,以与其它八大典行星相区分。此前天文学家在柯伊伯带发现了与冥王星质量相近甚至更大的天体,挑战了其行星地位。现在亿万富翁、NASA 局长 Jared Isaacman 在参议院听证会上表示坚决支持恢复冥王星被剥夺的行星地位,称 NASA“目前正在撰写文件,准备在科学界推动重新审视这一问题”。他的言论被认为是分散了对真正科学问题的关注,而且他支持削减 NASA 预算一半的提议引发了天文学家的抨击。
- Signal 开发无需手机的独立桌面应用
端对端加密消息应用 Signal 正在开发无需手机的独立桌面应用。此前 Signal 桌面应用需要智能手机用于初始设置和管理。Signal Desktop 源代码已经合并了相关补丁,但尚未正式推出。注册 Signal 桌面版仍然需要电话号码,电话号码可以是手机号码也可以是座机号码,功能机或智能手机都可以。未来没有智能手机的用户也能使用 Signal。