OrangeBot.AI Digest — 2026-05-24
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Australia Four-Day Work Week Study Data Shows Boosted Productivity (scienceaim.com)
- Claude is not your architect. Stop letting it pretend (www.hollandtech.net)
- Memory has grown to nearly two-thirds of AI chip component costs (epoch.ai)
- Omarchy Is Not A Distro (abyss.fish)
- Usborne 1980s Computer Books (usborne.com)
- 'AI washing': firms are scrambling to rebrand themselves as tech-focused (www.theguardian.com)
- DeepSeek to Make Permanent 75% Discount on Flagship AI Model (www.bloomberg.com)
- The seed oil panic is hurting my cardiac patients (www.statnews.com)
- Constraint Decay: The Fragility of LLM Agents in Back End Code Generation (arxiv.org)
- DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost (esengine.github.io)
- Childhood Computing (susam.net)
- Greg Brockman interview [video] (fs.blog)
- I spent 50 hours drawing a line graph (www.dougmacdowell.com)
- Amazon Web Services – Four Years and Out (www.adventuresinoss.com)
- Why is Vivado 2026.1 dropping Linux support for free tier? (adaptivesupport.amd.com)
GitHub Trending(15)
- Lum1104 / Understand-Anything
- rohitg00 / ai-engineering-from-scratch
- anthropics / claude-plugins-official
- anthropics / knowledge-work-plugins
- multica-ai / andrej-karpathy-skills
- earendil-works / pi
- Alishahryar1 / free-claude-code
- colbymchenry / codegraph
- multica-ai / multica
- shiyu-coder / Kronos
- manaflow-ai / cmux
- 666ghj / MiroFish
- codecrafters-io / build-your-own-x
- dotnet / skills
- blakeblackshear / frigate
Product Hunt(15)
- DynamicNotch
Dynamic island for macOS
- Stitch 3.0 by Google
Generate and iterate UI screens with AI on a live canvas
- WhatCable
Know what your USB-C cable can really do
- Freu AI
Automate any Mac app with $0 recurring run cost
- Edgee Fallback Models
Claude Code that never stops
- Runway Agent
Generate edited, sound-designed videos via chat
- ModelHub
The missing menu bar app for local LLMs on Mac.
- DockFlow
Save, switch, and automate Dock layouts for every workflow
- Command A+
Cohere’s open enterprise workhorse
- Kosshi
Simple, fast outliner for Mac and iPhone.
- Memdex
Turn every AI conversation into reusable local memory
- SignalLEMO - Ai Outreach Made Simple
AI-powered lead outreach for field service contractors
- note.md
Local-first markdown based workspace for research writings
- RetroMac
Turn your Mac into a time machine.
- Coca 2.0
Keep Your Mac and Apps Awake!
Hugging Face(15)
- DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards
Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better distinguish high-reward responses from low-reward ones. To address this limitation, we propose DelTA, a discriminative token credit assignment method that estimates token coefficients to amplify side-specific token-gradient directions and downweight shared or weakly discriminative ones. These coefficients reweight a self-normalized RLVR surrogate, making the effective side-wise centroids more contrastive and thereby reshaping the RLVR update direction. On seven mathematical benchmarks, DelTA outperforms the strongest same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base, respectively. Additional results on code generation, a different backbone, and out-of-domain evaluations further demonstrate the generalization ability of DelTA.
- TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation
Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.
- Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded social cognition in MLLMs.
- π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
The rise of personal assistant agents, e.g., OpenClaw, highlights the growing potential of large language models to support users across everyday life and work. A core challenge in these settings is proactive assistance, since users often begin with underspecified requests and leave important needs, constraints, or preferences unstated. However, existing benchmarks rarely evaluate whether agents can identify and act on such hidden intents before they are explicitly stated, especially in sustained multi-turn interactions where user needs emerge gradually. To address this gap, we introduce π-Bench, a benchmark for proactive assistance comprising 100 multi-turn tasks across 5 domain-specific user personas. By incorporating hidden user intents, inter-task dependencies, and cross-session continuity, π-Bench evaluates agents' ability to anticipate and address user needs over extended interactions, jointly measuring proactivity and task completion in long-horizon trajectories that better reflect real-world use. Experiments show (1) proactive assistance remains challenging, (2) a clear distinction between task completion and proactivity, and (3) the value of prior interaction for proactive intent resolution in later tasks.
- Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesirable trade-off among efficiency, training cost, and accuracy. In this work, we show that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only minimal adaptation. Our approach is built on three observations: (1) only a small subset of attention heads truly requires full long-context processing; (2) long-range retrieval is governed primarily by a low-dimensional subspace, allowing relevant tokens to be retrieved efficiently with a 16-dimensional indexer; and (3) the useful token budget is strongly query-dependent, making dynamic top-p selection more suitable than fixed top-k sparsification. Based on these insights, we propose RTPurbo, which retains the full KV cache only for retrieval heads and introduces a lightweight token indexer for sparse attention. By exploiting the model's intrinsic sparsity, RTPurbo achieves sparsification with only a few hundred training steps. Experiments on long-context benchmarks and reasoning tasks show that RTPurbo preserves near-lossless accuracy while delivering substantial efficiency gains, including up to a 9.36times prefill speedup at 1M context and about a 2.01times decode speedup. These results suggest that strong sparse inference can be obtained from standard full-attention training without expensive native sparse pretraining.
- ACC: Compiling Agent Trajectories for Long-Context Training
Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.
- PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
Simulation-ready physical 3D assets have emerged as a promising direction owing to their broad applicability in downstream tasks. However, most existing 3D generation methods either neglect physical properties or are limited to a single asset category, e.g., rigid, deformable, or articulated objects. To address these limitations, we introduce PhysX-Omni, a unified framework for simulation-ready physical 3D generation across diverse asset types. Specifically, we develop a novel and efficient geometry representation tailored for Vision-Language Models, which directly encodes high-resolution 3D structures without compression, significantly improving generation performance. In addition, we construct the first general simulation-ready 3D dataset, PhysXVerse, covering diverse indoor and outdoor categories. Furthermore, to comprehensively and flexibly evaluate both generative and understanding capabilities in the wild, we propose PhysX-Bench, which encompasses six key attributes: geometry, absolute scale, material, affordance, kinematics, and function description. Extensive experiments with conventional metrics and PhysX-Bench show that PhysX-Omni performs strongly in both generation and understanding. Moreover, additional studies further validate the potential of PhysX-Omni for applications in simulation-ready scene generation and robotic policy learning. We believe PhysX-Omni can significantly advance a wide range of downstream applications, particularly in embodied AI and physics-based simulation.
- LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous audio-visual signals into discrete tokens, weakening temporal grounding and shifting intermediate reasoning toward language priors. We argue that a unified latent space is a better medium for such reasoning because it preserves dense sensory information while remaining compatible with autoregressive generation. Based on this insight, we propose LatentOmni, a cross-modal reasoning framework that interleaves textual reasoning with audio-visual latent states. LatentOmni introduces feature-level supervision to align latent reasoning states with task-relevant sensory features and uses Omni-Sync Position Embedding (OSPE) to maintain temporal consistency between latent audio and visual states. We further construct LatentOmni-Instruct-35K, a dataset of audio-visual interleaved reasoning trajectories for supervising latent-space reasoning. Comprehensive evaluation across multiple audio-visual reasoning benchmarks demonstrates that LatentOmni achieves the best performance among the evaluated open-source models and consistently outperforms the Explicit Text CoT baseline, supporting latent-space joint reasoning as a promising path toward stronger omnimodal understanding.
- Forecasting Scientific Progress with Artificial Intelligence
Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints. We present CUSP (Cutoff-conditioned Unseen Scientific Progress), a multi-disciplinary and event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, mechanistic reasoning, generative solution design, and temporal prediction. Across 4,760 scientific events, we observe systematic and domain-dependent limitations in current frontier models. While models can identify plausible research directions from competing candidates, they fail to reliably predict whether scientific advances will be realized and systematically misestimate when they will occur. Performance is highly heterogeneous across domains, with the timing of AI progress more predictable than advances in biology, chemistry, and physics. Performance is largely insensitive to whether events occur before or after the training cutoff, suggesting these limitations cannot be explained solely by knowledge exposure in training data. Under controlled information access, additional pre-cutoff knowledge improves performance but does not close the gap to full-information settings, which becomes more pronounced for high-citation advances. Models also exhibit systematic overconfidence and strong response biases, indicating unreliable uncertainty estimation. Taken together, current AI systems fall short as predictive tools for scientific progress. Access to prior knowledge does not translate into reliable forecasting, and performance benefits more from post-event information than from forward-looking prediction.
- SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers
Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.
- WorldKV: Efficient World Memory with World Retrieval and Compression
Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/
- Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications. We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as domain-specific evaluation tasks in areas such as finance and supply chain management, which we compile into the new Domain-Spreadsheet benchmark dataset. It also includes a Spreadsheet Gym environment designed for multi-turn RL: Spreadsheet Gym exposes extensive Excel functionality through a Python sandbox, along with a refined harness that incorporates a comprehensive tool set and carefully designed tool-routing rules for spreadsheet tasks. Through comprehensive experiments, we show that Spreadsheet-RL substantially enhances AI agent's performance on both general and domain-specific spreadsheet tasks: it improves Qwen3-4B-Thinking-2507's Pass@1 on SpreadsheetBench from 12.0% to 23.4%, and raises Pass@1 from 8.4% to 17.2% on our curated Domain-Spreadsheet dataset. These results highlight Spreadsheet-RL's strong potential for generalization and real-world adoption in spreadsheet automation, and broadly, its promise for advancing LLM-based interactions with data interfaces in everyday work.
- SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation
Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.
- FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching
Extending the generation horizon of video diffusion models to long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions of bidirectional models, which are tightly coupled to specific architectures and suffer from quality degradation over long horizons, and autoregressive models, which accumulate drift errors due to exposure bias and tend to produce repetitive motion patterns. To address these issues, we propose a novel but simple inference-time approach for long video generation that is architecture-agnostic and requires no additional training. Our method generates long videos via overlapping sliding windows, where predicted clean samples from adjacent windows are blended via Tweedie matching to enforce both manifold constraint and temporal consistency across overlap regions. Stochastic early-phase sampling then synchronizes per-window trajectories by injecting fresh noise after each Tweedie matching correction in the high-noise phase, before transitioning to deterministic ODE sampling to preserve fine-grained visual fidelity. Applied to various video generation models, our method generates videos several times longer than the native window length while outperforming both training-free and autoregressive baselines in temporal consistency and visual quality, and further extends to audio-video joint generation and text-to-3DGS without any fine-tuning.
- Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving
Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.
Techmeme(15)
- Sources: Uber weighs a higher bid after it approached a major Delivery Hero shareholder with a €38-per-share bid, valuing the group at €11.5B+, but was rebuffed (Financial Times)
Financial Times : Sources: Uber weighs a higher bid after it approached a major Delivery Hero shareholder with a €38-per-share bid, valuing the group at €11.5B+, but was rebuffed — San Francisco based-group approached major shareholder in German food group — Uber's board met on Saturday …
- The ECB summons Eurozone banks to a meeting on Tuesday to discuss risks posed by the latest AI models and hopes US banks with Mythos access will share lessons (Martin Arnold/Financial Times)
Martin Arnold / Financial Times : The ECB summons Eurozone banks to a meeting on Tuesday to discuss risks posed by the latest AI models and hopes US banks with Mythos access will share lessons — Supervisor to stress seriousness of risks to financial system at hastily arranged meeting — The European Central Bank …
- A look at the scourge of smartphone thefts in London, as victims describe receiving texts threatening them into unlinking Apple IDs from stolen iPhones (New York Times)
New York Times : A look at the scourge of smartphone thefts in London, as victims describe receiving texts threatening them into unlinking Apple IDs from stolen iPhones — Tens of thousands of smartphones were reported stolen in the British capital in recent years. For some victims, losing their phone was only the beginning.
- A deep dive into Apple Watch and Health efforts; sources: iOS 27 to get a revamped AirPods control panel and default support for AirPlay rivals like Google Cast (Mark Gurman/Bloomberg)
Mark Gurman / Bloomberg : A deep dive into Apple Watch and Health efforts; sources: iOS 27 to get a revamped AirPods control panel and default support for AirPlay rivals like Google Cast — Also: iOS 27 AirPods, Genmoji and AirPlay details. — Now over a decade old, the Apple Watch is in need of a shake-up as the health and fitness wearables market pivots.
- How the AI boom is transforming global M&A, now dominated by the AI-driven race to control the world's energy, fiber networks, and computing capacity (Financial Times)
Financial Times : How the AI boom is transforming global M&A, now dominated by the AI-driven race to control the world's energy, fiber networks, and computing capacity — Deals hit record highs, unloved companies turn sexy and PE finds a new gold mine. Up until this week, NextEra Energy …
- Tether buys out SoftBank's ~26% stake in bitcoin treasury company Twenty One Capital, taking Tether's stake to ~71%; SoftBank's stake was worth ~$679M (Emily Nicolle/Bloomberg)
Emily Nicolle / Bloomberg : Tether buys out SoftBank's ~26% stake in bitcoin treasury company Twenty One Capital, taking Tether's stake to ~71%; SoftBank's stake was worth ~$679M — Tether has bought out SoftBank Group Inc.'s ownership in the digital-asset treasury company Twenty One Capital Inc. …
- Rising DRAM prices are driving "forced premiumization" in the smartphone market in India and Africa, as consumers in the sub-$200 segment risk being priced out (David Oks)
David Oks : Rising DRAM prices are driving “forced premiumization” in the smartphone market in India and Africa, as consumers in the sub-$200 segment risk being priced out — The global memory crunch and the great repricing of consumer electronics — One of the most remarkable things …
- Moment, which develops AI tools for automating fixed-income and equities trading tech, raised a $78M Series C led by Index Ventures, with a16z participating (Paige Smith/Bloomberg)
Paige Smith / Bloomberg : Moment, which develops AI tools for automating fixed-income and equities trading tech, raised a $78M Series C led by Index Ventures, with a16z participating — Moment, the financial-technology company founded by a cohort of former Citadel Securities quantitative traders and researchers …
- Hands-on with a pre-release build of Google Docs Live, an AI-powered voice tool for drafting documents, rolling out this summer to AI Pro and Ultra subscribers (Nicole Nguyen/Wall Street Journal)
Nicole Nguyen / Wall Street Journal : Hands-on with a pre-release build of Google Docs Live, an AI-powered voice tool for drafting documents, rolling out this summer to AI Pro and Ultra subscribers — An exclusive look at Docs Live, Google's new speech-powered AI project manager and writing partner — Microsoft Word brought word processing to the masses.
- DeepSeek says it will lower V4 Pro API prices by 75% to $0.435/1M input and $0.87/1M output tokens, making permanent the discount prices set to expire on May 31 (Bloomberg)
Bloomberg : DeepSeek says it will lower V4 Pro API prices by 75% to $0.435/1M input and $0.87/1M output tokens, making permanent the discount prices set to expire on May 31 — DeepSeek said it will make permanent a steep discount on its flagship V4-Pro model, maintaining prices for developers at a quarter of their original level.
- Sources: Princeton Digital Group to sell its Chinese data center assets for as much as $1B, as global buyout firms retreat from China's data center market (Financial Times)
Financial Times : Sources: Princeton Digital Group to sell its Chinese data center assets for as much as $1B, as global buyout firms retreat from China's data center market — Princeton Digital Group's sale process caps foreign retreat from the country's sensitive digital infrastructure.
- Salesforce's promotional videos showcased Agentforce mock-ups and features that are not widely available, as CEO Marc Benioff defends forward-looking marketing (Brody Ford/Bloomberg)
Brody Ford / Bloomberg : Salesforce's promotional videos showcased Agentforce mock-ups and features that are not widely available, as CEO Marc Benioff defends forward-looking marketing — Patients at the University of Chicago Medicine featured in a promotional video seamlessly refill prescriptions …
- Crypto companies prepare for the threat that quantum computers could hack core industry security, including breaking the critical code underpinning Bitcoin (Financial Times)
Financial Times : Crypto companies prepare for the threat that quantum computers could hack core industry security, including breaking the critical code underpinning Bitcoin — Threat to code that underpins bitcoin has moved ‘from theoretical to credible’, industry figures warn
- For publishers, pirated audiobooks made with AI on YouTube are a growing issue: removal is cumbersome, and some are hiring tech companies to take them down (Alexandra Alter/New York Times)
Alexandra Alter / New York Times : For publishers, pirated audiobooks made with AI on YouTube are a growing issue: removal is cumbersome, and some are hiring tech companies to take them down — Illegal, synthetically narrated copies of “The Hunger Games,” hit self-help books and everything in between are increasingly common on the platform.
- How Anthropic's ongoing discussions with the Vatican about ethics and AI led to Christopher Olah being invited to Pope Leo's unveiling of an encyclical on AI (Jack Jenkins/RNS)
Jack Jenkins / RNS : How Anthropic's ongoing discussions with the Vatican about ethics and AI led to Christopher Olah being invited to Pope Leo's unveiling of an encyclical on AI — (RNS) — Pope Leo XIV's new encyclical on AI is set to be released Monday (May 27), with Chris Olah, a co-founder of Anthropic, at his side.
Solidot(15)
- 报告认为个人要为老年健康状况承担至少八成责任
牛津长寿项目发表报告《Living Longer, Better》,认为个人至少要为老年时期的健康状况承担八成责任。报告指出,个人对自身寿命的掌控远超普遍认知。报告的结论是基于多项研究,这些研究认为至少 75% 的人类寿命由环境因素和可改变生活方式因素决定。其中一项研究使用了近 50 万英国生物银行参与者的数据。结果发现,环境暴露和习惯对过早死亡和生物衰老的影响远大于遗传因素。报告建议避免食用加工食品、完全戒酒、保证充足睡眠、晚上 6 点半以后不要进食,培养所谓的“非肉食心态”。在酒精问题上报告更直言不讳,称酒精有毒不要喝。批评者认为报告的结论过于简化,在贫困、污染和医保等问题上个人对自己选择的掌控力有限。
- 特朗普政府要求绿卡申请者离开美国申请
在非法移民之后,特朗普政府开始将目标瞄准合法移民。美国移民局宣布,绿卡申请者必须离开美国才能申请。但要离开美国申请者需要中断学业或工作,可能会被拒绝入境,事实上变成了某种自我遣返。目前不清楚新政策是否会类似 2025 年 9 月宣布的 10 万美元 H-1B 签证费用,一开始声称适用于所有新签证申请者,但随后大幅缩小了适用范围。
- 加州理工可能失去对 JPL 的控制
NASA 计划对 JPL 的运营合同进行首次公开竞标。JPL 实验室自 1930 年代成立以来一直由加州理工管理,它与 NASA 的现有合同将于 2028 年到期,届时有可能首次失去对 JPL 的控制。加州理工自去年夏天以来一直为这一可能的过渡做准备,并不为此感到意外。JPL 长期负责火星和其它深空区域的无人探索,它以美国联邦资助研发中心(FFRDC)的形式运营,相对于 NASA 其它机构保持着一定的独立性,如果由非加州理工的机构竞标运营,可能会产生重大影响,因为 JPL 和加州理工之间的关系非常紧密。
- 扎克伯格为监视员工的做法辩护
劳工保护组织 More Perfect Union 公开了扎克伯格(Mark Zuckerberg)上月底回答员工有关设备监控提问的六分钟录音。Meta 上个月通知员工将使用名为 Model Capability Initiative 的监控工具监控员工的鼠标点击和按键,此举旨在收集数据训练 AI 模型。扎克伯格在回答中为监控员工辩护,称如果想训练模型的编程能力,那么让内部员工去开发一些工具,或者去解决一些任务,以此来教模型如何写代码——这种方式能让模型在编程能力上实现飞跃。这种速度是行业内其他对手无法企及的,因为他们的公司没有成千上万名顶尖工程师,“这只是一个例子。我们的系统还需要非常擅长的一点就是‘操作电脑’。而要让一个系统学会熟练操作电脑,最有效的办法就是让它去观察极其聪明的人是如何操作电脑的。这基本上就是我们目前正在做的事情的核心本质。”扎克伯格表示不会监视员工的工作行为,MCI 数据不会用于绩效评估。因为欧盟的 GDPR 法律,Meta 位于欧洲的员工据报道不用参与该计划。Meta 并非唯一一家通过员工获取 AI 训练数据的科技公司,微软和 xAI 也在利用内部员工生成和完善训练数据集。
- 《无畏契约》反作弊工具会限制作弊者使用 DMA 外挂
非玩家可能不知道,今天的高级作弊工具已经硬件化,且价格不菲,可能比整台 PC 贵得多。此类工具被称为 DMA 硬件卡或 DMA 外挂,利用硬件绕过传统的游戏反作弊系统。游戏开发商也正致力于反制 DMA 外挂,最新的例子就是 Riot Games。它的 FPS 网游《无畏契约(Valorant)》使用的内核级反作弊系统 Vanguard 在最新更新之后能强制开启 IOMMU 封锁 DMA 外挂,导致 DMA 硬件停止工作,如果要恢复工作必须重新安装操作系统。Vanguard 现在能屏蔽大多数伪装成 SATA 或 NVMe 设备的 DMA 硬件卡固件,会在游戏中突然触发 IOMMU 重启警告,之后 DMA 固件完全无法使用,即使游戏不再运行或卸载也是如此。唯一的解决方法是重装 Windows 系统。Riot Games 通过社交媒体嘲讽了作弊者,称他们的 6000 美元 DMA 外挂变成了垃圾。
- 沃茨告诉毕业生他们拥有真正的智能
苹果联合创始人沃茨(Steve Wozniak)做到了其他毕业典礼嘉宾没有做到的事情:他谈论 AI 时赢得了毕业生的欢呼,而不是嘘声。沃茨说,“You have AI — actual intelligence。”他说,“要深入谈谈我对 AI 的看法,那就说来话长了,但我们一直在努力创造一个大脑,我们能否将一个程序复制一万亿次使其像大脑一样运作?AI 就是其中一种尝试。”沃兹回顾了他在苹果公司的工作经历,为即将开始职业生涯的毕业生们提供了一些建议,“你们应该尝试换一种思维,不要墨守成规,走千篇一律的路。想想我能不能做一些与众不同的事情?”
- Linus Torvalds 谈 AI
Linux 作者 Linus Torvalds 在北美开源峰会上谈论了 AI,他认为 AI 工具正在重塑内核开发,但他坚称 AI 只是一种不错的工具,不会完全替代程序员。Torvalds 称内核最近两个版本的 commits 数增加了 20%,他一开始以为是内核版本号从 6.x 跳到 7.x 而让开发者兴奋不已,结果发现是因为 AI 辅助编程工具过去半年有了显著进步。他承认 AI 工具降低贡献者的门槛,但它真正的影响是社会而不是技术层面,一个例子就是安全邮件列表涌入了大量重复性的 bug 报告。为应对这一情况,内核制定了新规则。Torvalds 同时督促安全研究人员不要提前披露漏洞利用,内核最近发现了四个提权漏洞,但维护者还没收到通知研究员就提前公开,他说这些人喜欢引人瞩目。他不认为闭源能解决安全问题,闭源实际上更糟,因为 AI 无法帮助你修复 bug。Torvalds 说维护工作依赖于人而不是代码,作为最高级别的维护者,他的工作不是写代码而是与人合作,他不会用 AI 来与人合作,并建议其他人也不要这么做。他始终认为 AI 只是不错的工具,不会完全取代程序员。他的工作经历就凸显了工具的进步给程序员带来的生产力提升:他最开始是手动输入机器代码,然后用汇编器,接着是编译器,最后是今天的 AI 辅助编程。他认为 AI 在改变编程,但并没有改变编程的本质。开发者仍然需要理解工具生成了什么。对于任何长期运行的系统,“你不仅要理解指令,还要理解最终结果,因为这是你能长期维护它的唯一途径。”AI 并不能取代人类判断、社区规范以及对所构建系统的深刻理解,“软件非常复杂,管理复杂基础设施复杂性的唯一真正有效方法是开源”,而 AI 只是程序员工具箱中的又一个工具。
- GitHub 面临生存之战
在被微软收购八年之后,最大的代码托管平台 GitHub 正面临生存之战,它的宕机和安全问题频发,而竞争对手的压力也越来越大。过去几周,GitHub 发生了多起严重的宕机事故,因员工的 VS Code 安装了一个恶意库扩展导致 3800 个内部代码库被窃取。GitHub 现员工和前员工在接受采访时描述了公司在领导层缺乏和竞争对手压力下挣扎的困境。2025 年夏天 CEO Thomas Dohmke 离职之后,微软没有再任命新 CEO,而是让领导团队成员向 CoreAI 汇报工作,CoreAI 由前 Meta 工程主管 Jay Parikh 负责,他由 CEO Satya Nadella 亲自招揽,负责帮助公司向 AI 转型。他在公司内部并不受欢迎,正是他决定不再任命 GitHub 新 CEO。有很多 GitHub 员工跟着离职去了 Dohmke 的新创公司 Entire。GitHub 高管过去几个月也不断流失,高级副总裁 Jared Palmer、前首席营收管 Elizabeth Pemmerl 都已经离职。GitHub 现员工称公司已经名存实亡,如今的一切都归微软。
- Sergey Brin 捐 50 万美元反对对薪酬过高的 CEO 征税
已从硅谷搬家到内华达州的 Google 联合创始人 Sergey Brin 向旧金山的一个政治行动委员会捐赠 50 万美元,用于反对一项被称为“薪酬过高 CEO 税”的提案,旧金山选民将于 6 月 2 日对该提案进行投票。他此前已经捐赠数千万美元反对加州对亿万富翁征税的提案,该提案预计将于今年 11 月由加州选民进行投票。“薪酬过高 CEO 税”将根据公司全球员工的薪酬情况计算高管与普通员工的薪酬比率。支持该提案的 Chinese Progressive Association 称有必要“确保最富有的企业缴纳其应缴的税款”。
- Meta 应沙特要求审查反对者的账号
从 2026 年 4 月 30 日起,Meta 应沙特政府要求在沙特境内屏蔽了 NGO 组织 ALQST for Human Rights 和 Democratic Diwan,以及沙特研究员 Abdullah Alaoudh 和人权活动人士 Yahya Assiri 的 Facebook 账户。Meta 也应阿联酋要求地理封锁了一名学者的账号。自 2026 年 3 月以来,已有逾 100 个 Facebook 页面和 Instagram 账户受到了限制。沙特还要求 X 平台地理封锁知名沙特活动人士的账号,目前 X 尚未遵守该要求。
- 脱离人体的大脑被用于药物测试
一天前这颗大脑还在一个活人身上。如今在其主人去世数小时后,它静静地躺在一辆小推车上。车上布满了管道,向这个器官内泵入数升的血液替代品和其它液体,为其输送氧气并排出代谢废物。它的大部分核心功能都完好无损,但其电活动已被麻醉剂压制,使这颗大脑处于一种介于生死之间的游离状态。随着它代谢着实验性药物,传感器实时记录着其反应,捕捉关于细胞、蛋白质和生理机能的数百个数据点。24 小时后,它将被切成数百个碎片,以进行更深入的研究。它是生物创业公司 Bexorg 使用脑维持设备 BrainEx 培养和研究的逾七百颗大脑之一,被用于深入理解潜在疗法在患有帕金森、阿尔茨海默或肌萎缩侧索硬化症等神经退行性疾病大脑中的作用机制。Bexorg 能对大脑进行活检,了解药物在细胞中停留的时间、是否靶向其分子靶点以及是否存在任何副作用。Bexorg 认为它的系统能提供比实验室动物或培养皿细胞更接近真实情况的药物测试条件。Bexorg 此前一直保持低调,但最近在扩大规模,邀请了记者参观其实验室,试图向公众保证,脱离人体的大脑不会触犯伦理底线,也不会有恢复意识的风险。
- 因无人驾驶汽车驶入洪水 Waymo 暂停亚特兰大服务
由于无人驾驶汽车暂时还无法应付洪水淹没道路问题,Waymo 暂停了在亚特兰大的无人出租车服务。Waymo 的一辆无人驾驶出租车周三驶入了一条被洪水淹没的道路,被困大约一小时。这辆车已被拖走。Waymo 表示它在寻找解决方案的同时暂停在了亚特兰大的服务。Waymo 早些时候因为恶劣天气暂停了德州圣安东尼奥、达拉斯和休斯顿的服务。Waymo 称亚特兰大的暴雨降雨量巨大,以至于在国家气象局发布山洪暴发预警、警报或建议前洪水就已经发生了。
- 手机壳可能会富集耐药菌和 PFAS
现代人几乎与手机形影不离,手部、面部皮肤与手机及手机壳长期高频接触。你有没有留意过,用了大半年的手机壳,不知从哪天开始就悄悄发黄、发黏,怎么擦都回不到当初光亮透明的样子。根据发表在《危险材料杂志》上的研究,科学家证实不良卫生习惯及频繁化妆行为会加速热塑性聚氨酯(TPU)手机壳老化,使其逐渐成为全氟烷基物质(PFAS)与条件致病菌共同富集的“温床”。用户行为研究机构 Dscout 的真实环境追踪报告显示,智能手机用户日均触摸手机 2617 次,重度用户可达 5400 余次。研究团队招募了 30 名在校大学生志愿者,开展了一项持续 285 天的真实环境受控队列研究。团队观察了两类典型受试群体:一类是卫生习惯良好、较少使用化妆品的志愿者;另一类则恰巧相反,频繁使用化妆品且手部卫生习惯较差。结果显示,与卫生习惯较好、较少使用化妆品的志愿者相比,频繁使用化妆品且手部卫生习惯较差的受试者,其手机壳表面的 PFA S富集水平显著升高。在部分污染累积较严重的手机壳样本中,全氟辛酸(PFOA)表面富集量最高达到每平方厘米 9.39 微克,全氟辛烷磺酸(PFOS)最高达到每平方厘米 0.164 微克,提示日常接触行为可能正在悄然增加人体暴露于新污染物和潜在致病微生物的风险。
- 欧洲巨石文化社会存在遗传亲缘关系
新石器时代晚期(约公元前 4500 至公元前 2800 年),巨石遗迹(即大型石质建筑结构)在欧洲各地出现。这些建筑作品既反映了当地的传统,同时也暗示了相隔遥远的人群之间存在着影响深远的社会、文化或祖源联系。根据发表在《科学》期刊上的一项研究,研究人员分析了中欧多个相距遥远的巨石文化遗址个体的基因组数据,发现他们之间存在着深厚且持续的生物学关联,表明当时存在着偶尔的跨越大范围地理区域的人口流动、通婚或文化交流。但中欧巨石文化与位于今天的英国以及北欧的巨石文化人群缺乏密切的基因学纽带关系。这表明巨石传统很可能是通过文化(而非通过生物学网络)传播的。
- 特朗普政府不想要埃博拉病毒的美国感染者回国治疗
刚果再次爆发了埃博拉疫情,确诊或接触病毒的人中包括了美国医生,但上周特朗普政府拒绝让他们回国接受治疗。39 岁的外科医生 Peter Stafford 于周日确诊,本周三美国 CDC 的埃博拉疫情事件响应经理 Satish Pillai 表示,Stafford 已送往德国,目前情况稳定。他的妻子 Rebekah Stafford 也是医生,也是病毒接触者,但目前还没有出现症状,他们以及四个孩子都送往了德国。另一名医生 Patrick LaRochelle 与 Stafford 夫妇同属于 Serge 传教团,他是病毒接触者,目前无症状,他已送往布拉格接受监测和治疗。他的妻子和孩子曾与他一同在刚果,但 CDC 认为他们没有接触过病毒,因此已经返回了美国。根据 WHO 周三公布的最新数据,目前埃博拉疑似病例为 528 例,死亡 132 例。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.