OrangeBot.AI Digest — 2026-06-04
89 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Anthropic's open-source framework for AI-powered vulnerability discovery (github.com)
- Meta's ships facial recognition on smart glasses (www.buchodi.com)
- The desperation of NYTimes (rozumem.xyz)
- Sagrada Família Lego set (www.lego.com)
- Retro-Tech Parenting (havenweb.org)
- When AI Builds Itself: Our progress toward recursive self-improvement (www.anthropic.com)
- Wind and solar generated more power than gas globally in April 2026 (electrek.co)
- VoidZero Is Joining Cloudflare (blog.cloudflare.com)
- Ian's Secure Shoelace Knot (www.fieggen.com)
- French-Iranian author Marjane Satrapi, author of 'Persepolis', dies at 56 (www.france24.com)
- U.S. Army Corps of Engineers Bay Model (en.wikipedia.org)
- Gaussian Point Splatting (momentsingraphics.de)
- Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites (uruky.com)
- UK media fails to disclose defence sector links in nearly 60% of cases (aoav.org.uk)
- Learn SQL Once, Use It for 30 Years (fagnerbrack.com)
GitHub Trending(14)
- chopratejas / headroom
- NousResearch / hermes-agent
- affaan-m / ECC
- PaddlePaddle / PaddleOCR
- github / spec-kit
- NVIDIA / cosmos
- lfnovo / open-notebook
- Open-LLM-VTuber / Open-LLM-VTuber
- jwasham / coding-interview-university
- github / copilot-sdk
- aquasecurity / trivy
- openclaw / openclaw-windows-node
- reconurge / flowsint
- mvanhorn / last30days-skill
Product Hunt(15)
- Sun
Collaborative voice API for agents
- Extella.AI
Agentic platform that evolves & builds reusable systems
- ChatPilot
Bulk delete, archive & timestamp your ChatGPT conversations
- Build Club Campus
Virtual AI School: Upskill in AI and Become Great at it Fast
- Empromptu AI
Train Fine Tuned Models With AI Apps You're Already Building
- TimeTuna.com
If Calendly had gorgeous video backgrounds
- PlugTalk
Your Mac talks back when you plug things in
- Koji by Brilliant
A world-class personal tutor for every home
- AppWizzy
Rent a private VM with Codex to build production apps
- Boxes.dev
Run Claude Code and Codex in your own cloud environment
- Curata
A shared workspace for AI agents and humans.
- Mailwarm 2.0
The email warmup tool, upgraded for deliverability.
- Google Gemma 4 12B
Run multimodal AI locally with an encoder-free architecture
- Gather
Save it once, never lose it again
- Keen Code
A context-efficient CLI coding agent built by agents
Hugging Face(15)
- Audio Interaction Model
Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-decide-respond loop, listens to sound, environment, and instructions in real time and reacts on the fly. We formalize this regime as the Audio Interaction Model, and realize it with Audio-Interaction, a unified streaming model that retains offline task execution while adding online general audio instruction following, from dialogue to full voice chatting, deciding when to respond from the semantics of the stream. To enable this, we propose SoundFlow, a framework that instantiates the perceive-decide-respond loop end to end, from data to training to deployment, through streaming-native data construction, comprehension-aware training, and asynchronous low-latency inference for stable real-time interaction. We further construct StreamAudio-2M, a 2.6M-item streaming corpus spanning 7 fundamental abilities and 28 sub-tasks, and Proactive-Sound-Bench for evaluating proactive audio intervention. Across 8 benchmarks, Audio-Interaction preserves competitive performance on mainstream audio tasks while unlocking capabilities inaccessible to offline LALMs, including real-time ASR, streaming audio instruction following, and proactive help.
- Cosmos 3: Omnimodal World Models for Physical AI
We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 https://openmdw.ai/license/1-1/ License at https://github.com/nvidia/cosmos}{github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3 . The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3 .
- Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents. We collect 2,790 real trajectories from two agent frameworks, three backbone models, and three benchmarks, convert raw logs into semantic spans, and annotate harmful error spans through LLM-assisted expert review. From these annotations, we build TELBench, a 1,000-instance benchmark for identifying error spans among normal exploration, failed searches, tentative hypotheses, and harmless noise. We further propose DRIFT, a claim-centric auditing framework that tracks agent claims, checks their support in trajectory evidence, and marks spans where unsupported or conflicting claims affect the answer path. Experiments across model families and auditing frameworks show that DRIFT improves span-level error localization and first-error accuracy by up to 30 percentage points. Our work provides a process-level view of reliability in deep-research agents.
- Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hacking behaviors are often subtle and entangled with multiple judge biases, making them difficult to analyze, detect, and mitigate. In this paper, we introduce CHERRL, a controllable hacking environment for rubric-based RL. By injecting known biases into LaaJ, CHERRL enables stable reproduction of reward hacking, explicit observation of reward divergence, and precise identification of hacking onset. This provides a clean experimental testbed for studying the mechanisms and mitigations of reward hacking in rubric-based RL. To demonstrate its utility, we analyze different judge biases from the perspectives of discoverability and exploitability, and explore an agent-based system for automatically detecting reward hacking onset from training logs. The code and environment are publicly available at https://github.com/THUAIS-Lab/CHERRL.
- OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs
Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous egocentric streams, often using evidence outside the current view. Existing benchmarks either evaluate offline over full videos or target events rather than spatial structure. We introduce OVO-S-Bench, a fully human-annotated benchmark for streaming spatial intelligence, comprising 1,680 questions over 348 source videos. Annotation involves 12 trained annotators, each also serving as a blind cross-reviewer, across roughly 804 person-hours of multi-round quality assurance. Each question carries a query timestamp and an evidence interval, and at evaluation, the model sees only the prefix preceding the query. Questions span four levels of increasing abstraction: instantaneous egocentric perception, spatiotemporal context tracking, spatial simulation and reasoning, and allocentric mapping. Across 38 proprietary and open-source MLLMs, Gemini-3.1-Pro trails human experts by 27 points, 59.2 vs. 86.6, with allocentric mapping as the dominant bottleneck. Notably, streaming and spatially fine-tuned MLLMs underperform their own backbones. We further find that chain-of-thought reasoning amplifies spatial errors when ungrounded in the stream. By exposing these limitations, OVO-S-Bench establishes a demanding testbed for next-generation streaming spatial MLLMs.
- Qwen-Image-Flash: Beyond Objective Design
Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.
- ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning
Large Reasoning Models (LRMs) have achieved remarkable progress thanks to Reinforcement Learning with Verifiable Rewards (RLVR) on Chain-of-Thoughts (CoTs). However, since long CoTs naturally contain trial and errors and mainstream RLVR approaches choose outcome-correct CoT trajectories for memorization, the redundant explorations in long CoTs are inevitably reinforced, which results in the over-thinking issues of LRMs. Previous attempts to resolve this issue mainly give more advantage to shorter trajectories, yet their learning signals are still outcome-based and cannot reduce the memorization of redundant explorations in long CoTs. Therefore, we propose ThoughtFold, a framework that leverages fine-grained preference learning to mitigate redundant explorations for efficient reasoning. ThoughtFold employs an introspective strategy to identify redundancy within each correct trajectory, which yields a spectrum of candidate sub-trajectories. Leveraging this spectrum, we introduce a masked preference optimization objective that explicitly penalizes redundant explorations and encourages the model to directly bridge essential reasoning segments, effectively folding its reasoning chains into a more concise path. Extensive experiments show that ThoughtFold significantly enhances efficiency. It reduces the token usage of DeepSeek-R1-Distill-Qwen-7B by approximately 56% while maintaining state-of-the-art accuracy.
- M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks
As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models retain, how faithfully information is preserved, and how robust memory remains under interference. To address this gap, we introduce M^3Eval, the first comprehensive evaluation framework and benchmark for probing different memory dimensions in multi-modal models. Grounded in cognitive psychology, our design features carefully constructed tasks that isolate key aspects of memory. Leveraging M^3Eval, we conduct extensive experiments across representative multi-modal models, revealing consistent weaknesses and distinctive behaviors. We find that models struggle to maintain disentangled representations when processing parallel video streams, exhibit interference patterns differing substantially from those observed in human memory, ground memory sources more reliably in the spatial domain than the temporal domain, and demonstrate limited symbolic memory. Collectively, our benchmark provides a valuable resource for future research, while our findings highlight memory as a fundamental yet underexplored capability and offer insights for designing more effective memory mechanisms in multi-modal models. Our code and dataset are available at https://pku-value-lab.github.io/m3eval-homepage.
- Streaming Communication in Multi-Agent Reasoning
Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.
- Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.
- Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems
LLM agents are rapidly evolving from coding assistants into autonomous software engineering systems. However, existing evaluation methodologies remain largely centered on static, isolated, and short-horizon benchmarks that fail to capture the dynamic complexity of real-world production workflows. As a result, benchmark performance may poorly reflect practical capability under realistic runtime environments involving long execution chains, tool interactions, dependency management, and iterative feedback loops. We thus present RAMP, a production-grounded infrastructure for assessing long-horizon software engineering agents. Built upon the YatCC integrated platform, RAMP provides a unified runtime assessment architecture through standardized orchestration and execution interfaces. RAMP introduces realistic compiler-construction workloads with serial dependencies and complex toolchain interactions, together with a staged recovery mechanism for analyzing execution behavior under partial workflow failure. The framework further incorporates utility-oriented multi-dimensional metrics that jointly evaluate outcome quality and process efficiency. We conduct runtime assessments across 15 mainstream models and observe substantial capability degradation that remains largely invisible to conventional isolated benchmarks. Task completion rates progressively collapse across serial workflows, dropping from 100% in the initial stage to only 20% in the final stage, while none of the evaluated models successfully completes the entire pipeline. Runtime analysis reveals systematic failure propagation and significant resource inefficiencies, with computational costs differing by up to three orders of magnitude among comparable models. These findings suggest RAMP advances agentic model evaluation toward continuous, runtime-observable, and production-grounded assessment.
- MemTrain: Self-Supervised Context Memory Training
Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize information accumulated across extended interactions. Existing memory-agent approaches are typically trained end-to-end with reinforcement learning on downstream tasks. However, collecting high-quality annotated problems for memory-intensive scenarios is costly, and the resulting training data often lack sufficient diversity to cover general memory behaviors. In this work, we propose MemTrain, a self-supervised training framework for generally enhancing the context-memory capability of LLM agents for more effective downstream post-training. MemTrain introduces two coupled proxy tasks over unlabeled Wikipedia corpora: (1) an end-to-end masked reconstruction objective, which requires the model to recover masked entities after multiple rounds of memory updates, thereby encouraging memory maintenance from the final outcome perspective; and (2) an intermediate memory recall objective, which requires the model to reconstruct masked historical information using intermediate memory states, encouraging faithful compression and memory completeness throughout the interaction process. The two objectives are jointly optimized using GRPO. Extensive experiments on long-text QA and search-based QA benchmarks demonstrate that MemTrain consistently improves downstream memory-intensive reasoning performance across different models, achieving gains of up to 17.67 points over direct task-specific post-training.
- Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching
Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained perception, and occlusion reasoning, making it a challenging testbed for spatial reasoning in multimodal large language models (MLLMs) deployed in physical environments. However, current MLLMs lack systematic evaluation and training frameworks for these capabilities. We introduce ReasonMatch-Bench, a benchmark stratified by viewpoint displacement and matching granularity across indoor, outdoor, and object-centric scenarios, and show that current MLLMs still struggle with fine-grained wide-baseline correspondence: on a difficult 90-sample subset, human annotators achieve 84.0 F1, while the best existing baseline reaches 37.2. To bridge this gap, we build a scalable data-generation pipeline that automatically extracts wide-baseline view pairs from large-scale video-3D corpora, including RGB-D videos and SfM reconstructions, yielding diverse and verifiable supervision. We further propose Dynamic Correspondence Reinforcement Learning (DCRL), which combines Image-Level Viewpoint Progression and Point-Level Correspondence Curriculum to improve WBM training through verifiable rewards without explicit CoT supervision. Extensive experiments show that DCRL substantially improves ReasonMatch-Bench and transfers to related spatial benchmarks, while maintaining general visual understanding performance with modest gains on several benchmarks.
- MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?
Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the gap between human-oriented guides and agent-executable skills, we formalize this problem as guide-to-skill learning: converting in-the-wild guides into executable skills and continuously improving them from trajectories observable to the agent. To evaluate the capability of existing agents on this task, we introduce MMG2Skill-Bench, the first benchmark designed for this problem. We further propose MMG2Skill, a closed-loop framework that compiles guides into editable skills, conditions a fixed vision-language model (VLM) agent on these skills during execution, and revises the skills from trajectory-level root-cause feedback without using benchmark scores. Across GUI control, open-ended gameplay, and strategic card play with six VLM backbones, MMG2Skill consistently outperforms vanilla baseline agents in every model-domain setting, achieving macro-average gains of +12.8 to +25.3 percentage points across backbones. Ablation studies show that directly prompting agents with raw guides can degrade performance, while both structured skill construction and trajectory-driven revision are necessary for the observed improvements. On success-inferable tasks, analyzer-based early stopping further prevents late-stage performance regressions and saves 25%-53% of attempts when the success signal is properly calibrated.
- Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation
On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more selective training paradigms. Recent OPD methods increasingly focus on selecting which trajectories to learn from, which tokens are most informative, and which supervision signals are most reliable. Motivated by this trend, we rethink optimization granularity of OPD and propose \fireicon\ FiRe-OPD (Filter, then Reweight), which jointly adjusts supervision signals at both trajectory and token levels. In details, FiRe-OPD first filters trajectories to remove low-quality rollout samples, and then applies soft reweighting within the retained trajectories to emphasize informative tokens. Compared with hard token selection, FiRe-OPD leverages a soft-weighting mechanism to effectively mitigate information loss and enhance optimization stability, thereby achieving finer-grained OPD optimization. We validate the effectiveness of FiRe-OPD across strong-to-weak, single-teacher, and multi-teacher settings, and demonstrate its superiority over recent token-level OPD methods ( (e.g., +6.25 on AIME 2024 in strong-to-weak, +18.81 on Miner in multi-teacher). Our code is available at https://github.com/YuYingLi0/FiRe-OPD.
Techmeme(15)
- Internal docs from lawsuits by 1,400 school districts show how social media companies targeted kids: Meta paid "teen ambassadors", Snap sent school-hour alerts (Jennifer Valentino-DeVries/New York Times)
Jennifer Valentino-DeVries / New York Times : Internal docs from lawsuits by 1,400 school districts show how social media companies targeted kids: Meta paid “teen ambassadors”, Snap sent school-hour alerts — Internal documents show how tech giants grabbed children's attention throughout the day, a strategy that schools say has undermined education.
- Poke, which lets users access AI agents via text message, becomes the first AI agent approved for Apple's Messages for Business platform (Sarah Perez/TechCrunch)
Sarah Perez / TechCrunch : Poke, which lets users access AI agents via text message, becomes the first AI agent approved for Apple's Messages for Business platform — Poke, a startup that turns using AI agents into something as simple as sending a text message, has become the first AI agent approved to run on Apple's Messages for Business platform.
- Unsealed 2020 lawsuit: ex-IBM VP of threat intelligence alleges that IBM and AT&T concealed foreign cyber breaches to maintain eligibility for federal contracts (Bloomberg)
Bloomberg : Unsealed 2020 lawsuit: ex-IBM VP of threat intelligence alleges that IBM and AT&T concealed foreign cyber breaches to maintain eligibility for federal contracts — International Business Machines Corp. and AT&T Inc.'s computer systems were repeatedly breached by foreign hackers …
- Sources: Anthropic has embedded around half a dozen forward-deployed engineers within the NSA to help the agency deploy Mythos for offensive cyber operations (Financial Times)
Financial Times : Sources: Anthropic has embedded around half a dozen forward-deployed engineers within the NSA to help the agency deploy Mythos for offensive cyber operations — Arrangement comes as AI lab is locked in legal battle with Pentagon over Claude model
- Sources: Brian Chesky is starting a new AI lab and considering a focus on user interaction and design; he will remain Airbnb CEO and will not be the lab's CEO (Bloomberg)
Bloomberg : Sources: Brian Chesky is starting a new AI lab and considering a focus on user interaction and design; he will remain Airbnb CEO and will not be the lab's CEO — Airbnb Inc. Chief Executive Officer Brian Chesky is starting a new artificial intelligence lab, according to several people familiar …
- In a letter to Utah Senate President, Kevin O'Leary says he will cut a 40,000-acre AI data center project in Utah by roughly half, after backlash from lawmakers (John Ainger/Bloomberg)
John Ainger / Bloomberg : In a letter to Utah Senate President, Kevin O'Leary says he will cut a 40,000-acre AI data center project in Utah by roughly half, after backlash from lawmakers — A proposed Utah data center that would have been almost three times the size of Manhattan will be drastically scaled back after pressure from lawmakers.
- Supabase, which provides backend tools for building AI apps, raised a $500M Series F led by GIC at a $10B pre-money valuation, up from $5B in October 2025 (CJ Haddad/CNBC)
CJ Haddad / CNBC : Supabase, which provides backend tools for building AI apps, raised a $500M Series F led by GIC at a $10B pre-money valuation, up from $5B in October 2025 — Vibe coding needs infrastructure. Supabase is trying to provide it. — The startup, which makes back-end tools …
- Google now lets big creators and publishers in the US claim and customize dedicated Search profiles to aggregate their content from multiple platforms (Jay Peters/The Verge)
Jay Peters / The Verge : Google now lets big creators and publishers in the US claim and customize dedicated Search profiles to aggregate their content from multiple platforms — Big creators and publishers can claim dedicated Search profiles that let them highlight everything they do online.
- Founders Fund launches a TV-style game show featuring A-list founders and investors, including Sam Altman and Palmer Luckey, playing a game of Mafia (Tom Dotan/Newcomer)
Tom Dotan / Newcomer : Founders Fund launches a TV-style game show featuring A-list founders and investors, including Sam Altman and Palmer Luckey, playing a game of Mafia — Do people want to watch the likes of Sam Altman, Trae Stephens & Palmer Luckey compete in a party game? We'll soon find out. — ∙ Paid
- Arizona Public Service, the state's largest utility, proposes a 45% electricity-rate increase for data centers to ensure "that they are paying their fair share" (Jennifer Hiller/Wall Street Journal)
Jennifer Hiller / Wall Street Journal : Arizona Public Service, the state's largest utility, proposes a 45% electricity-rate increase for data centers to ensure “that they are paying their fair share” — State's largest utility is proposing a 45% electricity-rate increase for data centers and a 14.5% hike for households.
- Two House lawmakers unveil bipartisan AI legislation that would override some state AI laws and require top AI developers to implement risk-management plans (Politico)
Politico : Two House lawmakers unveil bipartisan AI legislation that would override some state AI laws and require top AI developers to implement risk-management plans — But it's the proposal to preempt state rules on AI developers that has drawn the fiercest attacks from AI safety advocates and tech critics in both parties.
- Meta's Oversight Board says Meta's account deactivations lack due process, violations are flagged without clarity, and there's little support for appeals (Sarah Perez/TechCrunch)
Sarah Perez / TechCrunch : Meta's Oversight Board says Meta's account deactivations lack due process, violations are flagged without clarity, and there's little support for appeals — Meta's Oversight Board, the independent governing body that makes policy recommendations to the tech company, said Thursday …
- Cloudflare acquires VoidZero, the company behind Vite, Vitest, Rolldown, Oxc, and Vite+ frameworks, and says the projects will stay open source (Cloudflare)
Cloudflare : Cloudflare acquires VoidZero, the company behind Vite, Vitest, Rolldown, Oxc, and Vite+ frameworks, and says the projects will stay open source — VoidZero, the company behind Vite, Vitest, Rolldown, Oxc, and Vite+, is joining Cloudflare. As part of this change, all team members of VoidZero are joining Cloudflare, too.
- Anthropic details its progress toward recursive self-improvement, and its implications, and says 80%+ of the code merged into its codebase is authored by Claude (Anthropic)
Anthropic : Anthropic details its progress toward recursive self-improvement, and its implications, and says 80%+ of the code merged into its codebase is authored by Claude — Our progress toward recursive self-improvement, and its implications. — For most of AI's history, humans drove every step in its development cycle.
- Coinbase and Better fund the first Fannie Mae-backed mortgage that uses bitcoin as collateral, with a nationwide rollout planned in the coming months (Yogita Khatri/The Block)
Yogita Khatri / The Block : Coinbase and Better fund the first Fannie Mae-backed mortgage that uses bitcoin as collateral, with a nationwide rollout planned in the coming months — Quick Take — Coinbase and Better have funded the first Fannie Mae-backed mortgage using bitcoin as collateral, with a nationwide rollout planned in the coming months.
Solidot(15)
- 在失联半年后火星 MAVEN 任务宣告结束
在经历了长达六个月的无线电静默后,MAVEN 正式宣告任务终结。这艘于 2013 年发射的探测器,在 2025 年 12 月底一次飞越火星背面的常规过程中神秘失联,根据最后传回的数据显示,探测器当时陷入了异常的快速自旋,导致轨道偏离并耗尽了机载电池。 NASA 召集的审查委员会于近日得出结论,判定其已无法复原。尽管它预计还会在轨道上徘徊 50 到 100 年才会坠毁于火星表面,但其科学寿命已画下句号。NASA 在火星轨道上有三艘探测器,包括了 2001 年发射的 Mars Odysse 探测器,2005 年发射的 Mars Reconnaissance Orbiter(MRO)探测器,以及 2013 年发射的 Mars Atmosphere and Volatile Evolution(MAVEN)。MAVEN 属于三艘中服役时间最短的探测器,另外两艘都接近寿命终点。火星轨道上还有两颗欧洲探测器,以及地面上还有漫游车,因此火星研究还会继续。
- Steam 用户中使用 Linux 比例降至 3.99%
Valve 公布了 2026 年 5 月的 Steam 硬件和软件调查。在 3 月 Steam 玩家使用 Linux 比例达到创纪录的 5.33% 之后 Linux 份额连续两个月下降:4 月 4.52%,5 月 3.99% 减少 0.53% 但仍然有去年同期的两倍。Windows 操作系统占 93.85%,OSX 占 2.16%。在玩家使用的语言中,英语占 39.48% 增加 2.71%,简体中文占 21.85% 减少 1.56%。用户使用英特尔 CPU 的比例占 53.94%,AMD 占 46.03%,英特尔份额在缓慢减少 AMD 在缓慢增加。
- 微软创建 Rust Coreutils 分支 Coreutils for Windows
在本周举行的 Build 2026 大会上,微软宣布了 Coreutils for Windows 项目——软件巨人维护的 Rust Coreutils(uutils)的一个分支,该分支不是硬分支,而是下游版本。Coreutils for Windows 包含了 uutils/coreutils、findutils 和 grep 等工具,其目标是在 Windows、WSL、macOS 和 Linux 等不同平台之间的开发切换更无缝,因为有统一的命令、flags 和管线,以相同的方式工作,现有脚本无需转换即可直接使用。不知道鲍尔默(Steve Ballmer)是不是还记得他说过的话。
- 任何程度的饮酒都会增加健康风险
一项大规模研究显示,即使每天饮酒不足一个标准杯,也会增加患多种癌症风险。研究团队分析了截至 2023 年发表的 843 项队列研究和病例对照研究,对酒精与多种疾病之间的关联进行了系统评估、在所考察的 10 种癌症中,饮酒均与风险升高有关,且风险随饮酒量增加而持续上升。即使每日摄入不足 10 克纯酒精,也与咽癌、结直肠癌、食管癌、乳腺癌、肝癌、胰腺癌和前列腺癌风险增加相关。其中咽癌风险增幅最为显著,可增加一倍以上。除癌症外,饮酒还与肝硬化等慢性肝病以及胰腺炎风险上升相关。研究显示,慢性肝病风险至少增加 40%,胰腺炎风险至少增加 22%。研究结果清晰表明,癌症风险会随着任何水平的酒精摄入而增加,而所谓“适量饮酒有益健康”的证据主要集中在部分非癌症疾病领域,且关联性较弱。
- 美国资本主义转向末日论
末日论是今天美国资本主义最强大的动力。马斯克(Elon Musk)旗下的火箭公司 SpaceX 公开宣称其使命是在火星上建立殖民地以免人类在地球上灭绝。马斯克之所以能成为美国首富,部分原因在于他是美国声音最大的末日论者。马斯克正抢在另外两位持相似千禧年主义世界观的先知前让 SpaceX 上市。Anthropic 的 Dario Amodei 和 OpenAI 的 Sam Altman、以及 Palantir CEO Alex Karp、Anduril 创始人 Palmer Luckey 都在叙述着某种末日故事。一个信奉千禧年主义的经济体必然是偏执的。Peter Thiel 说 AI 将以威权统治的形式召唤敌基督。 教宗良十四世呼吁解除 AI 的武装。英国流行歌手 Charli XCX 的新歌捕捉到了大众和教宗的情绪:春天,夏天 ‘26/当世界即将终结,没有任何希望/是的,我们正走在一条通往地狱的跑道上。
- 德国巴伐利亚州取消微软合同改用开源软件
德国巴伐利亚州数字事务部正式宣布取消与微软的合同,该合同将在五年内支出近 10 亿欧元。巴伐利亚州将转向采用开源软件。州财政部长 Albert Füracker 主张在现有合同基础上寻求折扣,而数字部长 Fabian Mehring 则力主采用开源软件。Mehring 表示,转向开源软件将确保在危机时期服务的持续使用,保护巴伐利亚州免受价格上涨的影响,并优先保障数据安全。巴伐利亚州转向开源软件是欧洲更广泛趋势的一部分,欧洲各地的地方和联邦政府都在逐步摆脱对微软和其它美国技术的依赖。
- 欧盟公布减少依赖美国科技公司的计划
欧盟周三公布了 European Technological Sovereignty Package,旨在加强科技主权减少依赖美国科技公司。微软遵守美国总统特朗普的命令关闭国际刑事法院首席检察官账号给整个欧洲敲响了警钟。最新计划旨在扶持欧洲本土企业,要求高度敏感领域的公共服务不能使用外国科技公司的服务。欧盟委员会要求各成员国对其依赖的每一项数字服务进行“主权风险评估”,评估内容包括外国控制、敏感数据的潜在访问权限以及运营中断的风险。欧盟委员会主席 Ursula von der Leyen 表示,“我们不能依赖他人的技术维持医院运转、电网稳定运行和服务安全。这关乎保护我们的公民、捍卫我们的利益以及做出我们自己的选择。”
- 需求高涨苹果将 MacBook Neo 产能增加一倍
由于需求远超预期,苹果将其入门级电脑 MacBook Neo 的产能增加一倍,从 500 万台增加到 1000 万台。MacBook Neo 的内存只有 8GB,售价 599 美元,学生折扣价 499 美元。苹果 CEO 库克表示在发布 MacBook Neo 之前就对其前景非常乐观,但公司仍然低估了消费者的热情。在 MacBook Neo 的带动下上季度 Mac 新用户数量创下历史新高。Windows PC 行业也在关注 MacBook Neo 在入门级电脑市场掀起的旋风,戴尔刚刚推出了一款起售价 699 美元(学生折扣 599)的 XPS 13 笔电,但 8GB 内存对于 Windows 11 而言属于勉强可用。
- Google 发布能在笔记本上本地运行的开源模型 Gemma 4 12B
Google 发布了能在笔记本电脑上本地运行的开源模型 Gemma 4 12B。Gemma 4 12B 有 120 亿参数,能在有 16GB 显存的笔记本电脑上本地运行——排除了绝大部分中低端笔记本电脑,只有高端的笔记本电脑才可能有 16GB 以上显存。Gemma 4 是多模态模型,能处理文本、图像和音频不同类型的信息,能理解视觉内容、处理音频输入并执行高级推理任务,因此具有更广泛应用场景。Gemma 4 12B 采用 Apache 2.0 许可证,限制较少。
- 特朗普政府将拆除洋流观测系统
特朗普政府将从本月开始拆除耗资 3.68 亿美元的海洋观测计划(Ocean Observatories Initiative)。海洋观测计划由逾 900 台深海仪器构成,用于监测洋流、海洋生态系统、碳吸收、热浪、渔业、沿海洪水和气候变化。美国国家科学基金会(NSF)表示将派出船只开始拆除锚定在俄勒冈州、华盛顿州、阿拉斯加州、北卡罗来纳州,以及格陵兰岛和冰岛之间被称为 Irminger 海域的仪器。海洋观测计划于 2016 年投入运作,原计划运行 25 年。领导该计划的海洋气象学家 Jim Edson 称其为“世界最先进的持续运行海洋观测系统”。拆除这些仪器可能需要 15 个月的时间。位于俄勒冈州附近一座活火山周围的地震仪将持续运行至 2028 年。每个观测站由多个锚定装置组成。这些设备测量从水面到数千英尺深处的洋流以及化学生物状况。仪器经过加固能承受深海的压力、腐蚀性海水以及可能损坏电子设备的海洋动植物。锚定装置周围的遥控机器人和滑翔机负责收集数据并将其传输到研究实验室。它每年的运行成本为 4800 万美元。特朗普政府曾多次试图关闭该项目,提议在 2025 年和 2026 年分别削减其 80% 的资金。但国会最终否决了这一提议,恢复了拨款。尽管如此,NSF 还是推进了观测网络的退役工作。
- 青春与长寿之间的基因权衡
科学家发现基因 vgll3 与生命早期生长发育和生殖成功以及生命晚期衰老加速和癌症风险增加直接相关。最新研究为 antagonistic pleiotropy 假说提供了实验证据。该假说认为某些基因会在生命早期带来优势,但在生命晚期则会带来不利影响。研究人员针对了一种寿命非常短的非洲丽鱼(African turquoise killifish),使用 CRISPR 基因编辑技术修改了该基因。结果显示,修改了 vgll3 基因的鱼生长速度更快,性成熟更早,在自然环境中具有繁殖优势。但代价是寿命缩短,且罹患与年龄相关癌症的几率更高。研究人员指出,大自然并不优先考虑寿命,而是优先考虑延续性。人类也存在 vgll3 基因,这项研究也有助于更好的理解人类发育、衰老和年龄相关疾病。
- Meta 给予员工每次最多 30 分钟退出跟踪
Meta 最近开始在美国员工电脑上安装追踪软件,捕捉员工鼠标移动、点击和按键数据以用于训练 AI 模型,此举是该公司构建能自动执行工作任务的 AI 智能体的大计划的一部分。被称为 Model Capability Initiative(MCI)的工具在公司内部引发了强烈反对,部分员工为此发起了一项请愿活动,已有逾 1500 人签名。有匿名员工认为公司的行为“非常反乌托邦”。根据周二发给员工的一份内部备忘录,Meta 略微后退了一步,允许员工退出跟踪,“每次最长 30 分钟”,员工也可以申请永久退出该跟踪计划。
- 数学家警告 AI 对数学专业的威胁
数学家联合发表了获得国际数学联盟支持的宣言《Leiden Declaration》,警告 AI 通过产生大量看似合理但不可靠甚至错误的证明、削弱归因、改变激励机制以及赋予科技公司对研究优先事项过大的影响力去破坏数学。已有数百人签署了这一宣言,它警告 AI 的发展威胁到了数学研究的固有价值。宣言首先指出,区分 AI 产生的证明和正确的数学证明非常困难,给审稿人带来了越来越大的压力,生成 AI 论文成本低廉但验证论文代价昂贵,如果后续研究是基于错误的前提,那么错误会扩大。其次 AI 的训练是基于已有的数学论文,但它输出论文时经常不能正确引用,AI 模型的训练也普遍存在版权侵犯问题。第三 AI 的激励机制与数学专业的价值观背道而驰。宣言敦促数学家将 AI 视为一种工具,而非人类责任的替代品。数学家个人应公开 AI 的使用情况,对其工作的正确性承担责任。宣言还警告,数学可能被用于战争、压迫、大规模监控和破坏民主,因此数学家应谨慎权衡与科技行业合作的伦理问题。
- 微软的量子芯片存在基础性问题
微软宣布了其第二代量子芯片 Majorana 2。但专家认为微软的量子芯片缺乏坚实的研究基础,根本行不通。微软是在 2025 年初宣布了其第一代量子计算芯片 Majorana 1,利用它所谓的拓扑体去观察和控制马约拉纳粒子,从而产生更可靠和可扩展的量子比特。第一代拓扑体使用砷化铟半导体和铝超导体,结果到了第二代微软换成了铅超导体,声称量子比特的寿命从 20 秒延长到了 1 分钟。科学家对微软的说法持强烈怀疑态度,它的最新论文预印本尚未通过同行审议,物理学家 Henry Legg 认为预印本中数据来自于随机伪影。微软的上一篇预印本至今没有通过同行审议,很可能已被顶尖期刊拒绝了。
- 四千年前的古城 Mohenjo-daro 随经济发展而变得更平等
约克大学研究人员分析了古城 Mohenjo-daro 的住房模式。这座古城位于今天的巴基斯坦,其繁荣的时代是在公元前 2600 年至 1900 年间,它是印度河文明的最大城市之一。研究人员发现,Mohenjo-daro 的贫富差距低于其他古代城市。随着时间的推移,其贫富差距甚至缩小了。这座古城与其它文明的古城有显著的差异:没有宫殿没有统治者的巨型雕像没有奢侈陵墓,但拥有井然有序的街道和先进的排水系统,其公共基础设施遍及全城而不是只服务于精英阶层。古埃及为统治者建造金字塔,青铜时代的希腊为精英阶层建造宫殿,而 Mohenjo-daro 则投资于面向全体民众的公共服务。Mohenjo-daro 挑战了长期以来“经济增长会导致不平等加剧”的观点,城市发展和生产力提高的同时,资源分配也更加公平。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.