OrangeBot.AI Digest — 2026-06-05
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Gov.uk has replaced Stripe with Dutch provider Adyen (www.theregister.com)
- Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency (blog.google)
- New method turns ocean water into drinking water, without waste (www.rochester.edu)
- pg_durable: Microsoft open sources in-database durable execution (github.com)
- Conventional Commits encourages focus on the wrong things (sumnerevans.com)
- Dutch gov't will only allow European company to operate DigiD platform (nltimes.nl)
- I tested every IP KVM in my Homelab (www.jeffgeerling.com)
- Astronauts told to return to ISS after sheltering over air leak repairs (www.bbc.com)
- Mouseless – keyboard-driven control of macOS/Linux/Windows (mouseless.click)
- Did Claude increase bugs in rsync? (alexispurslane.github.io)
- Ultra-processed foods in the global food system: The role of tobacco companies (ajph.aphapublications.org)
- Redis 8.8: New array data structure, rate limiter, performance improvements (redis.io)
- Tracing a powerful GNSS interference source over Europe (arxiv.org)
- Changing how we develop Ladybird (ladybird.org)
- C++: The Documentary (herbsutter.com)
GitHub Trending(15)
- NousResearch / hermes-agent
- chopratejas / headroom
- CopilotKit / CopilotKit
- lfnovo / open-notebook
- affaan-m / ECC
- Panniantong / Agent-Reach
- NVIDIA / cosmos
- 666ghj / MiroFish
- mvanhorn / last30days-skill
- PaddlePaddle / PaddleOCR
- openai / plugins
- MemPalace / mempalace
- withastro / flue
- openclaw / openclaw-windows-node
- aquasecurity / trivy
Product Hunt(15)
- Agent Browser Shield
Block prompt inject & cut token costs for AI browser agents
- Nemotron 3 Ultra by NVIDIA
Powers faster, efficient reasoning for long-running agents
- Leni
The world’s most accurate AI for investors
- Lumo Studios
Build Decks that Speak for Themselves
- Microsoft MAI-Voice-2
Expressive TTS with voice cloning in 15 languages
- Veltrix AI
AI finance copilot for cash flow, margins, and growth
- Agent Mode on Arena
Get real-world tasks done with autonomous AI agents
- Ideogram 4.0
Generate design-ready image with open weight, layout control
- SellerClaw
A team of AI agents that runs your stores across channels
- LocalClicky
Control your Mac with your voice locally
- Minimi
Your ambient memory for Claude
- Moodloom
Ad-free Pinterest Alternative with AI content filtering
- Recursi
Self improving vibe coding env with no API fees
- Clarafy
Type messy and have it instantly polished
- Treadmill Pro
Control your treadmill from your iPhone, wirelessly
Hugging Face(15)
- ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.
- TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration
Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their total number unknown in advance. We frame this as the task of discovering multiple hidden problems from context, in which coexisting problems should be uncovered, grounded in supporting evidence, and paired with concrete actions. To this end, we introduce TIDE, a template-guided iterative framework with two complementary mechanisms. Specifically, motivated by the observation that single-pass prediction anchors on the most salient cases and yields generic claims, we propose iterative discovery, which surfaces a small batch of candidates per round while conditioning on what has already been found, so subsequent rounds extend coverage; and thought templates, reusable schemas distilled from previously solved cases that specify what contextual signals to attend to and how to connect them, anchoring each prediction in a recognizable problem class. We validate TIDE on two realistic settings, personal workspaces and software repositories, across four model backbones, showing substantial gains over single-shot and parallel multi-agent baselines on task coverage, identification, and resolution.
- Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific LoRA adapters, effectively injecting repository knowledge with zero inference-time token overhead. Code2LoRA supports two usage scenarios: Code2LoRA-Static converts a single repository snapshot into an adapter, suitable for comprehension of stable codebases; while Code2LoRA-Evo maintains an adapter backed by a GRU hidden state updated per code diff, suitable for active development of evolving codebases. To evaluate Code2LoRA against parameter-efficient fine-tuning baselines, we build RepoPeftBench, a benchmark of 604 Python repositories with two tracks: a static track with 40K training and 12K test assertion-completion tasks, and an evolution track with 215K commit-derived training and 87K commit-derived test tasks. On the static track, Code2LoRA-Static achieves 63.8% cross-repo and 66.2% in-repo exact match, matching the per-repository LoRA upper bound; on the evolution track, Code2LoRA-Evo achieves 60.3% cross-repo exact match (+5.2 pp over a single shared LoRA). Code2LoRA's code can be found at https://anonymous.4open.science/r/code2lora-6857; the model checkpoints and RepoPeftBench datasets can be found at https://huggingface.co/code2lora.
- AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To address this gap, we introduce AdaPlanBench, a dynamic interactive benchmark for evaluating whether Large Language Model (LLM) agents can adaptively plan and re-plan under progressively revealed world and user constraints. AdaPlanBench is built on 307 household tasks, with a scalable constraint construction pipeline that augments each task with dual constraints. At runtime, agents interact with the environment in a multi-turn protocol where hidden constraints are revealed only when the agent proposes a plan that violates them, requiring iterative plan revision under accumulating feedback. This makes planning challenging, as agents must infer and track constraints from feedback while re-planning effectively. Experiments on ten leading LLMs show that adaptive planning under dual constraints remains challenging, with the best model reaching only 67.75% accuracy. We further observe that performance degrades as more constraints accumulate, with user constraints posing a particularly large challenge and failures often stemming from weaker physical grounding and reduced effectiveness. These results establish AdaPlanBench as a testbed for dual-constrained interactive planning and highlight the challenge of reliable adaptation to dynamically revealed constraints in LLM agents.
- VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding
We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and reasoning-intensive video understanding. It comprises 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. We develop a human-in-the-loop, skill-oriented example generation pipeline that targets progressively deeper video reasoning capabilities while ensuring the difficulty, diversity, and reliability of both the examples and their CoT rationales. We also curate VideoKR-Eval, a new expert-annotated benchmark where questions require genuine video understanding and knowledge-intensive reasoning rather than textual shortcuts. Our experiments show that, under a standard SFTrightarrowGRPO pipeline, models post-trained on VideoKR outperform prior post-training approaches on knowledge-intensive video reasoning while remaining competitive on general video reasoning, highlighting data design as a key driver of progress in video reasoning. We further conduct comprehensive ablations to isolate the contributions of VideoKR, providing actionable insights for future work.
- Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.
- RobotValues: Evaluating Household Robots When Human Values Conflict
While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet, there are no benchmarks for evaluating robots' value preferences in such scenarios. We introduce RobotValues, a benchmark to evaluate household robot planners in 10K value-conflict scenarios. Each instance consists of a realistic household image with multiple plausible robot actions that prioritize different human values. We construct RobotValues through LLM-assisted scenario generation, stakeholder-grounded value extraction, image generation and automatic quality control. Using RobotValues we evaluate VLMs used in robotics and find that models exhibit default value preferences, including safety and accommodation, while underselecting privacy-prioritizing actions. When the models are instructed to prioritize specific values that conflict with their own preferences, they often fail to override their default actions, choosing incorrect actions for 80% of the time. These findings suggest that household robot evaluation should measure not only task completion or safety compliance, but also whether robots can choose among plausible actions when human values conflict.
- Personal AI Agent for Camera Roll VQA
We study the personal camera roll visual question answering setting. In this setting, a conversational AI assistant can access a user's personal camera roll and retrieve relevant photos to answer queries, ranging from simple factual questions (e.g., ``Name of the food I tried yesterday?'') to more open-ended ones (e.g., ``Recommend some dishes I have never eaten before''). Given the vast nature of the personal camera roll (i.e., multiple years, hundreds to thousands of photos), a successful AI assistant needs to understand a long-horizon, highly personalized visual content stream in order to navigate and locate the correct and/or relevant information. To support this, we collect and manually annotate questions that mimic real-world usage. The final dataset, camroll, contains 50 users, 31,476 images, and 2,500 QA pairs. We further design camroll-agent, a conversational AI agent equipped with hierarchical memory and a minimal set of tools for efficient navigation over large, personalized visual memory. Experimental results show that camroll-agent outperforms numerous baselines and methods for long-context understanding AI agents system. Together, the camroll dataset and camroll-agent highlight the gap in AI agents' long-context reasoning: personalized visual memory requires different approaches from standard long-context textual memory, especially when consistency, visual details, and user-specific context are present.
- LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing
Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for editing by concatenating sequence tokens. This concatenation inevitably doubles the sequence length, quadrupling the computational complexity of the self-attention mechanism and introducing prohibitive overhead. To address these bottlenecks, we present LoomVideo, a highly efficient 5B-parameter unified architecture for both video generation and editing. LoomVideo replaces the standard text encoder with a Multimodal Large Language Model (MLLM) and employs Deepstack injection mechanism to align multi-layer MLLM features with the Diffusion Transformer (DiT). Crucially, we introduce a zero-overhead Scale-and-Add conditioning approach for video editing. By scaling and directly adding the clean source video latent to the noised target latent, this elegant design eliminates the need for token concatenation, drastically reducing computational cost while maintaining robust capabilities for complex, non-rigid edits. Furthermore, a Negative Temporal RoPE strategy is seamlessly integrated to handle multiple reference images. Extensive experiments demonstrate that our compact 5B model achieves state-of-the-art or highly competitive performance across comprehensive benchmarks, exhibiting exceptional superiority in e-commerce and fashion generation scenarios. Benefiting from the zero-overhead conditioning mechanism, LoomVideo achieves at least a 5.41x acceleration in inference speed compared to models of similar capabilities, paving the way for highly practical and efficient video foundation models.
- Complexity-Balanced Diffusion Splitting
Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS), a principled framework for temporal capacity allocation that distributes the generative workload across multiple specialized sub-networks. Grounded in function approximation theory and de Boor's equidistribution principle, CBS partitions the diffusion timeline into segments of equal approximation burden, allocating more representational capacity to regions where the generative dynamics are more difficult to model. To estimate this local complexity, we introduce two complementary and tractable monitor functions: a spatial measure based on the flow's Dirichlet energy, and a geometric measure based on the acceleration of the sampling trajectories. Using a lightweight auxiliary model to estimate these complexity profiles, our approach eliminates the need for heuristic temporal splits or computationally expensive search procedures. Extensive evaluation across multiple architectures (SiT, JiT, and UNet) and datasets demonstrates that CBS consistently improves synthesis quality without increasing per-step inference cost. In particular, CBS improves FID by ~35% on SiT-XL with CFG relative to naive temporal partitioning. Project page is available at https://noamissachar.github.io/CBS/.
- Rethinking Continual Experience Internalization for Self-Evolving LLM Agents
Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under multi-iteration experience learning, existing methods suffer from a progressive capability collapse rather than compounding improvement. We systematically examine this failure through three vital dimensions of experience internalization: (1) Experience Granularity: We find that principle-level experience is more durable than instance-level experience, as it effectively abstracts transferable strategies away from trajectory-specific details. (2) Experience Injection Pattern: Our analysis reveals that step-wise injection significantly outperforms global injection by aligning experience with intermediate decision states, a property that is critical for long-horizon tool use. (3) Internalization Regime: We demonstrate that off-policy context-distillation on high-quality teacher trajectories provides a substantially more stable training signal than on-policy context-distillation, which is inherently limited by local corrections on student-induced flawed states. Together, these insights yield a simple yet robust recipe for stable and sustainable experience internalization, providing concrete guidance for engineering self-evolving and continually learning LLMs.
- Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?
Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic manipulation as a concrete, measurable window onto this question: if a model has truly internalized physical laws, the motion it depicts should translate into executable robot behavior. We introduce Dream.exe, an evaluation framework that operationalizes this criterion through a video-to-execution pipeline. Given a scene image and a task description, Dream.exe synthesizes a manipulation video, converts the generated motion into robot trajectories, and executes them in a physics simulator, yielding a grounding signal that purely visual metrics cannot offer. Using this pipeline, we evaluate 8 models spanning frontier closed-source generators, open-source generators, and robot-specific models. Our benchmark covers 101 manually curated manipulation tasks at three levels of physical complexity, measured across visual quality, trajectory fidelity, and execution success. Encouragingly, several models achieve measurable execution success, suggesting that generative priors learned from internet-scale data already encode meaningful physical knowledge. Yet visual quality proves a poor predictor of executability, exposing a dimension of model capability that standard visual evaluations do not capture. Dream.exe will be open-sourced at https://github.com/showlab/Dream.exe.
- The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset
Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/
- Unsupervised Skill Discovery for Agentic Data Analysis
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success criteria vary across analytical formats. This raises the key question of how to discover reusable data-analysis skills from unlabeled exploration alone. We propose DataCOPE, an unsupervised verifier-guided skill discovery framework for data-analytic agents. DataCOPE derives verifier signals from the exploration trajectories and uses them to characterize relative quality or aggreement among trajectories. It iteratively coordinates a Data-Analytic Agent for trajectory generation, an Unsupervised Verifier for signal extraction, and a Skill Manager for contrastive skill distillation. For report-style analysis, we instantiate the verifier as an Adaptive Checklist Verifier that derives task-specific criteria, scores reports by verifiable coverage, and iteratively refines the checklist. For reasoning-style analysis, we instantiate it as an Answer Agreement Verifier that groups trajectories by answer agreement and uses self-consistency as an auxiliary signal. We evaluate DataCOPE on report-style analysis from Deep Data Research and reasoning-style analysis from DABStep. Across both settings, DataCOPE consistently improves held-out performance over baselines. Averaged across four model settings, DataCOPE improves the mean score by 9.71% and 32.30% on report-style and reasoning-style tasks respectively.
- LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs
Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based capability attacks with non-adversarial evaluations. We propose a metric transformation that, applied to existing functions, allows to create propensity metrics. We further introduce SimpleTrace, a lightweight tracing pipeline built on infini-gram that deterministically attributes model generations to large-scale training corpora and computes verbatim, near-verbatim, and propensity-transformed memorization metrics. Evaluating two fully-open models: Comma and DFM Decoder on two datasets: Common Pile and Dynaword in two languages, we find a consistent gap between capability and propensity: prefix attacks elicit substantially stronger memorization signals than generic or dataset-specific prompts, while propensity scores remain low overall. Thus, the models can reveal training data when directly elicited, but rarely do so in more common non-adversarial settings. We also find that DFM Decoder, which is continually pre-trained from Comma, exhibits reduced memorization and memorization propensity for Common Pile, confirming that memorization capability can decrease when later training emphasizes partially different data. Our results suggest, and we encourage, that memorization audits should report both worst-case extractability and ordinary leakage propensity in order to have a more comprehensive view of this phenomenon.
Techmeme(15)
- Marvell and Flex, a contract manufacturer for electronics, will join the S&P 500; MRVL jumps 6%+ after hours after closing down 16.74% amid a broader sell-off (Kif Leswing/CNBC)
Kif Leswing / CNBC : Marvell and Flex, a contract manufacturer for electronics, will join the S&P 500; MRVL jumps 6%+ after hours after closing down 16.74% amid a broader sell-off — - Marvell Technology, which makes parts and products needed for the AI infrastructure boom, is joining the S&P 500
- Source: OpenAI and White House are discussing a government stake in the company, to seed something like the "Public Wealth Fund" that OpenAI outlined earlier (CNBC)
CNBC : Source: OpenAI and White House are discussing a government stake in the company, to seed something like the “Public Wealth Fund” that OpenAI outlined earlier — OpenAI CEO Sam Altman and the White House are in ongoing talks about a possible government stake in the artificial intelligence company, CNBC confirmed on Friday.
- Sources: Apollo and Blackstone finalized a $35B package for Anthropic to lease TPUs; Broadcom is backstopping payments on the debt's largest senior portions (Bloomberg)
Bloomberg : Sources: Apollo and Blackstone finalized a $35B package for Anthropic to lease TPUs; Broadcom is backstopping payments on the debt's largest senior portions — Apollo Global Management Inc. and Blackstone Inc. have finalized a $35 billion financing package for Anthropic PBC to expand its AI infrastructure …
- Trump signs a national security memorandum seeking to "accelerate the use of AI across intelligence and warfighting domains in line with American values" (Reuters)
Reuters : Trump signs a national security memorandum seeking to “accelerate the use of AI across intelligence and warfighting domains in line with American values” — The White House said on Friday it would accelerate the development and use of AI for national security applications …
- US-traded chipmakers plunged on Friday, after Broadcom missed expectations; Nvidia fell 6.19%, Micron fell 13.25%, AMD fell 10.86%, and Broadcom 7.92% (Noel Randewich/Reuters)
Noel Randewich / Reuters : US-traded chipmakers plunged on Friday, after Broadcom missed expectations; Nvidia fell 6.19%, Micron fell 13.25%, AMD fell 10.86%, and Broadcom 7.92% — U.S.-traded chipmakers plunged on Friday, losing over $1 trillion in market value, with deep losses in AI heavy hitters including Nvidia …
- Bitcoin falls below $60K, its lowest level since October 2024, amid a record streak of bitcoin ETF outflows following Strategy's bitcoin sale (CNBC)
CNBC : Bitcoin falls below $60K, its lowest level since October 2024, amid a record streak of bitcoin ETF outflows following Strategy's bitcoin sale — Bitcoin extended its losses on Friday, dropping to October 2024 lows to cap an already bruising week for crypto investors.
- Sources: AI coding startup Lovable is in talks to raise funding at a $12B valuation, up from $6.6B in December 2025 (Rashi Shrivastava/Forbes)
Rashi Shrivastava / Forbes : Sources: AI coding startup Lovable is in talks to raise funding at a $12B valuation, up from $6.6B in December 2025 — The less than two-year old startup crossed $400 million in annual recurring revenue earlier this year. The new fundraise would almost double its valuation.
- Filing: Google has agreed to pay SpaceX $920M per month for access to Nvidia chips as part of a cloud-services deal that runs through mid-2029 (Bloomberg)
Bloomberg : Filing: Google has agreed to pay SpaceX $920M per month for access to Nvidia chips as part of a cloud-services deal that runs through mid-2029 — Alphabet Inc.'s Google has agreed to pay Elon Musk's SpaceX $920 million a month for computing power as part of a cloud services deal that runs through mid-2029 …
- Sources: Revolut is looking to run a secondary share sale that would value it at $115B, after receiving a UK bank license and applying for a charter in the US (Bloomberg)
Bloomberg : Sources: Revolut is looking to run a secondary share sale that would value it at $115B, after receiving a UK bank license and applying for a charter in the US — Revolut Ltd. is looking to run a secondary share sale that would value the digital bank at $115 billion, on the heels of receiving …
- President Trump says he is weighing proposals for US government to hold equity stakes in leading AI labs, and will soon discuss the idea with their executives (Bloomberg)
Bloomberg : President Trump says he is weighing proposals for US government to hold equity stakes in leading AI labs, and will soon discuss the idea with their executives — President Donald Trump expressed interest in the US government holding equity stakes in leading artificial intelligence developers …
- Sources: Meta explores a stock offering to raise tens of billions to fund AI capital expenditures, following Google's record $85B share deal; META drops 5%+ (Financial Times)
Financial Times : Sources: Meta explores a stock offering to raise tens of billions to fund AI capital expenditures, following Google's record $85B share deal; META drops 5%+ — Facebook parent could sell tens of billions of dollars in new stock as it seeks to finance AI infrastructure
- Sources say xAI used Claude models for distillation and training, including using personal accounts and the intermediary service Blackbox AI after being cut off (Grace Kay/The Information)
Grace Kay / The Information : Sources say xAI used Claude models for distillation and training, including using personal accounts and the intermediary service Blackbox AI after being cut off — SpaceX's AI lab has had a chaotic year so far, churning through leaders and staff before bringing in outside help in an effort to catch up to Anthropic in coding.
- The largest US banks plan to launch a tokenized deposit network in 2027 to connect traditional payment rails with the infrastructure that digital assets run on (Wall Street Journal)
Wall Street Journal : The largest US banks plan to launch a tokenized deposit network in 2027 to connect traditional payment rails with the infrastructure that digital assets run on — The new network could help banks contend with a wave of new competition from stablecoins and crypto firms
- Over 20 EU news publishers file a claim seeking €640M+ from Google following an EU decision that lets anyone harmed by Google's ad market abuse seek damages (Charlotte Tobitt/Press Gazette)
Charlotte Tobitt / Press Gazette : Over 20 EU news publishers file a claim seeking €640M+ from Google following an EU decision that lets anyone harmed by Google's ad market abuse seek damages — Small publishers enabled to take on Google for damages via group claim. — More than 20 European news publishers …
- The NY State legislature passes a one-year moratorium on new large data centers, the first statewide ban of its kind if Governor Kathy Hochul signs it into law (Lauren Feiner/The Verge)
Lauren Feiner / The Verge : The NY State legislature passes a one-year moratorium on new large data centers, the first statewide ban of its kind if Governor Kathy Hochul signs it into law — It could become the first ban of its kind in a state. … The New York State legislature passed a one-year moratorium …
Solidot(15)
- 因空气泄露国际空间站宇航员被告知准备紧急撤离
由于国际空间站俄罗斯舱段的漏气过去几天从每天一磅空气增加到两磅(0.9 公斤),NASA 命令国际空间站上的宇航员待在飞船内,做好紧急撤离的准备。NASA Crew-12 任务的四名宇航员——两名美国宇航员、一名法国宇航员和一名俄罗斯宇航员——于美国东部时间周五 9.04am 接到 NASA 任务控制中心的命令,进入与空间站对接的 Crew Dragon 飞船,穿上宇航服,以防漏气情况需要紧急撤离。漏气的舱段位于 Progress(进步号)气闸舱和 Zvezda(星辰号)服务舱之间的 PrK 模块,漏气原因是微小的结构裂缝。最近几个月 NASA 和俄罗斯航天局一直在讨论漏气的原因和可能的修复方案。
- Brave 以 60 美元出售精简版本
Brave 浏览器过去几年积累了加密货币钱包、AI 助手、新闻流和奖励计划等不太欢迎的功能。为了回应用户对臃肿功能的不满,Brave 推出了精简版 Brave Origin 浏览器。Linux 平台免费,但其它平台则要付费,且价格不菲。Brave Origin 移除了 Brave Rewards、钱包、Leo AI、新闻流、Talk、VPN、Tor 等功能,保留了内置的广告和跟踪器屏蔽功能 Brave Shields,它的一次性授权费用为 59.99 美元,最多可用于 10 台设备。60 美元是否物有所值则取决于用户了。
- 超加工食品的加工过程可能与健康风险相关
越来越多的研究将超加工食品与心脏病、糖尿病、过早死亡等关联起来。但科学家仍在争论究竟是什么导致了健康风险:是食品本身的营养质量,还是生产过程中使用的工业加工和添加剂。根据《American Journal of Public Health》期刊上的一项研究,加工过程本身可能在其中发挥着重要作用。超加工食品的加工过程会改变食物细胞结构、流失有益化合物,引入添加剂以及包装的化合物。对美国长达 20 年的健康营养数据分析显示,超加工食品的热量每增加 10%,健康指标就会恶化。食用超加工食品的人体重更高、血糖控制更差、血压更高、胆固醇水平较差。他们更容易患上糖尿病、代谢综合征和癌症,在研究期间有更高的死亡风险。在考虑了超加工食品的营养质量,以及饱和脂肪、添加糖或钠的含量之后,这种关联仍然存在。
- 大黄蜂能利用工具解决问题
根据发表在《科学》期刊上的一项研究,大黄蜂能利用工具解决问题。昆虫加入到了能解决“盒子香蕉”问题的动物行列,展现出了基本智能。在盒子香蕉问题中,黑猩猩通过叠盒子够着了之前够不着的香蕉。在最新研究中,研究人员根据大黄蜂修改了盒子香蕉问题:它需要将聚苯乙烯球滚到特定位置,然后爬上去够到低天花板上的人造花。参与实验的大黄蜂只有几周大,研究人员训练它们将人造花与糖水奖励联系起来。在基础测试中 75% 的黄蜂成功够到了花朵;在更复杂的测试中,30 只黄蜂中有 23 只成功了。研究人员指出,即使昆虫的大脑非常小,它们也能灵活解决各种新问题。
- 机器人的 HTTP 请求超过人类
根据 Cloudflare 的统计,基于 HTTP 请求的机器人流量已远超人类,由于数据混乱机器人流量超过人类的确切时间不太清楚。目前机器人流量占 57.5%,人类占 42.5%。Cloudflare 统计的是 AI 智能体,这些 AI 智能体能代表人类浏览网页,阅读产品页面、查看价格、执行比较航班等多步骤任务、抓取和索引网页内容——但用于 AI 大模型而非搜索引擎,以及充当私人助理去订餐比价和购物,处理客户服务等。就应用使用、流媒体播放和无限滚动信息流的总时长而言,人类用户仍然是主要群体。按国别/地区划分,直布罗陀岛的机器人流量比例最高(92.1%),其次是新加坡(76.4%)和伊朗(76.4%),伊朗可能是 VPN 用户比较多。
- 苹果称 App Store 生态系统规模突破 1.4 万亿美元
苹果宣布全球 App Store 生态系统在 2025 年促成了逾 1.4 万亿美元开发者营业额与销售额。在 App Store 生态系统促成的营业额和销售额中,超过 90% 完全归开发者所有,无需向 Apple 支付任何佣金。苹果未单独披露 App Store 收入,而是将其计入服务业务。服务业务在 2025 财年贡献了 1091 亿美元,占苹果总收入 4161 亿美元的近三分之一。iPhone 业务收入最高达到 2095 亿美元。根据 Analysis Group 的分析,1.4 万亿美元中 1490 亿美元来自数字商品和服务,1.1 万亿美元来自实体商品和服务。中国市场贡献了最大的销售额 5620 亿美元,其次是美国 4530 亿美元、欧洲 1840 亿美元和日本 520 亿美元。
- Google 寻求在加州和佛州释放数千万只无生育能力的雄蚊
Google 旗下企业 Debug 正寻求政府许可在加利福尼亚州和佛罗里达州释放 3200 万只雄蚊。这些雄蚊携带了沃尔巴克氏体细菌(Wolbachia),会导致细胞质不亲和性,意思是雄蚊的精子无法让野雌蚊的卵子受精。理论上这会导致蚊群数量逐代减少。雄蚊不会叮咬人,只有雌蚊才会,因此 Debug 并没有释放大量吸血昆虫。Debug 正在等待美国环保署的批准,公众意见征询截止日期 6 月 5 日。目前的公众意见显示很多人持有阴谋论观点,声称“人不是实验鼠”。
- 日本计划 2049 年前重建 2-5 个核电机组
日本政府计划 2049 年之前重建 2-5 个已决定报废的核电机组,2059 年之前增至 11-14 个。其背景是 AI 的普及预计将带动电力需求增长。日本的国家核能政策方针已从 2011 年东京电力福岛第一核电站事故后提出的降低依赖转向最大限度利用。2025 年修改的《能源基本计划》提出了 204 0年度核能占到国内电源构成 2 成的目标。核电站运转期限最长为 60 年,日本部分机组已运行 50 年以上。靠重启现有核电机组已无法实现这一目标,需要进行重建或新建。目前日本国内有 11 座核电站共 24 个机组正在开展报废作业。其中关西电力美滨核电站(福井县)和九州电力川内核电站(鹿儿岛县)被视为重建的热门选项。
- rsync 项目争议 AI 辅助编程
广泛使用的备份项目 rsync 最近释出的一个版本导致部分用户增量备份失败,用户在检查代码时发现 rsync 维护者 Andrew Tridgell 最近大量使用 AI 辅助编程,项目有数十个 commits 的作者是 tridge 和 claude——tridge 是 Andrew Tridgell,而 claude 就是 Anthropic 的 AI 助手 Claude。此事立即引发了 AI 生成代码的争议。Tridgell 随后通过个人博客回应了争议,承认近期大量使用 AI 编程,他反驳了批评,称批评者在不了解 AI 工具实际使用情况就妄下结论。他表示自己设计了框架,对 AI 生成的代码进行人工审查,他只是将繁琐的工作交给 AI,称自己是一名有 40 年经验的软件工程师。Tridgell 表示会继续使用 AI 工具。
- 苹果在美国德州引入年龄验证
苹果从 6 月 4 日周四起在美国德州引入年龄验证,此举是为了遵守德州的法律《App Store Accountability Act(SB 2420)》。去年 12 月法官阻止了该法律的生效,但上诉法庭推翻了这一裁决。苹果一直试图阻止在其应用商店 App Store 验证年龄,但它已宣布计划实施年龄验证以遵守犹他、路易斯安那、巴西、澳大利亚、新加坡和英国等地的法律。Google 也被要求对 Play Store 进行类似的更改。美国德州用户在创建新苹果账户时,需要使用信用卡或政府颁发的身份证件验证是否年满 18 周岁。苹果也可能根据用户账户的注册时间以及是否绑定了信用卡等自动验证用户的年龄。
- AI 没有意识
知名科幻作家姜峯楠(Ted Chiang)在《大西洋月刊》上发表文章认为 AI 没有意识,它只是在玩角色扮演游戏。Anthropic 被视为 AI 巨头,但它真正擅长的可能是拟人化。大模型能生成流畅的文本并不意味着它们有意识,虽然销售大模型的公司一直在助长这种误解。它输出的每个单词都以完全相同的方式生成。深度伪造通常指的是照片、音频和视频,但当讨论意识时,我们也需要将文本视为一种深度伪造媒介。深度伪造照片和大模型对话的主要区别在于前者是故意欺骗他人后者更多是自我欺骗。姜峯楠认为意识需要有主观体验,大模型缺乏主观体验这一事实与它能否成为有用工具或产生显著经济影响不相关。它们脱离现实的内在本质,以及概率性质意味着它们永远无法达到传统软件所具备的可靠性,虽然大模型可能足够优秀到能改变部分领域的工作方式。
- 在失联半年后火星 MAVEN 任务宣告结束
在经历了长达六个月的无线电静默后,MAVEN 正式宣告任务终结。这艘于 2013 年发射的探测器,在 2025 年 12 月底一次飞越火星背面的常规过程中神秘失联,根据最后传回的数据显示,探测器当时陷入了异常的快速自旋,导致轨道偏离并耗尽了机载电池。 NASA 召集的审查委员会于近日得出结论,判定其已无法复原。尽管它预计还会在轨道上徘徊 50 到 100 年才会坠毁于火星表面,但其科学寿命已画下句号。NASA 在火星轨道上有三艘探测器,包括了 2001 年发射的 Mars Odysse 探测器,2005 年发射的 Mars Reconnaissance Orbiter(MRO)探测器,以及 2013 年发射的 Mars Atmosphere and Volatile Evolution(MAVEN)。MAVEN 属于三艘中服役时间最短的探测器,另外两艘都接近寿命终点。火星轨道上还有两颗欧洲探测器,以及地面上还有漫游车,因此火星研究还会继续。
- Steam 用户中使用 Linux 比例降至 3.99%
Valve 公布了 2026 年 5 月的 Steam 硬件和软件调查。在 3 月 Steam 玩家使用 Linux 比例达到创纪录的 5.33% 之后 Linux 份额连续两个月下降:4 月 4.52%,5 月 3.99% 减少 0.53% 但仍然有去年同期的两倍。Windows 操作系统占 93.85%,OSX 占 2.16%。在玩家使用的语言中,英语占 39.48% 增加 2.71%,简体中文占 21.85% 减少 1.56%。用户使用英特尔 CPU 的比例占 53.94%,AMD 占 46.06%,英特尔份额在缓慢减少 AMD 在缓慢增加。
- 微软创建 Rust Coreutils 分支 Coreutils for Windows
在本周举行的 Build 2026 大会上,微软宣布了 Coreutils for Windows 项目——软件巨人维护的 Rust Coreutils(uutils)的一个分支,该分支不是硬分支,而是下游版本。Coreutils for Windows 包含了 uutils/coreutils、findutils 和 grep 等工具,其目标是在 Windows、WSL、macOS 和 Linux 等不同平台之间的开发切换更无缝,因为有统一的命令、flags 和管线,以相同的方式工作,现有脚本无需转换即可直接使用。不知道鲍尔默(Steve Ballmer)是不是还记得他说过的话。
- 任何程度的饮酒都会增加健康风险
一项大规模研究显示,即使每天饮酒不足一个标准杯,也会增加患多种癌症风险。研究团队分析了截至 2023 年发表的 843 项队列研究和病例对照研究,对酒精与多种疾病之间的关联进行了系统评估、在所考察的 10 种癌症中,饮酒均与风险升高有关,且风险随饮酒量增加而持续上升。即使每日摄入不足 10 克纯酒精,也与咽癌、结直肠癌、食管癌、乳腺癌、肝癌、胰腺癌和前列腺癌风险增加相关。其中咽癌风险增幅最为显著,可增加一倍以上。除癌症外,饮酒还与肝硬化等慢性肝病以及胰腺炎风险上升相关。研究显示,慢性肝病风险至少增加 40%,胰腺炎风险至少增加 22%。研究结果清晰表明,癌症风险会随着任何水平的酒精摄入而增加,而所谓“适量饮酒有益健康”的证据主要集中在部分非癌症疾病领域,且关联性较弱。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.