OrangeBot.AI Digest — 2026-05-20
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- GitHub confirms breach of 3,800 repos via malicious VSCode extension (www.bleepingcomputer.com)
- An OpenAI model has disproved a central conjecture in discrete geometry (openai.com)
- How fast is N tokens per second really? (mikeveerman.github.io)
- Apparently Google hates us now (twitter.com)
- Tennessee man jailed 37 days for Trump meme wins settlement after lawsuit (www.fire.org)
- Goodbye Visa and Mastercard: 130M Europeans switching to sovereign payment (www.lesnumeriques.com)
- Meta blocks human rights accounts from reaching audiences in Saudi Arabia, UAE (www.alqst.org)
- Saying Goodbye to Asm.js (spidermonkey.dev)
- Google's AI is being manipulated. The search giant is quietly fighting back (www.bbc.com)
- College students drown out AI-praising commencement speeches with boos (www.tomshardware.com)
- Qwen3.7-Max: The Agent Frontier (qwen.ai)
- Map of Metal (mapofmetal.com)
- Incident Report: May 19, 2026 – GCP Account Suspension (blog.railway.com)
- Japan is gripped by mass allergies. A 1950s project is to blame (www.bbc.com)
- Everything in C is undefined behavior (blog.habets.se)
GitHub Trending(15)
- colbymchenry / codegraph
- Imbad0202 / academic-research-skills
- tinyhumansai / openhuman
- multica-ai / andrej-karpathy-skills
- rohitg00 / ai-engineering-from-scratch
- HKUDS / CLI-Anything
- can1357 / oh-my-pi
- obra / superpowers
- anthropics / claude-plugins-official
- msitarzewski / agency-agents
- rmyndharis / OpenWA
- truelockmc / streambert
- opentoonz / opentoonz
- zakirullin / files.md
- rohitg00 / agentmemory
Product Hunt(15)
- Chromtuner
A chromatic tuner for macOS. ±1¢ accuracy
- Insta360 Mic Pro
Pro audio with a customizable color E-Ink face
- Tophat by Shopify
Test mobile CI builds on any device without building locally
- Supercut for Agents
Permission-aware AI access to recordings and metadata
- Retina
Screen recorder w/ auto-zoom, smooth cursors, + AI graphics
- Skilled
Dashboard to find agent skills you no longer need
- LayerProof Kraft
Co-write insightful long form content
- Re_gent
Version Control for AI agent Activity
- Contextberg
Turn your work into AI agent memory, served over MCP
- StoreClaw
Grow your store profits with agents that know how to sell
- Glia
Local-first AI memory bridge between browser chats and IDEs
- GhostSnap
Multiple screenshots - Single paste - Auto compressed for AI
- Owlish
Reduce support volume with AI agents trained on your docs
- Viberia
Command AI agents like you're playing Civilization
- Manus Scheduled Tasks 2.0
Run recurring Manus work inside the same task context
Hugging Face(15)
- When Vision Speaks for Sound
Despite rapid progress in video-capable MLLMs, we find that their apparent audio understanding in videos is often vision-driven: models rely on visual cues to infer or hallucinate acoustic information, rather than verifying the audio stream. This issue appears across both state-of-the-art open-source omni models and leading closed-source models from providers such as Google and OpenAI. We characterize this failure mode as an audio-visual Clever Hans effect, in which models appear (falsely) audio-grounded, but actually exploit visual-acoustic correlations without verifying whether the audio and visual streams are truly aligned. To systematically study this behavior, we introduce Thud, an intervention-driven probing framework based on three counterfactual audio edits: Shift, which tests temporal synchronization; Mute, which tests sound existence; and Swap, which tests audio-visual consistency. Beyond diagnosis, we further study a two-stage alignment recipe: intervention-derived preference pairs teach audio verification, while event-level general video preferences regularize the model against over-specialization. Our best 10K-sample recipe improves average performance across the three intervention dimensions by 28 percentage points, while slightly improving performance on general video and audio-visual QA benchmarks.
- Active Learners as Efficient PRP Rerankers
Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.
- Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
On-policy self-distillation, where a student is pulled toward a copy of itself conditioned on privileged context (e.g., a verified solution or feedback), offers a promising direction for advancing reasoning capability without a stronger external teacher. Yet in math reasoning the gains are inconsistent, even when the same approach succeeds elsewhere. A pointwise mutual information analysis traces the failure to the privileged context itself: it inflates the teacher's confidence on tokens already implied by the solution (structural connectives, verifiable claims) and deflates it on deliberation tokens ("Wait", "Let", "Maybe") that drive multi-step search. We propose Anti-Self-Distillation (AntiSD), which ascends a divergence between student and teacher rather than descending it: this reverses the per-token sign and yields a naturally bounded advantage in one step. An entropy-triggered gate disables the term once the teacher entropy collapses, completing a drop-in replacement for default self-distillation. Across five models from 4B to 30B parameters on math reasoning benchmarks, AntiSD reaches the GRPO baseline's accuracy in 2 to 10x fewer training steps and improves final accuracy by up to 11.5 points. AntiSD opens a path to scalable self-improvement, where a language model bootstraps its own reasoning through its training signal.
- GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogeneous task coverage and reward formulations that inadequately reflect practical long-context requirements. Our work offers two contributions. (1) Capability-oriented data construction with full open release. We openly release a dataset of 23K RLVR samples, the complete construction pipeline, and all training code. Guided by a taxonomy of long-context capabilities, the dataset spans 9 task types, each paired with its natural evaluation metric. It comprises curated open-source samples from established corpora and synthetic samples whose QA pairs are generated from real source documents such as books, academic papers, and multi-turn dialogues. Under the same vanilla GRPO setup, our dataset alone outperforms the closed-source QwenLong-L1.5 dataset. Moreover, our Qwen3-30B-A3B model trained on this data delivers long-context performance comparable to DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507, suggesting that broader coverage and greater reward diversity substantially benefit long-context capability improvement. (2) TMN-Reweight for heterogeneous multitask optimization. To address optimization challenges from heterogeneous rewards, we propose TMN-Reweight, which combines task-level mean normalization for cross-task reward scale alignment with difficulty-adaptive weighting for more reliable advantage estimation. TMN-Reweight further improves average performance over vanilla GRPO, with general capabilities preserved or improved across reported evaluations.
- OpenComputer: Verifiable Software Worlds for Computer-Use Agents
We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specific state verifiers that expose structured inspection endpoints over real applications, (2) a self-evolving verification layer that improves verifier reliability using execution-grounded feedback, (3) a task-generation pipeline that synthesizes realistic and machine-checkable desktop tasks, and (4) an evaluation harness that records full trajectories and computes auditable partial-credit rewards. In its current form, OpenComputer covers 33 desktop applications and 1,000 finalized tasks spanning browsers, office tools, creative software, development environments, file managers, and communication applications. Experiments show that OpenComputer's hard-coded verifiers align more closely with human adjudication than LLM-as-judge evaluation, especially when success depends on fine-grained application state. Frontier agents struggle with end-to-end completion despite partial progress, and open-source models exhibit sharp drops from their OSWorld-Verified scores, exposing a persistent gap in robust computer automation.
- AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a Pivot/Refine decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.
- Process Rewards with Learned Reliability
Process Reward Models (PRMs) provide step-level feedback for reasoning, but current PRMs usually output only a single reward score for each step. Downstream methods must therefore treat imperfect step-level reward predictions as reliable decision signals, with no indication of when these predictions should be trusted. We propose BetaPRM, a distributional PRM that predicts both a step-level success probability and the reliability of that prediction. Given step-success supervision from Monte Carlo continuations, BetaPRM learns a Beta belief that explains the observed number of successful continuations through a Beta-Binomial likelihood, rather than regressing to the finite-sample success ratio as a point target. This learned reliability signal indicates when a step reward should be trusted, enabling downstream applications to distinguish reliable rewards from uncertain ones. As one application, we introduce Adaptive Computation Allocation (ACA) for PRM-guided Best-of-N reasoning. ACA uses the learned reliability signal to stop when a high-reward solution is reliable and to spend additional computation on uncertain candidate prefixes. Experiments across four backbones and four reasoning benchmarks show that BetaPRM improves PRM-guided Best-of-N selection while preserving standard step-level error detection. Built on this signal, ACA improves the accuracy--token tradeoff over fixed-budget Best-of-16, reducing token usage by up to 33.57% while improving final-answer accuracy.
- EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including τ^2-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.
- CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition
Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing video generation models, either inject conditions through adapters or couple a generic vision-language model (VLM) within a diffusion backbone, leaving a capability gap and failing to produce the videos that align with the user's creative intent. We present CogOmniControl, a reasoning-driven framework that factorizes controllable video generation into creative intent cognition and generation. Specifically, we train a specialized CogVLM using authentic anime production data. Compared to generic VLMs, it generates more professional and clear outputs, accurately cognizing user creative intent from sparse and abstract conditions and tuning these cues into dense reasoning output. Besides, CogOmniDiT unifies the controls from various conditions through in-context generation and is aligned to the CogVLM reasoning outputs via reinforcement learning. Furthermore, leveraging CogVLM's robust capability in guiding video generation, we release its potential in planning specific evaluators and enable a Best-of-N selection for the generated videos. This integration transforms the entire framework into a closed-loop "harness-like" architecture. We further introduce CogReasonBench and CogControlBench, built from professional workflows data that carry genuine creative intent rather than simulated ones. Experiments on two benchmarks show that CogOmniControl surpassed the existing open-source models. The project website: https://um-lab.github.io/CogOmniControl/
- Harnessing LLM Agents with Skill Programs
Equipping LLM agents with reusable skills derived from past experience has become a popular and successful approach for tackling complex and long-horizon tasks. However, such lessons are often encoded as textual guidance that remains largely advisory, lacking explicit mechanisms for when and how to intervene in the agent loop. To bridge the gap, we introduce HASP(Harnessing LLM Agents with Skill Programs), a new framework that upgrades skills into executable Program Functions (PFs). Rather than offering passive advice, PFs act as executable guardrails that activate on failure-prone states and modify the next action or inject corrective context. HASP is highly modular: it can be applied at inference time for direct agent-loop intervention, during post-training to provide structured supervision, or for self-improvement by evolving validated, teacher-reviewed PFs. Empirically, HASP drives substantial gains compared to both training-free and training-based methods on web-search, math reasoning, and coding tasks. For example, on web-search reasoning, inference-time PFs alone improve the average performance by 25% compared to (multi-loop) ReAct Agent, while post-training and controlled evolution achieve a 30.4% gain over Search-R1. To provide deeper insights into HASP, our mechanism analysis reveals how PFs trigger and intervene, how skills are internalized, and the requirement for stable skill library evolution.
- Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
Recent video generative models have greatly improved the realism of AI-generated videos, yet their outputs still exhibit artifacts such as temporal inconsistencies, structural distortions, and semantic incoherence. While Multimodal Large Language Models (MLLMs) show strong visual understanding capabilities, their ability to perceive and reason about such artifacts remains unclear. Existing benchmarks often lack systematic evaluation of artifact-aware perception and fine-grained diagnostic reasoning, especially across diverse AI-generated video domains beyond photorealistic content. To address this gap, we introduce Artifact-Bench, a comprehensive benchmark for evaluating MLLMs on AI-generated video artifact detection and analysis. We first establish a three-level hierarchical taxonomy of realism artifacts, covering photorealistic, animated, and CG-style videos. Based on this taxonomy, Artifact-Bench defines three complementary tasks: real vs. AI-generated video classification, pairwise realism comparison, and fine-grained artifact identification. Experiments on 19 leading MLLMs reveal substantial limitations in artifact perception and reasoning, with many models approaching random or even below-random performance in challenging settings. We further observe significant misalignment between MLLM judgments and human perceptual preferences, highlighting their limited reliability as general evaluators for AI-generated video realism.
- Aurora: Unified Video Editing with a Tool-Using Agent
Recent video editing models have converged on a unified conditioning design: a single diffusion transformer jointly consumes text, source video, and reference images, and one set of weights covers replacement, removal, style transfer, and reference-driven insertion. The design is flexible, but it assumes that the user already provides model-ready text, reference images, and spatial grounding for local edits, which real requests often omit. We present Aurora, an agentic video editing framework that pairs a tool-augmented vision-language model (VLM) agent with a unified video diffusion transformer. The VLM agent maps a raw user request to a structured edit plan aligned with the transformer's conditioning channels, thereby resolving textual and visual underspecification before generation. We train the VLM agent with supervised data for complete edit planning and reference-image selection, together with preference pairs for robust tool use and instruction refinement. We introduce AgentEdit-Bench to evaluate agent-enhanced video editing under textual and visual underspecification. Experiments on AgentEdit-Bench and two existing video editing benchmarks show that Aurora improves over instruction-only baselines and that the VLM agent transfers to compatible frozen video editing models. Project page: https://yeates.github.io/Aurora-Page
- CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization
When a model produces a correct solution under reinforcement learning with verifiable rewards (RLVR), every token receives the same reward signal regardless of whether it was a decisive reasoning step or a grammatical filler. A natural fix is to condition the model on the correct answer as a teacher, identifying tokens it would have generated differently had it known the answer. Prior work shows this either corrupts training by leaking the answer into the gradient, or produces a weak signal that cannot distinguish decisive steps from filler, since both look equally surprising relative to the model's baseline. We propose Contrastive Evidence Policy Optimization (CEPO), which asks a sharper question at every token: not just "does the correct answer favor this token?" but "does the correct answer favor it while the wrong answer disfavors it?" A token satisfying both is a genuine reasoning step; one satisfying neither is filler. The wrong-answer teacher is constructed from rejected rollouts already in the training batch, incurring no additional sampling cost. We prove CEPO inherits all structural safety guarantees of the prior state of the art while strictly sharpening credit at decisive tokens, with the improvement vanishing exactly at filler positions. Empirically, CEPO achieves 43.43% and 60.56% average accuracy across five multimodal mathematical reasoning benchmarks at 2B and 4B scale, respectively, versus 41.17% and 57.43% for GRPO under identical training budgets. Distribution-matching self-distillation methods (OPSD, SDPO) fall below the untrained baseline, empirically confirming the information leakage our theory predicts. Our code is available at https://github.com/ahmedheakl/CEPO.
- OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments
Current benchmarks for graphical user interface (GUI) agents predominantly rely on static screenshots. However, real-world smartphone interaction routinely requires agents to process transient audio cues and temporal video dynamics that are tightly coupled with the moment of action. To bridge this gap, we introduce OmniGUI, the first step-level benchmark designed to evaluate GUI agents in omni-modal smartphone environments. OmniGUI provides continuous, interleaved multimodal inputs comprising static images, synchronous audio, and video clips at every action step. The dataset encompasses 709 expert-demonstrated episodes (2,579 action steps) across 29 applications, systematically annotated with objective multimodal dependency levels. Because dedicated omni-modal GUI agent frameworks are currently in their nascent stage, we select foundational omni-modal models capable of natively processing interleaved inputs to serve as agent proxies for our initial baselines. Our empirical evaluation reveals that while current models exhibit competency on visually static tasks, their action prediction performance degrades significantly in environments requiring synchronous temporal and auditory signals. Furthermore, ablation studies isolate specific operational bottlenecks, notably cross-modal interference when processing task-irrelevant environmental noise. The complete dataset, evaluation pipeline, and baseline prompts are provided in the supplementary material. Project page: https://omni-gui.github.io.
- MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation
Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid evaluation pipelines, preventing systematic and reliable assessment of modern MSAV models. To bridge these gaps, we introduce MSAVBench, the first comprehensive benchmark and adaptive hybrid evaluation framework for multi-shot audio-video generation. Our benchmark spans four key dimensions, video, audio, shot, and reference, covering diverse task settings, varying shot counts of up to 15, and challenging non-realistic scenarios. Our evaluation framework improves robustness through an adaptive self-correction mechanism for shot segmentation, instance-wise rubrics for subjective metrics, and tool-grounded evidence extraction for complex judgments. Furthermore, MSAVBench achieves high alignment with human judgments, reaching a Spearman rank correlation of 91.5%. Our systematic evaluation of 19 state-of-the-art closed- and open-source models shows that current systems still struggle with director-level control and fine-grained audio-visual synchronization, while modular or agentic generation pipelines offer a promising path toward narrowing the gap between open- and closed-source models. We will release the benchmark data and evaluation code to facilitate future research.
Techmeme(15)
- A study conducted in the Phoenix metro area finds that waste heat from data centers can raise air temperatures in downwind neighborhoods by as much as 4°F (Tech Xplore)
Tech Xplore : A study conducted in the Phoenix metro area finds that waste heat from data centers can raise air temperatures in downwind neighborhoods by as much as 4°F — Waste heat from data centers can boost air temperatures in downwind neighborhoods by as much as 4 degrees Fahrenheit …
- Paris-based Pivot, which develops AI tools for procurement and financial workflows, raised a $40M Series B co-led by Forestay Capital and Notion Capital (Tamara Djurickovic/Tech.eu)
Tamara Djurickovic / Tech.eu : Paris-based Pivot, which develops AI tools for procurement and financial workflows, raised a $40M Series B co-led by Forestay Capital and Notion Capital — With new funding, Pivot plans to expand its AI-powered procurement and finance platform, accelerate development of agentic AI capabilities …
- Bengaluru-based Scapia, which offers travel-focused co-branded credit cards and an app, raised $63M led by GC, a source says at a $500M+ post-money valuation (Jagmeet Singh/TechCrunch)
Jagmeet Singh / TechCrunch : Bengaluru-based Scapia, which offers travel-focused co-branded credit cards and an app, raised $63M led by GC, a source says at a $500M+ post-money valuation — Scapia, an Indian startup that combines travel booking with co-branded credit cards and mobile payments, has raised $63 million …
- X resolved a three-year dispute with Australia's eSafety regulator after a court upheld a fine against the company for inadequate disclosures on combating CSAM (Byron Kaye/Reuters)
Byron Kaye / Reuters : X resolved a three-year dispute with Australia's eSafety regulator after a court upheld a fine against the company for inadequate disclosures on combating CSAM — An Australian court upheld a regulator's fine against Elon Musk's social media company X Corp after it admitted violating the law …
- Commure, which offers AI, revenue cycle management, and workflow automation tools for healthcare providers, raised $70M led by GC at a $7B post-money valuation (Paige Minemyer/Fierce Healthcare)
Paige Minemyer / Fierce Healthcare : Commure, which offers AI, revenue cycle management, and workflow automation tools for healthcare providers, raised $70M led by GC at a $7B post-money valuation — Commure investment Artificial Intelligence General Catalyst — Healthcare AI company Commure has banked $70 million in fresh funding, reaching a $7 billion valuation.
- Ofcom says Meta, Snap, and Roblox will adopt stronger anti-grooming measures, while TikTok and YouTube "failed to commit to any significant changes" (Ofcom)
Ofcom : Ofcom says Meta, Snap, and Roblox will adopt stronger anti-grooming measures, while TikTok and YouTube “failed to commit to any significant changes” — - Snap, Meta and Roblox to adopt further safety measures to protect children from stranger danger online
- AMD says its Mac Mini-sized Ryzen AI Halo PC starts at $3,999 with Ryzen AI Max 300 chips, for pre-order in June, and unveils AI Max 400 chips, available in Q3 (Devindra Hardawar/Engadget)
Devindra Hardawar / Engadget : AMD says its Mac Mini-sized Ryzen AI Halo PC starts at $3,999 with Ryzen AI Max 300 chips, for pre-order in June, and unveils AI Max 400 chips, available in Q3 — They're direct shots at NVIDIA's AI systems — AMD's big pitch for 2026 seems to be: “Who needs cloud AI processing when you can do it all locally?”
- Cohere releases Command A+, a sparse MoE open model built for agentic tasks, with 218B total and 25B active parameters, its first under the Apache 2.0 license (Carl Franzen/VentureBeat)
Carl Franzen / VentureBeat : Cohere releases Command A+, a sparse MoE open model built for agentic tasks, with 218B total and 25B active parameters, its first under the Apache 2.0 license — Canadian AI lab Cohere made waves recently by announcing a merger with German AI startup Aleph Alpha, but now it has even more in store …
- Jensen Huang said that Nvidia has "largely conceded" China's AI chip market to Huawei and should "expect nothing" regarding chip sale approvals to China (Lee Ying Shan/CNBC)
Lee Ying Shan / CNBC : Jensen Huang said that Nvidia has “largely conceded” China's AI chip market to Huawei and should “expect nothing” regarding chip sale approvals to China — Nvidia CEO Jensen Huang said the company has “largely conceded” China's artificial intelligence chip market to Huawei …
- Filing: SpaceX set aside $530M for potential litigation losses, including lawsuits involving Grok's "Spicy" mode, which it described as a "heightened risk" (Wired)
Wired : Filing: SpaceX set aside $530M for potential litigation losses, including lawsuits involving Grok's “Spicy” mode, which it described as a “heightened risk” — The rocket company has set aside more than $500 million for potential litigation losses …
- SpaceX S-1: Starlink had 10.3M subscribers in Q1 2026, a 105% increase YoY; SpaceX's "Connectivity" business, which is primarily Starlink, made $11.3B in 2025 (Michael Kan/PCMag)
Michael Kan / PCMag : SpaceX S-1: Starlink had 10.3M subscribers in Q1 2026, a 105% increase YoY; SpaceX's “Connectivity” business, which is primarily Starlink, made $11.3B in 2025 — (daily_creativity via Shutterstock) — For the first time, we have hard numbers on Starlink's paid subscriber base …
- Sources: the Pentagon is launching a task force to study how to safely deploy leading AI tools with hacking capabilities across Cyber Command and NSA missions (Politico)
Politico : Sources: the Pentagon is launching a task force to study how to safely deploy leading AI tools with hacking capabilities across Cyber Command and NSA missions — These people, like others in this report, were granted anonymity because they were not authorized to speak publicly about the sensitive effort.
- SpaceX S-1: xAI plans to buy another $2.8B worth of turbines for its data centers, including a $2B deal for mobile gas turbines, the type it's being sued over (Tim De Chant/TechCrunch)
Tim De Chant / TechCrunch : SpaceX S-1: xAI plans to buy another $2.8B worth of turbines for its data centers, including a $2B deal for mobile gas turbines, the type it's being sued over — Elon Musk's xAI has gotten itself in hot water over its use of polluting generators at its data center near Memphis, Tennessee.
- SpaceX S-1: xAI had a $6.4B operating loss on $3.2B in revenue in 2025; Grok and X had 550M MAUs combined as of March 2026, and 117M used Grok's AI features (Rebecca Bellan/TechCrunch)
Rebecca Bellan / TechCrunch : SpaceX S-1: xAI had a $6.4B operating loss on $3.2B in revenue in 2025; Grok and X had 550M MAUs combined as of March 2026, and 117M used Grok's AI features — Elon Musk's xAI lost $6.4 billion from operations on just $3.2 billion in revenue in 2025, according to SpaceX's IPO filings.
- SpaceX S-1: Anthropic is paying SpaceX $1.25B/mo. until May 2029 under their compute deal; Anthropic says it's expanding the deal to include Colossus 2 capacity (Ina Fried/Axios)
Ina Fried / Axios : SpaceX S-1: Anthropic is paying SpaceX $1.25B/mo. until May 2029 under their compute deal; Anthropic says it's expanding the deal to include Colossus 2 capacity — Anthropic is paying SpaceX $1.25 billion per month through May 2029 as part of the massive compute deal the companies signed earlier this month.
Solidot(15)
- Firefox 将移除 asm.js 相关代码
Mozilla 宣布 Firefox 未来将移除 asm.js 相关代码,因为它早有了后继者 WebAssembly,同时维护两者耗费时间且增加攻击面。asm.js 是 Mozilla 对 NaCl 和 PNaCl 的回应:通过选择一个严格静态的 JavaScript 子集获得类似 NaCl/PNaCl 的性能,同时代码又能直接运行在 Web 内容中。asm.js 于 2013 年随 Firefox 22 发布,获得了巨大的成功,证明只使用 Web 技术就能在 Web 上以接近原生的速度运行代码,它为 WebAssembly 的诞生铺平了道路,WebAssembly 在 2019 年成为 W3C 标准。Mozilla 从 Firefox 148 开始 JS 引擎 SpiderMonkey 默认禁用 asm.js 优化,未来版本将完全移除相关代码,使用 asm.js 的网站不会受到影响,开发者建议想要继续使用 asm.js 发布内容的网站重编译到 WebAssembly,它的执行速度更快,二进制文件更小。
- Google 云服务 GCP 不小心将其大客户 Railway 的账号封禁
2024 年 Google 云服务 GCP 的错误配置导致澳大利亚退休基金管理公司 UniSuper 的数据被完全删除,幸运的是 UniSuper 在另一家公司有备份。这起事故导致 UniSuper 下线了一周多时间。2026 年 5 月 19 日 GCP 发生了一起类似的严重事故,它的自动系统将其大客户、PaaS 平台 Railway.com 的生产账号给封了,导致 Railway 的服务下线,根据 Railway 官方博客的事故报告,宕机持续了大约 8 个小时。账号封禁发生在 19 日 22:10 UTC,导致 Railway 失去了 GCP 相关的基础设施,这些基础设施支持了控制面板、API 以及部分网络基础设施。Railway 立即联系了 GCP 的客户经理,22:29 UTC 账号恢复,但计算实例、磁盘以及网络都需要逐个慢慢恢复,直到第二天 07:58 UTC 事故才完全解决。Railway 宣布将降低对 GCP 的依赖,计划将 GCP 从热路径中移除,保留作为备份/故障转移服务。
- 为何日本的花粉过敏如此严重
日本的花粉过敏症是一个全国性健康问题,估计 43% 的日本人出现中度至重度症状。相比下英国是 26%,美国为 12%-18%。每年春天日本全国各地的城市街道上人人都戴上口罩,原因就是花粉引发的过敏性鼻炎。为什么日本的花粉过敏问题如此严重?原因与健康不佳、污染甚至自然环境都关系不大,而是与二战后日本政客的决策有关。战争期间,石油和天然气短缺迫使日本转向其最丰富的自然资源——森林——作为家庭和工业的燃料来源。天然森林遭到大面积砍伐,东京、大阪和神户等城市周围山林被砍伐殆尽。二战之后,由于光秃秃的山容易引发山体滑坡和洪涝灾害,政府决定开展大规模植树造林。政府选择了两种快速生长的树种:日本杉(sugi)和日本扁柏(hinoki)。今天这些杉树和柏树的种植面积占到了国土面积的五分之一。问题是杉树和柏树在生长 30 年成熟之后会产生大量轻质花粉。而几乎所有人工林的年龄都超过 30 岁了。为了缓解过敏症日本政府如今计划砍掉五分之一的杉树林,替换上新树种。
- Fedora 移除深度桌面环境包
在 openSUSE 之后,Fedora 发行版移除了深度桌面环境包(Deepin Desktop)。2025 年初 SUSE 安全团队在一次例行审查中发现深度桌面环境有名叫 deepin-feature-enable 的软件包,该软件包是在 2021 年 4 月加入的,并没有咨询或通知 SUSE,它包含了一个“许可协议对话框(license agreement dialog)”,基本上说讲因为 openSUSE 的安全规定,它禁用了 deepin-api 和 deepin-daemon 需要的所有 dbus 和 polkit 功能,这可能导致 Deepin Desktop 不能正常工作,部分功能无效。如果用户不在意这些安全问题,可选择点击确认,之后会自动安装缺少的 dbus 和 polkit。安全团队的调查发现,deepin-daemon 中的核心组件从未递交进行安全审查,它们被悄悄的引入到了 openSUSE 中。鉴于 Deepin 社区过去几年多次违规,openSUSE 决定移除 Deepin Desktop。Fedora 项目随后也对深度桌面环境包展开安全审查,期间开发者发现难以联系部分深度软件包的维护者,因为安全担忧和软件包缺乏维护,它最终决定移除深度桌面环境。
- OpenAI 和英伟达等在模型中加入了对 SynthID 水印的支持
Google 在三年前推出了用于标记 AI 图像的数字水印技术 SynthID,它称 SynthID 至今被用于标记了 1000 亿张图像和视频。Google 去年在 Gemini 应用中添加了 SynthID 检测功能。用户上传可疑内容,询问聊天机器人是否是 AI 生成的。Google 称至今还没有人成功破解 SynthID,宣布与多家 AI 公司合作加入对该水印技术的支持。英伟达的 Cosmos、OpenAI 的 GPT 2 图像、Kakao 和 ElevenLabs 都将在其 AI 生成内容中加入对 SynthID 的支持。
- 全球疫苗接种率下滑
全球疫苗接种率下滑。在医疗体系陷入混乱的新冠疫情过去后,疫苗接种率今未能恢复至以前的水平。2024 年麻疹疫情已蔓延至 59 个国家。麻疹病毒传染性极强,如果同一空间中有感染者,没有相关免疫的人群几乎 100% 会被感染。该病的并发症有肺炎、中耳炎等,甚至可能导致脑炎,变成重症。预防麻疹必须要靠疫苗。想要维持群体免疫、防止疫情扩散,疫苗接种率需达到 95% 以上。新冠疫情期间,由于出行限制,民众普遍推迟了其他疫苗的接种。医疗机构方面,接种人员和治疗人员也侧重于应对新冠疫情。加上其他传染病的流行得到抑制,认为无需接种疫苗的人越来越多,导致全球疫苗接种率持续走低。除麻疹以外,其他传染病也呈现出类似趋势。2024 年白喉、百日咳、破伤风三联疫苗的接种率全球所有地区都低于 2010 年以后的峰值水平。
- 地月之间的最高效路线
科学家开发出一种数学方法,能更精确地计算天体轨道之间最经济的旅行路线。以地月为例,与此前最节能的路线相比,新路线所需燃料减少了 58.80 米/秒。与旅程的预估总成本 3342.96 米/秒相比,这一差距看似微小,却对任务成本影响巨大。团队表示,在太空旅行中,每1米/秒的速度变化,都意味着巨大的燃料消耗。基于这一结果,团队绘制出一条从地球轨道到月球轨道的航天器飞行轨迹,并将其分为两个阶段。首先,航天器脱离地球轨道,进入L1拉格朗日点周围的轨道。L1拉格朗日点位于地球和月球之间,在这里,两天体的引力恰好相互抵消。借助控制系统,航天器可以无限期地保持在这个中间轨道上,直到任务准备就绪,再执行进入月球轨道的第二阶段。
- GitHub 证实黑客窃取了其内部代码库
GitHub 通过 X 平台官方账号证实黑客窃取了其内部代码库,它正对此展开调查。此前黑客组织 TeamPCP 通过 Breached 论坛声称获得了 GitHub 内部源代码和内部组织的访问权限,窃取了大约 3800 个代码库,它对想要访问源代码的人开出了 5 万美元的报价。TeamPCP 坚称这不是勒索,只要有人开出不低于 5 万美元的报价,它们会在收钱之后销毁数据,如果没有买家则将会免费公开。GitHub 称它的调查显示一名员工的计算机被入侵,其源头是安装的恶意 VS Code 扩展,他们移除了扩展隔离了设备,正继续进行调查。GitHub 表示目前没有证据表明客户数据受到影响。
- Kickstarter 撤销对成人内容的全面封禁
众筹平台 Kickstarter 上周修改了规则,扩大了禁止的成人内容范围。此前它只禁止“色情内容”,更新后的规则显著扩大了成人内容范围,包括但不限于:暗示性行为,MILF/DILF 内容,暗示性裸露,任何包含女性乳头/乳晕、生殖器和肛门的内容。在引发争议之后,Kickstarter 证实它修改规则是在支付处理商 Stripe 压力下做出的,而 Stripe 受到了更大的金融系统的制约。过去几个月 Kickstarter 上进行众筹的项目有许多其筹款账号被 Stripe 暂停,因此它修改规则以满足 Stripe 限制成人内容的要求。但这一做法受到了社区的批评,它现在决定撤销新的规则,回归旧规则,但同时添加了 Stripe 政策的相关链接。
- Google 宣布改变搜索框
在周二举行的 Google I/O 开发者大会上,Google 宣布对其有 25 年历史的标志性搜索框进行重新设计,将其转变成 AI 驱动的“智能搜索框”——基本上就是聊天机器人的对话框,其功能从执行搜索变为询问 Google(Ask Google)。Google 声称在搜索服务集成 AI 模式之后,月活跃用户数突破了 10 亿,搜索量创下了历史新高,所以它现在准备进一步把 AI 模式变成搜索的默认功能。类似 AI 聊天机器人,智能搜索框可以将文本、图像、文件、视频或 Chrome 标签页作为输入进行搜索。Google 还将提供智能体数字助手帮助用户自动搜索,寻找公寓的用户无需打开 Zillow 等网站即可收到新房源的通知。Google 此举再次引发了广泛批评,基于大模型的 AI 功能并没有将精确性视为核心,因此未来的搜索质量会进一步下降,进一步模糊广告和搜索结果。
- 三星电子劳资谈判破裂,从 21 日起开始 18 天大罢工
三星电子劳资 20 日就奖金发放上限标准等进行第三轮事后调解会议,但是双方未能达成协议,谈判最终破裂。工会表示对雇佣劳动部旗下中央劳动委员会提出的协调方案表示同意,但是三星电子方面拒不接受协调方案。三星电子只反复称“尚未做出决策”,没有表明立场。工会将于明天如期启动总罢工,在罢工期间工会仍将继续努力,争取同资方达成协议。总罢工预期每天会给三星电子带来多达 20 亿美元的损失。韩总统府对谈判破裂表示遗憾,韩政府正在研讨行使“紧急调解权”限制工会进行罢工,并将支持劳资进行新一轮调解。
- Bug 悬赏项目被 AI 报告淹没
企业通过 Bug 悬赏项目向白帽子黑客支付发现 bug 的赏金,但此类项目如今被低质量的 AI 报告淹没,迫使部分企业终止项目。Bugcrowd 的客户包括 OpenAI、T-Mobile 和摩托罗拉,该公司表示 3 月三周内收到的报告数量翻了四倍多,大部分报告被证实是错误的。Curl 项目在 1 月暂停了 Bug 悬赏项目。网络安全公司 Sophos 的首席信息安全官 Ross McKerchar 表示,低质量 AI 报告正迅速成为一大问题,Bug 悬赏会继续 存在,但必须做出改变。Nextcloud 在 4 月暂停了 Bug 悬赏。Bug 悬赏项目平台 HackerOne 也开始引入 AI 智能体去筛选递交的 Bug 报告,CEO Kara Sprague 表示高质量的 AI 报告最近也略有增加。
- pgBackRest 作者宣布继续维护该项目
上月底,PostgreSQL 备份恢复项目 pgBackRest 的维护者 David Steele 宣布项目存档停止维护。pgBackRest 被广泛视为是 PostgreSQL 生态系统最流行的运维工具之一。Steele 解释说,过去 13 年 pgBackRest 是他倾注热情的项目,幸运的是大部分时间里他都有企业资助,他的长期赞助商是 Crunchy Data 公司,但这家公司被 Snowflake 收购了,而新东家无意资助他继续从事相关工作,因此他过去几个月一直在寻找继续这项工作的职位但没有成功,获得的赞助也远远未能达到维持项目运营所需的金额,因此只能宣布停止维护。在这一声明公布数周之后,他更新了消息,宣布将继续开发 pgBackRes:因为一个赞助商联盟同意为项目持续提供资金,给予了 pgBackRes 开发所需的长期稳定性,他对此表示了感谢。
- 索尼取消将 PS 独占单人游戏移植到 PC 的计划
负责索尼 PS 工作室业务的高管 Hermen Hulst 周一证实了此前的流言:取消将 PS 独占单人游戏移植到 PC 的计划。索尼过去几年将此前的独占 PS 单人游戏如 God of War 系列、Spider-Man 系列、Ghost of Tsushima、The Last of Us 系列和 Horizon Zero Dawn 系列移植到了 PC 平台,但最近一段时间移植频率下降,引发了索尼改变移植战略的流言。Hermen Hulst 周一在员工大会上宣布了公司的战略调整计划。索尼据称是担心稀释 PlayStation 品牌影响力。此举意味着索尼最近推出的单人游戏 Ghost of Yotei 和 Saros 将会无缘登陆 PC。索尼的战略调整针对的是第一方工作室的单人游戏,多人游戏以及第三方工作室的单人游戏仍然会登陆 PC。
- 人类为什么惯用右手
人类中的大多数是右撇子,左撇子占约十分之一。为什么会出现这一倾向?研究人员分析了 41 种灵长类动物,共计 2025 只猴子与猿类的数据,逐一分析了工具使用、食性、栖息环境、体型、社会结构、脑容量、行动方式等各类影响因素。人类的用手倾向与其他灵长类动物存在明显差异。当研究人员将两个关键特征纳入模型中,情况就发生了变化。这两个特征分别是大脑大小及臂长与腿长的比例,这一比例常作为衡量两足行走能力的指标。纳入上述因素后,人类不再被视为特殊的进化产物。研究结果表明,直立行走与脑容量增大的共同作用,或是人类形成强烈右手使用偏好的核心原因。研究人员认为,惯用右手的进化分为两个阶段。首先,直立行走使双手从运动中解放出来,偏爱更专业和不对称的手部使用;其次,随着人类大脑变得更大且更为复杂,对右手的偏好变得愈发强烈且更为普遍。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.