TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0876
MON, MAY 25, 2026
Discover the best information organized by OrangeBot.AI
TODAY · MON, MAY 25, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

May 25, 2026

Here is a summary of today's main news events.

Markets React Strongly to Progress in U.S.-Iran Negotiations

Reports of significant progress in U.S.-Iran negotiations, aimed at a deal that could reopen the Strait of Hormuz, caused major reactions in global markets today. Brent crude oil prices fell by over 5% on the prospect of increased supply. In response, U.S. stock futures rallied, the U.S. dollar weakened, and gold prices rose.

AI Development Accelerates Amid Growing Ethical Concerns

Artificial intelligence was a major focus today. Meta's Chief Technology Officer, Andrew Bosworth, revealed a mission to reshape the company's workforce using AI. Simultaneously, Pope Francis issued a strong warning about the dangers of a technology revolution driven by "the idolatry of profit." This comes amid reports of AI software being designed to bypass safety protocols, creating systems that can provide instructions on dangerous topics like biological weapons.

Chinese Tech Giant Sets Goal to Rival Intel's Top Chips

A major Chinese technology company announced an ambitious goal to produce semiconductors that can match Intel's most advanced chips by the year 2031. This declaration underscores the escalating global competition for technological supremacy in the critical chip-making industry.

UK Political Figures Face Legal and Security Issues

In UK news, the former chief executive of the Scottish National Party and ex-husband of former leader Nicola Sturgeon will be detained until his sentencing on June 23. Separately, senior government figure John Healey was reportedly affected by a suspected Russian electronic attack, raising national security concerns.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - May 25, 2026

Hacker News Feed: Highlighting key posts and discussions.

Leave Me Behind

(androidessence.com)

9860
The Eternal Sloptember

(geohot.github.io)

385312
Building Pi with Pi

(lucumr.pocoo.org)

134110
I spent 50 hours drawing a line graph

(www.dougmacdowell.com)

58296
Wake up! 16b

(hellmood.111mb.de)

42334
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - May 25, 2026

huggingface.description

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.

137
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

We introduce Lens, a 3.8B-parameter T2I model that achieves performance competitive with, and in several cases surpassing, state-of-the-art models with more than 6B parameters across various benchmarks, while requiring significantly less training compute. For example, Lens requires only about 19.3% of the training compute used by Z-Image. The training efficiency of Lens stems from two key strategies beyond its compact model size. First, we maximize data information density per training batch by (i) training on Lens-800M, a dataset of 800M densely captioned image-text pairs whose captions are generated by GPT-4.1 and contain approximately 109 words on average, providing richer semantic supervision than conventional short captions, and (ii) constructing each batch from images with multiple resolutions and diverse aspect ratios, thereby enlarging the effective visual coverage of each optimization step. Second, we improve convergence speed through careful architectural choices, including adopting a semantic VAE that provides better latent representations and employing a strong language encoder that accelerates optimization while enabling multilingual generalization from English-only training data. After pre-training, we apply RL with taxonomy-driven prompts (Lens-RL-8K) and structured reward rubrics to suppress artifacts and improve visual quality, a reasoner module with training-free system prompt search to better align user requests with the model, and distillation-based acceleration for 4-step inference. Through efficient training and systematic optimization, Lens generalizes to arbitrary aspect ratios from 1:2 to 2:1 and resolutions up to 1440^2, and supports prompts in several commonly used languages. Thanks to its compact size, Lens generates a 1024^2 image in 3.15 seconds on a single NVIDIA H100 GPU, while its distilled turbo version performs 4-step generation in 0.84 seconds.

87
Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, objectives, and latent autoencoders -- has been extensively revisited. The residual stream that governs how information accumulates across layers, however, has been directly inherited from the original Transformer. In this paper, we present a systematic empirical analysis of cross-layer information flow in DiTs, jointly along depth and denoising timestep, and identify three concrete symptoms of traditional residual addition, namely monotonic forward magnitude inflation, sharp backward gradient decay, and pronounced block-wise redundancy. Motivated by this diagnosis, we propose Diffusion-Adaptive Routing (DAR), a drop-in residual replacement that performs learnable, timestep-adaptive, and non-incremental aggregation over the history of sublayer outputs. Moreover, the proposed DAR is compatible with many modern Transformer enhancement methods, such as REPA. On ImageNet 256times256, DAR improves SiT-XL/2 by 2.11 FID (7.56 vs.\ 9.67) and matches the baseline's converged quality with 8.75times fewer training iterations. Stacked on top of REPA, it yields a 2times training acceleration in the early stage, suggesting cross-layer information routing as an underexplored design axis in diffusion modeling, one that operates orthogonally to existing representation-alignment objectives. Beyond pretraining, DAR can also be applied during the fine-tuning stage of large-scale T2I models and preserves high-frequency details during Distribution Matching Distillation.

80
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current academic retrieval tools predominantly rely on superficial keyword matching or vector-space semantic retrieval, which lack the topological reasoning capabilities required to navigate complex logical connections. Agentic deep-research-based frameworks are often prone to logical hallucinations and consuming high inference costs. To bridge this gap, in this report, we introduce SciAtlas, a large-scale, multi-disciplinary, heterogeneous academic resource knowledge graph designed as a panoramic scientific evolution network. By integrating over 43M papers from 26 disciplines, and a total of 157M entities and 3B triplets, SciAtlas provides a structured topological cognitive substrate that dismantles disciplinary barriers and furnishes AI agents with a global perspective. Furthermore, we develop a neuro-symbolic retrieval algorithm featuring tri-path collaborative recall and graph reranking, achieving a seamless transition from simple semantic matching to deterministic association discovery. We also present key application directions of SciAtlas, including literature review, automated research trend synthesis, idea positioning, and academic trajectory exploration, to demonstrate that SciAtlas can serve as an effective ``cognitive map'' to empower the full loop of automated scientific research while significantly reducing reasoning costs. We have released the interfaces for KG retrieval and various downstream tasks in our GitHub repo.

44
StepAudio 2.5 Technical Report

Unified audio-language modeling has emerged as a prominent trend in modern speech systems, promising to bring the reasoning capabilities of large language models to auditory tasks. However, existing unified foundations often struggle to match the depth of specialized systems across automatic speech recognition (ASR), text-to-speech synthesis (TTS), and realtime spoken interaction. Bridging this gap remains an open challenge. This report presents StepAudio 2.5, a unified audio-language foundation model that matches or exceeds specialized systems across all three capabilities. Rather than treating these tasks as architecturally distinct, we operate on the premise that once text and audio share a multimodal representational space, task specialization becomes a matter of operational regimes: data construction, optimization targets, and decoding constraints. Guided by this insight, we advance the post-training paradigm from standard supervised learning to task-tailored Reinforcement Learning from Human Feedback (RLHF), using it as the primary mechanism to define complex optimization targets. We leverage this RLHF-centric alignment, alongside specialized decoding, to shape a shared backbone into three distinct operational modes. Concretely, the ASR branch advances transcription efficiency via verifiable multi-token decoding; the TTS branch achieves controllable, expressive synthesis through preference-based RLHF and context-rich supervision; and the Realtime branch realizes low-latency, persona-consistent dialogue via generative reward modeling within an RLHF framework. On standard benchmarks, StepAudio 2.5 achieves state-of-the-art results across ASR, TTS, and Realtime, demonstrating that a singular audio-language foundation can successfully internalize the distinct deployment objectives of speech understanding, generation, and live interaction.

35
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

We present SWIM (See What I Mean), a novel training strategy that aligns vision and language representations to enable fine-grained object understanding solely from textual prompts. Unlike existing approaches that require explicit visual prompts, such as masks or points, SWIM leverages mask supervision only during training to guide cross-modal attention, allowing the model to automatically attend to the user-specified object at inference. Our cross-attention analysis of pretrained multimodal large languagemodels (MLLMs) reveals a systematic discrepancy: Attribute words produce sharp, localized activations in the visual modality, whereas object nouns yield diffuse and scattered patterns due to semantic reference bias and distributed high-level representations. To address this misalignment, we construct NL-Refer, an enriched dataset, in which each object mask is paired with a precise natural language referring expression. SWIM extracts multi-layer cross-attention maps from object nouns and enforces spatial consistency with ground-truth masks. Experimental results demonstrate that SWIM substantially improves text-visual alignment and achieves superior performance over visual-prompt-based methods on fine-grained object understanding benchmarks. The code and data are available at https://github.com/HumanMLLM/SWIM{https://github.com/HumanMLLM/SWIM}.

28
PhotoFlow: Agentic 3D Virtual Photography Missions

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatial agent increasingly plausible, but the task stresses two capabilities that remain hard to evaluate together: complex 3D spatial understanding and abstract aesthetic judgment. We introduce PhotoFlow, a Director-Reviewer-Reflector agent for closed-loop camera search. The Director builds a soft photographic blueprint and proposes diverse candidate cameras; the Reviewer combines rule checks, visual critique, and pairwise incumbent selection; and the Reflector converts failures into region memory, dead-zone suppression, and high-explore relocation. We also introduce VPhotoBench, a benchmark of 47 open-license Blender scenes and 141 language-conditioned photography missions spanning subject placement, relational composition, and atmosphere/style. On held-out experiments, PhotoFlow achieves the strongest external quality-alignment composite and success rate among one-shot prediction, single-chain reflection, anchor-bank selection, and random search under a six-round rendering budget. To our knowledge, this is the first work to make language-conditioned virtual photography in arbitrary Blender scenes an executable agent task, and our results show that an LLM-centered spatial agent can already produce strong photographs in a setting designed to challenge both 3D reasoning and aesthetic choice.

20
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Language agents increasingly improve by reusing skills -- structured procedural artifacts distilled from past experience. In particular, domain-level and model-generated skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and they scale beyond labor-intensive hand-crafting. However, while extraction methods continue to proliferate, understanding remains limited, with no comprehensive study spanning the full skill lifecycle -- experience generation, skill extraction, and skill consumption -- to ask whether such skills actually work, when they work, and what makes them succeed or fail. To close this gap, we build a utility-grounded evaluation framework that provides systematic experimental results across extractors and target agents, covering five diverse agentic task domains. We find that model-generated skills are beneficial on average but exhibit non-trivial negative transfer, and that neither extractors nor targets behave uniformly. A model can be a strong extractor yet a weak consumer, or vice versa, with skill utility independent of model scale or baseline task strength. To explain these patterns, we then dissect each lifecycle stage in depth, analyzing how experience composition shapes skill quality, what properties characterize useful skills, and how the same skill transfers across different consumers. Finally, we translate these findings into a concrete meta-skill that guides skill extraction toward the features tied to actual utility, which consistently improves skill quality across domains and substantially reduces negative transfer.

20
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

Spatio-temporal reasoning is a core capability for Multimodal Large Language Models (MLLMs) operating in the real world. As such, evaluating it precisely has become an essential challenge. However, existing spatio-temporal reasoning benchmark datasets primarily rely on static image sets or passively curated video data, which limits the evaluation of fine-grained reasoning capabilities. In this paper, we introduce VGenST-Bench, a video benchmark that employs generative models to actively synthesize highly controlled and diverse evaluation scenarios. To construct VGenST-Bench, we propose a multi-agent pipeline incorporating a human quality control stage, ensuring the quality of all generated videos and QA pairs. We establish a comprehensive 3x2x2 video taxonomy, encompassing Spatial Scale, Perspective, and Scene Dynamics to span diverse scenarios. Furthermore, we design a hierarchical task suite that decouples low-level visual perception from high-level spatio-temporal reasoning. By shifting the paradigm from passive curation to active synthesis, VGenST-Bench enables fine-grained diagnosis of spatio-temporal understanding in MLLMs.

18
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Most practical high-resolution text-to-image systems, including latent diffusion and autoregressive models, perform generation in a compact latent space, and a decoder maps the generated latents back to pixels. Yet the latent-to-pixel decoder is reconstruction-oriented, optimized to invert the encoder rather than synthesize more details, and becomes increasingly costly at megapixel scale. This drawback calls for a more expressive and efficient decoding paradigm. Motivated by recent progress in scalable pixel-space diffusion, we introduce PiD, a Pixel diffusion Decoder that reformulates latent decoding as conditional pixel diffusion, unifying decoding and upsampling into one generative module. By denoising directly in high-resolution pixel space, PiD synthesizes 4times and even 8times upscaled images with low latency. For latent conditioning, a lightweight sigma-aware adapter injects noise-corrupted latents into the pixel diffusion backbone, enabling PiD to decode partially denoised latents and terminate the latent diffusion process early. To further improve efficiency, we distill the model using DMD2, reducing inference to just 4 steps. PiD applies to both conventional VAE latents and semantic latents (e.g., SigLIP, DINOv2) used in recent RAE-based models. PiD decodes latents of 512 times 512 images into 2048 times 2048 pixels in under 1 second with 13 GB peak memory on a consumer RTX 5090, and as fast as 210 ms on a GB200 GPU, about 6times faster than cascaded diffusion-based super-resolution pipelines with better visual fidelity.

15
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

Discrete autoregressive (AR) text-to-image (T2I) models pair a VQ tokenizer with an AR policy, and current post-training pipelines optimize only the policy while keeping the VQ decoder frozen. Recent diffusion T2I work, exemplified by REPA-E, has shown that the VAE itself constitutes a key alignment bottleneck, yet no analogous investigation exists for discrete AR models. We show that policy-only optimization induces Latent Covariate Shift: as the policy evolves, the resulting token distribution diverges from the ground-truth distribution on which the decoder was trained, such that reward scores improve while decoded image quality degrades. To address this mismatch, we propose RankE, the first end-to-end post-training framework for discrete T2I generation. Rather than optimizing the policy against a fixed decoder, RankE co-evolves both components through alternating optimization: each module maximizes a ranking-based alignment objective while being regularized by a stability-preserving anchor suited to its parameter space. This co-evolution breaks the fidelity--alignment trade-off that plagues frozen-decoder approaches: on LlamaGen-XL (775M), standard RL improves CLIP but degrades FID, whereas RankE improves both simultaneously (FID 15.21, CLIP 33.76 on MS-COCO 30K). Consistent gains on Janus-Pro (1B) confirm that decoder co-evolution reliably converts reward optimization into pixel-space quality improvements.

13
ETCHR: Editing To Clarify and Harness Reasoning

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fixed predefined toolkits or produce noisy intermediate images from unified multimodal methods. We pursue a third option: using a dedicated image editing model and decouple it with an understanding model. However, off-the-shelf image editors fail as reasoning assistants with two complementary gaps: a language-side gap, where editors trained as passive instruction-followers cannot map an abstract question to an appropriate visual transformation, and a generation-side gap, where edit correctness degrades as reasoning depth grows. Guided by this analysis, we introduce ETCHR (Editing To Clarify and Harness Reasoning), a question-conditioned, reasoning-aware image editor decoupled from the downstream understanding model and trained with a two-stage recipe targeted at the two gaps: Reasoning Imitation via supervised fine-tuning on edit trajectories, followed by Reasoning Enhancement with VLM-derived rewards for edit correctness and downstream reasoning accuracy. Since the editor is decoupled, ETCHR plugs into different open- and closed-source MLLMs in a training-free manner. Across five task families (fine-grained perception, chart understanding, logic reasoning, jigsaw restoration, and 3D understanding), ETCHR raises average Pass@1 from 55.95 to 60.77 (+4.82) with Qwen3-VL-8B, from 65.08 to 70.55 (+5.47) with Gemini-3.1-Flash-Lite, and from 76.55 to 81.16 (+4.61) with the 1T-parameter MoE model Kimi K2.5.

9
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

Interactive world models for first-person shooter (FPS) games must resolve high-frequency overlapping control signals at every frame without disrupting unaffected regions. Existing methods inject actions globally and train on single titles, failing under dense FPS inputs. We observe that FPS actions are spatially selective: discrete events such as firing or reloading affect only a localized region around the weapon (the scope), while continuous camera and movement signals govern stable surroundings. We propose SCOPE, which inserts a conditioning module into each transformer block of a pretrained video diffusion model. It reshapes features into per-pixel temporal sequences so that each position computes its action response from local visual content. This separates in-scope effects from out-of-scope generation without segmentation labels. We also introduce CrossFPS, the first multi-game FPS dataset with frame-aligned action telemetry. It comprises 69K clips from 7 titles with 10-DoF controller signals, curated to remove gameplay bias. The model learns general visual-to-action mappings rather than game-specific patterns, enabling zero-shot transfer to unseen scenes. Experiments confirm strong action responsiveness, precise scope separation, and effective cross-game generalization.

8
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified theoretical framework that models LLM training as information transmission over a noisy channel, grounded in the Shannon-Hartley theorem. By mapping model parameters to channel bandwidth and training tokens to signal power, our formulation explicitly captures the interaction between learning signal and intrinsic noise. This perspective reveals a fundamental Shannon capacity for LLMs: scaling model size or data without preserving a sufficient signal-to-noise ratio (SNR) inevitably amplifies noise, inducing a transition from monotonic improvement to U-shaped performance degradation. We validate our theory through experiments on Pythia and OLMo2 under perturbations, including Gaussian noise, quantization and supervised fine-tuning on math, QA and code tasks. The Shannon Scaling Law consistently outperforms classical scaling laws and recent perturbation-aware laws, achieving strong R^2 scores and accurately capturing loss basins missed by prior approaches. It also extrapolates: fitted on leq6.9B Pythia models with leq180B tokens, it predicts the unseen 12B model up to 307B tokens at pooled R^2{=}0.847, while monotonic baselines collapse.

7
Geo-Align: Video Generation Alignment via Metric Geometry Reward

Camera-controlled video generation has achieved remarkable progress in recent years. However, existing video-to-video re-rendering methods primarily rely on Supervised Fine-Tuning using synthetic datasets. At present, there is an extreme scarcity of synchronized, multi-view real-world video data. Consequently, the prevailing paradigm often exhibits limited generalization when processing out-of-distribution real-world videos, with models struggling to accurately adhere to physical scales and camera trajectories. To bridge this gap, we propose Geo-Align, the first Reinforcement Learning framework specifically designed for camera-controlled video re-rendering. Built upon a pretrained model, we optimize the model through a scale-aware perceptual reward mechanism. Specifically, we introduce a metric 3D estimator to extract precise camera trajectories from generated videos, explicitly penalizing deviations in rotation and translation. Furthermore, we meticulously designed a data pipeline strategy based on real-world conditioning videos and target camera trajectories derived from synthetic data, eliminating the reliance on paired data. Extensive experiments demonstrate that Geo-Align consistently outperforms existing supervised learning baselines in both precise camera controllability and visual fidelity, indicating the effectiveness of our method.

5
LatentUMM: Dual Latent Alignment for Unified Multimodal Models

Unified multimodal models (UMMs) achieve strong performance in both understanding and generation by learning a shared latent space, yet they often exhibit functional inconsistency between these two capabilities. We observe that this issue does not stem from a lack of shared representations, but from the absence of explicit alignment between the transformations that map into and out of the latent space. As a result, generation and re-encoding can follow inconsistent trajectories, leading to semantic drift under modality transitions. In this work, we propose LatentUMM, a framework that constructs an enhanced shared latent space to explicitly align these transformations and improve cross-modal consistency. LatentUMM consists of two stages. First, dual latent alignment enforces consistency at both the modality and capacity levels: cross-modal alignment uses a stronger embedding model to impose structured cross-modal semantics, while dual capacity alignment enforces bidirectional consistency under generation and re-encoding. Second, latent dynamics stabilization improves robustness via stochastic latent rollouts and preference optimization, favoring trajectories that better preserve semantic consistency. Experiments show that LatentUMM consistently improves multimodal consistency across diverse architectures. Code is available at: https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/LatentUMM.

4
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM post-training by decomposing their capabilities into three separate training stages: visual perception, visual reasoning, and textual reasoning, incorporating specialized training data. We demonstrate that visual perception (a) requires targeted optimization with specialized data; (b) serves as a fundamental scaffold that should be solidified through staged training before refining visual reasoning; and (c) is more effectively learned via RL than caption-based SFT. Our experiments across multiple VLMs demonstrate that staged training consistently improves both visual perception and reasoning performance over merged training. Notably, models trained with our approach achieve 1.5% higher reasoning accuracy with 20.8% shorter reasoning traces, suggesting that superior perception reduces the need for excessive reasoning. Furthermore, we show that this capability-based staging represents a new curriculum dimension orthogonal to traditional difficulty-based curricula, and combining both yields further additive gains. Our staged-training models achieve superior performance among open-weight VLMs, establishing advanced results on several visual math and perception (e.g., +5.2% on WeMath and +3.7% on RealWorldQA) tasks compared with the base counterpart.

4
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

4
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level action-output signals, or by using feedback-conditioned self-distillation. However, generating feedback at every turn is inefficient when many intermediate turns are already successful or neutral, and applying feedback at a fixed or misaligned turn often fails to supervise the actions that contributed to the failure. To bridge this gap, we propose HINT-SD, a targeted self-distillation framework that uses full-trajectory hindsight to select failure-relevant actions and applies feedback-conditioned distillation only on targeted action spans. Experiments on BFCL v3 and AppWorld show that our method improves over the dense per-turn feedback baseline by up to 18.80 percent while achieving 2.26times lower time per training step, suggesting that selecting where to distill is a key factor for both effective and efficient long-horizon agent training.

3
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers inside these models. This limits both their scalability and efficiency. In this work, we address this challenge with a simple yet general strategy: restricting the number of key/value tokens that each query interacts with during global attention. To achieve effective token selection, we introduce a two-stage framework. First, an inter-frame selection step operates at the frame level to identify frames that should be preserved. Second, an intra-frame selection step further discards more redundant tokens within the selected frames. Our analysis highlights the advantage of a diversity-based strategy for inter-frame selection, which ensures broad coverage of the scene. For intra-frame selection, we show that layer-aware sparsification is necessary, with the selection process guided by the entropy of the global attention pattern. Our approach offers a superior speed-accuracy trade-off compared to existing solutions. Extensive experiments show that it accelerates visual geometry transformers by over 85% for scenes with 500 images while maintaining, or even improving, baseline performance, which hints that how our token selection strategy can play a crucial role in future applications of visual geometry transformers. Our project website is available at https://zsh2000.github.io/good-token-hunting.github.io.

2
The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

The rapid proliferation of Vision-Language Models (VLMs) is often framed as enabling unified multimodal knowledge discovery but rests on an under-examined assumption: that current VLMs faithfully synthesise multimodal data. We argue they often do not, and this gap reflects a trustworthiness problem in the dominant Vision Encoder-Projector-LLM paradigm. Rather than extracting grounded knowledge from visual inputs, state-of-the-art models frequently exhibit functional blindness, i.e., exploiting strong language priors to bypass severe visual representation bottlenecks. In this work, we challenge the conventional methodology of multimodal evaluation, which relies on data ablation or new dataset creation and therefore conflates dataset biases with architectural incapacity. We propose an information-theoretic departure: the Modality Translation Protocol, designed to quantify what we call the Expense of Seeing. By translating semantic payloads rather than ablating them, we formulate three novel metrics -- the Toll (ToS), Curse (CoS), and Fallacy (FoS) of Seeing -- culminating in the Semantic Sufficiency Criterion (SSC). Furthermore, we hypothesise a Divergence Law of Multimodal Scaling: as the underlying language engines scale to unprecedented reasoning capabilities, the penalty of the visual knowledge bottleneck may increase rather than diminish. We argue the community should move beyond "multimodal gain" as a primary evaluation target. By elevating the SSC from a passive diagnostic constraint to an active architectural blueprint, we provide a foundation for guiding the next generation of AI systems toward genuine multimodal reasoning.

2
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.

2
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - May 25, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Pi Coding Agent icon
Pi Coding Agent

The coding-agent harness you can make your own

0
Forum icon
Forum

Dedicated space for Facebook groups

0
Yansu icon
Yansu

AI that learns how you work and turns it into software

0
Rixx icon
Rixx

The Perplexity alternative that organizes your research

0
Orchestria icon
Orchestria

AI music engine with granular stem control

0
MashuPack icon
MashuPack

Turn codebases into a clean file for Claude and ChatGPT

0
LLMTest icon
LLMTest

Use the right LLMs in your apps. Setup fallbacks. Be happy.

0
Fred icon
Fred

AI-orchestrated UX research with behavioural tracking

0
The Incident Challenge icon
The Incident Challenge

Production Debugging Games for Software Engineers

0
own.page icon
own.page

Make your own personal website with bento tiles

0
Tiny CV icon
Tiny CV

Resume builder that fits on one page

0
tweet.md icon
tweet.md

X posts as clean Markdown

0
tldx icon
tldx

Fast CLI to bulk-check domains via RDAP & MCP

0
Databerry icon
Databerry

Track all your business data in a single dashboard

0
Unabyss icon
Unabyss

MCP-native self-updating context layer for your AI

0
Supaboard 3.0 icon
Supaboard 3.0

AI data analysts that understand your business

0
ModelHub icon
ModelHub

The missing menu bar app for local LLMs on Mac.

0
Stitch 3.0 by Google icon
Stitch 3.0 by Google

Generate and iterate UI screens with AI on a live canvas

0
Freu AI icon
Freu AI

Automate any Mac app with $0 recurring run cost

0
DockFlow icon
DockFlow

Save, switch, and automate Dock layouts for every workflow

0
DynamicNotch icon
DynamicNotch

Dynamic island for macOS

0
Runway Agent icon
Runway Agent

Generate edited, sound-designed videos via chat

0
Edgee Fallback Models icon
Edgee Fallback Models

Claude Code that never stops

0
WhatCable icon
WhatCable

Know what your USB-C cable can really do

0
Google Antigravity CLI icon
Google Antigravity CLI

Run coding agents directly from your terminal

0
note.md icon
note.md

Local-first markdown based workspace for research writings

0
Bulkmark icon
Bulkmark

Transform your Twitter/X Bookmarks into real knowledge

0
Forsy icon
Forsy

Authentic human signal from real agent workflows

0
Vibedock icon
Vibedock

Toggle Claude Code MCP servers from your menu bar

0
Spantop icon
Spantop

Turn any Mac into a real second monitor

0
Finderlock icon
Finderlock

Lock Mac files in Finder with Touch ID & AES-256

0
RetroMac icon
RetroMac

Turn your Mac into a time machine.

0
SignalLEMO - Ai Outreach Made Simple icon
SignalLEMO - Ai Outreach Made Simple

AI-powered lead outreach for field service contractors

0
Coca 2.0 icon
Coca 2.0

Keep Your Mac and Apps Awake!

0
Area Contrast Checker icon
Area Contrast Checker

Drag, Select, Know. A new way to check A11y contrast

0
Memdex icon
Memdex

Turn every AI conversation into reusable local memory

0
Kosshi icon
Kosshi

Simple, fast outliner for Mac and iPhone.

0
Command A+ icon
Command A+

Cohere’s open enterprise workhorse

0
Shroomie icon
Shroomie

AI-powered news made fun and habit-forming

0
Smart Miles icon
Smart Miles

Automatic trip tracking for tax-ready exports

0
Faby icon
Faby

Your virtual coworker with its own computer living in Slack

0
iPromise icon
iPromise

Bring "Body Doubling" to your Mac notch

0
TestSprite 3.0 icon
TestSprite 3.0

Let a fleet of parallel agents test your app in minutes

0
DCP icon
DCP

Give your AI agents encrypted permission and keys

0
Auto Posts icon
Auto Posts

Schedule social post, Telegram messages + more

0
Nugget AI icon
Nugget AI

Turn customer interviews into your product roadmap

0
Prosed icon
Prosed

Go from newsletters & podcasts to published manuscript

0
General Compute icon
General Compute

AI models that run on an inference cloud optimized for speed

0
buildpipe icon
buildpipe

Compose, run and automate multi step AI developer workflows

0
Zero Assist icon
Zero Assist

Real-time AI cheating detection for technical interviews

0
06

TECHMEME

06.00
TECHMEME

Techmeme - May 25, 2026

Techmeme Digest: Major tech headlines and industry conversations.

SoftBank stock jumped 4.6% to a record high on Monday, spurred by hopes of big returns from the company's stakes in OpenAI and SB Energy Corp if they go public (Bloomberg)
Source: TechmemePublished: May 25, 2026

Bloomberg : SoftBank stock jumped 4.6% to a record high on Monday, spurred by hopes of big returns from the company's stakes in OpenAI and SB Energy Corp if they go public —  SoftBank Group Corp. shares climbed to a record high, spurred by hopes of big returns from the Japanese investor's stakes in OpenAI …

Star Citizen, a video game in development since 2012, has reached $1B in lifetime funding; it remains in alpha and does not have a confirmed release date (Jennifer Maas/Variety)
Source: TechmemePublished: May 25, 2026

Jennifer Maas / Variety : Star Citizen, a video game in development since 2012, has reached $1B in lifetime funding; it remains in alpha and does not have a confirmed release date —  Cloud Imperium Games marked a major milestone Sunday (May 24) as the game developer's open world massively multiplayer online space game …

Tether plans to launch GELT, an "official" stablecoin representing the Georgian lari, with the support of Georgia's government in an unusual partnership (Reuters)
Source: TechmemePublished: May 25, 2026

Reuters : Tether plans to launch GELT, an “official” stablecoin representing the Georgian lari, with the support of Georgia's government in an unusual partnership —  Tether, the world's biggest stablecoin issuer, plans to launch a crypto token representing the Georgian lari with the support …

Sources: Wix is expected to cut ~1,000 jobs in the coming months, or ~20% of its workforce, after weak Q1 earnings and a ~50% collapse of its stock in 2026 (Sophie Shulman/CTech)
Source: TechmemePublished: May 25, 2026

Sophie Shulman / CTech : Sources: Wix is expected to cut ~1,000 jobs in the coming months, or ~20% of its workforce, after weak Q1 earnings and a ~50% collapse of its stock in 2026 —  The company will reduce roughly 20% of its workforce after a steep stock decline and rising AI-related costs.

A section of the Pope's encyclical describing AI's unpredictability suggests influence from Anthropic, whose co-founder Christopher Olah attended the unveiling (Washington Post)
Source: TechmemePublished: May 25, 2026

Washington Post : A section of the Pope's encyclical describing AI's unpredictability suggests influence from Anthropic, whose co-founder Christopher Olah attended the unveiling —  In “Magnifica humanitas,” he fires a broadside against AI companies, warning of the technology's dangers in the same way Pope Francis did about climate change.

In his ~43,000-word encyclical, the Pope urged governments to slow down AI development and decried "new forms of slavery" in AI and tech supply chains (Joshua McElwee/Reuters)
Source: TechmemePublished: May 25, 2026

Joshua McElwee / Reuters : In his ~43,000-word encyclical, the Pope urged governments to slow down AI development and decried “new forms of slavery” in AI and tech supply chains —  Pope Leo urged governments to slow down the development of AI systems in his first major document, released on Monday …

More than 5,500 GitHub repositories were infected with malware in a supply chain attack, dubbed Megalodon, on May 18 that relies on automated commits (Ionut Arghire/SecurityWeek)
Source: TechmemePublished: May 25, 2026

Ionut Arghire / SecurityWeek : More than 5,500 GitHub repositories were infected with malware in a supply chain attack, dubbed Megalodon, on May 18 that relies on automated commits —  Fake automated commits injected GitHub Actions workflows containing payloads to steal credentials, CI secrets, keys, and tokens.

Pope Leo XIV presents Magnifica humanitas, his encyclical on AI, calling for AI regulation, protection for children against hypersexualized AI images, and more (New York Times)
Source: TechmemePublished: May 25, 2026

New York Times : Pope Leo XIV presents Magnifica humanitas, his encyclical on AI, calling for AI regulation, protection for children against hypersexualized AI images, and more —  The document marks a powerful foray by the leader of the Roman Catholic church into the debate about the misuse or overuse of artificial intelligence.

Sources: Meta, Google, and Amazon execs met Vatican officials on April 29, as part of a quiet lobbying push ahead of Pope Leo XIV's first AI encyclical (Océane Herrero/Politico)
Source: TechmemePublished: May 25, 2026

Océane Herrero / Politico : Sources: Meta, Google, and Amazon execs met Vatican officials on April 29, as part of a quiet lobbying push ahead of Pope Leo XIV's first AI encyclical —  As Leo XIV prepares his first encyclical, technology firms and Western diplomats have worked to make their case inside the Vatican.

A growing number of execs are creating AI digital twins to manage tasks; Reid Hoffman says "Reid AI" has delivered 75+ addresses and presentations since 2024 (Joann S. Lublin/Wall Street Journal)
Source: TechmemePublished: May 25, 2026

Joann S. Lublin / Wall Street Journal : A growing number of execs are creating AI digital twins to manage tasks; Reid Hoffman says “Reid AI” has delivered 75+ addresses and presentations since 2024 —  In a glimpse into the future, a small number of executives have created AI replicas to take over some of their responsibilities

A look at Xiaomi's AI push to future-proof its hardware and EV ecosystem, as it recently committed ~$8.8B in AI investments over the next three years (Ben Jiang/South China Morning Post)
Source: TechmemePublished: May 25, 2026

Ben Jiang / South China Morning Post : A look at Xiaomi's AI push to future-proof its hardware and EV ecosystem, as it recently committed ~$8.8B in AI investments over the next three years —  Xiaomi is betting on artificial intelligence to future-proof its sprawling hardware empire, pouring massive resources into open-source models …

A profile of Meta CTO Andrew "Boz" Bosworth, a top lieutenant of Mark Zuckerberg who is leading the gargantuan effort to transform Meta into an AI-first company (Meghan Bobrowsky/Wall Street Journal)
Source: TechmemePublished: May 25, 2026

Meghan Bobrowsky / Wall Street Journal : A profile of Meta CTO Andrew “Boz” Bosworth, a top lieutenant of Mark Zuckerberg who is leading the gargantuan effort to transform Meta into an AI-first company —  Andrew Bosworth, Meta's outspoken chief technology officer, has a new mission: transforming the company's workforce using AI

A look at DeepSeek's model optimization to reduce HBM use, potentially enabling domestic memory, ASIC, and CPU makers to create a Chinese AI hardware ecosystem (@bookwormengr)
Source: TechmemePublished: May 25, 2026

@bookwormengr : A look at DeepSeek's model optimization to reduce HBM use, potentially enabling domestic memory, ASIC, and CPU makers to create a Chinese AI hardware ecosystem —  Have you ever wondered, how DeepSeek may make money, and lot of it?  They didn't come up with competitive coding plans like GLM, MoonShot and MiniMax.

The UK, outpaced by the US and China in AI, is turning to experimental technologies like neuromorphic computing in search of computing sovereignty (Charles Clover/Financial Times)
Source: TechmemePublished: May 25, 2026

Charles Clover / Financial Times : The UK, outpaced by the US and China in AI, is turning to experimental technologies like neuromorphic computing in search of computing sovereignty —  Military capability depends increasingly on data centres.  Now governments outpaced in AI are looking to experimental technologies.

Huawei says it aims to make 1.4nm chips by 2031 using its "LogicFolding" tech, which is based on its new Tau Scaling Law intended to bypass Moore's Law limits (Nikkei Asia)
Source: TechmemePublished: May 25, 2026

Nikkei Asia : Huawei says it aims to make 1.4nm chips by 2031 using its “LogicFolding” tech, which is based on its new Tau Scaling Law intended to bypass Moore's Law limits —  TAIPEI — Huawei Technologies on Monday said it has found a new way to design chips to bring its semiconductor capabilities close …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - May 25, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - May 25, 2026

Solidot Feed: Highlighting essential tech & open-source news.

科学家推翻空气动力学的基础原则

几十年来,降低空气阻力的一大原则是表面必须光滑。日本东北大学研究团队率先证明,仅仅应用分布式微粗糙度(distributed micro-roughness 或 DMR),就能将空气阻力降低达 43.6%。DMR 是一种肉眼无法分辨的、极其微小且不规则的表面粗糙度。研究团队利用 1m-MSBS 系统精确测量了光滑表面和 DMR 涂层表面的阻力系数,结果显示 DMR 涂层表面的阻力系数低于光滑表面。

科学家破解烟草合成尼古丁之谜

尼古丁是让烟草具有成瘾性的化合物,人类使用尼古丁已有逾万年历史。但在数十年研究之后科学家仍然未能完全理解烟草植物是如何合成尼古丁分子的。根据发表在《Nature Communications》上的一项研究,科学家破解了烟草合成尼古丁之谜。研究团队发现,尼古丁一开始与葡萄糖分子结合,葡萄糖分子为尼古丁分子的基本构建块提供了能量去加速组装,但在最后葡萄糖分子会被移除。论文第一作者 Benjamin Schwabe 还发现了 NaGR 和 NicG 两种植物酶的精确结构,两种酶帮助将尼古丁分子从较小的片段组装起来。最新发现使得利用烟草植物生产更安全的药物和疫苗成为可能。

日本声优起诉要求 TikTok 删除 AI 模仿其声音的视频

日本人气声优津田健次郎已向东京地方法院提起诉讼,以有人利用生成式 AI 擅自模仿其声音制作视频并公开为由,要求 TikTok 运营方删除相关视频。这可能是关于生成式 AI 擅自使用声音的首个诉讼案。津田“富有磁性的低音声线”被认为是其特色,因在动画《咒术回战》中为七海建人、在《黄金神威》中为尾形百之助等角色配音而知名。 起诉书称,发布视频的人姓名不详。2024 年 7 月至 2025 年 9 月期间,该账号发布了 188 个视频,视频配有模仿津田声音的旁白,主题涉及都市传闻、神秘事件和杂学。根据 TikTok 的支付机制,该账号每月有 50 万至 75 万日元的收益。被告辩解称旁白为“普通的男性声音”,说话方式也没有特色,与津田的声音并不相似。账号发帖者解释说视频是让 AI 学习朋友的声音后制作,认为并不违法。

气候变化威胁全球植物物种

根据发表在《科学》期刊上的一项研究,气候变化增加植物物种灭绝的风险。研究人员分析了逾 67,000 种维管植物——维管植物是指有输送水分和养分之内部复杂传导组织的植物,全球已发现维管植物约在 30~40 万种之间。研究发现,7%-16% 的维管植物可能会失去逾九成的栖息地,面临极高的灭绝风险。植物的栖息地并非是地图上的一个位置,而是其生存所需的全部条件:温度、降雨量、土壤、土地利用以及遮荫处等地理特征。研究表明,气候变化正在缩小适宜植物生存的组合条件,使其生存所需的所有条件同时存在的区域越来越少。植物是多数陆地生态系统的基础。植物蓄碳、稳定土壤、为野生动物提供栖息地,提供食物、木材、药物等。植物多样性的变化会对自然和人类产生连锁反应。

Firefox 加入对 Web Serial API 的支持,与 Adafruit 合作

刚刚发布的 Firefox 151 加入了对 Web Serial API 的支持。Web Serial API 允许网站使用 JavaScript 向串口设备如 USB 和蓝牙设备写入或读取数据。Mozilla 称大部分人不会使用到该 API,它的主要使用群体是开发者,他们将能利用浏览器与兼容硬件设备直接进行通信。Mozilla 同时宣布与知名开源硬件平台 Adafruit 展开合作。Adafruit 基于浏览器的硬件工作流程能在 Firefox 上直接运行。以 Adafruit ESP32-S 开发板为例,通过 Web Serial 可以将网页代码发送的消息直接显示在设备上,或者直接在手持设备上修改网页的 CSS 属性。

四月全球风能太阳能发电量超过天然气发电量

四月全球风能太阳能发电量超过了天然气发电量。根据能源智库 Ember 的分析,四月风能和太阳能发电量占全球总发电量的 22%,天然气发电量占 20%。四月风能和太阳能总发电量达到创纪录的 531 TWh,比天然气总发电量 477 TWh 高出 54 TWh。而五年前的 2021 年 4 月,天然气总发电量 476 TWh,和今天几乎完全一致。但当时的风能和太阳能总发电总量仅为 245 TWh,不到今天的一半。北半球的四月是春季,通常风力强劲,因此风能发电量在四月一般呈增长趋势。Ember 的报告《Global Electricity Review》认为在 2025 年风能和太阳能足以满足全球电力的增长需求。

《星际公民(Star Citizen)》筹款突破十亿美元

开发了 14 年但发布日期未知的《星际公民(Star Citizen)》的筹款突破了十亿美元达到 1,003,408,183 美元。《星际公民》由《银河飞将》创始人 Chris Roberts 领导开发,试图复兴太空模拟飞行游戏,允许玩家在广袤的宇宙空间内探险,交易和战斗。它于 2012 年在 Kickstarter 上成功众筹,原计划的交付时间是 2014 年。但在 Kickstarter 众筹结束后,开发团队继续在官方网站上进行募资,许多募资其实就是销售游戏内的虚拟物品如各种型号的飞船。2018 年它筹集到 2 亿美元,五年之后突破 6 亿美元,2024 年 5 月突破了 7 亿美元,2 年之后突破了 10 亿美元。《星际公民》堪称史上开发预算最高的 3A 游戏。

研究建议为保护心血管健康每周运动 10 小时

澳门理工大学在《British Journal of Sports Medicine》期刊上发表报告,认为成年人应尽量每周进行约 560至610分钟(约10小时)的运动,从而显著降低心肌梗死、中风或心力衰竭等心血管疾病的风险。研究人员分析了英国健康数据库“英国生物银行”(UK Biobank)中 一万七千多名参与者的数据。这些参与者连续一周佩戴加速度传感器,用于记录其日常活动水平。研究还通过骑行测试测量并估算其最大摄氧量。随后在约 8 年的时间内,跟踪观察参与者罹患疾病的情况。研究的核心结果是:遵循世卫组织建议(每周至少 150 分钟运动)可将心血管疾病风险降低约 8%-9%;而每周运动 560至610分钟 时,风险降低幅度超过 30%。研究指出,体能基础较弱的人群,往往需要进行更多运动,才能获得与体能较好人群相同的健康益处。研究作者强调,尽管每周150分钟仍是一个重要的入门标准,但若想获得最佳的健康效果,应争取进行更多的运动。

报告认为个人要为老年健康状况承担至少八成责任

牛津长寿项目发表报告《Living Longer, Better》,认为个人至少要为老年时期的健康状况承担八成责任。报告指出,个人对自身寿命的掌控远超普遍认知。报告的结论是基于多项研究,这些研究认为至少 75% 的人类寿命由环境因素和可改变生活方式因素决定。其中一项研究使用了近 50 万英国生物银行参与者的数据。结果发现,环境暴露和习惯对过早死亡和生物衰老的影响远大于遗传因素。报告建议避免食用加工食品、完全戒酒、保证充足睡眠、晚上 6 点半以后不要进食,培养所谓的“非肉食心态”。在酒精问题上报告更直言不讳,称酒精有毒不要喝。批评者认为报告的结论过于简化,在贫困、污染和医保等问题上个人对自己选择的掌控力有限。

特朗普政府要求绿卡申请者离开美国申请

在非法移民之后,特朗普政府开始将目标瞄准合法移民。美国移民局宣布,绿卡申请者必须离开美国才能申请。但要离开美国申请者需要中断学业或工作,可能会被拒绝入境,事实上变成了某种自我遣返。目前不清楚新政策是否会类似 2025 年 9 月宣布的 10 万美元 H-1B 签证费用,一开始声称适用于所有新签证申请者,但随后大幅缩小了适用范围。

加州理工可能失去对 JPL 的控制

NASA 计划对 JPL 的运营合同进行首次公开竞标。JPL 实验室自 1930 年代成立以来一直由加州理工管理,它与 NASA 的现有合同将于 2028 年到期,届时有可能首次失去对 JPL 的控制。加州理工自去年夏天以来一直为这一可能的过渡做准备,并不为此感到意外。JPL 长期负责火星和其它深空区域的无人探索,它以美国联邦资助研发中心(FFRDC)的形式运营,相对于 NASA 其它机构保持着一定的独立性,如果由非加州理工的机构竞标运营,可能会产生重大影响,因为 JPL 和加州理工之间的关系非常紧密。

扎克伯格为监视员工的做法辩护

劳工保护组织 More Perfect Union 公开了扎克伯格(Mark Zuckerberg)上月底回答员工有关设备监控提问的六分钟录音。Meta 上个月通知员工将使用名为 Model Capability Initiative 的监控工具监控员工的鼠标点击和按键,此举旨在收集数据训练 AI 模型。扎克伯格在回答中为监控员工辩护,称如果想训练模型的编程能力,那么让内部员工去开发一些工具,或者去解决一些任务,以此来教模型如何写代码——这种方式能让模型在编程能力上实现飞跃。这种速度是行业内其他对手无法企及的,因为他们的公司没有成千上万名顶尖工程师,“这只是一个例子。我们的系统还需要非常擅长的一点就是‘操作电脑’。而要让一个系统学会熟练操作电脑,最有效的办法就是让它去观察极其聪明的人是如何操作电脑的。这基本上就是我们目前正在做的事情的核心本质。”扎克伯格表示不会监视员工的工作行为,MCI 数据不会用于绩效评估。因为欧盟的 GDPR 法律,Meta 位于欧洲的员工据报道不用参与该计划。Meta 并非唯一一家通过员工获取 AI 训练数据的科技公司,微软和 xAI 也在利用内部员工生成和完善训练数据集。

《无畏契约》反作弊工具会限制作弊者使用 DMA 外挂

非玩家可能不知道,今天的高级作弊工具已经硬件化,且价格不菲,可能比整台 PC 贵得多。此类工具被称为 DMA 硬件卡或 DMA 外挂,利用硬件绕过传统的游戏反作弊系统。游戏开发商也正致力于反制 DMA 外挂,最新的例子就是 Riot Games。它的 FPS 网游《无畏契约(Valorant)》使用的内核级反作弊系统 Vanguard 在最新更新之后能强制开启 IOMMU 封锁 DMA 外挂,导致 DMA 硬件停止工作,如果要恢复工作必须重新安装操作系统。Vanguard 现在能屏蔽大多数伪装成 SATA 或 NVMe 设备的 DMA 硬件卡固件,会在游戏中突然触发 IOMMU 重启警告,之后 DMA 固件完全无法使用,即使游戏不再运行或卸载也是如此。唯一的解决方法是重装 Windows 系统。Riot Games 通过社交媒体嘲讽了作弊者,称他们的 6000 美元 DMA 外挂变成了垃圾。

沃茨告诉毕业生他们拥有真正的智能

苹果联合创始人沃茨(Steve Wozniak)做到了其他毕业典礼嘉宾没有做到的事情:他谈论 AI 时赢得了毕业生的欢呼,而不是嘘声。沃茨说,“You have AI — actual intelligence。”他说,“要深入谈谈我对 AI 的看法,那就说来话长了,但我们一直在努力创造一个大脑,我们能否将一个程序复制一万亿次使其像大脑一样运作?AI 就是其中一种尝试。”沃兹回顾了他在苹果公司的工作经历,为即将开始职业生涯的毕业生们提供了一些建议,“你们应该尝试换一种思维,不要墨守成规,走千篇一律的路。想想我能不能做一些与众不同的事情?”

Linus Torvalds 谈 AI

Linux 作者 Linus Torvalds 在北美开源峰会上谈论了 AI,他认为 AI 工具正在重塑内核开发,但他坚称 AI 只是一种不错的工具,不会完全替代程序员。Torvalds 称内核最近两个版本的 commits 数增加了 20%,他一开始以为是内核版本号从 6.x 跳到 7.x 而让开发者兴奋不已,结果发现是因为 AI 辅助编程工具过去半年有了显著进步。他承认 AI 工具降低贡献者的门槛,但它真正的影响是社会而不是技术层面,一个例子就是安全邮件列表涌入了大量重复性的 bug 报告。为应对这一情况,内核制定了新规则。Torvalds 同时督促安全研究人员不要提前披露漏洞利用,内核最近发现了四个提权漏洞,但维护者还没收到通知研究员就提前公开,他说这些人喜欢引人瞩目。他不认为闭源能解决安全问题,闭源实际上更糟,因为 AI 无法帮助你修复 bug。Torvalds 说维护工作依赖于人而不是代码,作为最高级别的维护者,他的工作不是写代码而是与人合作,他不会用 AI 来与人合作,并建议其他人也不要这么做。他始终认为 AI 只是不错的工具,不会完全取代程序员。他的工作经历就凸显了工具的进步给程序员带来的生产力提升:他最开始是手动输入机器代码,然后用汇编器,接着是编译器,最后是今天的 AI 辅助编程。他认为 AI 在改变编程,但并没有改变编程的本质。开发者仍然需要理解工具生成了什么。对于任何长期运行的系统,“你不仅要理解指令,还要理解最终结果,因为这是你能长期维护它的唯一途径。”AI 并不能取代人类判断、社区规范以及对所构建系统的深刻理解,“软件非常复杂,管理复杂基础设施复杂性的唯一真正有效方法是开源”,而 AI 只是程序员工具箱中的又一个工具。

GitHub 面临生存之战

在被微软收购八年之后,最大的代码托管平台 GitHub 正面临生存之战,它的宕机和安全问题频发,而竞争对手的压力也越来越大。过去几周,GitHub 发生了多起严重的宕机事故,因员工的 VS Code 安装了一个恶意库扩展导致 3800 个内部代码库被窃取。GitHub 现员工和前员工在接受采访时描述了公司在领导层缺乏和竞争对手压力下挣扎的困境。2025 年夏天 CEO Thomas Dohmke 离职之后,微软没有再任命新 CEO,而是让领导团队成员向 CoreAI 汇报工作,CoreAI 由前 Meta 工程主管 Jay Parikh 负责,他由 CEO Satya Nadella 亲自招揽,负责帮助公司向 AI 转型。他在公司内部并不受欢迎,正是他决定不再任命 GitHub 新 CEO。有很多 GitHub 员工跟着离职去了 Dohmke 的新创公司 Entire。GitHub 高管过去几个月也不断流失,高级副总裁 Jared Palmer、前首席营收管 Elizabeth Pemmerl 都已经离职。GitHub 现员工称公司已经名存实亡,如今的一切都归微软。

Sergey Brin 捐 50 万美元反对对薪酬过高的 CEO 征税

已从硅谷搬家到内华达州的 Google 联合创始人 Sergey Brin 向旧金山的一个政治行动委员会捐赠 50 万美元,用于反对一项被称为“薪酬过高 CEO 税”的提案,旧金山选民将于 6 月 2 日对该提案进行投票。他此前已经捐赠数千万美元反对加州对亿万富翁征税的提案,该提案预计将于今年 11 月由加州选民进行投票。“薪酬过高 CEO 税”将根据公司全球员工的薪酬情况计算高管与普通员工的薪酬比率。支持该提案的 Chinese Progressive Association 称有必要“确保最富有的企业缴纳其应缴的税款”。

Meta 应沙特要求审查反对者的账号

从 2026 年 4 月 30 日起,Meta 应沙特政府要求在沙特境内屏蔽了 NGO 组织 ALQST for Human Rights 和 Democratic Diwan,以及沙特研究员 Abdullah Alaoudh 和人权活动人士 Yahya Assiri 的 Facebook 账户。Meta 也应阿联酋要求地理封锁了一名学者的账号。自 2026 年 3 月以来,已有逾 100 个 Facebook 页面和 Instagram 账户受到了限制。沙特还要求 X 平台地理封锁知名沙特活动人士的账号,目前 X 尚未遵守该要求。

脱离人体的大脑被用于药物测试

一天前这颗大脑还在一个活人身上。如今在其主人去世数小时后,它静静地躺在一辆小推车上。车上布满了管道,向这个器官内泵入数升的血液替代品和其它液体,为其输送氧气并排出代谢废物。它的大部分核心功能都完好无损,但其电活动已被麻醉剂压制,使这颗大脑处于一种介于生死之间的游离状态。随着它代谢着实验性药物,传感器实时记录着其反应,捕捉关于细胞、蛋白质和生理机能的数百个数据点。24 小时后,它将被切成数百个碎片,以进行更深入的研究。它是生物创业公司 Bexorg 使用脑维持设备 BrainEx 培养和研究的逾七百颗大脑之一,被用于深入理解潜在疗法在患有帕金森、阿尔茨海默或肌萎缩侧索硬化症等神经退行性疾病大脑中的作用机制。Bexorg 能对大脑进行活检,了解药物在细胞中停留的时间、是否靶向其分子靶点以及是否存在任何副作用。Bexorg 认为它的系统能提供比实验室动物或培养皿细胞更接近真实情况的药物测试条件。Bexorg 此前一直保持低调,但最近在扩大规模,邀请了记者参观其实验室,试图向公众保证,脱离人体的大脑不会触犯伦理底线,也不会有恢复意识的风险。

因无人驾驶汽车驶入洪水 Waymo 暂停亚特兰大服务

由于无人驾驶汽车暂时还无法应付洪水淹没道路问题,Waymo 暂停了在亚特兰大的无人出租车服务。Waymo 的一辆无人驾驶出租车周三驶入了一条被洪水淹没的道路,被困大约一小时。这辆车已被拖走。Waymo 表示它在寻找解决方案的同时暂停在了亚特兰大的服务。Waymo 早些时候因为恶劣天气暂停了德州圣安东尼奥、达拉斯和休斯顿的服务。Waymo 称亚特兰大的暴雨降雨量巨大,以至于在国家气象局发布山洪暴发预警、警报或建议前洪水就已经发生了。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK