TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0873
FRI, MAY 22, 2026
Discover the best information organized by OrangeBot.AI
TODAY · FRI, MAY 22, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

May 22, 2026

Here is a summary of today's main news events.

Global Stock Markets Rally, Fueled by AI and Tech Sector

Stock markets in the U.S., Europe, and Asia saw gains, largely driven by continued investor enthusiasm for the technology sector, particularly artificial intelligence. In the U.S., the Dow Jones reached a record high, with companies like IBM leading the way after news of government investment in the industry.

AI Generates Excitement and Increasing Concern

While AI is fueling market growth and major companies like SpaceX and OpenAI are preparing for public offerings, concerns are mounting. The White House is considering more oversight of the industry, and California's governor is responding to public fears about AI's impact on jobs. In a notable development, some AI creators have warned that their systems are beginning to produce flawed or dangerous code.

Commodity Prices Mixed Amid Geopolitical and Economic Uncertainty

Oil prices were volatile, fluctuating as markets awaited a breakthrough in U.S.-Iran peace talks. Natural gas prices fell after a report showed a larger-than-expected inventory build. Meanwhile, gold prices were on track for a weekly loss, pressured by a strong U.S. dollar, which held steady near multi-week highs.

Major Corporate and Legal Developments Unfold

Major social media companies reached a settlement with a Kentucky school district, avoiding a trial over accusations that their platforms were intentionally designed to be addictive to young people. Separately, three individuals involved in a pre-IPO fund fraud case were ordered to pay nearly $189 million in forfeiture and restitution.

International Focus on Turkish Markets and European Politics

In Turkey, markets experienced panic and capital outflows after a court ousted a top political rival of President Erdoğan. In Europe, Paris signaled support for a joint defense project to address a capabilities gap with Russia, while in the U.K., the ruling Labour Party is navigating internal leadership challenges and strains on public finances.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - May 22, 2026

Hacker News Feed: Highlighting key posts and discussions.

Slumber a TUI HTTP Client

(slumber.lucaspickering.me)

12141
Cleve Moler has died

(www.mathworks.com)

17915
The IBM-ification of Google?

(zeroshot.bearblog.dev)

185141
BBEdit 16

(www.barebones.com)

311100
Shunning AI is the human choice

(www.thehandbasket.co)

361521
Vivaldi 8.0

(vivaldi.com)

361241
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - May 22, 2026

huggingface.description

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.

107
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better distinguish high-reward responses from low-reward ones. To address this limitation, we propose DelTA, a discriminative token credit assignment method that estimates token coefficients to amplify side-specific token-gradient directions and downweight shared or weakly discriminative ones. These coefficients reweight a self-normalized RLVR surrogate, making the effective side-wise centroids more contrastive and thereby reshaping the RLVR update direction. On seven mathematical benchmarks, DelTA outperforms the strongest same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base, respectively. Additional results on code generation, a different backbone, and out-of-domain evaluations further demonstrate the generalization ability of DelTA.

96
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

The rise of personal assistant agents, e.g., OpenClaw, highlights the growing potential of large language models to support users across everyday life and work. A core challenge in these settings is proactive assistance, since users often begin with underspecified requests and leave important needs, constraints, or preferences unstated. However, existing benchmarks rarely evaluate whether agents can identify and act on such hidden intents before they are explicitly stated, especially in sustained multi-turn interactions where user needs emerge gradually. To address this gap, we introduce π-Bench, a benchmark for proactive assistance comprising 100 multi-turn tasks across 5 domain-specific user personas. By incorporating hidden user intents, inter-task dependencies, and cross-session continuity, π-Bench evaluates agents' ability to anticipate and address user needs over extended interactions, jointly measuring proactivity and task completion in long-horizon trajectories that better reflect real-world use. Experiments show (1) proactive assistance remains challenging, (2) a clear distinction between task completion and proactivity, and (3) the value of prior interaction for proactive intent resolution in later tasks.

74
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded social cognition in MLLMs.

71
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesirable trade-off among efficiency, training cost, and accuracy. In this work, we show that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only minimal adaptation. Our approach is built on three observations: (1) only a small subset of attention heads truly requires full long-context processing; (2) long-range retrieval is governed primarily by a low-dimensional subspace, allowing relevant tokens to be retrieved efficiently with a 16-dimensional indexer; and (3) the useful token budget is strongly query-dependent, making dynamic top-p selection more suitable than fixed top-k sparsification. Based on these insights, we propose RTPurbo, which retains the full KV cache only for retrieval heads and introduces a lightweight token indexer for sparse attention. By exploiting the model's intrinsic sparsity, RTPurbo achieves sparsification with only a few hundred training steps. Experiments on long-context benchmarks and reasoning tasks show that RTPurbo preserves near-lossless accuracy while delivering substantial efficiency gains, including up to a 9.36times prefill speedup at 1M context and about a 2.01times decode speedup. These results suggest that strong sparse inference can be obtained from standard full-attention training without expensive native sparse pretraining.

69
ACC: Compiling Agent Trajectories for Long-Context Training

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

50
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

Simulation-ready physical 3D assets have emerged as a promising direction owing to their broad applicability in downstream tasks. However, most existing 3D generation methods either neglect physical properties or are limited to a single asset category, e.g., rigid, deformable, or articulated objects. To address these limitations, we introduce PhysX-Omni, a unified framework for simulation-ready physical 3D generation across diverse asset types. Specifically, we develop a novel and efficient geometry representation tailored for Vision-Language Models, which directly encodes high-resolution 3D structures without compression, significantly improving generation performance. In addition, we construct the first general simulation-ready 3D dataset, PhysXVerse, covering diverse indoor and outdoor categories. Furthermore, to comprehensively and flexibly evaluate both generative and understanding capabilities in the wild, we propose PhysX-Bench, which encompasses six key attributes: geometry, absolute scale, material, affordance, kinematics, and function description. Extensive experiments with conventional metrics and PhysX-Bench show that PhysX-Omni performs strongly in both generation and understanding. Moreover, additional studies further validate the potential of PhysX-Omni for applications in simulation-ready scene generation and robotic policy learning. We believe PhysX-Omni can significantly advance a wide range of downstream applications, particularly in embodied AI and physics-based simulation.

35
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous audio-visual signals into discrete tokens, weakening temporal grounding and shifting intermediate reasoning toward language priors. We argue that a unified latent space is a better medium for such reasoning because it preserves dense sensory information while remaining compatible with autoregressive generation. Based on this insight, we propose LatentOmni, a cross-modal reasoning framework that interleaves textual reasoning with audio-visual latent states. LatentOmni introduces feature-level supervision to align latent reasoning states with task-relevant sensory features and uses Omni-Sync Position Embedding (OSPE) to maintain temporal consistency between latent audio and visual states. We further construct LatentOmni-Instruct-35K, a dataset of audio-visual interleaved reasoning trajectories for supervising latent-space reasoning. Comprehensive evaluation across multiple audio-visual reasoning benchmarks demonstrates that LatentOmni achieves the best performance among the evaluated open-source models and consistently outperforms the Explicit Text CoT baseline, supporting latent-space joint reasoning as a promising path toward stronger omnimodal understanding.

30
WorldKV: Efficient World Memory with World Retrieval and Compression

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/

26
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications. We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as domain-specific evaluation tasks in areas such as finance and supply chain management, which we compile into the new Domain-Spreadsheet benchmark dataset. It also includes a Spreadsheet Gym environment designed for multi-turn RL: Spreadsheet Gym exposes extensive Excel functionality through a Python sandbox, along with a refined harness that incorporates a comprehensive tool set and carefully designed tool-routing rules for spreadsheet tasks. Through comprehensive experiments, we show that Spreadsheet-RL substantially enhances AI agent's performance on both general and domain-specific spreadsheet tasks: it improves Qwen3-4B-Thinking-2507's Pass@1 on SpreadsheetBench from 12.0% to 23.4%, and raises Pass@1 from 8.4% to 17.2% on our curated Domain-Spreadsheet dataset. These results highlight Spreadsheet-RL's strong potential for generalization and real-world adoption in spreadsheet automation, and broadly, its promise for advancing LLM-based interactions with data interfaces in everyday work.

22
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Extending the generation horizon of video diffusion models to long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions of bidirectional models, which are tightly coupled to specific architectures and suffer from quality degradation over long horizons, and autoregressive models, which accumulate drift errors due to exposure bias and tend to produce repetitive motion patterns. To address these issues, we propose a novel but simple inference-time approach for long video generation that is architecture-agnostic and requires no additional training. Our method generates long videos via overlapping sliding windows, where predicted clean samples from adjacent windows are blended via Tweedie matching to enforce both manifold constraint and temporal consistency across overlap regions. Stochastic early-phase sampling then synchronizes per-window trajectories by injecting fresh noise after each Tweedie matching correction in the high-noise phase, before transitioning to deterministic ODE sampling to preserve fine-grained visual fidelity. Applied to various video generation models, our method generates videos several times longer than the native window length while outperforming both training-free and autoregressive baselines in temporal consistency and visual quality, and further extends to audio-video joint generation and text-to-3DGS without any fine-tuning.

21
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.

19
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advantages across diverse domains, yet current frameworks fail to exploit the complementary strengths of models and skills, thereby limiting their performance on downstream tasks. In this paper, we present Maestro (Multimodal Agent for Expert-Skill Targeted Reinforced Orchestration), a Reinforcement Learning (RL)-driven orchestration framework that reframes heterogeneous multimodal tasks as a sequential decision-making process over a hierarchical model-skill registry. Rather than consolidating all knowledge into a single model, Maestro trains a lightweight policy to dynamically compose ensembles of frozen expert models and a two-tier skill library, deciding at each step whether to invoke an external expert, which model-skill pair to select, and when to terminate. The policy is optimized via outcome-based RL, requiring no step-level supervision. We evaluate Maestro across ten representative multimodal benchmarks spanning mathematical reasoning, chart understanding, high-resolution perception, and domain-specific analysis. With only a 4B orchestrator, Maestro achieves an average accuracy of 70.1%, surpassing both GPT-5 (69.3%) and Gemini-2.5-Pro (68.7%). Crucially, the learned coordination policy generalizes to unseen models and skills without retraining: augmenting the registry with out-of-domain experts yields a 59.5% average on four challenging benchmarks, outperforming all closed-source baselines. Maestro further maintains high computational efficiency with low latency. The source code is available at https://github.com/jinyangwu/Maestro.

17
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.

17
Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.

13
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay. But the active edit still uses a single scalar gate to control two different things: how much old content to erase on the key side and how much new content to commit on the value side. We introduce Gated DeltaNet-2, which generalizes both Gated DeltaNet and KDA by inheriting adaptive forgetting and channel-wise decay while addressing their shared limitation, the scalar tie between erasing and writing. Gated Delta Rule-2 separates these roles with a channel-wise erase gate b_t and a channel-wise write gate w_t, reducing to KDA when both gates collapse to the same scalar and to Gated DeltaNet when the decay also collapses. We derive a fast-weight update view, a chunkwise WY algorithm with channel-wise decay absorbed into asymmetric erase factors, and a gate-aware backward pass that preserves efficient parallel training. At 1.3B parameters trained on 100B FineWeb-Edu tokens, Gated DeltaNet-2 achieves the strongest overall results among Mamba-2, Gated DeltaNet, KDA, and Mamba-3 variants across language modeling, commonsense reasoning, and retrieval. Its advantage is most pronounced on long-context RULER needle-in-a-haystack benchmarks, where it improves the evaluated multi-key retrieval setting and remains strong in both recurrent and hybrid settings. Code is available at https://github.com/NVlabs/GatedDeltaNet-2.

11
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

10
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

LLMs are widely adopted in production, pushing inference systems to their limits. Disaggregated LLM serving (e.g., PD separation and KV state disaggregation) improves scalability and cost efficiency, but it also turns KV into an explicit payload crossing network and storage boundaries, making KV a dominant end-to-end bottleneck. Existing KV compression are typically static runtime configurations, despite production service context varies over time in workload mix, bandwidth, and SLO/quality budgets. As a result, a fixed choice can be suboptimal or even increase latency. We present KVServe, the first service-aware and adaptive KV communication compression framework for disaggregated LLM serving: KVServe (1) unifies KV compression into a modular strategy space with new components and cross-method recomposition; (2) introduces Bayesian Profiling Engine that efficiently searches this space and distills a 3D Pareto candidate set, reducing 50times offline search overhead; and (3) deploys a Service-Aware Online Controller that combines an analytical latency model with a lightweight bandit to select profiles under constraints and correct offline-to-online mismatch. Integrated into vLLM and evaluated across datasets, models, GPUs and networks, KVServe achieves up to 9.13times JCT speedup in PD-separated serving and up to 32.8times TTFT reduction in KV-disaggregated serving.

10
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/

9
Unsupervised Process Reward Models

Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cost: PRMs require expert annotations for every reasoning step, making them costly and difficult to scale. Here, we propose a method for training unsupervised PRMs (uPRM) that requires no human supervision, neither at the level of step-by-step annotations nor through ground-truth verification of final answers. The key idea behind our approach is to define a scoring function, derived from LLM next-token probabilities, that jointly assesses candidate positions of first erroneous steps across a batch of reasoning trajectories. We demonstrate the effectiveness of uPRM across diverse scenarios: (i) uPRM achieves up to 15% absolute accuracy improvements over the LLM-as-a-Judge in identifying first erroneous steps on the ProcessBench dataset; (ii) as a verifier for test-time scaling, uPRM performs comparably to supervised PRMs and outperforms the majority voting baseline by up to 6.9%, and (iii) when used as a reward signal in reinforcement learning, uPRM enables more robust policy optimization throughout training compared to a supervised PRM trained using ground-truth labels. Overall, our results open a path toward scalable reward modeling for complex reasoning tasks.

7
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1) narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2) spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3) production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchical multi-agent framework that transforms a user's single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-based story generation module that enforces short-drama pacing and narrative coherence; (2) a 3D-grounded first-frame generation mechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3) multi-stage reviewer loops that perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introduce scene-level BGM matching and scene transition planning to improve the audience's immersive experience. To systematically evaluate this task, we introduce Short-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.

7
Swift Sampling: Selecting Temporal Surprises via Taylor Series

While most frames in long-form video are redundant, the critical information resides in temporal surprises: moments where the actual visual features deviate from their predicted evolution. Inspired by the human brain's predictive coding, we introduce Swift Sampling, an elegant, training-free frame selection algorithm that automatically identifies high-information moments in a video. Specifically, we model a video as a differentiable trajectory in the visual latent space and compute the velocity and acceleration of its features. Then, we apply Taylor expansion to project the expected path of subsequent frames. Frames that diverge sharply from this predicted manifold are identified as temporally surprising frames and selected for sampling. Unlike prior training-free methods that rely on auxiliary networks or video-specific hyperparameter tuning, Swift Sampling is incredibly lightweight, adding only 0.02x additional computational cost over baseline making it 30x cheaper overhead than leading baselines. Across three long-video question answering benchmarks and 10 different downstream tasks, Swift Sampling outperforms uniform sampling and prior query-agnostic baselines. It is especially powerful for long videos with limited frame budgets improving accuracy by up to +12.5 points.

5
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize multimodal evidence from heterogeneous sources. In this paper, we introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking that shifts the paradigm from passive evidence consumption to active evidence acquisition. Given only a clinical query and access to raw data sources, ClinSeekAgent gathers evidence by querying medical knowledge bases, navigating raw EHRs, and invoking medical imaging tools; refines its hypotheses as new information emerges; and integrates the collected evidence into grounded clinical decisions. ClinSeekAgent serves both as an inference-time agent for frontier LLMs and as a training-time pipeline for distilling high-quality agent trajectories into compact open-source models. To validate its inference-time effectiveness, we construct ClinSeek-Bench, which pairs Curated Input reasoning from fixed pre-selected evidence with Automated Evidence-Seeking over raw clinical data. On text-only EHR tasks, ClinSeekAgent improves Claude Opus 4.6 from 60.0 to 63.2 overall F1 and MiniMax M2.5 from 43.1 to 47.3, with positive risk-prediction gains in 7 out of 9 evaluated host models. On multimodal tasks, ClinSeekAgent improves Claude Opus 4.6 from 47.5 to 62.6 (+15.1); all evaluated models improve across the three CXR-related task groups. We further validate ClinSeekAgent as a training pipeline by distilling agentic evidence-seeking trajectories into ClinSeek-35B-A3B, which achieves 34.0 average F1 on existing AgentEHR-Bench, improving over its Qwen3.5-35B-A3B baseline by +11.9 points and approaching Claude Opus 4.6.

4
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

Traditional visual object tracking (VOT) methods typically rely on task-specific supervised training, limiting their generalization to unseen objects and challenging scenarios with distractors, occlusion, and nonlinear motion. Recent vision foundation models, exemplified by SAM 2, learn strong video understanding priors from large-scale pretraining and offer a promising foundation for building more robust and generalizable trackers. However, directly applying SAM 2 to VOT remains suboptimal, as it does not explicitly model target motion dynamics or enforce geometric and semantic consistency across frames, both of which are essential for reliable tracking. To address this issue, we propose SAMOSA, a new tracking framework that adapts SAM 2 to complex VOT scenarios by explicitly leveraging motion, geometry, and semantic cues. Specifically, we introduce a lightweight nonlinear motion predictor to model target dynamics and guide mask selection as well as memory filtering. We further exploit semantic cues to detect target shifts and recover from tracking failures, while geometric cues are incorporated as structural constraints to improve tracking stability. In this way, SAMOSA bridges the gap between the implicit video understanding prior of SAM 2 and explicit tracking-oriented modeling. Extensive experiments show that SAMOSA consistently outperforms state-of-the-art SAM 2--based approaches on general benchmarks, demonstrates stronger generalization than supervised VOT methods, and achieves substantial gains on anti-UAV datasets, which typify complex nonlinear motion scenarios. Our code is available at https://github.com/DurYi/SAMOSA.

4
Diversed Model Discovery via Structured Table Discovery

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline

4
SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.

3
Training Large Language Models to Predict Clinical Events

Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label resolved from later documentation. This process yields 6,900 prediction examples from 702 admissions across medications, procedures, organ support, microbiology, and mortality. A small LoRA adapter trained on these examples improves over the prompted base model, reducing expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145, while slightly outperforming GPT-5 point estimates on held-out questions. The approach enables reusable clinical prediction supervision from longitudinal notes without hand-engineered structured features or endpoint-specific classifiers.

3
Bernini: Latent Semantic Planning for Video Diffusion

Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize images and videos with photorealistic fidelity. We argue that these two families can be unified through a simple division of labor: MLLMs perform semantic planning, while diffusion models render pixels from high-level semantic guidance and low-level visual features. Building on this idea, we propose Bernini, a unified framework for video generation and editing. An MLLM-based planner predicts the target semantic representation directly in the ViT embedding space, and a DiT-based renderer synthesizes pixels conditioned on this plan, augmented by text features and, for editing, source VAE features for detail preservation. Because semantics serve as the interface, the planner and renderer can be trained separately and only lightly co-trained, preserving the pretrained strengths of both components while keeping training efficient. To better handle multiple visual inputs, we introduce Segment-Aware 3D Rotary Positional Embedding (SA-3D RoPE), and further incorporate chain-of-thought reasoning in the planner to better transfer understanding into generation. Bernini achieves state-of-the-art performance across a wide range of video generation and editing benchmarks, with the MLLM's pretrained understanding translating into strong generalization on challenging editing tasks.

3
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

We introduce TerminalWorld, a scalable data engine that automatically reverse-engineers high-fidelity evaluation tasks from "in-the-wild" terminal recordings. Processing 80,870 terminal recordings, the engine yields a full benchmark of 1,530 validated tasks, spanning 18 real-world categories, ranging from short everyday operations to workflows exceeding 50 steps, and covering 1,280 unique commands. From these, we curate a Verified subset of 200 representative, manually reviewed tasks. Comprehensive benchmarking on TerminalWorld-Verified across eight frontier models and six agents reveals that current systems still struggle with authentic terminal workflows, achieving a maximum pass rate of only 62.5%. Moreover, TerminalWorld captures real-world terminal capabilities distinct from existing expert-curated benchmarks (e.g., Terminal-Bench), with only a weak correlation to their scores (Pearson r=0.20). The automated engine makes TerminalWorld authentic and scalable by construction, enabling it to evaluate agents in real-world terminal environments as developer practices evolve. Data and code are available at https://github.com/EuniAI/TerminalWorld.

3
Forecasting Downstream Performance of LLMs With Proxy Metrics

Progress in language model development is often driven by comparative decisions: which architecture to adopt, which pretraining corpus to use, or which training recipe to apply. Making these decisions well requires reliable performance forecasts, yet the two commonly used signals are fundamentally limited. Cross-entropy loss is poorly aligned with downstream capabilities, and direct downstream evaluation is expensive, sparse, and often uninformative at early training stages. Instead, we propose to construct proxy metrics by aggregating token-level statistics, such as entropy, top-k accuracy, and expert token rank, from a candidate model's next token distribution over expert-written solutions. Across three settings, our proxies consistently outperform loss- and compute-based baselines: 1) For cross-family model selection, they rank a heterogeneous population of reasoning models with mean Spearman Rho = 0.81 (vs. Rho = 0.36 for cross-entropy loss); 2) For pretraining data selection, they reliably rank 25 candidate corpora for a target model at roughly 10{,}000times less compute than direct evaluation, pushing the Pareto frontier beyond existing methods; and 3) for training-time forecasting, they extrapolate downstream accuracy across an 18times compute horizon with roughly half the error of existing alternatives. Together, these results suggest that expert trajectories are a broadly useful source of signal for assessing model capabilities, enabling reliable performance forecasting throughout the model development life cycle.

2
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment cannot use partial progress in failed attempts. We introduce SCRL (Subproblem Curriculum Reinforcement Learning), a curriculum RL framework that derives verifiable subproblems from reference reasoning chains and fixes the final subproblem as the original problem. This turns partial progress on hard problems into verifiable learning signals. Algorithmically, SCRL uses subproblem-level normalization, which normalizes rewards independently at each subproblem position and assigns the resulting advantages to the corresponding answer spans, enabling finer-grained credit assignment without external rubrics or reward models. Our analysis shows that subproblem curricula lift hard problems out of gradient dead zones, with larger relative gains as the original problem becomes harder. Across seven mathematical reasoning benchmarks, SCRL outperforms strong curriculum-learning baselines, improving average accuracy over GRPO by +4.1 points on Qwen3-4B-Base and +1.9 points on Qwen3-14B-Base. On AIME24, AIME25, and IMO-Bench, SCRL further improves pass@1 by +3.7 points and pass@64 by +4.6 points on Qwen3-4B-Base, indicating better exploration on hard reasoning problems.

2
"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

As large language models (LLMs) increasingly shape how users form, refine, and extend their goals, attributing contributions in human-AI collaboration becomes critical for users calibrating their own reliance and for evaluators assessing AI-assisted work. Yet existing methods focus on final artifacts, missing the process through which goals themselves are jointly shaped. We introduce a goal-level attribution framework, CoTrace, that decomposes explicit goals into verifiable requirements and traces both direct contributions and indirect influences across dialogue turns. Applying CoTrace to 638 real-world collaboration logs, we find that while models account for only 11-26% of goal-shaping contribution, they contribute substantially more on introducing lower-level concrete requirements, and make various kinds of indirect contributions. Through controlled simulations, we show that interaction design choices significantly affect model goal-shaping behavior. In a user study, exposing participants to goal-level analyses shifts their perceived contributions by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand their own AI-assisted work.

1
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

1
DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

Representation Autoencoders (RAEs) leverage frozen vision foundation models (VFMs) as tokenizer encoders, providing robust high-level representations that facilitate fast convergence and high-quality generation in latent diffusion models. However, freezing the VFM inherently constrains its spatial reconstruction capacity, limiting fine-grained generation and image editing; in contrast, incorporating reconstruction-oriented signals via fine-tuning disrupts the pretrained semantic space and degrades generative fidelity. To address this trade-off, we propose DecQ, a simple yet effective framework for RAEs. Specifically, DecQ introduces lightweight detail-condensing queries that extract fine-grained information from intermediate VFM features through condenser modules. These queries are incorporated into the decoder to support reconstruction and are jointly generated with patch tokens during generative modeling. By aggregating information from both shallow and deep layers, DecQ effectively mitigates the reconstruction--generation trade-off, improving both reconstruction quality and generative performance. Our experiments demonstrate that: (1) with only 8 additional queries and 3.9% extra computation, DecQ improves reconstruction over the frozen DINOv2-based RAE, increasing PSNR from 19.13 dB to 22.76 dB; and (2) for generative modeling, DecQ achieves 3.3times faster convergence than RAE, attaining an FID of 1.41 without guidance and 1.05 with guidance.

1
OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Omni-proactive streaming video understanding, i.e., autonomously deciding when to speak and what to say from continuous audio-visual streams, is an emerging capability of omni-modal large language models. Existing benchmarks fall short in three key aspects: they rely primarily on visual signals, adopt polling or fixed-timestamp protocols instead of true proactive evaluation, and cover only a limited range of tasks, preventing reliable assessment and differentiation of omni-proactive streaming models. We present OmniPro, the first benchmark to jointly evaluate omni-modal perception, proactive responding, and diverse video understanding tasks. It comprises 2,700 human-verified samples spanning 9 sub-tasks and 3 cognitive levels, covering 6 basic video understanding capabilities. Notably, 84% of samples require audio signals (speech or non-speech), and each sample is annotated with modality-isolation labels to enable fine-grained multimodal analysis. We further introduce a dual-mode evaluation protocol: Probe mode assesses content understanding by querying the model before and after each ground-truth trigger, while Online mode evaluates full proactive ability by requiring models to autonomously decide when to respond in streaming input. Evaluating 11 representative models reveals three key findings: (1) audio provides consistent gains but with highly variable utilization across models, (2) performance degrades significantly over time, indicating limited long-horizon robustness, and (3) non-speech audio perception remains the weakest dimension.

1
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{é} ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot LLMs from 12B to 123B parameters. The results show that more context is not uniformly better: full-document context improves supervised DeBERTa encoders by 3.8--4.8 macro-F1 points over sentence-only input, but does not consistently help zero-shot LLMs. Retrieved moral knowledge is more consistently useful in matched comparisons, improving each tested model family and context condition under early fusion. However, scaling from DeBERTa-v3-base to large and from 12B to larger LLMs does not guarantee gains, and simple early fusion outperforms the tested late-fusion and cross-attention RAG variants for encoders. Per-value analyses show that context and retrieval help most for socially situated or conceptually confusable values. These findings suggest that value-sensitive NLP should evaluate context, knowledge, and model family jointly rather than treating longer inputs or larger models as universal improvements.

1
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws

Scaling laws have made language-model performance predictable from model size, data, and compute, but they typically treat the optimizer as a fixed training detail. We show that this assumption misses a fundamental axis of representation scaling: how effectively the optimizer converts added FFN width into utilized spectral capacity. Using eigenspectra of feed-forward network representations, measured through soft and hard spectral-ranks, we find that the same Transformer architecture realizes markedly different spectral scaling laws when trained with different optimizers. Holding architecture and width schedule fixed, AdamW exhibits weak hard-rank scaling (β=0.44) on rare-token (TAIL) representations where learning is known to be hardest, whereas Muon achieves linear scaling (β=1.02) in the same regimes, a 2.3times increase in the scaling exponent. This difference is not reducible to validation loss: AdamW configurations can match low-rank Dion variants in perplexity, under extended training, while exhibiting sharply different spectral geometry, demonstrating that matched loss does not imply matched representation structure. Hard--soft rank asymmetry further reveals that optimizers differ not only in how much capacity is realized, but also in how that capacity is structured across eigenmodes. To disentangle optimizer effects from architectural ones, we compare against architectural interventions (e.g., attention rank and positional encoding), and find that optimizer-induced spectral shifts often exceed the architectural effects. These results suggest optimization as a first-class axis of representation scaling, motivating optimizer--architecture co-design.

1
FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to develop a unified framework capable of handling diverse realistic fashion retrieval scenarios, achieving truly versatile fashion image retrieval. To establish a data foundation, we first introduce U-FIRE, a comprehensive benchmark that consolidates fragmented fashion datasets into a unified collection, supplemented by two manually curated datasets for testing generalization. Building upon this, we propose FashionLens, a unified framework based on Multimodal Large Language Models. To handle divergent matching objectives, we design a Proposal-Guided Spherical Query Calibrator that dynamically shifts query representations into task-aligned metric spaces via adaptive spherical linear interpolation. Additionally, to mitigate the optimization imbalance caused by varying task complexities and data scales, we develop a Gradient-Guided Adaptive Sampling strategy that automatically re-weights tasks based on realtime learning difficulty and the data scale prior. Experiments on U-FIRE show that FashionLens achieves state-of-the-art performance across diverse retrieval scenarios and generalizes robustly to unseen tasks. The data and code are publicly released at https://github.com/haokunwen/FashionLens.

0
Minimalist Visual Inertial Odometry

Visual-Inertial Odometry(VIO), which is critical to mobile robot navigation, uses cameras with a large number of pixels. Capturing and processing camera images requires significant resources. This work presents a minimalist approach to planar odometry, demonstrating that just four visual measurements and an IMU can provide robust motion estimation for differential-drive robots. Our key insight is that four downward-facing photodiodes that sense the world through optical Gabor masks produce signals that encode speed. Based on this, we jointly optimize the mask parameters alongside a Temporal Convolutional Network (TCN) using a physically-grounded simulator. The resulting model decodes speed from just the four measurements produced by the photodiodes. Pairing these estimates with the angular speed from an IMU yields a continuous planar trajectory. We validate our approach with a prototype sensor mounted on a differential drive robot. Across diverse indoor and outdoor terrains, our system closely tracks the reference ground truth without any real-world fine-tuning. Our work shows that minimalist sensing enables efficient and accurate planar odometry.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - May 22, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

motionvid.ai icon
motionvid.ai

Your AI Motion & Video Editor

0
WordPress 7.0 icon
WordPress 7.0

WordPress Armstrong is here

0
buildpipe icon
buildpipe

Compose, run and automate multi step AI developer workflows

0
AGG Identify icon
AGG Identify

A lightweight, secure streamlined OIDC and OAuth2 provider

0
Reader Alive icon
Reader Alive

Translate, listen to, and ask questions about your ebooks

0
moop icon
moop

The social network of taste.

0
TestSprite 3.0 icon
TestSprite 3.0

Let a fleet of parallel agents test your app in minutes.

0
JAMtime.ai icon
JAMtime.ai

Just tell your guitar pedal how to sound

0
iPromise icon
iPromise

Bring "Body Doubling" to your Mac notch.

0
Cleo icon
Cleo

The AI PM that runs your team

0
Shuffle Design CLI icon
Shuffle Design CLI

Multi-AI CLI for building and redesigning websites

0
General Compute icon
General Compute

AI models that run on an inference cloud optimized for speed

0
Nota: AI Notes & Voice icon
Nota: AI Notes & Voice

Nota chatbot. Nota blank page. Just your ideas, sharper.

0
SuprSend AI icon
SuprSend AI

AI-first platform for multi-channel notifications

0
whosthere icon
whosthere

Local Area Network discovery tool with an interactive TUI

0
Training Data - AI Microgames icon
Training Data - AI Microgames

As an AI you must gather training data by playing microgames

0
Auto Posts icon
Auto Posts

Your Socials. Finally on Autopilot

0
DecisionBox for Databricks icon
DecisionBox for Databricks

Connect DecisionBox to your Databricks to validate findings

0
HelioPeak icon
HelioPeak

Solar monitoring for pvoutput on every Apple device.

0
DCP icon
DCP

Give your AI agents encrypted permission and keys

0
Nugget AI icon
Nugget AI

Turn customer interviews into your product roadmap

0
Prosed icon
Prosed

Go from newsletters & podcasts to published manuscript

0
Our Stories icon
Our Stories

A storytelling tool for raising bilingual kids

0
Zero Assist icon
Zero Assist

Real-time AI cheating detection for technical interviews.

0
Smart Miles icon
Smart Miles

Automatic trip tracking for tax-ready exports

0
Faby icon
Faby

Your virtual coworker with its own computer living in Slack

0
TongueType for macOS icon
TongueType for macOS

Local dictation for macOS without the subscription

0
Tycoon AI icon
Tycoon AI

Run one-person companies entirely with AI agents

0
WeWeb 3.0 icon
WeWeb 3.0

Vibe-code apps with the safety net of a no-code editor

0
Framed icon
Framed

Turn screenshots, videos, and code into polished visuals

0
WarmIntro icon
WarmIntro

Free tool to find your warmest path into any company

0
Google Antigravity 2.0 icon
Google Antigravity 2.0

Orchestrate multi-agent workflows from a desktop app

0
Basedash Skills icon
Basedash Skills

Reusable AI instructions for every Basedash surface.

0
AutoSubtitles 2.0 icon
AutoSubtitles 2.0

AI subtitles & animated captions with faster editing

0
Vivaldi 8.0 icon
Vivaldi 8.0

New unified look for full customization

0
Novi Notes 1.1 icon
Novi Notes 1.1

A local AI memory layer for your Mac

0
CatchAll by NewsCatcher icon
CatchAll by NewsCatcher

Build any dataset from the web. Filtered to your criteria.

0
Tacet icon
Tacet

The brain monitor for cognitive health scores

0
Visual Usability Checker icon
Visual Usability Checker

Validate your design decisions instantly with AI insights

0
Mintlify Workflows icon
Mintlify Workflows

Self-updating knowledge bases

0
InstaVM icon
InstaVM

Instant computers for AI agents

0
AlliHat icon
AlliHat

Claude AI in your Safari sidebar

0
Ente Locker icon
Ente Locker

Shared vault for your most important documents

0
Slideshot icon
Slideshot

Product demo videos, recorded by your AI agent

0
Mixpanel Headless icon
Mixpanel Headless

Programmatic access to product analytics for agents and devs

0
Invenio icon
Invenio

Local AI search for Mac video & photo libraries

0
Emdash icon
Emdash

One app. Every coding agent. Open-source.

0
Contextberg icon
Contextberg

Turn your work into AI agent memory, served over MCP

0
Re_gent icon
Re_gent

Version Control for AI agent Activity

0
StoreClaw icon
StoreClaw

Grow your store profits with agents that know how to sell

0
06

TECHMEME

06.00
TECHMEME

Techmeme - May 22, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Zero2IPO Research: Chinese AI startups raised $16.2B in Q1 2026, up 185% YoY, led by top AI labs including Moonshot, Z.ai, and MiniMax (Karen Tian/South China Morning Post)
Source: TechmemePublished: May 22, 2026

Karen Tian / South China Morning Post : Zero2IPO Research: Chinese AI startups raised $16.2B in Q1 2026, up 185% YoY, led by top AI labs including Moonshot, Z.ai, and MiniMax —  Funding for China's artificial-intelligence-related start-ups jumped nearly threefold year on year in the first quarter, as investors poured capital …

Memo: Yusuf Mehdi, a 35-year Microsoft veteran who has been its commercial chief marketing officer since 2023, will leave the company after the next fiscal year (Ashley Stewart/Business Insider)
Source: TechmemePublished: May 22, 2026

Ashley Stewart / Business Insider : Memo: Yusuf Mehdi, a 35-year Microsoft veteran who has been its commercial chief marketing officer since 2023, will leave the company after the next fiscal year —  Yusuf Mehdi, a 35-year Microsoft veteran who has been its commercial chief marketing officer since 2023, will leave the company …

Sources: DeepSeek execs told potential investors in its ongoing $10B round that it will prioritize groundbreaking AI research over short-term commercialization (Lulu Yilun Chen/Bloomberg)
Source: TechmemePublished: May 22, 2026

Lulu Yilun Chen / Bloomberg : Sources: DeepSeek execs told potential investors in its ongoing $10B round that it will prioritize groundbreaking AI research over short-term commercialization —  DeepSeek's senior management has told potential investors in its ongoing 70 billion yuan ($10 billion) funding round that the startup …

Italian police dismantle a piracy network that streamed paid content from Netflix and others via an app called Cinemagoal with annual subscriptions of €40-€130 (Ana-Maria Stanciuc/The Next Web)
Source: TechmemePublished: May 22, 2026

Ana-Maria Stanciuc / The Next Web : Italian police dismantle a piracy network that streamed paid content from Netflix and others via an app called Cinemagoal with annual subscriptions of €40-€130 —  Italy's Guardia di Finanza said on Friday it had dismantled an audiovisual piracy operation that streamed paid content from Sky …

Republican House Oversight Committee Chairman James Comer says he requested information from Kalshi and Polymarket on their efforts to prevent insider trading (Justin Papp/CNBC)
Source: TechmemePublished: May 22, 2026

Justin Papp / CNBC : Republican House Oversight Committee Chairman James Comer says he requested information from Kalshi and Polymarket on their efforts to prevent insider trading —  Rep. James Comer, R-Ky., chair of the House Oversight and Government Reform Committee, announced Friday on CNBC's “Squawk Box” …

Driverless car startup Bliq.ai says it has received approval for fully driverless road operations in Estonia, the first authorization of its kind in the EU (Cate Lawrence/Tech.eu)
Source: TechmemePublished: May 22, 2026

Cate Lawrence / Tech.eu : Driverless car startup Bliq.ai says it has received approval for fully driverless road operations in Estonia, the first authorization of its kind in the EU —  The authorisation is the first of its kind in the EU, allowing its remotely supervised autonomous vehicles to operate on public roads without a driver behind the wheel.

Sources and documents detail Satya Nadella's effort to revamp Microsoft's senior leadership, creating a startup-style operating model to compete in the AI race (Ashley Stewart/Business Insider)
Source: TechmemePublished: May 22, 2026

Ashley Stewart / Business Insider : Sources and documents detail Satya Nadella's effort to revamp Microsoft's senior leadership, creating a startup-style operating model to compete in the AI race —  CEO Satya Nadella just dismantled the senior leadership structure that has run Microsoft for decades.

US and Canadian authorities arrest 23-year-old Jacob Butler, known online as "Dort", for allegedly operating the Kimwolf DDoS botnet, which infected ~2M devices (Sergiu Gatlan/BleepingComputer)
Source: TechmemePublished: May 22, 2026

Sergiu Gatlan / BleepingComputer : US and Canadian authorities arrest 23-year-old Jacob Butler, known online as “Dort”, for allegedly operating the Kimwolf DDoS botnet, which infected ~2M devices —  U.S. and Canadian authorities arrested and charged a Canadian man with operating the KimWolf distributed denial-of-service …

Meta releases Forum, a Reddit-like standalone app for Facebook Groups, with a feed showing Group conversations and an AI-powered "Ask" feature, on iOS (Mariella Moon/Engadget)
Source: TechmemePublished: May 22, 2026

Mariella Moon / Engadget : Meta releases Forum, a Reddit-like standalone app for Facebook Groups, with a feed showing Group conversations and an AI-powered “Ask” feature, on iOS —  It's a new dedicated app for Facebook Groups.  —  Meta has launched a new app called Forum without fanfare or even an official announcement.

Internal Binance reports and sources reveal Iran moved billions through Binance to fund entities like the IRGC, with some transactions as recent as this month (Wall Street Journal)
Source: TechmemePublished: May 22, 2026

Wall Street Journal : Internal Binance reports and sources reveal Iran moved billions through Binance to fund entities like the IRGC, with some transactions as recent as this month —  Transactions on world's largest crypto exchange took place despite repeated red flags; Binance says it has ‘zero-tolerance for illicit activity’

AMD CEO Lisa Su projects the CPU market will grow over 35% annually through 2031, up from 3% to 4% historically, driven by AI inference and agentic AI demand (Cheng Ting-Fang/Nikkei Asia)
Source: TechmemePublished: May 22, 2026

Cheng Ting-Fang / Nikkei Asia : AMD CEO Lisa Su projects the CPU market will grow over 35% annually through 2031, up from 3% to 4% historically, driven by AI inference and agentic AI demand —  TAIPEI — AMD CEO Lisa Su is predicting the market for central processing units (CPUs) will grow massively over the next five years …

Lenovo reports Q4 revenue up 27% YoY to $21.6B, above $18.7B est., net profit up 479% to $521M, above $271M est., as the PC maker pushes into AI server markets (Reuters)
Source: TechmemePublished: May 22, 2026

Reuters : Lenovo reports Q4 revenue up 27% YoY to $21.6B, above $18.7B est., net profit up 479% to $521M, above $271M est., as the PC maker pushes into AI server markets —  Lenovo Group (0992.HK) reported a better-than-expected 27% jump in quarterly revenue on Friday, as strong consumer demand …

Sources: Nintendo asked partners and suppliers to assemble ~20M Switch 2 consoles by March 2027, ~20% above the 16.5M public sales outlook issued in early May (Takashi Mochizuki/Bloomberg)
Source: TechmemePublished: May 22, 2026

Takashi Mochizuki / Bloomberg : Sources: Nintendo asked partners and suppliers to assemble ~20M Switch 2 consoles by March 2027, ~20% above the 16.5M public sales outlook issued in early May —  Nintendo Co. has asked partners and suppliers to assemble about 20 million Switch 2 consoles in the year through March …

Meta, Broadcom, Applied Materials, GlobalFoundries, and Synopsys launch a $125M "Semiconductor Hub" at UCLA to advance AI chip research and more (CJ Haddad/CNBC)
Source: TechmemePublished: May 22, 2026

CJ Haddad / CNBC : Meta, Broadcom, Applied Materials, GlobalFoundries, and Synopsys launch a $125M “Semiconductor Hub” at UCLA to advance AI chip research and more —  Broadcom, Meta, Applied Materials, GlobalFoundries and Synopsys are joining forces to launch a $125 million “Semiconductor Hub” at the UCLA Samueli School of Engineering.

Sources: David Sacks told President Trump that federal reviews of AI models before release would slow down innovation and hurt the US in its AI race with China (Politico)
Source: TechmemePublished: May 22, 2026

Politico : Sources: David Sacks told President Trump that federal reviews of AI models before release would slow down innovation and hurt the US in its AI race with China —  But during a conversation with Trump, Sacks told the president that companies were already cooperating, and that having …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - May 22, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - May 22, 2026

Solidot Feed: Highlighting essential tech & open-source news.

国际空间站俄罗斯舱段再次发生漏气事故

NASA 证实国际空间站的俄罗斯舱段再次发生漏气事故。过去五年俄罗斯航天局和 NASA 一直在追踪俄罗斯舱段的空气泄漏,漏气的舱段位于 Progress(进步号)气闸舱和 Zvezda(星辰号)服务舱之间的 PrK 模块,漏气原因是微小的结构裂缝。今年 1 月 NASA 宣布在多次检查和密封处理后 PrK 舱段的内部压力已经稳定,不再漏气。然而 PrK 舱段的漏气情况在三周前再次出现。NASA 表示它正与俄罗斯航天局协调后续处理步骤。此次事件再次引发了对国际空间站长期生存能力的担忧。

亚马逊去年在破坏工会的咨询服务上的支出为 2660 万美元

根据 Economic Policy Institute (EPI)的报告,美国雇主每年在反工会活动上的开支逾 15 亿美元。雇主雇佣从事工会规避服务的顾问和律所,在工会选举和活动期间提供法律咨询、代理和诉讼服务。美国公司每年在反工会咨询服务的开支上多达 4.42 亿美元,根据亚马逊递交到劳工部的文件,2025 年它在雇佣反工会顾问上的开支为 2660 万美元。目前美国的工会覆盖率仅为 10%,而 1983 年这一比例为 20.3%。而盖洛普民调显示,近七成美国民众支持工会。由于拖延战术和上诉,美国工人平均需要 465 天才能达成第一份工会合同,很多情况下时间甚至更长,如星巴克自 2021 年美国首家门店赢得工会选举以来工人至今仍未达成第一份工会合同。

Google 宣布在 AI 模式下加入更多广告

Google 本周二宣布搜索框将变成 AI 聊天机器人的对话框,那么它久经时间考虑的商业模式——搜索广告——自然也会跟着进入 AI 模式。Google 周三宣布将在 AI 模式中引入更多“富有帮助的广告(helpful ads)”。搜索巨人表示在测试两类新广告,提供相关产品的细节和有用的指导。作为广告的一部分,它们都会包含一个独立的 AI 解释器。广告也都会标明“赞助”字样。两类新广告其一称之为“对话式发现广告”——广告即答案;其二称之为“高亮答案”(Highlighted Answers)——将高度相关的广告作为推荐列表的一部分提供给用户。

NASA 预计中国将在 2027 年执行载人绕月飞行任务

NASA 局长 Jared Isaacman 表示他预计中国将在 2027 年执行载人绕月飞行任务,他正以此为由要求修改阿尔忒弥斯计划,加快美国重返月球的步伐。Isaacman 称,下次全世界观看宇航员绕月飞行时——很可能是 2027 年的某个时候——他们将是中国宇航员,美国将不再是唯一能将人类送入月球环境的国家。中国尚未公布月球载人飞行的时间表。迄今所有载人绕月飞行、轨道飞行或登月任务均由 NASA 执行:包括 1968-1972 年间的九次阿波罗计划以及今年四月的阿尔忒弥斯 2 号任务。

Vivaldi 8.0 释出

基于 Chromium 的浏览器 Vivaldi 释出了 8.0 版本。Vivaldi 由 Opera 联合创始人谭咏文(Jon von Tetzchner)创办。Vivaldi 8.0 的新特性包括:被称为 Unified 的新外观,所有元素都统一在一个视觉平面上;提供了六种预设布局,其中之一是垂直标签,用户可选择垂直左侧、垂直右侧两种垂直标签布局,其它还有经典、简洁、自动隐藏以及底部四种布局。

SpaceX 最大的收入来源是与 Anthropic 达成的数据中心交易

SpaceX 周三晚上向美国证券交易委员会(SEC)递交了招股说明书,首次披露了其财务状况。根据招股说明书,在合并了马斯克(Elon Musk)旗下的 xAI 和 X/Twitter 之后,SpaceX 最大的收入来源就是今年五月与 Anthropic 达成的为期三年的数据中心交易,租用 Colossus 1 园区的算力,每月支付 12.5 亿美元。但这笔交易并非是保障性,任何一方都可以提前 90 天通知终止交易。其它数据包括:2025 年营收 187 亿美元,营业亏损 26 亿美元,净亏损 49 亿美元。其中卫星宽带 Starlink / Connectivity 业务营收 114 亿美元营业利润 44 亿美元,太空发射业务营收 41 亿美元运营亏损 6.57 亿美元,AI 以及社媒业务营收 32 亿美元营业亏损 64 亿美元。招股书数百次提及 AI。马斯克持有 12.3% 的 A 类股和 93.6% 的 B 类股,B 类股投票权十倍于 A 类股,马斯克总共控制着公司 85.1% 的投票权。如果他出售任何 B 类股,它们将自动转换为 A 类股。

Google 的 AI 搜索容易被人为操纵

Google 的 AI 搜索非常容易被人为操纵。因为以前的搜索结果是第一页给你 10 个链接然后让用户判断,现在的 AI 搜索是给你一个答案,而答案的来源可能只有一个。BBC 科技记者通过个人网站上一篇热狗文章演示了这一操纵。专家表示此类操纵正大规模系统性地发生。操纵 AI 搜索向用户提供偏见或不准确信息可能会带来严重后果。这并非一个无关紧要的问题。在全球范围内,逾 10 亿人日常使用 AI 聊天机器人,每月有 25 亿人浏览 Google 的 AI overviews。如果你能操控此类工具就能获得巨大的权力。Google 等公司也注意到了该问题。, Google 上周更新了其政策,将试图操纵 AI 回复的行为视为违反公司规定。Google 威胁对涉嫌操纵行为的公司或网站从搜索结果中移除或降低排名。

RTX 5090DV2 显卡列入封禁清单

上周五,中国海关将去年 8 月英伟达为通过美国出口管制规定而推出的 RTX 5090DV2 显卡列入封禁清单。该清单最初包括 H200 和 H20。H20 是英伟达此前在中国市场销售的另一款中国特供芯片。在京东和淘宝等主要电商平台,RTX 5090DV2 仍在销售,价格在 1.8 万-2.2 万元之间,意味着现有库存仍然能正常销售,但随着进口的消失,其数量将会越来越少。

Google 意外公开了未修复 Chromium 漏洞的利用代码

Google 周三公开了一个未修复 Chromium 漏洞的利用代码。该漏洞影响所有使用基于 Chromium 浏览器的用户。独立安全研究员 Lyra Rebane 在 2022 年底向 Google 报告了漏洞,但 29 个月后它仍然没有修复。本周三上午 Google 向 Chromium 的 bug 跟踪系统披露了漏洞,Rebane 一开始以为漏洞已经修复了,结果发现根本没有。Google 虽然之后删除了帖子,但其内容已被其它网站存档。该漏洞滥用了 Chromium 的 Browser Fetch API 打开一个持续活动的 Service Worker,恶意网站可通过 JavaScript 触发该 Service Worker 创建连接,监视用户的部分活动,它还可作为代理访问网站和发起 DDoS 攻击。安全研究人员认为这是一个严重的漏洞,它实际上相当于一个受限的后门,将浏览器变成僵尸网络的一部分。

三星电子劳资谈判达成初步协议,罢工终止

三星电子工会在 20 日 23 时总罢工启动仅剩最后 1 个小时之际,与三星电子公司戏剧性地达成了协议,罢工终止。根据双方达成的就绩效奖金方案初步协议,负责半导体业务的设备解决方案(DS)部门员工今年有望获得最高约 6 亿韩元(约合人民币 272.3 万元)的绩效奖金。劳资商定维持既有的年终绩效奖金(OPI)制度的同时,为 DS 部门新设半导体特别绩效奖金。公司将拿出业绩的 10.5% 作为特别绩效奖金资金来源,不设上限。资金来源中的 40% 将分配给 DS 部门,其余 60% 分配给子部门,向行政部门统一发放的绩效奖金为 DS 子部门存储芯片事业部的 70% 水平。人均绩效奖金规模有望达 6 亿韩元。

安娜档案馆被判向图书出版商赔偿 1950 万美元

Penguin Random House、Elsevier 和 HarperCollins 等 13 家大型图书出版商今年 3 月联合起诉安娜的档案(Anna’s Archive),指控该影子图书馆助长图书盗版。出版商此举旨在获得法庭禁令,对安娜的档案的域名注册商施压。安娜的档案已经深陷了多起诉讼,去年底流媒体巨头 Spotify 和唱片公司起诉安娜的档案导致其失去了 .org 主域名。本周美国地区法官 Jed S. Rakoff 签署了一项缺席判决书,完全满足了出版商的要求,安娜档案馆被判向出版商赔偿 1950 万美元。法官还发布了一项范围广泛的永久禁令,要求二十多家全球域名注册商、托管商和服务提供商立即关闭安娜的档案的其余域名。鉴于网站运营者身份匿名,赔偿金基本不可能兑现,因此它面临的影响主要是禁令,如美国公司 Cloudflare 和 OwnRegistrar 将需要遵守禁令。

Firefox 将移除 asm.js 相关代码

Mozilla 宣布 Firefox 未来将移除 asm.js 相关代码,因为它早有了后继者 WebAssembly,同时维护两者耗费时间且增加攻击面。asm.js 是 Mozilla 对 NaCl 和 PNaCl 的回应:通过选择一个严格静态的 JavaScript 子集获得类似 NaCl/PNaCl 的性能,同时代码又能直接运行在 Web 内容中。asm.js 于 2013 年随 Firefox 22 发布,获得了巨大的成功,证明只使用 Web 技术就能在 Web 上以接近原生的速度运行代码,它为 WebAssembly 的诞生铺平了道路,WebAssembly 在 2019 年成为 W3C 标准。Mozilla 从 Firefox 148 开始 JS 引擎 SpiderMonkey 默认禁用 asm.js 优化,未来版本将完全移除相关代码,使用 asm.js 的网站不会受到影响,开发者建议想要继续使用 asm.js 发布内容的网站重编译到 WebAssembly,它的执行速度更快,二进制文件更小。

Google 云服务 GCP 不小心将其大客户 Railway 的账号封禁

2024 年 Google 云服务 GCP 的错误配置导致澳大利亚退休基金管理公司 UniSuper 的数据被完全删除,幸运的是 UniSuper 在另一家公司有备份。这起事故导致 UniSuper 下线了一周多时间。2026 年 5 月 19 日 GCP 发生了一起类似的严重事故,它的自动系统将其大客户、PaaS 平台 Railway.com 的生产账号给封了,导致 Railway 的服务下线,根据 Railway 官方博客的事故报告,宕机持续了大约 8 个小时。账号封禁发生在 19 日 22:10 UTC,导致 Railway 失去了 GCP 相关的基础设施,这些基础设施支持了控制面板、API 以及部分网络基础设施。Railway 立即联系了 GCP 的客户经理,22:29 UTC 账号恢复,但计算实例、磁盘以及网络都需要逐个慢慢恢复,直到第二天 07:58 UTC 事故才完全解决。Railway 宣布将降低对 GCP 的依赖,计划将 GCP 从热路径中移除,保留作为备份/故障转移服务。

为何日本的花粉过敏如此严重

日本的花粉过敏症是一个全国性健康问题,估计 43% 的日本人出现中度至重度症状。相比下英国是 26%,美国为 12%-18%。每年春天日本全国各地的城市街道上人人都戴上口罩,原因就是花粉引发的过敏性鼻炎。为什么日本的花粉过敏问题如此严重?原因与健康不佳、污染甚至自然环境都关系不大,而是与二战后日本政客的决策有关。战争期间,石油和天然气短缺迫使日本转向其最丰富的自然资源——森林——作为家庭和工业的燃料来源。天然森林遭到大面积砍伐,东京、大阪和神户等城市周围山林被砍伐殆尽。二战之后,由于光秃秃的山容易引发山体滑坡和洪涝灾害,政府决定开展大规模植树造林。政府选择了两种快速生长的树种:日本杉(sugi)和日本扁柏(hinoki)。今天这些杉树和柏树的种植面积占到了国土面积的五分之一。问题是杉树和柏树在生长 30 年成熟之后会产生大量轻质花粉。而几乎所有人工林的年龄都超过 30 岁了。为了缓解过敏症日本政府如今计划砍掉五分之一的杉树林,替换上新树种。

Fedora 移除深度桌面环境包

在 openSUSE 之后,Fedora 发行版移除了深度桌面环境包(Deepin Desktop)。2025 年初 SUSE 安全团队在一次例行审查中发现深度桌面环境有名叫 deepin-feature-enable 的软件包,该软件包是在 2021 年 4 月加入的,并没有咨询或通知 SUSE,它包含了一个“许可协议对话框(license agreement dialog)”,基本上说讲因为 openSUSE 的安全规定,它禁用了 deepin-api 和 deepin-daemon 需要的所有 dbus 和 polkit 功能,这可能导致 Deepin Desktop 不能正常工作,部分功能无效。如果用户不在意这些安全问题,可选择点击确认,之后会自动安装缺少的 dbus 和 polkit。安全团队的调查发现,deepin-daemon 中的核心组件从未递交进行安全审查,它们被悄悄的引入到了 openSUSE 中。鉴于 Deepin 社区过去几年多次违规,openSUSE 决定移除 Deepin Desktop。Fedora 项目随后也对深度桌面环境包展开安全审查,期间开发者发现难以联系部分深度软件包的维护者,因为安全担忧和软件包缺乏维护,它最终决定移除深度桌面环境。

OpenAI 和英伟达等在模型中加入了对 SynthID 水印的支持

Google 在三年前推出了用于标记 AI 图像的数字水印技术 SynthID,它称 SynthID 至今被用于标记了 1000 亿张图像和视频。Google 去年在 Gemini 应用中添加了 SynthID 检测功能。用户上传可疑内容,询问聊天机器人是否是 AI 生成的。Google 称至今还没有人成功破解 SynthID,宣布与多家 AI 公司合作加入对该水印技术的支持。英伟达的 Cosmos、OpenAI 的 GPT 2 图像、Kakao 和 ElevenLabs 都将在其 AI 生成内容中加入对 SynthID 的支持。

全球疫苗接种率下滑

全球疫苗接种率下滑。在医疗体系陷入混乱的新冠疫情过去后,疫苗接种率今未能恢复至以前的水平。2024 年麻疹疫情已蔓延至 59 个国家。麻疹病毒传染性极强,如果同一空间中有感染者,没有相关免疫的人群几乎 100% 会被感染。该病的并发症有肺炎、中耳炎等,甚至可能导致脑炎,变成重症。预防麻疹必须要靠疫苗。想要维持群体免疫、防止疫情扩散,疫苗接种率需达到 95% 以上。新冠疫情期间,由于出行限制,民众普遍推迟了其他疫苗的接种。医疗机构方面,接种人员和治疗人员也侧重于应对新冠疫情。加上其他传染病的流行得到抑制,认为无需接种疫苗的人越来越多,导致全球疫苗接种率持续走低。除麻疹以外,其他传染病也呈现出类似趋势。2024 年白喉、百日咳、破伤风三联疫苗的接种率全球所有地区都低于 2010 年以后的峰值水平。

地月之间的最高效路线

科学家开发出一种数学方法,能更精确地计算天体轨道之间最经济的旅行路线。以地月为例,与此前最节能的路线相比,新路线所需燃料减少了 58.80 米/秒。与旅程的预估总成本 3342.96 米/秒相比,这一差距看似微小,却对任务成本影响巨大。团队表示,在太空旅行中,每1米/秒的速度变化,都意味着巨大的燃料消耗。基于这一结果,团队绘制出一条从地球轨道到月球轨道的航天器飞行轨迹,并将其分为两个阶段。首先,航天器脱离地球轨道,进入L1拉格朗日点周围的轨道。L1拉格朗日点位于地球和月球之间,在这里,两天体的引力恰好相互抵消。借助控制系统,航天器可以无限期地保持在这个中间轨道上,直到任务准备就绪,再执行进入月球轨道的第二阶段。

GitHub 证实黑客窃取了其内部代码库

GitHub 通过 X 平台官方账号证实黑客窃取了其内部代码库,它正对此展开调查。此前黑客组织 TeamPCP 通过 Breached 论坛声称获得了 GitHub 内部源代码和内部组织的访问权限,窃取了大约 3800 个代码库,它对想要访问源代码的人开出了 5 万美元的报价。TeamPCP 坚称这不是勒索,只要有人开出不低于 5 万美元的报价,它们会在收钱之后销毁数据,如果没有买家则将会免费公开。GitHub 称它的调查显示一名员工的计算机被入侵,其源头是安装的恶意 VS Code 扩展,他们移除了扩展隔离了设备,正继续进行调查。GitHub 表示目前没有证据表明客户数据受到影响。

Kickstarter 撤销对成人内容的全面封禁

众筹平台 Kickstarter 上周修改了规则,扩大了禁止的成人内容范围。此前它只禁止“色情内容”,更新后的规则显著扩大了成人内容范围,包括但不限于:暗示性行为,MILF/DILF 内容,暗示性裸露,任何包含女性乳头/乳晕、生殖器和肛门的内容。在引发争议之后,Kickstarter 证实它修改规则是在支付处理商 Stripe 压力下做出的,而 Stripe 受到了更大的金融系统的制约。过去几个月 Kickstarter 上进行众筹的项目有许多其筹款账号被 Stripe 暂停,因此它修改规则以满足 Stripe 限制成人内容的要求。但这一做法受到了社区的批评,它现在决定撤销新的规则,回归旧规则,但同时添加了 Stripe 政策的相关链接。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK