TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0874
SAT, MAY 23, 2026
Discover the best information organized by OrangeBot.AI
TODAY · SAT, MAY 23, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

May 23, 2026

Here is a summary of today's main news events.

AI Enthusiasm Drives Stock Market to New Records

U.S. stock markets, including the S&P 500, continued to reach record highs, marking the eighth straight week of gains. This surge is primarily driven by strong investor confidence and excitement surrounding advancements in artificial intelligence (AI), which has overshadowed broader economic concerns like inflation and global conflicts.

Intense Diplomatic Efforts to Prevent Renewed Conflict with Iran

International mediators, including officials from Qatar and Pakistan, are engaged in urgent negotiations to maintain a ceasefire and prevent a return to full-scale war involving Iran. These diplomatic efforts aim to de-escalate tensions and establish humanitarian corridors, as the outcome holds significant implications for global stability and energy markets.

AI Race Heats Up for Funding as U.S. Delays Regulation

Competition among leading AI firms, including those led by Elon Musk and Sam Altman, is intensifying as they battle for massive capital investments from Wall Street. At the same time, the U.S. government has postponed signing a new executive order on AI safety amid concerns that strict regulations could slow American innovation and give a competitive edge to China.

SpaceX Launches New Starship Ahead of Record-Breaking IPO

Elon Musk’s company, SpaceX, successfully launched a new version of its massive Starship rocket. The event showcases key technology as the company prepares for what is expected to be the largest Initial Public Offering (IPO) in history, with investor demand so high it may delay the start of trading.

Mixed Day for Global Commodities

Key commodity markets experienced varied movements. Crude oil prices rose due to geopolitical uncertainty in the Middle East, while U.S. natural gas prices fell on forecasts of cooler weather reducing demand. In precious metals, both gold and silver prices declined, ending the week with losses.

Europe Braces for First Major Heatwave of the Summer

Authorities across several European countries have issued public health warnings as the continent prepares for its first significant heatwave of the season. Citizens are being urged to take precautions as temperatures are expected to rise sharply in the coming days.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - May 23, 2026

Hacker News Feed: Highlighting key posts and discussions.

I Miss Terry Pratchett

(www.mahl.me)

12665
Is AI Profitable Yet?

(isaiprofitable.com)

227175
CISA tries to contain data leak

(krebsonsecurity.com)

22752
Deno 2.8

(deno.com)

380159
Cleve Moler has died

(www.mathworks.com)

28527
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - May 23, 2026

huggingface.description

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.

167
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded social cognition in MLLMs.

157
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better distinguish high-reward responses from low-reward ones. To address this limitation, we propose DelTA, a discriminative token credit assignment method that estimates token coefficients to amplify side-specific token-gradient directions and downweight shared or weakly discriminative ones. These coefficients reweight a self-normalized RLVR surrogate, making the effective side-wise centroids more contrastive and thereby reshaping the RLVR update direction. On seven mathematical benchmarks, DelTA outperforms the strongest same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base, respectively. Additional results on code generation, a different backbone, and out-of-domain evaluations further demonstrate the generalization ability of DelTA.

138
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

The rise of personal assistant agents, e.g., OpenClaw, highlights the growing potential of large language models to support users across everyday life and work. A core challenge in these settings is proactive assistance, since users often begin with underspecified requests and leave important needs, constraints, or preferences unstated. However, existing benchmarks rarely evaluate whether agents can identify and act on such hidden intents before they are explicitly stated, especially in sustained multi-turn interactions where user needs emerge gradually. To address this gap, we introduce π-Bench, a benchmark for proactive assistance comprising 100 multi-turn tasks across 5 domain-specific user personas. By incorporating hidden user intents, inter-task dependencies, and cross-session continuity, π-Bench evaluates agents' ability to anticipate and address user needs over extended interactions, jointly measuring proactivity and task completion in long-horizon trajectories that better reflect real-world use. Experiments show (1) proactive assistance remains challenging, (2) a clear distinction between task completion and proactivity, and (3) the value of prior interaction for proactive intent resolution in later tasks.

89
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesirable trade-off among efficiency, training cost, and accuracy. In this work, we show that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only minimal adaptation. Our approach is built on three observations: (1) only a small subset of attention heads truly requires full long-context processing; (2) long-range retrieval is governed primarily by a low-dimensional subspace, allowing relevant tokens to be retrieved efficiently with a 16-dimensional indexer; and (3) the useful token budget is strongly query-dependent, making dynamic top-p selection more suitable than fixed top-k sparsification. Based on these insights, we propose RTPurbo, which retains the full KV cache only for retrieval heads and introduces a lightweight token indexer for sparse attention. By exploiting the model's intrinsic sparsity, RTPurbo achieves sparsification with only a few hundred training steps. Experiments on long-context benchmarks and reasoning tasks show that RTPurbo preserves near-lossless accuracy while delivering substantial efficiency gains, including up to a 9.36times prefill speedup at 1M context and about a 2.01times decode speedup. These results suggest that strong sparse inference can be obtained from standard full-attention training without expensive native sparse pretraining.

82
ACC: Compiling Agent Trajectories for Long-Context Training

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

56
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

Simulation-ready physical 3D assets have emerged as a promising direction owing to their broad applicability in downstream tasks. However, most existing 3D generation methods either neglect physical properties or are limited to a single asset category, e.g., rigid, deformable, or articulated objects. To address these limitations, we introduce PhysX-Omni, a unified framework for simulation-ready physical 3D generation across diverse asset types. Specifically, we develop a novel and efficient geometry representation tailored for Vision-Language Models, which directly encodes high-resolution 3D structures without compression, significantly improving generation performance. In addition, we construct the first general simulation-ready 3D dataset, PhysXVerse, covering diverse indoor and outdoor categories. Furthermore, to comprehensively and flexibly evaluate both generative and understanding capabilities in the wild, we propose PhysX-Bench, which encompasses six key attributes: geometry, absolute scale, material, affordance, kinematics, and function description. Extensive experiments with conventional metrics and PhysX-Bench show that PhysX-Omni performs strongly in both generation and understanding. Moreover, additional studies further validate the potential of PhysX-Omni for applications in simulation-ready scene generation and robotic policy learning. We believe PhysX-Omni can significantly advance a wide range of downstream applications, particularly in embodied AI and physics-based simulation.

44
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous audio-visual signals into discrete tokens, weakening temporal grounding and shifting intermediate reasoning toward language priors. We argue that a unified latent space is a better medium for such reasoning because it preserves dense sensory information while remaining compatible with autoregressive generation. Based on this insight, we propose LatentOmni, a cross-modal reasoning framework that interleaves textual reasoning with audio-visual latent states. LatentOmni introduces feature-level supervision to align latent reasoning states with task-relevant sensory features and uses Omni-Sync Position Embedding (OSPE) to maintain temporal consistency between latent audio and visual states. We further construct LatentOmni-Instruct-35K, a dataset of audio-visual interleaved reasoning trajectories for supervising latent-space reasoning. Comprehensive evaluation across multiple audio-visual reasoning benchmarks demonstrates that LatentOmni achieves the best performance among the evaluated open-source models and consistently outperforms the Explicit Text CoT baseline, supporting latent-space joint reasoning as a promising path toward stronger omnimodal understanding.

37
Forecasting Scientific Progress with Artificial Intelligence

Artificial intelligence (AI) is increasingly embedded in scientific discovery, yet whether it can anticipate scientific progress remains unclear. To study this question, we introduce a temporally grounded evaluation framework for forecasting scientific progress under controlled knowledge constraints. We present CUSP (Cutoff-conditioned Unseen Scientific Progress), a multi-disciplinary and event-level benchmark that evaluates scientific forecasting in AI systems through feasibility assessment, mechanistic reasoning, generative solution design, and temporal prediction. Across 4,760 scientific events, we observe systematic and domain-dependent limitations in current frontier models. While models can identify plausible research directions from competing candidates, they fail to reliably predict whether scientific advances will be realized and systematically misestimate when they will occur. Performance is highly heterogeneous across domains, with the timing of AI progress more predictable than advances in biology, chemistry, and physics. Performance is largely insensitive to whether events occur before or after the training cutoff, suggesting these limitations cannot be explained solely by knowledge exposure in training data. Under controlled information access, additional pre-cutoff knowledge improves performance but does not close the gap to full-information settings, which becomes more pronounced for high-citation advances. Models also exhibit systematic overconfidence and strong response biases, indicating unreliable uncertainty estimation. Taken together, current AI systems fall short as predictive tools for scientific progress. Access to prior knowledge does not translate into reliable forecasting, and performance benefits more from post-event information than from forward-looking prediction.

33
WorldKV: Efficient World Memory with World Retrieval and Compression

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/

32
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications. We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as domain-specific evaluation tasks in areas such as finance and supply chain management, which we compile into the new Domain-Spreadsheet benchmark dataset. It also includes a Spreadsheet Gym environment designed for multi-turn RL: Spreadsheet Gym exposes extensive Excel functionality through a Python sandbox, along with a refined harness that incorporates a comprehensive tool set and carefully designed tool-routing rules for spreadsheet tasks. Through comprehensive experiments, we show that Spreadsheet-RL substantially enhances AI agent's performance on both general and domain-specific spreadsheet tasks: it improves Qwen3-4B-Thinking-2507's Pass@1 on SpreadsheetBench from 12.0% to 23.4%, and raises Pass@1 from 8.4% to 17.2% on our curated Domain-Spreadsheet dataset. These results highlight Spreadsheet-RL's strong potential for generalization and real-world adoption in spreadsheet automation, and broadly, its promise for advancing LLM-based interactions with data interfaces in everyday work.

32
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

29
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Extending the generation horizon of video diffusion models to long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions of bidirectional models, which are tightly coupled to specific architectures and suffer from quality degradation over long horizons, and autoregressive models, which accumulate drift errors due to exposure bias and tend to produce repetitive motion patterns. To address these issues, we propose a novel but simple inference-time approach for long video generation that is architecture-agnostic and requires no additional training. Our method generates long videos via overlapping sliding windows, where predicted clean samples from adjacent windows are blended via Tweedie matching to enforce both manifold constraint and temporal consistency across overlap regions. Stochastic early-phase sampling then synchronizes per-window trajectories by injecting fresh noise after each Tweedie matching correction in the high-noise phase, before transitioning to deterministic ODE sampling to preserve fine-grained visual fidelity. Applied to various video generation models, our method generates videos several times longer than the native window length while outperforming both training-free and autoregressive baselines in temporal consistency and visual quality, and further extends to audio-video joint generation and text-to-3DGS without any fine-tuning.

24
Unsupervised Process Reward Models

Process Reward Models (PRMs) are a powerful mechanism for steering large language model reasoning by providing fine-grained, step-level supervision. However, this effectiveness comes at a significant cost: PRMs require expert annotations for every reasoning step, making them costly and difficult to scale. Here, we propose a method for training unsupervised PRMs (uPRM) that requires no human supervision, neither at the level of step-by-step annotations nor through ground-truth verification of final answers. The key idea behind our approach is to define a scoring function, derived from LLM next-token probabilities, that jointly assesses candidate positions of first erroneous steps across a batch of reasoning trajectories. We demonstrate the effectiveness of uPRM across diverse scenarios: (i) uPRM achieves up to 15% absolute accuracy improvements over the LLM-as-a-Judge in identifying first erroneous steps on the ProcessBench dataset; (ii) as a verifier for test-time scaling, uPRM performs comparably to supervised PRMs and outperforms the majority voting baseline by up to 6.9%, and (iii) when used as a reward signal in reinforcement learning, uPRM enables more robust policy optimization throughout training compared to a supervised PRM trained using ground-truth labels. Overall, our results open a path toward scalable reward modeling for complex reasoning tasks.

23
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens distortion, and compression artifacts. This raises a fundamental question: how robust is the spatial intelligence of current MLLMs when visual observations are imperfect? To answer this question, we introduce SpaceDG, the first large-scale dataset for degradation-aware spatial understanding. It is constructed with a physically grounded degradation synthesis engine that embeds degradation formation process into 3D Gaussian Splatting (3DGS) rendering, enabling realistic simulation of nine degradation types. The resulting dataset contains approximately 1M QA pairs from nearly 1,000 indoor scenes. We further introduce SpaceDG-Bench, an human-verified benchmark with 1,102 questions spanning 11 reasoning categories and 9 visual degradation types, yielding over 10K VQA instances. Evaluating 25 open- and closed-source MLLMs reveals that visual degradations consistently and substantially impair spatial reasoning, exposing a critical robustness gap. Finally, we show that finetuning on SpaceDG markedly improves degradation robustness and can even surpass human performance under degraded conditions without any performance drop on clean images, highlighting the promise of degradation-aware training for robust spatial intelligence.

23
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild video data is incompatible with ADS expecting structured, multi-modal sensor inputs for validation and training. To bridge this data gap, we propose Sensor2Sensor, a novel generative modeling paradigm that translates in-the-wild monocular dashcam videos into a high-fidelity, multi-modal sensor suite (AV logs) comprising multi-view camera images and LiDAR point clouds. A core challenge is the lack of paired training data. We address this by converting real AV logs into dashcam-style videos via 4D Gaussian Splatting (4DGS) reconstruction and novel-view rendering. Sensor2Sensor then utilizes a diffusion architecture to perform the generative conversion. We perform comprehensive quantitative evaluations on the fidelity and realism of the generated sensor data. We demonstrate Sensor2Sensor's practical utility by converting challenging in-the-wild internet and dashcam footage into realistic, multi-modal data formats, further unlocking vast external data sources for AV development.

22
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay. But the active edit still uses a single scalar gate to control two different things: how much old content to erase on the key side and how much new content to commit on the value side. We introduce Gated DeltaNet-2, which generalizes both Gated DeltaNet and KDA by inheriting adaptive forgetting and channel-wise decay while addressing their shared limitation, the scalar tie between erasing and writing. Gated Delta Rule-2 separates these roles with a channel-wise erase gate b_t and a channel-wise write gate w_t, reducing to KDA when both gates collapse to the same scalar and to Gated DeltaNet when the decay also collapses. We derive a fast-weight update view, a chunkwise WY algorithm with channel-wise decay absorbed into asymmetric erase factors, and a gate-aware backward pass that preserves efficient parallel training. At 1.3B parameters trained on 100B FineWeb-Edu tokens, Gated DeltaNet-2 achieves the strongest overall results among Mamba-2, Gated DeltaNet, KDA, and Mamba-3 variants across language modeling, commonsense reasoning, and retrieval. Its advantage is most pronounced on long-context RULER needle-in-a-haystack benchmarks, where it improves the evaluated multi-key retrieval setting and remains strong in both recurrent and hybrid settings. Code is available at https://github.com/NVlabs/GatedDeltaNet-2.

20
Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.

19
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs offer distinct advantages across diverse domains, yet current frameworks fail to exploit the complementary strengths of models and skills, thereby limiting their performance on downstream tasks. In this paper, we present Maestro (Multimodal Agent for Expert-Skill Targeted Reinforced Orchestration), a Reinforcement Learning (RL)-driven orchestration framework that reframes heterogeneous multimodal tasks as a sequential decision-making process over a hierarchical model-skill registry. Rather than consolidating all knowledge into a single model, Maestro trains a lightweight policy to dynamically compose ensembles of frozen expert models and a two-tier skill library, deciding at each step whether to invoke an external expert, which model-skill pair to select, and when to terminate. The policy is optimized via outcome-based RL, requiring no step-level supervision. We evaluate Maestro across ten representative multimodal benchmarks spanning mathematical reasoning, chart understanding, high-resolution perception, and domain-specific analysis. With only a 4B orchestrator, Maestro achieves an average accuracy of 70.1%, surpassing both GPT-5 (69.3%) and Gemini-2.5-Pro (68.7%). Crucially, the learned coordination policy generalizes to unseen models and skills without retraining: augmenting the registry with out-of-domain experts yields a 59.5% average on four challenging benchmarks, outperforming all closed-source baselines. Maestro further maintains high computational efficiency with low latency. The source code is available at https://github.com/jinyangwu/Maestro.

18
Training Large Language Models to Predict Clinical Events

Longitudinal clinical notes contain rich evidence of how patients evolve over time, but converting this signal into training supervision for clinical prediction remains challenging. We extend Foresight Learning to clinical prediction by converting time-ordered MIMIC-III notes into examples consisting of past patient context, a natural-language question about a possible future event, and a label resolved from later documentation. This process yields 6,900 prediction examples from 702 admissions across medications, procedures, organ support, microbiology, and mortality. A small LoRA adapter trained on these examples improves over the prompted base model, reducing expected calibration error from 0.1269 to 0.0398 and Brier score from 0.199 to 0.145, while slightly outperforming GPT-5 point estimates on held-out questions. The approach enables reusable clinical prediction supervision from longitudinal notes without hand-engineered structured features or endpoint-specific classifiers.

14
Forecasting Downstream Performance of LLMs With Proxy Metrics

Progress in language model development is often driven by comparative decisions: which architecture to adopt, which pretraining corpus to use, or which training recipe to apply. Making these decisions well requires reliable performance forecasts, yet the two commonly used signals are fundamentally limited. Cross-entropy loss is poorly aligned with downstream capabilities, and direct downstream evaluation is expensive, sparse, and often uninformative at early training stages. Instead, we propose to construct proxy metrics by aggregating token-level statistics, such as entropy, top-k accuracy, and expert token rank, from a candidate model's next token distribution over expert-written solutions. Across three settings, our proxies consistently outperform loss- and compute-based baselines: 1) For cross-family model selection, they rank a heterogeneous population of reasoning models with mean Spearman Rho = 0.81 (vs. Rho = 0.36 for cross-entropy loss); 2) For pretraining data selection, they reliably rank 25 candidate corpora for a target model at roughly 10{,}000times less compute than direct evaluation, pushing the Pareto frontier beyond existing methods; and 3) for training-time forecasting, they extrapolate downstream accuracy across an 18times compute horizon with roughly half the error of existing alternatives. Together, these results suggest that expert trajectories are a broadly useful source of signal for assessing model capabilities, enabling reliable performance forecasting throughout the model development life cycle.

10
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/

10
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

LLMs are widely adopted in production, pushing inference systems to their limits. Disaggregated LLM serving (e.g., PD separation and KV state disaggregation) improves scalability and cost efficiency, but it also turns KV into an explicit payload crossing network and storage boundaries, making KV a dominant end-to-end bottleneck. Existing KV compression are typically static runtime configurations, despite production service context varies over time in workload mix, bandwidth, and SLO/quality budgets. As a result, a fixed choice can be suboptimal or even increase latency. We present KVServe, the first service-aware and adaptive KV communication compression framework for disaggregated LLM serving: KVServe (1) unifies KV compression into a modular strategy space with new components and cross-method recomposition; (2) introduces Bayesian Profiling Engine that efficiently searches this space and distills a 3D Pareto candidate set, reducing 50times offline search overhead; and (3) deploys a Service-Aware Online Controller that combines an analytical latency model with a lightweight bandit to select profiles under constraints and correct offline-to-online mismatch. Integrated into vLLM and evaluated across datasets, models, GPUs and networks, KVServe achieves up to 9.13times JCT speedup in PD-separated serving and up to 32.8times TTFT reduction in KV-disaggregated serving.

10
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize multimodal evidence from heterogeneous sources. In this paper, we introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking that shifts the paradigm from passive evidence consumption to active evidence acquisition. Given only a clinical query and access to raw data sources, ClinSeekAgent gathers evidence by querying medical knowledge bases, navigating raw EHRs, and invoking medical imaging tools; refines its hypotheses as new information emerges; and integrates the collected evidence into grounded clinical decisions. ClinSeekAgent serves both as an inference-time agent for frontier LLMs and as a training-time pipeline for distilling high-quality agent trajectories into compact open-source models. To validate its inference-time effectiveness, we construct ClinSeek-Bench, which pairs Curated Input reasoning from fixed pre-selected evidence with Automated Evidence-Seeking over raw clinical data. On text-only EHR tasks, ClinSeekAgent improves Claude Opus 4.6 from 60.0 to 63.2 overall F1 and MiniMax M2.5 from 43.1 to 47.3, with positive risk-prediction gains in 7 out of 9 evaluated host models. On multimodal tasks, ClinSeekAgent improves Claude Opus 4.6 from 47.5 to 62.6 (+15.1); all evaluated models improve across the three CXR-related task groups. We further validate ClinSeekAgent as a training pipeline by distilling agentic evidence-seeking trajectories into ClinSeek-35B-A3B, which achieves 34.0 average F1 on existing AgentEHR-Bench, improving over its Qwen3.5-35B-A3B baseline by +11.9 points and approaching Claude Opus 4.6.

7
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1) narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2) spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3) production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchical multi-agent framework that transforms a user's single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-based story generation module that enforces short-drama pacing and narrative coherence; (2) a 3D-grounded first-frame generation mechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3) multi-stage reviewer loops that perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introduce scene-level BGM matching and scene transition planning to improve the audience's immersive experience. To systematically evaluate this task, we introduce Short-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.

7
LoREnc: Low-Rank Encryption for Securing Foundation Models and LoRA Adapters

Foundation models and low-rank adapters enable efficient on-device generative AI but raise risks such as intellectual property leakage and model recovery attacks. Existing defenses are often impractical because they require retraining or access to the original dataset. We propose LoREnc, a training-free framework that secures both FMs and adapters via spectral truncation and compensation. LoREnc suppresses dominant low-rank components of FM weights, compensates for the missing information in authorized adapters, and further applies orthogonal reparameterization to obscure structural fingerprints of the protected adapter. Unauthorized users produce structurally collapsed outputs, while authorized users recover exact performance. Experiments demonstrate that LoREnc provides strong protection against model recovery with under 1% computational overhead.

6
Swift Sampling: Selecting Temporal Surprises via Taylor Series

While most frames in long-form video are redundant, the critical information resides in temporal surprises: moments where the actual visual features deviate from their predicted evolution. Inspired by the human brain's predictive coding, we introduce Swift Sampling, an elegant, training-free frame selection algorithm that automatically identifies high-information moments in a video. Specifically, we model a video as a differentiable trajectory in the visual latent space and compute the velocity and acceleration of its features. Then, we apply Taylor expansion to project the expected path of subsequent frames. Frames that diverge sharply from this predicted manifold are identified as temporally surprising frames and selected for sampling. Unlike prior training-free methods that rely on auxiliary networks or video-specific hyperparameter tuning, Swift Sampling is incredibly lightweight, adding only 0.02x additional computational cost over baseline making it 30x cheaper overhead than leading baselines. Across three long-video question answering benchmarks and 10 different downstream tasks, Swift Sampling outperforms uniform sampling and prior query-agnostic baselines. It is especially powerful for long videos with limited frame budgets improving accuracy by up to +12.5 points.

6
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

Traditional visual object tracking (VOT) methods typically rely on task-specific supervised training, limiting their generalization to unseen objects and challenging scenarios with distractors, occlusion, and nonlinear motion. Recent vision foundation models, exemplified by SAM 2, learn strong video understanding priors from large-scale pretraining and offer a promising foundation for building more robust and generalizable trackers. However, directly applying SAM 2 to VOT remains suboptimal, as it does not explicitly model target motion dynamics or enforce geometric and semantic consistency across frames, both of which are essential for reliable tracking. To address this issue, we propose SAMOSA, a new tracking framework that adapts SAM 2 to complex VOT scenarios by explicitly leveraging motion, geometry, and semantic cues. Specifically, we introduce a lightweight nonlinear motion predictor to model target dynamics and guide mask selection as well as memory filtering. We further exploit semantic cues to detect target shifts and recover from tracking failures, while geometric cues are incorporated as structural constraints to improve tracking stability. In this way, SAMOSA bridges the gap between the implicit video understanding prior of SAM 2 and explicit tracking-oriented modeling. Extensive experiments show that SAMOSA consistently outperforms state-of-the-art SAM 2--based approaches on general benchmarks, demonstrates stronger generalization than supervised VOT methods, and achieves substantial gains on anti-UAV datasets, which typify complex nonlinear motion scenarios. Our code is available at https://github.com/DurYi/SAMOSA.

5
SceneAligner: 3D-Grounded Floorplan Localization in the Wild

Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.

5
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

Aligning Text-to-Image (T2I) generation models with human preferences increasingly relies on image reward models that score or rank generated images according to prompt alignment and perceptual quality. Existing reward models are commonly trained as Bradley-Terry (BT) preference models on large-scale human preference corpora, making them costly to train, difficult to adapt, and opaque in their evaluation criteria. Meanwhile, Vision-Language Model (VLM) judges can provide more fine-grained assessments through textual rubrics, but their manually designed or heuristically generated scoring rules may fail to reliably reflect human preferences. In this paper, we propose AutoRubric-T2I, the first rubric learning framework in T2I that automatically synthesizes and selects explicit rubrics for guiding VLM judges. AutoRubric-T2I first synthesizes reasoning traces from preference pairs into candidate rubrics, then uses a VLM judge to score paired images under each rubric, producing pairwise rubric-score differences for preference learning. To remove noisy and redundant rules, we further employ a ell_1-Regularized Logistic Regression Refiner, which selects the Top-N most discriminative rubrics. Extensive evaluations show that AutoRubric-T2I produces high-quality, interpretable reward signals using less than 0.01% of the annotated preference data, substantially reducing the need for large-scale reward-model training. On image reward benchmarks such as MMRB2, AutoRubric-T2I outperforms strong reward model baselines. We further validate AutoRubric-T2I as an RL reward on downstream T2I tasks, including TIIF and UniGenBench++, where it improves generation quality over scalar reward models using the Flow-GRPO pipeline on diffusion models.

4
Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation

Manufacturable chip layouts must satisfy thousands of geometry-based design rules, and design rule checking (DRC) enforces them by running executable DRC scripts on layouts. Translating natural language rules into correct DRC scripts is labor-intensive and requires specialized expertise, motivating LLM agents for DRC script synthesis and debugging. However, existing benchmarks have small evaluation sets and often evaluate scripts by code similarity rather than execution correctness, and prior machine learning-based methods either ignore execution feedback or require labeled test layouts as agent's input. To this end, we introduce Rule2DRC, a large-scale benchmark for DRC script coding agents with 1,000 rule-to-script tasks and 13,921 evaluation chip layouts for execution-based scoring. Rule2DRC provides an evaluation pipeline that measures functional correctness via DRC execution outcomes without requiring evaluation layouts as input to the agent. We also propose SplitTester, a tester agent for program selection that uses execution feedback to generate discriminative test cases and separate previously indistinguishable candidate scripts, substantially improving Best-of-N selection performance in this domain. We release the code at https://github.com/snu-mllab/Rule2DRC.

4
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

How should an agent decide when and how to plan? A dominant approach builds agents as reactive policies with adaptive computation (e.g., chain-of-thought), trained end-to-end expecting planning to emerge implicitly. Without control over the presence, structure, or horizon of planning, these systems dramatically increase reasoning length, yielding inefficient token use without reliable accuracy gains. We argue efficient agentic reasoning benefits from decomposing decision-making into three systems: simulative reasoning (System II) grounding deliberation in future-state prediction via a world model; self-regulation (System III) deciding when and how deeply to plan via a learned configurator; and reactive execution (System I) handling fine-grained action. Simulative reasoning provides unified planning across diverse tasks without per-domain engineering, while self-regulation ensures the planner is invoked only when needed. To test this, we develop SR^2AM (Self-Regulated Simulative Reasoning Agentic LLM), realizing both as distinct stages within an LLM's chain-of-thought, with the LLM as world model. We explore two instantiations: recording decisions from a prompted multi-module system (v0.1) and reconstructing structured plans from traces of pretrained reasoning LLMs (v1.0), trained via supervised then reinforcement learning (RL). Across math, science, tabular analysis, and web information seeking, v0.1-8B and v1.0-30B achieve Pass@1 competitive with 120-355B and 685B-1T parameter systems respectively, while v1.0-30B uses 25.8-95.3% fewer reasoning tokens than comparable agentic LLMs. RL increases average planning horizon by 22.8% while planning frequency grows only 2.0%, showing it learns to plan further ahead rather than more often. More broadly, learned self-regulation instantiates a principle we expect to extend beyond planning to how agents govern their own learning and adaptation.

4
Bernini: Latent Semantic Planning for Video Diffusion

Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize images and videos with photorealistic fidelity. We argue that these two families can be unified through a simple division of labor: MLLMs perform semantic planning, while diffusion models render pixels from high-level semantic guidance and low-level visual features. Building on this idea, we propose Bernini, a unified framework for video generation and editing. An MLLM-based planner predicts the target semantic representation directly in the ViT embedding space, and a DiT-based renderer synthesizes pixels conditioned on this plan, augmented by text features and, for editing, source VAE features for detail preservation. Because semantics serve as the interface, the planner and renderer can be trained separately and only lightly co-trained, preserving the pretrained strengths of both components while keeping training efficient. To better handle multiple visual inputs, we introduce Segment-Aware 3D Rotary Positional Embedding (SA-3D RoPE), and further incorporate chain-of-thought reasoning in the planner to better transfer understanding into generation. Bernini achieves state-of-the-art performance across a wide range of video generation and editing benchmarks, with the MLLM's pretrained understanding translating into strong generalization on challenging editing tasks.

4
Diversed Model Discovery via Structured Table Discovery

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline

4
"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

As large language models (LLMs) increasingly shape how users form, refine, and extend their goals, attributing contributions in human-AI collaboration becomes critical for users calibrating their own reliance and for evaluators assessing AI-assisted work. Yet existing methods focus on final artifacts, missing the process through which goals themselves are jointly shaped. We introduce a goal-level attribution framework, CoTrace, that decomposes explicit goals into verifiable requirements and traces both direct contributions and indirect influences across dialogue turns. Applying CoTrace to 638 real-world collaboration logs, we find that while models account for only 11-26% of goal-shaping contribution, they contribute substantially more on introducing lower-level concrete requirements, and make various kinds of indirect contributions. Through controlled simulations, we show that interaction design choices significantly affect model goal-shaping behavior. In a user study, exposing participants to goal-level analyses shifts their perceived contributions by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand their own AI-assisted work.

3
TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks

We introduce TerminalWorld, a scalable data engine that automatically reverse-engineers high-fidelity evaluation tasks from "in-the-wild" terminal recordings. Processing 80,870 terminal recordings, the engine yields a full benchmark of 1,530 validated tasks, spanning 18 real-world categories, ranging from short everyday operations to workflows exceeding 50 steps, and covering 1,280 unique commands. From these, we curate a Verified subset of 200 representative, manually reviewed tasks. Comprehensive benchmarking on TerminalWorld-Verified across eight frontier models and six agents reveals that current systems still struggle with authentic terminal workflows, achieving a maximum pass rate of only 62.5%. Moreover, TerminalWorld captures real-world terminal capabilities distinct from existing expert-curated benchmarks (e.g., Terminal-Bench), with only a weak correlation to their scores (Pearson r=0.20). The automated engine makes TerminalWorld authentic and scalable by construction, enabling it to evaluate agents in real-world terminal environments as developer practices evolve. Data and code are available at https://github.com/EuniAI/TerminalWorld.

3
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators

Interactive streaming music generation promises the use of generative models for live performance and co-creation that is impossible with offline models. However, SOTA models exist in the discrete-AR regime, requiring industrial levels of compute for both training and inference. In this work, we investigate whether audio diffusion models, with their wide support in the open-source community but non-streaming bidirectional nature, can be repurposed efficiently into interactive models accessible on consumer hardware. By taking a critical look at the modern pipeline for block-wise outpainting diffusion, we identify critical inefficiencies during inference that result in strictly worse computational efficiency than their discrete-AR counterparts. We propose Live Music Diffusion Models (LMDMs), a simple modification of the generative diffusion process that recovers, and then outperforms, the inference complexity of the discrete Live Music Models (LMMs) through block-wise KV Caching. Unlike LMMs, LMDMs further enable stable post-training alignment through our novel ARC-Forcing paradigm, reducing error accumulation without any explicit RL or reward models. We demonstrate the application of LMDMs in a number of creative domains, including text-conditioned generation, sketch-based music synthesis, and jamming. We finally show how LMDMs can be used as a generative instrument in a real artist-AI collaboration, utilizing LMDMs as a "generative delay" to transform musicians' improvisation live for variable timbral effects while running locally on a consumer gaming laptop.

2
SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild

3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first promptable framework for multi-animal 3D reconstruction from a single image. Built on the SMAL+ parametric animal model, our method jointly reconstructs multiple instances and supports flexible prompts in the form of keypoints and masks which enable more reliable disambiguation in crowded and occluded scenes. To train such a model, we further introduce Herd3D, a multi-animal 3D dataset containing over 5K images, designed to increase diversity in species, interactions, and occlusion patterns. Experiments on the Animal3D, APTv2, and Animal Kingdom datasets show that our framework achieves state-of-the-art results over both existing model-based and model-free methods, demonstrating a scalable and effective solution for prompt-driven animal 3D reconstruction in the wild.

2
Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry

The Strong Platonic Representation Hypothesis suggests that representational convergence in artificial neural networks can be harnessed constructively: embeddings can be translated across models through a universal latent space without paired data. We ask whether an analogous geometry can be recovered across human brains. Using fMRI data from the Natural Scenes Dataset, we propose a self-supervised encoder that learns subject-specific embeddings from brain data alone by exploiting repeated stimulus presentations. We show that these independently learned spaces can be translated across subjects using unsupervised orthogonal rotations, without paired cross-subject samples or intermediate model representations. Synchronizing pairwise rotations into a single shared latent space further improves cross-subject retrieval, indicating that subject-specific spaces are mutually compatible with a common coordinate system. These results provide evidence for a shared neural geometry in the human visual cortex: subject-specific fMRI representations are approximately isometric across individuals and can be translated through purely geometric transformations.

2
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding. We evaluate AnyMo on three complementary tasks: zero-shot activity recognition across 14 unseen downstream datasets, cross-modal retrieval, and wearable IMU motion captioning, where it improves average Accuracy/F1/R@2 by 11.7\%/11.6\%/22.6\% on HAR, increases zero-shot IMU-to-text and text-to-IMU retrieval MRR by 15.9\% and 28.6\%, respectively, and improves zero-shot captioning BERT-F1 by 18.8\%. These results support AnyMo as a generalist model for wearable motion understanding in the wild. Project page: https://baiyuchen.com/project/AnyMo.

2
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

2
DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

Representation Autoencoders (RAEs) leverage frozen vision foundation models (VFMs) as tokenizer encoders, providing robust high-level representations that facilitate fast convergence and high-quality generation in latent diffusion models. However, freezing the VFM inherently constrains its spatial reconstruction capacity, limiting fine-grained generation and image editing; in contrast, incorporating reconstruction-oriented signals via fine-tuning disrupts the pretrained semantic space and degrades generative fidelity. To address this trade-off, we propose DecQ, a simple yet effective framework for RAEs. Specifically, DecQ introduces lightweight detail-condensing queries that extract fine-grained information from intermediate VFM features through condenser modules. These queries are incorporated into the decoder to support reconstruction and are jointly generated with patch tokens during generative modeling. By aggregating information from both shallow and deep layers, DecQ effectively mitigates the reconstruction--generation trade-off, improving both reconstruction quality and generative performance. Our experiments demonstrate that: (1) with only 8 additional queries and 3.9% extra computation, DecQ improves reconstruction over the frozen DINOv2-based RAE, increasing PSNR from 19.13 dB to 22.76 dB; and (2) for generative modeling, DecQ achieves 3.3times faster convergence than RAE, attaining an FID of 1.41 without guidance and 1.05 with guidance.

2
OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Omni-proactive streaming video understanding, i.e., autonomously deciding when to speak and what to say from continuous audio-visual streams, is an emerging capability of omni-modal large language models. Existing benchmarks fall short in three key aspects: they rely primarily on visual signals, adopt polling or fixed-timestamp protocols instead of true proactive evaluation, and cover only a limited range of tasks, preventing reliable assessment and differentiation of omni-proactive streaming models. We present OmniPro, the first benchmark to jointly evaluate omni-modal perception, proactive responding, and diverse video understanding tasks. It comprises 2,700 human-verified samples spanning 9 sub-tasks and 3 cognitive levels, covering 6 basic video understanding capabilities. Notably, 84% of samples require audio signals (speech or non-speech), and each sample is annotated with modality-isolation labels to enable fine-grained multimodal analysis. We further introduce a dual-mode evaluation protocol: Probe mode assesses content understanding by querying the model before and after each ground-truth trigger, while Online mode evaluates full proactive ability by requiring models to autonomously decide when to respond in streaming input. Evaluating 11 representative models reveals three key findings: (1) audio provides consistent gains but with highly variable utilization across models, (2) performance degrades significantly over time, indicating limited long-horizon robustness, and (3) non-speech audio perception remains the weakest dimension.

2
More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{é} ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot LLMs from 12B to 123B parameters. The results show that more context is not uniformly better: full-document context improves supervised DeBERTa encoders by 3.8--4.8 macro-F1 points over sentence-only input, but does not consistently help zero-shot LLMs. Retrieved moral knowledge is more consistently useful in matched comparisons, improving each tested model family and context condition under early fusion. However, scaling from DeBERTa-v3-base to large and from 12B to larger LLMs does not guarantee gains, and simple early fusion outperforms the tested late-fusion and cross-attention RAG variants for encoders. Per-value analyses show that context and retrieval help most for socially situated or conceptually confusable values. These findings suggest that value-sensitive NLP should evaluate context, knowledge, and model family jointly rather than treating longer inputs or larger models as universal improvements.

2
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws

Scaling laws have made language-model performance predictable from model size, data, and compute, but they typically treat the optimizer as a fixed training detail. We show that this assumption misses a fundamental axis of representation scaling: how effectively the optimizer converts added FFN width into utilized spectral capacity. Using eigenspectra of feed-forward network representations, measured through soft and hard spectral-ranks, we find that the same Transformer architecture realizes markedly different spectral scaling laws when trained with different optimizers. Holding architecture and width schedule fixed, AdamW exhibits weak hard-rank scaling (β=0.44) on rare-token (TAIL) representations where learning is known to be hardest, whereas Muon achieves linear scaling (β=1.02) in the same regimes, a 2.3times increase in the scaling exponent. This difference is not reducible to validation loss: AdamW configurations can match low-rank Dion variants in perplexity, under extended training, while exhibiting sharply different spectral geometry, demonstrating that matched loss does not imply matched representation structure. Hard--soft rank asymmetry further reveals that optimizers differ not only in how much capacity is realized, but also in how that capacity is structured across eigenmodes. To disentangle optimizer effects from architectural ones, we compare against architectural interventions (e.g., attention rank and positional encoding), and find that optimizer-induced spectral shifts often exceed the architectural effects. These results suggest optimization as a first-class axis of representation scaling, motivating optimizer--architecture co-design.

2
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, but outcome-based RLVR remains inefficient on hard problems because correct final-answer rollouts are rare and sample-level credit assignment cannot use partial progress in failed attempts. We introduce SCRL (Subproblem Curriculum Reinforcement Learning), a curriculum RL framework that derives verifiable subproblems from reference reasoning chains and fixes the final subproblem as the original problem. This turns partial progress on hard problems into verifiable learning signals. Algorithmically, SCRL uses subproblem-level normalization, which normalizes rewards independently at each subproblem position and assigns the resulting advantages to the corresponding answer spans, enabling finer-grained credit assignment without external rubrics or reward models. Our analysis shows that subproblem curricula lift hard problems out of gradient dead zones, with larger relative gains as the original problem becomes harder. Across seven mathematical reasoning benchmarks, SCRL outperforms strong curriculum-learning baselines, improving average accuracy over GRPO by +4.1 points on Qwen3-4B-Base and +1.9 points on Qwen3-14B-Base. On AIME24, AIME25, and IMO-Bench, SCRL further improves pass@1 by +3.7 points and pass@64 by +4.6 points on Qwen3-4B-Base, indicating better exploration on hard reasoning problems.

2
Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation

Class imbalance is a fundamental challenge in medical image segmentation, where frequent classes typically dominate training at the expense of rare classes. Loss-based approaches mitigate imbalance by reweighting the per-pixel loss within the batch, while sampling strategies control which images enter the batch. Yet neither explicitly controls which classes appear within the batch, leaving rare-class exposure only partially rebalanced. In this work, we adopt episodic sampling from few-shot learning to promote class-balanced batch construction in a fully supervised setting. We decouple episodic sampling from its conventional metric-learning context and evaluate it in body composition segmentation in CT. We compare episodic sampling against random and weighted sampling on nine muscle and adipose tissues, derived from 210 scans of the public SAROS dataset. Training is performed under full- and low-data regimes, with additional comparisons under matched training iteration budgets. Under full-data training, all three strategies performed comparably (mean Dice 0.882 for episodic, 0.878 for random and weighted). Under low-data training, episodic sampling outperformed random and weighted (0.787 vs. 0.758 and 0.762), driven by a 12-fold difference in training iterations. Under matched training budgets, random and weighted overfit earlier, while episodic improved for approximately three times more iterations before plateauing. Our findings identify the training iteration budget as under-recognized confound in sampling strategies, motivating iteration-aware evaluation protocols for small datasets. Furthermore, the residual advantage of episodic sampling is consistent with an implicit regularization effect of class-balanced batches, offering a low-cost, model-agnostic strategy for class-imbalanced medical image segmentation. Code is available at https://github.com/iasonsky/episodic-sampling.

1
FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning

Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to develop a unified framework capable of handling diverse realistic fashion retrieval scenarios, achieving truly versatile fashion image retrieval. To establish a data foundation, we first introduce U-FIRE, a comprehensive benchmark that consolidates fragmented fashion datasets into a unified collection, supplemented by two manually curated datasets for testing generalization. Building upon this, we propose FashionLens, a unified framework based on Multimodal Large Language Models. To handle divergent matching objectives, we design a Proposal-Guided Spherical Query Calibrator that dynamically shifts query representations into task-aligned metric spaces via adaptive spherical linear interpolation. Additionally, to mitigate the optimization imbalance caused by varying task complexities and data scales, we develop a Gradient-Guided Adaptive Sampling strategy that automatically re-weights tasks based on realtime learning difficulty and the data scale prior. Experiments on U-FIRE show that FashionLens achieves state-of-the-art performance across diverse retrieval scenarios and generalizes robustly to unseen tasks. The data and code are publicly released at https://github.com/haokunwen/FashionLens.

0
Minimalist Visual Inertial Odometry

Visual-Inertial Odometry(VIO), which is critical to mobile robot navigation, uses cameras with a large number of pixels. Capturing and processing camera images requires significant resources. This work presents a minimalist approach to planar odometry, demonstrating that just four visual measurements and an IMU can provide robust motion estimation for differential-drive robots. Our key insight is that four downward-facing photodiodes that sense the world through optical Gabor masks produce signals that encode speed. Based on this, we jointly optimize the mask parameters alongside a Temporal Convolutional Network (TCN) using a physically-grounded simulator. The resulting model decodes speed from just the four measurements produced by the photodiodes. Pairing these estimates with the angular speed from an IMU yields a continuous planar trajectory. We validate our approach with a prototype sensor mounted on a differential drive robot. Across diverse indoor and outdoor terrains, our system closely tracks the reference ground truth without any real-world fine-tuning. Our work shows that minimalist sensing enables efficient and accurate planar odometry.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - May 23, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

note.md icon
note.md

Local-first markdown based workspace for research writings

0
Spantop icon
Spantop

Turn any Mac into a real second monitor

0
Kosshi icon
Kosshi

Simple, fast outliner for Mac and iPhone.

0
Memdex icon
Memdex

Turn every AI conversation into reusable local memory

0
Command A+ icon
Command A+

Cohere’s open enterprise workhorse

0
Forsy icon
Forsy

Capture and sell your AI agent workflow data

0
Vibedock icon
Vibedock

Toggle Claude Code MCP servers from your menu bar

0
Google Antigravity CLI icon
Google Antigravity CLI

Run coding agents directly from your terminal

0
Bulkmark icon
Bulkmark

Transform your Twitter/X Bookmarks into real knowledge

0
SignalLEMO - Ai Outreach Made Simple icon
SignalLEMO - Ai Outreach Made Simple

AI-powered lead outreach for field service contractors

0
RetroMac icon
RetroMac

Turn your Mac into a time machine.

0
Area Contrast Checker icon
Area Contrast Checker

Drag, Select, Know. A new way to check A11y contrast

0
Finderlock icon
Finderlock

Lock Mac files in Finder with Touch ID & AES-256

0
Coca 2.0 icon
Coca 2.0

Keep Your Mac and Apps Awake!

0
Shroomie icon
Shroomie

AI-powered news made fun and habit-forming

0
WordPress 7.0 icon
WordPress 7.0

Introducing AI tools, new admin experience & design controls

0
motionvid.ai icon
motionvid.ai

Your AI Motion & Video Editor

0
Reader Alive icon
Reader Alive

Translate, listen to, and ask questions about your ebooks

0
Nota: AI Notes & Voice icon
Nota: AI Notes & Voice

Turn voice, scans, sketches, and text into notes with AI

0
whosthere icon
whosthere

Local Area Network discovery tool with an interactive TUI

0
HelioPeak icon
HelioPeak

Solar monitoring for pvoutput on every Apple device.

0
AGG Identify icon
AGG Identify

A lightweight, secure streamlined OIDC and OAuth2 provider

0
Shuffle Design CLI icon
Shuffle Design CLI

Multi-AI CLI for building and redesigning websites

0
Our Stories icon
Our Stories

A storytelling tool for raising bilingual kids

0
DCP icon
DCP

Give your AI agents encrypted permission and keys

0
Nugget AI icon
Nugget AI

Turn customer interviews into your product roadmap

0
Prosed icon
Prosed

Go from newsletters & podcasts to published manuscript

0
buildpipe icon
buildpipe

Compose, run and automate multi step AI developer workflows

0
Zero Assist icon
Zero Assist

Real-time AI cheating detection for technical interviews

0
DecisionBox for Databricks icon
DecisionBox for Databricks

Connect DecisionBox to your Databricks to validate findings

0
SuprSend AI icon
SuprSend AI

AI-first platform for multi-channel notifications

0
JAMtime.ai icon
JAMtime.ai

Just tell your guitar pedal how to sound

0
Auto Posts icon
Auto Posts

Schedule social post, Telegram messages + more

0
Faby icon
Faby

Your virtual coworker with its own computer living in Slack

0
iPromise icon
iPromise

Bring "Body Doubling" to your Mac notch

0
moop icon
moop

A social network without media

0
Smart Miles icon
Smart Miles

Automatic trip tracking for tax-ready exports

0
Cleo icon
Cleo

The AI PM that runs your team

0
TestSprite 3.0 icon
TestSprite 3.0

Let a fleet of parallel agents test your app in minutes

0
General Compute icon
General Compute

AI models that run on an inference cloud optimized for speed

0
Tycoon AI icon
Tycoon AI

Run one-person companies entirely with AI agents

0
WeWeb 3.0 icon
WeWeb 3.0

Vibe-code apps with the safety net of a no-code editor

0
Basedash Skills icon
Basedash Skills

Reusable AI instructions for every Basedash surface.

0
Framed icon
Framed

Turn screenshots, videos, and code into polished visuals

0
WarmIntro icon
WarmIntro

Free tool to find your warmest path into any company

0
Visual Usability Checker icon
Visual Usability Checker

Validate your design decisions instantly with AI insights

0
Mixpanel Headless icon
Mixpanel Headless

Programmatic access to product analytics for agents and devs

0
Google Antigravity 2.0 icon
Google Antigravity 2.0

Orchestrate multi-agent workflows from a desktop app

0
Mintlify Workflows icon
Mintlify Workflows

Self-updating knowledge bases

0
AlliHat icon
AlliHat

Claude AI in your Safari sidebar

0
06

TECHMEME

06.00
TECHMEME

Techmeme - May 23, 2026

Techmeme Digest: Major tech headlines and industry conversations.

A profile of TP-Link, whose share of the US consumer router market grew from 10% to 60%+ between 2019 and 2025, as it seeks to rebut national security concerns (Noah Berman/The Wire China)
Source: TechmemePublished: May 23, 2026

Noah Berman / The Wire China : A profile of TP-Link, whose share of the US consumer router market grew from 10% to 60%+ between 2019 and 2025, as it seeks to rebut national security concerns —  When Jeffrey Chao was studying for his master's in computer science in the early 1990s, he would sometimes spend so many hours …

As the US House probes Airbnb's use of Chinese AI models, CEO Brian Chesky says the company is not sharing data with Chinese firms and uses open-source models (Natalie Lung/Bloomberg)
Source: TechmemePublished: May 23, 2026

Natalie Lung / Bloomberg : As the US House probes Airbnb's use of Chinese AI models, CEO Brian Chesky says the company is not sharing data with Chinese firms and uses open-source models —  Airbnb Inc. Chief Executive Officer Brian Chesky defended his company's use of Chinese artificial intelligence models …

Q&A with Sundar Pichai on the future of Google Search, Google's place in the AI race, public skepticism toward AI, AI agents, AI safety, TPUs, and more (New York Times)
Source: TechmemePublished: May 23, 2026

New York Times : Q&A with Sundar Pichai on the future of Google Search, Google's place in the AI race, public skepticism toward AI, AI agents, AI safety, TPUs, and more —  After a busy Google I/O, the company's chief executive sits down with the hosts of “Hard Fork” to discuss the future of Google Search …

FOIA lawsuit documents show hackers who breached SolarWinds potentially had access to all "treasury.gov" email addresses from July 6, 2020 to October 12, 2020 (Jordan Robertson/Bloomberg)
Source: TechmemePublished: May 23, 2026

Jordan Robertson / Bloomberg : FOIA lawsuit documents show hackers who breached SolarWinds potentially had access to all “treasury.gov” email addresses from July 6, 2020 to October 12, 2020 —  New details about the 2020 incident.  —  Six years after hackers allegedly backed by Russia's intelligence services broke …

Cloudflare CEO Matthew Prince says AI won't replace builders or sellers, but it will affect middle managers, operations jobs, and other "measuring" positions (Matthew Prince/Wall Street Journal)
Source: TechmemePublished: May 23, 2026

Matthew Prince / Wall Street Journal : Cloudflare CEO Matthew Prince says AI won't replace builders or sellers, but it will affect middle managers, operations jobs, and other “measuring” positions —  The company has less need for middle managers, operations jobs and other ‘measuring’ positions.

Samsung's bonus deal is fueling employee resentment over a 100x payout gap between memory division staff and those making smartphones, TVs, and home appliances (Yoolim Lee/Bloomberg)
Source: TechmemePublished: May 23, 2026

Yoolim Lee / Bloomberg : Samsung's bonus deal is fueling employee resentment over a 100x payout gap between memory division staff and those making smartphones, TVs, and home appliances —  Samsung Electronics Co. staved off a potentially catastrophic strike this week, reaching a tentative deal with leaders …

Fresha, a London-based beauty and wellness booking marketplace, raised $80M from KKR's growth equity arm at a $1B+ valuation, bringing its total raised to $285M (Dominic-Madori Davis/TechCrunch)
Source: TechmemePublished: May 23, 2026

Dominic-Madori Davis / TechCrunch : Fresha, a London-based beauty and wellness booking marketplace, raised $80M from KKR's growth equity arm at a $1B+ valuation, bringing its total raised to $285M —  Beauty and wellness booking marketplace Fresha has announced an $80 million investment from KKR's Next Generation Technology Growth fund …

Filing: Zoom's stake in Anthropic is worth ~$1.27B based on a February round which valued Anthropic at $380B; Zoom invested an additional $46M in recent months (Brody Ford/Bloomberg)
Source: TechmemePublished: May 23, 2026

Brody Ford / Bloomberg : Filing: Zoom's stake in Anthropic is worth ~$1.27B based on a February round which valued Anthropic at $380B; Zoom invested an additional $46M in recent months —  Zoom Communications Inc., the videoconferencing company, has netted about $1 billion on an investment it made in artificial intelligence startup Anthropic PBC in early 2023.

The US NTSB suspends access to its database of civil transportation accidents after people re-created voices of pilots killed in a 2025 UPS plane crash using AI (Jeremy Hsu/Ars Technica)
Source: TechmemePublished: May 23, 2026

Jeremy Hsu / Ars Technica : The US NTSB suspends access to its database of civil transportation accidents after people re-created voices of pilots killed in a 2025 UPS plane crash using AI —  Pilots' voices from the last seconds of a fatal cargo plane crash have been re-created by Internet sleuths using software and AI tools.

Anthropic says Claude Mythos Preview has been used to find more than 10,000 high- or critical-severity vulnerabilities since the launch of Project Glasswing (Anthropic)
Source: TechmemePublished: May 23, 2026

Anthropic : Anthropic says Claude Mythos Preview has been used to find more than 10,000 high- or critical-severity vulnerabilities since the launch of Project Glasswing —  Last month, we launched Project Glasswing, our collaborative effort to secure the world's most critical software before increasingly capable AI models can be turned against it.

The Trump administration plans to require most green card applicants to return to their home countries to apply, which could adversely impact tech workers (Peter Wells/Financial Times)
Source: TechmemePublished: May 22, 2026

Peter Wells / Financial Times : The Trump administration plans to require most green card applicants to return to their home countries to apply, which could adversely impact tech workers —  Move to tighten permanent residency requirements could have significant implications for businesses

Google files its appeal of the US federal ruling deeming it an illegal search monopolist, arguing it "prevailed in the marketplace fair and square" (Lauren Feiner/The Verge)
Source: TechmemePublished: May 22, 2026

Lauren Feiner / The Verge : Google files its appeal of the US federal ruling deeming it an illegal search monopolist, arguing it “prevailed in the marketplace fair and square” —  It wants to throw out the original decision, as well as an order to share data with rivals.

China's Wingtech files a lawsuit against Nexperia, saying its control over the Dutch chipmaker remained restricted, and seeks ~$1.2B for economic losses (Reuters)
Source: TechmemePublished: May 22, 2026

Reuters : China's Wingtech files a lawsuit against Nexperia, saying its control over the Dutch chipmaker remained restricted, and seeks ~$1.2B for economic losses —  China's Wingtech Technology (600745.SS) has filed a lawsuit with a subsidiary against Nexperia B.V. and five other entities …

The US Ambassador to Canada accuses Canada of imposing trade barriers by requiring that US streamers contribute 15% of Canadian revenue to local programming (Paul Vieira/Wall Street Journal)
Source: TechmemePublished: May 22, 2026

Paul Vieira / Wall Street Journal : The US Ambassador to Canada accuses Canada of imposing trade barriers by requiring that US streamers contribute 15% of Canadian revenue to local programming —  Remarks from U.S. Ambassador Pete Hoekstra add to harshly critical statements from lobby groups representing Hollywood and tech companies

President Emmanuel Macron says France will invest an additional €1B for quantum computing research and a further €550M in state funding for semiconductors (France 24)
Source: TechmemePublished: May 22, 2026

France 24 : President Emmanuel Macron says France will invest an additional €1B for quantum computing research and a further €550M in state funding for semiconductors —  Bruyères-le-Châtel (France) (AFP) - France will pump one billion euros of new funding into quantum computing …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - May 23, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - May 23, 2026

Solidot Feed: Highlighting essential tech & open-source news.

扎克伯格为监视员工的做法辩护

劳工保护组织 More Perfect Union 公开了扎克伯格(Mark Zuckerberg)上月底回答员工有关设备监控提问的六分钟录音。Meta 上个月通知员工将使用名为 Model Capability Initiative 的监控工具监控员工的鼠标点击和按键,此举旨在收集数据训练 AI 模型。扎克伯格在回答中为监控员工辩护,称如果想训练模型的编程能力,那么让内部员工去开发一些工具,或者去解决一些任务,以此来教模型如何写代码——这种方式能让模型在编程能力上实现飞跃。这种速度是行业内其他对手无法企及的,因为他们的公司没有成千上万名顶尖工程师,“这只是一个例子。我们的系统还需要非常擅长的一点就是‘操作电脑’。而要让一个系统学会熟练操作电脑,最有效的办法就是让它去观察极其聪明的人是如何操作电脑的。这基本上就是我们目前正在做的事情的核心本质。”扎克伯格表示不会监视员工的工作行为,MCI 数据不会用于绩效评估。因为欧盟的 GDPR 法律,Meta 位于欧洲的员工据报道不用参与该计划。Meta 并非唯一一家通过员工获取 AI 训练数据的科技公司,微软和 xAI 也在利用内部员工生成和完善训练数据集。

《无畏契约》反作弊工具会限制作弊者使用 DMA 外挂

非玩家可能不知道,今天的高级作弊工具已经硬件化,且价格不菲,可能比整台 PC 贵得多。此类工具被称为 DMA 硬件卡或 DMA 外挂,利用硬件绕过传统的游戏反作弊系统。游戏开发商也正致力于反制 DMA 外挂,最新的例子就是 Riot Games。它的 FPS 网游《无畏契约(Valorant)》使用的内核级反作弊系统 Vanguard 在最新更新之后能强制开启 IOMMU 封锁 DMA 外挂,导致 DMA 硬件停止工作,如果要恢复工作必须重新安装操作系统。Vanguard 现在能屏蔽大多数伪装成 SATA 或 NVMe 设备的 DMA 硬件卡固件,会在游戏中突然触发 IOMMU 重启警告,之后 DMA 固件完全无法使用,即使游戏不再运行或卸载也是如此。唯一的解决方法是重装 Windows 系统。Riot Games 通过社交媒体嘲讽了作弊者,称他们的 6000 美元 DMA 外挂变成了垃圾。

沃茨告诉毕业生他们拥有真正的智能

苹果联合创始人沃茨(Steve Wozniak)做到了其他毕业典礼嘉宾没有做到的事情:他谈论 AI 时赢得了毕业生的欢呼,而不是嘘声。沃茨说,“You have AI — actual intelligence。”他说,“要深入谈谈我对 AI 的看法,那就说来话长了,但我们一直在努力创造一个大脑,我们能否将一个程序复制一万亿次使其像大脑一样运作?AI 就是其中一种尝试。”沃兹回顾了他在苹果公司的工作经历,为即将开始职业生涯的毕业生们提供了一些建议,“你们应该尝试换一种思维,不要墨守成规,走千篇一律的路。想想我能不能做一些与众不同的事情?”

Linus Torvalds 谈 AI

Linux 作者 Linus Torvalds 在北美开源峰会上谈论了 AI,他认为 AI 工具正在重塑内核开发,但他坚称 AI 只是一种不错的工具,不会完全替代程序员。Torvalds 称内核最近两个版本的 commits 数增加了 20%,他一开始以为是内核版本号从 6.x 跳到 7.x 而让开发者兴奋不已,结果发现是因为 AI 辅助编程工具过去半年有了显著进步。他承认 AI 工具降低贡献者的门槛,但它真正的影响是社会而不是技术层面,一个例子就是安全邮件列表涌入了大量重复性的 bug 报告。为应对这一情况,内核制定了新规则。Torvalds 同时督促安全研究人员不要提前披露漏洞利用,内核最近发现了四个提权漏洞,但维护者还没收到通知研究员就提前公开,他说这些人喜欢引人瞩目。他不认为闭源能解决安全问题,闭源实际上更糟,因为 AI 无法帮助你修复 bug。Torvalds 说维护工作依赖于人而不是代码,作为最高级别的维护者,他的工作不是写代码而是与人合作,他不会用 AI 来与人合作,并建议其他人也不要这么做。他始终认为 AI 只是不错的工具,不会完全取代程序员。他的工作经历就凸显了工具的进步给程序员带来的生产力提升:他最开始是手动输入机器代码,然后用汇编器,接着是编译器,最后是今天的 AI 辅助编程。他认为 AI 在改变编程,但并没有改变编程的本质。开发者仍然需要理解工具生成了什么。对于任何长期运行的系统,“你不仅要理解指令,还要理解最终结果,因为这是你能长期维护它的唯一途径。”AI 并不能取代人类判断、社区规范以及对所构建系统的深刻理解,“软件非常复杂,管理复杂基础设施复杂性的唯一真正有效方法是开源”,而 AI 只是程序员工具箱中的又一个工具。

GitHub 面临生存之战

在被微软收购八年之后,最大的代码托管平台 GitHub 正面临生存之战,它的宕机和安全问题频发,而竞争对手的压力也越来越大。过去几周,GitHub 发生了多起严重的宕机事故,因员工的 VS Code 安装了一个恶意库扩展导致 3800 个内部代码库被窃取。GitHub 现员工和前员工在接受采访时描述了公司在领导层缺乏和竞争对手压力下挣扎的困境。2025 年夏天 CEO Thomas Dohmke 离职之后,微软没有再任命新 CEO,而是让领导团队成员向 CoreAI 汇报工作,CoreAI 由前 Meta 工程主管 Jay Parikh 负责,他由 CEO Satya Nadella 亲自招揽,负责帮助公司向 AI 转型。他在公司内部并不受欢迎,正是他决定不再任命 GitHub 新 CEO。有很多 GitHub 员工跟着离职去了 Dohmke 的新创公司 Entire。GitHub 高管过去几个月也不断流失,高级副总裁 Jared Palmer、前首席营收管 Elizabeth Pemmerl 都已经离职。GitHub 现员工称公司已经名存实亡,如今的一切都归微软。

Sergey Brin 捐 50 万美元反对对薪酬过高的 CEO 征税

已从硅谷搬家到内华达州的 Google 联合创始人 Sergey Brin 向旧金山的一个政治行动委员会捐赠 50 万美元,用于反对一项被称为“薪酬过高 CEO 税”的提案,旧金山选民将于 6 月 2 日对该提案进行投票。他此前已经捐赠数千万美元反对加州对亿万富翁征税的提案,该提案预计将于今年 11 月由加州选民进行投票。“薪酬过高 CEO 税”将根据公司全球员工的薪酬情况计算高管与普通员工的薪酬比率。支持该提案的 Chinese Progressive Association 称有必要“确保最富有的企业缴纳其应缴的税款”。

Meta 应沙特要求审查反对者的账号

从 2026 年 4 月 30 日起,Meta 应沙特政府要求在沙特境内屏蔽了 NGO 组织 ALQST for Human Rights 和 Democratic Diwan,以及沙特研究员 Abdullah Alaoudh 和人权活动人士 Yahya Assiri 的 Facebook 账户。Meta 也应阿联酋要求地理封锁了一名学者的账号。自 2026 年 3 月以来,已有逾 100 个 Facebook 页面和 Instagram 账户受到了限制。沙特还要求 X 平台地理封锁知名沙特活动人士的账号,目前 X 尚未遵守该要求。

脱离人体的大脑被用于药物测试

一天前这颗大脑还在一个活人身上。如今在其主人去世数小时后,它静静地躺在一辆小推车上。车上布满了管道,向这个器官内泵入数升的血液替代品和其它液体,为其输送氧气并排出代谢废物。它的大部分核心功能都完好无损,但其电活动已被麻醉剂压制,使这颗大脑处于一种介于生死之间的游离状态。随着它代谢着实验性药物,传感器实时记录着其反应,捕捉关于细胞、蛋白质和生理机能的数百个数据点。24 小时后,它将被切成数百个碎片,以进行更深入的研究。它是生物创业公司 Bexorg 使用脑维持设备 BrainEx 培养和研究的逾七百颗大脑之一,被用于深入理解潜在疗法在患有帕金森、阿尔茨海默或肌萎缩侧索硬化症等神经退行性疾病大脑中的作用机制。Bexorg 能对大脑进行活检,了解药物在细胞中停留的时间、是否靶向其分子靶点以及是否存在任何副作用。Bexorg 认为它的系统能提供比实验室动物或培养皿细胞更接近真实情况的药物测试条件。Bexorg 此前一直保持低调,但最近在扩大规模,邀请了记者参观其实验室,试图向公众保证,脱离人体的大脑不会触犯伦理底线,也不会有恢复意识的风险。

因无人驾驶汽车驶入洪水 Waymo 暂停亚特兰大服务

由于无人驾驶汽车暂时还无法应付洪水淹没道路问题,Waymo 暂停了在亚特兰大的无人出租车服务。Waymo 的一辆无人驾驶出租车周三驶入了一条被洪水淹没的道路,被困大约一小时。这辆车已被拖走。Waymo 表示它在寻找解决方案的同时暂停在了亚特兰大的服务。Waymo 早些时候因为恶劣天气暂停了德州圣安东尼奥、达拉斯和休斯顿的服务。Waymo 称亚特兰大的暴雨降雨量巨大,以至于在国家气象局发布山洪暴发预警、警报或建议前洪水就已经发生了。

手机壳可能会富集耐药菌和 PFAS

现代人几乎与手机形影不离,手部、面部皮肤与手机及手机壳长期高频接触。你有没有留意过,用了大半年的手机壳,不知从哪天开始就悄悄发黄、发黏,怎么擦都回不到当初光亮透明的样子。根据发表在《危险材料杂志》上的研究,科学家证实不良卫生习惯及频繁化妆行为会加速热塑性聚氨酯(TPU)手机壳老化,使其逐渐成为全氟烷基物质(PFAS)与条件致病菌共同富集的“温床”。用户行为研究机构 Dscout 的真实环境追踪报告显示,智能手机用户日均触摸手机 2617 次,重度用户可达 5400 余次。研究团队招募了 30 名在校大学生志愿者,开展了一项持续 285 天的真实环境受控队列研究。团队观察了两类典型受试群体:一类是卫生习惯良好、较少使用化妆品的志愿者;另一类则恰巧相反,频繁使用化妆品且手部卫生习惯较差。结果显示,与卫生习惯较好、较少使用化妆品的志愿者相比,频繁使用化妆品且手部卫生习惯较差的受试者,其手机壳表面的 PFA S富集水平显著升高。在部分污染累积较严重的手机壳样本中,全氟辛酸(PFOA)表面富集量最高达到每平方厘米 9.39 微克,全氟辛烷磺酸(PFOS)最高达到每平方厘米 0.164 微克,提示日常接触行为可能正在悄然增加人体暴露于新污染物和潜在致病微生物的风险。

欧洲巨石文化社会存在遗传亲缘关系

新石器时代晚期(约公元前 4500 至公元前 2800 年),巨石遗迹(即大型石质建筑结构)在欧洲各地出现。这些建筑作品既反映了当地的传统,同时也暗示了相隔遥远的人群之间存在着影响深远的社会、文化或祖源联系。根据发表在《科学》期刊上的一项研究,研究人员分析了中欧多个相距遥远的巨石文化遗址个体的基因组数据,发现他们之间存在着深厚且持续的生物学关联,表明当时存在着偶尔的跨越大范围地理区域的人口流动、通婚或文化交流。但中欧巨石文化与位于今天的英国以及北欧的巨石文化人群缺乏密切的基因学纽带关系。这表明巨石传统很可能是通过文化(而非通过生物学网络)传播的。

特朗普政府不想要埃博拉病毒的美国感染者回国治疗

刚果再次爆发了埃博拉疫情,确诊或接触病毒的人中包括了美国医生,但上周特朗普政府拒绝让他们回国接受治疗。39 岁的外科医生 Peter Stafford 于周日确诊,本周三美国 CDC 的埃博拉疫情事件响应经理 Satish Pillai 表示,Stafford 已送往德国,目前情况稳定。他的妻子 Rebekah Stafford 也是医生,也是病毒接触者,但目前还没有出现症状,他们以及四个孩子都送往了德国。另一名医生 Patrick LaRochelle 与 Stafford 夫妇同属于 Serge 传教团,他是病毒接触者,目前无症状,他已送往布拉格接受监测和治疗。他的妻子和孩子曾与他一同在刚果,但 CDC 认为他们没有接触过病毒,因此已经返回了美国。根据 WHO 周三公布的最新数据,目前埃博拉疑似病例为 528 例,死亡 132 例。

国际空间站俄罗斯舱段再次发生漏气事故

NASA 证实国际空间站的俄罗斯舱段再次发生漏气事故。过去五年俄罗斯航天局和 NASA 一直在追踪俄罗斯舱段的空气泄漏,漏气的舱段位于 Progress(进步号)气闸舱和 Zvezda(星辰号)服务舱之间的 PrK 模块,漏气原因是微小的结构裂缝。今年 1 月 NASA 宣布在多次检查和密封处理后 PrK 舱段的内部压力已经稳定,不再漏气。然而 PrK 舱段的漏气情况在三周前再次出现。NASA 表示它正与俄罗斯航天局协调后续处理步骤。此次事件再次引发了对国际空间站长期生存能力的担忧。

亚马逊去年在破坏工会的咨询服务上的支出为 2660 万美元

根据 Economic Policy Institute (EPI)的报告,美国雇主每年在反工会活动上的开支逾 15 亿美元。雇主雇佣从事工会规避服务的顾问和律所,在工会选举和活动期间提供法律咨询、代理和诉讼服务。美国公司每年在反工会咨询服务的开支上多达 4.42 亿美元,根据亚马逊递交到劳工部的文件,2025 年它在雇佣反工会顾问上的开支为 2660 万美元。目前美国的工会覆盖率仅为 10%,而 1983 年这一比例为 20.3%。而盖洛普民调显示,近七成美国民众支持工会。由于拖延战术和上诉,美国工人平均需要 465 天才能达成第一份工会合同,很多情况下时间甚至更长,如星巴克自 2021 年美国首家门店赢得工会选举以来工人至今仍未达成第一份工会合同。

Google 宣布在 AI 模式下加入更多广告

Google 本周二宣布搜索框将变成 AI 聊天机器人的对话框,那么它久经时间考虑的商业模式——搜索广告——自然也会跟着进入 AI 模式。Google 周三宣布将在 AI 模式中引入更多“富有帮助的广告(helpful ads)”。搜索巨人表示在测试两类新广告,提供相关产品的细节和有用的指导。作为广告的一部分,它们都会包含一个独立的 AI 解释器。广告也都会标明“赞助”字样。两类新广告其一称之为“对话式发现广告”——广告即答案;其二称之为“高亮答案”(Highlighted Answers)——将高度相关的广告作为推荐列表的一部分提供给用户。

NASA 预计中国将在 2027 年执行载人绕月飞行任务

NASA 局长 Jared Isaacman 表示他预计中国将在 2027 年执行载人绕月飞行任务,他正以此为由要求修改阿尔忒弥斯计划,加快美国重返月球的步伐。Isaacman 称,下次全世界观看宇航员绕月飞行时——很可能是 2027 年的某个时候——他们将是中国宇航员,美国将不再是唯一能将人类送入月球环境的国家。中国尚未公布月球载人飞行的时间表。迄今所有载人绕月飞行、轨道飞行或登月任务均由 NASA 执行:包括 1968-1972 年间的九次阿波罗计划以及今年四月的阿尔忒弥斯 2 号任务。

Vivaldi 8.0 释出

基于 Chromium 的浏览器 Vivaldi 释出了 8.0 版本。Vivaldi 由 Opera 联合创始人谭咏文(Jon von Tetzchner)创办。Vivaldi 8.0 的新特性包括:被称为 Unified 的新外观,所有元素都统一在一个视觉平面上;提供了六种预设布局,其中之一是垂直标签,用户可选择垂直左侧、垂直右侧两种垂直标签布局,其它还有经典、简洁、自动隐藏以及底部四种布局。

SpaceX 最大的收入来源是与 Anthropic 达成的数据中心交易

SpaceX 周三晚上向美国证券交易委员会(SEC)递交了招股说明书,首次披露了其财务状况。根据招股说明书,在合并了马斯克(Elon Musk)旗下的 xAI 和 X/Twitter 之后,SpaceX 最大的收入来源就是今年五月与 Anthropic 达成的为期三年的数据中心交易,租用 Colossus 1 园区的算力,每月支付 12.5 亿美元。但这笔交易并非是保障性,任何一方都可以提前 90 天通知终止交易。其它数据包括:2025 年营收 187 亿美元,营业亏损 26 亿美元,净亏损 49 亿美元。其中卫星宽带 Starlink / Connectivity 业务营收 114 亿美元营业利润 44 亿美元,太空发射业务营收 41 亿美元运营亏损 6.57 亿美元,AI 以及社媒业务营收 32 亿美元营业亏损 64 亿美元。招股书数百次提及 AI。马斯克持有 12.3% 的 A 类股和 93.6% 的 B 类股,B 类股投票权十倍于 A 类股,马斯克总共控制着公司 85.1% 的投票权。如果他出售任何 B 类股,它们将自动转换为 A 类股。

Google 的 AI 搜索容易被人为操纵

Google 的 AI 搜索非常容易被人为操纵。因为以前的搜索结果是第一页给你 10 个链接然后让用户判断,现在的 AI 搜索是给你一个答案,而答案的来源可能只有一个。BBC 科技记者通过个人网站上一篇热狗文章演示了这一操纵。专家表示此类操纵正大规模系统性地发生。操纵 AI 搜索向用户提供偏见或不准确信息可能会带来严重后果。这并非一个无关紧要的问题。在全球范围内,逾 10 亿人日常使用 AI 聊天机器人,每月有 25 亿人浏览 Google 的 AI overviews。如果你能操控此类工具就能获得巨大的权力。Google 等公司也注意到了该问题。, Google 上周更新了其政策,将试图操纵 AI 回复的行为视为违反公司规定。Google 威胁对涉嫌操纵行为的公司或网站从搜索结果中移除或降低排名。

RTX 5090DV2 显卡列入封禁清单

上周五,中国海关将去年 8 月英伟达为通过美国出口管制规定而推出的 RTX 5090DV2 显卡列入封禁清单。该清单最初包括 H200 和 H20。H20 是英伟达此前在中国市场销售的另一款中国特供芯片。在京东和淘宝等主要电商平台,RTX 5090DV2 仍在销售,价格在 1.8 万-2.2 万元之间,意味着现有库存仍然能正常销售,但随着进口的消失,其数量将会越来越少。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK