TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0877
TUE, MAY 26, 2026
Discover the best information organized by OrangeBot.AI
TODAY · TUE, MAY 26, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

May 26, 2026

Here is a summary of today's main news events.

U.S. Strikes Iran Amidst Tense Negotiations

The United States military conducted "defensive" strikes against targets in southern Iran, causing a temporary spike in oil prices. This military action comes as U.S. and Iranian officials are reportedly engaged in negotiations to de-escalate tensions, possibly concerning the Strait of Hormuz. The conflicting signals have created volatility and uncertainty in global markets.

U.S. Stocks Rise While Global Inflation Concerns Mount

U.S. stock markets moved higher, led by a rally in technology and artificial intelligence-related stocks. However, economic anxieties are growing globally. Officials from the European Central Bank warned that rising energy costs are contributing to broader inflation, and data shows that U.S. inflation is also trending upward, increasing concerns about higher interest rates.

AI Expansion Accelerates, Prompting Both Investment and Concern

The artificial intelligence sector continues to expand rapidly across industries. A major French AI firm announced a partnership to bring its technology to the legal sector, while companies are competing fiercely for AI talent. This rapid integration is also drawing scrutiny and backlash, with prominent figures, including the head of the Catholic Church, voicing moral and societal concerns about the technology's impact.

Europe Faces Economic Pressures and Setback in Ukraine Aid

A Czech-led initiative to supply millions of artillery shells to Ukraine has reportedly been halved since December, highlighting challenges in providing sustained military support. On the economic front, British households now owe a record amount to energy suppliers, reflecting significant cost-of-living pressures facing consumers across the continent.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - May 26, 2026

Hacker News Feed: Highlighting key posts and discussions.

GitHub Actions down again today

(www.githubstatus.com)

339181
Ferrari Luce

(www.ferrari.com)

351645
Leave Me Behind

(androidessence.com)

341303
Magnifica Humanitas

(www.vatican.va)

1507844
The Eternal Sloptember

(geohot.github.io)

467361
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - May 26, 2026

huggingface.description

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Reinforcement Learning has become a standard paradigm for aligning Large Language Models with human intent and task requirements. While Group Relative Policy Optimization offers an efficient, value-model-free alternative to Proximal Policy Optimization, adapting it to real-world multi-reward settings remains challenging. Standard scalarization practices, such as Reward Combination and Advantage Combination, suffer from significant drawbacks: Reward Combination frequently generates advantages with excessively large squared magnitudes that lead to training instability, while Advantage Combination relies on static hyperparameters and ignores cross-objective correlations. To address these limitations, we propose Dynamic Variance-adaptive Advantage Optimization (DVAO), which dynamically adjusts combination weights based on the empirical reward variance of each objective within a rollout group, effectively up-weighting objectives with a stronger learning signal while suppressing noisy ones. We mathematically prove that DVAO maintains bounded advantage magnitudes for stable training and introduces a self-adaptive cross-objective regularization mechanism. Extensive experiments on mathematical reasoning and tool-use benchmarks using Qwen3 and Qwen2.5 models demonstrate that DVAO significantly outperforms baseline methods, achieving a superior multi-objective Pareto frontier and robust training stability.

115
Macaron-A2UI: A Model for Generative UI in Personal Agents

As personal agents evolve to handle complex, user-centric tasks, static plain-text chat is rapidly becoming a bottleneck. Generative UI emerges as the necessary new interface layer, dynamically synthesizing the right controls, options, and state from the interaction context in real time. We present Macaron-A2UI, a model for Generative UI in personal agents. Our goal is to move beyond text-only interaction by enabling agents to generate natural language together with lightweight, executable UI actions for information collection, preference refinement, confirmation, and multi-goal organization. We build a large-scale Generative UI corpus from heterogeneous dialogue sources, introduce A2UI-Bench for controlled evaluation, and train 30B, 235B and 754B models with parameter-efficient LoRA-based supervised fine-tuning followed by reward-driven reinforcement learning. The best Macaron-A2UI model reaches 75.6 overall on A2UI-Bench without explicit schema hints, surpassing the strongest full-schema frontier baseline. We release the models, benchmark, and evaluation protocol to support future work on Generative UI for personal agents.

59
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Interactive world models are advancing rapidly, yet existing benchmarks cover only part of the required competencies, leaving no unified standard for systematic evaluation. To fill this gap, we introduce WBench, a comprehensive multi-turn benchmark for interactive world model evaluation along five dimensions, namely video quality, setting adherence, interaction adherence, consistency, and physics compliance. WBench contains 289 test cases and 1,058 interaction turns, where each case specifies a world setting and a multi-turn interaction sequence, covering diverse scenes, styles, subjects, and both first- and third-person perspectives, together with four interaction types, including navigation, subject action, event editing, and perspective switching. For navigation, WBench unifies text, 6-DoF pose, and discrete-action control, enabling evaluation of models with different native input interfaces. Evaluation uses 22 automatic sub-metrics that combine specialist vision models with large multimodal models, and all metrics are validated against human judgments. Across 20 state-of-the-art models, we find that no single model performs strongly across all dimensions. We provide detailed diagnostic insights into the characteristic strengths, weaknesses, and open challenges of each model. Code and data are available at https://github.com/meituan-longcat/WBench.

56
Foundation Protocol: A Coordination Layer for Agentic Society

Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another. As these systems scale, the bottleneck shifts away from raw model capability toward coordination. Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight. This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society. FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations, and supports native multi-party organization and event-based collaboration. It also provides economic primitives for metering, receipts, and settlement, and treats policy, provenance, and audit as first-class concerns. FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead. The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.

53
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.

33
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Training large multimodal models (LMMs) via reinforcement learning (RL) to natively invoke video-processing tools (e.g., cropping) has become a promising route to long-video understanding. However, existing native-RL methods dispatch tool calls sequentially (i.e., one per turn): a single wrong crop propagates errors without peer correction, multi-turn tool calls corrupt context, and inference cost scales linearly with the number of turns. We introduce ParaVT, the first multi-agent end-to-end RL-trained framework for Parallel Video Tool calling, dispatching multiple time-window crops in a single turn for cleaner context and better fault tolerance. Yet applying standard RL to ParaVT reveals an obstacle we term the Tool Prior Paradox: the pretrained tool priors that enable tool exploration also destabilize cold-started structural format and expose the skip-tool reward shortcut under temperature sampling. A cross-model contrast on a weaker-prior LMM supports this claim: format stays stable but RL elicits zero tool calls, indicating that prior strength is the shared driver of both format collapse and tool exploration. We propose PARA-GRPO (Parseability-Anchored and Ratio-gAted GRPO), which augments standard RL with two complementary mechanisms: (i) a targeted format reward applied only at the structural-token positions most prone to collapse, and (ii) a per-prompt frame-budget randomization that creates training prompts where calling the tool yields a measurable reward signal over skipping it. Across six long-video understanding benchmarks, ParaVT improves over the Qwen3-VL baseline by +7.9% on average, with PARA-GRPO lifting training-time format compliance from 0.13 to 0.64. As tool capabilities become increasingly internalized in modern LMMs, RL must cooperate with the resulting priors, and ParaVT offers a general recipe for agentic RL. Code, data, and model weights are publicly available.

29
Toward Native Multimodal Modeling: A Roadmap

Multimodal modeling represents a vital step from modality-agnostic reasoning toward world modeling. While early approaches predominantly rely on late-fusion that assembles encoders and frozen language backbones with output heads, recent efforts have shifted the paradigm toward native multimodal modeling (NMM) with the intrinsic integration of modalities for superior multimodal performance. Despite its potential, the design space of native architectures remains insufficiently defined. In this paper, we present the community with a formalized roadmap for this transition. Specifically, we formally define the architectural nativity, distinguishing mid-fusion and early-fusion from non-native paradigms. We further organize the existing native models through the lens of input-output duality into three categories: (i) Multi-to-Text for cross-modal comprehension with text-only output; (ii) Multi-to-Target for scenario-oriented generation, e.g., image, audio and video generation, and (iii) Multi-to-Multi for unified modeling with symmetric input-output. We deliver a comprehensive and industrial-grade investigation into the transition toward the definitive NMM framework, where understanding and generation seamlessly coexist within a unified transformer paradigm. We systematically unpack the end-to-end pipeline from industrial perspectives from architectural coordination, massive data curation, to full-stack training recipes, inference & deployment, and the comprehensive evaluation for truly native modeling.

27
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision. This shift marks a transition from task-level AI for science to workflow-level research automation. Yet current systems remain fragmented, differing in autonomy, domain scope, execution environment, validation mechanism, and human oversight, while still struggling with evidence preservation, reproducibility, weak-direction rejection, provenance tracking, cross-domain robustness, and accountable scientific closure. This survey examines these developments through AutoResearch, defined as the developmental spectrum of AI-powered scientific workflow automation. Within it, Vibe Research denotes the human-steered region of prompt-based assistance and human-verified execution, whereas emerging AI-led systems coordinate larger portions of the discovery loop without achieving robust autonomy. We analyze how research systems redistribute control, evidence, execution, validation, and accountability across workflows and organize the field around five workflow conditions: literature and research grounding; hypothesis formation and planning; experimentation and tool use; feedback, validation, and review; and reporting and knowledge communication. We further synthesize AI scientist systems, mixed-initiative co-research frameworks, benchmarks, domain deployments, and open-source infrastructures. Finally, we propose five evaluation dimensions--novelty, validity, impact, reliability, and provenance--and show that AutoResearch autonomy is domain-conditioned, being more credible in structured, executable, and rapidly verifiable settings but limited in embodied, delayed, heterogeneous, ethical, or institutionally accountable contexts.

21
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

Efficient attention algorithms are critical to mitigate the quadratic cost of attention in long-context workloads. Prior work utilises block-scaled quantisation techniques on Blackwell GPUs to move attention computation to 4-bit precision to accelerate inference. However, these techniques result in significant quality degradation in long-context settings. We show that the output impact of quantisation error is highly non-uniform and increases with the importance of each query-key interaction, concentrating functionally relevant error in a small number of attention blocks that contain the most important tokens. We propose ThriftAttention, a low-bit attention variant that delivers near-FP16 long-context quality at FP4 inference efficiency. This approach proceeds in two stages. First, a heuristic rapidly selects a small number of important query-key block pairs for FP16 precision. Second, the selected blocks are computed in FP16 and the remaining blocks in FP4, with both paths merged via online softmax into a single output. We demonstrate across long-context benchmarks and model families that by computing only 5% of query-key blocks in FP16, ThriftAttention recovers on average 89.1% of the FP4-to-FP16 performance gap. We show ThriftAttention's advantage grows with sequence length, mitigating the systematic FP4 quality degradation observed at longer contexts. The code is available at https://github.com/joesharratt1229/ThriftAttention.

19
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Deep research agents extend the role of search engines from retrieving keyword-matched pages to synthesizing knowledge, fundamentally changing how humans interact with information. However, frontier systems remain proprietary, while existing open agents often generalize poorly across different task types, leaving unclear how to train a broadly capable deep research agent. We release QUEST, a family of open models (ranging from 2B to 35B) that serve as general-purpose deep research agents designed to handle a wide range of long-horizon search tasks, with strong capabilities in fact seeking, citation grounding, and report synthesis. To build QUEST, we propose an effective training recipe combining mid-training, supervised fine-tuning, and reinforcement learning. Central to this recipe is a curated data synthesis pipeline based on unified rubric trees, which applies to different task types and enables synthesizing training data with verifiable rewards without human annotation. In addition, QUEST incorporates a built-in context management mechanism that enables effective long-horizon reasoning and knowledge synthesis. Using only 8K synthesized tasks, QUEST approaches or even surpasses frontier closed-source agents across eight deep research benchmarks spanning diverse task types, and achieves the best overall performance among recent open-weight agents. We released everything: models, data, and training scripts.

19
Your Embedding Model is SMARTer Than You Think

Multimodal retrieval relies heavily on single-vector retrievers, which compress rich, sequential token sequences into one single global representation. While efficient, they discard fine-grained, local evidence critical for dense retrieval tasks. Multi-vector approaches were introduced as a solution, but they strictly require training and many ignore the necessity of a globally summarizing representation. To address this, we introduce SMART, a framework that unlocks the latent multi-vector capabilities of standard single-vector models. We first demonstrate that standard contrastive training on the pooled embedding implicitly shapes the retrieval geometry of preceding hidden states via gradient flow. By applying direct late-interaction over these frozen hidden states during inference, SMART acts as a plug-and-play upgrade that consistently improves performance across diverse modalities, improving even the state-of-the-art models further on MMEB-V2. We also reveal SMART's superior performance, as simple lightweight post-training not only saves time and compute, but also brings forth further improvement on Visual Document retrieval, allowing a single-vector model to outperform SoTA multi-vector counterparts. Ultimately, SMART offers both a highly efficient inference enhancement and a powerful finetuning technique for multimodal retrieval. We open source our code and weights at https://github.com/HanSolo9682/SMART.

17
Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.

15
ControlLight: Towards Controllable, Consistent, and Generalizable Low-Light Enhancement

Existing deep learning-based low-light enhancement methods are typically trained on limited datasets with single enhancement targets, which restricts their generalization ability and controllability in real-world applications. To overcome these limitations, we propose ControlLight, a controllable, consistent, and generalizable framework for low-light enhancement. We first construct a large-scale dataset of real-world degraded images with continuous illumination-strength supervision. To further ensure consistent outputs under different control strengths, we introduce a misalignment-aware weighted flow matching loss that preserves image structure across continuous enhancement strengths. ControlLight allows users to edit real-world degraded low-light images toward satisfactory enhancement results by flexibly controlling the strength while preserving visual consistency and realism. Extensive experiments show that ControlLight achieves state-of-the-art performance against existing low-light enhancement approaches while demonstrating strong continuous controllability and generalization to real-world scenarios.

14
Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents

While AI agents demonstrate remarkable capabilities in reasoning and tool use, they remain fundamentally reactive: they compute responses only after explicit user prompts. This paradigm ignores a critical opportunity: the idle time between interactions is largely wasted, leaving agents unable to prepare for future user needs. To bridge this gap, we introduce ProAct, a proactive agent architecture that leverages idle-time compute to anticipate and fulfill likely upcoming user needs. By analyzing evolving dialogue history together with persistent memory, ProAct predicts upcoming needs and iteratively acquires information, allowing the agent to resolve knowledge gaps and prepare evidence before the user initiates a query.To rigorously evaluate proactive capabilities, we also introduce ProActEval, a comprehensive benchmark comprising 200 scenarios across 40 domains, featuring predictable need chains and diverse user cognitive profiles. Empirical results demonstrate significant advantages over reactive baselines. ProAct accelerates task completion by reducing required turns by 14.8%, decreases user effort by 11.7%, and cuts hallucination rates by 28.1% on ProActEval. Furthermore, MemBench evaluations confirm that ProAct achieves state-of-the-art reflective accuracy, underscoring its sustained and robust performance.

12
On-Policy Adversarial Flow Distillation for Autoregressive Video Generation

Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student must learn under its own rollout distribution, whereas practical teachers may expose only prompt-conditioned completed videos and may differ in architecture, capacity, temporal design, and sampling schedule. This interface makes supervised fine-tuning off-policy, score-based distillation inapplicable, and direct adversarial imitation too sparse for denoising-time credit assignment. We propose Adversarial Flow Distillation (AFD), an on-policy framework for heterogeneous black-box video distillation. AFD queries the teacher and rolls out the current student on the same prompts, trains a prompt-paired Bradley-Terry discriminator to estimate clean-sample teacher-student discrepancy, and converts the resulting on-policy advantage into forward-process flow-matching updates on the student's own noised states. Thus, AFD provides dense velocity-field supervision while requiring no teacher scores, latents, denoising trajectories, step alignment, or reverse-chain reinforcement learning. Experiments across two causal AR student families show that AFD consistently improves motion- and physics-sensitive generation while preserving general video quality, and ablations validate the importance of adaptive on-policy feedback and forward-process credit assignment. The method requires only clean teacher videos and student rollouts, providing a practical route for distilling proprietary or heterogeneous video generators into efficient autoregressive students.

12
Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet current systems operate over only narrow slices of that world, limiting context-sensitive reasoning and effective assistance. Existing benchmarks similarly provide only partial user state and therefore fail to capture performance in such a broad, always-on setting. To address this gap, we introduce Claw-Anything, a benchmark that expands agent context along three dimensions: long-horizon activity histories, interdependent backend services, and integrated GUI and CLI interaction across multiple devices. To instantiate this setting, we simulate months of user activity through multi-round event injection, producing complex world states and realistic noise, including irrelevant events and conflicting signals. Agents must reason over rich contextual environments while remaining robust to such noise. This expanded scope also enables the evaluation of proactive assistance, requiring agents to anticipate user needs and deliver timely recommendations. Experiments show that GPT-5.5 achieves only 34.5% pass@1, substantially below prior benchmarks, underscoring a gap between current agent capabilities and the demands of always-on personal assistance. Alongside the benchmark, we release an automated data-generation pipeline that yields 2,000 training environments and improves the base model by 23.7%, demonstrating its utility of scalable data infrastructure.

11
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

Large language model (LLM) agents accumulate rich episodic trajectories while solving real-world tasks, but it remains unclear whether such experience can be distilled into reusable procedural skills. We introduce SkillEvolBench, a diagnostic benchmark for evaluating this step from experience reuse to skill formation. It contains 180 tasks across six real-world agent environments, organized into role-conditioned task families with shared latent procedures. Agents learn from acquisition tasks, update an external skill library using compacted trajectories and verifier feedback, and then face frozen deployment tasks testing context shift, adversarial shortcuts, and composition. By comparing self-generated and curated-start skill evolution against no-skill and raw-trajectory controls, SkillEvolBench separates procedural abstraction from base capability, curated prior knowledge, and direct reuse of episodic traces. Across ten model configurations and three agent harnesses, we find that current agents often adapt locally but rarely form robust reusable skills. Skill-based conditions can improve acquisition or replay, and individual models sometimes gain on specific deployment axes, but these gains are unstable under frozen deployment. Raw-trajectory reuse frequently outperforms distilled skills, suggesting that current abstraction procedures discard contextual and procedural cues that remain useful for future tasks. Capacity and cost analyses further show that writing more skills or larger Tier-3 resource libraries is not sufficient: additional updates can improve coverage while introducing episode-specific drift and procedural clutter. These findings position SkillEvolBench as a testbed for measuring when one-off experience becomes durable procedural knowledge rather than task-local memory.

10
MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

Memory is a fundamental component for enabling long-context LLM agents, supporting persistent state across interactions through a continuous serve-and-update lifecycle. Despite substantial prior work, existing systems suffer from significant maintenance overhead due to two key limitations: coarse-grained state management and inherently sequential update pipelines. In particular, updates are often tightly coupled with LLM inference and require full-state rewrites, leading to poor scalability and growing latency as memory accumulates. To address these challenges, we present MemForest, a memory framework that reformulates agent memory as a write-efficient temporal data management problem. MemForest breaks the sequential bottleneck via parallel chunk extraction, decoupling memory construction into concurrent, independent operations. To further eliminate coarse-grained maintenance, we introduce MemTree, a hierarchical temporal index that organizes memory as time-ordered trees rather than flat global summaries. This design replaces full-state rewrites with localized per-node updates, reducing maintenance cost to the affected tree paths while naturally preserving temporally evolving states. We evaluate MemForest on two long-context memory benchmarks, LongMemEval-S and LoCoMo. On LongMemEval-S, MemForest achieves the best overall performance among stateful baselines, reaching 79.8% pass@1 accuracy while sustaining a memory construction throughput approximately 6x higher than state-of-the-art approaches including EverMemOS.

8
InstructSAM: Segment Any Instance with Any Instructions

In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction problem and propose an explicit reasoning-to-instance query interface that elegantly bridges a vision-language model (VLM) and SAM3. Specifically, a bank of learnable instance queries is injected into the VLM and contextualized with instruction and visual information, enabling each query to serve as an instance-aware slot. A hybrid-attention mechanism further promotes interaction among these queries, visual tokens, and instruction tokens, improving instance enumeration and reducing duplicate predictions. The resulting LLM-conditioned queries are projected into SAM3's detector query space to drive accurate multi-instance segmentation in a single forward pass. This design equips SAM3 with high-level instruction understanding, compositional reasoning, and instance-level set prediction without modifying its core architecture. To support training and evaluation, we further construct Inst2Seg, a high-quality and large-scale instruction-based instance segmentation dataset and benchmark that couples free-form instructions with instance-level masks. Extensive experiments show that only 2B-scale InstructSAM achieves strong results across complex instruction-driven and phrase-level referring segmentation benchmarks, outperforming prior end-to-end methods and SAM3's agentic pipeline while enabling efficient single-pass multi-instance prediction.

7
Channel-wise Vector Quantization

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature vector, CVQ quantizes each channel of the feature map. This formulation represents an image as discrete levels of visual details, rather than as a grid of spatial patches. Based on CVQ, we introduce a new visual autoregressive framework with "next-channel prediction". Instead of rendering images patch by patch in raster order, our Channel-wise Autoregressive (CAR) model predicts image channels sequentially, producing progressively enriched visual details. Specifically, it first sketches global structure and then refines fine-grained attributes, akin to a human artist's workflow. Empirically, we show that: (1) CVQ achieves 100% codebook utilization with a 16K+ codebook size without any bells and whistles, and substantially improves reconstruction quality over conventional VQ; and (2) CAR attains a DPG score of 86.7 and a GenEval score of 0.79, demonstrating strong effectiveness for text-to-image generation.

7
Geometry-Aware Image Flow Matching

Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.

7
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering, yet its extension to computer-use agents (CUAs) has been bottlenecked by the scarcity of scalable training data with deterministic rewards. Constructing such data for CUAs requires consistent task instruction, executable environment, and verifiable reward. However, hand-curated benchmarks achieve high reward fidelity but cover few applications and LLM-as-judge-based datasets scale broadly but lack reliable verification. We present CUA-Gym, a scalable pipeline that co-generates task instructions, environment states, and reward functions. Concretely, a Generator agent constructs the initial and golden environment states, and a separate Discriminator agent writes the reward function from the task specification. An orchestrator agent drives the two through iterative rounds upon execution. Generated tuples then pass a final filter combining LLM majority voting and agent rollouts, ensuring quality beyond the per-task adversarial loop. To address the scarcity of training environments, we further synthesize CUA-Gym-Hub, a broad suite of high-fidelity mock web applications grounded in real-world software-use distributions, expanding the scale of CUA RLVR data by magnitude. Using this pipeline, we construct CUA-Gym, a dataset of 32,112 verified RLVR training tuples grounded in 110 environments. Trained with GSPO on CUA-Gym, our CUA-Gym-A3B and CUA-Gym-A17B achieve 62.1% and 72.6% on OSWorld-Verified, outperforming prior open-source CUAs at comparable scales, with performance scaling smoothly in both data volume and environment diversity. The same checkpoints also improve on the held-out WebArena benchmark, indicating transfer beyond the training environments. We will open-source the full synthesis pipeline, dataset, CUA-Gym-Hub environments, and models.

5
CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model's predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category. Built in a photorealistic Unreal Engine environment, CRONOS enables controlled, high-fidelity generation of videos across diverse scenes and dynamics. In contrast to previous benchmarks, CRONOS systematically intervenes on four key factors - viewpoint, scene, object category, and object appearance - while keeping the underlying physical event type, such as a collision, occlusion, or fall, fixed. Our evaluation of recent open-source video generators reveals substantial failures in counterfactual physical consistency: prediction quality for the same physical event type is affected by appearance, environment, and, particularly by viewpoint changes. CRONOS provides a controlled and reproducible testbed for diagnosing how the quality of generated videos changes for different interventions, establishing a concrete target for developing models that perform consistently across changes of multiple conditions. The dataset and code are available at our project page.

5
Towards Customized Multimodal Role-Play

Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual identity while maintaining output consistency across modalities remains largely unexplored. To mitigate this gap, we introduce a new task, Customized Multimodal Role-Play (CMRP). We construct the RoleScape-20 dataset comprising 20 characters, including training and evaluation data that cover persona, stylistic descriptions, visual/expressive cues, and text-image interactions. Building on a unified model, we devise UniCharacter, a two-stage training framework containing Unified Supervised Finetuning (Unified-SFT) and character-specific group relative policy optimization (Character-GRPO). Given only 10 images plus corresponding interaction examples, the model acquires the target character and exhibits coherent persona, style, and visual identity in both generated text and images. This process takes about 100 GPU hours. Experiments on the RoleScape-20 dataset show that the proposed method substantially outperforms prior approaches. Ablation studies further validate the effectiveness of our cross-modal consistency design and few-shot customization strategy. We argue that CMRP, coupled with unified modeling, provides a basis for next-generation characterful and immersive interactive agents.

4
Helix4D: Complex 4D Mesh Generation

Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic mesh generation framework by inheriting the expressive representation of Trellis2, adapting it from image-to-3D to video-conditioned 4D generation. Our design arises from two key questions: (a) how to enable Trellis2's frame-local attention to share information across frames while preserving its pretrained quality on rare cases such as transparent objects and inner surfaces, and (b) how to inject temporal information into a purely 3D positional encoding without breaking pretrained capabilities. We address (a) with a sliding-window cross-frame attention and anchor on the first frame. The first frame is generated by the base Trellis2 model and injected into our model, letting it inherit Trellis2's quality in rare cases through cross-frame attention. We address (b) with a 4D temporal encoding that repurposes redundant low-frequency spatial RoPE bands for time, extending the encoding from 3D with no additional parameters. Extensive experiments show the effectiveness of Helix4D for high-quality dynamic mesh generation on ActionBench and our own challenging complex dynamics set.

4
MetaphorVU: Towards Metaphorical Video Understanding

Metaphorical videos are prevalent across various real-world scenarios to convey complex ideas, and understanding them typically requires high-order cognitive capabilities. The lack of systematic studies on metaphorical video understanding not only constrains the real-world applicability of MLLMs but also impedes the thorough assessment of their high-order cognitive capabilities. To bridge this gap, we propose MetaphorVU-Bench, the first systematic and comprehensive benchmark dedicated to metaphorical video understanding. Through experiments, we find current MLLMs struggle with accurate metaphorical video understanding, lagging far behind human level, primarily due to defective cross-domain mapping. Motivated by this finding, we construct a metaphor knowledge graph as mapping augmentation and propose MetaphorBoost, an inference-time enhancement framework achieving consistent performance improvement. Our benchmark, analysis, and method provide useful insights and a foundation for future research on advancing MLLMs.

4
Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference

Text-to-image diffusion models like Stable Diffusion generate high-quality images from text, but lack a way to inject visual guidance (e.g. sketches, styles) at inference without retraining. Existing methods either require computationally expensive fine-tuning or rely on style transfer techniques that risk semantic misalignment with textual prompts. We introduce Visual Concept Fusion (VCF), the first method offering dual conditioning on both an image and text prompt at inference time without any concept-specific training. VCF enables visual concept injection into Stable Diffusion by aligning CLIP image features with the text embedding space. VCF consists of three components: (1) a lightweight aligner that maps image tokens to the text embedding manifold using InfoNCE and cross-attention reconstruction losses, (2) a fusion strategy that preserves both textual and visual semantics, and (3) an optional Prompt-Noise Optimization (PNO) module for test-time refinement. Our experiments demonstrate that VCF successfully transfers visual attributes including style, composition, and color palette from reference images while maintaining prompt adherence. Quantitative results show a trade-off between text alignment (CLIP score) and visual correspondence (LPIPS), with VCF outperforming baselines in reference fidelity.

3
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

Generative priors in Image Super-Resolution (SR) often compromise faithful restoration, we attribute this limitation to a fundamental spectral misalignment between isotropic objectives and the intrinsic natural image manifold. While Direct Preference Optimization offers a path to alignment, its reliance on spectrally flat Gaussian noise fails to distinguish authentic high-frequency details from hallucinations. To bridge this geometric gap, we propose ASASR, a theoretically grounded framework that recasts the generative flow into a Sobolev-induced Riemannian geometry by explicitly coloring the noise transition kernel to mirror natural spectral decay. Driving this geometric alignment, we integrate a parametric adversary grounded in the Riesz Representation Theorem, which synthesizes targeted negative samples equivalent to worst-case Sobolev gradients to direct optimization along the tangent space of plausible structural failures. Extensive evaluations demonstrate that ASASR outperforms leading generative baselines, particularly in preserving spectral consistency and structural fidelity, offering a robust solution that effectively mitigates artifacts.

3
SEAL: Synergistic Co-Evolution of Agents and Learning Environments

Large Language Model (LLM) agents are increasingly improved through interaction, yet most self-evolution methods adapt either the policy or the learning environment in isolation. We identify this structural gap as Agent-Environment Misalignment: the agent's capability frontier changes during training, while the environment that provides supervision remains static or only weakly coupled to the agent's revealed failures. We propose SEAL, a closed-loop co-evolution framework for interactive tool-use agents. SEAL collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both environment-side adaptation and model-side policy optimization. The environment evolves its training-time learning interface by exposing clearer tool affordance cues, constraint information, and recovery-oriented feedback, while the policy is updated with diagnosis-guided advantage reweighting. Extensive experiments across in-distribution and out-of-distribution multi-turn tool-use evaluations show that SEAL improves low-resource agent learning: with only 400 training samples, it yields +8.25 to +26.25 average-point gains across three backbones and exhibits positive out-of-distribution transfer. These results demonstrate the value of jointly adapting the learner and its training-time learning substrate for robust self-improving LLM agents.

3
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators. We show that minimizing the KL divergence to a reward-tilted teacher distribution naturally decomposes into a distribution matching term and a reward maximization term. In the first stage, we introduce Ambient-Consistent Distribution Matching Distillation (AC-DMD), which performs subinterval-wise distribution matching and augments the fake score objective with a consistency regularizer to help the fake score model track the shifting generator distribution under limited updates. In the second stage, we jointly optimize both terms: for the reward maximization term, we derive a hybrid policy gradient that combines a GRPO-style estimator for the stochastic intermediate transitions with direct reward backpropagation through the deterministic final step, and further introduce step-subset GRPO (SubGRPO) to reduce variance. Experiments on SD3, SD3.5, and FLUX.2 demonstrate that RTDMD establishes new state-of-the-art results across preference, aesthetic, and compositional metrics with only 4 inference steps, outperforming previous few-step text-to-image generation methods. Code and models are available at https://github.com/Harahan/RTDMD.

2
Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

Chains of thought (CoTs) have become central in interpreting and auditing behaviors of large language models. Yet growing evidence suggests that these traces often fail to faithfully represent the computations behind a model's predictions. Several faithfulness metrics have been proposed, but whether they indeed measure faithfulness remains unknown. Answering this requires ground-truth labels, which are hard to obtain since internal computations are not directly observable. Consequently, most works proposing metrics report only absolute scores or comparisons to prior metrics, and the few existing benchmarks rely on proxies like plausibility or importance, properties orthogonal to faithfulness that can mislead about whether a CoT can be trusted. We address this challenge by constructing tasks whose outputs reveal which intermediate computations must have produced them, and developing an automated labeling pipeline that yields ground-truth faithfulness labels at both the step and CoT level. Building on this methodology, we present BonaFide, a benchmark of 3,066 labeled CoTs across 13 tasks and 10 models, and use it to conduct the first systematic evaluation of prominent faithfulness metrics. Our experiments show that most metrics perform near chance, exhibit strong prediction biases and degrade on longer CoTs. The best metric reaches only 0.70 AUROC at the CoT level while another reaches 0.59 at the step level, with neither transferring across settings, while entailing prohibitively high computational cost. Our results expose fundamental gaps in current faithfulness evaluation and call for the development of more reliable and efficient metrics.

2
Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries

Transportation safety analysis requires integrating crash records, roadway attributes, and geospatial data through GIS-based workflows, but access remains uneven across agencies and community stakeholders. Technical prerequisites create a gap between analytical tools central to safety planning and the practitioners able to use them. Local agencies, school committees, and residents may have safety concerns but limited capacity to retrieve, filter, map, and analyze relevant data. Generative AI offers a way to narrow this divide, but its public-sector use raises questions about reliability, reproducibility, and governance. This paper presents a schema-grounded natural language interface for transportation safety analysis, using a large language model (LLM) to interpret user intent while preserving deterministic, reviewable execution against an authoritative database. User queries are translated into structured semantic frames, validated by a rule-based layer, compiled into a typed directed acyclic graph of spatial operations, and executed against a PostGIS database. This bounded design separates language interpretation from deterministic execution, keeping results reproducible and schema-grounded while removing access barriers. The framework is evaluated using a statewide Massachusetts transportation safety database integrating crash records, roadway attributes, and geospatial layers including schools, bus stops, crosswalks, and municipal boundaries. All queries executed successfully; the validation layer corrects errors in 29% of evaluation queries, reflecting the gap between flexible natural language and strict schema-grounded requirements. The results suggest that combining natural language accessibility with deterministic execution is a practical direction for broadening access to transportation safety data, with implications for trustworthy AI in public-sector planning.

1
Language Models Need Sleep

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs N offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.

1
Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Temporal credit assignment in reinforcement learning has long been a central challenge. Inspired by the multi-timescale encoding of the dopamine system in neurobiology, recent research has sought to introduce multiple discount factors into Actor-Critic architectures, such as Proximal Policy Optimization (PPO), to balance short-term responses with long-term planning. However, this paper reveals that blindly fusing multi-timescale signals in complex delayed-reward tasks can lead to severe algorithmic pathologies. We systematically demonstrate that exposing a temporal attention routing mechanism to policy gradients results in surrogate objective hacking, while adopting gradient-free uncertainty weighting triggers irreversible myopic degeneration, a phenomenon we term the Paradox of Temporal Uncertainty. To address these issues, we propose a Target Decoupling architecture: on the Critic side, we retain multi-timescale predictions to enforce auxiliary representation learning, while on the Actor side, we strictly isolate short-term signals and update the policy based solely on long-term advantages. Rigorous empirical evaluations across multiple independent random seeds in the LunarLander-v2 environment demonstrate that our proposed architecture achieves statistically significant performance improvements. Without relying on hyperparameter hacking, it consistently surpasses the ''Environment Solved'' threshold with minimal variance, completely eliminates policy collapse, and escapes the hovering local optima that trap single-timescale baselines. The source code to reproduce our experiments is publicly available at https://github.com/ben-dlwlrma/Representation-Over-Routing.

1
Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce β, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.

1
HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

Online 3D reconstruction requires estimating camera pose and scene geometry under strict causal and bounded-memory constraints. Existing methods often suffer from drift, jitter, or collapse on long sequences. We trace these failures to a fundamental mismatch. Streaming geometry is inherently temporally heterogeneous, with evidence ranging from short-lived correspondences to persistent global scale. However, current architectures impose uniform and pathological influence patterns. For example, sliding windows enforce hard cutoffs, while ungated recurrence and causal attention cause cache saturation and spike-like attention sinks. To resolve this, we formalize geometric propagation as an evidence influence kernel and propose HorizonStream, a long-horizon Transformer that explicitly factorizes this kernel. For the long-range temporal factor, Geometric Linear Attention learns channel-wise decay rates to enable bounded, multi-timescale propagation of geometric evidence. For the short-range spatial factor, Geometric Local Attention with Spatiotemporal RoPE performs reliable 3D matching while suppressing attention sinks. Finally, Metric Readout Tokens recover stable scale and rigid pose directly from the persistent geometric state. Extensive experiments show that HorizonStream, trained on only 48-frame clips, generalizes stably to sequences exceeding 10,000\ frames with constant memory and linear time, achieving state-of-the-art streaming 3D reconstruction performance. Project Page: https://3dagentworld.github.io/horizonstream/

1
SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps or file-operation tasks for the difficulty of constructing rewards on real applications, leaving a gap between benchmark settings and real-world usage. Moreover, most benchmarks focus on basic grounding and navigation, with limited coverage of complex, long-horizon interactions. To address these limitations, we introduce SimuWoB, a fully synthetic benchmark for mobile GUI agents with 120 challenging tasks spanning diverse types and difficulty levels. We build a robust virtual environment generation framework that synthesizes high-fidelity tasks and environments, and automatically provides valid rewards for each task. Each environment is deployed as a backend-free webpage accessible via URL, enabling efficient and reproducible evaluation. We conduct comprehensive experiments on several state-of-the-art mobile GUI agents. The average success rate is only 27.92%, dropping to 17.82% on long-horizon tasks, which reveals substantial weaknesses in current agents under complex scenarios. Evaluation result comparison with real-world sample tasks demonstrate that agent assessments based on our synthetic environment generalize well. We further provide diagnostic insights across key capability dimensions and discuss implications for future mobile GUI agent development.

1
SemBridge: Language Transfer in Sparse Encoders via Multilingual Semantic Bridges

Sparse encoders offer high-precision retrieval by representing term importance within a vocabulary space, yet their English-centric structures pose a critical impediment to language transfer for non-English languages. To overcome this structural limitation, we propose SemBridge, a novel embedding initialization method designed for cross-lingual adaptation in sparse encoders by leveraging multilingual bridge models. SemBridge establishes semantic alignments between source and target vocabularies using multilingual dense embeddings as a bridge. Rather than directly relying on all source tokens, SemBridge selects a small set of semantically related source-language tokens and uses them to initialize each target-language token, effectively filtering out semantic noise and reconstructing target tokens as precise linear combinations of core synonyms. This accelerates convergence during fine-tuning and improves training efficiency. Extensive experiments across five languages and four sparse architectures demonstrate that SemBridge achieves superior zero-shot retrieval performance and consistently improves retrieval performance after fine-tuning compared to existing baselines. These results validate SemBridge as a practical solution for deploying high-performance sparse retrieval systems in diverse linguistic environments.

1
Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual Perturbation

Log anomaly detection is a critical task for system operations and security assurance. However, in networked systems at scale, log data are generated at massive scale while instance-level annotations are prohibitively expensive, posing great difficulties to fine-grained anomaly localization. To address this challenge, we propose LogMILP (Log anomaly localization based on Multi-Instance Learning enhanced by prototypes and Perturbation), a weakly supervised framework that enables both bag-level anomaly detection and instance-level anomaly localization using only bag-level labels. Our method guides the model to pinpoint the critical log entries using prototype-guided structural modeling with counterfactual perturbation consistency regularization, thereby improving localization reliability and interpretability under coarse-grained supervision. Experimental results on three public datasets demonstrate that LogMILP achieves competitive detection performance while yielding significantly more reliable instance-level localization. Our code is open-sourced at https://github.com/YUK1207/LogMILP.

0
Decoding the Critique Mechanism in Large Reasoning Models

Large Reasoning Models (LRMs) exhibit backtracking and self-verification mechanisms that enable them to revise intermediate steps and reach correct solutions, yielding strong performance on complex logical benchmarks. We hypothesize that such behaviors are beneficial only when the model has sufficiently strong ``critique'' ability to detect its own mistakes. This work systematically investigates how current LRMs recover from errors by inserting arithmetic mistakes in their intermediate reasoning steps. Notably, we discover a peculiar yet important phenomenon: despite the error propagating throughout the entire chain-of-thought (CoT) without any verbalized correction, the model still reaches the correct final answer after the thinking process finishes. This recovery implies the existence of an internal mechanism helping the model to detect errors and trigger self-correction, which we refer to as the hidden critique ability. Building on feature space analysis, we identify a highly interpretable critique vector representing this behavior. Extensive experiments across multiple model scales and families demonstrate that steering latent representations with this vector improves the model's error detection capability and enhances the performance of test-time scaling at no extra training cost. Our findings provide a valuable understanding of LRMs' critique behavior, suggesting a promising direction to control and improve their self-verification mechanism. Our code is available at: https://github.com/mail-research/lrm-critique-vectors.

0
Pixel-Level Pavement Distress Assessment Using Instance Segmentation

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.

0
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors occur at the level of individual visual claims. A good dense caption should be both faithful and informative, avoiding hallucination without omitting salient details. Yet pairwise preferences, reference-based metrics, and holistic scalar rewards compress these local errors into a single sequence-level signal, obscuring the tradeoff between factuality and coverage. We introduce ClaimDiff-RL, a framework that uses reference-conditioned atomic claim differences as the reward unit for caption RL. Given an image, an actor caption, and a reference caption, a multimodal judge enumerates visually grounded differences, verifies each difference against the image, assigns open-vocabulary error types and severity levels, and produces per-difference statistics for reward composition. This makes hallucinated claims and omitted salient facts separately measurable and tunable. Experiments show that holistic scalar rewards can reduce hallucination by increasing missing facts, while ClaimDiff-RL exposes this faithfulness and coverage tradeoff and enables more balanced operating points. On a 160-image human-labeled diagnostic benchmark, public captioning benchmarks, and VQA benchmarks, ClaimDiff-RL improves the hallucination--missing-fact balance, preserves general capability, and even surpasses Gemini-3-Pro-Preview on several fine-grained Capability dimensions such as object counting, spatial relations, and scene recognition. These results suggest that typed, verifiable claim differences are an effective reward unit for fine-grained and diagnosable caption RL.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - May 26, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Brew icon
Brew

Like Claude design for email marketing

0
DNSimple CLI icon
DNSimple CLI

Manage Your DNS from the Command Line with DNSimple CLI

0
Hayley icon
Hayley

Share your thoughts & gain insights in thinking patterns

0
DodoForm icon
DodoForm

Turn talking, pics, or scribbles into clean, structured data

0
AVTR-1 Real-Time Open Weights Model icon
AVTR-1 Real-Time Open Weights Model

Generating uncanny AI avatars is now open source

0
Ferrari Luce icon
Ferrari Luce

The first electric Ferrari designed by LoveFrom

0
QuakPit icon
QuakPit

Meeting reminders that actually make you smile.

0
blokdots 3.0 icon
blokdots 3.0

Prototype hardware visually, export real C++ for engineering

0
Willow Scribe icon
Willow Scribe

Tell Scribe what to say. It writes the rest.

0
crunr icon
crunr

Launch and run any compute job on AWS with 1 command

0
Ajar icon
Ajar

Lid Angle Sync & Keep Awake for AI Agents on Mac

0
ReplylessAI Sequences icon
ReplylessAI Sequences

Outbound email sequences without the sales-tool bloat

0
Parrot Speech-to-text API icon
Parrot Speech-to-text API

Fast, accurate STT for production-grade voice agents

0
LikePulse icon
LikePulse

See exactly where YouTube audiences react — instantly

0
Tesserac icon
Tesserac

A spatial alternative to Cmd+Tab for macOS

0
Trace icon
Trace

No-frills offline meeting transcripts with context

0
AI Shadowing icon
AI Shadowing

Turn any YouTube video into a language shadowing lesson

0
Bond icon
Bond

Outbound campaigns powered by real buying signals

0
Rezonant icon
Rezonant

Talk, spec, ship: get your product ideas into production

0
marpy.io icon
marpy.io

AI coding platform built specifically for the Python stack

0
Kept icon
Kept

Your AI chats, saved as Markdown locally with no cloud

0
SelectPrism icon
SelectPrism

Agents that screen and interview so you can hire faster

0
Parsewise API icon
Parsewise API

API for agentic multi-document processing

0
LangPanda icon
LangPanda

Learn languages from watching your favorite shows

0
MiniCPM5-1B icon
MiniCPM5-1B

A new SOTA for compact open models on the edge

0
NoteCove icon
NoteCove

Notes, tasks, and AI — offline-first, no SaaS bill.

0
Ormedo icon
Ormedo

Let AI agents handle your entire outbound pipeline

0
The Incident Challenge icon
The Incident Challenge

Production Debugging Games for Software Engineers

0
Yansu icon
Yansu

AI that learns how you work and turns it into software

0
MashuPack icon
MashuPack

Turn codebases into a clean file for Claude and ChatGPT

0
Fred icon
Fred

AI-orchestrated UX research with behavioural tracking

0
tldx icon
tldx

Fast CLI to bulk-check domains via RDAP & MCP

0
Forum icon
Forum

Dedicated space for Facebook groups

0
Pi Coding Agent icon
Pi Coding Agent

The coding-agent harness you can make your own

0
Orchestria icon
Orchestria

AI music engine with granular stem control

0
LLMTest icon
LLMTest

Use the right LLMs in your apps. Setup fallbacks. Be happy.

0
Tiny CV icon
Tiny CV

Resume builder that fits on one page

0
own.page icon
own.page

Make your own personal website with bento tiles

0
Databerry icon
Databerry

Track all your business data in a single dashboard

0
tweet.md icon
tweet.md

X posts as clean Markdown

0
Unabyss icon
Unabyss

MCP-native self-updating context layer for your AI

0
Rixx icon
Rixx

The Perplexity alternative that organizes your research

0
Supaboard 3.0 icon
Supaboard 3.0

AI data analysts that understand your business

0
Stitch 3.0 by Google icon
Stitch 3.0 by Google

Generate and iterate UI screens with AI on a live canvas

0
Runway Agent icon
Runway Agent

Generate edited, sound-designed videos via chat

0
Freu AI icon
Freu AI

Automate any Mac app with $0 recurring run cost

0
ModelHub icon
ModelHub

The missing menu bar app for local LLMs on Mac.

0
WhatCable icon
WhatCable

Know what your USB-C cable can really do

0
Edgee Fallback Models icon
Edgee Fallback Models

Claude Code that never stops

0
DynamicNotch icon
DynamicNotch

Dynamic island for macOS

0
06

TECHMEME

06.00
TECHMEME

Techmeme - May 26, 2026

Techmeme Digest: Major tech headlines and industry conversations.

How AI startups like Altur are using chatbots to help automate debt collection; YC incubated six debt collection and settlement startups in the past six years (Kate Knibbs/Wired)
Source: TechmemePublished: May 26, 2026

Kate Knibbs / Wired : How AI startups like Altur are using chatbots to help automate debt collection; YC incubated six debt collection and settlement startups in the past six years —  There's a mad dash to automate the world's most hated calls.  Have an unpaid bill?  You'll hear from an AI debt collector sometime soon.

Dropbox founder Drew Houston is stepping down as CEO after 19 years to become executive chairman, replaced by Ashraf Alkarmi, who is SVP and GM of Dropbox Core (Jonathan Vanian/CNBC)
Source: TechmemePublished: May 26, 2026

Jonathan Vanian / CNBC : Dropbox founder Drew Houston is stepping down as CEO after 19 years to become executive chairman, replaced by Ashraf Alkarmi, who is SVP and GM of Dropbox Core —  Drew Houston founded Dropbox nearly two decades ago out at age 24, eventually becoming a household name in Silicon Valley …

Google Fitbit Air review: slim, comfortable, and stylish, robust tracking, seven-day battery life, and cheaper than Whoop, but can only be worn on the wrist (Max Buondonno/The Shortcut)
Source: TechmemePublished: May 26, 2026

Max Buondonno / The Shortcut : Google Fitbit Air review: slim, comfortable, and stylish, robust tracking, seven-day battery life, and cheaper than Whoop, but can only be worn on the wrist —  🏆 Rating: 4/5  —  ✅ Pros  — 📐 Slim design that fades into the background  — ⌚️ Comfortable and stylish band options

The Dutch government blocks the acquisition of authentication IT supplier Solvinity by US-based Kyndryl, citing "a possible risk to the public interest" (Pieter Haeck/Politico)
Source: TechmemePublished: May 26, 2026

Pieter Haeck / Politico : The Dutch government blocks the acquisition of authentication IT supplier Solvinity by US-based Kyndryl, citing “a possible risk to the public interest” —  BRUSSELS — The Dutch government is blocking a United States-based company's attempts to acquire a key online identification IT supplier.

Spain says it is blocking Polymarket and Kalshi as a precautionary measure while it probes possible gambling law violations over the next three to four months (Mauro Orru/Wall Street Journal)
Source: TechmemePublished: May 26, 2026

Mauro Orru / Wall Street Journal : Spain says it is blocking Polymarket and Kalshi as a precautionary measure while it probes possible gambling law violations over the next three to four months —  The government said companies seeking to provide clients with gambling services need a license

Spotify launches a library of over 650 narrated long-form magazine articles in English for Premium users; free users can buy articles "individually for $1.99" (Jess Weatherbed/The Verge)
Source: TechmemePublished: May 26, 2026

Jess Weatherbed / The Verge : Spotify launches a library of over 650 narrated long-form magazine articles in English for Premium users; free users can buy articles “individually for $1.99” —  More than 650 long-form articles are available starting today as part of Spotify's audiobook library.

OpenRouter raised $113M led by CapitalG, a source says at a $1.3B valuation, and now processes 25T tokens across 400+ models weekly, up from 5T six months ago (Michael J. de la Merced/New York Times)
Source: TechmemePublished: May 26, 2026

Michael J. de la Merced / New York Times : OpenRouter raised $113M led by CapitalG, a source says at a $1.3B valuation, and now processes 25T tokens across 400+ models weekly, up from 5T six months ago —  An investment arm of Alphabet is backing OpenRouter, which helps companies choose among hundreds of models for different software tasks.

Atlanta-based e-commerce logistics company Stord raised a $250M Series F led by Strike at a $3B valuation, up from $1.5B after a $200M Series E in May 2025 (Julie Bort/TechCrunch)
Source: TechmemePublished: May 26, 2026

Julie Bort / TechCrunch : Atlanta-based e-commerce logistics company Stord raised a $250M Series F led by Strike at a $3B valuation, up from $1.5B after a $200M Series E in May 2025 —  E-commerce logistics company Stord has raised a $250 million round at a $3 billion valuation, it announced Tuesday.

SEC filing: Quantinuum is seeking to raise $1.05B in its US IPO, marketing ~21M shares for $45 to $50 each, giving it a $12.7B valuation at the top of the range (Carmen Reinicke/Bloomberg)
Source: TechmemePublished: May 26, 2026

Carmen Reinicke / Bloomberg : SEC filing: Quantinuum is seeking to raise $1.05B in its US IPO, marketing ~21M shares for $45 to $50 each, giving it a $12.7B valuation at the top of the range —  Quantinuum Inc., a quantum computing company backed by Honeywell International Inc., is seeking to raise $1.05 billion …

Sources: SpaceX successfully pressured the Pentagon to raise Starlink fees for LUCAS kamikaze drones amid increasing tensions over Starlink's pricing (David Jeans/Reuters)
Source: TechmemePublished: May 26, 2026

David Jeans / Reuters : Sources: SpaceX successfully pressured the Pentagon to raise Starlink fees for LUCAS kamikaze drones amid increasing tensions over Starlink's pricing —  As U.S. kamikaze drones guided by Elon Musk's Starlink network began to make visible gains in the war against Iran, senior SpaceX officials reached …

US law enforcement documents: the DHS, FBI, and other agencies introduce a novel domestic threat category termed "anti-tech violent extremism" amid the AI boom (Daniel Boguslaw/Wired)
Source: TechmemePublished: May 26, 2026

Daniel Boguslaw / Wired : US law enforcement documents: the DHS, FBI, and other agencies introduce a novel domestic threat category termed “anti-tech violent extremism” amid the AI boom —  As Americans stew over the looming risk of job-stealing AI and data centers in their back yards …

Pony AI reports Q1 revenue up 145% YoY to ~$34.3M, above $21.7M est., and increases its 2026 robotaxi fleet target by 500 to 3,500 vehicles on fast growth (Linda Lew/Bloomberg)
Source: TechmemePublished: May 26, 2026

Linda Lew / Bloomberg : Pony AI reports Q1 revenue up 145% YoY to ~$34.3M, above $21.7M est., and increases its 2026 robotaxi fleet target by 500 to 3,500 vehicles on fast growth —  Pony AI Inc. raised its robotaxi fleet target for this year by 500 vehicles to 3,500 after reporting stronger-than-expected first-quarter revenue.

Spotify co-CEO Alex Norström defends the company's expansion into AI-generated music, arguing that "controlled" products are superior to unregulated AI "slop" (Financial Times)
Source: TechmemePublished: May 26, 2026

Financial Times : Spotify co-CEO Alex Norström defends the company's expansion into AI-generated music, arguing that “controlled” products are superior to unregulated AI “slop” —  Streaming app strikes deal with Universal allowing subscribers to create ‘controlled’ covers and remixes

Xiaomi reports Q1 revenue down 11% YoY to ~$14.6B, its first quarterly decline in three years, and net income down 57% to $695M amid a global memory price jump (Bloomberg)
Source: TechmemePublished: May 26, 2026

Bloomberg : Xiaomi reports Q1 revenue down 11% YoY to ~$14.6B, its first quarterly decline in three years, and net income down 57% to $695M amid a global memory price jump —  Xiaomi Corp.'s quarterly profit tanked more than anticipated after sharp increases in memory prices exacted a heavy toll on the Chinese firm's smartphone business.

Sources: ByteDance is offering low-priced stock options linked to growth in its Seed AI division to staff of the unit, a first, to fend off poaching from rivals (Financial Times)
Source: TechmemePublished: May 26, 2026

Financial Times : Sources: ByteDance is offering low-priced stock options linked to growth in its Seed AI division to staff of the unit, a first, to fend off poaching from rivals —  TikTok owner issues shares tied to AI business unit as China's tech talent war heats up  —  ByteDance is offering special stock …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - May 26, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - May 26, 2026

Solidot Feed: Highlighting essential tech & open-source news.

英国皇家医学院学会认为社媒和香烟一样不利于青少年健康

英国皇家医学院学会在递交给政府的咨询意见书中表示,社交媒体的使用与吸烟一样对年轻人的健康构成威胁。医生在接诊年轻患者时,应例行询问他们的屏幕时间和社交媒体使用情况。英国政府正在考虑的一项措施是禁止 16 岁以下儿童使用社交媒体,类似澳大利亚的做法。其它可能采取的限制包括宵禁,或禁用自动播放和无限滚动等功能。儿童精神科医生 Emily Sehmer 认为过度使用社媒的危害远甚于吸烟,因为儿童只需几秒钟就会接触到有害内容。

Uber COO 称愈来愈难以证明最大化词元花的钱是合理的

Uber 高管表示 AI 上支出并没有带来相应的回报。Uber COO Andrew Macdonald 上周六接受采访时表示愈来愈难以证明最大化 AI 词元花的钱是合理的。而在上个月的一次采访中 Uber CTO Praveen Neppalli Naga 告诉 The Information,该公司已经用完了 2026 年的 Claude Code 预算。Macdonald 称,通过与工程主管的交流,他认识到更高的 AI 词元使用量并没有转变为消费者功能的相应增加。他说 AI 带来的权衡成本愈来愈难以证明支出是合理的。

JAXA 等成功测试五马赫冲压发动机

JAXA、早稻田大学、东京大学和庆应义塾大学的工程师团队成功完成了为五马赫高超音速飞机设计的冲压式发动机的地面燃烧试验。冲压发动机利用了发动机的前向运动来压缩空气,不使用带有可旋转叶片的压气机,它无法在空速为零的时候产生推力,需要先加速到超音速。在测试中,一架实验飞机被安装在 JAXA 角田宇宙中心的风洞中,模拟约 25 公里高空的环境条件。在五马赫的飞行速度下,机头和前缘周围的空气温度会超过 1000 摄氏度,为应对高温,工程师设计了一套先进的热防护系统,使飞机内部温度保持在接近正常工作温度的范围内,保证机载航空电子设备和控制电子设备的正常运行。JAXA 接下来计划将实验飞行器搭载在探空火箭上尝试实际飞行,它的目标是到 2040 年代实现商业高超音速客运服务。

BepiColombo 计划于 11 月 21 日进入水星轨道

欧洲 ESA 和日本 JAXA 合作的水星探索项目 BepiColombo 以意大利数学家 Giuseppe Colombo 的名字命名,探测器于 2018 年 10 月发射,原计划在六次飞掠水星之后于 2025 年 12 月进入水星轨道。但第四次飞掠水星前推进器出现故障,地面任务规划人员不得不修订时间表。JAXA 通过其社交媒体账号宣布了最新的日期:BepiColombo 计划于 11 月 21 日进入水星轨道。BepiColombo 包含三个组件:ESA 的水星转移模块和水星行星轨道器,以及 JAXA 的水星磁层轨道器。JAXA 的轨道器分离时间定在 12 月 10 日。BepiColombo 是人类第三次水星探测任务,前两次是 1973 年的 Mariner 10 和 2004 年的 Messenger。水星是太阳系最小密度最高的行星,由于温度非常高,ESA 的轨道器安装了上百公斤的隔热材料。

加州年龄验证法律将豁免大部分 Linux 发行版

加州年龄验证法律的修正案将豁免大部分 Linux 发行版和自由开源软件。年龄验证法律要求操作系统提供商在设置询问用户年龄。该法案的修改版 AB-1856 缩小了适用的操作系统提供商和应用程序的范围:(2) “操作系统提供商”不包括在许可条款允许接收方复制、重新分发和修改该软件的情况下,分发操作系统或应用程序的个人或实体。(2) “应用程序”不包括其本身并未作为独立可执行应用程序、通过受监管的应用程序商店向消费者提供的软件组件。Valve 的 SteamOS 平台仍然受到影响,因为它的 Steam 客户端是受监管的应用商店。

2025 年中亚经历了创纪录的冰川损失

中亚的冰川是生活在下游地区的数百万人的重要水源。一项新研究发现 2025 年中亚经历了创纪录的冰川损失。冰川加速消融可能会在短期内增加融水,但最终由于冰量的减少融水也会减少。研究人员利用对天山和帕米尔高原 16 座冰川的实地观测数据,结合模型,估算出中亚冰川在一年内损失了约 30 立方千米的冰,相当于该地区冰川总体积的近 2%。这一结果是异常温暖的春夏季气温以及降雪频率的大幅下降造成的。16 座冰川有 9 座经历了有史以来最严重的冰川质量损失,帕米尔高原西部和天山山脉西部的冰川消融最为严重,部分冰川在一年内损失了 2%-4% 的总冰量。64% 的冰川经历了自 1991 年以来最严重的冰川质量损失。研究人员警告由于全球暖化,这种情况可能成为常态。

摩托罗拉手机劫持亚马逊应用植入联盟营销推广码

用户通过社交媒体报告,摩托罗拉手机预装的一个应用 Smart Feed 在更新之后开始劫持亚马逊应用植入联盟营销推广码获取佣金。非常奇怪的是,推广码 sramz-kff-008-20 指向的是一名时尚博主“@kirasfashionfinds”,也就是佣金给的不是 Smart Feed 而是这位博主。暂时不清楚究竟发生了什么。受影响的用户可通过禁用 Smart Feed 关闭推广码,方法是:Settings > Apps > 搜索定位到 “Smart Feed” > Disable。

教宗呼吁不可用 AI 作恶

教宗良十四世颁布了其首道通谕,呼吁世人不可用人工智能来作恶,切莫把人工智能当成「掌控、排斥或死亡的工具」。 教会长期支持核裁军,称这是「为人类大家庭和平与尊严的服务」。同样地,人工智能今天也不可用于作恶,这就「如同核能那样,必须用来为所有的人和公共福祉效劳」。「关于科技的决定绝对是与良心和责任密不可分的」。「和平不只是没有战争,更是正义伸张。然而,当科技削弱我们的批判意识时,和平本身就会陷入危险。无论如何,光是解除武装仍有所不足,我们还必须进行建设。」

欧洲执法部门黑进 VPN 服务识别勒索组织用户

欧洲刑警组织披露,他们黑进了被网络犯罪分子使用的 VPN 服务“First VPN”,访问了用户数据库,识别了数千用户身份。First VPN 的网站已经显示被执法部门扣押的信息,它过去曾在俄语网络犯罪论坛上打广告,宣称能隐藏用户的 IP 地址,加密所有通信,不记录任何日志。它还声称将拒绝与司法机关合作,其服务不受任何司法管辖,且不会存储任何用户数据。First VPN 的活动始于 2014 年,在 27 个国家/地区提供了 32 个出口节点服务器。至少有 25 个勒索软件组织利用了其基础设施进行网络侦察和入侵。警方搜查了该服务管理员在乌克兰的住所,拆除了 33 台服务器。

HBM 成本占到了 AI 芯片组件成本的三分之二

对英伟达、AMD、Google 和亚马逊四家公司的 AI 芯片的分析显示,HBM 内存芯片成本占到了 AI 芯片组件成本的三分之二(63%),逻辑芯片占 13%,先进封装占 15%,辅助组件占 9% 。四家公司在 HBM 上的支出从 2024 年的约 120 亿美元增至 2025 年的 320 亿美元,增速远超其它芯片组件。随着内存芯片供应持续紧张且价格上涨,HBM 在 2026 年的市场份额可能会进一步扩大。超大规模数据中心运营商在其资本支出预期中已经预见到这一点:微软 2026 财年 1900 亿美元的资本支出预期中,约有 250 亿美元来自组件价格上涨;Meta 将其 2026 年资本支出预期上调了 100 亿美元,理由同样是组件价格上涨。

惠普调查 BIOS 更新导致笔记本故障问题

过去几个月惠普笔记本电脑用户通过论坛等报告在更新 BIOS 之后设备出现了问题,包括设备无法启动、风扇噪音异常以及蓝屏死机等等。一名移动工作站 ZBook Ultra G1a 的用户称更新 BIOS 之后设备在启动过程中卡住。受影响的产品包括 ZBook Ultra G1a,存在问题的 BIOS 版本号 01.04.03 和 01.04.05;EliteBook X G1a,存在问题的 BIOS 版本号 01.03.11 和 01.05.00。惠普表示它正对此展开调查,建议受影响的用户联系其技术支持团队。这不是第一次惠普设备因为存在问题的 BIOS 更新而导致设备故障。

俄罗斯推迟对移动 VPN 用户收费的计划

俄罗斯政府已推迟对使用 VPN 的移动互联网用户收费的计划。俄罗斯数字发展部在三月表示将打击 VPN 的使用。它最初要求移动网络运营商从 5 月 1 日起对每月国际数据流量超 15GB 的用户收费。但由于追踪 VPN 使用和计费方面存在困难,该期限已推迟至 6 月 1 日。该收费计划可能会再次被推迟,可能会在 9 月底国家杜马和地方选举之后实施。原因是一个功能完整的国际流量支付系统需要三到四个月才能建成。在这项政策推行前,俄罗斯的移动互联网频繁发生中断事件。

政治情绪和普通情绪不同

根据 PNAS 期刊上的一项研究,政治情绪的生理反应和日常经历的普通情绪不同。研究人员邀请近 1000 名美国参与者使用名为 emBODY 的身体映射工具,绘制出感受到的普通情绪和政治情绪的身体部位。研究发现,政治情绪有着独特的身体反应模式。举例来说,政治抑郁会引发身体更广泛、更强烈的感受,而非普通抑郁的麻木感。这意味着政治绝望会激励人行动而不是对一切漠然。政治厌恶感与普通厌恶感也不同。病原体引起的厌恶感如呕吐反应会在胃部和喉咙强烈感受到,而政治厌恶感则更像是愤怒。这意味着政治将厌恶感转化为一种更具道德感和愤怒感的情绪,改变了对政治厌恶感的思考方式。研究还发现不同意识形态的人体验的政治情绪存在差异。倾向于民主党的参与者相比倾向共和党的参与者,对愤怒、焦虑、抑郁和厌恶等负面政治情绪的身体感受更为强烈。

科学家推翻空气动力学的基础原则

几十年来,降低空气阻力的一大原则是表面必须光滑。日本东北大学研究团队率先证明,仅仅应用分布式微粗糙度(distributed micro-roughness 或 DMR),就能将空气阻力降低达 43.6%。DMR 是一种肉眼无法分辨的、极其微小且不规则的表面粗糙度。研究团队利用 1m-MSBS 系统精确测量了光滑表面和 DMR 涂层表面的阻力系数,结果显示 DMR 涂层表面的阻力系数低于光滑表面。

科学家破解烟草合成尼古丁之谜

尼古丁是让烟草具有成瘾性的化合物,人类使用尼古丁已有逾万年历史。但在数十年研究之后科学家仍然未能完全理解烟草植物是如何合成尼古丁分子的。根据发表在《Nature Communications》上的一项研究,科学家破解了烟草合成尼古丁之谜。研究团队发现,尼古丁一开始与葡萄糖分子结合,葡萄糖分子为尼古丁分子的基本构建块提供了能量去加速组装,但在最后葡萄糖分子会被移除。论文第一作者 Benjamin Schwabe 还发现了 NaGR 和 NicG 两种植物酶的精确结构,两种酶帮助将尼古丁分子从较小的片段组装起来。最新发现使得利用烟草植物生产更安全的药物和疫苗成为可能。

日本声优起诉要求 TikTok 删除 AI 模仿其声音的视频

日本人气声优津田健次郎已向东京地方法院提起诉讼,以有人利用生成式 AI 擅自模仿其声音制作视频并公开为由,要求 TikTok 运营方删除相关视频。这可能是关于生成式 AI 擅自使用声音的首个诉讼案。津田“富有磁性的低音声线”被认为是其特色,因在动画《咒术回战》中为七海建人、在《黄金神威》中为尾形百之助等角色配音而知名。 起诉书称,发布视频的人姓名不详。2024 年 7 月至 2025 年 9 月期间,该账号发布了 188 个视频,视频配有模仿津田声音的旁白,主题涉及都市传闻、神秘事件和杂学。根据 TikTok 的支付机制,该账号每月有 50 万至 75 万日元的收益。被告辩解称旁白为“普通的男性声音”,说话方式也没有特色,与津田的声音并不相似。账号发帖者解释说视频是让 AI 学习朋友的声音后制作,认为并不违法。

气候变化威胁全球植物物种

根据发表在《科学》期刊上的一项研究,气候变化增加植物物种灭绝的风险。研究人员分析了逾 67,000 种维管植物——维管植物是指有输送水分和养分之内部复杂传导组织的植物,全球已发现维管植物约在 30~40 万种之间。研究发现,7%-16% 的维管植物可能会失去逾九成的栖息地,面临极高的灭绝风险。植物的栖息地并非是地图上的一个位置,而是其生存所需的全部条件:温度、降雨量、土壤、土地利用以及遮荫处等地理特征。研究表明,气候变化正在缩小适宜植物生存的组合条件,使其生存所需的所有条件同时存在的区域越来越少。植物是多数陆地生态系统的基础。植物蓄碳、稳定土壤、为野生动物提供栖息地,提供食物、木材、药物等。植物多样性的变化会对自然和人类产生连锁反应。

Firefox 加入对 Web Serial API 的支持,与 Adafruit 合作

刚刚发布的 Firefox 151 加入了对 Web Serial API 的支持。Web Serial API 允许网站使用 JavaScript 向串口设备如 USB 和蓝牙设备写入或读取数据。Mozilla 称大部分人不会使用到该 API,它的主要使用群体是开发者,他们将能利用浏览器与兼容硬件设备直接进行通信。Mozilla 同时宣布与知名开源硬件平台 Adafruit 展开合作。Adafruit 基于浏览器的硬件工作流程能在 Firefox 上直接运行。以 Adafruit ESP32-S 开发板为例,通过 Web Serial 可以将网页代码发送的消息直接显示在设备上,或者直接在手持设备上修改网页的 CSS 属性。

四月全球风能太阳能发电量超过天然气发电量

四月全球风能太阳能发电量超过了天然气发电量。根据能源智库 Ember 的分析,四月风能和太阳能发电量占全球总发电量的 22%,天然气发电量占 20%。四月风能和太阳能总发电量达到创纪录的 531 TWh,比天然气总发电量 477 TWh 高出 54 TWh。而五年前的 2021 年 4 月,天然气总发电量 476 TWh,和今天几乎完全一致。但当时的风能和太阳能总发电总量仅为 245 TWh,不到今天的一半。北半球的四月是春季,通常风力强劲,因此风能发电量在四月一般呈增长趋势。Ember 的报告《Global Electricity Review》认为在 2025 年风能和太阳能足以满足全球电力的增长需求。

《星际公民(Star Citizen)》筹款突破十亿美元

开发了 14 年但发布日期未知的《星际公民(Star Citizen)》的筹款突破了十亿美元达到 1,003,408,183 美元。《星际公民》由《银河飞将》创始人 Chris Roberts 领导开发,试图复兴太空模拟飞行游戏,允许玩家在广袤的宇宙空间内探险,交易和战斗。它于 2012 年在 Kickstarter 上成功众筹,原计划的交付时间是 2014 年。但在 Kickstarter 众筹结束后,开发团队继续在官方网站上进行募资,许多募资其实就是销售游戏内的虚拟物品如各种型号的飞船。2018 年它筹集到 2 亿美元,五年之后突破 6 亿美元,2024 年 5 月突破了 7 亿美元,2 年之后突破了 10 亿美元。《星际公民》堪称史上开发预算最高的 3A 游戏。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK