TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0884
TUE, JUN 2, 2026
Discover the best information organized by OrangeBot.AI
TODAY · TUE, JUN 2, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 2, 2026

Here is a summary of today's key news events.

Mixed Day for Markets as Tech Lifts S&P 500 to Record High

The S&P 500 reached a new record, but the rally was narrow, driven almost entirely by gains in technology and AI-related stocks while most other sectors declined. Commodities like oil, gold, and natural gas, along with the U.S. dollar, fluctuated as investors weighed ongoing inflation concerns against geopolitical uncertainty in the Middle East.

Major Financial Moves and Legal Challenges Shake AI Sector

The AI industry saw significant financial activity as Alphabet announced an $80 billion share sale to fund data center expansion. At the same time, Anthropic, the company behind the AI model Claude, filed to go public. However, the sector also faces new scrutiny, with a lawsuit filed against OpenAI alleging that its ChatGPT product is unsafe.

U.S.-Iran Talks Stall, Increasing Middle East Tensions

Iran has reportedly suspended back-channel negotiations with the U.S., dimming hopes for a quick de-escalation of conflict and the reopening of the crucial Strait of Hormuz shipping lane. The stalled talks and exchange of military strikes have increased regional instability and are causing volatility in global oil markets.

Russia Unleashes Major Air Attack on Ukrainian Cities

Russia launched another large-scale overnight missile and drone attack on Kyiv and other cities across Ukraine. This marks the third major aerial bombardment in the past month, continuing its assault on civilian and infrastructure targets.

Genco Shipping Board Rejects Unsolicited Buyout Offer

The board of Genco Shipping & Trading has unanimously rejected a revised takeover bid from rival Diana Shipping. Genco stated that the offer continues to significantly undervalue the company and its assets.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 2, 2026

Hacker News Feed: Highlighting key posts and discussions.

Why Janet? (2023)

(ianthehenry.com)

248116
Fooling around with encrypted reasoning blobs

(blog.cryptographyengineering.com)

11328
macOS needs its grid back

(blog.hopefullyuseful.com)

317183
Chipotlai Max

(github.com)

28848
Debug Project

(debug.com)

24797
KDE at 30

(kde.org)

242121
Nvidia RTX Spark

(www.nvidia.com)

402396
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - June 2, 2026

huggingface.description

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Parameter-efficient fine-tuning (PEFT) is usually treated as a cheaper alternative to full fine-tuning. We study a broader role: small trainable adapters as persistent local state on top of strong shared foundation models. In this framing, the base model provides shared competence while adapters carry instance-specific behavior such as preferences, skills, tool habits, and memory-like updates. We organize the problem around three scaling axes: Scale Up, where stronger shared priors make small local updates more useful; Scale Down, where we study how small adapters can be while remaining reliable; and Scale Out, where many persistent adapted instances coexist. MinT provides one infrastructure example for managing adapter identity, revision, provenance, evaluation, and serving residency. Together, the results suggest that PEFT can be a compact substrate for persistent personal models rather than only a budget substitute for full fine-tuning.

53
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

As agent capabilities advance, existing benchmarks, such as τ^2-Bench, are becoming increasingly saturated. Yet constructing new benchmark tasks remains complex, costly, and labor-intensive. Moreover, the standard approach, in which scenarios are first written in natural language and then mapped to tool sequences, captures only a narrow subset of the tool-use patterns agents exercise. In this paper, we address these problems by reversing the task construction process. We propose TASTE: Task Synthesis from Tool Sequence Evolution, an automatic method that generates challenging tasks with broader tool-use coverage. TASTE utilizes an Adaptive Contrastive n-gram model trained on LLM-judged validity signals. This enables sampling valid tool sequences that cover a vast range of tool combinations. TASTE then selects representative sequences from the pool via clustering, instantiates them into complete benchmark tasks, and refines them through iterative difficulty evolution. Using TASTE, we construct τ^c-Bench, a challenging extension of the three domains of τ^2-Bench. We evaluate 11 agent/user LLM pairs and find that models nearly saturating τ^2-Bench suffer severe performance drops on our tasks (e.g., Gemini-3-Flash falls from 0.82!-!0.94 to 0.28!-!0.61). Beyond increasing difficulty, our generated tasks more than double the number of unique tool combinations agents must execute. Our results suggest high scores on existing benchmarks often reflect saturation rather than robust task-solving ability. By automating the generation of difficult, high-coverage benchmarks, TASTE enables continuous, scalable evaluation of future agents.

50
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400 problems. The 300-problem K-BrowseComp-Verified subset is manually constructed and validated by native Korean speakers. On this subset, frontier LLMs, including GPT-5.5, DeepSeek-V4-Pro, and GLM-5.1, reach only 30.00--45.67\%, a substantial drop from BrowseComp, while Korean LLMs released through Korea's Proprietary AI Foundation Model program obtain only 0.00--10.33\%. We further construct a 100-problem synthetic split using hard few-shot exemplars and failure-mode-targeted generation to exploit the asymmetry between solving and creating web browsing problems. On the adversarially filtered synthetic diagnostic split, the strongest model reaches only 26.00\%, and we report this split separately as a targeted stress test. We publicly release our data and code.

41
Draft-OPD: On-Policy Distillation for Speculative Draft Models

Speculative decoding accelerates large language model inference by pairing a target model with a lightweight draft model whose proposed tokens are verified in parallel. A common way to build draft models, like EAGLE3 or DFlash is supervised fine-tuning (SFT) on target-generated trajectories. However, we observe that SFT quickly plateaus: the draft model's acceptance length on test data stops improving. The reason is an offline-to-inference mismatch: In SFT, the drafter learns from fixed target-generated trajectories, whereas during speculative decoding it is evaluated on blocks proposed under its own policy. This motivates on-policy distillation (OPD), where the target model supervises the drafter on draft-induced states. Yet OPD remains difficult for draft models, as they cannot reliably roll out complete sequences independently, whereas target-assisted generation makes the collected sequences follow the target distribution and thus eliminates the on-policy signal. We therefore propose Draft-OPD, which uses target-assisted rollout for stable continuations and replays drafting from the verification-exposed error positions. This allows the drafter to learn from target feedback on both accepted and rejected proposals, focusing training on the draft-induced errors that limit speculative acceptance. Experiments show that Draft-OPD achieves over 5times lossless acceleration for thinking models across diverse tasks, improving over EAGLE-3 and DFlash by 23\% and 13\%.

24
VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to logical failures across diverse reasoning scenarios. Existing efforts try to utilize Vision-Language Models (VLMs) as problem pre-solvers to produce or refine textual guidance for the VGM. However, textual descriptions fail to capture intricate spatiotemporal details, and VGMs often struggle to faithfully execute fine-grained or long-tail instructions even with a valid plan. While VLMs struggle as solvers, they possess strong perception capabilities to evaluate process-constraint satisfaction and final-goal achievement. Leveraging this strength, we introduce a paradigm shift that transitions the role of VLMs to "teachers". Specifically, a VLM teacher extracts task-specific rules to formulate differentiable rewards, guiding a VGM Reasoner via test-time online optimization of a lightweight LoRA module. This strategy enables adaptive test-time optimization and extends the reasoning capabilities beyond the VGM's intrinsic boundaries. Evaluations on symbolic (VBVR-Bench) and general-purpose (RULER-Bench) video reasoning benchmarks show that the proposed method yields a 16.7-point average performance gain, outperforming the VLM-as-Solver paradigm (+0.4 points) and Best-of-N scaling (+2.2 points) by a large margin at comparable test-time cost. These findings reveal that integrating VLMs as test-time teachers offers a promising paradigm for achieving generalizable video reasoning. Project Page: https://VLM-as-Teacher.github.io/

20
SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-level feedback, which makes failure attribution coarse and often produces unstable or overly broad revisions. We propose SkillAdaptor, a training-free step-level skill adaptation framework with explicit failure attribution, and it can plug into OpenClaw-class agent harnesses. Given a failed trajectory, SkillAdaptor identifies a first actionable fault step, links responsibility to candidate skills, and applies targeted updates under explicit acceptance checks while keeping the backbone frozen. We evaluate on WebShop, PinchBench, and Claw-Eval with Kimi-K2.5, GLM-5, and GPT-5.2. SkillAdaptor improves over no-skill and skill-adaptation baselines on all three suites, with the largest single-metric improvements of +1.5 points on PinchBench Avg Score%, +1.8 on Claw-Eval Avg Score, and +1.7 on WebShop success rate. These results indicate that step-level attribution supports more stable and auditable training-free skill maintenanceThe code will be released at https://github.com/zjunlp/SkillAdaptor..

20
X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

While video streaming understanding has made significant strides, real-world applications, such as live sports broadcasting, autonomous driving, and multi-screen collaboration, inherently demand continuous, multi-stream interactions. However, existing benchmarks are confined to single-stream paradigms, leaving a critical gap in evaluating online, cross-stream reasoning. To bridge this, we introduce X-Stream, the first benchmark dedicated to multi-stream streaming understanding. Comprising 4,220 rigorously curated QA pairs across 932 videos, X-Stream evaluates 11 subtasks across multi-window, multi-view, and multi-device scenarios. Crucially, our dataset is constructed using a novel dual-verification pipeline that prevents over-reliance on a single stream. Furthermore, we pioneer the conceptualization of multi-modal large language models (MLLMs) as naive multiplexers, systematically evaluating their performance through the lens of Signal Multiplexing Theory. Our extensive online inference experiments reveal a stark reality: state-of-the-art MLLMs struggle significantly with concurrent streams, achieving only about 50% score and exhibiting poor proactive ability. Ultimately, X-Stream exposes the trade-off of current multiplexing schemes, providing both a practical evaluation protocol and empirical guidance for next-generation multi-stream agents.

20
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layout by changing which tokens occupy the window or how their positions are encoded. The per-head KV layout itself, a dominant contributor to streaming memory and latency, has been mostly left unchanged. In this paper, we present the first study of Multi-Head Latent Attention (MLA) in video diffusion. VideoMLA replaces per-head keys and values with a shared low-rank content latent and a shared decoupled 3D-RoPE positional key, reducing per-token KV memory by 92.7% at every cached layer. We further investigate why MLA succeeds in video diffusion even though the spectral assumption often used to motivate it in language models does not hold: pretrained video attention is not low-rank, with 99%-energy effective rank far above any practical latent dimension. VideoMLA retains quality at compression ratios where direct spectral approximation would predict large reconstruction error. We show that the MLA bottleneck, rather than the pretrained spectrum, determines the effective rank: both spectral and random initialization occupy nearly the full rank budget from initialization, and training preserves this budget while adapting within it. On VBench, VideoMLA matches short-horizon streaming video diffusion baselines, achieves the best overall score at long horizons among evaluated methods, and improves throughput by 1.23x on a single B200.

20
Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

Humans can reproduce the viewpoint specified by a target image through active head and body motion, yet spatial intelligence in foundation models has largely been studied as passive understanding of pre-collected observations. We introduce Target Viewpoint Reproduction (TVR) -- an active task where an agent adjusts its viewpoint in a 3D environment until its observation matches a given target image -- and TVRBench, an indoor-simulation benchmark spanning scene scale and target-view visual richness. TVR is far from solved: on the evaluation split, the strongest open-source and closed-source models reach only 7.8% and 12.0% success. Fine-grained analysis identifies two consistent bottlenecks: off-the-shelf models struggle with multi-turn visual history, and performance drops sharply when viewpoint reproduction requires body translation rather than in-place rotation, exposing a gap in mapping spatial discrepancies to embodied movement. To study reducing this gap, we build a unified TVR post-training framework covering expert-trajectory SFT, rationale-supervised CoT-SFT, offline Single-turn GRPO, and on-policy Multi-turn GRPO from live simulator rollouts. Visual-action SFT supplies the main gain, raising a 9B open-source model to 50.8% success; Multi-turn GRPO provides targeted multi-room refinement and reaches 51.4% overall, while CoT supervision and Single-turn GRPO degrade closed-loop performance. These results establish TVRBench as a testbed for measuring and training foundation models that actively perceive and act in 3D environments. Our code, data, and models are available at https://github.com/aim-uofa/TVRBench.

17
NITP: Next Implicit Token Prediction for LLM Pre-training

Standard next-token prediction (NTP) supervises language models solely through discrete labels in the output logit space. We argue that this sparse one-hot supervision leaves the latent representation space under-constrained, allowing hidden states to drift into degenerate and anisotropic configurations that can limit generalization. To address this issue, we propose Next Implicit Token Prediction (NITP), which augments discrete prediction with dense continuous supervision directly in the representation space. NITP trains the model to predict the implicit semantic content of the next token, using shallow-layer representations from the same model as stable self-supervised targets. We provide theoretical analysis showing that NITP regularizes the optimization landscape by mitigating under-constrained degrees of freedom and encouraging a compact, structured representation geometry. Empirically, across dense and MoE models ranging from 0.5B to 9B parameters, NITP consistently improves downstream performance with negligible computational overhead. On a 9B MoE model, NITP achieves a 5.7% absolute improvement on MMLU-Pro, along with gains of 6.4% on C3 and 4.3% on CommonsenseQA, with approximately 2% additional training FLOPs and no additional inference cost. Our implementation is available at https://github.com/aHapBean/NITP.

17
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Multi-agent LLM workflows route inference through specialized roles to lift end-task accuracy, but jointly training those roles with reinforcement learning is unstable in ways that are poorly understood. We study when end-to-end RL training of multi-agent LLM workflows improves over their base models, comparing Shared-Policy training, where all roles update one policy, with Isolated-Policy training, where each role has its own parameters. Our experimental matrix spans Eval-Opt, Voting, and Orch-Workers workflows, math and code tasks, and three model scales (0.6B, 1.7B, 4B). We find that multi-agent RL usually improves over base models, but gains depend jointly on workflow, task, and scale, not on policy sharing alone. Isolated-Policy tends to reach higher peak accuracy yet more often falls off a terminal accuracy cliff, while Shared-Policy training does not eliminate failure; it redistributes failure into qualitatively different patterns. We then explain the strongest of these patterns through role-level gradient dynamics induced by workflow topology and policy routing: under Isolated-Policy, parallel same-role agents on shared prompts amplify per-role gradients and drive terminal degradation in Voting and Orch-Workers workflows; under Shared-Policy, asymmetric per-step gradient mass causes the shared policy to be captured by the dominant role, producing different failure signatures by task and workflow. Together, the empirical map and its underlying mechanisms show that policy sharing routes training pressure through different channels rather than offering uniform stability, making it a design choice with workflow- and task-conditional tradeoffs.

13
LVSA: Training-Free Sparse Attention for Long Video Diffusion

Dense self-attention is the compute and quality bottleneck of long-video diffusion inference: cost grows quadratically with the sequence length, and beyond the training horizon the model converges to near-static output, that is, "frozen" repetitive video. State of the art approaches are either too costly, e.g., they require retraining, or fail to satisfy both performance and quality objectives in a scalable manner. To this end, we introduce Long Video Sparse Attention (LVSA), a training-free model-agnostic block-sparse attention for video diffusion transformers that combines a structured window pattern with rotating global anchors, thus removing the fixed-grid bias which causes long-range temporal artifacts. LVSA, combined with a FlashInfer kernel, reduces compute up to 3.17x on Wan 2.1 1.3B at a 6x horizon, 2.98x on Wan 2.1 14B at a 6x horizon, and 3.33x on HunyuanVideo 1.5 at a 1.5x horizon, compared to dense attention. Beyond reducing compute, LVSA enables HunyuanVideo 1.5 generation at a 2x horizon, which is otherwise out-of-memory on a single GPU. Moreover, LVSA provides speedups up to 2.41x compared to RIFLEx and 3.27x compared to UltraViCo on Wan 2.1 1.3B. To demonstrate applicability across diverse platforms, we apply LVSA on NPUs and achieve speedups up to 2.71x on Wan 2.2 A14B and 3.24x on Wan 2.1 1.3B compared to dense attention. To evaluate quality in a fair way, we introduce VQeval, a tool properly scoring loopy video failures, which instead are rewarded in state of the art evaluators like VBench-Long. LVSA is quality-neutral for generation at training horizon length and quality-positive at extended lengths.

12
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal applications and development platforms. However, existing benchmarks predominantly focus on generic information-seeking tools and fail to capture the practical challenges posed by personal social applications, where tools interact with individual accounts or local databases. To bridge this critical gap, we introduce MCP-Persona, the first benchmark specifically designed for evaluating agent performance on real-world, personalized MCP tools. MCP-Persona encompasses a diverse set of widely-used applications, ranging from social media platforms like Reddit and Xiaohongshu (Rednote) to enterprise collaboration suites such as Lark (Feishu) and Slack. Our extensive experiments on various state-of-the-art (SOTA) agents demonstrate their significant struggles with personalized tool use, thereby highlighting the benchmark's crucial role in identifying and addressing these limitations. MCP-Persona is publicly available at https://github.com/wwh0411/MCP-Persona}{https://github.com/wwh0411/MCP-Persona.

12
Joint Agent Memory and Exploration Learning via Novelty Signals

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.

11
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory: once the active window accumulates appearance errors, subsequent generations can only condition on this degraded trajectory and drift further away. We address this limitation by formulating long video generation as a retrieval-augmented generation (RAG) problem. Rather than relying solely on the recent window, we treat previously generated latents as a dynamic, searchable history. We propose LongLive-RAG, a general retrieval framework for AR video generation. At each new block, LongLive-RAG uses a query embedding to retrieve relevant historical latents. This lightweight retrieval step adds only a small overhead relative to generation and lets the generator condition on non-local context instead of only the recent window. To make retrieval more discriminative, we introduce the Window Temporal Delta Loss that suppresses redundant local similarity and encourages embeddings to capture meaningful temporal changes. Together, these components help reduce error accumulation caused by sliding-window attention. Experiments across multiple AR backbones and generation lengths show improved long-video quality and the best average VBench-Long rank. To our knowledge, among open-ended AR long video generation methods, LongLive-RAG is the first to formulate self-generated latent history as content-addressable retrieval memory. Code is available at https://github.com/qixinhu11/LongLive-RAG.

9
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the diverse, ever-changing open web. Although online RL has shown promise for text-based agents, its potential for training visual web agents directly on live websites remains largely underexplored. In this paper, we introduce OpenWebRL, an open framework for training visual web agents with online multi-turn RL on real websites. OpenWebRL covers the full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, we train OpenWebRL-4B, which establishes a new open-source state of the art on challenging live-web benchmarks. With only 0.4K initialization trajectories and 2.2K open-ended RL training tasks, OpenWebRL-4B achieves 67.0% success on Online-Mind2Web and 64.0% on DeepShop, outperforming prior open agents of similar or larger scale and remaining competitive with proprietary systems including OpenAI CUA and Gemini CUA. Beyond strong benchmark performance, we systematically study the key design choices that make online RL effective for visual web agents, and analyze how RL improves agentic reasoning. Overall, our work offers a practical path toward building more capable, reproducible, and cost-efficient open web agents. We will release our training data, models, and code to support future research.

9
Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.

9
Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

LLM agents increasingly retrieve externally curated skills-procedural instructions retrieved at decision time-to improve performance on long-horizon interactive tasks. Existing skill libraries are typically treated as model-agnostic, reusing the same skill formulations across backbones with substantially different capacities and behaviors. However, our controlled experiments across multiple model scales show that skill effectiveness is strongly model-dependent: a skill that benefits one backbone can harm another. Motivated by this observation, we propose MASA Model-Aware Skill Alignment, a framework that adapts skills to each target backbone without modifying agent weights. MASA operates in two stages: (1) a hierarchical skill evolution pipeline that iteratively rewrites general and task-specific skills using hill climbing and UCB-driven tree search, guided by environment feedback and model capability profiles; and (2) a lightweight model-conditioned skill rewriter trained on evolution trajectories to reproduce the adaptation in a single forward pass. Experiments across three interactive environments and four backbones show that MASA consistently achieves the best overall performance, with gains of up to 25.8 points over the strongest baseline. The learned rewriter further generalizes to unseen tasks and environments without additional search, consistently outperforming a much larger teacher LLM at a fraction of the inference cost.

9
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Speculative decoding accelerates LLM inference by drafting multiple tokens and verifying them in parallel with the target model. However, its practical speedup is constrained by the trade-off between draft quality and drafting cost: autoregressive drafters model causal dependencies among draft tokens but incur sequential overhead, while parallel drafters reduce drafting cost but weaken intra-block dependency modeling. In this paper, we propose Domino, a speculative decoding framework that decouples causal dependency modeling from expensive autoregressive draft execution. Domino first uses a parallel draft backbone to produce preliminary draft distributions for the entire block, and then applies a lightweight Domino head to refine them with prefix-dependent causal information. To stabilize teacher-forced causal encoding, we further introduce a base-anchored training curriculum that first strengthens the parallel backbone and then gradually shifts optimization toward the causally corrected final distribution. Experiments on Qwen3 models show that Domino achieves up to \(5.49\times\) end-to-end speedup under the Transformers backend and up to \(5.8\times\) throughput speedup under SGLang serving.

8
MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft

Multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and action generation. However, their ability to sustain exploration in dynamic open worlds remains unclear. Existing embodied and game-based benchmarks often compress interaction into short-horizon tasks or entangle success with domain-specific game mechanics. In this paper, we introduce MineExplorer benchmark for evaluating open-world exploration capabilities of MLLM agents in Minecraft. We first filter atomic tasks whose solutions rely heavily on Minecraft-specific knowledge to better reflect general open-world reasoning. Then we organize the benchmark around a ReAct-style capability formulation and compose atomic tasks into implicit multi-hop tasks. To further construct reliable instances, MineExplorer uses a multi-agent synthesis workflow that jointly designs task graphs, sandbox scenes, and rule-based milestone evaluators. Human evaluation shows that the multi-agent synthesis workflow produces significantly more reliable instances than a single-agent baseline. Experiments with advanced MLLM agents show that open-world exploration remains challenging, as strong models can handle many single-hop tasks but degrade sharply when hidden prerequisites must be coordinated over longer trajectories. Further analysis finds that task difficulty tracks agent completion, and larger models or thinking modes do not consistently translate into better performance. Code and dataset are available at https://github.com/Jometeorie/MineExplorer.

6
RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

Vision-Language Models (VLMs) have shown strong visual understanding and are increasingly deployed in embodied AI systems, where reliable perception under real conditions is essential. However, existing benchmarks assess VLMs using clean images or isolated perturbations rather than stresses caused by physical scene formation. This design has two limitations: it covers only a narrow subset of everyday visual stresses, and some perturbations rarely appear in realistic embodied scenes. This gap raises a fundamental question: how can we define visual stress in a principled way that captures the diverse factors encountered in physical environments? To address this question, we formulate visual perception from an inverse graphics perspective and introduce RoboStressBench, a benchmark for evaluating VLM robustness to physical visual stress in embodied scenes. Inspired by the physical rendering equation, RoboStressBench decomposes visual stress into four physically grounded dimensions: Material (M), Viewpoint (V), Lighting (L), and Geometry (G). This design enables RoboStressBench to cover a broad range of visual stresses in real-world environments, while allowing controlled analysis of their effects on VLM capabilities such as visual recognition, reasoning, and planning. Through comprehensive evaluations of state-of-the-art VLMs, we identify stress-specific failure modes and reveal that different physical factors degrade different embodied capabilities, which are often obscured by aggregate accuracy. We further introduce a stress-aware agentic solver that detects visual stressors and invokes visual-editing skills before reasoning, improving robustness in high-stress scenarios. Overall, RoboStressBench provides a principled evaluation framework for diagnosing and improving VLM perception under real-world physical stress, supporting the development of more reliable embodied AI systems.

6
Agent Skills Should Go Beyond Text: The Case for Visual Skills

Reusable skills are a key mechanism for extending agent capabilities, allowing agents to accumulate experience and solve increasingly complex tasks. Yet most existing skill-learning methods store reusable experience as text-only assets, such as instructions, reasoning traces, or summarized trajectories. We argue that this text-only paradigm creates a fundamental bottleneck for visual-centric tasks, where reusable knowledge often depends on spatial layout, visual grounding, fine-grained appearance, and localized state changes. To address this limitation, we propose \NAME, a multimodal skill paradigm that combines declarative textual logic with explicit visual support. We distinguish three reusable forms: static priors for stable spatial conventions, dynamic priors for in-situ visual working memory, and interleaved visual skills that bind ordered text steps to the source frames, screenshots, or page regions that justify them. Rather than only describing what to do, visual skills also encode where to look, how to inspect, and how to verify visual outcomes. To scale visual-skill construction, we introduce \SYSTEM, an automatic system that converts agent experience into reusable multimodal skills by preserving textual reasoning, spatial references, visual boundaries, and interaction patterns from task trajectories. Experiments on GUI and other visual-centric tasks show that visual skills consistently outperform text-only skills, particularly when success requires spatial correspondence, visual evidence, and state-aware interaction. These results support our central position: reusable agent skills should go beyond text and become multimodal assets for future multimodal agents.

5
SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

Measuring structured object understanding in vision foundation models remains challenging due to inconsistent evaluation protocols and limited part-level supervision. Semantic correspondence (SC) evaluates this capability by testing whether object parts can be matched across instances and categories under large variations in appearance, viewpoint, and geometry. To enable a systematic SC evaluation, we introduce SOCO, a new benchmark for Semantic Object Correspondence that introduces a taxonomy of correspondence types and provides consistent, functionally meaningful keypoint annotations across 100 categories and over 1M correspondence pairs. In addition, SOCO includes keypoint language descriptions, enabling the evaluation of large vision-language models (LVLMs) and their fine-grained part-level understanding. Comprehensive experiments reveal that (i) vision foundation backbones encode strong semantic structure but transfer correspondences poorly across related categories and only partially capture object-part position, (ii) LVLMs are stronger at text-prompted part localization than at visual-reference cross-image matching, exposing a gap between language-grounded localization and fine-grained visual correspondence, and (iii) correspondence performance predicts performance on dense downstream tasks, including segmentation, tracking, 3D pose estimation, and 3D detection, more strongly than ImageNet classification. Together, these findings position SOCO as a benchmark for structured, part-level representation quality in vision and multimodal foundation models.

5
PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

Large Vision-Language Models (LVLMs) map visual inputs into dense token sequences, imposing a quadratic computational bottleneck for inference. Elastic visual-token compression addresses this by training a single model that can run at multiple visual-token budgets. However, existing approaches struggle under aggressive compression. Spatial-only compression, as in nested pooling, behaves as an imperfect low-pass filter and induces spectral aliasing that obscures fine-grained detail. Query-only compression, as in nested query resampling, replaces explicit grid-aligned tokens with non-local summaries and substantially degrades spatial grounding. To resolve this representational conflict, we introduce PARCEL (Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding), a visual tokenization architecture that dynamically partitions the labor of feature extraction. PARCEL establishes spatial pool tokens as low-frequency layout anchors and conditions elastic query tokens on these anchors through Pool-Conditioned Query Resampling. This encourages query tokens to focus on complementary visual features rather than redundant spatial mapping. Extensive evaluations across 27 benchmarks show that PARCEL improves the performance-efficiency Pareto frontier, consistently outperforming existing matryoshka baselines across visual-token budgets while preserving the "train once, deploy anywhere" paradigm.

5
Multi-Agent Computer Use

Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning and parallel execution, alleviate many of the shortcomings of single-agent CUAs. We propose a general multi-agent setup in which a manager model decomposes computer use tasks as a directed acyclic graph (DAG), encoding relevant dependencies and goals for subagents. At each iteration, the manager dispatches parallel CUA subagents to carry out nodes on the ready frontier of the DAG, and continuously revises the DAG (adding, canceling, or rewriting nodes) as new findings arrive from subagents. This design treats the partially observable environment of computer use as a first class challenge: information that downstream agents may not be able to re-observe are retained and passed forward through the manager and DAG structure. We demonstrate that MACU consistently improves over strong single-agent baselines by 3.4-25.5% on desktop (OSWorld) and web navigation (Online-Mind2Web, WebTailBench, Odysseys) benchmarks, exhibits more favorable test-time scaling, and solves complex long-horizon tasks where single-agent CUAs get stuck. On Odysseys, a long-horizon web navigation benchmark, MACU improves average task completion wall-clock time by {sim} 1.5 times, demonstrating its efficacy in speeding up traditionally slow CUA pipelines. Our findings highlight that multi-agent coordination is a promising axis for scaling computer use agents to work productively for longer and more effectively. We release all code and interactive visualizations at https://jykoh.com/multi-agent-computer-use.

5
RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

Vision-language-action (VLA) models are built on the premise that semantic understanding from pretrained language or vision-language backbones should guide robot action prediction. Yet robot fine-tuning is optimized as imitation over task-specific action distributions, and many evaluations can be solved through visual or instruction-action shortcuts. We introduce RoboSemanticBench (RSB), an embodied benchmark for diagnosing semantic grounding in action prediction: whether post-trained VLA models can use complex instruction semantics to select and manipulate the correct physical target. In each episode, a robot receives a multiple-choice math or general-knowledge question, observes candidate answer blocks, and must grasp the block corresponding to the correct answer. RSB covers controlled arithmetic, grade-school mathematical understanding, and commonsense or factual understanding under four-choice and ten-choice suites. Across representative VLA models, we find that many policies learn to grasp candidate blocks but select the semantically correct block at near-random or below-random rates after controlling for grasp success, revealing a persistent gap between backbone-level semantic competence and action prediction.

5
Measuring the Depth of LLM Unlearning via Activation Patching

Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail to detect when this knowledge remains recoverable from internal representations. Recent white-box studies reveal such residual knowledge but often rely on auxiliary training or dataset-specific adaptations, leaving no generalizable metric. To address these limitations, we propose the Unlearning Depth Score (UDS), a metric that quantifies the mechanistic depth of unlearning via activation patching. UDS first identifies layers that encode the target knowledge using a retain model baseline, then measures how much of it is erased in the unlearned model on a 0-1 scale. In a meta-evaluation across 20 metrics on 150 unlearned models spanning 8 methods, UDS achieves the highest faithfulness and robustness, confirming our causal approach as the most reliable for unlearning evaluation. Case studies further reveal that white-box metrics can disagree at the layer level and that erasure depth varies across examples. We provide guidelines for integrating UDS into existing benchmarking frameworks and streamlining the evaluation pipeline. Code and data are available at https://github.com/gnueaj/unlearning-depth-score

4
FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search

Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but current approaches can fail, because correct answers are often sparse and score-based selection depends on model calibration. We propose FineVerify, a fine-grained self-verification framework that decomposes each question into checkable sub-questions, verifies sampled candidates against each sub-question, and selects the candidate with the highest aggregated score. This per-check structure turns selection into simpler local judgments and produces scores under the same explicit criteria. Across four agentic search benchmarks and two models, FineVerify consistently outperforms standard scaling baselines. With only four sampled trajectories, it improves GPT-5-mini by 8.2 accuracy points and Gemini-3-flash by 5.6% on average. With 12 samples, FineVerify enables GPT-5-mini to surpass frontier GPT-5 on BrowseComp-Plus. Beyond accuracy, FineVerify produces interpretable verification traces that help audit benchmark errors, suggesting broader applications for inspecting agentic search systems. Code and data are available at https://github.com/XuZhao0/fineverify

4
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

Understanding chart and table images is essential for applying vision-language models (VLMs) to real-world document understanding. While English benchmarks have advanced rapidly, non-English counterparts remain scarce, leaving it unclear whether this progress generalizes across languages. A key obstacle is the difficulty of collecting realistic and diverse non-English chart and table images at scale. To address this, we leverage governmental white papers as a scalable source for benchmark construction beyond English, as they contain naturally occurring charts and tables across diverse formats and domains and are freely accessible in many countries. As a first instantiation, we introduce HakushoBench, a challenging Japanese chart and table VQA benchmark built from 33 governmental white papers. HakushoBench contains 2,053 images spanning over 10 image types, with manually annotated QA pairs, designed to assess deep and holistic understanding of charts and tables, rather than local visual cues alone. Experiments across a broad range of VLMs demonstrate that HakushoBench remains challenging for open-weight models: the best open-weight model achieves only 58.6% accuracy, and a 34.9-point gap between open-weight and proprietary models highlights substantial room for improvement in complex chart and table understanding. We release our dataset and code.

4
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation. Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.

3
Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

Sentence embeddings are a foundational component for semantic search, clustering, classification, and retrieval-augmented generation. This paper presents embeddingmagibu-200m, a Turkish-focused sentence embedding model that produces 768-dimensional L2-normalized vectors and supports an 8,192-token context window, far exceeding the 512-token limit of earlier BERT-based Turkish encoders. Instead of full pretraining, an efficient three-stage adaptation pipeline is introduced: (1) construct a Turkish-optimized multilingual tokenizer with a 131,072 vocabulary by pruning redundant tokens from the teacher's vocabulary and incorporating multilingual tokens via frequency analysis on a 40-language corpus, (2) clone a teacher embedding model while preserving transformer backbone weights and initializing a compatible embedding table for the new vocabulary via mean-composition token mapping, and (3) perform offline embedding distillation from precomputed teacher vectors using a cosine similarity objective over a balanced 40-language Wikipedia corpus. The resulting student model contains approximately 200M parameters and trains in roughly four hours on a single GPU by avoiding online teacher inference during training, at a total cost of 5-20. Empirically, Pearson/Spearman correlations of 77.55%/77.45% are obtained on STSbTR, surpassing the 300M-parameter teacher model (73.84%/72.92%). On TR-MTEB (26 tasks), a mean score of 63.9% is achieved (7th out of 26 models), providing a competitive cost-quality trade-off with 33% fewer parameters than the teacher. To facilitate reproducibility and downstream use, all artifacts are released including model weights, tokenizer files, precomputed embedding datasets, and open-source cloning and distillation tooling.

3
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers

This paper addresses the challenge of integrating 3D meshes as a native modality within Multimodal Large Language Models (MLLMs). Diffusion-based large reconstruction models decouple semantic understanding from geometric reasoning, operating as stateless reconstructors conditioned on dense 2D pixel priors. Recent MLLM-based methods treat the 3D modality as an external output rather than a native component of the multimodal sequence, making incremental adaptations without a systematic analysis of how geometric manifolds align with MLLM feature spaces. We introduce EVA01, a unified framework that extends the modality boundary of MLLMs to natively incorporate 3D mesh understanding, generation, and context-aware editing. Built upon a Mixture-of-Transformers (MoT) architecture, EVA01 decouples the model into a pre-trained Understanding Expert (E_{und}) and a structurally mirrored Generation Expert (E_{gen}), coupled through shared global self-attention with hard modality routing. This design aligns the semantic latent space of the MLLM backbone with the geometric manifold, enabling direct transfer of multimodal priors without intermediate 2D representations. Results show that EVA01 achieves state-of-the-art native text-to-3D generation fidelity and unlocks robust long-context multi-turn geometric editing with identity preservation, a capability fundamentally inaccessible to stateless reconstruction pipelines. Our findings further offer architectural insights for integrating 2D foundation models with 3D tasks, informing the design of 3D-native multimodal systems. Project Page: https://www.seeles.ai/research/pages/EVA01

3
Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Selecting the best response from multiple small-model samples using a stronger scorer is a simple inference-time strategy, but fails when the small model has already committed to incorrect reasoning paths. PRM guided search avoids this by scoring candidate continuations during generation, but requires a reward model trained with step-level labels. We propose Chunk-Level Guided Generation, a training-free alternative that uses an off-the-shelf large language model as a process scorer. At each step, a small model samples k fixed-length candidate chunks, while the larger model scores the candidates using likelihoods without generating any text. The selected chunk is committed before the next step, steering generation before errors can propagate. We instantiate this framework with two selection rules: Likelihood-Guided Selection (LGS), which selects the chunk with the highest length-normalized large-model log-probability, and Contrastive-Guided Selection (CGS), which subtracts the small model's log-probability to favor chunks where the large model's preference diverges from the small model's. We show that scoring variable-length reasoning steps with large-model likelihoods is unreliable due to a systematic length bias that persists even after length normalization, and that fixed-length chunks avoid this confound. On GSM8K, MATH, Minerva Math, AMC23, and AIME24 with Qwen2.5-1.5B guided by Qwen2.5-32B and Llama-3.2-1B guided by Llama-3.1-70B, CGS outperforms majority voting by up to 28 pp and, under matched guidance budgets, matches or outperforms Qwen2.5-Math-PRM-72B guided search on most benchmarks without reward-model training. With Qwen2.5-7B guided by Qwen2.5-72B, CGS reaches 81.8% on MATH and 63.6% on Minerva Math at k=16, surpassing majority voting by 4--6 pp. Finally, Chunk-Level Guided Generation produces substantially shorter reasoning traces than PRM guided search.

3
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

Procedural 3D modeling through code is emerging as a versatile paradigm, offering deterministic, engine-ready, and precisely editable assets that neural 3D generators inherently lack. Authoring such procedural content, however, demands deep expertise in 3D software APIs, parametric design, and code-level geometric reasoning. In this paper, we propose 3DCodeBench, a systematic benchmark for evaluating vision-language model (VLM) agents for procedural 3D generation in 3D modeling software. Specifically, 3DCodeBench evaluates how effectively 12 advanced VLMs can serve as procedural 3D modelers by translating text and image references into procedural code for 3D modeling software. Recognizing that automated metrics may not fully capture the perceptual quality of 3D shapes, we build 3DCodeArena, a ranking platform based on pairwise human preferences over generated 3D outputs. From extensive evaluations and results, we observe that: (1) Failures mostly arise from API mismatches, while successful renders still suffer from disconnected or floating 3D geometric components. (2) Test-time scaling, such as higher thinking budgets and multi-turn refinement, improves performance overall. Our findings highlight a critical need for high-quality procedural coding data to advance commercial VLMs. Furthermore, effective procedural 3D modeling requires a robust execution environment that provides high-fidelity feedback for iterative refinement. We release 3DCodeBench, including the curated large-scale dataset of multimodal (text/image) prompts, procedural code, 3D object triplets, evaluation protocol, and the public 3DCodeArena platform as a foundational toolkit for exploring VLM-based procedural 3D modelers.

3
Not only where, But when: Temporal Scheduling for RLVR

Reinforcement learning with verifiable rewards (RLVR) has become a core technique for post-training of Large Language Models (LLMs). While policy optimization is driven by all sampled tokens under a globally broadcast scalar reward, the heterogeneous policy behaviors exhibited along trajectories are largely overlooked without differentiation. Existing works address this by credit allocation, including token-level advantage reweighting, and selective token optimization, however, the allocation criterion are principally stagnant throughout training, limiting resilient policy evolution. In this work, we argue that when learning signals are scheduled can be as important as where they are allocated across tokens, and introduce the temporal dimension that scheduling the credit allocation criteria over the course of RLVR optimization. We find that prioritizing targeted tokens emphasized with specific policy behaviors, and gradually attenuating toward general optimization leads to more stable and efficient learning dynamics. Furthermore, we show that simple trajectory percentiles provide a natural perspective for distinguishing policy behaviors, and works effectively with temporal scheduling. Our analysis reveals that standard optimization substantially sacrifices policy entropy when simultaneously accommodating heterogeneous behaviors, whereas temporal scheduling yields healthier policy evolution dynamics. Experiments across mathematical and general reasoning benchmarks demonstrate consistent improvements, suggesting that temporal scheduling constitutes a promising optimization dimension.

3
Confidence-Adaptive SwiGLU for Mixture-of-Experts

SwiGLU has become a standard gated activation in modern Transformer MLPs, yet its gate sharpness -- the smoothness and selectivity of the gating function -- is typically fixed throughout training. In this work, we propose Confidence-Aware SwiGLU (κ-SwiGLU), a variant of SwiGLU for Mixture-of-Experts (MoE) models that adjusts expert gate sharpness according to token-level routing confidence. Specifically, κ-SwiGLU parameterizes the SiLU gate sharpness coefficient as a learnable function of the router logit, enabling each expert gate unit to interpolate between smooth, broadly active gating and sharp, selective gating. We evaluate κ-SwiGLU on the FineWeb-Edu dataset across MoE Transformer models ranging from 8 to 28 layers. Across these settings, κ-SwiGLU improves mean CORE performance while adding negligible parameters and incurring only a small computational overhead, demonstrating that confidence-aware gate sharpness is a promising mechanism for improving MoE MLPs. The code is available at https://github.com/askerlee/kappa-swiglu.

2
Can Predicted Dynamics Exist in the Physical World?

Predictive Physical AI systems output state rollouts, action chunks, and latent plans, yet a low root-mean-square error (RMSE) does not imply that a particular proposal is physically executable. We formulate physical admissibility as a prediction-control interface: before execution, a decoded proposal is treated as candidate dynamics and evaluated using kinematic, dynamic, and direct-to-composed horizon conditions. Passing is not a certificate of task success; rejection identifies violation of the specified physical envelope and gives a component-level reason. On Hugging Face LeRobot PushT, controlled falsification shows that one-step prediction-RMSE and standardized dynamics residuals reach area under the receiver operating characteristic curve (AUC) 0.982 and 0.972, kinematic-only conditions reach AUC 0.592, and the full gate reaches AUC 0.957 with condition-level attribution. In replay-based intervention experiments, residual-based filters and the full physical-admissibility gate prevent 87-$89% of invalid proposals while preserving mean progress near 0.998.

2
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Video world models (WMs) have shown promise for policy evaluation and improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures, policy evaluation and improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robust policy evaluation and improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoiding out-of-distribution (OOD) noise that yields implausible imaginations. We address this with two complementary objectives: a semantic objective with a Vision-Language Model that provides informative gradients by reasoning about the generated video, and a plausibility objective that prevents the optimized noise from drifting OOD. With state-of-the-art video world models for autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robust policy evaluation and improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.

2
LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

As real-world applications increasingly require processing inputs of 100k+ tokens, the gap between context length and inference efficiency has become a critical bottleneck. Context compression offers a way to reduce prefill costs while preserving task accuracy. However, existing training-free attention-based methods leave substantial gaps in demanding long-context tasks such as code reasoning. We present LongAttnComp, a long-context adaptation of AttnComp that fine-tunes a lightweight cross-attention scoring layer and introduces tokenlevel chunking, a token-budget top-p algorithm, positional reordering, and a formatagnostic query parser. We further design a two-stage fine-tuning recipe for the compressor: Stage 1 builds a general retrieval foundation from NIAH-style data, and Stage 2 extends it with multi-hop and reasoning data for broader long-context task coverage. On InfiniteBench Code-Debug, LongAttnComp matches or exceeds full-context accuracy, substantially outperforms training-free baselines, and transfers across four target models from three families. On LongBench v2, the two-stage recipe largely closes the Stage 1 gap on multi-document reasoning while preserving Code-Debug performance.

2
Compositional Text-to-Image Generation Via Region-aware Bimodal Direct Preference Optimization

Despite the rapid progress of text-to-image (T2I) models, generating images that accurately reflect complex compositional prompts (covering attribute bindings, object relationships, counting) still remains challenging. To address this, we propose BiDPO, a framework to enhance T2I model's capability of compositional text-to-image generation. We begin by introducing an carefully designed pipeline to construct a large-scale preference dataset, BiComp, with strictly quality control. Then, we extend Diffusion DPO to jointly optimize image and text preferences, which is shown to greatly effective in improving the models to follow complex text prompt in generation. To further enhance the models for fine-grained alignment, we employ a region-level guidance method to focus on regions relevant to compositional concepts. Experimental results demonstrate that our BiDPO substantially improves compositional fidelity, consistently outperforming prior methods across multiple benchmarks. Our approach highlights the potential of preference-based fine-tuning for complex text-to-image tasks, offering a flexible and scalable alternative to existing techniques.

2
ACL-Verbatim: hallucination-free question answering for research

Academic researchers need efficient and reliable methods for collecting high-quality information from trusted sources, but modern tools for AI-assisted research still suffer from the tendency of Large Language Models (LLMs) to produce factually inaccurate or nonsensical output, commonly referred to as hallucinations. We apply the extractive question answering system VerbatimRAG to research papers in the ACL Anthology, directly mapping user queries to verbatim text spans in retrieved documents. We contribute a novel ground truth dataset for the task of mapping user queries to relevant text spans in research papers, and use it to train and evaluate a variety of extractive models. Human annotation is performed by NLP researchers and is based on synthetic user queries generated using a custom pipeline based on the ScIRGen methodology, paired with chunks of research papers retrieved by VerbatimRAG. On this benchmark, a 150M-parameter ModernBERT token classifier trained on silver supervision from our pipeline achieves the best word-level F1 (53.6), ahead of the strongest evaluated LLM extractor (48.7).

1
Review Arcade: On the Human Alignment and Gameability of LLM Reviews

LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective. First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the alignment is reasonable. However, we also find that LLM-human alignment varies substantially across prompts and models. Finally, we investigate the scenario in which the author uses an iterative draft-revise workflow to improve the submission according to the LLM review. We find that this "gaming" of LLM reviews can be effective in specific scenarios, leading to a statistically significant increase of overall scores for up to 35\% of papers. We publish our code: https://github.com/uhh-hcds/reviewarcade.

1
AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

AI systems are fallible, and humans can make mistakes in deciding whether to trust AI over their own judgment. Thus, improving human-AI collaboration requires understanding when, why, and how humans decide to rely on AI. We study two distinct reliance decisions: the delegation choice -- deciding when to let AI act autonomously without knowing its output, and the adoption choice -- evaluating AI suggestions and deciding how to use them. Both of these decoupled reliance patterns shape collaboration, but prior work rarely studies them together in realistic settings with the same users. We address this gap by studying collaborative human--AI teams competing in a question-answering game in which humans can choose when and how to work with AI agents to win. Our 24 matches pair 23 expert humans with 16 AI agents, capturing 387 delegation and 1440 adoption decisions. While human--AI collaboration performs better than either AI or humans alone, humans make suboptimal collaboration decisions, both under-relying on correct AI suggestions (3.9% of opportunities missed) and over-relying when AI misleads them (1.7%). Both parties contribute wrong answers: reported model confidence is near chance when humans and AI disagree, while confirmation bias drives higher under-reliance (64.5%) when an AI suggestion agrees with humans' initial incorrect answer. To close this gap, we recommend calibrated confidence, evidence-grounded explanations, and mechanisms that help users refine trust.

1
A Formally Verified Library of Mathematical Finance in Lean 4

We describe a library of mathematical finance built in the Lean 4 proof assistant, on top of Mathlib and the BrownianMotion package. It is broad: more than two hundred sorry-free theorems across eleven areas, from the measure-theoretic foundations of continuous-time stochastic calculus through derivative pricing to applied risk, portfolio, and fixed-income theory, and, to our knowledge, the most comprehensive machine-checked development of mathematical finance to date. Breadth is the setting, not the point. Two things make it more than a catalogue. It reaches into the continuous theory far enough to construct the L2 Itô integral as a bounded linear isometry and to derive, rather than assume, the risk-neutral pricing measure. And it audits its own faithfulness: every result is classified by how its Lean statement relates to the mathematics it claims, and a build-enforced gate pins the axioms each proof actually uses, so a reader can see precisely what has been proved and what has only been proved under added hypotheses. We close with a candid finding: a formal base over classical financial mathematics yields certified unification of known results rather than new financial theory. The contribution is therefore methodological and infrastructural, reusable verified foundations for mathematical finance, together with the faithfulness audit.

1
The Hamilton-Jacobi Theory of Deep Learning

In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.

1
Geometric Latent Reasoning Induces Shorter Generations in LLMs

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.

1
ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats

Charts are a primary medium for conveying quantitative and relational information, yet systematically evaluating chart parsing models remains difficult. Existing benchmarks focus on narrow chart types and leave diagrammatic structures such as flowcharts and mind maps largely unaddressed, while models produce outputs in incompatible formats, and datasets rarely include the printed or hand-drawn images encountered in practice. To address these issues, we introduce ChartArena, a comprehensive bilingual benchmark covering eight chart families spanning both numeric charts and diagrammatic structures, each evaluated across three visual scenarios: digital renderings, printed photos, and hand-drawn photos. The dataset is built via a human-agent collaborative annotation pipeline with multi-stage human verification to ensure annotation reliability. To enable fair cross-model comparison, we further design a format-agnostic evaluation protocol that maps heterogeneous outputs into two canonical semantic spaces, a normalized triple view and a directed graph view, and scores them with structure-aware metrics. Through extensive evaluation of 26 leading MLLMs, we observe three consistent findings: (i) frontier proprietary models such as Gemini 3.1 Pro lead overall, yet the strongest open-source systems are rapidly closing the gap; (ii) document parsing models handle numeric charts reasonably but fall sharply behind on diagrammatic structures; and (iii) expert chart parsers remain limited to narrow chart families. Across all models, radar charts and hand-drawn scenarios stay especially challenging. These findings show that ChartArena exposes clear capability gaps and provides a unified foundation for future progress. ChartArena is publicly available at https://github.com/pspdada/ChartArena.

1
Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable inverse graphics directly from a single image by reconstructing a scene as an editable Blender program, without relying on specialized 2D or 3D foundation models, differentiable rendering, or multi-view supervision. We introduce Staged Executable Inverse Graphics (SEIG), an agentic framework that reconstructs a 3D scene from a single image by progressively refining scene factors including geometry, materials, composition, and lighting directly in executable Blender code space. We evaluate our framework across diverse scenes using a range of reconstruction metrics spanning pixel-level, perceptual, and semantic fidelity. Our experiments show that staged reconstruction substantially improves reconstruction fidelity, highlighting the importance of task decomposition for executable inverse graphics with general-purpose VLMs. Finally, we showcase various downstream applications enabled by the reconstructed editable Blender scenes.

1
Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available at https://github.com/ahan-2000/Lost-in-Translation-{https://github.com/ahan-2000/Lost-in-Translation-}.

0
Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

Human annotation is the empirical foundation of much NLP research, from dataset construction to model evaluation, but papers often leave unclear who produced the annotations and how the annotation process was controlled. We provide the first large-scale, task-level audit of human annotation reporting across major NLP venues, asking which annotation details are documented, which are missing, and how reporting varies across time, topic, venue, and intended use of human judgment. We introduce a unified taxonomy of annotation-reporting practices and validate an LLM-assisted extraction pipeline against Annotated-gold, a human-adjudicated gold standard of 41 papers and 72 annotation tasks, where the best model reaches human-comparable agreement with adjudicated labels, with Krippendorff's alpha of 0.606 versus 0.585 for human-human agreement. Using this pipeline, we construct Annotated-llm, a dataset covering ACL-venue papers from 2018-2025, with 2,667 extracted annotation tasks from 1,603 papers, and find that papers frequently report operational details such as recruitment strategies, annotator expertise, and annotation volume, but often omit details needed to assess annotation validity, including training, language proficiency, compensation, socio-demographics, adjudication, and agreement values, especially in model-evaluation studies. Our results show that annotation reporting in NLP has improved over time but remains uneven, and they establish a scalable framework and bare-minimum reporting recommendations for making human annotation more reliable, reproducible, and interpretable.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 2, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Brief icon
Brief

Navigate your agents to product-market fit

0
GlowPulse icon
GlowPulse

Your Mac's camera is now a heart-rate sensor

0
Branda icon
Branda

A fun new way to create & manage brands.

0
Rodeo by TwelveLabs icon
Rodeo by TwelveLabs

Describe your shot. Rodeo builds your first cut.

0
Knock agent for Slack icon
Knock agent for Slack

Build, manage, and ship customer messaging from Slack

0
Co-Invest icon
Co-Invest

Trade 500+ markets directly from ChatGPT & Claude

0
Kompassify 2.0 icon
Kompassify 2.0

User onboarding now with an AI copilot

0
Gigacatalyst icon
Gigacatalyst

Give your Sales and CS teams engineering superpowers

0
Enshittifier icon
Enshittifier

Chrome extension that replaces "AI" with 💩

0
Vokal icon
Vokal

A collaboration space for 10x teammates with their Al agents

0
Moxie Docs icon
Moxie Docs

Living docs + MCP context for your GitHub repos

0
HumToBeats icon
HumToBeats

Turn humming into AI-generated beats

0
choclift icon
choclift

Use iPhone to open apps, Apple Shortcuts and websites on Mac

0
Sortail icon
Sortail

Self-learning one-click inbox cleanup for Apple Mail

0
Mirowl icon
Mirowl

Search all your screenshots via a local OCR-powered AI

0
Galleroo icon
Galleroo

Turn your Google Drive into a stunning client gallery

0
MartinLoop icon
MartinLoop

Control AI coding agents with limits, proof, + run receipts

0
RingDisk icon
RingDisk

Brighten your video calls with different toned ring lights

0
Trovelo icon
Trovelo

Plan and track your trips privately

0
Paste MCP & AI Tools icon
Paste MCP & AI Tools

Infinite clipboard for Claude, Codex and other AI tools

0
Fundraisly icon
Fundraisly

AI fundraising agent that finds investors and books meetings

0
PawPause icon
PawPause

Lock your keyboard and prevent cats from causing chaos

0
Overline icon
Overline

Real-time AI captions and translation for any browser video

0
ConnectWizard icon
ConnectWizard

Unlock hidden App Store Connect analytics

0
findloc.ai icon
findloc.ai

Make your business citable by ChatGPT, Claude & Perplexity

0
folk icon
folk

the AI in your texts that gets stuff done

0
Dune Keypad icon
Dune Keypad

Context-aware Mac keypad, w/ Claude + community extensions

0
Mina Meeting Assistant icon
Mina Meeting Assistant

Your AI Teammate now responds and executes during your calls

0
Emily by Co-Desk icon
Emily by Co-Desk

Voice AI copilot for coworking & coliving operators

0
SocialEcho 2.0 icon
SocialEcho 2.0

AI social media copilot for teams and agents

0
R0Y OMNI 1.0 icon
R0Y OMNI 1.0

Generate more accurate investment dashboards and reports

0
Tabstack Web Research icon
Tabstack Web Research

Run a research agent with cited answers in a single API call

0
Mistral Vibe icon
Mistral Vibe

I agent for long-running, multi-step work and coding

0
Skylive icon
Skylive

Never miss a celestial event, anywhere on Earth

0
Databox MCP icon
Databox MCP

Chat with your business data inside Claude, ChatGPT and more

0
Open Caffeine icon
Open Caffeine

Keep your Mac awake

0
Joanium icon
Joanium

Local AI workspace to build and work with your computer

0
Trippple Club icon
Trippple Club

Advertise together on Meta Ads and pay 3x less

0
Typeahead icon
Typeahead

AI autocomplete for every app on your Mac

0
NetworkSpy icon
NetworkSpy

HTTP(s) proxy debugger with custom viewer

0
Paint By JSON | Figma API Client icon
Paint By JSON | Figma API Client

Real API data in your mockups made as easy as lorem ipsum.

0
Sentinel icon
Sentinel

Control your robots from anywhere in the world

0
Stella icon
Stella

Local natural language search across all your files

0
Tokenwise icon
Tokenwise

A smart LLM proxy that shows where you're overpaying

0
Presentify icon
Presentify

Take your presentation skills to the next level

0
Web Clipper for NotebookLM icon
Web Clipper for NotebookLM

Your ultimate NotebookLM's Chrome Extension

0
TabTasker icon
TabTasker

Zero servers. Total privacy. Your new favorite toolbox.

0
Second Brain for AI icon
Second Brain for AI

Persistent memory for Claude, ChatGPT & Cursor. Free.

0
Oura Ring 5 icon
Oura Ring 5

The world’s smallest smart ring, now even better

0
Marqly 5.0 icon
Marqly 5.0

Your AI-powered bookmark manager

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 2, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Anthropic says it will extend Project Glasswing to organizations in 15+ countries, sources say giving Mythos access to Five Eyes, NATO, Samsung, SK, and others (Financial Times)
Source: TechmemePublished: Jun 2, 2026

Financial Times : Anthropic says it will extend Project Glasswing to organizations in 15+ countries, sources say giving Mythos access to Five Eyes, NATO, Samsung, SK, and others —  About 150 organisations will be given advanced cyber security model following requests from around the world

Analysis: Palo Alto Networks shareholders have voted to reject pay packages for top executives seven times since 2015, more than any other S&P 500 company (Andrew Martin/Bloomberg)
Source: TechmemePublished: Jun 2, 2026

Andrew Martin / Bloomberg : Analysis: Palo Alto Networks shareholders have voted to reject pay packages for top executives seven times since 2015, more than any other S&P 500 company —  A majority of Palo Alto Networks investors have voted against the cybersecurity company's executive pay seven times in 11 years.

Meta expands Teen Accounts safety features to limit harmful content on Instagram, Facebook, and Messenger, including about nutrition, weightlifting, and anxiety (Eli Tan/New York Times)
Source: TechmemePublished: Jun 2, 2026

Eli Tan / New York Times : Meta expands Teen Accounts safety features to limit harmful content on Instagram, Facebook, and Messenger, including about nutrition, weightlifting, and anxiety —  The changes, after Meta's legal losses in two child safety cases, are aimed at limiting harmful content shown to teenagers on Instagram, Facebook and Messenger.

Didi reports Q1 revenue up 10% YoY to ~$8.7B and a ~$177M net loss, several times larger than its Q4 loss, as it expands globally in markets like Latin America (Luz Ding/Bloomberg)
Source: TechmemePublished: Jun 2, 2026

Luz Ding / Bloomberg : Didi reports Q1 revenue up 10% YoY to ~$8.7B and a ~$177M net loss, several times larger than its Q4 loss, as it expands globally in markets like Latin America —  Didi Global Inc. reported its second straight quarterly loss, after ratcheting up investment to expand globally and defend …

Russia's Federal Security Service claims it uncovered a large-scale spyware operation by "foreign intelligence operatives" on senior officials' mobile phones (Bloomberg)
Source: TechmemePublished: Jun 2, 2026

Bloomberg : Russia's Federal Security Service claims it uncovered a large-scale spyware operation by “foreign intelligence operatives” on senior officials' mobile phones —  Russia's Federal Security Service said it uncovered a large-scale operation by foreign intelligence agencies to implant spyware on senior officials' mobile phones.

Sources: after Trump nixed an AI EO on May 21, US officials are navigating internal strife and chaotic talks; early model access was the most contentious issue (Wired)
Source: TechmemePublished: Jun 2, 2026

Wired : Sources: after Trump nixed an AI EO on May 21, US officials are navigating internal strife and chaotic talks; early model access was the most contentious issue —  Donald Trump killed an executive order to regulate AI.  Now, administration officials and AI executives are trying to figure …

Amazon schedules Prime Day for June 23 to June 26, shifting the four-day sales event from July to avoid the end of the FIFA World Cup and US Independence Day (Arriana McLymore/Reuters)
Source: TechmemePublished: Jun 2, 2026

Arriana McLymore / Reuters : Amazon schedules Prime Day for June 23 to June 26, shifting the four-day sales event from July to avoid the end of the FIFA World Cup and US Independence Day —  Amazon.com (AMZN.O) will host its annual Prime Day sales event from June 23 to June 26 after launching the event in July for the past five years …

OpenAI releases a new knowledge work report: Codex now has 5M+ weekly active users, up 6x+ since February, and knowledge workers represent ~20% of Codex users (OpenAI)
Source: TechmemePublished: Jun 2, 2026

OpenAI : OpenAI releases a new knowledge work report: Codex now has 5M+ weekly active users, up 6x+ since February, and knowledge workers represent ~20% of Codex users —  OpenAI today released a new report, The Next Era of Knowledge Work, showing how Codex is no longer just a coding tool.

Analysis: 22 of 24 US executive agencies saw a YoY increase in their average X account engagement during the first year of Trump's second term; @DOGE dominated (Pew Research Center)
Source: TechmemePublished: Jun 2, 2026

Pew Research Center : Analysis: 22 of 24 US executive agencies saw a YoY increase in their average X account engagement during the first year of Trump's second term; @DOGE dominated —  Federal agencies are getting far more audience engagement on X (formerly Twitter) in the second Trump administration …

SK Hynix Chair Chey Tae-won says the company plans to double its memory chip capacity over five years, responding to a global shortage that could last till 2030 (Debby Wu/Bloomberg)
Source: TechmemePublished: Jun 2, 2026

Debby Wu / Bloomberg : SK Hynix Chair Chey Tae-won says the company plans to double its memory chip capacity over five years, responding to a global shortage that could last till 2030 —  SK Hynix Inc. plans to double its memory chip capacity over the coming half-decade, a major expansion that should help ease …

China adds data and algorithms to its trade secret rules, as part of Beijing's efforts to prevent tech leaks amid intensifying strategic competition with the US (Nectar Gan/Bloomberg)
Source: TechmemePublished: Jun 2, 2026

Nectar Gan / Bloomberg : China adds data and algorithms to its trade secret rules, as part of Beijing's efforts to prevent tech leaks amid intensifying strategic competition with the US —  China expanded its trade secret rules to include data and algorithms, as Beijing steps up efforts to prevent technology leaks …

Computex 2026: ARM CEO Rene Haas says Oracle and ByteDance are among the customers of Arm's new AGI CPUs for data centers (Max Cherney/Reuters)
Source: TechmemePublished: Jun 2, 2026

Max Cherney / Reuters : Computex 2026: ARM CEO Rene Haas says Oracle and ByteDance are among the customers of Arm's new AGI CPUs for data centers —  Chinese tech company ByteDance and U.S. data centre firm Oracle (ORCL.N) are among the customers of Arm's (O9Ty.F) AI data centre chips, the head of the chip designing firm said on Tuesday.

Zhipu AI says it plans to apply for a listing in Shanghai; Zhipu's Hong Kong-listed shares are up 10x+ since its January IPO, giving it an ~$83B market cap (Reuters)
Source: TechmemePublished: Jun 2, 2026

Reuters : Zhipu AI says it plans to apply for a listing in Shanghai; Zhipu's Hong Kong-listed shares are up 10x+ since its January IPO, giving it an ~$83B market cap —  Knowledge Atlas Technology JSC (2513.HK), also known as Zhipu AI, intends to apply for a domestic listing on the Shanghai Stock …

A profile of Valve, which PrivCo estimates generated $5.2B in revenue and $1.5B in net income in 2025, as lawsuits allege its Steam store abuses market power (Bloomberg)
Source: TechmemePublished: Jun 2, 2026

Bloomberg : A profile of Valve, which PrivCo estimates generated $5.2B in revenue and $1.5B in net income in 2025, as lawsuits allege its Steam store abuses market power —  Lawsuits in the US and the UK allege the company's Steam store is abusing its market power.  Valve disagrees.

SpaceX's, Anthropic's, and OpenAI's IPOs could add up to $4T in US stock market value within months, fueling concerns they could herald more capital raising (The Economist)
Source: TechmemePublished: Jun 2, 2026

The Economist : SpaceX's, Anthropic's, and OpenAI's IPOs could add up to $4T in US stock market value within months, fueling concerns they could herald more capital raising —  They promise to be the biggest stockmarket debuts ever.  On June 11th SpaceX reportedly hopes to raise $75bn from investors …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 2, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 2, 2026

Solidot Feed: Highlighting essential tech & open-source news.

中国打击快餐行业的幽灵外卖

中国正在打击引发食品安全问题的幽灵外卖。幽灵外卖指的是在外卖平台上提供外卖服务但没有实体店的商家。根据周一生效的新规,外卖平台上的商家信息必须与实体店相符,商家还必须注明是否提供堂食服务。去年北京一男子投诉称他通过外卖平台订购的蛋糕质量不佳,上面装饰着不可食用的花朵。此事引发了对“幽灵外卖”的关注。调查发现,他订购蛋糕的连锁店在各大电商平台上列出了近 380 家门店,但实际上却没有一家实体店。其网店还使用了伪造的营业执照。进一步调查显示,从网店订购的蛋糕实际上外包给一个订单转运平台,该平台会将订单分配给出价最低的第三方商家。当局在两个订单转运平台上共查获了 360 万份蛋糕订单。当局还在七大外卖平台上发现了 6.7 万家“幽灵店铺”,这些店铺与订单转运网站“相互勾结,形成非法供应链”。今年四月,市场监管总局宣布对拼多多、美团、京东、饿了么、抖音、淘宝、天猫 7 家电商平台“幽灵外卖”系列案罚款 36 亿元。

中国将数据和算法纳入商业秘密保护

中国扩大商业秘密保护范围,将数据和算法纳入其中,以加强防范技术外流。中国国家市场监督管理总局修订的《商业秘密保护规定》在星期一(6月1日)正式施行。这是中国法律首次明确将数据、算法等数字资产纳入商业秘密保护范围。新规也对远程办公和跨境企业合作提出更严格的安全要求。企业必须采取保护措施,包括按照员工职级限制文件访问权限、隐藏敏感信息,以及记录用户操作行为等。规定还将境外实施的侵犯商业秘密行为纳入规制范围,但未明确具体执法机制。配合新规实施,中国国家市场监管总局星期一启动为期一个月的专项执法行动,重点针对生物医药、半导体和人工智能等关键领域,严厉打击“恶意挖角”以及员工跳槽时携带商业秘密等行为。

能源危机推动 37 个国家的电动汽车销量创新高

受中东危机导致燃料价格上涨的影响,全球电动汽车销量快速增长。根据标普全球汽车数据统计,在可获取数据的 150 个国家中,3 月有包括澳大利亚和英国在内的 28 个国家刷新了电动汽车单月销量历史纪录。4 月则有包括巴西和菲律宾在内的 9 个国家创下新高。3 月和 4 月期间,91% 的国家电动汽车销量实现增长。在原油进口高度依赖中东的韩国,3~4 月的电动汽车销量同比增长至 2.4 倍。电动汽车在新车销售中的占比提高14个百分点达到 26%。东南亚地区电动汽车销量增长 4 成,市场占比升至 16%。欧盟市场也摆脱了一度停滞的局面,销量同比增长 4 成。中国市场虽然电动汽车销量下降 8%,但由于整体新车需求同步下滑,电动汽车在新车销售中的占比反而提高5个百分点达到 42%。国际能源署在 5 月发布的报告中指出,此次能源危机的应对方式“将在未来几年塑造全球汽车市场”。

海盗湾在被警方搜查 20 年后

2006 年 5 月 31 日,海盗湾成立不到三年后,65 名瑞典警察进入了斯德哥尔摩的一个数据中心。在美国政府的压力下,作为刑事调查的一部分,他们奉命下线海盗湾的服务器。在警察进入数据中心前,海盗湾联合创始人Gottfrid Svartholm 和 Fredrik Neij 就感觉到情况不妙。他们注意到有密探跟踪他们。不过这一次警方的目标是他们的服务器。上午 10 点左右,Gottfrid 告诉 Fredrik 办公室来了警察。他让同事去托管机房销毁“罪证”。Fredrik 离开时,他意识到问题可能与他们的 torrent tracker 相关。为以防万一他决定对网站进行完整备份。当他到达托管机房时,他的担忧得到了证实。数十名警察带走了数十台服务器,其中大部分属于与海盗湾无关的客户。接下来几天,Fredrik 备份网站的决定显然是海盗湾历史上最关键的时刻。正因为有了备份,海盗湾团队才得以在三天内恢复网站。事件的处理方式也延续了海盗湾一贯的恶搞。他们将网站更名为“警察湾”(The Police Bay),设计了一个向好莱坞发射炮弹的新标志。几天后网站的标志被凤凰图案取代,象征着它从数字灰烬里重生。这次突击搜查非但没有让海盗湾倒闭,反而让它成为主流媒体关注的焦点,而很大程度上这要归功于网站的快速恢复。媒体的报道也引发了网站流量的激增,与好莱坞的预期结果相反。20 年后,海盗湾仍然还是那个海盗湾。

Red Hat 官方 NPM 账号被入侵,软件包被植入恶意程序

Red Hat 官方 NPM 账号 @redhat-c​​loud-services 被入侵,该账号相关联的多个软件包植入了窃取凭证的恶意程序。恶意程序旨在窃取 GitHub Action Secret、以及 AWS、GCP、Azure、Kubernetes、HashiCorp Vault、npm 和 CircleCI 等的凭证,它还是一种能自我传播的蠕虫,会利用窃取的 npm 令牌和 npm 的 bypass_2fa 参数,自动重新发布其它软件包的后门版本。Red Hat 在一份声明中表示,恶意软件包已经移除,它仍然在进行调查,初步分析未发现对客户或合作伙伴环境或 Red Hat 生产系统造成任何影响。

Anthropic 申请 IPO

Anthropic 已向美国证券交易委员会(SEC)秘密提交了 IPO 招股说明书。该公司表示在 SEC 完成审查之后,将根据市场状况等因素选择上市。Anthropic 的估值今年以来出现了爆炸式增长,在上周的最新一轮融资中估值达到了 9650 亿美元,超过了 OpenAI 在 3 月下旬的 8520 亿美元估值。美国股市即将迎来三家万亿市值公司的上市,SpaceX 预计本月上市,Anthropic 竞争对手 OpenAI 预计会很快递交上市申请,三家公司的市值预计将达到 4 万亿美元。

黑客利用 Meta AI 机器人接管 Instagram 名人账号

亲伊朗黑客诱骗 Meta AI 机器人短时间内接管了多个 Instagram 名人账号,其中包括奥巴马和美国太空军总军士长(Chief Master Sergeant),之后在账号上发表了亲伊朗的图片和信息。攻击方法非常简单:首先使用 VPN 连接到目标用户常住地附近,然后请求重置账号密码,要求与 Meta AI 客服对话,指示 AI 将目标账户关联到一个新邮箱地址,AI 按指示向该邮箱地址发送一次性验证码后,攻击者就可以重置密码接管账号。目前 Telegram 上已经出现了大量交易被接管账号的频道。Meta 的 Andy Stone 声称该公司已经采取行动解决了问题。

三种埃博拉疫苗在研发中

The International Aids Vaccine Initiative(IAVI)、牛津大学以及 Moderna 公司正在研发针对埃博拉病毒的疫苗。IAVI 表示正在刚果民主共和国爆发的埃博拉疫情可能是至今最严重的。疫情发生在冲突地区,已经报告了逾千例疑似病例,邻国乌干达已确诊 9 例。目前已知有六种埃博拉病毒株,只有三种会引发疫情。最常见的 Zaire 毒株已有针对性的疫苗,但此次爆发的是比较罕见的 Bundibugyo 毒株,目前还没有针对它的疫苗。Moderna 公司宣布将利用 mRNA 技术研发针对 Bundibugyo 毒株的疫苗。

巴西亚马逊出现旱季延长和降雨模式改变

最近发表的两项研究显示,巴西亚马逊地区开始出现此前预测几十年后才会出现的情景,包括旱季延长和降雨模式改变。如果没有采取应对措施,情况可能会迅速恶化,对生物多样性、天然水库的补充以及森林功能构成威胁。其中一项研究表明,亚马逊地区的旱季正从四个月延长至六个月,期间降水量减少逾 150 毫米。第二项研究分析了 2023 年至 2024 年间亚马逊地区的干旱情况。研究结果显示,过火面积增加了 9%,森林退化预警增加了 19%,在干旱高峰期,多达 420 万公顷的森林受到火灾影响。结果表明,干旱、火灾和退化的循环在加剧,削弱了生态系统的恢复能力。亚马逊雨林的面积也可能会减少。

中国批准首例侵入式脑机接口芯片之后

去年 10 月的一天,Dong Hui 突然决定试试能不能握笔写字。6 年前他因为车祸导致的脊髓损伤而颈部以下瘫痪。他缓慢而坚定的写下了自己的名字、谢谢和日期。他能做到这一切来自他参加的脑机接口芯片试验。2024 年 11 月 Dong Hui 成为中国首批接受脑部手术植入侵入式脑机接口芯片的患者之一。今年三月他使用的植入式脑机接口产品获得了商业使用批准。他植入的脑机接口设备被称为 NEO,由上海初创公司 Neuracle Technology 和清华大学合作研发。手术历时约 1.5 小时,收集脑电信号的传感器植入放置在他的硬脑膜上。植入物会将信号传输到计算机。计算机将信号翻译成指令,控制他每天 2.5 小时训练期间佩戴的软体机器人手套,帮助他学习抓握。手术后大约一周他开始康复训练,“训练的第九天,我的右手成功不用手套抓住了一个球,那真是个奇迹。”悉尼科技大学的脑机接口研究员 Avinash Singh 表示,NEO 迅速获得批准的原因之一是其侵入性相对较小,它的 8 个传感器放置在大脑保护膜之上,相比下马斯克(Elon Musk)所创办的 Neuralink 公司开发的 N1 脑机芯片直接穿透了大脑皮层。NEO 的出血、胶质瘢痕形成和长期信号衰减的风险较低。中国还着手将脑机接口列入医保,将其与量子技术、人形机器人等列为对中国未来科技竞争力至关重要的六大关键产业之一。信息科学家 Meicen Sun 表示,中国一大优势是患者乐于接受新技术。美国初创公司 Axoft 正与中国公司合作在中国对四名患者进行脑机接口测试,并计划扩大规模。

实验性药物显著延长了最致命癌症患者的生存期

胰腺癌是最致命的癌症,大部分现有疗法的效果甚微。现在名为 daraxonrasib 的药物公布了 III 期临床试验结果,有 500 名胰腺癌已扩散的患者参与了试验,其中 248 名患者每日服用 daraxonrasib,其余 252 名接受化疗。结果显示,服药组的中位生存期为 13.2 个月,化疗组为 6.6 个月,也就是药物将患者的生存期延长了一倍,而且副作用更少。研究报告公布在芝加哥举行的美国临床肿瘤学会年会上,专家认为这种药物有望引领一场治疗革命。Daraxonrasib 的作用机制是靶向名为 Kras 的蛋白质,这种蛋白质驱动了几乎所有胰腺癌。药物通过粘合分子去捕获并抑制 Kras 蛋白,从而阻止肿瘤的生长。

AOMedia 发布 AV2 规范

由 Amazon、Cisco, Google、Intel、Microsoft、Mozilla 和 Netflix 等联合组建的开放媒体联盟 AOMedia 正式发布了 AV1 的后继者 AV2 编解码器。AV2 在 AV1 继续上提高了压缩效率,以更低的比特率实现高质量视频传输,为流媒体、广播和实时视频会议不断变化的需求进行了优化。AV2 增强了对 AR/VR 应用的支持,支持多节目分屏播放,改进屏幕内容处理,能在更宽的视觉质量范围内运行。

马来西亚禁止未满 16 岁青少年使用社媒禁令生效

马来西亚新网络安全法规星期一(6 月 1 日)生效,要求各大社交媒体平台验证用户年龄,并禁止 16 岁以下儿童注册账户。这项新法规适用于在马来西亚拥有至少 800 万用户的社媒供应商,包括 Facebook、Instagram、TikTok、YouTube 等。该国通信监管机构表示将给予社媒平台一段宽限期实施这些措施,但未说明宽限期的截止日期。新《网络安全法》的相关规定包括新的《儿童保护法》和《风险缓解法》,并要求社媒平台“加强内容管理”。通信与多媒体委员会说,未能遵守这两项守则的公司可面临最高 1000 万令吉的罚款。

研究认为玩家群体总体上的价值观更包容

过去几年,玩家群体中反 DEI 和拥抱保守派价值观的声音在社媒上非常突出,他们究竟只是代表了少数人的声音但被社媒的算法放大,还是代表了大多数玩家?研究人员利用 MRI-Simmons 的数据分析了 2012 年、2016 年和2020 年这三个特定年份在美国进行的全国消费者调查,追踪了受访者过去十二个月是否玩过网络游戏或单机游戏,观察了游戏行为与价值观之间的相关性。结果显示,玩家群体相比美国普通民众总体上持有更包容性的价值观。研究人员认为对 DEI 等包容价值观的敌意来自少数活跃玩家。

地球熔心在 2010 年突然逆转方向

根据卫星对地磁场的测量,太平洋一区域下的地球熔融核心在 2010 年突然逆转了流动方向,从西向流动转为东向流动。爱丁堡大学地球科学家 Frederik Dahl Madsen 说,“科学家现在想了解,这种逆转究竟代表着短暂的波动、周期性振荡的一部分,还是地核环流的一种新的稳定平衡。持续监测对确定未来几年这一流动如何演变至关重要。”Madsen 团队分析了 1997-2025 年间 27 年的卫星数据,拼凑出可能发生的变化。外地核大部分运动都受被称为偏心行星环流(eccentric planetary gyre)的环流模式支配。2010 年太平洋下方的区域,部分外核突然偏离了这种模式,从 2010 年之前的微弱西向流动转变为 2012 年之后的强劲东向流动。这种流动持续增强至 2020 年。根据最新的测量结果,它又开始减弱了。这一发现表明地球内部可能比我们想象的更动态多变。

Paint.net 项目通过诉讼拿回 Paint.net 域名

流行图像编辑软件 Paint.net 的官方域名是 www.getpaint.net,因为域名 Paint.net 掌握在第三方手中。现在你可以直接通过 Paint.net 域名获取该软件了。过去 22 年 Paint.net 域名原所有者一直拒绝出售域名,除非项目开发者 Rick Brewster 支付巨额费用。但域名所有者犯下了一个严重错误,他们创建了一个模仿 Paint.net 项目下载页的网站,通过恶意链接和广告获利。Brewster 提起了诉讼,主张利用他人作品牟利构成了侵犯版权和域名抢注。他赢得了诉讼,没有花钱就拿回了 Paint.net 域名。Paint.net 未来将成为主站,GetPaint.net 将重定向到主站。

维基媒体基金会否认以组织工会理由解雇员工

维基媒体基金会的员工正在组建工会,但本月有多名参与组织工会的员工离职或解雇,此事在社区引发了强烈反应,有人呼吁罢工,或者暂停将破坏性编辑恢复到正确版本的工作。维基媒体基金会证实它解散了负责 Community Wishlist 的团队,但否认此事与组建工会相关。基金会称,它的内部评估认为依靠单一团队处理社区请求不再运作良好。因为基金会支持的软件众多,接收社区请求的渠道众多,很难靠一个专门的团队去满足社区的所有愿望。在新架构下 Community Wishlist 请求的处理职责将由更大的产品和技术部门承担。受影响的员工目前仍在职,他们正在考虑安排其他内部岗位。未被安排到其他岗位的员工将于下个月离职,将获得遣散费。基金会称,如果员工最终投票决定成立工会,基金会将尊重法律程序。

16 岁男孩命名蓝牙设备为 BOMB,客机被迫返航

2026年 5 月 30 日下午 5:58,美联航 UA236 航班波音 767-400ER 客机从纽瓦克自由国际机场起飞,飞往西班牙马略卡岛帕尔玛机场(Palma de Mallorca Airport)。在跨大西洋飞行约一个半小时后,原本平静的飞行却让机上乘客陷入了混乱。据乘客在社媒上分享的经历,乘务员突然通过广播发出紧急指令:所有乘客必须立即关闭蓝牙连接。机组人员多次发出语气越来越紧张的广播,声称该指令直接来自美联航位于芝加哥的总部。机组人员警告说,如果蓝牙信号不被关闭,飞机将被迫返航。尽管收到了警告,至少还有两台蓝牙设备处于开启状态。飞行员最终决定中止飞行。根据社媒上的消息,原因是一名 16 岁男孩将其个人蓝牙音箱的网络名称改为 BOMB,男孩据说是几年前改的。蓝牙信号会广播给附近任何试图配对的智能手机或笔记本电脑,因此该名称会立即出现在机舱内乘客和机组人员的屏幕上,触发标准的炸弹威胁应对流程。

微软以证书过期为借口让 Mac 版 Office 2019 进入只读模式

微软于 2018 年 9 月 24 日宣布推出 Windows 和 Mac 版本的 Office 2019,售价 149.99 美元,可永久使用,但不会引入新功能。但到了 2026 年 5 月 15 日微软更新了支持文档,不再保证 Office 2019 能正常运行。Mac 版本的 Office 2019 的支持于 2023 年 10 月 10 日结束,微软使用数字证书去验证 Mac 版本的许可,该证书将于 2026 年 7 月 13 日到期。微软不打算更新证书,而是就让证书过期,而证书过期之后软件将无法正常使用,进入只读模式。微软向受影响用户提供了三种选择:继续以只读模式使用 Mac 版 Office 2019、切换到免费的 Microsoft 365 Web 应用,或者付费订阅 Microsoft 365 或购买新的 Office 家庭版 2024 永久许可证。微软此举招致了广泛批评,认为其做法涉嫌违法。Windows 版本未受影响。

高温会扰乱动物大脑

大量证据表明,动物大脑会受到高温的影响。天气炎热时,鸟类学习能力下降,狗咬人的次数增多,羚羊等体型较大的动物更容易挑衅打架。西澳大利亚大学的行为生态学家 Amanda Ridley 说,如果动物无法保持足够的警觉去寻找食物或躲避天敌,它们的生存几率会急剧下降。随着气候变化导致热浪日益频繁,动物王国的认知障碍可能会波及整个生态系统,本已脆弱的物种会面临更大的风险。如果授粉昆虫忘记该拜访哪些花朵,农作物和野生植物可能会歉收。如果鸟类难以觅食,其幼鸟可能无法存活。在一个气候暖化的行星上,敏锐的思维尤为重要。Ridley 指出气候变化意味着适应能力变得更重要。高温影响人类的大脑,有研究发现,对于在无空调学校学习的学生,学年气温每升高华氏 1 度,考试成绩会下降 1 %。对美国近 7 万起狗咬人报告的分析发现,32 摄氏度的天气狗咬人的风险比 16 摄氏度的天气高 10%,但研究人员并不确定是天热的条件下狗变得更具有攻击性,还是人类更暴躁而容易引发攻击,很可能是两个因素的组合。中国的一项研究发现,蛇和猫在天气变热时也更可能咬人。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK