TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0912
TUE, JUN 30, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · TUE, JUN 30, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 30, 2026

Here is a summary of today's key news events.

U.S. Stock Markets Post Strongest Quarterly Gains in Years

Wall Street is concluding one of its best quarters since 2020, with major indexes like the S&P 500 showing significant growth. The strong performance comes despite ongoing concerns about global conflicts and volatility in the tech sector, indicating resilience in the U.S. economy.

Japan Monitors Yen's Slide, Hinting at Market Intervention

The Japanese yen has fallen to a new low against the dollar, putting financial authorities on high alert. Japan's Finance Minister has reiterated a commitment to prevent excessive volatility, leading to speculation that the government may soon intervene by buying yen to strengthen its currency.

Taiwan Probes Alleged Smuggling of AI Chips to China

Taiwanese authorities have launched an investigation into the potential illegal diversion of advanced AI servers, including those with powerful Nvidia chips. The probe focuses on cracking down on smuggling networks that may be helping China bypass U.S. export restrictions on high-end technology.

JPMorgan Chase to Increase Investment in National Security Industries

CEO Jamie Dimon announced that the bank is dedicating its own capital to invest in defense, energy, and other industries vital to national security. The move signals a strategic expansion for the financial giant into sectors it deems critical for the country's stability.

Oil and Gold Prices Fluctuate Amid Economic Uncertainty

Oil prices saw a slight increase today but are on track for their largest quarterly drop since early 2020. Meanwhile, gold futures rose but are set for a significant monthly loss, as investors weigh geopolitical risks against expectations of future interest rate hikes by the Federal Reserve.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 30, 2026

Hacker News Feed: Highlighting key posts and discussions.

Open Source Low Tech

(opensourcelowtech.org)

41088
Popping the GPU Bubble

(moondream.ai)

17540
Free the Icons

(weblog.rogueamoeba.com)

551190
Dark Sky Lighting

(www.savingourstars.org)

23244
Rocketlab acquires Iridium

(investors.rocketlabcorp.com)

432286
Tidal AI Policy

(tidal.com)

302341
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - June 30, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

Agentic Abstention: Do Agents Know When to Stop Instead of Act?

LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain from additional tool calls. We define Agentic Abstention, the problem of deciding when an agent should stop acting under uncertainty. Unlike standard LLM abstention, which is usually evaluated as a single-turn answer-or-abstain decision, agentic abstention is a sequential decision problem: an agent can answer, abstain, or gather more information at each turn, and the need to abstain may only become clear after interacting with the environment. We study this problem across web shopping, terminal environments, and question answering, evaluating 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks. Our results show that the main challenge is not only whether agents can abstain, but also when they abstain. Some agents never abstain when they should, while others do so only after many unnecessary interactions. This gap is especially large on tasks where the instruction appears feasible until the environment reveals otherwise (e.g., no valid result matches the instruction). We further find that model scale, reasoning, and agent scaffolding affect abstention in different ways, where larger or more capable models sometimes perform worse at timely abstention. Finally, we introduce CONVOLVE, a context engineering method for improving agentic abstention that distills full interaction trajectories into reusable stopping rules. On WebShop, CONVOLVE substantially improves timely abstention without updating model parameters, raising Llama-3.3-70B's timely recall rate from 26.7 to 57.4. Our dataset and code are available at https://lhannnn.github.io/agentic-abstention

109
LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Streaming video editing has made rapid progress, yet practical deployment is still limited by two core issues: maintaining stable backgrounds and non-edited regions over time, and achieving the low latency required for real-time interactive scenarios. Meanwhile, recent streaming video generation methods are mostly developed for synthesis and cannot be directly applied to editing due to the strict preservation requirement and region-specific control. In this work, we present a novel streaming video editing framework that performs causal, frame-by-frame editing with strong content preservation and real-time responsiveness. Our key design is a three-stage distillation pipeline that progressively transfers editing capability from a powerful bidirectional foundation model to an efficient unidirectional streaming editor, enabling stable long-horizon edits without sacrificing visual fidelity. To further support real-time deployment, we introduce an AR-oriented mask cache that reuses region-related computation across frames, substantially reducing redundant processing and accelerating inference. Finally, we establish a dedicated benchmark for streaming video editing. Extensive evaluations demonstrate that our method achieves state-of-the-art visual quality among streaming baselines while drastically boosting inference speed to 12.66 FPS, making it suitable for interactive and augmented reality applications.

63
Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-horizon knowledge-action infrastructure that connects external knowledge, actions, observations, and verifier outcomes, producing agentic trajectories with an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domain supervised fine-tuning to align the base model with broad agentic behaviors. Second, we train domain-level teacher models to capture specialized expertise in each domain. Third, we propose a multi-teacher domain-routed on-policy distillation with salient vocabulary alignment to improve knowledge transfer efficiency across different domains, unifying six heterogeneous domains into one deployable student model. Agents-A1 achieves strong and broad performance for long-horizon agent benchmarks. Compared with 1T-parameter model such as Kimi-K2.6 and DeepSeek-V4-pro, Agents-A1 achieves leading results on SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8), and remains highly competitive on SciCode (44.3), HLE (47.6) and BrowseComp (75.5). We hope this work provides the community with a practical path for scaling the horizon using a 35B agent that can reach or match the performance of 1T models on long-horizon tasks.

59
TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. However, existing benchmarks do not adequately evaluate general-purpose terminal computer-use agents (TUAs): general computer-use benchmarks primarily target graphical user interfaces (GUIs), whereas terminal-based benchmarks largely emphasize technical and programming-centric workflows historically native to the shell. We introduce TUA-Bench, a general-purpose benchmark for terminal-use agents. TUA-Bench includes 120 real-world tasks across five task families, covering routine digital activities-including document editing, email management, and live-web information seeking-as well as scientific and engineering workflows co-designed with PhD-level domain experts that require specialized software. This breadth distinguishes TUA-Bench from prior shell-focused or domain-specific benchmarks. Each task is manually designed, runs in a real terminal with a deterministic setup script, and is evaluated by an execution-based scoring protocol. We find that the strongest frontier agent, Claude Code with Claude Opus 4.8 max reasoning effort, achieves 65.8% overall performance, with substantial gaps across both tracks. By providing a broad and realistic evaluation of terminal-use capabilities, TUA-Bench aims to accelerate the transition from narrow, task-specific assistants to general-purpose agents capable of operating reliably across diverse digital environments.

39
ReFreeKV: Towards Threshold-Free KV Cache Compression

To reduce memory consumption during LLM inference, a handful of methods have been proposed for KV cache pruning. While these techniques can accomplish lossless memory reduction on many datasets, they often hinge on an under-emphasized condition: an input/domain-specific threshold for KV cache budget needs to be pre-determined to achieve the optimal performance. However, such input-sensitive design may be considerably limited in real-world scenarios, as open-domain inputs span diverse domains, lengths and difficulty levels, without clear boundaries for threshold selection. As a result, the dependence of such input-sensitive threshold can be a fundamental limitation that causes large degradation on arbitrary inputs. In this work, we propose a new objective that lifts the threshold constraints for robust KV compression, advocating for "threshold-free" methods that adaptively adjust budget allocation while preserving full-cache performance. We then propose a novel method, ReFreeKV, serving as the first instantiation of this objective. Extensive experiments across 13 datasets with diverse context lengths, task types, and model sizes demonstrate its efficacy and efficiency. Our code is publicly released at https://github.com/Patrick-Ni/ReFreeKV.

35
Beyond IID: How General Are Tabular Foundation Models, Really?

Foundation models for predictive machine learning on tabular data have recently gained significant traction in academia and industry. Research communities across disciplines are increasingly evaluating tabular foundation models on diverse datasets and tasks. However, these task- and discipline-specific evaluations remain largely inaccessible to model researchers because benchmark software and evaluation protocols are fragmented. As a result, model researchers rely on standard benchmarks, which are mostly defined for tasks where tabular foundation models already excel. The most challenging scenarios are excluded, limiting meaningful progress in the field by focusing on marginal improvements on IID data rather than on broader, more demanding challenges. To overcome this, we introduce BeyondArena, the first unified holistic benchmark for tabular data that supports diverse task types (IID, temporal, grouped), across sample size and feature dimensionality scales, with diverse feature types (with text, with high cardinality) from a broad range of disciplines. To enable unified benchmarking beyond standard benchmarks, we introduce Data Foundry, a Python framework and metadata schema for curating tabular datasets for predictive machine learning. Our results across 11 models and 142 curated datasets show that existing tabular foundation models excel on tiny- to medium-sized IID data, while traditional tree-based and deep learning models still dominate on non-IID, large, and high-dimensional datasets. BeyondArena guides model research for the most demanding challenges in tabular data, enabling progress towards truly foundational tabular models.

34
Trimming the Long-Tail of Visual World Modeling Evaluation

Physical interactions follow a long-tailed distribution: a set of common and regular interactions dominates human experience and visual data, while a broad spectrum of rare and irregular interactions remains underrepresented. Although recent visual world models, including image and video generation models, achieve impressive realism on existing benchmarks, they primarily focus on simulating common physical interactions. This raises a central question: Do current visual world models internalize and generalize physical principles? In this work, we introduce Tailor-Bench, a benchmark that challenges world models to simulate irregular physical interactions. To enable systematic evaluation, we design three scenario modes that progressively challenge model reasoning: Regular scenarios reflect common tool-task pairs, Unconventional scenarios replace conventional tools with attribute-compatible substitutes to test affordance generalization, and Impossible scenarios introduce attribute-violating tools to probe constraint awareness. Additionally, we design two complementary settings under a unified evaluation protocol: predictive generation requires inferring outcomes without guidance, while descriptive generation specifies the target outcome for faithful realization. Our experimental results reveal a clear long-tail gap in physical world modeling: performance degrades from Regular to Unconventional and Impossible scenarios, indicating limited generalization beyond common interactions. Failure analysis further shows that models rely on superficial visual patterns: image models fail to realize correct state changes, while video models further suffer from temporal inconsistencies.

31
Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Recent interest in multimodal large language models (MLLMs) raises a central question: can they reason over dynamic visual evidence rather than merely recognize objects or events in individual frames? This ability, which we refer to as video temporal-logical reasoning, requires models to maintain, update, and compose evidence as visual states evolve across frames. Existing video benchmarks often conflate this capability with scene complexity, static recognition, or uncontrolled temporal variation. To isolate this capability, we introduce Video-MME-Logical, a controlled benchmark organized around five temporal-logical operations: state tracking, sequential counting, temporal ordering, dynamic spatiality, and structural composition. The benchmark contains 25 fine-grained task categories generated with controlled object states, transitions, temporal dependencies, and logical compositions. It enables difficulty-controlled final-answer evaluation by varying temporal horizon and reasoning complexity, and supports intermediate-state diagnostics by verifying whether models recover the required logical reasoning trace before producing the final answer. Experiments with state-of-the-art MLLMs reveal a substantial human-model gap, especially as temporal-logical complexity increases. Supervised fine-tuning on up to 500K generated samples improves performance but remains insufficient to close the reasoning gap, positioning Video-MME-Logical as a scalable testbed for analyzing and improving temporal-logical reasoning in MLLMs.

23
AsyncOPD: How Stale Can On-Policy Distillation Be?

On-policy distillation (OPD) trains a student on its own rollouts guided by teacher feedback and is becoming increasingly important for large language model (LLM) post-training. Like reinforcement learning (RL), however, OPD faces an on-policy systems bottleneck, as rollouts can dominate training time for reasoning workloads. Asynchronous training pipelines can alleviate this bottleneck by decoupling rollout generation from learner updates, but doing so introduces stale-policy data. While prior work has studied stale data in asynchronous RL, its effects in OPD remain underexplored. We present the first systematic study of staleness in asynchronous OPD, focusing on a practical setting where teacher feedback is implemented through local KL losses and full-vocabulary teacher logits are too expensive to store or transfer, necessitating finite teacher-score caches. We first show that KL direction changes the stale-data problem: teacher-weighted forward KL is more robust to stale rollouts, whereas student-weighted reverse KL is vulnerable. Second, for this vulnerable reverse-KL case, we study whether methods designed to stabilize asynchronous RL can mitigate OPD staleness. In our experiments, they do not improve over a simpler OPD-specific surrogate: recomputing the reverse-KL signal under the current student at learner time. Third, we analyze how finite teacher-score caches create a bias-variance tradeoff for sparse and sampled reverse-KL OPD estimators. This motivates multi-sample Monte Carlo (MC), which preserves MC correctability while reducing one-sample variance. Finally, we present and open-source AsyncOPD, a fully asynchronous OPD training pipeline built from these estimator choices. Experiments show that AsyncOPD improves training throughput by 1.6times to 3.8times over strict synchronous training while reaching comparable accuracy.

23
Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

Video understanding is a fundamental capability for multimodal intelligence, and recent Multimodal Large Language Models (MLLMs) have achieved remarkable performance on Video Question Answering (VideoQA) benchmarks. However, existing benchmarks primarily evaluate whether models can perceive shallow visual cues, while rarely examining whether MLLMs can learn deeper knowledge or procedural skills from video tutorials and generalize them to downstream long-horizon agentic tasks. To address this gap, we introduce VG-GUIBench (Video-Guided GUI Benchmark), a new benchmark designed to evaluate whether MLLM-based GUI agents can follow video tutorials to complete corresponding GUI interactive tasks. Furthermore, we observe that the performance of models on both VideoQA and video-guided agentic tasks critically depends on effective keyframe extraction. Based on this observation, we propose TASKER (Task-driven And Scene-aware Keyframe searchER), a keyframe extraction algorithm that jointly considers task relevance and scene dynamics to identify informative frames. Experimental results demonstrate that TASKER achieves significant performance improvements on both VideoQA and video-guided agentic task benchmarks, outperforming the best baseline by 2.0% on the EgoSchema fullset and 1.8% on the NExT-QA dataset, respectively. These results further highlight the potential of generalized keyframe extraction methods for video understanding tasks. Our code and data are available at https://github.com/VG-GUI-TASKER/VG-GUI-TASKER.

16
Monte Carlo Energy Aggregation for Mobile 3D Gaussian Splatting

Recent advances in 3D Gaussian Splatting have demonstrated unprecedented success in novel view synthesis. However, the substantial inference and storage overhead driven by high-order Spherical Harmonics (SH) are primary bottlenecks for mobile platforms. In this paper, we present Flux-GS, a real-time Gaussian Splatting method designed to achieve high-fidelity rendering with significantly reduced overhead for resource-constrained mobile platforms. We first propose a Monte Carlo Specular Energy Aggregator, sampling third-order radiance residuals and aggregating specular energy into a compact latent space. In this way, our method effectively preserves visually salient lighting features in lower-order bands without expensive distillation or pre-training. To mitigate the high-frequency details lost during compression, we introduce an Attribute-Conditioned SH Enhancement module. This module predicts Gaussian-aware offsets based on intrinsic Gaussian attributes, which enhance the first-order SH representation prior to inference, without extra inference costs. Furthermore, the original single-view gradient-based densification is prone to producing excessive Gaussians and overfitting to a certain view. We address these limitations by proposing a Multi-view Alpha-based Densification and Pruning strategy. By leveraging multi-view guidance, we ensure multi-view structure consistency and the precise removal of redundant primitives. Extensive experiments demonstrate that Flux-GS achieves substantial parameter reduction while maintaining competitive visual quality, offering a robust and scalable solution for real-time mobile rendering. Code: magenta{https://xiaobiaodu.github.io/flux-gs-project/{https://xiaobiaodu.github.io/flux-gs-project/}}.

15
TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

Agentic multimodal models perform diverse operations on an image via code and reason over the returned view, an effective paradigm for fine-grained visual question answering. However, code operations can be useful, redundant, or misleading. Outcome-only rewards cannot precisely distinguish these cases, and existing process rewards either fail to attribute final correctness to individual tool calls, or require an external judge model. To address this, we introduce Tool-Augmented Credit Optimization (TACO), a GRPO variant for code-tool agents built on two coupled advantage channels. The first, Differential Answer-Probe Reward (DAPR), is a self-supervised, judge-free tool-contribution advantage that credits each tool call by its own effect on answering correctly. Probe tokens inserted into the model's reasoning elicit its predictions with and without the tool, and the difference in outcome reward is taken as the call's value: positive for a useful call, negative for a misleading one, and zero for one that changes nothing. This reuses the existing answer checker with no auxiliary judge, and, being a difference rather than an absolute probe score, is naturally robust to probe-hacking. The second is the outcome advantage from the final answer, distributed by Outcome-Gated Advantage Routing (OGAR): a parameter-free rule that, conditioned on the call's outcome, delivers this credit only to the responsible segments, suppressing wasted tool calls without any cost term. We train TACO through a two-stage SFT+RL pipeline. Extensive experiments across perception, reasoning, and general multimodal benchmarks show that it yields consistent accuracy gains and learns to invoke its tools only when they help.

14
Interleaved Speech Language Models Latently Work In Text

Speech language models (SLMs) have been extensively studied, with the common paradigm incorporating text data and pre-trained text LMs. A leading approach is speech-text interleaving in which models are trained over sequences containing both speech and text tokens, aiming to boost even speech-only capabilities. Yet the way these two modalities interact in the model latent space remains unclear. In this work, we analyze interleaved speech-text LMs from different model families and sizes through the scope of the logit lens to provide such insight. We reveal that these models go through an implicit transcription phase in which the text token of the spoken word becomes decodable in intermediate layers, despite not being trained for speech recognition. The transcription of the word appears as one of the top candidate words for as much as 77\% of the data. Following this stage, the models proceed to predict the next word in the text space before transforming back to the speech domain. We finally analyze the role of interleaving data, and initializing from text LMs in eliciting this behavior, as well as seeing how this correlates with spoken knowledge abilities. Our analysis sheds light on the internal mechanisms underlying the relationship between speech and text modalities and could shape SLM optimization.

9
OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

Existing computer-use benchmarks fail to capture the realism, complexity, and long-horizon demands of real-world computer use, limiting their ability to reveal the limitations of frontier agents. We introduce OSWorld 2.0, a benchmark of 108 long-horizon computer-use workflows across everyday and professional tasks, designed to capture complex and challenging real-world phenomena. Each task represents a realistic end-to-end workflow that takes human users a median of about 1.6 hours to complete and requires an average of 318 tool calls with Claude Opus 4.7 using maximum thinking, compared with about 30 in OSWorld 1.0. OSWorld 2.0 targets challenge phenomena that are common in real workflows yet underrepresented in prior benchmarks, spanning interaction-design challenges such as streaming interaction and dynamic environments, as well as agent-pattern challenges such as cross-source reasoning, implicit-state inference, and visual-spatial precision. Tasks are grounded in authentic input artifacts and cross-referenced against realistic stateful user profile data, and include separate safety reports auditing safety-sensitive execution. Under our primary binary-completion metric at 500 steps, Claude Opus 4.8 with maximum thinking and batched tool calls scores best but still completes only 20.6% of tasks at a 54.8% partial score; GPT-5.5 is far more token-efficient yet plateaus near 13%. These results show that current agents are still far from professional-level computer use: rather than stumbling on basic GUI control or coding, they lose track of constraints, miss information that arrives mid-task, guess rather than ask the user, and skip verification, struggling most when a task hinges on hidden state they must recover.

8
GUICrafter: Weakly-Supervised GUI Agent Leveraging Massive Unannotated Screenshots

Data, as the fundamental substrate of modern intelligence, has greatly driven the development of current foundation models. Naturally, researchers aim to extend this paradigm to the domain of GUI agents, hoping to build strong GUI agents through a similar paradigm. However, GUI agent data cannot be directly harvested from the internet, making it costly and difficult to collect at scale. As a result, current GUI agents suffer from poor cross-device generalization and limited visual grounding ability for fine-grained GUI elements. As an attempt to address data challenge in GUI agents, we propose GUICrafter, a weakly-supervised GUI agent leveraging massive unannotated screenshots to substantially reduce the reliance on expensive human annotations. GUICrafter explores a curriculum learning framework for training GUI agents through two progressive stages. First, the model learns visual grounding from large-scale unannotated screenshots and webpages, leveraging the rich contextual signals inherent in GUI interactions without human annotations. Then, in Stage 2, we leverage a small amount of high-quality data to calibrate the model via reinforcement learning. Experiments show that GUICrafter achieves competitive, or even superior, performance to advanced systems like UI-TARS while using only 0.1% of its data. Furthermore, under the same amount of annotated data, GUICrafter surpasses all previous methods such as GUI-R1. Code, data, and models are available at https://github.com/fansunqi/GUICrafter.

8
One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Modern large-scale LLM pretraining benefits from utilizing Pipeline Parallelism; however, synchronous implementations leave GPUs idle during pipeline bubbles, wasting computational resources. Asynchronous Pipeline Parallelism eliminates these bubbles, maximizing throughput at the cost of gradient staleness. Among asynchronous schedules, PipeDream-2BW is particularly appealing: unlike the original PipeDream schedule, it ensures a constant one-step gradient delay regardless of pipeline depth. However, its adoption remains limited due to the common belief that optimizing under staleness is fundamentally unstable. In this work, we challenge this assumption, demonstrating that degradation under one-step delay depends strongly on optimizer choice rather than being an intrinsic limitation. We provide the first comprehensive empirical analysis showing that while AdamW, the predominant optimizer at the time when PipeDream-2BW was introduced, indeed suffers from severe degradation, recent methods like Muon exhibit strong robustness under a one-step delay. We introduce an optimizer-agnostic Error Feedback-inspired correction to further mitigate delay effects. We provide supporting theoretical analysis demonstrating convergence for Muon with and without this correction. Extensive evaluation on models up to 10B parameters confirms that our strategies bridge the performance gap with synchronous training, highlighting the practical potential of asynchronous pipeline parallelism at scale.

7
DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model

We present DreamForge-World 0.1 Preview, a preview foundational world model for real-time interactive world simulation. The system adapts the LongLive 1 autoregressive video stack, itself derived from Wan2.1-T2V-1.3B, with a residual action pathway inspired by the Matrix-Game family. DreamForge-World 0.1 Preview focuses on a complementary axis to frontier-scale world simulators: low-compute adaptation, consumer-GPU runtime, and broad interactive capability coverage. It supports live keyboard and mouse control, multimodal initialization, mid-stream reprompting, dual-view operation, and minute-scale interactive rollouts at native 480p resolution, reaching up to 14 to 15 FPS FPS on a single RTX 4090 with a low memory footprint. By leveraging open video backbones and applying targeted adaptation runs, we build the preview system with high cost-efficiency. DF-World 0.1 Preview is not yet a memory-complete or frontier-quality world simulator, but demonstrates a practical low-compute route toward real-time controllable world-model previews on consumer GPUs.

7
How Good Can Linear Models Be for Time-Series Forecasting?

Time-series forecasting research has been moving steadily toward larger architectures, from specialized transformers to general-purpose foundation models, on the assumption that capacity is what unlocks accuracy. We take the opposite position: most of the gap can be closed at far lower cost by tuning preprocessing rather than scaling models. We use Ridge regression as the testbed, since it has a closed-form solution and interpretable weights, which let the optimal hyperparameters be read off the search directly. We search over context length, local normalization, regularization, and augmentation on eight standard benchmarks and find three patterns. (1) Optimal lookback is strongly series-specific and often non-monotonic in forecast horizon, with fitted power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic, challenging the convention that longer horizons need longer history. (2) Normalizing over a learned trailing fraction of the context, rather than its entirety, is almost universally preferred. (3) Series within the same dataset often disagree on hyperparameters; the optimal degree of cross-series sharing varies from fully shared to fully per-series. The resulting models beat prior linear forecasters on most dataset-horizon entries and exceed Transformer, MLP, and CNN baselines on six of eight benchmarks. The optimized hyperparameters also serve as a diagnostic on the data itself, revealing structures that larger models absorb silently into their learned parameters.

6
MIMFlow: Integrating Masked Image Modeling with Normalizing Flows for End-to-End Image Generation

Normalizing Flows (NFs) are powerful generative models capable of exact density estimation and sampling. However, their strict invertibility often forces the model to exhaust its capacity on low-level pixel details, hindering the capture of high-level semantic structures. While Masked Image Modeling (MIM) has excelled in representation learning, its integration into generative pipelines has remained largely modular and disjointed. In this paper, we propose MIMFlow, a unified end-to-end framework that jointly optimizes latent semantics, pixel reconstruction, and generative flow. By employing a VAE encoder to infer semantic latent from masked images, MIMFlow achieves a principled decoupling of the generative task: the Normalizing Flow focuses on modeling a simplified, low-frequency semantic manifold, while a specialized decoder handles high-frequency synthesis. This design effectively resolves the inherent capacity bottleneck of NFs, allowing the model to prioritize global structural coherence over redundant noise. Empirical results on ImageNet 256times256 show that MIMFlow-L reaches 71.3\% linear probing accuracy and an FID of 2.50. Despite using only 128 tokens (50\% fewer than standard models), it yields a 32.8\% performance gain over similar-scale NF baselines. Our code is available at https://github.com/MCG-NJU/MIMFlow.

6
Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

We propose Nemotron-Labs-Diffusion-Image, a state-of-the-art masked discrete diffusion model (MDM) for high-resolution text-to-image synthesis. Compared with prior work on masked image generation, Nemotron-Labs-Diffusion-Image addresses two key challenges. First, unlike continuous diffusion models which progressively refine latent representations across the entire image, standard MDMs lack self-correcting capability because discrete tokens cannot be modified once they are unmasked. Second, although increasing the vocabulary size of discrete image tokenizers improves reconstruction fidelity, it introduces optimization difficulties for generative modeling as the per-token training signal becomes increasingly sparse. To address the first challenge, Nemotron-Labs-Diffusion-Image incorporates a token-editing mechanism that enables the model to dynamically revise already-unmasked tokens during inference, similar to how a sculptor iteratively refines their work. To tackle the second challenge, we propose a Grouped Cross-Entropy (GCE) objective that assigns positive learning signals to tokens neighboring the ground truth in embedding space, thereby alleviating signal sparsity. To further improve training efficiency, we implement a custom fused operator for GCE that significantly reduces VRAM usage in large-vocabulary settings. Experimental results demonstrate that these innovations substantially improve both training efficiency and image fidelity of masked discrete image generators, achieving a score of 0.90 on GenEval, 86.9 on DPG and 10.76 of HPSv3.

5
PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

LLM agents handle user requests on behalf of organizations through tool calls and must follow the company policies stated in their system prompts. Prior work approaches this as a safeguarding problem -- external checks that block non-compliant agent actions. We argue that policy adherence is a broader problem: real workflows unfold across many turns, require explicit user confirmation and prerequisite reads, and hinge on the content of the dialogue rather than on any single argument value. Meeting this bar requires (i) full conversation context, (ii) self-reasoning over the policy and the current dialogue, and (iii) conversation-specific remediation that guides the agent's next turn -- three capabilities that prior safeguard work has often underestimated. We introduce POLICYGUARD, a sub-agent verifier that shares the agent's view of the dialogue, reasons over the policy in context, and provides actionable feedback for the agent's next turn. On tau^2-BENCH airline across three vendors (GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Pro) with four trials per setting, POLICYGUARD improves PASS4 by +12.0 / +6.0 / +12.0 pp. Per-call analyses show POLICYGUARD achieves higher policy-violation recall while blocking roughly half as often as argument-level guards.

5
Walking in the Implicit: Interactive World Exploration via Neural Scene Representation

Interactive video generation systems for camera-controlled world exploration roll out growing sequences of latent video frames, entangling state transition with high-frequency observation synthesis. We propose Walking in the Implicit, a scene-centric paradigm that changes the rollout variable from frame latents to a fixed-length, renderable implicit state, termed Neural Implicit Scene (NIS). This factorizes interactive generation into stochastic transition of a compact scene state and deterministic pose-conditioned rendering given the sampled state. We instantiate this paradigm as NeuWorld: a transformer VAE learns locally anchored NIS from sparse posed frames, and a diffusion transformer evolves NIS conditioned on future camera trajectories and geometry-aware retrieved history. By reusing the VAE encoder as a unified conditioner, NeuWorld maps camera, reference-image, and history cues into the same NIS modality, avoiding external heterogeneous encoders. Trained from scratch on public posed-view data without pretrained video backbones or auxiliary 3D reconstructors, NeuWorld achieves strong long-horizon consistency with favorable inference efficiency.

4
Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

Predicting human item difficulty is central to educational assessment, where reliable estimates support fairness and effective test construction. Existing methods often depend on costly human calibration or item-level textual representations, providing limited evidence about the cognitive processes that make items difficult. We argue that difficulty should be viewed not only as a property of item text, but also as an observable consequence of the problem-solving burden an item induces. Large Reasoning Models (LRMs) offer scalable process evidence through reasoning traces, but such evidence must be structured to support interpretable modeling. To this end, we introduce Epi2Diff (Episode to Difficulty), a framework that maps LRM reasoning traces into cognitively grounded episode sequences. These episodes group trace segments into functional problem-solving states, enabling difficulty to be modeled through reasoning scale, effort allocation, and state transitions. Epi2Diff extracts compact episode-dynamic features and combines them with semantic item representations for human difficulty prediction. Experiments on four real-world human difficulty datasets show that Epi2Diff consistently outperforms strong baselines, including fine-tuned small language models, LLM in-context learning, and supervised LLM adaptation. On SAT-derived classification benchmarks, Epi2Diff achieves an 8.1% average relative gain over supervised LLM fine-tuning baselines. Further analyses show that harder items induce more effortful, iterative, and implementation-centered episode dynamics, rather than merely longer responses. These results demonstrate that cognitive episodes in LRM reasoning traces provide a predictive and interpretable process representation for human item difficulty, offering a new lens for educational measurement with reasoning models.

4
ZooClaw-FashionSigLIP2: Distilled Fine-tuning for Robust Fashion Retrieval

Adapting a foundation vision-language encoder to a specialized retrieval task creates a fundamental tradeoff: gains on the target distribution come at the cost of the foundation model's broad generalization, and fashion retrieval is a stringent instance of this problem. We present ZooClaw-FashionSigLIP2, a fashion-specialized SigLIP2-base model that resolves this tradeoff with a simple recipe -- full fine-tuning with knowledge distillation on curated in-domain data, followed by \wiseft~wortsman2022wiseft weight interpolation with the base model -- and outperforms LoRA, larger backbones (up to 1B parameters), and external training data. Under fair evaluation, ZooClaw-FashionSigLIP2 outperforms all baselines on every benchmark in our suite. In addition, we release ZooClaw-Fashion, a new high-quality fashion retrieval benchmark, and a systematic quality analysis of widely-used benchmarks that exposes and mitigates structural biases in their public ground truth. We open-source the model weights and all evaluation artifacts to facilitate future research.

3
Focusing on What Matters: Saliency-Harnessing Accurate Routing for Diffusion MoE

Mixture-of-Experts (MoE) architectures have emerged as a powerful paradigm for scaling diffusion models in visual generation. Recent advancements have focused on adaptively allocating computational resources across diverse tokens to improve efficiency and performance. However, we identify a routing assignment problem in existing diffusion MoE frameworks: the router fails to accurately allocate more computational resources to salient tokens. Our analysis attributes this failure to the router's reliance on noise-corrupted latent features throughout the denoising process. Such stochastic noise obscures the critical structural and textural information, thereby preventing the router from effectively distinguishing salient tokens. To address this, we propose SharpMoE, a post-training framework with a saliency-harnessing accurate routing mechanism, which utilizes clean latent features as a noise-free guidance signal for routing. By bypassing the noise-distorted inputs, SharpMoE provides the router with clear saliency guidance, enabling the identification of salient tokens even in high-noise stages. Furthermore, we introduce a trajectory routing loss to constrain the compute allocation throughout the multi-step denoising trajectory, ensuring precise resource allocation along the generation rollout. Extensive experiments demonstrate that SharpMoE serves as a versatile, plug-and-play solution that further enhances the pretrained, converged MoE models, achieving state-of-the-art performance in visual generation.

3
Beyond Drug Discovery: The Nanotechnology Molecular Optimization (NMO) Benchmark

Generative molecular design is shaped by simple proxy benchmarks for drug-like properties and models pretrained on large pharmaceutical datasets. This combination yields strong benchmark metrics but limits transferability to domains structurally distinct from drug discovery. To overcome this limitation and drive discovery toward real, scientifically grounded targets, we introduce the Nanotechnology Molecular Optimization (NMO) Benchmark, which bridges machine learning (ML) and quantum materials science. NMO acts simultaneously as a rigorous testbed for the ML community and a discovery engine for nanotechnology research. The suite replaces proxy oracles with quantum simulations and introduces strict protocols that prioritize scientific utility over leaderboard-oriented overfitting. The physics-based NMO tasks impose hard structural constraints and rugged fitness landscapes, posing fundamentally new requirements on generative models. Notably, advanced molecular optimization methods underperform much simpler approaches on the NMO tasks. We develop a new baseline method identifying the critical components to solve the NMO tasks, including a novel representation for modeling structural constraints and a domain-agnostic pretraining strategy to eliminate pharmaceutical dataset bias. Our results surpass state-of-the-art physical properties and reveal previously unknown structural motifs, offering new insights for the nanotechnology community and demonstrating that ML can drive genuine scientific discovery.

2
TheoremGraph: Bridging Formal and Informal Mathematics

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level dependency graph spanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we release LeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generated natural-language slogans into a shared semantic space, linking related statements across papers and across the informal/formal divide; an LLM judge affirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formal concept retrieval, our name-and-signature representation with graph expansion comes within 0.5pp of LeanSearch v2's reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.

2
The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

4D hand motion reconstruction from egocentric video is bottlenecked by clear limitations of existing methods: image-based pipelines depend on a detector that fails under heavy occlusion, while video-based methods rely on temporal modules learned only from scarce hand-pose annotations, a narrow signal insufficient to model motion dynamics, occlusion reasoning, and hand-object interaction. These capabilities, however, are exactly what video generative models must implicitly acquire when trained to synthesize coherent video at internet scale. Motivated by this, we present ViDiHand, which leverages the representations of a pretrained video diffusion model to reconstruct 4D two-hand pose. We adapt it via a hand-overlay rendering objective that specializes its features for hands while preserving its world priors. A decoder then recovers metric-scale pose from the adapted features. The whole pipeline operates directly on full frames--no detector, no infiller, and no test-time optimization. On ARCTIC, HOT3D, and HOI4D, ViDiHand substantially outperforms prior methods, establishing video diffusion models as a powerful new foundation for hand motion reconstruction and a promising route to scalable in-the-wild data collection for embodied AI. Project page: https://vidihand.github.io.

2
SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

In real-world applications, guardrails are often expected to identify unsafe user-model interactions according to application-specific safety policies, rather than relying on predefined risk taxonomies. In this work, we study this setting under the paradigm of in-context policy guardrailing, where guardrails predict safety violations based on policy specifications provided in context. To systematically evaluate this capability, we introduce SafePyramid, a safety benchmark comprising 1,000 multi-turn conversations across 10 domains and 3,000 corresponding application-specific policies, which together contain 61,699 distinct natural-language rules. SafePyramid organizes the evaluation into three difficulty levels: L0 evaluates individual-rule understanding, L1 evaluates reasoning over rule dependencies, and L2 evaluates adaptation of full novel policy frameworks defined in context. To ensure benchmark quality, we employ a rigorous multi-stage pipeline to construct and validate the benchmark. Using SafePyramid, we evaluate 10 frontier LLMs and 5 policy-configurable guardrails and find that in-context policy guardrailing remains highly challenging: even the best-performing model, GPT-5.5, exactly identifies the full set of violated rules in only 54.0%, 35.3%, and 12.9% cases on L0, L1, and L2, respectively. These results highlight the limitations of current guardrails and call for stronger in-context policy guardrails that can reliably execute policies, resolve rule dependencies, and adapt to novel policy frameworks.

2
ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models

The emergence of Large Reasoning Models has introduced exceptionally long Chain-of-Thought traces, creating a transparency burden where critical logic is often buried under massive procedural text. To address this, we present ReasoningLens, an open-source framework designed for the hierarchical visualization and diagnostic auditing of complex reasoning chains. ReasoningLens addresses information necropsy by: (1) structuring traces into interactive hierarchies that separate high-level strategy from low-level execution; (2) leveraging an agentic auditor for automated error detection and tool-augmented verification; and (3) synthesizing systemic reasoning profiles to reveal model-specific blind spots. By transforming unstructured walls of text into actionable insights, ReasoningLens provides a modular foundation for interpreting, debugging, and optimizing the next generation of reasoning-centric AI.

2
One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models

A faithful 3D world representation should account for layered geometry, where a single camera ray may contain multiple visible and geometrically valid surfaces. Monocular depth estimation, however, reduces this structure to one scalar depth per pixel. Transparent scenes make this ambiguity measurable: the same ray can pass through foreground glass and observe the background, turning the supervised target into a convention of annotation, data, and training rather than a scene-intrinsic truth. A learned predictor exposes this convention as its depth-layer preference. We introduce MultiDepth-3k (MD-3k), a sparse two-layer ordinal benchmark for measuring depth-layer preference and multi-layer spatial relationship accuracy (ML-SRA). On MD-3k, leading depth foundation models exhibit diverse layer preferences under standard RGB input, showing that the same layered geometry can be resolved differently across models. We further find that Laplacian Visual Prompting (LVP), a training-free spectral input transformation, can substantially change the reported layer for certain frozen models. The strongest RGB/LVP pair, DAv2-L, reaches 75.5% ML-SRA. These results suggest that depth foundation models may express complementary geometric hypotheses that standard RGB inference leaves unexpressed. We invite the community to rethink depth supervision and evaluation through an ambiguity-aware lens, where multiple valid 3D interpretations are treated as geometric structure to be measured, preserved, and expressed.

1
RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation

Pre-trained Vision Foundation Models (VFMs) have become central to modern computer vision due to their powerful semantic representations and strong generalization ability. However, their patchified or pooled outputs are inherently low-resolution, limiting their effectiveness in tasks requiring fine-grained, pixel-level reasoning. Existing feature upsampling approaches either degrade semantic fidelity or rely on VFM-specific retraining and heavy architectures, hindering efficiency and scalability. To address these challenges, we propose RaysUp, an ultra-lightweight, task-agnostic, and VFM-agnostic feature upsampling framework that reconstructs high-resolution feature maps at arbitrary resolutions. Unlike conventional 2D interpolation or attention-based schemes, RaysUp lifts feature reconstruction into a geometry-aware ray domain. Specifically, we introduce a Spatially Decoupled Guidance Encoder for direction-aware guidance encoding, an Any-Resolution Cross-Attention mechanism for resolution-flexible reconstruction, and a novel Ray Positional Encoding (RayPE) that injects implicit 3D geometric priors via 6D Plucker ray coordinates. Finally, a Geometry-Aware Neighborhood Attention module further ensures content-adaptive bilateral aggregation while preserving geometric consistency. Extensive experiments across diverse dense prediction tasks demonstrate that RaysUp achieves state-of-the-art performance while using only 16% of the parameters of AnyUp and delivering approximately 7x faster inference. These results highlight a substantially improved accuracy-efficiency trade-off and establish RaysUp as a practical and scalable solution for universal feature upsampling. Code is available at https://github.com/MAP-RaysUp/RaysUp.

1
Illuminating Unified Multimodal Model for Free-form Interleaved Text-Image Generation

The advancement of generative AI models capable of producing text and image marks a critical step forward in the realm of multimodal intelligence, particularly for tasks involving the interleaving of both modalities. To advance this intelligence to the next stage, it is crucial for models to autonomously generate free-form interleaved text-image sequences. In this paper, we introduce ILLUME-X, an advanced unified multimodal paradigm that enables high-quality, free-form interleaved text-image generation by improving multimodal data efficiency and stabilizing the multimodal training process. ILLUME-X comprises three key components: (i) an expanded training data pipeline optimized for interleaved text-image generation, (ii) a progressive training strategy with self-adaptive objectives for free-length multimodal token sequences, and (iii) an objective and comprehensive evaluation method ILScore for interleaved text-image sequences. Notably, our ILLUME-X outperforms previous unified models across multiple interleaved text-image generation tasks like style transfer, image decomposition and storytelling.

1
Large-Scale Tunnel Air-Ground Collaboration With FLISP: Fast LiDAR-IMU Synchronized Path Planner

Hydropower tunnel inspection is critical for infrastructure integrity yet remains inefficient and hazardous using manual methods. We propose FLISP (Fast LiDAR-IMU Synchronized Path Planner), a mapless planning framework for cooperative UGV-UAV inspection. Unlike traditional map-based paradigms, FLISP features three core contributions: (1) a unified architecture where a single UGV-mounted LiDAR-IMU suite drives synchronized path generation for both platforms; (2) platform-specific solvers utilizing an enhanced Firefly Algorithm for UGV obstacle avoidance and a dynamic iterative optimizer for UAV flight; and (3) a hierarchical refinement strategy ensuring kinematic feasibility without state estimation drift. Benchmarks in a 1.2 km operational tunnel demonstrate that FLISP circumvents structural bottlenecks of map-based methods, eliminating map rasterization overhead (Fast-LIO2 + A*) and sampling instability (LIO-SAM + RRT*). FLISP achieves a 100% success rate with 7 ms latency, representing a 7-fold speedup over grid-based and a three-order-of-magnitude improvement over sampling-based baselines. Validated in operational hydropower tunnels, this approach offers a scalable solution for robotic inspection in feature-degraded linear infrastructure. A demonstration video is available at https://youtu.be/Y_ezs1PfLJ4, and the code at https://github.com/ArchibaldGuo/FLISP.git.

1
Learning Transferable Dynamics Priors from Action to World Modeling

We study action-conditioned world modeling as a scalable way to learn transferable dynamics priors for robot learning. By pretraining a model to predict how actions drive visual scene evolution, the resulting world model captures reusable interaction dynamics beyond appearance-level video generation. Concretely, we pretrain a multi-view interactive base diffusion world model, A2World, on large-scale robot manipulation data with real action annotations. We validate the learned dynamics priors from two complementary perspectives. First, we adapt A2World into a task- or scene-specialized real-world simulator, A2World-sim, whose long-horizon rollouts support simulator-based policy evaluation and scalable what-if analysis by replacing real-robot rollouts with world model rollouts. Second, starting from the same pretrained weights, we adapt A2World into a video-action joint prediction model, A2World-policy, that predicts actions under visual and instruction conditioning. Experiments across simulation benchmarks and real-robot settings demonstrate that action-conditioned world model pretraining yields transferable dynamics priors that benefit both simulator-centric and policy-centric robot learning.

1
Geometric Stability of Neural Population Codes: Regional Variation, Behavioral Relevance, and Circuit Dependence

Current models of representational reliability in neural populations focus on temporal stability: whether population centroids are preserved across sessions and days. This framing leaves a fundamental question unanswered: how reliably does the pairwise distance structure among stimuli reproduce across independent observations within a session? We argue that this property, geometric stability, constitutes an independent axis of representational analysis that existing frameworks do not capture. We formalize geometric stability as the Spearman rank correlation between split-half representational dissimilarity matrices (Shesha) and show that it is empirically dissociable from both temporal stability and decoding accuracy. Across 229 area-session observations spanning 68 brain regions in a visual discrimination task (Steinmetz et al. 2019), geometric stability predicts trial-by-trial neural-behavioral coupling (ρ= 0.18, p = 0.005) while centroid drift does not (ρ= 0.002, p = 0.976). The regional hierarchy, with striatum most stable (S = 0.44) and hippocampus least (S = 0.19), runs roughly opposite to the temporal stability hierarchy. Directionally consistent olfactory data (Bolding \& Franks 2018) motivate an attractor network model in which recurrent excitatory coupling amplifies split-half RDM consistency by completing stimulus patterns from sparse feedforward input (ρ= +0.64, p = 0.010), providing a circuit-level account of how geometric stability emerges. These results establish geometric stability as a functionally relevant, circuit-dependent property of neural population codes, orthogonal to temporal drift measures and complementary to recent accounts of how recurrent connectivity balances representational stability with sequential dynamics in hippocampal circuits.

1
One Forward Beats Two: InnerZoom for Accurate and Efficient GUI Grounding

MLLM-based GUI grounding methods commonly formulate target localization as autoregressive coordinate generation, enabling models to leverage the strong instruction-following and semantic understanding capabilities of MLLMs. However, this formulation requires the model to retain region-level target evidence while decoding coordinate tokens with the spatial precision demanded by GUI clicking. Our diagnostic analysis reveals that target-region awareness emerges in intermediate decoder layers but is neither retained nor translated into the final coordinate prediction. Existing ZoomIn-style methods address this issue through an external crop-and-rerun pass, which improves localization but increases end-to-end latency and computational cost. To retain the accuracy benefits of two-pass zooming without this extra cost, we propose InnerZoom, a single-forward framework for cross-layer evidence bridging. InnerZoom transforms target-related cues from the original forward pass into a compact cross-layer evidence state, then preserves, refines, and reinjects this state throughout later decoding layers to guide coordinate prediction. Extensive experimental results suggest that InnerZoom-4B achieves state-of-the-art performance on all six GUI grounding benchmarks, obtaining 64.7 on OSWorld-G, 40.2 on UI-Vision, 73.1 on OSWorld-GR, and 87.6 on MMBench-GUI, surpassing the previous best results by 4.1, 3.2, 2.9, and 2.3 points, respectively. Under a controlled 4B setting, InnerZoom improves the same SFT+RL baseline by 5.3 points on average and outperforms two-pass ZoomIn by 1.3 points on average, while reducing end-to-end latency by up to 31.8% and TFLOPs by about 29%. Code and models will be publicly available.

1
PoseShield: Neural Collision Fields for Human Self-Collision Resolution

Self-collision remains a persistent challenge in SMPL-based human pose estimation and motion generation. Under extreme articulations or stochastic motion synthesis, generated meshes frequently exhibit self-penetrations, leading to physically implausible results. We propose PoseShield, a neural collision constraint defined directly in SMPL pose space. We formulate collision correction as a constrained optimization problem and connect the learned constraint with the Eikonal equation. Enforcing Eikonal regularization ensures non-vanishing gradients near the collision boundary, improving numerical stability and robustness of the optimization process. Unlike prior methods that operate in the mesh space or rely on heuristic penalties, our approach operates directly in the low-dimensional space of human poses and is theoretically grounded. The same learned constraint extends to human motion sequences, providing a generator-agnostic post-hoc collision corrector without retraining the underlying motion model. Experiments on a newly constructed SMPL pose benchmark show that our method achieves a 95.8% success rate and outperforms state-of-the-art baselines.

1
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 30, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

v0 Design Systems 2.0 icon
v0 Design Systems 2.0

Build with your components, colors, fonts, and patterns

0
Load Nova icon
Load Nova

An AI co-pilot and dashboard built for dispatcher speed

0
Dayflow icon
Dayflow

Open source tools that help you get promoted

0
Foresight by Lightning Rod icon
Foresight by Lightning Rod

Predict anything with AI

0
Oakamo icon
Oakamo

Your quiet space for reading articles later.

0
Supafax icon
Supafax

Email-native assistant that learns how you work

0
Skills Marketplace by Databox icon
Skills Marketplace by Databox

Ready-made AI analytics skills for your business data

0
Bilt.me - Figma icon
Bilt.me - Figma

Get a real mobile app from your Figma design

0
iVox icon
iVox

The first app dedicated to 1980s tape-edit effects.

0
Midway Chat icon
Midway Chat

Real-time member chat for Memberstack and Webflow sites

0
DropK icon
DropK

The tray that doesn't pretend

0
Tinkerfont icon
Tinkerfont

Free font playground for live websites

0
Brain2Qwerty v2 icon
Brain2Qwerty v2

Decode sentences directly from non-invasive brain signals

0
Justwrite icon
Justwrite

A private, local-first writing space that works offline

0
AgentPeek icon
AgentPeek

Claude Code & Codex in your Mac notch

0
Clade icon
Clade

AI COO that runs your team in tools you already use

0
Pluno icon
Pluno

Browser agent that’s 10x faster than Claude

0
Cursor for iOS icon
Cursor for iOS

Build with coding agents from anywhere

0
Akiflow icon
Akiflow

Manage tasks and calendars from Claude, ChatGPT or Cursor

0
VisibAI icon
VisibAI

Are you in AI answers? Find out and fix it in minutes

0
Agent Mode by Receiptor AI icon
Agent Mode by Receiptor AI

Bookkeeping assistant that runs receipt workflows end-to-end

0
Outpaint - Ad Reframe icon
Outpaint - Ad Reframe

AI to turn vertical UGC into widescreen ads

0
ReadHere icon
ReadHere

Lightweight PDF & EPUB reader in your browser

0
Upstream FTP icon
Upstream FTP

A fast, beautiful, and native FTP/SFTP client for macOS

0
Intelli icon
Intelli

Convert leads into customers with AI conversations

0
PMB icon
PMB

Stop re-explaining your project to AI coding agents

0
Crest icon
Crest

System stats and translation on your Mac's notch

0
Spira for Product Hunt Makers icon
Spira for Product Hunt Makers

Social media growth agents that build your momentum

0
Sami icon
Sami

Automate ad budgets across Google, LinkedIn & Meta ads

0
ClinePass icon
ClinePass

Run the best open-weights models in Cline

0
discode.ai icon
discode.ai

100+ AI models, one interface. ECO friendly.

0
Persona.js icon
Persona.js

Add WebMCP-native AI chat to any Frontend

0
GetCompress icon
GetCompress

Lossless media compression without context switching

0
Lyto icon
Lyto

"One AI agent across your browser, tools, and messages "

0
Dotient icon
Dotient

Your local semantic search app

0
RetroMac icon
RetroMac

Turn your Mac into a time machine.

0
Nada icon
Nada

Compose music with just your voice

0
Supra Player icon
Supra Player

Compare & Sync Videos Fast

0
QApilot's CoWork icon
QApilot's CoWork

3x Mobile Automation. Same QE Team.

0
Cloud World Model icon
Cloud World Model

Simulate AWS, GCP & DigitalOcean without paying the bill

0
Epilogue. Write novels, scripts & poetry icon
Epilogue. Write novels, scripts & poetry

The professional book writing app built for serious authors

0
Folio AI icon
Folio AI

Claude for PowerPoint, on steroids

0
Gemini Spark icon
Gemini Spark

Your 24/7 personal AI agent

0
LockIn MCP icon
LockIn MCP

Let AI block distractions for you when you need to lock in

0
Atlas icon
Atlas

Every AI tool you use should know how your company works

0
ModuleX icon
ModuleX

AI workspace that’s already connected to everything

0
Aurora Notch icon
Aurora Notch

A private notch workspace for every Mac

0
SquidHub icon
SquidHub

Multiplayer mode for humans and AI

0
Agent Arena icon
Agent Arena

The first public arena for AI agents

0
Sleek Analytics icon
Sleek Analytics

See who's on your site. Right now.

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 30, 2026

Techmeme Digest: Major tech headlines and industry conversations.

An Indonesian court sentences Gojek co-founder and ex-education minister Nadiem Makarim to 10 years in prison for power abuses over a Chromebook contract (New York Times)
Source: TechmemePublished: Jun 30, 2026

New York Times : An Indonesian court sentences Gojek co-founder and ex-education minister Nadiem Makarim to 10 years in prison for power abuses over a Chromebook contract —  The case against Nadiem Makarim, a co-founder of Gojek, has fueled concerns about judicial fairness in a nation where foreign investors were already growing wary.

Q&A with Grindr CEO George Arison on turning Grindr into an "AI-native company", "impos[ing]" AI on staff, facing "opposition", using AI as a CEO, and more (Jordyn Holman/New York Times)
Source: TechmemePublished: Jun 30, 2026

Jordyn Holman / New York Times : Q&A with Grindr CEO George Arison on turning Grindr into an “AI-native company”, “impos[ing]” AI on staff, facing “opposition”, using AI as a CEO, and more —  George Arison, the gay dating app's chief executive, is aiming for all code to be eventually written …

How Spain's LaLiga uses nationwide IP blocking to combat illegal sports streams, impacting an estimated 550K+ domains, putting it in conflict with Cloudflare (Bloomberg)
Source: TechmemePublished: Jun 30, 2026

Bloomberg : How Spain's LaLiga uses nationwide IP blocking to combat illegal sports streams, impacting an estimated 550K+ domains, putting it in conflict with Cloudflare —  From small businesses to government agencies, legitimate websites are getting caught in the crossfire of LaLiga's fight against illegal sports streams

As OpenAI and Anthropic prepare to go public, San Francisco tech workers making six figures say they cannot compete with the new AI elite and may have to leave (Emmy Martin/New York Times)
Source: TechmemePublished: Jun 30, 2026

Emmy Martin / New York Times : As OpenAI and Anthropic prepare to go public, San Francisco tech workers making six figures say they cannot compete with the new AI elite and may have to leave —  As OpenAI and Anthropic prepare to go public, tech workers making six figures are grousing that they cannot compete with the new A.I. elite.

Meituan open-sources LongCat-2.0, a 1.6T-parameter model that it says was trained on a 50K-chip cluster of domestic Chinese processors, without giving details (Reuters)
Source: TechmemePublished: Jun 30, 2026

Reuters : Meituan open-sources LongCat-2.0, a 1.6T-parameter model that it says was trained on a 50K-chip cluster of domestic Chinese processors, without giving details —  China's food delivery giant Meituan (3690.HK) said on Tuesday it had released and would open-source its next-generation LongCat …

AI and vibe coding fuel a surge in game releases; ATTN Economy says 181K mobile games launched in six months to May, up 118% on iOS and 73% on Android YoY (Orlando Crowcroft/Financial Times)
Source: TechmemePublished: Jun 30, 2026

Orlando Crowcroft / Financial Times : AI and vibe coding fuel a surge in game releases; ATTN Economy says 181K mobile games launched in six months to May, up 118% on iOS and 73% on Android YoY —  More instinctive technology is accelerating production amid concerns it risks losing gamers' trust

The UK CMA proposes requiring Apple and Google to relax UK developer payment "steering" rules, and Apple to open NFC; Google says it "already made the changes" (Sam Tabahriti/Reuters)
Source: TechmemePublished: Jun 30, 2026

Sam Tabahriti / Reuters : The UK CMA proposes requiring Apple and Google to relax UK developer payment “steering” rules, and Apple to open NFC; Google says it “already made the changes” —  Britain's competition regulator on Tuesday proposed allowing app developers to steer users …

Hotels, tour operators, and travel agencies rush to launch proprietary online tools and loyalty schemes to fend off future competition from AI travel agents (Stephanie Stacey/Financial Times)
Source: TechmemePublished: Jun 30, 2026

Stephanie Stacey / Financial Times : Hotels, tour operators, and travel agencies rush to launch proprietary online tools and loyalty schemes to fend off future competition from AI travel agents —  Chatbots could help to find, filter and book customers' destinations  —  Hotels, tour operators and travel agencies are rushing …

AI may help transform air traffic control by processing vast amounts of data, spotting collision risks early, and easing staff shortages as air travel grows (Peter Campbell/Financial Times)
Source: TechmemePublished: Jun 30, 2026

Peter Campbell / Financial Times : AI may help transform air traffic control by processing vast amounts of data, spotting collision risks early, and easing staff shortages as air travel grows —  The technology has potential to assist controllers with an increasing flight load but many are wary

Dubai- and London-based 1001, which uses AI to improve aviation, port, and energy infrastructure efficiency in the Gulf, raised $30M led by Lux Capital (Matthew Martin/Semafor)
Source: TechmemePublished: Jun 30, 2026

Matthew Martin / Semafor : Dubai- and London-based 1001, which uses AI to improve aviation, port, and energy infrastructure efficiency in the Gulf, raised $30M led by Lux Capital —  THE SCOOP  —  Dubai- and London-based startup 1001 has raised $30 million from investors including US venture firm Lux Capital …

US political campaign managers and consultants are using AI to analyze voter data, create campaign materials, and more; survey: 87% of campaigners use AI daily (Stuart A. Thompson/New York Times)
Source: TechmemePublished: Jun 30, 2026

Stuart A. Thompson / New York Times : US political campaign managers and consultants are using AI to analyze voter data, create campaign materials, and more; survey: 87% of campaigners use AI daily —  A.I.-generated images are the public face of this election overhaul.  Behind the scenes, campaigns are using the technology …

Five Chinese tech and advanced manufacturing companies launch Hong Kong listings, seeking to raise up to $5.6B, led by Apple supplier Luxshare's $3.15B offering (Reuters)
Source: TechmemePublished: Jun 30, 2026

Reuters : Five Chinese tech and advanced manufacturing companies launch Hong Kong listings, seeking to raise up to $5.6B, led by Apple supplier Luxshare's $3.15B offering —  Five Chinese technology and advanced manufacturing companies launched Hong Kong listings on Tuesday to raise up to HK$44.1 billion …

Sources: Chinese smartphone makers like Xiaomi, Oppo, and Vivo told suppliers they will again cut 2026 shipment targets, with Xiaomi cutting 30% to ~95M units (Nikkei Asia)
Source: TechmemePublished: Jun 30, 2026

Nikkei Asia : Sources: Chinese smartphone makers like Xiaomi, Oppo, and Vivo told suppliers they will again cut 2026 shipment targets, with Xiaomi cutting 30% to ~95M units —  TAIPEI — Major Chinese smartphone makers including Xiaomi, Oppo and Vivo have told suppliers they will again cut shipment targets …

Source: the Trump administration has talked with SpaceX about donating shares to Trump Accounts, which provide children with tax-advantaged savings accounts (Semafor)
Source: TechmemePublished: Jun 30, 2026

Semafor : Source: the Trump administration has talked with SpaceX about donating shares to Trump Accounts, which provide children with tax-advantaged savings accounts —  THE SCOOP  —  The Trump administration has spoken with SpaceX about donating stock to the children's savings accounts known …

Nebex, a fintech startup aiming to act as a broker platform connecting US space tech suppliers, foreign governments, and investors, raised a $30M seed led by GV (Bloomberg)
Source: TechmemePublished: Jun 30, 2026

Bloomberg : Nebex, a fintech startup aiming to act as a broker platform connecting US space tech suppliers, foreign governments, and investors, raised a $30M seed led by GV —  Alphabet Inc.'s GV venture arm led a funding round to provide $30 million in seed money to a fledgling space fintech company founded …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 30, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 30, 2026

Solidot Feed: Highlighting essential tech & open-source news.

中国大学停招众多语言专业

麦可思研究上个月发表的一项调查发现,根据 70 所本科高校最新公布的停招专业名单,日语专业共有 8 所高校停招,德语专业 5 所,翻译研究专业 5 所。麦可思称,过去多年,外语类专业曾是高校扩招的重要方向,但随着国际交流环境变化以及人工智能翻译工具快速发展,传统语言类专业的人才培养模式也开始转型。大学如何应对 AI 带来的影响?中国大学需要政府批准才能新增专业。据报道,教育部批准下学年新增 38 个专业,其中大部分侧重于科技或数字化领域。新专业包括具身智能、商业 AI、数据智能等,以及低空经济与管理、半导体设备工程、稀土科学与工程等领域。

国际清算银行警告 AI 泡沫破裂将增加全球经济衰退风险

国际清算银行(BIS)发表年度经济报告,警告 AI 泡沫破裂将增加全球经济衰退风险。报告指出,AI 投资虽能助推生产力提升,但过度投资一旦退潮反转,可能将引发金融系统混乱。根据早先的报道,亚马逊预计 2026 年资本支出 2000 亿美元,微软预计为 1900 亿美元,Google 约为 1800 亿美元,Meta 1400 亿美元。甲骨文也将投入巨资。五大数据中心运营商 2026 年的 AI 相关资本支出将超过 1 万亿美元。报告指出,“投资承诺的增长速度超过了这些公司的利润和自由现金流,导致部分公司不得不发行债券筹集额外资金。这场投资竞赛的部分原因可能是认为只有少数拥有卓越技术的企业最终能主导市场。”报告指出,“回报不尽如人意可能会引发融资突然收缩,使资本支出繁荣演变为旷日持久的投资萧条,可能对金融状况产生连锁反应。”报告还提到,电力供应、芯片短缺和电网连接瓶颈等问题引发了对“供应侧障碍”的担忧。

被遗弃的金鱼会破坏生态环境

一项新研究发现,当宠物金鱼被放生或逃逸至野外时,会对淡水生态系统产生重大影响。该研究利用大型户外淡水模拟生态系统,旨在模拟真实湖泊环境。研究人员将金鱼引入实验生态系统,并长期观察它们对不同类型湖泊的影响。研究团队考察了两种常见的淡水环境:营养贫乏(寡营养)水域和营养丰富(富营养)水域。在这两种环境中,金鱼都造成了实质性的生态破坏。最重要的发现之一是水质迅速恶化。在营养丰富的系统中,金鱼导致水体透明度急剧下降,同时悬浮颗粒物显著增加,表明生态系统状况发生了重大改变。其次是本地水生物种减少。蜗牛、片脚类动物和浮游动物的种群数量显著下降。这些小型生物在健康的淡水食物网中发挥着关键作用,同时受到了捕食和栖息地干扰的双重影响。本地鱼类也受到负面冲击。金鱼与本地鱼类争夺食物和其他资源,导致本地鱼类整体体质下降。科学家将其视为长期种群健康的重要指标。研究人员表示,应将金鱼列为高优先级入侵物种。他们建议自然资源机构在野生种群建立之前,重点开展预防、早期发现和控制工作。

科学家发现液态水存在两种结构的分子水平证据

根据发表在《Nature Physics》上的研究,科学家发现了液态水存在两种微观结构的分子水平证据。水可能以两种不同的结构状态存在并非新观点。几十年来科学家一直推测液态水由两种可相互转化的局部结构组成——一种密度更高更无序,另一种密度更低更有序。双状态模型被用于解释水的许多反常性质,如为什么水在冷却时更容易被压缩,以及为什么水的最大密度出现在 4°C 而不是冰点。但由于难以获得直接分子水平证该模型仍然受到争议。双状态模型的核心是被称为液液相变(liquid-liquid phase transition)的假设现象。其基本思想是,在深度过冷状态下,水会分裂成两种宏观上不同的液相:高密度液体和低密度液体。

一项关于癌症治疗时机的论文被撤稿

今年初一篇发表在医学期刊上的文章因其惊人的结论引起了全球癌症患者和医生的关注。仅仅改变免疫疗法的给药时间,似乎就能为肺癌患者带来意想不到的巨大益处。根据在中国进行的一项临床试验结果,上午接受静脉输注的患者癌症得到控制的时间是下午接受输注患者的两倍。研究还指出,这些患者的存活时间也延长了近两倍。 几位肿瘤科医生表示,近几个月来,他们和各自所在医院接到了大量患者的电话,咨询是否可以改在上午进行输注。上周《自然-医学》撤回了该研究,理由是其试验设计和结果存在一系列矛盾和不规范之处。期刊在其撤稿声明中列举的问题包括:原本应该在研究开始前锁定的记录在进行到一半时被修改了;该研究计划的中文版本与翻译版本之间存在差异;所有患者在研究的第一年都接受了治疗和随访,没有人因副作用而退出——这在肿瘤学研究中极为罕见;此外随访扫描的时间安排也发现了异常模式。其他研究也发现患者接受癌症免疫疗法的时间与他们的预后之间存在某种关联。但原因仍不清楚。医生们表示,这有可能是精力更充沛、更健康的患者会选择上午的时间段。而住在远离输注中心、且往往预后较差的贫困或农村患者可能会要求选择下午的时间段,因为他们需要花整个上午的时间在赶往预约的路上。

三星、SK 海力士和美光再次被控串通操纵内存价格

14 名消费者和 3 家小企业于 25 日在加州联邦法院提起诉讼,指控全球最大的三家内存供应商三星、SK 海力士和美光自 2022 年起串通操纵内存价格和供应,导致过去四年内存价格上涨约 700%。原告称,三家公司以向 HBM 过渡为借口,减少了 DDR 内存的供应,“DDR 内存寡头垄断企业系统性协调了向 HBM 的过渡以及 DDR3 和 DDR4 的停产,”苹果公司近期大幅提高产品价格是引发这场诉讼的导火索。虽然这起诉讼规模较小,但如果法院接受原告的诉求并正式批准其为集体诉讼,诉讼规模可能会扩大。代表原告的反垄断律所 Bathaee Dunne 的目标是发起一项集体诉讼,代表所有购买过含 DRAM 产品的普通消费者和企业。三星电子和 SK 海力士此前在美国被判串谋罪名成立,导致巨额罚款以及高管入狱。

美最高法院裁决手机地理位置数据受宪法第四修正案的保护

美国最高法院裁决智能手机的地理位置数据受宪法第四修正案的保护。在 Chatrie v US 一案中,最高法院以 6 比 3 判决政府败诉。大法官 Elena Kagan 执笔多数判决书,指出地理围栏搜查令获取的敏感数据属于第四修正案规定的搜查范围,即使个人身处公共场所也享有“合理的隐私期望”,“个人对其手机位置记录享有合理的隐私期望,警方索取这些信息——即使只是在有限的时间范围内,且是从第三方科技公司获取——也侵犯了这种受宪法保护的权利。”Okello Chatrie 在 2019 年 5 月 20 日持枪抢劫了一家银行,抢走 19.5 万美元后逃走。当地警方利用地理围栏搜查令让 Google 提供了抢劫前后 30 分钟内距离银行 150 米范围内的所有设备关联的账号信息。其中一个账号就是 Chatrie。他曾选择启用 Google 的“位置历史记录”功能,该功能每隔几分钟就会记录他的位置。在认罪后他被判处 12 年监禁。他的律师认为,地理围栏搜查令搜查范围过广,侵犯了他受宪法第四修正案保护的权利。美国政府则认为执法部门只获取少量手机位置信息,不属于第四修正案所指的搜查,因而不应享有同样的隐私保护。大法官们站在了政府的对立面。乔治城大学法学教授 Paul Ohm 表示,最高法院重申,警方需要搜查令才能将 Google 位置追踪等私人服务转变为国家监控工具。

Rocket Lab 收购铱星

火箭发射公司 Rocket Lab 宣布收购卫星运营商铱星公司。双方达成最终协议,Rocket Lab 以每股 54 美元,现金加股票的方式收购铱星所有已发行普通股。这笔收购对铱星的估值约为 80 亿美元。这笔交易还需要获得铱星股东以及监管机构的批准,交易预计将于 2027年 中期完成。铱星公司目前运营的铱星卫星星座共有 80 颗卫星,其中 66 颗为活跃卫星,14 颗为备用卫星。

世界各地的学生使用智能眼镜在考试中作弊

世界各地的学生正使用 AI 驱动的智能眼镜在考试中作弊,而各国也开始加强对考生眼镜的检查。本月初的中国高考有逾千万学生参加,政府要求检查所有学生的眼镜。英格兰考试监管机构负责人警告,AI 眼镜和智能耳机等设备可能会加剧考试作弊现象。韩国报告了首起利用 AI 眼镜作弊的案例。专家担心相关个案可能预示着智能眼镜作弊正成为更为普遍的问题。研究过智能眼镜应用的 Thomas Corbin 说,“如果我们看到有些案例被报道,那么肯定还有更多案例没有被报道。”

因 AI 未能达到预期福特公司重聘人类工程师

福特公司是众多拥抱 AI 的公司之一,该公司在质检等运营环节部署了 AI。福特 CEO Jim Farley 去年 6 月表示,AI 将让大量白领落在后面。首席运营官 Kumar Galhotra 去年 10 月表示公司在整个工业系统部署 AI。福特汽车硬件工程副总裁 Charles Poon 如今表示 AI 驱动的质量检查未能达到预期。自动化工具缺少资深技术人员的训练和知识,他表示资深技术人员在公司能利用其知识改进技术前已离开了公司。这些员工已重新聘用,负责训练系统并指导年轻员工。

AMS 新发现挑战现有宇宙模型

超新星爆炸将碳、氮、氧等组成生命的元素散布到宇宙中。这些粒子成为宇宙射线在宇宙中跋涉数百万年后,如今正不断撞击地球大气。搭载于国际太空站上的阿尔法磁谱仪(AMS-02)所收集数据的新发现为我们揭开了这些粒子的奥秘。AMS-02 自 2011 年启用以来,已累积收集超过 230 兆个宇宙射线事件。研究团队深入分析了周期表上介于氦与铁之间的 20 种元素,惊人地发现这些宇宙射线并非随机分布,而是可以归纳为四个独特的类别:两种「主要宇宙射线」(源自深空未受破坏的原始射线)及两种「次要宇宙射线」(在旅途中与星际气体碰撞产生的混合粒子)。粒子中质子数的奇偶性(奇数或偶数)与其演化路径密切相关,显示恒星内部的元素合成机制深刻影响了它们在太空中的行为。新发现直接挑战了现行的宇宙射线模型。目前科学家尚无法完全解释这些观测结果,这意味着现有理论可能遗漏了某些关键物理机制。

波音 747 的落幕

波音 747 曾是美国实力、发明、进步和平民主义的象征。如今它却成为所有这些价值观衰落的缩影。从 1970 年第一架 747 投入使用,到 2023 年停产该机型,波音公司共生产了 1574 架飞机,包括两架至今仍在服役的空军一号。类似 20 世纪的大多数科技创新,747 项目也是由军方推动的。1960 年代初,波音应政府要求设计一款大型军用运输机。洛克希德赢得了竞标生产了 C-5 银河运输机。波音失利之后放手让工程师在此基础上研发出最大的商用飞机。747 的用途远不止客运和货运。NASA 曾利用一架改装过的 747 将航天飞机运送到肯尼迪航天中心。美国的这两大象征似乎预示着 20 世纪的进步永无止境。但它们终究会落幕。航天飞机项目于 2011 年终止,而 747 飞机也逐渐从天空消失。如今亲眼目睹 747 变得越来越难,尤其是在美国。Atlas Air 和 Kalitta 仍在运营部分波音 747。汉莎航空运营着波音 747 客运航班最多的航线,大韩航空仍在运营 747 的国际航线,中国、伊朗和俄罗斯用它执飞类似巴士的国内航线。

灵晟超算使用的 LX2 处理器

Top500 上周公布了最新的超算榜单,深圳国家超算中心的灵晟首次亮相即登顶榜单。灵晟超算在 Linpack 测试中比排名第二的美国劳伦斯利弗莫尔国家实验室 El Capitan 超算快 22%,在 HPCG 测试中快 26%。它是首个仅靠 CPU 实现持续双精度浮点性能逾 2 Exaflops 的超算系统,美国的超算使用了 GPU 加速器。据 Chips and Cheese 根据相关幻灯片和相关 arXiv 论文报道,灵晟使用的 LX2 CPU 是基于 ARMv9.2 架构,支持 Scalable Matrix Extension(SME)指令集。相比下日本 ARM 超算富岳(Fugaku)是基于 ARMv8 架构,在今天已经相当老了。LX2 的每个核心都有 32 KB 的 L1 指令缓存和 32 KB 的 L1 数据缓存。芯片由两个计算模块(die)组成,每个模块包含四个 40 核心簇。每个簇有 2 个核心被禁用,因此每个簇有 38 个活跃核心,每个模块有 152 个活跃核心。每个簇配备 28.5 MB 的 L2 缓存,每个模块有 114 MB 的 L2 缓存,整个 LX2 封装有 304 个活跃核心和 228 MB 的总 L2 缓存。304 个核心以 1.55 GHz 运行,每个 LX2 CPU 提供 60.3 TFLOP/s 的 FP64 计算性能,功耗为 690 瓦。LX2 配备了八个“高带宽内存”,带宽为 4 TB/s(另一篇报道称 4 TB/s per chiplet,8 TB/s per socket)。所谓的高带宽内存可能不是 HBM。灵晟超算系统包含了逾 22,000 个节点和 1379 万个 CPU 核心。

Gartner 预测两年内开发者的 AI token 费用将超过其薪水

Gartner 预测两年内开发者的 AI token 使用费用将超过其薪水。预测将开发者的薪水设定为每月 2000 美元,因此这并不意味着所有地区的 AI token 使用费用将会达到或超过开发者薪水,美国的开发者年薪通常高达六位数。Gartner 高级首席分析师 Nitish Tyagi 指出,在极端情况下美国开发者的 AI token 使用费用也可能会超过其薪水,有些开发者每月的 AI 支出会超过数万美元。美国部分科技公司也已经开始要求其员工控制 AI token 的使用。Tyagi 称,企业必须监管和控制 AI token 的使用,否者 AI 工具费用的增长速度可能会超过带来的生产力提升。

中国的肥胖危机

中国近四年来人口持续下降,超重肥胖率却加速上升,而且年轻化趋势明显,这意味着未来健康劳动力可能减少。国家卫健委、中国营养学会的《中国肥胖预防和控制蓝皮书》等公开数据显示,1992 年全国成人超重肥胖率为 27.2%,2002 年升至 29.9%,2005 年则加速增长到 42.3%,2020 年为 50.7%,2023 年则达到 57%。研究预测,若超重肥胖趋势得不到遏制,2030 年中国成人、儿童超重肥胖率将分别达 70.5% 和 31.8%。《柳叶刀》报告显示,中国 25 岁及以上的成年超重和肥胖患者人数早在 2021 年已达 4.02 亿。

微软被控为 OpenAI 构建新超算鼓励侵犯版权

在 Cox Communications, Inc. v. Sony Music Entertainment 一案中,索尼等唱片公司指控 Cox 在其用户的侵权活动中是共谋犯,需要承担侵权责任。今年 3 月美国最高法院站在了 Cox 这边,以 9 比 0 裁定 Cox 不用对其用户的行为承担共同责任。本案就间接侵权设立了新标准,从此之后原告必须证明被告故意诱导他人实施非法行为。《纽约时报》根据最高法的裁决修改了诉讼,指控微软积极鼓励 OpenAI 窃取其受版权保护的作品。诉讼称,微软的新超算是专门帮助 OpenAI 侵权而定制的,其目的就是在未经许可的情况下训练 AI 处理受版权保护的作品,该系统特别授予纽约时报文章更高的权重。《纽约时报》指控,通过建造新超算,微软不仅帮助 OpenAI 选择侵权作品,还提供了一种未经许可获取受版权保护作品的手段。

Linux 7.2-rc1 释出

Linus Torvalds 在内核邮件列表上宣布释出 Linux 7.2-rc1。主要变化包括:Cache Aware Scheduling,性能优化、修复 Linux 7.1 的新 NTFS 驱动,完全移除 strncpy API,新 ARCTIC Fan Controller 驱动,AMD ISP4 驱动,初步支持 AMDGPU HDMI 2.1 FRL,等等。

中国 AI 短剧生态

AI 短剧近月来成了中国影视业炙手可热的风口。在这场变革中,创作者借助低门槛的 AI 生成技术,快速制作具画面、对白和音效的短剧,大幅压缩了传统制作所需的人力和时间。古装题材 AI 短剧《霍去病》等爆款作品的出现,让不少业者看到一片商业蓝海。据媒体报道,中国今年注册的 AI 短剧企业超过 2100 家。然而爆款剧终究是少数——截至今年 2 月上线的超过 12 万部 AI 短剧中,播放量破亿的低于 150 部,爆款率约千分之一。西红柿影业董事长陈健受访时坦言:“对于承制 AI 短剧的公司,扣除算力和员工成本,利润没剩多少。钱基本都给头部大厂赚走,我们更像是血汗工厂。”陈健说:“有时发出 100 次指令,只有一次能换来想要的效果……原以为能赚钱,后来发现是亏的,因为修改太多了,推高了人力和算力成本。”AI 虽提升了生产力,却未必能改变短剧业内卷和利益分配的既有逻辑。许多业者发现,真正赚钱的是平台公司和少数头部短剧企业,多数中小型承制公司则只能艰难求生。

币安因未取得牌照停止在欧盟提供服务

币安已告知欧盟客户,由于该公司无法取得在欧盟运营所需的牌照,将从下周起停止向他们提供服务。这对这家全球最大的加密货币交易所来说,是一次重大挫折。自 7 月 1 日起,所有在欧盟经营的加密货币公司,都必须依据欧盟《加密资产市场监管条例》(Markets in Crypto-Assets Regulation, MiCA)持牌运营,否则将面临处罚。币安此前在希腊申请一张可覆盖整个欧盟的牌照,但上周遭到拒绝,距离最后期限生效已不足两周。该交易所目前计划转向法国申请牌照,此前曾在法国就牌照问题进行过磋商。

苹果寻求从长鑫采购内存

苹果正游说特朗普政府,希望获准从长鑫存储采购内存。长鑫已被五角大楼列入黑名单。此举旨在缓解内存芯片价格上涨给公司带来的财务压力。游说行动凸显了美国科技巨头面临的困境:内存芯片价格飙升与华盛顿对中国芯片制造商的国家安全限制相冲突。长鑫于 2025 年被列入美国商务部的实体清单,美国公司未经许可不得向清单上的公司出口商品、软件和技术,而申请此类许可大概率会被拒绝。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…