TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0907
THU, JUN 25, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · THU, JUN 25, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 25, 2026

Here is a summary of today's main news events, based on the information provided.

JPMorgan Announces Major Leadership Shake-Up

JPMorgan Chase named Doug Petno and Troy Rohrbaugh as co-presidents in a significant management restructuring. The move is widely seen as part of the succession plan for longtime CEO Jamie Dimon and coincides with the departure of prominent executive Marianne Lake.

Micron's Strong Earnings Report Boosts Tech Stocks

Chipmaker Micron reported better-than-expected financial results, driven by surging demand for its memory chips used in AI systems. The positive news helped calm investor fears after a recent market sell-off, lifting semiconductor stocks and Nasdaq futures.

Oil Prices Drop as Persian Gulf Supply Reopens

Global oil prices fell to levels seen before the recent conflict as a large supply of oil that was trapped in the Persian Gulf began moving through the Strait of Hormuz. This increase in available supply has eased market concerns and driven prices lower.

AI Firm Anthropic Accuses Alibaba of Illicitly Accessing Its Technology

U.S. artificial intelligence lab Anthropic has accused Chinese e-commerce giant Alibaba of illegally accessing its advanced "Claude" AI model. The company claims Alibaba did this to harvest its capabilities, highlighting growing tensions and competition in the global AI race.

Merck Finalizes Its Largest Acquisition in Over a Decade

German life sciences company Merck announced the acquisition of a U.S. medical tools manufacturer. The deal is the company's biggest in more than 10 years and represents a major strategic move by its new CEO to expand its business.

Deadly Earthquakes Devastate Venezuela

A series of powerful earthquakes has struck Venezuela, resulting in widespread devastation. The country's acting president, Delcy Rodríguez, confirmed that at least 164 people have died, warning that the final death toll is expected to rise sharply.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 25, 2026

Hacker News Feed: Highlighting key posts and discussions.

Half-Life 2 in a Browser

(hl2.slqnt.dev)

421168
Ending respiratory infections

(blog.interceptfund.com)

177100
Dostoyevsky isn't difficult

(www.autodidacts.io)

179221
The Xteink X4 E-Ink Reader

(blog.omgmog.net)

286165
Stealing Is a Skill

(ben-mini.com)

239141
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - June 25, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

Are We Ready For An Agent-Native Memory System?

Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic lifecycle governance throughout agent execution. Despite this evolution, existing evaluations still benchmark agent memory mainly through end-to-end task success metrics (e.g., F1, BLEU), while treating the underlying system as a monolithic black box. As a result, critical system-level concerns, including operational costs, architectural trade-offs across memory modules, and robustness under dynamic knowledge updates, remain insufficiently explored. In this paper, we present a systematic experimental study of agent memory from a data management perspective. We propose an analytical framework that decomposes agent memory into four core modules: memory representation and storage, extraction, retrieval and routing, and maintenance. Under this framework, we evaluate 12 representative memory systems and two reference baselines across five benchmark workloads spanning 11 datasets. Our extensive end-to-end evaluation shows that no single architecture dominates across all scenarios; instead, effectiveness depends heavily on how well the memory structure aligns with the workload bottleneck. Furthermore, through fine-grained ablation studies, we quantify their individual effects on representation fidelity, retrieval precision, update correctness, and long-horizon stability. Finally, we reveal cost-performance trade-offs under realistic workloads, showing localized maintenance is more cost-efficient than global reorganization. Based on these findings, we identify promising directions towards building truly agent-native memory systems. The code is publicly available at https://github.com/OpenDataBox/MemoryData.

82
DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation

Open domain subject-driven text-to-video (S2V) generation has drawn significant interest in academia and industry. Open domain S2V mainly involves two scenarios: in-domain, which requires retaining the reference subject features as much as possible, and cross-domain, which preserves the intrinsic features of the subject while allowing subject-irrelevant properties to vary flexibly according to the text prompt. Existing methods primarily focus on maximizing subject fidelity in in-domain scenarios, which limits their editability and adaptability in cross-domain scenarios, such as novel styles, semantic combinations, or domain attributes. In this study, we propose that an ideal S2V method should flexibly shuttle between different domains, achieving strong performance in both in-domain and cross-domain scenarios. To this end, we propose DomainShuttle, which could achieve high fidelity and generative flexibility for open domain video personalization. Specifically, we introduce Domain-MoT, which decouples videos and reference features and introduces the domain-aware AdaLN for domain-specific modeling of reference images. We then introduce the Video-Reference DualRoPE scheme, which places reference image tokens and video tokens in separate RoPE spaces to enable precise subject-level spatial modeling, and Cross-Pair Consistent Loss, which aims to extract intrinsic subject features unaffected by irrelevant features. Extensive experiments demonstrate that DomainShuttle achieves significant performance improvements over existing methods, exhibiting high subject fidelity and generative flexibility across diverse open domain application scenarios.

50
ShutterMuse: Capture-Time Photography Guidance with MLLMs

Real-world photography requires capture-time guidance for both camera framing and subject pose. Yet existing aesthetic cropping benchmarks mainly evaluate post-hoc crop prediction and overlook subject-side recommendations, leaving the capture-time guidance capabilities of multimodal large language models (MLLMs) underexplored. To address this gap, we introduce CaptureGuide-Bench, a benchmark with two complementary tasks: photographer-side composition decision and refinement, and subject-side scene-conditioned pose recommendation. Our evaluation reveals limitations: general-purpose MLLMs can make composition decisions but lack precise refinement localization, while specialized aesthetic cropping models localize crops effectively but are limited to refinement; neither provides actionable pose guidance. To support model development, we further construct CaptureGuide-Dataset, comprising 130K samples with textual rationales and structured visual annotations, and develop ShutterMuse, a unified MLLM trained with supervised and reinforcement fine-tuning. Experiments on CaptureGuide-Bench show that ShutterMuse achieves the best overall photographer-side performance among evaluated baselines and competitive subject-side pose recommendation with substantially lower inference cost, demonstrating the potential of MLLMs as interactive assistants for photography during image capture.

36
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

We present Wan-Streamer, a native-streaming, end-to-end interactive foundation model designed from the ground up for real-time, low-latency, full-duplex audio-visual interaction. Wan-Streamer seamlessly models language, audio, and video as both input and output within a single Transformer, where the sequence is represented as interleaved visual, audio, and text input tokens together with visual, audio, and text output tokens, coordinated by block-causal attention for incremental streaming. Unlike cascaded interactive systems that rely on separate VAD, ASR, language, TTS, audio-driven animation, or video-generation modules, Wan-Streamer does not rely on external language, speech, avatar, or video-generation modules: perception, reasoning, generation, response timing, turn management, and cross-modal synchronization are learned jointly within one unified model, reducing pipeline latency and error accumulation. To support natural audio-visual responsiveness, we redesign the entire stack around streamability, including causal encoders, causal decoders, block-causal attention, and low-latency multimodal token scheduling, enabling streaming units as short as 160 ms at 25 fps. Wan-Streamer achieves approximately 200 ms model-side response latency and approximately 550 ms total interaction latency when combined with 350 ms bidirectional network latency, supporting sub-second duplex audio-visual communication. These results position Wan-Streamer as a unified, end-to-end, multimodal interactive foundation model for low-latency streaming interaction.

36
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence

While Large Language Models (LLMs) have substantially advanced text-to-code synthesis, many real programming tasks specify intent through visual artifacts such as screenshots, charts, vector drawings, videos, and interactive states. These tasks require models to connect visual perception to executable programs, because correctness depends not only on syntax but also on layout, data semantics, interaction behavior, and domain-specific constraints that apply after execution. This survey examines Multimodal Code Intelligence, covering systems that generate, edit, refine, or reason with code under visually grounded inputs and outputs. We first formulate the field by the role that code plays in each task, distinguishing code as a rendered artifact, an editable symbolic structure, a scientific representation, an intermediate reasoning trace, or an executable policy or tool interface. We then organize benchmarks and methods into four domains: Graphical User Interface, Scientific Visualization, Structured Graphics, and Frontier Tasks and Frameworks. This taxonomy connects mature artifact-generation problems to emerging agentic and unified settings and allows us to compare how different tasks treat evidence of correctness. Looking ahead, we argue that future research may benefit from four verification-centered directions. Multi-signal validation can combine complementary evidence of correctness, multi-state verification can test behavior across execution trajectories, cross-task transfer testing can probe reusable visual-code skills, and verifiable agent traces can reveal whether agent actions are grounded in visual evidence. Together, these directions may move this field from single-output imitation toward evidence-grounded executable systems. An ongoing project and resources are available on https://github.com/xjywhu/Awesome-Multimodal-LLM-for-Code{GitHub}.

26
Improved Large Language Diffusion Models

Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention. iLLaDA keeps the masked diffusion objective throughout pre-training and supervised fine-tuning (SFT), scaling pre-training to 12T tokens and fine-tuning on a 25B-token instruction corpus for 12 epochs. We further use variable-length generation for efficiency and introduce confidence-based scoring for multiple-choice evaluation. Compared with LLaDA, iLLaDA improves broadly across general, mathematical, and code benchmarks; for example, iLLaDA-Base improves by 21.6 points on BBH and 14.9 points on ARC-Challenge, while iLLaDA-Instruct improves by 14.5 points on MATH and 16.5 points on HumanEval. Despite its non-autoregressive training, iLLaDA also remains competitive with Qwen2.5 7B on several benchmarks. These results show that fully bidirectional diffusion training from scratch is a competitive path toward strong language models. Model weights and codes: https://github.com/ML-GSAI/LLaDA.

25
MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidelity with respect to the reference video. Existing methods based on explicit 3D representations are limited by the accuracy of off-the-shelf reconstruction modules, which often produce inaccurate geometry for dynamic objects in monocular videos. In contrast, camera-conditioning-only methods can achieve high visual quality but often struggle to preserve geometric and motion consistency. In this work, we introduce MVTrack4Gen (Multi-View point Tracking for Novel-View Generation), a motion-aware training framework that leverages multi-view point tracking as an additional geometric and motion supervision signal for camera-conditioning-only novel-view video diffusion models. Our key finding is that specific attention layers encode strong correspondence cues, where query features attend to key features at geometrically corresponding locations across views and over time, and the misalignment of these correspondences causes motion inconsistency. Based on this observation, we route these features into an auxiliary multi-view tracking head and jointly train the diffusion model with a point-tracking objective. By explicitly strengthening these motion-aware correspondences, MVTrack4Gen improves existing models to better follow the motion in the reference view and maintain cross-view geometric consistency. Across diverse benchmarks, our method achieves state-of-the-art geometric consistency and competitive camera accuracy.

24
V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

Fine-grained visual reasoning requires multimodal large language models (MLLMs) to identify task-relevant visual evidence and ground their reasoning in local image regions. Existing agentic methods typically rely on reinforcement learning with verifiable rewards or supervised fine-tuning on large-scale annotated reasoning traces, leading to costly exploration, hand-designed verification rules, or heavy dependence on textual supervision. A natural way to avoid such external answer labels is to learn from trajectories sampled by the student itself, which points to On-Policy Distillation (OPD). To understand what OPD can and cannot provide for visual reasoning, we revisit it as negative-free stop-gradient alignment. This perspective shows that, although OPD provides effective token-level correction, its ceiling is constrained by the absence of trajectory-level discrimination. Motivated by these observations, we propose V-Zero, an answer-label-free framework for visual reasoning with contrastive evidence gating. V-Zero uses no annotated textual answer labels; instead, during training it pairs a question-relevant regional crop with a negative visual view to evaluate student-sampled trajectories and gate dense token-level distillation. Experiments on multiple visual reasoning benchmarks show that V-Zero consistently improves fine-grained visual reasoning while preserving strong generalization. Notably, V-Zero is more than 5times faster than previous supervised fine-tuning methods and more than 10times faster than reinforcement learning baselines. Code and dataset will be released at https://github.com/eVI-group-SCU/V-Zero

19
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

Generating a coherent multi-shot video requires structured cross-shot memory. Subject appearance, scene context, and speaker identity must persist across cuts. Existing approaches either train end-to-end over fixed-length sequences and cannot scale, generate shot-by-shot with memory banks that grow linearly, or orchestrate pretrained generators under an LLM planner without a multi-shot-aware backbone. We present UnityShots, a memory-driven multi-shot audio-video generation system built on LTX-2.3, trained on annotated cinematic and music-video shots. The video stream maintains two fixed-size slots, a long-term memory (LTM) slot anchored to the opening shot and a short-term memory (STM) slot holding the immediately preceding tail, both updated at every cut by a boundary-conditioned gate that fuses visual cut probability and beat-tracker signals. The audio stream injects a reference speaker token at every shot to preserve vocal timbre without a sliding audio bank. A discrete cut-type prior, learned through AdaLN, becomes an inference-time control knob over transition strength. We release a benchmark of 200 multi-cultural multi-shot sequences spanning six ethnic regions and ten or more languages, with per-shot reference identities, reference audio, and per-boundary transition labels. Evaluated across I2V, T2V, and R2V conditioning modes, UnityShots leads open-source baselines on every cross-shot coherence metric and matches the strongest closed-source system on the multi-shot axes.

17
EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulation models including π_0, π_{0.5}, XVLA, and InternVLA-A1, and reveal that models with near success rates exhibit strikingly different capability profiles: π_{0.5} achieves the highest test success rate and the best train--test retention, whereas InternVLA-A1 dominates mobile manipulation but collapses on dexterous tasks, and XVLA exhibits strengths on a disjoint set of atomic skills compared to other policies. Beyond capability profiling, EBench analyzes the generalization ability from 4 representative perspectives, identifying the impact of different distribution shift factors. The results reveal strengths and weaknesses of models behind an overall score. We hope this benchmark offers a broad set of diagnostic signals to guide iteration on generalist manipulation models.

13
IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layouts must be preserved. We attribute this limitation in part to the entanglement of structural planning and appearance rendering within a single conditioning stream. To address this issue, we propose Implicit Visual Chain-of-Thought (IV-CoT), a latent visual reasoning framework for query-conditioned image generation. IV-CoT decomposes the visual conditioning queries into a structural-to-semantic cascade, where structural queries first form a latent visual plan and semantic queries then render appearance conditioned on this plan. To guide the structural queries, we introduce training-only sketch supervision, which encourages them to capture structure from sketches without requiring sketch extraction or intermediate decoding at inference time. IV-CoT performs implicit CoT reasoning in a single forward pass and achieves superior results on GenEval and T2I-CompBench. Visualizations and analyses demonstrate that the learned structural and semantic queries play complementary roles in structure-aware generation.

13
Causal-rCM: A Unified Teacher-Forcing and Self-Forcing Open Recipe for Autoregressive Diffusion Distillation in Streaming Video Generation and Interactive World Models

Autoregressive video diffusion with causal diffusion transformers has emerged as a major paradigm for real-time streaming video generation and action-conditioned interactive world models. In this work, we extend rCM, an advanced diffusion distillation framework, to autoregressive video diffusion. The core philosophy of rCM lies in the complementarity between forward and reverse divergences, represented by consistency models (CMs) and distribution matching distillation (DMD), respectively, in diffusion distillation. This philosophy naturally carries over to the autoregressive setting, where teacher-forcing (TF) provides an offline, forward-divergence causal training paradigm, while self-forcing (SF) corresponds to an on-policy, reverse-divergence refinement. Our contributions are: (1) through extensive experiments, we show that teacher-forcing CM is currently the best complement to self-forcing DMD as an initialization strategy (2) we present the first implementation of teacher-forcing-based continuous-time CMs (e.g., sCM/MeanFlow) for autoregressive video diffusion, enabled by our custom-mask FlashAttention-2 JVP kernel, achieving 10times faster convergence compared to discrete-time CMs (dCMs) (3) we introduce Causal-rCM, a leading, unified, and scalable algorithm-infrastructure open recipe for diffusion distillation and causal training (4) we achieve state-of-the-art streaming video generation performance in both frame-wise and chunk-wise settings, using only synthetic data for training. Notably, our distilled 2-step causal Wan2.1-1.3B model achieves a VBench-T2V score of 84.63 with only 1 or 2 sampling steps. We further apply Causal-rCM to Cosmos 3, an advanced omnimodal world foundation model for physical AI with action-conditioned generation capability, enabling an interactive world model.

10
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building great agentic systems requires understanding every layer of the pipeline, not just one. The book opens with the LLM substrate -- transformer architecture, GPU systems, training and fine-tuning (SFT,LoRA, MoE), model compression, and inference optimization -- treated as essential foundations rather than the primary focus. It then develops the alignment and reasoning layer: reinforcement learning from human feedback (RLHF), PPO, DPO and its variants, GRPO, reward modeling, and RL for large reasoning models including chain-of-thought and test-time scaling. The second half is devoted to agentic AI proper. Topics include agentic training and trajectory-based RL, retrieval-augmented generation (RAG and Agentic RAG), memory systems (in-context, external, episodic, and semantic), agent harness design and context management, and a taxonomy of agent design patterns. Inter-agent coordination is covered in depth: the Model Context Protocol (MCP), agent skills and tool use, the Agent-to-Agent (A2A) communication protocol, and multi-agent architectures spanning centralized, decentralized, and hierarchical topologies. The book concludes with agent development frameworks, agentic UI design, evaluation methodology for agentic tasks, and production deployment. Each chapter pairs rigorous theoretical foundations with implementation guidance, code examples, and references to the primary literature.

8
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

Chain-of-Thought (CoT) has become a standard method for improving reasoning capabilities in large language models (LLMs) by eliciting step-by-step thinking, but its effectiveness in multimodal tasks remains unclear. In this paper, we aim to systematically investigate the key question: What can multimodal Chain-of-Thought reasoning do, and where and why does it fall short? To this end, we evaluate 12 multimodal tasks across perception and reasoning categories using both 14 non-reasoning models and 8 reasoning models. Our analysis reveals several important findings: (1) CoT is not a free lunch and should be used selectively depending on the specific requirements of each task. For perception tasks, CoT can lead to undesirable side effects, such as reduced performance in visual grounding and object counting. In contrast, it proves effective for reasoning tasks involving mathematical, scientific, and multi-image reasoning; (2) Compared to original models, existing open-source multimodal reasoning models often yield only marginal overall improvements, possibly due to an overemphasis on mathematical reasoning at the expense of broader capabilities; (3) Visual reasoning remains a key bottleneck for current multimodal CoT, as models exhibit a Look Light, Think Heavy pattern where verbal reflection rises and falls during reasoning, whereas visual reflection consistently diminishes. These findings suggest that while multimodal CoT handles verbal reflection relatively well, it lacks the ability to maintain deep visual introspection throughout the reasoning process.

7
Autodata: An agentic data scientist to create high quality synthetic data

We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical implementation, Agentic Self-Instruct. We conduct experiments on computer science research tasks, legal reasoning tasks and reasoning with mathematical objects, where we obtain improved results compared to classical synthetic dataset creation methods. Further, meta-optimizing the data scientist agent itself delivers an even larger performance uplift. Agentic data creation provides a way to convert increased inference compute into higher quality model training. Overall, we believe this direction has the potential to change the way we build AI data.

7
TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

While Video Virtual Try-on (VVT) has achieved remarkable progress in synthesizing realistic garment overlays on dynamic subjects, existing paradigms remains fundamentally constrained by a passive dependency on source camera trajectories, failing to accommodate the requisite interactive freedom for omnidirectional viewpoint exploration. To address this limitation, we define a pioneering research frontier: Camera-controllable Video Virtual Try-on (CaM-VVT). Unlike conventional VVT, CaM-VVT not only necessitates viewpoint-agnostic texture hallucination but also strict structural synchronization between non-rigid human dynamics and background contexts under arbitrary, unconstrained camera movements. To tackle these challenges, we present TryOnCrafter, the first unified DiT-based framework specifically architected for the CaM-VVT task. Departing from implicit pixel-space manipulation, we introduce a Renderable 4D Try-on Proxy that explicitly decouples the human subject from the environment. This is achieved by distilling high-fidelity 2D try-on priors into a clothed 3DGS-based avatar, which is subsequently animated via SMPL-X sequences and metric-aligned into a reconstructed background point cloud. This proxy establishes a robust structural foundation with superior texture density and motion integrity. Our Proxy-Anchored Video DiT leverages this robust structural foundation as a primary geometric anchor, ensuring that the synthesized photorealistic videos are strictly constrained by prescribed trajectories and physically plausible deformations. Benefiting from the inherent editability of the 4D proxy, TryOnCrafter facilitates diverse downstream applications, including human relocalization, ``bullet time'' effects, and 360-degree orbital viewing.

6
Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods

WordArt (artistic text) features highly customized fonts, textures, and layouts, making WordArt-oriented scene TExt Recognition (WATER) substantially more challenging than general Scene Text Recognition (STR). Existing STR datasets and methods, typically built around regular scene text and fixed-template inputs, struggle to scale to WATER. Thus, we aim to advance this task from both data and model perspectives. On the data side, we construct a 2M synthetic dataset, WATER-S, with the scale improved by hundreds of times compared to existing artistic text data. WATER-S consists of two complementary subsets. One rendered by an upgraded rendering pipeline (SynthWordArt), which provides highly accurate and controllable synthetic WordArt data. The other is generated by combining Qwen3-VL for prompt mining and Z-Image for image synthesis, which improves the coverage of realistic and diverse data. On the model side, we propose WATERec. It adopts an visual encoder supporting arbitrary-shaped inputs and an autoregressive decoder to model complex layouts, structurally breaking the bottleneck of fixed-template STR on WordArt. Experiments show that this architecture outperforms prior STR methods, achieving state-of-the-art performance on irregular texts such as WordArt. Together with WATER-R, carefully reorganized from existing real STR data, our strong baseline with the new synthetic data and model design reaches 90.40% accuracy on WordArt-Bench, surpassing both general-purpose and OCR-specialized vision-language models by a large margin. Code and data are available at https://github.com/YesianRohn/WATER.

6
ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

On-policy distillation (OPD) improves LLM reasoning by training a student model on its own generated outputs, but standard OPD treats all student-generated outputs (SGOs) equally regardless of their informativeness. We observe a consistent asymmetry in controlled filtering experiments: in both OPD and on-policy self distillation (OPSD), training only on incorrect SGOs outperforms training only on correct ones. Our further analysis suggests that models trained on correct-only SGOs tend to generate shorter reasoning traces and show weaker reflection behavior, while incorrect SGOs better preserve exploratory reasoning near the model's capability boundary. To exploit this signal without requiring full answer-containing rollouts, we introduce ReNIO, which Reweights Negative trajectory Importance for LLM On-policy distillation. By using the student-to-teacher probability ratio, ReNIO identifies pivotal tokens leading to wrong reasoning traces and aggregates their information into a normalized sample weight, inherently assigning larger weights to likely negative trajectories without observing the correctness of final-answer. Since Re-NIO only uses prefix-conditioned token probabilities, it preserves OPD's prefix training advantage over full-rollout reinforcement learning. Across both mathematical reasoning and code generation tasks, ReNIO improves both OPD and OPSD, with representative relative gains of up to 8.90% for Qwen3-1.7B and 10.00% for R1-Distill-Qwen-7B on mathematical reasoning benchmarks. Code repo: https://github.com/BDML-lab/ReNIO.

5
RL-Index: Reinforcement Learning for Retrieval Index Reasoning

Retrieving external knowledge is essential for solving real-world tasks, yet it remains challenging when the relationship between a query and its relevant knowledge involves implicit and complex reasoning beyond surface-level semantic or lexical matching (e.g., mathematical problems relying on the same theorem or coding requiring deep reasoning). Existing approaches primarily rely on query-side reasoning (e.g., query rewriting), which introduces significant online latency and underutilizes the opportunity to perform reasoning over the knowledge corpus itself (i.e., index-side reasoning). In this paper, we propose RL-Index, an agentic indexing framework that formulates retrieval index reasoning as a reinforcement learning problem. Instead of performing reasoning at query time, RL-Index shifts reasoning to the indexing stage by augmenting documents with LLM-generated rationales that explicitly encode the latent query-knowledge relationship. To optimize the quality of these rationales, we employ Group Relative Policy Optimization (GRPO) and use retrieval similarity as a verifiable reward signal, enabling direct optimization of indexing decisions for retrieval effectiveness. Extensive experiments on the BRIGHT benchmark demonstrate that RL-Index consistently improves both retrieval and downstream question-answering performance, while significantly reducing online inference latency. Moreover, the learned rationale augmentation generalizes across diverse retrievers and generators, highlighting its robustness as a plug-and-play indexing strategy across different retrieval systems.

5
CAVEWOMAN: How Large Language Models Behave Under Linguistic Input and Output Compression

"Talk short. Drop grammar. Save token." This caveman style is widely promoted as a way to cut inference cost, but whether it actually saves anything depends on which channel (the user's prompt or the model's response) is being compressed. We present Cavewoman, a two-channel evaluation protocol that scores every generation on task accuracy, realized per-item cost, and reference-text agreement against the model's unconstrained reference. We evaluate eight models on five datasets at five reduction levels, with both channels measured on the same items. Output compression cuts realized cost on most API models (1.4-2.4x per model, up to 3x in the best case) and on all four open-weight models under public-tier pricing. Input compression has the opposite effect, a strict lose-lose: it raises net cost rather than lowering it (~1.15x on the five-benchmark mean, up to 1.8x on the worst dataset and 2.7x under stronger compression), because models compensate with longer responses even as accuracy collapses. Under the same setting, surface text diverges from the unconstrained reference: on the non-reasoning models, roughly half of all generations are correct yet their surface text no longer entails the model's own unconstrained baseline generation. The divergence survives length-controlled re-scoring, multiple-comparisons correction, and replication under complementary semantic measures. Code and data are available at https://github.com/danielle34/cavewoman.

4
When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.

4
RoPE-Aware Bit Allocation for KV-Cache Quantization

Existing low-bit KV-cache quantizers often treat each cached key as a flat vector. Under RoPE, however, a key's contribution to a future attention logit decomposes into a position-dependent sum over two-dimensional frequency blocks. This makes key-cache quantization a block-wise bit-allocation problem: high-energy RoPE blocks are more sensitive to quantization error and should receive more bits. We introduce Block-GTQ, a RoPE-aware bit allocator for key-cache quantization built on TurboQuant-MSE(TQ-MSE). For each layer and KV head, Block-GTQ computes a label-free energy score for each RoPE block and greedily allocates integer bit widths by marginal gain. Under matched K/V bit budgets, Block-GTQ better preserves RoPE query-key logits on a ten-model diagnostic panel, cutting per-layer MAE by 32-80% at 2 and 3 b/dim K-only quantization and winning all 367/367 layer comparisons against uniform TQ-MSE. These fidelity gains translate to stronger downstream long-context retrieval, understanding, and reasoning. At K2V2 on Llama-3.1-8B-Instruct, Block-GTQ raises the six-task NIAH average from 70.6 to 97.4, and the LongBench-EN average from 36.87 to 53.31. On AIME 2024/2025 with DeepSeek-R1-Distill-Qwen-7B, without an fp16 recent-key buffer, Block-GTQ at K3V2 scores 51.7/37.5, close to fp16's 54.2/37.9, whereas uniform TQ-MSE collapses to 0.0/0.0. We further implement a packed-cache serving path. On a single H800 GPU with Qwen2.5-3B-Instruct, packed K3V3 achieves 3.24x KV-cache compression with fp16-comparable quality, runs 1.34x faster than fp16 FlashAttention2 at 128K context, reduces peak memory from 56.31 GB to 19.85 GB, and remains feasible at 256K and 512K where fp16 OOMs. Code is available at https://github.com/JIA-Lab-research/blockgtq.

4
What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

Jailbreak attacks reveal a persistent weakness in aligned Large Language Models: carefully crafted prompts can elicit policy-violating responses despite safety training. While most defenses operate at the prompt or output level, it remains unclear how harmful intent is encoded within the model's internal representations. We investigate this question by analyzing token-level predictive entropy trajectories across layers of a frozen LLM using the logit lens. We find that static aggregate statistics of prompt-level entropy (e.g., mean, variance) carry little discriminative signal, whereas features capturing how entropy evolves across token positions, such as monotonic rank-based trend scores, are substantially more informative. Importantly, this signal is not uniform across model depth: it is concentrated in intermediate layers and degrades at the final layer, indicating that jailbreak-relevant structure is most pronounced in mid-network representations rather than at the output head. Across multiple models (Llama, Qwen, Gemma) and adversarial benchmarks, these entropy dynamics provide architecture-consistent separation without additional training. Together, our findings show that jailbreak behavior is reflected in structured intermediate uncertainty dynamics, clarifying both which entropy-derived features encode harmful intent and where in the network that signal is most pronounced.

3
Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation

Continual Test-Time Adaptation (CTTA) aims to maintain model performance under evolving target domains by adapting online without labeled data. However, practical deployments often cannot retain the source dataset due to privacy or licensing constraints, and purely source-free CTTA methods tend to become unstable under long-term distribution shift, suffering from compounding self-training errors and catastrophic forgetting. We introduce DO-ALL (Distill Once, Adapt Life-Long), a plug-and-play framework that revisits source information in a compact and privacy-conscious form via Dataset Distillation (DD). Before deployment, DO-ALL performs DD to produce a small set of synthetic distilled anchors that summarize the source distribution. During adaptation, each target sample is matched with its most semantically aligned anchor, which provides a stable reference for various CTTA via source replay, representation alignment, and manifold-smoothing regularization. DO-ALL can be seamlessly integrated into existing CTTA algorithms, consistently improving long-term robustness across CIFAR100-C, ImageNet-C, and the CCC benchmark. This demonstrates the potential of leveraging DD to enable stable and continuous adaptation without retaining raw source data. The code is available at https://github.com/blue-531/DOALL.

1
Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their interaction under joint deployment conditions remains insufficiently understood. This paper reports a reproducible phenomenon observed in a production Agent system: when Tool Calling and JSON Schema constraints are simultaneously enabled, multiple open-weight models cease invoking tools despite maintaining high schema compliance. We refer to this behavior as Tool Suppression. Through controlled experiments across multiple model families and deployment settings, we consistently reproduce Tool Suppression under joint constraints, while tool execution and schema compliance remain functional when evaluated independently. Further analysis reveals that JSON Schema constraints are compiled into grammar-based token masks, causing tool-call tokens to become unreachable during decoding. This provides an implementation-level explanation for the observed behavior. To interpret the phenomenon, we formulate the Constraint Priority Inversion (CPI) hypothesis, which suggests that schema satisfaction may dominate action-selection behavior under multiple simultaneous constraints. We present CPI as a behavioral hypothesis consistent with the observed evidence rather than a verified internal mechanism. To mitigate the problem, we propose Transparent Two-Pass Execution, an inference-time strategy that decouples tool execution from schema-constrained response generation. Experimental results show that this approach restores tool invocation while preserving structured output guarantees without requiring model retraining. These findings suggest that evaluating tool use and structured output separately may overlook important reliability issues in production Agent systems. Code, data, and docs will be released at https://github.com/Fzsama/Constrain-Tax-26-06.git.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 25, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

SendTidings icon
SendTidings

Turn your analytics into beautiful monthly email reports

0
Polygraph icon
Polygraph

Let AI agents see cross repo and maintain session memory.

0
Zaro icon
Zaro

Build agents & apps on top of your context with one prompt.

0
BrowserBash icon
BrowserBash

CLI that turns plain-English into real browser tests

0
Tough Tongue AI for Sales icon
Tough Tongue AI for Sales

Live AI teammate for every tough sales conversation

0
Paybond CLI icon
Paybond CLI

Safe agent spend from the terminal

0
Signspell icon
Signspell

Real-time ASL alphabet recognition in py ,pip install and go

0
Blop icon
Blop

Describe your app and Blop tests it and repairs broken tests

0
BrowserAct icon
BrowserAct

Web browser automation for AI agents

0
Nashra icon
Nashra

Turn followers into clients.

0
Milestones icon
Milestones

Native project planning app, now on Mac & with an MCP server

0
Oxlo.ai icon
Oxlo.ai

Scale across AI models without scaling your bill

0
Figma Motion icon
Figma Motion

Your Figma canvas now has a timeline

0
Brain² by ClickUp icon
Brain² by ClickUp

One AI that knows your entire company and acts on it

0
Grass 2.0 icon
Grass 2.0

The always-on computer for your coding agents

0
Sidegent icon
Sidegent

Learn to build AI agents by actually building them

0
Postproxy - Engagement API icon
Postproxy - Engagement API

Publish, reply, and analyze social media via API

0
Dub Ninja icon
Dub Ninja

Live autonomous AI DJ that digs, mixes & explains 24/7

0
Genspark Design icon
Genspark Design

Generate UI prototypes, videos, and posters with AI

0
SayCraft icon
SayCraft

Build a web app by talking through a meeting

0
Papermark Agents icon
Papermark Agents

Let AI agents run your next deal, fundraise or data room

0
VTT for Mac icon
VTT for Mac

Voice-to-text for macOS with a fully on-device option

0
QuickMaker icon
QuickMaker

State of the art AI models in Blender under one subscription

0
MeetPoint icon
MeetPoint

Find the city where everyone's flights are cheapest

0
Heron icon
Heron

Wireshark for AI Agents: passive eBPF observability

0
Samepage Signals icon
Samepage Signals

Your second brain for product management

0
Tencent EdgeOne Makers icon
Tencent EdgeOne Makers

Ship AI agents like web apps, in minutes.

0
Ruby icon
Ruby

Ask better questions, live on every call

0
React UI Kit V7 icon
React UI Kit V7

All the chat components you need. None of the complexity

0
Swimio icon
Swimio

AI swim coach with Apple Watch tracking & smart workouts

0
Prospector by Synter icon
Prospector by Synter

Your outbound agent, right inside Slack

0
Nimt icon
Nimt

Your AI Search Coworker in Slack

0
StaleMate PR icon
StaleMate PR

Your menu bar turns red when PRs pile up

0
Customer Relationship Agents by Clarify icon
Customer Relationship Agents by Clarify

The M in CRM shouldn't be you

0
Crewdle AI icon
Crewdle AI

Use every business AI tool without every subscription

0
FUTO Swipe icon
FUTO Swipe

Open models for on-device swipe typing

0
Stripe.Directory icon
Stripe.Directory

New way for you & agents to search for businesses on Stripe

0
Buy by Agentcard icon
Buy by Agentcard

Order DoorDash from Claude

0
Mindstone Rebel icon
Mindstone Rebel

AI workspace for agents that know your work and ask first

0
Propane icon
Propane

Automatic customer context for product teams and agents

0
Thumbmagic icon
Thumbmagic

AI thumbnail generator trained on top-performing thumbnails

0
Bluerails Discovery icon
Bluerails Discovery

The rails AI agents use to find and pay you

0
Blazly SEO icon
Blazly SEO

Dominate SEO with an AI content operating system

0
OpenArt Director icon
OpenArt Director

Direct cinematic videos through chat

0
prepros icon
prepros

Run your brand shoots from start to finish

0
wildbirds icon
wildbirds

Birdwatchers app to share and discover birds socially

0
Amnesia icon
Amnesia

A Mac app that asks why you opened that tab

0
BestDefense.io icon
BestDefense.io

Pentest and patch every deploy with AI

0
Deckwise icon
Deckwise

AI presentation agent for editable decks

0
Conduit icon
Conduit

Fix the tool-list bloat slowing your AI agent

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 25, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Sources: Google expands the scope of its months-old AI coding strike team to "midtraining" to better compete with Anthropic, after key executive departures (Erin Woo/The Information)
Source: TechmemePublished: Jun 25, 2026

Erin Woo / The Information : Sources: Google expands the scope of its months-old AI coding strike team to “midtraining” to better compete with Anthropic, after key executive departures —  Google is reorganizing its recently launched strike team working on AI coding tools to try to catch up with Anthropic …

Sail, whose software optimizes how AI models run on existing chips, emerges from stealth with $80M in seed and Series A led by Kleiner at a $450M valuation (Lily Mae Lazarus/Fortune)
Source: TechmemePublished: Jun 25, 2026

Lily Mae Lazarus / Fortune : Sail, whose software optimizes how AI models run on existing chips, emerges from stealth with $80M in seed and Series A led by Kleiner at a $450M valuation —  For months, Kleiner Perkins partner Aditya Naganath had been mulling over his investing thesis that the next wave of AI wasn't going to be a chatbot …

Apple says "the rapid expansion of AI data centers has created an extraordinary surge in demand for memory and storage"; the MacBook Neo rises to $699 from $599 (Mark Gurman/Bloomberg)
Source: TechmemePublished: Jun 25, 2026

Mark Gurman / Bloomberg : Apple says “the rapid expansion of AI data centers has created an extraordinary surge in demand for memory and storage”; the MacBook Neo rises to $699 from $599 —  Apple Inc. took the extreme measure of raising prices of Macs, iPads, home devices and the Vision Pro on Thursday …

Apple raises its Mac and iPad prices 15% to 25%, saying it's "never seen a component price increase this much, this quickly", but keeps iPhone prices unchanged (Rolfe Winkler/Wall Street Journal)
Source: TechmemePublished: Jun 25, 2026

Rolfe Winkler / Wall Street Journal : Apple raises its Mac and iPad prices 15% to 25%, saying it's “never seen a component price increase this much, this quickly”, but keeps iPhone prices unchanged —  Increases come a week after Tim Cook said higher memory costs made them ‘unavoidable’

Australian payments startup Airwallex raised $320M led by Lee Fixel's Addition at an $11B valuation, and is incubating blockchain payments startup Metal (Jeff Kauflin/Forbes)
Source: TechmemePublished: Jun 25, 2026

Jeff Kauflin / Forbes : Australian payments startup Airwallex raised $320M led by Lee Fixel's Addition at an $11B valuation, and is incubating blockchain payments startup Metal —  Amid allegations of sharing customer data with China, Australian payments startup Airwallex, now valued at $11 billion, is gunning for Stripe and Ramp.

Scaled Cognition, a reliability-focused lab that develops the Agentic Pretrained Transformer model, raised a $100M Series A led by Khosla at a $750M valuation (Steven Rosenbush/Wall Street Journal)
Source: TechmemePublished: Jun 25, 2026

Steven Rosenbush / Wall Street Journal : Scaled Cognition, a reliability-focused lab that develops the Agentic Pretrained Transformer model, raised a $100M Series A led by Khosla at a $750M valuation —  AI models can be ‘like schizophrenic geniuses,’ says CEO who raised $100 million in round led by Khosla Ventures

Exponential View: global AI sales, excluding China, hit $25B in Q1, exceeding an estimated $21B in data center and chip depreciation costs; margins remain thin (Jacob Reid/Bloomberg)
Source: TechmemePublished: Jun 25, 2026

Jacob Reid / Bloomberg : Exponential View: global AI sales, excluding China, hit $25B in Q1, exceeding an estimated $21B in data center and chip depreciation costs; margins remain thin —  Revenue from artificial intelligence has reached a tipping point, showing that the hundreds of billions of dollars tech companies …

OpenAI, Anthropic, Amazon, Microsoft, and others launch Raise Us, a new non-profit led by ex-Commerce Secretary Gina Raimondo to help US workers adapt to AI (Lydia DePillis/New York Times)
Source: TechmemePublished: Jun 25, 2026

Lydia DePillis / New York Times : OpenAI, Anthropic, Amazon, Microsoft, and others launch Raise Us, a new non-profit led by ex-Commerce Secretary Gina Raimondo to help US workers adapt to AI —  OpenAI, Anthropic, Amazon and Microsoft have signed on to an effort led by Gina Raimondo, a former commerce secretary.

Preliminary findings: the EU says Azure and AWS are "the largest and second largest" cloud services in the EU, as the bloc weighs up tougher DMA oversight (Samuel Stolton/Bloomberg)
Source: TechmemePublished: Jun 25, 2026

Samuel Stolton / Bloomberg : Preliminary findings: the EU says Azure and AWS are “the largest and second largest” cloud services in the EU, as the bloc weighs up tougher DMA oversight —  Microsoft Corp.'s Azure and Amazon Web Services face tough European Union guardrails after regulators said they should be targeted by the bloc's Digital Markets Act.

How risk modelers like Fathom and Verisk are using AI and diffusion models to bypass the limits of physics-based "cat" models to predict natural disasters (Financial Times)
Source: TechmemePublished: Jun 25, 2026

Financial Times : How risk modelers like Fathom and Verisk are using AI and diffusion models to bypass the limits of physics-based “cat” models to predict natural disasters —  Catastrophe scientists are pushing past the limits of physics-based models, improving how insurers calculate risk

IBM details a 0.7nm chip manufacturing process that utilizes a "nanostack" 3D transistor architecture, which it says could continue chip innovation for 10 years (Don Clark/New York Times)
Source: TechmemePublished: Jun 25, 2026

Don Clark / New York Times : IBM details a 0.7nm chip manufacturing process that utilizes a “nanostack” 3D transistor architecture, which it says could continue chip innovation for 10 years —  Industry leaders had worried that innovations in chip miniaturization were no longer possible.

A profile of incoming WhatsApp CEO Kunal Shah, who studied philosophy, advised Sequoia India, and joins a growing pool of Indian-born executives at US companies (Financial Times)
Source: TechmemePublished: Jun 25, 2026

Financial Times : A profile of incoming WhatsApp CEO Kunal Shah, who studied philosophy, advised Sequoia India, and joins a growing pool of Indian-born executives at US companies —  Kunal Shah has experience in building transaction platforms and insights into emerging markets

Alibaba's Hong Kong stock closed down 4.43% after Anthropic accused Alibaba of "illicitly" accessing its AI models, down 36.24% YTD; Xiaomi and Baidu fell ~2% (Jeanny Yu/Bloomberg)
Source: TechmemePublished: Jun 25, 2026

Jeanny Yu / Bloomberg : Alibaba's Hong Kong stock closed down 4.43% after Anthropic accused Alibaba of “illicitly” accessing its AI models, down 36.24% YTD; Xiaomi and Baidu fell ~2% —  Alibaba Group Holding Ltd. shares slid to a 16-month low in Hong Kong after Anthropic PBC accused the Chinese technology giant of …

Arm EVP Mohamed Awad says its chip architecture now accounts for 50%+ of the hyperscale cloud computing market, as AI demand transforms the data center industry (Cheng Ting-Fang/Nikkei Asia)
Source: TechmemePublished: Jun 25, 2026

Cheng Ting-Fang / Nikkei Asia : Arm EVP Mohamed Awad says its chip architecture now accounts for 50%+ of the hyperscale cloud computing market, as AI demand transforms the data center industry —  TAIPEI — SoftBank-backed chip designer Arm says it has hit a milestone in its decades-long challenge to Intel and AMD …

Qualcomm CEO Cristiano Amon says the company is designing data center chips specifically for Chinese customers that are in compliance with US export controls (Yifan Yu/Nikkei Asia)
Source: TechmemePublished: Jun 25, 2026

Yifan Yu / Nikkei Asia : Qualcomm CEO Cristiano Amon says the company is designing data center chips specifically for Chinese customers that are in compliance with US export controls —  NEW YORK — Qualcomm unveiled its data center chip lineup on Wednesday, becoming the latest chipmaker to enter the AI processor race …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 25, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 25, 2026

Solidot Feed: Highlighting essential tech & open-source news.

卵巢绝经后可能转变为具有免疫功能的器官

生殖专家曾认为,女性绝经后,卵巢会像阑尾一样变得无用。在对 50-75 岁女性的卵巢进行检查时,研究人员发现该器官的细胞会随着年龄增长产生不同的蛋白质。为了更深入研究卵巢的年龄相关变化,研究人员转向了实验小鼠。尽管小鼠不会出现雌激素急剧下降等人类更年期特有特征,但这些动物在 2 年生命周期的后期,卵巢功能也会停止。研究人员分别从年轻小鼠、处于生殖期末期的小鼠以及“绝经”后小鼠体内摘取了卵巢。对每只动物,他们对其中一侧卵巢的 RNA 进行了测序,以测量基因表达情况。对另一侧卵巢,他们对组织进行了显微镜下视觉分析,以识别不同的细胞群,并测量纤维化的发展程度,纤维化是指随着年龄增长自然发生的硬化组织堆积现象。但对“绝经”后卵巢的分析显示,其中各类免疫细胞的水平均高于年轻小鼠的典型水平。此外,老年小鼠的卵巢中,编码各种促炎化合物的基因活性更高,这些免疫分子可能被分泌到血液中并随血液流向身体其他部位。尚不清楚衰老的卵巢究竟是真正发挥着免疫信号传导的作用,还是仅仅是免疫细胞的意外聚集地。这一发现或许有助于解释,为何女性尽管寿命更长,但随着年龄增长,健康状况往往不如男性。绝经后的卵巢可能会分泌某些分子,导致女性在更年期出现慢性炎症。

中国科学家研发出降低镉吸收能力的水稻

镉不是植物生长的必要元素,但其通过土壤—水稻—食物链进入人体长期摄入后,会引发肾功能损伤、癌症、骨质疏松等严重健康问题。OsNramp5 是水稻中负责从根部往茎部运输镉的关键转运蛋白,但也同时负责锰离子等植物生长必需的金属离子的运输,敲除 OsNramp5 可以有效降低镉的运输,但也会造成其他必要金属元素的缺乏,使水稻大幅减产。根据发表在 PNAS 期刊上的研究,中国科学院遗传与发育生物学研究所等通过碱基替换技术,靶向编辑水稻负责吸收镉元素的核心转运基因 OsNramp5,创制了优异人工等位变异,发现了特异降低镉吸收而不影响锰等其他关键金属离子吸收的新机制,解决了低镉与高产难以兼顾的难题,为镉污染农田安全生产主粮提供了可落地的育种新方案。

OpenAI 宣布了专用于推理的自研 AI 芯片 Jalapeño

OpenAI 宣布了首款自研芯片 Jalapeño,由 OpenAI 与博通公司合作设计和制造,专门用于 AI 推理。OpenAI 没有披露技术方面的细节,只是称初步测试显示每瓦性能显著优于目前最先进的同类产品。OpenAI 与博通是在去年 10 月正式宣布合作,OpenAI 声称利用其模型加速了芯片的设计。自研 AI 芯片旨在减少对英伟达的依赖,Google 和亚马逊也都开发了自研芯片。

英国维基百科员工寻求成立工会

英国维基百科员工率先寻求成立工会。维基媒体基金会英国员工于 6 月 24 日星期三致函管理层,请求由 Communication Workers Union(CWU)下辖分支 United Tech and Allied Workers (UTAW) 代表他们的权利。员工呼吁维基基金会作为这家全球非营利机构的实际管理者,履行其领导层最近作出的公开承诺,即保障员工组织和组建工会的权利。逾千名维基志愿者和社区成员签署了请愿书声援这些员工。英国是仅次于美国的维基媒体基金会第二大员工来源国。

微软称 8GB 内存对 Windows 11 足够用了

微软更新了 Surface 购买指南,声称 8GB 内存对 Windows 11 足够日常使用了,如浏览、视频串流、作业和生产力应用。它同时表示 16GB 或以上的内存才能解锁 Copilot+ PC 功能。由于内存短缺且价格翻了数倍,PC 厂商不得不开始提供 8GB 内存的设备,但 8GB 内存对 Windows 11 而言非常勉强,而过去两年微软的宣传是 16GB 内存是获得良好 Windows 11 体验的必要条件。作为主要 AI 基础设施提供商,微软当然也是造成今天这一局面的罪魁祸首之一了。

白宫应用自动下载到政府配发手机上且无法卸载

美国白宫今年五月宣布其白宫应用将自动下载到政府配发手机上。该应用无法卸载,即使政府雇员尝试卸载,应用也会很快重新安装。美国农业部、国务院和劳工部员工匿名接受采访时表示,这款应用出现在手机上时让他们感到不安,有人试图删除它,但失败了。“我把它删了,测试下,结果它立刻又出现了,”一位美国农业部雇员说。白宫应用内有一个按钮,允许用户“给特朗普总统发短消息”,点击后会自动弹出一个写着“史上最伟大总统”的文本框。应用的社交部分可看到来自白宫 X 账号推文、特朗普 Truth Social 账号发布的帖子,以及官方账号在 TikTok 和 Instagram 等平台上分享的视频。新闻部分包含了白宫新闻稿、简报和情况说明书,以及来自 Fox, Breitbart, Reuters, The New York Post 等媒体的精选文章,这些内容要么对本届政府政策大加赞扬,要么攻击民主党。一位政府雇员说这是赤裸裸的宣传。

给拼写错误的单词引入波浪线的人

我们习以为常的图形 UI 中的每一个小细节,无论多么微小,都是由某个人在某个时间点想出来的。举例来说:拼写错误的单词下方的小红色波浪线。这种设计已成为每个文本编辑字段司空见惯的元素,以至于无人特意去思考它。然而它确实是由某个人发明的,微软资深程序员 Raymond Chen 说,这个人是 Tony Krueger。早期的 Word 版本中,拼写检查功能需要用户手动调用,然后等待程序查找所有可能拼写错误的单词,逐一向用户展示,由用户决定如何处理每一个错误。Word 引入了自动拼写检查功能,在用户空闲时运行拼写检查,当用户点击拼写检查按钮时,结果已准备就绪。然而自动拼写检查仍然是一个阻塞操作。很多用户选择关闭它,因为它总是会在你想做其它事情如保存并退出时突然决定“现在是检查文档拼写的好时机”,迫使你等待拼写检查完成。Tony 让拼写检查器变得更不显眼,不会干扰用户的当前工作。当它发现问题时,不会触发拼写检查,而是立即在可能拼写错误的单词下画上红色波浪线,后来在可能语法错误的单词下画上绿色波浪线。

LG 和三星智能电视应用三分之一嵌入了住宅代理 SDK

对 LG 和三星智能电视应用的扫描发现,6038 款电视应用中有 2058 款嵌入了住宅代理 SDK,也就是会出售用户的家用 IP 作为代理服务使用。智能电视是理想的代理主机,它基本上一直处于插入电源状态,同时接入了家用 WIFI,但不像 PC 没人会去检查其可疑后台活动。电视应用上的广告可能会让用户不满,但默默运行的住宅代理则能在最小化用户不满的同时给运营商带来收入。但住宅代理会有滥用的风险,Kimwolf 僵尸网络就滥用了住宅代理进行传播和扩散。

Anthropic 指控阿里巴巴蒸馏其模型

Anthropic 指控阿里巴巴的 Qwen AI 实验室非法蒸馏其 AI 模型。Anthropic 在给美国议员的信中称,阿里巴巴的上述行动发生在今年 4 月 22 日至 6 月 5 日期间,通过近 2.5 万个欺诈账户与 Claude 进行了超过 2880 万次交互。这封日期为 6 月 10 日的信在一场有关 AI 的听证会前发送给美国参议院银行委员会主席蒂姆·斯科特和资深成员伊丽莎白·沃伦。

科学家将早期人类用火时间上溯至 180 万年前

科学家在南非 Wonderwerk 洞穴发现了新证据,表明人类祖先在 107-179 万年前就开始使用火,这是已知最早的人类用火记录。研究人员在洞穴深处约 30 米处发现了反复用火的痕迹,这些地点远离自然野火可能影响的范围,因此表明早期人类有意将自然产生的火带入洞穴并持续燃烧。早期人类不能随意生火,他们很可能是从闪电引发的火或草原野火收集火源。

中国一季度 PC 出货量下滑 2%

根据市场分析公司 Omdia 的数据,中国一季度 PC 出货量下滑 2%,平板电脑下滑 5%。PC 出货量降至 890 万台,平板电脑出货量降至 830 万台。笔记本电脑(含移动工作站)出货量同比下降 19%,而台式机(含台式工作站)出货量同比增长 41%,分别达到 530 万台和 360 万台。Omdia 称市场疲软的原因是组件成本上涨导致设备价格上涨,以及消费者补贴力度减弱。Omdia 预测 2026 年全年 PC 出货量将下降 14% 至 3600 万台,平板电脑出货量预计将下降 11% 至 3200 万台。最主要 PC 制造商包括联想、华为、苹果、软通动力和惠普。

幼儿早期的屏幕使用与较差的学习成绩和较弱的工作记忆相关

随着屏幕在幼儿生活中几乎无处不在,一项研究调查了其对学习表现的影响。研究跟踪了 1-8 岁的儿童,发现屏幕观看时间更长与 9 岁时较差的学习表现以及 10.5 岁时较弱的工作记忆存在关联。研究结果表明,屏幕接触的时机可能与屏幕使用的总时长同样重要。WHO 和美国儿科学会建议幼儿在 18–24 个月前不要接触屏幕,2-5 岁儿童每天使用屏幕时间不超过 1 小时。但很多幼儿都超过了这些限制。最新研究追踪了 502 名儿童从婴儿期到童年中期的发育过程,发现在特定发育阶段屏幕观看时间较长的儿童,后期学业表现较差,工作记忆较弱。这种关联在婴儿期和学龄初期最为显著,表明这些阶段可能是认知发展的特别敏感窗口期。在整个童年期屏幕接触总量较高的儿童,学业表现也通常较差。研究结果表明,屏幕使用的时机可能与总暴露量同样重要。研究结果支持“越少越好”的原则,即儿童的屏幕时间越少越好。

欧洲是变暖速度最快的大陆

本周英国、法国、意大利和西班牙都发布了红色高温预警,欧洲正经历五月以来第二波热浪。全球气温比工业化前时期——1850-1900 年——的水平高出约 1.4C,而根据欧盟哥白尼气候变化服务中心的数据,欧洲气温比工业化前水平高出约 2.4C。全球平均气温的持续上升主要是由于燃烧石油、天然气和煤炭产生的温室气体排放,但由于多种因素的共同作用,不同地区的升温幅度不同。陆地升温速度快于海洋,因为水可以吸收更多热量并通过蒸发冷却。哥白尼气候变化服务中心称,大气环流的变化导致欧洲夏季热浪更频繁强度更大。另一个主要原因是地理位置,欧洲与北极相连,北极气温比工业化前水平高出 3.2C。北极地区气温上升的部分原因是反照率。明亮的冰雪会将大部分太阳热量反射回太空,但冰雪融化会露出颜色较深吸收热量的陆地。欧洲冬季降雪频繁的地区,积雪覆盖面积正在减少,露出了深色的陆地。

伊朗断网期间仅约 2000 个 IP 能访问外网

伊朗今年早些时候全国范围断网,持续数月之久。在断网期间,伊朗实施了白名单制度,也就是只有处于白名单内的极少数 IP 地址才能访问外网。研究人员利用位于伊朗境内的一台 VPS 以及位于匈牙利、美国以及日本的 VPS,根据伊朗自治系统通过 BGP 宣布的 IP 段总数约 11,766,454 个 IP 地址,伪造这些 IP 地址进行穷举,观察哪些 IP 能访问外网。结果显示,能访问外网的 IP 大约有 2000 个。研究人员还发现,即使这些 IP 能访问外网,它们也不能随意访问任何网站,而是受到了基于 SNI 的过滤机制的约束。但白名单 IP 地址也不是所有都受到 SNI 过滤,测试的 IP 至少有半数不受任何 SNI 过滤。这意味着白名单 IP 也存在不同的访问策略。

阿里巴巴起诉美国国防部

阿里巴巴及其美国子公司共同以美国国防部为被告,向加州联邦地区法院递交诉状,请求法院宣告美国国防部 6 月 8 日公布的对阿里巴巴的认定决定无效。阿里巴巴在美国设有分支机构,在美国开展电商与云计算业务。即便被列入该名单,企业理论上仍可同美国企业开展合作,但美国国防部有权对与名单内中企合作的美国企业采取解约等限制措施。阿里巴巴在诉状中主张,此次认定缺乏事实依据、相关流程不合规。该认定致使公司无法继续聘用游说机构等,自身合法权益遭受侵害,此举同时违反美国宪法。

心率同步程度可判断社交投入程度

根据发表在 PNAS Nexus 期刊上的一项研究,心率同步程度可判断社交投入程度。当人与人在身体与情感层面彼此亲近时,双方心率会逐渐同步。研究团队依托 72 名学生参与音频工程竞赛、赴纽约市期间采集的数据集开展研究。学生借助可收录环境噪音的助听器、监测心率的手环以及记录定位信息的手机采集各类数据。研究规定,人与人相距 20 米以内即为物理近距离接触。受试者共处时心率同步性更强,近距离互动、共同关注同一外界刺激(例如一同听课)时同步效应尤为明显。出行前便彼此熟悉的受试者,心率同步水平显著更高。

GCC 编译器加入对海光苏州 x86 CPU 的支持

GCC 编译器合并了支持海光代号苏州的 Model 8 c86-4g-m8 处理器的补丁。海光最早是与 AMD 合作的半导体企业,授权提供 AMD Zen 1 CPU的本地化版本,其产品仅供国内市场使用。几个月前 GCC 编译器合并了支持海光 C86-4G CPU 的补丁。Model 8 苏州 CPU 是上一代 Model 7 成都 CPU 的继任者,目前关于该处理器的信息很少,其指令集架构与上一代相差无几,支持包括 AVX-512 在内的指令集。

德国铁路因 IT 故障而停运

德国铁路网络周二晚上因 IT 故障而全国停运。凌晨一点国家铁路运营商 Deutsche Bahn 宣布问题已经解决,服务正在逐步恢复。铁路公司称问题是铁路网络内部通信使用的 GSM-R 数字通信系统出现全国故障导致的,它表示已查明原因但未具体说明。铁路公司在故障期间向乘客发放了出租车和酒店代金券,并在条件允许下,在车站提供可供旅客乘坐的列车。该公司就此次事故表示歉意。

计划在伦敦举行的极端高温会议因极端高温预警取消

原计划本周在伦敦举行的极端高温会议《Extreme Heat: Improving governance and strengthening action around the world》因英国气象局宣布的极端高温红色预警而取消。根据气象局发布的罕见高温红色预警,伦敦、英格兰中部部分地区、威尔士东南部和英格兰南部受到影响,时间从周三 09:00 BST 持续到周四 21:00 BST ,气象局警告高温可能会有重病或死亡风险。预计英格兰南部气温将升至 37-38 摄氏度左右,周三最高气温甚至可能达到 39 摄氏度。

Valve 称它无法与内存厂商沟通报价

Valve 宣布了起售价逾一千美元的 Steam Machine,它表示这一定价反映了过去 6 个月确保能获得的内存和存储组件的价格。DDR5 和 SSD 过去半年的价格上涨了数倍之多。Valve 在接受采访时表示,他们在采购内存时根本无法选择,只能接受厂商的报价,想要协商根本不可能,协商价格的结果会是完全断货。一位 Valve 员工说,内存厂商“每个月都给我们报个价,说‘你们可以买这么多’,只有答应或拒绝两种选择。如果我们拒绝,他们就再也不理我们了。”Steam Machine 的内存配置有两种:其一是两条 8GB DDR5 内存条,其二是单条 16GB DDR5 内存条,Valve 称它的测试显示两种配置性能相差无几。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…