TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0909
SAT, JUN 27, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · SAT, JUN 27, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 27, 2026

Here is a summary of today's key news events:

U.S. Stock Markets End the Week with Losses

Major U.S. stock indices, including the S&P 500, Nasdaq, and Dow Jones Industrial Average, all declined, marking a negative end to the week for investors. The slide in stocks occurred as Treasury yields also fell.

U.S. and Iran Tensions Escalate in Strait of Hormuz

Oil prices climbed above $70 a barrel after the U.S. launched a new attack on Iran. The strike was a response to Tehran attacking a commercial ship in the Strait of Hormuz, an action Washington condemned as a ceasefire violation.

Debate on AI Regulation and Use Intensifies

The U.S. government began allowing trusted partners to access its powerful Mythos 5 AI model after a temporary restriction. The move comes amid a broader debate, with tech companies arguing against routine White House reviews of AI releases and local communities clashing over the use of AI surveillance for public safety.

Germany Proposes Major Pension Reform Amid Economic Pressure

Germany's coalition government has unveiled a significant pension reform plan. This bold move aims to address structural economic problems as the nation's powerful industrial sector faces mounting threats and challenges.

Progressive Wins Highlight Democratic Party Divisions

Recent sweeping election victories by progressive candidates are exposing internal divisions within the Democratic party. The results highlight a growing debate over the most effective strategy to compete against Donald Trump in the November presidential election.

High-Profile Business Figures Face Scrutiny

Two U.S. senators have called for a regulatory investigation into the betting platform Polymarket following reports of it promoting fake bets. In a separate event, a co-founder of Apollo Global Management publicly denied any involvement in the Jeffrey Epstein sex-trafficking scheme.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 27, 2026

Hacker News Feed: Highlighting key posts and discussions.

OpenRA

(www.openra.net)

8927
OpenTTD 16.0-Beta1

(www.openttd.org)

20035
Om

(daringfireball.net)

44419
Jolla Phone (October 2026)

(commerce.jolla.com)

298170
My Steam Machine is a 50ft HDMI cable

(blog.matthewbrunelle.com)

199184
Libre Barcode Project

(graphicore.github.io)

28262
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - June 27, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

DanceOPD: On-Policy Generative Field Distillation

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.

63
In-Context World Modeling for Robotic Control

Modern Vision-Language-Action (VLA) models often fail to generalize to novel setups, such as altered camera viewpoints or robot morphologies, because they are typically conditioned only on current observations and language instructions. By ignoring the underlying system configuration as a variable, these models implicitly assume a fixed execution context encountered during training, necessitating data-intensive fine-tuning for any new environment. In this work, we introduce In-Context World Modeling (ICWM), a framework that treats system identification as an in-context adaptation problem. ICWM enables robot policies to autonomously infer essential system variables from a short history of self-generated, task-agnostic interactions. Unlike traditional In-Context Learning that uses demonstrations to specify what task to perform, ICWM leverages the context window to understand how the system operates. By processing these interactions before task execution, the model implicitly captures the world dynamics of the current system, enabling adaptation to novel configurations without parameter updates. Extensive experiments in simulation and on real-world robot platforms demonstrate that ICWM significantly outperforms standard VLA baselines on novel camera viewpoints.

42
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed. On-policy self-distillation offers dense token-level supervision, yet existing skill-conditioned variants often rely on external skill memories or retrieved privileged context, which are costly to maintain and can be mismatched with the state distribution induced by the current policy in multi-turn interaction. We propose OPID (On-Policy Skill Distillation), a framework that extracts skill supervision directly from completed on-policy trajectories. OPID represents trajectory hindsight as hierarchical skills: episode-level skills capture global workflows or failure-avoidance rules, while step-level skills capture local decision knowledge at critical timesteps. A critical-first routing mechanism uses step-level skills when critical decisions are identified and falls back to episode-level skills as default guidance otherwise. The selected skill is injected into the interaction history, allowing the old policy to re-score the same sampled response under both original and skill-augmented contexts. The resulting log-probability shift yields a token-level self-distillation advantage, which is combined with the outcome advantage for policy optimization. OPID thus preserves RL as the primary training objective while introducing dense, distribution-matched hindsight supervision. Experiments on ALFWorld, WebShop and Search-based QA demonstrate that OPID generally improves agent performance, sample efficiency, and robustness over outcome-only RL and existing skill-distillation baselines. Our code is available at https://github.com/jinyangwu/OPID/tree/main.

40
Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

While text-to-image (T2I) models have achieved remarkable progress, they struggle with real-world requests that are often underspecified, implicit, or dependent on up-to-date knowledge. We identify this challenge as the Context Gap: the mismatch between the user context and the sufficient generation context for T2I models. To bridge this gap, we propose Qwen-Image-Agent, a unified agentic framework that integrates plan, reason, search, memory and feedback in a context-centric manner. Qwen-Image-Agent treats user input as partial context and progressively constructs the generation context through Context-Aware Planning and Context Grounding. Specifically, Context-Aware Planning identifies missing context and plans how it should be acquired and used, while Context Grounding gathers this context from reason, search, memory, and feedback. To evaluate agentic image generation, we further introduce Image Agent Bench (IA-Bench), a benchmark covering four core image agent capabilities: Plan, Reason, Search, and Memory. Experiments on IA-Bench, Mindbench and WISE-Verified show that Qwen-Image-Agent outperforms strong baselines and achieves state-of-the-art performance.

38
The Verification Horizon: No Silver Bullet for Coding Agent Rewards

A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is no longer difficult -- reliably verifying them has become the harder problem. Every verifier we can build is only a proxy for human intent, never the intent itself. This makes verification subject to a twofold difficulty: first, intent is underspecified by nature, making it inherently hard to faithfully check whether it has been fulfilled; second, during model training, optimization widens the gap between proxy and intent -- manifesting as reward hacking or signal saturation. To address this, we characterize the quality of verification signals along three dimensions -- scalability, faithfulness, and robustness -- and argue that achieving all three simultaneously is the central challenge. We further study four reward constructions: a test verifier for general coding tasks, a rubric verifier for frontend tasks, the user as verifier for real-world agent tasks, and an automated agent verifier for long-horizon tasks. Across different task types and policy capability levels, we conduct in-depth analysis and experiments on the core challenges of reward design and how to more effectively leverage reward signals. Experiments show that targeted verification design can effectively suppress reward hacking, improve task completion quality, and achieve significant gains across multiple internal and public benchmarks. These experiences collectively point to a core observation: no fixed reward function can remain effective as policy capability continues to grow; and verification must co-evolve with the generator.

37
ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

A unified representation for text and vision is a natural pursuit, as it enables simpler multimodal modeling and more efficient training. However, representing images as discrete signals in the same way as text inevitably introduces severe information loss. Existing work struggles to balance low-level details and high-level semantics in discrete representations: reconstruction-oriented representations often lack semantic information, whereas semantically stronger features typically suffer from severe loss of detail. We present ViQ, a Visual Quantized Representations framework, which is designed to balance semantics and details in discrete representations while supporting inputs at native resolutions, thereby enabling it to serve as a unified and general discrete representation for arbitrary visual inputs. Our approach structures quantization learning into two stages: text-aligned pre-training and feature discretization. With text-aligned pre-training, we enhance the visual encoder semantic-rich supervision from the pretrained language model and enable it to process native-resolution visual inputs. During discretization, we propose a proximal representation learning strategy to progressively compact the feature space, along with a position-aware head-wise quantization mechanism that enables flexible processing of arbitrary resolutions. Extensive experiments on multimodal tasks demonstrate that ViQ achieves competitive performance compared to state-of-the-art multimodal vision encoders with continuous and high-dimensional visual features, while maintaining high precision in low-level reconstruction. We also show that multimodal training with visual quantized representations largely improves efficiency, yielding up to 20\%-70\% acceleration with different base LLMs and training recipes.

37
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Speculative decoding (SD) accelerates autoregressive Large Language Models (LLMs) by drafting multiple tokens and verifying them in parallel, but it faces a scaling limitation: increasing the draft budget improves speed only when acceptance remains high and drafting overhead stays low. This ceiling has been difficult to break because prior head-based SD methods face a causality-efficiency dilemma. Autoregressive drafters produce path-conditioned candidates that are effective for tree speculative decoding with higher acceptance length, but their drafting cost grows with tree depth. Bidirectional block-diffusion drafters generate all positions in one pass, but their branch-agnostic marginals can form individually plausible yet mutually inconsistent trees, wasting budget and reducing acceptance. We propose JetSpec, a head-based SD framework that combines one-forward drafting efficiency with branch-wise causal conditioning. JetSpec trains a causal parallel draft head over fused hidden states from the frozen target model, producing candidate trees whose scores align with the target model's autoregressive factorization. This enables JetSpec to convert larger draft budgets into longer accepted prefixes and higher end-to-end speedup. Across math, coding, and chat benchmarks on dense and MoE Qwen3 models, JetSpec consistently outperforms bidirectional-head and tree-based SD baselines. On H100 GPUs, JetSpec achieves up to 9.64x speedup on MATH-500 and 4.58x on open-ended conversational workloads, with further latency gains demonstrated through vLLM integration under realistic serving loads. Our code and models are available at https://github.com/hao-ai-lab/JetSpec.

26
GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.

25
Fast LeWorldModel

Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.

21
Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities while overlooking broader dimensions, resulting in saturated performance on modern agents and failing to probe their limitations. To this end, we introduce GauntletBench, a web-based benchmark for evaluating agent generalisation in challenging scenarios, focusing on three underexplored capabilities (temporal perception, graphical understanding, and 3D reasoning), across five less-covered professional applications (Video Editor, Workflow Builder, 3D Modeller, Flight Analyser, and Circuit Designer), each with 20 vision-intensive tasks (100 in total). Our benchmark provides a modular pipeline that comprises an environment compatible with both open- and closed-source agent frameworks, a controlled web-based application, a well-structured task suite, and an automated evaluation engine with diverse metrics. Contrary to widespread expectations, our empirical results reveal that frontier agentic systems remain far from achieving human-level performance. Even the state-of-the-art agent achieves only a 19.1% success rate on our GauntletBench, highlighting the limitations in these overlooked capabilities and generalisation. By comparison, non-expert human annotators achieve over 80% success on our challenging yet feasible tasks, revealing the substantial gap between current agent capabilities and those required for complex real-world scenarios.

15
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catastrophic collapse, where performance abruptly drops and tool-invocation structures fail. The analysis reveals that these failures stem from unexpected probability spikes in specific control tokens, disrupting structured execution, yet the underlying tool-use capability remains intact, merely obscured by specific formats. To address this, we systematically investigate a diverse set of supervisory signals, including off-policy supervision, hint-based guidance, erroneous example supervision, and others, applied under both synchronous and interleaved training schemes. We find that interleaving supervised fine-tuning (SFT) with RL substantially improves stability, but exhibits degraded performance under format and content out-of-distribution (OOD) evaluation. We also analyze the impact of learning rates and generalization across settings. These results highlight the importance of understanding RL failures and demonstrate how diverse supervisory signals can guide exploratory learning, enabling robust training of LLMs for complex, multi-step tool-use tasks. Our Code is available at https://github.com/hypasd-art/Tool-RL-Box.

15
LISA: Likelihood Score Alignment for Visual-condition Controllable Generation

The prevalent dual-branch paradigm, i.e., training a side network to encode visual conditions and fusing its intermediate-layer features to a frozen pretrained main network, has shown remarkable success in visual-condition controllable generation. Despite its widespread adoption, the role of the side branch and its training efficiency remain underexplored. In this paper, we first revisit this mainstream paradigm through the lens of score-based generative modeling: 1) The main network preserves visual perceptual quality by providing a prior unconditional score. 2) The side network steers conditional control by implicitly contributing a likelihood score. Guided by this perspective, we propose LIkelihood Score Alignment (LISA), an effective regularization method that explicitly aligns the intermediate feature of the side network with an approximated likelihood score. Specifically, we first hook features from a designated layer of the side network and project them into the score latent space by a lightweight decoder. Then, we construct an approximated likelihood score target and calculate the distance between the decoder's output and this target as an additional regularization loss. Finally, we jointly optimize the side network and decoder with both standard diffusion loss and our regularization loss. Experiments across various image/video tasks, architectures, and diffusion/flow models demonstrated that LISA can not only consistently accelerate the training convergence and improve final synthetic results, but also encourage the side network's features to be more disentangled for conditional modeling with negligible additional training cost and zero extra inference cost.

13
Information-Aware KV Cache Compression for Long Reasoning

Reasoning capability has advanced rapidly in large language models (LLMs), leading to an increasing size of key-value (KV) cache in both prefilling and decoding stages. Existing KV cache compression methods mainly rely on attention weights to estimate token importance. While attention effectively captures contextual relevance, it overlooks complementary information-theoretic signals related to predictive uncertainty and token informativeness. In this paper, we revisit token importance from a forward-looking perspective and introduce Forward Influence, a metric that measures how compressed tokens affect future contexts. Our analysis reveals that tokens selected by attention scores mainly influence nearby contexts, whereas tokens associated with high predictive uncertainty exhibit substantially stronger influence on distant future contexts. Based on the observation, we propose InfoKV, an entropy-aware KV cache compression framework that incorporates information-theoretic signals. It combines token-level predictive uncertainty with layer-wise representation evolution and integrates the resulting entropy scores with attention scores during reasoning. Experiments on long-context reasoning benchmarks with Llama-3.1, Llama-3.2, and DeepSeek-R1 demonstrate that InfoKV consistently outperforms existing attention-based KV compression methods in both long prefilling and decoding scenarios.

9
Confidence-Aware Tool Orchestration for Robust Video Understanding

Video reasoning language models implicitly assume that every input frame is equally reliable. This leads to what we term the Blind Trust Problem: under realistic perturbations such as motion blur, glare, or occlusion, frontier video reasoning models can suffer 15-30%p accuracy drops on real-world embodied benchmarks, while remaining unaware that their visual evidence has been degraded. To address this challenge, we propose Robust-TO, an agentic video understanding framework that explicitly integrates per-frame trustworthiness into every stage of reasoning. Robust-TO organizes heterogeneous visual perception tools under a unified evidence interface. Each tool receives a sub-query derived from the original question and a set of trustworthy frames selected by the reliability-relevance score. It returns evidence in a shared format: a concrete prediction (e.g., a bounding box, motion trajectory, recognized text, or action label), temporal grounding, and a calibrated reliability score. During reasoning, these calibrated scores guide evidence weighting in a three-tier synthesis process (high/medium/low) and define a confidence-cost GRPO reward that jointly optimizes correctness, evidence reliability, and efficiency. On two video reasoning benchmarks spanning eight tasks, Robust-TO achieves 56.4% average accuracy on clean inputs, surpassing the strongest open-source baseline by 10.6%p and outperforming Gemini-2.5-Pro (46.2%). Under five realistic corruption types, Robust-TO maintains 54.3% average accuracy, 5.8%p above the strongest open-source baseline, while exhibiting the smallest clean-to-corrupted accuracy drop among all compared methods.

9
PhysiFormer: Learning to Simulate Mechanics in World Space

We present PhysiFormer, a diffusion transformer for physically-plausible 3D object motion. Unlike video world models that operate in view-dependent pixel space, PhysiFormer represents objects as 3D meshes expressed in world coordinates. Given the initial vertex positions and velocities, as well as object material type, rigid or elastic, the model samples future vertex trajectories. While related neural physics approaches build on ad-hoc latent spaces or explicitly enforce rigidity and causality, PhysiFormer shows that excellent results can be obtained without any such inductive biases, by casting vertex trajectory prediction as a single denoising diffusion process directly in world coordinates. The probabilistic formulation captures uncertainty in the learned dynamics, enabling diverse plausible futures from initial conditions, making this framework potentially useful for applications with unobserved uncertainty. The model features attention factorised over time, space, and objects for efficiency, enabling permutation-invariant multi-object reasoning without needing explicit object encoding. Trained on over 100k simulated trajectories, PhysiFormer generates rigid and elastic mechanics, and generalises to mixed-material settings, unseen real-world geometries, and larger object counts. It substantially outperforms autoregressive baselines in trajectory accuracy, rigidity preservation, and momentum-based physical consistency. Our results position coordinate-space diffusion as a promising step toward view-invariant, geometry-aware world modelling for robotics, graphics, and physical design. Visualisations, code, and models are available at https://yimingc9.github.io/physiformer.

9
Hallucination in World Models is Predictable and Preventable

Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation. An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2

8
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a single agent interacting with a passive environment, economic systems are inherently multi-agent, requiring autonomous agents to communicate, negotiate, and transact while pursuing their own objectives over extended periods. We introduce CoffeeBench, a benchmark for evaluating LLM agents in a long-horizon multi-agent economy composed of heterogeneous firms. In CoffeeBench, two farmers, two roasters, and two retailers autonomously operate their businesses over a 90-day simulation, each seeking to maximize cumulative net income through communication and transactions while managing cash, inventory, and pricing. The evaluated model controls one coffee roaster, while the remaining firms are controlled by fixed reference agents. Across several recent open-weight and proprietary LLMs, all models outperform a passive baseline that takes no actions, with most achieving positive net income. Analysis of agent behavior reveals substantial differences in long-horizon economic interaction: higher-performing models communicate more actively with other firms, whereas Claude~Haiku~4.5 exhibits an idle-drift failure mode, repeatedly choosing inaction despite producing coherent assessments and plans. We release our code and agent trajectories to support future research.

7
Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make both human annotation and Monte Carlo estimation infeasible at scale. In this work, we show that reinforcement learning (RL) post-training already provides the ingredients for effective step-level scoring, eliminating the need for dedicated reward model training altogether. Concretely, we derive an implicit advantage under a general stochastic Markov decision process, which we term progress advantage -- log-probability ratio between the RL-trained policy and its reference policy exactly recovers the optimal advantage function. This formulation makes the resulting signal annotation-free, domain-agnostic, and available as a byproduct of the standard RL post-training pipeline. We validate the effectiveness of the progress advantage across three different applications: test-time scaling, uncertainty quantification, and failure attribution on five benchmarks and four model families. Across all settings, it consistently outperforms confidence-based baselines and, despite requiring no task-specific training, surpasses dedicated trained reward models. We complement these results with deeper analyses on characteristics of progress advantage, offering practical guidance for adoption in real-world agentic systems.

6
Discretizing Reward Models

Despite their widespread use, the role of reward models in shaping reinforcement learning is poorly understood. Reward models offer a tempting promise: they automatically estimate response quality in the absence of verifiers or human judges. Unlike "verifiable rewards" which typically produce binary scores, reward models typically produce continuous scores, allowing them to be sensitive to fine-grained differences in responses. However, we show this apparent strength is a serious weakness: many popular reward models are oversensitive, assigning different scores to equally good responses. Theoretically, we show that seemingly perfect reward models can be highly oversensitive; empirically, this oversensitivity can lead to bad policies. In place of existing notions of "reward model accuracy," we propose evaluating reward models using distinct measures of "discriminative ability" and "specificity" (the complement of oversensitivity). As a solution, we describe a training-free algorithm that uses Monte Carlo dropout on any neural reward model to produce discrete reward clusters. Theoretically, we prove there exist discretizations that reduce oversensitivity at minimal expense of discriminative ability; empirically we show, in both controlled and natural RL settings, that discretizing rewards leads to less reward hacking and better policies than training on the original rewards.

6
COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

While generative AI has achieved remarkable success in solving problems with verifiable solutions, generating physical art that satisfies both strict geometric constraints and subjective visual aesthetics remains a challenge. This paper presents an approach to tackle these difficulties in the domain of computational origami, a mathematically rigid environment that grounds artistic design within the equations of flat foldability. We present COrigami, an end-to-end AI-driven pipeline that assists the design cycle by generating crease patterns from natural language. Our pipeline involves generating a semantic stick figure, computing a base packing, solving for a flat-foldable crease pattern, shaping the flat-folded crease pattern, and refining the generated model using reinforcement learning driven by an autonomous aesthetic evaluation loop. Our system acts as a highly effective collaborative assistant, generating structural starting points that human artists can further expand and shape. By integrating algorithmic optimisation with autonomous aesthetic critique, this work demonstrates how AI systems can satisfy multi-objective physical constraints to enable reliable, mathematically grounded co-creativity.

4
When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Multi-model LLM systems such as routing, voting, cascades, fusion, and mixture-of-agents are used to beat single-model accuracy. We show that their gain is capped by a quantity the field rarely reports. For any policy whose output is one member model answer, accuracy cannot exceed one minus beta, where beta is the rate at which every model is wrong on the same query. In contrast, the usual diagnostic, average pairwise error correlation rho, cannot identify beta: error laws with identical marginals and pairwise correlations can have different all-wrong rates. A Clopper-Pearson bound on beta gives a finite-sample certificate on the largest gain any router, vote, or cascade could deliver before training a router. Across 67 models from 21 providers, a tetrachoric-calibrated single-factor model still underprices the all-wrong tail: on open-ended mathematics, observed beta is 0.052 versus 0.023 under the full 67-model Gaussian copula, about 2.5 times underpricing, with 90 percent CI 1.7 to 3.4 and k equals 17. The effect recurs on execution-graded code, where beta is 0.079. Re-asking the same GPQA-Diamond questions in free-response rather than multiple-choice form reopens the tail, with beta 0.127 and a five-judge panel with kappa 0.73 to 0.92, locating co-failure in answer format rather than subject. At matched quality, low-rho heterogeneous ensembles beat high-rho Self-MoA, but on checkable tasks in our pool, combining models rarely beats the single best model without a strong query-level routing signal. Gains come from models failing on different questions, not from adding more models.

3
How Post-Training Shapes Biological Reasoning Models

Scientific reasoning models for biology combine language models with foundation models trained on multimodal biological data, including DNA, RNA, and proteins. These models are built through post-training, yet how each stage shapes reasoning and generalization remains poorly understood. We study when post-training improves performance and when it induces over-specialization. Across genomics, transcriptomics, and proteins, we train and evaluate more than 100 biological reasoning models under controlled variation in backbone, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), measuring both in-domain (ID) and out-of-domain (OOD) performance. We find that each post-training stage reshapes generalization in a distinct way rather than contributing uniform gains. CPT improves downstream performance by aligning models with biological language. SFT consistently increases ID performance but causes OOD performance to peak early and decline as models fit the training distribution. RL, when applied to strong SFT checkpoints with aligned rewards, improves OOD performance and partially recovers generalization. These results show that biological reasoning does not improve monotonically with additional supervision or compute. Instead, performance depends on how training stages are composed. Under fixed post-training budgets, the strongest ID-OOD trade-off comes from brief SFT, larger RL allocations, and asymmetric adaptation capacity across stages.

3
OpenBioRQ: Unsolved Biomedical Research Questions for Agents

A working citation looks like proof -- but the fact that a link resolves does not mean the cited paper supports the claim. I find that current agentic models rarely fabricate citations (over 99% resolve), yet roughly 15.9% link to the wrong paper. Existing benchmarks miss this failure mode: when a question has a fixed answer key, a model can reproduce the expected source from that key rather than independently verifying that the source supports the claim. I introduce \openbiorq{}, a retrieval-grounded agentic benchmark of 12{,}553 unsolved biomedical research questions across 12 domains that treats open questions as a faithfulness-and-abstention probe. To my knowledge, this is the first biomedical benchmark to combine an agentic setting -- where the model must issue multiple tool calls -- with unsolved questions that have no answer key. Openness is verified against real follow-up evidence rather than a model's parametric knowledge. Difficulty is empirical: I anchor it on questions that three open-weight reference models fail to answer, rather than on subjective hardness labels. On this hardest subset, held-out models from the same lineage as the difficulty anchors solve only ~17%, while three independent frontier agents (Gemini-3-Pro, Opus-4.7, GPT-5.5) span a wide 29-60% range. The benchmark is thus hard, non-saturating (the best agent still leaves ~33-40\% unsolved), and discriminating across capability tiers. Beyond difficulty, I observe agentic collapse on the hardest questions, where agents stop using their tools. For the most collapse-prone model, blocking tool access entirely barely changes its score -- so tools stop paying off exactly where they are needed most. A frozen per-question checklist raises inter-judge agreement from Spearman 0.35 to 0.82.

3
ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on existing 3B-parameter unified foundation model and is adapted for object localization tasks using three key innovations: density-aware adaptive zooming with objectness maps for spatial grounding; a boundary-aware count policy via GRPO to eliminate crop-boundary errors; and a cycle-consistent GRPO strategy where the understanding branch self-critiques generated outputs, closing the understanding-generation gap without any external annotations. ABACUS achieves state-of-the-art results across seven benchmarks, outperforming both task-specific specialists and larger generalist models.

2
EO-WM: A Physically Informed World Model for Probabilistic Earth Observation Forecasting

Earth Observation (EO) forecasting aims to predict future Earth surface dynamics from satellite observations under changing meteorological conditions. In this paper, we view this task as a partially observed, weather-driven world modeling problem, in which weather acts as a conditioning signal, while forecasting remains uncertain due to sparse observations and unobserved land-surface states. However, existing methods do not fully capture this setting: deterministic models collapse uncertainty into a single future prediction, while diffusion-based methods typically treat weather variables as undifferentiated conditioning signals, and existing benchmarks focus mainly on reconstruction accuracy rather than whether forecasts respond correctly to changed weather forcing.We introduce EO-WM, a video diffusion transformer for multispectral EO forecasting. EO-WM incorporates a physically informed conditioning framework that represents meteorological forcing through a climatological baseline, weather anomalies, and cumulative physical stress signals. Specifically, it separates baseline and anomaly through distinct conditioning pathways, and accumulates anomalous forcing over time to capture sustained heat and drought stress. To evaluate weather-response behavior beyond standard metrics, we introduce two diagnostic benchmarks: an Extreme Summer Benchmark for severity-aware prediction of vegetation degradation under extreme weather, and a Seasonal Matched-Pair Benchmark for testing response fidelity under changed weather forcing. Experiments show that EO-WM reduces the error in predicted Normalized Difference Vegetation Index (NDVI) decline amplitude by a relative 5.63% and improves directional hit rate by a relative 7.80%, while remaining competitive on standard pixel-level metrics. The benchmarks and model will be made open-source at https://github.com/Luo-Z13/EO-WM.

2
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 27, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Nada icon
Nada

Compose music with just your voice

0
Epilogue. Write novels, scripts & poetry icon
Epilogue. Write novels, scripts & poetry

The professional book writing app built for serious authors

0
Cloud World Model icon
Cloud World Model

Simulate AWS, GCP & DigitalOcean without paying the bill

0
RetroMac icon
RetroMac

Turn your Mac into a time machine.

0
Folio AI icon
Folio AI

Claude for PowerPoint, on steroids

0
Supra Player icon
Supra Player

Compare & Sync Videos Fast

0
QApilot's CoWork icon
QApilot's CoWork

3x Mobile Automation. Same QE Team.

0
Cewsco icon
Cewsco

All-in-one AI assistant — chat, images, voice & market data

0
ModuleX icon
ModuleX

AI workspace that’s already connected to everything

0
AI Slide Editor by CubeOne icon
AI Slide Editor by CubeOne

The editor PowerPoint should've shipped

0
Animdock Motion Templates in the Browser icon
Animdock Motion Templates in the Browser

Create trend motions in your browser!

0
Sleek Analytics icon
Sleek Analytics

See who's on your site. Right now.

0
SquidHub icon
SquidHub

Multiplayer mode for humans and AI

0
Aurora Notch icon
Aurora Notch

A private notch workspace for every Mac

0
Gemini Spark icon
Gemini Spark

Your 24/7 personal AI agent

0
Basedash for Excel icon
Basedash for Excel

Turn any Excel file into a live dashboard

0
note.md icon
note.md

your notes and research documentation now a local LLM Memory

0
Agent Arena icon
Agent Arena

The first public arena for AI agents

0
Group Subscriptions by beehiiv icon
Group Subscriptions by beehiiv

Sell subscriptions to teams, companies, and organizations.

0
Atlas icon
Atlas

Every AI tool you use should know how your company works

0
LockIn MCP icon
LockIn MCP

Let AI block distractions for you when you need to lock in

0
DMV by Agent Community icon
DMV by Agent Community

A community-governed namespace for AI agents

0
Genspark Design icon
Genspark Design

Generate UI prototypes, videos, and posters with AI

0
SayCraft icon
SayCraft

Build a web app by talking through a meeting

0
BrowserBash icon
BrowserBash

CLI that turns plain-English into real browser tests

0
Grass 2.0 icon
Grass 2.0

The always-on computer for your coding agents

0
Tough Tongue AI for Sales icon
Tough Tongue AI for Sales

Live AI teammate for every tough sales conversation

0
Polygraph icon
Polygraph

Let AI agents see cross repo and maintain session memory.

0
SendTidings icon
SendTidings

Turn your analytics into beautiful monthly email reports

0
Papermark Agents icon
Papermark Agents

Let AI agents run your next deal, fundraise or data room

0
Blop icon
Blop

Describe your app and Blop tests it and repairs broken tests

0
Signspell icon
Signspell

Real-time ASL alphabet recognition in py ,pip install and go

0
Postproxy - Engagement API icon
Postproxy - Engagement API

Publish, reply, and analyze social media via API

0
Nashra icon
Nashra

Turn followers into clients.

0
Dub Ninja icon
Dub Ninja

Live autonomous AI DJ that digs, mixes & explains 24/7

0
Oxlo.ai icon
Oxlo.ai

Scale across AI models without scaling your bill

0
Milestones icon
Milestones

Native project planning app, now on Mac & with an MCP server

0
Zaro icon
Zaro

Build agents & apps on top of your context with one prompt.

0
BrowserAct icon
BrowserAct

Web browser automation for AI agents

0
Samepage Signals icon
Samepage Signals

Your second brain for product management

0
MeetPoint icon
MeetPoint

Find the city where everyone's flights are cheapest

0
VTT for Mac icon
VTT for Mac

Voice-to-text for macOS with a fully on-device option

0
Sidegent icon
Sidegent

Learn to build AI agents by actually building them

0
Figma Motion icon
Figma Motion

Your Figma canvas now has a timeline

0
Heron icon
Heron

Wireshark for AI Agents: passive eBPF observability

0
Paybond CLI icon
Paybond CLI

Safe agent spend from the terminal

0
QuickMaker icon
QuickMaker

State of the art AI models in Blender under one subscription

0
Brain² by ClickUp icon
Brain² by ClickUp

One AI that knows your entire company and acts on it

0
Stripe.Directory icon
Stripe.Directory

New way for you & agents to search for businesses on Stripe

0
Tencent EdgeOne Makers icon
Tencent EdgeOne Makers

Ship AI agents like web apps, in minutes.

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 27, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Polymarket says its annualized revenue is now $1B+; Dune Analytics: daily volume on Polymarket's US platform rose from ~$50M in mid-May to $200M+ by June 20 (Davis Giangiulio/CNBC)
Source: TechmemePublished: Jun 27, 2026

Davis Giangiulio / CNBC : Polymarket says its annualized revenue is now $1B+; Dune Analytics: daily volume on Polymarket's US platform rose from ~$50M in mid-May to $200M+ by June 20 —  Prediction market platform Polymarket's annualized revenue are now well above $1 billion, the company shared exclusively with CNBC on Friday.

Streaming services must comply with a California law that bans playing ads louder than the content being watched from July 1, but its implementation is unclear (Scharon Harding/Ars Technica)
Source: TechmemePublished: Jun 27, 2026

Scharon Harding / Ars Technica : Streaming services must comply with a California law that bans playing ads louder than the content being watched from July 1, but its implementation is unclear —  On July 1, it will be illegal for streaming platforms to play ads louder than the content being watched in California.

A look at advanced chip packaging, now more reliant on TSMC and its partners in Taiwan than ever, and the efforts to address this bottleneck in the US (Don Clark/New York Times)
Source: TechmemePublished: Jun 27, 2026

Don Clark / New York Times : A look at advanced chip packaging, now more reliant on TSMC and its partners in Taiwan than ever, and the efforts to address this bottleneck in the US —  A silicon wafer reflecting Subramanian Iyer, a specialist at the University of California, Los Angeles, in a technology called advanced chip packaging.

Sources: Meituan, Baidu, Xiaomi, and other Chinese tech giants have been trimming their workforces, fueling Chinese workers' concerns of being replaced by AI (Wency Chen/South China Morning Post)
Source: TechmemePublished: Jun 27, 2026

Wency Chen / South China Morning Post : Sources: Meituan, Baidu, Xiaomi, and other Chinese tech giants have been trimming their workforces, fueling Chinese workers' concerns of being replaced by AI —  When a friend checked in on a Meituan employee late last month to see if he had survived the latest round of corporate culling …

Japanese financial giant SBI agrees to acquire Bitbank, a top 10 Japanese crypto exchange by trading activity, for ~$289M, with the deal set to close in October (Jamie Crawley/CoinDesk)
Source: TechmemePublished: Jun 27, 2026

Jamie Crawley / CoinDesk : Japanese financial giant SBI agrees to acquire Bitbank, a top 10 Japanese crypto exchange by trading activity, for ~$289M, with the deal set to close in October —  Japanese financial services giant SBI Holdings said it agreed to buy cryptocurrency exchange Bitbank for around $289 million.

Source: Intel has promised to deliver SpaceX and Apple a toolkit this fall to test its 14A node before they make final commitments to produce chips with Intel (Tripp Mickle/New York Times)
Source: TechmemePublished: Jun 27, 2026

Tripp Mickle / New York Times : Source: Intel has promised to deliver SpaceX and Apple a toolkit this fall to test its 14A node before they make final commitments to produce chips with Intel —  At a tech conference in San Francisco this week, admirers surrounded Lip-Bu Tan, the chief executive of Intel …

Insurance tech startup Corgi denies accusations that it used Papermark's open source software code to develop its software and present it as its own (Julie Bort/TechCrunch)
Source: TechmemePublished: Jun 27, 2026

Julie Bort / TechCrunch : Insurance tech startup Corgi denies accusations that it used Papermark's open source software code to develop its software and present it as its own —  Y Combinator-backed insurance tech startup Corgi became embroiled in yet another controversy earlier this week when Papermark …

Sources: the CFTC began an extensive investigation of Polymarket earlier this year; the agency's former acting head killed a separate investigation in July 2025 (New York Times)
Source: TechmemePublished: Jun 27, 2026

New York Times : Sources: the CFTC began an extensive investigation of Polymarket earlier this year; the agency's former acting head killed a separate investigation in July 2025 —  Last year, the Commodity Futures Trading Commission overruled its enforcement attorneys and killed a separate inquiry into whether …

Sources: Apple is lobbying the Trump admin for clearance to buy memory chips from US-blacklisted Chinese company CXMT to ease pressure from rising chip prices (Financial Times)
Source: TechmemePublished: Jun 27, 2026

Financial Times : Sources: Apple is lobbying the Trump admin for clearance to buy memory chips from US-blacklisted Chinese company CXMT to ease pressure from rising chip prices —  The iPhone maker wants Trump administration to sign off on purchases to ease pressure from rising chip prices

Sources: SpaceX and Charter have held executive-level talks on a consumer mobile phone offering, which would help SpaceX become a DTC mobile phone provider (Kelcee Griffis/Bloomberg)
Source: TechmemePublished: Jun 27, 2026

Kelcee Griffis / Bloomberg : Sources: SpaceX and Charter have held executive-level talks on a consumer mobile phone offering, which would help SpaceX become a DTC mobile phone provider —  SpaceX and Charter Communications Inc. have held executive-level talks about partnering on a consumer mobile phone offering, according to people familiar with the matter.

Anthropic says the US government is allowing Mythos 5 to be redeployed to critical infrastructure operators, and is working to restore general access to Fable 5 (@anthropicai)
Source: TechmemePublished: Jun 27, 2026

@anthropicai : Anthropic says the US government is allowing Mythos 5 to be redeployed to critical infrastructure operators, and is working to restore general access to Fable 5 —  Since June 12, we've been working closely with the US government to restore access to Claude Mythos 5 and Fable 5. Today, the government notified us that Mythos 5, our strongest cybersecurity model, can be redeployed to a set of US organizations that operate and defend critical

Sources: Meta lobbyists are urging California lawmakers to exempt social media platforms from legislation that would increase penalties in child-harm cases (Tyler Katzenberger/Politico)
Source: TechmemePublished: Jun 26, 2026

Tyler Katzenberger / Politico : Sources: Meta lobbyists are urging California lawmakers to exempt social media platforms from legislation that would increase penalties in child-harm cases —  Meta's plea comes as it faces hundreds of lawsuits accusing the company of failing to protect kids' safety on its platforms.

Letter: the US lifts its block on Mythos 5, allowing Anthropic to release it to more than 100 US institutions; sources: talks about Fable 5 are ongoing (Semafor)
Source: TechmemePublished: Jun 26, 2026

Semafor : Letter: the US lifts its block on Mythos 5, allowing Anthropic to release it to more than 100 US institutions; sources: talks about Fable 5 are ongoing —  THE SCOOP  —  The US government Friday lifted its block on Anthropic's powerful Claude Mythos 5 AI model, allowing the company to release …

Anthropic Moves Toward Deal With US to Lift Curbs on AI Models (Bloomberg)
Source: TechmemePublished: Jun 26, 2026

Bloomberg : Anthropic Moves Toward Deal With US to Lift Curbs on AI Models —  Anthropic PBC and the Trump administration are moving closer to an agreement that would lift US restrictions on the company's top two artificial intelligence models after weeks of talks between the two sides over security of the systems …

How AI-native law firms use "management services organization" structures to access capital historically barred from US law firms, including PE and VC funds (Stephen Foley/Financial Times)
Source: TechmemePublished: Jun 26, 2026

Stephen Foley / Financial Times : How AI-native law firms use “management services organization” structures to access capital historically barred from US law firms, including PE and VC funds —  Interest in a model that separates legal casework from other operations exploded alongside the new tech

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 27, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 27, 2026

Solidot Feed: Highlighting essential tech & open-source news.

空客被要求对 A380 的机翼进行紧急检查

欧盟航空安全局(EASA)下令对阿联酋航空和澳洲航空运营的 16 架空客 A380 飞机进行紧急检查,此前部分 A380 飞机的机翼部件发现了裂纹。EASA 表示,裂纹是在此前对机翼翼梁结构的检查中发现的,它认定这些裂纹可能会降低机翼的结构完整性。为解决潜在安全隐患,空客必须进行额外的专项细查。16 架 A380 客机中有 15 架由阿联酋航空运营,1 架由澳洲航空运营。A380 是载客量最高的民航客机,共制造了 254 架,目前已经停产。

从赞美美德到歌颂堕落

英国伦敦大学玛丽皇后学院的研究人员分析了 1960-2023 年间发行的逾 38 万首歌曲的歌词后发现,流行音乐中使用的情感语言和道德语言发生了显著变化。表达关怀和体面等道德美德的词语随时间推移变得越来越少见,而与伤害、欺骗、颠覆和堕落相关的语言逐渐增多。研究人员指出,“音乐不仅仅是娱乐。它是社会讲述自身故事的方式之一。通过分析几十年来的歌词,我们可以开始看到情感表达和道德叙事随时间如何演变。”研究还发现,女性艺术家更多与关爱和忠诚等美德联系在一起,而男性艺术家和男女混合组合则更多与反映伤害、颠覆和堕落等负面主题联系在一起。

大型猿类笑声节奏与人类相似,存在了 1500 万年

根据发表在《Communications Biology》期刊上的一项研究,大猿的笑声节奏可能与现代人类相似,而这一现象已持续了至少 1500 万年。研究结果还表明,在大猿的演化过程中,笑声变得更快、变化更多,且越来越受到所处情境的影响。所有大猿(人科动物)都会笑,包括与人类亲缘关系较近的物种,如倭黑猩猩,以及亲缘关系较远的物种,如婆罗洲猩猩。然而笑声的节奏随时间如何演变,及其可能与人类语言的演化有何关联,此前尚不清楚。 在研究中,英国华威大学的研究人员分析了 4 只婆罗洲猩猩(Pongo pygmaeus)、两只大猩猩(Gorilla gorilla)、3只倭黑猩猩(Pan paniscus)、4只黑猩猩(Pan troglodytes)以及4个人的笑声录音,这些个体的年龄在6个月至7岁之间。 科学家研究了140段笑声序列,并测量了每次发声之间的时间间隔。研究发现,所有物种的笑声都遵循一种规律的节律模式,连续发声之间的间隔均匀。由于这种模式在所有研究物种中均存在,研究人员推测,这种有节奏的笑声可能早在 1500 万年前就已存在于它们的共同祖先身上。 他们还推断,随着时间的推移,笑声变得更快、更多样化,比如人类会根据情境改变笑声的速度,如被挠痒时发出的笑声比玩耍时更快,而其他猿类则不会。此外,与人类亲缘关系越近的猿类,其笑声节奏的变化性就越大。 这些发现表明,在大猿和人类的演化过程中,发声的灵活性和控制力可能逐渐增强,作者推测这可能促成了语言的出现,未来需要通过更大样本量的研究证实这些发现。

每小时走五分钟有助于抵消久坐的危害

久坐是一种健康风险,但对久坐行为的干预需要考虑可行性和有效性。根据发表在《British Journal of Sports Medicine》期刊上的一项研究,研究人员评估了每隔 30 分钟、60 分钟或 120 分钟就站起来步行 5 分钟的干预措施。有 19342 名成年人参与了研究,其中 11484 人分成三组执行上述三种不同的干预方法。结果显示,所有干预组参与者报告疲劳和负面情绪显著降低,正面情绪显著提升。在考虑了可行性和有效性等因素之后,研究人员指出每小时站起来走 5 分钟在可行性和有效性之间取得了最佳平衡。

美光与其大客户签署了长达五年的供货协议

美光 CEO Sanjay Mehrotra 在最新的财报电话会议上披露,该公司与 16 家大客户签署了“战略客户协议”,大部分协议涵盖的时间从 2026 年一直持续到 2030 年,客户承诺购买一定数量的产品,支付价格处于设有最低和最高价格的定价区间内。这意味着如果内存价格进一步上涨,客户基本不会受到的影响。美光 CEO 称,客户意识到,内存和存储设备的供应短缺需要相当长的时间才能缓解。美光预计供应将在 2028 年逐步改善,但目前无法预测内存供应何时才能赶上持续增长的需求。他说客户同意预付款项,该公司将利用这笔资金扩建晶圆厂。

晚上刷手机与眼疾风险增加相关

上海交通大学医学院附属第一人民医院的一支研究团队利用了英国生物样本库(UK Biobank)的数据,最终纳入了 82826 名基线时无眼部疾病的参与者。这些参与者均连续 7 天佩戴了配有高分辨率光传感器的腕带式加速度计,以客观记录其个人光照暴露情况。研究结果显示,在晚间时段(晚上20:00至23:30),当参与者所处环境的平均光照强度超过1000勒克斯时,与其后眼部退行性疾病的发病风险显著升高相关。其中,年龄相关性黄斑变性的患病风险增加了31%,白内障风险增加了18%,而原发性开角型青光眼的风险则大幅增加了47%。研究人员还观察到了显著的时间-剂量反应关系。在极高强度(如超过2250勒克斯)的光照下暴露时间越长,发生整体年龄相关性眼病和青光眼的风险就越高。

《Arma: Cold War Assault》重制版开源

Bohemia Interactive 在 GPL v3.0 许可证下公开了《Arma: Cold War Assault》重制版源代码,项目托管在 GitHub 上。《Arma: Cold War Assault》于 2001 年以《Operation Flashpoint: Cold War Crisis》的名字发布,游戏提供了 12.5 km × 12.5 km 开放世界地图,它对于现代化立体化野战的真实模拟为它赢得了一大批军事游戏爱好者拥趸。游戏的开放性以及强大的脚本编程能力,也给它带来了大量 MOD。重制版代码已现代化至 C++20,使用 CMake 和 Clang 构建,并支持 Windows x64 和 Linux x64 等平台。Bohemia Interactive 称,游戏代码是自由软件,但名字和商标并不能自由使用,而且模型、纹理、音效、任务和语音等游戏数据也都没有公开,需要另外购买。

微软再次延长 Windows 10 免费安全更新一年

Windows 10 于 2025 年 10 月 14 日结束支持,微软原本此后不再提供免费的安全更新,但 Windows 10 仍然有大量用户使用,软件巨人去年宣布将提供免费安全更新一年。如今还有几个月时间才到期,微软又将免费安全更新延长一年,Windows 10 用户不需要做任何事就能再享受一年免费安全更新。最新的扩展安全更新将于 2027 年 10 月 12 日到期。根据 StatCounter 的统计,有 26% 的 PC 仍然运行 Windows 10,由于微软提高了 Windows 11 的硬件需求,大部分 Windows 10 PC 无法升级到 Windows 11。

特朗普政府要求 OpenAI 分阶段发布新模型

出于安全担忧特朗普政府要求 OpenAI 分阶段发布新的 GPT-5.6 模型。The Information 报道,新模型最初将提供给一小部分合作伙伴,政府将在预览期内“逐个批准客户的访问权限”。报道称,这一要求源于国家网络安全总监办公室和科技政策办公室之间的对话。

美国国防部恢复了疫苗强制接种要求

在美国一个空军基地逾 200 名新兵感染流感之后,美国海陆空兵种恢复了新兵疫苗接种要求。两个月前国防部长 Pete Hegseth 取消了数十年来一直沿用的流感疫苗接种强制令,理由是不合理,取消强制令将恢复军人的“自由”。但历史早就证明,兵营等封闭环境容易滋生病菌,而传染病一直是军队战斗力的大敌。最近德州 Lackland 空军基地报告了 222 例确诊流感病例和 4 例住院病例,其中新兵 Keon McDaniel 死亡,但暂时不清楚其死因是否与流感有关。该基地只有约 40% 的新兵接种了疫苗,这波疫情爆发始于 6 月初。五角大楼发言人称,五角大楼已批准陆军、海军、空军、国家安全局和国防卫生局豁免于 Hegseth 的流感疫苗自愿接种政策。

LastPass 再次披露用户数据泄漏

密码管理器 LastPass 再次披露了用户数据泄漏事故,这一次是它的外部合作伙伴 Klue 导致的,黑客访问了客户信息和支持案例数据。LastPass 称,被访问的数据包括客户姓名、电话号码、电子邮件地址和实际地址,以及支持案例数据和销售相关数据。它表示在获悉数据泄漏之后,它立即撤回了员工对 Klue 的访问,轮换了暴露的 API 令牌,通知了执法部门。LastPass 警告客户对钓鱼攻击或社交工程攻击提高警惕,公布了与攻击者相关的 IP 地址和电邮域名。

苹果产品正式涨价

在苹果 CEO 库克提前透风数天之后,苹果产品全系列涨价,涨幅少则 50 美元多则上千美元。即使是苹果也无法再自己承担高昂的内存和存储器成本。 苹果在一份声明中表示,“我们从未见过一个组件价格以如此之快、如此之大的幅度上涨。迄今为止,我们一直在尽力为客户抵挡这些涨价,但现在我们已经到了不得不开始提高部分产品价格的地步,包括今天 iPad 和 Mac 的涨价。我们知道这不是一个好消息,我们正在不遗余力地寻找解决方案。”

卵巢绝经后可能转变为具有免疫功能的器官

生殖专家曾认为,女性绝经后,卵巢会像阑尾一样变得无用。在对 50-75 岁女性的卵巢进行检查时,研究人员发现该器官的细胞会随着年龄增长产生不同的蛋白质。为了更深入研究卵巢的年龄相关变化,研究人员转向了实验小鼠。尽管小鼠不会出现雌激素急剧下降等人类更年期特有特征,但这些动物在 2 年生命周期的后期,卵巢功能也会停止。研究人员分别从年轻小鼠、处于生殖期末期的小鼠以及“绝经”后小鼠体内摘取了卵巢。对每只动物,他们对其中一侧卵巢的 RNA 进行了测序,以测量基因表达情况。对另一侧卵巢,他们对组织进行了显微镜下视觉分析,以识别不同的细胞群,并测量纤维化的发展程度,纤维化是指随着年龄增长自然发生的硬化组织堆积现象。但对“绝经”后卵巢的分析显示,其中各类免疫细胞的水平均高于年轻小鼠的典型水平。此外,老年小鼠的卵巢中,编码各种促炎化合物的基因活性更高,这些免疫分子可能被分泌到血液中并随血液流向身体其他部位。尚不清楚衰老的卵巢究竟是真正发挥着免疫信号传导的作用,还是仅仅是免疫细胞的意外聚集地。这一发现或许有助于解释,为何女性尽管寿命更长,但随着年龄增长,健康状况往往不如男性。绝经后的卵巢可能会分泌某些分子,导致女性在更年期出现慢性炎症。

中国科学家研发出降低镉吸收能力的水稻

镉不是植物生长的必要元素,但其通过土壤—水稻—食物链进入人体长期摄入后,会引发肾功能损伤、癌症、骨质疏松等严重健康问题。OsNramp5 是水稻中负责从根部往茎部运输镉的关键转运蛋白,但也同时负责锰离子等植物生长必需的金属离子的运输,敲除 OsNramp5 可以有效降低镉的运输,但也会造成其他必要金属元素的缺乏,使水稻大幅减产。根据发表在 PNAS 期刊上的研究,中国科学院遗传与发育生物学研究所等通过碱基替换技术,靶向编辑水稻负责吸收镉元素的核心转运基因 OsNramp5,创制了优异人工等位变异,发现了特异降低镉吸收而不影响锰等其他关键金属离子吸收的新机制,解决了低镉与高产难以兼顾的难题,为镉污染农田安全生产主粮提供了可落地的育种新方案。

OpenAI 宣布了专用于推理的自研 AI 芯片 Jalapeño

OpenAI 宣布了首款自研芯片 Jalapeño,由 OpenAI 与博通公司合作设计和制造,专门用于 AI 推理。OpenAI 没有披露技术方面的细节,只是称初步测试显示每瓦性能显著优于目前最先进的同类产品。OpenAI 与博通是在去年 10 月正式宣布合作,OpenAI 声称利用其模型加速了芯片的设计。自研 AI 芯片旨在减少对英伟达的依赖,Google 和亚马逊也都开发了自研芯片。

英国维基百科员工寻求成立工会

英国维基百科员工率先寻求成立工会。维基媒体基金会英国员工于 6 月 24 日星期三致函管理层,请求由 Communication Workers Union(CWU)下辖分支 United Tech and Allied Workers (UTAW) 代表他们的权利。员工呼吁维基基金会作为这家全球非营利机构的实际管理者,履行其领导层最近作出的公开承诺,即保障员工组织和组建工会的权利。逾千名维基志愿者和社区成员签署了请愿书声援这些员工。英国是仅次于美国的维基媒体基金会第二大员工来源国。

微软称 8GB 内存对 Windows 11 足够用了

微软更新了 Surface 购买指南,声称 8GB 内存对 Windows 11 足够日常使用了,如浏览、视频串流、作业和生产力应用。它同时表示 16GB 或以上的内存才能解锁 Copilot+ PC 功能。由于内存短缺且价格翻了数倍,PC 厂商不得不开始提供 8GB 内存的设备,但 8GB 内存对 Windows 11 而言非常勉强,而过去两年微软的宣传是 16GB 内存是获得良好 Windows 11 体验的必要条件。作为主要 AI 基础设施提供商,微软当然也是造成今天这一局面的罪魁祸首之一了。

白宫应用自动下载到政府配发手机上且无法卸载

美国白宫今年五月宣布其白宫应用将自动下载到政府配发手机上。该应用无法卸载,即使政府雇员尝试卸载,应用也会很快重新安装。美国农业部、国务院和劳工部员工匿名接受采访时表示,这款应用出现在手机上时让他们感到不安,有人试图删除它,但失败了。“我把它删了,测试下,结果它立刻又出现了,”一位美国农业部雇员说。白宫应用内有一个按钮,允许用户“给特朗普总统发短消息”,点击后会自动弹出一个写着“史上最伟大总统”的文本框。应用的社交部分可看到来自白宫 X 账号推文、特朗普 Truth Social 账号发布的帖子,以及官方账号在 TikTok 和 Instagram 等平台上分享的视频。新闻部分包含了白宫新闻稿、简报和情况说明书,以及来自 Fox, Breitbart, Reuters, The New York Post 等媒体的精选文章,这些内容要么对本届政府政策大加赞扬,要么攻击民主党。一位政府雇员说这是赤裸裸的宣传。

给拼写错误的单词引入波浪线的人

我们习以为常的图形 UI 中的每一个小细节,无论多么微小,都是由某个人在某个时间点想出来的。举例来说:拼写错误的单词下方的小红色波浪线。这种设计已成为每个文本编辑字段司空见惯的元素,以至于无人特意去思考它。然而它确实是由某个人发明的,微软资深程序员 Raymond Chen 说,这个人是 Tony Krueger。早期的 Word 版本中,拼写检查功能需要用户手动调用,然后等待程序查找所有可能拼写错误的单词,逐一向用户展示,由用户决定如何处理每一个错误。Word 引入了自动拼写检查功能,在用户空闲时运行拼写检查,当用户点击拼写检查按钮时,结果已准备就绪。然而自动拼写检查仍然是一个阻塞操作。很多用户选择关闭它,因为它总是会在你想做其它事情如保存并退出时突然决定“现在是检查文档拼写的好时机”,迫使你等待拼写检查完成。Tony 让拼写检查器变得更不显眼,不会干扰用户的当前工作。当它发现问题时,不会触发拼写检查,而是立即在可能拼写错误的单词下画上红色波浪线,后来在可能语法错误的单词下画上绿色波浪线。

LG 和三星智能电视应用三分之一嵌入了住宅代理 SDK

对 LG 和三星智能电视应用的扫描发现,6038 款电视应用中有 2058 款嵌入了住宅代理 SDK,也就是会出售用户的家用 IP 作为代理服务使用。智能电视是理想的代理主机,它基本上一直处于插入电源状态,同时接入了家用 WIFI,但不像 PC 没人会去检查其可疑后台活动。电视应用上的广告可能会让用户不满,但默默运行的住宅代理则能在最小化用户不满的同时给运营商带来收入。但住宅代理会有滥用的风险,Kimwolf 僵尸网络就滥用了住宅代理进行传播和扩散。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…