TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0908
FRI, JUN 26, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · FRI, JUN 26, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 26, 2026

Of course. Here is a summary of today's news events based on the information provided.


Tech Stocks Decline Amid AI Valuation Concerns

U.S. stock markets were mixed, with the tech-focused Nasdaq falling as investors grew concerned that the recent rally in artificial intelligence (AI) stocks was overextended. This sentiment caused a major sell-off in global markets, with South Korean shares dropping more than 8%.

Oil Prices Fall, Easing Interest Rate Fears

Oil prices continued to drop, nearing pre-war levels. The decline has eased concerns about inflation, causing U.S. Treasury yields to fall on expectations that the Federal Reserve may not need to raise interest rates as aggressively as previously anticipated.

JPMorgan Announces New Co-Presidents in Succession Plan

JPMorgan Chase named Doug Petno and Troy Rohrbaugh as co-presidents of the company. The appointments are a significant step in the bank's succession plan to eventually replace longtime Chief Executive, Jamie Dimon.

Chipmaker Micron Warns of Continued Shortages

Technology company Micron announced that the global shortage of memory chips is expected to continue. The company noted that while the supply chain issues persist, its customers are actively developing workarounds.

Former Facebook Executive Sues Meta Over Book

Sarah Wynn-Williams, a former policy executive at Facebook, is suing the parent company, Meta. She claims the company is attempting to silence her from speaking about her bestselling book, "Careless People," which is critical of the social media giant.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 26, 2026

Hacker News Feed: Highlighting key posts and discussions.

My Steam Machine Is a 50ft HDMI Cable

(blog.matthewbrunelle.com)

7376
Libre Barcode Project

(graphicore.github.io)

21335
Oxide computer 3D rack guided tour

(explorer.oxide.computer)

412170
You can't unit test for taste

(dev.karltryggvason.com)

292129
Half-Life 2 in a Browser

(hl2.slqnt.dev)

658265
OAuth for all

(blog.cloudflare.com)

371160
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - June 26, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

DanceOPD: On-Policy Generative Field Distillation

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.

51
ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

A unified representation for text and vision is a natural pursuit, as it enables simpler multimodal modeling and more efficient training. However, representing images as discrete signals in the same way as text inevitably introduces severe information loss. Existing work struggles to balance low-level details and high-level semantics in discrete representations: reconstruction-oriented representations often lack semantic information, whereas semantically stronger features typically suffer from severe loss of detail. We present ViQ, a Visual Quantized Representations framework, which is designed to balance semantics and details in discrete representations while supporting inputs at native resolutions, thereby enabling it to serve as a unified and general discrete representation for arbitrary visual inputs. Our approach structures quantization learning into two stages: text-aligned pre-training and feature discretization. With text-aligned pre-training, we enhance the visual encoder semantic-rich supervision from the pretrained language model and enable it to process native-resolution visual inputs. During discretization, we propose a proximal representation learning strategy to progressively compact the feature space, along with a position-aware head-wise quantization mechanism that enables flexible processing of arbitrary resolutions. Extensive experiments on multimodal tasks demonstrate that ViQ achieves competitive performance compared to state-of-the-art multimodal vision encoders with continuous and high-dimensional visual features, while maintaining high precision in low-level reconstruction. We also show that multimodal training with visual quantized representations largely improves efficiency, yielding up to 20\%-70\% acceleration with different base LLMs and training recipes.

34
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed. On-policy self-distillation offers dense token-level supervision, yet existing skill-conditioned variants often rely on external skill memories or retrieved privileged context, which are costly to maintain and can be mismatched with the state distribution induced by the current policy in multi-turn interaction. We propose OPID (On-Policy Skill Distillation), a framework that extracts skill supervision directly from completed on-policy trajectories. OPID represents trajectory hindsight as hierarchical skills: episode-level skills capture global workflows or failure-avoidance rules, while step-level skills capture local decision knowledge at critical timesteps. A critical-first routing mechanism uses step-level skills when critical decisions are identified and falls back to episode-level skills as default guidance otherwise. The selected skill is injected into the interaction history, allowing the old policy to re-score the same sampled response under both original and skill-augmented contexts. The resulting log-probability shift yields a token-level self-distillation advantage, which is combined with the outcome advantage for policy optimization. OPID thus preserves RL as the primary training objective while introducing dense, distribution-matched hindsight supervision. Experiments on ALFWorld, WebShop and Search-based QA demonstrate that OPID generally improves agent performance, sample efficiency, and robustness over outcome-only RL and existing skill-distillation baselines. Our code is available at https://github.com/jinyangwu/OPID/tree/main.

31
Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation

While text-to-image (T2I) models have achieved remarkable progress, they struggle with real-world requests that are often underspecified, implicit, or dependent on up-to-date knowledge. We identify this challenge as the Context Gap: the mismatch between the user context and the sufficient generation context for T2I models. To bridge this gap, we propose Qwen-Image-Agent, a unified agentic framework that integrates plan, reason, search, memory and feedback in a context-centric manner. Qwen-Image-Agent treats user input as partial context and progressively constructs the generation context through Context-Aware Planning and Context Grounding. Specifically, Context-Aware Planning identifies missing context and plans how it should be acquired and used, while Context Grounding gathers this context from reason, search, memory, and feedback. To evaluate agentic image generation, we further introduce Image Agent Bench (IA-Bench), a benchmark covering four core image agent capabilities: Plan, Reason, Search, and Memory. Experiments on IA-Bench, Mindbench and WISE-Verified show that Qwen-Image-Agent outperforms strong baselines and achieves state-of-the-art performance.

30
The Verification Horizon: No Silver Bullet for Coding Agent Rewards

A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is no longer difficult -- reliably verifying them has become the harder problem. Every verifier we can build is only a proxy for human intent, never the intent itself. This makes verification subject to a twofold difficulty: first, intent is underspecified by nature, making it inherently hard to faithfully check whether it has been fulfilled; second, during model training, optimization widens the gap between proxy and intent -- manifesting as reward hacking or signal saturation. To address this, we characterize the quality of verification signals along three dimensions -- scalability, faithfulness, and robustness -- and argue that achieving all three simultaneously is the central challenge. We further study four reward constructions: a test verifier for general coding tasks, a rubric verifier for frontend tasks, the user as verifier for real-world agent tasks, and an automated agent verifier for long-horizon tasks. Across different task types and policy capability levels, we conduct in-depth analysis and experiments on the core challenges of reward design and how to more effectively leverage reward signals. Experiments show that targeted verification design can effectively suppress reward hacking, improve task completion quality, and achieve significant gains across multiple internal and public benchmarks. These experiences collectively point to a core observation: no fixed reward function can remain effective as policy capability continues to grow; and verification must co-evolve with the generator.

22
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Speculative decoding (SD) accelerates autoregressive Large Language Models (LLMs) by drafting multiple tokens and verifying them in parallel, but it faces a scaling limitation: increasing the draft budget improves speed only when acceptance remains high and drafting overhead stays low. This ceiling has been difficult to break because prior head-based SD methods face a causality-efficiency dilemma. Autoregressive drafters produce path-conditioned candidates that are effective for tree speculative decoding with higher acceptance length, but their drafting cost grows with tree depth. Bidirectional block-diffusion drafters generate all positions in one pass, but their branch-agnostic marginals can form individually plausible yet mutually inconsistent trees, wasting budget and reducing acceptance. We propose JetSpec, a head-based SD framework that combines one-forward drafting efficiency with branch-wise causal conditioning. JetSpec trains a causal parallel draft head over fused hidden states from the frozen target model, producing candidate trees whose scores align with the target model's autoregressive factorization. This enables JetSpec to convert larger draft budgets into longer accepted prefixes and higher end-to-end speedup. Across math, coding, and chat benchmarks on dense and MoE Qwen3 models, JetSpec consistently outperforms bidirectional-head and tree-based SD baselines. On H100 GPUs, JetSpec achieves up to 9.64x speedup on MATH-500 and 4.58x on open-ended conversational workloads, with further latency gains demonstrated through vLLM integration under realistic serving loads. Our code and models are available at https://github.com/hao-ai-lab/JetSpec.

19
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catastrophic collapse, where performance abruptly drops and tool-invocation structures fail. The analysis reveals that these failures stem from unexpected probability spikes in specific control tokens, disrupting structured execution, yet the underlying tool-use capability remains intact, merely obscured by specific formats. To address this, we systematically investigate a diverse set of supervisory signals, including off-policy supervision, hint-based guidance, erroneous example supervision, and others, applied under both synchronous and interleaved training schemes. We find that interleaving supervised fine-tuning (SFT) with RL substantially improves stability, but exhibits degraded performance under format and content out-of-distribution (OOD) evaluation. We also analyze the impact of learning rates and generalization across settings. These results highlight the importance of understanding RL failures and demonstrate how diverse supervisory signals can guide exploratory learning, enabling robust training of LLMs for complex, multi-step tool-use tasks. Our Code is available at https://github.com/hypasd-art/Tool-RL-Box.

10
Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities while overlooking broader dimensions, resulting in saturated performance on modern agents and failing to probe their limitations. To this end, we introduce GauntletBench, a web-based benchmark for evaluating agent generalisation in challenging scenarios, focusing on three underexplored capabilities (temporal perception, graphical understanding, and 3D reasoning), across five less-covered professional applications (Video Editor, Workflow Builder, 3D Modeller, Flight Analyser, and Circuit Designer), each with 20 vision-intensive tasks (100 in total). Our benchmark provides a modular pipeline that comprises an environment compatible with both open- and closed-source agent frameworks, a controlled web-based application, a well-structured task suite, and an automated evaluation engine with diverse metrics. Contrary to widespread expectations, our empirical results reveal that frontier agentic systems remain far from achieving human-level performance. Even the state-of-the-art agent achieves only a 19.1% success rate on our GauntletBench, highlighting the limitations in these overlooked capabilities and generalisation. By comparison, non-expert human annotators achieve over 80% success on our challenging yet feasible tasks, revealing the substantial gap between current agent capabilities and those required for complex real-world scenarios.

9
GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.

6
Confidence-Aware Tool Orchestration for Robust Video Understanding

Video reasoning language models implicitly assume that every input frame is equally reliable. This leads to what we term the Blind Trust Problem: under realistic perturbations such as motion blur, glare, or occlusion, frontier video reasoning models can suffer 15-30%p accuracy drops on real-world embodied benchmarks, while remaining unaware that their visual evidence has been degraded. To address this challenge, we propose Robust-TO, an agentic video understanding framework that explicitly integrates per-frame trustworthiness into every stage of reasoning. Robust-TO organizes heterogeneous visual perception tools under a unified evidence interface. Each tool receives a sub-query derived from the original question and a set of trustworthy frames selected by the reliability-relevance score. It returns evidence in a shared format: a concrete prediction (e.g., a bounding box, motion trajectory, recognized text, or action label), temporal grounding, and a calibrated reliability score. During reasoning, these calibrated scores guide evidence weighting in a three-tier synthesis process (high/medium/low) and define a confidence-cost GRPO reward that jointly optimizes correctness, evidence reliability, and efficiency. On two video reasoning benchmarks spanning eight tasks, Robust-TO achieves 56.4% average accuracy on clean inputs, surpassing the strongest open-source baseline by 10.6%p and outperforming Gemini-2.5-Pro (46.2%). Under five realistic corruption types, Robust-TO maintains 54.3% average accuracy, 5.8%p above the strongest open-source baseline, while exhibiting the smallest clean-to-corrupted accuracy drop among all compared methods.

6
In-Context World Modeling for Robotic Control

Modern Vision-Language-Action (VLA) models often fail to generalize to novel setups, such as altered camera viewpoints or robot morphologies, because they are typically conditioned only on current observations and language instructions. By ignoring the underlying system configuration as a variable, these models implicitly assume a fixed execution context encountered during training, necessitating data-intensive fine-tuning for any new environment. In this work, we introduce In-Context World Modeling (ICWM), a framework that treats system identification as an in-context adaptation problem. ICWM enables robot policies to autonomously infer essential system variables from a short history of self-generated, task-agnostic interactions. Unlike traditional In-Context Learning that uses demonstrations to specify what task to perform, ICWM leverages the context window to understand how the system operates. By processing these interactions before task execution, the model implicitly captures the world dynamics of the current system, enabling adaptation to novel configurations without parameter updates. Extensive experiments in simulation and on real-world robot platforms demonstrate that ICWM significantly outperforms standard VLA baselines on novel camera viewpoints.

5
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a single agent interacting with a passive environment, economic systems are inherently multi-agent, requiring autonomous agents to communicate, negotiate, and transact while pursuing their own objectives over extended periods. We introduce CoffeeBench, a benchmark for evaluating LLM agents in a long-horizon multi-agent economy composed of heterogeneous firms. In CoffeeBench, two farmers, two roasters, and two retailers autonomously operate their businesses over a 90-day simulation, each seeking to maximize cumulative net income through communication and transactions while managing cash, inventory, and pricing. The evaluated model controls one coffee roaster, while the remaining firms are controlled by fixed reference agents. Across several recent open-weight and proprietary LLMs, all models outperform a passive baseline that takes no actions, with most achieving positive net income. Analysis of agent behavior reveals substantial differences in long-horizon economic interaction: higher-performing models communicate more actively with other firms, whereas Claude~Haiku~4.5 exhibits an idle-drift failure mode, repeatedly choosing inaction despite producing coherent assessments and plans. We release our code and agent trajectories to support future research.

4
Fast LeWorldModel

Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.

3
Discretizing Reward Models

Despite their widespread use, the role of reward models in shaping reinforcement learning is poorly understood. Reward models offer a tempting promise: they automatically estimate response quality in the absence of verifiers or human judges. Unlike "verifiable rewards" which typically produce binary scores, reward models typically produce continuous scores, allowing them to be sensitive to fine-grained differences in responses. However, we show this apparent strength is a serious weakness: many popular reward models are oversensitive, assigning different scores to equally good responses. Theoretically, we show that seemingly perfect reward models can be highly oversensitive; empirically, this oversensitivity can lead to bad policies. In place of existing notions of "reward model accuracy," we propose evaluating reward models using distinct measures of "discriminative ability" and "specificity" (the complement of oversensitivity). As a solution, we describe a training-free algorithm that uses Monte Carlo dropout on any neural reward model to produce discrete reward clusters. Theoretically, we prove there exist discretizations that reduce oversensitivity at minimal expense of discriminative ability; empirically we show, in both controlled and natural RL settings, that discretizing rewards leads to less reward hacking and better policies than training on the original rewards.

2
PhysiFormer: Learning to Simulate Mechanics in World Space

We present PhysiFormer, a diffusion transformer for physically-plausible 3D object motion. Unlike video world models that operate in view-dependent pixel space, PhysiFormer represents objects as 3D meshes expressed in world coordinates. Given the initial vertex positions and velocities, as well as object material type, rigid or elastic, the model samples future vertex trajectories. While related neural physics approaches build on ad-hoc latent spaces or explicitly enforce rigidity and causality, PhysiFormer shows that excellent results can be obtained without any such inductive biases, by casting vertex trajectory prediction as a single denoising diffusion process directly in world coordinates. The probabilistic formulation captures uncertainty in the learned dynamics, enabling diverse plausible futures from initial conditions, making this framework potentially useful for applications with unobserved uncertainty. The model features attention factorised over time, space, and objects for efficiency, enabling permutation-invariant multi-object reasoning without needing explicit object encoding. Trained on over 100k simulated trajectories, PhysiFormer generates rigid and elastic mechanics, and generalises to mixed-material settings, unseen real-world geometries, and larger object counts. It substantially outperforms autoregressive baselines in trajectory accuracy, rigidity preservation, and momentum-based physical consistency. Our results position coordinate-space diffusion as a promising step toward view-invariant, geometry-aware world modelling for robotics, graphics, and physical design. Visualisations, code, and models are available at https://yimingc9.github.io/physiformer.

2
When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

Multi-model LLM systems such as routing, voting, cascades, fusion, and mixture-of-agents are used to beat single-model accuracy. We show that their gain is capped by a quantity the field rarely reports. For any policy whose output is one member model answer, accuracy cannot exceed one minus beta, where beta is the rate at which every model is wrong on the same query. In contrast, the usual diagnostic, average pairwise error correlation rho, cannot identify beta: error laws with identical marginals and pairwise correlations can have different all-wrong rates. A Clopper-Pearson bound on beta gives a finite-sample certificate on the largest gain any router, vote, or cascade could deliver before training a router. Across 67 models from 21 providers, a tetrachoric-calibrated single-factor model still underprices the all-wrong tail: on open-ended mathematics, observed beta is 0.052 versus 0.023 under the full 67-model Gaussian copula, about 2.5 times underpricing, with 90 percent CI 1.7 to 3.4 and k equals 17. The effect recurs on execution-graded code, where beta is 0.079. Re-asking the same GPQA-Diamond questions in free-response rather than multiple-choice form reopens the tail, with beta 0.127 and a five-judge panel with kappa 0.73 to 0.92, locating co-failure in answer format rather than subject. At matched quality, low-rho heterogeneous ensembles beat high-rho Self-MoA, but on checkable tasks in our pool, combining models rarely beats the single best model without a strong query-level routing signal. Gains come from models failing on different questions, not from adding more models.

1
Hallucination in World Models is Predictable and Preventable

Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation. An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2

1
OpenBioRQ: Unsolved Biomedical Research Questions for Agents

A working citation looks like proof -- but the fact that a link resolves does not mean the cited paper supports the claim. I find that current agentic models rarely fabricate citations (over 99% resolve), yet roughly 15.9% link to the wrong paper. Existing benchmarks miss this failure mode: when a question has a fixed answer key, a model can reproduce the expected source from that key rather than independently verifying that the source supports the claim. I introduce \openbiorq{}, a retrieval-grounded agentic benchmark of 12{,}553 unsolved biomedical research questions across 12 domains that treats open questions as a faithfulness-and-abstention probe. To my knowledge, this is the first biomedical benchmark to combine an agentic setting -- where the model must issue multiple tool calls -- with unsolved questions that have no answer key. Openness is verified against real follow-up evidence rather than a model's parametric knowledge. Difficulty is empirical: I anchor it on questions that three open-weight reference models fail to answer, rather than on subjective hardness labels. On this hardest subset, held-out models from the same lineage as the difficulty anchors solve only ~17%, while three independent frontier agents (Gemini-3-Pro, Opus-4.7, GPT-5.5) span a wide 29-60% range. The benchmark is thus hard, non-saturating (the best agent still leaves ~33-40\% unsolved), and discriminating across capability tiers. Beyond difficulty, I observe agentic collapse on the hardest questions, where agents stop using their tools. For the most collapse-prone model, blocking tool access entirely barely changes its score -- so tools stop paying off exactly where they are needed most. A frozen per-question checklist raises inter-judge agreement from Spearman 0.35 to 0.82.

1
COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

While generative AI has achieved remarkable success in solving problems with verifiable solutions, generating physical art that satisfies both strict geometric constraints and subjective visual aesthetics remains a challenge. This paper presents an approach to tackle these difficulties in the domain of computational origami, a mathematically rigid environment that grounds artistic design within the equations of flat foldability. We present COrigami, an end-to-end AI-driven pipeline that assists the design cycle by generating crease patterns from natural language. Our pipeline involves generating a semantic stick figure, computing a base packing, solving for a flat-foldable crease pattern, shaping the flat-folded crease pattern, and refining the generated model using reinforcement learning driven by an autonomous aesthetic evaluation loop. Our system acts as a highly effective collaborative assistant, generating structural starting points that human artists can further expand and shape. By integrating algorithmic optimisation with autonomous aesthetic critique, this work demonstrates how AI systems can satisfy multi-objective physical constraints to enable reliable, mathematically grounded co-creativity.

1
How Post-Training Shapes Biological Reasoning Models

Scientific reasoning models for biology combine language models with foundation models trained on multimodal biological data, including DNA, RNA, and proteins. These models are built through post-training, yet how each stage shapes reasoning and generalization remains poorly understood. We study when post-training improves performance and when it induces over-specialization. Across genomics, transcriptomics, and proteins, we train and evaluate more than 100 biological reasoning models under controlled variation in backbone, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), measuring both in-domain (ID) and out-of-domain (OOD) performance. We find that each post-training stage reshapes generalization in a distinct way rather than contributing uniform gains. CPT improves downstream performance by aligning models with biological language. SFT consistently increases ID performance but causes OOD performance to peak early and decline as models fit the training distribution. RL, when applied to strong SFT checkpoints with aligned rewards, improves OOD performance and partially recovers generalization. These results show that biological reasoning does not improve monotonically with additional supervision or compute. Instead, performance depends on how training stages are composed. Under fixed post-training budgets, the strongest ID-OOD trade-off comes from brief SFT, larger RL allocations, and asymmetric adaptation capacity across stages.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 26, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Group Subscriptions by beehiiv icon
Group Subscriptions by beehiiv

Sell subscriptions to teams, companies, and organizations.

0
ModuleX icon
ModuleX

AI workspace that’s already connected to everything

0
Atlas icon
Atlas

Every AI tool you use should know how your company works

0
Animdock Motion Templates in the Browser icon
Animdock Motion Templates in the Browser

Create trend motions in your browser!

0
Gemini Spark icon
Gemini Spark

Your 24/7 personal AI agent

0
SquidHub icon
SquidHub

Multiplayer mode for humans and AI

0
Cewsco icon
Cewsco

All-in-one AI assistant — chat, images, voice & market data

0
Aurora Notch icon
Aurora Notch

A private notch workspace for every Mac

0
DMV by Agent Community icon
DMV by Agent Community

A community-governed namespace for AI agents

0
LockIn MCP icon
LockIn MCP

Let AI block distractions for you when you need to lock in

0
Basedash for Excel icon
Basedash for Excel

Turn any Excel file into a live dashboard

0
Sleek Analytics icon
Sleek Analytics

See who's on your site. Right now.

0
note.md icon
note.md

your notes and research documentation now a local LLM Memory

0
Agent Arena icon
Agent Arena

The first public arena for AI agents

0
AI Slide Editor by CubeOne icon
AI Slide Editor by CubeOne

The editor PowerPoint should've shipped

0
SayCraft icon
SayCraft

Build a web app by talking through a meeting

0
Brain² by ClickUp icon
Brain² by ClickUp

One AI that knows your entire company and acts on it

0
Nashra icon
Nashra

Turn followers into clients.

0
Postproxy - Engagement API icon
Postproxy - Engagement API

Publish, reply, and analyze social media via API

0
Milestones icon
Milestones

Native project planning app, now on Mac & with an MCP server

0
Papermark Agents icon
Papermark Agents

Let AI agents run your next deal, fundraise or data room

0
Tough Tongue AI for Sales icon
Tough Tongue AI for Sales

Live AI teammate for every tough sales conversation

0
Paybond CLI icon
Paybond CLI

Safe agent spend from the terminal

0
Blop icon
Blop

Describe your app and Blop tests it and repairs broken tests

0
Sidegent icon
Sidegent

Learn to build AI agents by actually building them

0
Signspell icon
Signspell

Real-time ASL alphabet recognition in py ,pip install and go

0
BrowserBash icon
BrowserBash

CLI that turns plain-English into real browser tests

0
Samepage Signals icon
Samepage Signals

Your second brain for product management

0
Zaro icon
Zaro

Build agents & apps on top of your context with one prompt.

0
Oxlo.ai icon
Oxlo.ai

Scale across AI models without scaling your bill

0
BrowserAct icon
BrowserAct

Web browser automation for AI agents

0
Figma Motion icon
Figma Motion

Your Figma canvas now has a timeline

0
Heron icon
Heron

Wireshark for AI Agents: passive eBPF observability

0
Polygraph icon
Polygraph

Let AI agents see cross repo and maintain session memory.

0
Dub Ninja icon
Dub Ninja

Live autonomous AI DJ that digs, mixes & explains 24/7

0
MeetPoint icon
MeetPoint

Find the city where everyone's flights are cheapest

0
Grass 2.0 icon
Grass 2.0

The always-on computer for your coding agents

0
VTT for Mac icon
VTT for Mac

Voice-to-text for macOS with a fully on-device option

0
SendTidings icon
SendTidings

Turn your analytics into beautiful monthly email reports

0
QuickMaker icon
QuickMaker

State of the art AI models in Blender under one subscription

0
Genspark Design icon
Genspark Design

Generate UI prototypes, videos, and posters with AI

0
Ruby icon
Ruby

Ask better questions, live on every call

0
Prospector by Synter icon
Prospector by Synter

Your outbound agent, right inside Slack

0
React UI Kit V7 icon
React UI Kit V7

All the chat components you need. None of the complexity

0
Tencent EdgeOne Makers icon
Tencent EdgeOne Makers

Ship AI agents like web apps, in minutes.

0
Stripe.Directory icon
Stripe.Directory

New way for you & agents to search for businesses on Stripe

0
Buy by Agentcard icon
Buy by Agentcard

Order DoorDash from Claude

0
Propane icon
Propane

Automatic customer context for product teams and agents

0
Mindstone Rebel icon
Mindstone Rebel

AI workspace for agents that know your work and ask first

0
Swimio icon
Swimio

AI swim coach with Apple Watch tracking & smart workouts

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 26, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Sources: Donald Trump Jr. got ~$300K in Kalshi equity after becoming a strategic adviser in 2025, when Kalshi was valued at ~$2B; now it's worth $22B+ (George Steer/Financial Times)
Source: TechmemePublished: Jun 26, 2026

George Steer / Financial Times : Sources: Donald Trump Jr. got ~$300K in Kalshi equity after becoming a strategic adviser in 2025, when Kalshi was valued at ~$2B; now it's worth $22B+ —  Privately owned company's valuation has soared as US administration has adopted a light-touch approach to the sector

Q&A with Tim Sweeney on his vision for a cross-platform gaming social system, Unreal Engine 6, AI's PR challenges, the state of AAA game development, and more (Tim Clark/PC Gamer)
Source: TechmemePublished: Jun 26, 2026

Tim Clark / PC Gamer : Q&A with Tim Sweeney on his vision for a cross-platform gaming social system, Unreal Engine 6, AI's PR challenges, the state of AAA game development, and more —  The Epic Games CEO discusses his vision for “Team Open,” his objections to Steam's AI disclosure requirement, and the huge problems facing AAA game development.

Italy is investigating Microsoft 365's price hike, saying Microsoft failed to inform users that AI tools like Copilot were being integrated into the service (Giulia Segreti/Reuters)
Source: TechmemePublished: Jun 26, 2026

Giulia Segreti / Reuters : Italy is investigating Microsoft 365's price hike, saying Microsoft failed to inform users that AI tools like Copilot were being integrated into the service —  Italy's antitrust authority said on Friday it had opened an investigation into Microsoft (MSFT.O) over alleged unfair commercial practices linked …

A study of 408 teens in Australia finds 80%+ were still using social media three months after a ban came into force, citing inadequate age verification checks (Anna Bawden/The Guardian)
Source: TechmemePublished: Jun 26, 2026

Anna Bawden / The Guardian : A study of 408 teens in Australia finds 80%+ were still using social media three months after a ban came into force, citing inadequate age verification checks —  Experts say law not enough to stop children accessing harmful content online and more ‘convincing strategy is required’

Binance tells EU customers that it will stop providing services for them from July 1, after Greece rejected its application for a bloc-wide license last week (Financial Times)
Source: TechmemePublished: Jun 26, 2026

Financial Times : Binance tells EU customers that it will stop providing services for them from July 1, after Greece rejected its application for a bloc-wide license last week —  World's biggest crypto exchange tells customers how to withdraw their money as MiCA rules set to come into force

Despite the "DeepMind mafia" pulling billions into London AI startups, none of Demis Hassabis' former lieutenants is building a homegrown UK frontier AI model (Tim Bradshaw/Financial Times)
Source: TechmemePublished: Jun 26, 2026

Tim Bradshaw / Financial Times : Despite the “DeepMind mafia” pulling billions into London AI startups, none of Demis Hassabis' former lieutenants is building a homegrown UK frontier AI model —  The tech sector is buzzing in Britain.  But can it ever be more than a US outpost?  Two decades ago, King's Cross was central London's most neglected district.

Shares of Japanese NAND flash maker Kioxia slid 12% on Friday after a report that OpenAI was considering delaying its IPO sparked a selloff in AI-related shares (Sam Nussey/Reuters)
Source: TechmemePublished: Jun 26, 2026

Sam Nussey / Reuters : Shares of Japanese NAND flash maker Kioxia slid 12% on Friday after a report that OpenAI was considering delaying its IPO sparked a selloff in AI-related shares —  Shares of Japanese chipmaker Kioxia (285A.T) slid 12% on Friday after a report that ChatGPT maker OpenAI was considering delaying …

Sources: SpaceX COO Gwynne Shotwell told investors during an IPO roadshow SpaceX may launch a Starlink mobile product and build its own terrestrial US network (Financial Times)
Source: TechmemePublished: Jun 26, 2026

Financial Times : Sources: SpaceX COO Gwynne Shotwell told investors during an IPO roadshow SpaceX may launch a Starlink mobile product and build its own terrestrial US network —  Move would test whether group can turn sky-high ambition into a mass-market phone business  —  Elon Musk's SpaceX has told investors …

Swiss watchmaker Swatch seeks $170M from Samsung in a London trial over 26 digital watch face apps that allegedly cloned luxury brand designs, including Omega (Alistair Gray/Financial Times)
Source: TechmemePublished: Jun 26, 2026

Alistair Gray / Financial Times : Swiss watchmaker Swatch seeks $170M from Samsung in a London trial over 26 digital watch face apps that allegedly cloned luxury brand designs, including Omega —  Swiss watchmaker accuses technology group of ‘large-scale appropriation’ of luxury designs in London lawsuit

OpenAI says 97.9% of its employees are now using Codex, up from ~40% in August 2025; non-developer usage of Codex has risen 137x for individual users (Thomas Claburn/The Register)
Source: TechmemePublished: Jun 26, 2026

Thomas Claburn / The Register : OpenAI says 97.9% of its employees are now using Codex, up from ~40% in August 2025; non-developer usage of Codex has risen 137x for individual users —  Codex, it's not just for developers, really  —  A company can learn a lot about the market by looking at its own employees.

Warp, a startup using AI to automate payroll compliance and employee management, raised a $60M Series B led by Battery, bringing its total funding to $85M (Lucinda Shen/Axios)
Source: TechmemePublished: Jun 26, 2026

Lucinda Shen / Axios : Warp, a startup using AI to automate payroll compliance and employee management, raised a $60M Series B led by Battery, bringing its total funding to $85M —  Warp, a startup using AI to automate payroll compliance and employee management, raised $60 million in Series B funding, CEO Ayush Sharma tells Axios exclusively.

Utah-based ecommerce tech company Redo raised an $81M Series B at a $1.25B valuation led by Smash Capital, with participation from Pelion and Cervin (David Politis/Utah Money Watch)
Source: TechmemePublished: Jun 26, 2026

David Politis / Utah Money Watch : Utah-based ecommerce tech company Redo raised an $81M Series B at a $1.25B valuation led by Smash Capital, with participation from Pelion and Cervin —  The Utah eCommerce-technology company started with returns.  —  Now it is raising growth capital to build a broader AI-powered platform …

SoftBank's stock fell 12.53% after a report that OpenAI may delay its IPO until 2027; expectations of a windfall from OpenAI's debut had buoyed Softbank's stock (Aya Wagatsuma/Bloomberg)
Source: TechmemePublished: Jun 26, 2026

Aya Wagatsuma / Bloomberg : SoftBank's stock fell 12.53% after a report that OpenAI may delay its IPO until 2027; expectations of a windfall from OpenAI's debut had buoyed Softbank's stock —  SoftBank Group Corp.'s stock fell as much as 13% on concerns that OpenAI may hold off on an initial public offering until next year …

California launches a tool to serve as an "early warning system" for widespread AI-driven job loss, linking AI exposure with unemployment insurance claims (Jo Constantz/Bloomberg)
Source: TechmemePublished: Jun 26, 2026

Jo Constantz / Bloomberg : California launches a tool to serve as an “early warning system” for widespread AI-driven job loss, linking AI exposure with unemployment insurance claims —  Politicians like California Governor Gavin Newsom are under pressure to appear proactive in the face of the technology's threat to the labor market

Upside, which provides social workers with AI-powered housing insights, raised a $20M Series A led by Aquiline with Flare Capital, 645, and others participating (Cailey Gleeson/Fierce Healthcare)
Source: TechmemePublished: Jun 26, 2026

Cailey Gleeson / Fierce Healthcare : Upside, which provides social workers with AI-powered housing insights, raised a $20M Series A led by Aquiline with Flare Capital, 645, and others participating —  funding round series A Payers Medicaid  —  Upside, a scalable housing stability platform, banked a $20 million series …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 26, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 26, 2026

Solidot Feed: Highlighting essential tech & open-source news.

晚上刷手机与眼疾风险增加相关

上海交通大学医学院附属第一人民医院的一支研究团队利用了英国生物样本库(UK Biobank)的数据,最终纳入了 82826 名基线时无眼部疾病的参与者。这些参与者均连续 7 天佩戴了配有高分辨率光传感器的腕带式加速度计,以客观记录其个人光照暴露情况。研究结果显示,在晚间时段(晚上20:00至23:30),当参与者所处环境的平均光照强度超过1000勒克斯时,与其后眼部退行性疾病的发病风险显著升高相关。其中,年龄相关性黄斑变性的患病风险增加了31%,白内障风险增加了18%,而原发性开角型青光眼的风险则大幅增加了47%。研究人员还观察到了显著的时间-剂量反应关系。在极高强度(如超过2250勒克斯)的光照下暴露时间越长,发生整体年龄相关性眼病和青光眼的风险就越高。

《Arma: Cold War Assault》重制版开源

Bohemia Interactive 在 GPL v3.0 许可证下公开了《Arma: Cold War Assault》重制版源代码,项目托管在 GitHub 上。《Arma: Cold War Assault》于 2001 年以《Operation Flashpoint: Cold War Crisis》的名字发布,游戏提供了 12.5 km × 12.5 km 开放世界地图,它对于现代化立体化野战的真实模拟为它赢得了一大批军事游戏爱好者拥趸。游戏的开放性以及强大的脚本编程能力,也给它带来了大量 MOD。重制版代码已现代化至 C++20,使用 CMake 和 Clang 构建,并支持 Windows x64 和 Linux x64 等平台。Bohemia Interactive 称,游戏代码是自由软件,但名字和商标并不能自由使用,而且模型、纹理、音效、任务和语音等游戏数据也都没有公开,需要另外购买。

微软再次延长 Windows 10 免费安全更新一年

Windows 10 于 2025 年 10 月 14 日结束支持,微软原本此后不再提供免费的安全更新,但 Windows 10 仍然有大量用户使用,软件巨人去年宣布将提供免费安全更新一年。如今还有几个月时间才到期,微软又将免费安全更新延长一年,Windows 10 用户不需要做任何事就能再享受一年免费安全更新。最新的扩展安全更新将于 2027 年 10 月 12 日到期。根据 StatCounter 的统计,有 26% 的 PC 仍然运行 Windows 10,由于微软提高了 Windows 11 的硬件需求,大部分 Windows 10 PC 无法升级到 Windows 11。

特朗普政府要求 OpenAI 分阶段发布新模型

出于安全担忧特朗普政府要求 OpenAI 分阶段发布新的 GPT-5.6 模型。The Information 报道,新模型最初将提供给一小部分合作伙伴,政府将在预览期内“逐个批准客户的访问权限”。报道称,这一要求源于国家网络安全总监办公室和科技政策办公室之间的对话。

美国国防部恢复了疫苗强制接种要求

在美国一个空军基地逾 200 名新兵感染流感之后,美国海陆空兵种恢复了新兵疫苗接种要求。两个月前国防部长 Pete Hegseth 取消了数十年来一直沿用的流感疫苗接种强制令,理由是不合理,取消强制令将恢复军人的“自由”。但历史早就证明,兵营等封闭环境容易滋生病菌,而传染病一直是军队战斗力的大敌。最近德州 Lackland 空军基地报告了 222 例确诊流感病例和 4 例住院病例,其中新兵 Keon McDaniel 死亡,但暂时不清楚其死因是否与流感有关。该基地只有约 40% 的新兵接种了疫苗,这波疫情爆发始于 6 月初。五角大楼发言人称,五角大楼已批准陆军、海军、空军、国家安全局和国防卫生局豁免于 Hegseth 的流感疫苗自愿接种政策。

LastPass 再次披露用户数据泄漏

密码管理器 LastPass 再次披露了用户数据泄漏事故,这一次是它的外部合作伙伴 Klue 导致的,黑客访问了客户信息和支持案例数据。LastPass 称,被访问的数据包括客户姓名、电话号码、电子邮件地址和实际地址,以及支持案例数据和销售相关数据。它表示在获悉数据泄漏之后,它立即撤回了员工对 Klue 的访问,轮换了暴露的 API 令牌,通知了执法部门。LastPass 警告客户对钓鱼攻击或社交工程攻击提高警惕,公布了与攻击者相关的 IP 地址和电邮域名。

苹果产品正式涨价

在苹果 CEO 库克提前透风数天之后,苹果产品全系列涨价,涨幅少则 50 美元多则上千美元。即使是苹果也无法再自己承担高昂的内存和存储器成本。 苹果在一份声明中表示,“我们从未见过一个组件价格以如此之快、如此之大的幅度上涨。迄今为止,我们一直在尽力为客户抵挡这些涨价,但现在我们已经到了不得不开始提高部分产品价格的地步,包括今天 iPad 和 Mac 的涨价。我们知道这不是一个好消息,我们正在不遗余力地寻找解决方案。”

卵巢绝经后可能转变为具有免疫功能的器官

生殖专家曾认为,女性绝经后,卵巢会像阑尾一样变得无用。在对 50-75 岁女性的卵巢进行检查时,研究人员发现该器官的细胞会随着年龄增长产生不同的蛋白质。为了更深入研究卵巢的年龄相关变化,研究人员转向了实验小鼠。尽管小鼠不会出现雌激素急剧下降等人类更年期特有特征,但这些动物在 2 年生命周期的后期,卵巢功能也会停止。研究人员分别从年轻小鼠、处于生殖期末期的小鼠以及“绝经”后小鼠体内摘取了卵巢。对每只动物,他们对其中一侧卵巢的 RNA 进行了测序,以测量基因表达情况。对另一侧卵巢,他们对组织进行了显微镜下视觉分析,以识别不同的细胞群,并测量纤维化的发展程度,纤维化是指随着年龄增长自然发生的硬化组织堆积现象。但对“绝经”后卵巢的分析显示,其中各类免疫细胞的水平均高于年轻小鼠的典型水平。此外,老年小鼠的卵巢中,编码各种促炎化合物的基因活性更高,这些免疫分子可能被分泌到血液中并随血液流向身体其他部位。尚不清楚衰老的卵巢究竟是真正发挥着免疫信号传导的作用,还是仅仅是免疫细胞的意外聚集地。这一发现或许有助于解释,为何女性尽管寿命更长,但随着年龄增长,健康状况往往不如男性。绝经后的卵巢可能会分泌某些分子,导致女性在更年期出现慢性炎症。

中国科学家研发出降低镉吸收能力的水稻

镉不是植物生长的必要元素,但其通过土壤—水稻—食物链进入人体长期摄入后,会引发肾功能损伤、癌症、骨质疏松等严重健康问题。OsNramp5 是水稻中负责从根部往茎部运输镉的关键转运蛋白,但也同时负责锰离子等植物生长必需的金属离子的运输,敲除 OsNramp5 可以有效降低镉的运输,但也会造成其他必要金属元素的缺乏,使水稻大幅减产。根据发表在 PNAS 期刊上的研究,中国科学院遗传与发育生物学研究所等通过碱基替换技术,靶向编辑水稻负责吸收镉元素的核心转运基因 OsNramp5,创制了优异人工等位变异,发现了特异降低镉吸收而不影响锰等其他关键金属离子吸收的新机制,解决了低镉与高产难以兼顾的难题,为镉污染农田安全生产主粮提供了可落地的育种新方案。

OpenAI 宣布了专用于推理的自研 AI 芯片 Jalapeño

OpenAI 宣布了首款自研芯片 Jalapeño,由 OpenAI 与博通公司合作设计和制造,专门用于 AI 推理。OpenAI 没有披露技术方面的细节,只是称初步测试显示每瓦性能显著优于目前最先进的同类产品。OpenAI 与博通是在去年 10 月正式宣布合作,OpenAI 声称利用其模型加速了芯片的设计。自研 AI 芯片旨在减少对英伟达的依赖,Google 和亚马逊也都开发了自研芯片。

英国维基百科员工寻求成立工会

英国维基百科员工率先寻求成立工会。维基媒体基金会英国员工于 6 月 24 日星期三致函管理层,请求由 Communication Workers Union(CWU)下辖分支 United Tech and Allied Workers (UTAW) 代表他们的权利。员工呼吁维基基金会作为这家全球非营利机构的实际管理者,履行其领导层最近作出的公开承诺,即保障员工组织和组建工会的权利。逾千名维基志愿者和社区成员签署了请愿书声援这些员工。英国是仅次于美国的维基媒体基金会第二大员工来源国。

微软称 8GB 内存对 Windows 11 足够用了

微软更新了 Surface 购买指南,声称 8GB 内存对 Windows 11 足够日常使用了,如浏览、视频串流、作业和生产力应用。它同时表示 16GB 或以上的内存才能解锁 Copilot+ PC 功能。由于内存短缺且价格翻了数倍,PC 厂商不得不开始提供 8GB 内存的设备,但 8GB 内存对 Windows 11 而言非常勉强,而过去两年微软的宣传是 16GB 内存是获得良好 Windows 11 体验的必要条件。作为主要 AI 基础设施提供商,微软当然也是造成今天这一局面的罪魁祸首之一了。

白宫应用自动下载到政府配发手机上且无法卸载

美国白宫今年五月宣布其白宫应用将自动下载到政府配发手机上。该应用无法卸载,即使政府雇员尝试卸载,应用也会很快重新安装。美国农业部、国务院和劳工部员工匿名接受采访时表示,这款应用出现在手机上时让他们感到不安,有人试图删除它,但失败了。“我把它删了,测试下,结果它立刻又出现了,”一位美国农业部雇员说。白宫应用内有一个按钮,允许用户“给特朗普总统发短消息”,点击后会自动弹出一个写着“史上最伟大总统”的文本框。应用的社交部分可看到来自白宫 X 账号推文、特朗普 Truth Social 账号发布的帖子,以及官方账号在 TikTok 和 Instagram 等平台上分享的视频。新闻部分包含了白宫新闻稿、简报和情况说明书,以及来自 Fox, Breitbart, Reuters, The New York Post 等媒体的精选文章,这些内容要么对本届政府政策大加赞扬,要么攻击民主党。一位政府雇员说这是赤裸裸的宣传。

给拼写错误的单词引入波浪线的人

我们习以为常的图形 UI 中的每一个小细节,无论多么微小,都是由某个人在某个时间点想出来的。举例来说:拼写错误的单词下方的小红色波浪线。这种设计已成为每个文本编辑字段司空见惯的元素,以至于无人特意去思考它。然而它确实是由某个人发明的,微软资深程序员 Raymond Chen 说,这个人是 Tony Krueger。早期的 Word 版本中,拼写检查功能需要用户手动调用,然后等待程序查找所有可能拼写错误的单词,逐一向用户展示,由用户决定如何处理每一个错误。Word 引入了自动拼写检查功能,在用户空闲时运行拼写检查,当用户点击拼写检查按钮时,结果已准备就绪。然而自动拼写检查仍然是一个阻塞操作。很多用户选择关闭它,因为它总是会在你想做其它事情如保存并退出时突然决定“现在是检查文档拼写的好时机”,迫使你等待拼写检查完成。Tony 让拼写检查器变得更不显眼,不会干扰用户的当前工作。当它发现问题时,不会触发拼写检查,而是立即在可能拼写错误的单词下画上红色波浪线,后来在可能语法错误的单词下画上绿色波浪线。

LG 和三星智能电视应用三分之一嵌入了住宅代理 SDK

对 LG 和三星智能电视应用的扫描发现,6038 款电视应用中有 2058 款嵌入了住宅代理 SDK,也就是会出售用户的家用 IP 作为代理服务使用。智能电视是理想的代理主机,它基本上一直处于插入电源状态,同时接入了家用 WIFI,但不像 PC 没人会去检查其可疑后台活动。电视应用上的广告可能会让用户不满,但默默运行的住宅代理则能在最小化用户不满的同时给运营商带来收入。但住宅代理会有滥用的风险,Kimwolf 僵尸网络就滥用了住宅代理进行传播和扩散。

Anthropic 指控阿里巴巴蒸馏其模型

Anthropic 指控阿里巴巴的 Qwen AI 实验室非法蒸馏其 AI 模型。Anthropic 在给美国议员的信中称,阿里巴巴的上述行动发生在今年 4 月 22 日至 6 月 5 日期间,通过近 2.5 万个欺诈账户与 Claude 进行了超过 2880 万次交互。这封日期为 6 月 10 日的信在一场有关 AI 的听证会前发送给美国参议院银行委员会主席蒂姆·斯科特和资深成员伊丽莎白·沃伦。

科学家将早期人类用火时间上溯至 180 万年前

科学家在南非 Wonderwerk 洞穴发现了新证据,表明人类祖先在 107-179 万年前就开始使用火,这是已知最早的人类用火记录。研究人员在洞穴深处约 30 米处发现了反复用火的痕迹,这些地点远离自然野火可能影响的范围,因此表明早期人类有意将自然产生的火带入洞穴并持续燃烧。早期人类不能随意生火,他们很可能是从闪电引发的火或草原野火收集火源。

中国一季度 PC 出货量下滑 2%

根据市场分析公司 Omdia 的数据,中国一季度 PC 出货量下滑 2%,平板电脑下滑 5%。PC 出货量降至 890 万台,平板电脑出货量降至 830 万台。笔记本电脑(含移动工作站)出货量同比下降 19%,而台式机(含台式工作站)出货量同比增长 41%,分别达到 530 万台和 360 万台。Omdia 称市场疲软的原因是组件成本上涨导致设备价格上涨,以及消费者补贴力度减弱。Omdia 预测 2026 年全年 PC 出货量将下降 14% 至 3600 万台,平板电脑出货量预计将下降 11% 至 3200 万台。最主要 PC 制造商包括联想、华为、苹果、软通动力和惠普。

幼儿早期的屏幕使用与较差的学习成绩和较弱的工作记忆相关

随着屏幕在幼儿生活中几乎无处不在,一项研究调查了其对学习表现的影响。研究跟踪了 1-8 岁的儿童,发现屏幕观看时间更长与 9 岁时较差的学习表现以及 10.5 岁时较弱的工作记忆存在关联。研究结果表明,屏幕接触的时机可能与屏幕使用的总时长同样重要。WHO 和美国儿科学会建议幼儿在 18–24 个月前不要接触屏幕,2-5 岁儿童每天使用屏幕时间不超过 1 小时。但很多幼儿都超过了这些限制。最新研究追踪了 502 名儿童从婴儿期到童年中期的发育过程,发现在特定发育阶段屏幕观看时间较长的儿童,后期学业表现较差,工作记忆较弱。这种关联在婴儿期和学龄初期最为显著,表明这些阶段可能是认知发展的特别敏感窗口期。在整个童年期屏幕接触总量较高的儿童,学业表现也通常较差。研究结果表明,屏幕使用的时机可能与总暴露量同样重要。研究结果支持“越少越好”的原则,即儿童的屏幕时间越少越好。

欧洲是变暖速度最快的大陆

本周英国、法国、意大利和西班牙都发布了红色高温预警,欧洲正经历五月以来第二波热浪。全球气温比工业化前时期——1850-1900 年——的水平高出约 1.4C,而根据欧盟哥白尼气候变化服务中心的数据,欧洲气温比工业化前水平高出约 2.4C。全球平均气温的持续上升主要是由于燃烧石油、天然气和煤炭产生的温室气体排放,但由于多种因素的共同作用,不同地区的升温幅度不同。陆地升温速度快于海洋,因为水可以吸收更多热量并通过蒸发冷却。哥白尼气候变化服务中心称,大气环流的变化导致欧洲夏季热浪更频繁强度更大。另一个主要原因是地理位置,欧洲与北极相连,北极气温比工业化前水平高出 3.2C。北极地区气温上升的部分原因是反照率。明亮的冰雪会将大部分太阳热量反射回太空,但冰雪融化会露出颜色较深吸收热量的陆地。欧洲冬季降雪频繁的地区,积雪覆盖面积正在减少,露出了深色的陆地。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…