TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0916
SAT, JUL 4, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · SAT, JUL 4, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

July 4, 2026

Here is a summary of today's main events, based on the information provided.


President Frames Election Using US 250th Anniversary

The U.S. President kicked off events for the nation's 250th anniversary by previewing a political line of attack against the Democratic party. The speech framed the upcoming election as a pivotal moment for the country's future, using the historical milestone as a backdrop for the campaign.

Severe Weather Causes Widespread Power Outages

Extreme weather, including severe storms and temperatures near 40°C (104°F), has strained utility grids. The conditions have left more than 800,000 households without electricity as power systems struggle to keep up with high demand and storm-related damage.

Safety Concerns Grow Over Humanoid Robot Development

Recent viral incidents involving robot malfunctions are highlighting major safety challenges for robotics companies. The mishaps have raised public and industry concerns about how to ensure advanced humanoid robots can operate safely around people without causing harm.

Companies to Increase Clean Energy Deals as Subsidies End

Major tech companies like Google and Meta are expected to sharply increase their direct purchases of clean energy. This trend is driven by the anticipated end of Biden-era government support, pushing corporations to secure renewable power through private agreements.

Small-Cap Stocks Surge in First Half of the Year

The Russell 2000 index, a key benchmark for smaller U.S. companies, has performed strongly, climbing 22% in the first six months of the year. The rally comes as investors continue to show strong interest in new trades related to the Artificial Intelligence (AI) sector.

Farmers Adapt Agricultural Methods to Combat Extreme Heat

In response to rising temperatures, farmers are adopting new techniques to improve soil quality and its ability to retain water. These methods include reducing chemical use and intensive ploughing to better protect crops and land from the effects of extreme heat.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - July 4, 2026

Hacker News Feed: Highlighting key posts and discussions.

Synthesis is harder than analysis

(surfingcomplexity.blog)

11527
Costco is the anti-Amazon

(phenomenalworld.org)

457415
Factories are just rooms

(interconnected.org)

259109
Half-Baked Product

(weli.dev)

1293387
CarPlay Is Additive

(www.caseyliss.com)

562699
Protect your right to run local AI

(righttointelligence.org)

524186
An American Privacy Emergency

(scottaaronson.blog)

402131
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - July 4, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large language model APIs at the cost of locality, reproducibility, and price. We propose fuzzy-function programming: compiling such a function from a natural-language specification into a compact, locally-executable neural artifact. We instantiate this paradigm with Program-as-Weights (PAW), in which a 4B compiler trained on FuzzyBench, a 10M-example dataset we release, emits parameter-efficient adapters for a frozen, lightweight interpreter. A 0.6B Qwen3 interpreter executing PAW programs matches the performance of direct prompting of Qwen3-32B, while using roughly one fiftieth of the inference memory and running at 30 tokens/s on a MacBook M3. PAW reframes the foundation model from a per-input problem solver into a tool builder: invoked once per function definition, it produces a small reusable artifact whose subsequent calls per function application are cheap and offline.

67
EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended software-engineering progress. We introduce Autonomous Policy Evolution, a controlled evaluation setting in which a harness-model agent repeatedly edits an executable policy system under a fixed interaction budget. We instantiate this setting in EvoPolicyGym, a benchmark built from compact interactive RL environments that evaluates how agents iteratively improve explored policies. On the EvoPolicyGym suite, GPT-5.5 achieves the strongest aggregate rank score and top-two performance on all 16 environments. Beyond leaderboard results, EvoPolicyGym also provides trajectory-level diagnostics that distinguish how agents allocate budget, convert feedback into parametric tuning. These analyses show that strong autonomous policy evolution depends not only on isolated task wins, but on discovering task-appropriate mechanisms and refining policies under bounded feedback.

41
AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which makes prior context easy to access but also turns it into a jumbled mixture in which the effect of any single memory component is hard to isolate. We introduce and instrument an alternative bounded contract: every decision is made from a fresh user message assembled by typed retrieval, with no raw cross-decision transcript appended. The prompt thus stays bounded across runs of any length, and any single layer can be ablated in isolation. We instantiate the contract in Slay the Spire 2, a closed-rule stochastic deck-building game whose runs require hundreds of tactical and strategic decisions. A public online benchmark of frontier LLMs on the same game reports zero wins at the lowest difficulty across five configurations, and the developer-reported human win rate at the same difficulty is 16%; the task is hard but not saturated. Within our harness, a fixed-A0 ablation shows the largest observed difference when triggered strategic skills are enabled: the no-store baseline wins 3/10 games and adding the skill layer 6/10. At this sample size the comparison is directional rather than statistically decisive (Fisher exact p\approx0.37); a cross-backbone probe and public accumulating-context baselines are reported as operational comparisons rather than controlled tests of the contract variable itself. We release a reproducible testbed: 298 completed trajectories with condition tags, frozen memory/skill snapshots, prompt records, and analysis scripts -- an agent design and a validated, reusable methodology for studying how explicit memory layers shape long-horizon LLM-agent decisions.

41
Morphing into Hybrid Attention Models

Hybrid attention models improve long-context efficiency by retaining only a subset of full-attention layers and replacing the remaining layers with linear attention. However, the effectiveness of Transformer-to-hybrid conversion critically depends on which layers preserve full attention. Existing hybrid layer selection methods typically rely on heuristic strategies such as fixed placement patterns or layerwise scoring, implicitly treating layer importance as isolated and overlooking the interdependent layer effect under a global hybrid configuration. In this work, we formulate hybrid layer selection as a budget-constrained subset optimization problem. We further propose FlashMorph (Fast LAyer Selection for Hybrid MORPHing), an effective, efficient and scalable layer selection method for Transformer-to-hybrid conversion. FlashMorph first constructs a morphable model by equipping each full-attention layer with a converted linear-attention branch. It then freezes all model weights and jointly optimizes layerwise gates on synthetic long-context retrieval data, with a linearization regularization that encourages the model to rely on linear attention for efficiency. The learned gates are discretized under a preset full-attention budget to instantiate the hybrid architecture, followed by standard logits distillation and long-context finetuning. Extensive experiments show that FlashMorph discovers more effective hybrid configurations, preserves strong long-context recall and general benchmark performance while substantially reducing layer selection cost compared with existing layer selection methods, demonstrating its effectiveness, efficiency, and scalability.

34
AgenticDataBench: A Comprehensive Benchmark for Data Agents

Data science aims to derive actionable insights from heterogeneous raw data, unlocking the value of the massive amounts of data generated in modern society. Automating this process is essential to reducing labor-intensive efforts for data scientists and enabling scalable data-driven applications. Recently, large language model (LLM)-based data agents have emerged as a promising solution to automate data science workflows. However, the field lacks comprehensive benchmarks to rigorously evaluate these agents across diverse scenarios with fine-grained granularity. To address this gap, we propose AgenticDataBench, a comprehensive benchmark featuring realistic tasks spanning diverse domains with fine-grained ground-truth labels. This enables evaluations to capture the diversity and complexity of data science workflows and the detailed performance of agents. First, to cover diverse domains, we collect real datasets and tasks from 15 vertical domains, including 5 real-world B2B use cases from a leading fintech company. Second, to remove redundancy in real-world tasks and generate high-quality tasks for domains lacking real data, we introduce data science skills, recurring data-centric operational patterns, and quantify benchmark coverage by the number of skills included. Representative skills are extracted from large-scale task solutions on Stack Overflow using skill-aligned hierarchical clustering. Third, for real-world business tasks, we select task-solution pairs that maximize diversity in skill composition, ensuring broad coverage of practical scenarios. Fourth, to generate realistic tasks for devise domains without real tasks, we propose a systematic LLM-based task generation approach to create workflows and tasks based on these skills. Finally, we evaluate state-of-the-art data agents using our annotated benchmark and open-sourced testbed, providing detailed skill-level insights.

25
Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

Hardware-agnostic strategies for accelerating text-to-image diffusion, such as timestep distillation and feature caching, can reduce inference time without custom kernels or system-level optimization. Among them, multi-resolution generation strategies have recently received broad attention, attaining more than 5x speedup without any training. However, the design of performing upsampling in the latent space, together with the selective modification of partial regions, causes these methods to exhibit noticeable blurring or artifacts. To this end, we propose MrFlow, a training-free multi-resolution acceleration strategy for pretrained flow-matching models built upon a staged low-to-high-resolution pipeline. MrFlow first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model, subsequently injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. Quantitative and qualitative results on FLUX.1-dev and Qwen-Image show that MrFlow exploits the quadratic token reduction and reduced step requirement of low-resolution sampling to achieve 10x end-to-end acceleration while keeping OneIG within a 1% gap relative to that before acceleration, significantly surpassing other training-free acceleration strategies, and requiring no training or runtime dynamic identification whatsoever. MrFlow can further be directly combined orthogonally with pre-trained timestep distillation strategies, achieving even higher generation acceleration of up to 25x.

24
WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework explicitly decouples semantic motion orchestration from visual generation. By leveraging an LLM to coordinate 3D trajectories with camera movements and subsequently employing these orchestrated trajectories as control signals for video generation, our approach ensures strict physical logic and appearance stability, successfully preserving the exact visual identities of dynamic entities even when they re-enter the scene after prolonged periods out of view. Experimental results demonstrate that our method supports the synthesis of complex and extended events with unprecedented controllability and persistent dynamic object memory. Project Page: https://worlddirector.github.io/

20
Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Recent multimodal large language models have shown great promise in clinical image reasoning, but existing post-training pipelines remain predominantly outcome-centric, relying on final answer correctness or sequence-level preferences. This suffers from sparse credit assignment, making it difficult to optimize the reasoning process essential for clinical applications. Our analysis reveals that cascading errors from early-stage reasoning failures are a leading cause of incorrect predictions in medical visual question answering (VQA) benchmarks. Motivated by this, we propose Medical Reasoning-aware Policy Optimization (MRPO), an RL algorithm that incorporates step-wise process rewards. When the final answer is incorrect, MRPO assigns exponentially larger penalties to tokens in earlier invalid reasoning steps, breaking failure cascades without compromising successful paths. Across three multimodal LLM backbones, MRPO consistently outperforms standard GRPO and a recent RL baseline, and on Qwen3-VL-8B-Instruct even surpasses substantially larger medical MLLMs such as HuatuoGPT-Vision-34B by 2.79 points. Moreover, MRPO reduces early-stage reasoning failures from 64.0% to 13.0%, showing that targeted mitigation of cascading failures improves both reasoning quality and final answer accuracy. Our code is available at https://github.com/dmis-lab/MRPO

17
SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

Skills are becoming a reusable operational layer for LLM agents, encoding SOPs, domain rules, tool workflows, scripts, and validation routines. In realistic skill repositories, overlapping skills make reliable skill-use difficult. Final verifier success is too coarse for both evaluation and training, since an agent may pass through trial and error while selecting distractor skills, skipping required steps, composing workflows incorrectly or omitting final checks. We introduce SkillCoach, a self-evolving rubric framework for evaluating and enhancing agentic skill-use. SkillCoach derives skill-grounded process rubrics from real rollouts and evaluates trajectories along four dimensions: skill selection, skill following, skill composition, and skill-grounded reflection. It keeps the external verifier as a separate outcome signal, allowing process quality to be distinguished from accidental task success. The evolved rubrics further serve as process supervision for selecting high-quality training trajectories. Experiments show that evolved rubrics substantially improve evaluation quality, expose failures hidden by final accuracy, and provide stronger supervision signals than outcome-only filtering for enhancing agentic skill-use.

14
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value (OV) circuit, the very mechanism that carries non-literal retrieval. We introduce Logit-Contribution Scoring (LOCOS), a write-aware detector that scores each head by the projection of its OV-circuit output onto the answer-token unembedding direction, contrasting needle and off-needle source positions in a single forward pass. Across three model families (Qwen3, Gemma-3, OLMo-3.1), mean-ablating the top LOCOS heads on the NoLiMa non-literal retrieval benchmark collapses ROUGE-L at lower head counts than prior attention-based detections; on Qwen3-8B, ablating 50 heads drives ROUGE-L from 0.401 to 0.000 while the strongest baseline still retains 0.292. The selected heads are retrieval-specific: parametric recall and arithmetic reasoning stay at baseline under the same ablation. On Qwen3-8B, the same ablation also drops MuSiQue from 0.55 to 0.08 and BABI-Long from 0.62 to 0.20, while a random-heads control stays within 0.05 of baseline.

13
AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition

Vein recognition is a secure biometric technology often constrained by limited annotated data and imaging variations. While data augmentation mitigates this, strategies designed for natural images may disrupt the fine-grained topology and textures essential for identity discrimination. We present AGVBench, which evaluates 30 representative augmentation strategies on five public palm- and finger-vein datasets with seven backbone architectures, covering classic CNNs, vision transformers, and vein-specific recognition models. Our results show that multi-image mixing methods (e.g., MixUp, PuzzleMix, StarMixup) generally provide the strongest recognition performance. However, they are often poorly calibrated and vulnerable to adversarial perturbations, revealing a clear inconsistency between clean accuracy and adversarial security. We also find that severe geometric transformations frequently degrade recognition, which is potentially due to feature misalignment or spatial cropping, and that augmentation effectiveness varies across palm and finger vein datasets. These findings prove that accuracy-centric evaluation is insufficient for biometric augmentation. AGVBench provides standardized protocols to support reproducible research and guide the design of reliable, secure, and robust vein recognition systems. Our codebase is available at https://github.com/Advance-VeinTech-Innovators/AGVBench.

13
Optimizing Visual Generative Models via Distribution-wise Rewards

Conventional reinforcement learning strategies for visual generation typically employ sample-wise reward functions, yet this practice frequently results in reward hacking that degrades image diversity and introduces visual anomalies. To address these limitations, we present a novel framework that finetunes generative models using distribution-wise rewards, ensuring better alignment with real-world data distributions. Unlike rewards that evaluate samples individually, distribution-wise reward accounts for the data distribution of the samples, mitigating the mode collapse problem that occurs when all samples optimize towards the same direction independently. To overcome the prohibitive computational cost of estimating these rewards, we introduce a subset-replace strategy that efficiently provides reward signals by updating only a small subset of a generated reference set. Additionally, we apply RL to optimize post-hoc model merging coefficients, potentially mitigating the train-inference inconsistency caused by introducing stochastic differential equation (SDE) in regular RL practices. Extensive experiments show our approach significantly improves FID-50K across various base models, from 8.30 to 5.77 for SiT and from 3.74 to 3.52 for EDM2. Qualitative evaluation also confirms that our method enhances perceptual quality while preserving sample diversity.

12
From SRA to Self-Flow: Data Augmentation or Self-Supervision?

Representation alignment has become an effective way to accelerate diffusion transformer training and improve generation quality. Recent self-alignment methods, such as SRA and Self-Flow, further remove the dependency on external pretrained encoders by constructing alignment within the diffusion model itself. However, the mechanism behind the improvement from SRA to Self-Flow, dual-time scheduling, remains under-examined: Self-Flow attributes its gain to interactions between tokens at different noise levels, where cleaner tokens help infer noisier ones. In this work, we revisit this explanation and ask whether the gain instead comes from data augmentation along the noise dimension. To disentangle these factors, we introduce Attention Separation, which preserves the same dual-timestep input as Self-Flow while blocking attention between tokens assigned to different noise levels. Surprisingly, removing such interaction does not degrade performance and can even improve it, suggesting that the improvement from SRA to Self-Flow mainly comes from data augmentation. Furthermore,We show that Attention Separation itself provides an augmentation effect by splitting a single image into multiple effective training parts to expand the training data. Based on these observations, we combine self-representation alignment with dual-timestep and attention-separation augmentation, and demonstrate the effectiveness of this design on ImageNet.

10
AutoMem: Automated Learning of Memory as a Cognitive Skill

Memory expertise is a learned skill: knowing what to encode, when to retrieve, and how to organize knowledge--a capacity known in cognitive science as metamemory. We bring this perspective to LLMs by treating memory management as a trainable skill. We promote file-system operations to first-class memory actions alongside task actions, letting the model itself decide how to manage its memory. This memory skill improves along two axes: the structure that supports it (prompts, file schemas, action vocabulary), and the proficiency of the model exercising it. Both axes resist manual optimization: episodes in long-horizon tasks run for thousands of steps, and a single memory mistake can hide long before it surfaces, making human review of full trajectories impractical. We introduce AutoMem, a framework that automates both axes. In the first loop, a strong LLM reviews complete agent trajectories and iteratively revises the memory structure that shapes how the agent interacts with its memory files. In the second loop, the agent's own good memory decisions are identified from many episodes and used as training signal to sharpen the model's memory proficiency directly. Across three procedurally generated long-horizon games (Crafter, MiniHack, and NetHack), optimizing memory alone--without modifying the model's task-action behavior--improved the base agent's performance ~2x-4x, bringing a 32B open-weight model competitive with frontier systems such as Claude Opus 4.5 and Gemini 3.1 Pro Thinking. Our results show that memory management is an independently learnable skill, and a high-leverage objective yielding large gains on long-horizon tasks.

9
InstanceControl: Controllable Complex Image Generation without Instance Labeling

Controllable image generation methods, such as ControlNet, have demonstrated a remarkable capacity to introduce visual conditions(e.g., depth maps) to guide image generation. However, these methods often struggle with complex multi-instance scenes, frequently leading to attribute confusion among instances. While recent approaches attempt to mitigate this via manual instance labeling, such requirements are labor-intensive. In this paper, we propose InstanceControl, a novel multi-instance controllable generation method that eliminates the need for instance labeling. We identify the primary bottleneck in existing methods as the inability to accurately associate instance descriptions with their corresponding regions within visual conditions. To address this, we leverage the Vision-Language Model (VLM) to establish instance-level correspondences between text prompts and visual conditions. Specifically, the VLM automatically parses instance descriptions from the text prompts and simultaneously predicts instance masks based on the visual conditions. Furthermore, since the predicted masks may contain noise, we introduce an adaptive mask refinement strategy that dynamically refines these instance masks during the generation process. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods, achieving superior fidelity and precise instance-level control.

8
When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume that user queries are complete and explicit, overlooking the fact that real-world search requests are frequently vague, underspecified, or even factually incorrect. In deep search scenarios, such ambiguity can propagate along multi-step reasoning chains and lead agents toward incorrect search trajectories. To address this gap, we introduce DiscoBench, a benchmark for clarification-aware deep search, designed to evaluate whether search agents can proactively identify ambiguity, ask effective clarification questions, and recover correct reasoning paths through user interaction. DiscoBench contains 211 samples and 463 ambiguity instances across 11 real-world domains, covering four ambiguity types. We further design a user simulator for multi-turn interaction and evaluate model performance from four perspectives: task utility, ambiguity detection, interaction strategy, and cost efficiency. Experiments on representative LLMs show that ambiguity detection and effective clarification are distinct capabilities, and that repeatedly searching instead of asking for clarification often performs worse than direct guessing, highlighting a critical gap between retrieval ability and interactive problem-solving in current search agents.

8
AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models

Vision-Language Models (VLMs) have demonstrated immense promise in Spatio-Temporal Video Grounding (STVG). However, current evaluation protocols are largely confined to zero-shot assessments on general, daily-life benchmarks. This creates a critical disconnect from real-world applications in specialized fields, where models inevitably encounter rare visual concepts and complex spatio-temporal dynamics. Since exhaustive pre-training across infinite data distributions is infeasible, the ability to adapt to novel domains is essential. To bridge this gap, we introduce AnyGroundBench, a domain-adaptation benchmark designed to shift the STVG evaluation paradigm from static zero-shot testing to rigorous domain adaptation. Targeting five specialized domains (animal, industry, sports, surgery, and public security), AnyGroundBench pairs newly captured videos such as expert-annotated mouse behaviors with established datasets, unifying them through dense, high-fidelity spatio-temporal annotations. Crucially, the benchmark provides dedicated training subsets to systematically measure domain adaptability. We extensively evaluate 15 state-of-the-art VLMs, assessing their zero-shot generalization and In-Context Learning (ICL) capabilities under practical computational constraints. Ultimately, our findings reveal that current models fail in both zero-shot and ICL-based adaptation when confronted with specialized domains, exposing critical flaws in spatio-temporal reasoning that future research must address.

7
PACE: A Proxy for Agentic Capability Evaluation

Evaluating LLM agents on benchmarks like SWE-Bench and GAIA can be expensive, time-consuming, and requires complex infrastructure. A single evaluation can cost thousands of dollars and take days to complete. In contrast, non-agentic LLM benchmarks that test individual capabilities (e.g., reasoning, code generation) are fast and cheap to run. In this paper, we investigate whether performance on expensive agentic benchmarks can be accurately predicted by the performance on a small, carefully selected subset of atomic evaluation instances. We introduce PACE, a framework that constructs proxy benchmarks by selecting instances from existing non-agentic evaluations whose aggregate scores most reliably predict model performances on agentic benchmarks. Given a pool of candidate instances spanning atomic capabilities, PACE fits a regression that maps a model's scores on a compact subset of source instances to its score on the target agentic benchmark. The subset itself is curated by combining two complementary instance-selection strategies, target-relevance local selection and globally informative global selection. We apply PACE to the 4 target agentic benchmarks in this paper, which yields PACE-Bench, the concrete proxy benchmark that we evaluate in the paper. Experiments across 14 models, 4 agentic benchmarks, and 19 non-agentic benchmarks show that PACE-Bench predicts agentic scores with leave-one-out cross-validation (LOOCV) mean absolute error (MAE) under 4%, Spearman correlation above 0.80, and pairwise model-ranking accuracy around 85%, all at much less than 1% of the full agentic evaluation cost. We further analyze the selected proxy instances, revealing which skills each agentic benchmark uniquely demands. PACE enables practitioners to obtain reliable estimates of agentic performance during model development, selection, and routing, without the overhead of full agent evaluation.

6
Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language model, DiffusionGemma-26B, and benchmark it against its same-size AR sibling Gemma-4-26B under an identical LoRA recipe on medical visual question answering datasets, scored by a verbosity-robust LLM judge. Diffusion matches or exceeds AR on all of them, and the finetuned model (3.8B active) is competitive with frontier vision-language models; its decoding is also 3.5-4.4x faster. Beyond this parity, the diffusion model offers a drafting capability AR lacks: any-order infill. Because the canvas is denoised bidirectionally, a radiologist can fix report fragments and have the model fill the text between them, an operation inherent to diffusion but not to autoregression, which is subpar at it. This suits real reports, which are often terse or inconsistent across clinicians and institutions.

4
Representation Distribution Matching for One-Step Visual Generation

We elucidate the design space of Representation Distribution Matching (RDM), our name for the paradigm that trains a one-step image generator by matching generated and reference feature distributions under frozen pretrained encoders. We identify two design axes, how the distributions are compared and the representations they are compared in, and controlled studies along them yield three findings. First, the classical MMD, which could not train convincing generators a decade ago, becomes a strong and scalable objective once estimated right. Second, the generated batch is then the operative variable, with an optimum above 2048, far beyond customary batch sizes. Third, any single representation can be gamed, driven below the real score while images stay visibly fake, so we match against a balanced battery of encoders and evaluate with SW_r14, a Sliced-Wasserstein distance over 14 encoders that is independent of the training loss and resists gaming. Combining the preferred choices yields improved RDM (iRDM): it sets the one-step state of the art on ImageNet at SW_r14 1.30, corroborated by PickScore, a human-preference proxy our objective never optimizes, which prefers it over the prior best one-step generator on 71.2% of matched samples. The same recipe post-trains the four-step FLUX.2 [klein] into a one-step generator, surpassing the four-step version on GenEval, 0.826 to 0.794, and on PickScore, 22.76 to 22.58, in 90 H200 GPU-hours. Project page: https://alan-lanfeng.github.io/rdm/.

4
Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Vision-Language-Action (VLA) models are fundamentally bottlenecked by the scarcity of expert demonstrations -- triplets of observations, instructions, and actions that are costly to collect at scale. We argue that this bottleneck stems from conflating two distinct learning objectives: acquiring physical competence (how to move) and acquiring semantic alignment (what to do). Crucially, only the latter requires language supervision. Building on this Decomposition Hypothesis, we propose Task-Agnostic Pretraining (TAP), a two-stage framework that first learns transferable motor priors from cheap, unlabeled interaction data -- including discarded off-task trajectories and autonomous robot play -- via a self-supervised Inverse Dynamics objective. A lightweight second stage then grounds these priors in language using minimal expert data. On the SIMPLER benchmark, TAP matches models trained on over 1M expert trajectories while using orders of magnitude less labeled data, yielding a 10% absolute gain over standard behavior cloning. On a real-world WidowX platform, TAP retains 25% success under camera perturbations where internet-scale baselines collapse to 0%, demonstrating that task-agnostic pretraining produces robust, transferable physical representations and offers a scalable path forward for Embodied AI.

4
Denser neq Better: Limits of On-Policy Self-Distillation for Continual Post-Training

Continual post-training enables foundation models to acquire new knowledge while preserving existing capabilities. Recent work suggests that on-policy learning can mitigate forgetting, with on-policy self-distillation emerging as a particularly attractive approach. In this work, we revisit this optimistic view through self-distillation policy optimization (SDPO). Our experiments show that SDPO can accelerate in-domain specialization when teacher signals are stable and well aligned, but it struggles to generalize to out-of-distribution scenarios. In continual post-training, SDPO exhibits stronger forgetting and can even collapse, whereas on-policy reinforcement learning methods such as GRPO adapt more conservatively and better preserve prior capabilities. Further analyses reveal that denser self-distillation induces larger drift in both parameter space and response space, and can amplify high-frequency formatting artifacts through a self-reinforcing teacher--student loop. These findings suggest that on-policy data alone is insufficient for continual learning. Dense self-distillation can accelerate specialization when teacher targets are stable and token-level supervision is reliable, but it should not be treated as a default stabilizer for continual post-training. Our code is available at https://github.com/Moenupa/SDPO-CL.

4
WARP: Weight-Space Analysis for Recovering Training Data Portfolios

Foundation models are routinely released to the public, yet the data recipes used to train them -- such as domain mixture weights that determine how different sources are sampled -- are rarely disclosed. This creates an access asymmetry: researchers study the resulting models but lack visibility into the training distribution that produces them. Prior works for inferring training data, such as membership inference, detect at the level of individual samples and thus cannot characterize the global composition of the training corpus. We introduce WARP, a framework that recovers a fine-tuned model's training mixtures directly from its released weights. WARP interpolates between the base and fine-tuned models using model merging, generating pseudo-checkpoints that approximate the missing training trajectory and expose a geometric footprint of the training data in the weight space. From these simulated footprints, WARP extracts geometric features and maps them to domain proportions using either a parameter-free softmax readout or an MLP projector trained on synthetic mixtures. In controlled experiments with BERT and GPT-2, WARP recovers domain mixtures with an average MAE as low as 0.046 and 0.104 respectively, outperforming membership inference and a variant with access to the true training trajectory.

3
DuoMem: Towards Capable On-Device Memory Agents via Dual-Space Distillation

Large Language Model (LLM)-based agents can solve complex procedural tasks by interacting with environments over multiple turns, but this ability typically depends on large models, long contexts, and repeated inference calls. This makes advanced memory-augmented agents difficult to deploy on resource-constrained devices. We introduce DuoMem, a dual-space distillation framework that transfers procedural problem-solving ability from a large teacher model to compact student models. DuoMem distils in two complementary spaces: (1)context-space distillation, which replaces student-generated memories with higher-quality teacher-generated procedural memories prepended to the student's input, and (2)parameter-space distillation, which fine-tunes lightweight LoRA adapters on successful teacher trajectories. Evaluated on ALFWorld, a challenging embodied decision-making benchmark, DuoMem boosts a 4B-parameter model from 4.3% to 77.9% task success rate, closing most of the gap to a 72B teacher model (87.1%), while adding fewer than 10M trainable parameters and only a few megabytes of pre-computed teacher memories. Moreover, the DuoMem-enhanced 4B model completes tasks over 3x faster than the 72B teacher in wall-clock time, making it viable for real-time edge deployment, which would be challenging for the teacher.Extensive ablations across eight models spanning 2B-72B parameters reveal that both distillation axes contribute complementary

3
Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to whether a gradient step on the selected domain benefits the remaining domains. In this paper, we propose Transfer-Aware Curriculum (TAC), a bandit-style online curriculum that prioritizes domains whose updates broadly benefit the rest of the training suite. TAC repurposes signals already produced by RL training: per-domain advantages capture local learnability, and projected gradients, taken from the GRPO step being computed, estimate cross-domain transferability via gradient-geometry alignment, at negligible cost (<1% wall-clock overhead). Across a six-domain reasoning suite, TAC achieves the best macro-averaged accuracy on both Qwen3-1.7B and Llama3.2-3B, outperforming proportional random sampling, a hand-designed schedule, and a learnability-only bandit, and improving over the last of these by up to 2.8 points (10% relative). Ablations show performance degrades sharply when the transferability term is removed, and TAC remains robust on imbalanced training mixtures where learnability-only curricula over-commit to dominant domains. Our findings establish cross-domain transferability as a key signal for curriculum design in multi-domain RLVR.

3
Parameter-Efficient Quantum-Inspired Fast Weight Programmers for Traffic-Matrix Forecasting

Traffic matrices (TMs) capture network-wide origin-destination demand and are central to traffic engineering, yet accurate whole-matrix forecasting remains challenging when prediction must be performed under the memory, update, and training-budget constraints of online network control. This paper investigates whether compact quantum-inspired recurrent models can provide effective TM forecasts without relying on dedicated graph, transformer, or diffusion modules. We adapt gated quantum-inspired Kolmogorov-Arnold network fast-weight programmers (QKAN-FWPs) to direct multi-step Abilene TM forecasting, where each model predicts the next 20 five-minute frames of a 144-channel origin-destination (OD) matrix from a two-hour history. We benchmark three QKAN placement variants against a matched-size long short-term memory (LSTM) network, a larger LSTM, and a classical gated fast-weight programmer under a shared fixed-budget training protocol. Among the evaluated recurrent models, G-QKANFWP achieves the best pooled root-mean-square error (RMSE), while using only 22.4% of the larger LSTM. It also outperforms both the matched-size LSTM and the classical G-FWP baseline, indicating that the gain is not due to gated fast-weight framework alone. Convergence and channel-wise analyses further show that the quantum-inspired variants obtain lower validation-loss area under the learning curve (AULC) than matched-size recurrent baselines, while G-QKANFWP and GQKAN-FWP achieve substantially more OD-channel wins. These results identify a classical slow programmer with a quantum-inspired fast programmer as a promising accuracy-efficiency design for resource-conscious network traffic-matrix forecasting.

1
Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions

Grid-based approaches to approximate nearest neighbor (ANN) search have been absent from modern scaling analyses. We present a systematic characterization of a multiprobe grid algorithm with respect to dataset size N and dimensionality d. Our experiments reveal a previously unreported d-scaling crossover on the GloVe embedding family, in which multiprobe grid search maintains an approximately constant dimensional scaling exponent while other graph-, tree-, and partitioning-based methods exhibit degrading throughput. The advantage comes with near-linear query scaling in N, but also with lower indexing cost than competing ANN methods. Our results suggest that grid-based methods such as multiprobe grid may be competitive in rebuild-heavy or high-dimensional settings where indexing cost and dimensional robustness dictate performance. More broadly, recent work has formalized self-attention as an ANN operation. Thus, the N- and d-scaling properties of ANN algorithms may guide cost analysis of efficient transformer architectures. Code is available at: https://github.com/weiz345/MultiProbeANN.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - July 4, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

CentryAI icon
CentryAI

Subscription tracker built by someone who forgot 11 of them

0
Termi Protocol icon
Termi Protocol

Watch your AI coding agents build, live in 3D

0
PhoneDeck icon
PhoneDeck

Turn your iPhone into a free Mac controller

0
Vida icon
Vida

Clone yourself. Let AI do the work before you ask

0
ChecklistFox icon
ChecklistFox

AI Checklist Maker - Beautiful PDFs, Free & Instant

0
Tamamon icon
Tamamon

A desktop pet that grows as you code with Claude Code

0
Glaze by Raycast icon
Glaze by Raycast

Create your own Mac apps by chatting with AI

0
Goals from Loops icon
Goals from Loops

Measure whether a campaign drove the desired outcome

0
Archify icon
Archify

understand software

0
nxt icon
nxt

Talk to your to do list and get what's next

0
Vox icon
Vox

Voice in, voice out — with GitHub Copilot

0
Fypro icon
Fypro

Convert your TikTok followers into paying customers

0
Context.dev icon
Context.dev

One API to scrape, enrich, and extract the internet

0
Macro icon
Macro

Unifies your work into one app with shared memory

0
PieterPost MCP icon
PieterPost MCP

Connect your AI agent to postal mail

0
Solaris icon
Solaris

Your company’s AI adoption and upskilling platform

0
Macuse icon
Macuse

Give Your AI Superpowers on macOS

0
scritty icon
scritty

Shared, searchable memory for every AI coding agent

0
Needle icon
Needle

The proactive GTM agent in Slack and Teams

0
PixFit icon
PixFit

Turn 1 creative into every ad format, instantly

0
html.contact icon
html.contact

A full form backend you can test before paying

0
Gaming Chat SDK by CometChat icon
Gaming Chat SDK by CometChat

Chat drops into Unreal like it was always there

0
Retrace icon
Retrace

Debug AI agents by replaying and forking runs

0
Quick Sub 2: Video Subtitling icon
Quick Sub 2: Video Subtitling

Quick, creative video subtitling with direct canvas control.

0
Banger Mail icon
Banger Mail

Shared mailboxes for teams and AI agents

0
Basedash Actions icon
Basedash Actions

A BI tool that can take action for you

0
Flowly icon
Flowly

A personal AI agent that runs on your desktop and iPhone

0
Sidedoor icon
Sidedoor

Paste any job, find who in your network can refer you

0
Humalike icon
Humalike

Give your AI agents the social intelligence they're missing

0
Acti icon
Acti

Agentic keyboard for mobile commands and search

0
Loot icon
Loot

Collect your favorite things in real life

0
Claude Sonnet 5 icon
Claude Sonnet 5

AI that plans, acts, and gets work done

0
Tabstack Browser Automation icon
Tabstack Browser Automation

Automate the web in your app or agent, no browser to host

0
RunInfra icon
RunInfra

Describe the AI model you need and get an optimized AI

0
Livinity icon
Livinity

Open-source homeserver OS with a built-in AI agent

0
Fuser Apps icon
Fuser Apps

Vibecode apps, sites & games on everyone's favorite canvas

0
Mark by Airtop icon
Mark by Airtop

Vibe automation for solo marketers

0
Claude Science icon
Claude Science

Your research partner for rigorous science

0
OASIS 1 Ring icon
OASIS 1 Ring

Whisper to write and touch to edit

0
Wins 3.4 icon
Wins 3.4

Snap, switch, and arrange Mac windows from the notch

0
LightTwist icon
LightTwist

Record & stream your show in a realistic virtual studio

0
Adam CAD Copilot icon
Adam CAD Copilot

AI CAD inside Onshape and Fusion

0
Modelence Mobile Builder icon
Modelence Mobile Builder

Build mobile apps by chatting with AI

0
MailAdept by mailwarm icon
MailAdept by mailwarm

AI Agents & Email deliverability experts on your team

0
Sequence Agentic icon
Sequence Agentic

Money movement for AI agents

0
Aruki icon
Aruki

The Japanese walking method, coached on your iPhone

0
Saldor icon
Saldor

Speed up procurement and AP.

0
N71 icon
N71

Give all your AI agents one shared context

0
Folderly Lens icon
Folderly Lens

Domain health analysis for high performance email campaigns

0
Clusy icon
Clusy

AI notebook platform for modern data science

0
06

TECHMEME

06.00
TECHMEME

Techmeme - July 4, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Q&A with Doug Brooks, senior product manager of Apple silicon, about Mac minis becoming preferred AI agent machines, future of on-device AI, and more (Jason Hiner/The Deep View)
Source: TechmemePublished: Jul 4, 2026

Jason Hiner / The Deep View : Q&A with Doug Brooks, senior product manager of Apple silicon, about Mac minis becoming preferred AI agent machines, future of on-device AI, and more —  W  —  alk into any of the frontier AI labs, and you'll find wall-to-wall Macs.  —  Decisions Apple made years ago …

Official data shows Hong Kong accounted for 50%+ of China's $239B in chip imports in the first five months of 2026, a record share, up from ~33% a decade ago (Bloomberg)
Source: TechmemePublished: Jul 4, 2026

Bloomberg : Official data shows Hong Kong accounted for 50%+ of China's $239B in chip imports in the first five months of 2026, a record share, up from ~33% a decade ago —  Hong Kong has become a vital conduit for high-tech products moving in and out of China, emerging as one node in a $2 trillion network …

A profile of Google DeepMind philosopher Iason Gabriel, whose work has tracked, and in many cases predicted, the ethical challenges posed by the success of LLMs (Robert P Baird/The Guardian)
Source: TechmemePublished: Jul 4, 2026

Robert P Baird / The Guardian : A profile of Google DeepMind philosopher Iason Gabriel, whose work has tracked, and in many cases predicted, the ethical challenges posed by the success of LLMs —  Since 2017, Iason Gabriel has worked at the tech giant, trying to anticipate - and think through - the impact of AI.

Micron breaks ground on its ~$9.3B Hiroshima factory expansion, part of its global ramp-up to meet AI demand, and plans to start HBM shipments from summer 2028 (Mari Kiyohara/Bloomberg)
Source: TechmemePublished: Jul 4, 2026

Mari Kiyohara / Bloomberg : Micron breaks ground on its ~$9.3B Hiroshima factory expansion, part of its global ramp-up to meet AI demand, and plans to start HBM shipments from summer 2028 —  Micron Technology Inc. on Saturday broke ground on the expansion of its factory in western Japan, a ¥1.5 trillion ($9.3 billion) …

Singapore-based dConstruct Robotics, which develops spatial tech to let autonomous robots operate in complex, GPS-denied environments, raised a $125M Series A (Duc Dao/TNGlobal)
Source: TechmemePublished: Jul 4, 2026

Duc Dao / TNGlobal : Singapore-based dConstruct Robotics, which develops spatial tech to let autonomous robots operate in complex, GPS-denied environments, raised a $125M Series A —  Singapore-based dConstruct Technologies has closed a $125 million Series A funding round, marking the standout achievement …

Prague-based EquiLibre, which offers AI for quant hedge funds and is founded by three ex-Google DeepMind researchers, raised a Series A at a $500M valuation (Anna Heim/TechCrunch)
Source: TechmemePublished: Jul 4, 2026

Anna Heim / TechCrunch : Prague-based EquiLibre, which offers AI for quant hedge funds and is founded by three ex-Google DeepMind researchers, raised a Series A at a $500M valuation —  Three former DeepMind researchers who created an AI that beat humans at poker have now applied the same technology to trading stocks — and the bet appears to be paying off.

Montreal-based Stathera, a maker of MEMS-based silicon timing components for chips, raised a $55M Series B led by Maverick Silicon, taking total funding to $75M (Madison McLauchlan/BetaKit)
Source: TechmemePublished: Jul 4, 2026

Madison McLauchlan / BetaKit : Montreal-based Stathera, a maker of MEMS-based silicon timing components for chips, raised a $55M Series B led by Maverick Silicon, taking total funding to $75M —  Stathera has raised $55 million USD ($78 million CAD) to boost production of its semiconductor clock technology amid an AI-fuelled push for compute power.

Midjourney wants Disney, Universal, and Warner Bros. to reveal in court how they use AI across their companies; studios sued Midjourney in 2025 for infringement (Gene Maddaus/Variety)
Source: TechmemePublished: Jul 4, 2026

Gene Maddaus / Variety : Midjourney wants Disney, Universal, and Warner Bros. to reveal in court how they use AI across their companies; studios sued Midjourney in 2025 for infringement —  The studios sued the AI image lab last year, accusing it of enabling massive infringement of their copyrighted characters.

A look at the quant fund frenzy in China, as assets under management have more than doubled to ~$384B in less than a year amid rapid AI adoption (Bloomberg)
Source: TechmemePublished: Jul 3, 2026

Bloomberg : A look at the quant fund frenzy in China, as assets under management have more than doubled to ~$384B in less than a year amid rapid AI adoption —  Quant funds in China have become so popular that they are being deluged with investors' money.  —  Ubiquant, one of the top players …

Meta could use its compute for its own models, ad scaling, SpaceX-like neocloud deals, and hosting 3rd-party models; it may be close to an Anthropic deal (Jeremie Eliahou Ontiveros/SemiAnalysis)
Source: TechmemePublished: Jul 3, 2026

Jeremie Eliahou Ontiveros / SemiAnalysis : Meta could use its compute for its own models, ad scaling, SpaceX-like neocloud deals, and hosting 3rd-party models; it may be close to an Anthropic deal —  Zuck Takes Plan B?  SpaceX 2.0, Bedrock 2.0, MSL Isn't Giving Up, Scaling RecSys by 10x... ClusterMAX ranking coming soon?

Meta getting into the cloud business has been inevitable for a long time, as it seeks to diversify beyond ad revenue and monetize its AI buildout (M.G. Siegler/Spyglass)
Source: TechmemePublished: Jul 3, 2026

M.G. Siegler / Spyglass : Meta getting into the cloud business has been inevitable for a long time, as it seeks to diversify beyond ad revenue and monetize its AI buildout —  Their need to diversify the business meets the AI build out concerns...  Meta has a problem.  Well, two of them, actually.

Instagram has been running ads promoting child sexual abuse material in India, with terms like "rape video" and "child video" and linking to Telegram channels (Divya Arya/BBC)
Source: TechmemePublished: Jul 3, 2026

Divya Arya / BBC : Instagram has been running ads promoting child sexual abuse material in India, with terms like “rape video” and “child video” and linking to Telegram channels —  Warning: This story contains descriptions of abuse  —  Instagram has been running paid adverts promoting …

An interview with Sriram Krishnan, who says "there will not be an FDA for AI" under Trump, blames the AI backlash on the industry's "doomer" messaging, and more (Financial Times)
Source: TechmemePublished: Jul 3, 2026

Financial Times : An interview with Sriram Krishnan, who says “there will not be an FDA for AI” under Trump, blames the AI backlash on the industry's “doomer” messaging, and more —  Sriram Krishnan tells the FT the president is against a centralised regulator as AI backlash grows

Texas Attorney General Ken Paxton opens an investigation into StubHub following complaints of last-minute ticket cancellations for FIFA World Cup 2026 matches (Giles Turner/Bloomberg)
Source: TechmemePublished: Jul 3, 2026

Giles Turner / Bloomberg : Texas Attorney General Ken Paxton opens an investigation into StubHub following complaints of last-minute ticket cancellations for FIFA World Cup 2026 matches —  Texas Attorney General Ken Paxton has opened an investigation into StubHub following reports that some fans who bought World Cup tickets through …

How Jeff Bezos' changing relationship with President Trump has helped Amazon and Blue Origin, such as more federal contracts awarded during Trump's second term (Wall Street Journal)
Source: TechmemePublished: Jul 3, 2026

Wall Street Journal : How Jeff Bezos' changing relationship with President Trump has helped Amazon and Blue Origin, such as more federal contracts awarded during Trump's second term —  Space company has booked rapid growth under this administration, after founder spent president's first term being ‘hated’

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - July 4, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - July 4, 2026

Solidot Feed: Highlighting essential tech & open-source news.

全球极端热应激现象加剧

发表在《Nature Climate Change》期刊上的一项研究显示,全球极端热应激现象加剧。相比 1970 年代,当前多出约 10 亿人要至少扛过一天“极端热应激”,也就是通用热气候指数(UTCI)≥46℃的日子。研究人员分析了 1950-2024 年共 75 年的全球热应激数据集,把白天、夜间,还有昼夜连着的那段时间分开算。结论不复杂:无论哪个时段,热应激现象在频率、强度、持续时间上全线走高,而夜里跑得比白天还快。自 1970 年代以来,从全球平均来看,每年最热的十个夜晚,UTCI 升温速率是每十年 0.32℃;每年最热的十个白天反倒慢一点,每十年 0.27℃。城市热岛效应是原因之一,但更重要是湿度。夜间地表辐射冷却被云量和大气水汽兜住,加上静风日数增加,“凉不下来”的夜越来越多。地域分布上,亚热带首先遭殃:北美南部、欧洲南部、非洲南北两端、南美这些地方,跟 1970 年代比,每年 UTCI≥32℃(强)和≥46℃(极端)的天数多了约 50 天。也就是说,某些亚热带城市一年里快要有一半日子卡在强热应激线上。西班牙、葡萄牙、意大利、法国这一部分南欧国家,现在的体感温度比 1970 年代高出 5 ℃。

现代生活可能与人类大脑不匹配

人类大脑演化适应了一个充满熟人、直接威胁以及小型社交群体的世界。但今天世界变化速度远超人类演化适应的速度。这种不匹配或能解释人类所经历的压力、孤独和持续攀比心理。新加坡研究人员在《Behavioral Sciences》期刊上发表论文,探讨了如何从演化角度理解压力、竞争和孤独。在熟人小群体中形成的行之有效的反应,在现代生活中显得格格不入,甚至令人难以承受。这种演化上的不匹配在社交媒体时代尤其显著。论文合作者 Jose Yong 博士称,竞争不是新鲜事,但现代生活让它无处不在,从演化论的角度看,或许可以解释为什么人们对对比和害怕落后的反应如此强烈,即便这些信号并非来自小群体,而是来自陌生人或屏幕。

NASA 发射探测器拯救坠落中的 Swift 天文台

2004 年发射的 Neil Gehrels Swift 天文台设计监测宇宙中的伽马射线暴,它最初的运行轨道高度约 600 公里,但由于大气阻力其轨道高度已衰减至约 400 公里。2024 年太阳极大期间的太阳活动导致了地球大气膨胀,加速了轨道衰减。如果不进行拯救,Swift 天文台将于 2026 年底不受控重返大气层。Swift 天文台本身没有推进系统。去年 9 月 NASA 授予 Katalyst Space Technologies 一份 3000 万美元的合同,用于开发和发射一艘与 Swift 对接并提升其轨道的探测器 LINK。LINK 于 7 月 3 日成功发射,未来几个月它将尝试用三个机械臂抓住 Swift 天文台,启动推进器,将其送回 600 公里高度的安全轨道。这是一项雄心勃勃的任务,如果成功,那么下一个拯救任务有可能是哈勃太空望远镜。

蜂后会将农药负担转移给蜂卵

蜜蜂是重要的授粉昆虫,但其数量在世界各地都出现了下降,原因可能与农药有关。发表在《Current Biology》期刊上的一项研究在实验室条件下跟踪了农药在一个小型蜂群中的流动。研究发现,工蜂最初通过食物过滤和在蜂巢中沉积将食物中的农药水平降低 95%,但到第 10 天过滤效率下降至 86%;蜂后体内的农药含量显著低于工蜂,但随着时间的推移,蜂后会在卵巢中积累农药并将其转移到正在发育的蜂卵;蜂后的存在会改变整个蜂群的化学物质分布,工蜂会集中接触农药,增加蜂蜡中农药的沉积。

阿里巴巴禁止员工使用 Claude Code

阿里巴巴因担心后门禁止员工在工作中使用 Anthropic 的 Claude Code,它要求员工使用自家的编程平台 Qoder。尽管 Anthropic 限制了中国用户和实体的访问,但 Claude Code 在中国程序员中间仍然非常受欢迎。Anthropic 上个月指控阿里巴巴蒸馏了其模型,几天前它的 Claude Code 被发现包含了检测用户是否来自中国的代码。

Valve 开源 Steam Machine 的电子墨水屏

Valve 不会为 Steam Machine 游戏机提供正面的电子墨水屏,但它开源了相关技术,允许任何人自己动手为 Steam Machine 安装墨水屏。相关文件在 MIT 许可证下发布在 GitLab。Valve 将该项目称之为 Inkterfac,用户需要准备: 1 x Adafruit ESP32 Feather with 2MB PSRAM. 1 x Adafruit eInk Breakout Friend. 1 x Adafruit 5.83" Monochrome eInk Panel. 13 x M2.5 x 5mm Pan Head Machine Screws. 4 x 1/4" x 1/4" x 3/16" Stepped Magnet SB443-OUT.

阿里巴巴与美国达成和解将支付 6 亿美元

美国司法部表示,因未能阻止商家进口和销售违法药物,阿里巴巴集团及蚂蚁集团子公司将支付 6 亿美元,目前已就此达成和解协议。美国司法部不会起诉两家公司。阿里巴巴在 2016 年 1 月至 2024 年 12 月期间,未能防止商家通过阿里运营的电商平台向美国进口和销售违法化学物质、药品以及假药制造设备。蚂蚁子公司 AUS 则在 2020 年 1 月至 2023 年 12 月期间提供了支付服务。销售的违法商品达到 8 万笔,交易总额超过 2 亿美元。阿里巴巴将被没收 2 亿美元,AUS将被没收 1.9 亿美元,此外,阿里巴巴还将支付 1.25 亿美元罚金,AUS 将支付 8500 万美元罚金。

日本最高法院裁决 AI 不能被列为专利申请的发明人

日本最高法院驳回了一位美国工程师要求将 AI 列为专利申请发明人的上诉。日本最高法院维持了下级法院的判决,即根据专利法,专利申请的发明人必须是“自然人”。原告于 2020 年提交了一份专利申请,发明人是原告创建的 DABUS AI。专利局要求原告提供自然人的姓名作为发明人。原告拒绝提供,因此该申请被驳回。日本最高法院表示,专利法并未预见到 AI 的快速发展,而关于是否应授予 AI 发明专利权的问题“鉴于其对社会的影响,需要进行讨论”。

代糖会扰乱肠道健康和新陈代谢

越来越多的研究表明,人工甜味剂和非营养性甜味剂可能会扰乱新陈代谢。代糖在食品中已经无处不在。根据发表在《Current Atherosclerosis Reports》期刊上的一篇综述和荟萃分析,研究人员发现相比水或安慰剂等非热量对照组,人工甜味剂和低热量甜味剂会导致空腹胰岛素水平升高,糖化血红蛋白(HbA1c,长期血糖控制的标志物)升高,显示出胰岛素敏感性恶化的趋势。研究人员表示,一种解释与肠道微生物群有关。非营养性甜味剂会通过肠道与这些微生物直接接触,研究表明它会改变肠道微生物群的组成和功能。

苹果寻求从长鑫和长江存储采购内存

内存价格危机正迫使美国硬件制造商违背政府意愿尝试与中国内存制造商达成交易。苹果正与长鑫存储和长江存储谈判采购内存,以缓解全球内存短缺的影响。两家中国公司都被五角大楼列入了 1260H 名单,该黑名单并不具有法律上的禁止交易效力。但如果同时被列入美国商务部的实体清单,美国公司与之交易会受到限制,这正是苹果寻求阻止的。苹果可能会尝试仅在销往中国的苹果设备中使用中国内存芯片,以此限制负面影响。

Google 的 2025 年用电量增长了 37%

Google 通过最新的可持续发展报告承认,该公司自 2019 年以来用电量增长了逾 250%,在 2024 年增长 27% 基础上 2025 年又增长了 37%。Google 将这一切归于 Google Cloud、YouTube 视频串流以及 AI 基础设施的建造和运营的持续增长。Google 数据中心在 2025 年消耗了逾 4200 万 MWh 电力,2024 年则是 3060 万 MWh。这意味着 Google 数据中心的能源消耗量相当于新西兰、丹麦和尼日利亚等国全国的电力消耗量。

OpenAI 磋商将 5% 股份送给美国政府

随着 AI 公司试图缓和与特朗普政府的关系,OpenAI 正磋商向美国政府捐赠其 5% 的股份。OpenAI CEO Sam Altman 认为,向美国公众提供该公司的股份是分享 AI 好处的最佳方式。它的提议还建议还其它美国 AI 公司向政府捐出类似的股份,目前尚不清楚 Anthropic、Google 和 Meta 等公司是否会同意该计划。OpenAI 高管建议,美国 AI 公司应将 5% 股份捐给主权基金 Alaska Permanent Fund。这一谈判是“概念性的”,还处于早期阶段,任何协议可能需要国会通过法案才能实施。

轨道数据中心的炒作和现实

SpaceX 创始人 Elon Musk 今年一月在达沃斯世界经济论坛上宣称,最迟三年轨道数据中心就能实现。随后 SpaceX 向 FCC 递交申请发射 100 万颗卫星建立轨道数据中心星座。Musk 总是喜欢夸大其词,他说完全自动驾驶汽车将在 2017 年实现,载人火星任务将在 2024 年实现,到 2025 年底将会制造出 1 万台 Optimus 人形机器人。目前地球轨道上约有 14,500 颗卫星,Starlink 星座占了三分之二,要部署 100 万颗卫星,SpaceX 的火箭发射频率和卫星制造能力都需要大幅提升。SpaceX 下一代火箭 Starship 能将 60 颗卫星发射到轨道上,100 万颗卫星至少需要执行 16,666 次发射。SpaceX 在 2025 年创下了 165 次轨道发射纪录,如果将发射频率提高到 10 倍,也需要十年才能发射完毕。Starlink 卫星的建造速度为每年 4000 颗,除非卫星制造发生革命性变革,制造 100 万颗卫星也需要约 25 年。轨道数据中心星座距离现实还遥遥无期。这还没有考虑轨道数据中心所需要的庞大散热器、以及辐射、维护、轨道碎片等问题。那么为什么 SpaceX 要大力宣传轨道数据中心?为了钱。IEEE Spectrum 的 Dina Genkina 称,Musk 在自己给自己发钱上几乎是天才,他让 xAI 负责建造数据中心,SpaceX 负责将它们发射到太空,特斯拉负责制造太阳能电池板,他就像是自己给自己发工资。

DGX Spark 黑客松线上训练营:4 小时干货,从环境配置到具身智能,手把手教你搭出能跑的 Agent

NVIDIA DGX Spark 黑客松开赛即报满,但赛事之外还有一场更适合"先蹭一波再决定要不要打"的硬核直播 直播时间:7 月 12 日 10:00 - 12:00 训练营内容: 1· 黑客松赛事规则说明:解读赛事机制、评审标准与提交流程,帮助团队明确方向、高效备赛。 2 · 基于 DGX Spark 和 Step 3.7 搭建本地 Agent Team 的最佳实践:从环境配置到模型推理,讲解如何在 DGX Spark 上高效落地Stepfun3.7模型能力。 3 · Agent 一键出片:基于 DGX Spark 搭建本地视觉生成智能体 演示如何构建具备视觉理解与内容生成能力的本地 Agent,打通从提示词到成片的完整链路。 4 · 从本地 AI 到具身智能:基于 DGX Spark 构建桌面机器人 Agent 开发平台 探索 Agent 从软件走向物理世界的实现路径,展示 DGX Spark 在具身智能场景下的开发实践。

久坐不动者肌肉线粒体功能出现显著下降

研究人员发现健康但久坐不动者其肌肉线粒体功能出现了显著且一致的下降。这可能是重大疾病发生的先兆。论文资深作者 Iñigo San Millan 表示,线粒体功能是代谢健康的核心,如果你 40 岁,身体健康但久坐不动,那么细胞很可能已出现问题,这些问题可能会在 10-15 年后给你带来麻烦。研究对象为 9 名久坐不动的男性和 10 名经常运动的男性,年龄均约为 42 岁。研究人员分析了肌肉活检以观察线粒体燃烧燃料的效率,并进行了运动测试以测量受试者的体能、脂肪燃烧能力和血乳酸水平——血乳酸水平是衡量身体能量消耗程度的关键指标。相比经常运动的男性,久坐不动的男性的线粒体效率在多个类别中下降了 28%-36%;将糖转化为可用能量的关键蛋白 MPC1 的水平降低了 49%,脂肪运输到线粒体的 CPT1 酶的活性降低了约一半;最大摄氧量(VO2max)降低了 38%,运动时血液乳酸水平升高了 60%。

来自 Google 的 Android 恶意程序

Android 自由软件应用商店 F-Droid 警告,过去几个月 Google 向多达 40 亿 Android 设备推送了被称为 Android Developer Verifier(ADV)的恶意程序。它以系统服务的形式在后台秘密运行,拥有完整的 root 权限,正静静等待 Google 的激活信号。ADV 服务无法屏蔽、禁用或移除。一旦激活,它唯一的目的就是阻止用户运行未经 Google 批准的开发者应用。Google 是以安全的名义强制推行 Android 开发者验证计划。根据 Android Developer Console 服务条款,如果开发者“违反任何条款,或者分发恶意应用或其它有害应用,Google 可能会终止您对 ADC 的访问……”,Google 没有定义恶意应用或有害应用,这意味着一款应用是否是恶意应用由 Google 判断,而作为最大的广告公司,广告屏蔽应用在其眼里可能就属于恶意应用。Google 预计从 9 月 30 日开始逐步激活 ADV。

越来越多的儿童使用 AI

根据来自 10 个国家的新数据,联合国儿童基金会估计,至少有 2000 万儿童使用过人工智能,且青少年采用该技术人数的增长速度是成年人的三倍多。最引人注目的是,据估计约有 200万 儿童——约占十分之一——表示会向人工智能寻求关于自身担忧的建议,另有 1300 万儿童表示使用人工智能来协助完成学校作业和家庭作业。儿基会表示:“人工智能已经到来。它正日益成为我们生活的一部分,它已经在塑造全球儿童的成长历程——无论好坏。”尽管人工智能为学习和创造力提供了新机遇,但儿基会警告称,关于其对儿童发展、情绪健康以及可能面临的伤害的影响,相关证据才刚刚开始浮出水面。该机构表示:“实际上,这一代人正在一场全球性实验中成长。”它敦促各国政府和科技公司将儿童权利置于人工智能监管的核心位置。

全球昆虫物种可能有 2000 万

科学家长期以来一直就昆虫物种的确切数量争论不休,此前普遍认为约为 600 万种。过去 3 个世纪里,昆虫学家已描述了约 100 万种昆虫,但要发现并描述所有物种,是一项艰巨甚至不可能完成的任务。为更准确估算昆虫多样性,研究人员研究了哥斯达黎加瓜纳卡斯特国家公园多年的昆虫调查数据,并应用了借鉴自流行病学领域的统计方法。随后利用另一个高度多样化的生物群体——树木,将这一数字推及全球范围。如果昆虫的多样性也遵循相同的比例,那么地球上大约有 1330万~2470 万种昆虫,一个稳妥的中间值是 2030 万种。研究人员表示他们的估算数字较为保守,这意味着可能还有数百万种尚未被发现的昆虫物种。

Cloudflare 推动 AI 公司为内容付费

Cloudflare 宣布推出新的控制功能,赋予内容出版商更多控制权,更好的掌控 AI 公司如何访问和使用其内容。 从 9 月 15 日起,新 Cloudflare 网站将允许传统的搜索引擎索引,但默认会屏蔽 AI 训练机器人和 AI 智能体访问广告支持的网页。Cloudflare 还在扩展其变现努力,推出了一种按使用付费模式(Pay-Per-Use),目的是当出版商的内容为 AI 生成的答案做出贡献时给予它们补偿,而不仅仅是让内容被抓取。Cloudflare 认为,出版商不应被迫在提高在线曝光率和免费向 AI 系统提供内容之间做出选择。

科学家首次利用非生命成分制造出细胞

明尼苏达大学的合成生物学家首次将非生物成分逐一装入类细胞的膜,见证该分子袋开始表现出类生命行为。这种人工合成细胞能生长、复制 DNA 并分裂,展示了细胞周期的基本功能。就任何定义而言,这个细胞都不是活着的。它离不开源源不断的营养物质和核糖体——合成蛋白质的分子机器。它没有防御机制,没有完善的废物处理系统。但迄今为止它最有力地证明从非生命物质创造生命是可能的,而这正是合成生物学家几十年来一直追求的目标。大约 40 亿年前,非生物分子聚集在一起形成了最早的原细胞。它们吸收养分、生长和分裂。随着时间的推移,这些细胞演变分化成不同的类型,用各种奇特的生物装饰这个原本贫瘠的世界。科学家对从非生命到生命的这种转变是如何发生的至今仍存在争议,部分科学家已开始在实验室进行尝试。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…