TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0915
FRI, JUL 3, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · FRI, JUL 3, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

July 3, 2026

Here is a summary of today's main news events:

Weak Jobs Report Boosts Markets Weaker-than-expected U.S. jobs data fueled hopes that the Federal Reserve will hold off on raising interest rates. In response, global stocks rose and the U.S. dollar weakened as investors shifted to a more risk-on sentiment.

Mixed Signals Emerge in the AI Sector The artificial intelligence industry is showing signs of both doubt and major investment. While concerns about the long-term profitability of AI caused volatility in tech stocks, Kuaishou Technology's AI video unit raised $2.8 billion from investors, and U.S. company Anthropic moved to block Chinese firms from using its technology.

Global Food Prices Decline, Says UN The United Nations’ Food and Agriculture Organization reported that global prices for key food commodities, including cereals, sugar, and dairy products, have fallen. However, the agency warned that an intense El Niño weather pattern adds uncertainty to future production outlooks.

China Protests Japan-Philippines Maritime Talks Beijing condemned ongoing talks between Japan and the Philippines concerning their maritime boundaries, arguing the discussions violate international law. The protest comes as China continues to increase military and political pressure on Taiwan.

Gold Prices Climb on Rate Hike Doubts Gold prices rose for a third consecutive session as the disappointing U.S. jobs data eased investor concerns about further interest rate hikes from the Federal Reserve. Oil prices, however, remained stable, signaling an ample near-term supply.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - July 3, 2026

Hacker News Feed: Highlighting key posts and discussions.

CarPlay Is Additive

(www.caseyliss.com)

415560
Right to Local Intelligence

(righttointelligence.org)

350120
An American Privacy Emergency

(scottaaronson.blog)

358106
Immich 3.0

(github.com)

510249
Exapunks (2018)

(www.zachtronics.com)

307106
Podman v6.0.0

(blog.podman.io)

582229
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - July 3, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Many everyday programming tasks resist clean rule-based implementation, such as alerting on important log lines, repairing malformed JSON, or ranking search results by intent, and are increasingly outsourced to large language model APIs at the cost of locality, reproducibility, and price. We propose fuzzy-function programming: compiling such a function from a natural-language specification into a compact, locally-executable neural artifact. We instantiate this paradigm with Program-as-Weights (PAW), in which a 4B compiler trained on FuzzyBench, a 10M-example dataset we release, emits parameter-efficient adapters for a frozen, lightweight interpreter. A 0.6B Qwen3 interpreter executing PAW programs matches the performance of direct prompting of Qwen3-32B, while using roughly one fiftieth of the inference memory and running at 30 tokens/s on a MacBook M3. PAW reframes the foundation model from a per-input problem solver into a tool builder: invoked once per function definition, it produces a small reusable artifact whose subsequent calls per function application are cheap and offline.

39
AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

Memory for a long-horizon LLM agent is a contract about what each future decision is allowed to see. The simplest contract appends past observations, tool calls, and reflections to every prompt, which makes prior context easy to access but also turns it into a jumbled mixture in which the effect of any single memory component is hard to isolate. We introduce and instrument an alternative bounded contract: every decision is made from a fresh user message assembled by typed retrieval, with no raw cross-decision transcript appended. The prompt thus stays bounded across runs of any length, and any single layer can be ablated in isolation. We instantiate the contract in Slay the Spire 2, a closed-rule stochastic deck-building game whose runs require hundreds of tactical and strategic decisions. A public online benchmark of frontier LLMs on the same game reports zero wins at the lowest difficulty across five configurations, and the developer-reported human win rate at the same difficulty is 16%; the task is hard but not saturated. Within our harness, a fixed-A0 ablation shows the largest observed difference when triggered strategic skills are enabled: the no-store baseline wins 3/10 games and adding the skill layer 6/10. At this sample size the comparison is directional rather than statistically decisive (Fisher exact p\approx0.37); a cross-backbone probe and public accumulating-context baselines are reported as operational comparisons rather than controlled tests of the contract variable itself. We release a reproducible testbed: 298 completed trajectories with condition tags, frozen memory/skill snapshots, prompt records, and analysis scripts -- an agent design and a validated, reusable methodology for studying how explicit memory layers shape long-horizon LLM-agent decisions.

37
EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments

Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended software-engineering progress. We introduce Autonomous Policy Evolution, a controlled evaluation setting in which a harness-model agent repeatedly edits an executable policy system under a fixed interaction budget. We instantiate this setting in EvoPolicyGym, a benchmark built from compact interactive RL environments that evaluates how agents iteratively improve explored policies. On the EvoPolicyGym suite, GPT-5.5 achieves the strongest aggregate rank score and top-two performance on all 16 environments. Beyond leaderboard results, EvoPolicyGym also provides trajectory-level diagnostics that distinguish how agents allocate budget, convert feedback into parametric tuning. These analyses show that strong autonomous policy evolution depends not only on isolated task wins, but on discovering task-appropriate mechanisms and refining policies under bounded feedback.

36
Morphing into Hybrid Attention Models

Hybrid attention models improve long-context efficiency by retaining only a subset of full-attention layers and replacing the remaining layers with linear attention. However, the effectiveness of Transformer-to-hybrid conversion critically depends on which layers preserve full attention. Existing hybrid layer selection methods typically rely on heuristic strategies such as fixed placement patterns or layerwise scoring, implicitly treating layer importance as isolated and overlooking the interdependent layer effect under a global hybrid configuration. In this work, we formulate hybrid layer selection as a budget-constrained subset optimization problem. We further propose FlashMorph (Fast LAyer Selection for Hybrid MORPHing), an effective, efficient and scalable layer selection method for Transformer-to-hybrid conversion. FlashMorph first constructs a morphable model by equipping each full-attention layer with a converted linear-attention branch. It then freezes all model weights and jointly optimizes layerwise gates on synthetic long-context retrieval data, with a linearization regularization that encourages the model to rely on linear attention for efficiency. The learned gates are discretized under a preset full-attention budget to instantiate the hybrid architecture, followed by standard logits distillation and long-context finetuning. Extensive experiments show that FlashMorph discovers more effective hybrid configurations, preserves strong long-context recall and general benchmark performance while substantially reducing layer selection cost compared with existing layer selection methods, demonstrating its effectiveness, efficiency, and scalability.

27
AgenticDataBench: A Comprehensive Benchmark for Data Agents

Data science aims to derive actionable insights from heterogeneous raw data, unlocking the value of the massive amounts of data generated in modern society. Automating this process is essential to reducing labor-intensive efforts for data scientists and enabling scalable data-driven applications. Recently, large language model (LLM)-based data agents have emerged as a promising solution to automate data science workflows. However, the field lacks comprehensive benchmarks to rigorously evaluate these agents across diverse scenarios with fine-grained granularity. To address this gap, we propose AgenticDataBench, a comprehensive benchmark featuring realistic tasks spanning diverse domains with fine-grained ground-truth labels. This enables evaluations to capture the diversity and complexity of data science workflows and the detailed performance of agents. First, to cover diverse domains, we collect real datasets and tasks from 15 vertical domains, including 5 real-world B2B use cases from a leading fintech company. Second, to remove redundancy in real-world tasks and generate high-quality tasks for domains lacking real data, we introduce data science skills, recurring data-centric operational patterns, and quantify benchmark coverage by the number of skills included. Representative skills are extracted from large-scale task solutions on Stack Overflow using skill-aligned hierarchical clustering. Third, for real-world business tasks, we select task-solution pairs that maximize diversity in skill composition, ensuring broad coverage of practical scenarios. Fourth, to generate realistic tasks for devise domains without real tasks, we propose a systematic LLM-based task generation approach to create workflows and tasks based on these skills. Finally, we evaluate state-of-the-art data agents using our annotated benchmark and open-sourced testbed, providing detailed skill-level insights.

18
Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling

Hardware-agnostic strategies for accelerating text-to-image diffusion, such as timestep distillation and feature caching, can reduce inference time without custom kernels or system-level optimization. Among them, multi-resolution generation strategies have recently received broad attention, attaining more than 5x speedup without any training. However, the design of performing upsampling in the latent space, together with the selective modification of partial regions, causes these methods to exhibit noticeable blurring or artifacts. To this end, we propose MrFlow, a training-free multi-resolution acceleration strategy for pretrained flow-matching models built upon a staged low-to-high-resolution pipeline. MrFlow first rapidly generates the main structure at low resolution, then performs super-resolution in the pixel space using a lightweight pretrained GAN-based model, subsequently injects low-strength noise to enable high-frequency resampling, and finally refines the details at high resolution. Quantitative and qualitative results on FLUX.1-dev and Qwen-Image show that MrFlow exploits the quadratic token reduction and reduced step requirement of low-resolution sampling to achieve 10x end-to-end acceleration while keeping OneIG within a 1% gap relative to that before acceleration, significantly surpassing other training-free acceleration strategies, and requiring no training or runtime dynamic identification whatsoever. MrFlow can further be directly combined orthogonally with pre-trained timestep distillation strategies, achieving even higher generation acceleration of up to 25x.

17
WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

We present WorldDirector, a highly controllable video world model framework designed for persistent dynamic object memory and unrestricted viewpoint exploration. Unlike existing world models that entangle physical dynamics with pixel rendering and rely on continuous visual observation to sustain motion, our framework explicitly decouples semantic motion orchestration from visual generation. By leveraging an LLM to coordinate 3D trajectories with camera movements and subsequently employing these orchestrated trajectories as control signals for video generation, our approach ensures strict physical logic and appearance stability, successfully preserving the exact visual identities of dynamic entities even when they re-enter the scene after prolonged periods out of view. Experimental results demonstrate that our method supports the synthesis of complex and extended events with unprecedented controllability and persistent dynamic object memory. Project Page: https://worlddirector.github.io/

14
Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Recent multimodal large language models have shown great promise in clinical image reasoning, but existing post-training pipelines remain predominantly outcome-centric, relying on final answer correctness or sequence-level preferences. This suffers from sparse credit assignment, making it difficult to optimize the reasoning process essential for clinical applications. Our analysis reveals that cascading errors from early-stage reasoning failures are a leading cause of incorrect predictions in medical visual question answering (VQA) benchmarks. Motivated by this, we propose Medical Reasoning-aware Policy Optimization (MRPO), an RL algorithm that incorporates step-wise process rewards. When the final answer is incorrect, MRPO assigns exponentially larger penalties to tokens in earlier invalid reasoning steps, breaking failure cascades without compromising successful paths. Across three multimodal LLM backbones, MRPO consistently outperforms standard GRPO and a recent RL baseline, and on Qwen3-VL-8B-Instruct even surpasses substantially larger medical MLLMs such as HuatuoGPT-Vision-34B by 2.79 points. Moreover, MRPO reduces early-stage reasoning failures from 64.0% to 13.0%, showing that targeted mitigation of cascading failures improves both reasoning quality and final answer accuracy. Our code is available at https://github.com/dmis-lab/MRPO

12
Optimizing Visual Generative Models via Distribution-wise Rewards

Conventional reinforcement learning strategies for visual generation typically employ sample-wise reward functions, yet this practice frequently results in reward hacking that degrades image diversity and introduces visual anomalies. To address these limitations, we present a novel framework that finetunes generative models using distribution-wise rewards, ensuring better alignment with real-world data distributions. Unlike rewards that evaluate samples individually, distribution-wise reward accounts for the data distribution of the samples, mitigating the mode collapse problem that occurs when all samples optimize towards the same direction independently. To overcome the prohibitive computational cost of estimating these rewards, we introduce a subset-replace strategy that efficiently provides reward signals by updating only a small subset of a generated reference set. Additionally, we apply RL to optimize post-hoc model merging coefficients, potentially mitigating the train-inference inconsistency caused by introducing stochastic differential equation (SDE) in regular RL practices. Extensive experiments show our approach significantly improves FID-50K across various base models, from 8.30 to 5.77 for SiT and from 3.74 to 3.52 for EDM2. Qualitative evaluation also confirms that our method enhances perceptual quality while preserving sample diversity.

11
AGVBench: A Reliability-Oriented Benchmark of Data Augmentation for Vein Recognition

Vein recognition is a secure biometric technology often constrained by limited annotated data and imaging variations. While data augmentation mitigates this, strategies designed for natural images may disrupt the fine-grained topology and textures essential for identity discrimination. We present AGVBench, which evaluates 30 representative augmentation strategies on five public palm- and finger-vein datasets with seven backbone architectures, covering classic CNNs, vision transformers, and vein-specific recognition models. Our results show that multi-image mixing methods (e.g., MixUp, PuzzleMix, StarMixup) generally provide the strongest recognition performance. However, they are often poorly calibrated and vulnerable to adversarial perturbations, revealing a clear inconsistency between clean accuracy and adversarial security. We also find that severe geometric transformations frequently degrade recognition, which is potentially due to feature misalignment or spatial cropping, and that augmentation effectiveness varies across palm and finger vein datasets. These findings prove that accuracy-centric evaluation is insufficient for biometric augmentation. AGVBench provides standardized protocols to support reproducible research and guide the design of reliable, secure, and robust vein recognition systems. Our codebase is available at https://github.com/Advance-VeinTech-Innovators/AGVBench.

9
From SRA to Self-Flow: Data Augmentation or Self-Supervision?

Representation alignment has become an effective way to accelerate diffusion transformer training and improve generation quality. Recent self-alignment methods, such as SRA and Self-Flow, further remove the dependency on external pretrained encoders by constructing alignment within the diffusion model itself. However, the mechanism behind the improvement from SRA to Self-Flow, dual-time scheduling, remains under-examined: Self-Flow attributes its gain to interactions between tokens at different noise levels, where cleaner tokens help infer noisier ones. In this work, we revisit this explanation and ask whether the gain instead comes from data augmentation along the noise dimension. To disentangle these factors, we introduce Attention Separation, which preserves the same dual-timestep input as Self-Flow while blocking attention between tokens assigned to different noise levels. Surprisingly, removing such interaction does not degrade performance and can even improve it, suggesting that the improvement from SRA to Self-Flow mainly comes from data augmentation. Furthermore,We show that Attention Separation itself provides an augmentation effect by splitting a single image into multiple effective training parts to expand the training data. Based on these observations, we combine self-representation alignment with dual-timestep and attention-separation augmentation, and demonstrate the effectiveness of this design on ImageNet.

9
SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

Skills are becoming a reusable operational layer for LLM agents, encoding SOPs, domain rules, tool workflows, scripts, and validation routines. In realistic skill repositories, overlapping skills make reliable skill-use difficult. Final verifier success is too coarse for both evaluation and training, since an agent may pass through trial and error while selecting distractor skills, skipping required steps, composing workflows incorrectly or omitting final checks. We introduce SkillCoach, a self-evolving rubric framework for evaluating and enhancing agentic skill-use. SkillCoach derives skill-grounded process rubrics from real rollouts and evaluates trajectories along four dimensions: skill selection, skill following, skill composition, and skill-grounded reflection. It keeps the external verifier as a separate outcome signal, allowing process quality to be distinguished from accidental task success. The evolved rubrics further serve as process supervision for selecting high-quality training trajectories. Experiments show that evolved rubrics substantially improve evaluation quality, expose failures hidden by final accuracy, and provide stronger supervision signals than outcome-only filtering for enhancing agentic skill-use.

9
AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models

Vision-Language Models (VLMs) have demonstrated immense promise in Spatio-Temporal Video Grounding (STVG). However, current evaluation protocols are largely confined to zero-shot assessments on general, daily-life benchmarks. This creates a critical disconnect from real-world applications in specialized fields, where models inevitably encounter rare visual concepts and complex spatio-temporal dynamics. Since exhaustive pre-training across infinite data distributions is infeasible, the ability to adapt to novel domains is essential. To bridge this gap, we introduce AnyGroundBench, a domain-adaptation benchmark designed to shift the STVG evaluation paradigm from static zero-shot testing to rigorous domain adaptation. Targeting five specialized domains (animal, industry, sports, surgery, and public security), AnyGroundBench pairs newly captured videos such as expert-annotated mouse behaviors with established datasets, unifying them through dense, high-fidelity spatio-temporal annotations. Crucially, the benchmark provides dedicated training subsets to systematically measure domain adaptability. We extensively evaluate 15 state-of-the-art VLMs, assessing their zero-shot generalization and In-Context Learning (ICL) capabilities under practical computational constraints. Ultimately, our findings reveal that current models fail in both zero-shot and ICL-based adaptation when confronted with specialized domains, exposing critical flaws in spatio-temporal reasoning that future research must address.

7
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting long-context model behavior. Yet existing detectors miss these heads by construction: they reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes through its output-value (OV) circuit, the very mechanism that carries non-literal retrieval. We introduce Logit-Contribution Scoring (LOCOS), a write-aware detector that scores each head by the projection of its OV-circuit output onto the answer-token unembedding direction, contrasting needle and off-needle source positions in a single forward pass. Across three model families (Qwen3, Gemma-3, OLMo-3.1), mean-ablating the top LOCOS heads on the NoLiMa non-literal retrieval benchmark collapses ROUGE-L at lower head counts than prior attention-based detections; on Qwen3-8B, ablating 50 heads drives ROUGE-L from 0.401 to 0.000 while the strongest baseline still retains 0.292. The selected heads are retrieval-specific: parametric recall and arithmetic reasoning stay at baseline under the same ablation. On Qwen3-8B, the same ablation also drops MuSiQue from 0.55 to 0.08 and BABI-Long from 0.62 to 0.20, while a random-heads control stays within 0.05 of baseline.

6
When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

Search agents powered by large language models (LLMs) are increasingly used to solve complex information-seeking tasks, requiring multi-step retrieval and reasoning to fulfill user goals. However, existing benchmarks often assume that user queries are complete and explicit, overlooking the fact that real-world search requests are frequently vague, underspecified, or even factually incorrect. In deep search scenarios, such ambiguity can propagate along multi-step reasoning chains and lead agents toward incorrect search trajectories. To address this gap, we introduce DiscoBench, a benchmark for clarification-aware deep search, designed to evaluate whether search agents can proactively identify ambiguity, ask effective clarification questions, and recover correct reasoning paths through user interaction. DiscoBench contains 211 samples and 463 ambiguity instances across 11 real-world domains, covering four ambiguity types. We further design a user simulator for multi-turn interaction and evaluate model performance from four perspectives: task utility, ambiguity detection, interaction strategy, and cost efficiency. Experiments on representative LLMs show that ambiguity detection and effective clarification are distinct capabilities, and that repeatedly searching instead of asking for clarification often performs worse than direct guessing, highlighting a critical gap between retrieval ability and interactive problem-solving in current search agents.

6
InstanceControl: Controllable Complex Image Generation without Instance Labeling

Controllable image generation methods, such as ControlNet, have demonstrated a remarkable capacity to introduce visual conditions(e.g., depth maps) to guide image generation. However, these methods often struggle with complex multi-instance scenes, frequently leading to attribute confusion among instances. While recent approaches attempt to mitigate this via manual instance labeling, such requirements are labor-intensive. In this paper, we propose InstanceControl, a novel multi-instance controllable generation method that eliminates the need for instance labeling. We identify the primary bottleneck in existing methods as the inability to accurately associate instance descriptions with their corresponding regions within visual conditions. To address this, we leverage the Vision-Language Model (VLM) to establish instance-level correspondences between text prompts and visual conditions. Specifically, the VLM automatically parses instance descriptions from the text prompts and simultaneously predicts instance masks based on the visual conditions. Furthermore, since the predicted masks may contain noise, we introduce an adaptive mask refinement strategy that dynamically refines these instance masks during the generation process. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods, achieving superior fidelity and precise instance-level control.

4
Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Diffusion language models, which generate text by denoising a token canvas bidirectionally instead of emitting tokens left to right, have become competitive with autoregressive (AR) generation. Medical foundation models, however, remain almost entirely autoregressive. We adapt a mixture-of-experts diffusion language model, DiffusionGemma-26B, and benchmark it against its same-size AR sibling Gemma-4-26B under an identical LoRA recipe on medical visual question answering datasets, scored by a verbosity-robust LLM judge. Diffusion matches or exceeds AR on all of them, and the finetuned model (3.8B active) is competitive with frontier vision-language models; its decoding is also 3.5-4.4x faster. Beyond this parity, the diffusion model offers a drafting capability AR lacks: any-order infill. Because the canvas is denoised bidirectionally, a radiologist can fix report fragments and have the model fill the text between them, an operation inherent to diffusion but not to autoregression, which is subpar at it. This suits real reports, which are often terse or inconsistent across clinicians and institutions.

4
Representation Distribution Matching for One-Step Visual Generation

We elucidate the design space of Representation Distribution Matching (RDM), our name for the paradigm that trains a one-step image generator by matching generated and reference feature distributions under frozen pretrained encoders. We identify two design axes, how the distributions are compared and the representations they are compared in, and controlled studies along them yield three findings. First, the classical MMD, which could not train convincing generators a decade ago, becomes a strong and scalable objective once estimated right. Second, the generated batch is then the operative variable, with an optimum above 2048, far beyond customary batch sizes. Third, any single representation can be gamed, driven below the real score while images stay visibly fake, so we match against a balanced battery of encoders and evaluate with SW_r14, a Sliced-Wasserstein distance over 14 encoders that is independent of the training loss and resists gaming. Combining the preferred choices yields improved RDM (iRDM): it sets the one-step state of the art on ImageNet at SW_r14 1.30, corroborated by PickScore, a human-preference proxy our objective never optimizes, which prefers it over the prior best one-step generator on 71.2% of matched samples. The same recipe post-trains the four-step FLUX.2 [klein] into a one-step generator, surpassing the four-step version on GenEval, 0.826 to 0.794, and on PickScore, 22.76 to 22.58, in 90 H200 GPU-hours. Project page: https://alan-lanfeng.github.io/rdm/.

4
Denser neq Better: Limits of On-Policy Self-Distillation for Continual Post-Training

Continual post-training enables foundation models to acquire new knowledge while preserving existing capabilities. Recent work suggests that on-policy learning can mitigate forgetting, with on-policy self-distillation emerging as a particularly attractive approach. In this work, we revisit this optimistic view through self-distillation policy optimization (SDPO). Our experiments show that SDPO can accelerate in-domain specialization when teacher signals are stable and well aligned, but it struggles to generalize to out-of-distribution scenarios. In continual post-training, SDPO exhibits stronger forgetting and can even collapse, whereas on-policy reinforcement learning methods such as GRPO adapt more conservatively and better preserve prior capabilities. Further analyses reveal that denser self-distillation induces larger drift in both parameter space and response space, and can amplify high-frequency formatting artifacts through a self-reinforcing teacher--student loop. These findings suggest that on-policy data alone is insufficient for continual learning. Dense self-distillation can accelerate specialization when teacher targets are stable and token-level supervision is reliable, but it should not be treated as a default stabilizer for continual post-training. Our code is available at https://github.com/Moenupa/SDPO-CL.

4
Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to whether a gradient step on the selected domain benefits the remaining domains. In this paper, we propose Transfer-Aware Curriculum (TAC), a bandit-style online curriculum that prioritizes domains whose updates broadly benefit the rest of the training suite. TAC repurposes signals already produced by RL training: per-domain advantages capture local learnability, and projected gradients, taken from the GRPO step being computed, estimate cross-domain transferability via gradient-geometry alignment, at negligible cost (<1% wall-clock overhead). Across a six-domain reasoning suite, TAC achieves the best macro-averaged accuracy on both Qwen3-1.7B and Llama3.2-3B, outperforming proportional random sampling, a hand-designed schedule, and a learnability-only bandit, and improving over the last of these by up to 2.8 points (10% relative). Ablations show performance degrades sharply when the transferability term is removed, and TAC remains robust on imbalanced training mixtures where learnability-only curricula over-commit to dominant domains. Our findings establish cross-domain transferability as a key signal for curriculum design in multi-domain RLVR.

3
PACE: A Proxy for Agentic Capability Evaluation

Evaluating LLM agents on benchmarks like SWE-Bench and GAIA can be expensive, time-consuming, and requires complex infrastructure. A single evaluation can cost thousands of dollars and take days to complete. In contrast, non-agentic LLM benchmarks that test individual capabilities (e.g., reasoning, code generation) are fast and cheap to run. In this paper, we investigate whether performance on expensive agentic benchmarks can be accurately predicted by the performance on a small, carefully selected subset of atomic evaluation instances. We introduce PACE, a framework that constructs proxy benchmarks by selecting instances from existing non-agentic evaluations whose aggregate scores most reliably predict model performances on agentic benchmarks. Given a pool of candidate instances spanning atomic capabilities, PACE fits a regression that maps a model's scores on a compact subset of source instances to its score on the target agentic benchmark. The subset itself is curated by combining two complementary instance-selection strategies, target-relevance local selection and globally informative global selection. We apply PACE to the 4 target agentic benchmarks in this paper, which yields PACE-Bench, the concrete proxy benchmark that we evaluate in the paper. Experiments across 14 models, 4 agentic benchmarks, and 19 non-agentic benchmarks show that PACE-Bench predicts agentic scores with leave-one-out cross-validation (LOOCV) mean absolute error (MAE) under 4%, Spearman correlation above 0.80, and pairwise model-ranking accuracy around 85%, all at much less than 1% of the full agentic evaluation cost. We further analyze the selected proxy instances, revealing which skills each agentic benchmark uniquely demands. PACE enables practitioners to obtain reliable estimates of agentic performance during model development, selection, and routing, without the overhead of full agent evaluation.

2
Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Vision-Language-Action (VLA) models are fundamentally bottlenecked by the scarcity of expert demonstrations -- triplets of observations, instructions, and actions that are costly to collect at scale. We argue that this bottleneck stems from conflating two distinct learning objectives: acquiring physical competence (how to move) and acquiring semantic alignment (what to do). Crucially, only the latter requires language supervision. Building on this Decomposition Hypothesis, we propose Task-Agnostic Pretraining (TAP), a two-stage framework that first learns transferable motor priors from cheap, unlabeled interaction data -- including discarded off-task trajectories and autonomous robot play -- via a self-supervised Inverse Dynamics objective. A lightweight second stage then grounds these priors in language using minimal expert data. On the SIMPLER benchmark, TAP matches models trained on over 1M expert trajectories while using orders of magnitude less labeled data, yielding a 10% absolute gain over standard behavior cloning. On a real-world WidowX platform, TAP retains 25% success under camera perturbations where internet-scale baselines collapse to 0%, demonstrating that task-agnostic pretraining produces robust, transferable physical representations and offers a scalable path forward for Embodied AI.

2
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - July 3, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Osloq icon
Osloq

An AI agent that reproduces GitHub issues for you

0
Tamamon icon
Tamamon

A desktop pet that grows as you code with Claude Code

0
Glaze by Raycast icon
Glaze by Raycast

Create your own Mac apps by chatting with AI

0
Goals from Loops icon
Goals from Loops

Measure whether a campaign drove the desired outcome

0
nxt icon
nxt

Talk to your to do list and get what's next

0
Vox icon
Vox

Talk to GitHub Copilot out loud

0
Archify icon
Archify

understand software

0
Macuse icon
Macuse

Give Your AI Superpowers on macOS

0
Sidedoor icon
Sidedoor

Paste any job, find who in your network can refer you

0
scritty icon
scritty

Shared, searchable memory for every AI coding agent

0
Solaris icon
Solaris

Your company’s AI adoption and upskilling platform

0
Needle icon
Needle

The proactive GTM agent in Slack and Teams

0
Fypro icon
Fypro

Convert your TikTok followers into paying customers

0
Context.dev icon
Context.dev

One API to scrape, enrich, and extract the internet

0
Basedash Actions icon
Basedash Actions

A BI tool that can take action for you

0
Quick Sub 2: Video Subtitling icon
Quick Sub 2: Video Subtitling

Quick, creative video subtitling with direct canvas control.

0
PieterPost MCP icon
PieterPost MCP

Connect your AI agent to postal mail

0
Gaming Chat SDK by CometChat icon
Gaming Chat SDK by CometChat

Chat drops into Unreal like it was always there

0
PixFit icon
PixFit

Turn 1 creative into every ad format, instantly

0
Macro icon
Macro

Unifies your work into one app with shared memory

0
Flowly icon
Flowly

A personal AI agent that runs on your desktop and iPhone

0
html.contact icon
html.contact

A full form backend you can test before paying

0
Banger Mail icon
Banger Mail

Shared mailboxes for teams and AI agents

0
Retrace icon
Retrace

Debug AI agents by replaying and forking runs

0
Claude Sonnet 5 icon
Claude Sonnet 5

AI that plans, acts, and gets work done

0
MailAdept by mailwarm icon
MailAdept by mailwarm

AI Agents & Email deliverability experts on your team

0
Adam CAD Copilot icon
Adam CAD Copilot

AI CAD inside Onshape and Fusion

0
Tabstack Browser Automation icon
Tabstack Browser Automation

Automate the web in your app or agent, no browser to host

0
Acti icon
Acti

Agentic keyboard for mobile commands and search

0
Sequence Agentic icon
Sequence Agentic

Money movement for AI agents

0
Mark by Airtop icon
Mark by Airtop

Vibe automation for solo marketers

0
Aruki icon
Aruki

The Japanese walking method, coached on your iPhone

0
Gemini Omni Flash icon
Gemini Omni Flash

High-quality video generation and conversational editing

0
LightTwist icon
LightTwist

Record & stream your show in a realistic virtual studio

0
RunInfra icon
RunInfra

Describe the AI model you need and get an optimized AI

0
Humalike icon
Humalike

Give your AI agents the social intelligence they're missing

0
OASIS 1 Ring icon
OASIS 1 Ring

Whisper to write and touch to edit

0
Browser Notes icon
Browser Notes

Your ideas, organized - not uploaded

0
Clusy icon
Clusy

AI notebook platform for modern data science

0
Dump Memory icon
Dump Memory

We fix your memory

0
Modelence Mobile Builder icon
Modelence Mobile Builder

Build mobile apps by chatting with AI

0
Metal icon
Metal

AI-driven operating system for raising venture rounds

0
Stigg 2.0 icon
Stigg 2.0

The usage runtime for AI products

0
Folderly Lens icon
Folderly Lens

Domain health analysis for high performance email campaigns

0
Fuser Apps icon
Fuser Apps

Vibecode apps, sites & games on everyone's favorite canvas

0
Bamboo icon
Bamboo

Markdown notes with AI under your control

0
N71 icon
N71

Give all your AI agents one shared context

0
Wins 3.4 icon
Wins 3.4

Snap, switch, and arrange Mac windows from the notch

0
Saldor icon
Saldor

Speed up procurement and AP.

0
Loot icon
Loot

Collect your favorite things in real life

0
06

TECHMEME

06.00
TECHMEME

Techmeme - July 3, 2026

Techmeme Digest: Major tech headlines and industry conversations.

How Jeff Bezos' changing relationship with President Trump has impacted his companies, including increased federal contract awards during Trump's second term (Wall Street Journal)
Source: TechmemePublished: Jul 3, 2026

Wall Street Journal : How Jeff Bezos' changing relationship with President Trump has impacted his companies, including increased federal contract awards during Trump's second term —  Space company has booked rapid growth under this administration, after founder spent president's first term being ‘hated’

Filing: GoDaddy challenges a New Delhi court ruling requiring domain sellers to stop offering privacy by default, saying it could expose website owners globally (Reuters)
Source: TechmemePublished: Jul 3, 2026

Reuters : Filing: GoDaddy challenges a New Delhi court ruling requiring domain sellers to stop offering privacy by default, saying it could expose website owners globally —  The world's biggest internet domain seller, GoDaddy, has warned that India's crackdown on fake websites impersonating famous brands …

India's IT secretary said the country is investigating a data breach at Apple supplier Tata, which exposed files that included photos of iPhone 18 Pro models (Reuters)
Source: TechmemePublished: Jul 3, 2026

Reuters : India's IT secretary said the country is investigating a data breach at Apple supplier Tata, which exposed files that included photos of iPhone 18 Pro models —  India is investigating a data breach at Tata Electronics that exposed documents linked to Apple's (AAPL.O) unreleased iPhone 18 Pro …

Pitch document: Chris Larsen and Palmer Luckey invested an undisclosed amount in APEC, a derivatives exchange founded by the son of Senator Kirsten Gillibrand (Declan Harty/Politico)
Source: TechmemePublished: Jul 3, 2026

Declan Harty / Politico : Pitch document: Chris Larsen and Palmer Luckey invested an undisclosed amount in APEC, a derivatives exchange founded by the son of Senator Kirsten Gillibrand —  Democratic megadonor Chris Larsen has backed Sen. Kirsten Gillibrand's political campaigns several times over the years.

Sources: Alibaba has banned employees from using Claude Code and asked them to remove all Claude models from their work computers, citing security concerns (The Information)
Source: TechmemePublished: Jul 3, 2026

The Information : Sources: Alibaba has banned employees from using Claude Code and asked them to remove all Claude models from their work computers, citing security concerns —  Alibaba Group has banned employees from using Anthropic's Claude Code, and asked them to remove all Claude models from their work computers …

A look back at BitTorrent, launched by Bram Cohen 25 years ago, and how media piracy fueled its growth while its architecture shielded it from legal liability (Janko Roettgers/The Verge)
Source: TechmemePublished: Jul 3, 2026

Janko Roettgers / The Verge : A look back at BitTorrent, launched by Bram Cohen 25 years ago, and how media piracy fueled its growth while its architecture shielded it from legal liability —  Cath Virginia / The Verge … Twenty-five years ago today, a young, little-known programmer by the name of Bram Cohen fired off …

Sources: Anthropic moves to close loopholes that let Chinese firms like Ant use its models via workarounds including cloud providers and overseas subsidiaries (Financial Times)
Source: TechmemePublished: Jul 3, 2026

Financial Times : Sources: Anthropic moves to close loopholes that let Chinese firms like Ant use its models via workarounds including cloud providers and overseas subsidiaries —  Engineers are still finding ways to use AI models despite stringent restrictions.  Anthropic is moving to shut loopholes …

Blackstone's QTS abandons plans to build its portion of a 2,100-acre data center campus in Virginia, following years of local opposition and legal challenges (Dawn Lim/Bloomberg)
Source: TechmemePublished: Jul 3, 2026

Dawn Lim / Bloomberg : Blackstone's QTS abandons plans to build its portion of a 2,100-acre data center campus in Virginia, following years of local opposition and legal challenges —  Blackstone Inc.'s QTS is walking away from plans to build its portion of a 2,100-acre data center campus in Virginia …

Letter: chip sector group SEMI, including Micron and Samsung, warns Scott Bessent US policies affecting prices or production capacity would worsen the shortage (Maggie Eastland/Bloomberg)
Source: TechmemePublished: Jul 3, 2026

Maggie Eastland / Bloomberg : Letter: chip sector group SEMI, including Micron and Samsung, warns Scott Bessent US policies affecting prices or production capacity would worsen the shortage —  Government attempts to address the global memory chip shortage by influencing prices or production capacity would worsen …

Sources: Alexandr Wang said Meta's model currently in training, codenamed Watermelon, matches GPT-5.5 and uses an "order of magnitude more compute than Avocado" (Business Insider)
Source: TechmemePublished: Jul 3, 2026

Business Insider : Sources: Alexandr Wang said Meta's model currently in training, codenamed Watermelon, matches GPT-5.5 and uses an “order of magnitude more compute than Avocado” —  Meta is making significant progress in the AI model race, its superintelligence chief Alexandr Wang told employees today.

Sources: Anthropic's bankers have hired UK law firm Freshfields to advise on its IPO; it also advised on Google's acquisition of Wiz and ServiceNow's Armis deal (The Information)
Source: TechmemePublished: Jul 3, 2026

The Information : Sources: Anthropic's bankers have hired UK law firm Freshfields to advise on its IPO; it also advised on Google's acquisition of Wiz and ServiceNow's Armis deal —  A British law firm has scored a big win: a role in the Anthropic initial public offering, expected to raise tens of billions at a valuation of more than $1 trillion.

Spotify removed 500K+ streams of Malcolm Todd's Earrings after a 70% surge in 24 hours sent it to #1 on Spotify USA and coincided with suspicious Kalshi wagers (Stephanie Stacey/Financial Times)
Source: TechmemePublished: Jul 3, 2026

Stephanie Stacey / Financial Times : Spotify removed 500K+ streams of Malcolm Todd's Earrings after a 70% surge in 24 hours sent it to #1 on Spotify USA and coincided with suspicious Kalshi wagers —  Platform removes more than 500,000 streams of Malcolm Todd track ‘Earrings’ on concern traders propelled it to number 1

Finnish quantum computing company IQM closed up 2% in its Nasdaq debut Thursday after going public via a SPAC merger at a ~$1.9B valuation (Anna Heim/TechCrunch)
Source: TechmemePublished: Jul 2, 2026

Anna Heim / TechCrunch : Finnish quantum computing company IQM closed up 2% in its Nasdaq debut Thursday after going public via a SPAC merger at a ~$1.9B valuation —  IQM, a full-stack quantum company out of Finland, went public on the Nasdaq Thursday via a SPAC merger at a valuation of about $1.9 billion.

Sources: Crusoe is in active talks to raise ~$3B in a funding round expected to value the company in the ~$30B range, up from a ~$10B valuation in October (Bloomberg)
Source: TechmemePublished: Jul 2, 2026

Bloomberg : Sources: Crusoe is in active talks to raise ~$3B in a funding round expected to value the company in the ~$30B range, up from a ~$10B valuation in October —  Crusoe, the data center upstart with contracts to supply AI computing power for the likes of Meta Platforms Inc. and Oracle Corp. …

At a town hall, Mark Zuckerberg said Meta's AI agent development has not accelerated as expected and its reorganization was not as "clean" as it could have been (Katie Paul/Reuters)
Source: TechmemePublished: Jul 2, 2026

Katie Paul / Reuters : At a town hall, Mark Zuckerberg said Meta's AI agent development has not accelerated as expected and its reorganization was not as “clean” as it could have been —  Meta (META.O) Chief Executive Mark Zuckerberg told an internal town hall on Thursday that AI agent development …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - July 3, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - July 3, 2026

Solidot Feed: Highlighting essential tech & open-source news.

阿里巴巴禁止员工使用 Claude Code

阿里巴巴因担心后门禁止员工在工作中使用 Anthropic 的 Claude Code,它要求员工使用自家的编程平台 Qoder。尽管 Anthropic 限制了中国用户和实体的访问,但 Claude Code 在中国程序员中间仍然非常受欢迎。Anthropic 上个月指控阿里巴巴蒸馏了其模型,几天前它的 Claude Code 被发现包含了检测用户是否来自中国的代码。

Valve 开源 Steam Machine 的电子墨水屏

Valve 不会为 Steam Machine 游戏机提供正面的电子墨水屏,但它开源了相关技术,允许任何人自己动手为 Steam Machine 安装墨水屏。相关文件在 MIT 许可证下发布在 GitLab。Valve 将该项目称之为 Inkterfac,用户需要准备: 1 x Adafruit ESP32 Feather with 2MB PSRAM. 1 x Adafruit eInk Breakout Friend. 1 x Adafruit 5.83" Monochrome eInk Panel. 13 x M2.5 x 5mm Pan Head Machine Screws. 4 x 1/4" x 1/4" x 3/16" Stepped Magnet SB443-OUT.

阿里巴巴与美国达成和解将支付 6 亿美元

美国司法部表示,因未能阻止商家进口和销售违法药物,阿里巴巴集团及蚂蚁集团子公司将支付 6 亿美元,目前已就此达成和解协议。美国司法部不会起诉两家公司。阿里巴巴在 2016 年 1 月至 2024 年 12 月期间,未能防止商家通过阿里运营的电商平台向美国进口和销售违法化学物质、药品以及假药制造设备。蚂蚁子公司 AUS 则在 2020 年 1 月至 2023 年 12 月期间提供了支付服务。销售的违法商品达到 8 万笔,交易总额超过 2 亿美元。阿里巴巴将被没收 2 亿美元,AUS将被没收 1.9 亿美元,此外,阿里巴巴还将支付 1.25 亿美元罚金,AUS 将支付 8500 万美元罚金。

日本最高法院裁决 AI 不能被列为专利申请的发明人

日本最高法院驳回了一位美国工程师要求将 AI 列为专利申请发明人的上诉。日本最高法院维持了下级法院的判决,即根据专利法,专利申请的发明人必须是“自然人”。原告于 2020 年提交了一份专利申请,发明人是原告创建的 DABUS AI。专利局要求原告提供自然人的姓名作为发明人。原告拒绝提供,因此该申请被驳回。日本最高法院表示,专利法并未预见到 AI 的快速发展,而关于是否应授予 AI 发明专利权的问题“鉴于其对社会的影响,需要进行讨论”。

代糖会扰乱肠道健康和新陈代谢

越来越多的研究表明,人工甜味剂和非营养性甜味剂可能会扰乱新陈代谢。代糖在食品中已经无处不在。根据发表在《Current Atherosclerosis Reports》期刊上的一篇综述和荟萃分析,研究人员发现相比水或安慰剂等非热量对照组,人工甜味剂和低热量甜味剂会导致空腹胰岛素水平升高,糖化血红蛋白(HbA1c,长期血糖控制的标志物)升高,显示出胰岛素敏感性恶化的趋势。研究人员表示,一种解释与肠道微生物群有关。非营养性甜味剂会通过肠道与这些微生物直接接触,研究表明它会改变肠道微生物群的组成和功能。

苹果寻求从长鑫和长江存储采购内存

内存价格危机正迫使美国硬件制造商违背政府意愿尝试与中国内存制造商达成交易。苹果正与长鑫存储和长江存储谈判采购内存,以缓解全球内存短缺的影响。两家中国公司都被五角大楼列入了 1260H 名单,该黑名单并不具有法律上的禁止交易效力。但如果同时被列入美国商务部的实体清单,美国公司与之交易会受到限制,这正是苹果寻求阻止的。苹果可能会尝试仅在销往中国的苹果设备中使用中国内存芯片,以此限制负面影响。

Google 的 2025 年用电量增长了 37%

Google 通过最新的可持续发展报告承认,该公司自 2019 年以来用电量增长了逾 250%,在 2024 年增长 27% 基础上 2025 年又增长了 37%。Google 将这一切归于 Google Cloud、YouTube 视频串流以及 AI 基础设施的建造和运营的持续增长。Google 数据中心在 2025 年消耗了逾 4200 万 MWh 电力,2024 年则是 3060 万 MWh。这意味着 Google 数据中心的能源消耗量相当于新西兰、丹麦和尼日利亚等国全国的电力消耗量。

OpenAI 磋商将 5% 股份送给美国政府

随着 AI 公司试图缓和与特朗普政府的关系,OpenAI 正磋商向美国政府捐赠其 5% 的股份。OpenAI CEO Sam Altman 认为,向美国公众提供该公司的股份是分享 AI 好处的最佳方式。它的提议还建议还其它美国 AI 公司向政府捐出类似的股份,目前尚不清楚 Anthropic、Google 和 Meta 等公司是否会同意该计划。OpenAI 高管建议,美国 AI 公司应将 5% 股份捐给主权基金 Alaska Permanent Fund。这一谈判是“概念性的”,还处于早期阶段,任何协议可能需要国会通过法案才能实施。

轨道数据中心的炒作和现实

SpaceX 创始人 Elon Musk 今年一月在达沃斯世界经济论坛上宣称,最迟三年轨道数据中心就能实现。随后 SpaceX 向 FCC 递交申请发射 100 万颗卫星建立轨道数据中心星座。Musk 总是喜欢夸大其词,他说完全自动驾驶汽车将在 2017 年实现,载人火星任务将在 2024 年实现,到 2025 年底将会制造出 1 万台 Optimus 人形机器人。目前地球轨道上约有 14,500 颗卫星,Starlink 星座占了三分之二,要部署 100 万颗卫星,SpaceX 的火箭发射频率和卫星制造能力都需要大幅提升。SpaceX 下一代火箭 Starship 能将 60 颗卫星发射到轨道上,100 万颗卫星至少需要执行 16,666 次发射。SpaceX 在 2025 年创下了 165 次轨道发射纪录,如果将发射频率提高到 10 倍,也需要十年才能发射完毕。Starlink 卫星的建造速度为每年 4000 颗,除非卫星制造发生革命性变革,制造 100 万颗卫星也需要约 25 年。轨道数据中心星座距离现实还遥遥无期。这还没有考虑轨道数据中心所需要的庞大散热器、以及辐射、维护、轨道碎片等问题。那么为什么 SpaceX 要大力宣传轨道数据中心?为了钱。IEEE Spectrum 的 Dina Genkina 称,Musk 在自己给自己发钱上几乎是天才,他让 xAI 负责建造数据中心,SpaceX 负责将它们发射到太空,特斯拉负责制造太阳能电池板,他就像是自己给自己发工资。

DGX Spark 黑客松线上训练营:4 小时干货,从环境配置到具身智能,手把手教你搭出能跑的 Agent

NVIDIA DGX Spark 黑客松开赛即报满,但赛事之外还有一场更适合"先蹭一波再决定要不要打"的硬核直播 直播时间:7 月 12 日 10:00 - 12:00 训练营内容: 1· 黑客松赛事规则说明:解读赛事机制、评审标准与提交流程,帮助团队明确方向、高效备赛。 2 · 基于 DGX Spark 和 Step 3.7 搭建本地 Agent Team 的最佳实践:从环境配置到模型推理,讲解如何在 DGX Spark 上高效落地Stepfun3.7模型能力。 3 · Agent 一键出片:基于 DGX Spark 搭建本地视觉生成智能体 演示如何构建具备视觉理解与内容生成能力的本地 Agent,打通从提示词到成片的完整链路。 4 · 从本地 AI 到具身智能:基于 DGX Spark 构建桌面机器人 Agent 开发平台 探索 Agent 从软件走向物理世界的实现路径,展示 DGX Spark 在具身智能场景下的开发实践。

久坐不动者肌肉线粒体功能出现显著下降

研究人员发现健康但久坐不动者其肌肉线粒体功能出现了显著且一致的下降。这可能是重大疾病发生的先兆。论文资深作者 Iñigo San Millan 表示,线粒体功能是代谢健康的核心,如果你 40 岁,身体健康但久坐不动,那么细胞很可能已出现问题,这些问题可能会在 10-15 年后给你带来麻烦。研究对象为 9 名久坐不动的男性和 10 名经常运动的男性,年龄均约为 42 岁。研究人员分析了肌肉活检以观察线粒体燃烧燃料的效率,并进行了运动测试以测量受试者的体能、脂肪燃烧能力和血乳酸水平——血乳酸水平是衡量身体能量消耗程度的关键指标。相比经常运动的男性,久坐不动的男性的线粒体效率在多个类别中下降了 28%-36%;将糖转化为可用能量的关键蛋白 MPC1 的水平降低了 49%,脂肪运输到线粒体的 CPT1 酶的活性降低了约一半;最大摄氧量(VO2max)降低了 38%,运动时血液乳酸水平升高了 60%。

来自 Google 的 Android 恶意程序

Android 自由软件应用商店 F-Droid 警告,过去几个月 Google 向多达 40 亿 Android 设备推送了被称为 Android Developer Verifier(ADV)的恶意程序。它以系统服务的形式在后台秘密运行,拥有完整的 root 权限,正静静等待 Google 的激活信号。ADV 服务无法屏蔽、禁用或移除。一旦激活,它唯一的目的就是阻止用户运行未经 Google 批准的开发者应用。Google 是以安全的名义强制推行 Android 开发者验证计划。根据 Android Developer Console 服务条款,如果开发者“违反任何条款,或者分发恶意应用或其它有害应用,Google 可能会终止您对 ADC 的访问……”,Google 没有定义恶意应用或有害应用,这意味着一款应用是否是恶意应用由 Google 判断,而作为最大的广告公司,广告屏蔽应用在其眼里可能就属于恶意应用。Google 预计从 9 月 30 日开始逐步激活 ADV。

越来越多的儿童使用 AI

根据来自 10 个国家的新数据,联合国儿童基金会估计,至少有 2000 万儿童使用过人工智能,且青少年采用该技术人数的增长速度是成年人的三倍多。最引人注目的是,据估计约有 200万 儿童——约占十分之一——表示会向人工智能寻求关于自身担忧的建议,另有 1300 万儿童表示使用人工智能来协助完成学校作业和家庭作业。儿基会表示:“人工智能已经到来。它正日益成为我们生活的一部分,它已经在塑造全球儿童的成长历程——无论好坏。”尽管人工智能为学习和创造力提供了新机遇,但儿基会警告称,关于其对儿童发展、情绪健康以及可能面临的伤害的影响,相关证据才刚刚开始浮出水面。该机构表示:“实际上,这一代人正在一场全球性实验中成长。”它敦促各国政府和科技公司将儿童权利置于人工智能监管的核心位置。

全球昆虫物种可能有 2000 万

科学家长期以来一直就昆虫物种的确切数量争论不休,此前普遍认为约为 600 万种。过去 3 个世纪里,昆虫学家已描述了约 100 万种昆虫,但要发现并描述所有物种,是一项艰巨甚至不可能完成的任务。为更准确估算昆虫多样性,研究人员研究了哥斯达黎加瓜纳卡斯特国家公园多年的昆虫调查数据,并应用了借鉴自流行病学领域的统计方法。随后利用另一个高度多样化的生物群体——树木,将这一数字推及全球范围。如果昆虫的多样性也遵循相同的比例,那么地球上大约有 1330万~2470 万种昆虫,一个稳妥的中间值是 2030 万种。研究人员表示他们的估算数字较为保守,这意味着可能还有数百万种尚未被发现的昆虫物种。

Cloudflare 推动 AI 公司为内容付费

Cloudflare 宣布推出新的控制功能,赋予内容出版商更多控制权,更好的掌控 AI 公司如何访问和使用其内容。 从 9 月 15 日起,新 Cloudflare 网站将允许传统的搜索引擎索引,但默认会屏蔽 AI 训练机器人和 AI 智能体访问广告支持的网页。Cloudflare 还在扩展其变现努力,推出了一种按使用付费模式(Pay-Per-Use),目的是当出版商的内容为 AI 生成的答案做出贡献时给予它们补偿,而不仅仅是让内容被抓取。Cloudflare 认为,出版商不应被迫在提高在线曝光率和免费向 AI 系统提供内容之间做出选择。

科学家首次利用非生命成分制造出细胞

明尼苏达大学的合成生物学家首次将非生物成分逐一装入类细胞的膜,见证该分子袋开始表现出类生命行为。这种人工合成细胞能生长、复制 DNA 并分裂,展示了细胞周期的基本功能。就任何定义而言,这个细胞都不是活着的。它离不开源源不断的营养物质和核糖体——合成蛋白质的分子机器。它没有防御机制,没有完善的废物处理系统。但迄今为止它最有力地证明从非生命物质创造生命是可能的,而这正是合成生物学家几十年来一直追求的目标。大约 40 亿年前,非生物分子聚集在一起形成了最早的原细胞。它们吸收养分、生长和分裂。随着时间的推移,这些细胞演变分化成不同的类型,用各种奇特的生物装饰这个原本贫瘠的世界。科学家对从非生命到生命的这种转变是如何发生的至今仍存在争议,部分科学家已开始在实验室进行尝试。

Anthropic 将移除检测中国用户的秘密代码

Anthropic 工程师表示将在周三发布补丁移除几个月前添加到 Claude Code 中的隐藏代码,这些代码旨在阻止其它 AI 公司蒸馏其模型。Claude Code 工程师 Thariq Shihipar 表示,“它是我们 3 月启动的一项实验,旨在防止未经授权的转售商滥用账户,以及防止模型蒸馏。团队此后已采取了更有效的缓解措施,实际上我们早就打算移除这些代码了。”在这之前有开发者发现 Claude Code 包含了秘密代码检查基本 URL 环境变量,该变量用于将 API 请求路由到代理或网关。如果基本 URL 已被覆盖,代码会继续检查系统时区,以及主机名是否与已知中国 AI 实验室、其它 AI 公司、账户转售商和网关域名列表中的任何条目匹配。

瑞典法院判决 Google 向比价网站赔偿 15 亿美元

瑞典法院以 Google 在搜索结果中偏袒自家购物服务为由判决它向比价网站 PriceRunner 赔偿约 15 亿美元(143 亿瑞典克朗)。这是瑞典法院在反垄断诉讼中判处的最高金额罚款,但远低于 PriceRunner 寻求的 780 亿瑞典克朗赔偿。PriceRunner 于 2022 年起诉 Google,指控 Google 操纵搜索结果。2008 年 Google 开始在搜索结果中突出展示其比价购物服务,导致竞争对手的比价网站流量急剧下降。2017 年时任欧盟竞争事务专员 Margrethe Vestager 以 Google 利用其比价购物服务获取不公平优势对其处以罚款。Google 于 2021 年对该裁决提出上诉但被驳回。之后欧洲的多家比价网站提起了赔偿诉讼。

索尼 PS 从 2028 年 1 月起不再发售新游戏的光盘版

数字游戏是未来,索尼正式宣布其 PS 游戏机从 2028 年 1 月起不再发售新游戏的实体光盘版本。这也意味着未来的的 PS 游戏机不会再发售包含蓝光光驱的型号。索尼称 2028 年 1 月之前已发售或即将发售的游戏实体光盘版不受影响。消费者普遍偏爱数字媒介而不是实体光盘,索尼表示它只是顺应这一趋势罢了。

Godot 拒绝接受 AI 生成的代码

开源项目都面临 AI 代码越来越多的问题,现在负责开发开源游戏引擎 Godot 的基金会宣布修订贡献者政策,禁止递交 AI 署名的代码和 AI 智能体提交的 pull request,以及在人与人之间的沟通中禁止 AI 生成文本——机器翻译除外。新政策旨在限制 AI Slop,鼓励维护者审查代码,将新贡献者培养成未来的维护者,最重要的是要求所有贡献都必须来自对代码负责的人类,修复出现问题的代码。基金会称,“AI 不能承担责任,我们也不能指望 AI 的重度用户能充分理解他们的代码并能进行修正。”

09

APP STORE RANK

09.00
APP STORE RANK
Loading…