TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0913
WED, JUL 1, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · WED, JUL 1, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

July 1, 2026

Here is a summary of today's main news events, based on the information provided.


U.S. Stock Markets End Quarter with Major Gains

U.S. stock markets concluded the second quarter with a powerful rally. The Dow Jones Industrial Average closed June at a record high, while the S&P 500 and Nasdaq recorded their best quarterly performances since 2020, rising 15% and 21% respectively over the period.

Japanese Yen Hits 40-Year Low Against the Dollar

The Japanese yen fell to its lowest point against the U.S. dollar in 40 years, putting financial markets on high alert. Traders are now watching closely for potential currency intervention by the Japanese government, which may act to support the struggling yen.

Oil Prices Fall Amid U.S.-Iran Diplomatic Progress

Global oil prices have decreased, with Brent crude trading below $73 a barrel. The drop is attributed to continued diplomatic talks between the U.S. and Iran, which has eased market fears of potential conflict and supply disruptions in the Middle East.

President's Financial Disclosure Shows Over $1 Billion in Earnings

The latest financial disclosure report for the U.S. President reveals earnings of more than $1 billion. The income was generated from a wide range of sources, including digital currency interests, real estate and stock trades, legal settlements, and licensing agreements for various consumer products.

Prominent AI Model Shutdown Extends Over Security Concerns

A major artificial intelligence model has been shut down for nearly three weeks, creating significant disruption within the AI industry. The shutdown was prompted by unspecified security concerns, and its extended duration is raising questions about the stability and safety of advanced AI systems.

European Natural Gas Prices Rise on Supply Uncertainty

The price of natural gas in Europe has increased due to uncertainty over supplies. The rise is linked to disruptions in the Middle East and record-high temperatures across parts of the continent, which are boosting demand for energy.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - July 1, 2026

Hacker News Feed: Highlighting key posts and discussions.

ArXiv's Next Chapter

(blog.arxiv.org)

18755
Redeploying Fable 5

(www.anthropic.com)

13440
Leanstral 1.5

(docs.mistral.ai)

271113
Claude Sonnet 5

(www.anthropic.com)

1180722
Claude Science

(claude.com)

520151
Nano Banana 2 Lite

(deepmind.google)

409169
Knoppix

(www.knopper.net)

323116
Open Source Low Tech

(opensourcelowtech.org)

642138
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - July 1, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

Orca: The World is in Your Mind

We introduce Orca, an initial instantiation of a general world foundation model. Orca learns a unified world latent space from multimodal world signals and exposes it through multimodal readout interfaces. Rather than optimizing isolated next-token, next-frame, or next-action prediction, we are centered on Next-State-Prediction modeling, offering a unified state-transition modeling route toward understanding, predicting, and acting upon the world. Orca learns through two complementary paradigms: unconscious learning captures dense natural state transitions from continuous videos, and conscious learning models sparse meaningful state transitions by language-described events and VQA supervision. For pre-training, we construct a large-scale world-learning inventory data, including 125K hours of video data and 160M event annotations. After pre-training, Orca learns a unified world latent space. To examine whether the learned latent supports downstream, we evaluate it by three representative downstream readouts: text generation, image prediction, and embodied action generation. Orca's backbone is frozen, and only the lightweight modality-specific decoders are trainable. Experiments show the scalability of the proposed paradigm and verify that stronger world latent enables stronger downstream readouts. Orca outperforms similar-sized specialized baselines. These results show that Orca, as a general world foundation model, presents a promising approach to understanding, predicting, and acting upon the world. Finally, we discuss the current limitations, aiming to provide useful insights and inspiration for the community.

170
Dockerless: Environment-Free Program Verifier for Coding Agents

Program verifiers play a central role in training coding agents, including selecting trajectories for supervised fine-tuning (SFT) and providing rewards for reinforcement learning (RL). Standard execution-based verification requires running unit tests inside per-repository environments such as Docker images, incurring substantial environment setup costs. We propose Dockerless, an environment-free agentic patch verifier that evaluates generated code patches without executing them. Rather than simply matching candidate patches to references, Dockerless judges patch correctness using evidence gathered through agentic repository exploration. On a verifier evaluation benchmark, Dockerless outperforms the strongest open-source verifier by 14.3 AUC points. Using Dockerless as both the SFT trajectory filter and the RL reward enables a fully environment-free post-training pipeline. The resulting model reaches 62.0%, 50.0%, and 35.2% resolve rate on SWE-bench Verified, Multilingual, and Pro, respectively. It surpasses the Qwen3.5-9B baseline by 2.4, 8.7, and 2.9 points, matching environment-based post-training.

79
DOPD: Dual On-policy Distillation

On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance frontier of distillation, an intuitive direction is to infuse privileged information to either teacher or student itself. However, this additional input induces a potential failure mode we dub privilege illusion: a pattern that conflates the transferable capability gap that students are meant to close, and the information asymmetry gap that can only be mimicked but never replicated. This issue is further amplified by the inherent non-uniformity of token-level supervision, where only a small subset of tokens carries pivotal capability-bearing signals. To this end, we propose DOPD, an advantage-aware dual distillation paradigm that dynamically routes token-level supervision between privileged teacher and privileged student policies based on their advantage gap and relative probabilities. Each token receives supervision of different strength, objective, and strategy from either teacher or student itself, which transfers credible capability while simultaneously receiving auxiliary signals, to alleviate privilege illusion. Extensive experiments on both large language model (LLM) and vision-language model (VLM) settings demonstrate that DOPD consistently outperforms Vanilla OPD and other counterparts. Further results on stability, robustness, continual learning, and out-of-distribution tasks validate its superiority.

70
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based speculative decoding further improves parallelism by generating multiple tokens per forward pass via block-level diffusion, achieving state-of-the-art (SOTA) performance. However, existing methods adopt a fixed inference block size and assume a uniform optimal decoding strategy across all inputs. In this paper, we show that this assumption is suboptimal, as the optimal block size varies across samples and plays a critical role in speculative decoding performance. Moreover, these values exhibit a clear local structure, concentrating around the training block size, which reduces the problem to a low-dimensional and structured decision space. Based on these insights, we propose BlockPilot, a sample-adaptive policy that predicts the optimal block size from the prefilling representation. Specifically, we formulate block size selection as a lightweight policy learning problem and propose an instance-adaptive decision mechanism that predicts the optimal block size based on the representation of the prefilling stage. The prediction is performed only once after prefilling, allowing for seamless integration. Extensive experiments demonstrate that our method is plug-and-play, introduces minimal overhead, and consistently improves efficiency, achieving an acceptance length of 5.92 and a 4.20times speedup on Qwen3-4B under temperature T=1.

65
Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views

A 3D scene is understood through its objects, not the primitives that compose them. Yet feed-forward reconstruction methods output dense, unstructured sets of points or Gaussians, leaving object-level structure to be recovered after the fact. We propose a feed-forward framework that decomposes a scene into instance-structured 3D token groups directly from unposed multi-view images -- compact object-centric units from which reconstruction, segmentation, and manipulation all follow. Each token group pairs an instance token capturing entity-level identity with anchor tokens that encode local geometry and appearance, which are decoded into a set of 3D Gaussians. This two-level factorization decouples object identity from local appearance, making object instances a native interface of the representation rather than a derived product. The token groups are learned through differentiable rendering with joint reconstruction and segmentation supervision, requiring no 3D annotations. Our feed-forward model surpasses per-scene optimization baselines in class-agnostic instance segmentation while remaining competitive in novel view synthesis. Beyond these metrics, the same token groups directly unlock instance-level scene editing -- removing, translating, or inserting objects by operating on their groups -- as well as efficient open-vocabulary 3D instance retrieval, where retrieval complexity scales with the number of instances rather than primitives.

28
GEAR: Guided End-to-End AutoRegression for Image Synthesis

Visual generative models are typically trained in two stages. A tokenizer is first trained for reconstruction and then frozen, after which a generator is trained on its discrete indices or continuous latents. This decoupling leaves the tokenizer unaware of what the generator finds easy to model. We present GEAR (Guided End-to-end AutoRegression), which trains a vector-quantized (VQ) tokenizer and an autoregressive (AR) generator jointly and end-to-end, guided by representation alignment. The key obstacle is that the VQ index fed to the AR model is non-differentiable, so gradients cannot reach the tokenizer, and a straight-through estimator collapses. GEAR resolves this with a dual read-out of the codebook assignment. A hard, one-hot branch trains the AR with next-token prediction, while a differentiable soft branch carries a representation-alignment loss that flows back to guide only the tokenizer. The AR model thereby steers its tokenizer toward an index distribution it can predict more easily. This shifts the alignment burden from the tokenizer to the AR: the tokenizer's own features become less DINOv2-like while the AR's become more so, the opposite of diffusion-side recipes that make the latent itself semantic. GEAR speeds up ImageNet gFID convergence by up to 10x relative to the strong LlamaGen-REPA baseline, learns markedly better patch-level and spatially-coherent features, and generalizes across quantizers (VQVAE, LFQ, IBQ) and to text-to-image generation.

25
SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need to interpret prior revisions, evaluations, and rejected alternatives. We introduce SkillHone, a harness for continual agent skill evolution grounded in persistent decision history. SkillHone pairs skill revisions with evaluation-side evidence that supplies practice feedback, recording structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes with redacted reporting and propose revisions informed by prior decisions, enabling cross-session refinement without rediscovering past rationale. On deep-research benchmarks, SkillHone runs without a pre-integrated search stack and outperforms the commercially backed deep-research agent by 15.8 points on GAIA and 3.2 points on WebWalkerQA-EN, while also exceeding prior skill-evolution methods. We further deploy SkillHone on internal tool-mediated analysis scenarios, where it improves accuracy by an average of 18.8 points across seven settings.

19
Multi-Block Diffusion Language Models

Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a running-set of consecutive blocks is decoded concurrently for inter-block parallelism. However, existing BD-LMs are mostly trained under teacher forcing, where the model observes only one noisy block conditioned on a clean prefix. While the recent diffusion forcing strategy introduces visibility among multiple noisy blocks, its training states still differ from MultiBD inference, where decoding operates on a bounded running-set with heterogeneous slot-wise noise patterns. To bridge this gap, we propose Multi-Block Diffusion Language Models (MBD-LMs), obtained by post-training BD-LMs with Multi-block Teacher Forcing (MultiTF). MultiTF integrates teacher forcing and diffusion forcing by training on bounded noise-groups conditioned on clean prefixes, with randomized noise-schedulers that better match MultiBD inference states. To make MultiBD practically executable, we further introduce an optimized decoding algorithm based on the Block Buffer mechanism that preserves prefix-cache reuse, keeps input shapes static, and translates increased decoding parallelism into wall-clock acceleration. Empirically, MBD-LLaDA2-Mini increases average Tokens Per Forward pass (TPF) from 3.47 to 6.19 and improves average accuracy from 79.95% to 81.03%; when combined with DMax, MBD-LLaDA2-Mini-DMax reaches an average TPF of 9.34 with only a 1.02% accuracy drop on math and code benchmarks.

19
Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on optimization tasks, including open mathematical conjectures, GPU kernel design, scientific law discovery, and combinatorial puzzles. To achieve this, prior work applied search scaffolds to one target task at a time, so every new problem is approached from scratch and the experience accumulated during search is discarded once the model finishes its attempt. This leaves the capability of iteratively evolving a solution (e.g., knowing which part to mutate and how, deciding when to backtrack) entirely in the scaffold rather than in the model itself. Whether the model itself could acquire this capability and reuse it across different tasks has been largely unexamined. To address this, we introduce Evolution Fine-Tuning (EFT), a mid-training paradigm that teaches LLMs to evolve solutions across tasks by converting evolutionary search trajectories into supervision. We construct Finch Collection, a 156K-trajectory dataset spanning 10 domains and 371 optimization tasks, and fine-tune open-source LLMs from 2B to 9B parameters. Empirically, EFT confers cross-task generalization: across 22 held-out tasks, our models surpass their base counterparts by 10.22% on average. Furthermore, when paired with test-time RL, our model matches state-of-the-art performance on two circle-packing tasks and outperforms its base-model counterpart on the Erdős minimum-overlap problem. EFT thus serves as a "practice phase" for general-purpose discovery agents that do not solve new problems from scratch.

19
MemLearner: Learning to Query Context memory for Video World Models

Video World Models are interactive video generation models that predict future world states based on user actions and history video frames. A critical challenge in video world models is the lack of memory, causing inconsistent generated scenes over extended durations. Previous methods explored rule-based context frame retrieval as memory, but they fail to generalize in scenarios with scene occlusions and dynamic objects. We propose MemLearner, a learning-based adaptive context query method using query tokens to bridge context and predicted tokens. By leveraging the video generation model itself for context querying, MemLearner exploits pre-trained visual priors without training additional modules from scratch, and incorporates efficient strategies for training and inference. We collect a dataset of long videos with scene occlusions and dynamic objects, paired with camera pose annotations, and propose a multi-dataset training strategy leveraging both annotated rendered and unannotated real-world videos. Extensive experiments demonstrate that MemLearner significantly outperforms prior video world models in terms of scene consistency and memory, particularly under challenging occlusion and dynamic scenarios.

16
Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation

Procedural memory is increasingly used to improve LLM agents on recurring workplace tasks, yet its ability to produce reusable skills remains poorly understood. We introduce AFTER, a benchmark of 382 realistic enterprise tasks spanning six professional roles and 22 procedural skills, designed to evaluate how skills transfer across tasks, roles, and model backbones. The benchmark includes controlled evaluation settings for local improvement, cross-task transfer, cross-role transfer, and cross-model generalization. Experiments show that procedural memory delivers consistent gains in industrial workflows: a single refinement round improves aggregate performance by 3.7-6.7 points, while skills evolved from diverse multi-model execution traces achieve 73.1% cross-model test accuracy, outperforming all single-model trace sources. We further find that some skills generalize broadly across tasks and models, whereas others become specialized to role-specific workflows and lose effectiveness under transfer. These results provide practical guidance for building, evaluating, and deploying procedural memory systems in production agent platforms.

15
DataEvolver: Self-Evolving Multi-Agent Data Construction for Text-Rich Image Generation

Text-rich image generation is one of the most challenging settings in image generation, since models must simultaneously produce visually realistic images and render legible, semantically aligned, and layout-consistent text. Existing data pipelines usually follow a static crawl-filter-freeze paradigm. They collect candidate samples, filter them once, and freeze the accepted data for training. However, rejected samples are usually discarded, although they often contain useful failure signals such as OCR errors and semantic mismatches. As a result, later construction rounds may repeat the same failure modes. To address these limitations, we propose DataEvolver, a self-evolving multi-agent framework for text-rich image data construction. DataEvolver treats data construction as feedback-driven construction policy evolution. A Retriever collects candidate samples, a Verifier assigns quality scores and rejection causes, a Critic summarizes round-level feedback into semantic feedback, and a Generator completes under-covered regions through targeted synthesis. The updated feedback memory then guides the next construction round. Experiments on text-rich image generation benchmarks show that DataEvolver produces more useful training data than fixed-dataset baselines under matched data budgets. At the 0.75M scale on PixArt-alpha, DataEvolver improves OCR-F1 over the strongest baseline by 85.3 percent on TextScenesHQ and 35.3 percent on LongTextBench. The improvements are consistent across both evaluated benchmarks and also transfer to Show-o2, indicating that the benefit of DataEvolver is not tied to a single downstream generator. These results suggest that rejected samples can provide actionable feedback for improving text-rich image data construction.

14
RedVox: Safety and Fairness Gaps in Speech Models Across Languages

Speech-capable models are increasingly deployed in real-world applications across languages. Yet their safety and fairness beyond English settings and under naturalistic conditions remain understudied. We survey safety reporting practices across state-of-the-art speech model releases, finding that only 8% document any multilingual analysis. To address this gap, we introduce RedVox, a multilingual safety and fairness benchmark for audio and speech built on real voices, covering unsafe and unfair stereotypical requests across five languages (English, French, Italian, Spanish, and German). Evaluating eight state-of-the-art models, we find that vulnerabilities persist even under non-adversarial conditions, worsen in non-English languages, and are amplified when the request comes from a spoken input. Finally, by surveying the participants who contributed to RedVox, we document the unique personal and privacy challenges of collecting speech data with human participants, pointing to broader sociotechnical challenges in naturalistic speech safety research.

11
Little Brains, Big Feats: Exploring Compact Language Models

While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller language models perform during the generation stage within a Retrieval-Augmented Generation (RAG) system. To benchmark these models effectively, we utilised both open-source and proprietary datasets covering diverse subject areas and question types. Our findings demonstrate that a RAG system with small language models can be executed directly on-device without requiring any GPU hardware within a reasonable time. The experimental code and links to the supplementary materials can be accessed through the GitHub repository: https://github.com/SibNN/SLM-RAG-EVAL.

9
Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with high confidence, fail to recognize knowledge boundaries, and misrepresent their internal uncertainty--undermining trustworthiness and reliability. Since monitoring task performance and adapting behavior accordingly are central to metacognition, we posit that models capable of accurately judging their own performance are better positioned to improve it. We operationalize this idea via two novel mechanisms: reinforcement learning with metacognitive feedback (RLMF), a paradigm to refine completion rankings during preference optimization based on the quality of a model's self-judgments of performance, and metacognitive data selection, which uses similar self-judgments to identify high-value training examples, outperforming naive active learning. We apply these innovations to the problem of faithful calibration (FC), a task that is itself fundamentally metacognitive: the goal is to align expressed with intrinsic uncertainty, difficult even for frontier LLMs. We adopt a two-stage, decoupled approach, first using these methods to calibrate the faithfulness of models' self-reported confidence scores, then mapping to natural, context-adaptable linguistic uncertainty via targeted output editing. Extensive experiments show RLMF achieves generalizable, state-of-the-art FC on diverse tasks while preserving accuracy. Further, RLMF surpasses standard RL by up to 63% while enhancing models' ability to assess and express their own capability limits. This positions RLMF as a promising paradigm to enhance LLM metacognition toward improved abilities and alignment, and suggests metacognitive performance as an effective RL signal to overcome limits of prior intrinsic feedback methods.

9
PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation

Autoregressive Transformers dominate high-quality mesh generation by producing artist-worthy topologies, yet their inherent sequential decoding induces substantial computational overhead, falling orders of magnitude slower than parallel generative models. On the other hand, while continuous diffusion and flow-matching methods support efficient parallel synthesis across a variety of domains, they cannot be directly applied to meshes: mesh connectivity is inherently discrete and incompatible with standard continuous noise injection and denoising operations. To resolve this fundamental incompatibility, we introduce a compact topology embedder that projects discrete mesh vertex positions and normals into continuous per-vertex embeddings, where the original discrete adjacency information can be faithfully recovered via spacetime distance thresholding. After pretraining and freezing this embedder, any raw mesh can be fully converted into a continuous per-vertex state space unifying position, normal, and implicit topological attributes. Built upon this novel continuous mesh representation, we present PolyFlow, a Transformer-based flow-matching framework that achieves fully parallel vertex state denoising conditioned on extracted point-cloud features. During inference, our model completes generation rapidly via an ODE solver, and supports explicit, precise control over output mesh resolution by directly specifying the target vertex count. Extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines in both Chamfer Distance and Hausdorff Distance.

8
Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature

The materials science literature encodes decades of experimental knowledge in figures, yet this visual record remains locked away and inaccessible to AI at scale. The core difficulty is structural: most scientific figures are compound, with a single caption describing multiple sub-panels simultaneously, making direct image-text pairing unreliable. We present MatMMExtract, an end-to-end open-source pipeline that resolves this by decomposing compound figures into individual sub-panels and generating structured, grounded annotations using a large language model guided by a curated materials science taxonomy. Applied to 14,810 open-access articles, MatMMExtract produces MatSciFig; 391,606 panel-level image-text pairs from 180,571 figures, each annotated with a sub-caption, a two-level visualisation category spanning 19 classes and over 100 subtypes, and a scientific summary. To enable accurate panel localisation, we introduce MaterialScope, a domain-specific detection dataset of 2,811 manually annotated materials science figures, on which a fine-tuned YOLO12-m detector achieves mAP_50 of 0.9227. Among six benchmarked language models, Gemini 3.1 Flash Lite delivers the best cost-quality trade-off for annotation generation, with 82% of outputs rated good and a hallucination rate of 4.8%. A dual-encoder retrieval baseline on MatSciFig achieves a 4.4 times improvement in R@1 over zero-shot CLIP, demonstrating the dataset's immediate utility for vision-language learning. All resources are released openly to the community.

7
Xiaomi-GUI-0 Technical Report

Graphical user interface (GUI) agents build on vision-language models to complete user tasks end-to-end in real applications through interface actions such as tapping, swiping, text entry, and navigation. However, existing GUI agents are trained and evaluated largely on offline trajectories, simulated environments, and standardized benchmarks. These differ substantially from real applications in interface layout, interaction logic, and abnormal-state distribution, and cannot faithfully characterize execution stability in real-world use, where account states, permission dialogs, payment authentication, and risk control continually reshape the state distribution and open a persistent gap between benchmark scores and real usability. To close this gap, we propose Xiaomi-GUI-0, a native multimodal GUI agent for real mobile environments, trained and evaluated within a real-device closed loop. At its core is a real-device-dominant hybrid infrastructure, where physical devices are the primary execution environment and sandboxes provide auxiliary support, so that data collection, training, rollout, and evaluation share an execution distribution close to real deployment. We construct multi-source training data spanning high-frequency head tasks, high-generalization data for long-tail intents, and capability-enhancement data for reflection and memory, and introduce an error-driven data flywheel that turns failure trajectories into corrected actions, reflective explanations, and recovery demonstrations. The model is trained through a progressive three-stage pipeline of supervised fine-tuning, step-level reinforcement learning, and agentic reinforcement learning. Evaluated on public benchmarks and our in-house RealMobile, Xiaomi-GUI-0 achieves 72.0% success on RealMobile and 78.9% on AndroidWorld, while substantially improving execution stability and abnormal-state recognition in real-world tasks.

6
QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervision methods aim to solve this problem by scoring intermediate steps, from intrinsic confidence to self-distillation and embedding similarities. However, it is common practice to evaluate them by measuring the downstream performance of a training pipeline that integrates them. This is expensive, conflates supervision quality with training engineering confounders, and renders different methodological families requiring distinct training setups incomparable. As a result, dense supervision methods are rarely benchmarked on common ground. We introduce QVal, a training-free testbed for directly evaluating dense supervision signals. Given a state-action pair, QVal measures how well a method's score is Q-aligned: whether it orders actions according to the Q-values of a strong reference-policy. This lets us compare signals before any training run and separate signal quality from other engineering choices. We instantiate QVal as QVal-v1.0, benchmarking 21 dense supervision methods across four diverse environments and seven methodological families, with over 1.2K evaluation experiments across six open-weight model backbones. We find that simple prompting baselines consistently outperform recent dense supervision methods from the literature, and that performance clusters strongly by family. These findings hold across model sizes, environments, and observation modalities. QVal is designed to be easily extensible to new environments and methods, enabling researchers to iterate on dense supervision methods before any training run.

5
BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at https://github.com/HaitaoWuTJU/BrainJanus{GitHub}.

4
AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

Audio-video generation has recently gained unprecedented research attention, aiming to synthesize high-quality sounding video content with fine-grained synchronization and semantic alignment between the auditory and visual components. The preceding methods predominantly adopt a dual-branch design with separate tokenization and generation modules per modality, neglecting the representation gap while necessitating intensive computational resources for proper training. Inspired by recent advancements in one-dimensional visual tokenization, we present AVTok, a novel unified tokenizer designated for holistic audio-video generation. AVTok features a dual-stream transformer-based architecture with shared encoder-decoder and modal-specific learnable queries to efficiently and effectively encode an audio-video pair into a compact one-dimensional latent representation with a unified codebook. To cope with the heterogeneous information imbalance that hinders AVTok from exploiting aligned audio-visual information, we devise a hierarchical training strategy to progressively realize reconstruction capabilities for each modality. Extensive experiments demonstrate that AVTok excels both in audio-video reconstruction and when integrated into downstream pipelines for audio-to-video, video-to-audio, and class-conditional joint audio-video generation. AVTok paves the way for the challenge of joint audio-video tokenization and provides a potential direction to build unified large multimodal models for audio-video generation.

3
MuSViT: A Foundation Vision Model for Sheet Music Representation

Foundation models have transformed vision and language processing by providing rich, reusable representations that transfer across diverse tasks. Sheet music, as a visual encoding of musical language, lacks such a strong domain-specific backbone. We introduce MuSViT (Music Score Vision Transformer): the first foundation vision model for sheet music representation -- a ViT encoder pre-trained via Masked Autoencoders on 9.7 million pages from the IMSLP. To handle the complexity of real-world scores, we adopt a two-stage curriculum: a synthetic warm-up on typeset scores followed by large-scale training on the full IMSLP corpus. We evaluate MuSViT on four downstream tasks -- full-page and staff-level music score recognition, music symbol detection, and score difficulty classification -- under two scenarios: linear probing (frozen encoder) and fine-tuning. Under linear probing, MuSViT consistently outperforms modern vision encoders, revealing that general-purpose representations, regardless of scale, fall systematically short on the structured symbolic properties of musical notation. Under fine-tuning, MuSViT generally improves upon task-specific state-of-the-art methods. An additional embedding-transcription consistency analysis reveals that MuSViT encodes symbolic musical structure directly in its representation space -- unlike other encoders, whose embeddings do not correlate with music notation content. These results establish MuSViT as a foundation backbone for sheet music understanding.

2
PhotoQuilt: Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising

Photomosaics are large images whose local regions are seen as independent tiles while their overall arrangement forms a coherent scene. Generating them at high resolution, with every tile convincing in its own right, is computationally expensive, since the canvas must hold many detailed tiles at once. We present PhotoQuilt, a training-free framework that generates photomosaics at arbitrary resolution. Diffusion models struggle to satisfy both scales at once, as direct high-resolution generation is costly and tends toward one smooth image rather than a mosaic, while patch-based tiling keeps local detail but loses global structure. PhotoQuilt resolves this with a bootstrapped tiled denoising procedure. We first produce a global composition at low resolution to fix the layout, then upscale it in latent space and re-inject noise to restore generative capacity. Denoising proceeds within fixed tiles, so each forms its own image while the shared global structure holds them in one layout. Because tile generation is handled separately, PhotoQuilt scales to large canvases without quadratic attention cost. Experiments show that PhotoQuilt outperforms current baselines on both global structure and local realism.

2
LUMOS: A Semantic Operating-System Layer for Accessibility-Grounded AI Agents

Current operating systems expose interfaces optimized for human users but not for AI agents. Humans benefit from pixels, icons, windows, visual grouping, mouse movement, and keyboard shortcuts; AI agents instead need compact semantic state, grounded actions, and reliable feedback. As a result, many computer-use agents are forced to interpret screenshots, OCR output, and visual crops, introducing high token costs, visual ambiguity, latency, and coordinate uncertainty. This paper introduces LUMOS (Language Model Unified Machine-Readable Operating-System Semantics), a semantic interaction layer between AI agents and operating systems. LUMOS converts native accessibility metadata and browser UI structures into machine readable semantic blueprints with stable identifiers, roles, names, values, bounds, and action affordances. It also supports live semantic pointer grounding by querying the UI element under or near the cursor through operating-system automation APIs. An LLM then acts through an accessibility grounded observe act loop using constrained visible-UI primitives rather than application-specific scripts. LUMOS does not claim to replace visual agents; instead, it reduces dependence on screenshots when operating systems already provide semantic structure. These results suggest a path toward AI-native operating systems and machine-readable interaction layers.

2
Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing

Existing instruction-based video editing datasets commonly focus on single-task appearance editing, failing to meet the complex creative demands of real-world scenarios. To bridge this gap, we present Goku, a large-scale dataset featuring 2 million high-quality, instruction-aligned video editing pairs, which is the first to extend task boundaries from basic appearance editing to multi-task and structural manipulations(e.g., precise control of subject movement). To tackle the data synthesis challenges inherent in these complex tasks, we design an efficient data synthesis pipeline that decomposes complex edits into controllable sub-problems and introduce a progressive filtering system for data reliability throughout the whole process. Furthermore, we explore the optimal network structures on Goku, and propose Goku-Edit. To deeply comprehend complex editing instructions, Goku-Edit leverages an MLLM as its text encoder and adopts a decoupled dual-branch design: a dedicated mask branch handles structural control, freeing the main branch for appearance rendering. A comprehensive video editing benchmark, Goku-Bench, is also proposed with 1,000 human-verified test cases and 7 novel editing-specific metrics. Evaluated on Goku-Bench, Goku-Edit obtains up to +8% improvement on other open-source models in terms of instruction following.

1
TerraDiT-Ω: Unified Spatial Control for Satellite Image Synthesis with Any Geospatial Primitive

Generative models have achieved remarkable progress, yet applying them to satellite imagery remains challenging. Unlike natural imagery, satellite scenes are structured by spatially complex and semantically distinct geometries. Prior work addresses this complexity by adapting natural image frameworks using dense rasters or sparse prompts, trading off annotation cost and fidelity while breaking compatibility with vector primitives commonly used to represent geographic information. We introduce TerraDiT-Ω, a unified spatial control framework that generates satellite imagery directly from any native geospatial primitive. By jointly leveraging precise annotations (polygons, polylines) and coarser ones (bounding boxes, points), the model supports controllable layouts across varying annotation budgets, broadening applicability to design tasks such as urban planning while remaining naturally compatible with end-to-end GeoAI workflows. To effectively leverage these primitives during generation, we propose Geometry-Aware Local Attention, a conditioning mechanism that injects explicit geometric cues into the attention space. Across all conditioning formats, our approach consistently outperforms both dense-control and sparse-control baselines. Furthermore, this flexibility enables controllable synthetic data augmentation using a single generative model, improving downstream performance on land-cover segmentation, object detection, road graph extraction, and scene classification. Code, data, and weights are available at https://github.com/mvrl/TerraDiT.

1
MOPD: Multi-Teacher On-Policy Distillation for Capability Integration in LLM Post-Training

Modern large language models (LLMs) rely on reinforcement learning during post-training to push specific capabilities, yet integrating multiple capabilities into one model remains hard. Existing methods, such as Off-Policy Finetune and Mix-RL, are either inefficient or lose performance. In this work, we propose Multi-teacher On-Policy Distillation (MOPD), a post-training paradigm for combining the capabilities of multiple domain RL teachers: we first run per-domain specialised RL to obtain a set of domain teachers, then distill these teachers into the student on its own rollouts. This eliminates exposure bias and provides a dense optimization signal. On Qwen3-30B-A3B, MOPD outperforms Mix-RL, Cascade RL, Off-Policy Finetune, and Param-Merge baselines, inheriting nearly all of each teacher's capability. MOPD also enables parallel, independent development of domain teachers, removing the cross-domain coupling typical of multi-domain post-training. MOPD has been deployed in the post-training of MiMo-V2-Flash, an industrial-scale frontier model, demonstrating its practical value for capability integration in frontier-scale LLMs.

0
FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

Spoken language models (SLMs) extend LLMs to speech input and output. Existing SLMs represent speech at fixed frame rates (e.g., 25 or 12.5 Hz), ignoring the time-varying information density of speech and offering no flexibility to trade off quality for speed at inference time. Recent audio tokenizer research has proposed dynamic frame rate speech coding, which exploits this non-uniformity and enables two new capabilities: very low average frame rates and frame rate controllability. However, this technique has not yet been applied to SLMs. We introduce Flexible Spoken Language Model (FlexiSLM), the first SLM that supports dynamic and controllable frame rates on both speech input and output. Using dynamic frame rate representations, FlexiSLM outperforms fixed-frame-rate 7B models including Qwen2.5-Omni and Kimi-Audio at its high-quality operating points. We further verify that FlexiSLM can be accurately steered down to 4.0 Hz; at 6.25 Hz, it roughly halves inference time relative to 12.5 Hz while retaining strong speech-to-speech quality. Audio samples are available at https://flexislm.github.io .

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - July 1, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

LightTwist icon
LightTwist

Record & stream your show in a realistic virtual studio

0
Clusy icon
Clusy

AI notebook platform for modern data science

0
Stigg 2.0 icon
Stigg 2.0

The usage runtime for AI products

0
N71 icon
N71

Give all your AI agents one shared context

0
Browser Notes icon
Browser Notes

Your ideas, organized - not uploaded

0
Humalike icon
Humalike

Give your AI agents the social intelligence they're missing

0
Aruki icon
Aruki

The Japanese walking method, coached on your iPhone

0
Gemini Omni Flash icon
Gemini Omni Flash

High-quality video generation and conversational editing

0
Livinity icon
Livinity

Open-source homeserver OS with a built-in AI agent

0
Acti icon
Acti

Agentic keyboard for mobile commands and search

0
Fuser Apps icon
Fuser Apps

Vibecode apps, sites, & games on everyone's favorite canvas

0
MailAdept by mailwarm icon
MailAdept by mailwarm

AI Agents & Email deliverability experts on your team

0
Adam CAD Copilot icon
Adam CAD Copilot

AI CAD inside Onshape and Fusion

0
Tabstack Browser Automation icon
Tabstack Browser Automation

Automate the web in your app or agent, no browser to host

0
RunInfra icon
RunInfra

Describe the AI model you need and get an optimized AI

0
Mark by Airtop icon
Mark by Airtop

Vibe automation for solo marketers

0
Metal icon
Metal

AI-driven operating system for raising venture rounds

0
Sequence Agentic icon
Sequence Agentic

Money movement for AI agents

0
Claude Science icon
Claude Science

Your research partner for rigorous science

0
OASIS 1 Ring icon
OASIS 1 Ring

Whisper to write and touch to edit

0
Modelence Mobile Builder icon
Modelence Mobile Builder

Build mobile apps by chatting with AI

0
Dump Memory icon
Dump Memory

We fix your memory

0
Saldor icon
Saldor

Speed up procurement and AP.

0
Wins 3.4 — Snap Island for Mac icon
Wins 3.4 — Snap Island for Mac

Snap, switch, and arrange Mac windows from the notch

0
Ciaro Pro icon
Ciaro Pro

AI filmmaking for visual storytellers

0
Folderly Lens icon
Folderly Lens

Domain health analysis for high performance email campaigns

0
Loot icon
Loot

Collect your favorite things in real life

0
Claude Sonnet 5 icon
Claude Sonnet 5

AI that plans, acts, and gets work done

0
Bamboo icon
Bamboo

Markdown notes with AI under your control

0
Get Transparent Pricing on Labs icon
Get Transparent Pricing on Labs

The right tests. The real price. Nothing extra.

0
Load Nova icon
Load Nova

An AI co-pilot and dashboard built for dispatcher speed

0
Pluno icon
Pluno

Browser agent that’s 10x faster than Claude

0
Cursor for iOS icon
Cursor for iOS

Build with coding agents from anywhere

0
Foresight by Lightning Rod icon
Foresight by Lightning Rod

Predict anything with AI

0
AgentPeek icon
AgentPeek

Claude Code & Codex in your Mac notch

0
Oakamo icon
Oakamo

Your quiet space for reading articles later.

0
v0 Design Systems 2.0 icon
v0 Design Systems 2.0

Build with your components, colors, fonts, and patterns

0
Midway Chat icon
Midway Chat

Real-time member chat for Memberstack and Webflow sites

0
Brain2Qwerty v2 icon
Brain2Qwerty v2

Decode sentences directly from non-invasive brain signals

0
Tinkerfont icon
Tinkerfont

Free font playground for live websites

0
DropK icon
DropK

The tray that doesn't pretend

0
Clade icon
Clade

AI COO that runs your team in tools you already use

0
Akiflow icon
Akiflow

Manage tasks and calendars from Claude, ChatGPT or Cursor

0
Supafax icon
Supafax

Email-native assistant that learns how you work

0
iVox icon
iVox

The first app dedicated to 1980s tape-edit effects.

0
Bilt.me - Figma icon
Bilt.me - Figma

Get a real mobile app from your Figma design

0
Justwrite icon
Justwrite

A private, local-first writing space that works offline

0
Skills Marketplace by Databox icon
Skills Marketplace by Databox

Ready-made AI analytics skills for your business data

0
Dayflow icon
Dayflow

Open source tools that help you get promoted

0
Crest icon
Crest

System stats and translation on your Mac's notch

0
06

TECHMEME

06.00
TECHMEME

Techmeme - July 1, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Abu Dhabi-based MGX raised a $49B AI-focused fund, exceeding its $45B target, and plans to spend as much as $10B annually over the next few years (Dinesh Nair/Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Dinesh Nair / Bloomberg : Abu Dhabi-based MGX raised a $49B AI-focused fund, exceeding its $45B target, and plans to spend as much as $10B annually over the next few years —  MGX has raised $49 billion for one of the biggest ever funds dedicated to artificial intelligence deals, propelling the two-year-old Abu Dhabi firm …

Letter: the US says Anthropic "agreed to proactively detect and address security risks" of Fable 5 and Mythos 5; a source says Anthropic developed a "safeguard" (Financial Times)
Source: TechmemePublished: Jul 1, 2026

Financial Times : Letter: the US says Anthropic “agreed to proactively detect and address security risks” of Fable 5 and Mythos 5; a source says Anthropic developed a “safeguard” —  US government move allows AI start-up to re-release Mythos and Fable models

Sony says it is closing the virtual PlayStation 3 store in select markets in 2026 ahead of global store closures for the PS3 and PS Vita in July 2027 (Jess Weatherbed/The Verge)
Source: TechmemePublished: Jul 1, 2026

Jess Weatherbed / The Verge : Sony says it is closing the virtual PlayStation 3 store in select markets in 2026 ahead of global store closures for the PS3 and PS Vita in July 2027 —  The virtual PS3 store will close in select markets this year, with global closures for PS3 and PS Vita following in 2027.

Sources: Meta is developing plans for a cloud infrastructure business that will sell access to AI computing power and models, to compete with AWS and Azure (Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Bloomberg : Sources: Meta is developing plans for a cloud infrastructure business that will sell access to AI computing power and models, to compete with AWS and Azure —  Meta Platforms Inc. is developing plans for a cloud infrastructure business that will sell access to AI computing power and models …

A researcher says a vulnerability in Apple's Hide My Email tool lets anyone discover a real email address; first reported in June 2025, Apple is yet to fix it (Joseph Cox/404 Media)
Source: TechmemePublished: Jul 1, 2026

Joseph Cox / 404 Media : A researcher says a vulnerability in Apple's Hide My Email tool lets anyone discover a real email address; first reported in June 2025, Apple is yet to fix it —  “Hide My Email users deserve to know that it may be possible for attackers to discover their hidden email addresses,” the person who reported the issue said.

Taiwanese authorities detain two Super Micro staff and an Albatron manager after a raid of Super Micro's local offices this week over Nvidia shipments to China (Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Bloomberg : Taiwanese authorities detain two Super Micro staff and an Albatron manager after a raid of Super Micro's local offices this week over Nvidia shipments to China —  Taiwanese prosecutors detained two Super Micro Computer Inc. employees following a raid of the US company's local offices earlier …

Sony says all new PlayStation games from both first- and third-party developers will be sold in digital formats from January 2028, ending physical game discs (Stephen Totilo/Game File)
Source: TechmemePublished: Jul 1, 2026

Stephen Totilo / Game File : Sony says all new PlayStation games from both first- and third-party developers will be sold in digital formats from January 2028, ending physical game discs —  The end of an era, a potential fatal blow to video games as physical media, and a hint of what's in store for the PS6?

Stockholm's Patent and Market Court orders Google to pay nearly $2B to Klarna's PriceRunner in a dispute over abuse of power in the shopping comparison market (Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Bloomberg : Stockholm's Patent and Market Court orders Google to pay nearly $2B to Klarna's PriceRunner in a dispute over abuse of power in the shopping comparison market —  Alphabet Inc.'s Google was ordered to pay almost $2 billion to Klarna Group Plc's Pricerunner unit in a dispute …

Twelve Labs, which is building AI models to make video searchable and understandable, raised a $100M Series B co-led by NEA and Naver, and signs an AWS deal (Saritha Rai/Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Saritha Rai / Bloomberg : Twelve Labs, which is building AI models to make video searchable and understandable, raised a $100M Series B co-led by NEA and Naver, and signs an AWS deal —  Twelve Labs Inc. is raising $100 million from investors including Amazon.com Inc., NEA Management Co. and Naver Ventures …

Together AI, which offers access to open-source models, raised $800M led by Saudi Aramco's Prosperity7 at an $8.3B valuation, taking its total funding to $1.3B (Niko Gallogly/New York Times)
Source: TechmemePublished: Jul 1, 2026

Niko Gallogly / New York Times : Together AI, which offers access to open-source models, raised $800M led by Saudi Aramco's Prosperity7 at an $8.3B valuation, taking its total funding to $1.3B —  Together AI, which specializes in open-source artificial intelligence models, is now worth more than $8 billion.

Wayve files to sell shares on the London Stock Exchange's new Private Securities Market, the first major company to test it, and let staff sell $85M in shares (Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Bloomberg : Wayve files to sell shares on the London Stock Exchange's new Private Securities Market, the first major company to test it, and let staff sell $85M in shares —  Autonomous driving software company Wayve Technologies Ltd. has filed to sell shares on the London Stock Exchange's new Private Securities Market …

A UN panel co-chaired by Yoshua Bengio warns that AI capabilities are outpacing scientific understanding, the "potential benefits of AI are enormous", and more (Andrea Shalal/Reuters)
Source: TechmemePublished: Jul 1, 2026

Andrea Shalal / Reuters : A UN panel co-chaired by Yoshua Bengio warns that AI capabilities are outpacing scientific understanding, the “potential benefits of AI are enormous”, and more —  The rapid development of AI offers huge potential benefits to countries and people around the world, but also poses big risks …

How the rapid rise in US- and China-made AI abilities is leading to both a transformation in AI use at work, and sudden lurches in policies and markets (Ethan Mollick/One Useful Thing)
Source: TechmemePublished: Jul 1, 2026

Ethan Mollick / One Useful Thing : How the rapid rise in US- and China-made AI abilities is leading to both a transformation in AI use at work, and sudden lurches in policies and markets —  How work changes along the exponential  —  If you feel like things are accelerating in AI, you are probably right.

Anthropic says it is rolling back a covert Claude Code tracking feature to identify users based in China or affiliated with Chinese AI labs, after backlash (Juro Osawa/The Information)
Source: TechmemePublished: Jul 1, 2026

Juro Osawa / The Information : Anthropic says it is rolling back a covert Claude Code tracking feature to identify users based in China or affiliated with Chinese AI labs, after backlash —  Anthropic is backtracking a spyware rolled out covertly to track users' location and whether they are based in China or affiliated …

How a new Amazon-built transatlantic fiber-optic link in Ireland is symbolic of the country's AI ambitions, but also of its chronic lack of defense spending (Bloomberg)
Source: TechmemePublished: Jul 1, 2026

Bloomberg : How a new Amazon-built transatlantic fiber-optic link in Ireland is symbolic of the country's AI ambitions, but also of its chronic lack of defense spending —  A new Amazon transatlantic fiber-optic link is symbolic of the country's tech economy, but also of its chronic lack of defense spending.

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - July 1, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - July 1, 2026

Solidot Feed: Highlighting essential tech & open-source news.

LHC 第三次停机维护

CERN 宣布了 LHC 的第三次长时间停机维护(Long Shutdown 3)。这次维护和升级将为下一阶段的 High-Luminosity LHC(HiLumi LHC)的运行做准备。LHC 于 1998-2008 年建造,2009 年投入运行,2010 年首次实现 3.5TeV 粒子对撞,2012 年宣布发现了希格斯玻色子。2013-2015 年 LHC 进行了第一次维护升级,使得粒子对撞的总能量提高到了 13.0TeV;2018 年底到 2022 年 4 月 LHC 进行第二次维护升级。第三次停机维护将是至今最大规模的升级改造,HiLumi LHC 计划于 2030 年投入运行,其亮度提高最多十倍,将使研究人员能收集规模更大的数据集,对希格斯玻色子进行更精确的研究,增强发现标准模型之外现象的潜力。

研究人员将干细胞转化为人类初级卵母细胞

美国生物科技创业公司 Conception 宣布成功诱导干细胞转化为人类初级卵母细胞,称这是一项重大的科学进步。科学家已在小鼠身上实现了利用干细胞制造卵子。研究人员首先将小鼠皮肤细胞转化为“诱导多能干细胞”(iPSC),然后转化为可用的卵子。这些卵子产生了健康的幼崽,寿命正常,能自然繁殖,有自己的健康后代。该过程被称为“体外配子发生”(in vitro gametogenesis),在小鼠身上比在大型动物身上更容易实现。 IVG 有潜力重新定义生殖。简单通过抽血就可以制造出一个家庭需要的尽可能多的健康卵子。这种能力将可用摆脱生物和遗传的限制,极大扩大家庭生育健康孩子的选择,使女性能在更大的年龄生育后代。

中国 2026 年汽车出口量有望达到千万

AlixPartners 预测,中国的汽车出口量 2026 年将比上一年增加 4 成,达到 1000 万辆。中国汽车工业协会的统计显示,2026 年 1~5 月出口量同比增长 63%,达 405 万辆。占出口大部分的是纯电动汽车(EV)等新能源汽车,达到 183 万辆,增至 2.1 倍,远远超过汽油汽车的增长率(36%)。如果目前的出口速度持续,2026 年出口量将比 2025 年增加 41%,达到 1000 万辆。如果实现,将成为世界上第一个出口 1000 万辆汽车的国家。相当于日本出口量的约 2.5 倍。中国汽车出口量剧增,主要原因是国内销量的减少。2026 年中国新车销量预计为 2460 万辆,比 2025 年减少 10%。

arXiv 从康奈尔大学独立

预印本平台 arXiv.org 于 7 月 1 日脱离康奈尔大学成立独立的非营利性组织。arXiv 诞生于 1991 年,创始人 Paul Ginsparg 在 2001 年加入了康奈尔大学,arXiv 网站随后由康奈尔大学图书馆接手。25 年后 arXiv 决定翻开新的篇章。arXiv 组织正致力于确保这一过渡平稳进行,让作者、读者和社区几乎不会感受到有任何变化。官方博客表示:“arXiv 由科学家创建,为科学家服务,虽然我们的“家”可能会变,但我们的使命、愿景和价值观永远不变。arXiv 将继续致力于免费阅读和投稿,致力于为全球科学家提供公平获取新想法和新发现的机会。在从康奈尔大学独立出来的过程中,arXiv 的工作人员、志愿者以及我们的支持者正在努力确保 arXiv 提供的重要服务不会中断。”

美国 Henrico 县请求政府和学校为数据中心节约用电

美国弗吉尼亚州 Henrico 县因毗邻华盛顿特区,拥有大片的土地,而几乎一夜之间成为数据中心枢纽。该县有 37 座数据中心,计划再建 17 座数据中心。数据中心带来的一个副作用是电价在上涨。县长 John Vithoulkas 于 6 月 26 日向数千名政府员工发送了一封电子邮件,请求他们协助政府节约用电,“从 7月 1 日起,Henrico 县所有政府和学校设施的电价将上涨 25%,预计下一财年将增加 500 万美元的开支。我们预计未来几年电价还会继续上涨。”“为缓解电费上涨带来的影响,我请求大家共同做出些调整,在各自的工作区域节约用电。离开工作区域时,包括下班离开时,请关灯。每天工作结束后,请关闭电脑/笔记本电脑。如果您的工作区域有窗户,请调整百叶窗以控制阳光照射带来的热量。不使用电器、充电器或其它电器时,请拔掉电源插头。请限制使用(或完全避免使用)电暖器。仅单个普通电暖器每年就可能给县政府造成 150-300 美元的电费支出。”

微软发布 WSL 容器的预览版

微软发布了 Windows Subsystem for Linux (WSL)容器的公开预览版本。容器已成为现代开发的基础部分——从云原生应用、AI工作负载到测试和部署流水线。WSL 容器通过提供一个内置的、企业级的方式,在 Windows 上创建、运行和管理 Linux 容器,简化了这一体验,不再需要额外的第三方工具。

为什么我们需要睡眠

为什么动物需要睡眠?发表在《Brain Medicine》期刊上的一篇综述认为,关于睡眠的诸多困惑,源自将三个概念混为一谈,作者认为睡眠最恰当的理解,既非休息,亦非打扫,而是一种系统层面的韧性机制,使这具约由 860 亿个神经元构成的网络,不致漂入无法脱身的状态。“我们想要超越那种把睡眠仅仅视作一夜充电的看法,”中国科学院长春应用化学研究所的 Xiaohui Wang 教授说,他是通讯作者之一。“当你把大脑看作一个复杂的动态网络,睡眠便开始像是一位审慎的工程师有意设计之物,是留给系统自我修复与重组的一段预定窗口。”综述把睡眠的两大阶段看作一种分工。在非快速眼动睡眠中,尤其是慢波阶段,大脑陷入高振幅、低频率、低于一赫兹的节律。模块性上升。熵下降。被一整日学习所抬高的突触强度,被悄然重整,使网络不致饱和。快速眼动睡眠所为,几乎相反。电信号去同步化,θ节律与γ节律攀升,大脑转向全局整合与探索,松动那些已变得过于僵硬的回路。“一个只知优化的网络,可能把自己逼入死角,”第一作者 Longwei Yang 说。“而睡眠所守护的,似乎正是重新走出来的能力。”

美国政府解除对 Claude Fable 5 和 Mythos 5 模型的出口限制

Anthropic 周二宣布美国商务部已解除对 Claude Fable 5 和 Mythos 5 模型的出口限制,该公司将于周三恢复提供对其新模型的访问。美国政府是在 6 月中旬以国家安全理由下令禁止外国公民访问 Anthropic 最先进的 AI 模型,这一限制甚至涵盖了 Anthropic 自己的外籍员工。美国商务部长 Howard Lutnick 周二在 X 平台上发帖称,“过去两周我们与 Anthropic 密切合作,分析并批准 Fable 5,以确保其与美国政府保持一致,并强化美国在 AI 领域的领导地位。”

Claude Code 会悄悄检查用户的系统时区是否是中国

Claude Code 被发现会悄悄检查用户的系统时区和是否来自中国 AI 公司。对 Claude Code(2.1.196)的本地二进制文件的分析发现,它会检查系统时区是否为 Asia/Shanghai 或 Asia/Urumqi,以及是否匹配中国科技公司的域名,其中包括 baidu.com、alibaba-inc.com、alipay.com、antgroup-inc.cn、bytedance.net、kuaishou.com 、xiaohongshu.com、jd.com 和 bilibili.co 等等。此举可能是防止中国 AI 公司蒸馏其模型。

小鼠实验显示单次注射 DNA 指令的减肥时效十倍于 GLP-1 药物

GLP-1 减肥药如 Ozempic 和 Wegovy 需要长期服用,停药后容易反弹。现在研究人员报告小鼠实验显示单次注射 DNA 指令即可实现减轻体重和控制血糖的效果,其持续时间十倍于 GLP-1 药物。新方法有助于消除重复给药的需求。人体能充当工厂合成长效抗体。新方法是基于 Weiner 实验室的肌肉 DNA 电穿孔平台(intramuscular DNA electroporation platform),患者注射一针质粒 DNA(遗传指令),然后施加电脉冲,将这些指令导入人体细胞的细胞核被读取,使细胞能持续合成抗体。Weiner 实验室使用该方法将新冠抗体指令输入到患者体内,在 1 期临床试验中患者体内抗体持续表达逾 72 周。新研究中,研究人员设计出长效肠促胰岛素激素 GLP-1 和 GIP 的 DNA 指令,其中包含了一个防止蛋白质在体内快速分解的抗体片段。在小鼠实验中,单次注射在 70 天内产生了可检测水平的肠促胰岛素,能长时间维持代谢改善效果。

数据中心碳排放比预想的严重

Allianz Trade 公布的一份报告显示,数据中心对气候构成的危害比此前预想的更为严重。报告指出,此前人们低估了数字基础设施对气候的影响。除非采取适当措施,否则随着人工智能应用的激增,由算力需求引发的碳排放量将急剧上升。该研究估算,2025 年全球数据中心的二氧化碳排放总量达到 2.86 亿吨。这一数字比国际能源署此前研究得出的数据高出 57%。Allianz Trade 将此归因于其模型纳入了整个价值链中的间接排放(例如制造和施工环节产生的排放)以及电力传输过程中的损耗。数据显示,超过 70% 的排放源于设施的电力消耗,而约四分之一的排放则来自硬件和基础设施。数据中心对气候的危害在很大程度上取决于其地理位置。中国和美国的数据中心二氧化碳排放量合计约占全球总量的 70%。这两个国家的电力结构中,碳强度相对较高,分别为每度 384 克(美国)和 526 克(中国)。

试验显示鱼油未能改善大脑健康

一项为期两年、采用安慰剂对照、双盲设计的临床试验显示,高剂量 Omega-3 补充剂未能改善受试者记忆力、认知能力,也无法延缓阿尔茨海默病相关脑区的脑细胞损耗。研究共纳入 365 名 55-80 岁的受试者,他们日常很少吃鱼,而鱼类是膳食 Omega-3 的主要来源。研究人员认为受试者全部属于阿尔茨海默病患病高风险人群。近半数受试者(47%)携带 APOE4 基因,后者是目前已知的晚发性阿尔茨海默病最强遗传风险基因。受试者被随机分为两组,一组每日服用鱼油补充剂,另一组服用安慰剂。每份补充剂含有 2000 毫克二十二碳六烯酸(DHA),这是一种对大脑功能起关键作用的 Omega-3 脂肪酸。研究首要目标之一就是确认补充剂中的DHA能否真正进入大脑。研究人员通过检测包裹大脑与脊髓的脑脊液中的 DHA 含量验证了这一问题。服用 6 个月后,受试者脑脊液DHA平均提升 17%,证实该营养物质顺利抵达脑部。即便 DHA 成功进入大脑,也没有带来可量化的认知提升。研究人员分别在试验启动时与两年后对受试者的记忆、思维能力进行测评。服用 DHA 补充剂的人群,其认知测试表现与安慰剂组相比并无优势。脑部影像检测结果也印证了这一结论:鱼油补充剂无法减缓海马体萎缩。海马体是掌控记忆的核心脑区,也是评判大脑老化程度、阿尔茨海默病发病风险的标志性区域。

中国大学停招众多语言专业

麦可思研究上个月发表的一项调查发现,根据 70 所本科高校最新公布的停招专业名单,日语专业共有 8 所高校停招,德语专业 5 所,翻译研究专业 5 所。麦可思称,过去多年,外语类专业曾是高校扩招的重要方向,但随着国际交流环境变化以及人工智能翻译工具快速发展,传统语言类专业的人才培养模式也开始转型。大学如何应对 AI 带来的影响?中国大学需要政府批准才能新增专业。据报道,教育部批准下学年新增 38 个专业,其中大部分侧重于科技或数字化领域。新专业包括具身智能、商业 AI、数据智能等,以及低空经济与管理、半导体设备工程、稀土科学与工程等领域。

国际清算银行警告 AI 泡沫破裂将增加全球经济衰退风险

国际清算银行(BIS)发表年度经济报告,警告 AI 泡沫破裂将增加全球经济衰退风险。报告指出,AI 投资虽能助推生产力提升,但过度投资一旦退潮反转,可能将引发金融系统混乱。根据早先的报道,亚马逊预计 2026 年资本支出 2000 亿美元,微软预计为 1900 亿美元,Google 约为 1800 亿美元,Meta 1400 亿美元。甲骨文也将投入巨资。五大数据中心运营商 2026 年的 AI 相关资本支出将超过 1 万亿美元。报告指出,“投资承诺的增长速度超过了这些公司的利润和自由现金流,导致部分公司不得不发行债券筹集额外资金。这场投资竞赛的部分原因可能是认为只有少数拥有卓越技术的企业最终能主导市场。”报告指出,“回报不尽如人意可能会引发融资突然收缩,使资本支出繁荣演变为旷日持久的投资萧条,可能对金融状况产生连锁反应。”报告还提到,电力供应、芯片短缺和电网连接瓶颈等问题引发了对“供应侧障碍”的担忧。

被遗弃的金鱼会破坏生态环境

一项新研究发现,当宠物金鱼被放生或逃逸至野外时,会对淡水生态系统产生重大影响。该研究利用大型户外淡水模拟生态系统,旨在模拟真实湖泊环境。研究人员将金鱼引入实验生态系统,并长期观察它们对不同类型湖泊的影响。研究团队考察了两种常见的淡水环境:营养贫乏(寡营养)水域和营养丰富(富营养)水域。在这两种环境中,金鱼都造成了实质性的生态破坏。最重要的发现之一是水质迅速恶化。在营养丰富的系统中,金鱼导致水体透明度急剧下降,同时悬浮颗粒物显著增加,表明生态系统状况发生了重大改变。其次是本地水生物种减少。蜗牛、片脚类动物和浮游动物的种群数量显著下降。这些小型生物在健康的淡水食物网中发挥着关键作用,同时受到了捕食和栖息地干扰的双重影响。本地鱼类也受到负面冲击。金鱼与本地鱼类争夺食物和其他资源,导致本地鱼类整体体质下降。科学家将其视为长期种群健康的重要指标。研究人员表示,应将金鱼列为高优先级入侵物种。他们建议自然资源机构在野生种群建立之前,重点开展预防、早期发现和控制工作。

科学家发现液态水存在两种结构的分子水平证据

根据发表在《Nature Physics》上的研究,科学家发现了液态水存在两种微观结构的分子水平证据。水可能以两种不同的结构状态存在并非新观点。几十年来科学家一直推测液态水由两种可相互转化的局部结构组成——一种密度更高更无序,另一种密度更低更有序。双状态模型被用于解释水的许多反常性质,如为什么水在冷却时更容易被压缩,以及为什么水的最大密度出现在 4°C 而不是冰点。但由于难以获得直接分子水平证该模型仍然受到争议。双状态模型的核心是被称为液液相变(liquid-liquid phase transition)的假设现象。其基本思想是,在深度过冷状态下,水会分裂成两种宏观上不同的液相:高密度液体和低密度液体。

一项关于癌症治疗时机的论文被撤稿

今年初一篇发表在医学期刊上的文章因其惊人的结论引起了全球癌症患者和医生的关注。仅仅改变免疫疗法的给药时间,似乎就能为肺癌患者带来意想不到的巨大益处。根据在中国进行的一项临床试验结果,上午接受静脉输注的患者癌症得到控制的时间是下午接受输注患者的两倍。研究还指出,这些患者的存活时间也延长了近两倍。 几位肿瘤科医生表示,近几个月来,他们和各自所在医院接到了大量患者的电话,咨询是否可以改在上午进行输注。上周《自然-医学》撤回了该研究,理由是其试验设计和结果存在一系列矛盾和不规范之处。期刊在其撤稿声明中列举的问题包括:原本应该在研究开始前锁定的记录在进行到一半时被修改了;该研究计划的中文版本与翻译版本之间存在差异;所有患者在研究的第一年都接受了治疗和随访,没有人因副作用而退出——这在肿瘤学研究中极为罕见;此外随访扫描的时间安排也发现了异常模式。其他研究也发现患者接受癌症免疫疗法的时间与他们的预后之间存在某种关联。但原因仍不清楚。医生们表示,这有可能是精力更充沛、更健康的患者会选择上午的时间段。而住在远离输注中心、且往往预后较差的贫困或农村患者可能会要求选择下午的时间段,因为他们需要花整个上午的时间在赶往预约的路上。

三星、SK 海力士和美光再次被控串通操纵内存价格

14 名消费者和 3 家小企业于 25 日在加州联邦法院提起诉讼,指控全球最大的三家内存供应商三星、SK 海力士和美光自 2022 年起串通操纵内存价格和供应,导致过去四年内存价格上涨约 700%。原告称,三家公司以向 HBM 过渡为借口,减少了 DDR 内存的供应,“DDR 内存寡头垄断企业系统性协调了向 HBM 的过渡以及 DDR3 和 DDR4 的停产,”苹果公司近期大幅提高产品价格是引发这场诉讼的导火索。虽然这起诉讼规模较小,但如果法院接受原告的诉求并正式批准其为集体诉讼,诉讼规模可能会扩大。代表原告的反垄断律所 Bathaee Dunne 的目标是发起一项集体诉讼,代表所有购买过含 DRAM 产品的普通消费者和企业。三星电子和 SK 海力士此前在美国被判串谋罪名成立,导致巨额罚款以及高管入狱。

美最高法院裁决手机地理位置数据受宪法第四修正案的保护

美国最高法院裁决智能手机的地理位置数据受宪法第四修正案的保护。在 Chatrie v US 一案中,最高法院以 6 比 3 判决政府败诉。大法官 Elena Kagan 执笔多数判决书,指出地理围栏搜查令获取的敏感数据属于第四修正案规定的搜查范围,即使个人身处公共场所也享有“合理的隐私期望”,“个人对其手机位置记录享有合理的隐私期望,警方索取这些信息——即使只是在有限的时间范围内,且是从第三方科技公司获取——也侵犯了这种受宪法保护的权利。”Okello Chatrie 在 2019 年 5 月 20 日持枪抢劫了一家银行,抢走 19.5 万美元后逃走。当地警方利用地理围栏搜查令让 Google 提供了抢劫前后 30 分钟内距离银行 150 米范围内的所有设备关联的账号信息。其中一个账号就是 Chatrie。他曾选择启用 Google 的“位置历史记录”功能,该功能每隔几分钟就会记录他的位置。在认罪后他被判处 12 年监禁。他的律师认为,地理围栏搜查令搜查范围过广,侵犯了他受宪法第四修正案保护的权利。美国政府则认为执法部门只获取少量手机位置信息,不属于第四修正案所指的搜查,因而不应享有同样的隐私保护。大法官们站在了政府的对立面。乔治城大学法学教授 Paul Ohm 表示,最高法院重申,警方需要搜查令才能将 Google 位置追踪等私人服务转变为国家监控工具。

Rocket Lab 收购铱星

火箭发射公司 Rocket Lab 宣布收购卫星运营商铱星公司。双方达成最终协议,Rocket Lab 以每股 54 美元,现金加股票的方式收购铱星所有已发行普通股。这笔收购对铱星的估值约为 80 亿美元。这笔交易还需要获得铱星股东以及监管机构的批准,交易预计将于 2027年 中期完成。铱星公司目前运营的铱星卫星星座共有 80 颗卫星,其中 66 颗为活跃卫星,14 颗为备用卫星。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…