TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0885
WED, JUN 3, 2026
Discover the best information organized by OrangeBot.AI
TODAY · WED, JUN 3, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 3, 2026

Here is a summary of today's main news events:

U.S. Stocks Pull Back After Record-Setting Run Major U.S. stock indexes opened lower, ending a five-day streak of record highs. The dip is linked to new data showing a stronger-than-expected job market, which has pushed the U.S. dollar and Treasury bond yields higher as investors weigh the possibility of delayed interest rate cuts.

Oil Prices Climb Amid Renewed U.S.-Iran Tensions The price of oil is rising following new exchanges of fire between the U.S. and Iran. These hostilities are raising concerns about the security of major shipping routes like the Strait of Hormuz, increasing uncertainty in the global energy supply and fueling inflation fears.

AI Sector Sees Flurry of Activity from Regulation to IPO Plans The artificial intelligence industry is experiencing major developments, with the White House issuing new orders requiring companies to provide pre-release access to powerful models. Meanwhile, AI developer Anthropic is preparing for a public offering and companies like Google are raising billions for AI development, signaling strong commercial momentum in the sector.

Ukraine Strikes Russian Oil Terminal in Retaliation In the ongoing conflict, Ukraine has attacked a Russian oil terminal. The strike follows a series of deadly Russian missile attacks on Ukrainian cities, marking a significant escalation and further targeting of energy infrastructure by both sides.

Eli Lilly Announces Major Kidney Disease Research Deal Pharmaceutical giant Eli Lilly has entered into a collaboration and licensing agreement with Ascidian Therapeutics worth up to $1.9 billion. The partnership will focus on discovering and developing new treatments for kidney diseases.

Private Equity Firm Limits Investor Withdrawals Swiss-based firm Partners Group has placed limits on how much money clients can withdraw from its flagship fund. The move is a response to growing anxiety among wealthy investors about the performance and stability of private market investments.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 3, 2026

Hacker News Feed: Highlighting key posts and discussions.

Every Byte Matters

(fzakaria.com)

10729
Agentic Mfw

(agenticmotherfucking.website)

19459
My Students Can't Read

(www.chronicle.com)

114172
CT scans of BYD car parts

(www.lumafield.com)

438290
MAI-Code-1-Flash

(microsoft.ai)

500231
Show HN: Eyeball

(eyeball.rory.codes)

26380
Stop Ruining It

(seths.blog)

294144
Love systemd timers

(blog.tjll.net)

380263
Why Janet? (2023)

(ianthehenry.com)

466254
macOS needs its grid back

(blog.hopefullyuseful.com)

393258
Chipotlai Max

(github.com)

38366
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - June 3, 2026

huggingface.description

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world's knowledge into its weights. However, many practical applications benefit more from robust reasoning than from extensive parametric knowledge. In this setting, task-specialized small language models (SLMs) offer a principled design choice. We introduce Optimal Cognitive Core (OCC), a family of SLMs built around this premise. As a variant of OCC, we present OCC-RAG, optimized for faithful question answering (QA) grounded in the provided context. This task directly aligns with the OCC design approach, requiring multi-hop reasoning over supplied passages while ignoring memorized knowledge. To train OCC-RAG, we implement a novel pipeline for synthesizing multi-context, multi-hop QA data at scale, producing a corpus of over three million examples targeting multi-hop reasoning, strict context faithfulness, and calibrated abstention. We release OCC-RAG-0.6B and OCC-RAG-1.7B, both mid-trained on this corpus. The models produce structured reasoning traces with source citations grounded in literal quotes from the context. Through OCC-RAG, we demonstrate that compact, task-specialized SLMs can match or exceed general-purpose models 2 -- 6x their size across multi-hop reasoning (HotpotQA, MuSiQue, TAT-QA), faithfulness (ConFiQA), and refusal (MuSiQue-Un) benchmarks.

32
Trust Region On-Policy Distillation

On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ substantially, as teacher supervision on student-generated tokens may yield unreliable policy gradients and even cause optimization failure. This work addresses reliable on-policy token-level supervision through credit assignment strategies, and proposes Trust Region On-Policy Distillation, TrOPD. It features the following characteristics: 1) Trust-Region On-Policy Learning: TrOPD performs OPD only in regions where the teacher provides reliable supervision, mitigating the optimization difficulty of the K1 reverse-KL estimator under distribution mismatch. 2) Outlier Estimation: For outlier regions, we explore gradient clipping, masking, and forward-KL estimation to reduce the adverse effects of unreliable supervision. 3) Off-Policy Guidance: The student continues generation from teacher prefixes and uses forward KL to imitate off-policy guidance, encouraging on-policy exploration toward reliable regions. Experiments show that TrOPD consistently outperforms SoTA OPD baselines, including OPD, EOPD, and REOPOLD, across mathematical reasoning, code generation, and general-domain benchmarks.

30
Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks highly dynamic behaviors while achieving unprecedented zero-shot generalization to unseen motions and control tasks. Extensive experiments and scaling analyses show that our model establishes a new performance frontier, demonstrating robust zero-shot generalization to unseen tasks while simultaneously tracking highly dynamic and complex motions.

27
A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creative writing (CW), but training on one domain often degrades performance on others. Existing explanations based on catastrophic forgetting or global gradient conflict are incomplete: substantial interference can occur even when full-model gradients are nearly orthogonal. We show that single-domain RL produces sparse, small-magnitude parameter edits with weak overlap among top-changed neurons, while different domains still share substantial active computation routes on which update directions determine whether they act synergistically or conflict. Guided by this observation, we prove under a local perturbation model of multi-domain RL that later-domain training harms an earlier domain mainly through a second-order damage term, which under the observed sparse route structure concentrates in a low-dimensional shared conflict subspace. Moreover, a short domain refresh contracts the harmful component on this subspace, enabling selective recovery with limited collateral damage. Consistent with the theory, a brief Re-Math refresh after Code rightarrow Math rightarrow QA rightarrow CW recovers Math from 57.66 to 66.04 while largely preserving performance on the other domains, yielding the best average score of 66.39. Beyond refresh, a training-free rollback on a sparse proxy conflict coordinate set for the Math-QA pair partially restores Math, providing direct proxy-level evidence for localized damage. These results provide a localized mechanistic account of interference and recovery in multi-domain RL.

19
From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept relative to other concepts. Yet strong activation alone does not establish that a region represents the concept itself, as responses may instead be driven by correlated visual or semantic cues. We introduce BrainCause, an automated framework that combines generative and brain models to synthesize controlled stimuli and validate neural representations through targeted causal testing. Given a query specifying a concept of interest, our framework constructs targeted stimulus sets comprising concept images, counterfactual edits that remove the target concept while preserving other image content, and images with candidate correlated distractors. It then uses an image-to-fMRI encoding model to predict brain responses and searches for representations that respond specifically to the target concept over correlated alternatives. BrainCause returns validated candidate representations and proposes follow-up fMRI experiments to further test or extend its discoveries. Our approach successfully recovers known functional localizations and identifies new candidate representations across dozens of concepts, validated on both predicted and measured fMRI data. Critically, we show that without causal validation, a large fraction of localizations would be false positives, confirming that activation alone is insufficient evidence of representation.

17
World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. However, generated rollouts are stochastic and may be visually plausible but task-incorrect, making it necessary to determine when visual simulation is useful, whether a rollout is credible, and how it should influence the final answer. We formulate this problem as controlled concrete reasoning, where a model learns to invoke, verify, and integrate visual future simulation alongside abstract reasoning. To study this setting, we construct two human-verified benchmarks, VRQABench for controllable spatial lookahead and OpenWorldQA for open-domain physical prediction, and propose Privileged-Future On-Policy Self-Distillation (PF-OPSD). During training, PF-OPSD uses ground-truth future videos and answers only as teacher-side privileged context to evaluate on-policy concrete-reasoning trajectories, while the deployable student never observes true futures at test time. Experimental results show that PF-OPSD outperforms baseline by 10.6% and 10.9% on VRQABench and OpenWorldQA, respectively, while increasing robustness to noisy or conflicting rollouts. Our code and dataset are available at https://github.com/yczhou001/PF-OPSD.

16
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. However, existing medical agent benchmarks primarily evaluate final outputs, providing limited visibility into agent behavior within the research process. To address this gap, we present AutoMedBench, a workflow-aware benchmark for autonomous medical-AI research across diverse medical imaging and multimodal inference tasks, organizing agent execution into a unified five-stage workflow (S1-S5): Plan, Setup, Validate, Inference, and Submit. It comprises long-horizon tasks with each run averaging 33 agent turns, spanning five research tracks: segmentation, image enhancement, visual question answering (VQA), report generation, and lesion detection. Each task is evaluated under two difficulty tiers, Lite and Standard, which use the same data and metrics but differ in the amount of task-brief scaffolding, and each run is scored using both final task performance and S1-S5 stage scores, enabling stage-level analysis from the initial task brief to the final submitted artifact. Across thousands of recorded runs, stage-level scoring reveals that Validate is the weakest workflow stage on average, whereas Setup is the strongest, suggesting that current agents are better at making pipelines executable than at verifying their reliability. Post-run error analysis further shows that verification and submission failures dominate tagged errors, accounting for 37.7% and 38.1% of fired codes respectively, whereas task-understanding errors are rare at 0.9%, and runs with one fired error code have a 48% lower overall score than runs with no error code on average.

15
MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Mid-training has become an important stage in modern LLM development, using large-scale curated mixtures to strengthen capabilities before final post-training. Its data selection problem is distinct: the data are optimized under a pretraining-style objective at near-pretraining scale, but are curated toward downstream capabilities and drawn from heterogeneous sources with different formats and training roles. As a result, effective selection requires both scalability and source-adaptive semantic criteria. Existing model-based methods scale well, but provide only implicit quality signals. Semantic selection methods offer stronger judgments, but usually assume fixed rubrics or standardized data formats. To address this mismatch, we propose MIRA, a source-aware filtering framework based on self-anchored rubric discovery. The key idea is to make rubric construction part of data selection: MIRA first discovers what should be evaluated for each source group, then distills those judgments into scalable student scorers for full-corpus filtering. On code-oriented mid-training with 21 sources and 5 source groups, MIRA outperforms selection baselines across nine code benchmarks and matches the full-corpus run while using only half the tokens.

13
TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

Reinforcement learning (RL) for visual reasoning needs scalable, verifiable, and controllable training signals. Existing visual RL post-training trains on static curated datasets, with fixed image-question-answer samples bounded by their collection budget. In this work, we introduce TRON (Targeted, Rule-verifiable Online eNvironments), an online environment substrate: a training rollout is generated on demand by a controllable generator-verifier program that samples a fresh latent visual state, renders an image, asks a question, and exactly verifies the answer. A single run can therefore draw an unbounded stream of fresh instances at the difficulty level required by the current curriculum. The current TRON suite contains 520 environments organized into five ability buckets (spatial, mathematical, diagram, pattern/logic, and counting); the same substrate supports both a single full model trained on all buckets and per-bucket ability-specialist models, with no additional data collection. We also introduce a substrate analysis covering generation reliability, instance and level diversity, cross-environment near-duplicates, and base-model pass rate by difficulty level. RL post-training with METHOD consistently improves performance on ten external multimodal reasoning benchmarks across Qwen3-VL-4B, Qwen2.5-VL-7B, and MiMo-VL-7B-SFT.

13
Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation

We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property in diffusion models. Crucially, beyond its conventional role of manifold lifting (i.e., moving data off low-dimensional manifolds), injecting Gaussian noise facilitates domain harmonization by implicitly aligning feature distributions across domains, a property particularly advantageous for unified I2I translation. However, existing diffusion models prematurely erode this harmonization effect, as noise and residuals are simultaneously removed in a single coupled diffusion process. To address this, DRDD decouples the diffusion process into two sequential and independent diffusion stages: (1) a stochastic noise diffusion for domain harmonization and manifold lifting, and (2) a deterministic residual diffusion that learns the core semantic mapping entirely within the fixed-noise domain. This decoupling preserves harmonization and manifold lifting effects throughout the transformation, substantially simplifying the learning of unified mappings across diverse tasks and domains. Notably, the noise diffusion stage is trained exclusively on abundant, unpaired target-domain images, greatly improving data efficiency. Comprehensive theoretical and empirical analysis demonstrates that DRDD is broadly compatible with mainstream diffusion models and consistently delivers robust, unified I2I translation, even under limited paired data. Our code is available at https://github.com/HKU-HealthAI/DRDD.

10
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

10
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

Personalization is a crucial capability of modern language agents. However, current research primarily positions personalized agents as passive responders to user preferences, limiting their ability to interact with users and provide suggestions or guidance proactively. To systematically evaluate such proactive personalization in realistic interactions, we propose Ψ-Bench, a benchmark for assessing LLMs' ability to influence realistic users through conversation. We design three real-world interaction scenarios that involve persuasion in Ψ-Bench, and endow simulated clients with personal characteristics through explicit user profiles derived from dialogue histories. We evaluate 10 frontier LLMs on Ψ-Bench and find that while most models can produce coherent and reasonable arguments, even state-of-the-art models still leave considerable room for improvement in persuasion. We also find that providing access to client profiles yields an average performance gain of 18.24\%, highlighting the importance of user-specific information for effective persuasion. Overall, our work highlights persona-sensitive influencing as a challenging yet practical direction for evaluating and developing more proactive personalized LLM agents. Codes are available at: https://github.com/Hanpx20/Psi-Bench.

9
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging

Instruction tuning aligns large language models, including multimodal ones, with diverse user intents, but scaling to heterogeneous mixtures is hindered by gradient interference and bandwidth-heavy synchronization. We ask whether these two bottlenecks can be addressed jointly by training parts of the mixture independently and reconciling them once in parameter space. We develop a local quadratic theory inside a shared flat basin that yields three results: weight merging produces a curvature-weighted variance reduction; PCA-aligned conflict splitting maximizes this gain along high-curvature directions; and merging additionally acts as spectral filtering with implicit norm regularization. These results directly motivate MERIT, a decentralized merge-ready instruction-tuning pipeline that estimates dataset-level gradient conflicts, partitions the mixture along the top PCA conflict axes, fine-tunes each partition independently with no inter-partition communication, and merges once via token-weighted averaging. On Qwen2.5-VL-3B with 136 Vision-FLAN tasks, MERIT improves the 8-benchmark average from 54.3 (joint training) to 57.0. The same recipe scales to a 7B model on a 1.6M-example, 176-source mixture -- matching or exceeding centralized joint training with minimal cost overhead -- and transfers to text-only FLAN. Our code is available at https://github.com/naver-ai/merit.

7
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules or rely on distribution assumptions. In this work, we formulate adaptive sampling as a Markov decision process (MDP). We train a lightweight sampling controller with reinforcement learning (RL) to jointly balance answer correctness, latency, and computation cost. At each round, the controller decides to stop sampling or to acquire additional samples. Our method is lightweight which only relies on statistics of final answers, and can be trained and deployed on CPU. We further show that the resulting framework admits an interpretation as the Lagrangian relaxation of a constrained optimization problem with explicit budget constraints. Experiments against strong baselines such as ASC and ESC show that our method achieves improved trade-offs among answer correctness, sampling rounds, and total samples required.

6
Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears sufficiently supported, but the trace continues with additional reasoning that remains in the supervised target. To test its training effect, we use a delete-only editor to construct answer-preserving suffix removal and compare CoT-based SFT on the original and processed traces. We observe improved SFT outcomes after removing the editor-identified post-conclusion continuation, suggesting that this continuation is harmful to training in our setting. We therefore refer to this empirically supported phenomenon as harmful continuation. Beyond this intervention, we further characterize the removed post-conclusion continuation through uncertainty and hidden-state progress. We observe persistent local uncertainty together with weakened terminal-directional progress, forming an uncertainty--geometry mismatch. Finally, we instantiate Harmful Continuation Cut (HCC), a lightweight boundary proxy that approximates the editor-identified post-conclusion continuation boundary.

6
PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.

5
MERIT: Learning Disentangled Music Representations for Audio Similarity

Current music similarity models typically compute a single, monolithic score, entangling distinct musical dimensions like melody, rhythm, and timbre. This limits user control and interpretability, making it impossible to execute nuanced queries. We introduce MERIT, a framework for learning disentangled, factor-specific music representations tailored to these three core dimensions. To overcome the lack of isolated musical variations in real-world audio, we use a novel training strategy that uses conditional audio generation and source-separated stems to strongly encourage single-factor variation in training data. Our evaluations demonstrate strong factor-wise disentanglement. Each head responds strongly to its intended perceptual dimension while remaining near chance on the others, a representational property that holds across both the synthetic training domain and independent real-world audio.

5
NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.

5
PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps

Embodied visual navigation, where an agent perceives a complex environment and acts to reach a goal from raw sensory input, underpins a wide range of applications such as household service robotics, assistive robotics, and large-scale autonomous exploration. However, recent attempts to unify vision-and-language navigation (VLN) and object goal navigation (ObjNav) remain at the level of architectural fusion, mixed-task training, and large vision-language pretraining, without examining whether independently trained vision and language encoders may already share a common semantic structure. Moreover, even object-centric topological maps still ground language goals through explicit cross-modal supervision such as CLIP or large vision-language models, leaving open whether such grounding is possible from a purely vision-built map. To address these challenges, we extend the Platonic Representation Hypothesis to embodied navigation and recast vision-only ObjNav, cross-modal ObjNav, and VLN as three different interfaces to the same object-centric semantic manifold. We further introduce PlatonicNav, a training-free framework whose Platonic Topological Map fuses geometric and semantic node distances from a self-supervised visual encoder, and grounds language goals via blind matching without any paired vision-language data. Extensive experiments on simulation benchmarks including HM3D-IIN, OVON, and R2R-CE on MP3D, together with deployment on Unitree Go2, demonstrate that PlatonicNav generalizes across tasks, modalities, and embodiments without explicit cross-modal training. Code: https://github.com/AIGeeksGroup/PlatonicNav. Website: https://aigeeksgroup.github.io/PlatonicNav.

5
Benchmarking Visual State Tracking in Multimodal Video Understanding

Understanding a video requires more than recognizing isolated moments, as humans continuously track entities, states, and events over time. This capacity for visual state tracking is fundamental to video understanding, yet remains underexplored in current evaluations of Multimodal Large Language Models (MLLMs). We introduce Visual STAte Tracking benchmark (VSTAT), a video-based benchmark designed to diagnose visual state tracking in MLLMs. VSTAT consists of 834 clips drawn from both synthetic and real-world videos, paired with 1,500 questions that cannot be answered from any single frame or short segment, requiring continuous perception and integration of events across the entire video stream. Despite their strong performance on existing video benchmarks, we find that state-of-the-art MLLMs perform far below humans and only modestly above answer-prior baselines. To analyze this gap, we compare MLLMs' thinking traces with the underlying video stream to understand why and when MLLMs fail on VSTAT. We find that MLLMs reason and track correctly in text, but fail at visually perceiving the events they need to track. Finally, our preliminary evaluation suggests that recent agentic approaches, including MLLM-based video agents and coding agents, do not readily resolve these failures, still falling short on VSTAT.

4
Value-Aware Stochastic KV Cache Eviction for Reasoning Models

Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse attention alternatives, which keep the full KV cache. We identify key factors crucial to KV cache eviction accuracy. First, a small fraction of value states have abnormally large magnitudes, and evicting them causes catastrophic failure where models enter repetitive reasoning loops. Second, introducing stochasticity during eviction improves accuracy by increasing cache diversity. Based on these findings, we propose Value-aware Stochastic KV Cache Eviction (VaSE), a training-free recipe that protects large-magnitude value states and promotes diverse eviction decisions. Across six reasoning tasks, Qwen3 models using VaSE with 4x KV cache compression yield higher average accuracies than SOTA selection method at the same sparsity, while outperforming the strongest eviction method by more than 4%. Overall, VaSE bridges the gap between efficiency and accuracy, supporting FlashAttention2 and enabling a static memory footprint for reasoning models.

3
ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

Agent skills extend AI agents with reusable instructions, tools, scripts, references, and workflows, establishing a security boundary distinct from both model safety and traditional package-malware detection. ClawHub Security Signals is a sanitized dataset of 67,453 latest public OpenClaw skill versions. Each row pairs redacted SKILL.md content and sanitized bundled files where present with a final ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Rather than estimating malicious-skill prevalence, we study scanner disagreement. The three scanners rarely flag the same skills: any pair overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. The disagreement is structured by attack surface. SkillSpector, which raises semantic agentic-risk advisories rather than malware-reputation signals, is positive for 19,209 of 25,504 suspicious rows (75.3%) but only 14 of 206 malicious rows (6.8%). The malicious-verdict region shows the inverse profile: 150 of 206 malicious rows (72.8%) are VirusTotal-positive, consistent with bundled-code malware evidence. These results show that agent-skill security requires layered governance, not single-scanner allow/block decisions. The corpus is released as a sanitized silver-standard dataset: labels are the registry's automated verdicts, not human-annotated ground truth, and the release represents an early, versioned snapshot intended to support the community while a human-annotated subset is developed. Further research is encouraged, including models tailored for skill-security triage.

2
A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems

Finite element analysis (FEA) is the most important numerical approach for solid mechanics. Challenges of FEA include a steep learning curve for entry-level users and potential false simulations due to incorrect definitions of key simulation components, such as boundary conditions, load cases, and solution variables. Years of engineering experience are usually necessary for real-world problem-solving. To address these issues, we present AbaqusAgent, a multi-agent framework grounded in large language models (LLMs) for solid mechanics analyses. AbaqusAgent is developed to facilitate analysis case generation and execution using Abaqus, one of the most widely used FEA packages, by turning users' natural-language instructions into executed FEA analyses and result visualization. AbaqusAgent is composed of six agents, including interpreter, architect, input writer, runner, reviewer, and visualizer agents, encompassing all the essential pre-processing and post-processing steps of standard FEA analyses. A wide variety of 50 solid mechanics problems have been successfully validated, achieving an overall success rate of 86%. Beyond improving the efficiency of FEA for solid mechanics problems and lowering the barrier to computational mechanics education, AbaqusAgent advances the human-simulation interaction paradigm and enables integration with AI-empowered optimization and material characterization workflows. The code is available at https://github.com/LIRAM-LIN/AbaqusAgent

1
αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion

Accurately modeling soft boundaries, e.g., hair and defocus blur, is a fundamental challenge in stereo conversion due to the ambiguous blending of foreground and background. Existing depth models primarily predict single-layer depth, leading to ambiguity in depth correspondence at soft boundaries. While matting techniques can capture opacity for layered modeling, they often struggle in complex scenes with multiple targets and usually require user intervention. This paper introduces αDepth, a layered representation that decomposes soft boundaries for high-fidelity stereo conversion. Specifically, we first resolve mixed color and depth ambiguity by estimating layered color and depth values at soft boundaries. Considering complex multi-target scenes, we design a Circular Alpha Representation (CAR) that shifts the paradigm from global target extraction to local boundary decomposition. Unlike prior matting methods restricted to a single foreground/background, CAR enables efficient scene-level inference without manual guidance. Extensive evaluations demonstrate that αDepth achieves state-of-the-art performance in stereo conversion, eliminating background bleeding and structural distortions at soft boundaries.

1
BA-T: An Iterative Transformer for Two-View Bundle Adjustment

Feed-forward models for 3D reconstruction have achieved strong performance using deep cross-view attention to exchange information across images. However, these approaches often depend on heavy decoder stacks and lack a structured mechanism for geometry refinement, resulting in poor multi-view consistency. We address this by drawing inspiration from classical bundle adjustment (BA), which can be viewed as an iterative information propagation process between poses and local geometry. Inspired by BA, we propose BA-T, an iterative Transformer that implements BA-style structured updates as a repeatable layer in implicit token space. Instead of relying on deep attention stacks, BA-T refines predictions based on latent residual by a single lightweight layer. Experiments demonstrate that BA-T progressively improves pose and reconstruction accuracy across iterations, achieves stronger cross-view consistency than conventional decoders, and matches or surpasses substantially larger models while using only 16% of their decoder parameters. BA-T provides a compact, efficient, and structural alternative to depth-heavy attention, enabling accurate 3D reconstruction within a lightweight architecture. The code will be made publicly at https://github.com/zhangganlin/BA-T.

0
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Linear probes trained on LLM activations are increasingly proposed as deception-detection metrics, yet report AUROC exceeding 0.96 on clean benchmarks while collapsing under distributional shift. This paper systematically pressure-tests probe-based metrics across the Gemma 3 model family (1B-27B parameters), diagnosing why they fail rather than merely documenting that they fail. We test four hypotheses about deception encoding: (1) single linear direction, (2) multi-dimensional subspace, (3) convex conic hull, (4) entropy proxy. Our design includes cross-domain transfer matrices, multi-dimensional probe analysis with permutation null baselines, entropy-residualization tests, and distractor evaluations across 8 stylistic shifts. We find that: (a) probes achieve near-perfect AUROC (>=0.998) on clean data but collapse under stylistic shifts; style-augmented probes recover near-perfect detection (mean AUROC 0.979-0.983) on unseen styles; (b) the single-direction hypothesis is rejected (k=1 captures only 0.61-0.80 AUROC), with cross-domain transfer failure confirmed as geometric rather than layer-mismatch-driven; (c) the entropy-proxy hypothesis is rejected (max |rho|=0.454, max Delta-AUROC after residualization=0.004); and (d) deception does not form a significant linear subspace (per-domain k*=0), yet multi-dimensional probes (k>=5) recover the signal through distributed sub-threshold features. Probe fragility reflects distributional narrowness rather than an architectural limitation: style-augmented probes recover near-perfect detection at both 4B and 27B, establishing that the inverse scaling pattern is a training-distribution artifact rather than a genuine scale-dependent phenomenon.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 3, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Carbone Skill for AI icon
Carbone Skill for AI

Teach your AI to build document templates

0
Dispatch icon
Dispatch

Your app launch hub with ASO audit, keywords, and ads

0
Devin Desktop icon
Devin Desktop

Manage fleets of local and cloud agents from one surface

0
Handler icon
Handler

Review AI edits like stacked PRs at generation time.

0
BeerShot icon
BeerShot

Screen recording studio for Windows

0
TaskGPT icon
TaskGPT

Voice agent for MacOS

0
BoxBox icon
BoxBox

File manager for Linux homelab and NAS-style servers

0
Hermes Desktop icon
Hermes Desktop

The agent that grows with you

0
Brand Context API icon
Brand Context API

Ship AI that stays on-brand

0
Forward icon
Forward

Installs your API into a customer's codebase in one command

0
Elentaria icon
Elentaria

Your GTM: from diagnosis to execution

0
Uselink icon
Uselink

host your html, share the link, and get comments

0
Spectron icon
Spectron

Agent memory you can trust

0
StampCam icon
StampCam

Turn any photo into a postage stamp or sticker

0
Dropstone 1.5 icon
Dropstone 1.5

2× Claude Code Pro's usage at $15/mo

0
Barflare icon
Barflare

Cloudflare Tunnels, managed from your menu bar

0
Franz 6 icon
Franz 6

All your messaging apps in one window — with private AI

0
Replicas icon
Replicas

Run Claude Code and Codex in the cloud

0
Walkable icon
Walkable

Safety-first walking navigation to walk the safest routes

0
Town icon
Town

The assistant that learns how you work, then gets to work.

0
RadianceKit icon
RadianceKit

Turn photos into 3D Gaussian Splats on your Mac

0
audien.to icon
audien.to

Turn recordings into source-linked work

0
FolderPlus icon
FolderPlus

Peek inside any folder without opening it

0
InsForge Backend Branching icon
InsForge Backend Branching

Git style branching for your backend

0
superlog icon
superlog

Autonomous observability tool that finds & fixes bugs

0
Composer icon
Composer

Multiplayer markdown for you, your team, and your agents.

0
EchoFlow icon
EchoFlow

Native Android AI chat with chats stored locally

0
Linkeezy icon
Linkeezy

LinkedIn inbox and feeds without the chaos.

0
Wallie V2 icon
Wallie V2

The open-source AI streamer that actually feels alive

0
Rodeo by TwelveLabs icon
Rodeo by TwelveLabs

Describe your shot. Rodeo builds your first cut.

0
choclift icon
choclift

Use iPhone to open apps, Apple Shortcuts and websites on Mac

0
Mirowl icon
Mirowl

Search all your screenshots via a local OCR-powered AI

0
Branda icon
Branda

A fun new way to create & manage brands.

0
ConnectWizard icon
ConnectWizard

Unlock hidden App Store Connect analytics

0
Brief icon
Brief

Navigate your agents to product-market fit

0
Co-Invest icon
Co-Invest

Trade 500+ markets directly from ChatGPT & Claude

0
PawPause icon
PawPause

Lock your keyboard and prevent cats from causing chaos

0
Fundraisly icon
Fundraisly

AI fundraising agent that finds investors and books meetings

0
Gusto Cofounder icon
Gusto Cofounder

If Gusto, OpenClaw, and Claude Cowork had a baby...

0
Sortail icon
Sortail

Self-learning one-click inbox cleanup for Apple Mail

0
HumToBeats icon
HumToBeats

Turn humming into AI-generated beats

0
findloc.ai icon
findloc.ai

Make your business citable by ChatGPT, Claude & Perplexity

0
Gigacatalyst icon
Gigacatalyst

Give your Sales and CS teams engineering superpowers

0
Enshittifier icon
Enshittifier

Chrome extension that replaces "AI" with 💩

0
Paste MCP & AI Tools icon
Paste MCP & AI Tools

Infinite clipboard for Claude, Codex and other AI tools

0
GlowPulse icon
GlowPulse

Your Mac's camera is now a heart-rate sensor

0
Knock agent for Slack icon
Knock agent for Slack

Build, manage, and ship customer messaging from Slack

0
Kompassify 2.0 icon
Kompassify 2.0

User onboarding now with an AI copilot

0
MartinLoop icon
MartinLoop

Control AI coding agents with limits, proof, + run receipts

0
Moxie Docs icon
Moxie Docs

Living docs + MCP context for your GitHub repos

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 3, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Anthropic unveils a Services Track for its Claude Partner Network, a ranking based on what companies built with Claude, and releases a Claude Partner Hub portal (Belle Lin/Wall Street Journal)
Source: TechmemePublished: Jun 3, 2026

Belle Lin / Wall Street Journal : Anthropic unveils a Services Track for its Claude Partner Network, a ranking based on what companies built with Claude, and releases a Claude Partner Hub portal —  AI giant is solidifying its Claude Partner Network to help demonstrate ‘durability of revenue’ as it nears going public

Meta launches Meta Business Agent globally on WhatsApp and Instagram DMs, after a two-year pilot of the customer support AI bot in India, Mexico, and others (Ivan Mehta/TechCrunch)
Source: TechmemePublished: Jun 3, 2026

Ivan Mehta / TechCrunch : Meta launches Meta Business Agent globally on WhatsApp and Instagram DMs, after a two-year pilot of the customer support AI bot in India, Mexico, and others —  For years, WhatsApp has been a communication layer for businesses of all sizes around the world.

Google sold $35B in stock in its equity raise this week, up from its planned $30B and taking its total funding to $85B; a source says it contacted 75 investors (Katherine Blunt/Wall Street Journal)
Source: TechmemePublished: Jun 3, 2026

Katherine Blunt / Wall Street Journal : Google sold $35B in stock in its equity raise this week, up from its planned $30B and taking its total funding to $85B; a source says it contacted 75 investors —  $35 billion  —  That's how much stock Google parent Alphabet ended up selling this week in underwritten public offerings …

Apple plans to open an Apple Developer Center in Berlin in 2026, its first center in Europe and fifth globally, offering in-person developer access and sessions (David Phelan/Forbes)
Source: TechmemePublished: Jun 3, 2026

David Phelan / Forbes : Apple plans to open an Apple Developer Center in Berlin in 2026, its first center in Europe and fifth globally, offering in-person developer access and sessions —  In an announcement that is characteristically well-timed before the upcoming Worldwide Developers Conference which kicks off on Monday …

Boston-based software monitoring startup Coralogix raised a $200M Series F at a $1.6B post-money valuation, up from $1B+ after a $115M Series E in June 2025 (Jagmeet Singh/TechCrunch)
Source: TechmemePublished: Jun 3, 2026

Jagmeet Singh / TechCrunch : Boston-based software monitoring startup Coralogix raised a $200M Series F at a $1.6B post-money valuation, up from $1B+ after a $115M Series E in June 2025 —  Coralogix, a Boston-headquartered software monitoring startup founded in Israel, has raised $200 million in a new funding round …

Analysis: 10 Trump officials, including special envoy Steve Witkoff, reported holding stakes in SpaceX or xAI collectively worth $9.9M+, ahead of SpaceX's IPO (Bloomberg)
Source: TechmemePublished: Jun 3, 2026

Bloomberg : Analysis: 10 Trump officials, including special envoy Steve Witkoff, reported holding stakes in SpaceX or xAI collectively worth $9.9M+, ahead of SpaceX's IPO —  SpaceX's initial public offering will likely make President Donald Trump's already wealthy administration even richer.

The EU's Cloud and AI Development Act aims to triple data center capacity in five to seven years; the Chips Act 2.0 recommends allowing direct EU investments (Gian Volpicelli/Bloomberg)
Source: TechmemePublished: Jun 3, 2026

Gian Volpicelli / Bloomberg : The EU's Cloud and AI Development Act aims to triple data center capacity in five to seven years; the Chips Act 2.0 recommends allowing direct EU investments —  The European Union unveiled a sweeping plan to expand its domestic technology supply chains, targeting greater independence …

The EU Commission unveils its tech sovereignty package, including the EU's Cloud and AI Development Act, in a bid to reduce its reliance on US tech companies (Mathieu Pollet/Politico)
Source: TechmemePublished: Jun 3, 2026

Mathieu Pollet / Politico : The EU Commission unveils its tech sovereignty package, including the EU's Cloud and AI Development Act, in a bid to reduce its reliance on US tech companies —  The European Commission's so-called tech sovereignty plan aims to boost domestic champions rather than shut out American competitors.

Baidu CFO Henry He says the company plans to spin off and list its chip unit Kunlunxin in Hong Kong and Shanghai in 2026, making it more like a "neutral player" (Tracy Qu/Wall Street Journal)
Source: TechmemePublished: Jun 3, 2026

Tracy Qu / Wall Street Journal : Baidu CFO Henry He says the company plans to spin off and list its chip unit Kunlunxin in Hong Kong and Shanghai in 2026, making it more like a “neutral player” —  Plans to spin off and list Kunlunxin Technology in Hong Kong and Shanghai are on track

ESA and YouGov: 212.3M people in the US between the ages of 5 and 90 play video games every week, up 3% from 2025; the average age of players rises to 37 (Jennifer Maas/Variety)
Source: TechmemePublished: Jun 3, 2026

Jennifer Maas / Variety : ESA and YouGov: 212.3M people in the US between the ages of 5 and 90 play video games every week, up 3% from 2025; the average age of players rises to 37 —  Two-thirds of Americans play an hour or more of video games per week, according to a new report published Wednesday by the Entertainment Software Association (ESA).

Supercell, King, and Sybo warn that EU's Digital Fairness Act, requiring pop-ups showing real-world values of virtual currencies, could make games "unplayable" (Richard Milne/Financial Times)
Source: TechmemePublished: Jun 3, 2026

Richard Milne / Financial Times : Supercell, King, and Sybo warn that EU's Digital Fairness Act, requiring pop-ups showing real-world values of virtual currencies, could make games “unplayable” —  Makers of ‘Clash of Clans’, ‘Candy Crush Saga’ and ‘Subway Surfers’ warn EU proposals could throttle rare tech success

Europol and international law enforcement agencies dismantle nine organized crime groups and arrest 29 in an illegal streaming crackdown, removing 27,000+ URLs (Sergiu Gatlan/BleepingComputer)
Source: TechmemePublished: Jun 3, 2026

Sergiu Gatlan / BleepingComputer : Europol and international law enforcement agencies dismantle nine organized crime groups and arrest 29 in an illegal streaming crackdown, removing 27,000+ URLs —  European and international law enforcement agencies have dismantled nine organized crime groups and arrested 29 suspects …

AI market research platform AlphaSense raised $350M from Vitruvian, Accenture, and others at a $7.5B valuation, up from $4B in 2024, ahead of a possible IPO (Ben Glickman/Wall Street Journal)
Source: TechmemePublished: Jun 3, 2026

Ben Glickman / Wall Street Journal : AI market research platform AlphaSense raised $350M from Vitruvian, Accenture, and others at a $7.5B valuation, up from $4B in 2024, ahead of a possible IPO —  Firm raises $350 million from investors including Accenture and JPMorgan's asset-management unit.

Apoha, which is developing a "Liquid State Intelligence" data layer to measure molecules in the real world, emerges from stealth with $36M led by Singular (Mike Butcher/Pathfounders)
Source: TechmemePublished: Jun 3, 2026

Mike Butcher / Pathfounders : Apoha, which is developing a “Liquid State Intelligence” data layer to measure molecules in the real world, emerges from stealth with $36M led by Singular —  Apoha has emerged from stealth with $36 million to build what it calls “Liquid State Intelligence” …

How OpenAI, Anthropic, and other AI startups are pursuing recursive self-improvement, in a bid to build AI that can improve itself with little to no human input (Financial Times)
Source: TechmemePublished: Jun 3, 2026

Financial Times : How OpenAI, Anthropic, and other AI startups are pursuing recursive self-improvement, in a bid to build AI that can improve itself with little to no human input —  Industry chiefs say technology within their grasp is the key to superintelligence.  Safety experts say we're not ready.

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 3, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 3, 2026

Solidot Feed: Highlighting essential tech & open-source news.

微软的量子芯片存在基础性问题

微软宣布了其第二代量子芯片 Majorana 2。但专家认为微软的量子芯片缺乏坚实的研究基础,根本行不通。微软是在 2025 年初宣布了其第一代量子计算芯片 Majorana 1,利用它所谓的拓扑体去观察和控制马约拉纳粒子,从而产生更可靠和可扩展的量子比特。第一代拓扑体使用砷化铟半导体和铝超导体,结果到了第二代微软换成了铅超导体,声称量子比特的寿命从 20 秒延长到了 1 分钟。科学家对微软的说法持强烈怀疑态度,它的最新论文预印本尚未通过同行审议,物理学家 Henry Legg 认为预印本中数据来自于随机伪影。微软的上一篇预印本至今没有通过同行审议,很可能已被顶尖期刊拒绝了。

四千年前的古城 Mohenjo-daro 随经济发展而变得更平等

约克大学研究人员分析了古城 Mohenjo-daro 的住房模式。这座古城位于今天的巴基斯坦,其繁荣的时代是在公元前 2600 年至 1900 年间,它是印度河文明的最大城市之一。研究人员发现,Mohenjo-daro 的贫富差距低于其他古代城市。随着时间的推移,其贫富差距甚至缩小了。这座古城与其它文明的古城有显著的差异:没有宫殿没有统治者的巨型雕像没有奢侈陵墓,但拥有井然有序的街道和先进的排水系统,其公共基础设施遍及全城而不是只服务于精英阶层。古埃及为统治者建造金字塔,青铜时代的希腊为精英阶层建造宫殿,而 Mohenjo-daro 则投资于面向全体民众的公共服务。Mohenjo-daro 挑战了长期以来“经济增长会导致不平等加剧”的观点,城市发展和生产力提高的同时,资源分配也更加公平。

高通 CEO 称抵抗 AI 是徒劳的

高通 CEO Cristiano Amon 在台北电脑展上发表主题演讲,宣称抵抗是徒劳的,AI 智能体将会变得不可见,不可避开,并且能跨设备跟踪用户。他表示智能体将会从根本上改变人类与技术的关系。今天的手机是数字生活的中心,一切都围绕着手机展开,不久的将来智能体将取代手机。而手机就像可穿戴设备一样成为智能体的延伸。“智能体不局限于设备,它会随着用户移动。无论你使用什么设备,它都与你同在,”他解释道。“一旦你理解这种变化,你就能明白整个移动行业将如何变革。”

2026 年智能手机出货量预计下降 13.9%

根据 Counterpoint Research 最新的智能手机市场展望追踪报告,全球智能手机市场正进入近年来较为明显的调整阶段。2026 年全年出货量预计同比下降 13.9% 至约 10.8 亿部,其触发因素是近几周加剧的存储供应紧张,加上伊朗冲突。数据显示,2026 年第二季度移动 LPDDR4/5 价格预计较 2025 年第四季度增长约两倍,考虑到半导体制造的高资本投入与长交付周期,供应紧张情况预计将持续至 2027 年下半年。低端设备受到的影响更为明显。随着晶圆厂将产能转向 AI 驱动的 HBM 和服务器 DRAM,预计 2026 年 LPDDR4 供应将缩减超过 40%,使得入门级产品的成本效益持续降低。2026 年第一季度全球智能手机批发价格同比上涨 14%,随着前期库存的逐步消化,价格上行趋势仍将持续。部分 150 美元以下的细分市场,正面临被市场逐步淘汰的风险。

雄性园丁鸟用漂亮人造装饰品吸引雌性

雄性园丁鸟以其错综复杂的求偶仪式知名。它们用树枝搭建隧道,用从环境中收集的各种亮丽物品进行装饰。当雌鸟前来参观时,雄鸟会将自己最闪亮的物品抛向雌性,展示华丽的羽毛,希望以此吸引雌性。根据《Royal Society Open Science》期刊上的一篇新论文,城市化以及随之而来的亮丽人造品的日益流行,对澳大利亚雄性园丁鸟的求偶行为产生了显著影响,研究人员甚至还发现了手铐。对城市和农村园丁鸟的观察发现:城市鸟使用人造装饰品的可能性是农村鸟的十倍以上,而农村鸟更多使用天然物品作为装饰品。城市园丁鸟装饰品数量几乎是乡村园丁鸟的五倍,平均有 90 件,而农村园丁鸟平均只有 20 件。有一只生活在城市的雄性园丁鸟甚至收集了 300 件装饰品。无论是生活在城里还是乡下,园丁鸟都表现出对人造装饰品的偏爱。研究人员称,人类活动正以意想不到的方式改变自然界。

特朗普签署行政令要求 AI 公司让政府先行评估其新模型

美国总统特朗普周二签署了一项行政令,要求 AI 公司让政府先行评估其新模型的能力。行政令还要求 AI 公司在自愿的基础上参与基准测试流程,以评估模型的“高级网络能力”,确定其是否应被视为“受保护的前沿模型”。行政令要求 AI 公司在正式发布新模型前提前最多 30 天给予政府访问权限。

Vim Classic 8.3 释出

Vim 项目在 2025 年 12 月宣布了生成式 AI 政策:只要大模型生成代码予以披露以及代码风格与现有代码保持一致,那么 AI 代码就可以接受。但项目的多位资深参与者对接受 AI 代码持反对意见,不想看到 AI 代码泛滥,他们选择了创建没有 AI 代码的分支,其中一个分支就是 Drew DeVault 的 Vim Classic。出于长期维护的考虑,Vim Classic 不是基于较新的 Vim 9 系列,而是基于 Vim 8.2.0148。他刚刚释出了 Vim Classic 8.3,主要是从上游版本移植了部分 bug 修正和补丁。由于缺乏资源,部分 Vim 插件与 Vim Classic 不兼容。

欧洲议会默认搜索引擎从 Google 切换到 Qwant

根据内部电子邮件,欧洲议会内部计算机的默认搜索引擎将于 6 月 4 日起从 Google 切换到法国搜索引擎 Qwant,此举是出于对数字主权和隐私的考虑。Qwant 被描述为以隐私为中心的欧洲搜索引擎,不追踪用户或收集个人数据。Qwant 成立于 2013 年,突出了隐私保护,为用户提供了 Google 之外的一种选择。通过 Firefox 和 Edge 浏览器地址栏进行的搜索将自动路由到 Qwant,但欧洲议会议员仍然可以自由使用其它搜索引擎或更改其默认设置。欧盟委员会正在加强技术主权,减少对外国技术供应商的依赖,扶持欧洲本土技术。

拒绝停止呼吸的土壤

法国生化学家 Sébastien Fontaine 15 年来一直试图杀死土壤,他想要了解没有任何生命的土壤能释放多少碳。 他的团队将土壤密封在罐子内,用伽马射线进行灭菌照射。然后等待土壤释放的二氧化碳——这是微生物呼吸持续进行的标志——下降。他们等待了几周,几个月。在显微镜下,经辐射处理的土壤没有显示任何生命迹象,但它仍在继续释放二氧化碳。土壤拒绝停止呼吸。Fontaine 的实验室重复了实验得到了相同的结果。研究人员开始寻找无生命土壤中的呼吸来源。Fontaine 的团队如今报告,他们的土壤样本在六年内持续消耗氧气并释放二氧化碳。他们提出,为生命提供能量的代谢过程也可能发生在活细胞之外。他们的实验表明,即使没有通常组织土壤的生物蛋白质,这种代谢过程也能在土壤中发挥作用。如果他们的假设正确,那么部分生化反应如释放富碳糖分子能量的反应,可能并非生物所独有。此类反应甚至可能在地球生命出现前就已经存在。

蓝色章鱼是全新物种

2015 年在加拉帕戈斯群岛进行深海考察的科学家在查看遥控潜水器拍摄的影像时,发现了一只体型娇小、通体呈蓝色的章鱼,它位于水下约 1773 米处。科学家捕捉了这只章鱼以进行进一步分析。研究人员如今得出结论:这只体型小到可以放在手掌的可爱小生物属于一个全新物种。研究报告发表在《Zootaxis》期刊上。小章鱼被保存在储藏室中。由于它的独一无二,且极不可能采集到另一只,科学家不愿意对其解剖进行彻底的物种鉴定分析。因此研究团队选择了 mini-CT 扫描,研究表明这种生物手臂很短,臂上的吸盘很少,没有墨囊,皮肤光滑,且有一颗巨大的脊齿。他们将该物种命名为 Microeledone galapagensis。

富铁免疫细胞帮助信鸽导航

迁徒鸟、海龟等动物似乎具有感知地磁场的能力,能利用地磁场进行导航。根据发表在《科学》期刊上的一项研究,信鸽肝脏中的富铁免疫细胞可能赋予了其磁罗盘的能力。对信鸽组织薄片的分析发现,其肝脏巨噬细胞富含铁蛋白,但它在脾脏中很少,且在喙和大脑中完全不存在。电子显微镜的进一步观察发现,巨噬细胞紧邻神经元,而这些神经元都与中枢神经系统相连。研究人员设计了一个试验检验富含铁的巨噬细胞是否能像磁罗盘一样为信鸽指引方向:他们使用名为 clodronate liposomes 的药物抑制巨噬细胞的活性。研究团队训练了 34 只信鸽。白天信鸽利用太阳的位置确定方向。当阴天或完全被云层遮蔽时,它们依靠磁感应辨别方向。研究团队给 18 只信鸽注射了 clodronate,24 小时后当云完全遮蔽阳光时将它们逐一放飞。这些信鸽都佩戴了 GPS 发射器,研究团队能实时追踪其飞行轨迹。所有 18 只信鸽都迷路了,直到天空放晴才返回。16 只对照组的信鸽都没有迷路。研究人员表示,如果铁蛋白辅助导航机制得到证实,那么它可能具有普适性,适用于从蜜蜂到蝙蝠,到鲸鱼和鲨鱼等各种动物。

NASA 低音爆超音速飞机 X-59 将首次尝试突破音速

NASA 宣布,由洛克希德马丁臭鼬工厂设计的 X-59 Quess 低音爆超音速飞机将在本月首次尝试突破音速。X-59 设计能突破音速但同时不会有超音速飞机通常会产生的音爆,它会产生更安静的“砰砰声”,类似室内听到关车门的声音。它没有前向窗户,而是通过摄像头和显示屏为飞行员提供飞机前方的增强现实的外部视觉系统。如果 X-59 成功它有望对超音速飞行和航空业产生革命性影响,解除目前对超音速飞行的限制。X-59 于 2025 年 10 月完成首飞,2026 年 3 月以来进行了 14 次试飞,本月的超音速飞行计划在 16.7 公里高度实现 1.4 马赫。

中国打击快餐行业的幽灵外卖

中国正在打击引发食品安全问题的幽灵外卖。幽灵外卖指的是在外卖平台上提供外卖服务但没有实体店的商家。根据周一生效的新规,外卖平台上的商家信息必须与实体店相符,商家还必须注明是否提供堂食服务。去年北京一男子投诉称他通过外卖平台订购的蛋糕质量不佳,上面装饰着不可食用的花朵。此事引发了对“幽灵外卖”的关注。调查发现,他订购蛋糕的连锁店在各大电商平台上列出了近 380 家门店,但实际上却没有一家实体店。其网店还使用了伪造的营业执照。进一步调查显示,从网店订购的蛋糕实际上外包给一个订单转运平台,该平台会将订单分配给出价最低的第三方商家。当局在两个订单转运平台上共查获了 360 万份蛋糕订单。当局还在七大外卖平台上发现了 6.7 万家“幽灵店铺”,这些店铺与订单转运网站“相互勾结,形成非法供应链”。今年四月,市场监管总局宣布对拼多多、美团、京东、饿了么、抖音、淘宝、天猫 7 家电商平台“幽灵外卖”系列案罚款 36 亿元。

中国将数据和算法纳入商业秘密保护

中国扩大商业秘密保护范围,将数据和算法纳入其中,以加强防范技术外流。中国国家市场监督管理总局修订的《商业秘密保护规定》在星期一(6月1日)正式施行。这是中国法律首次明确将数据、算法等数字资产纳入商业秘密保护范围。新规也对远程办公和跨境企业合作提出更严格的安全要求。企业必须采取保护措施,包括按照员工职级限制文件访问权限、隐藏敏感信息,以及记录用户操作行为等。规定还将境外实施的侵犯商业秘密行为纳入规制范围,但未明确具体执法机制。配合新规实施,中国国家市场监管总局星期一启动为期一个月的专项执法行动,重点针对生物医药、半导体和人工智能等关键领域,严厉打击“恶意挖角”以及员工跳槽时携带商业秘密等行为。

能源危机推动 37 个国家的电动汽车销量创新高

受中东危机导致燃料价格上涨的影响,全球电动汽车销量快速增长。根据标普全球汽车数据统计,在可获取数据的 150 个国家中,3 月有包括澳大利亚和英国在内的 28 个国家刷新了电动汽车单月销量历史纪录。4 月则有包括巴西和菲律宾在内的 9 个国家创下新高。3 月和 4 月期间,91% 的国家电动汽车销量实现增长。在原油进口高度依赖中东的韩国,3~4 月的电动汽车销量同比增长至 2.4 倍。电动汽车在新车销售中的占比提高14个百分点达到 26%。东南亚地区电动汽车销量增长 4 成,市场占比升至 16%。欧盟市场也摆脱了一度停滞的局面,销量同比增长 4 成。中国市场虽然电动汽车销量下降 8%,但由于整体新车需求同步下滑,电动汽车在新车销售中的占比反而提高5个百分点达到 42%。国际能源署在 5 月发布的报告中指出,此次能源危机的应对方式“将在未来几年塑造全球汽车市场”。

海盗湾在被警方搜查 20 年后

2006 年 5 月 31 日,海盗湾成立不到三年,65 名瑞典警察进入了斯德哥尔摩的一个数据中心。在美国政府的压力下,作为刑事调查的一部分,他们奉命下线海盗湾的服务器。在警察进入数据中心前,海盗湾联合创始人Gottfrid Svartholm 和 Fredrik Neij 就感觉到情况不妙。他们注意到有密探跟踪他们。不过这一次警方的目标是他们的服务器。上午 10 点左右,Gottfrid 告诉 Fredrik 办公室来了警察。他让同事去托管机房销毁“罪证”。Fredrik 离开时,他意识到问题可能与他们的 torrent tracker 相关。为以防万一他决定对网站进行完整备份。当他到达托管机房时,他的担忧得到了证实。数十名警察带走了数十台服务器,其中大部分属于与海盗湾无关的客户。接下来几天,Fredrik 备份网站的决定显然是海盗湾历史上最关键的时刻。正因为有了备份,海盗湾团队才得以在三天内恢复网站。事件的处理方式也延续了海盗湾一贯的恶搞。他们将网站更名为“警察湾”(The Police Bay),设计了一个向好莱坞发射炮弹的新标志。几天后网站的标志被凤凰图案取代,象征着它从数字灰烬里重生。这次突击搜查非但没有让海盗湾倒闭,反而让它成为主流媒体关注的焦点,而很大程度上这要归功于网站的快速恢复。媒体的报道也引发了网站流量的激增,与好莱坞的预期结果相反。20 年后,海盗湾仍然还是那个海盗湾。

Red Hat 官方 NPM 账号被入侵,软件包被植入恶意程序

Red Hat 官方 NPM 账号 @redhat-c​​loud-services 被入侵,该账号相关联的多个软件包植入了窃取凭证的恶意程序。恶意程序旨在窃取 GitHub Action Secret、以及 AWS、GCP、Azure、Kubernetes、HashiCorp Vault、npm 和 CircleCI 等的凭证,它还是一种能自我传播的蠕虫,会利用窃取的 npm 令牌和 npm 的 bypass_2fa 参数,自动重新发布其它软件包的后门版本。Red Hat 在一份声明中表示,恶意软件包已经移除,它仍然在进行调查,初步分析未发现对客户或合作伙伴环境或 Red Hat 生产系统造成任何影响。

Anthropic 申请 IPO

Anthropic 已向美国证券交易委员会(SEC)秘密提交了 IPO 招股说明书。该公司表示在 SEC 完成审查之后,将根据市场状况等因素选择上市。Anthropic 的估值今年以来出现了爆炸式增长,在上周的最新一轮融资中估值达到了 9650 亿美元,超过了 OpenAI 在 3 月下旬的 8520 亿美元估值。美国股市即将迎来三家万亿市值公司的上市,SpaceX 预计本月上市,Anthropic 竞争对手 OpenAI 预计会很快递交上市申请,三家公司的市值预计将达到 4 万亿美元。

黑客利用 Meta AI 机器人接管 Instagram 名人账号

亲伊朗黑客诱骗 Meta AI 机器人短时间内接管了多个 Instagram 名人账号,其中包括奥巴马和美国太空军总军士长(Chief Master Sergeant),之后在账号上发表了亲伊朗的图片和信息。攻击方法非常简单:首先使用 VPN 连接到目标用户常住地附近,然后请求重置账号密码,要求与 Meta AI 客服对话,指示 AI 将目标账户关联到一个新邮箱地址,AI 按指示向该邮箱地址发送一次性验证码后,攻击者就可以重置密码接管账号。目前 Telegram 上已经出现了大量交易被接管账号的频道。Meta 的 Andy Stone 声称该公司已经采取行动解决了问题。

三种埃博拉疫苗在研发中

The International Aids Vaccine Initiative(IAVI)、牛津大学以及 Moderna 公司正在研发针对埃博拉病毒的疫苗。IAVI 表示正在刚果民主共和国爆发的埃博拉疫情可能是至今最严重的。疫情发生在冲突地区,已经报告了逾千例疑似病例,邻国乌干达已确诊 9 例。目前已知有六种埃博拉病毒株,只有三种会引发疫情。最常见的 Zaire 毒株已有针对性的疫苗,但此次爆发的是比较罕见的 Bundibugyo 毒株,目前还没有针对它的疫苗。Moderna 公司宣布将利用 mRNA 技术研发针对 Bundibugyo 毒株的疫苗。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK