TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0900
THU, JUN 18, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · THU, JUN 18, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

June 18, 2026

Here is a summary of today's main news events.

U.S. and Iran Sign Deal, Easing Geopolitical Tensions

What: The United States and Iran signed an interim agreement to wind down their conflict and reopen the critical Strait of Hormuz shipping lane. Why: The deal aims to de-escalate military tensions. The news immediately caused global oil prices to drop and stock markets to rise, as investors reacted positively to reduced geopolitical risk and the potential for lower inflation.

Federal Reserve Signals Future Interest Rate Hikes

What: Federal Reserve officials indicated that an interest rate increase is likely by the end of the year, even as they held rates steady for now. Why: The potential for higher rates is a move to combat persistent inflation. The "hawkish" signal caused stock prices to fall, bond yields to climb, and the U.S. dollar to strengthen against other currencies.

AI Investment Heats Up with Major Funding and IPO Preparations

What: The boom in Artificial Intelligence continues, with startup Baseten raising $1.5 billion in a new funding round. In parallel, investment banks Goldman Sachs and Morgan Stanley are reportedly forming special teams to manage the highly anticipated IPOs of AI leaders OpenAI and Anthropic. Why: The moves highlight intense investor and corporate interest in capitalizing on the rapidly advancing AI industry.

Mixed News for Consumers as Gas Prices Fall but Electronics Costs Rise

What: Average U.S. gas prices have dipped below $4 a gallon, offering some relief to drivers. However, a persistent global memory-chip shortage is driving up the cost of electronics, with corporate executives stating that price increases for consumers are "unavoidable." Why: The conflicting price trends show how different sectors are being affected by global supply and demand issues, from falling oil prices to a crunch in semiconductor manufacturing.

Deluxe Acquires Celero Commerce for $625 Million

What: Business technology and payments company Deluxe announced its purchase of Celero Commerce, a firm specializing in services for small and midsize businesses. Why: The $625 million deal is a strategic acquisition for Deluxe to expand its portfolio and service offerings for the small-business market.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - June 18, 2026

Hacker News Feed: Highlighting key posts and discussions.

DeepSeek Introduces Vision

(chat.deepseek.com)

268112
I hate compilers

(xeiaso.net)

11694
SteamOS Linux 3.8 released as stable

(store.steampowered.com)

19262
Midjourney Medical

(www.midjourney.com)

1013701
Clojure Hosted on Go

(github.com)

17722
U.S. science is in chaos

(www.scientificamerican.com)

8421014
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - June 18, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. We introduce RNG-Bench (Reconstructive Non-Markov Games), a benchmark suite designed to isolate a base model's ability to reconstruct past observations and act on them during multi-step interaction. RNG-Bench includes two complementary games: Matching Pairs, where card identities briefly revealed at specific locations must later be recalled, and 3D Maze, where egocentric views must be integrated into a spatial map. Both games are evaluated under a unified harness with three controlled difficulty axes: grid size, visual pattern, and observation modality. The benchmark further introduces a head-to-head duel protocol to control for instance-level variance and a Memory Gap metric that disentangles forgetting from poor action selection. The hardest configurations require contexts of roughly 128K tokens and 350 image inputs per episode, and remain far from saturated by frontier MLLMs. Memory Gap analysis shows that most residual errors stem from forgetting earlier observations rather than from suboptimal decision making. Finally, fine-tuning Qwen3.5-9B on optimal-policy rollouts and filtered model demonstrations improves performance on RNG-Bench and transfers to existing benchmarks without degrading general multimodal capability.

35
Guava: An Effective and Universal Harness for Embodied Manipulation

Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, and control. However, it remains unclear what makes an effective harness for embodied manipulation, and to what extent such a harness can unlock embodied capabilities in a wide range of reasoning models. In this work, we present Guava, a harness framework for embodied tool use developed through systematic exploration of the design space of agent workflows, action spaces, and observation spaces. Our study identifies three key ingredients for effective embodied agents: iterative perception-reasoning-action loops, semantic action abstractions, and multimodal observations. To understand whether these design principles are universal even to small models, we develop an end-to-end training pipeline that distills embodied manipulation capabilities into a 4B open-source model using fewer than 2K trajectories collected entirely in simulation. Experimental results in both simulation and real-world environments show performance comparable to frontier proprietary models while exhibiting strong generalization to unseen objects, novel instructions, and long-horizon tasks. Results suggest that a well-designed harness can serve as a scalable, model-agnostic interface for embodied manipulation, enabling strong emergent embodied capabilities in compact open-source models with minimal training data.

21
Kairos: A Native World Model Stack for Physical AI

World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kairos, a native world model stack designed around these requirements. (1) Kairos learns the world by pioneering a Native Pre-training Paradigm governed by a Cross-Embodiment Data Curriculum, which organizes open-world videos, human behavioral data, and robot interactions into a progressive developmental pathway. (2) Kairos maintains the world by unified world understanding, generation, and prediction within a Native Unified Architecture equipped with Hybrid Linear Temporal Attention, where sliding-window attention captures local dynamics, dilated sliding windows capture mid-range dependencies, and gated linear attention maintains persistent global memory. We establish formal theoretical bounds demonstrating that this temporal factorization strictly limits error accumulation, mathematically guaranteeing state propagation across extended horizons. (3) Kairos runs the world by incorporating a Deployment-Aware System Co-Design to support low-latency rollout generation on server and consumer-grade hardware for real-world observation-action-feedback loops. Experiments on embodied world-model, long-horizon, and action-policy benchmarks show that Kairos achieves top level performance while offering a strong efficiency-capability trade-off. Together, these results position Kairos as a cohesive operational foundation for future self-evolving physical intelligence.

20
SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

Sparse Autoencoders (SAEs) decompose residual-stream activations into interpretable features. Recent latent-space defenses increasingly rely on these decompositions, assuming that identified "unsafe" SAE features serve as actionable handles for monitoring and intervention. In this paradigm, clamping a specific harmful feature is expected to reliably prevent model misbehavior. However, we show that this success may hide a recoverable failure mode: the clamp may block one visible route to a behavior without eliminating the behavior itself. We formulate this vulnerability as post-intervention recovery, a constrained residual-space optimization problem. Starting from the post-intervention residual state, we optimize residual perturbations to recover the pre-intervention behavior while preserving the post-intervention values of the targeted SAE features. Even under a strong threat model where the intervention remains active throughout optimization and generation, recovery remains possible. To rule out that recovery simply undoes the intervention, we use encoder-orthogonal updates for single-layer interventions and the corresponding feature-map Jacobian in the cross-layer setting. Across TPP, unlearning, IOI, and refusal steering experiments, this stress test reveals recoverable behavior despite successful feature-level intervention. Especially in the safety-critical refusal-steering setting, we achieve a 95.8% recovery rate on valid samples while keeping defended-feature relative drift to 0.131, substantially below suffix-based baselines. A recovery-path attribution analysis further localizes this recovery to the SAE reconstruction residual, the component left unexplained by the SAE. These results expose a gap between feature-level control and behavioral completeness: SAE features can support causal intervention, but controlling them does not guarantee control over the underlying behavior.

14
Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Spatial VLMs have made substantial progress in geometric perception, yet complex spatial reasoning requiring multi-step inference over depth, distance, and scene relations remains challenging. Moreover, different spatial queries call for fundamentally different strategies: some are best addressed through purely linguistic, step-by-step deduction, while others require explicit 3D grounding before quantitative inference. We present Dual-Path Spatial Reasoning via Reinforcement Learning for Spatial VLMs (SR-REAL), a unified framework that equips a spatial VLM with two complementary reasoning paths: Language-Only Reasoning (LOR), which performs step-by-step linguistic deduction, and Detect-Then-Reason (DTR), which detects 3D geometric cues (e.g., centers or bounding boxes) via region tokens before explicit geometric inference. SR-REAL begins with a cold-start supervised fine-tuning stage that constructs LOR and DTR chain-of-thought supervision and exposes a region-to-3D interface, followed by RL that optimizes the policy model with accuracy and format rewards; for DTR, a discrete center-based detection reward further refines geometric alignment. Across diverse spatial benchmarks, SR-REAL significantly outperforms spatial VLM baselines: (i) a single RL-trained model supports both reasoning paths, with DTR excelling in region-aware tasks through precise 3D localization and LOR enhancing general spatial reasoning; (ii) jointly training both paths fosters mutual reinforcement; (iii) high-quality, blended cold-start data is crucial for stable RL optimization; and (iv) the model generalizes across datasets and domains without per-task tuning, demonstrating positive transfer between LOR and DTR.

12
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-sensitive task, since it provides dense token-level teacher signals beyond hard coordinate labels. However, naive OPSD is not well suited to GUI grounding: OPSD evaluates the teacher on student-generated prefixes, the quality of coordinate-token teacher signals can degrade when the prefix has already deviated from the target coordinate, leading to unreliable teacher signal. To mitigate this, We propose quality-aware self-distillation for VLM-based GUI grounding, which improves coordinate-token teacher-signal quality through soft correctness-aware gating and teacher-probability scaling. The soft correctness-aware gate checks whether the teacher's current coordinate-token prediction can still be completed into the ground-truth box under the student-generated prefix. If not, the corresponding teacher signal is down-weighted. Teacher-probability scaling then uses the teacher's confidence as a lightweight factor to further calibrate the strength of the gated supervision. A key empirical finding is that neither component alone improves overall performance, whereas combining them consistently improves performance. This suggests that the two mechanisms play complementary roles: correctness-aware gating suppresses unreliable coordinate-token supervision, while teacher-probability scaling calibrates the strength of the remaining signals. Experiments across six GUI grounding benchmarks show that our method consistently improves the base model and outperforms strong baselines.

12
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive sampling decodes responses sequentially and a small number of long-tailed generations often determine completion time. Speculative decoding (SD) offers a natural way to address this bottleneck, as it is a well-established technique for serving fixed LLMs that reduces latency by rapidly drafting tokens and accepting them through parallel verification while preserving the target-model distribution. However, its practical speedups do not directly carry over to RL rollouts: (i) the evolving target policy makes any fixed drafter increasingly mismatched with the policy's output distribution; and (ii) active batch sizes shrink throughout rollout decoding, shifting decoding from compute-bound to memory-bound regimes where parallel verification can exploit underutilized compute. Therefore, accelerating RL rollouts requires both a drafter that remains effective under long, high-temperature generations from an evolving policy and system-aware use of SD that avoids compute-bound regimes. We present EfficientRollout, a system-aware self-SD framework designed to address this gap for RL rollouts. EfficientRollout induces a quantized drafter from the target model (i.e. self-speculative decoding), keeping it coupled to the evolving policy without separate drafter pretraining or online adaptation. It further coordinates a system-aware SD toggle policy with acceptance-aware draft-length adaptation, enabling speculation only in beneficial regimes while matching the drafting budget to evolving drafter quality. EfficientRollout reduces rollout and end-to-end latency by up to 19.6% and 12.7%, respectively, over an accelerated AR rollout baseline, while preserving final model quality.

10
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the current policy. To automate this process, we propose the LLM-as-Environment-Engineer framework in which the current policy model analyzes failure trajectories together with contextual information and proposes modifications to the next-stage training environment configuration. We also introduce MAPF-FrozenLake, a controllable testbed whose generator exposes multi-dimensional environment configurations, making it suitable for studying and benchmarking environment redesign. On this testbed, we condition the environment engineer on structured summaries of policy behavior, failure cases, and environment statistics, from which it produces the configuration for the next training stage. With Qwen3-4B as the backbone, our framework achieves the strongest aggregate performance on our benchmarks, outperforming larger proprietary LLMs (e.g., GPT, Gemini) and fixed-environment training baselines. We further analyze which forms of context are most effective, finding that successful environment updates rely on failure evidence and preserve configurations that already work. Interestingly, the current RL checkpoint serves as a better environment engineer than the original base model, suggesting that policy learning improves the model's ability to diagnose its remaining weaknesses.

9
Native Active Perception as Reasoning for Omni-Modal Understanding

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still scales with video length. We propose OmniAgent, the first native omni-modal agent that formulates video understanding as a POMDP-based iterative Observation-Thought-Action cycle. OmniAgent executes on-demand actions to selectively distill audio-visual cues into a persistent textual memory, effectively decoupling reasoning complexity from raw video duration. To operationalize this, we introduce (1) Agentic Supervised Fine-Tuning to bootstrap native active perception via best-of-N trajectory synthesis with dual-stage quality control, and (2) Agentic Reinforcement Learning with TAURA (Turn-aware Adaptive Uncertainty Rescaled Advantage), which leverages turn-level entropy to steer credit assignment toward pivotal discovery turns. Crucially, OmniAgent exhibits positive test-time scaling, where performance improves as the number of reasoning turns increases, validating the efficacy of active perception. Empirical results across ten benchmarks (e.g., VideoMME, LVBench) demonstrate that OmniAgent achieves state-of-the-art performance among open-source models. Notably, on LVBench, our 7B agent outperforms the 10times larger Qwen2.5-VL-72B (50.5% vs. 47.3%).

9
STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.

7
Sumi: Open Uniform Diffusion Language Model from Scratch

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none. A scratch-pretrained UDLM at scale would provide a clean reference point for studying scaling behavior, generation dynamics, controllability, and trade-offs against established autoregressive and masked diffusion models. To this end, we introduce Sumi ("ink" in Japanese), a fully open 7B uniform diffusion language model pretrained from scratch on 1.5T tokens. Sumi performs competitively with autoregressive models trained at comparable token budgets on knowledge, reasoning, and coding benchmarks, while under-performing on commonsense benchmarks, where our education-heavy data mixture is a likely contributor. We release our model weights, checkpoints, and full training recipe, including a complete specification of the data mixture over publicly available corpora. We hope this release enables the community to study native uniform diffusion at scale and catalyzes work on its as-yet poorly understood aspects.

7
Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot reveal whether a system, taken as a whole, preserves the cultural plurality it is meant to represent. We propose value diversity as a system-level evaluation axis for multicultural agent systems, defined through the dissimilarity between culturally conditioned agents' responses on a shared value survey. Using the World Values Survey, we evaluate 19 cultures and 18 backbone models across a wide range of system configurations. We find that diversity is largely uncorrelated with alignment, indicating that the two capture complementary system properties, and that current multicultural agent systems fall substantially below human societies in value diversity. Mixed-backbone systems narrow this gap but do not close it, and the gap persists across culture compositions and agent scales. Social interaction further erodes diversity by driving agents toward consensus, and a participatory budgeting case study shows that this homogenization narrows the breadth of collective decision-making. Together, our results establish value diversity as a distinct evaluation axis for multicultural multi-agent systems and reveal a persistent homogenization tendency in current LLM-based societies. Our code and data are publicly available at https://github.com/iNLP-Lab/MultiAgent-Diversity.

6
MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move in order to plan actions, reason about physical interactions, and synthesize realistic futures. We argue that 3D points in world coordinates provide a general representation that is class-agnostic, view-stable, compact, and directly useful for downstream tasks. We formalize the task of goal-conditioned 3D point motion forecasting: given a short visual history, a set of 3D query points on an object of interest, and a language description of the intended goal, the model predicts the future 3D trajectory of each point. We introduce a full stack to study this task at scale: (1) MolmoMotion-1M is a large corpus of action-described, object-grounded 3D point trajectories annotated from 1.16M unconstrained videos; (2) PointMotionBench is a human-verified benchmark spanning 111 object categories and 61 motion types; and (3) MolmoMotion is a general motion forecasting model that supports both autoregressive coordinate prediction and flow-matching-based trajectory generation. MolmoMotion accurately predicts diverse motion patterns with different language instructions, and significantly outperforms existing motion prediction baselines on PointMotionBench. Finally, we show that the learned 3D motion prior transfers well to downstream applications: it improves training efficiency and generalization for robot manipulation, and its predicted trajectories provide effective motion guidance for generative models to synthesize videos with more realistic object motion.

5
SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

Frontier scientific reasoning remains a major challenge for large language models (LLMs), where even the strongest commercial systems fall short of expert-level performance. A closer look at model behavior reveals substantial complementarity that single-model evaluation hides: different frontier models excel on different question types, and no single model captures the full picture. We present SciOrch, a framework that trains a lightweight 8B model to orchestrate frontier LLMs for scientific reasoning. The orchestrator decomposes each question, delegates sub-problems to selected commercial models through API calls, and synthesizes a final answer. Training such an orchestrator is fundamentally harder than conventional agentic RL: each action triggers an API call that is expensive in both dollar cost and latency, making standard online rollouts infeasible. We address this with MCTS-based approach, producing diverse orchestration trajectories, extracting per-node single-turn samples, and optimizing the orchestrator with GRPO-style training. On a 240-question test set spanning SGI-Reasoning and Scientists' First Exam, SciOrch reaches 56.66% average accuracy, outperforming the strongest single commercial model by 3.74% and the strongest multi-agent baseline by 3.33%. It also attains the best accuracy on both SGI and SFE with less than half the API cost of typical multi-agent methods.

3
PAIWorld: A 3D-Consistent World Foundation Model for Robotic Manipulation

World foundation models (WFMs) are powerful simulators, yet they predominantly operate in a single-view setting and lack the multi-view 3D consistency required for robotic manipulation. While robotic systems rely on multiple cameras (egocentric, eye-to-hand, and wrist-mounted) for policy learning, current multi-view world models simply concatenate view tokens without explicit geometric reasoning. This causes cross-view object drift, depth inconsistency, and texture misalignment. We trace these failures to two deficiencies: the absence of an explicit inter-view communication mechanism and the lack of a 3D geometric prior. We argue that resolving both simultaneously is necessary and sufficient. To address this, we present PAIWorld, a framework that augments diffusion-transformer world models via three core components: (1) Geometry-Aware Cross-View Attention blocks that establish an explicit pathway across views, (2) Geometric Rotary Position Embedding that encodes camera ray directions and extrinsic poses into the attention mechanism, and (3) Latent 3D-REPA, which distills 3D-aware features from frozen 3D foundation models to ensure 3D consistency. Built upon a DiT-based world foundation model, PAIWorld achieves state-of-the-art multi-view 3D consistency on robotic manipulation benchmarks, ranking 1st on the WorldArena leaderboard and 2nd on the AgiBot-Challenge2026 leaderboard, while enabling downstream applications such as model-based planning, world action models, and multi-view policy post-training.

3
ViT-Up: Faithful Feature Upsampling for Vision Transformers

Vision Transformers (ViTs) have become a dominant architecture for visual representation learning, providing exceptionally strong and broadly reusable backbone features. However, ViTs are commonly operated on relatively small patch-token grids due to the quadratic cost of global self-attention, which creates a persistent bottleneck for dense prediction tasks such as semantic segmentation and depth estimation. This has motivated the development of task-agnostic feature upsamplers. While recent state-of-the-art methods produce visually sharp dense representations, their reliance on shallow image encoders for guided upsampling can introduce feature leakage, fragmentation, and blur. We introduce ViT-Up, an implicit feature upsampling framework that replaces external image guidance with layer-wise query construction from intermediate ViT hidden states. This enables feature prediction at arbitrary continuous image coordinates while preserving alignment with the backbone feature space. Experiments demonstrate that ViT-Up consistently outperforms state-of-the-art image-guided upsamplers across dense prediction and semantic correspondence. On DINOv3-S+, ViT-Up improves over prior methods by up to +2.07 mIoU on Cityscapes and +4.17 [email protected] on SPair-71k. With the larger DINOv3-B backbone, these gains increase to +3.36 mIoU and +8.09 [email protected], demonstrating that ViT-Up scales favorably with backbone capacity.

2
IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

Industrial products such as valves and circuit breakers are defined by dense technical specifications that govern procurement, compatibility, and safety across supply chains. These specifications are scattered across multiple heterogeneous product images, including specification tables, nameplates, and technical drawings, yet whether Multimodal Large Language Models (MLLMs) can reliably recover them remains underexplored. To fill this gap, we introduce IndustryBench-MIPU, the first large-scale benchmark for multi-image industrial product understanding, built around structured attribute extraction -- recovering property-value pairs from product images. This task jointly probes text recognition on specification tables and nameplates, visual reasoning over technical drawings, domain knowledge to decode industrial terminology, and cross-image evidence integration to assemble scattered specifications. Concretely, the benchmark comprises 4,559 products across 27,652 images with 103,703 annotations spanning 18 industrial categories, constructed through multi-model consensus and three-tier quality assurance. Evaluating nine MLLMs under both single-image and product-level multi-image settings reveals a stark completeness gap: models achieve high precision (86--94%) but the best recovers only 49.9% of product-level attributes; moving from single-image to multi-image extraction costs 15--34 percentage points of recall. Multi-image completeness, not single-image accuracy, is the core bottleneck. Dataset and code are publicly available.

2
Learning User Simulators with Turing Rewards

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maximizing the log probability or by using a similarity reward. We instead propose {Turing-RL}: a Turing-Test-based reinforcement learning approach for training user simulator models. {Turing-RL} uses a discriminative Turing reward with an LLM judge to score how indistinguishable a generated response is from the real user's given the user's history, and the user simulator LLM learns to produce responses indistinguishable from what the user could have said with such rewards. Across two different domains--conversational chat and Reddit forum discussion--we find that {Turing-RL} consistently outperforms baseline methods on both LLM and human evaluation metrics. Our study suggests that optimizing for indistinguishability, rather than response matching, is effective for learning user simulators.

2
CEO-Bench: Can Agents Play the Long Game?

Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3) adapting to a changing world; (4) orchestrating multiple moving parts toward a coherent goal. We introduce CEO-Bench, which evaluates these capabilities together by simulating a representative real-world task: operating a startup for 500 days. An agent manages pricing, marketing, budgeting, and many other aspects of a fictional company through a programmable Python interface, operating in the same environment and facing the same challenges as a human CEO. Success demands analyzing noisy, interconnected business databases, translating signals into sound strategy, and coordinating many decisions with programming. The strongest agents write sophisticated code that simulates customer cohorts to forecast future cash and mines negotiation history to uncover hidden customer preferences. Even so, most state-of-the-art models struggle in this environment. Only Claude Opus 4.8 and GPT-5.5 finish above the $1M starting balance, and neither consistently turns a profit. CEO-Bench takes a first step toward measuring the intelligence required to drive sustained, adaptive progress over time.

2
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.

2
Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

We show the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs (+/-1) and confidence via their magnitudes, acting as independent binary registers; a feature is a subset of dimensions with a consistent sign pattern, read by counting sign agreements with no learned rotation. We validate this Bag of Dims framework across seven models spanning language (Qwen 3.5-4B, Gemma 3-4B, Mistral 7B, Qwen3-32B), vision (DINOv2, ViT-Base), and audio (AST). Signs alone carry predictive content: unit-magnitude sign patterns preserve 60-93% top-5 next-token accuracy through the LM head, and decoder-free Hamming scoring reaches 80-90% top-4096. From a single-token cache (one forward pass per token, no context, no labels), we detect 175 categories at AUC 0.97-0.99 by sign agreement; a trained probe adds only +0.018 AUC and converges to axis-aligned weights. These features are causally operative: they survive the K/V attention projections, trace to the FFN neuron coalitions that write them (random-weight controls never reproduce this), and flipping a feature's signs during the live forward pass suppresses its concept across four language models, magnitude-matched and concept-specific. Dimensions stay independent throughout (pairwise mutual information below 0.006 bits). The structure is not specific to language: the same per-dimension signs appear in self-supervised vision (DINOv2, 9/12 ImageNet superclasses), supervised vision (ViT-Base, 11/12), and audio (AST, 50/50 ESC-50 categories), so it reflects transformer training in general, not the language-modeling objective. The standard basis already suffices for feature reading at one forward pass, no optimization, no GPU-days. The open problem shifts from finding the right rotation to cataloging what each dimension encodes.

1
iOSWorld: A Benchmark for Personally Intelligent Phone Agents

A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalization. We introduce iOSWorld, the first interactive native iOS simulator benchmark built around a persistent user identity spanning 26 newly built iOS apps. These apps contain connected data such as transactions, messages, travel records, social relationships, and financial activity. iOSWorld includes 133 tasks across three increasingly difficult categories. Single-app tasks (27) test one app, multi-app tasks (60) span 2 to 8 apps, and memory and personalization tasks (46) require agents to infer patterns from personal data. We evaluate frontier and open-source computer-use models in both vision-only and privileged vision+XML settings. The best configuration reaches 52\% overall but only 37\% on multi-app tasks. Privileged vision+XML access improves frontier models by up to 26 percentage points, while smaller models do not benefit from added accessibility-tree input. We release iOSWorld as an open-source benchmark with all apps, seeded data, tasks, rubrics, and evaluation code.

1
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user's whole digital life, including their context, historical data, and logged-in accounts. This gap is widest on web tasks, where live web evaluations cannot exercise sites that require logging in or personal information, the kind of site a real personal assistant has to drive. We introduce MyPCBench, which tests computer-use agents as personal assistants on a Linux desktop populated with 17 simulated real-world web applications and a full desktop stack, all seeded for one canonical persona, Michael Scott from The Office. We define 184 tasks in this environment, each inspired by a real request drawn from the OpenClaw community, and benchmark six closed and open-weight models with a uniform computer+bash tool surface. We find that the best model, Claude Opus 4.6, fully solves 55.4\% of the tasks, the only model above 50\%. Model failures cluster on tasks that span many applications and on long trajectories, where personalization stresses an assistant the most. We release the environment, task set, and agent harness at https://mypcbench.com.

1
RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. We observe that the gradient signal in GRPO concentrates on tasks with the highest rollout reward variance, a consequence of the Popoviciu upper bound. Consequently, samples near the agent's capability boundary -- where successes and failures are roughly balanced -- contribute disproportionately large policy gradients. As training progresses, this boundary continuously shifts, which gradually depletes the pool of informative samples in a static dataset. We propose RODS (Reward-driven Online Data Synthesis) to resolve this depletion. RODS closes the loop between RL training and data generation by repurposing the progress reward variance as a practical, zero-cost boundary detector that requires no extra inference beyond the rollouts already computed for training. It continuously identifies such boundary samples, synthesizes new multi-turn variants matching their structural complexity (e.g., API topology and dependency depth) via a skill-aligned resampling pipeline, and manages a dynamic replay buffer that co-evolves with the policy. Starting from 400 human seeds and maintaining an active training pool of ~800 samples, RODS achieves comparable performance to a 17K-sample offline pipeline while requiring roughly 20x fewer trajectories, and improves over fixed-data RL and environment augmentation in our controlled setting.

1
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Turkish is agglutinative: meaning is carried by morphemes, yet the subword tokenizers that drive modern language models split words by corpus statistics, fragmenting semantically loaded suffixes and -- in the case of WordPiece and rule-based analyzers -- failing to decode their output back to the original text. This paper presents Morpheus, a neural morpheme-boundary model for Turkish that is at once a lossless, morphology-aware tokenizer and a word-embedding producer. A differentiable Poisson-binomial dynamic program turns per-character boundary probabilities into soft morpheme memberships during training and exact segments at inference, with no string normalization, so decode(encode(w)) = w holds by construction. Because the model is neural, the same forward pass that tokenizes also emits a structured word embedding. Among reversible tokenizers -- the only ones valid for generation -- Morpheus attains the lowest bits-per-character (1.425), roughly doubles the gold morphological alignment of the subword family (MorphScore macro-F1 0.61 vs.\ {sim}0.32), and uses {sim}19% less GPU memory than 64K-vocabulary subword tokenizers. As an embedder, frozen Morpheus vectors lead on lexical retrieval (root-family MAP 0.85) and same-root verification (ROC-AUC 1.00), surpassing the multilingual retriever BGE-M3 and BERTurk; on context- and inflection-dependent tasks (NER, case/number probing) the heavier contextual encoders remain ahead -- a trade-off we attribute to Morpheus's root-centric geometry. Code: https://github.com/lonewolf-rd/TurkishMorpheus; model: https://huggingface.co/lonewolflab/Morpheus-TR-50K; interactive demo: https://huggingface.co/spaces/lonewolflab/morpheus-tr-demo.

1
Physics-IQ Verified

Video generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a good video model must understand the physical reality of the world. Evaluating this understanding is an emerging field and has led to the Physics-IQ benchmark, which quantifies this explicitly by comparing model-generated videos to real-world videos of physical experiments. In this work, we present a systematic audit of the Physics-IQ benchmark, expose shortcomings and propose three solutions that sharpen how we can measure physical understanding of VGMs. Specifically, we improve prompt and ground-truth quality to reduce the influence of confounding factors and further introduce a sample-level scoring system that weights each sample and metric equally. Our resulting benchmark, Physics-IQ Verified, refines 57.6\% of all samples and improves over 34.8\% of prompts. In a comparison study using six image-to-video generative models, we observe moderate but meaningful ranking changes (Kendall's τ= 0.46). We hope Physics-IQ Verified advances the community by providing a more reliable signal toward physically accurate VGMs. The code for the benchmark can be accessed at https://github.com/google-deepmind/physics-iq-benchmark

1
A Benchmark and Framework for Evaluating Next Action Predictions in Spreadsheets

Predictive code completion greatly accelerates how quickly developers work. In spreadsheets, despite being much more common, such auto-completion features are virtually non-existent. To address this gap, we introduce a benchmark for systems that observe a sequence of user actions in a spreadsheet and predict future actions. Two challenges are (1) the absence of edit histories in public spreadsheet corpora and (2) the complex space of spreadsheet actions (spatial, temporal, composite). To address (1), we manually curate 52 sequences of 12K actions that recreate spreadsheets from public corpora, seeded by parametrized heuristics and LLM refinement. To address (2), we propose an online evaluation that expects a prediction after each user action, accepts or rejects that prediction, updates the future actions upon acceptance, and repeats this until the target spreadsheet is obtained. We use multiple baseline predictors (including zero-shot LLMs, fine-tuned SLMs, and classical models) and analyze different properties that our benchmark teaches us, including but not limited to: properties of saved actions and false positives, efficiency, effect of user profiles, effect of triggers, and effect of context.

0
LLM-Enabled NWDAF: A Step Toward AI-Native 6G Network Intelligence

The Network Data Analytics Function (NWDAF) is central to enabling zero-touch network management in fifth-generation (5G) networks by supporting real-time analytics and closed-loop automation. Despite its critical role, open-source NWDAF implementations remain limited in scope and accessibility. In this paper, we develop an open-source NWDAF, compatible with the open-source core network Free5GC, that collects network data via subscriptions to Network Functions (NFs), and also includes an integrated Large Language Model (LLM) interface that enables natural language interaction with human operators. The interface processes user intents, encodes them using a semantic embedding model, and maps them to one of seven predefined intent categories to trigger analytics queries or event subscription commands. This architecture abstracts the complexity of traditional interfaces, allowing non-expert users to manage network analytics and subscriptions with ease. The system supports Access and Management Function (AMF) and Session Management Function (SMF) event subscriptions, real-time monitoring, and analytics retrieval via Prometheus, all accessible through a conversational interface. By bridging AI-driven intent recognition with standardized network analytics, our implementation enhances operator usability and provides a foundation towards AI-native 6G networks. The source code and datasets generated during the current study are available in the github repository, https://github.com/HenokDanielbfg/testbed.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - June 18, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Japanly AEO icon
Japanly AEO

See if Japanese AI search recommends your brand

0
Honestly icon
Honestly

See what Reddit and TikTok honestly think about your product

0
Ploy.ai icon
Ploy.ai

Ploy turns your website into your company's growth engine.

0
Upstream icon
Upstream

The inbox designed for humans and agents

0
Tiles: Map Your Adventures icon
Tiles: Map Your Adventures

Turn Apple Health workouts into a private route map

0
Adapt icon
Adapt

The company brain that gets work done

0
Grayscale for Safari icon
Grayscale for Safari

Turn Safari black & white and browse with less distraction

0
Otty icon
Otty

A Mac native and beautiful terminal emulator

0
Agentic videos by D-ID icon
Agentic videos by D-ID

Interactive videos that talk back

0
CashOut icon
CashOut

Block sports betting apps, track how much you saved

0
Jesse icon
Jesse

Stop building Apollo/Clay lists. Search the live internet.

0
AI‑Native eCommerce Infrastructure icon
AI‑Native eCommerce Infrastructure

A unified control plane for Magento with Claude Code web

0
Tine icon
Tine

An AI desktop cursor that does the work for you

0
Merlin by Encord icon
Merlin by Encord

Manage your AI data infrastructure in a single conversation

0
Genie Mentions icon
Genie Mentions

AI that gets you *and* the people in your life, together

0
VELA icon
VELA

Securely execute AI-generated & untrusted code

0
VoiceOS icon
VoiceOS

A voice assistant that's a real JARVIS for your computer

0
The City Mesh icon
The City Mesh

Fund your project with a 3D city of sponsors

0
Buddy icon
Buddy

Free Figma agent + Import anything to Figma

0
Elvin icon
Elvin

Proactive AI that finds and finishes work before you ask

0
Locofy: design-to-code agents icon
Locofy: design-to-code agents

Agentic frontend layer between Figma and Cursor & Claude

0
Tabstack Dev Tools icon
Tabstack Dev Tools

Ditch your scraper. Make one API call with any tool.

0
Splice icon
Splice

Emojis and GIFs, Anywhere on Your Mac

0
Speed Reader icon
Speed Reader

Read 2–5x faster with zero effort

0
InstantDelay icon
InstantDelay

Add, remove, or adjust stream delay while already live

0
DeskArcade icon
DeskArcade

An arcade in your menu bar - playable over anything

0
Viktor for Microsoft Teams icon
Viktor for Microsoft Teams

The most powerful AI employee, now in Microsoft Teams

0
Juno icon
Juno

Free, local AI powered Voice to Text w/ live transcriptions

0
Refuse icon
Refuse

Block vulnerable package installs for you and your AI

0
LayerProof Bristol icon
LayerProof Bristol

Agentic reports your clients want to read

0
CADAM icon
CADAM

AI Tinkercad

0
Tabnxt icon
Tabnxt

AI tab manager that suspends background RAM hogs

0
Labs AI icon
Labs AI

Turn any text into natural AI voiceovers on iPhone

0
Retool icon
Retool

Build anywhere. Govern in Retool.

0
Cliptop icon
Cliptop

Clipboard history for Mac, right under the notch.

0
Snapchat SPECS icon
Snapchat SPECS

Powerful computer built into lightweight see-through glasses

0
SuperGoal icon
SuperGoal

World cup in your menu bar

0
memi icon
memi

The AI agent harness for product design teams

0
Polygram Coding Agent icon
Polygram Coding Agent

AI-native coding assistant that helps developers in any IDE

0
MCP 2000 icon
MCP 2000

AI Drum Machine MPC in your browser

0
Parano.ai icon
Parano.ai

Never miss a competitor's move

0
NotchSpace icon
NotchSpace

Turn your Mac notch into an intelligent active workspace

0
Henji icon
Henji

AI replies that are trained to sound like you

0
Swytchcode CLI icon
Swytchcode CLI

Give agents reliable access to 2,000+ APIs w/ durable state

0
Restaurant Menu Visualizer icon
Restaurant Menu Visualizer

Share the menu with Ava and see what each dish may look like

0
Mirlo icon
Mirlo

Social media for real connections. No likes, no algorithm.

0
Dualora icon
Dualora

Record in both 16:9 and 9:16 at the same time

0
TapSign icon
TapSign

Send, sign & manage documents easily

0
ClipDone icon
ClipDone

Automatic short-form video editing

0
Locus Founder icon
Locus Founder

Text an AI agent and it builds + runs your business

0
06

TECHMEME

06.00
TECHMEME

Techmeme - June 18, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Sources: the early Chinese backers of Manus, including HSG and Tencent, plan to buy the AI startup back from Meta at the $2B price Meta paid (The Information)
Source: TechmemePublished: Jun 18, 2026

The Information : Sources: the early Chinese backers of Manus, including HSG and Tencent, plan to buy the AI startup back from Meta at the $2B price Meta paid —  The early Chinese backers of AI firm Manus are planning to buy the firm back from Meta Platforms at the $2 billion price Meta paid …

Architect Labs, which aims to use AI to cheapen and speed up the process of designing custom chips, raised a $24M seed led by Kindred Ventures (Max A. Cherney/Reuters)
Source: TechmemePublished: Jun 18, 2026

Max A. Cherney / Reuters : Architect Labs, which aims to use AI to cheapen and speed up the process of designing custom chips, raised a $24M seed led by Kindred Ventures —  Architect Labs said on Thursday it had raised $24 million in seed funding to build a company that will use artificial intelligence to speed and ease the design of custom chips.

Sources: the EU is set to unveil its preliminary findings that AWS and Azure likely meet the criteria for regulation under the DMA as early as next week (Bloomberg)
Source: TechmemePublished: Jun 18, 2026

Bloomberg : Sources: the EU is set to unveil its preliminary findings that AWS and Azure likely meet the criteria for regulation under the DMA as early as next week —  Microsoft Corp.'s Azure and Amazon Web Services are on a collision course with the European Union's tough digital competition rulebook …

As David Sacks steps back and Sriram Krishnan prepares to leave, Commerce Secretary Howard Lutnick and others are leading AI policy in the Trump administration (Maria Curi/Axios)
Source: TechmemePublished: Jun 18, 2026

Maria Curi / Axios : As David Sacks steps back and Sriram Krishnan prepares to leave, Commerce Secretary Howard Lutnick and others are leading AI policy in the Trump administration —  Like its AI policy, the Trump administration's AI team is taking shape on the fly. … But with Sacks stepping …

Hive stock jumps 5%+ after announcing a $220M, three-year GPU cloud contract with Bell Canada and Cohere, as it pivots from bitcoin mining (James Van Straten/CoinDesk)
Source: TechmemePublished: Jun 18, 2026

James Van Straten / CoinDesk : Hive stock jumps 5%+ after announcing a $220M, three-year GPU cloud contract with Bell Canada and Cohere, as it pivots from bitcoin mining —  HIVE Digital Technologies (HIVE) shares jumped 10% in pre-market trading on Thursday after the company announced a $220 million …

Former Gojek CEO and Indonesian education minister Nadiem Makarim is charged with taking ~$46M in rewards tied to a Chromebook procurement contract for schools (Bloomberg)
Source: TechmemePublished: Jun 18, 2026

Bloomberg : Former Gojek CEO and Indonesian education minister Nadiem Makarim is charged with taking ~$46M in rewards tied to a Chromebook procurement contract for schools —  The trial of one of Southeast Asia's most successful entrepreneurs is adding to concern about governance under President Prabowo Subianto.

An in-depth look at Meta's AI-fueled rampage through its engineering organization, reassignments of engineers on core teams to data labeling, and more (Gergely Orosz/The Pragmatic Engineer)
Source: TechmemePublished: Jun 18, 2026

Gergely Orosz / The Pragmatic Engineer : An in-depth look at Meta's AI-fueled rampage through its engineering organization, reassignments of engineers on core teams to data labeling, and more —  Leadership at the social media giant has been on an AI-fueled rampage through its engineering org.  We report what's happened

Prem AI, a Swiss startup that lets hedge funds and law firms run AI models on their own infrastructure, is raising a $100M Series A, targeting a $500M valuation (Natalia Kniazhevich/Bloomberg)
Source: TechmemePublished: Jun 18, 2026

Natalia Kniazhevich / Bloomberg : Prem AI, a Swiss startup that lets hedge funds and law firms run AI models on their own infrastructure, is raising a $100M Series A, targeting a $500M valuation —  Prem AI, a Swiss startup that helps companies including hedge funds and law firms run artificial intelligence models …

Accenture says it will buy a majority stake in Dragos and fully acquire runZero and NetRise in a combined deal for the cybersecurity startups valued at $4.18B (Anhata Rooprai/Reuters)
Source: TechmemePublished: Jun 18, 2026

Anhata Rooprai / Reuters : Accenture says it will buy a majority stake in Dragos and fully acquire runZero and NetRise in a combined deal for the cybersecurity startups valued at $4.18B —  Consultancy firm Accenture (ACN.N) said on Thursday it will buy a majority stake in Dragos and fully acquire runZero and NetRise in a combined deal valued at $4.18 billion.

Verse Enterprises, which wants to provide energy-management software for 100 data centers by 2027, raised a $54M Series B from Nvidia and others (Summer Maxwell/Bloomberg)
Source: TechmemePublished: Jun 18, 2026

Summer Maxwell / Bloomberg : Verse Enterprises, which wants to provide energy-management software for 100 data centers by 2027, raised a $54M Series B from Nvidia and others —  Verse Enterprises Inc. is betting batteries and solar can help data centers skip the line.  —  Long queues to get connected to the strained energy grid …

Trump, who entered office opposing AI regulation, is shaping the industry through case-by-case interventions without clear rules, creating major uncertainty (Axios)
Source: TechmemePublished: Jun 18, 2026

Axios : Trump, who entered office opposing AI regulation, is shaping the industry through case-by-case interventions without clear rules, creating major uncertainty —  The Trump administration entered office promising to get government out of the AI industry's way.  It hasn't worked out that way.

AI inference startup Baseten is raising $1.5B in a dual-tiered deal, with some investors putting in money at an $11B valuation and others at a $13B valuation (Angel Au-Yeung/Wall Street Journal)
Source: TechmemePublished: Jun 18, 2026

Angel Au-Yeung / Wall Street Journal : AI inference startup Baseten is raising $1.5B in a dual-tiered deal, with some investors putting in money at an $11B valuation and others at a $13B valuation —  Baseten, part of a growing Silicon Valley ecosystem offering services to enable low-cost AI models, is raising $1.5 billion in a new round

Report: DeepSeek's first external funding round has a non-negotiable term for investors to not poach its staff or encourage them to start their own companies (CNBC)
Source: TechmemePublished: Jun 18, 2026

CNBC : Report: DeepSeek's first external funding round has a non-negotiable term for investors to not poach its staff or encourage them to start their own companies —  China's DeepSeek has a precondition for its $7.4 billion maiden fundraise: no poaching the AI lab's talents.

Guardrails Alliance, a super PAC that has raised $5M, debuts to advocate for AI safety legislation and counter pro-industry lobbying, running ads for Alex Bores (New York Times)
Source: TechmemePublished: Jun 18, 2026

New York Times : Guardrails Alliance, a super PAC that has raised $5M, debuts to advocate for AI safety legislation and counter pro-industry lobbying, running ads for Alex Bores —  The Guardrails Alliance, which has raised $5 million, is positioning itself as a populist effort that will take on the pro …

Dream, co-founded by ex-NSO Group CEO Shalev Hulio with a focus on protecting critical infrastructure, raised $260M at a $3B valuation, up from $1B in 2025 (Galit Altstein/Bloomberg)
Source: TechmemePublished: Jun 18, 2026

Galit Altstein / Bloomberg : Dream, co-founded by ex-NSO Group CEO Shalev Hulio with a focus on protecting critical infrastructure, raised $260M at a $3B valuation, up from $1B in 2025 —  Dream, an Israeli artificial intelligence company that provides AI and cybersecurity services to governments and critical infrastructure operators …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - June 18, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - June 18, 2026

Solidot Feed: Highlighting essential tech & open-source news.

三个安全启动证书即将过期

三个微软在 2011 年颁发的安全启动 (Secure Boot) 证书将于 6 月 24 日过期。安全启动检查系统启动期间加载的所有固件的数字签名,确保其来自可信提供商。安全启动旨在设计阻止会纂改 UEFI 的恶意程序 UEFI bootkits,一旦安装此类恶意程序很难检测到,即使重装系统也没用。安全启动使用加密签名确保启动过程中加载的每个固件都受到计算机制造商的信任,它旨在建立信任链,防止攻击者用恶意固件替换预期的启动固件。但在 2023 年研究人员发现了存在于几乎所有 Windows 和 Linux 系统 UEFI 启动过程中的严重漏洞 LogoFail。该漏洞存在于启动时显示硬件制造商徽标的软件中,攻击者能利用其图像解析 bug 绕过安全启动,用恶意固件感染 UEFI。微软因此移除了三个在 2011 年颁发的旧证书,用 2023 年颁发的新证书取代。Windows 用户可通过 Windows 安全设置 > 设备安全性 > 安全启动 去检查证书是否已经更新。Linux 用户可关注名叫 shim 的程序更新。

摩根大通高盛禁止香港员工使用 Anthropic 模型

美国投行摩根大通已禁止香港员工访问 Anthropic 的模型,显示这一技术在美国境外的应用正面临极其严格的审查。由于 Anthropic 与摩根大通的许可协议中有关“使用条款”的特定措辞,摩根大通已将 Claude 模型从其驻港员工获批使用的大型语言模型(LLM)内部名单中移除。在此之前,高盛也做出了类似决定,于 4 月将 Claude 从其香港员工的获准使用工具名单中剔除。今年 4 月 Anthropic 首次向少数企业和机构开放 Mythos 模型测试,并警告该模型具备发现网络安全漏洞的能力,不宜广泛推广。6 月初 Anthropic 发布了 Mythos 级模型的首个公开版本 Fable 5,但为管控其突破网络漏洞的能力,同步设置了许多限制措施。然而华盛顿仍以国家安全为由下达紧急出口管制令,迫使 Anthropic 在全球范围内关停 Mythos 5 和 Fable 5 模型。

诺和诺德 1.3 TB 内部数据被盗,被勒索 2500 万美元

勒索组织 FulcrumSec 宣称入侵了制药巨头诺和诺德(Novo Nordisk)的网络,窃取了约 1.3 TB 的数据,包括源代码、药物研究、临床试验记录、员工和医生信息、生产系统信息以及内部 AI 模型数据。它向诺和诺德勒索 2500 万美元赎金,但未获成功,因此考虑出售部分数据。FulcrumSec 称诺和诺德的代表于 6 月 3 日联系了他们。FulcrumSec 表示考虑通过开源来遏制企业不想支付赎金的情况。诺和诺德发言人表示它正与相关机构保持联系。

科学家将鼠疫追溯到 5500 年前

科学家发现了已知最古老的鼠疫证据,将其出现的时间追溯到约 5500 年前——比之前认为的早了约 200 年。研究人员在西伯利亚贝加尔湖附近的四个墓地寻找鼠疫杆菌的痕迹。他们在 18 位古代狩猎采集者的牙齿中发现了鼠疫 DNA 残留。对骨骼碳年代测定显示,发现这场瘟疫引发了两波疫情,第一波出现在 5500 年前。病菌可能是通过土拨鼠传播的,当地人可能是通过食用生内脏或屠宰过程中接触携带病菌的兽皮而感染鼠疫。死者中有很多是 8-11 岁幼童。早期的鼠疫和中世纪的黑死病同样致命,不仅摧毁人口稠密的城市,也摧毁小型游牧狩猎采集群体。

调查显示中国三分之一青少年睡眠质量差

山西大学研究人员在 PLOS One 上发表了一篇论文,指出青少年的心理健康、体重指数以及屏幕时间与睡眠质量有显著联系,且女孩和生活在农村地区的青少年睡眠质量往往较差。研究人员调查了中国六个城市的 5,713 名 13-18 岁青少年,这六个城市分别是:上海、苏州、太原、婺源、兴义和乌鲁木齐。他们使用匹兹堡睡眠质量指数(PSQI)收集了睡眠质量数据,同时还收集了 BMI、体质健康、静坐时间、屏幕使用时间及心理健康等数据。此外还获得了每位参与者的居住地(城市或农村)和性别信息。总体上有 33.71% 的受访者睡眠质量不佳。他们发现不同居住地点和性别之间存在显著差异。农村青少年睡眠质量不佳的比率高于城市青少年(分别为 35.78% 和 31.90%),在入睡时间、睡眠时长和睡眠干扰几个方面的表现均较差。女孩在几乎所有睡眠衡量指标方面上的表现均不及男孩,女孩睡眠质量较差者的比率为 38.40%,而男孩为 29.20%。较高的体重指数对女孩的睡眠有更显著的不利影响。

法国物理学家和科普名人因论文抄袭被剥夺博士学位

法国物理学家和科普名人 Étienne Klein 因论文抄袭被剥夺博士学位。他是 Alternative Energies and Atomic Energy Commission (CEA)的物理学家,出版了 30 多本书,主持一档每周播出的科普节目。自 2016 年以来他就面临着科普文章抄袭的指控。2024 年 8 月他的博士论文也受到质疑。他是在 1999 年获得博士学位,他的大学目前被合并为巴黎城市大學。分析显示,这篇博士论文五分之一的版面涉嫌抄袭,抄袭的内容包括作家加缪(Albert Camus)、物理学家德布罗意(Louis de Broglie),甚至还有论文委员会成员的论文。巴黎城市大學随后展开了调查,发现论文近三分之二的内容存在抄袭,因此撤销了他的博士学位。Klein 回应了指控,辩解称他阅读了大量书籍,可能不知觉的将其吸收的内容写入到论文中。

中国汽车占欧洲新车销售的比例将超过 10%

智库 Rhodium Group 的统计显示,截至 2025 年 12 月,中国生产的汽车占欧盟新车销售的 9.3%,比 2023 年 1 月上升 7.1 个百分点。预计 2026 年将超过 10%。从中国以外的第三国出口到欧洲等的中国品牌车的比例也在 2025 年 12 月达到 6.2%,增加 5.5 个百分点。欧盟从 2024 年秋季开始对中国产纯电动汽车加征关税。不过,中国企业增加了不属于加征对象的插电式混合动力车(PHV)的出口,势头并未减弱。 中国整车企业也陆续开设欧洲基地,进行采购和生产。

苹果准备涨价

苹果成为 AI 热导致内存短缺而涨价的最新一家公司。即将卸任的苹果 CEO 库克(Tim Cook)表示,内存供应状况“难以为继”,涨价“不可避免”。他没有透露何时涨价,也没有说明哪些产品会涨价,以及即将于 9 月发布的下一代 iPhone 18 是否会受到影响 。库克说,“在消费者急需设备时内存供应在减少,而内存厂商却选择大幅涨价。我们迫切需要内存价格和供应恢复到消费产品的合理水平。这是最为重要的。”内存价格自 2025 年 10 月以来翻了一番多。

美国暂缓将 DeepSeek 加入黑名单

美国暂缓将 DeepSeek 和长鑫存储等公司加入贸易黑名单以免中美关系再次紧张。如果被加入贸易实体清单,美国公司未经许可不得向其出口商品、软件和技术,而许可通常不会被批准。美国自去年十月以来就没有再更新实体清单。是否将某个实体列入黑名单的决定由一个跨部门委员会做出,该委员会成员包括美国商务部、国防部、能源部、国务院,偶尔还有财政部官员。该委员会已批准将一些公司列入黑名单,但商务部尚未公布名单。

Epic Games 推出开源版本控制系统 Lore

Epic Games 宣布了新版本控制系统 Lore,源代码采用 MIT 许可证托管在 GitHub 上。Git 是最流行的版本控制系统,但它最初的是为 Linux 这一大型去中心化项目设计的,并没有为游戏或封闭环境下的大型私有软件开发优化。Git 不太适合游戏公司的纹理、3D 模型、音频等文件的协同开发,因此游戏领域流行的版本控制系统是私有的 Perforce,开源的 Lore 瞄准的就是该私有软件。Epic Games 称,“Lore是一个集中式、内容寻址的版本控制系统,使用默克尔树和不可变的版本链来表示仓库状态,并针对二进制优先存储、重复数据删除以及大规模的稀疏/按需数据水合进行了优化。”

六成美国消费者对品牌中的 AI 表示反感

根据 WordPress VIP 的报告《Future of the Web Report》,六成美国消费者对品牌信息中的 AI 表示反感。74% 的消费者认为今天的互联网没有 10 年前有人味;普通人冲浪 40 分钟就会产生在线互动缺乏真实感的感受——这被称为 Bot fatigue;16% 的消费者认为没有品牌真正有效利用了 AI,六成消费者认为品牌信息中的 AI 会让人倒胃口。

GLP-1 减肥药有助于抑制暴力冲动

大量研究表明 GLP-1 药物不仅仅能减肥,它几乎无所不能。根据发表在《Criminology》期刊上的一项新研究,GLP-1 减肥药有助于抑制暴力冲动。研究人员强调这是一项观察性研究,并没有证明两者之间存在因果。GLP-1 药物在减轻体重过程中除了降低食欲外还会对行为产生影响,比如遏制对酒精的渴望。这一结果可能源于药物对冲动控制和奖赏处理感知的影响。而冲动和酒精饮用都是公认的暴力行为风险因素。研究人员分析了 7521 名美国成年人的调查数据,其中 821 人曾服用过 GLP-1 减肥药,597 人正在服用该药,受访者被询问了饮酒和冲动行为。结果显示正在服用 GLP-1 药物的人中冲动行为和暴力行为之间的关联减弱了 62%,饮酒行为与暴力行为之间的关联性减弱了 52%。

恶意墙纸瞄准中俄 Steam 用户窃取其账号

俄罗斯安全公司卡巴斯基对中俄 Steam 用户发出警告,恶意墙纸正在 Steam 创意工坊快速扩散,其目的是劫持他们的账号。攻击者利用了热门墙纸应用 Wallpaper Engine 创意工坊分享功能的漏洞,恶意程序隐藏在分享的壁纸包中。运行被感染的壁纸会导致 Steam 账号被盗,或者系统被植入后门或加密货币挖矿程序。安全研究人员在创意工坊发现了数十款恶意壁纸,每一款都被下载了数千次,甚至数万次。黑客主要针对中国 Steam 用户,墙纸的艺术风格和标题都专门针对中国玩家量身定制,中国玩家的下载量最多,占到了总下载量的  89.4%,其次是俄罗斯的 5.5%,新加坡 (1.4%)、香港 (0.9%)、德国 (0.9%)、越南 (0.9%)、印度 (0.5%) 和加拿大 (0.5%)。Steam 目前已经移除了包含恶意程序的墙纸。

Firefox 用 Zlib 的 Rust 语言版本替代了 C 语言版本

Firefox 浏览器从 v151 开始,Gzip 压缩/解压缩就依赖于 zlib-rs 库,用 Rust 语言开发的版本替代了 C 语言版本改进了性能,提供了更好的内存安全性,以及带来了英特尔第 13 代/第 14 代酷睿 CPU 不稳定导致的崩溃问题。致力于用 Rust 语言重写关键库的非盈利组织 Trifecta Tech Foundation 在 2024 年夏天就与 Mozilla 讨论在浏览器中集成 zlib-rs,但从测试到落地花了两年时间,一个重要原因就是 zlib-rs 触发了臭名昭著的英特尔 CPU bug。测试中 zlib-rs 中的一些代码导致英特尔 Raptor Lake CPU 频繁崩溃,开发者最终发现问题与 Huffman 编码写入内存的一个特定指令相关,识别问题之后解决起来就容易了,开发者通过加入一段“不安全代码”修复了该问题。

泄漏财务数据显示 2025 年 OpenAI 净亏损约 80 亿美元

泄漏财务数据显示 2025 年 OpenAI 净亏损约 80 亿美元。数据显示,OpenAI 的营收从 2024 年的 37 亿美元增至 2025 年的 130.7 亿美元。研发支出从 2024 年的 78.1 亿美元飙升至 2025 年的 191.8 亿美元,其中仅支付给微软的研发费用就高达 105.9 亿美元。产品生产和分销支出从 2024 年的 26.5 亿美元增至 2025 年的 75 亿美元。销售和市场营销支出从 2024 年的 11.1 亿美元增至 2025 年的 57.3 亿美元。OpenAI 的运营亏损从 2024 年的 87.8 亿美元增至 2025 年的 209.2 亿美元,净亏损从 2024 年略高于 50 亿美元飙升至 2025 年的近 390 亿美元。但其中包含了一笔大约 300 亿美元的从非盈利结构转为盈利性结构的估值相关会计支出,如果不计入这笔费用,OpenAI 在 2025 年净亏损约为 80 亿美元。OpenAI 披露 ChatGPT 周活跃用户逾 9 亿,但付费用户只有 5000 万。

GLP-1 减肥药有助于提高男性睾酮水平和精子质量

根据内分泌学会年会上发表的报告,多项研究显示 GLP-1 减肥药有助于提高男性睾酮水平和精子质量。一项研究对 1600多 名开具减肥药处方的男性患者的电子健康记录进行了分析,发现在接受 GLP-1 药物或双重激素受体激动剂治疗后,参与者的睾酮水平增加了约 30%。另一项回顾性研究同样分析了 215 名接受减肥药物治疗男性的记录,发现治疗后他们的平均睾酮水平比治疗前高出约 20%。睾酮是精子产生和维持生育能力不可或缺的激素,而肥胖会降低睾酮水平已是医学界的共识。脂肪细胞中含有高水平的酶,能将睾酮转化为主要的女性性激素雌二醇。此外肥胖引起的代谢变化和体内炎症水平升高也会直接影响睾酮的产生。当 GLP-1 药物帮助患者有效减重时,这些负面因素也随之减弱,从而促使生殖激素网络恢复正常。

地下真菌网络长度超过 10 万万亿公里

根据发表在《科学》期刊上的一项研究,地下真菌网络长度达到 11 万万亿公里(或 110 京公里,1 京等于 1 千万亿),是地日距离的 7.5 亿倍。丛枝菌根真菌(Arbuscular mycorrhizal fungi)是由被称为菌丝的管状细胞构成的网络。它们通过与逾七成的植物建立共生关系维系着地球上的生命。这种网络已存在约 4.75 亿年,它们通过向植物提供养分和水分换取植物产生的碳,它们还通过将碳吸收到土壤中帮助调节气候。Society for the Protection of Underground Networks(Spun)组织的研究团队利用机器学习模型,结合世界各地逾 16000 个土壤样本的数据,绘制出第一张丛枝菌根真菌网络的全球地图。研究人员称,仅仅一茶匙土壤就可能存在长达 10 米的菌根网络。研究还发现,农耕会破坏真菌网络,农田菌根网络密度平均比野生生态系统低 47.3%。草原地区拥有最密集的菌丝系统,但这些地区缺乏保护,正日益退化。

Mozilla 公布 Firefox 路线图

Mozilla 在宣布 Firefox 152 的同时,公布了将在未来推出的一系列新功能,其中包括:更新 UI 的 Project Nova;自定义快捷键;改进 PDF 编辑功能——支持在浏览器上直接拆分、合并和重组 PDF 文档;Multi-Account Containers 从扩展变成原生功能;移动版本将内置免费 VPN(可能只限于少数国家);通过语音向浏览器提问获得 AI 生成答案的 Quick Answers;隐私 AI 浏览 Smart Window;省电模式(Power Saving Mode)识别手机上消耗资源最多的标签页,自动降低其资源占用,从而延长电池续航时间,等等。

ChatGPT 市场份额首次跌破 50%

根据 Sensor Tower 的《State of AI Report for 2026》报告,在 ChatGPT 发布三年半之后,其市场份额首次跌破 50%,而用户正在 Google Gemini、Anthropic Claude 等不同 AI 助手之间切换。ChatGPT 是最快达到 10 亿月活用户的应用,它的月活用户目前超过 11 亿,之后是 Gemini 的 6.62 亿 和 Claude 的 2.45 亿。ChatGPT 在今年 1 月市场份额还超过 50%,但到了 5 月底降至 46.4%,Gemini 占 27.7% 和 Claude 占 10.3%,Grok、Perplexity、DeepSeek 和 Meta AI 都低于 5%。 Sensor Tower 估计,2026 年上半年,AI 应用下载量预计将接近 23 亿次,用户支出将超过 42 亿美元。相比之下 2025 年上半年的 AI 支出为 18.3 亿美元——这表明 AI 行业正将重心从增长转向盈利。但下载量和支出增长率均已放缓,表明即使绝对数量在继续攀升,市场可能正走向成熟。中国和印度的 AI 应用下载量出现了下滑,2026 年第一季度亚洲下载量下降了 3.3%。

微软考虑使用 DeepSeek 的开源模型降低成本

最大化词元使用(tokenmaxxing)对微软的 AI 工具 Copilot 产生了不利影响,软件巨人正在考虑使用 DeepSeek 的开源模型以降低成本。微软考虑使用的是 DeepSeek-V4 自托管版本的修改版,它将作为一种低成本选项用于驱动微软的 Copilot Cowork。Copilot Cowork 目前运行在 Anthropic 和 OpenAI 的模型上,两家公司不断涨价,Copilot 也从无限量使用切换到了基于使用量的定价模式,此举招致了用户的强烈不满。更便宜的型号有助于降低成本让用户满意,但可能会让特朗普政府不满意。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…