OrangeBot.AI Digest — 2026-06-18
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Noam Shazeer Joins OpenAI (twitter.com)
- The founder of Craigslist has given away half a billion dollars (www.independent.co.uk)
- A website that lists websites to submit your website to (www.submission.directory)
- Swiss parliament lifts ban on new nuclear power plants (www.bluewin.ch)
- Ubiquiti: Enterprise NAS, Built on ZFS (blog.ui.com)
- .gitignore Isn't the only way to ignore files in Git (nelson.cloud)
- Modos Color Monitor Pushes E-Paper Displays Further (spectrum.ieee.org)
- I found 10k GitHub repositories distributing Trojan malware (orchidfiles.com)
- Microsoft new Outlook takes 10 seconds to do what Outlook Classic does instantly (www.windowslatest.com)
- CS 6120: Advanced Compilers: The Self-Guided Online Course (2020) (www.cs.cornell.edu)
- Emacs 31 is around the corner: The changes I'm daily driving (www.rahuljuliato.com)
- Hospitals and universities repurposing drugs at lower cost (www.kcl.ac.uk)
- AMD silently removes memory encryption from consumer Ryzen CPUs (www.tomshardware.com)
- DeepSeek Introduces Vision (chat.deepseek.com)
- SteamOS Linux 3.8 released as stable (store.steampowered.com)
GitHub Trending(15)
- google-research / timesfm
- n0-computer / iroh
- freeCodeCamp / freeCodeCamp
- obra / superpowers
- zai-org / GLM-5
- DeusData / codebase-memory-mcp
- yifanfeng97 / Hyper-Extract
- alibaba / zvec
- withastro / flue
- Kilo-Org / kilocode
- makeplane / plane
- Kong / insomnia
- Universal-Debloater-Alliance / universal-android-debloater-next-generation
- dotnet / aspnetcore
- owainlewis / awesome-artificial-intelligence
Product Hunt(15)
- VELA
Securely execute AI-generated & untrusted code
- Tine
An AI desktop cursor that does the work for you
- Merlin by Encord
Manage your AI data infrastructure in a single conversation
- LayerProof Bristol
Agentic reports your clients want to read
- Elvin
Proactive AI that finds and finishes work before you ask
- Agentic videos by D-ID
Interactive videos that talk back
- Juno
Free, local AI powered Voice to Text w/ live transcriptions
- Adapt
The AI company brain that does work for you
- Locofy: design-to-code agents
Agentic frontend layer between Figma and Cursor & Claude
- Genie Mentions
AI that gets you *and* the people in your life, together
- Jesse
Stop building Apollo/Clay lists. Search the live internet.
- Retool
Build anywhere. Govern in Retool.
- Tabstack Dev Tools
Ditch your scraper. Make one API call with any tool.
- Buddy
Free Figma agent + Import anything to Figma
- Otty
A Mac native and beautiful terminal emulator
Hugging Face(15)
- Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. We introduce RNG-Bench (Reconstructive Non-Markov Games), a benchmark suite designed to isolate a base model's ability to reconstruct past observations and act on them during multi-step interaction. RNG-Bench includes two complementary games: Matching Pairs, where card identities briefly revealed at specific locations must later be recalled, and 3D Maze, where egocentric views must be integrated into a spatial map. Both games are evaluated under a unified harness with three controlled difficulty axes: grid size, visual pattern, and observation modality. The benchmark further introduces a head-to-head duel protocol to control for instance-level variance and a Memory Gap metric that disentangles forgetting from poor action selection. The hardest configurations require contexts of roughly 128K tokens and 350 image inputs per episode, and remain far from saturated by frontier MLLMs. Memory Gap analysis shows that most residual errors stem from forgetting earlier observations rather than from suboptimal decision making. Finally, fine-tuning Qwen3.5-9B on optimal-policy rollouts and filtered model demonstrations improves performance on RNG-Bench and transfers to existing benchmarks without degrading general multimodal capability.
- Kairos: A Native World Model Stack for Physical AI
World models are transitioning from passive visual generators to foundational, operational infrastructure for Physical AI: they must natively acquire world knowledge from heterogeneous experience, maintain persistent states over long horizons, and execute efficiently within real deployment constraints. We introduce Kairos, a native world model stack designed around these requirements. (1) Kairos learns the world by pioneering a Native Pre-training Paradigm governed by a Cross-Embodiment Data Curriculum, which organizes open-world videos, human behavioral data, and robot interactions into a progressive developmental pathway. (2) Kairos maintains the world by unified world understanding, generation, and prediction within a Native Unified Architecture equipped with Hybrid Linear Temporal Attention, where sliding-window attention captures local dynamics, dilated sliding windows capture mid-range dependencies, and gated linear attention maintains persistent global memory. We establish formal theoretical bounds demonstrating that this temporal factorization strictly limits error accumulation, mathematically guaranteeing state propagation across extended horizons. (3) Kairos runs the world by incorporating a Deployment-Aware System Co-Design to support low-latency rollout generation on server and consumer-grade hardware for real-world observation-action-feedback loops. Experiments on embodied world-model, long-horizon, and action-policy benchmarks show that Kairos achieves top level performance while offering a strong efficiency-capability trade-off. Together, these results position Kairos as a cohesive operational foundation for future self-evolving physical intelligence.
- Guava: An Effective and Universal Harness for Embodied Manipulation
Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, and control. However, it remains unclear what makes an effective harness for embodied manipulation, and to what extent such a harness can unlock embodied capabilities in a wide range of reasoning models. In this work, we present Guava, a harness framework for embodied tool use developed through systematic exploration of the design space of agent workflows, action spaces, and observation spaces. Our study identifies three key ingredients for effective embodied agents: iterative perception-reasoning-action loops, semantic action abstractions, and multimodal observations. To understand whether these design principles are universal even to small models, we develop an end-to-end training pipeline that distills embodied manipulation capabilities into a 4B open-source model using fewer than 2K trajectories collected entirely in simulation. Experimental results in both simulation and real-world environments show performance comparable to frontier proprietary models while exhibiting strong generalization to unseen objects, novel instructions, and long-horizon tasks. Results suggest that a well-designed harness can serve as a scalable, model-agnostic interface for embodied manipulation, enabling strong emergent embodied capabilities in compact open-source models with minimal training data.
- MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction
Motion forecasting is central to visual intelligence: agents must anticipate how objects will move in order to plan actions, reason about physical interactions, and synthesize realistic futures. We argue that 3D points in world coordinates provide a general representation that is class-agnostic, view-stable, compact, and directly useful for downstream tasks. We formalize the task of goal-conditioned 3D point motion forecasting: given a short visual history, a set of 3D query points on an object of interest, and a language description of the intended goal, the model predicts the future 3D trajectory of each point. We introduce a full stack to study this task at scale: (1) MolmoMotion-1M is a large corpus of action-described, object-grounded 3D point trajectories annotated from 1.16M unconstrained videos; (2) PointMotionBench is a human-verified benchmark spanning 111 object categories and 61 motion types; and (3) MolmoMotion is a general motion forecasting model that supports both autoregressive coordinate prediction and flow-matching-based trajectory generation. MolmoMotion accurately predicts diverse motion patterns with different language instructions, and significantly outperforms existing motion prediction baselines on PointMotionBench. Finally, we show that the learned 3D motion prior transfers well to downstream applications: it improves training efficiency and generalization for robot manipulation, and its predicted trajectories provide effective motion guidance for generative models to synthesize videos with more realistic object motion.
- EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts
Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive sampling decodes responses sequentially and a small number of long-tailed generations often determine completion time. Speculative decoding (SD) offers a natural way to address this bottleneck, as it is a well-established technique for serving fixed LLMs that reduces latency by rapidly drafting tokens and accepting them through parallel verification while preserving the target-model distribution. However, its practical speedups do not directly carry over to RL rollouts: (i) the evolving target policy makes any fixed drafter increasingly mismatched with the policy's output distribution; and (ii) active batch sizes shrink throughout rollout decoding, shifting decoding from compute-bound to memory-bound regimes where parallel verification can exploit underutilized compute. Therefore, accelerating RL rollouts requires both a drafter that remains effective under long, high-temperature generations from an evolving policy and system-aware use of SD that avoids compute-bound regimes. We present EfficientRollout, a system-aware self-SD framework designed to address this gap for RL rollouts. EfficientRollout induces a quantized drafter from the target model (i.e. self-speculative decoding), keeping it coupled to the evolving policy without separate drafter pretraining or online adaptation. It further coordinates a system-aware SD toggle policy with acceptance-aware draft-length adaptation, enabling speculation only in beneficial regimes while matching the drafting budget to evolving drafter quality. EfficientRollout reduces rollout and end-to-end latency by up to 19.6% and 12.7%, respectively, over an accelerated AR rollout baseline, while preserving final model quality.
- SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior
Sparse Autoencoders (SAEs) decompose residual-stream activations into interpretable features. Recent latent-space defenses increasingly rely on these decompositions, assuming that identified "unsafe" SAE features serve as actionable handles for monitoring and intervention. In this paradigm, clamping a specific harmful feature is expected to reliably prevent model misbehavior. However, we show that this success may hide a recoverable failure mode: the clamp may block one visible route to a behavior without eliminating the behavior itself. We formulate this vulnerability as post-intervention recovery, a constrained residual-space optimization problem. Starting from the post-intervention residual state, we optimize residual perturbations to recover the pre-intervention behavior while preserving the post-intervention values of the targeted SAE features. Even under a strong threat model where the intervention remains active throughout optimization and generation, recovery remains possible. To rule out that recovery simply undoes the intervention, we use encoder-orthogonal updates for single-layer interventions and the corresponding feature-map Jacobian in the cross-layer setting. Across TPP, unlearning, IOI, and refusal steering experiments, this stress test reveals recoverable behavior despite successful feature-level intervention. Especially in the safety-critical refusal-steering setting, we achieve a 95.8% recovery rate on valid samples while keeping defended-feature relative drift to 0.131, substantially below suffix-based baselines. A recovery-path attribution analysis further localizes this recovery to the SAE reconstruction residual, the component left unexplained by the SAE. These results expose a gap between feature-level control and behavioral completeness: SAE features can support causal intervention, but controlling them does not guarantee control over the underlying behavior.
- Reinforcing Dual-Path Reasoning in Spatial Vision Language Models
Spatial VLMs have made substantial progress in geometric perception, yet complex spatial reasoning requiring multi-step inference over depth, distance, and scene relations remains challenging. Moreover, different spatial queries call for fundamentally different strategies: some are best addressed through purely linguistic, step-by-step deduction, while others require explicit 3D grounding before quantitative inference. We present Dual-Path Spatial Reasoning via Reinforcement Learning for Spatial VLMs (SR-REAL), a unified framework that equips a spatial VLM with two complementary reasoning paths: Language-Only Reasoning (LOR), which performs step-by-step linguistic deduction, and Detect-Then-Reason (DTR), which detects 3D geometric cues (e.g., centers or bounding boxes) via region tokens before explicit geometric inference. SR-REAL begins with a cold-start supervised fine-tuning stage that constructs LOR and DTR chain-of-thought supervision and exposes a region-to-3D interface, followed by RL that optimizes the policy model with accuracy and format rewards; for DTR, a discrete center-based detection reward further refines geometric alignment. Across diverse spatial benchmarks, SR-REAL significantly outperforms spatial VLM baselines: (i) a single RL-trained model supports both reasoning paths, with DTR excelling in region-aware tasks through precise 3D localization and LOR enhancing general spatial reasoning; (ii) jointly training both paths fosters mutual reinforcement; (iii) high-quality, blended cold-start data is crucial for stable RL optimization; and (iv) the model generalizes across datasets and domains without per-task tuning, demonstrating positive transfer between LOR and DTR.
- Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
Graphical user interface (GUI) grounding requires vision-language models (VLMs) to identify small target elements in high-resolution screenshots and predict precise screen coordinates. On-policy self-distillation (OPSD) is a promising post-training approach for this coordinate-sensitive task, since it provides dense token-level teacher signals beyond hard coordinate labels. However, naive OPSD is not well suited to GUI grounding: OPSD evaluates the teacher on student-generated prefixes, the quality of coordinate-token teacher signals can degrade when the prefix has already deviated from the target coordinate, leading to unreliable teacher signal. To mitigate this, We propose quality-aware self-distillation for VLM-based GUI grounding, which improves coordinate-token teacher-signal quality through soft correctness-aware gating and teacher-probability scaling. The soft correctness-aware gate checks whether the teacher's current coordinate-token prediction can still be completed into the ground-truth box under the student-generated prefix. If not, the corresponding teacher signal is down-weighted. Teacher-probability scaling then uses the teacher's confidence as a lightweight factor to further calibrate the strength of the gated supervision. A key empirical finding is that neither component alone improves overall performance, whereas combining them consistently improves performance. This suggests that the two mechanisms play complementary roles: correctness-aware gating suppresses unreliable coordinate-token supervision, while teacher-probability scaling calibrates the strength of the remaining signals. Experiments across six GUI grounding benchmarks show that our method consistently improves the base model and outperforms strong baselines.
- The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL
Score- and flow-matching models often rely on preference-based reinforcement learning for two purposes: aligning with subjective preferences and, surprisingly, recovering properties such as visual realism and coherent object structure that matching-based training is intended to learn from the data itself. We argue that this reflects a structural mismatch. Matching losses measure ell_2 regression error on the velocity or score field under training-time marginals, a proxy poorly aligned with the visual and semantic properties that determine sample quality at inference. Given a reward aligned with these properties, RL sidesteps the mismatch by evaluating the model on its own samples and following the reward landscape directly. The challenge is to obtain such a reward without relying on human preferences, which are expensive and conflate data realism with annotator inclinations. We propose Discriminator-Guided RL (DRL). DRL trains a discriminator to separate data from base-model samples in a pretrained representation space and uses its logit as the reward in KL-regularized RL. The pretrained space restricts the discriminator to perceptually meaningful directions, and the logit estimates the log-likelihood ratio between data and model, which is the optimal reward for targeting the data distribution. Across SiT, JiT, REPA, and RAE, DRL reduces guidance-free FID (e.g., 9.38 to 2.62 on SiT) and semantic-space FD (e.g., 88.2 to 19.3 on DINOv3 for SiT), with consistent gains across all backbones, and improves human-preference rewards without training on them. It also yields a better Pareto frontier between preference reward and image fidelity under subsequent preference-based post-training, increasing alignment while reducing low-level artifacts such as oversaturation and excessive brightness.
- From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the current policy. To automate this process, we propose the LLM-as-Environment-Engineer framework in which the current policy model analyzes failure trajectories together with contextual information and proposes modifications to the next-stage training environment configuration. We also introduce MAPF-FrozenLake, a controllable testbed whose generator exposes multi-dimensional environment configurations, making it suitable for studying and benchmarking environment redesign. On this testbed, we condition the environment engineer on structured summaries of policy behavior, failure cases, and environment statistics, from which it produces the configuration for the next training stage. With Qwen3-4B as the backbone, our framework achieves the strongest aggregate performance on our benchmarks, outperforming larger proprietary LLMs (e.g., GPT, Gemini) and fixed-environment training baselines. We further analyze which forms of context are most effective, finding that successful environment updates rely on failure evidence and preserve configurations that already work. Interestingly, the current RL checkpoint serves as a better environment engineer than the original base model, suggesting that policy learning improves the model's ability to diagnose its remaining weaknesses.
- Native Active Perception as Reasoning for Omni-Modal Understanding
Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still scales with video length. We propose OmniAgent, the first native omni-modal agent that formulates video understanding as a POMDP-based iterative Observation-Thought-Action cycle. OmniAgent executes on-demand actions to selectively distill audio-visual cues into a persistent textual memory, effectively decoupling reasoning complexity from raw video duration. To operationalize this, we introduce (1) Agentic Supervised Fine-Tuning to bootstrap native active perception via best-of-N trajectory synthesis with dual-stage quality control, and (2) Agentic Reinforcement Learning with TAURA (Turn-aware Adaptive Uncertainty Rescaled Advantage), which leverages turn-level entropy to steer credit assignment toward pivotal discovery turns. Crucially, OmniAgent exhibits positive test-time scaling, where performance improves as the number of reasoning turns increases, validating the efficacy of active perception. Empirical results across ten benchmarks (e.g., VideoMME, LVBench) demonstrate that OmniAgent achieves state-of-the-art performance among open-source models. Notably, on LVBench, our 7B agent outperforms the 10times larger Qwen2.5-VL-72B (50.5% vs. 47.3%).
- MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model
As an increasing majority of global video content is consumed on social platforms for interactive social purposes, video generation models built for social worlds are important but largely overlooked by previous studies. In this work, we define the position of social world models and build a prototype model as the first step towards this goal. While previous world models successfully simulate physical environments or gaming world exploration, they remain fundamentally detached from human-centric social dynamics. To bridge this gap as the first step to social world models, we present MaineCoon, the first real-time audio-visual autoregressive model that has 22B parameters and is capable of real-time streaming generation and sub-second interaction, with a record-breaking frame rate of up to 47.5 FPS, on a single GPU. To the best of our knowledge, MaineCoon is also the first real-time audio-visual generation model specifically optimized for social-interactive applications. To enable efficient and stable training, we introduce several novel techniques into MaineCoon, including self-resampling, cross-modal representation alignment, domain-aware preference optimization, and reinforced online-policy distillation (ROPD). We also design the first agentic streaming inference framework that supports thousand-second-scale or even longer generation while mitigating drift with agentic cache management and prompt planing. These innovations significantly accelerate training while optimizing real-time inference performance. We believe this work not only sets a new state-of-the-art (SOTA) performance benchmark for high-quality, low-latency, and long-horizon audio-visual autoregressive models, but also points out the paradigm shift desired for next-generation AI-native social platforms.
- STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability
Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the next-token distribution, yielding an advantage-surprisal four-quadrant structure and a near-criticality property. Motivated by it, we propose STARE (Surprisal-guided Token-level Advantage Reweighting for policy Entropy stability), which identifies entropy-critical token subsets via batch-internal surprisal quantiles, selectively reweights their effective advantages, and incorporates a target-entropy closed-loop gate for stable entropy regulation. Across model scales from 1.5B to 32B and three task families (Short CoT, Long CoT, and Multi-Turn Tool Use), STARE sustains stable RL training over thousands of steps while maintaining policy entropy within the target band. On AIME24 and AIME25, STARE outperforms DAPO and other competitive baselines by 4%-8% in average accuracy, with reflection tokens and response length growing in tandem, indicating sustained exploration-exploitation balance that further unlocks RL training potential.Code is available at https://github.com/hp-luo/STARE.
- Sumi: Open Uniform Diffusion Language Model from Scratch
Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none. A scratch-pretrained UDLM at scale would provide a clean reference point for studying scaling behavior, generation dynamics, controllability, and trade-offs against established autoregressive and masked diffusion models. To this end, we introduce Sumi ("ink" in Japanese), a fully open 7B uniform diffusion language model pretrained from scratch on 1.5T tokens. Sumi performs competitively with autoregressive models trained at comparable token budgets on knowledge, reasoning, and coding benchmarks, while under-performing on commonsense benchmarks, where our education-heavy data mixture is a likely contributor. We release our model weights, checkpoints, and full training recipe, including a complete specification of the data mixture over publicly available corpora. We hope this release enables the community to study native uniform diffusion at scale and catalyzes work on its as-yet poorly understood aspects.
- Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems
Multicultural multi-agent systems are increasingly deployed in globally diverse settings, where different agents are grounded in different cultural backgrounds. Existing cultural evaluation focuses on value alignment: how closely a single agent matches a target culture. Yet alignment is a per-agent property and cannot reveal whether a system, taken as a whole, preserves the cultural plurality it is meant to represent. We propose value diversity as a system-level evaluation axis for multicultural agent systems, defined through the dissimilarity between culturally conditioned agents' responses on a shared value survey. Using the World Values Survey, we evaluate 19 cultures and 18 backbone models across a wide range of system configurations. We find that diversity is largely uncorrelated with alignment, indicating that the two capture complementary system properties, and that current multicultural agent systems fall substantially below human societies in value diversity. Mixed-backbone systems narrow this gap but do not close it, and the gap persists across culture compositions and agent scales. Social interaction further erodes diversity by driving agents toward consensus, and a participatory budgeting case study shows that this homogenization narrows the breadth of collective decision-making. Together, our results establish value diversity as a distinct evaluation axis for multicultural multi-agent systems and reveal a persistent homogenization tendency in current LLM-based societies. Our code and data are publicly available at https://github.com/iNLP-Lab/MultiAgent-Diversity.
Techmeme(15)
- Snap plans to spin off an internal generative AI video team into Dotmo, a new company focused on AI models for interactive gaming experiences, citing high costs (Lucas Ropek/TechCrunch)
Lucas Ropek / TechCrunch : Snap plans to spin off an internal generative AI video team into Dotmo, a new company focused on AI models for interactive gaming experiences, citing high costs — Snap will be spinning off an internal generative AI video team into a separate company. The new company — dubbed Dotmo …
- Sources: Meta is under contract to buy roughly 1.6 GW of computing capacity from Crusoe across two data centers in Texas and Missouri (Bloomberg)
Bloomberg : Sources: Meta is under contract to buy roughly 1.6 GW of computing capacity from Crusoe across two data centers in Texas and Missouri — Meta Platforms Inc. has secured new agreements to get AI computing power from data center developer Crusoe, bolstering the infrastructure it needs to support …
- Unsealed docs: Google lost a court fight against a 2023 US warrant in a Jan. 6 pipe bomb probe seeking info of 300+ users who searched for the RNC and DNC HQs (Zoe Tillman/Bloomberg)
Zoe Tillman / Bloomberg : Unsealed docs: Google lost a court fight against a 2023 US warrant in a Jan. 6 pipe bomb probe seeking info of 300+ users who searched for the RNC and DNC HQs — Google waged a secret court fight against a US warrant demanding identities of hundreds of internet users who searched …
- Sources: the White House and Anthropic are working on a framework that would assess the severity of AI security flaws, a sign that negotiations are progressing (Politico)
Politico : Sources: the White House and Anthropic are working on a framework that would assess the severity of AI security flaws, a sign that negotiations are progressing — The attempt to create a standardized method to evaluate this and future such incidents underscores how the administration is racing …
- Sources: APEC, a derivatives exchange founded by the 22-year-old son of pro-crypto Senator Kirsten Gillibrand, raised $30M led by Lux at a $300M valuation (Ben Weiss/Fortune)
Ben Weiss / Fortune : Sources: APEC, a derivatives exchange founded by the 22-year-old son of pro-crypto Senator Kirsten Gillibrand, raised $30M led by Lux at a $300M valuation — The 22-year-old son of a crypto-friendly senator plans to launch his own exchange for a type of derivative popularized by digital asset traders.
- GLM-5.2 is the leading open weights model on Artificial Analysis' Intelligence Index, scoring 51, only behind Fable 5's 60, Opus 4.8's 56, and GPT-5.5's 55 (Artificial Analysis)
Artificial Analysis : GLM-5.2 is the leading open weights model on Artificial Analysis' Intelligence Index, scoring 51, only behind Fable 5's 60, Opus 4.8's 56, and GPT-5.5's 55 — Z ai's GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task
- New book: Trump mocked Zuckerberg and Bezos by showing associates their fawning texts, saying they were "kissing my ass"; Musk called it "First-class groveling" (Hugo Lowell/Wired)
Hugo Lowell / Wired : New book: Trump mocked Zuckerberg and Bezos by showing associates their fawning texts, saying they were “kissing my ass”; Musk called it “First-class groveling” — “You would not believe the texts I got from these tech guys,” NYT reporters Maggie Haberman …
- Rockstar Games announces that pre-orders for Grand Theft Auto VI will go live on June 25; TTWO closed up 4.93% (Zack Zwiezen/Kotaku)
Zack Zwiezen / Kotaku : Rockstar Games announces that pre-orders for Grand Theft Auto VI will go live on June 25; TTWO closed up 4.93% — If you were worried about another delay, you can probably stop worrying … Some of you might roll your eyes at the seeming certainty of that statement, and understandably so.
- Apple opens iOS to alternative app marketplaces in Brazil and changes App Store commission structure following a settlement with competition watchdog CADE (Marcus Mendes/9to5Mac)
Marcus Mendes / 9to5Mac : Apple opens iOS to alternative app marketplaces in Brazil and changes App Store commission structure following a settlement with competition watchdog CADE — Starting today, app developers will be able to distribute apps through alternative app marketplaces in Brazil, as part of a broader set …
- Former Trump AI adviser Dean Ball is joining OpenAI to lead a new team called Strategic Futures, focused on frontier AI policy and internal governance (Ashley Gold/Axios)
Ashley Gold / Axios : Former Trump AI adviser Dean Ball is joining OpenAI to lead a new team called Strategic Futures, focused on frontier AI policy and internal governance — Dean Ball, who helped shape the Trump administration's early policies on AI, is heading to OpenAI, he told Axios exclusively.
- Sources: General Intuition, which trains AI agents in spatial reasoning, is in talks to raise $300M from Jeff Bezos and others at a $2B+ valuation (Rebecca Bellan/TechCrunch)
Rebecca Bellan / TechCrunch : Sources: General Intuition, which trains AI agents in spatial reasoning, is in talks to raise $300M from Jeff Bezos and others at a $2B+ valuation — General Intuition, the New York-based startup building a foundation model that trains AI agents how to move through space and time …
- Amazon's AI chief Peter DeSantis says the company is in talks to sell its custom Trainium AI chips for use in third-party data centers (Mark Bergen/Bloomberg)
Mark Bergen / Bloomberg : Amazon's AI chief Peter DeSantis says the company is in talks to sell its custom Trainium AI chips for use in third-party data centers — Amazon.com Inc. is in talks to sell its custom-made artificial intelligence chips for use in other companies' data centers, a key expansion of its efforts to cut into Nvidia Corp.'s dominance.
- Filings: Waymo pulls its ~4K robotaxis from highways after finding 13+ instances of the cars driving into highway sections under construction (Sean O'Kane/TechCrunch)
Sean O'Kane / TechCrunch : Filings: Waymo pulls its ~4K robotaxis from highways after finding 13+ instances of the cars driving into highway sections under construction — Waymo has recalled its fleet of nearly 4,000 robotaxis to restrict them from driving on highways while it figures out how to make the vehicles behave around construction zones.
- Seattle-based Gradial, which makes AI agents that automate enterprise marketing workflows, raised a $65M Series C led by Insight Partners at a $675M valuation (Kerry Flynn/Axios)
Kerry Flynn / Axios : Seattle-based Gradial, which makes AI agents that automate enterprise marketing workflows, raised a $65M Series C led by Insight Partners at a $675M valuation — Gradial, a Seattle-based startup that deploys AI agents to automate enterprise marketing workflows, has raised $65 million in Series C funding …
- Sources: the early Chinese backers of Manus, including HSG, ZhenFund, and Tencent, plan to buy the AI startup back from Meta at the $2B price Meta paid (The Information)
The Information : Sources: the early Chinese backers of Manus, including HSG, ZhenFund, and Tencent, plan to buy the AI startup back from Meta at the $2B price Meta paid — The early Chinese backers of AI firm Manus are planning to buy the firm back from Meta Platforms at the $2 billion price Meta paid …
Solidot(15)
- 地球的海洋来自何处?
地球之水来自何处?科学家其实并不真正了解。水的来源有多种假说,其中最主流的是彗星说——撞击地球的彗星将水带到了地球;此外还有小行星说——撞击地球的小行星将水带到了地球,以及水由地球自身创造说。1986 年 Giotto 探测器对哈雷彗星的观测数据基本上否定了彗星假说,因为地球水的化学特性与彗星水完全不同。后续对 Hale-Bopp 彗星以及 Rosetta 探测器对 Churyumov-Gerasimenko 彗星的观测也都证实彗星之水与地球之水截然不同。那么地球之水是否可能来自小行星?科学家发现小行星上的惰性元素比例与地球也存在差异。那么地球上的海洋是否主要是由它自身形成的?早期地球的岩浆海洋富含氧气,而大气富含氢气,但氢气和氧气并不会自然结合。过去几年科学家做了一系列实验探索早期地球环境氢气和氧气是否能发生反应形成水。实验证实,地球上至少有一部分水能靠自身形成,但是否能形成今天覆盖整个地球的海洋,还无法下定论。
- 三个安全启动证书即将过期
三个微软在 2011 年颁发的安全启动 (Secure Boot) 证书将于 6 月 24 日过期。安全启动检查系统启动期间加载的所有固件的数字签名,确保其来自可信提供商。安全启动旨在设计阻止会纂改 UEFI 的恶意程序 UEFI bootkits,一旦安装此类恶意程序很难检测到,即使重装系统也没用。安全启动使用加密签名确保启动过程中加载的每个固件都受到计算机制造商的信任,它旨在建立信任链,防止攻击者用恶意固件替换预期的启动固件。但在 2023 年研究人员发现了存在于几乎所有 Windows 和 Linux 系统 UEFI 启动过程中的严重漏洞 LogoFail。该漏洞存在于启动时显示硬件制造商徽标的软件中,攻击者能利用其图像解析 bug 绕过安全启动,用恶意固件感染 UEFI。微软因此移除了三个在 2011 年颁发的旧证书,用 2023 年颁发的新证书取代。Windows 用户可通过 Windows 安全设置 > 设备安全性 > 安全启动 去检查证书是否已经更新。Linux 用户可关注名叫 shim 的程序更新。
- 摩根大通高盛禁止香港员工使用 Anthropic 模型
美国投行摩根大通已禁止香港员工访问 Anthropic 的模型,显示这一技术在美国境外的应用正面临极其严格的审查。由于 Anthropic 与摩根大通的许可协议中有关“使用条款”的特定措辞,摩根大通已将 Claude 模型从其驻港员工获批使用的大型语言模型(LLM)内部名单中移除。在此之前,高盛也做出了类似决定,于 4 月将 Claude 从其香港员工的获准使用工具名单中剔除。今年 4 月 Anthropic 首次向少数企业和机构开放 Mythos 模型测试,并警告该模型具备发现网络安全漏洞的能力,不宜广泛推广。6 月初 Anthropic 发布了 Mythos 级模型的首个公开版本 Fable 5,但为管控其突破网络漏洞的能力,同步设置了许多限制措施。然而华盛顿仍以国家安全为由下达紧急出口管制令,迫使 Anthropic 在全球范围内关停 Mythos 5 和 Fable 5 模型。
- 诺和诺德 1.3 TB 内部数据被盗,被勒索 2500 万美元
勒索组织 FulcrumSec 宣称入侵了制药巨头诺和诺德(Novo Nordisk)的网络,窃取了约 1.3 TB 的数据,包括源代码、药物研究、临床试验记录、员工和医生信息、生产系统信息以及内部 AI 模型数据。它向诺和诺德勒索 2500 万美元赎金,但未获成功,因此考虑出售部分数据。FulcrumSec 称诺和诺德的代表于 6 月 3 日联系了他们。FulcrumSec 表示考虑通过开源来遏制企业不想支付赎金的情况。诺和诺德发言人表示它正与相关机构保持联系。
- 科学家将鼠疫追溯到 5500 年前
科学家发现了已知最古老的鼠疫证据,将其出现的时间追溯到约 5500 年前——比之前认为的早了约 200 年。研究人员在西伯利亚贝加尔湖附近的四个墓地寻找鼠疫杆菌的痕迹。他们在 18 位古代狩猎采集者的牙齿中发现了鼠疫 DNA 残留。对骨骼碳年代测定显示,发现这场瘟疫引发了两波疫情,第一波出现在 5500 年前。病菌可能是通过土拨鼠传播的,当地人可能是通过食用生内脏或屠宰过程中接触携带病菌的兽皮而感染鼠疫。死者中有很多是 8-11 岁幼童。早期的鼠疫和中世纪的黑死病同样致命,不仅摧毁人口稠密的城市,也摧毁小型游牧狩猎采集群体。
- 调查显示中国三分之一青少年睡眠质量差
山西大学研究人员在 PLOS One 上发表了一篇论文,指出青少年的心理健康、体重指数以及屏幕时间与睡眠质量有显著联系,且女孩和生活在农村地区的青少年睡眠质量往往较差。研究人员调查了中国六个城市的 5,713 名 13-18 岁青少年,这六个城市分别是:上海、苏州、太原、婺源、兴义和乌鲁木齐。他们使用匹兹堡睡眠质量指数(PSQI)收集了睡眠质量数据,同时还收集了 BMI、体质健康、静坐时间、屏幕使用时间及心理健康等数据。此外还获得了每位参与者的居住地(城市或农村)和性别信息。总体上有 33.71% 的受访者睡眠质量不佳。他们发现不同居住地点和性别之间存在显著差异。农村青少年睡眠质量不佳的比率高于城市青少年(分别为 35.78% 和 31.90%),在入睡时间、睡眠时长和睡眠干扰几个方面的表现均较差。女孩在几乎所有睡眠衡量指标方面上的表现均不及男孩,女孩睡眠质量较差者的比率为 38.40%,而男孩为 29.20%。较高的体重指数对女孩的睡眠有更显著的不利影响。
- 法国物理学家和科普名人因论文抄袭被剥夺博士学位
法国物理学家和科普名人 Étienne Klein 因论文抄袭被剥夺博士学位。他是 Alternative Energies and Atomic Energy Commission (CEA)的物理学家,出版了 30 多本书,主持一档每周播出的科普节目。自 2016 年以来他就面临着科普文章抄袭的指控。2024 年 8 月他的博士论文也受到质疑。他是在 1999 年获得博士学位,他的大学目前被合并为巴黎城市大學。分析显示,这篇博士论文五分之一的版面涉嫌抄袭,抄袭的内容包括作家加缪(Albert Camus)、物理学家德布罗意(Louis de Broglie),甚至还有论文委员会成员的论文。巴黎城市大學随后展开了调查,发现论文近三分之二的内容存在抄袭,因此撤销了他的博士学位。Klein 回应了指控,辩解称他阅读了大量书籍,可能不知觉的将其吸收的内容写入到论文中。
- 中国汽车占欧洲新车销售的比例将超过 10%
智库 Rhodium Group 的统计显示,截至 2025 年 12 月,中国生产的汽车占欧盟新车销售的 9.3%,比 2023 年 1 月上升 7.1 个百分点。预计 2026 年将超过 10%。从中国以外的第三国出口到欧洲等的中国品牌车的比例也在 2025 年 12 月达到 6.2%,增加 5.5 个百分点。欧盟从 2024 年秋季开始对中国产纯电动汽车加征关税。不过,中国企业增加了不属于加征对象的插电式混合动力车(PHV)的出口,势头并未减弱。 中国整车企业也陆续开设欧洲基地,进行采购和生产。
- 苹果准备涨价
苹果成为 AI 热导致内存短缺而涨价的最新一家公司。即将卸任的苹果 CEO 库克(Tim Cook)表示,内存供应状况“难以为继”,涨价“不可避免”。他没有透露何时涨价,也没有说明哪些产品会涨价,以及即将于 9 月发布的下一代 iPhone 18 是否会受到影响 。库克说,“在消费者急需设备时内存供应在减少,而内存厂商却选择大幅涨价。我们迫切需要内存价格和供应恢复到消费产品的合理水平。这是最为重要的。”内存价格自 2025 年 10 月以来翻了一番多。
- 美国暂缓将 DeepSeek 加入黑名单
美国暂缓将 DeepSeek 和长鑫存储等公司加入贸易黑名单以免中美关系再次紧张。如果被加入贸易实体清单,美国公司未经许可不得向其出口商品、软件和技术,而许可通常不会被批准。美国自去年十月以来就没有再更新实体清单。是否将某个实体列入黑名单的决定由一个跨部门委员会做出,该委员会成员包括美国商务部、国防部、能源部、国务院,偶尔还有财政部官员。该委员会已批准将一些公司列入黑名单,但商务部尚未公布名单。
- Epic Games 推出开源版本控制系统 Lore
Epic Games 宣布了新版本控制系统 Lore,源代码采用 MIT 许可证托管在 GitHub 上。Git 是最流行的版本控制系统,但它最初的是为 Linux 这一大型去中心化项目设计的,并没有为游戏或封闭环境下的大型私有软件开发优化。Git 不太适合游戏公司的纹理、3D 模型、音频等文件的协同开发,因此游戏领域流行的版本控制系统是私有的 Perforce,开源的 Lore 瞄准的就是该私有软件。Epic Games 称,“Lore是一个集中式、内容寻址的版本控制系统,使用默克尔树和不可变的版本链来表示仓库状态,并针对二进制优先存储、重复数据删除以及大规模的稀疏/按需数据水合进行了优化。”
- 六成美国消费者对品牌中的 AI 表示反感
根据 WordPress VIP 的报告《Future of the Web Report》,六成美国消费者对品牌信息中的 AI 表示反感。74% 的消费者认为今天的互联网没有 10 年前有人味;普通人冲浪 40 分钟就会产生在线互动缺乏真实感的感受——这被称为 Bot fatigue;16% 的消费者认为没有品牌真正有效利用了 AI,六成消费者认为品牌信息中的 AI 会让人倒胃口。
- GLP-1 减肥药有助于抑制暴力冲动
大量研究表明 GLP-1 药物不仅仅能减肥,它几乎无所不能。根据发表在《Criminology》期刊上的一项新研究,GLP-1 减肥药有助于抑制暴力冲动。研究人员强调这是一项观察性研究,并没有证明两者之间存在因果。GLP-1 药物在减轻体重过程中除了降低食欲外还会对行为产生影响,比如遏制对酒精的渴望。这一结果可能源于药物对冲动控制和奖赏处理感知的影响。而冲动和酒精饮用都是公认的暴力行为风险因素。研究人员分析了 7521 名美国成年人的调查数据,其中 821 人曾服用过 GLP-1 减肥药,597 人正在服用该药,受访者被询问了饮酒和冲动行为。结果显示正在服用 GLP-1 药物的人中冲动行为和暴力行为之间的关联减弱了 62%,饮酒行为与暴力行为之间的关联性减弱了 52%。
- 恶意墙纸瞄准中俄 Steam 用户窃取其账号
俄罗斯安全公司卡巴斯基对中俄 Steam 用户发出警告,恶意墙纸正在 Steam 创意工坊快速扩散,其目的是劫持他们的账号。攻击者利用了热门墙纸应用 Wallpaper Engine 创意工坊分享功能的漏洞,恶意程序隐藏在分享的壁纸包中。运行被感染的壁纸会导致 Steam 账号被盗,或者系统被植入后门或加密货币挖矿程序。安全研究人员在创意工坊发现了数十款恶意壁纸,每一款都被下载了数千次,甚至数万次。黑客主要针对中国 Steam 用户,墙纸的艺术风格和标题都专门针对中国玩家量身定制,中国玩家的下载量最多,占到了总下载量的 89.4%,其次是俄罗斯的 5.5%,新加坡 (1.4%)、香港 (0.9%)、德国 (0.9%)、越南 (0.9%)、印度 (0.5%) 和加拿大 (0.5%)。Steam 目前已经移除了包含恶意程序的墙纸。
- Firefox 用 Zlib 的 Rust 语言版本替代了 C 语言版本
Firefox 浏览器从 v151 开始,Gzip 压缩/解压缩就依赖于 zlib-rs 库,用 Rust 语言开发的版本替代了 C 语言版本改进了性能,提供了更好的内存安全性,以及带来了英特尔第 13 代/第 14 代酷睿 CPU 不稳定导致的崩溃问题。致力于用 Rust 语言重写关键库的非盈利组织 Trifecta Tech Foundation 在 2024 年夏天就与 Mozilla 讨论在浏览器中集成 zlib-rs,但从测试到落地花了两年时间,一个重要原因就是 zlib-rs 触发了臭名昭著的英特尔 CPU bug。测试中 zlib-rs 中的一些代码导致英特尔 Raptor Lake CPU 频繁崩溃,开发者最终发现问题与 Huffman 编码写入内存的一个特定指令相关,识别问题之后解决起来就容易了,开发者通过加入一段“不安全代码”修复了该问题。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.