Curated by Shen Huang · 90 stories · ~14 min read
DIGEST · 2026-06-09

OrangeBot.AI Digest — 2026-06-09

90 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. If Claude Fable stops helping you, you'll never know (jonready.com)
  2. GPT-2: Too Dangerous To Release (2019) (naokishibuya.github.io)
  3. CEOs Who Think AI Replaces Their Employees Are Just Bad CEOs (www.techdirt.com)
  4. Claude Fable 5 (www.anthropic.com)
  5. System Card: Claude Fable 5 and Claude Mythos 5 [pdf] (www-cdn.anthropic.com)
  6. Apple decided not to roll out Siri in EU after denied request for exemption (www.reuters.com)
  7. 'Sloppenheimer:' Amazon employees mock the company's AI on Slack (www.404media.co)
  8. FCC wants to kill burner phones by forcing telecoms to get all customers' IDs (www.404media.co)
  9. Albania Is Not for Sale: Kushner's $4B Resort Triggers'Flamingo Revolution' (www.yacnews.com)
  10. Cleaning up after AI rockstar developers (www.codingwithjesse.com)
  11. Making Graphics Like it's 1993 (staniks.github.io)
  12. GentleOS – Classic operating system with a lovely retro GUI (github.com)
  13. Microsoft's open source tools were hacked to steal passwords of AI developers (techcrunch.com)
  14. Facebook is paying people overseas promoting Alberta separatism (www.cbc.ca)
  15. OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision (opencv.org)

GitHub Trending(15)

  1. mvanhorn / last30days-skill
  2. RyanCodrai / turbovec
  3. roboflow / supervision
  4. opencv / opencv
  5. refactoringhq / tolaria
  6. aaif-goose / goose
  7. Andyyyy64 / whichllm
  8. TapXWorld / ChinaTextbook
  9. x1xhlol / system-prompts-and-models-of-ai-tools
  10. yikart / AiToEarn
  11. phuryn / pm-skills
  12. santifer / career-ops
  13. openai / plugins
  14. maziyarpanahi / openmed
  15. francescopace / espectre

Product Hunt(15)

  1. ZeroGPU

    The compute efficient layer for AI inference

  2. Krisp Voice Translation API

    Real-time speech-to-speech translation API

  3. AgentOS

    Manage AI agents, tasks, workspaces from one control layer

  4. Solarch

    Interactive diagrams with AI, and your code always in sync

  5. Whistle

    A fitness coach with personalized plans

  6. TravelMind

    AI-powered city discovery built on taste, not reviews

  7. Uiverse Design

    De-slop your AI generated websites

  8. VC Boom

    Score your deck, meet investors who fit, raise more. Boom!

  9. agmsg

    Stop copy-pasting between your AI coding agents

  10. BooBar

    AI Dynamic Island for your Mac

  11. Reve 2.0

    Generate and edit 4K images through layout-based control

  12. agentcad

    A CAD design tool for coding agents (free + open source)

  13. Signal Recorder SR-7

    On-device voice recorder that transcribes + exports Markdown

  14. Cove for Mac

    Like a save/load game for your work

  15. Pixel Snapper

    Editor to clean up AI-generated pixel art

Hugging Face(15)

  1. On the Geometry of On-Policy Distillation

    On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training dynamics remain poorly understood. We characterize the trajectory of OPD updates in parameter space and compare it with supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). A suite of parameter-space diagnostics consistently places OPD in a relaxed off-principal regime: compared with SFT, its updates affect fewer weights and avoid principal directions more strongly, while compared with RLVR, they remain less tightly constrained. Beyond this static localization, OPD exhibits subspace locking: its cumulative updates rapidly enter a narrow low-dimensional channel. Constraining training to the update subspace formed early in training preserves OPD performance but substantially degrades SFT, indicating that the locked subspace is functionally sufficient for OPD. Control experiments further show that sparsifying the update tokens and shifting rollout generation off-policy preserve the rank dynamics, whereas mixing the OPD objective with RLVR changes them. Overall, these results suggest that OPD is not merely an intermediate point between SFT and RLVR, but induces its own update geometry in parameter space.

  2. LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

    Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play LoRA adapters through a pretrained hypernetwork. LatentSkill stores skill knowledge in weight space rather than context space, removing per-step skill tokens while preserving modular loading, scaling, and composition. On ALFWorld and Search-QA, LatentSkill outperforms the corresponding in-context skill baseline while using substantially fewer prefill tokens: it improves ALFWorld success by 21.4 and 13.4 points on the seen and unseen splits with 64.1% fewer prefill tokens, and improves Search-QA exact match by 3.0 points with 72.2% lower skill-token overhead. Further analysis shows that generated skill LoRAs form a structured semantic geometry, can be precisely controlled via the LoRA scaling coefficient, and can be composed through parameter-space arithmetic when skill components are aligned. These findings suggest that weight-space skills provide an efficient, modular, and less exposed substrate for extending LLM agents.

  3. Latent Spatial Memory for Video World Models

    Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel space discards rich features of the learned latent representation. In this paper, we introduce latent spatial memory for video world models, a persistent 3D cache that stores scene information directly in the diffusion latent space, avoiding pixel-space reconstruction. Building on this, we propose Mirage, a latent-space spatial memory framework that constructs the memory by lifting latent tokens into 3D via depth-guided back-projection and queries it by synthesizing novel views through direct latent-space warping. This unified formulation eliminates both the information loss of pixel-space reconstruction and the computational burden of repeated encoding and rendering. Experiments show that latent spatial memory achieves up to 10.57times faster end-to-end video generation and 55times reduction in memory footprint relative to explicit 3D baselines. Leveraging the geometric prior of the diffusion model, Mirage attains state-of-the-art performance on WorldScore and strong reconstruction quality on RealEstate10K.

  4. FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

    Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. Rather than passively attending to all historical tokens, LSA proactively predicts future context demands and preserves only the query-critical KV chunks in the GPU memory. Crucially, we instantiate this architecture via a backbone-free decoupled training strategy. By formulating the indexer as a standard dual-encoder architecture, we train it independently using standard retrieval training frameworks without ever loading the massive backbone model into GPU memory. We demonstrate that this "less is more" paradigm significantly maximizes serving efficiency while acting as an effective attention denoiser in tasks that rely on long-term global memory. Across primary long-context evaluation suites (e.g., LongBench-v2, LongMemEval, and RULER), FM-DS-V4 compresses the average physical KV cache footprint down to merely 13.5% of the full-context baseline, while consistently preserving or slightly elevating downstream accuracy (+0.6% absolute margin on average). Crucially, at extreme 500K scales, FlashMemory suppresses the physical KV cache overhead by over 90% without destabilizing the backbone's core reasoning capacities.

  5. Agents' Last Exam

    Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is 2.6%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP-relevant impact.

  6. Echo-Memory: A Controlled Study of Memory in Action World Models

    We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera leaves and returns, the scene or salient object may silently change. Existing memory designs are hard to compare because gains are entangled with backbone, training, retrieval, and evaluation differences. Echo-Memory fixes the action-to-video interface and varies only how history is stored and read by the generator. Under a shared video diffusion backbone, optimizer, camera-action representation, sampler, and evaluation pipeline, we compare raw context, compression-based memory, spatial summaries with different read-out paths, and state-space recurrence. This matched matrix separates four otherwise conflated axes: capacity, compression, read-out, and recurrence. We also evaluate memory through a three-branch protocol: replay quality, in-domain loop revisit, and open-domain return probes. The branches routinely disagree, showing that replay fidelity is not a sufficient proxy for remembering a world. Three findings follow. Raw context is a strong capacity baseline and improves open-domain return far more than it improves replay metrics. Compactness is not a free substitute for capacity: aggressive spatial and hybrid-compression memories lose the salient evidence needed for return. Finally, block-wise state-space recurrence is the strongest open-domain return mechanism in our matrix, showing that the structure of implicit memory matters as much as the decision to use it. These results provide a compact protocol for studying memory in action world models beyond isolated replay metrics.

  7. OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

    Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games spanning Solo (7), PvP (3), and Coop (2) with unified action interfaces, and the Improvement Dynamics Curve (IDC), an agentic-reflection harness in which a tool-using reflector LLM autonomously refines a bounded skill prompt across multiple rounds. Beyond cold-start leaderboard scores, IDC exposes two additional observables for each (agent, game) pair: how the score evolves across reflection rounds, and how the learned skill behaves on held-out task variants. We report these observables for twelve VLM agents on the cold-start leaderboard and four top agents under IDC.

  8. End-to-End Context Compression at Scale

    Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, many methods require the input to fit within the target model's context window, and are generally incompatible with modern production inference engines. Encoder-decoder compressors, which map a long token sequence to a shorter sequence of latent embeddings consumed by a decoder, are an appealing alternative in principle. However, existing approaches are not competitive with KV cache compression on the accuracy-efficiency frontier. In this work, we revisit encoder-decoder compression and close this gap. We first perform an architecture search, pre-training many variants from scratch to determine how best to design and train encoder-decoder compressors. Guided by our findings, we continually pre-train a family of 0.6B-encoder, 4B-decoder models on over 350B tokens each, at compression ratios of 1:4, 1:8, and 1:16. We introduce Latent Context Language Models (LCLMs), a family of compressors that improve the Pareto frontier across general-task performance, compression speed, and peak memory usage. We demonstrate that LCLMs serve as efficient backbones for long-horizon agents, letting the agent skim through a compressed long context and adaptively expand relevant segments on demand.

  9. A Geometric Account of Activation Steering through Angle-Norm Decomposition

    Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address limitations of additive interventions, often motivated by the assumption that hidden-state norm does not carry concept-relevant information. In this work, we revisit this assumption through a controlled empirical study designed to disentangle the roles of angular and radial components. We show that steering methods differ mainly in how they couple two geometric effects: changing a token's angular alignment with a concept direction and changing its hidden-state norm. Across seven language models, we find that concepts are represented primarily in angular structure, supporting the motivation for spherical methods, but that norm remains important for the stability and downstream effects of steering. Our results explain why interventions with similar concept-level effects can behave differently, and suggest that activation steering should be parameterized by interpretable angular and radial components of the intervention, rather than by a single additive coefficient that entangles these two effects.

  10. SwiftVR: Real-Time One-Step Generative Video Restoration

    Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions and the latency-memory overhead of large video autoencoders. We present SwiftVR, a streaming one-step generative VR framework that reduces both bottlenecks under a causal chunk-wise protocol. For attention, mask-free shifted-window self-attention gathers each spatial window into a dense tensor via deterministic indexing, keeping all attention calls on the dense scaled dot-product attention path without masks, cyclic shifts, padding, or hardware-specific sparse kernels. Because SwiftVR uses only standard dense SDPA calls, the trained model transfers to consumer GPUs without retraining or custom kernels. For autoencoding, a lightweight Restoration-aware Autoencoder enables fast chunk-wise decoding while preserving reconstruction quality. On a single H100, SwiftVR sustains 31~FPS at 2560x1440 and 14~FPS at 3840x2160, whereas all compared diffusion-based VR baselines exceed the memory limit at 4K. On a consumer RTX~5090, SwiftVR reaches 26~FPS at 1920x1080. To our knowledge, SwiftVR is the first generative VR model to achieve real-time 1080p streaming on a consumer-grade GPU, while attaining strong no-reference perceptual quality with lower inference cost. Project is available at https://h-oliday.github.io/SwiftVR.

  11. AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

    World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch to model near-term frame variations that are redundant and weakly informative. We posit that strictly binding world prediction and action execution to the same temporal rhythm may underutilize the potential of the video branch for embodied control. Therefore, we propose AHA-WAM, an Asynchronous Horizon-Adaptive World-Action Model built on a dual Diffusion Transformer (DiT) architecture that reorganizes world-action modeling around this temporal asymmetry. AHA-WAM instantiates the video DiT as a low-frequency world planner that maintains rolling key-value memory over past observations and exposes reusable layerwise latent context encoding long-horizon scene evolution, while a high-frequency action DiT executes short action chunks in closed loop by querying this context through layerwise joint attention. To support asynchronous execution, we introduce horizon-adaptive offset training and Observation-Guided Video-Context Routing (OVCR), which together let the action expert exploit long-horizon world context while remaining responsive to real-time execution state without rerunning the video DiT. Experiments on RoboTwin and real-world manipulation tasks show that AHA-WAM achieves state-of-the-art performance without any robot-data pretraining, attaining 92.80% average success on RoboTwin and 78.3% success across 4 real-world tasks, while reaching 24.17 Hz closed-loop control with a 4.59x speedup over Fast-WAM.

  12. Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

    LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but they are often revised by heuristic reflection or by reusing observed successes and failures as if counts alone were reliable belief. We introduce Bayesian-Agent, a native and cross-harness framework that treats reusable skills and SOPs as hypotheses about whether a frozen model will succeed under a particular prompt, context, and harness environment. Bayesian-Agent records verified trajectory evidence, maintains a feature-conditioned categorical posterior over each skill, and maps posterior state into inspectable actions such as patch, split, compress, retire, and explore. Model-facing prompts receive executable guardrails and failure-mode patches, while posterior summaries remain available for audit. With deepseek-v4-flash, incremental repair improves SOP-Bench from 80\% to 95\%, Lifelong AgentBench from 90\% to 100\%, and RealFin-Bench from 45\% to 65\%. We further evaluate Bayesian-Agent's native backend and optional GenericAgent, mini-swe-agent, and Claude Code backends. The results include positive, negative, saturated, and case-study settings, suggesting that agent skill evolution is best viewed as posterior-guided harness optimization rather than uncalibrated prompt accumulation. The source code is available at https://github.com/DataArcTech/Bayesian-Agent.

  13. Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

    Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.

  14. Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

    Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.

  15. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

    We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty. Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population. This creates cross-model adversarial pressure that drives robustness beyond intra-model self-play. Evaluated on the Core War domain, a competitive programming benchmark in which Redcode warrior programs battle inside a simulated machine, a four-node heterogeneous ensemble (GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, and Claude Haiku 4.5) achieves 124 percent higher merged-archive QD-Score (45.90 vs. 20.46) and 28 percent higher coverage (80.6 percent vs. 63.0 percent of cells) than a single-node baseline at equal total LLM-call budget. The heterogeneous ensemble also outperforms an equally-budgeted homogeneous ensemble on QD-Score, coverage, and held-out solution generality across all four model families. These results provide the first empirical evidence that model diversity, not merely parallelism, is the key driver of gain in distributed LLM-based QD search.

Techmeme(15)

  1. Super Micro Computer aims to raise $7B in a series of equity and equity-linked financing transactions to fund its component purchases to satisfy AI orders (Harshita Mary Varghese/Reuters)

    Harshita Mary Varghese / Reuters : Super Micro Computer aims to raise $7B in a series of equity and equity-linked financing transactions to fund its component purchases to satisfy AI orders —  Super Micro Computer (SMCI.O) on Tuesday said it plans to raise $7 billion in a series of equity and equity-linked financing transactions …

  2. Analysis: Trump and his sons profited $2.3B+ from four crypto ventures including $TRUMP since January 2025, while other investors in those projects lost ~$2.3B (Reuters)

    Reuters : Analysis: Trump and his sons profited $2.3B+ from four crypto ventures including $TRUMP since January 2025, while other investors in those projects lost ~$2.3B —  Risking little of their own money, the US president and his sons have added at least $2.3 billion to the family fortune …

  3. Docs: ~34K Instagram accounts, including Obama's White House account, were affected in the breach tied to Meta's AI chatbot; attackers changed 3,500+ usernames (New York Times)

    New York Times : Docs: ~34K Instagram accounts, including Obama's White House account, were affected in the breach tied to Meta's AI chatbot; attackers changed 3,500+ usernames —  The flaw, which Meta said it had fixed, allowed anyone to take over Instagram accounts using a bug in the company's new artificial intelligence software.

  4. Anthropic says Fable 5 has invisible safeguards that use prompt modification, steering vectors, or PEFT to limit its effectiveness for building frontier LLMs (Matthias Bastian/The Decoder)

    Matthias Bastian / The Decoder : Anthropic says Fable 5 has invisible safeguards that use prompt modification, steering vectors, or PEFT to limit its effectiveness for building frontier LLMs —  Key Points … Ask about this article...  Both models share the same base model.  Fable 5 ships with conservative safety guardrails for general use.

  5. Hands-on with Claude Fable 5: impressive results with complex projects, including a data analysis tool built in just 9.5 hours and an interactive isochrone map (Ethan Mollick/One Useful Thing)

    Ethan Mollick / One Useful Thing : Hands-on with Claude Fable 5: impressive results with complex projects, including a data analysis tool built in just 9.5 hours and an interactive isochrone map —  Claude Fable represents another big jump in AI  —  I had early access to the first Mythos-class AI model being released to the public, Claude 5 Fable.

  6. Kalshi plans to require users seeking to make bets in some markets linked to material nonpublic information to submit an online form disclosing where they work (Wall Street Journal)

    Wall Street Journal : Kalshi plans to require users seeking to make bets in some markets linked to material nonpublic information to submit an online form disclosing where they work —  The popular predictions market is putting new guardrails in place for certain bets as concerns about suspicious activity rise

  7. Apple says 79% of all iPhones and 86% of iPhones released in the past four years were running iOS 26 as of June 7, vs. 82% and 88% for iOS 18 before WWDC 2025 (Joe Rossignol/MacRumors)

    Joe Rossignol / MacRumors : Apple says 79% of all iPhones and 86% of iPhones released in the past four years were running iOS 26 as of June 7, vs. 82% and 88% for iOS 18 before WWDC 2025 —  Apple has shared updated iOS 26 and iPadOS 26 adoption figures, revealing how many iPhones and iPads were running those software versions …

  8. Sources: Canadian payments company Nuvei is in advanced talks to acquire NYC-based cross-border payments company Payoneer for ~$2.7B and could sign within days (Milana Vinn/Reuters)

    Milana Vinn / Reuters : Sources: Canadian payments company Nuvei is in advanced talks to acquire NYC-based cross-border payments company Payoneer for ~$2.7B and could sign within days —  Canadian payments firm Nuvei is in advanced talks to acquire cross-border payments company Payoneer Global (PAYO.O) …

  9. A US judge cancels a Mississippi trial and disqualifies and fines lawyers on both sides after finding their filings were filled with hallucinated case citations (Jason Koebler/404 Media)

    Jason Koebler / 404 Media : A US judge cancels a Mississippi trial and disqualifies and fines lawyers on both sides after finding their filings were filled with hallucinated case citations —  When two AIs argue against each other, the legal system loses.  —  The lawyers on both sides of a federal court case …

  10. Anthropic says internal and external red team tests of Fable 5 found no universal jailbreaks; it will keep user traffic for 30 days, aligning with Trump's AI EO (Derek B. Johnson/CyberScoop)

    Derek B. Johnson / CyberScoop : Anthropic says internal and external red team tests of Fable 5 found no universal jailbreaks; it will keep user traffic for 30 days, aligning with Trump's AI EO —  Claude Fable 5 offers Mythos-level performance for most tasks with safeguards on sensitive topics.  Anthropic claims testing found no universal jailbreaks.

  11. Anthropic says Claude Fable 5 uses conservative safety classifiers that trigger a fallback to Claude Opus 4.8 in <5% of sessions, in areas like cybersecurity (Anthropic)

    Anthropic : Anthropic says Claude Fable 5 uses conservative safety classifiers that trigger a fallback to Claude Opus 4.8 in <5% of sessions, in areas like cybersecurity —  Today we're launching Claude Fable 5: a Mythos-class1 model that we've made safe for general use.

  12. Anthropic says Fable 5 is available on Pro, Max, Team, and seat-based Enterprise plans through June 22, after which using Fable 5 will require usage credits (Rebecca Bellan/TechCrunch)

    Rebecca Bellan / TechCrunch : Anthropic says Fable 5 is available on Pro, Max, Team, and seat-based Enterprise plans through June 22, after which using Fable 5 will require usage credits —  Anthropic is bringing its most powerful AI model to the general public for the first time, but it's doing it with guardrails.

  13. Anthropic prices both Claude Fable 5 and Mythos 5 at $10 per 1M input and $50 per 1M output tokens, less than half the price of Claude Mythos Preview (David Gewirtz/ZDNET)

    David Gewirtz / ZDNET : Anthropic prices both Claude Fable 5 and Mythos 5 at $10 per 1M input and $50 per 1M output tokens, less than half the price of Claude Mythos Preview —  ZDNET's key takeaways  — Anthropic is releasing Claude Fable 5 for general users.  — Fable 5 uses Mythos-class power with safety controls.

  14. Anthropic debuts Claude Fable 5, a "safe" Mythos-class model it says can't be used for cyberattacks, to the public, and Claude Mythos 5 to trusted organizations (Wired)

    Wired : Anthropic debuts Claude Fable 5, a “safe” Mythos-class model it says can't be used for cyberattacks, to the public, and Claude Mythos 5 to trusted organizations —  Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can't be used for cyberattacks.

  15. Apple updates its App Store review guidelines to say it may remove apps in saturated categories "if they are not updated, improved, or do not attract customers" (Sarah Perez/TechCrunch)

    Sarah Perez / TechCrunch : Apple updates its App Store review guidelines to say it may remove apps in saturated categories “if they are not updated, improved, or do not attract customers” —  Apple is warning developers that some of their apps may not be able to call the App Store home forever.

Solidot(15)

  1. Donut Lab 的全固态电池被认为就是普通锂离子电池

    在 CES 2026 上芬兰初创企业 Donut Lab 宣称其研发出一款能量密度达 400Wh/kg、循环寿命 10 万次、5 分钟即可充满电,并且在 -30℃-100℃ 的温度范围内,仍能保持 99% 以上容量的固态电池。由 20 多位业内独立专家开展的调查证实,全固态电池系造假,实为普通锂离子电池。证据包括:其电压曲线与现有液态高镍三元锂离子电池特征完全吻合;电池充电时离子会嵌入负极材料,使电池产生规律性膨胀,采用石墨负极的电池,在电量充至 50% 至 70% 区间时,膨胀曲线会出现一处明显拐点,这是离子在石墨层状结构中重新排布所形成的独有特征,Donut Lab 的这款电池,曲线中恰好出现了这一标志性拐点。电池的实际能量密度约为 298Wh/kg,属于当前三元锂电池的正常水平。调查团队发现,Donut Lab 之所以如此欺诈宣传,核心是为了从资本市场获利,在该公司 1300 余名股东中,逾 900 人持股不超过 50 股,单笔投入估计在 3000 至 23000 美元之间。

  2. iPhone 与美国生育率下降相关

    美国总生育率自 2007 年以来下降了 22%,这一下降趋势难以用经济状况、避孕、住房或托儿成本等进行解释,智能手机的普及被认为与生育率下降相关,2007 年就是第一代 iPhone 发布之年。在美国,从 2007 年 6 月到 2011 年 2 月,iPhone 仅在 AT&T 网络销售。这就是为研究智能手机对生育影响提供了一个天然的实验环境。研究人员利用 AT&T 移动网络覆盖范围的差异去识别 iPhone 对生育的影响。结果显示,iPhone 的普及使 15-19 岁女性的生育率下降了 4.5%-8.0%,20-24 岁女性的生育率下降了 3.2%-6.6%。iPhone 的普及加速了 30 岁以下女性生育率的下降,抑制了 30 岁以上女性生育率的上升。研究人员称,iPhone 的普及能解释 15-44 岁女性总体生育率下降的 33%-52%。原因被认为是智能手机减少了线下的面对面人际交往,增加了色情内容的使用,降低了性生活频率。

  3. Falcon 9 火箭第一级 B 1067 执行了 35 次发射任务

    本周一编号为 B 1067 的 Falcon 9 火箭第一级完成了第 35 次发射任务,在将 29 颗 Starlink 卫星送入轨道之后成功着陆在无人驳船 A Shortfall of Gravitas 上。B 1067 是 SpaceX 重复使用次数最多的火箭第一级,服役了五年多时间,曾在一个月内执行了两次发射,SpaceX 的目标是火箭第一级能重复使用 40 次,B 1067 正接近这一目标。B 1067 发射次数比竞争对手联合发射联盟(ULA)过去五年的总发射次数还要多(ULA 完成了 29 次发射)。

  4. 联合国报告警告海洋承受巨大压力

    最新发布的《世界海洋评估》报告警告,气候变化、污染、过度开发等多重压力正在持续削弱海洋健康,而海洋的未来与人类的未来紧密相连。报告指出,即便远离海岸,海洋依然深刻影响着每个人的生活。海洋吸收了地球大部分额外热量和温室气体,在减缓气候变化方面发挥关键作用。海洋还为全球数十亿人口提供食物、氧气和药物资源,并支撑着全球贸易、旅游业和大量就业岗位。报告强调,海洋环境恶化不仅会影响沿海地区,还将波及粮食安全、供应链稳定以及全球经济发展。评估显示,海洋变暖和海平面上升正在加速。由于冰盖融化和海水热膨胀,全球海平面上升速度已从 2015 年前每年最高 1.9 毫米增加到 2023 年的 4.3 毫米。北极升温速度达到全球平均水平的四倍。与此同时,海洋缺氧区面积已扩大至约 450 万平方公里,大量海洋生物生存空间受到挤压。自 1970 年代以来,加勒比地区约 80% 的珊瑚礁已经消失。如果全球升温超过工业化前水平 1.5 摄氏度,全球 90% 的珊瑚礁可能面临消失风险。报告显示,每年约有 5200 万吨塑料垃圾进入海洋,形成约 24 万亿个微塑料颗粒,已影响 4000 多种海洋生物。

  5. 微软开源工具被植入窃取凭证的恶意代码

    微软下线了数十个托管在 GitHub 上的开源项目,原因是安全公司发现这些项目被入侵植入了窃取密码等敏感凭证的恶意代码。微软在一份声明中表示,它正对此展开调查,部分下线的项目在审核之后已恢复上线,作为调查的一部分,它通知了下载受影响项目的一小部分用户。调查显示,至少 73 个项目受到影响。这是过去一个月微软第二次开源项目库遭到入侵。

  6. 世界杯可能有 97 场比赛受高温影响

    气候中心(Climate Central)发布分析结果称,美加墨世界杯比赛将遭遇全球变暖带来的高温天气,球员表现受到负面影响的可能性升高。此次世界杯将在 16 个场馆共举行 104 场比赛,其中 97 场比赛可能出现导致恢复能力等下降的炎热天气。不仅球员的健康风险上升,比赛的质量也可能受到影响。本届世界杯由美国、墨西哥、加拿大共同主办,赛程为当地时间 6 月 11 日至 7 月 19 日。基于以往数据对赛事期间气温的预测显示,有较高概率在 97 场比赛中出现超过 28 度的气温。此前研究指出,超过 28 度会对球员的跑动速度、距离与恢复时间产生影响,也会影响到战术和比赛风格。

  7. 企业批准员工以宗教理由不使用 AI

    美国企业在强推 AI 之际公众对 AI 的抵触情绪也日益高涨。现在一名叫 Erin Maus 的 34 岁软件工程师找到了一种变通方法,以宗教理由豁免于使用 AI。她信仰普救一位神教(Unitarian Universalism),这是一个开明、包容的宗教,接受多元化和互联性,致力促进个人灵性成长。她以 AI 的环境和伦理问题为由称使用 AI 与其宗教信仰不符。她的雇主上个月批准了宗教豁免。Maus 说,她现在仍然手写代码,自己审查代码,就和两年前一样。

  8. 网信办对网络评测进行设限

    国家网信办、市场监管总局联合发布了《网络测评活动规范》。网信办称,制定该规范的原因是“一些网络测评存在夸大宣传、只评不测、商测一体等问题,不仅影响消费者信任度和购物体验,也扰乱市场环境”。《规范》要求: 三、网络测评所选取的样本,应当是消费者可以从市场上购买到的普通商品且来源可以追溯,不得是为测评活动准备的特殊物品。从事网络测评活动,接受第三方委托、赞助或者与测评样本相关方存在利益关系的,应当作出显著提示。 四、从事网络测评活动,涉及对产品功能、性能等项目测试,应当委托具有法定检验检测资质许可的检验检测机构按照相关标准以及技术规范开展测试,并明示测试依据的标准以及技术规范,按照规定保留测试样本以及测试数据、图片、视频等记录,确保测试数据、结果可以追溯。 五、未对产品开展测试,仅凭感知、观察、体验等主观感受对产品进行评价,应当进行说明,并在信息展示过程中显著标明“仅为个人体验”或者“主观感受,仅供参考”等内容。

  9. 被时尚潮流占据的社交网络

    社交网络不再是为了社交,而是为了跟随时尚潮流。今天的社交活动主要发生在消息应用上。社交媒体正演变成类似电视的被动式平台,但不同于需要遥控器去切换电视频道,社媒平台的算法已经为你量身定制了内容,平台利用你的信息获利,作为回报它提供的内容是免费的。社交平台的核心商业模式仍然是广告,而且其收入还在持续增长。2026 年全球社交媒体广告收入将达到 3170 亿美元,超过 2025 年的 2770 亿美元。其中 Meta 的广告收入将达到 2430 亿美元,预计将首次超过 Google。Instagram 和 TikTok 之类的大型平台越来越注重娱乐和发现内容,而 WhatsApp 之类的应用则变成社交活动的主要场所,但此类消息应用的变现比较难。

  10. 苹果宣布 Google Gemini 驱动的 Siri AI

    苹果在 2026 年 WWDC 开发者大会上宣布了 Google Gemini 驱动的新一代 Apple 智能和 Siri AI。驱动 AI 功能的运算运行在设备上或者私有云上。苹果称,“Siri 能够利用对个人情境的理解,搜索信息、邮件、照片等内容,并通过更加全系统化的 app 操作,完成跨 app 任务。Siri AI 能够回答与用户屏幕上的内容相关的问题,也可以利用广博的世界知识,上网获取最新信息,生成有用的答案。通过专门的 Siri app,用户可重新访问过往对话或发起新对话,并利用 iCloud 在用户的各种设备上私密同步对话历史记录。”由于欧盟的隐私和消费者保护监管规定,AI 智能暂时不会在欧盟推出,苹果表示,“Apple 智能推出时间依监管部门审批情况而定,Siri AI 和其他新的 Apple 智能功能在中国大陆尚不可用。”

  11. OpenAI 申请 IPO

    OpenAI 已秘密提交了 IPO 申请。秘密提交上市申请允许企业在不公开披露财务信息的情况下推进上市计划。OpenAI 以及 SpaceX 和 Anthropic 是近期最受瞩目的 IPO 事件,三家公司的市值有可能达到 4 万亿美元。OpenAI 在声明中表示它尚未决定上市日期,它也未披露将会出售多少股份。OpenAI 表示将在最佳的时机选择上市。OpenAI 最近一轮融资是在今年 3 月,融资 1220 亿美元估值 8520 亿美元,它的估值已经落后于主要竞争对手 Anthropic。

  12. 肥胖会影响精子质量改变表观遗传标记

    根据发表在《Current Obesity Reports》期刊上的一项研究,肥胖并非只是个人选择的结果,肥胖风险的遗传率高达 40%-70%,能通过复杂的生物和环境因素代代相传。最新证据表明,肥胖会影响精子质量,改变表观遗传标记。这些变化可能会影响儿童的食欲调节、新陈代谢和长期患病风险。好消息是这些变化是可逆转的。生活方式改变以及减肥可改善精子健康,改变与肥胖相关的表观遗传模式。

  13. 韦伯首次测量早期宇宙休眠黑洞质量

    天文学家利用韦伯太空望远镜以及引力透镜效应首次测量了一个早期宇宙休眠黑洞质量。该黑洞是 MRG-M0138 星系的中心,星系已经不再形成恒星,而黑洞也不再吞噬周围的物质而处于休眠状态。MRG-M0138 位于一个巨大星系团的背后,被引力透镜效应放大了约 30 倍。黑洞距离地球大约 100 亿光年,其质量为太阳的 60 亿倍。天文学家组合了引力透镜以及黑洞引力对恒星运动的影响确定了其质量。

  14. 平台算法给民主带来风险

    越来越多的证据表明社媒平台算法给民主带来了风险。由于算法的不透明性以及以最大化用户参与度和平台停留时间为导向,完全不在乎推送内容的质量,算法被认为是造成政治极化的罪魁祸首。以 X 平台为例,在马斯克(Elon Musk)在 2024 年宣布支持特朗普之后,倾向共和党的账号曝光度显著提升。马斯克本人在 2024 年 7 月至 11 月间所发布推文的累计浏览量高达 171 亿次,超过了该平台所有政治竞选广告的总和。2025 年德国联邦选举期间,各大社交平台算法推荐给年轻用户的政党相关内容中半数涉及极右翼政党。一项分析发现,X 平台算法不成比例的放大了政治极端政党(尤其是极右翼政党)的内容,系统性压制中间政党。另一项研究发现,相比按时间排序的内容,用户接触 X 平台算法推送内容七周后,政治态度会向更保守的方向转变。禁用算法后这种转变并未逆转。这些研究显示平台算法目前的运作方式不利于民主。社媒平台算法放大极端声音导致的一个结果是扭曲对观点分布的感知,发表边缘观点的人会认为自己是主流,这种网络同质性被称为“虚假共识效应(false consensus effect)”。如果不能采取强有力的保护措施,我们会进入到一个日益极化和分裂的威权社会。

  15. GLP-1 减肥药与更低的乳腺癌风险相关

    根据发表在《JCO Oncology Practice》期刊上的一项研究,服用 GLP-1 减肥药与女性更低的乳腺癌风险相关。对逾 11 万名年龄在 45 岁至 80 岁之间的回顾性分析发现,服用 GLP-1 药物的女性患乳腺癌的风险比未服用的女性低约 30%。这是一项观察性研究,GLP-1 减肥药与降低乳腺癌发病率之间是否存在关联还有待进一步研究。GLP-1 药物模拟了人体天然激素 glucagon‑like peptide‑1,该激素有助于调节血糖和食欲。GLP-1 药物最初被用于减肥,如今被发现还可能有助于预防癌症。研究人员指出,GLP-1 药物会影响许多与癌症发展相关的靶点和通路,因此值得进一步展开研究。

NEWSLETTER · FREE · WEEKLY

OrangeBot Weekly

5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.