About LLMs & Generative AI
Large Language Models (LLMs) are the foundation of the current AI wave — transformer-based neural networks trained on massive text corpora that can generate, summarize, translate, and reason. OrangeBot.AI's LLM topic feed pulls news, releases, and research papers about GPT, Claude, Gemini, Llama, Qwen, DeepSeek, and the broader frontier model landscape. Updated daily from 8 sources, deduplicated and ranked.
LLMs & Generative AI
The latest on large language models, foundation models, and generative AI.
129 unique stories from the last 14 days across 8 sources.
Hacker News(19)
- Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?
- CrankGPT (crankgpt.com)
- Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model (github.com)
- Amazon CEO's talks with U.S. officials triggered crackdown on Anthropic models (www.wsj.com)
- "Don't You Just Upload It to ChatGPT?" (correresmidestino.com)
- Anthropic apologizes for invisible Claude Fable guardrails (www.theverge.com)
- Anthropic's model naming, extrapolated (samwilkinson.io)
- Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use (github.com)
- If Claude Fable stops helping you, you'll never know (jonready.com)
- GPT-2: Too Dangerous To Release (2019) (naokishibuya.github.io)
- Claude Fable 5 (www.anthropic.com)
- System Card: Claude Fable 5 and Claude Mythos 5 [pdf] (www-cdn.anthropic.com)
GitHub Trending(4)
Product Hunt(11)
- Glint
Claude Code activity, right where you want it.
- Notchcode
Claude Code + Codex agents in your notch
- EmailFlow.AI
Like Claude Design for Email Newsletters
- Conan
A native Mac cockpit for Claude Code
- Memoriq
Your private AI memory for ChatGPT, Claude, Gemini and Grok
- CrustRecruiter
Turn Claude into a recruiter that thinks like you
- Spotlight by Backplanes
Session reports for Claude Code & Codex to improve your code
- Gemini 3.5 Live Translate
Latest audio model for live speech-to-speech translation
- Claude Artifact Player
Run your Claude AI artifacts natively, No browser. No cloud.
- ChatPilot
Bulk delete, archive & timestamp your ChatGPT conversations
- Boxes.dev
Run Claude Code and Codex in your own cloud environment
Hugging Face(52)
- JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence
Many moments in the real world do not wait for a user to ask. A fire starts on a security monitor, an expression flickers across a video call, or a product a viewer wants flashes by in a livestream. Yet today's large models remain mostly turn-based by design: they answer only when addressed, and even video-call apps that appear interactive still operate as question-answer systems, reacting only when polled or prompted. We argue for a different paradigm: a model that is present in the world like a person. It continuously watches what is happening now, decides on its own whether to speak or stay silent, interacts in real time, and delegates to a background model when the problem is hard. To advance interaction models and their adoption across domains, we make two fully open-sourced contributions. First, we release JoyAI-VL-Interaction, an 8B-scale, vision-first VL-interaction model. The model makes the response decision internally, choosing each second to stay silent, respond, or delegate to a background model, and it excels at vision-triggered responsiveness and time awareness. We pair it with a transferable training recipe, from which capabilities we never trained for emerge, such as guiding a shopper through changing app screens or improvising a lecture from a slide deck. Second, we release a complete, deployable system built around that model. The system streams any ongoing video into the model, making it genuinely present in the world. All other components are pluggable, including ASR/TTS modules, memory, visualization UI, and a background brain that can connect to any API or agent. Across six real-world scenarios, human raters prefer JoyAI-VL-Interaction over the in-app video-call assistants of Doubao and Gemini by a wide margin. To our knowledge, this is the first open, vision-driven interaction model released together with its training recipe, data, and complete deployable system.
- FastContext: Training Efficient Repository Explorer for Coding Agents
Large Language Model (LLM) coding agents have achieved strong results on software engineering tasks, yet repository exploration remains a major bottleneck: locating relevant code consumes substantial token budget and pollutes the agent's context with irrelevant snippets. In most agents, the same model explores the repository and solves the task, leaving exploratory reads and searches in the solver's history. We present FastContext, a dedicated exploration subagent that separates repository exploration from solving. Invoked on demand, FastContext issues parallel tool calls and returns concise file paths and line ranges as focused context. FastContext is powered by specialized exploration models spanning 4B--30B parameters. We bootstrap them from strong reference-model trajectories and refine them with task-grounded rewards for broad first-turn search, multi-turn evidence gathering, and precise citation generation. Across SWE-bench Multilingual, SWE-bench Pro, and SWE-QA, integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates up to 5.5\% while reducing coding-agent token consumption up to 60\%, with marginal overhead. These results show that repository exploration can be separated from solving and handled effectively by specialized models. Code and data: https://github.com/microsoft/fastcontext
- VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models
This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations demonstrate that VibeThinker-3B achieves frontier-level performance on highly demanding verifiable tasks. Specifically, it attains a score of 94.3 on AIME26 (improving to 97.1 with claim-level test-time scaling), an 80.2 Pass@1 on LiveCodeBench v6, and exhibits strong out-of-distribution generalization with a 96.1\% acceptance rate on recent unseen LeetCode contests. This effectively places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2, GLM-5, and Gemini 3 Pro. Furthermore, a score of 93.4 on IFEval confirms that this extreme reasoning enhancement does not compromise strict instruction controllability. Extending our previous 1.5B work, these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios. This perspective suggests that compact models are not merely deployment-efficient substitutes, but a complementary path toward frontier-level performance in parameter-dense capability regimes.
- VisualClaw: A Real-Time, Personalized Agent for the Physical World
Vision language models are serving as general-purpose interfaces for complex multimodal tasks. However, deployment still faces three gaps: VLMs typically incur high latency and cost when processing dense video frames and long prompts, the agent scaffold remains static after deployment, and standard video-QA benchmarks do not test whether agents can use visual evidence inside tool-using workspaces. We present VisualClaw, a self-evolving multimodal agent built around two principles. First, hybrid encoding reduces deployment cost by filtering less informative streaming frames with a cascaded gate and compressing the text skill bank through hot/cold top-k injection. Second, skill evolution lets the agent learn from failures: retrieved memories condition an evolver as direct concatenated context or as guided evidence, producing skill-bank updates that help future questions. Across 4 video-QA benchmarks with 2 VLMs, VisualClaw cuts per-question API cost by an average -98% versus full-frame upload and by -25.9% over the offline uniform 8 frame baseline, while boosting accuracy in most settings, e.g., an average +3.85% and a peak +15.80% on EgoSchema with Gemini 3 Flash. To address the gap, we curate VisualClawArena, a 200-scenario multimodal agentic benchmark built through a strict five-stage pipeline; models must use video evidence, documents, dynamic updates, and executable checks inside a workspace. On VisualClawArena, the same framework with computer-use agent backends improves macro accuracy by +2.9% for Codex (GPT-5.5) and +3.2% for Claude Code (Sonnet 4.6) over no-evolution baselines, with a -9.5% cost reduction compared to the uniform-sampled baseline. These properties make VisualClaw a natural fit for edge applications, where the cascade reduces a 1-hour streaming session from ~3,600 API uploads down to only 5-20 calls and the self-evolution makes it a perfect personalized assistant.
- OneRank: Unified Transformer-Native Ranking Architecture for Multi-Task Recommendation
Multi-task learning (MTL) is essential in recommender systems to enable complementary learning among diverse user feedback. While modern industrial practices have shifted from DNNs to Transformer-centric architectures to strengthen sequence modeling and scaling capacity, they still decouple feature encoding from multi-task prediction, treating the Transformer as a task-agnostic encoder. This design fundamentally limits the performance and scalability by (1) creating an information bottleneck under heterogeneous task objectives, (2) inducing gradient interference that leads to the seesaw phenomenon, and (3) forcing a dataflow transition in which attention-based, context-adaptive representation learning is converted to static feed-forward task prediction with incompatible information read-write dynamics. We propose OneRank, a Transformer-native multi-task ranking framework that eliminates encoder-predictor separation and introduces task-private channels for forward representation learning and backward optimization, enabling task-specialized learning while reducing inter-task interference. In the forward pass, OneRank learns task-specific representations bottom-up through task-conditioned information selection, candidate-aware contextualization, and controlled cross-task interaction. In the backward pass, cross-task gradient detachment isolates task-private parameter updates from shared knowledge extraction modules, preventing negative transfer. We further replace static task-specific MLP scorers with dynamic matching-based scoring for context-aware personalized ranking. By internalizing multi-task reasoning within the Transformer stack, OneRank establishes a unified and scalable architectural paradigm. Offline and online experiments on large-scale industrial datasets show that OneRank significantly outperforms state-of-the-art baselines while maintaining computational efficiency.
- OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in poor performance in complicated camera motion cloning. To address these issues, we introduce a general camera motion representation that encodes cameras as grid motion videos. This camera grid represents the camera parameters visually and supports the integration of diverse trajectories for multi-shot video generation. Building upon this, we propose OmniDirector, a unified framework trained on a million-scale camera grid-video pairs that coordinates characters, actions, and cameras to provide director-level control for multimodal diffusion transformers. Furthermore, we design a novel hierarchical prompt expansion agent that harmoniously integrates different control signals by systematically describing camera motion and visual content through understanding signal relationships. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework. Project page: https://ymlinfeng.github.io/OmniDirector.github.io/
- Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents
Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. While current memory-augmented agents rely on a static retrieve-then-reason paradigm, this rigid pipeline design prevents them from dynamically adapting memory access to intermediate evidence discovered during inference. To bridge this gap, we propose MRAgent, a framework that combines an associative memory graph with an active reconstruction mechanism. We represent memory as a Cue-Tag-Content graph, where associative tags serve as semantic bridges connecting fine-grained cues to memory contents. Operating on this structure, our active reconstruction mechanism integrates LLM reasoning directly into memory access, allowing the agent to iteratively explore and prune retrieval paths based on accumulated evidence. This ensures that memory retrieval is dynamically adapted to the reasoning context while avoiding combinatorial explosion caused by unconstrained expansion. Experiments on the LoCoMo benchmark and LongMemEval benchmark demonstrate significant improvements over strong baselines (up to 23%), while substantially reducing token and runtime cost, highlighting the effectiveness of active and associative reconstruction for long-horizon memory reasoning.
- From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI
Large Language Models (LLMs) are undergoing a fundamental transformation from conversational generators into integrated AI systems capable of reasoning, action, memory, and self-improvement. We conceptualize this transition as a shift from Chatbot to Digital Colleague: from conversational answers to persistent work. We organize this transition along two tightly coupled dimensions. First, at the cognitive core level, LLMs are advancing from Chatbot-era "fast thinking" systems driven by next-token prediction toward Thinking LLMs that leverage inference-time computation, Chain-of-Thought reasoning, reflection, process supervision, and reinforcement learning to support more deliberate and reliable cognition. Second, at the tool-augmented task execution level, LLMs are progressing from tool-calling Agents that invoke external resources in an ad hoc manner toward OpenClaw-style workstation systems (OpenClaw) equipped with persistent Workspaces, skills, verification loops, and governance. The "Workspace + Skill" paradigm makes episodic tool use colleague-like via state persistence, reusable procedures, task closure, and experience reuse. We examine data construction shifts from instruction-response pairs to State-Action-Observation trajectories and evaluation from static benchmarks to sandboxed, auditable, self-evolving AI ecosystems.
- Orchestra-o1: Omnimodal Agent Orchestration
The recent success of agent swarms has shifted the paradigm of large language model (LLM)-based agents from single-agent workflows to multi-agent systems, highlighting the importance of agent orchestration for task decomposition and collaboration. However, existing orchestration frameworks are limited to a narrow set of modalities and struggle to generalize to more complex settings where heterogeneous modalities coexist and interact. This limitation becomes particularly pronounced in omnimodal scenarios, where tasks require the unified understanding and coordination of diverse inputs such as text, image, audio, and video. In this work, we propose Orchestra-o1, an omnimodal agent orchestration framework designed to support efficient agent collaboration across multiple modalities. Orchestra-o1 introduces a unified orchestration mechanism that enables modality-aware task decomposition, online sub-agent specialization, and parallel sub-task execution. This scalable design allows agent systems to effectively tackle complex real-world tasks involving heterogeneous information sources, surpassing the second-best approach by 10.3% accuracy on the OmniGAIA benchmark. Furthermore, we introduce decision-aligned group relative policy optimization (DA-GRPO), an efficient agentic reinforcement learning approach for training Orchestra-o1-8B, which also achieves state-of-the-art performance against all existing open-source omnimodal agents.
- Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model family inherently exhibit higher policy-level diversity, indicated by their superior pass@k relative to larger counterparts as sample counts increase. Unlike token-level noise, this diversity is temporally correlated, preserves logical consistency, and provides structured exploration signals for gradient estimation. We thus propose S2L-PO (Small-to-Large Policy Optimization), a framework that leverages fixed small models as natural explorers to train larger models. To balance exploration and exploitation, we design a progressive annealing strategy that transitions from offline small-model rollouts to the large learner's own sampling. This shift elegantly avoids mid-training performance drops caused by the small model's capacity limits, achieving faster convergence and unlocking a higher performance ceiling. S2L-PO improves accuracy on diverse mathematical reasoning benchmarks (e.g., +8.8% on AIME 24 using a 1.7B explorer to guide the 8B model) while reducing rollout compute.
- EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated task conditions. To address this gap, we introduce EvoArena, a benchmark suite that models environment changes as sequences of progressive updates across terminal, software, and social domains. We further propose EvoMem, a patch-based memory paradigm that records memory evolution as structured update histories, enabling agents to reason about environmental evolution through changes in their memory. Experiments show that current agents struggle on EvoArena, achieving an average accuracy of 39.6% across evolving terminal, software, and social-preference domains. EvoMem consistently improves performance, yielding an average gain of 1.5% on EvoArena and also improving standard benchmarks such as GAIA and LoCoMo by 6.1% and 4.8%. Beyond individual tasks, EvoMem further improves chain-level accuracy by 3.7% on EvoArena, where success requires completing a consecutive sequence of related evolutionary subtasks. Mechanistic analysis shows that EvoMem improves evidence capture in the memory, indicating better preservation of complete evolving environment states. Our results highlight the importance of modeling evolution in both evaluation and memory for reliable agent deployment.
- MiniMax Sparse Attention
Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment scale. We introduce MiniMax Sparse Attention (MSA), a blockwise sparse attention built upon Grouped Query Attention (GQA). A lightweight Index Branch scores key-value blocks and independently selects a Top-k subset for each GQA group, enabling group-specific sparse retrieval while maintaining efficient block-level execution; the Main Branch then performs exact block-sparse attention over only the selected blocks. Designed around a principle of simplicity and scalability, MSA is deliberately streamlined, making it straightforward to deploy efficiently across a broad range of GPUs. To translate sparsity into practical speedups, we co-design MSA with a GPU execution path that uses exp-free Top-k selection and KV-outer sparse attention to improve tensor-core utilization under block-granular access. On a 109B-parameter model with native multimodal training, MSA performs on par with GQA while reducing per-token attention compute by 28.4x at 1M context. Paired with our co-designed kernel, MSA achieves 14.2x prefill and 7.6x decoding wall-clock speedups on H800. Our inference kernel is available at: https://github.com/MiniMax-AI/MSA. A production-grade natively multimodal model powered by MSA has been publicly released at: https://huggingface.co/MiniMaxAI/MiniMax-M3.
Techmeme(29)
- Sensor Tower: ChatGPT's market share fell to 46.4% by the end of May, as Gemini rose to 27.7% and Claude to 10.3%; Grok, Meta AI, and others have less than 5% (Ivan Mehta/TechCrunch)
Ivan Mehta / TechCrunch : Sensor Tower: ChatGPT's market share fell to 46.4% by the end of May, as Gemini rose to 27.7% and Claude to 10.3%; Grok, Meta AI, and others have less than 5% — More than three and a half years after ChatGPT's initial release, AI assistants are now used by millions of people worldwide, and the competitive landscape is changing fast.
- A US judge dismisses xAI's lawsuit alleging OpenAI stole trade secrets, saying xAI failed to show OpenAI induced a former xAI engineer to divulge trade secrets (Jonathan Stempel/Reuters)
Jonathan Stempel / Reuters : A US judge dismisses xAI's lawsuit alleging OpenAI stole trade secrets, saying xAI failed to show OpenAI induced a former xAI engineer to divulge trade secrets — A federal judge on Monday dismissed a lawsuit by Elon Musk's artificial intelligence company xAI that accused rival Sam Altman's OpenAI …
- Anthropic's belief in its own commitment to safety gives Anthropic the license to aggressively favor its business and even challenge the US government (Ben Thompson/Stratechery)
Ben Thompson / Stratechery : Anthropic's belief in its own commitment to safety gives Anthropic the license to aggressively favor its business and even challenge the US government — I'm sympathetic to the cynics who consistently characterize Anthropic's public statements, particularly those surrounding their model releases …
- Source: Anthropic was given 90 minutes to comply and was not provided with detailed concerns before the export control order was issued (Financial Times)
Financial Times : Source: Anthropic was given 90 minutes to comply and was not provided with detailed concerns before the export control order was issued — Export controls on Fable and Mythos raise doubts over how US will police the most powerful AI systems — The Trump administration's decision …
- Sources: senior Anthropic technical staff are in DC to meet WH officials and try to fix the Mythos 5 dispute; both sides say they are eager to resolve the issue (Maria Curi/Axios)
Maria Curi / Axios : Sources: senior Anthropic technical staff are in DC to meet WH officials and try to fix the Mythos 5 dispute; both sides say they are eager to resolve the issue — Senior technical Anthropic staff are in Washington to meet with White House officials to try to fix a dispute that has taken …
- Canadian PM says the Anthropic ban shows the dangers of "over-reliance on certain models", and compares the risks to those that led to the 2008 financial crisis (Bloomberg)
Bloomberg : Canadian PM says the Anthropic ban shows the dangers of “over-reliance on certain models”, and compares the risks to those that led to the 2008 financial crisis — Prime Minister Mark Carney said the US export ban blocking all foreign access to Anthropic PBC's latest artificial …
- EU says it is looking at the practical consequences of US restricting Anthropic's models, notes such measures "should not be discriminatory against partners" (Reuters)
Reuters : EU says it is looking at the practical consequences of US restricting Anthropic's models, notes such measures “should not be discriminatory against partners” — The European Commission said on Sunday that it is assessing the practical implications of a U.S. export control directive …
- Siri AI is good enough to ease Apple's AI crisis; sources: the ability to tap third party AI models beyond OpenAI's is already active in internal iOS 27 builds (Mark Gurman/Bloomberg)
Mark Gurman / Bloomberg : Siri AI is good enough to ease Apple's AI crisis; sources: the ability to tap third party AI models beyond OpenAI's is already active in internal iOS 27 builds — The company prepares for the foldable iPhone and touch-screen MacBook. — Apple's new Siri AI, despite mainly delivering …
- Source: the White House is unlikely to extend export restrictions to other AI companies (Leo Schwartz/The Information)
Leo Schwartz / The Information : Source: the White House is unlikely to extend export restrictions to other AI companies — The White House is unlikely to extend export restrictions on Anthropic's advanced models to other AI companies, an official close to the U.S. government said Saturday.
- European political figures say Anthropic disabling access to Fable 5 and Mythos 5 is a "wake-up call" about the risks of depending on the US for AI tech (Nathan Rennolds/Associated Press)
Nathan Rennolds / Associated Press : European political figures say Anthropic disabling access to Fable 5 and Mythos 5 is a “wake-up call” about the risks of depending on the US for AI tech — Anthropic said it believed the US government had become aware of a potential means of jailbreaking Fable 5.
- David Sacks says Dario Amodei refused to "fix the jailbreak or de-deploy the model" after "a highly credible trusted partner" reported a Fable jailbreak (David Sacks/@davidsacks)
David Sacks / @davidsacks : David Sacks says Dario Amodei refused to “fix the jailbreak or de-deploy the model” after “a highly credible trusted partner” reported a Fable jailbreak — I've had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true: — As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
- US barring foreign nationals, including Anthropic staffers in the US, from using Fable 5 and Mythos 5 marks a new phase in the US trying to control Anthropic (New York Times)
New York Times : US barring foreign nationals, including Anthropic staffers in the US, from using Fable 5 and Mythos 5 marks a new phase in the US trying to control Anthropic — The company said on Friday night that the federal government had ordered limits on its Mythos and Fable 5 A.I. systems, citing national security concerns.
Solidot(14)
- 开源模型能否战胜 OpenAI?
中美两国的 AI 公司采取了不同的发布策略:中国侧重于开源权重模型,美国公司如 OpenAI 和 Anthropic 则采用闭源策略。Hugging Face 前亚太生态系统高管 Tiezhen Wang 表示,OpenAI 和 Anthropic 指责中国 AI 公司蒸馏其模型,他认为蒸馏是中性的,美国 AI 公司是通过抓取互联网上的信息训练模型,它们并非知识的创造者,却试图阻止其他人重复利用知识,有点讽刺。所有 AI 生成的内容都应该没有版权,否则拥有算力的人能滥用权力,生成各种组合内容然后对所有内容都申请版权。他发现中国公司和美国公司在最大化使用 token 上有明显差异,因为中国有很多开源权重模型,其使用成本没有美国大,因此中国互联网公司都鼓励员工最大化使用 token,鼓励员工成为 AI 原生开发者,甚至禁止他们手动完成撰写文档之类的日常工作。
- Google 起诉涉嫌 AI 诈骗的中国组织
Google 起诉了一家提供“诈骗即服务”的中国组织 Outsider Enterprise。该组织在 Telegram 上运营,向想要搞诈骗活动的人提供一整套模板,如使用 Google Gemini 创建模仿 Google、YouTube,以及纽约 E-ZPass 等政府机构网站的教程。Android 用户收到的逾 250 万条诈骗短信与 Outsider Enterprise 相关,其中约 5.5 万条短信发送在上月的两周内。Google 追踪到 9000 个虚假网站和 100 万网址与该诈骗网络相关。目前没人知道 Outsider Enterprise 幕后运营者的身份,Google 此举旨在扰乱 Outsider Enterprise 的运营。
- 因美政府命令 Anthropic 下线 Fable 5 和 Mythos 5 模型
Anthropic 周五发表声明,它收到美国政府的命令,政府以国家安全理由下令禁止外国公民访问其最先进的 AI 模型。该指令适用于所有外国公民,无论他们是身处美国境内还是境外,Anthropic 的外籍员工也包含在内。为确保合格,它只能对所有用户暂停访问 Fable 5 和 Mythos 5 模型。Anthropic 其它模型的访问不受影响。亚马逊云服务 AWS 周五晚间表示,Anthropic 已要求其禁止“所有地区所有用户”对相关模型的访问。Anthropic 公司的多位核心成员,包括联合创始人 Chris Olah、研究员 Andrej Karpathy 和哲学家 Amanda Askell 均出生于美国境外。
- 铠侠市值超过丰田跃居日本股市第一
拜 AI 热所赐,6 月 12 日日本铠侠控股(Kioxia Holdings)的总市值超过丰田,在日本国内上市企业中首次跃居榜首。铠侠的总市值达到 44 万亿日元,超过丰田约 43 万亿日元的市值。支撑股价上涨的是盈利能力扩大。以美国科技巨头对 AI 数据中心的投资为背景,NAND 闪存的销售大幅增长。软银集团(SBG)股价同样受 AI 投资相关预期推动走高,曾在 6 月 1 日市值一度超越丰田登顶榜首。作为投资公司的软银集团的收益主要来源于两大板块,一是对美国 OpenAI 的大额投资估值上涨,二是旗下英国半导体设计公司 ARM 控股的价值提升。
- OpenAI 称中国关联账户试图煽动美国反数据中心情绪
OpenAI 周三发布报告称,公司发现一些源自中国的账户利用 AI 生成英文社交媒体帖子,称数据中心推高了美国居民的电费。OpenAI 称,这些账户可能与一家未具名的中国私营科技公司有关。OpenAI 表示,这些帖子传播范围有限,但应引起外界对外国势力试图削弱美国战略性产业的关注。该公司补充称,美国对 AI 和数据中心存在“合理的讨论”,但这些账户通过伪装成普通美国民众,通过发布有争议的 AI 生成内容来试图操纵讨论。
- Visa 支付网络集成 ChatGPT
Visa 正在其支付网络集成 ChatGPT,允许 AI 智能体代表用户购物并完成购买。此举意味着 AI 智能体不仅能推荐商品,还能代表用户在任何接受 Visa 的商家完成商品购买。OpenAI 将提供技术,让智能体能通过 ChatGPT 进行互动、做出决策和发起购买。Visa 和 OpenAI 没有透露双方合作的财务条款,也没有说明商家或顾客需要支付的费用详细信息。 Visa 表示,为保护消费者并最大限度减少欺诈,该功能将设置消费限额、需要批准的步骤以及仅限授权商家等安全措施。
- 苹果宣布 Google Gemini 驱动的 Siri AI
苹果在 2026 年 WWDC 开发者大会上宣布了 Google Gemini 驱动的新一代 Apple 智能和 Siri AI。驱动 AI 功能的运算运行在设备上或者私有云上。苹果称,“Siri 能够利用对个人情境的理解,搜索信息、邮件、照片等内容,并通过更加全系统化的 app 操作,完成跨 app 任务。Siri AI 能够回答与用户屏幕上的内容相关的问题,也可以利用广博的世界知识,上网获取最新信息,生成有用的答案。通过专门的 Siri app,用户可重新访问过往对话或发起新对话,并利用 iCloud 在用户的各种设备上私密同步对话历史记录。”由于欧盟的隐私和消费者保护监管规定,AI 智能暂时不会在欧盟推出,苹果表示,“Apple 智能推出时间依监管部门审批情况而定,Siri AI 和其他新的 Apple 智能功能在中国大陆尚不可用。”
- AI 威胁数十亿人的自然资源
联合国大学水、环境与健康研究所发布了报告《AI 能耗的环境成本:碳、水和土地足迹》。报告预计到 2030 年,为全球人工智能(AI)提供支持的数据中心,每年将消耗 945 TWh 的电力,相关用水量将相当于 13 亿人一年的基本生活用水需求,而土地占用面积将超过 14500 平方公里。研究发现,支撑 AI 运行的每 1 千瓦时电力,都同时对应3种环境足迹,即来自能源生产过程的碳足迹、来自发电和冷却过程的水足迹,以及能源基础设施建设和资源开采带来的土地足迹。报告显示,训练 GPT-5 预计需要约 100 GWh 电力,相当于撒哈拉以南非洲约 77 万人一年的居民用电量,相关用水量约为 10 亿升,土地占用量约为 1.5 平方公里。训练只是 AI 生命周期中的一部分。随着模型投入应用,真正持续消耗资源的是推理过程,也就是模型不断响应用户请求、生成内容的过程。报告估计,推理环节占 AI 总能耗的 80%-90%。2025 年全球数据中心消耗了 448 TWh 的电力。如果将其视为一个国家,它们将成为全球第 11 大电力消费国,排在法国之后,沙特阿拉伯之前。
- 美国政府考虑在 AI 公司持有股份
美国政府考虑持有 AI 公司股份。OpenAI CEO Sam Altman 正与白宫就政府可能入股这家 AI 公司进行持续磋商。双方的讨论已持续一年多,本周 Altman 在华盛顿会见了多位议员和官员,就监管和 AI 的最新发展进行了磋商。作为潜在协议的一部分,OpenAI 可能会向美国政府捐赠股权,用于建立某种公共财富基金。该基金可以“投资于多元化的长期资产”,让公民能获取 AI 发展的“收益”。在特朗普的第二个任期内,政府已入股了英特尔、IBM 以及量子和关键矿产公司。
- 加州伯克利的 CS 课程不及格率上升
数据显示,2026 年春季加州伯克利 CS 10 课(The Beauty and Joy of Computing)的不及格率高达 35.3%,CS 61A 课(计算机程序的构造和解释)的不及格率达到了 10.6%。而在 2025 年和 2024 年春季,这两门课的不及格率均未超过 10%。教这两门课的教授 Dan Garcia 认为不及格率上升与学生使用大模型相关:学生被发现使用大模型如 Claude、ChatGPT 和 Google Gemini 考试作弊,或过于依赖大模型完成作业但对知识一知半解因此未能对考试做好准备。其它原因包括数学基础薄弱以及师资力量不足。
- Google 将每月支付给 SpaceX 9.2 亿美元租用其算力
SpaceX/xAI 的聊天机器人 Grok 显然用户太少而导致马斯克(Elon Musk)耗巨资购买的英伟达 GPU 大量闲置,为了避免数据中心空转,SpaceX 近期先后与 Anthropic 和 Google 两大 AI 巨头达成了类似的算力出租协议:Anthropic 同意在 2029 年之前每月向 SpaceX 支付 12.5 亿美元租用 Colossus 1 数据中心的算力,Google 每月向 SpaceX 支付 9.2 亿美元租用 11 万个英伟达 GPU 及相关计算基础设施。SpaceX 未透露 Google 租用 Colossus 1 还是 Colossus 2 数据中心。与 Anthropic 的协议类似,与 Google 达成的协议也包含终止条款。SpaceX 和 Google 都可以在 2026 年 12 月 31 日之后提前 90 天通知对方终止交易。
- rsync 项目争议 AI 辅助编程
广泛使用的备份项目 rsync 最近释出的一个版本导致部分用户增量备份失败,用户在检查代码时发现 rsync 维护者 Andrew Tridgell 最近大量使用 AI 辅助编程,项目有数十个 commits 的作者是 tridge 和 claude——tridge 是 Andrew Tridgell,而 claude 就是 Anthropic 的 AI 助手 Claude。此事立即引发了 AI 生成代码的争议。Tridgell 随后通过个人博客回应了争议,承认近期大量使用 AI 编程,他反驳了批评,称批评者在不了解 AI 工具实际使用情况就妄下结论。他表示自己设计了框架,对 AI 生成的代码进行人工审查,他只是将繁琐的工作交给 AI,称自己是一名有 40 年经验的软件工程师。Tridgell 表示会继续使用 AI 工具。