LLMs & Generative AI
The latest on large language models, foundation models, and generative AI.
118 unique stories from the last 14 days across 8 sources.
Hacker News(11)
- How Fast Does Claude, Acting as a User Space IP Stack, Respond to Pings? (dunkels.com)
- LLMs corrupt your documents when you delegate (arxiv.org)
- DeepSeek 4 Flash local inference engine for Metal (github.com)
- AlphaEvolve: Gemini-powered coding agent scaling impact across fields (deepmind.google)
- Higher usage limits for Claude and a compute deal with SpaceX (www.anthropic.com)
- OpenAI's o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors (www.theguardian.com)
- Uber torches 2026 AI budget on Claude Code in four months (www.briefs.co)
- Claude Code refuses requests or charges extra if your commits mention "OpenClaw" (twitter.com)
- Claude.ai unavailable and elevated errors on the API (status.claude.com)
- Anthropic Joins the Blender Development Fund as Corporate Patron (www.blender.org)
- OpenAI CEO's Identity Verification Company Announced Fake Bruno Mars Partnership (www.vice.com)
GitHub Trending(5)
Product Hunt(10)
- Grok Connectors
Bring your daily apps into Grok
- AgentPeek
Claude Code and Codex in your Mac notch
- DevPass by LLM Gateway
One key to access every coding model in 3 flat prices
- WOZCODE
Cut Claude Code costs by up to 50%
- Open Finance MCP
Access your bank data in ChatGPT & Claude via Open Finance
- Claude Code & Codex Usage Trading Cards by Rudel
Get your trading card based on your CC & codex usage
- Zush
Updated: docs support, BYOK, Local AI (Ollama), Windows App
- HiveTerm
One workspace for Claude, Codex, Gemini and your stack
- Gemini Deep Research Agent
Web and MCP research agents, now in Gemini API
- KarmaBox
Run your own Claude Code in your pocket.
Hugging Face(48)
- Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Scaling Diffusion Transformers (DiTs) to hundreds of layers introduces a structural vulnerability: networks can enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation. Through mechanistic auditing, we isolate the trigger event of this collapse as Mean Mode Screaming (MMS). MMS can occur even when training appears stable, with a mean-coherent backward shock on residual writers that opens deep residual branches and drives the network into a mean-dominated state. We show this behavior is driven by an exact decomposition of these gradients into mean-coherent and centered components, compounded by the structural suppression of attention-logit gradients through the null space of the Softmax Jacobian once values homogenize. To address this, we propose Mean-Variance Split (MV-Split) Residuals, which combine a separately gained centered residual update with a leaky trunk-mean replacement. On a 400-layer single-stream DiT, MV-Split prevents the divergent collapse that crashes the un-stabilized baseline; it tracks close to the baseline's pre-crash trajectory while remaining substantially better than token-isotropic gating methods such as LayerScale across the full schedule. Finally, we present a 1000-layer DiT as a scale-validation run at boundary scales, establishing that the architecture remains stably trainable at extreme depth.
- MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
With the rise of online dance-video platforms and rapid advances in AI-generated content (AIGC), music-driven dance generation has emerged as a compelling research direction. Despite substantial progress in related domains such as music-driven 3D dance generation, pose-driven image animation, and audio-driven talking-head synthesis, existing methods cannot be directly adapted to this task. Moreover, the limited studies in this area still struggle to jointly achieve high-quality visual appearance and realistic human motion. Accordingly, we present MACE-Dance, a music-driven dance video generation framework with cascaded Mixture-of-Experts (MoE). The Motion Expert performs music-to-3D motion generation while enforcing kinematic plausibility and artistic expressiveness, whereas the Appearance Expert carries out motion- and reference-conditioned video synthesis, preserving visual identity with spatiotemporal coherence. Specifically, the Motion Expert adopts a diffusion model with a BiMamba-Transformer hybrid architecture and a Guidance-Free Training (GFT) strategy, achieving state-of-the-art (SOTA) performance in 3D dance generation. The Appearance Expert employs a decoupled kinematic-aesthetic fine-tuning strategy, achieving state-of-the-art (SOTA) performance in pose-driven image animation. To better benchmark this task, we curate a large-scale and diverse dataset and design a motion-appearance evaluation protocol. Based on this protocol, MACE-Dance also achieves state-of-the-art performance. Code is available at https://github.com/AMAP-ML/MACE-Dance.
- Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which samples a group of responses per prompt and updates the policy via group-relative advantage signals. This work reveals that these optimization strategies share a common geometric structure: each implicitly defines a target distribution on the response simplex and projects toward it via first-order approximation. Building on this insight, we propose Listwise Policy Optimization (LPO) to explicitly conduct the target-projection, which demystifies the implicit target by restricting the proximal RL objective to the response simplex, and then projects the policy via exact divergence minimization. This framework provides (i) monotonic improvement on the listwise objective with bounded, zero-sum, and self-correcting projection gradients, and (ii) flexibility in divergence selection with distinct structural properties through the decoupled projection step. On diverse reasoning tasks and LLM backbones, LPO consistently improves training performance over typical policy gradient baselines under matched targets, while intrinsically preserving optimization stability and response diversity.
- LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
Test-time scaling (TTS) has become an effective approach for improving large language model performance by allocating additional computation during inference. However, existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition, leaving much of the computation-allocation space unexplored. We propose an environment-driven framework, AutoTTS, that changes what researchers design: from individual TTS heuristics to environments where TTS strategies can be discovered automatically. The key to AutoTTS lies in environment construction: the discovery environment must make the control space tractable and provide cheap, frequent feedback for TTS search. As a concrete instantiation, we formulate width--depth TTS as controller synthesis over pre-collected reasoning trajectories and probe signals, where controllers decide when to branch, continue, probe, prune, or stop and can be evaluated cheaply without repeated LLM calls. We further introduce beta parameterization to make the search tractable and fine-grained execution trace feedback to improve discovery efficiency by helping the agent diagnose why a TTS program fails. Experiments on mathematical reasoning benchmarks show that the discovered strategies improve the overall accuracy--cost tradeoff over strong manually designed baselines. The discovered strategies generalize to held-out benchmarks and model scales, while the entire discovery costs only $39.9 and 160 minutes. Our data, and code will be open-source at https://github.com/zhengkid/AutoTTS.
- Anisotropic Modality Align
Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further propose the principle of anisotropic modality gap alignment: effective modality alignment should align with the target-modality distribution while preserving the semantic structure of the source modality. Guided by this principle, we propose an anisotropic geometric correction framework, AnisoAlign, for unpaired modality alignment. This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in both geometric diagnostics and text-only MLLM training. Overall, this work recasts the modality gap from an empirical observation into a correctable, structured geometric phenomenon and provides a new representation alignment perspective for training multimodal models with unimodal data.
- TextLDM: Language Modeling with Continuous Latent Diffusion
Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous latents, enhanced by Representation Alignment (REPA) with a frozen pretrained language model to produce representations effective for conditional denoising. A standard DiT then performs flow matching in this latent space, identical in architecture to its visual counterpart. The central challenge we address is obtaining high-quality continuous text representations: we find that reconstruction fidelity alone is insufficient, and that aligning latent features with a pretrained language model via REPA is critical for downstream generation quality. Trained from scratch on OpenWebText2, TextLDM substantially outperforms prior diffusion language models and matches GPT-2 under the same settings. Our results establish that the visual DiT recipe transfers effectively to language, taking a concrete step toward unified diffusion architectures for multimodal generation and understanding.
- MiA-Signature: Approximating Global Activation for Long-Context Understanding
A growing body of work in cognitive science suggests that reportable conscious access is associated with global ignition over distributed memory systems, while such activation is only partially accessible as individuals cannot directly access or enumerate all activated contents. This tension suggests a plausible mechanism that cognition may rely on a compact representation that approximates the global influence of activation on downstream processing. Inspired by this idea, we introduce the concept of Mindscape Activation Signature (MiA-Signature), a compressed representation of the global activation pattern induced by a query. In LLM systems, this is instantiated via submodular-based selection of high-level concepts that cover the activated context space, optionally refined through lightweight iterative updates using working memory. The resulting MiA-Signature serves as a conditioning signal that approximates the effect of the full activation state while remaining computationally tractable. Integrating MiA-Signatures into both RAG and agentic systems yields consistent performance gains across multiple long-context understanding tasks.
- RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned harmonic mean of 0.7827 and outperforming the strongest baseline (gpt-oss-120b, 0.6390). Ablations show that diversity in model families, scales, and prompting strategies is essential, with the ensemble consistently beating any single model. We also introduce Meno-Lite-0.1, a 7B domain-adapted model with a strong cost--performance trade-off, and analyse MTRAGEval, highlighting annotation limitations and directions for improvement. Our code is publicly available: https://github.com/RaguTeam/ragu_mtrag_semeval
- SkillOS: Learning Skill Curation for Self-Evolving Agents
LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key bottleneck. Existing approaches either rely on manual skill curation, prescribe heuristic skill operations, or train for short-horizon skill operations. However, they still struggle to learn complex long-term curation policies from indirect and delayed feedback. To tackle this challenge, we propose SkillOS, an experience-driven RL training recipe for learning skill curation in self-evolving agents. SkillOS pairs a frozen agent executor that retrieves and applies skills with a trainable skill curator that updates an external SkillRepo from accumulated experience. To provide learning signals for curation, we design composite rewards and train on grouped task streams based on skill-relevant task dependencies, where earlier trajectories update the SkillRepo, and later related tasks evaluate these updates. Across multi-turn agentic tasks and single-turn reasoning tasks, SkillOS consistently outperforms memory-free and strong memory-based baselines in both effectiveness and efficiency, with the learned skill curator generalizing across different executor backbones and task domains. Further analyses show that the learned curator produces more targeted skill use, while the skills in SkillRepo evolve into more richly structured Markdown files that encode higher-level meta-skills over time.
- Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the ``zero-advantage problem'': when all sampled rollouts for a query fail, the relative advantage collapses to zero. Consequently, the model loses effective training signals for these questions, wasting the training data and computational budget. While simply increasing the sampling budget for these questions is a common remedy, the static sampling policy inherently constrains reasoning exploration, limiting the success rate. In this paper, we propose Lorem Perturbation for Exploration (LoPE), a simple yet effective training framework to break this exploration bottleneck. We posit that task-irrelevant prompt-space perturbations can shift the model's output distribution enough to unlock orthogonal reasoning pathways for hard questions. Specifically, LoPE prepends sequences stochastically assembled from Lorem Ipsum vocabulary (a pseudo-Latin placeholder text) to the prompts before resampling. Experiments across 1.7B, 4B, and 7B models demonstrate that LoPE significantly outperforms resampling with the original prompts. Further analysis reveals that other Latin-based random sequences with low perplexity are also effective perturbations. Our results establish LoPE as a strong baseline for broadening exploration in LLM reinforcement learning.
- RLDX-1 Technical Report
While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. π_{0.5} and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while π_{0.5} and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.
- HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reasoning capabilities, they lack the capacity to predict future geometric evolution, creating a significant disparity between semantic interpretation and physical simulation. To bridge this gap, we propose HERMES++, a unified driving world model that integrates 3D scene understanding and future geometry prediction within a single framework. Our approach addresses the distinct requirements of these tasks through synergistic designs. First, a BEV representation consolidates multi-view spatial information into a structure compatible with LLMs. Second, we introduce LLM-enhanced world queries to facilitate knowledge transfer from the understanding branch. Third, a Current-to-Future Link is designed to bridge the temporal gap, conditioning geometric evolution on semantic context. Finally, to enforce structural integrity, we employ a Joint Geometric Optimization strategy that integrates explicit geometric constraints with implicit latent regularization to align internal representations with geometry-aware priors. Extensive evaluations on multiple benchmarks validate the effectiveness of our method. HERMES++ achieves strong performance, outperforming specialist approaches in both future point cloud prediction and 3D scene understanding tasks. The model and code will be publicly released at https://github.com/H-EmbodVis/HERMESV2.
Techmeme(31)
- Musk v. Altman: Satya Nadella says Elon Musk never contacted him with concerns that Microsoft's investments in OpenAI violated any special terms or commitments (CNBC)
CNBC : Musk v. Altman: Satya Nadella says Elon Musk never contacted him with concerns that Microsoft's investments in OpenAI violated any special terms or commitments — Microsoft CEO Satya Nadella took the stand in the Musk v. Altman trial on Monday, where he testified that Elon Musk never contacted …
- Musk v. Altman: Ilya Sutskever testifies that his OpenAI stake is worth ~$7B and he had concerns about Altman for a year before Altman's brief ouster as CEO (Rachel Metz/Bloomberg)
Rachel Metz / Bloomberg : Musk v. Altman: Ilya Sutskever testifies that his OpenAI stake is worth ~$7B and he had concerns about Altman for a year before Altman's brief ouster as CEO — OpenAI co-founder and former chief scientist Ilya Sutskever said his stake in the ChatGPT maker is worth roughly $7 billion …
- An Anthropic engineer argues HTML is a better output format for AI agents than Markdown, citing information density, ease of sharing, and two-way interaction (@trq212)
@trq212 : An Anthropic engineer argues HTML is a better output format for AI agents than Markdown, citing information density, ease of sharing, and two-way interaction — Using Claude Code: The Unreasonable Effectiveness of HTML
- curl founder Daniel Stenberg says Mythos identified five vulnerabilities in curl, but a manual review found three were false positives and one was "just a bug" (Daniel Stenberg/daniel.haxx.se)
Daniel Stenberg / daniel.haxx.se : curl founder Daniel Stenberg says Mythos identified five vulnerabilities in curl, but a manual review found three were false positives and one was “just a bug” — yes, as in singular one. — Back in April 2026 Anthropic caused a lot of media noise when they concluded …
- Experian says 40% of the 5,000 data breaches it serviced in 2025 were AI-powered, and predicts agentic AI will be the leading cause of data breaches in 2026 (Jennah Haque/Bloomberg)
Jennah Haque / Bloomberg : Experian says 40% of the 5,000 data breaches it serviced in 2025 were AI-powered, and predicts agentic AI will be the leading cause of data breaches in 2026 — A few months ago, I received a congratulations letter on my upcoming enrollment at the Ultimate Medical Academy in Tampa, Florida.
- Anthropic, OpenAI, and other AI firms met with Hindu, Sikh, and Greek Orthodox leaders to draft principles on how to infuse models with ethics and morality (Krysta Fauria/Associated Press)
Krysta Fauria / Associated Press : Anthropic, OpenAI, and other AI firms met with Hindu, Sikh, and Greek Orthodox leaders to draft principles on how to infuse models with ethics and morality — As concerns mount over artificial intelligence and its rapid integration into society, tech companies are increasingly turning …
- OpenAI, Anthropic, and Google's enterprise push with PE firms poses a new competitive threat to India's IT industry, as services become increasingly automatable (Moneycontrol)
Moneycontrol : OpenAI, Anthropic, and Google's enterprise push with PE firms poses a new competitive threat to India's IT industry, as services become increasingly automatable — On Wall Street, the announcements sounded like the next phase of the artificial intelligence (AI) boom: frontier model companies …
- A profile of Anthropic CFO Krishna Rao, who tends to take a conservative approach to revenue projections and has chosen to raise less money than is available (Kate Clark/Wall Street Journal)
Kate Clark / Wall Street Journal : A profile of Anthropic CFO Krishna Rao, who tends to take a conservative approach to revenue projections and has chosen to raise less money than is available — Krishna Rao is navigating unprecedented growth, compute constraints and the idiosyncratic Amodeis
- Anthropic details how it improved Claude's safety training after finding agentic misalignment in older models, such as Opus 4 blackmailing engineers (Anthropic)
Anthropic : Anthropic details how it improved Claude's safety training after finding agentic misalignment in older models, such as Opus 4 blackmailing engineers — Last year, we released a case study on agentic misalignment. In experimental scenarios, we showed that AI models from many different …
- Impressions of China's AI ecosystem after visiting many leading AI labs there, and the similarities and differences in working on LLMs in China and the West (Nathan Lambert/Interconnects AI)
Nathan Lambert / Interconnects AI : Impressions of China's AI ecosystem after visiting many leading AI labs there, and the similarities and differences in working on LLMs in China and the West — Lessons from my trip to talk to most of the leading AI labs in China. … Audio playback is not supported on your browser. Please upgrade.
- Akamai says it struck a seven-year cloud computing deal with a "leading frontier model provider"; sources: the deal was with Anthropic and is worth $1.8B (Rachel Metz/Bloomberg)
Rachel Metz / Bloomberg : Akamai says it struck a seven-year cloud computing deal with a “leading frontier model provider”; sources: the deal was with Anthropic and is worth $1.8B — Anthropic PBC has signed a $1.8 billion computing deal with cloud services provider Akamai Technologies Inc. to meet surging demand …
- Sources: OpenAI and Broadcom discuss terms for Broadcom to finance initial custom chip production for ~$18B, conditioned on Microsoft buying ~40% of the chips (Anissa Gardizy/The Information)
Anissa Gardizy / The Information : Sources: OpenAI and Broadcom discuss terms for Broadcom to finance initial custom chip production for ~$18B, conditioned on Microsoft buying ~40% of the chips — When OpenAI and chip designer Broadcom announced last fall that they would make custom artificial intelligence chips together, they positioned it as a done deal.
Solidot(13)
- Mythos 发现了一个 curl 漏洞
Anthropic 上个月宣布的新 AI 模型 Mythos 引发了媒体的广泛关注,它宣传 Mythos 能极其精确的发现源代码中的安全漏洞。它的识别能力如此强大以至于 Anthropic 暂不向公众发布该模型,而是先提供给少数几家公司,以便于它们能优先解决其发现的安全漏洞。curl 维护者 Daniel Stenberg 认为这是一次极其成功的营销噱头。curl 是广泛使用的开源项目,因此他获得了 Mythos 的访问权限。curl 目前包含了 17.6 万行 C 代码,共 66 万个单词。Mythos 最终返回了一份安全报告,声称确认了五个安全漏洞。但 curl 的安全团队在仔细检查后发现其中 3 个是误报,1 个是 Bug,还有 1 个是低危级别的安全漏洞,将会在下个月释出的版本中修复。安全报告还详细纪录了约 20 个 bug,基本上都是正确的。Stenberg 表示他没有看到任何证据表明 Mythos 在发现安全漏洞上比之前的其它工具更胜一筹,Mythos 可能略好一点,但不足以对代码分析产生显著影响。
- Linux 基金会 2.95% 的预算投入在 Linux
根据 Linux 基金会公布的 2025 年年度报告,去年它在 Linux 内核项目上的开支为 841 万美元,占到了总预算的 2.95%,其中 Linux 内核作者 Linus Torvalds 薪水大约为 150 万美元(其中包括百万美元的“其它”收入,该收入未明确定义)。Linux 基金会其实是一个行业协会,并非公益性非营利组织,它的资金来自于科技巨头的赞助,从董事会成员的构成就可以看出,它的董事来自索尼、华为、OpenAI、高通、三星、微软、甲骨文、Google 和 Meta 等。Linux 基金会托管了大约 1500 个开源项目,Linux 内核也不是最大的项目,它在区块链上支出占到了总预算的 4%。
- 法国对马斯克及其 X 平台展开刑事调查
法国检方对马斯克(Elon Musk)及其 X 平台展开刑事调查。法国执法部门三个月前搜查了 X 位于巴黎的办公室,传唤马斯克接受讯问。检方原计划于今年 4 月约谈马斯克及前 X CEO Linda Yaccarino,但两人都未现身。 现在法国当局正试图以刑事指控相威胁,强制他们到场接受讯问。除未成年人色情图像外,调查还涉及 Grok 传播否认纳粹大屠杀的言论以及深度伪造色情。检方称,如果马斯克和 Yaccarino 再次缺席他们将面临缺席起诉。
- 扎克伯格被控个人授权和鼓励公司侵犯版权
五大出版商 Hachette、Macmillan、McGraw Hill、Elsevier 和 Cengage 以及作家 Scott Turow 起诉 Meta 公司及其 CEO 扎克伯格(Mark Zuckerberg),指控扎克伯格个人授权和积极鼓励大规模版权侵犯,使用盗版图书、期刊论文和网络抓取的资料训练 Meta 公司的 Llama AI 系统。Meta 否认有任何不当行为,表示将应诉,称法院已认定使用受版权保护的材料训练 AI 属于合理使用。用版权材料训练 AI 可能是合理使用,但 Meta 使用了非法手段获取了版权材料。起诉书称,为了赢得 AI 军备竞赛并构建一个功能完善的生成式 AI 模型,Meta 和扎克伯格遵循了其“快速行动打破常规”的信条,首先从盗版网站非法下载了数百万本受版权保护的书籍和期刊文章,未经授权抓取了几乎整个互联网的内容,构成了历史上最大规模的版权侵权之一。
- OpenAI 总裁被迫在法庭作证时阅读自己的个人日记
马斯克(Elon Musk)上周在法庭上作证指控 OpenAI 的另外两位联合创始人 Greg Brockman 和 Sam Altman 放弃创办时的其非营利使命以谋取个人私利。本周 Brockman 出庭作证,被迫在陪审团前阅读个人日记,似乎印证了马斯克的指控。Brockman 称他从学生时期就写日记,在职业生涯中通过写日记去思考重大决策。这些日记是在去年 10 月作为证据递交到法庭,今年 1 月解封。2017 年马斯克向 OpenAI 发出最后通牒,要么完全由他掌控 OpenAI 的营利性部门,要么 OpenAI 继续保持非营利性质。而 Brockman 同一时间在日记里畅谈了赚钱的好处。在 OpenAI 成立了不由马斯克掌控的营利性部门之后,Brockman 个人在 OpenAI 的股份如今价值 300 亿美元。他还在日记中纠结投票反对马斯克的计划或者投票支持将马斯克逐出董事会是否在道德上是错误的。他在日记中写道:“从他手中夺走这家非营利机构是错误的。在道德上是败坏的。”
- Google Chrome 被发现在合格设备上静默下载 Gemini Nano
Google Chrome 被发现在合格设备上静默下载了 4GB 大小的 Gemini Nano 模型,而且会在用户删除之后重新下载。Gemini Nano 就是 Google 受争议的 Prompt API 所针对的本地模型,运行该模型需要至少有 4GB 显存、16GB 内存和至少 22GB 可用空间(浏览器安装包所在分区)。Google Chrome 有 38 亿用户,是市场份额最高的浏览器,满足运行 Gemini Nano 要求的设备至少数以亿计,即使不考虑重复下载,为如此多的设备静默下载 4GB 数据也是难以想象的资源浪费。此外值得一提是 Chrome 安装包大小是 1GB 左右,悄悄下载的模型大小四倍于浏览器本身,超出了大多数用户对额外功能大小的预期。Gemini Nano 下载在被称为 OptGuideOnDeviceModel 的文件夹内,该名字代表 OptimizationGuide on-device model storage。
- OpenAI、Google 和微软推动在学校课程中加入 AI 素养课
加州民主党参议员 Adam Schiff 提出了获得两党支持的新法案——《The Literacy in Future Technologies Artificial Intelligence(LIFT AI Act)》,旨在修改 K-12 课程加入 AI 素养课,为 AI 课程以及相关教材、教师培训等提供资助。法案将 AI 素养定义为使用 AI,具体是指“具备与年龄相符的知识和能力,能有效使用 AI,批判性解读输出,解决 AI 世界中的问题,以及降低潜在风险。法案得到了主要 AI 公司如 OpenAI、Google 和微软,以及美国教师联合会、信息技术产业理事会、软件与信息产业协会、惠普公司等的支持。
- 英国 NHS 以 AI 为由准备关闭所有开源库
日程安排平台 Cal.com 上月宣布从开源转为闭源,理由是 AI 工具更容易从开源代码中发现漏洞,而安全性依赖于模糊,因此闭源有助于提高安全。现在英国国家医疗服务体系(NHS)以相同的理由准备关闭它几乎所有的开源库,这一决定引发了广泛争议和批评。批评者指出 NHS 公布的大部分开源库是数据集、内部工具、指南、研究工具、前端设计等,它们不会因为安全扫描技术的进步而受到影响。此外是否开源对于 Anthropic Mythos 之类的 AI 工具并无区别,因为它们也能分析二进制程序并寻找漏洞。批评者发表了公开信,呼吁 NHS 保持其代码公开。
- 为什么 OpenAI 的系统提示词要专门限制 Goblins
OpenAI Codex CLI 系统提示词专门加入了一条对地精(Goblins)等词的限制:“never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query”。官方解释称,从 GPT-5.1 开始该公司的模型在比喻中提及 goblin 等词的频率大增,ChatGPT 中 goblin 的使用量增加了 175%,gremlin 使用量增加了 52%。它为此展开了调查,发现是因为 Nerdy 个性无意中奖励了此类比喻,导致高频使用 goblin 的行为扩散。为解决该问题,OpenAI 淘汰了 Nerdy 个性,移除了对 goblin 友好的奖励信号,从训练数据过滤掉相关示例,防止其再次不恰当的出现。
- Mozilla 反对 Chrome 的 Prompt API
Google Chrome 在 2025 年提出了 Prompt API,也就是为浏览器集成的本地模型——使用前需要下载——提供统一的 JavaScript API。Google 还有意让该 API 成为一个 W3C 标准。Chrome 桌面版集成的大模型是 Gemini Nano,使用该模型需要本地设备至少有 4GB 显存、16GB 内存和至少 22GB 可用空间(浏览器所在硬盘)。Mozilla 开发者发表声明反对 Chrome 的 Prompt API。开发者认为该 API 存在巨大的互操作性问题,因为不同的模型都有各种独特的特性,因此系统提示词需要对模型进行针对性调整,然而对一个模型进行的调整对另一个模型就可能是过度修正。为了实现互操作性,Mozilla 和 Apple 可能不得不获得 Google 模型的授权,或者发布一个与 Google 模型特性兼容的模型。另一个大问题是模型的中立性缺乏。
- Zed 编辑器发布 1.0 版本
用 Rust 开发的文本编辑器项目 Zed 宣布发布 1.0 版本。开发者表示 1.0 版本并不意味着“完成”或“完美”,而是意味着到达了一个关键点。开发者还宣称 Zed 编辑器是一个 AI 原生编辑器,能并行运行多个 AI 智能体,包括 Claude Agent、Codex、OpenCode,以及 Cursor。AI 构建在编辑器的基础架构之中,而不是附加组件。
- 马斯克称他创办非盈利的 OpenAI 是为了对抗 Google
2024 年马斯克(Elon Musk)向旧金山高等法院起诉 OpenAI 及其联合创始人 Sam Altman 和 Greg Brockman 违反公司的创始原则,将商业利益置于公共利益之上。OpenAI 则公开了马斯克的邮件,证明作为曾经的联合创始人,马斯克同意 OpenAI 建立一个盈利实体,还表示将提供资金,但之后暂停了资金支持,他的目的是获得多数股权和董事会控制权,双方最终因此终止了合作。本周这起诉讼正式进入审讯阶段,马斯克在法庭上作证,称创办 OpenAI 是将其作为一家非盈利公司去对抗 Google,如果 OpenAI 的目标是盈利他不会支持它。马斯克称他在与 Google 联合创始人 Larry Page 就 AI 安全问题上发生争执后萌生了创办非盈利 AI 公司的想法。他担心 Page 没有认真对待 AI 安全问题,因此希望通过一个非盈利的开源替代方案去对抗 Google。