Weekly Digest — 2026-W10
215 unique stories (2026-03-02 → 2026-03-08), aggregated across 8 sources.
Hacker News(42)
- The workers behind Meta's smart glasses can see everything (www.svd.se)
- Welcome (back) to Macintosh (take.surf)
- British Columbia to end time changes, adopt year-round daylight time (www.cbc.ca)
- First in-utero stem cell therapy for fetal spina bifida repair is safe: study (health.ucdavis.edu)
- Anthropic Cowork feature creates 10GB VM bundle on macOS without warning (github.com)
- New iPad Air, powered by M4 (www.apple.com)
- Iran War Cost Tracker (iran-cost-ticker.com)
- GitHub Is Having Issues (www.githubstatus.com)
- Intel's make-or-break 18A process node debuts for data center with 288-core Xeon (www.tomshardware.com)
- GPT‑5.3 Instant (openai.com)
- Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video] (www.youtube.com)
- MacBook Air with M5 (www.apple.com)
GitHub Trending(27)
Product Hunt(41)
- Rankfender
AI visibility and automated SEO optimization platform
- ChatWithAds
From Data to AI-Assisted Decision, In One Conversation.
- Mosaic
Zapier for Video Editing
- JDoodleClaw
The most user-friendly OpenClaw. Securely hosted.
- Voca AI
The AI project manager that runs in the background
- WEIR AI
Track your identity online to protect it or earn from it
- getviktor.com
Your AI Coworker that proactively executes tasks
- Krisp Accent Conversion
Understand accented speech in real time
- The Bias
The synthesis engine for multi-perspective news
- Deep Personality
Science-backed personality insights for you and your partner
- Continue (Mission Control)
Quality control for your software factory
- Lavalier AI
Interview Intelligence to make confident hiring decisions
Hugging Face(25)
- dLLM: Simple Diffusion Language Modeling
Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set of shared components. These components, however, are distributed across ad-hoc research codebases or lack transparent implementations, making them difficult to reproduce or extend. As the field accelerates, there is a clear need for a unified framework that standardizes these common components while remaining flexible enough to support new methods and architectures. To address this gap, we introduce dLLM, an open-source framework that unifies the core components of diffusion language modeling -- training, inference, and evaluation -- and makes them easy to customize for new designs. With dLLM, users can reproduce, finetune, deploy, and evaluate open-source large DLMs such as LLaDA and Dream through a standardized pipeline. The framework also provides minimal, reproducible recipes for building small DLMs from scratch with accessible compute, including converting any BERT-style encoder or autoregressive LM into a DLM. We also release the checkpoints of these small DLMs to make DLMs more accessible and accelerate future research.
- Enhancing Spatial Understanding in Image Generation via Reward Modeling
Recent progress in text-to-image generation has greatly advanced visual fidelity and creativity, but it has also imposed higher demands on prompt complexity-particularly in encoding intricate spatial relationships. In such cases, achieving satisfactory results often requires multiple sampling attempts. To address this challenge, we introduce a novel method that strengthens the spatial understanding of current image generation models. We first construct the SpatialReward-Dataset with over 80k preference pairs. Building on this dataset, we build SpatialScore, a reward model designed to evaluate the accuracy of spatial relationships in text-to-image generation, achieving performance that even surpasses leading proprietary models on spatial evaluation. We further demonstrate that this reward model effectively enables online reinforcement learning for the complex spatial generation. Extensive experiments across multiple benchmarks show that our specialized reward model yields significant and consistent gains in spatial understanding for image generation.
- Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets
The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic drift and context loss, which can lead to misleading performance metrics. In this work, we present a fully automated framework designed to address these challenges by enabling scalable, high-quality translation of datasets and benchmarks. We demonstrate that adapting test-time compute scaling strategies, specifically Universal Self-Improvement (USI) and our proposed multi-round ranking method, T-RANK, allows for significantly higher quality outputs compared to traditional pipelines. Our framework ensures that benchmarks preserve their original task structure and linguistic nuances during localization. We apply this approach to translate popular benchmarks and datasets into eight Eastern and Southern European languages (Ukrainian, Bulgarian, Slovak, Romanian, Lithuanian, Estonian, Turkish, Greek). Evaluations using both reference-based metrics and LLM-as-a-judge show that our translations surpass existing resources, resulting in more accurate downstream model assessment. We release both the framework and the improved benchmarks to facilitate robust and reproducible multilingual AI development.
- CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large language models (LLMs) remain uncompetitive with compiler-based systems such as torch.compile for CUDA kernel generation. Existing CUDA code generation approaches either rely on training-free refinement or fine-tune models within fixed multi-turn execution-feedback loops, but both paradigms fail to fundamentally improve the model's intrinsic CUDA optimization ability, resulting in limited performance gains. We present CUDA Agent, a large-scale agentic reinforcement learning system that develops CUDA kernel expertise through three components: a scalable data synthesis pipeline, a skill-augmented CUDA development environment with automated verification and profiling to provide reliable reward signals, and reinforcement learning algorithmic techniques enabling stable training. CUDA Agent achieves state-of-the-art results on KernelBench, delivering 100\%, 100\%, and 92\% faster rate over torch.compile on KernelBench Level-1, Level-2, and Level-3 splits, outperforming the strongest proprietary models such as Claude Opus 4.5 and Gemini 3 Pro by about 40\% on the hardest Level-3 setting.
- Mode Seeking meets Mean Seeking for Fast Long Video Generation
Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence based on a unified representation via a Decoupled Diffusion Transformer. Our approach utilizes a global Flow Matching head trained via supervised learning on long videos to capture narrative structure, while simultaneously employing a local Distribution Matching head that aligns sliding windows to a frozen short-video teacher via a mode-seeking reverse-KL divergence. This strategy enables the synthesis of minute-scale videos that learns long-range coherence and motions from limited long videos via supervised flow matching, while inheriting local realism by aligning every sliding-window segment of the student to a frozen short-video teacher, resulting in a few-step fast long video generator. Evaluations show that our method effectively closes the fidelity-horizon gap by jointly improving local sharpness, motion and long-range consistency. Project website: https://primecai.github.io/mmm/.
- LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding
Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.
- UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?
Unified multimodal models have recently demonstrated strong generative capabilities, yet whether and when generation improves understanding remains unclear. Existing benchmarks lack a systematic exploration of the specific tasks where generation facilitates understanding. To this end, we introduce UniG2U-Bench, a comprehensive benchmark categorizing generation-to-understanding (G2U) evaluation into 7 regimes and 30 subtasks, requiring varying degrees of implicit or explicit visual transformations. Extensive evaluation of over 30 models reveals three core findings: 1) Unified models generally underperform their base Vision-Language Models (VLMs), and Generate-then-Answer (GtA) inference typically degrades performance relative to direct inference. 2) Consistent enhancements emerge in spatial intelligence, visual illusions, or multi-round reasoning subtasks, where enhanced spatial and shape perception, as well as multi-step intermediate image states, prove beneficial. 3) Tasks with similar reasoning structures and models sharing architectures exhibit correlated behaviors, suggesting that generation-understanding coupling induces class-consistent inductive biases over tasks, pretraining data, and model architectures. These findings highlight the necessity for more diverse training data and novel paradigms to fully unlock the potential of unified multimodal modeling.
- Beyond Language Modeling: An Exploration of Multimodal Pretraining
The visual world offers a critical axis for advancing foundation models beyond language. Despite growing interest in this direction, the design space for native multimodal models remains opaque. We provide empirical clarity through controlled, from-scratch pretraining experiments, isolating the factors that govern multimodal pretraining without interference from language pretraining. We adopt the Transfusion framework, using next-token prediction for language and diffusion for vision, to train on diverse data including text, video, image-text pairs, and even action-conditioned video. Our experiments yield four key insights: (i) Representation Autoencoder (RAE) provides an optimal unified visual representation by excelling at both visual understanding and generation; (ii) visual and language data are complementary and yield synergy for downstream capabilities; (iii) unified multimodal pretraining leads naturally to world modeling, with capabilities emerging from general training; and (iv) Mixture-of-Experts (MoE) enables efficient and effective multimodal scaling while naturally inducing modality specialization. Through IsoFLOP analysis, we compute scaling laws for both modalities and uncover a scaling asymmetry: vision is significantly more data-hungry than language. We demonstrate that the MoE architecture harmonizes this scaling asymmetry by providing the high model capacity required by language while accommodating the data-intensive nature of vision, paving the way for truly unified multimodal models.
- Utonia: Toward One Encoder for All Point Clouds
We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder across diverse domains, spanning remote sensing, outdoor LiDAR, indoor RGB-D sequences, object-centric CAD models, and point clouds lifted from RGB-only videos. Despite their distinct sensing geometries, densities, and priors, Utonia learns a consistent representation space that transfers across domains. This unification improves perception capability while revealing intriguing emergent behaviors that arise only when domains are trained jointly. Beyond perception, we observe that Utonia representations can also benefit embodied and multimodal reasoning: conditioning vision-language-action policies on Utonia features improves robotic manipulation, and integrating them into vision-language models yields gains on spatial reasoning. We hope Utonia can serve as a step toward foundation models for sparse 3D data, and support downstream applications in AR/VR, robotics, and autonomous driving.
- BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
Current benchmarks for code agents primarily assess narrow, repository-specific fixes, overlooking critical real-world challenges such as cross-repository reasoning, domain-specialized problem solving, dependency-driven migration, and full-repository generation. To address this gap, we introduce BeyondSWE, a comprehensive benchmark that broadens existing evaluations along two axes - resolution scope and knowledge scope - using 500 real-world instances across four distinct settings. Experimental results reveal a significant capability gap: even frontier models plateau below 45% success, and no single model performs consistently across task types. To systematically investigate the role of external knowledge, we develop SearchSWE, a framework that integrates deep search with coding abilities. Our experiments show that search augmentation yields inconsistent gains and can in some cases degrade performance, highlighting the difficulty of emulating developer-like workflows that interleave search and reasoning during coding tasks. This work offers both a realistic, challenging evaluation benchmark and a flexible framework to advance research toward more capable code agents.
- Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models
Recent advancements in Generative Reward Models (GRMs) have demonstrated that scaling the length of Chain-of-Thought (CoT) reasoning considerably enhances the reliability of evaluation. However, current works predominantly rely on unstructured length scaling, ignoring the divergent efficacy of different reasoning mechanisms: Breadth-CoT (B-CoT, i.e., multi-dimensional principle coverage) and Depth-CoT (D-CoT, i.e., substantive judgment soundness). To address this, we introduce Mix-GRM, a framework that reconfigures raw rationales into structured B-CoT and D-CoT through a modular synthesis pipeline, subsequently employing Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR) to internalize and optimize these mechanisms. Comprehensive experiments demonstrate that Mix-GRM establishes a new state-of-the-art across five benchmarks, surpassing leading open-source RMs by an average of 8.2\%. Our results reveal a clear divergence in reasoning: B-CoT benefits subjective preference tasks, whereas D-CoT excels in objective correctness tasks. Consequently, misaligning the reasoning mechanism with the task directly degrades performance. Furthermore, we demonstrate that RLVR acts as a switching amplifier, inducing an emergent polarization where the model spontaneously allocates its reasoning style to match task demands. The synthesized data and models are released at https://huggingface.co/collections/DonJoey/mix-grm{Hugging Face}, and the code is released at https://github.com/Don-Joey/Mix-GRM{Github}.
- How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.
Techmeme(42)
- Filing: PayPay is seeking to raise up to $1.1B at a valuation of up to $13.4B in its US IPO, selling nearly 55M shares priced between $17 and $20 apiece (Arasu Kannagi Basil/Reuters)
Arasu Kannagi Basil / Reuters : Filing: PayPay is seeking to raise up to $1.1B at a valuation of up to $13.4B in its US IPO, selling nearly 55M shares priced between $17 and $20 apiece — PayPay and a selling shareholder are aiming to raise as much as $1.1 billion in an initial public offering in the United States …
- Source: Cursor's annualized revenue topped $2B in February, doubling from three months earlier, and about 60% of the revenue is coming from corporate customers (Rachel Metz/Bloomberg)
Rachel Metz / Bloomberg : Source: Cursor's annualized revenue topped $2B in February, doubling from three months earlier, and about 60% of the revenue is coming from corporate customers — Cursor's annualized revenue topped $2 billion in February, according to a person familiar with the matter …
- The US Treasury Department, State Department, and federal housing agency are ending use of Anthropic products; State Department says it will switch to OpenAI (Reuters)
Reuters : The US Treasury Department, State Department, and federal housing agency are ending use of Anthropic products; State Department says it will switch to OpenAI — The U.S. Treasury Department, State Department and the federal housing agency are terminating all use of Anthropic products …
- Sources: US considers limiting Chinese companies to 75K Nvidia H200 chips each, less than half of what some want to buy; AMD MI325 chips also count toward a cap (Bloomberg)
Bloomberg : Sources: US considers limiting Chinese companies to 75K Nvidia H200 chips each, less than half of what some want to buy; AMD MI325 chips also count toward a cap — US officials are considering caps on the number of AI accelerators Nvidia Corp. can export to any one Chinese company …
- Iranians turn to Starlink, decentralized messaging apps, and VPNs to circumvent an internet blackout; NetBlocks says connectivity is at 1% of ordinary levels (Bloomberg)
Bloomberg : Iranians turn to Starlink, decentralized messaging apps, and VPNs to circumvent an internet blackout; NetBlocks says connectivity is at 1% of ordinary levels — Iranians are finding ways to circumvent a fresh internet blackout imposed by their government and are sharing footage of US and Israeli airstrikes with the world.
- New York-based Ease Health, which is building an AI-native OS for behavioral health providers, emerged from stealth and raised a $41M Series A led by a16z (Vignesh R/Tech Funding News)
Vignesh R / Tech Funding News : New York-based Ease Health, which is building an AI-native OS for behavioral health providers, emerged from stealth and raised a $41M Series A led by a16z — Behavioural health providers across the US are struggling with outdated software. Most clinics use separate systems for admissions, patient records, and billing.
- Sources: President Trump met with Coinbase CEO Brian Armstrong on March 3 before publicly admonishing banks over the GENIUS Act, echoing Coinbase's position (Jasper Goodman/Politico)
Jasper Goodman / Politico : Sources: President Trump met with Coinbase CEO Brian Armstrong on March 3 before publicly admonishing banks over the GENIUS Act, echoing Coinbase's position — President Donald Trump met privately on Tuesday with Coinbase CEO Brian Armstrong before publicly backing the company's position …
- An OpenAI spokesperson says Sam Altman misspoke in saying OpenAI was looking to deploy on NATO classified networks, and that it was for "unclassified networks" (Hyunsu Yim/Reuters)
Hyunsu Yim / Reuters : An OpenAI spokesperson says Sam Altman misspoke in saying OpenAI was looking to deploy on NATO classified networks, and that it was for “unclassified networks” — OpenAI is considering a contract to deploy its AI technology on North Atlantic Treaty Organization's (NATO) …
- Asia's smaller chip companies are joining their bigger peers in hiking prices as robust AI demand fuels capex, projected to rise 25% YoY to over $136B in 2026 (Nikkei Asia)
Nikkei Asia : Asia's smaller chip companies are joining their bigger peers in hiking prices as robust AI demand fuels capex, projected to rise 25% YoY to over $136B in 2026 — TAIPEI — Asia's smaller chip companies are joining their bigger peers in hiking prices as robust AI demand fuels record levels …
- The UK government commits an initial £40M to an AI research lab, modeled on its DARPA-inspired ARIA, seeking breakthroughs in science, healthcare, and transport (Madhumita Murgia/Financial Times)
Madhumita Murgia / Financial Times : The UK government commits an initial £40M to an AI research lab, modeled on its DARPA-inspired ARIA, seeking breakthroughs in science, healthcare, and transport — New state-backed body seeks AI breakthroughs in science, healthcare and transport — The UK is launching …
- Sea's NY-listed shares fell as much as 27% on March 3, their worst intraday drop since 2023, after reporting Q4 net income up 73% YoY to $410.9M, vs. $442M est. (Olivia Poh/Bloomberg)
Olivia Poh / Bloomberg : Sea's NY-listed shares fell as much as 27% on March 3, their worst intraday drop since 2023, after reporting Q4 net income up 73% YoY to $410.9M, vs. $442M est. — Sea Ltd. shares fell by the most in more than two years after its quarterly earnings missed analysts' estimates …
- Lockheed Martin to follow US federal ban on Anthropic, as government contracting attorneys say defense contractors are expected to comply with the DOD's order (Reuters)
Reuters : Lockheed Martin to follow US federal ban on Anthropic, as government contracting attorneys say defense contractors are expected to comply with the DOD's order — U.S. defense contractors, like Lockheed Martin (LMT.N), are expected to follow the Pentagon's order to purge Anthropic's prized AI tools …
Solidot(38)
- NIST 限制外国科学家进入其实验室
过去几周在美国国家标准与技术研究院(NIST)工作的数百名外国科学家被限制进入实验室,除非有联邦雇员陪同,否则不得在晚上和周末进入实验室。部分国家的科学家最早将在本月底失去访问权限。拟议中的规则尚无书面版本,仅通过会议传达。最新变化是基于 NIST 在 2025 年更新的研究安全规则,它将中国、俄罗斯、伊朗、朝鲜、古巴、委内瑞拉和叙利亚的科学家视为“高风险”人群,中国等国的研究人员已被告知,他们的实验室访问权限将于 3 月 31 日前接受审查,如果在 NIST 工作逾 3 年或从事量子技术或 AI 等敏感项目而构成“高风险”,其访问权限将被终止。低风险国家的研究人员也面临从 9 月或 12 月起失去访问权限。NIST 研究人员不从事机密研究,NIST 前主任 Patrick Gallagher 表示看不出这么做会带来什么安全上的好处。
- 亚马逊 AWS 中东数据中心遭遇火灾和断电
亚马逊 AWS 披露其位于中东数据中心的一处遭遇断电一处遭遇“物体”撞击后起火。它没有披露是什么东西撞击了数据中心设施。AWS 表示它位于阿联酋的一个数据中心于 7:30 a.m. ET 遭遇撞击,撞击产生了火花和火灾,消防部门在灭火过程中切断了数据中心和发电机的电源。
- 为何女性的疼痛持续时间更长
医生通常认为免疫系统会通过引起炎症加剧疼痛,而炎症通常表现为红肿。最新研究显示,免疫细胞在帮助缓解疼痛方面也可能至关重要,男女之间的免疫细胞功能差异可能会影响疼痛消退的速度。研究人员调查了名为 IL-10(interleukin-10)的分子,测量了小鼠皮肤损伤后和交通事故急诊伤者体内的 IL-10 水平,发现 IL-10 的作用不仅是缓解炎症,还能直接与疼痛感受神经细胞通信将其关闭。也就是 IL-10 有助于消除疼痛。IL-10 由免疫系统的一种白细胞单核细胞(Monocyte)产生,这种细胞会在血液中循环转移到受伤组织。在男性体内,单核细胞更容易产生 IL-10,而在女性体内不太明显。原因是睾酮会影响单核细胞产生的 IL-10 的数量,而男性体内有更高的睾酮水平。
- 小鼠研究发现器官同步衰老但存在性别差异
研究人员构建了迄今最详尽的图谱,展示了衰老如何影响21种哺乳动物组织中的数千种细胞亚型。他们通过分析不同年龄段小鼠的近 700 万个单细胞,确定了随时间推移最易受损的细胞,以及促使其衰老的因素。研究人员对 32 只小鼠21个器官中提取的数百万个单细胞进行分析。这些小鼠处于 3 个年龄段:1 个月(年轻成年小鼠)、5 个月(中年小鼠)、21 个月(老年小鼠)。研究人员识别出了超过 1800 种不同的细胞亚型,其中包括许多此前未被完整描述过的罕见类型。随后研究团队追踪了各年龄阶段小鼠不同类型细胞的数量变化情况。数十年来,科学家们一直认为衰老主要改变的是细胞功能,而非细胞数量。而研究团队的分析结果对这一观点提出了挑战。他们发现,约 1/4 的细胞类型在数量上随时间推移发生了显著变化,比如某些肌肉细胞和肾细胞的数量大幅减少,而免疫细胞数量则大幅增加。这些变化在不同器官的细胞中具有同步性,相似的细胞状态在不同器官中几乎同时出现和消失。这种模式表明,血液中循环中的共同信号可能有助协调全身的衰老过程。大约 40% 与衰老相关的改变因性别而异。例如女性衰老过程中,表现出更广泛的免疫激活情况。
- 2026 年 2 月 Steam 统计显示简体中文用户占逾半数份额
Valve 公布了 2026 年 2 月的 Steam 硬件和软件调查。数据显示了一个异常:简体中文用户出现爆炸式增长,单月增长 30.74% 至 54.60%,英文用户减少 14.74% 至 22.27%。另一个异常是 Windows 11 下降 10.43% 占 56.28%,64 位 Windows 10 增长 12.46% 至 40.25%。异常现象的一个解释也许是 2 月中旬是大部分简体中文用户的春节假期。其它数据包括:Windows 操作系统份额增长 1.99% 至 96.61%,Linux 下降 1.15% 占 2.23%,macOS 下降 0.85% 占 1.16%。
- 摩托罗拉手机宣布与 GrapheneOS 合作
联想旗下的摩托罗拉手机宣布与 Android 安全加固社区发行版项目 GrapheneOS 展开合作。GrapheneOS 此前主要支持 Google 的 Pixel 系列手机,但摩托罗拉哪些型号的手机会支持 GrapheneOS 官方新闻稿没有给出信息,只是表示未来会公布。摩托罗拉同时推出了企业级分析平台 Moto Analytics,Moto Secure 支持私密图像数据功能,启用后会自动从设备上所有新拍摄的图像中移除敏感元数据如位置。
- ChatGPT 卸载率在五角大楼交易之后飙升 295%
根据 Sensor Tower 的数据,在 OpenAI 与五角大楼达成交易之后,用户对此做出了反应,2 月 28 日当天 OpenAI AI 聊天机器人 ChatGPT 应用的卸载量在美国比前一天飙升 295%,而过去 30 天它的平均日卸载率是 9%。与此同时,拒绝五角大楼要求的 OpenAI 竞争对手 Anthropic 的 Claude 应用的下载量在 2 月 27 日和 2 月 28 日分别增长了 37% 和 51%。ChatGPT 的下载量也受到了交易的影响,在宣布与五角大楼合作前的 2 月 27 日 ChatGPT 下载量环比增长 14%,但宣布交易后的 28 日其下载量环比下降 13%,3 月 1 日下载量继续环比下降 5%。Claude 在美国免费应用排行榜也已经连续三天登上榜首,这一波热潮也导致 Claude 多次发生短暂的宕机。
- ARM Cortex X925 桌面性能赶上了 AMD 和英特尔
英国公司 Arm 设计的芯片长期以来是为低功耗和小面积优化的,但它也一直推出针对高性能应用场景的核心。2012 年 Arm 发布 64 位核心 Cortex A57 时,能媲美 AMD 和英特尔最新处理器还是遥不可及的梦想。它在 2024 年推出的高性能核心 Cortex X925 已将梦想变成了现实。英伟达超级芯片 GB10 Superchip 使用的 Arm 核心就是基于 Cortex X925。它在桌面性能上赶上了 AMD Zen 5 和英特尔的 Lion Cove。GB10 使用了 10 个 X925 核心,分成两个集群,其中之一的 X925 核心最高频率 4 GHz,另一个是 3.9 GHz。测试显示它的重排序性能优于 AMD Zen 5,L2 缓存容量赶上了英特尔处理器的 P-Cores(即性能核心)。
- 南极过去三十年损失了 1.2 万平方公里的底部冰
加州尔湾的冰川学家绘制了过去 30 年的南极洲环极底部冰线迁移图,显示它损失了逾 1.2 万平方公里的底部冰(grounded ice)。底部冰是直接与海床或基岩接触的冰,区别于漂浮在水面上的冰架或冰山,它们通常更稳定。研究人员综合分析了多颗卫星的数据,发现 77% 的海岸线未发生冰川接地线迁移,但西南极洲、南极半岛和东南极洲部分地区损失了 12,820 平方公里的底部冰。冰盖正以平均每年 442 平方公里的速度从接地线后退。变化最显著的是西南极洲的 Amundsen Sea 和 Getz 海域,冰川后退了 10-40 公里左右。Pine Island 冰川后退了 33 公里,Thwaites 冰川后退了 26 公里,Smith 冰川的后退距离达到了 42 公里。
- 小米莱卡手机起售价 1.6 万元
小米和莱卡宣布了面向高端智能手机市场的莱卡智能手机 Leitzphone,以莱卡创始人 Ernst Leitz 的名字命名。Leitzphone 配备了两个 5000 万像素镜头和一个 2 亿像素镜头,提供了类似相机的调整焦距、快门速度和曝光的旋钮,硬件配置与 Xiaomi 17 Ultra 基本相同,起售价 1999 欧元——约 1.6 万人民币。
- 人体血液中 CO2 水平也在上升
随着大气二氧化碳(CO2)浓度持续升高,人体血液中的 CO2 水平也在悄然攀升。如果这一趋势不加遏制,几十年内,一项关键血液指标可能逼近健康警戒线。相关论文发表于《Air Quality, Atmosphere & Health》杂志。 为探究大气变化对人体内部的影响,研究团队系统梳理了美国 20 余年间的大规模人口数据。结果发现,血液化学成分的悄然演变,与大气 CO2 的上升轨迹惊人同步。数据显示,血清碳酸氢盐这一与体内 CO2 水平密切挂钩的标志物的平均浓度,在 20 余年间上升了约 7%。与此同时,钙和磷的平均水平则出现了下降。这一变化恰与大气 CO2 浓度从 2000 年的 369ppm 攀升至如今超过 420ppm 的背景重合。团队表示,这暗示着人体可能正在“默默代偿”,以应对环境变化带来的内在压力。碳酸氢盐是维持人体血液酸碱平衡的“缓冲剂”。当血液中 CO2 浓度升高,身体便会自动保留更多碳酸氢盐,以中和酸性,稳定内环境。然而这种代偿并非长久之计。持续的微调,终将打破人体精密的生理平衡。模型推演显示,若当前趋势不改,50 年后人体血清碳酸氢盐平均水平或将触及当今健康范围的上限,而钙、磷浓度也可能在本世纪晚些时候跌破下限。
- Ars Technica 的 AI 记者离职
知名科技媒体 Ars Technica 上个月在报道 AI 新闻时被发现将 AI 生成的内容作为消息来源使用,Ars 联合创始人兼主编 Ken Fisher 为此发表声明公开道歉。这篇报道的合作者 Benj Edwards 是 Ars 的资深 AI 记者,他表示自己承担全部责任,另一位合作者与这起错误没有关联。他辩解说自己尝试使用基于 Claude Code 的实验性 AI 工具从原始材料中提取出可添加到大纲的结构化引用内容,但该 AI 拒绝处理,他猜测可能是文章描述的是一起骚扰事件(AI 骚扰人类),他于是将文本拷贝到 ChatGPT,没有注意到 ChatGPT 生成了文章作者的意译版本而不是原话,在引用时没有核实引用是否与原文一致。现在 Benj Edwards 在 Ars 的简历已经变成了过去时态,意味着他已经离职,但是否被解雇 Ars 没有说明, Edwards 本人拒绝置评。