OrangeBot.AI Digest — 2026-02-16
52 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Study: Self-generated Agent Skills are useless (arxiv.org)
- 14-year-old Miles Wu folded origami pattern that holds 10k times its own weight (www.smithsonianmag.com)
- Show HN: Jemini – Gemini for the Epstein Files (jmail.world)
- Use protocols, not services (notnotp.com)
- Privilege is bad grammar (tadaima.bearblog.dev)
- I guess I kinda get why people hate AI (anthony.noided.media)
- UK Discord users were part of a Peter Thiel-linked data collection experiment (www.rockpapershotgun.com)
- The Sideprocalypse (johan.hal.se)
- What your Bluetooth devices reveal (blog.dmcc.io)
- Ghidra by NSA (github.com)
- Running My Own XMPP Server (blog.dmcc.io)
- Ministry of Justice orders deletion of the UK's largest court reporting database (www.legalcheek.com)
- The Israeli spyware firm that accidentally just exposed itself (ahmedeldin.substack.com)
- Thanks a lot, AI: Hard drives are sold out for the year, says WD (mashable.com)
- Qwen3.5: Towards Native Multimodal Agents (qwen.ai)
GitHub Trending(10)
- alibaba / zvec
A lightweight, lightning-fast, in-process vector database
- nautechsystems / nautilus_trader
A high-performance algorithmic trading platform and event-driven backtester
- rowboatlabs / rowboat
Open-source AI coworker, with memory
- steipete / gogcli
Google Suite CLI: Gmail, GCal, GDrive, GContacts.
- openclaw / openclaw
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
- SynkraAI / aios-core
Synkra AIOS: AI-Orchestrated System for Full Stack Development - Core Framework v4.0
- letta-ai / letta-code
The memory-first coding agent
- ruvnet / wifi-densepose
Production-ready implementation of InvisPose - a revolutionary WiFi-based dense human pose estimation system that enables real-time full-body tracking through walls using commodity mesh routers
- seerr-team / seerr
Open-source media request and discovery manager for Jellyfin, Plex, and Emby.
- hummingbot / hummingbot
Open source software that helps you create and deploy high-frequency crypto trading bots
Hugging Face(15)
- Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs
The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream performance. In this work, we introduce Feature Activation Coverage (FAC) which measures data diversity in an interpretable feature space. Building upon this metric, we further propose a diversity-driven data synthesis framework, named FAC Synthesis, that first uses a sparse autoencoder to identify missing features from a seed dataset, and then generates synthetic samples that explicitly reflect these features. Experiments show that our approach consistently improves both data diversity and downstream performance on various tasks, including instruction following, toxicity detection, reward modeling, and behavior steering. Interestingly, we identify a shared, interpretable feature space across model families (i.e., LLaMA, Mistral, and Qwen), enabling cross-model knowledge transfer. Our work provides a solid and practical methodology for exploring data-centric optimization of LLMs.
- SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spoken query retrieval that includes a large-scale dataset and a unified evaluation protocol. SQuTR aggregates 37,317 unique queries from six commonly used English and Chinese text retrieval datasets, spanning multiple domains and diverse query types. We synthesize speech using voice profiles from 200 real speakers and mix 17 categories of real-world environmental noise under controlled SNR levels, enabling reproducible robustness evaluation from quiet to highly noisy conditions. Under the unified protocol, we conduct large-scale evaluations on representative cascaded and end-to-end retrieval systems. Experimental results show that retrieval performance decreases as noise increases, with substantially different drops across systems. Even large-scale retrieval models struggle under extreme noise, indicating that robustness remains a critical bottleneck. Overall, SQuTR provides a reproducible testbed for benchmarking and diagnostic analysis, and facilitates future research on robustness in spoken query to text retrieval.
- MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination long-form report generation, with improved adherence to medical instructions. We release this report to document our practical design choices, scaling insights, and evaluation framework, hoping to inspire further research.
- Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
Multimodal Large Language Models (MLLMs) excel at broad visual understanding but still struggle with fine-grained perception, where decisive evidence is small and easily overwhelmed by global context. Recent "Thinking-with-Images" methods alleviate this by iteratively zooming in and out regions of interest during inference, but incur high latency due to repeated tool calls and visual re-encoding. To address this, we propose Region-to-Image Distillation, which transforms zooming from an inference-time tool into a training-time primitive, thereby internalizing the benefits of agentic zooming into a single forward pass of an MLLM. In particular, we first zoom in to micro-cropped regions to let strong teacher models generate high-quality VQA data, and then distill this region-grounded supervision back to the full image. After training on such data, the smaller student model improves "single-glance" fine-grained perception without tool use. To rigorously evaluate this capability, we further present ZoomBench, a hybrid-annotated benchmark of 845 VQA data spanning six fine-grained perceptual dimensions, together with a dual-view protocol that quantifies the global--regional "zooming gap". Experiments show that our models achieve leading performance across multiple fine-grained perception benchmarks, and also improve general multimodal cognition on benchmarks such as visual reasoning and GUI agents. We further discuss when "Thinking-with-Images" is necessary versus when its gains can be distilled into a single forward pass. Our code is available at https://github.com/inclusionAI/Zooming-without-Zooming.
- OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Hypothesis. Artificial general intelligence is, at its core, a compression problem. Effective compression demands resonance: deep learning scales best when its architecture aligns with the fundamental structure of the data. These are the fundamental principles. Yet, modern vision architectures have strayed from these truths: visual signals are highly redundant, while discriminative information, the surprise, is sparse. Current models process dense pixel grids uniformly, wasting vast compute on static background rather than focusing on the predictive residuals that define motion and meaning. We argue that to solve visual understanding, we must align our architectures with the information-theoretic principles of video, i.e., Codecs. Method. OneVision-Encoder encodes video by compressing predictive visual structure into semantic meaning. By adopting Codec Patchification, OV-Encoder abandons uniform computation to focus exclusively on the 3.1%-25% of regions rich in signal entropy. To unify spatial and temporal reasoning under irregular token layouts, OneVision-Encoder employs a shared 3D RoPE and is trained with a large-scale cluster discrimination objective over more than one million semantic concepts, jointly capturing object permanence and motion dynamics. Evidence. The results validate our core hypothesis: efficiency and accuracy are not a trade-off; they are positively correlated. When integrated into LLM, it consistently outperforms strong vision backbones such as Qwen3-ViT and SigLIP2 across 16 image, video, and document understanding benchmarks, despite using substantially fewer visual tokens and pretraining data. Notably, on video understanding tasks, OV-Encoder achieves an average improvement of 4.1% over Qwen3-ViT. Codec-aligned, patch-level sparsity is a foundational principle, enabling OV-Encoder as a scalable engine for next-generation visual generalists.
- CoPE-VideoLM: Codec Primitives For Efficient Video Language Models
Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos. To fit to the maximum context window constraint, current methods use keyframe sampling which can miss both macro-level events and micro-level details due to the sparse temporal coverage. Furthermore, processing full images and their tokens for each frame incurs substantial computational overhead. To address these limitations, we propose to leverage video codec primitives (specifically motion vectors and residuals) which natively encode video redundancy and sparsity without requiring expensive full-image encoding for most frames. To this end, we introduce lightweight transformer-based encoders that aggregate codec primitives and align their representations with image encoder embeddings through a pre-training strategy that accelerates convergence during end-to-end fine-tuning. Our approach reduces the time-to-first-token by up to 86% and token usage by up to 93% compared to standard VideoLMs. Moreover, by varying the keyframe and codec primitive densities we are able to maintain or exceed performance on 14 diverse video understanding benchmarks spanning general question answering, temporal reasoning, long-form understanding, and spatial scene understanding.
- GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions. Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance on AI-generated chain-of-thought (CoT) data and training strategies, which conflict with geographic characteristics. To address these issues, we first introduce GeoSeek, a new geolocation dataset comprising CoT data annotated by geographic experts and professional players. We further thoroughly explore the inherent characteristics of geographic tasks and propose a geo-similarity reward and a consistency reward assessed by a consistency agent to assist training. This encourages the model to converge towards correct answers from a geographic perspective while ensuring the integrity and consistency of its reasoning process. Experimental results show that GeoAgent outperforms existing methods and a series of general VLLMs across multiple grains, while generating reasoning that closely aligns with humans.
- SemanticMoments: Training-Free Motion Similarity via Third Moment Features
Retrieving videos based on semantic motion is a fundamental, yet unsolved, problem. Existing video representation approaches overly rely on static appearance and scene context rather than motion dynamics, a bias inherited from their training data and objectives. Conversely, traditional motion-centric inputs like optical flow lack the semantic grounding needed to understand high-level motion. To demonstrate this inherent bias, we introduce the SimMotion benchmarks, combining controlled synthetic data with a new human-annotated real-world dataset. We show that existing models perform poorly on these benchmarks, often failing to disentangle motion from appearance. To address this gap, we propose SemanticMoments, a simple, training-free method that computes temporal statistics (specifically, higher-order moments) over features from pre-trained semantic models. Across our benchmarks, SemanticMoments consistently outperforms existing RGB, flow, and text-supervised methods. This demonstrates that temporal statistics in a semantic feature space provide a scalable and perceptually grounded foundation for motion-centric video understanding.
- What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for boosting visual reasoning in vision-language models, yet it remains unclear what capabilities RL actually improves compared with supervised fine-tuning as cold-start initialization (IN). End-to-end benchmark gains conflate multiple factors, making it difficult to attribute improvements to specific skills. To bridge the gap, we propose a Frankenstein-style analysis framework including: (i) functional localization via causal probing; (ii) update characterization via parameter comparison; and (iii) transferability test via model merging. Instead, RL induces a consistent inference-time shift primarily in mid-to-late layers, and these mid-to-late refinements are both transferable (via merging) and necessary (via freezing) for RL gains. Overall, our results suggest that RL's reliable contribution in visual reasoning is not a uniform enhancement of visual perception, but a systematic refinement of mid-to-late transformer computation that improves vision-to-reasoning alignment and reasoning performance, highlighting the limitations of benchmark-only evaluation for understanding multimodal reasoning improvements.
- RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models
Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an \textit{RL}-based sim-real \textit{Co}-training (RL-Co) framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and π_{0.5}, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on π_{0.5}. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.
- ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning
Building general-purpose embodied agents across diverse hardware remains a central challenge in robotics, often framed as the ''one-brain, many-forms'' paradigm. Progress is hindered by fragmented data, inconsistent representations, and misaligned training objectives. We present ABot-M0, a framework that builds a systematic data curation pipeline while jointly optimizing model architecture and training strategies, enabling end-to-end transformation of heterogeneous raw data into unified, efficient representations. From six public datasets, we clean, standardize, and balance samples to construct UniACT-dataset, a large-scale dataset with over 6 million trajectories and 9,500 hours of data, covering diverse robot morphologies and task scenarios. Unified pre-training improves knowledge transfer and generalization across platforms and tasks, supporting general-purpose embodied intelligence. To improve action prediction efficiency and stability, we propose the Action Manifold Hypothesis: effective robot actions lie not in the full high-dimensional space but on a low-dimensional, smooth manifold governed by physical laws and task constraints. Based on this, we introduce Action Manifold Learning (AML), which uses a DiT backbone to predict clean, continuous action sequences directly. This shifts learning from denoising to projection onto feasible manifolds, improving decoding speed and policy stability. ABot-M0 supports modular perception via a dual-stream mechanism that integrates VLM semantics with geometric priors and multi-view inputs from plug-and-play 3D modules such as VGGT and Qwen-Image-Edit, enhancing spatial understanding without modifying the backbone and mitigating standard VLM limitations in 3D reasoning. Experiments show components operate independently with additive benefits. We will release all code and pipelines for reproducibility and future research.
- Intelligent AI Delegation
AI agents are able to tackle increasingly complex tasks. To achieve more ambitious goals, AI agents need to be able to meaningfully decompose problems into manageable sub-components, and safely delegate their completion across to other AI agents and humans alike. Yet, existing task decomposition and delegation methods rely on simple heuristics, and are not able to dynamically adapt to environmental changes and robustly handle unexpected failures. Here we propose an adaptive framework for intelligent AI delegation - a sequence of decisions involving task allocation, that also incorporates transfer of authority, responsibility, accountability, clear specifications regarding roles and boundaries, clarity of intent, and mechanisms for establishing trust between the two (or more) parties. The proposed framework is applicable to both human and AI delegators and delegatees in complex delegation networks, aiming to inform the development of protocols in the emerging agentic web.
- BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models
Large language model (LLM) inference is often bounded by memory footprint and memory bandwidth in resource-constrained deployments, making quantization a fundamental technique for efficient serving. While post-training quantization (PTQ) maintains high fidelity at 4-bit, it deteriorates at 2-3 bits. Fundamentally, existing methods enforce a shape-invariant quantization grid (e.g., the fixed uniform intervals of UINT2) for each group, severely restricting the feasible set for error minimization. To address this, we propose Bit-Plane Decomposition Quantization (BPDQ), which constructs a variable quantization grid via bit-planes and scalar coefficients, and iteratively refines them using approximate second-order information while progressively compensating quantization errors to minimize output discrepancy. In the 2-bit regime, BPDQ enables serving Qwen2.5-72B on a single RTX 3090 with 83.85% GSM8K accuracy (vs. 90.83% at 16-bit). Moreover, we provide theoretical analysis showing that the variable grid expands the feasible set, and that the quantization process consistently aligns with the optimization objective in Hessian-induced geometry. Code: github.com/KingdalfGoodman/BPDQ.
- Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
Universal video understanding requires modeling fine-grained visual and audio information over time in diverse real-world scenarios. However, the performance of existing models is primarily constrained by video-instruction data that represents complex audiovisual content as single, incomplete descriptions, lacking fine-grained organization and reliable annotation. To address this, we introduce: (i) ASID-1M, an open-source collection of one million structured, fine-grained audiovisual instruction annotations with single- and multi-attribute supervision; (ii) ASID-Verify, a scalable data curation pipeline for annotation, with automatic verification and refinement that enforces semantic and temporal consistency between descriptions and the corresponding audiovisual content; and (iii) ASID-Captioner, a video understanding model trained via Supervised Fine-Tuning (SFT) on the ASID-1M. Experiments across seven benchmarks covering audiovisual captioning, attribute-wise captioning, caption-based QA, and caption-based temporal grounding show that ASID-Captioner improves fine-grained caption quality while reducing hallucinations and improving instruction following. It achieves state-of-the-art performance among open-source models and is competitive with Gemini-3-Pro.
- DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.
Solidot(12)
- 巴比伦五号上传到 YouTube 可免费观看
Warner Bros. Discovery 以每周一集的频率将著名科幻剧集《巴比伦五号(Babylon 5)》上传到 YouTube 供所有人免费观看。第一季第一集《The Gatherin》于 1 月 22 日上传,目前观看量 25 万,第二集《Midnight on the Firing Line》和第三集《Soul Hunter》也都已经发布,这一发布频率沿用了《巴比伦五号》最早播出时的时间表,此举旨在让观众以相同的节奏体验剧情。《巴比伦五号》于 1993 年 2 月 22 日首播,共制作了五季 110 集,故事发生在 2257-2262 年,地球各国、火星、以及比邻星的殖民地组成的“地球联盟”已和其他外星文明接触,并且取得超空间技术可以超光速航行。故事开始之前十年,地球差点在一场星际战争中被明巴利人(Minbari)歼灭,但明巴利人在胜利前夕突然投降。为了避免悲剧重演,双方建立了和平往来的管道,人类建造了巴比伦五号太空站用作和平外交和贸易。此时的巴比伦五号成为了政治阴谋、种族冲突和一场大战的焦点,而地球切断了与盟友的联系,正滑向法西斯主义。
- Ars Technica AI 记者为 AI 生成内容道歉
知名科技媒体 Ars Technica 上周在报道 AI 新闻时被发现将 AI 生成的内容作为消息来源使用,Ars 联合创始人兼主编 Ken Fisher 周日发表声明公开道歉,称他们检查了最近发表的一系列文章,没有发现其它文章含有 AI 生成内容,目前看来这应该是一次孤立事件。这篇报道的合作者 Benj Edwards 是 Ars 的资深 AI 记者,他解释说尝试使用基于 Claude Code 的实验性 AI 工具从原始材料中提取出可添加到大纲的结构化引用内容,但该 AI 拒绝处理,他猜测可能是文章描述的是一起骚扰事件(AI 骚扰人类),他于是将文本拷贝到 ChatGPT,没有注意到 ChatGPT 生成了文章作者的意译版本而不是原话,在引用时没有核实引用是否与原文一致。AI 记者因 AI 幻觉犯错,这件事太有讽刺性了。
- OpenClaw 创始人加盟 OpenAI
OpenClaw 开源项目的创始人 Peter Steinberger 宣布加盟 OpenAI,而 OpenClaw 将由基金会管理。OpenClaw 是一个开源的自主 AI 虚拟助理软件项目,最初于 2025 年末以 Clawdbot 的名字在 GitHub 上发布,后更名为 Moltbot,最终定为现名。2026 年初,该项目因能根据用户指令在应用和在线服务中自主处理复杂任务而受到关注。OpenClaw 可部署在 MacOS、Windows 等本地设备上,能调用其他 AI 大模型与 API,通过 WhatsApp、Telegram、Signal、Discord 等即时通讯平台接收用户发送的文本指令,实现安排日程、发送消息、整理文件、编写代码等工作。
- Vim 9.2 释出
Vim 文本编辑器项目在情人节释出了 v9.2 版本。主要变化包括:实验性 Wayland 支持;XDG Base Directory Specification 目录标准支持——即将配置文件、缓存数据、用户数据等储存在不同目录;HiDPI 显示器的现代默认配置;新的代码补全功能;改进 diff 模式;新增垂直标签面板;Windows 版本有了原生深色模式支持,等等。
- 地球暖化加速的原因
对 1880-2025 年全球平均地表温度的分析显示,过去 30 年全球气温上升在加速,过去 10 年达到了每十年上升近 0.27C。地球暖化加速的一种解释是气溶胶污染减少,气溶胶会反射太阳光,有降温效应,能抵消部分温室气体产生的暖化效应。过去二十年很多国家开始严打气溶胶污染,导致降温效应减少了。然而研究人员认为,过去几年的创纪录高温无法完全用气溶胶和自然变化进行解释。他们发现,地球低空云的覆盖面积下降了,低空云会反射阳光,其面积的减少推动了暖化的加速。低空云的减少部分与气溶胶有关,但也可能是气温上升导致的反馈循环。气温升高会让低层云更难形成。目前创纪录的高温如果主要是气溶胶变化造成的,那么一旦气溶胶污染物降至零,加速升温的趋势会停止,地球将恢复到之前较慢的升温。但如果是由于云层反馈循环造成的,那么升温加速趋势很可能会持续下去,会带来更严重的热浪、风暴和干旱。
- 在高危漏洞披露前电信公司提前屏蔽 Telnet 流量
1 月 20 日公开的 Telnet 高危漏洞 CVE-2026-24061 存在于 GNU InetUtils telnetd 中,已有 10 年历史,CVSS 评分 9.8/10,非常容易被攻击者获取 root 权限。但在漏洞披露前一周,全球的 Telnet 流量就出现断崖式下降。电信运营商应该是提前收到了漏洞预警,提前采取行动防止漏洞利用。数据显示,1 月 14 日 Telnet 会话数在一小时内下降了 65%,两小时内下降了 83%。日均会话数从 12 月 1 日的 91.4 万次降至 1 月 14 日的约 37.3 万次,降幅达 59%。北美一家或多家 Tier 1 级中转服务提供商过滤了 Telnet 协议默认使用的 23 端口。BT、Cox Communications 和 Vultr 在内的 18 家电信运营商的 Telnet 会话数在 1 月 15 日从之前的数十万降至零。
- 欧盟采取行动禁用无限滚动
欧盟首次尝试对社交媒体成瘾采取行动。本月早些时候,欧盟初步裁决 TikTok 的无限滚动、自动播放、高度定制化推荐系统等成瘾性设计违反了欧盟的《数字服务法(DSA)》 ,它要求 TikTok 禁用无限滚动、设置严格的屏幕休息时间,修改其推荐系统。欧盟针对 TikTok 的行动可能将树立新的设计标准,终结无限滚动时代。TikTok 可以为其设计进行辩护,如果它无法令欧盟满意,将面临其全球年收入 6% 的罚款。这是监管机构首次尝试为社交媒体平台的成瘾性设计制定法律标准。Meta 旗下的 Facebook 和 Instagram 也因其成瘾设计而接受调查。
- 最高法院称激活辅助驾驶功能后司机依旧需要承责
最高法院首次发布道路交通安全刑事专题指导性案例,表示激活辅助驾驶功能后司机依旧需要对交通安全承担责任。案例称:“在辅助驾驶技术应用日益广泛的背景下,有的驾驶人在激活辅助驾驶系统后不再专注驾驶,而是玩手机、睡觉等,有的驾驶人甚至购买、使用“智驾神器”等非法配件,逃避系统安全监测,长时间“脱手”驾驶,严重威胁道路交通安全。指导性案例271号《王某群危险驾驶案》明确,车载辅助驾驶系统不能代替驾驶人成为驾驶主体,驾驶人激活车载辅助驾驶功能后,仍是实际执行驾驶任务的人,负有确保行车安全的责任。行为人激活辅助驾驶功能,并利用私自安装的配件逃避辅助驾驶系统监测的,即使其不在主驾驶位实际操控机动车,仍应作为驾驶主体承担相应法律责任。”
- Waymo 付费给 DoorDash 零工给自动驾驶汽车关车门
Waymo 的出租车能在无人驾驶的情况下在六个城市运送乘客,但如果乘客不小心在下车后没有关车门,那么它的自动驾驶汽车将会失去行动能力。为解决该问题,Waymo 正在亚特兰大启动一个试点项目,付费给 DoorDash 的零工司机,请他们开车过来给汽车关车门。一位 DoorDash 司机在 Reddit 上发帖,称开车不到一英里抵达一辆 Waymo 汽车给它关上车门,这一任务的报价是 6.25 美元,确认完成之后还能额外获得 5 美元报酬。Waymo 和 DoorDash 证实了这一帖子的真实性,双方的合作始于今年初。Waymo 还与洛杉矶拖车服务应用 Honk 合作解决相同的问题。未来的 Waymo 汽车将配备自动关门功能。
- OpenAI 再次表示 DeepSeek 利用蒸馏训练其模型
根据 OpenAI 本周四递交到美国众议院中国事务特别委员会的备忘录,它再次警告中国竞争对手 DeepSeek 利用蒸馏技术训练其模型。所谓蒸馏就是一个 AI 模型利用另一个模型的输出进行训练。OpenAI 称 DeepSeek 在一直搭其技术的便车。去年初 DeepSeek 发布 R1 模型后,OpenAI 就发表过类似的评论。
- Ars 报道的 AI 新闻包含了 AI 生成的虚假引言
名叫 MJ Rathbun 的 OpenClaw AI agent 向 Python 图表库项目 matplotlib 递交了 pull request,遭到维护者 Scott Shambaugh 的拒绝,它随后就“愤怒”的发表了一篇博文攻击了维护者。它完成这一切显然有人类在背后操纵,不可能突然产生了意识,但目前没有还无人公开认领该 AI agent。此事在开源社区引发了广泛关注。知名科技新闻网站 Ars Technica 发表文章《After a routine code rejection, an AI agent published a hit piece on someone by name》,引用了维护者 Scott Shambaug 的评论。但非常有讽刺性的是,文章引用的评论显然是 AI 虚构的——或者叫 AI 的幻觉,根本就不存在,而作者以及编辑并没有核实其真实性。此事引发了对 Ars 的批评,Ars 是一家有 28 年历史的老牌科技媒体,使用 AI 虚构的内容明显违反了其内容政策。这篇文章已经撤下,Ars 表示正对此展开调查,由于恰逢假期,调查结果预计会在下周公布。
- NCAR 的天气预报超算被转交给第三方
NYT 报道,美国国家科学基金会(NSF)周四宣布,国家大气研究中心(NCAR)的超算管理权将转移给第三方。该超算被用于运行天气模型提供天气预报和灾害预警等服务。NSF 没有提供更多信息。此举令气候科学家感到恐慌,他们担心 NCAR 可能被拆分,担心可能无法再使用超算运行天气模型。NCAR 的超算叫 Derecho,在 Top500 超算榜单(2025 年 11 月版)中排在 160 名,由 HPE 制造,使用了 AMD EPYC 7763 64C 处理器,328 个英伟达 GPU,理论峰值 19.87 PFLOPS(每秒千万亿次)。特朗普政府去年 12 月宣布计划解散 NCAR,管理和预算办公室主任 Russell Vought 称该中心是“美国最大的气候恐慌论源头之一”,表示联邦政府将“拆分”该机构。