OrangeBot.AI Digest — 2026-02-17
55 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Thank HN: You helped save 33k lives
- Stephen Colbert says CBS forbid interview of Democrat because of FCC threat (arstechnica.com)
- Show HN: AsteroidOS 2.0 – Nobody asked, we shipped anyway (asteroidos.org)
- Tesla 'Robotaxi' adds 5 more crashes in Austin in a month – 4x worse than humans (electrek.co)
- Discord Rival Gets Overwhelmed by Exodus of Players Fleeing Age-Verification (kotaku.com)
- Claude Sonnet 4.6 (www.anthropic.com)
- Gentoo on Codeberg (www.gentoo.org)
- Using go fix to modernize Go code (go.dev)
- HackMyClaw (hackmyclaw.com)
- CBS didn't air Rep. James Talarico interview out of fear of FCC (www.nbcnews.com)
- Semantic ablation: Why AI writing is generic and boring (www.theregister.com)
- I converted 2D conventional flight tracking into 3D (aeris.edbn.me)
- A Programmer's Loss of Identity (ratfactor.com)
- Is Show HN dead? No, but it's drowning (www.arthurcnops.blog)
- GrapheneOS – Break Free from Google and Apple (blog.tomaszdunia.pl)
GitHub Trending(13)
- p-e-w / heretic
Fully automatic censorship removal for language models
- seerr-team / seerr
Open-source media request and discovery manager for Jellyfin, Plex, and Emby.
- obra / superpowers
An agentic skills framework & software development methodology that works.
- steipete / gogcli
Google Suite CLI: Gmail, GCal, GDrive, GContacts.
- alibaba / zvec
A lightweight, lightning-fast, in-process vector database
- openclaw / openclaw
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
- SynkraAI / aios-core
Synkra AIOS: AI-Orchestrated System for Full Stack Development - Core Framework v4.0
- ashishps1 / awesome-system-design-resources
Learn System Design concepts and prepare for interviews using free resources.
- steipete / summarize
Point at any URL/YouTube/Podcast or file. Get the gist. CLI and Chrome Extension.
- hummingbot / hummingbot
Open source software that helps you create and deploy high-frequency crypto trading bots
- anthropics / claude-quickstarts
A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API
- davila7 / claude-code-templates
CLI tool for configuring and monitoring Claude Code
- OpenCTI-Platform / opencti
Open Cyber Threat Intelligence Platform
Hugging Face(15)
- DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
Existing multimodal retrieval systems excel at semantic matching but implicitly assume that query-image relevance can be measured in isolation. This paradigm overlooks the rich dependencies inherent in realistic visual streams, where information is distributed across temporal sequences rather than confined to single snapshots. To bridge this gap, we introduce DeepImageSearch, a novel agentic paradigm that reformulates image retrieval as an autonomous exploration task. Models must plan and perform multi-step reasoning over raw visual histories to locate targets based on implicit contextual cues. We construct DISBench, a challenging benchmark built on interconnected visual data. To address the scalability challenge of creating context-dependent queries, we propose a human-model collaborative pipeline that employs vision-language models to mine latent spatiotemporal associations, effectively offloading intensive context discovery before human verification. Furthermore, we build a robust baseline using a modular agent framework equipped with fine-grained tools and a dual-memory system for long-horizon navigation. Extensive experiments demonstrate that DISBench poses significant challenges to state-of-the-art models, highlighting the necessity of incorporating agentic reasoning into next-generation retrieval systems.
- Experiential Reinforcement Learning
Reinforcement learning has become the central approach for language models (LMs) to learn from environmental reward or feedback. In practice, the environmental feedback is usually sparse and delayed. Learning from such signals is challenging, as LMs must implicitly infer how observed failures should translate into behavioral changes for future iterations. We introduce Experiential Reinforcement Learning (ERL), a training paradigm that embeds an explicit experience-reflection-consolidation loop into the reinforcement learning process. Given a task, the model generates an initial attempt, receives environmental feedback, and produces a reflection that guides a refined second attempt, whose success is reinforced and internalized into the base policy. This process converts feedback into structured behavioral revision, improving exploration and stabilizing optimization while preserving gains at deployment without additional inference cost. Across sparse-reward control environments and agentic reasoning benchmarks, ERL consistently improves learning efficiency and final performance over strong reinforcement learning baselines, achieving gains of up to +81% in complex multi-step environments and up to +11% in tool-using reasoning tasks. These results suggest that integrating explicit self-reflection into policy training provides a practical mechanism for transforming feedback into durable behavioral improvement.
- REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.
- STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts
Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often fails to achieve meaningful output diversity. Moreover, existing ITC methods offer limited control over how to perform reasoning, which in turn limits their explainability. We present STATe-of-Thoughts (STATe), an interpretable ITC method that searches over high-level reasoning patterns. STATe replaces stochastic sampling with discrete and interpretable textual interventions: a controller selects actions encoding high-level reasoning choices, a generator produces reasoning steps conditioned on those choices, and an evaluator scores candidates to guide search. This structured approach yields three main advantages. First, action-guided textual interventions produce greater response diversity than temperature-based sampling. Second, in a case study on argument generation, STATe's explicit action sequences capture interpretable features that are highly predictive of output quality. Third, estimating the association between performance and action choices allows us to identify promising yet unexplored regions of the action space and steer generation directly toward them. Together, these results establish STATe as a practical framework for generating high-quality, diverse, and interpretable text. Our framework is available at https://github.com/zbambergerNLP/state-of-thoughts.
- Query as Anchor: Scenario-Adaptive User Representation via Large Language Model
Industrial-scale user representation learning requires balancing robust universality with acute task-sensitivity. However, existing paradigms primarily yield static, task-agnostic embeddings that struggle to reconcile the divergent requirements of downstream scenarios within unified vector spaces. Furthermore, heterogeneous multi-source data introduces inherent noise and modality conflicts, degrading representation. We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis. To empower Large Language Models (LLMs) with deep user understanding, we first construct UserU, an industrial-scale pre-training dataset that aligns multi-modal behavioral sequences with user understanding semantics, and our Q-Anchor Embedding architecture integrates hierarchical coarse-to-fine encoders into dual-tower LLMs via joint contrastive-autoregressive optimization for query-aware user representation. To bridge the gap between general pre-training and specialized business logic, we further introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures, effectively aligning model attention with scenario-specific modalities. For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency. Evaluations on 10 Alipay industrial benchmarks show consistent SOTA performance, strong scalability, and efficient deployment. Large-scale online A/B testing in Alipay's production system across two real-world scenarios further validates its practical effectiveness. Our code is prepared for public release and will be available at: https://github.com/JhCircle/Q-Anchor.
- Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training
Data quality determines foundation model performance, yet systematic processing frameworks are lacking. We introduce Data Darwinism, a ten-level taxonomy (L0-L9) that conceptualizes data-model co-evolution: advanced models produce superior data for next-generation systems. We validate this on scientific literature by constructing Darwin-Science, a 900B-token corpus (L0-L5). We identify a learnability gap in raw scientific text, which we bridge via L4 (Generative Refinement) and L5 (Cognitive Completion) using frontier LLMs to explicate reasoning and terminology. To ensure rigorous attribution, we pre-trained daVinci-origin-3B/7B models from scratch, excluding scientific content to create contamination-free baselines. After 600B tokens of continued pre-training, Darwin-Science outperforms baselines by +2.12 (3B) and +2.95 (7B) points across 20+ benchmarks, rising to +5.60 and +8.40 points on domain-aligned tasks. Systematic progression to L5 yields a +1.36 total gain, confirming that higher-level processing unlocks latent data value. We release the Darwin-Science corpus and daVinci-origin models to enable principled, co-evolutionary development.
- InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem
The rapid evolution of Large Language Models has catalyzed a surge in scientific idea production, yet this leap has not been accompanied by a matching advance in idea evaluation. The fundamental nature of scientific evaluation needs knowledgeable grounding, collective deliberation, and multi-criteria decision-making. However, existing idea evaluation methods often suffer from narrow knowledge horizons, flattened evaluation dimensions, and the inherent bias in LLM-as-a-Judge. To address these, we regard idea evaluation as a knowledge-grounded, multi-perspective reasoning problem and introduce InnoEval, a deep innovation evaluation framework designed to emulate human-level idea assessment. We apply a heterogeneous deep knowledge search engine that retrieves and grounds dynamic evidence from diverse online sources. We further achieve review consensus with an innovation review board containing reviewers with distinct academic backgrounds, enabling a multi-dimensional decoupled evaluation across multiple metrics. We construct comprehensive datasets derived from authoritative peer-reviewed submissions to benchmark InnoEval. Experiments demonstrate that InnoEval can consistently outperform baselines in point-wise, pair-wise, and group-wise evaluation tasks, exhibiting judgment patterns and consensus highly aligned with human experts.
- BitDance: Scaling Autoregressive Generative Models with Binary Tokens
We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to 2^{256} states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using 5.4x fewer parameters (260M) and achieving 8.7x speedup. For text-to-image generation, BitDance trains on large-scale multimodal tokens and generates high-resolution, photorealistic images efficiently, showing strong performance and favorable scaling. When generating 1024x1024 images, BitDance achieves a speedup of over 30x compared to prior AR models. We release code and models to facilitate further research on AR foundation models. Code and models are available at: https://github.com/shallowdream204/BitDance.
- Qute: Towards Quantum-Native Database
This paper envisions a quantum database (Qute) that treats quantum computation as a first-class execution option. Unlike prior simulation-based methods that either run quantum algorithms on classical machines or adapt existing databases for quantum simulation, Qute instead (i) compiles an extended form of SQL into gate-efficient quantum circuits, (ii) employs a hybrid optimizer to dynamically select between quantum and classical execution plans, (iii) introduces selective quantum indexing, and (iv) designs fidelity-preserving storage to mitigate current qubit constraints. We also present a three-stage evolution roadmap toward quantum-native database. Finally, by deploying Qute on a real quantum processor (origin_wukong), we show that it outperforms a classical baseline at scale, and we release an open-source prototype at https://github.com/weAIDB/Qute.
- Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.
- UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model
Unified Multimodal Large Language Models (MLLMs) require a visual representation that simultaneously supports high-fidelity reconstruction, complex semantic extraction, and generative suitability. However, existing visual tokenizers typically struggle to satisfy these conflicting objectives within a single framework. In this paper, we introduce UniWeTok, a unified discrete tokenizer designed to bridge this gap using a massive binary codebook (2^{128}). For training framework, we introduce Pre-Post Distillation and a Generative-Aware Prior to enhance the semantic extraction and generative prior of the discrete tokens. In terms of model architecture, we propose a convolution-attention hybrid architecture with the SigLu activation function. SigLu activation not only bounds the encoder output and stabilizes the semantic distillation process but also effectively addresses the optimization conflict between token entropy loss and commitment loss. We further propose a three-stage training framework designed to enhance UniWeTok's adaptability cross various image resolutions and perception-sensitive scenarios, such as those involving human faces and textual content. On ImageNet, UniWeTok achieves state-of-the-art image generation performance (FID: UniWeTok 1.38 vs. REPA 1.42) while requiring a remarkably low training compute (Training Tokens: UniWeTok 33B vs. REPA 262B). On general-domain, UniWeTok demonstrates highly competitive capabilities across a broad range of tasks, including multimodal understanding, image generation (DPG Score: UniWeTok 86.63 vs. FLUX.1 [Dev] 83.84), and editing (GEdit Overall Score: UniWeTok 5.09 vs. OmniGen 5.06). We release code and models to facilitate community exploration of unified tokenizer and MLLM.
- Learning to Configure Agentic AI Systems
Configuring LLM-based agent systems involves choosing workflows, tools, token budgets, and prompts from a large combinatorial design space, and is typically handled today by fixed large templates or hand-tuned heuristics. This leads to brittle behavior and unnecessary compute, since the same cumbersome configuration is often applied to both easy and hard input queries. We formulate agent configuration as a query-wise decision problem and introduce ARC (Agentic Resource & Configuration learner), which learns a light-weight hierarchical policy using reinforcement learning to dynamically tailor these configurations. Across multiple benchmarks spanning reasoning and tool-augmented question answering, the learned policy consistently outperforms strong hand-designed and other baselines, achieving up to 25% higher task accuracy while also reducing token and runtime costs. These results demonstrate that learning per-query agent configurations is a powerful alternative to "one size fits all" designs.
- BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents
Multimodal large language models (MLLMs), equipped with increasingly advanced planning and tool-use capabilities, are evolving into autonomous agents capable of performing multimodal web browsing and deep search in open-world environments. However, existing benchmarks for multimodal browsing remain limited in task complexity, evidence accessibility, and evaluation granularity, hindering comprehensive and reproducible assessments of deep search capabilities. To address these limitations, we introduce BrowseComp-V^3, a novel benchmark consisting of 300 carefully curated and challenging questions spanning diverse domains. The benchmark emphasizes deep, multi-level, and cross-modal multi-hop reasoning, where critical evidence is interleaved across textual and visual modalities within and across web pages. All supporting evidence is strictly required to be publicly searchable, ensuring fairness and reproducibility. Beyond final-answer accuracy, we incorporate an expert-validated, subgoal-driven process evaluation mechanism that enables fine-grained analysis of intermediate reasoning behaviors and systematic characterization of capability boundaries. In addition, we propose OmniSeeker, a unified multimodal browsing agent framework integrating diverse web search and visual perception tools. Comprehensive experiments demonstrate that even state-of-the-art models achieve only 36% accuracy on our benchmark, revealing critical bottlenecks in multimodal information integration and fine-grained perception. Our results highlight a fundamental gap between current model capabilities and robust multimodal deep search in real-world settings.
- Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings
Leveraging Multimodal Large Language Models (MLLMs) has become pivotal for advancing Universal Multimodal Embeddings (UME) in addressing diverse cross-modal tasks. Recent studies demonstrate that incorporating generative Chain-of-Thought (CoT) reasoning can substantially enhance task-specific representations compared to discriminative methods. However, the generated reasoning CoTs of existing generative embedding methods are limited to the textual analysis of queries and are irrelevant to the retrieval of the targets. To address these limitations, we propose a reasoning-driven UME framework that integrates Embedder-Guided Reinforcement Learning (EG-RL) to optimize the Reasoner to produce evidential Traceability CoT (T-CoT). Our key contributions are threefold: (1) We design an EG-RL framework where the Embedder provides explicit supervision to the Reasoner, ensuring the generated CoT traces are aligned with embedding tasks. (2) We introduce T-CoT, which extracts critical multimodal cues to focus on retrieval-relevant elements and provides multimodal inputs for the Embedder. (3) With limited computational resources, our framework outperforms the pioneering embedding model on both MMEB-V2 and UVRB benchmarks. The integration of multimodal evidence in structured reasoning, paired with retrieval-oriented alignment, effectively strengthens cross-modal semantic consistency and boosts the fine-grained matching capability of the model as well as the generalization across complex scenarios. Our work demonstrates that targeted reasoning optimization can significantly improve multimodal embedding quality, providing a practical and efficient solution for reasoning-driven UME development.
- WebWorld: A Large-Scale World Model for Web Agent Training
Web agents require massive trajectories to generalize, yet real-world training is constrained by network latency, rate limits, and safety risks. We introduce WebWorld series, the first open-web simulator trained at scale. While existing simulators are restricted to closed environments with thousands of trajectories, WebWorld leverages a scalable data pipeline to train on 1M+ open-web interactions, supporting reasoning, multi-format data, and long-horizon simulations of 30+ steps. For intrinsic evaluation, we introduce WebWorld-Bench with dual metrics spanning nine dimensions, where WebWorld achieves simulation performance comparable to Gemini-3-Pro. For extrinsic evaluation, Qwen3-14B trained on WebWorld-synthesized trajectories improves by +9.2\% on WebArena, reaching performance comparable to GPT-4o. WebWorld enables effective inference-time search, outperforming GPT-5 as a world model. Beyond web simulation, WebWorld exhibits cross-domain generalization to code, GUI, and game environments, providing a replicable recipe for world model construction.
Solidot(12)
- 虚假医疗信息的主要受众是老年人
犹他大学的研究人员跟踪了逾千名美国成年人四周的上网冲浪活动,发现虚假医疗信息的主要受众是老年人,尤其是政治立场偏右的老年人。研究期间参与者访问了约 900 万个网址,包括 50 万个 YouTube 视频。有 1,055 个域名属于医疗健康类别,其中 78 个域名被认为传播虚假医疗健康信息。只有 13% 的参与者访问过此类网站,而大部分访问量集中老年人群中。研究人员表示,他们的数据无法判断参与者是通过 Google 搜索还是 Facebook 推荐访问此类网站的。
- 希捷和西部数据证实其 2026 年硬盘产能已售罄
三大硬盘制造商中的两家希捷和西部数据都已经证实其 2026 年硬盘产能已全部或几乎售罄,另一家硬盘制造商东芝的情况可能类似。西部数据 CEO 陈添耀表示,该公司与五大客户中的两家达成的供货协议持续到 2027 年,还有一家持续到 2028 年。希捷 CEO William Mosley 表示未来几个月将开始接受 2027 年上半年的订单。希捷和西部数据的大客户都是数据中心运营商,包括亚马逊 AWS、Google、微软 Azure、Meta 和 OpenAI。服务器硬盘占到了希捷硬盘总销量的 87%,而一年前是 83%。希捷表示它暂时没有扩大产能的计划。
- 内存价格飙升推动二手笔记本销量上涨
内存和硬盘的价格因 AI 公司大规模采购而供不应求,价格在数个月内飙升数倍之多,内存等关键零部件的短缺推动了二手翻新笔记本电脑销量的上涨。根据 Context 的数据,意大利、英国、德国、西班牙和法国五大欧洲市场去年四季度二手翻新笔记本销量上升了 7%。四成的销量来自于预算有限的客户,他们购买的笔记本电脑价格区间在 235-355 美元之间。355-475 美元价格区间的二手电脑销量也在扩大,占到了整个二手电脑销售的 23%,而一年前是 15%,这表明部分客户愿意为更好的配置支付更高的价格。
- 切尔诺贝利工人后代的 DNA 突变
研究人员测序了 130 人的基因序列,他们的父亲参与了切尔诺贝利核事故的清理工作。通过对比对照组,研究人员首次发现了父亲长时间暴露于低剂量电离辐射的“跨代效应”证据。研究报告发表在《Scientific Reports》期刊上。研究人员不是去寻找新的基因突变,而是寻找“簇状新生突变(clustered de novo mutations,缩写 cDNM)”——即在父母一代中不存在,但在后代中首次出现的两个或多个位置相近的突变。这些突变是暴露在辐射下导致父母 DNA 断裂而产生的。研究人员发现父亲暴露在辐射下导致后代 cDNM 数量显著增加,cDNM 数量与暴露的辐射剂量相关。cDNM 数量增加并没有增加后代的患病风险,原因可能是 cDNM 多数位于非编码的 DNA 区域。
- 巴比伦五号上传到 YouTube 可免费观看
Warner Bros. Discovery 以每周一集的频率将著名科幻剧集《巴比伦五号(Babylon 5)》上传到 YouTube 供所有人免费观看。第一季第一集《The Gatherin》于 1 月 22 日上传,目前观看量 25 万,第二集《Midnight on the Firing Line》和第三集《Soul Hunter》也都已经发布,这一发布频率沿用了《巴比伦五号》最早播出时的时间表,此举旨在让观众以相同的节奏体验剧情。《巴比伦五号》于 1993 年 2 月 22 日首播,共制作了五季 110 集,故事发生在 2257-2262 年,地球各国、火星、以及比邻星的殖民地组成的“地球联盟”已和其他外星文明接触,并且取得超空间技术可以超光速航行。故事开始之前十年,地球差点在一场星际战争中被明巴利人(Minbari)歼灭,但明巴利人在胜利前夕突然投降。为了避免悲剧重演,双方建立了和平往来的管道,人类建造了巴比伦五号太空站用作和平外交和贸易。此时的巴比伦五号成为了政治阴谋、种族冲突和一场大战的焦点,而地球切断了与盟友的联系,正滑向法西斯主义。
- Ars Technica AI 记者为 AI 生成内容道歉
知名科技媒体 Ars Technica 上周在报道 AI 新闻时被发现将 AI 生成的内容作为消息来源使用,Ars 联合创始人兼主编 Ken Fisher 周日发表声明公开道歉,称他们检查了最近发表的一系列文章,没有发现其它文章含有 AI 生成内容,目前看来这应该是一次孤立事件。这篇报道的合作者 Benj Edwards 是 Ars 的资深 AI 记者,他解释说尝试使用基于 Claude Code 的实验性 AI 工具从原始材料中提取出可添加到大纲的结构化引用内容,但该 AI 拒绝处理,他猜测可能是文章描述的是一起骚扰事件(AI 骚扰人类),他于是将文本拷贝到 ChatGPT,没有注意到 ChatGPT 生成了文章作者的意译版本而不是原话,在引用时没有核实引用是否与原文一致。AI 记者因 AI 幻觉犯错,这件事太有讽刺性了。
- OpenClaw 创始人加盟 OpenAI
OpenClaw 开源项目的创始人 Peter Steinberger 宣布加盟 OpenAI,而 OpenClaw 将由基金会管理。OpenClaw 是一个开源的自主 AI 虚拟助理软件项目,最初于 2025 年末以 Clawdbot 的名字在 GitHub 上发布,后更名为 Moltbot,最终定为现名。2026 年初,该项目因能根据用户指令在应用和在线服务中自主处理复杂任务而受到关注。OpenClaw 可部署在 MacOS、Windows 等本地设备上,能调用其他 AI 大模型与 API,通过 WhatsApp、Telegram、Signal、Discord 等即时通讯平台接收用户发送的文本指令,实现安排日程、发送消息、整理文件、编写代码等工作。
- Vim 9.2 释出
Vim 文本编辑器项目在情人节释出了 v9.2 版本。主要变化包括:实验性 Wayland 支持;XDG Base Directory Specification 目录标准支持——即将配置文件、缓存数据、用户数据等储存在不同目录;HiDPI 显示器的现代默认配置;新的代码补全功能;改进 diff 模式;新增垂直标签面板;Windows 版本有了原生深色模式支持,等等。
- 地球暖化加速的原因
对 1880-2025 年全球平均地表温度的分析显示,过去 30 年全球气温上升在加速,过去 10 年达到了每十年上升近 0.27C。地球暖化加速的一种解释是气溶胶污染减少,气溶胶会反射太阳光,有降温效应,能抵消部分温室气体产生的暖化效应。过去二十年很多国家开始严打气溶胶污染,导致降温效应减少了。然而研究人员认为,过去几年的创纪录高温无法完全用气溶胶和自然变化进行解释。他们发现,地球低空云的覆盖面积下降了,低空云会反射阳光,其面积的减少推动了暖化的加速。低空云的减少部分与气溶胶有关,但也可能是气温上升导致的反馈循环。气温升高会让低层云更难形成。目前创纪录的高温如果主要是气溶胶变化造成的,那么一旦气溶胶污染物降至零,加速升温的趋势会停止,地球将恢复到之前较慢的升温。但如果是由于云层反馈循环造成的,那么升温加速趋势很可能会持续下去,会带来更严重的热浪、风暴和干旱。
- 在高危漏洞披露前电信公司提前屏蔽 Telnet 流量
1 月 20 日公开的 Telnet 高危漏洞 CVE-2026-24061 存在于 GNU InetUtils telnetd 中,已有 10 年历史,CVSS 评分 9.8/10,非常容易被攻击者获取 root 权限。但在漏洞披露前一周,全球的 Telnet 流量就出现断崖式下降。电信运营商应该是提前收到了漏洞预警,提前采取行动防止漏洞利用。数据显示,1 月 14 日 Telnet 会话数在一小时内下降了 65%,两小时内下降了 83%。日均会话数从 12 月 1 日的 91.4 万次降至 1 月 14 日的约 37.3 万次,降幅达 59%。北美一家或多家 Tier 1 级中转服务提供商过滤了 Telnet 协议默认使用的 23 端口。BT、Cox Communications 和 Vultr 在内的 18 家电信运营商的 Telnet 会话数在 1 月 15 日从之前的数十万降至零。
- 欧盟采取行动禁用无限滚动
欧盟首次尝试对社交媒体成瘾采取行动。本月早些时候,欧盟初步裁决 TikTok 的无限滚动、自动播放、高度定制化推荐系统等成瘾性设计违反了欧盟的《数字服务法(DSA)》 ,它要求 TikTok 禁用无限滚动、设置严格的屏幕休息时间,修改其推荐系统。欧盟针对 TikTok 的行动可能将树立新的设计标准,终结无限滚动时代。TikTok 可以为其设计进行辩护,如果它无法令欧盟满意,将面临其全球年收入 6% 的罚款。这是监管机构首次尝试为社交媒体平台的成瘾性设计制定法律标准。Meta 旗下的 Facebook 和 Instagram 也因其成瘾设计而接受调查。
- 最高法院称激活辅助驾驶功能后司机依旧需要承责
最高法院首次发布道路交通安全刑事专题指导性案例,表示激活辅助驾驶功能后司机依旧需要对交通安全承担责任。案例称:“在辅助驾驶技术应用日益广泛的背景下,有的驾驶人在激活辅助驾驶系统后不再专注驾驶,而是玩手机、睡觉等,有的驾驶人甚至购买、使用“智驾神器”等非法配件,逃避系统安全监测,长时间“脱手”驾驶,严重威胁道路交通安全。指导性案例271号《王某群危险驾驶案》明确,车载辅助驾驶系统不能代替驾驶人成为驾驶主体,驾驶人激活车载辅助驾驶功能后,仍是实际执行驾驶任务的人,负有确保行车安全的责任。行为人激活辅助驾驶功能,并利用私自安装的配件逃避辅助驾驶系统监测的,即使其不在主驾驶位实际操控机动车,仍应作为驾驶主体承担相应法律责任。”