DIGEST · 2026-01-29

OrangeBot.AI Digest — 2026-01-29

59 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. PlayStation 2 Recompilation Project Is Absolutely Incredible (redgamingtech.com)
  2. County pays $600k to pentesters it arrested for assessing courthouse security (arstechnica.com)
  3. Tesla is committing automotive suicide (electrek.co)
  4. Drug trio found to block tumour resistance in pancreatic cancer (www.drugtargetreview.com)
  5. Project Genie: Experimenting with infinite, interactive worlds (blog.google)
  6. US cybersecurity chief leaked sensitive government files to ChatGPT: Report (www.dexerto.com)
  7. Benchmarking OpenTelemetry: Can AI trace your failed login? (quesma.com)
  8. Waymo robotaxi hits a child near an elementary school in Santa Monica (techcrunch.com)
  9. How to choose colors for your CLI applications (2023) (blog.xoria.org)
  10. Claude Code daily benchmarks for degradation tracking (marginlab.ai)
  11. A lot of population numbers are fake (davidoks.blog)
  12. TÜV Report 2026: Tesla Model Y has the worst reliability of all 2022–2023 cars (2025) (www.autoevolution.com)
  13. The tech market is fundamentally fucked up and AI is just a scapegoat (bayramovanar.substack.com)
  14. Vitamin D and Omega-3 have a larger effect on depression than antidepressants (blog.ncase.me)
  15. Europe’s next-generation weather satellite sends back first images (www.esa.int)

GitHub Trending(14)

  1. moltbot / moltbot

    Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

  2. asgeirtj / system_prompts_leaks

    Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini

  3. MoonshotAI / kimi-cli

    Kimi Code CLI is your next CLI agent.

  4. modelcontextprotocol / ext-apps

    Official repo for spec & SDK of MCP Apps protocol - standard for UIs embedded AI chatbots, served by MCP servers

  5. NevaMind-AI / memU

    Memory for 24/7 proactive agents like moltbot (clawdbot).

  6. hashicorp / vault

    A tool for secrets management, encryption as a service, and privileged access management

  7. badlogic / pi-mono

    AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods

  8. anomalyco / opencode-anthropic-auth
  9. protocolbuffers / protobuf

    Protocol Buffers - Google's data interchange format

  10. pedroslopez / whatsapp-web.js

    A WhatsApp client library for NodeJS that connects through the WhatsApp Web browser app

  11. TeamNewPipe / NewPipe

    A libre lightweight streaming front-end for Android.

  12. Shubhamsaboo / awesome-llm-apps

    Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

  13. microsoft / playwright-cli

    CLI for common Playwright actions. Record and generate Playwright code, inspect selectors and take screenshots.

  14. lobehub / lobehub

    The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, effortless agent team design, and introducing agents as the unit of work interaction.

Hugging Face(15)

  1. Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

    Reinforcement Learning with Verifiable Rewards (RLVR) offers a robust mechanism for enhancing mathematical reasoning in large models. However, we identify a systematic lack of emphasis on more challenging questions in existing methods from both algorithmic and data perspectives, despite their importance for refining underdeveloped capabilities. Algorithmically, widely used Group Relative Policy Optimization (GRPO) suffers from an implicit imbalance where the magnitude of policy updates is lower for harder questions. Data-wise, augmentation approaches primarily rephrase questions to enhance diversity without systematically increasing intrinsic difficulty. To address these issues, we propose a two-dual MathForge framework to improve mathematical reasoning by targeting harder questions from both perspectives, which comprises a Difficulty-Aware Group Policy Optimization (DGPO) algorithm and a Multi-Aspect Question Reformulation (MQR) strategy. Specifically, DGPO first rectifies the implicit imbalance in GRPO via difficulty-balanced group advantage estimation, and further prioritizes harder questions by difficulty-aware question-level weighting. Meanwhile, MQR reformulates questions across multiple aspects to increase difficulty while maintaining the original gold answer. Overall, MathForge forms a synergistic loop: MQR expands the data frontier, and DGPO effectively learns from the augmented data. Extensive experiments show that MathForge significantly outperforms existing methods on various mathematical reasoning tasks. The code and augmented data are all available at https://github.com/AMAP-ML/MathForge.

  2. Advancing Open-source World Models

    We present LingBot-World, an open-sourced world simulator stemming from video generation. Positioned as a top-tier world model, LingBot-World offers the following features. (1) It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. (2) It enables a minute-level horizon while preserving contextual consistency over time, which is also known as "long-term memory". (3) It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.

  3. Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

    We present Innovator-VL, a scientific multimodal large language model designed to advance understanding and reasoning across diverse scientific domains while maintaining excellent performance on general vision tasks. Contrary to the trend of relying on massive domain-specific pretraining and opaque pipelines, our work demonstrates that principled training design and transparent methodology can yield strong scientific intelligence with substantially reduced data requirements. (i) First, we provide a fully transparent, end-to-end reproducible training pipeline, covering data collection, cleaning, preprocessing, supervised fine-tuning, reinforcement learning, and evaluation, along with detailed optimization recipes. This facilitates systematic extension by the community. (ii) Second, Innovator-VL exhibits remarkable data efficiency, achieving competitive performance on various scientific tasks using fewer than five million curated samples without large-scale pretraining. These results highlight that effective reasoning can be achieved through principled data selection rather than indiscriminate scaling. (iii) Third, Innovator-VL demonstrates strong generalization, achieving competitive performance on general vision, multimodal reasoning, and scientific benchmarks. This indicates that scientific alignment can be integrated into a unified model without compromising general-purpose capabilities. Our practices suggest that efficient, reproducible, and high-performing scientific multimodal models can be built even without large-scale data, providing a practical foundation for future research.

  4. DeepSeek-OCR 2: Visual Causal Flow

    We present DeepSeek-OCR 2 to investigate the feasibility of a novel encoder-DeepEncoder V2-capable of dynamically reordering visual tokens upon image semantics. Conventional vision-language models (VLMs) invariably process visual tokens in a rigid raster-scan order (top-left to bottom-right) with fixed positional encoding when fed into LLMs. However, this contradicts human visual perception, which follows flexible yet semantically coherent scanning patterns driven by inherent logical structures. Particularly for images with complex layouts, human vision exhibits causally-informed sequential processing. Inspired by this cognitive mechanism, DeepEncoder V2 is designed to endow the encoder with causal reasoning capabilities, enabling it to intelligently reorder visual tokens prior to LLM-based content interpretation. This work explores a novel paradigm: whether 2D image understanding can be effectively achieved through two-cascaded 1D causal reasoning structures, thereby offering a new architectural approach with the potential to achieve genuine 2D reasoning. Codes and model weights are publicly accessible at http://github.com/deepseek-ai/DeepSeek-OCR-2.

  5. Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

    Reinforcement learning has empowered large language models to act as intelligent agents, yet training them for long-horizon tasks remains challenging due to the scarcity of high-quality trajectories, especially under limited resources. Existing methods typically scale up rollout sizes and indiscriminately allocate computational resources among intermediate steps. Such attempts inherently waste substantial computation budget on trivial steps while failing to guarantee sample quality. To address this, we propose Spark (Strategic Policy-Aware exploRation via Key-state dynamic branching), a novel framework that selectively branches at critical decision states for resource-efficient exploration. Our key insight is to activate adaptive branching exploration at critical decision points to probe promising trajectories, thereby achieving precise resource allocation that prioritizes sampling quality over blind coverage. This design leverages the agent's intrinsic decision-making signals to reduce dependence on human priors, enabling the agent to autonomously expand exploration and achieve stronger generalization. Experiments across diverse tasks (e.g., embodied planning), demonstrate that Spark achieves superior success rates with significantly fewer training samples, exhibiting robust generalization even in unseen scenarios.

  6. Linear representations in language models can change dramatically over a conversation

    Language model representations often contain linear directions that correspond to high-level concepts. Here, we study the dynamics of these representations: how representations evolve along these dimensions within the context of (simulated) conversations. We find that linear representations can change dramatically over a conversation; for example, information that is represented as factual at the beginning of a conversation can be represented as non-factual at the end and vice versa. These changes are content-dependent; while representations of conversation-relevant information may change, generic information is generally preserved. These changes are robust even for dimensions that disentangle factuality from more superficial response patterns, and occur across different model families and layers of the model. These representation changes do not require on-policy conversations; even replaying a conversation script written by an entirely different model can produce similar changes. However, adaptation is much weaker from simply having a sci-fi story in context that is framed more explicitly as such. We also show that steering along a representational direction can have dramatically different effects at different points in a conversation. These results are consistent with the idea that representations may evolve in response to the model playing a particular role that is cued by a conversation. Our findings may pose challenges for interpretability and steering -- in particular, they imply that it may be misleading to use static interpretations of features or directions, or probes that assume a particular range of features consistently corresponds to a particular ground-truth value. However, these types of representational dynamics also point to exciting new research directions for understanding how models adapt to context.

  7. VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

    Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.

  8. Reinforcement Learning via Self-Distillation

    Large language models are increasingly post-trained with reinforcement learning in verifiable domains such as code and math. Yet, current methods for reinforcement learning with verifiable rewards (RLVR) learn only from a scalar outcome reward per attempt, creating a severe credit-assignment bottleneck. Many verifiable environments actually provide rich textual feedback, such as runtime errors or judge evaluations, that explain why an attempt failed. We formalize this setting as reinforcement learning with rich feedback and introduce Self-Distillation Policy Optimization (SDPO), which converts tokenized feedback into a dense learning signal without any external teacher or explicit reward model. SDPO treats the current model conditioned on feedback as a self-teacher and distills its feedback-informed next-token predictions back into the policy. In this way, SDPO leverages the model's ability to retrospectively identify its own mistakes in-context. Across scientific reasoning, tool use, and competitive programming on LiveCodeBench v6, SDPO improves sample efficiency and final accuracy over strong RLVR baselines. Notably, SDPO also outperforms baselines in standard RLVR environments that only return scalar feedback by using successful rollouts as implicit feedback for failed attempts. Finally, applying SDPO to individual questions at test time accelerates discovery on difficult binary-reward tasks, achieving the same discovery probability as best-of-k sampling or multi-turn conversations with 3x fewer attempts.

  9. SERA: Soft-Verified Efficient Repository Agents

    Open-weight coding agents should hold a fundamental advantage over closed-source systems: they can be specialized to private codebases, encoding repository-specific information directly in their weights. Yet the cost and complexity of training has kept this advantage theoretical. We show it is now practical. We present Soft-Verified Efficient Repository Agents (SERA), an efficient method for training coding agents that enables the rapid and cheap creation of agents specialized to private codebases. Using only supervised finetuning (SFT), SERA achieves state-of-the-art results among fully open-source (open data, method, code) models while matching the performance of frontier open-weight models like Devstral-Small-2. Creating SERA models is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. Our method, Soft Verified Generation (SVG), generates thousands of trajectories from a single code repository. Combined with cost-efficiency, this enables specialization to private codebases. Beyond repository specialization, we apply SVG to a larger corpus of codebases, generating over 200,000 synthetic trajectories. We use this dataset to provide detailed analysis of scaling laws, ablations, and confounding factors for training coding agents. Overall, we believe our work will greatly accelerate research on open coding agents and showcase the advantage of open-source models that can specialize to private codebases. We release SERA as the first model in Ai2's Open Coding Agents series, along with all our code, data, and Claude Code integration to support the research community.

  10. OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

    Graphical User Interface (GUI) agents show great potential for enabling foundation models to complete real-world tasks, revolutionizing human-computer interaction and improving human productivity. In this report, we present OmegaUse, a general-purpose GUI agent model for autonomous task execution on both mobile and desktop platforms, supporting computer-use and phone-use scenarios. Building an effective GUI agent model relies on two factors: (1) high-quality data and (2) effective training methods. To address these, we introduce a carefully engineered data-construction pipeline and a decoupled training paradigm. For data construction, we leverage rigorously curated open-source datasets and introduce a novel automated synthesis framework that integrates bottom-up autonomous exploration with top-down taxonomy-guided generation to create high-fidelity synthetic data. For training, to better leverage these data, we adopt a two-stage strategy: Supervised Fine-Tuning (SFT) to establish fundamental interaction syntax, followed by Group Relative Policy Optimization (GRPO) to improve spatial grounding and sequential planning. To balance computational efficiency with agentic reasoning capacity, OmegaUse is built on a Mixture-of-Experts (MoE) backbone. To evaluate cross-terminal capabilities in an offline setting, we introduce OS-Nav, a benchmark suite spanning multiple operating systems: ChiM-Nav, targeting Chinese Android mobile environments, and Ubu-Nav, focusing on routine desktop interactions on Ubuntu. Extensive experiments show that OmegaUse is highly competitive across established GUI benchmarks, achieving a state-of-the-art (SOTA) score of 96.3% on ScreenSpot-V2 and a leading 79.1% step success rate on AndroidControl. OmegaUse also performs strongly on OS-Nav, reaching 74.24% step success on ChiM-Nav and 55.9% average success on Ubu-Nav.

  11. UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

    The space of task-agnostic feature upsampling has emerged as a promising area of research to efficiently create denser features from pre-trained visual backbones. These methods act as a shortcut to achieve dense features for a fraction of the cost by learning to map low-resolution features to high-resolution versions. While early works in this space used iterative upsampling approaches, more recent works have switched to cross-attention-based methods, which risk falling into the same efficiency scaling problems of the backbones they are upsampling. In this work, we demonstrate that iterative upsampling methods can still compete with cross-attention-based methods; moreover, they can achieve state-of-the-art performance with lower inference costs. We propose UPLiFT, an architecture for Universal Pixel-dense Lightweight Feature Transforms. We also propose an efficient Local Attender operator to overcome the limitations of prior iterative feature upsampling methods. This operator uses an alternative attentional pooling formulation defined fully locally. We show that our Local Attender allows UPLiFT to maintain stable features throughout upsampling, enabling state-of-the-art performance with lower inference costs than existing pixel-dense feature upsamplers. In addition, we apply UPLiFT to generative downstream tasks and show that it achieves competitive performance with state-of-the-art Coupled Flow Matching models for VAE feature upsampling. Altogether, UPLiFT offers a versatile and efficient approach to creating denser features.

  12. RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation

    Despite decades of research on reverberant speech, comparing methods remains difficult because most corpora lack per-file acoustic annotations or provide limited documentation for reproduction. We present RIR-Mega-Speech, a corpus of approximately 117.5 hours created by convolving LibriSpeech utterances with roughly 5,000 simulated room impulse responses from the RIR-Mega collection. Every file includes RT60, direct-to-reverberant ratio (DRR), and clarity index (C_{50}) computed from the source RIR using clearly defined, reproducible procedures. We also provide scripts to rebuild the dataset and reproduce all evaluation results. Using Whisper small on 1,500 paired utterances, we measure 5.20% WER (95% CI: 4.69--5.78) on clean speech and 7.70% (7.04--8.35) on reverberant versions, corresponding to a paired increase of 2.50 percentage points (2.06--2.98). This represents a 48% relative degradation. WER increases monotonically with RT60 and decreases with DRR, consistent with prior perceptual studies. While the core finding that reverberation harms recognition is well established, we aim to provide the community with a standardized resource where acoustic conditions are transparent and results can be verified independently. The repository includes one-command rebuild instructions for both Windows and Linux environments.

  13. Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning

    Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning abilities of large language models (LLMs), yet training often stalls as problems become saturated. We identify the core challenge as the poor accessibility of informative failures: learning signals exist but are rarely encountered during standard rollouts. To address this, we propose failure-prefix conditioning, a simple and effective method for learning from saturated problems. Rather than starting from the original question, our approach reallocates exploration by conditioning training on prefixes derived from rare incorrect reasoning trajectories, thereby exposing the model to failure-prone states. We observe that failure-prefix conditioning yields performance gains matching those of training on medium-difficulty problems, while preserving token efficiency. Furthermore, we analyze the model's robustness, finding that our method reduces performance degradation under misleading failure prefixes, albeit with a mild trade-off in adherence to correct early reasoning. Finally, we demonstrate that an iterative approach, which refreshes failure prefixes during training, unlocks additional gains after performance plateaus. Overall, our results suggest that failure-prefix conditioning offers an effective pathway to extend RLVR training on saturated problems.

  14. SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper

    Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a major challenge. While some approaches achieve strong performance when fine-tuned on specific domains, few systems generalize well across out-of-domain datasets. Our prior work, Diarization-Conditioned Whisper (DiCoW), leverages speaker diarization outputs as conditioning information and, with minimal fine-tuning, demonstrated strong multilingual and multi-domain performance. In this paper, we address a key limitation of DiCoW: ambiguity in Silence-Target-Non-target-Overlap (STNO) masks, where two or more fully overlapping speakers may have nearly identical conditioning despite differing transcriptions. We introduce SE-DiCoW (Self-Enrolled Diarization-Conditioned Whisper), which uses diarization output to locate an enrollment segment anywhere in the conversation where the target speaker is most active. This enrollment segment is used as fixed conditioning via cross-attention at each encoder layer. We further refine DiCoW with improved data segmentation, model initialization, and augmentation. Together, these advances yield substantial gains: SE-DiCoW reduces macro-averaged tcpWER by 52.4% relative to the original DiCoW on the EMMA MT-ASR benchmark.

  15. GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection

    Multimodal sarcasm detection (MSD) aims to identify sarcasm within image-text pairs by modeling semantic incongruities across modalities. Existing methods often exploit cross-modal embedding misalignment to detect inconsistency but struggle when visual and textual content are loosely related or semantically indirect. While recent approaches leverage large language models (LLMs) to generate sarcastic cues, the inherent diversity and subjectivity of these generations often introduce noise. To address these limitations, we propose the Generative Discrepancy Comparison Network (GDCNet). This framework captures cross-modal conflicts by utilizing descriptive, factually grounded image captions generated by Multimodal LLMs (MLLMs) as stable semantic anchors. Specifically, GDCNet computes semantic and sentiment discrepancies between the generated objective description and the original text, alongside measuring visual-textual fidelity. These discrepancy features are then fused with visual and textual representations via a gated module to adaptively balance modality contributions. Extensive experiments on MSD benchmarks demonstrate GDCNet's superior accuracy and robustness, establishing a new state-of-the-art on the MMSD2.0 benchmark.

Solidot(15)

  1. 日本首例防陆地暴雨的人工降雨实验

    日本千叶大学、富山大学等组成的研究团队 1 月在富山县近海启动实验,旨在通过在海上人工引发降雨或降雪以减少陆地上的暴雨灾害。人工降雨主要是缓解旱情,以防止暴雨为目的的实验尚属首例。近年来因大雨导致的水灾呈增加趋势,千叶大学教授小槻峻司称:“将来希望建立能控制雨云生成时间和地点的方法。”实验于 7 日至 13 日共实施 4 次,以性质上类似于容易引发暴雨灾害的夏季积雨云,但生成高度更低且容易预测的日本海冬季雪云为实验对象。实验使用小型螺旋桨飞机在富山湾近海上空约 3000 米处分多次投放总计约 30 千克干冰,最长持续 2 小时,确认了天空和云层情况。今后计划对收集到的数据进行分析并找出合适的投放方式。

  2. 韦伯绘制出至今最精细暗物质地图

    天文学家藉由韦伯太空望远镜的超高解析影像,首度建立一幅广阔、解析度极高的宇宙质量分布图,显示暗物质与普通物质如何从星系四周的丝状结构开始,交织延伸到致密的星系团,其影像解析度较以往提升逾一倍,拍摄出更黯淡的天体得以让天文学家回溯宇宙演化的极早期阶段。 暗物质约占宇宙总物质的85%,不发光也不会吸收光,直接观测它是非常困难的。然而暗物质产生的引力会扭曲背景遥远星系的影像。研究团队透过测量约 25 万个星系影像形状的微小剪切效应,重建出连续区域内最精细的质量分布图,进而推知暗物质的空间位置。相较于先前以哈伯望远镜为主的研究,韦伯望远镜拍摄的影像兼具高解析度、高灵敏度、涵盖的视野广阔,能测量并绘制出宇宙网中除了大质量星系团外的暗淡丝状结构与低质量星系群。天文学家发现测量结果与标准宇宙学模型一致。

  3. Apple TV 将改编布兰登·桑德森的寰宇系列

    Apple TV 获得了布兰登·桑德森寰宇系列的影视剧改编权。桑德森是当今最多产、最受欢迎的奇幻作家之一,与苹果达成的交易赋予他对改编作品有极高掌控权,他拥有最终的审批权。寰宇系列(Cosmere)发生在一个架空宇宙寰宇中,造物主雅多纳西(Adonalsium)被谋反者杀害。他的力量被分裂成 16 个碎片,散布到各个世界,将各种魔法传播到宇宙的各个角落。苹果计划改编的首批作品包括《迷雾之子》系列和《飓光志》系列。

  4. GNU C Library 将从 Sourceware 迁移到 Linux Foundation 托管的 CTI

    GNU C Library“glibc”维护者 Carlos O'Donell 宣布项目核心服务将从 Sourceware 迁移到 Linux Foundation 托管的 Core Toolchain Infrastructure“CTI”。此举旨在满足 glibc 和 GNU Toolchain 当前及未来的需求,拥有安全、稳固(robust)且可持续的基础设施,同时兼顾开发者和社区协作创新的需求,确保基础设施在长期内有可靠资金支持。

  5. Linux kernel 社区制定 Linus Torvalds 卸任的计划

    Linux kernel 社区正式制定了一旦 Linus Torvalds 最终卸任如何寻找他的接替者的计划。该计划由资深内核贡献者 Dan Williams 起草,在最近举行的东京 Linux 内核维护者峰会进行了讨论。计划并没有指定具体的接替者,而是制定了一套流程,在最坏或有序过渡的情况下选择一位或多位维护者接管 Linux,包括召开一次会议去权衡各种方案,以最大限度保障 Linux 项目的长期健康发展。一位东京维护者开玩笑的建议,像选举新教皇的秘密会议一样,把遴选小组锁在房间里,在做出决定时释放出一团白烟。此举旨在防止“巴士系数”问题。巴士系数是指一个项目或项目至少失去若干关键成员的参与(“被巴士撞了”,指代职业和生活方式变动、婚育、意外伤亡等任意导致缺席的缘由)即导致项目陷入混乱、瘫痪而无法存续时,这些成员的数量即为巴士系数。Linus Torvalds 目前在 Linux 项目的核心地位意味着项目的巴士系数为 1。目前内核社区排在 Torvalds 之后的是稳定版内核维护者 Greg Kroah-Hartman。对于有人建议指定 Greg KH 为继任者,Torvalds 回答:“问题是 Greg 并非一直是 Greg。在他之前是 Andrew Morton 和 Alan Cox。Greg 之后会是 Shannon 和 Steve。真正的问题在于你必须找到一个或一组能赢得社区信任的人,而信任在于有足够长的时间让人们了解你的工作方式,但足够长的时间并不意味着要有 30 年。”

  6. 欧洲纯电汽车销量首次超过汽油车

    欧洲汽车协会公布的 2025 年 12 月的销售数据显示,欧洲纯电汽车销量首次超过汽油车。纯电汽车销量占比和汽油车都是 22.5%,柴油车占 7%,混动 33%,插电混动 10.7%。在纯电品牌中,特斯拉销量持续下滑,市场份额被比亚迪和大众吞食。12 月欧盟共注册了 320,812 辆新电动汽车,相比 2024 年同期增长了 46.1%,电动汽车市场份额达到 33.3%,较上年增长了 9.2 个百分点。

  7. Vibe Coding 杀死开源

    生成式 AI 正在重塑软件开发。Claude Code、Cursor 和 Lovable 等 AI 辅助编程助手让用户几乎无需手动编码就能将其意图转化为可工作的应用。这种软件构建方式被称为 Vibe Coding。Vibe Coding 降低了软件开发成本,但也改变了用户与软件生态系统的互动方式。传统的软件开发模式中,开发者选择开源软件包、阅读文档,与维护者及其他用户互动。在 Vibe Coding 下,AI 智能体可以直接选择、组合和修改软件包,人类开发者可能不知道使用了哪些上游组件。这就引发了一个开源软件的可持续性问题。开源软件项目依赖于用户的参与和互动——文档访问、bug 报告、公开问答和声誉——维持维护和获取报酬。如果 AI 取代了人类用户之间的互动,那么旧的开源软件开发模式将会彻底改变,开源软件的可用性和质量将会下降。中欧大学和德国经济研究所  Kiel Institute for the World Economy 的研究人员在 arxiv 上发表研究报告认为,Vibe Coding 将会杀死开源。

  8. Anthropic 如何构建 Claude

    根据上周公开的图书作者诉 Anthropic 侵权案的法庭文件,该公司实施了名为“巴拿马计划”(Project Panama)的行动:大量购买实体图书,拆开书脊、扫描书页去训练其 Claude 聊天机器人,之后将图书残骸送去回收公司。Anthropic 为此投入了数千万美元,聘请了二十年前参与 Google Books 项目的 Google 高管 Tom Turvey。Anthropic 从包括 Better World Books 和 World of Books 在内的图书零售商批量购买图书,每批数以万计;供应商文件显示,Anthropic 计划扫描 50-200 万册图书。在巴拿马计划前,Anthropic 联合创始人 Ben Mann 曾在 2021 年 6 月 的 11 天内,从影子图书馆 LibGen 下载书籍,向同事分享了盗版图书馆镜像站的链接,称“这太棒了!!!”法庭文件还披露,Meta 员工在获得扎克伯格(Mark Zuckerberg)批准后,也从盗版图书种子平台下载图书,有工程师表示“用公司笔记本电脑下载(盗版书)种子文件不太好”。Anthropic 去年 8 月以 15 亿美元和解侵权案,但未承认有不当行为。

  9. OpenAI 科学部门负责人称大模型尚未准备好产生新发现

    OpenAI 副总裁、OpenAI for Science 部门负责人 Kevin Weil 接受 MIT Technology Review 采访时承认,大模型还无法产生全新发现,表示这不是目前的任务。大模型的输出是组合现有成果,时常会出错,它并不是提出全新的方法。Weil 承认现有的大模型还没有达到理想状态,可能最终会达到,他对此感到乐观。大模型擅长挖掘被遗忘的解决方案,发现跨领域的联系,Weil 表示加速科学发展的标准不需要“像爱因斯坦那样彻底重塑整个领域”。他说,GPT-5 阅读了过去 30 年发表的几乎所有论文,聚合来自不相关学科的类推。现有知识的积累——帮助科学家避免在已解决的问题上浪费精力——本身就是一种加速。

  10. 法国政府将用本国平台取代美国的 Teams 和 Zoom

    法国政府宣布将用本国开发的视频会议平台 Visio 取代美国微软的 Teams 和 Zoom 平台,计划到 2027 年所有政府部门都使用该平台。此举是法国加强数字主权战略的一部分,减少对外国软件供应商——尤其是美国供应商——的依赖,重新控制关键数字基础设施。Visio 已测试一年,有约 4 万用户。

  11. 印尼发现距今 6.78 万年的岩画

    印度尼西亚苏拉威西发现的手形拓印,可追溯到至少 67800 年前,可能是迄今发现的最古老岩画。这些发现有力支持了早期人类经苏拉威西的北部路径迁移至萨胡尔(连接澳大利亚与新几内亚的古代陆地)的理论。岩画为古代人类的创造性和迁移模式提供了珍贵线索。印度尼西亚以世界最早洞穴艺术闻名。为了深入了解苏拉威西未被充分研究的地区,澳大利亚学团队调查了苏拉威西东南部的多处洞穴,记录了 44 处遗址,其中 14 处是此前未知的,并描述了这些洞穴中发现的岩画图样。他们对覆盖岩画上方和下方的微小碳酸钙沉积物做了取样,运用高分辨率激光剥蚀铀系法进行测年。结果表明,这些岩画至少出现于 67800 年前,比此前发现最古老的岩画早约1100 年。

  12. 末日时钟被设定距离午夜 85 秒

    《原子科学家公报(Bulletin of the Atomic Scientists)》将末日时钟设定距离午夜 85 秒。这是自 1947 年冷战时期科学家创建末日时钟衡量人类文明距离灭绝有多远以来距离理论上的毁灭时刻最近的一次。《原子科学家公报》列举了毁灭风险上升的因素:三大核大国咄咄逼人的行为、脆弱的核军控框架、乌克兰和中东持续的冲突、AI 不受监管的集成到军事系统,以及气候变化等。

  13. 微软错误配置将 example.com 流量重路由到日本公司域名

    微软被发现将专门用于测试的 example.com 的流量重路由到日本住友电工的域名 sei.co.jp。该错误配置已经修正,微软表示正对此展开调查。example.com 以及 example.net 和 example.org 是保留用于测试的域名,被要求解析到 IANA 指定的 IP,不应该被任何一方访问。但 Azure 和其它微软网络中的设备此前被发现一直在将部分 example.com 流量路由到 sei.co.jp 的子域名。而在 Outlook 中设置测试账号 test@example.com 时邮件流量会自动配置路由到两个 sei.co.jp 子域名:imapgms.jnet.sei.co.jp 和 smtpgms.jnet.sei.co.jp。目前不清楚住友电工为什么会卷入此事。 Tinyapps.org 本月初报道称,该错误配置已存在五年之久。

  14. 美国政府去年有逾万名 STEM 博士离职

    根据《科学》期刊的分析,因特朗普政府大幅削减联邦政府雇员规模,有 10109 名 STEM 领域的博士离职,虽然只占离职联邦雇员总人数的 3%,但占到了政府 STEM 博士雇员总数的 14%。分析显示,去年离职人数与新入职人数比高达 11:1,STEM 领域博士净流失 4224 人。美国 CDC 有 519 名 STEM 博士离职,其中 16% 收到了裁员通知。但大部分联邦机构没有裁减 STEM 博士雇员,离职的 STEM 博士多数是退休和辞职。NIH 离职 STEM 博士人数超过 1100 人。

  15. 利用电和空气生成汽油正走向现实

    美国创业公司 Aircela 准备推出能利用电和空气生成汽油的机器。其工作原理分为三步。从空气中捕获二氧化碳和水蒸气,水利用电解分解成氢气和氧气,氧气释放;留下的氢气和二氧化碳混合物利用名为二氧化碳直接加氢制甲醇的方法制造出甲醇。赛车能使用甲醇,但普通汽车不能,因此 Aircela 的机器最后一步是将甲醇转换为汽油。Aircela 的机器每天大约能生产一加仑汽油,其容器最大能储存 17 加仑汽油。如果用户不经常开车,那么这种机器可以给汽车加满油。机器的目标价格是 1.5 万-2 万美元之间,该公司希望量产后价格能下降。机器所需的电能大约两倍汽油:一加仑汽油含有约 37kWh 的能量,需要约 75kWh 的电能,因此机器如能组合离网太阳能会比较合算,否则意义不大。