DIGEST · 2026-02-15

OrangeBot.AI Digest — 2026-02-15

56 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. I’m joining OpenAI (steipete.me)
  2. Modern CSS Code Snippets: Stop writing CSS like it's 2015 (modern-css.com)
  3. Editor's Note: Retraction of article containing fabricated quotations (arstechnica.com)
  4. Palantir Gets Millions of Dollars from New York City's Public Hospitals (theintercept.com)
  5. Palantir vs. the "Republik": US analytics firm takes magazine to court (www.heise.de)
  6. Gwtar: A static efficient single-file HTML format (gwern.net)
  7. LT6502: A 6502-based homebrew laptop (github.com)
  8. EU bans the destruction of unsold apparel, clothing, accessories and footwear (environment.ec.europa.eu)
  9. Hideki Sato, designer of all Sega's consoles, has died (www.videogameschronicle.com)
  10. How Is Data Stored? (www.makingsoftware.com)
  11. Amazon's Ring and Google's Nest reveal the severity of U.S. surveillance state (greenwald.substack.com)
  12. I fixed Windows native development (marler8997.github.io)
  13. Two different tricks for fast LLM inference (www.seangoedecke.com)
  14. A practical guide to observing the night sky for real skies and real equipment (stargazingbuddy.com)
  15. Oat – Ultra-lightweight, zero dependency, semantic HTML, CSS, JS UI library (oat.ink)

GitHub Trending(11)

  1. nautechsystems / nautilus_trader

    A high-performance algorithmic trading platform and event-driven backtester

  2. steipete / gogcli

    Google Suite CLI: Gmail, GCal, GDrive, GContacts.

  3. rowboatlabs / rowboat

    Open-source AI coworker, with memory

  4. github / gh-aw

    GitHub Agentic Workflows

  5. ChromeDevTools / chrome-devtools-mcp

    Chrome DevTools for coding agents

  6. alibaba / zvec

    A lightweight, lightning-fast, in-process vector database

  7. openclaw / openclaw

    Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

  8. moonshine-ai / moonshine

    Fast and accurate automatic speech recognition (ASR) for edge devices

  9. brave / brave-browser

    Brave browser for Android, iOS, Linux, macOS, Windows.

  10. SynkraAI / aios-core

    Synkra AIOS: AI-Orchestrated System for Full Stack Development - Core Framework v4.0

  11. ruvnet / wifi-densepose

    Production-ready implementation of InvisPose - a revolutionary WiFi-based dense human pose estimation system that enables real-time full-body tracking through walls using commodity mesh routers

Hugging Face(15)

  1. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

    The emergence of multi-agent systems built from large language models (LLMs) offers a promising paradigm for scalable collective intelligence and self-evolution. Ideally, such systems would achieve continuous self-improvement in a fully closed loop while maintaining robust safety alignment--a combination we term the self-evolution trilemma. However, we demonstrate both theoretically and empirically that an agent society satisfying continuous self-evolution, complete isolation, and safety invariance is impossible. Drawing on an information-theoretic framework, we formalize safety as the divergence degree from anthropic value distributions. We theoretically demonstrate that isolated self-evolution induces statistical blind spots, leading to the irreversible degradation of the system's safety alignment. Empirical and qualitative results from an open-ended agent community (Moltbook) and two closed self-evolving systems reveal phenomena that align with our theoretical prediction of inevitable safety erosion. We further propose several solution directions to alleviate the identified safety concern. Our work establishes a fundamental limit on the self-evolving AI societies and shifts the discourse from symptom-driven safety patches to a principled understanding of intrinsic dynamical risks, highlighting the need for external oversight or novel safety-preserving mechanisms.

  2. Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

    Large-scale verifiable prompts underpin the success of Reinforcement Learning with Verifiable Rewards (RLVR), but they contain many uninformative examples and are costly to expand further. Recent studies focus on better exploiting limited training data by prioritizing hard prompts whose rollout pass rate is 0. However, easy prompts with a pass rate of 1 also become increasingly prevalent as training progresses, thereby reducing the effective data size. To mitigate this, we propose Composition-RL, a simple yet useful approach for better utilizing limited verifiable prompts targeting pass-rate-1 prompts. More specifically, Composition-RL automatically composes multiple problems into a new verifiable question and uses these compositional prompts for RL training. Extensive experiments across model sizes from 4B to 30B show that Composition-RL consistently improves reasoning capability over RL trained on the original dataset. Performance can be further boosted with a curriculum variant of Composition-RL that gradually increases compositional depth over training. Additionally, Composition-RL enables more effective cross-domain RL by composing prompts drawn from different domains. Codes, datasets, and models are available at https://github.com/XinXU-USTC/Composition-RL.

  3. DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

    Current unified multimodal models for image generation and editing typically rely on massive parameter scales (e.g., >10B), entailing prohibitive training costs and deployment footprints. In this work, we present DeepGen 1.0, a lightweight 5B unified model that achieves comprehensive capabilities competitive with or surpassing much larger counterparts. To overcome the limitations of compact models in semantic understanding and fine-grained control, we introduce Stacked Channel Bridging (SCB), a deep alignment framework that extracts hierarchical features from multiple VLM layers and fuses them with learnable 'think tokens' to provide the generative backbone with structured, reasoning-rich guidance. We further design a data-centric training strategy spanning three progressive stages: (1) Alignment Pre-training on large-scale image-text pairs and editing triplets to synchronize VLM and DiT representations, (2) Joint Supervised Fine-tuning on a high-quality mixture of generation, editing, and reasoning tasks to foster omni-capabilities, and (3) Reinforcement Learning with MR-GRPO, which leverages a mixture of reward functions and supervision signals, resulting in substantial gains in generation quality and alignment with human preferences, while maintaining stable training progress and avoiding visual artifacts. Despite being trained on only ~50M samples, DeepGen 1.0 achieves leading performance across diverse benchmarks, surpassing the 80B HunyuanImage by 28% on WISE and the 27B Qwen-Image-Edit by 37% on UniREditBench. By open-sourcing our training code, weights, and datasets, we provide an efficient, high-performance alternative to democratize unified multimodal research.

  4. Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

    On-policy distillation (OPD), which aligns the student with the teacher's logit distribution on student-generated trajectories, has demonstrated strong empirical gains in improving student performance and often outperforms off-policy distillation and reinforcement learning (RL) paradigms. In this work, we first theoretically show that OPD is a special case of dense KL-constrained RL where the reward function and the KL regularization are always weighted equally and the reference model can by any model. Then, we propose the Generalized On-Policy Distillation (G-OPD) framework, which extends the standard OPD objective by introducing a flexible reference model and a reward scaling factor that controls the relative weight of the reward term against the KL regularization. Through comprehensive experiments on math reasoning and code generation tasks, we derive two novel insights: (1) Setting the reward scaling factor to be greater than 1 (i.e., reward extrapolation), which we term ExOPD, consistently improves over standard OPD across a range of teacher-student size pairings. In particular, in the setting where we merge the knowledge from different domain experts, obtained by applying domain-specific RL to the same student model, back into the original student, ExOPD enables the student to even surpass the teacher's performance boundary and outperform the domain teachers. (2) Building on ExOPD, we further find that in the strong-to-weak distillation setting (i.e., distilling a smaller student from a larger teacher), performing reward correction by choosing the reference model as the teacher's base model before RL yields a more accurate reward signal and further improves distillation performance. However, this choice assumes access to the teacher's pre-RL variant and incurs more computational overhead. We hope our work offers new insights for future research on OPD.

  5. GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

    Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enhancing VLA learning. Therefore, we propose GigaBrain-0.5M*, a VLA model trained via world model-based reinforcement learning. Built upon GigaBrain-0.5, which is pre-trained on over 10,000 hours of robotic manipulation data, whose intermediate version currently ranks first on the international RoboChallenge benchmark. GigaBrain-0.5M* further integrates world model-based reinforcement learning via RAMP (Reinforcement leArning via world Model-conditioned Policy) to enable robust cross-task adaptation. Empirical results demonstrate that RAMP achieves substantial performance gains over the RECAP baseline, yielding improvements of approximately 30\% on challenging tasks including Laundry Folding, Box Packing, and Espresso Preparation. Critically, GigaBrain-0.5M^* exhibits reliable long-horizon execution, consistently accomplishing complex manipulation tasks without failure as validated by real-world deployment videos on our https://gigabrain05m.github.io{project page}.

  6. MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

    Discrete audio tokenizers are fundamental to empowering large language models with native audio processing and generation capabilities. Despite recent progress, existing approaches often rely on pretrained encoders, semantic distillation, or heterogeneous CNN-based architectures. These designs introduce fixed inductive biases that limit reconstruction fidelity and hinder effective scaling. In this paper, we argue that discrete audio tokenization should be learned fully end-to-end using a homogeneous and scalable architecture. To this end, we first propose CAT (Causal Audio Tokenizer with Transformer), a purely Transformer-based architecture that jointly optimizes the encoder, quantizer, and decoder from scratch for high-fidelity reconstruction. Building on the CAT architecture, we develop MOSS-Audio-Tokenizer, a large-scale audio tokenizer featuring 1.6 billion parameters, pre-trained on 3 million hours of diverse, general audio data. We show that this simple, fully end-to-end approach built from homogeneous, causal Transformer blocks scales gracefully and supports high-fidelity reconstruction across diverse audio domains. Across speech, sound, and music, MOSS-Audio-Tokenizer consistently outperforms prior codecs over a wide range of bitrates, while exhibiting predictable improvements with increased scale. Notably, leveraging the discrete tokens from our model, we develop the first purely autoregressive TTS model that surpasses prior non-autoregressive and cascaded systems. Furthermore, MOSS-Audio-Tokenizer enables competitive ASR performance without auxiliary encoders. Our findings position the CAT architecture as a unified, scalable interface for the next generation of native audio foundation models.

  7. NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control

    Synthesizing coherent soundtracks for long-form videos remains a formidable challenge, currently stalled by three critical impediments: computational scalability, temporal coherence, and, most critically, a pervasive semantic blindness to evolving narrative logic. To bridge these gaps, we propose NarraScore, a hierarchical framework predicated on the core insight that emotion serves as a high-density compression of narrative logic. Uniquely, we repurpose frozen Vision-Language Models (VLMs) as continuous affective sensors, distilling high-dimensional visual streams into dense, narrative-aware Valence-Arousal trajectories. Mechanistically, NarraScore employs a Dual-Branch Injection strategy to reconcile global structure with local dynamism: a Global Semantic Anchor ensures stylistic stability, while a surgical Token-Level Affective Adapter modulates local tension via direct element-wise residual injection. This minimalist design bypasses the bottlenecks of dense attention and architectural cloning, effectively mitigating the overfitting risks associated with data scarcity. Experiments demonstrate that NarraScore achieves state-of-the-art consistency and narrative alignment with negligible computational overhead, establishing a fully autonomous paradigm for long-video soundtrack generation.

  8. Thinking with Drafting: Optical Decompression via Logical Reconstruction

    Existing multimodal large language models have achieved high-fidelity visual perception and exploratory visual generation. However, a precision paradox persists in complex reasoning tasks: optical perception systems transcribe symbols without capturing logical topology, while pixel-based generative models produce visual artifacts lacking mathematical exactness. To bridge this gap, we propose that reasoning over visual inputs be reconceptualized as optical decompression-the process of reconstructing latent logical structures from compressed visual tokens. Guided by the axiom that Parsing is Reasoning, we introduce Thinking with Drafting (TwD), which utilizes a minimalist Domain-Specific Language (DSL) as a grounding intermediate representation. Unlike standard approaches that hallucinate answers directly, TwD forces the model to draft its mental model into executable code, rendering deterministic visual proofs for self-verification. To validate this, we present VisAlg, a visual algebra benchmark. Experiments demonstrate that TwD serve as a superior cognitive scaffold. Our work establishes a closed-loop system where visual generation acts not as a creative output but as a logical verifier, offering a generalizable path for visual reasoning.

  9. LawThinker: A Deep Research Legal Agent in Dynamic Environments

    Legal reasoning requires not only correct outcomes but also procedurally compliant reasoning processes. However, existing methods lack mechanisms to verify intermediate reasoning steps, allowing errors such as inapplicable statute citations to propagate undetected through the reasoning chain. To address this, we propose LawThinker, an autonomous legal research agent that adopts an Explore-Verify-Memorize strategy for dynamic judicial environments. The core idea is to enforce verification as an atomic operation after every knowledge exploration step. A DeepVerifier module examines each retrieval result along three dimensions of knowledge accuracy, fact-law relevance, and procedural compliance, with a memory module for cross-round knowledge reuse in long-horizon tasks. Experiments on the dynamic benchmark J1-EVAL show that LawThinker achieves a 24% improvement over direct reasoning and an 11% gain over workflow-based methods, with particularly strong improvements on process-oriented metrics. Evaluations on three static benchmarks further confirm its generalization capability. The code is available at https://github.com/yxy-919/LawThinker-agent .

  10. Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

    Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel vector sketching task where a single sketch undergoes a dramatic semantic transformation through the sequential addition of strokes. We present Stroke of Surprise, a generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. The core challenge lies in the "dual-constraint": initial prefix strokes must form a coherent object (e.g., a duck) while simultaneously serving as the structural foundation for a second concept (e.g., a sheep) upon adding delta strokes. To address this, we propose a sequence-aware joint optimization framework driven by a dual-branch Score Distillation Sampling (SDS) mechanism. Unlike sequential approaches that freeze the initial state, our method dynamically adjusts prefix strokes to discover a "common structural subspace" valid for both targets. Furthermore, we introduce a novel Overlay Loss that enforces spatial complementarity, ensuring structural integration rather than occlusion. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, successfully expanding visual anagrams from the spatial to the temporal dimension. Project page: https://stroke-of-surprise.github.io/

  11. RISE: Self-Improving Robot Policy with Compositional World Model

    Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.

  12. Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

    Achieving effective test-time scaling requires models to engage in In-Context Exploration -- the intrinsic ability to generate, verify, and refine multiple reasoning hypotheses within a single continuous context. Grounded in State Coverage theory, our analysis identifies a critical bottleneck to enabling this capability: while broader state coverage requires longer reasoning trajectories, the probability of sampling such sequences decays exponentially during autoregressive generation, a phenomenon we term the ``Shallow Exploration Trap''. To bridge this gap, we propose Length-Incentivized Exploration(\method). This simple yet effective recipe explicitly encourages models to explore more via a length-based reward coupled with a redundancy penalty, thereby maximizing state coverage in two-step manner. Comprehensive experiments across different models (Qwen3, Llama) demonstrate that \method effectively incentivize in-context exploration. As a result, our method achieves an average improvement of 4.4\% on in-domain tasks and a 2.7\% gain on out-of-domain benchmarks.

  13. χ_{0}: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

    High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose χ_{0}, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. χ_{0} enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that χ_{0} surpasses the state-of-the-art π_{0.5} in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.

  14. EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

    Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. While this paradigm has advanced robot-arm manipulation, its potential for the more challenging, data-hungry problem of humanoid loco-manipulation remains largely unexplored. We present EgoHumanoid, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data, enabling humanoids to perform loco-manipulation across diverse real-world environments. To bridge the embodiment gap between humans and robots, including discrepancies in physical morphology and viewpoint, we introduce a systematic alignment pipeline spanning from hardware design to data processing. A portable system for scalable human data collection is developed, and we establish practical collection protocols to improve transferability. At the core of our human-to-humanoid alignment pipeline lies two key components. The view alignment reduces visual domain discrepancies caused by camera height and perspective variation. The action alignment maps human motions into a unified, kinematically feasible action space for humanoid control. Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51\%, particularly in unseen environments. Our analysis further reveals which behaviors transfer effectively and the potential for scaling human data.

  15. dVoting: Fast Voting for dLLMs

    Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting

Solidot(15)

  1. 在高危漏洞披露前电信公司提前屏蔽 Telnet 流量

    1 月 20 日公开的 Telnet 高危漏洞 CVE-2026-24061 存在于 GNU InetUtils telnetd 中,已有 10 年历史,CVSS 评分 9.8/10,非常容易被攻击者获取 root 权限。但在漏洞披露前一周,全球的 Telnet 流量就出现断崖式下降。电信运营商应该是提前收到了漏洞预警,提前采取行动防止漏洞利用。数据显示,1 月 14 日 Telnet 会话数在一小时内下降了 65%,两小时内下降了 83%。日均会话数从 12 月 1 日的 91.4 万次降至 1 月 14 日的约 37.3 万次,降幅达 59%。北美一家或多家 Tier 1 级中转服务提供商过滤了 Telnet 协议默认使用的 23 端口。BT、Cox Communications 和 Vultr 在内的 18 家电信运营商的 Telnet 会话数在 1 月 15 日从之前的数十万降至零。

  2. 欧盟采取行动禁用无限滚动

    欧盟首次尝试对社交媒体成瘾采取行动。本月早些时候,欧盟初步裁决 TikTok 的无限滚动、自动播放、高度定制化推荐系统等成瘾性设计违反了欧盟的《数字服务法(DSA)》 ,它要求 TikTok 禁用无限滚动、设置严格的屏幕休息时间,修改其推荐系统。欧盟针对 TikTok 的行动可能将树立新的设计标准,终结无限滚动时代。TikTok 可以为其设计进行辩护,如果它无法令欧盟满意,将面临其全球年收入 6% 的罚款。这是监管机构首次尝试为社交媒体平台的成瘾性设计制定法律标准。Meta 旗下的 Facebook 和 Instagram 也因其成瘾设计而接受调查。

  3. 最高法院称激活辅助驾驶功能后司机依旧需要承责

    最高法院首次发布道路交通安全刑事专题指导性案例,表示激活辅助驾驶功能后司机依旧需要对交通安全承担责任。案例称:“在辅助驾驶技术应用日益广泛的背景下,有的驾驶人在激活辅助驾驶系统后不再专注驾驶,而是玩手机、睡觉等,有的驾驶人甚至购买、使用“智驾神器”等非法配件,逃避系统安全监测,长时间“脱手”驾驶,严重威胁道路交通安全。指导性案例271号《王某群危险驾驶案》明确,车载辅助驾驶系统不能代替驾驶人成为驾驶主体,驾驶人激活车载辅助驾驶功能后,仍是实际执行驾驶任务的人,负有确保行车安全的责任。行为人激活辅助驾驶功能,并利用私自安装的配件逃避辅助驾驶系统监测的,即使其不在主驾驶位实际操控机动车,仍应作为驾驶主体承担相应法律责任。”

  4. Waymo 付费给 DoorDash 零工给自动驾驶汽车关车门

    Waymo 的出租车能在无人驾驶的情况下在六个城市运送乘客,但如果乘客不小心在下车后没有关车门,那么它的自动驾驶汽车将会失去行动能力。为解决该问题,Waymo 正在亚特兰大启动一个试点项目,付费给 DoorDash 的零工司机,请他们开车过来给汽车关车门。一位 DoorDash 司机在 Reddit 上发帖,称开车不到一英里抵达一辆 Waymo 汽车给它关上车门,这一任务的报价是 6.25 美元,确认完成之后还能额外获得 5 美元报酬。Waymo 和 DoorDash 证实了这一帖子的真实性,双方的合作始于今年初。Waymo 还与洛杉矶拖车服务应用 Honk 合作解决相同的问题。未来的 Waymo 汽车将配备自动关门功能。

  5. OpenAI 再次表示 DeepSeek 利用蒸馏训练其模型

    根据 OpenAI 本周四递交到美国众议院中国事务特别委员会的备忘录,它再次警告中国竞争对手 DeepSeek 利用蒸馏技术训练其模型。所谓蒸馏就是一个 AI 模型利用另一个模型的输出进行训练。OpenAI 称 DeepSeek 在一直搭其技术的便车。去年初 DeepSeek 发布 R1 模型后,OpenAI 就发表过类似的评论。

  6. Ars 报道的 AI 新闻包含了 AI 生成的虚假引言

    名叫 MJ Rathbun 的 OpenClaw AI agent 向 Python 图表库项目 matplotlib 递交了 pull request,遭到维护者 Scott Shambaugh 的拒绝,它随后就“愤怒”的发表了一篇博文攻击了维护者。它完成这一切显然有人类在背后操纵,不可能突然产生了意识,但目前没有还无人公开认领该 AI agent。此事在开源社区引发了广泛关注。知名科技新闻网站 Ars Technica 发表文章《After a routine code rejection, an AI agent published a hit piece on someone by name》,引用了维护者 Scott Shambaug 的评论。但非常有讽刺性的是,文章引用的评论显然是 AI 虚构的——或者叫 AI 的幻觉,根本就不存在,而作者以及编辑并没有核实其真实性。此事引发了对 Ars 的批评,Ars 是一家有 28 年历史的老牌科技媒体,使用 AI 虚构的内容明显违反了其内容政策。这篇文章已经撤下,Ars 表示正对此展开调查,由于恰逢假期,调查结果预计会在下周公布。

  7. NCAR 的天气预报超算被转交给第三方

    NYT 报道,美国国家科学基金会(NSF)周四宣布,国家大气研究中心(NCAR)的超算管理权将转移给第三方。该超算被用于运行天气模型提供天气预报和灾害预警等服务。NSF 没有提供更多信息。此举令气候科学家感到恐慌,他们担心 NCAR 可能被拆分,担心可能无法再使用超算运行天气模型。NCAR 的超算叫 Derecho,在 Top500 超算榜单(2025 年 11 月版)中排在 160 名,由 HPE 制造,使用了 AMD EPYC 7763 64C 处理器,328 个英伟达 GPU,理论峰值 19.87 PFLOPS(每秒千万亿次)。特朗普政府去年 12 月宣布计划解散 NCAR,管理和预算办公室主任 Russell Vought 称该中心是“美国最大的气候恐慌论源头之一”,表示联邦政府将“拆分”该机构。

  8. 中国成功测试可重复使用的新型火箭

    中国载人航天办公室宣布于 2 月 11 日成功完成了长征十号运载火箭系统低空演示验证与梦舟载人飞船系统最大动压逃逸飞行试验,其中火箭第一级在分离之后重新点燃助推器,减速缓慢降落在停留的回收驳船附近,第一级于 2 月 13 日成功回收,这是中国首次在海上实施运载火箭搜索回收任务,朝着火箭可重复使用迈出了重要一步。长征十号运载火箭主要用于载人月球探测任务,此次测试的是其缩小版,梦舟飞船则将取代目前使用的神舟飞船。长征十号第一级以及梦舟都设计可重复使用多次。

  9. 科学家警告地球接近气候临界点

    科学家警告地球正接近气候变化的临界点,越过临界点可能会导致全球暖化失控,无法遏制,世界将陷入地狱般的温室地球气候,将与过去 1.1 万年人类文明经历的温和气候截然不同。最近几年地球气温只上升了 1.3 摄氏度,但极端天气已经在全球范围内夺走大量生命和摧毁无数人的生计。如果气温上升的幅度达到 3-4 摄氏度,那么经济和社会将无法像我们所熟知的那样运转。科学家表示,何时触发临界点难以预测,但最重要是采取预防措施,大幅削减化石燃料的消耗。

  10. 当超级智能成为信仰,我们需要谈谈节奏

    Nala Ginrut 写道:在今天的技术语境中,如果你长时间阅读来自硅谷的讨论,你会很快发现一种近乎单一的价值排序:规模更大、算力更强、模型更通用、迭代更迅猛。 但当你把视线从发布会的灯光下移开,落到真实社会的地面,你会发现另一个问题正在变得越来越重要:如果智能真的迅速跨越临界点,我们的社会结构是否准备好了?

  11. 麻疹卷土重来

    在许多国家,麻疹已变得极为罕见,甚至一些医生从未接诊过一个病例,但这种情况正在改变。美国去年报告了逾 2000 例麻疹病例,创 30 年来纪录,且 2026 年病例数可能超过 2025 年。今年 1 月,英国、西班牙、奥地利等 6 国均失去了官方“无麻疹”认证。加拿大在去年 11 月失去“无麻疹”国家的地位,美国预计于今年 4 月步其后尘。麻疹病毒具有极强的传染性,可引起发热、咳嗽和皮疹,甚至导致死亡。数据显示,若周围人群均易感,那么每位麻疹患者平均会传染 12-18 人;若接触感染者,高达 90% 的未免疫人群会患上麻疹。好在麻疹疫苗的效果非常显著。接种一剂后,93% 的人将获得免疫力而免于感染;接种两剂后,保护率可提升至 97%。对于多数人而言,这种保护作用终身有效。当 92%~94% 的人群通过接种疫苗或既往感染获得免疫力时,麻疹病毒便无法继续传播,这种现象被称为群体免疫。然而在美国,幼儿园儿童的疫苗接种率从 2019—2020 学年的 95.2% 降至 2024—2025 学年的 92.5%,为疫情暴发打开了大门。

  12. 首次观测到恒星直接坍缩成黑洞

    天文学家首次完整观测到一颗大质量恒星在生命终点并未经历超新星爆发,而是直接坍缩形成黑洞。这颗名为 M31-2014-DS1 的恒星位于距地球约 250 万光年的仙女座星系。加州理工研究团队分析了 2005-2023 年间来自 NASA NEOWISE 项目及其他多台地面与空间望远镜的观测数据,发现该恒星的红外辐射自 2014 年起异常增亮,随后在 2016 年亮度急剧下降,整个变暗过程持续不到一年。在 2022-2023 年,该恒星在可见光与近红外波段已基本不可见,亮度仅为原先的万分之一,仅在热辐射更强的中红外波段留存微弱信号,亮度亦降至之前的十分之一。研究团队认为亮度骤降并最终消失的现象强烈表明,该恒星核心发生了引力坍缩并形成了黑洞。通常大质量恒星在耗尽核燃料后,核心会先坍缩为中子星,并借由中微子爆发产生向外激波,触发超新星爆炸。但理论预测,若激波未能抛射外层物质,物质将回落到中子星上,使其进一步坍缩成黑洞。此次观测首次为这一过程提供了直接证据。

  13. 长江禁渔令缓解了生态系统恶化

    根据发表在《科学》期刊上的一项研究,研究人员报告在实施为期十年的全方位商业禁渔令后,生态系统恶化已达数十年的长江正显现复苏端倪。自 1950 年代以来,中国经济的快速发展导致其境内最大、最长的河流——长江的淡水生物多样性严重衰退。这种衰退主要是由长达数十年的过度捕捞和栖息地退化造成的。尽管政府在生态保护和水质改善方面投入了巨资,但生物多样性却在持续下降,这引发了人们对传统恢复措施有效性的质疑。为此中国于 2021 年在整个长江流域实施了前所未有的十年禁渔令,并辅之以严格执法和环境统筹管理。通过分析 2018 年至 2023 年的数据,研究人员估了长江水域在禁渔令实施前后的鱼类群落状况。研究结果表明,实施禁渔令后,长江流域已显现出生态恢复的早期迹象,鱼类生物量增加了一倍多,物种丰富度亦有适度提升。体型较大、营养层级较高的物种恢复尤为显著,其数量与健康状况均优于禁渔之前。若干濒危物种与洄游物种——以及极度濒危的长江江豚——的种群数量也呈回升态势。

  14. Linux Mint 考虑采用更长的开发周期

    基于 Ubuntu 的发行版 Linux Mint 考虑放慢发布周期。Ubuntu 是每半年发布一个新版本,Linux Mint 发布周期类似。项目联合创始人 Clem Lefebvre 指出,每六个月发布一个新版本,此外还包括 LMDE,他们花在测试、修 bug 和发布上的时间远多于开发时间。Linux Mint 考虑改变现状,采用更长的开发周期,未来会公布更多信息。

  15. ICE 部署反无人机激光器,FAA 紧急关闭空域

    本周二晚上 FAA(美国联邦航空管理局)突然宣布德州 El Paso 周围空域关闭 10 天,但在周三上午 FAA 又突然宣布解除关闭。特朗普政府官员声称此举是为了应对墨西哥贩毒集团无人机的突然入侵。但知情人士称真正原因是 ICE 在未给予航空官员足够时间评估对商用飞机风险的情况下,部署了借自国防部的反无人机激光器。ICE 使用激光器击中了一个目标,他们以为是贩毒集团无人机,结果是气球。