DIGEST · 2026-02-08

OrangeBot.AI Digest — 2026-02-08

57 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Bun v1.3.9 (bun.com)
  2. Vouch (github.com)
  3. I put a real-time 3D shader on the Game Boy Color (blog.otterstack.com)
  4. Billing can be bypassed using a combo of subagents with an agent definition (github.com)
  5. Omega-3 is inversely related to risk of early-onset dementia (pubmed.ncbi.nlm.nih.gov)
  6. I am happier writing code by hand (www.abhinavomprakash.com)
  7. GitHub Agentic Workflows (github.github.io)
  8. AI fatigue is real and nobody talks about it (siddhantkhare.com)
  9. Show HN: It took 4 years to sell my startup. I wrote a book about it (derekyan.com)
  10. Curating a Show on My Ineffable Mother, Ursula K. Le Guin (hyperallergic.com)
  11. Why E cores make Apple silicon fast (eclecticlight.co)
  12. Dave Farber has died (lists.nanog.org)
  13. Slop Terrifies Me (ezhik.jp)
  14. Matchlock – Secures AI agent workloads with a Linux-based sandbox (github.com)
  15. DoNotNotify is now Open Source (donotnotify.com)

GitHub Trending(12)

  1. KeygraphHQ / shannon

    Fully autonomous AI hacker to find actual exploits in your web apps. Shannon has achieved a 96.15% success rate on the hint-free, source-aware XBOW Benchmark.

  2. pydantic / monty

    A minimal, secure Python interpreter written in Rust for use by AI

  3. openai / skills

    Skills Catalog for Codex

  4. virattt / dexter

    An autonomous agent for deep financial research

  5. microsoft / litebox

    A security-focused library OS supporting kernel- and user-mode execution

  6. google / langextract

    A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

  7. obra / superpowers

    An agentic skills framework & software development methodology that works.

  8. OpenBMB / MiniCPM-o

    A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone

  9. likec4 / likec4

    Visualize, collaborate, and evolve the software architecture with always actual and live diagrams from your code

  10. iOfficeAI / AionUi

    Free, local, open-source 24/7 Cowork and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

  11. home-assistant / addons

    ➕ Docker add-ons for Home Assistant

  12. gitbutlerapp / gitbutler

    The GitButler version control client, backed by Git, powered by Tauri/Rust/Svelte

Hugging Face(15)

  1. CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

    Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-car voice assistants, users often issue incomplete or ambiguous requests, creating intrinsic uncertainty that agents must manage through dialogue, tool use, and policy adherence. We introduce CAR-bench, a benchmark for evaluating consistency, uncertainty handling, and capability awareness in multi-turn, tool-using LLM agents in an in-car assistant domain. The environment features an LLM-simulated user, domain policies, and 58 interconnected tools spanning navigation, productivity, charging, and vehicle control. Beyond standard task completion, CAR-bench introduces Hallucination tasks that test agents' limit-awareness under missing tools or information, and Disambiguation tasks that require resolving uncertainty through clarification or internal information gathering. Baseline results reveal large gaps between occasional and consistent success on all task types. Even frontier reasoning LLMs achieve less than 50% consistent pass rate on Disambiguation tasks due to premature actions, and frequently violate policies or fabricate information to satisfy user requests in Hallucination tasks, underscoring the need for more reliable and self-aware LLM agents in real-world settings.

  2. Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

    As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S^2Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

  3. MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long histories. To this end, we present MemSkill, which reframes these operations as learnable and evolvable memory skills, structured and reusable routines for extracting, consolidating, and pruning information from interaction traces. Inspired by the design philosophy of agent skills, MemSkill employs a controller that learns to select a small set of relevant skills, paired with an LLM-based executor that produces skill-guided memories. Beyond learning skill selection, MemSkill introduces a designer that periodically reviews hard cases where selected skills yield incorrect or incomplete memories, and evolves the skill set by proposing refinements and new skills. Together, MemSkill forms a closed-loop procedure that improves both the skill-selection policy and the skill set itself. Experiments on LoCoMo, LongMemEval, HotpotQA, and ALFWorld demonstrate that MemSkill improves task performance over strong baselines and generalizes well across settings. Further analyses shed light on how skills evolve, offering insights toward more adaptive, self-evolving memory management for LLM agents.

  4. Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

    Recent applications of Reinforcement Learning with Verifiable Rewards (RLVR) to Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated significant success in enhancing reasoning capabilities for complex tasks. During RLVR training, an increase in response length is often regarded as a key factor contributing to the growth of reasoning ability. However, the patterns of change in response length vary significantly across different RLVR algorithms during the training process. To provide a fundamental explanation for these variations, this paper conducts an in-depth analysis of the components of mainstream RLVR algorithms. We present a theoretical analysis of the factors influencing response length and validate our theory through extensive experimentation. Building upon these theoretical findings, we propose the Length-Unbiased Sequence Policy Optimization (LUSPO) algorithm. Specifically, we rectify the length bias inherent in Group Sequence Policy Optimization (GSPO), rendering its loss function unbiased with respect to response length and thereby resolving the issue of response length collapse. We conduct extensive experiments across mathematical reasoning benchmarks and multimodal reasoning scenarios, where LUSPO consistently achieves superior performance. Empirical results demonstrate that LUSPO represents a novel, state-of-the-art optimization strategy compared to existing methods such as GRPO and GSPO.

  5. DFlash: Block Diffusion for Flash Speculative Decoding

    Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel generation, but current diffusion models typically underperform compared with autoregressive models. In this paper, we introduce DFlash, a speculative decoding framework that employs a lightweight block diffusion model for parallel drafting. By generating draft tokens in a single forward pass and conditioning the draft model on context features extracted from the target model, DFlash enables efficient drafting with high-quality outputs and higher acceptance rates. Experiments show that DFlash achieves over 6x lossless acceleration across a range of models and tasks, delivering up to 2.5x higher speedup than the state-of-the-art speculative decoding method EAGLE-3.

  6. Context Forcing: Consistent Autoregressive Video Generation with Long Context

    Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student's context length. To resolve this, we propose Context Forcing, a novel framework that trains a long-context student via a long-context teacher. By ensuring the teacher is aware of the full generation history, we eliminate the supervision mismatch, enabling the robust training of models capable of long-term consistency. To make this computationally feasible for extreme durations (e.g., 2 minutes), we introduce a context management system that transforms the linearly growing context into a Slow-Fast Memory architecture, significantly reducing visual redundancy. Extensive results demonstrate that our method enables effective context lengths exceeding 20 seconds -- 2 to 10 times longer than state-of-the-art methods like LongLive and Infinite-RoPE. By leveraging this extended context, Context Forcing preserves superior consistency across long durations, surpassing state-of-the-art baselines on various long video evaluation metrics.

  7. RISE-Video: Can Video Generators Decode Implicit World Rules?

    While generative video models have achieved remarkable visual fidelity, their capacity to internalize and reason over implicit world rules remains a critical yet under-explored frontier. To bridge this gap, we present RISE-Video, a pioneering reasoning-oriented benchmark for Text-Image-to-Video (TI2V) synthesis that shifts the evaluative focus from surface-level aesthetics to deep cognitive reasoning. RISE-Video comprises 467 meticulously human-annotated samples spanning eight rigorous categories, providing a structured testbed for probing model intelligence across diverse dimensions, ranging from commonsense and spatial dynamics to specialized subject domains. Our framework introduces a multi-dimensional evaluation protocol consisting of four metrics: Reasoning Alignment, Temporal Consistency, Physical Rationality, and Visual Quality. To further support scalable evaluation, we propose an automated pipeline leveraging Large Multimodal Models (LMMs) to emulate human-centric assessment. Extensive experiments on 11 state-of-the-art TI2V models reveal pervasive deficiencies in simulating complex scenarios under implicit constraints, offering critical insights for the advancement of future world-simulating generative models.

  8. Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

    Proactive interventions by LLM critic models are often assumed to improve reliability, yet their effects at deployment time are poorly understood. We show that a binary LLM critic with strong offline accuracy (AUROC 0.94) can nevertheless cause severe performance degradation, inducing a 26 percentage point (pp) collapse on one model while affecting another by near zero pp. This variability demonstrates that LLM critic accuracy alone is insufficient to determine whether intervention is safe. We identify a disruption-recovery tradeoff: interventions may recover failing trajectories but also disrupt trajectories that would have succeeded. Based on this insight, we propose a pre-deployment test that uses a small pilot of 50 tasks to estimate whether intervention is likely to help or harm, without requiring full deployment. Across benchmarks, the test correctly anticipates outcomes: intervention degrades performance on high-success tasks (0 to -26 pp), while yielding a modest improvement on the high-failure ALFWorld benchmark (+2.8 pp, p=0.014). The primary value of our framework is therefore identifying when not to intervene, preventing severe regressions before deployment.

  9. Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

    High-quality kernel is critical for scalable AI systems, and enabling LLMs to generate such code would advance AI development. However, training LLMs for this task requires sufficient data, a robust environment, and the process is often vulnerable to reward hacking and lazy optimization. In these cases, models may hack training rewards and prioritize trivial correctness over meaningful speedup. In this paper, we systematically study reinforcement learning (RL) for kernel generation. We first design KernelGYM, a robust distributed GPU environment that supports reward hacking check, data collection from multi-turn interactions and long-term RL training. Building on KernelGYM, we investigate effective multi-turn RL methods and identify a biased policy gradient issue caused by self-inclusion in GRPO. To solve this, we propose Turn-level Reinforce-Leave-One-Out (TRLOO) to provide unbiased advantage estimation for multi-turn RL. To alleviate lazy optimization, we incorporate mismatch correction for training stability and introduce Profiling-based Rewards (PR) and Profiling-based Rejection Sampling (PRS) to overcome the issue. The trained model, Dr.Kernel-14B, reaches performance competitive with Claude-4.5-Sonnet in Kernelbench. Finally, we study sequential test-time scaling for Dr.Kernel-14B. On the KernelBench Level-2 subset, 31.6% of the generated kernels achieve at least a 1.2x speedup over the Torch reference, surpassing Claude-4.5-Sonnet (26.7%) and GPT-5 (28.6%). When selecting the best candidate across all turns, this 1.2x speedup rate further increases to 47.8%. All resources, including environment, training code, models, and dataset, are included in https://www.github.com/hkust-nlp/KernelGYM.

  10. ProAct: Agentic Lookahead in Interactive Environments

    Existing Large Language Model (LLM) agents struggle in interactive environments requiring long-horizon planning, primarily due to compounding errors when simulating future states. To address this, we propose ProAct, a framework that enables agents to internalize accurate lookahead reasoning through a two-stage training paradigm. First, we introduce Grounded LookAhead Distillation (GLAD), where the agent undergoes supervised fine-tuning on trajectories derived from environment-based search. By compressing complex search trees into concise, causal reasoning chains, the agent learns the logic of foresight without the computational overhead of inference-time search. Second, to further refine decision accuracy, we propose the Monte-Carlo Critic (MC-Critic), a plug-and-play auxiliary value estimator designed to enhance policy-gradient algorithms like PPO and GRPO. By leveraging lightweight environment rollouts to calibrate value estimates, MC-Critic provides a low-variance signal that facilitates stable policy optimization without relying on expensive model-based value approximation. Experiments on both stochastic (e.g., 2048) and deterministic (e.g., Sokoban) environments demonstrate that ProAct significantly improves planning accuracy. Notably, a 4B parameter model trained with ProAct outperforms all open-source baselines and rivals state-of-the-art closed-source models, while demonstrating robust generalization to unseen environments. The codes and models are available at https://github.com/GreatX3/ProAct

  11. Privileged Information Distillation for Language Models

    Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, where closed-source systems typically hide their internal reasoning and expose only action trajectories. This breaks standard distillation pipelines, since successful behavior is observable but the reasoning process is not. For this, we introduce π-Distill, a joint teacher-student objective that trains a PI-conditioned teacher and an unconditioned student simultaneously using the same model. Additionally, we also introduce On-Policy Self-Distillation (OPSD), an alternative approach that trains using Reinforcement Learning (RL) with a reverse KL-penalty between the student and the PI-conditioned teacher. We show that both of these algorithms effectively distill frontier agents using action-only PI. Specifically we find that π-Distill and in some cases OPSD, outperform industry standard practices (Supervised finetuning followed by RL) that assume access to full Chain-of-Thought supervision across multiple agentic benchmarks, models, and forms of PI. We complement our results with extensive analysis that characterizes the factors enabling effective learning with PI, focusing primarily on π-Distill and characterizing when OPSD is competitive.

  12. InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

    Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPrior, a scalable framework that learns a unified generative controller through large-scale imitation pretraining and post-training by reinforcement learning. InterPrior first distills a full-reference imitation expert into a versatile, goal-conditioned variational policy that reconstructs motion from multimodal observations and high-level intent. While the distilled policy reconstructs training behaviors, it does not generalize reliably due to the vast configuration space of large-scale human-object interactions. To address this, we apply data augmentation with physical perturbations, and then perform reinforcement learning finetuning to improve competence on unseen goals and initializations. Together, these steps consolidate the reconstructed latent skills into a valid manifold, yielding a motion prior that generalizes beyond the training data, e.g., it can incorporate new behaviors such as interactions with unseen objects. We further demonstrate its effectiveness for user-interactive control and its potential for real robot deployment.

  13. Reinforcement World Model Learning for LLM-based Agents

    Large language models (LLMs) have achieved strong performance in language-centric tasks. However, in agentic settings, LLMs often struggle to anticipate action consequences and adapt to environment dynamics, highlighting the need for world-modeling capabilities in LLM-based agents. We propose Reinforcement World Model Learning (RWML), a self-supervised method that learns action-conditioned world models for LLM-based agents on textual states using sim-to-real gap rewards. Our method aligns simulated next states produced by the model with realized next states observed from the environment, encouraging consistency between internal world simulations and actual environment dynamics in a pre-trained embedding space. Unlike next-state token prediction, which prioritizes token-level fidelity (i.e., reproducing exact wording) over semantic equivalence and can lead to model collapse, our method provides a more robust training signal and is empirically less susceptible to reward hacking than LLM-as-a-judge. We evaluate our method on ALFWorld and τ^2 Bench and observe significant gains over the base model, despite being entirely self-supervised. When combined with task-success rewards, our method outperforms direct task-success reward RL by 6.9 and 5.7 points on ALFWorld and τ^2 Bench respectively, while matching the performance of expert-data training.

  14. Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

    Despite strong performance on existing benchmarks, it remains unclear whether large language models can reason over genuinely novel scientific information. Most evaluations score end-to-end RAG pipelines, where reasoning is confounded with retrieval and toolchain choices, and the signal is further contaminated by parametric memorization and open-web volatility. We introduce DeR2, a controlled deep-research sandbox that isolates document-grounded reasoning while preserving core difficulties of deep search: multi-step synthesis, denoising, and evidence-based conclusion making. DeR2 decouples evidence access from reasoning via four regimes--Instruction-only, Concepts (gold concepts without documents), Related-only (only relevant documents), and Full-set (relevant documents plus topically related distractors)--yielding interpretable regime gaps that operationalize retrieval loss vs. reasoning loss and enable fine-grained error attribution. To prevent parametric leakage, we apply a two-phase validation that requires parametric failure without evidence while ensuring oracle-concept solvability. To ensure reproducibility, each instance provides a frozen document library (drawn from 2023-2025 theoretical papers) with expert-annotated concepts and validated rationales. Experiments across a diverse set of state-of-the-art foundation models reveal substantial variation and significant headroom: some models exhibit mode-switch fragility, performing worse with the Full-set than with Instruction-only, while others show structural concept misuse, correctly naming concepts but failing to execute them as procedures.

  15. Grounding and Enhancing Informativeness and Utility in Dataset Distillation

    Dataset Distillation (DD) seeks to create a compact dataset from a large, real-world dataset. While recent methods often rely on heuristic approaches to balance efficiency and quality, the fundamental relationship between original and synthetic data remains underexplored. This paper revisits knowledge distillation-based dataset distillation within a solid theoretical framework. We introduce the concepts of Informativeness and Utility, capturing crucial information within a sample and essential samples in the training set, respectively. Building on these principles, we define optimal dataset distillation mathematically. We then present InfoUtil, a framework that balances informativeness and utility in synthesizing the distilled dataset. InfoUtil incorporates two key components: (1) game-theoretic informativeness maximization using Shapley Value attribution to extract key information from samples, and (2) principled utility maximization by selecting globally influential samples based on Gradient Norm. These components ensure that the distilled dataset is both informative and utility-optimized. Experiments demonstrate that our method achieves a 6.1\% performance improvement over the previous state-of-the-art approach on ImageNet-1K dataset using ResNet-18.

Solidot(15)

  1. AI 热导致短缺无处不在

    美国五大科技公司亚马逊、Google、微软、Meta 和甲骨文今年计划在 AI 上投资大约 7000 亿美元,但在可预计的未来 AI 投资获得的回报远低于支出。而在 AI 上的巨额投资已经让整个世界体验到了无处不在的短缺。熟练电工越来越难以找到,非数据中心建筑项目被迫暂停,智能手机价格未来几年会继续上涨,有前景的创新面临资金不足的困境。知名投资人 Roger McNamee 称,自 2022 年中期以来,美国在 AI 领域的投资额可能超过了此前整个科技行业的所有投资总额。苹果上周通知投资者,该公司在采购 iPhone 和 Mac 电脑所需的两种关键芯片上遇到了困难。CEO Tim Cook 不愿意讨论是否会涨价。非 AI 创业公司的融资额降至十年来的最低点。

  2. Waymo 部分远程操作人员位于菲律宾

    在美国国会自动驾驶汽车安全和监管听证会上,Waymo 首席安全官 Mauricio Peña 披露该公司的部分远程操作人员位于菲律宾。当自动驾驶汽车遭遇它无法自己解决的驾驶情况时怎么办?汽车会联络公司,与远程操作人员进行通信,远程操作人员并不会远程驾驶汽车,而是提供指导,动态的驾驶任务仍然由汽车自身负责。Mauricio Peña 博士表示部分远程操作人员位于美国,还有部分位于菲律宾。Waymo 解释说,远程操作员工都是持有驾照的司机,会接受与驾驶相关的犯罪记录和其它交通违规行为的审查,还会“接受随机的毒品检测”。

  3. Steam Machine 定价过高可能导致其缺乏竞争力

    Valve 原本针对入门级 PC 游戏市场的 Steam Machine 因为内存和固态硬盘价格飙升而可能因定价过高缺乏竞争力,乃至无人问津。目前 16GB DDR5 内存条高达 200 美元。在 Valve 去年宣布 Steam Machine 时分析师估计其 512 GB 固态硬盘版本售价在 599 至 629 美元之间,2TB 固态硬盘版本 849 至 899 美元之间。但如今因价格飙升,分析师估计 512GB 版本可能超过 1000 美元,2TB 版本价格可能在 1300-1500美元,这根本不太可能卖出去。Valve 面临的一大问题是,相比现有 PC 厂商它的产量低议价能力更弱,更容易受价格波动的影响。

  4. 丰田开发适合汽车的开源游戏引擎 Fluorite

    Toyota Connected North America 的开发者在 FOSDEM 2026 上宣布了他们正在开发的开源游戏引擎 Fluorite。Toyota Connected North America 是丰田和微软合作成立的公司,致力于开发车载软件、AI 等相关技术。Fluorite 使用了 Flutter、Dart 语言以及 Google 的 Filament 3D 渲染引擎。丰田在寻找适合车载系统的游戏引擎,Unity 和 Unreal Engine 因代码私有、资源占用高以及授权费用昂贵被否决,开源引擎 Godot 存在启动时间过长且资源占用过高等问题,其他引擎被认为不稳定或缺乏稳定的 API。丰田最终选择了 Fluorite,目前公开的信息还不多。

  5. AI.com 域名以 7 千万美元出售

    加密货币交易所 Crypto.com 联合创始人兼 CEO Kris Marszalek 以 7000 万美元的价格收购了域名 AI.com,创下域名最高成交价纪录。款项全部以加密货币形式支付给一位未透露姓名的卖家。Marszalek 计划在本周末的超级碗广告中推出该网站,宣传个人“AI 智能体”,该智能体能帮助用户发送消息、使用应用和交易股票。根据 GoDaddy 的数据,此前的域名成交纪录是 Carinsurance.com,成交价接近 5000 万美元。

  6. 一季度内存价格比去年四季度翻番

    内存价格自去年 10 月之后开始飙升,在京东上 32 GB DDR4 内存条从去年初的 400-500 元涨到了如今的 2500 元左右。根据 Counterpoint Research 的跟踪报告,2026 年第一季度 DRAM、NAND 和 HBM 芯片价格比 2025 年四季度上涨了 80%-90%。64GB RDIMM 内存价格从 2025 年四季度的合同价 450 美元飙升至 900 美元以上,Counterpoint 预计其价格将在第二季度突破 1000 美元。DRAM 运营利润率在 2025 年四季度达到了 60% 左右——这是传统 DRAM 利润率首次超过 HBM——而 2026 年第一季度有望创历史新高。

  7. 美国科技巨头市值蒸发 1.35 万亿美元

    在亚马逊宣布 2026 年高达 2000 亿美元的 AI 资本支出计划后,其股价周五暴跌逾 9%。2000 亿美元比 2025 年的 1250 亿美元增加了 56%,而此前投资者已经对 AI 过热泡沫可能破裂感到担忧。美国主要科技巨头都公布了远超 2025 年的 AI 支出计划:Google 的 AI 支出从 2025 年的 914 亿美元增加到了 2026 年的 1850 亿美元,Meta 和微软也都增加了 AI 支出。这些科技巨头今年的 AI 支出将超过 6600 亿美元,结果是微软、英伟达、甲骨文、Meta、亚马逊和 Alphabet 的市值蒸发了 1.35 万亿美元。

  8. 欧盟裁定 TikTok 的成瘾性设计违法

    TikTok 能称雄短视频市场与其推荐算法密切相关,它的算法能在用户点击的一秒内更新推荐,它的无限滚动让用户沉迷其中。欧盟委员会在对 TikTok 展开长达一年的调查后宣布,初步裁定无限滚动、自动播放、高度定制化推荐系统等成瘾性设计违反了欧盟的《数字服务法(DSA)》,认为成瘾性设计会对未成年人和弱势群体造成伤害,TikTok 未采取有效措施降低成瘾性设计带来的风险,认为 TikTok 需要对其服务的基本设计进行调整。

  9. 微软用 Rust 开发新安全操作系统 LiteBox

    内核开发者、微软 Linux 新兴技术团队负责人 James Morris 宣布了使用 Rust 语言开发的新操作系统项目 LiteBox。项目托管在 GitHub 上,采用 MIT 许可证。LiteBox 设计作为一个安全内核运行,通过虚拟化硬件保护客户机内核,它通过大幅减少与主机的接口缩小攻击面。它可被用于在 Windows 上运行未修改的 Linux 程序;在 Linux 上沙盒化 Linux 应用;运行在 Linux Virtualization Based Security“LVBS”之上;等等。LiteBox 目前在活跃开发之中,还没有发布稳定版本。

  10. 快手被罚 1.191 亿元

    北京网信办宣布对快手处以 1.191 亿元,责令其限期改正,从严处理责任人。声明称:“针对近期快手平台出现大量色情低俗内容直播问题,在国家互联网信息办公室指导下,北京市互联网信息办公室依法对北京快手科技有限公司涉嫌违法行为进行立案调查。经查实,快手平台未履行网络安全保护义务,未及时处置系统漏洞等安全风险,未对用户发布的违法信息立即采取停止传输、消除等处置措施,情节严重,影响恶劣。”去年 12 月 22 日,快手直播间出现大量色情內容,持续超过一个小时。当时快手回应称平台遭到黑灰产攻击已紧急处理。

  11. 科学家在 100 公里光纤上演示了设备无关的量子密钥分发

    中科大的研究人员在《科学》期刊上报告通过长达 100 公里的光纤演示了与设备无关的量子密钥分发。研究结果表明,这种方法可在都市规模保障加密通信安全——这一传输距离远超以往结果——它将帮助缩小原理验证量子网络实验与实际应用之间的差距。量子密钥分发(QKD)是量子技术应用的前沿领域,它能实现格外安全的数字通信。早期形式的 QKD 是通过用可信设备来确保安全性,但它们存在技术限制和漏洞。一种更先进的方法是与器件无关的量子密钥分发(DI-QKD),后者的安全性直接源于量子基本现象,而无需信任量子设备的内在工作机制。

  12. HBO 制作《博德之门》电视剧

    HBO 将制作《博德之门》电视剧,《最后生还者(The Last of Us)》的制作人 Craig Mazin 担任新剧的创作者、编剧和执行制作人,《博德之门》版权所有者 Wizards of the Coast 的前故事总监 Chris Perkins 担任顾问。电视剧将讲述《博德之门3》后发生的故事。Mazin 计划邀请《博德之门3》的声优参与剧集制作,类似他在《最后生还者》中的做法。目前不清楚《博德之门3》开发商 Larian Studios 是否会参与制作,它正在开发《Divinity》系列的新作。Mazin 称他在《博德之门3》中投入了 1000 个小时,能延续其故事是梦想成真。

  13. 日本电视市场中国厂商占六成

    根据调研公司的数据,2025 年日本国内电视机市场份额海信控制 95% 股份的 REGZA 位居首位,海信和 TCL 合占五成。如果索尼品牌转移到 TCL 主导的合资公司,中国系将占到 6 成。在世界电视市场,三星占据榜首,三星、LG 电子、海信和 TCL 四家企业掌握了全球市场份额的一半以上。生产电视的日本大型企业只剩下松下,而松下也在剥离其电视业务,它的低价产品已由 TCL 代工。日本企业在规模和供应链方面处于劣势,很难以硬件为起点开展家电业务。

  14. 农药总体毒性仍然呈上升趋势

    第 15 届联合国生物多样性大会(COP15)设定了到 2030 年时将农药使用量和相关风险减半的目标,鼓励使用有机农药和低毒性农药。但根据发表在《科学》期刊上的最新研究,农药总体毒性仍然呈上升趋势,如无改变 2030 年目标可能难以实现。研究结果显示,全球农药所造成的生态总体毒性正在上升;这一增长趋势在多个国家、多种作物及各类物种群体中均有显现。总体而言,总施用毒性主要由少数剧毒化学品主导,其中果蔬、玉米、大豆、谷物和稻米所用农药毒性占了全球农药毒性的 76-83%。中国、巴西、美国和印度合计贡献了全球总施用毒性的 53-68%。

  15. 更多 Android 设备将支持苹果的 AirDrop

    Google 去年出人意料的实现了 Quick Share 和苹果 AirDrop 的互操作,但目前仅有 Pixel 10 系列 Android 手机支持该功能。Google Android 平台工程副总裁 Eric Kay 表示 2026 年会有更多 Android 设备支持该功能,他表示正与合作伙伴一起将对 AirDrop 的支持扩展到整个 Android 生态系统。目前 Android 厂商中只有 Nothing 确认正致力于支持该功能。Kay 还表示 Google 正加倍努力让苹果 iPhone 用户更容易切换到 Android。