OrangeBot.AI Digest — 2026-02-06
53 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- OpenCiv3: Open-source, cross-platform reimagining of Civilization III (openciv3.org)
- How to effectively write quality code with AI (heidenstedt.org)
- Show HN: I spent 4 years building a UI design tool with only the features I use (vecti.com)
- Sheldon Brown's Bicycle Technical Info (www.sheldonbrown.com)
- The Waymo World Model (waymo.com)
- An Update on Heroku (www.heroku.com)
- Understanding Neural Network, Visually (visualrambling.space)
- Microsoft open-sources LiteBox, a security-focused library OS (github.com)
- Hackers (1995) Animated Experience (hackers-1995.vercel.app)
- I now assume that all ads on Apple news are scams (kirkville.com)
- TikTok's 'addictive design' found to be illegal in Europe (www.nytimes.com)
- DNS Explained – How Domain Names Get Resolved (www.bhusalmanish.com.np)
- A new bill in New York would require disclaimers on AI-generated news content (www.niemanlab.org)
- Invention of DNA "page numbers" opens up possibilities for the bioeconomy (www.caltech.edu)
- Stay Away from My Trash (tldraw.dev)
GitHub Trending(8)
- openai / skills
Skills Catalog for Codex
- bytedance / UI-TARS-desktop
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
- nvm-sh / nvm
Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions
- likec4 / likec4
Visualize, collaborate, and evolve the software architecture with always actual and live diagrams from your code
- aquasecurity / trivy
Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more
- ZeroTworu / anet
Simple Rust VPN Client / Server
- Flowseal / zapret-discord-youtube
- DataExpert-io / data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Hugging Face(15)
- CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-car voice assistants, users often issue incomplete or ambiguous requests, creating intrinsic uncertainty that agents must manage through dialogue, tool use, and policy adherence. We introduce CAR-bench, a benchmark for evaluating consistency, uncertainty handling, and capability awareness in multi-turn, tool-using LLM agents in an in-car assistant domain. The environment features an LLM-simulated user, domain policies, and 58 interconnected tools spanning navigation, productivity, charging, and vehicle control. Beyond standard task completion, CAR-bench introduces Hallucination tasks that test agents' limit-awareness under missing tools or information, and Disambiguation tasks that require resolving uncertainty through clarification or internal information gathering. Baseline results reveal large gaps between occasional and consistent success on all task types. Even frontier reasoning LLMs achieve less than 50% consistent pass rate on Disambiguation tasks due to premature actions, and frequently violate policies or fabricate information to satisfy user requests in Hallucination tasks, underscoring the need for more reliable and self-aware LLM agents in real-world settings.
- Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening
As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S^2Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.
- Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR
Recent applications of Reinforcement Learning with Verifiable Rewards (RLVR) to Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated significant success in enhancing reasoning capabilities for complex tasks. During RLVR training, an increase in response length is often regarded as a key factor contributing to the growth of reasoning ability. However, the patterns of change in response length vary significantly across different RLVR algorithms during the training process. To provide a fundamental explanation for these variations, this paper conducts an in-depth analysis of the components of mainstream RLVR algorithms. We present a theoretical analysis of the factors influencing response length and validate our theory through extensive experimentation. Building upon these theoretical findings, we propose the Length-Unbiased Sequence Policy Optimization (LUSPO) algorithm. Specifically, we rectify the length bias inherent in Group Sequence Policy Optimization (GSPO), rendering its loss function unbiased with respect to response length and thereby resolving the issue of response length collapse. We conduct extensive experiments across mathematical reasoning benchmarks and multimodal reasoning scenarios, where LUSPO consistently achieves superior performance. Empirical results demonstrate that LUSPO represents a novel, state-of-the-art optimization strategy compared to existing methods such as GRPO and GSPO.
- Context Forcing: Consistent Autoregressive Video Generation with Long Context
Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student's context length. To resolve this, we propose Context Forcing, a novel framework that trains a long-context student via a long-context teacher. By ensuring the teacher is aware of the full generation history, we eliminate the supervision mismatch, enabling the robust training of models capable of long-term consistency. To make this computationally feasible for extreme durations (e.g., 2 minutes), we introduce a context management system that transforms the linearly growing context into a Slow-Fast Memory architecture, significantly reducing visual redundancy. Extensive results demonstrate that our method enables effective context lengths exceeding 20 seconds -- 2 to 10 times longer than state-of-the-art methods like LongLive and Infinite-RoPE. By leveraging this extended context, Context Forcing preserves superior consistency across long durations, surpassing state-of-the-art baselines on various long video evaluation metrics.
- MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long histories. To this end, we present MemSkill, which reframes these operations as learnable and evolvable memory skills, structured and reusable routines for extracting, consolidating, and pruning information from interaction traces. Inspired by the design philosophy of agent skills, MemSkill employs a controller that learns to select a small set of relevant skills, paired with an LLM-based executor that produces skill-guided memories. Beyond learning skill selection, MemSkill introduces a designer that periodically reviews hard cases where selected skills yield incorrect or incomplete memories, and evolves the skill set by proposing refinements and new skills. Together, MemSkill forms a closed-loop procedure that improves both the skill-selection policy and the skill set itself. Experiments on LoCoMo, LongMemEval, HotpotQA, and ALFWorld demonstrate that MemSkill improves task performance over strong baselines and generalizes well across settings. Further analyses shed light on how skills evolve, offering insights toward more adaptive, self-evolving memory management for LLM agents.
- Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention
Proactive interventions by LLM critic models are often assumed to improve reliability, yet their effects at deployment time are poorly understood. We show that a binary LLM critic with strong offline accuracy (AUROC 0.94) can nevertheless cause severe performance degradation, inducing a 26 percentage point (pp) collapse on one model while affecting another by near zero pp. This variability demonstrates that LLM critic accuracy alone is insufficient to determine whether intervention is safe. We identify a disruption-recovery tradeoff: interventions may recover failing trajectories but also disrupt trajectories that would have succeeded. Based on this insight, we propose a pre-deployment test that uses a small pilot of 50 tasks to estimate whether intervention is likely to help or harm, without requiring full deployment. Across benchmarks, the test correctly anticipates outcomes: intervention degrades performance on high-success tasks (0 to -26 pp), while yielding a modest improvement on the high-failure ALFWorld benchmark (+2.8 pp, p=0.014). The primary value of our framework is therefore identifying when not to intervene, preventing severe regressions before deployment.
- RISE-Video: Can Video Generators Decode Implicit World Rules?
While generative video models have achieved remarkable visual fidelity, their capacity to internalize and reason over implicit world rules remains a critical yet under-explored frontier. To bridge this gap, we present RISE-Video, a pioneering reasoning-oriented benchmark for Text-Image-to-Video (TI2V) synthesis that shifts the evaluative focus from surface-level aesthetics to deep cognitive reasoning. RISE-Video comprises 467 meticulously human-annotated samples spanning eight rigorous categories, providing a structured testbed for probing model intelligence across diverse dimensions, ranging from commonsense and spatial dynamics to specialized subject domains. Our framework introduces a multi-dimensional evaluation protocol consisting of four metrics: Reasoning Alignment, Temporal Consistency, Physical Rationality, and Visual Quality. To further support scalable evaluation, we propose an automated pipeline leveraging Large Multimodal Models (LMMs) to emulate human-centric assessment. Extensive experiments on 11 state-of-the-art TI2V models reveal pervasive deficiencies in simulating complex scenarios under implicit constraints, offering critical insights for the advancement of future world-simulating generative models.
- ProAct: Agentic Lookahead in Interactive Environments
Existing Large Language Model (LLM) agents struggle in interactive environments requiring long-horizon planning, primarily due to compounding errors when simulating future states. To address this, we propose ProAct, a framework that enables agents to internalize accurate lookahead reasoning through a two-stage training paradigm. First, we introduce Grounded LookAhead Distillation (GLAD), where the agent undergoes supervised fine-tuning on trajectories derived from environment-based search. By compressing complex search trees into concise, causal reasoning chains, the agent learns the logic of foresight without the computational overhead of inference-time search. Second, to further refine decision accuracy, we propose the Monte-Carlo Critic (MC-Critic), a plug-and-play auxiliary value estimator designed to enhance policy-gradient algorithms like PPO and GRPO. By leveraging lightweight environment rollouts to calibrate value estimates, MC-Critic provides a low-variance signal that facilitates stable policy optimization without relying on expensive model-based value approximation. Experiments on both stochastic (e.g., 2048) and deterministic (e.g., Sokoban) environments demonstrate that ProAct significantly improves planning accuracy. Notably, a 4B parameter model trained with ProAct outperforms all open-source baselines and rivals state-of-the-art closed-source models, while demonstrating robust generalization to unseen environments. The codes and models are available at https://github.com/GreatX3/ProAct
- Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
High-quality kernel is critical for scalable AI systems, and enabling LLMs to generate such code would advance AI development. However, training LLMs for this task requires sufficient data, a robust environment, and the process is often vulnerable to reward hacking and lazy optimization. In these cases, models may hack training rewards and prioritize trivial correctness over meaningful speedup. In this paper, we systematically study reinforcement learning (RL) for kernel generation. We first design KernelGYM, a robust distributed GPU environment that supports reward hacking check, data collection from multi-turn interactions and long-term RL training. Building on KernelGYM, we investigate effective multi-turn RL methods and identify a biased policy gradient issue caused by self-inclusion in GRPO. To solve this, we propose Turn-level Reinforce-Leave-One-Out (TRLOO) to provide unbiased advantage estimation for multi-turn RL. To alleviate lazy optimization, we incorporate mismatch correction for training stability and introduce Profiling-based Rewards (PR) and Profiling-based Rejection Sampling (PRS) to overcome the issue. The trained model, Dr.Kernel-14B, reaches performance competitive with Claude-4.5-Sonnet in Kernelbench. Finally, we study sequential test-time scaling for Dr.Kernel-14B. On the KernelBench Level-2 subset, 31.6% of the generated kernels achieve at least a 1.2x speedup over the Torch reference, surpassing Claude-4.5-Sonnet (26.7%) and GPT-5 (28.6%). When selecting the best candidate across all turns, this 1.2x speedup rate further increases to 47.8%. All resources, including environment, training code, models, and dataset, are included in https://www.github.com/hkust-nlp/KernelGYM.
- Steering LLMs via Scalable Interactive Oversight
As Large Language Models increasingly automate complex, long-horizon tasks such as vibe coding, a supervision gap has emerged. While models excel at execution, users often struggle to guide them effectively due to insufficient domain expertise, the difficulty of articulating precise intent, and the inability to reliably validate complex outputs. It presents a critical challenge in scalable oversight: enabling humans to responsibly steer AI systems on tasks that surpass their own ability to specify or verify. To tackle this, we propose Scalable Interactive Oversight, a framework that decomposes complex intent into a recursive tree of manageable decisions to amplify human supervision. Rather than relying on open-ended prompting, our system elicits low-burden feedback at each node and recursively aggregates these signals into precise global guidance. Validated in web development task, our framework enables non-experts to produce expert-level Product Requirement Documents, achieving a 54\% improvement in alignment. Crucially, we demonstrate that this framework can be optimized via Reinforcement Learning using only online user feedback, offering a practical pathway for maintaining human control as AI scales.
- Semantic Search over 9 Million Mathematical Theorems
Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of 9.2 million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at https://huggingface.co/spaces/uw-math-ai/theorem-search{this link}, and the dataset is available at https://huggingface.co/datasets/uw-math-ai/TheoremSearch{this link}.
- Privileged Information Distillation for Language Models
Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, where closed-source systems typically hide their internal reasoning and expose only action trajectories. This breaks standard distillation pipelines, since successful behavior is observable but the reasoning process is not. For this, we introduce π-Distill, a joint teacher-student objective that trains a PI-conditioned teacher and an unconditioned student simultaneously using the same model. Additionally, we also introduce On-Policy Self-Distillation (OPSD), an alternative approach that trains using Reinforcement Learning (RL) with a reverse KL-penalty between the student and the PI-conditioned teacher. We show that both of these algorithms effectively distill frontier agents using action-only PI. Specifically we find that π-Distill and in some cases OPSD, outperform industry standard practices (Supervised finetuning followed by RL) that assume access to full Chain-of-Thought supervision across multiple agentic benchmarks, models, and forms of PI. We complement our results with extensive analysis that characterizes the factors enabling effective learning with PI, focusing primarily on π-Distill and characterizing when OPSD is competitive.
- Grounding and Enhancing Informativeness and Utility in Dataset Distillation
Dataset Distillation (DD) seeks to create a compact dataset from a large, real-world dataset. While recent methods often rely on heuristic approaches to balance efficiency and quality, the fundamental relationship between original and synthetic data remains underexplored. This paper revisits knowledge distillation-based dataset distillation within a solid theoretical framework. We introduce the concepts of Informativeness and Utility, capturing crucial information within a sample and essential samples in the training set, respectively. Building on these principles, we define optimal dataset distillation mathematically. We then present InfoUtil, a framework that balances informativeness and utility in synthesizing the distilled dataset. InfoUtil incorporates two key components: (1) game-theoretic informativeness maximization using Shapley Value attribution to extract key information from samples, and (2) principled utility maximization by selecting globally influential samples based on Gradient Norm. These components ensure that the distilled dataset is both informative and utility-optimized. Experiments demonstrate that our method achieves a 6.1\% performance improvement over the previous state-of-the-art approach on ImageNet-1K dataset using ResNet-18.
- SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers
Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present SocialVeil, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded in a systematic literature review of communication challenges in human interaction, SocialVeil introduces three representative types of such disruption, semantic vagueness, sociocultural mismatch, and emotional interference. We also introduce two barrier-aware evaluation metrics, unresolved confusion and mutual understanding, to evaluate interaction quality under impaired communication. Experiments across 720 scenarios and four frontier LLMs show that barriers consistently impair performance, with mutual understanding reduced by over 45\% on average, and confusion elevated by nearly 50\%. Human evaluations validate the fidelity of these simulated barriers (ICCapprox0.78, Pearson rapprox0.80). We further demonstrate that adaptation strategies (Repair Instruction and Interactive learning) only have a modest effect far from barrier-free performance. This work takes a step toward bringing social interaction environments closer to real-world communication, opening opportunities for exploring the social intelligence of LLM agents.
- Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities
Despite strong performance on existing benchmarks, it remains unclear whether large language models can reason over genuinely novel scientific information. Most evaluations score end-to-end RAG pipelines, where reasoning is confounded with retrieval and toolchain choices, and the signal is further contaminated by parametric memorization and open-web volatility. We introduce DeR2, a controlled deep-research sandbox that isolates document-grounded reasoning while preserving core difficulties of deep search: multi-step synthesis, denoising, and evidence-based conclusion making. DeR2 decouples evidence access from reasoning via four regimes--Instruction-only, Concepts (gold concepts without documents), Related-only (only relevant documents), and Full-set (relevant documents plus topically related distractors)--yielding interpretable regime gaps that operationalize retrieval loss vs. reasoning loss and enable fine-grained error attribution. To prevent parametric leakage, we apply a two-phase validation that requires parametric failure without evidence while ensuring oracle-concept solvability. To ensure reproducibility, each instance provides a frozen document library (drawn from 2023-2025 theoretical papers) with expert-annotated concepts and validated rationales. Experiments across a diverse set of state-of-the-art foundation models reveal substantial variation and significant headroom: some models exhibit mode-switch fragility, performing worse with the Full-set than with Instruction-only, while others show structural concept misuse, correctly naming concepts but failing to execute them as procedures.
Solidot(15)
- 科学家在 100 公里光纤上演示了设备无关的量子密钥分发
中科大的研究人员在《科学》期刊上报告通过长达 100 公里的光纤演示了与设备无关的量子密钥分发。研究结果表明,这种方法可在都市规模保障加密通信安全——这一传输距离远超以往结果——它将帮助缩小原理验证量子网络实验与实际应用之间的差距。量子密钥分发(QKD)是量子技术应用的前沿领域,它能实现格外安全的数字通信。早期形式的 QKD 是通过用可信设备来确保安全性,但它们存在技术限制和漏洞。一种更先进的方法是与器件无关的量子密钥分发(DI-QKD),后者的安全性直接源于量子基本现象,而无需信任量子设备的内在工作机制。
- HBO 制作《博德之门》电视剧
HBO 将制作《博德之门》电视剧,《最后生还者(The Last of Us)》的制作人 Craig Mazin 担任新剧的创作者、编剧和执行制作人,《博德之门》版权所有者 Wizards of the Coast 的前故事总监 Chris Perkins 担任顾问。电视剧将讲述《博德之门3》后发生的故事。Mazin 计划邀请《博德之门3》的声优参与剧集制作,类似他在《最后生还者》中的做法。目前不清楚《博德之门3》开发商 Larian Studios 是否会参与制作,它正在开发《Divinity》系列的新作。Mazin 称他在《博德之门3》中投入了 1000 个小时,能延续其故事是梦想成真。
- 日本电视市场中国厂商占六成
根据调研公司的数据,2025 年日本国内电视机市场份额海信控制 95% 股份的 REGZA 位居首位,海信和 TCL 合占五成。如果索尼品牌转移到 TCL 主导的合资公司,中国系将占到 6 成。在世界电视市场,三星占据榜首,三星、LG 电子、海信和 TCL 四家企业掌握了全球市场份额的一半以上。生产电视的日本大型企业只剩下松下,而松下也在剥离其电视业务,它的低价产品已由 TCL 代工。日本企业在规模和供应链方面处于劣势,很难以硬件为起点开展家电业务。
- 农药总体毒性仍然呈上升趋势
第 15 届联合国生物多样性大会(COP15)设定了到 2030 年时将农药使用量和相关风险减半的目标,鼓励使用有机农药和低毒性农药。但根据发表在《科学》期刊上的最新研究,农药总体毒性仍然呈上升趋势,如无改变 2030 年目标可能难以实现。研究结果显示,全球农药所造成的生态总体毒性正在上升;这一增长趋势在多个国家、多种作物及各类物种群体中均有显现。总体而言,总施用毒性主要由少数剧毒化学品主导,其中果蔬、玉米、大豆、谷物和稻米所用农药毒性占了全球农药毒性的 76-83%。中国、巴西、美国和印度合计贡献了全球总施用毒性的 53-68%。
- 更多 Android 设备将支持苹果的 AirDrop
Google 去年出人意料的实现了 Quick Share 和苹果 AirDrop 的互操作,但目前仅有 Pixel 10 系列 Android 手机支持该功能。Google Android 平台工程副总裁 Eric Kay 表示 2026 年会有更多 Android 设备支持该功能,他表示正与合作伙伴一起将对 AirDrop 的支持扩展到整个 Android 生态系统。目前 Android 厂商中只有 Nothing 确认正致力于支持该功能。Kay 还表示 Google 正加倍努力让苹果 iPhone 用户更容易切换到 Android。
- 欧盟委员会测试用 Matrix 替代 Teams
欧盟委员会测试用 Matrix 替代微软的 Teams。Matrix 是一个端对端加密、去中心化的即时通讯系统,它没有中心服务器,通过网桥与其它平台互通,该项目由位于伦敦的非营利组织维护。法国政府、德国医疗机构和欧洲军方都在使用基于 Matrix 协议构建的通信工具。欧盟正在加强数字主权,减少对美国科技公司的依赖。欧盟委员会广泛使用 Teams,目前没有计划用 Matrix 取代 Teams,而是将其作为一种备用方案,另一个备用方案是 Signal,但它被认为不够灵活。
- Substack 警告用户数据泄漏
Substack 通知用户数据泄漏。数据泄漏事件发生在 2025 年 10 月,但 Substack 直到本周才发现。CEO Chris Best 表示,未经授权的第三方访问了部分用户数据,包括邮箱地址、电话号码和其他内部元数据,信用卡号、密码和财务信息未被访问。Substack 未透露有多少用户受到影响。本周一有黑客在 BreachForums 论坛上泄露了一个 Substack 数据库,包含 697,313 条数据记录。Substack 非常受记者和内容创作者的欢迎,截至 2025 年 3 月有 500 万付费订阅用户。
- CIA 停止出版 World Factbook
CIA 宣布停止出版 World Factbook(世界概况或世界各国纪实年鉴),它没有解释原因,可能与特朗普政府削减政府机构的预算有关。World Factbook 是 CIA 的调查报告,发布世界各国及地区的概况,例如人口、地理、政治及经济等各方面的统计数据。CIA 是在 1975 年首次向公众发表该报告的非机密版本,1997 年起开始有线上版本。报告中的统计数据、地图以及图片等内容的之版权皆属于公有领域,任何人都可以无需 CIA 批准而自由引用或转载,只需注明资料来源即可,因此其数据被记者和学者广泛引用。
- 台积电计划在日本生产 3 纳米芯片
台积电董事长兼首席执行官(CEO)魏哲家表示,考虑在目前在熊本县建设的第二工厂生产日本国内首批 3 纳米制程最尖端半导体。台积电正在熊本县菊阳町建设第二工厂,原计划生产 6 纳米制程半导体,今后将就变更计划展开磋商。毗邻第二工厂用地的第一工厂目前生产 12 至 28 纳米制程半导体,已于 2024 年 12 月启动量产。
- 麦地那龙线虫病接近彻底根除
卡特中心宣布,麦地那龙线虫病正接近根除,根据初步统计数据,2025 年全球感染病例仅 10 例。如果能彻底根除,那么它将是天花之后被人类根除的第二种疾病。麦地那龙线虫(Dracunculus medinensis)是一种通过水传播的寄生线虫。如果人饮用了被麦地那龙线虫污染的水,寄生虫会钻入肠道在人体内移动。感染者起初没有症状。大约一年后,母虫会在下肢的皮肤上形成水疱,大约八周后一条意大利面条长度的虫体会从水疱中钻出。除了剧痛之外,麦地那龙线虫病还会导致继发感染和败血症等并发症,造成暂时性或永久性残疾。麦地那龙线虫根除计划于 1986 年启动,当时非洲和亚洲 21 个国家估计有 350 万例病例,2024 年病例数降为 15 例,2025 年的 10 例分别为:乍得 4 例,埃塞俄比亚 4 例,南苏丹 2 例。要彻底根除还必须消灭动物感染病例,2025 年动物感染病例有数百例:乍得(147 例)、马里(17 例)、喀麦隆(445 例)、安哥拉(70 例)、埃塞俄比亚(1 例)和南苏丹(3例)。
- 微软有个 AI 大问题
Copilot 是微软 AI 战略的核心,但相比 OpenAI 的 ChatGPT 或 Google 的 Gemini,微软面临的一大问题是它无法留住用户,Copilot 的使用体验非常糟糕。微软上周表示它售出了 1500 万个 Microsoft 365 Copilot“席位(seats,即用户)”,而 Microsoft 365 业务的总付费席位逾 4.5 亿,也就是说微软的 Microsoft 365 订户只有 3.3% 会购买 Copilot。该公司去年底表示,其第一方平台上的 Copilot 月活跃用户逾 1.5 亿。Google Gemini 的月活用户数逾 6.5 亿,ChatGPT 周活跃用户约 9 亿。未公开数据显示,Copilot 用户愈来愈倾向于选择竞品。根据市场调研公司 Recon Analytics 对美国超过 15 万名受访者的调查,从去年 7 月到今年 1 月底,将 Copilot 作为首选工具的订户比例从 18.8% 降至 11.5%。选择 Google Gemini 作为首选工具的付费用户比例从 12.8% 升至 15.7%。改用其它工具的前 Copilot 用户称,其它工具质量更好,Copilot 用户表示其使用体验差且使用限制也多。ChatGPT 和 Gemini 用户都比 Copilot 用户有更高的付费意愿。很多公司即使购买了 Copilot 席位,真实使用比例也只有 10%。Copilot 还存在让人困惑的多个不同版本,以及互操作性问题。
- 流媒体时代机电视盒盗版服务再次兴盛
流媒体服务的平台独占导致了用户想看的内容分散在不同服务上,订阅所有服务对大部分用户而言是不经济也不可行的,这种状况导致了一站式盗版流媒体服务再次流行起来。美国这个最大的娱乐市场也涌现了基于电视盒的盗版流媒体服务。盗版流媒体的核心是两款中国公司制造的电视盒———SuperBox 和 vSeeBox。电视盒本身没有任何盗版服务,因此可以合法销售,但它们会引导用户下载官方应用商店没有的盗版流媒体应用。vSeeBox 会引导到 Heat,而 SuperBox 引导到 Blue TV。这些应用允许用户观看 6000-8000 个频道,包括付费体育频道和数百个地方电视台。美国公司正在努力打击这些盗版服务。
- 俄罗斯黑客快速利用微软紧急修复的 Office 高危漏洞
微软在 1 月 26 日释出紧急更新修复 Office 高危漏洞 CVE-2026-21509,不到 48 小时俄罗斯黑客组织就对补丁进行了逆向工程,开始利用该漏洞发动大规模钓鱼攻击,入侵多个国家的外交、海事和交通机构。安全公司 Trellix 的研究人员发现,钓鱼攻击持续了 72 小时,被称为 APT28 aka Fancy Bear、Sednit、Forest Blizzard 和 Sofacy 的黑客组织向主要位于东欧的 9 个国家发送了至少 29 封恶意电邮。被攻击的国家包括了波兰、斯洛文尼亚、土耳其、希腊、阿联酋、乌克兰、罗马尼亚和玻利维亚,目标组织包括国防部(40%)、运输/物流运营商(35%)和外交机构(25%)。攻击者利用尚未修复的漏洞安装了两种新后门程序 BeardShell 或 NotDoor。BeardShell 主要用于侦察,运行在内存中不会在硬盘上留下痕迹,NotDoor 则是监控电子邮件文件夹的 VBA 宏。
- 男性富人大脑奖赏和压力区域的代谢率更高
根据发表在《European Journal of Neuroscience》期刊上的一项研究,韩国釜山国立大学医学院的研究人员分析了参加体检的 233 名健康男性的正电子发射断层成像(PET)数据,他们的平均年龄为 43 岁,家庭年均收入为 61,319 美元,平均受教育年限为 13-14 年。研究人员结合家庭收入和教育水平数据发现,高收入家庭的男性大脑尾状核、壳核、前扣带回、海马和杏仁核区域有更高的葡萄糖代谢。这些大脑区域直接或间接参与了大脑奖赏回路。这是一项相关而非因果研究。教育水平与大脑区域代谢模式无关。
- 微软终于在 Windows 中加入了系统监控工具 Sysmon
微软兑现了承诺,在 Windows 中集成了著名的系统监控工具 Sysmon。Sysmon 是 Sysinternals 套件的一部分,被用于管理和监控 Windows 系统,微软在 2006 年收购了 Sysinternals。直接在操作系统中集成 Sysmon 将简化管理员的工作。该功能已发布在 Windows Insider builds 26300.7733 (Dev channel) 和 26220.7752 (Beta channel)中,默认没有启用,用户需要使用 PowerShell 启用,而在启用前需要先卸载已安装的 Sysmon。