DIGEST · 2026-01-26

OrangeBot.AI Digest — 2026-01-26

49 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. When AI 'builds a browser,' check the repo before believing the hype (www.theregister.com)
  2. JuiceSSH – Give me my pro features back (nproject.io)
  3. Google Books removed all search functions for any books with previews (old.reddit.com)
  4. Fedora Asahi Remix is now working on Apple M3 (bsky.app)
  5. Television is 100 years old today (diamondgeezer.blogspot.com)
  6. France Aiming to Replace Zoom, Google Meet, Microsoft Teams, etc. (twitter.com)
  7. Qwen3-Max-Thinking (qwen.ai)
  8. Google AI Overviews cite YouTube more than any medical site for health queries (www.theguardian.com)
  9. Apple introduces new AirTag with longer range and improved findability (www.apple.com)
  10. After two years of vibecoding, I'm back to writing by hand (atmoio.substack.com)
  11. Vibe coding kills open source (arxiv.org)
  12. MapLibre Tile: a modern and efficient vector tile format (maplibre.org)
  13. The Holy Grail of Linux Binary Compatibility: Musl and Dlopen (github.com)
  14. Things I've learned in my 10 years as an engineering manager (www.jampa.dev)
  15. The browser is the sandbox (simonwillison.net)

GitHub Trending(8)

  1. Blaizzy / mlx-audio

    A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

  2. VectifyAI / PageIndex

    📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

  3. supermemoryai / supermemory

    Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.

  4. block / goose

    an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM

  5. remotion-dev / remotion

    🎥 Make videos programmatically with React

  6. AI4Finance-Foundation / FinRobot

    FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀

  7. k4yt3x / video2x

    A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018.

  8. business-science / ai-data-science-team

    An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

Hugging Face(15)

  1. LongCat-Flash-Thinking-2601 Technical Report

    We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demonstrates strong generalization to complex tool interactions and robust behavior under noisy real-world environments. Its advanced capability stems from a unified training framework that combines domain-parallel expert training with subsequent fusion, together with an end-to-end co-design of data construction, environments, algorithms, and infrastructure spanning from pre-training to post-training. In particular, the model's strong generalization capability in complex tool-use are driven by our in-depth exploration of environment scaling and principled task construction. To optimize long-tailed, skewed generation and multi-turn agentic interactions, and to enable stable training across over 10,000 environments spanning more than 20 domains, we systematically extend our asynchronous reinforcement learning framework, DORA, for stable and efficient large-scale multi-environment training. Furthermore, recognizing that real-world tasks are inherently noisy, we conduct a systematic analysis and decomposition of real-world noise patterns, and design targeted training procedures to explicitly incorporate such imperfections into the training process, resulting in improved robustness for real-world applications. To further enhance performance on complex reasoning tasks, we introduce a Heavy Thinking mode that enables effective test-time scaling by jointly expanding reasoning depth and width through intensive parallel thinking.

  2. SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

    LLM agents have demonstrated remarkable capabilities in software development, but their performance is hampered by long interaction contexts, which incur high API costs and latency. While various context compression approaches such as LongLLMLingua have emerged to tackle this challenge, they typically rely on fixed metrics such as PPL, ignoring the task-specific nature of code understanding. As a result, they frequently disrupt syntactic and logical structure and fail to retain critical implementation details. In this paper, we propose SWE-Pruner, a self-adaptive context pruning framework tailored for coding agents. Drawing inspiration from how human programmers "selectively skim" source code during development and debugging, SWE-Pruner performs task-aware adaptive pruning for long contexts. Given the current task, the agent formulates an explicit goal (e.g., "focus on error handling") as a hint to guide the pruning targets. A lightweight neural skimmer (0.6B parameters) is trained to dynamically select relevant lines from the surrounding context given the goal. Evaluations across four benchmarks and multiple models validate SWE-Pruner's effectiveness in various scenarios, achieving 23-54% token reduction on agent tasks like SWE-Bench Verified and up to 14.84x compression on single-turn tasks like LongCodeQA with minimal performance impact.

  3. TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

    Standard Vision-Language-Action (VLA) models typically fine-tune a monolithic Vision-Language Model (VLM) backbone explicitly for robotic control. However, this approach creates a critical tension between maintaining high-level general semantic understanding and learning low-level, fine-grained sensorimotor skills, often leading to "catastrophic forgetting" of the model's open-world capabilities. To resolve this conflict, we introduce TwinBrainVLA, a novel architecture that coordinates a generalist VLM retaining universal semantic understanding and a specialist VLM dedicated to embodied proprioception for joint robotic control. TwinBrainVLA synergizes a frozen "Left Brain", which retains robust general visual reasoning, with a trainable "Right Brain", specialized for embodied perception, via a novel Asymmetric Mixture-of-Transformers (AsyMoT) mechanism. This design allows the Right Brain to dynamically query semantic knowledge from the frozen Left Brain and fuse it with proprioceptive states, providing rich conditioning for a Flow-Matching Action Expert to generate precise continuous controls. Extensive experiments on SimplerEnv and RoboCasa benchmarks demonstrate that TwinBrainVLA achieves superior manipulation performance compared to state-of-the-art baselines while explicitly preserving the comprehensive visual understanding capabilities of the pre-trained VLM, offering a promising direction for building general-purpose robots that simultaneously achieve high-level semantic understanding and low-level physical dexterity.

  4. VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

    Modern Vision-Language Models (VLMs) remain poorly characterized in multi-step visual interactions, particularly in how they integrate perception, memory, and action over long horizons. We introduce VisGym, a gymnasium of 17 environments for evaluating and training VLMs. The suite spans symbolic puzzles, real-image understanding, navigation, and manipulation, and provides flexible controls over difficulty, input representation, planning horizon, and feedback. We also provide multi-step solvers that generate structured demonstrations, enabling supervised finetuning. Our evaluations show that all frontier models struggle in interactive settings, achieving low success rates in both the easy (46.6%) and hard (26.0%) configurations. Our experiments reveal notable limitations: models struggle to effectively leverage long context, performing worse with an unbounded history than with truncated windows. Furthermore, we find that several text-based symbolic tasks become substantially harder once rendered visually. However, explicit goal observations, textual feedback, and exploratory demonstrations in partially observable or unknown-dynamics settings for supervised finetuning yield consistent gains, highlighting concrete failure modes and pathways for improving multi-step visual decision-making. Code, data, and models can be found at: https://visgym.github.io/.

  5. Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

    Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving. While the majority of existing efforts focus on enhancing policy capabilities via post-training, we propose an alternative paradigm: self-evolving the agent's ability by iteratively verifying the policy model's outputs, guided by meticulously crafted rubrics. This approach gives rise to the inference-time scaling of verification, wherein an agent self-improves by evaluating its generated answers to produce iterative feedback and refinements. We derive the rubrics based on an automatically constructed DRA Failure Taxonomy, which systematically classifies agent failures into five major categories and thirteen sub-categories. We present DeepVerifier, a rubrics-based outcome reward verifier that leverages the asymmetry of verification and outperforms vanilla agent-as-judge and LLM judge baselines by 12%-48% in meta-evaluation F1 score. To enable practical self-evolution, DeepVerifier integrates as a plug-and-play module during test-time inference. The verifier produces detailed rubric-based feedback, which is fed back to the agent for iterative bootstrapping, refining responses without additional training. This test-time scaling delivers 8%-11% accuracy gains on challenging subsets of GAIA and XBench-DeepResearch when powered by capable closed-source LLMs. Finally, to support open-source advancement, we release DeepVerifier-4K, a curated supervised fine-tuning dataset of 4,646 high-quality agent steps focused on DRA verification. These examples emphasize reflection and self-critique, enabling open models to develop robust verification capabilities.

  6. Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

    Recent foundational video-to-video diffusion models have achieved impressive results in editing user provided videos by modifying appearance, motion, or camera movement. However, real-world video editing is often an iterative process, where users refine results across multiple rounds of interaction. In this multi-turn setting, current video editors struggle to maintain cross-consistency across sequential edits. In this work, we tackle, for the first time, the problem of cross-consistency in multi-turn video editing and introduce Memory-V2V, a simple, yet effective framework that augments existing video-to-video models with explicit memory. Given an external cache of previously edited videos, Memory-V2V employs accurate retrieval and dynamic tokenization strategies to condition the current editing step on prior results. To further mitigate redundancy and computational overhead, we propose a learnable token compressor within the DiT backbone that compresses redundant conditioning tokens while preserving essential visual cues, achieving an overall speedup of 30%. We validate Memory-V2V on challenging tasks including video novel view synthesis and text-conditioned long video editing. Extensive experiments show that Memory-V2V produces videos that are significantly more cross-consistent with minimal computational overhead, while maintaining or even improving task-specific performance over state-of-the-art baselines. Project page: https://dohunlee1.github.io/MemoryV2V

  7. Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

    Reinforcement learning (RL) is essential for enhancing the complex reasoning capabilities of large language models (LLMs). However, existing RL training pipelines are computationally inefficient and resource-intensive, with the rollout phase accounting for over 70% of total training time. Quantized RL training, particularly using FP8 precision, offers a promising approach to mitigating this bottleneck. A commonly adopted strategy applies FP8 precision during rollout while retaining BF16 precision for training. In this work, we present the first comprehensive study of FP8 RL training and demonstrate that the widely used BF16-training + FP8-rollout strategy suffers from severe training instability and catastrophic accuracy collapse under long-horizon rollouts and challenging tasks. Our analysis shows that these failures stem from the off-policy nature of the approach, which introduces substantial numerical mismatch between training and inference. Motivated by these observations, we propose Jet-RL, an FP8 RL training framework that enables robust and stable RL optimization. The key idea is to adopt a unified FP8 precision flow for both training and rollout, thereby minimizing numerical discrepancies and eliminating the need for inefficient inter-step calibration. Extensive experiments validate the effectiveness of Jet-RL: our method achieves up to 33% speedup in the rollout phase, up to 41% speedup in the training phase, and a 16% end-to-end speedup over BF16 training, while maintaining stable convergence across all settings and incurring negligible accuracy degradation.

  8. SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

    Diffusion Transformers have recently demonstrated remarkable performance in video generation. However, the long input sequences result in high computational latency due to the quadratic complexity of full attention. Various sparse attention mechanisms have been proposed. Training-free sparse attention is constrained by limited sparsity and thus offers modest acceleration, whereas training-based methods can reach much higher sparsity but demand substantial data and computation for training. In this work, we propose SALAD, introducing a lightweight linear attention branch in parallel with the sparse attention. By incorporating an input-dependent gating mechanism to finely balance the two branches, our method attains 90% sparsity and 1.72x inference speedup, while maintaining generation quality comparable to the full attention baseline. Moreover, our finetuning process is highly efficient, requiring only 2,000 video samples and 1,600 training steps with a batch size of 8.

  9. GameTalk: Training LLMs for Strategic Conversation

    Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce GameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.

  10. Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain

    This paper presents Mecellem models, a framework for developing specialized language models for the Turkish legal domain through domain adaptation strategies. We make two contributions: (1)Encoder Model Pre-trained from Scratch: ModernBERT-based bidirectional encoders pre-trained on a Turkish-dominant corpus of 112.7 billion tokens. We implement a checkpoint selection strategy that evaluates downstream retrieval performance throughout training, revealing that optimal checkpoints achieve best retrieval scores before pre-training loss reaches its minimum. Our encoder models achieve top-3 rankings on the Turkish retrieval leaderboard, with smaller models (155M parameters) achieving comparable performance to larger reference models (307M-567M parameters). Our approach achieves 92.36% production efficiency compared to state-of-the-art models (embeddinggemma-300m: 100.00%, BAAI/bge-m3: 99.54%, newmindai/bge-m3-stsb: 94.38%), ranking fourth overall despite requiring less computational resources. SOTA models rely on multi-stage, computationally intensive training pipelines, making our single-stage pre-training followed by efficient post-training approach a cost-effective alternative; (2)Decoder Model with Continual Pre-training (CPT): Qwen3-1.7B and Qwen3-4B models adapted to Turkish legal domain through controlled curriculum learning. Four-phase CPT with optimal sample ratios enables gradual transition from general language knowledge to specialized legal terminology and long-context reasoning. This approach achieves 36.2% perplexity reduction on Turkish legal text, demonstrating domain adaptation gains.

  11. MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

    Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.

  12. Endless Terminals: Scaling RL Environments for Terminal Agents

    Environments are the bottleneck for self-improving agents. Current terminal benchmarks were built for evaluation, not training; reinforcement learning requires a scalable pipeline, not just a dataset. We introduce Endless Terminals, a fully autonomous pipeline that procedurally generates terminal-use tasks without human annotation. The pipeline has four stages: generating diverse task descriptions, building and validating containerized environments, producing completion tests, and filtering for solvability. From this pipeline we obtain 3255 tasks spanning file operations, log management, data processing, scripting, and database operations. We train agents using vanilla PPO with binary episode level rewards and a minimal interaction loop: no retrieval, multi-agent coordination, or specialized tools. Despite this simplicity, models trained on Endless Terminals show substantial gains: on our held-out dev set, Llama-3.2-3B improves from 4.0% to 18.2%, Qwen2.5-7B from 10.7% to 53.3%, and Qwen3-8B-openthinker-sft from 42.6% to 59.0%. These improvements transfer to human-curated benchmarks: models trained on Endless Terminals show substantial gains on held out human curated benchmarks: on TerminalBench 2.0, Llama-3.2-3B improves from 0.0% to 2.2%, Qwen2.5-7B from 2.2% to 3.4%, and Qwen3-8B-openthinker-sft from 1.1% to 6.7%, in each case outperforming alternative approaches including models with more complex agentic scaffolds. These results demonstrate that simple RL succeeds when environments scale.

  13. ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

    Chart reasoning is a critical capability for Vision Language Models (VLMs). However, the development of open-source models is severely hindered by the lack of high-quality training data. Existing datasets suffer from a dual challenge: synthetic charts are often simplistic and repetitive, while the associated QA pairs are prone to hallucinations and lack the reasoning depth required for complex tasks. To bridge this gap, we propose ChartVerse, a scalable framework designed to synthesize complex charts and reliable reasoning data from scratch. (1) To address the bottleneck of simple patterns, we first introduce Rollout Posterior Entropy (RPE), a novel metric that quantifies chart complexity. Guided by RPE, we develop complexity-aware chart coder to autonomously synthesize diverse, high-complexity charts via executable programs. (2) To guarantee reasoning rigor, we develop truth-anchored inverse QA synthesis. Diverging from standard generation, we adopt an answer-first paradigm: we extract deterministic answers directly from the source code, generate questions conditional on these anchors, and enforce strict consistency verification. To further elevate difficulty and reasoning depth, we filter samples based on model fail-rate and distill high-quality Chain-of-Thought (CoT) reasoning. We curate ChartVerse-SFT-600K and ChartVerse-RL-40K using Qwen3-VL-30B-A3B-Thinking as the teacher. Experimental results demonstrate that ChartVerse-8B achieves state-of-the-art performance, notably surpassing its teacher and rivaling the stronger Qwen3-VL-32B-Thinking.

  14. Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

    Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT) is commonly used to update model knowledge, it often updates factual content without reliably improving the model's ability to use the newly incorporated information for question answering or decision-making. Reinforcement Learning (RL) is essential for acquiring reasoning skills; however, its high computational cost makes it impractical for efficient online adaptation. We empirically observe that the parameter updates induced by SFT and RL are nearly orthogonal. Based on this observation, we propose Parametric Skill Transfer (PaST), a framework that supports modular skill transfer for efficient and effective knowledge adaptation. By extracting a domain-agnostic Skill Vector from a source domain, we can linearly inject knowledge manipulation skills into a target model after it has undergone lightweight SFT on new data. Experiments on knowledge-incorporation QA (SQuAD, LooGLE) and agentic tool-use benchmarks (ToolBench) demonstrate the effectiveness of our method. On SQuAD, PaST outperforms the state-of-the-art self-editing SFT baseline by up to 9.9 points. PaST further scales to long-context QA on LooGLE with an 8.0-point absolute accuracy gain, and improves zero-shot ToolBench success rates by +10.3 points on average with consistent gains across tool categories, indicating strong scalability and cross-domain transferability of the Skill Vector.

  15. DSGym: A Holistic Framework for Evaluating and Training Data Science Agents

    Data science agents promise to accelerate discovery and insight-generation by turning data into executable analyses and findings. Yet existing data science benchmarks fall short due to fragmented evaluation interfaces that make cross-benchmark comparison difficult, narrow task coverage and a lack of rigorous data grounding. In particular, we show that a substantial portion of tasks in current benchmarks can be solved without using the actual data. To address these limitations, we introduce DSGym, a standardized framework for evaluating and training data science agents in self-contained execution environments. Unlike static benchmarks, DSGym provides a modular architecture that makes it easy to add tasks, agent scaffolds, and tools, positioning it as a live, extensible testbed. We curate DSGym-Tasks, a holistic task suite that standardizes and refines existing benchmarks via quality and shortcut solvability filtering. We further expand coverage with (1) DSBio: expert-derived bioinformatics tasks grounded in literature and (2) DSPredict: challenging prediction tasks spanning domains such as computer vision, molecular prediction, and single-cell perturbation. Beyond evaluation, DSGym enables agent training via execution-verified data synthesis pipeline. As a case study, we build a 2,000-example training set and trained a 4B model in DSGym that outperforms GPT-4o on standardized analysis benchmarks. Overall, DSGym enables rigorous end-to-end measurement of whether agents can plan, implement, and validate data analyses in realistic scientific context.

Solidot(11)

  1. 脸部的伤疤为什么不容易留痕?

    外科医生早就注意到一个令人费解的现象:同样的手术切口,脸上的往往愈合得更好,疤痕更不明显,而身体其他部位则容易留下显著的疤痕。 发表在《细胞》(Cell)上的一项研究揭示了这一现象背后的分子机制。斯坦福大学的研究团队通过小鼠模型发现,面部皮肤之所以具有“无痕愈合”的潜力,是因为其深层的成纤维细胞能抑制疤痕组织的过度生成。基于这一发现研制的药物有望让任何伤口都能不留疤痕地愈合。人体皮肤真皮层的主要细胞类型是成纤维细胞。研究指出,面部和头皮的成纤维细胞起源于胚胎发育早期的“神经嵴”(neural crest),而身体其他部位的成纤维细胞则源自“中胚层”(mesoderm)。这决定了这些细胞在成熟后的行为模式。研究团队发现,源自神经嵴的面部成纤维细胞高表达一种名为 ROBO2 的蛋白及其下游因子 EID1。这一信号通路就像一个分子层面的“抑制器”,它能有效阻断一种名为 EP300 的蛋白质的功能。在身体其他部位的成纤维细胞中,EP300 能够打开 DNA 的折叠结构,让负责制造胶原蛋白等疤痕成分的基因变得活跃。而在面部细胞中,由于 ROBO2 和 EID1 的存在,EP300 受到抑制,DNA 保持在一种相对“沉默”和紧密的状态,使得促纤维化基因无法被轻易读取和表达。这种状态让面部细胞更接近其原始的干细胞形态,从而倾向于再生修复而非填补式的疤痕修复。研究人员指出,这在进化上是非常合理的。对于躯干上的伤口,生物体的首要任务是活下去,因此需要快速封闭伤口以防失血过多或感染,哪怕代价是形成功能较差的疤痕组织。然而,面部承载着视觉、听觉、嗅觉和进食等关键功能,如果形成僵硬的疤痕,将严重影响生物的基本生存能力,因此进化赋予了面部更精细的再生能力。

  2. Spotify 诉讼导致安娜的档案主域名被封

    根据解封的法庭文件,12 月 29 日 Spotify 与环球唱片(UMG)、索尼、华纳等唱片公司向纽约南区地方法院提起诉讼,指控安娜的档案(Anna’s Archive)大规模侵权。法庭于 1 月 2 日发布了一项临时限制令,限制托管安娜的档案网站和为其提供域名服务,法庭的命令导致了安娜的档案的 .org 和 .se 域名先后被封。此事发生在 安娜的档案发布 300TB 抓取的 Spotify 音乐文件和元数据之后,安娜的档案背后的运营者一度以为其域名被封与 Spotify 无关,但法庭文件显示并非如此。

  3. 伊朗正建立一个分级制的互联网

    伊朗的互联网目前处于严格的白名单模式下,绝大多数人都无法访问国际互联网。该系统被称为 Barracks Internet。这一网络分级制可能至少始于 2013 年,伊朗向约 1.6 万人提供了白色 SIM 卡,允许他们不受限制的访问国际互联网。2025 年 11 月 X/Twitter 的位置功能显示,包括通信信息技术部长在内的官员被发现是直接从伊朗境内访问该平台的,而 X/Twitter 早在 2009 年就被封锁了。自 2022 年拜登政府将星链服务排除在制裁外以来,活动人士估计已向伊朗走私了 5 万个星链卫星终端。伊朗政府称它切断了 4 万个星链接入,干扰了部分终端,但一部分终端在固件更新后绕过了封锁,还能正常运行。尽管如此,卫星宽带技术容易受到信号干扰,意味着伊朗政府掌握着最终的制衡力量。

  4. 伊朗断网 17 天

    根据 NetBlocks 的监测,伊朗断网 17 天超过 400 小时。过去几天伊朗的网络处于严格的白名单模式,只有极少数人可以访问国际互联网,流量在短时间会出现峰值,但对绝大多数伊朗人而言,国际网络遥不可及而且非常不稳定。路由跟踪显示,访问伊朗网站的流量会路由经过俄罗斯 (AS8631) 和阿塞拜疆 (AS29049)的自治系统。

  5. 预期负面结果比正面结果带来的情绪冲击力更大

    一项研究发现,预期未来负面结果比正面结果带来的情绪冲击力更大,这有助于解释为什么人们倾向于避开不确定性,尽快做出决策。研究人员发现,畏惧损失带来的情绪影响六倍于预期同等收益带来的愉悦。英国研究人员分析了 1991-2024 年间近 14000 名参与者的数据,追踪对未来财务状况的预期所产生的情绪反应,以及预期情绪如何影响风险和延迟相关的决策。研究证实,损失带来的冲击力大于收益。简而言之,预期损失 10 英镑带来的痛苦远比享受获得 10 英镑的喜悦强烈得多。研究人员发现,个人之间的情绪反应存在差异,部分人群对预期结果的情绪比其他人更强烈。

  6. AI 公司高管对 AGI 看法迥异

    Google DeepMind CEO、诺贝尔奖得主、负责开发 Google Gemini 大模型的 Demis Hassabis 以及图灵奖得主 Yann LeCun 都认为大模型虽然备受瞩目,但并非通往 AGI(人类通用智能)之路。Hassabis 表示今天的 AI 虽然令人印象深刻,但距离 AGI 还很遥远。他预测 AGI 在十年内实现的概率是 50%。Yann LeCun 则更悲观,认为今天基于大模型的 AI 永远也无法实现 AGI,需要完全不同的方法。他认为大型语言模型之所以成功是因为语言简单。Anthropic CEO Dario Amodei 则要乐观的多,认为 AI 模型能在一年内取代所有程序员的工作,两年内实现诺奖级别的研究成果,五年内五成白领工作将消失。OpenAI CEO Sam Altman 此前表达过类似观点。对大部分企业领袖而言关键问题还是 AI 何时能带来巨大经济价值。

  7. 微软释出第二个紧急更新修复第一个紧急更新导致的问题

    微软本月早些时候释出的例行安全更新堪称灾难,它带来的问题可能比解决的问题还要多,迫使软件巨人在一周内释出了两个紧急更新。在例行安全更新之后,微软于 1 月 17 日释出了一个紧急更新去修复 Windows 11 23H2 无法关机的问题以及无法远程登陆的问题。它在 1 月 24 日释出了第二个紧急更新,修复上一个更新导致 Outlook、OneDrive 和 Dropbox 等云端应用无法运行的问题。

  8. 欧盟去年风能和太阳能首次超过化石燃料

    根据 Ember 的报告,2025 年欧盟风能和太阳能发电量首次超过了化石燃料。这一成就主要受益于太阳能发电量的快速增长,欧盟太阳能发电量占比创下 13% 的纪录,风能和太阳能合计占欧盟发电量的 30%,超过了化石燃料的 29%。这一转变尤为重要,因为欧盟的俄罗斯液化天然气替代方——美国——正变得越来越不可靠,并且越来越愿意将经济工具武器化。逾半数欧盟成员国的风能和太阳能发电量超过了化石燃料。

  9. 利用音爆追踪重返大气层的神舟十五轨道舱

    过去几年,重返地球大气层的空间碎片数量呈指数级增长,不受控的重返大气层事件对人类生命、基础设施和环境的威胁日益加剧。研究人员在《科学》期刊上发表论文,报告了用地面的地震传感器所提供的公开数据探测再入大气层碎片所产生的冲击波(即音爆)的方法。通过对 2024 年 4 月重返地球的神舟十五号轨道舱的再入过程进行监测,研究人员验证了他们的方法;该轨道舱此前处于轨道衰减状态,并会定期飞越六大洲的主要人口密集区的上空。通过利用来自南加州和内华达州传感器的地震数据,研究人员对神舟十五号重返大气层时产生的音爆进行了分析。神舟十五号最终被观察到的再入点与追踪及撞击预测的估算位置相差约 8600 公里。研究人员成功推算出了该航天器的地面轨迹、速度和高度。此外音爆模式显示,神舟十五号并非在单次爆炸事件中坠落,而是可能逐渐碎裂成较小的碎片。这与目击者的报告和视频片段相符。

  10. 美国退出后加州加入 WHO 疾病预警网络

    美国正式退出 WHO 的第二天,加州加入了 WHO 的全球疾病暴发预警和应对网络(Global Outbreak Alert and Response Network,GOARN),成为第一个重新加入该组织的美国州。加州州长 Gavin Newsom 在一份声明中表示,“特朗普政府退出 WHO 是一个鲁莽的决定,将损害所有加州人民和美国人民的利益。加州不会目睹这一决定带来的混乱。我们将继续在全球范围内加强合作,保持在公共卫生准备工作的最前沿...“

  11. 微软向 FBI 提供 BitLocker 密钥解锁硬盘加密数据

    微软最近向 FBI 提供了 BitLocker 密钥去解锁三台笔记本电脑硬盘上的加密数据。Windows 11 默认启用 BitLocker 全盘加密,而密钥会上传到用户的 Microsoft Account,也就是会上传到微软云端。而微软以及执法机构可以访问密钥解密 BitLocker 加密的硬盘。此案与关岛发生的疫情失业援助欺诈相关。FBI 在查获三台使用 BitLocker 加密的笔记本电脑六个月后申请了搜查令。微软未予以置评,它此前曾表示平均每年会收到 20 份提供 BitLocker 密钥的请求。