OrangeBot.AI Digest — 2026-01-18
54 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Police Invested Millions in Shadowy Phone-Tracking Software Won't Say How Used (www.texasobserver.org)
- Flux 2 Klein pure C inference (github.com)
- Gaussian Splatting – A$AP Rocky "Helicopter" music video (radiancefields.com)
- Statement by Denmark, Finland, France, Germany, Netherlands, Norway, Sweden, UK (www.bundesregierung.de)
- A Social Filesystem (overreacted.io)
- The Nobel Prize and the Laureate Are Inseparable (www.nobelpeaceprize.org)
- Statement by Denmark, Finland, France, Germany, the Netherlands,Norway,Sweden,UK (www.presidentti.fi)
- Predicting OpenAI's ad strategy (ossa-ma.github.io)
- What is Plan 9? (fqa.9front.org)
- A free and open-source rootkit for Linux (lwn.net)
- Command-line Tools can be 235x Faster than your Hadoop Cluster (2014) (adamdrake.com)
- Consent-O-Matic (github.com)
- Iconify: Library of Open Source Icons (icon-sets.iconify.design)
- ThinkNext Design (thinknextdesign.com)
- jQuery 4 (blog.jquery.com)
GitHub Trending(9)
- iOfficeAI / AionUi
Free, local, open-source Cowork for Gemini CLI, Claude Code, Codex, Opencode, Qwen Code, Goose Cli, Auggie, and more | 🌟 Star if you like it!
- yt-dlp / yt-dlp
A feature-rich command-line audio/video downloader
- nautechsystems / nautilus_trader
A high-performance algorithmic trading platform and event-driven backtester
- google / langextract
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
- OpenBMB / VoxCPM
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
- yichuan-w / LEANN
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
- Flowseal / zapret-discord-youtube
- tobi / try
fresh directories for every vibe
- Mebus / cupp
Common User Passwords Profiler (CUPP)
Hugging Face(15)
- STEP3-VL-10B Technical Report
We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10times-20times larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL. Delivering best-in-class performance, it records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision. We release the full model suite to provide the community with a powerful, efficient, and reproducible baseline.
- Urban Socio-Semantic Segmentation with Vision-Language Reasoning
As hubs of human activity, urban surfaces consist of a wealth of semantic entities. Segmenting these various entities from satellite imagery is crucial for a range of downstream applications. Current advanced segmentation models can reliably segment entities defined by physical attributes (e.g., buildings, water bodies) but still struggle with socially defined categories (e.g., schools, parks). In this work, we achieve socio-semantic segmentation by vision-language model reasoning. To facilitate this, we introduce the Urban Socio-Semantic Segmentation dataset named SocioSeg, a new resource comprising satellite imagery, digital maps, and pixel-level labels of social semantic entities organized in a hierarchical structure. Additionally, we propose a novel vision-language reasoning framework called SocioReasoner that simulates the human process of identifying and annotating social semantic entities via cross-modal recognition and multi-stage reasoning. We employ reinforcement learning to optimize this non-differentiable process and elicit the reasoning capabilities of the vision-language model. Experiments demonstrate our approach's gains over state-of-the-art models and strong zero-shot generalization. Our dataset and code are available in https://github.com/AMAP-ML/SocioReasoner.
- Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: policies prematurely concentrate on a small set of dominant reasoning patterns, improving pass@1 while limiting rollout-level diversity and gains in pass@k. We argue that this failure stems from regularizing local token behavior rather than diversity over sets of solutions. To address this, we propose Uniqueness-Aware Reinforcement Learning, a rollout-level objective that explicitly rewards correct solutions that exhibit rare high-level strategies. Our method uses an LLM-based judge to cluster rollouts for the same problem according to their high-level solution strategies, ignoring superficial variations, and reweights policy advantages inversely with cluster size. As a result, correct but novel strategies receive higher rewards than redundant ones. Across mathematics, physics, and medical reasoning benchmarks, our approach consistently improves pass@k across large sampling budgets and increases the area under the pass@k curve (AUC@K) without sacrificing pass@1, while sustaining exploration and uncovering more diverse solution strategies at scale.
- Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce Multi-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.
- VIBE: Visual Instruction Based Editor
Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside highly capable commercial systems. However, only a limited number of open-source approaches currently achieve real-world quality. In addition, diffusion backbones, the dominant choice for these pipelines, are often large and computationally expensive for many deployments and research settings, with widely used variants typically containing 6B to 20B parameters. This paper presents a compact, high-throughput instruction-based image editing pipeline that uses a modern 2B-parameter Qwen3-VL model to guide the editing process and the 1.6B-parameter diffusion model Sana1.5 for image generation. Our design decisions across architecture, data processing, training configuration, and evaluation target low-cost inference and strict source consistency while maintaining high quality across the major edit categories feasible at this scale. Evaluated on the ImgEdit and GEdit benchmarks, the proposed method matches or exceeds the performance of substantially heavier baselines, including models with several times as many parameters and higher inference cost, and is particularly strong on edits that require preserving the input image, such as an attribute adjustment, object removal, background edits, and targeted replacement. The model fits within 24 GB of GPU memory and generates edited images at up to 2K resolution in approximately 4 seconds on an NVIDIA H100 in BF16, without additional inference optimizations or distillation.
- Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning
The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, pre-defined tool libraries, a paradigm that fundamentally fails in scientific domains where tools are sparse, heterogeneous, and intrinsically incomplete. In this paper, we propose Test-Time Tool Evolution (TTE), a new paradigm that enables agents to synthesize, verify, and evolve executable tools during inference. By transforming tools from fixed resources into problem-driven artifacts, TTE overcomes the rigidity and long-tail limitations of static tool libraries. To facilitate rigorous evaluation, we introduce SciEvo, a benchmark comprising 1,590 scientific reasoning tasks supported by 925 automatically evolved tools. Extensive experiments show that TTE achieves state-of-the-art performance in both accuracy and tool efficiency, while enabling effective cross-domain adaptation of computational tools. The code and benchmark have been released at https://github.com/lujiaxuan0520/Test-Time-Tool-Evol.
- Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.
- DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
Vision-Language Pre-training (VLP) models demonstrate strong performance across various downstream tasks by learning from large-scale image-text pairs through contrastive pretraining. The release of extensive English image-text datasets (e.g., COYO-700M and LAION-400M) has enabled widespread adoption of models such as CLIP and SigLIP in tasks including cross-modal retrieval and image captioning. However, the advancement of Chinese vision-language pretraining has substantially lagged behind, due to the scarcity of high-quality Chinese image-text data. To address this gap, we develop a comprehensive pipeline for constructing a high-quality Chinese cross-modal dataset. As a result, we propose DanQing, which contains 100 million image-text pairs collected from Common Crawl. Different from existing datasets, DanQing is curated through a more rigorous selection process, yielding superior data quality. Moreover, DanQing is primarily built from 2024-2025 web data, enabling models to better capture evolving semantic trends and thus offering greater practical utility. We compare DanQing with existing datasets by continual pre-training of the SigLIP2 model. Experimental results show that DanQing consistently achieves superior performance across a range of Chinese downstream tasks, including zero-shot classification, cross-modal retrieval, and LMM-based evaluations. To facilitate further research in Chinese vision-language pre-training, we will open-source the DanQing dataset under the Creative Common CC-BY 4.0 license.
- CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
Recent video generation models have revealed the emergence of Chain-of-Frame (CoF) reasoning, enabling frame-by-frame visual inference. With this capability, video models have been successfully applied to various visual tasks (e.g., maze solving, visual puzzles). However, their potential to enhance text-to-image (T2I) generation remains largely unexplored due to the absence of a clearly defined visual reasoning starting point and interpretable intermediate states in the T2I generation process. To bridge this gap, we propose CoF-T2I, a model that integrates CoF reasoning into T2I generation via progressive visual refinement, where intermediate frames act as explicit reasoning steps and the final frame is taken as output. To establish such an explicit generation process, we curate CoF-Evol-Instruct, a dataset of CoF trajectories that model the generation process from semantics to aesthetics. To further improve quality and avoid motion artifacts, we enable independent encoding operation for each frame. Experiments show that CoF-T2I significantly outperforms the base video model and achieves competitive performance on challenging benchmarks, reaching 0.86 on GenEval and 7.468 on Imagine-Bench. These results indicate the substantial promise of video models for advancing high-quality text-to-image generation.
- Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Recent progress in text-to-image (T2I) diffusion models (DMs) has enabled high-quality visual synthesis from diverse textual prompts. Yet, most existing T2I DMs, even those equipped with large language model (LLM)-based text encoders, remain text-pixel mappers -- they employ LLMs merely as text encoders, without leveraging their inherent reasoning capabilities to infer what should be visually depicted given the textual prompt. To move beyond such literal generation, we propose the think-then-generate (T2G) paradigm, where the LLM-based text encoder is encouraged to reason about and rewrite raw user prompts; the states of the rewritten prompts then serve as diffusion conditioning. To achieve this, we first activate the think-then-rewrite pattern of the LLM encoder with a lightweight supervised fine-tuning process. Subsequently, the LLM encoder and diffusion backbone are co-optimized to ensure faithful reasoning about the context and accurate rendering of the semantics via Dual-GRPO. In particular, the text encoder is reinforced using image-grounded rewards to infer and recall world knowledge, while the diffusion backbone is pushed to produce semantically consistent and visually coherent images. Experiments show substantial improvements in factual consistency, semantic alignment, and visual realism across reasoning-based image generation and editing benchmarks, achieving 0.79 on WISE score, nearly on par with GPT-4. Our results constitute a promising step toward next-generation unified models with reasoning, expression, and demonstration capacities.
- Transition Matching Distillation for Fast Video Generation
Large video diffusion and flow models have achieved remarkable success in high-quality video generation, but their use in real-time interactive applications remains limited due to their inefficient multi-step sampling process. In this work, we present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators. The central idea of TMD is to match the multi-step denoising trajectory of a diffusion model with a few-step probability transition process, where each transition is modeled as a lightweight conditional flow. To enable efficient distillation, we decompose the original diffusion backbone into two components: (1) a main backbone, comprising the majority of early layers, that extracts semantic representations at each outer transition step; and (2) a flow head, consisting of the last few layers, that leverages these representations to perform multiple inner flow updates. Given a pretrained video diffusion model, we first introduce a flow head to the model, and adapt it into a conditional flow map. We then apply distribution matching distillation to the student model with flow head rollout in each transition step. Extensive experiments on distilling Wan2.1 1.3B and 14B text-to-video models demonstrate that TMD provides a flexible and strong trade-off between generation speed and visual quality. In particular, TMD outperforms existing distilled models under comparable inference costs in terms of visual fidelity and prompt adherence. Project page: https://research.nvidia.com/labs/genair/tmd
- Alterbute: Editing Intrinsic Attributes of Objects in Images
We introduce Alterbute, a diffusion-based method for editing an object's intrinsic attributes in an image. We allow changing color, texture, material, and even the shape of an object, while preserving its perceived identity and scene context. Existing approaches either rely on unsupervised priors that often fail to preserve identity or use overly restrictive supervision that prevents meaningful intrinsic variations. Our method relies on: (i) a relaxed training objective that allows the model to change both intrinsic and extrinsic attributes conditioned on an identity reference image, a textual prompt describing the target intrinsic attributes, and a background image and object mask defining the extrinsic context. At inference, we restrict extrinsic changes by reusing the original background and object mask, thereby ensuring that only the desired intrinsic attributes are altered; (ii) Visual Named Entities (VNEs) - fine-grained visual identity categories (e.g., ''Porsche 911 Carrera'') that group objects sharing identity-defining features while allowing variation in intrinsic attributes. We use a vision-language model to automatically extract VNE labels and intrinsic attribute descriptions from a large public image dataset, enabling scalable, identity-preserving supervision. Alterbute outperforms existing methods on identity-preserving object intrinsic attribute editing.
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks. Monitoring step-level tool invocation behaviors in real time and proactively intervening before unsafe execution is critical for agent deployment, yet remains under-explored. In this work, we first construct TS-Bench, a novel benchmark for step-level tool invocation safety detection in LLM agents. We then develop a guardrail model, TS-Guard, using multi-task reinforcement learning. The model proactively detects unsafe tool invocation actions before execution by reasoning over the interaction history. It assesses request harmfulness and action-attack correlations, producing interpretable and generalizable safety judgments and feedback. Furthermore, we introduce TS-Flow, a guardrail-feedback-driven reasoning framework for LLM agents, which reduces harmful tool invocations of ReAct-style agents by 65 percent on average and improves benign task completion by approximately 10 percent under prompt injection attacks.
- Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
Today's strongest video-language models (VLMs) remain proprietary. The strongest open-weight models either rely on synthetic data from proprietary VLMs, effectively distilling from them, or do not disclose their training data or recipe. As a result, the open-source community lacks the foundations needed to improve on the state-of-the-art video (and image) language models. Crucially, many downstream applications require more than just high-level video understanding; they require grounding -- either by pointing or by tracking in pixels. Even proprietary models lack this capability. We present Molmo2, a new family of VLMs that are state-of-the-art among open-source models and demonstrate exceptional new capabilities in point-driven grounding in single image, multi-image, and video tasks. Our key contribution is a collection of 7 new video datasets and 2 multi-image datasets, including a dataset of highly detailed video captions for pre-training, a free-form video Q&A dataset for fine-tuning, a new object tracking dataset with complex queries, and an innovative new video pointing dataset, all collected without the use of closed VLMs. We also present a training recipe for this data utilizing an efficient packing and message-tree encoding scheme, and show bi-directional attention on vision tokens and a novel token-weight strategy improves performance. Our best-in-class 8B model outperforms others in the class of open weight and data models on short videos, counting, and captioning, and is competitive on long-videos. On video-grounding Molmo2 significantly outperforms existing open-weight models like Qwen3-VL (35.5 vs 29.6 accuracy on video counting) and surpasses proprietary models like Gemini 3 Pro on some tasks (38.4 vs 20.0 F1 on video pointing and 56.2 vs 41.1 J&F on video tracking).
- MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local step precision with global task success, we introduce a dual-level advantage estimation scheme that integrates turn-level and trajectory-level signals, assigning distinct advantage values to individual interaction turns. Extensive experiments on three benchmarks demonstrate the superiority of MatchTIR. Notably, our 4B model surpasses the majority of 8B competitors, particularly in long-horizon and multi-turn tasks. Our codes are available at https://github.com/quchangle1/MatchTIR.
Solidot(15)
- 太阳能满足美国 2025 年六成新增电力需求
能源智库 Ember 的分析显示,美国 2025 年电力需求激增了 135TWh,而太阳能发电量同期创记录增长 83TWh,太阳能满足了美国 61% 的新增用电需求。这凸显了太阳能已成为美国电力的核心组成部分。德克萨斯州、中西部和中大西洋地区去年太阳能发电量增长最快,它们也是电力需求增长最快的地区。太阳能满足了德州和中西部 81% 的新增电力需求,中大西洋则是 33%。
- 腾讯向逾 30 个 GitHub 微信相关项目发出 DMCA 通知
腾讯的微信以体积随时间不断膨胀著称,这促使很多开发者对微信进行了分析,其中一部分人将他们的分析方法或相关清理工具发布在代码托管平台 GitHub 上。但本月初,腾讯法务向超过 30 个 GitHub 项目发出了 DMCA 通知,导致这些项目被迫下架。腾讯法务指控开发者们违反了 DMCA 绕过技术保护措施的条款、违反了微信禁止逆向工程条款、威胁用户隐私和安全,以及侵犯知识产权。
- 微软最新安全更新导致部分 Windows 11 PC 无法正常关机
根据微软官方公告,它的最新安全更新导致部分 Windows 11 23H2 PC 无法正常关机。受影响的 PC 无法进入关机或休眠状态,而是一直处于唤醒状态,顽固的抵抗关机指令。该 bug 与 Secure Launch 相关。Secure Launch 是基于虚拟化的保护功能,旨在确保启动过程中只加载受信任组件。在启用了 Secure Launch 系统上,安装了最新安全更新的 PC 无法执行关机、重启或休眠等操作。微软表示权宜之计是命令行下输入“shutdown /s /t 0”强制关机。目前没有修复方法。
- 伊朗恢复互联网访问
根据 Cloudflare Rader 的监测,伊朗逐步恢复了互联网访问,总断网时间超过了 200 小时。伊朗半官方 Fars 新闻通讯社周六表示,伊朗将逐步解除上月底因经济危机引发抗议活动后实施的互联网和通信限制。伊朗首先恢复了短信服务,然后是恢复国内互联网和国内应用的访问,最后是国际互联网的访问。更新:Netblocks 报告伊朗尚未开放国际互联网。
- Monster Hunter Wilds 性能问题与 DLC 检查相关
日本游戏公司 Capcom 的《怪物猎人》系列新作《Monster Hunter Wilds》自去年初发布起玩家就抱怨性能优化问题。现在玩家在测试后认为优化问题与后台持续检查 DLC 所有权相关。《Monster Hunter Wilds》共有 190 个 DLC,绝大部分都是饰物服装之类虚拟物品。玩家使用了一个禁止检查 DLC 的 Mod,在中端计算机硬件上对比了禁止该 Mod 和启用该 Mod 的帧数,发现禁止 Mod 时帧数仅为 26 FPS,启用后达到 46 FPS。玩家测试发现,用户账号购买/注册的 DLC 越多,性能下降的幅度越小。
- 互联网档案馆的基础设施
互联网档案馆存档了逾万亿个网页,99PB 唯一数据,如果加上备份和冗余则超过 212PB。它是如何做得的?互联网档案馆的核心是被称为 PetaBox 的定制服务器,现成的服务器通常价格昂贵且耗电巨大,PetaBox 设计具有高密度、低成本和低功耗的特点,使用磁盘矩阵(Just a Bunch of Disks)而非昂贵的 RAID 控制器,用软件而非硬件处理数据冗余。第一代 PetaBox 于 2004 年 6 月投入使用,每个机架存储 100 TB 数据,功耗 6 千瓦。第四代 PetaBox 于 2010 年推出,每个机架包含 240 块 2TB 硬盘,使用英特尔至强处理器。最新一代的 PetaBox 每个机架能储存 1.4 PB 数据。它的机房利用周围环境空气进行冷却,没有使用传统的空调,服务器设计能运行在较高的温度上,磁盘产生的余热会被收集和循环利用,冬季能为大楼供暖。
- 德国总理承认淘汰核能是严重的战略错误
德国总理梅尔茨表示,当前德国能源成本居高不下应该归咎于前总理默克尔以及其后的(由社民党、绿党、自民党组成的)联合政府。他说:“如果你们当时真的打算这样做(退出核能),至少应该在三年前让德国最后剩下的几座核电站继续运行,这样至少还能维持当时的发电能力。”2011 年 3 月,日本地震和海啸引发福岛核电站泄漏事故,该事件对德国产生的影响,甚至超过了日本国内。德国决定完全退出核能。2023 年 4 月 15 日,位于巴伐利亚、巴登符腾堡以及下萨克森州的德国最后三座核电站正式退出电网。
- 苹果和英伟达争夺台积电先进芯片产能
当台积电董事长兼总裁魏哲家去年 8 月访问苹果总部时,他给最大客户带去了坏消息。魏哲家告诉苹果高管,该公司需要接受多年来最大幅度的涨价。库克(Tim Cook)及其团队对此措手不及。但这还不是最糟的消息。作为长期以来最大的客户,苹果还面临需要与英伟达等公司争夺台积电先进芯片产能。魏哲家还没有告诉库克的另一个消息是苹果可能不再是台积电最大的客户了。英伟达去年至少有一到两个季度成为台积电最大客户。台积电去年营收增长 36% 达到 1220 亿美元,英伟达去年营收预计将增长 62%,而苹果预计只增长 3.6%。造成这一局面当然是 AI 芯片的巨大需求,而智能手机的需求则趋于平缓。台积电包括 AI 芯片在内的高性能计算销售额去年增长了 48%,而前一年增长 58%。它的智能手机收入仅增长 11%,低于上一年的 23%。
- 全世界三角洲都有地面在下沉
弗吉尼亚理工大学的研究人员分析了欧洲空间局哨兵 1 号卫星雷达 2014-2023 年的数据,确定全球 40 个河流三角洲的下沉速度,其中包括湄公河、密西西比河、亚马孙河、赞比西河、长江和尼罗河等流域的三角洲。 全球有 5 亿人生活在三角洲地区,有10 座人口逾千万的特大城市坐落在这些地区。分析显示 40 个三角洲中,每个三角洲都有逾 1/3 的面积在下沉;而其中 38 个三角洲下沉的面积超过了一半。泰国曼谷所在的湄南河三角洲最严重,该地区沉降速度为每年 8 毫米,是当前全球平均海平面上升速度的两倍,并且该三角洲 94% 的区域正以每年超过5毫米的速度下沉。地面下沉和海平面上升的综合影响,意味着曼谷及湄南河三角洲的海平面正以每年 12.3 毫米的速度上升。埃及的亚历山大港、印度尼西亚的雅加达和泗水等城市也面临快速下沉的威胁。研究人员发现,地下水抽取对地面下沉的总体影响最大。
- 华为手机出货量 2025 年重回第一
根据 IDC 的数据,华为手机出货量 2025 年重回第一。华为 2025 年的手机出货量比 2024 年减少1.9%,为 4670 万部。由于 2024 年位居榜首的 vivo 出货量大幅下降 6.6%,华为逆转升至首位。出货量排在第 2 位的是美国苹果,增长 4% 达到 4620 万部。2025 年 9 月上市的 iPhone17 系列销售强劲。中国 2025 的整体手机出货量减少 0.6%,为 2 亿 8460 万部,2 年来首次低于上年。IDC 预测 2026 年的出货量为 2 亿 7800 万部,持续低于上年。
- 美国公交车站太密集而影响行驶速度
纽约、旧金山等美国城市的公交车行驶速度非常缓慢,只比快走快一点。一项分析发现,原因是车站之间的距离太近,车站密度过大,只需要减少些车站点就能提高公交车速。美国城市车站之间的平均距离为 313 米,每英里有 5 个车站,费城、芝加哥和旧金山等老城市车站距离更短,分别为 214 米、223 米和 248 米。欧洲城市的车站间距通常在 300-450 米之间。公交车在每个车站都会耗费时间:乘客上下车、加速和减速、为轮椅使用者降低高度、错过红绿灯等。公交车大约五分之一的运营时间耗费在停车和起步上,由于人工成本占公交运营成本的大部分,速度慢就意味着运营成本更高。美洲部分城市开始测试扩大公交车站间距,旧金山通过将每英里停靠车站数从六个减少到两个半,车速提高 4.4%-14%。温哥华试点减少了四分之一公交停靠站,平均行程时间缩短了五分钟,单条路线每年节省约 50 万美元。
- 微软关闭员工图书馆削减数字订阅
微软内部曾有传说,员工图书馆内的藏书是如此之重,以至于建筑物都下沉了。然而如今实体图书馆行将消失,而数字订阅也将削减。微软正在转向“AI 驱动的学习体验”。微软是从去年 11 月起开始通知出版商取消订阅,已订阅了 22 年的 Strategic News Service(SNS)不再是微软的订阅服务之一。微软员工称,他们无法再访问 The Information 等数字出版物,无法再从图书馆借阅商业书籍。
- 长期接触低浓度农药缩短野生鱼类寿命
中国研究人员在《科学》期刊上发表一项研究,发现即使在监管框架认为安全的剂量下,长期接触农药毒死蜱(chlorpyrifos)也会加速野生鱼类的生理衰老过程并缩短其寿命。这些发现引发了人们对农药长期低度污染环境影响的担忧。为评估接触低浓度农药会对野生鱼类产生何种影响,武汉研究人员将对中国湖泊中 2 万 4388 条达氏鲌(Culter Dabryi)的野外观测(这些湖泊持续存在低浓度的常见农药毒死蜱)与实验室中的实验相结合,发现来自受农药污染湖泊的鱼类会出现端粒缩短;它们还会呈现以年轻个体为主的被截断的种群结构,表明长期接触低剂量的毒死蜱与生理衰老过程加速及寿命缩短有关。这些发现也在实验室中得到证实。
- 针对 Pixel 9 手机的零点击利用链
Google 安全团队 Project Zero 发表了三篇博客,分析了针对 Pixel 9 手机的零点击漏洞利用链。相关漏洞早在 2025 年 9 月 19 日就已经公开,但 Google 直到 2026 年 1 月 6 日才给手机打上补丁。安全研究员称,智能手机过去几年为用户提供了多项 AI 驱动的功能,但这些 AI 功能带来的一个后果是零点击攻击面的扩大。其中一项功能是音频转录。Google Messages 接收到的短信和 RCS 音频附件无需用户操作就会自动解码。音频解码器现在是零点击攻击面的一部分,针对 Pixel 9 的零点击漏洞利用链就始于音频解码器 Dolby Unified Decoder——它提供了对 AC-3(Dolby Digital)和 EAC-3(Dolby Digital Plus)音频格式的支持。
- 中国应用商店下架“死了么”App
最近一段时间病毒式传播的“死了么”App 在中国区应用商店下架,原因可能与名字有关,开发商最近几天正在征集新中文名。以苹果应用商店为例,中国区已经搜索不到“死了么”,但其它地区仍然可以搜索到,能正常下载,而且在位居付费排行榜前列。