DIGEST · 2025-11-21

OrangeBot.AI Digest — 2025-11-21

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Helping Valve to power up Steam devices (www.igalia.com)
  2. How Cops Are Using Flock's ALPR Network to Surveil Protesters and Activists (www.eff.org)
  3. You can make PS2 games in JavaScript (jslegenddev.substack.com)
  4. Arduino published updated terms and conditions: no longer an open commons (www.molecularist.com)
  5. Show HN: Wealthfolio 2.0- Open source investment tracker. Now Mobile and Docker (wealthfolio.app)
  6. We should all be using dependency cooldowns (blog.yossarian.net)
  7. Making a Small RPG (jslegenddev.substack.com)
  8. How a French judge was digitally cut off by the USA (www.heise.de)
  9. I converted a rotary phone into a meeting handset (www.stavros.io)
  10. FAWK: LLMs can write a language interpreter (martin.janiczek.cz)
  11. HP and Dell disable HEVC support built into their laptops' CPUs (arstechnica.com)
  12. Scientists now know that bees can process time, a first in insects (www.cnn.com)
  13. It's hard to build an oscillator (lcamtuf.substack.com)
  14. Olmo 3: Charting a path through the model flow to lead open-source AI (allenai.org)
  15. WebAssembly from the Ground Up (wasmgroundup.com)

GitHub Trending(15)

  1. sansan0 / TrendRadar

    🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署⭐ 让算法为你服务,用AI理解热点

  2. google / adk-go

    An open-source, code-first Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

  3. TapXWorld / ChinaTextbook

    所有小初高、大学PDF教材。

  4. yeongpin / cursor-free-vip

    [Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please let us know if you believe this is a mistake.

  5. nvm-sh / nvm

    Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

  6. traefik / traefik

    The Cloud Native Application Proxy

  7. HKUDS / LightRAG

    [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

  8. bobeff / open-source-games

    A list of open source games.

  9. volcengine / verl

    verl: Volcano Engine Reinforcement Learning for LLMs

  10. GibsonAI / Memori

    Open-Source Memory Engine for LLMs, AI Agents & Multi-Agent Systems

  11. yangshun / tech-interview-handbook

    Curated coding interview preparation materials for busy software engineers

  12. microsoft / call-center-ai

    Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

  13. MustardChef / WSABuilds

    Run Windows Subsystem For Android on your Windows 10 and Windows 11 PC using prebuilt binaries with Google Play Store (MindTheGapps) and/or Magisk or KernelSU (root solutions) built in.

  14. playcanvas / engine

    Powerful web graphics runtime built on WebGL, WebGPU, WebXR and glTF

  15. iptv-org / iptv

    Collection of publicly available IPTV channels from all over the world

Hugging Face(15)

  1. V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

    Recent progress in generative video models, such as Veo-3, has shown surprising zero-shot reasoning abilities, creating a growing need for systematic and reliable evaluation. We introduce V-ReasonBench, a benchmark designed to assess video reasoning across four key dimensions: structured problem-solving, spatial cognition, pattern-based inference, and physical dynamics. The benchmark is built from both synthetic and real-world image sequences and provides a diverse set of answer-verifiable tasks that are reproducible, scalable, and unambiguous. Evaluations of six state-of-the-art video models reveal clear dimension-wise differences, with strong variation in structured, spatial, pattern-based, and physical reasoning. We further compare video models with strong image models, analyze common hallucination behaviors, and study how video duration affects Chain-of-Frames reasoning. Overall, V-ReasonBench offers a unified and reproducible framework for measuring video reasoning and aims to support the development of models with more reliable, human-aligned reasoning skills.

  2. Step-Audio-R1 Technical Report

    Recent advances in reasoning models have demonstrated remarkable success in text and vision domains through extended chain-of-thought deliberation. However, a perplexing phenomenon persists in audio language models: they consistently perform better with minimal or no reasoning, raising a fundamental question - can audio intelligence truly benefit from deliberate thinking? We introduce Step-Audio-R1, the first audio reasoning model that successfully unlocks reasoning capabilities in the audio domain. Through our proposed Modality-Grounded Reasoning Distillation (MGRD) framework, Step-Audio-R1 learns to generate audio-relevant reasoning chains that genuinely ground themselves in acoustic features rather than hallucinating disconnected deliberations. Our model exhibits strong audio reasoning capabilities, surpassing Gemini 2.5 Pro and achieving performance comparable to the state-of-the-art Gemini 3 Pro across comprehensive audio understanding and reasoning benchmarks spanning speech, environmental sounds, and music. These results demonstrate that reasoning is a transferable capability across modalities when appropriately anchored, transforming extended deliberation from a liability into a powerful asset for audio intelligence. By establishing the first successful audio reasoning model, Step-Audio-R1 opens new pathways toward building truly multimodal reasoning systems that think deeply across all sensory modalities.

  3. First Frame Is the Place to Go for Video Content Customization

    What role does the first frame play in video generation models? Traditionally, it's viewed as the spatial-temporal starting point of a video, merely a seed for subsequent animation. In this work, we reveal a fundamentally different perspective: video models implicitly treat the first frame as a conceptual memory buffer that stores visual entities for later reuse during generation. Leveraging this insight, we show that it's possible to achieve robust and generalized video content customization in diverse scenarios, using only 20-50 training examples without architectural changes or large-scale finetuning. This unveils a powerful, overlooked capability of video generation models for reference-based video customization.

  4. Scaling Spatial Intelligence with Multimodal Foundation Models

    Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to cultivate spatial intelligence within the SenseNova-SI family, built upon established multimodal foundations including visual understanding models (i.e., Qwen3-VL and InternVL3) and unified understanding and generation models (i.e., Bagel). We take a principled approach to constructing high-performing and robust spatial intelligence by systematically curating SenseNova-SI-8M: eight million diverse data samples under a rigorous taxonomy of spatial capabilities. SenseNova-SI demonstrates unprecedented performance across a broad range of spatial intelligence benchmarks: 68.7% on VSI-Bench, 43.3% on MMSI, 85.6% on MindCube, 54.6% on ViewSpatial, and 50.1% on SITE, while maintaining strong general multimodal understanding (e.g., 84.9% on MMBench-En). More importantly, we analyze the impact of data scaling, discuss early signs of emergent generalization capabilities enabled by diverse data training, analyze the risk of overfitting and language shortcuts, present a preliminary study on spatial chain-of-thought reasoning, and validate the potential downstream application. SenseNova-SI is an ongoing project, and this report will be updated continuously. All newly trained multimodal foundation models are publicly released to facilitate further research in this direction.

  5. SAM 3D: 3Dfy Anything in Images

    We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.

  6. Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

    While language models have become impactful in many real-world applications, video generation remains largely confined to entertainment. Motivated by video's inherent capacity to demonstrate physical-world information that is difficult to convey through language alone (e.g., imagine teaching someone to tie a tie using only text), we identify an underutilized opportunity to extend video as a new answer modality for Next-Event Prediction (NEP), formalized as Video-Next-Event Prediction (VNEP). While the established NEP task takes a video with a procedural or predictive question as input to predict the next event in text, VNEP requires dynamic video responses. This shift from telling to showing unlocks more intuitive and customized answers for procedural learning and creative exploration. However, this task remains challenging for existing models, as it demands an understanding of multimodal input, instruction-conditioned reasoning, and the generation of video with visual and semantic consistency. To address this, we introduce VANS, a model that leverages reinforcement learning to align a Vision-Language Model (VLM) with a Video Diffusion Model (VDM) for VNEP. The core of VANS is our proposed Joint-GRPO that orchestrates the VLM and VDM to function as a unit. Driven by a shared reward on their respective output, it optimizes the VLM to produce captions that are both accurate and friendly to visualize, while guiding the VDM to generate videos that are faithful to these captions and the input visual context. To enable this learning, we craft VANS-Data-100K, a dedicated dataset for the VNEP task. Experiments on procedural and predictive benchmarks demonstrate that VANS achieves state-of-the-art performance in both video event prediction and visualization. Codes are released in https://github.com/KlingTeam/VANS.

  7. MiMo-Embodied: X-Embodied Foundation Model Technical Report

    We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.

  8. Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

    Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this process still incurs hundreds of billions of tokens worth of training cost per compressed model. In this paper, we present Nemotron Elastic, a framework for building reasoning-oriented LLMs, including hybrid Mamba-Attention architectures, that embed multiple nested submodels within a single parent model, each optimized for different deployment configurations and budgets. Each of these submodels shares weights with the parent model and can be extracted zero-shot during deployment without additional training or fine-tuning. We enable this functionality through an end-to-end trained router, tightly coupled to a two-stage training curriculum designed specifically for reasoning models. We additionally introduce group-aware SSM elastification that preserves Mamba's structural constraints, heterogeneous MLP elastification, normalized MSE-based layer importance for improved depth selection, and knowledge distillation enabling simultaneous multi-budget optimization. We apply Nemotron Elastic to the Nemotron Nano V2 12B model, simultaneously producing a 9B and a 6B model using only 110B training tokens; this results in over 360x cost reduction compared to training model families from scratch, and around 7x compared to SoTA compression techniques. Each of the nested models performs on par or better than the SoTA in accuracy. Moreover, unlike other compression methods, the nested capability of our approach allows having a many-in-one reasoning model that has constant deployment memory against the number of models in the family.

  9. Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

    Recent advances in visual generation have increasingly explored the integration of reasoning capabilities. They incorporate textual reasoning, i.e., think, either before (as pre-planning) or after (as post-refinement) the generation process, yet they lack on-the-fly multimodal interaction during the generation itself. In this preliminary study, we introduce Thinking-while-Generating (TwiG), the first interleaved framework that enables co-evolving textual reasoning throughout the visual generation process. As visual content is progressively generating, textual reasoning is interleaved to both guide upcoming local regions and reflect on previously synthesized ones. This dynamic interplay produces more context-aware and semantically rich visual outputs. To unveil the potential of this framework, we investigate three candidate strategies, zero-shot prompting, supervised fine-tuning (SFT) on our curated TwiG-50K dataset, and reinforcement learning (RL) via a customized TwiG-GRPO strategy, each offering unique insights into the dynamics of interleaved reasoning. We hope this work inspires further research into interleaving textual reasoning for enhanced visual generation. Code will be released at: https://github.com/ZiyuGuo99/Thinking-while-Generating.

  10. Generalist Foundation Models Are Not Clinical Enough for Hospital Operations

    Hospitals and healthcare systems rely on operational decisions that determine patient flow, cost, and quality of care. Despite strong performance on medical knowledge and conversational benchmarks, foundation models trained on general text may lack the specialized knowledge required for these operational decisions. We introduce Lang1, a family of models (100M-7B parameters) pretrained on a specialized corpus blending 80B clinical tokens from NYU Langone Health's EHRs and 627B tokens from the internet. To rigorously evaluate Lang1 in real-world settings, we developed the REalistic Medical Evaluation (ReMedE), a benchmark derived from 668,331 EHR notes that evaluates five critical tasks: 30-day readmission prediction, 30-day mortality prediction, length of stay, comorbidity coding, and predicting insurance claims denial. In zero-shot settings, both general-purpose and specialized models underperform on four of five tasks (36.6%-71.7% AUROC), with mortality prediction being an exception. After finetuning, Lang1-1B outperforms finetuned generalist models up to 70x larger and zero-shot models up to 671x larger, improving AUROC by 3.64%-6.75% and 1.66%-23.66% respectively. We also observed cross-task scaling with joint finetuning on multiple tasks leading to improvement on other tasks. Lang1-1B effectively transfers to out-of-distribution settings, including other clinical tasks and an external health system. Our findings suggest that predictive capabilities for hospital operations require explicit supervised finetuning, and that this finetuning process is made more efficient by in-domain pretraining on EHR. Our findings support the emerging view that specialized LLMs can compete with generalist models in specialized tasks, and show that effective healthcare systems AI requires the combination of in-domain pretraining, supervised finetuning, and real-world evaluation beyond proxy benchmarks.

  11. TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

    Neural information retrieval systems excel in high-resource languages but remain underexplored for morphologically rich, lower-resource languages such as Turkish. Dense bi-encoders currently dominate Turkish IR, yet late-interaction models -- which retain token-level representations for fine-grained matching -- have not been systematically evaluated. We introduce TurkColBERT, the first comprehensive benchmark comparing dense encoders and late-interaction models for Turkish retrieval. Our two-stage adaptation pipeline fine-tunes English and multilingual encoders on Turkish NLI/STS tasks, then converts them into ColBERT-style retrievers using PyLate trained on MS MARCO-TR. We evaluate 10 models across five Turkish BEIR datasets covering scientific, financial, and argumentative domains. Results show strong parameter efficiency: the 1.0M-parameter colbert-hash-nano-tr is 600times smaller than the 600M turkish-e5-large dense encoder while preserving over 71\% of its average mAP. Late-interaction models that are 3--5times smaller than dense encoders significantly outperform them; ColmmBERT-base-TR yields up to +13.8\% mAP on domain-specific tasks. For production-readiness, we compare indexing algorithms: MUVERA+Rerank is 3.33times faster than PLAID and offers +1.7\% relative mAP gain. This enables low-latency retrieval, with ColmmBERT-base-TR achieving 0.54 ms query times under MUVERA. We release all checkpoints, configs, and evaluation scripts. Limitations include reliance on moderately sized datasets (leq50K documents) and translated benchmarks, which may not fully reflect real-world Turkish retrieval conditions; larger-scale MUVERA evaluations remain necessary.

  12. SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

    Vision-Language-Action (VLA) models excel in robotic manipulation but are constrained by their heavy reliance on expert demonstrations, leading to demonstration bias and limiting performance. Reinforcement learning (RL) is a vital post-training strategy to overcome these limits, yet current VLA-RL methods, including group-based optimization approaches, are crippled by severe reward sparsity. Relying on binary success indicators wastes valuable information in failed trajectories, resulting in low training efficiency. To solve this, we propose Self-Referential Policy Optimization (SRPO), a novel VLA-RL framework. SRPO eliminates the need for external demonstrations or manual reward engineering by leveraging the model's own successful trajectories, generated within the current training batch, as a self-reference. This allows us to assign a progress-wise reward to failed attempts. A core innovation is the use of latent world representations to measure behavioral progress robustly. Instead of relying on raw pixels or requiring domain-specific fine-tuning, we utilize the compressed, transferable encodings from a world model's latent space. These representations naturally capture progress patterns across environments, enabling accurate, generalized trajectory comparison. Empirical evaluations on the LIBERO benchmark demonstrate SRPO's efficiency and effectiveness. Starting from a supervised baseline with 48.9% success, SRPO achieves a new state-of-the-art success rate of 99.2% in just 200 RL steps, representing a 103% relative improvement without any extra supervision. Furthermore, SRPO shows substantial robustness, achieving a 167% performance improvement on the LIBERO-Plus benchmark.

  13. SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

    Surgical video segmentation is crucial for computer-assisted surgery, enabling precise localization and tracking of instruments and tissues. Interactive Video Object Segmentation (iVOS) models such as Segment Anything Model 2 (SAM2) provide prompt-based flexibility beyond methods with predefined categories, but face challenges in surgical scenarios due to the domain gap and limited long-term tracking. To address these limitations, we construct SA-SV, the largest surgical iVOS benchmark with instance-level spatio-temporal annotations (masklets) spanning eight procedure types (61k frames, 1.6k masklets), enabling comprehensive development and evaluation for long-term tracking and zero-shot generalization. Building on SA-SV, we propose SAM2S, a foundation model enhancing SAM2 for Surgical iVOS through: (1) DiveMem, a trainable diverse memory mechanism for robust long-term tracking; (2) temporal semantic learning for instrument understanding; and (3) ambiguity-resilient learning to mitigate annotation inconsistencies across multi-source datasets. Extensive experiments demonstrate that fine-tuning on SA-SV enables substantial performance gains, with SAM2 improving by 12.99 average J\&F over vanilla SAM2. SAM2S further advances performance to 80.42 average J\&F, surpassing vanilla and fine-tuned SAM2 by 17.10 and 4.11 points respectively, while maintaining 68 FPS real-time inference and strong zero-shot generalization. Code and dataset will be released at https://jinlab-imvr.github.io/SAM2S.

  14. TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding

    We introduce TimeViper, a hybrid vision-language model designed to tackle challenges of long video understanding. Processing long videos demands both an efficient model architecture and an effective mechanism for handling extended temporal contexts. To this end, TimeViper adopts a hybrid Mamba-Transformer backbone that combines the efficiency of state-space models with the expressivity of attention mechanisms. Through this hybrid design, we reveal the vision-to-text information aggregation phenomenon, where information progressively flows from vision tokens to text tokens across increasing LLM depth, resulting in severe vision token redundancy. Motivated by this observation, we propose TransV, a token information transfer module that transfers and compresses vision tokens into instruction tokens while maintaining multimodal understanding capabilities. This design enables TimeViper to process hour-long videos exceeding 10,000 frames. Extensive experiments across multiple benchmarks demonstrate that TimeViper competes with state-of-the-art models while extending frame numbers. We further analyze attention behaviors of both Mamba and Transformer layers, offering new insights into hybrid model interpretability. This work represents an initial step towards developing, interpreting, and compressing hybrid Mamba-Transformer architectures.

  15. FinTRec: Transformer Based Unified Contextual Ads Targeting and Personalization for Financial Applications

    Transformer-based architectures are widely adopted in sequential recommendation systems, yet their application in Financial Services (FS) presents distinct practical and modeling challenges for real-time recommendation. These include:a) long-range user interactions (implicit and explicit) spanning both digital and physical channels generating temporally heterogeneous context, b) the presence of multiple interrelated products require coordinated models to support varied ad placements and personalized feeds, while balancing competing business goals. We propose FinTRec, a transformer-based framework that addresses these challenges and its operational objectives in FS. While tree-based models have traditionally been preferred in FS due to their explainability and alignment with regulatory requirements, our study demonstrate that FinTRec offers a viable and effective shift toward transformer-based architectures. Through historic simulation and live A/B test correlations, we show FinTRec consistently outperforms the production-grade tree-based baseline. The unified architecture, when fine-tuned for product adaptation, enables cross-product signal sharing, reduces training cost and technical debt, while improving offline performance across all products. To our knowledge, this is the first comprehensive study of unified sequential recommendation modeling in FS that addresses both technical and business considerations.

Solidot(15)

  1. 美国 CDC 网页宣称疫苗与自闭症相关

    在反疫苗卫生部长 Robert F. Kennedy Jr.治下,美国疾控中心 CDC 撤下了列举大量证据驳斥疫苗会导致自闭症的网页,改为宣称疫苗与自闭症相关。此举无疑会受到反疫苗者的欢迎,但只会进一步加剧公众的不信任、恐惧和困惑,进一步削弱美国已岌岌可危的疫苗接种率,导致更多可预防感染所引发的疾病、痛苦和死亡,尤其是在儿童和弱势群体中间。匿名 CDC 官员称,该机构的资深科学家并不知道网页更新,也没有就更新内容被咨询过。

  2. 微软公开 Zork 系列游戏源代码

    微软与 Team Xbox 和 Activision 合作,在 MIT 许可证下发布了 Zork I、II 和 III 的源代码。每个库都包含原始源代码和相关文档。Zork(魔域)系列游戏属于文字冒险游戏,最初于 1977 年在 PDP-10 大型机上首次推出,之后开发者成立了 Infocom 公司,将游戏扩展为三部曲——《魔域Ⅰ:地下大帝国》、《魔域Ⅱ:弗罗博兹巫师》、《魔域Ⅲ:地下城主》,于 1980 年登陆 PC。在游戏中,玩家输入文字指令,让角色穿越数百个地点并解决谜题和收集宝藏。游戏程序则扮演叙述者的角色,向玩家描述他们的位置以及命令的结果。游戏被誉为最著名的交互式小说(文字冒险游戏)。Activision 于 1986 年收购了 Infocom 获得了 Zork 游戏的所有权。

  3. 惠普和戴尔禁用了部分电脑的 HEVC 硬件解码功能

    惠普和戴尔被发现禁用了部分型号笔记本电脑的 HEVC 硬件解码功能,此举可能与 HEVC 许可费用上涨有关。今天几乎所有英特尔和 AMD 的 CPU 都支持 HEVC 硬件解码,但惠普和戴尔的用户发现他们无法在浏览器中播放 HEVC/H.265 内容。受影响的惠普型号包括 HP ProBook 460 G11、ProBook 465 G11 和 EliteBook 665 G11。惠普发言人建议客户使用支持 HEVC(H.265)解码的第三方软件方案。HEVC 许可费用将从明年 1 月开始涨价。在美国超过 100,001 台 HEVC 设备的专利费率将从每台 0.20 美元上涨至每台 0.24 美元。而根据 Gartner 的统计数据,惠普在今年三季度售出了 15,002,000 台笔记本电脑和台式机,戴尔售出了 10,166,000 台笔记本电脑和台式机。

  4. 冷冻妻子的男子结交新伴侣引发争论

    八年前,桂军民的妻子展文莲因患肺癌生命垂危,他做了一个离经叛道的决定:他将妻子的身体完完整整地冷冻储存,期待迎接她的苏醒。展文莲成为中国本土首个“冷冻人”。在山东银丰生命科学研究院里,储存她的容器被标记为“1号罐”——里面零下 196℃ 的液氮让时间趋于静止,也让一个普通家庭与科技创造永生的念想紧密相连。桂军民从来不会用“死”形容妻子。在他口中,妻子只是睡着了,要一直睡到医学能攻克肺癌的那一天。“不然(复活后)又遭一遍罪,没有任何意义。”展文莲的人体冷冻协议,签了30年。在等待展文莲复活的日子里,桂军民的生活有了些变化。他上过两次手术台;身边多了个女友。一部分人认为他开始新生活是人之常情,而另一部分人则认为他自私自利,既不尊重亡妻,也不尊重现任伴侣。

  5. 清华去年获得 AI 专利数超过 MIT、斯坦福、普林斯顿和哈佛的总和

    从 2005-2024 年底,清华大学共获得 4986 项 AI 和机器学习专利。清华仅去年一年就获得了逾 900 项专利,比同期 MIT、斯坦福、普林斯顿和哈佛获得的专利数总和还要多。根据 LexisNexis 的数据,目前中国在全球 AI 和机器学习领域活跃专利家族的占比逾半数。在 AI 领域被引用次数前 100 的论文中,清华大学的 AI 研究论文数超过了其它任何高校。尽管如此,美国仍然拥有最具影响力的 AI 专利和性能最先进的模型。哈佛和 MIT 在专利影响力方面一直领先于清华大学。斯坦福大学《AI Index Report》显示,2024 年美国机构开发了 40 个值得关注的 AI 模型,中国机构为 15 个。根据 Information Technology & Innovation Foundation 的数据,中国在全球顶尖 AI 研究员——前 2%——所占份额从 2019 年的 10% 升到 2022 年的 26%。美国的份额同期从 35% 降至 28%。

  6. 大脑处理不同语言的基本语音的方式相同

    《自然》期刊发表的一篇论文发现,人脑对熟悉和陌生语言的声音会有相似的反应。这一研究结果实证了语音感知的跨语言共性,有助于人们理解大脑如何处理语音,并有望指导今后的语言学习和康复的方式。所有口语都有一些相同的声学-语音特征,如元音和辅音,但这些声音组合成词的方式却各不相同。此前研究发现,大脑颞上回区域在语音感知中起到关键作用,但并不清楚颞上回处理熟悉和陌生语言的方式是否一致。中美科学家招募了 34 名讲西班牙语、英语或普通话的受试者,并在他们聆听母语语句和陌生外语语句时记录了他们的脑活动。研究人员发现,此次研究的大部分脑活动都来自颞上回,而且对于熟悉和陌生语言具有相似性。不过,在听到一种已知语言时,脑信号对于词汇相关特征的反应会增强,如词边界(词头和词尾)和词频。在讲西英双语的受试者中,这些信号对两种语言都会增强。研究结果表明,虽然大脑处理不同语言的基本语音的方式相同,但经验能将这些语音组合成词。这或许能解释为何学一种新语言很难:不仅要听辨声音,还要学会如何组合它们。

  7. 美国可能会在两个月内失去麻疹消除国地位

    加拿大在本月初失去麻疹消除国地位,因为麻疹疫情在该国持续了一年之久。而美国的麻疹疫情已经持续了 10 月,距离失去麻疹消除国地位仅剩下两个月时间。美国疾控中心(CDC)的官员确认,亚利桑那州和犹他州交界处最近爆发的麻疹疫情是今年 1 月中下旬西德克萨斯州爆发的麻疹疫情的延续,两起疫情源自相同的麻疹病毒亚型。失去麻疹消除国地位意味着麻疹将再次被视为美国的地方性流行病,对于一种疫苗可预防疾病而言,这是一次令人尴尬的公共卫生倒退。美国现任卫生部长是一位反疫苗者。

  8. 逾 60 个警局装备了波士顿动力的机器狗

    美国和加拿大逾 60 个拆弹小组和特警队装备了波士顿动力的机器狗 Spot。四足机器狗重约 34 公斤,起售价 10 万美元,五年前投入商业使用,被警方用于武装对峙、人质营救和危险品处理等任务。马萨诸塞州警方于 2020 年和 2022 年分别以约 25 万美元的价格购买了两台机器狗。去年 Spot 在 Hyannis 帮助警方抓获了一名持刀劫持其母亲的嫌疑人。休斯顿警方装备了三台 Spot,拉斯维加斯警方拥有一台。美国移民及海关执法局(ICE)最近斥资约 7.8 万美元从加拿大制造商 Icor Technology 购入了一款类似的机器人,该机器人还能投烟雾弹。公民自由组织对警方军事化成为一种常态表示担忧,认为需要制定法律监管此类技术的合理使用。目前全球约有 2000 台 Spot 机器狗投入运行。

  9. 中国卡车正迅速从柴油转向电动

    2020 年中国所有新卡车几乎都使用柴油,但到了 2025 年上半年,电动卡车占到了新重型卡车销量的 22%,而 2024 年同期是 9.2%。英国研究公司 BMI 预测,今年的电动卡车销量将占到接近 46%,明年将达到 60%。中国卡车保有量仅次于美国,目前主要使用柴油,但正快速转向电动,柴油使用量预计将会大幅下降。中国电动卡车的销量已连续五个月超过液化天然气(LNG)卡车,而在世界其它国家电动卡车可能永远无法普及。虽然电动卡车的价格是柴油卡车的两到三倍,比液化天然气卡车贵 18%,但中国科学家的研究表明,电动卡车在整个寿命期间有更高的能效和更低的成本,能为车主节省 10%-26% 的费用。

  10. 中国人才推动美国 AI 研究

    今年 6 月,Meta CEO 扎克伯格(Mark Zuckerberg)宣布成立 Superintelligence Lab 时,公布了 11 位加入该计划的 AI 研究员名单。这 11 人全部是移民,其中 7 人出生于中国。两项新研究显示,出生于中国并在中国接受教育的研究人员在美国顶尖 AI 实验室中发挥着重要作用。尽管美国政府收紧移民政策,硅谷反华情绪上升,这些人才仍在产业界和学术界推动着重要的 AI 研究。保尔森基金会在 2020 年的研究显示,中国 AI 研究员占全球顶尖 AI 人才的近 1/3。Meta 的 AI 相关工作此前就十分依赖中国人才。熟悉该公司 AI 团队文化的人士透露,常有人开玩笑地告知新入职员工只需掌握两种语言:公司内部编程语言“Hack”与普通话。

  11. Linus Torvalds 认为 AI 辅助编程对新手有用,但不适合应用于生产代码

    Linux 作者 Linus Torvalds 在韩国首尔举行的 Linux 基金会开源峰会上接受采访时表达了对 vibe coding(AI 辅助编程)的积极看法,认为它能帮助人们完成原本无法完成的计算机任务,但从代码维护的角度看在生产代码中使用 AI 生成代码是一个非常糟糕的想法。Torvalds 说今天的计算机比他当年学编程时的计算机复杂得多,vibe coding 为新手提供了一条进入计算机领域的途径。Torvalds 本人并没有使用 AI 辅助编程,他表示他从拒绝新想法转向拥抱推动新想法,反对那些墨守成规的资深维护者。他称在内核中 Rust 语言不再是实验性质的语言,而 AI 爬虫对开源基础设施产生了巨大影响,部分开发者也被发现滥用 AI 工具向内核维护者递交了虚假的 Bug 报告和安全警告,但问题还不是太严重。

  12. 荷兰将安世半导体控制权归还给中国母公司

    荷兰政府披露,已停止接管欧洲晶片制造商安世半导体(Nexperia),并将公司控制权归还给它的中国母公司闻泰科技,意味着这场持续一个多月的争端正逐步降温。经济事务部长卡雷曼斯(Vincent Karremans)星期三(11月19日)在社交媒体平台X上说,赋予荷兰政府阻止或修改安世半导体决策权的命令已经取消,并称此举是“善意的表示”。今年 9 月 30日 ,荷兰政府以国家安全为由,冻结中国闻泰科技对安世半导体的控制权一年,相当于由荷兰政府接管公司。作为回应,中国商务部于 10 月 4 日宣布禁止安世中国出口特定成品零部件。尽管安世半导体的晶片技术并不属高端,公司在中国也仅运营一座工厂,但争端仍一度波及本田、大众等多家汽车制造商的生产安排。

  13. 物理学家开发 DeepSeek R1 去审查版本

    西班牙公司 Multiverse Computing 的物理学家开发出一个精简版本的 DeepSeek R1 模型 DeepSeek R1 Slim,模型规模比原版小 55% 但性能几乎相同,而且移除了审查机制。中国 AI 公司的模型都受到了遵守法律和符合社会主义价值观的约束,内置了多层审查机制。Multiverse 利用看一种源自量子物理学的复杂数学方法张量网络,用高维网格网络表示和处理大数据集,张量网络能显著缩小模型规,高效表达复杂 AI 系统。张量网络为研究人员提供了一张模型中所有相关性的地图,允许他们精确识别并移除特定信息。

  14. 近半美国儿童希望的圣诞礼物是游戏虚拟币

    Entertainment Software Association(ESA)调查了逾 700 名儿童(年龄在 5-17 岁之间),39% 的儿童想要游戏机,37% 的儿童想要实体游戏,43% 想要游戏虚拟币。逾半数(58%)美国儿童希望和父母一起玩游戏,尤其是 5-7 岁的儿童(73%)。ESA 还调查了逾 1100 名成年人(年龄在 18-65 岁之间),其中 539 名家长有 5-17 岁孩子。三分之一的美国成年人计划在今年圣诞节购买游戏相关的产品。

  15. 全球近半数人口定居城市

    联合国经济和社会事务部周二发布的《2025 年世界城市化前景:成果摘要》报告显示,全球 82 亿人口中已有45% 居住在城市地区,随着世界持续城市化,这一比例还将继续攀升。报告指出,1950 年全球总人口 25 亿中仅有 20% 居住于城市。到 2050 年,预计全球三分之二的人口增长将发生在城市,其余增长则分布于城镇。报告显示,自 1975 年以来全球特大城市数量已实现四倍增长,从 8 座激增至 33 座,其中 19 座位于亚洲。拥有近4200万居民的印度尼西亚首都雅加达位居全球人口最多城市之首,孟加拉国达卡(约 4000 万人口)和日本东京(3300 万人口)分列二、三位。埃及首都开罗是跻身榜单前十的唯一非亚洲城市。预计到 2050 年,亚的斯亚贝巴(埃塞俄比亚)、达累斯萨拉姆(坦桑尼亚)、哈吉普尔(印度)和吉隆坡(马来西亚)的人口将突破千万大关,使特大城市总数增至 37 座。