OrangeBot.AI Digest — 2025-12-26
44 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- How uv got so fast (nesbitt.io)
- Show HN: Witr – Explain why a process is running on your Linux system (github.com)
- My insulin pump controller uses the Linux kernel. It also violates the GPL (old.reddit.com)
- Rob Pike got spammed with an AI slop "act of kindness" (simonwillison.net)
- Experts explore new mushroom which causes fairytale-like hallucinations (nhmu.utah.edu)
- FFmpeg has issued a DMCA takedown on GitHub (twitter.com)
- Rob Pike goes nuclear over GenAI (skyview.social)
- LearnixOS (www.learnix-os.com)
- Package managers keep using Git as a database, it never works out (nesbitt.io)
- Ask HN: What did you read in 2025?
- ChatGPT conversations still lack timestamps after years of requests (community.openai.com)
- I'm a laptop weirdo and that's why I like my new Framework 13 (blog.matthewbrunelle.com)
- The Algebra of Loans in Rust (nadrieril.github.io)
- Rob Pike Goes Nuclear over GenAI (imgur.com)
- TurboDiffusion: 100–200× Acceleration for Video Diffusion Models (github.com)
GitHub Trending(7)
- tw93 / Mole
🐹 Deep clean and optimize your Mac.
- rendercv / rendercv
CV/resume generator for academics and engineers, YAML to PDF
- langgenius / dify
Production-ready platform for agentic workflow development.
- NanmiCoder / MediaCrawler
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
- flowsurface-rs / flowsurface
A native desktop charting platform for crypto markets
- yichuan-w / LEANN
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
- apurvsinghgautam / robin
AI-Powered Dark Web OSINT Tool
Hugging Face(7)
- Latent Implicit Visual Reasoning
While Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in their ability to handle reasoning tasks that are predominantly visual. Recent approaches have sought to address this by supervising intermediate visual steps with helper images, depth maps, or image crops. However, these strategies impose restrictive priors on what "useful" visual abstractions look like, add heavy annotation costs, and struggle to generalize across tasks. To address this critical limitation, we propose a task-agnostic mechanism that trains LMMs to discover and use visual reasoning tokens without explicit supervision. These tokens attend globally and re-encode the image in a task-adaptive way, enabling the model to extract relevant visual information without hand-crafted supervision. Our approach outperforms direct fine-tuning and achieves state-of-the-art results on a diverse range of vision-centric tasks -- including those where intermediate abstractions are hard to specify -- while also generalizing to multi-task instruction tuning.
- Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term "internal RL", enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models.
- Spatia: Video Generation with Updatable Spatial Memory
Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.
- Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively suppress evaluative feedback steps rather than uniformly shortening responses. Together, our results demonstrate that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
- How Much 3D Do Video Foundation Models Encode?
Videos are continuous 2D projections of 3D worlds. After training on large video data, will global 3D understanding naturally emerge? We study this by quantifying the 3D understanding of existing Video Foundation Models (VidFMs) pretrained on vast video data. We propose the first model-agnostic framework that measures the 3D awareness of various VidFMs by estimating multiple 3D properties from their features via shallow read-outs. Our study presents meaningful findings regarding the 3D awareness of VidFMs on multiple axes. In particular, we show that state-of-the-art video generation models exhibit a strong understanding of 3D objects and scenes, despite not being trained on any 3D data. Such understanding can even surpass that of large expert models specifically trained for 3D tasks. Our findings, together with the 3D benchmarking of major VidFMs, provide valuable observations for building scalable 3D models.
- VA-π: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
Autoregressive (AR) visual generation relies on tokenizers to map images to and from discrete sequences. However, tokenizers are trained to reconstruct clean images from ground-truth tokens, while AR generators are optimized only for token likelihood. This misalignment leads to generated token sequences that may decode into low-quality images, without direct supervision from the pixel space. We propose VA-π, a lightweight post-training framework that directly optimizes AR models with a principled pixel-space objective. VA-π formulates the generator-tokenizer alignment as a variational optimization, deriving an evidence lower bound (ELBO) that unifies pixel reconstruction and autoregressive modeling. To optimize under the discrete token space, VA-π introduces a reinforcement-based alignment strategy that treats the AR generator as a policy, uses pixel-space reconstruction quality as its intrinsic reward. The reward is measured by how well the predicted token sequences can reconstruct the original image under teacher forcing, giving the model direct pixel-level guidance without expensive free-running sampling. The regularization term of the ELBO serves as a natural regularizer, maintaining distributional consistency of tokens. VA-π enables rapid adaptation of existing AR generators, without neither tokenizer retraining nor external reward models. With only 1% ImageNet-1K data and 25 minutes of tuning, it reduces FID from 14.36 to 7.65 and improves IS from 86.55 to 116.70 on LlamaGen-XXL, while also yielding notable gains in the text-to-image task on GenEval for both visual generation model (LlamaGen: from 0.306 to 0.339) and unified multi-modal model (Janus-Pro: from 0.725 to 0.744). Code is available at https://github.com/Lil-Shake/VA-Pi.
- GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
Multi-turn reinforcement learning (RL) for multi-modal agents built upon vision-language models (VLMs) is hampered by sparse rewards and long-horizon credit assignment. Recent methods densify the reward by querying a teacher that provides step-level feedback, e.g., Guided Thought Reinforcement (GTR) and On-Policy Distillation, but rely on costly, often privileged models as the teacher, limiting practicality and reproducibility. We introduce GTR-Turbo, a highly efficient upgrade to GTR, which matches the performance without training or querying an expensive teacher model. Specifically, GTR-Turbo merges the weights of checkpoints produced during the ongoing RL training, and then uses this merged model as a "free" teacher to guide the subsequent RL via supervised fine-tuning or soft logit distillation. This design removes dependence on privileged VLMs (e.g., GPT or Gemini), mitigates the "entropy collapse" observed in prior work, and keeps training stable. Across diverse visual agentic tasks, GTR-Turbo improves the accuracy of the baseline model by 10-30% while reducing wall-clock training time by 50% and compute cost by 60% relative to GTR.
Solidot(15)
- 尼安德特人可能是被现代人类吸收了而不是灭绝了
尼安德特人如何消失或为何灭绝至今是一个争论的主题。有关尼安德特人灭绝的假说包括:人口下降、环境变化、与智人(即现代人类)竞争失败,以及基因同化。意大利和瑞士的研究人员在《Scientific Reports》上发表了一篇论文赞同基因同化的观点,认为尼安德特人是被现代人类吸收了,我们就是尼安德特人,不分彼此。尼安德特人部落的规模比智人的部落通常小一个数量级,当尼安德特人部落不断有智人迁入,反复的杂交和基因混合最终导致规模较小的尼安德特人被智人吸收。
- 两名古代人类都携带致癌病毒 HPV16
Ötzi aka 冰人(Iceman)是一具因冰封而保存完好的天然木乃伊,距今有约 5000 年历史,其发现地点是阿尔卑斯山脉厄茨塔尔山冰川。Ötzi 以其保存完好的衣物、武器和纹身而知名,他的死因可能是肩膀上的箭头。他还被发现饱受骨折、肠道寄生虫和熏黑肺部的折磨。根据发表在 bioRxiv 上的一篇论文,科学家又发现了他的一项新疾病:致癌的人类乳头瘤病毒 HPV16。科学家报告 Ötzi 以及在西伯利亚西部发现的距今 4.5 万年的智人化石都携带了 HPV16 的 DNA 片段。HPV16 致癌病毒同时存在于相距 5000 公里相隔 4 万年的人类身上,显示它已经在人类中间传播了很长时间,可能是现代人类将其传播给了尼安德特人,而不是尼安德特人传播给人类。研究人员发现尼安德特人携带的是低风险的人类乳头瘤病毒 HPV12,而非高致癌性的 HPV16。最新发现挑战了 HPV 病毒是人类与尼安德特人杂交而感染的观点。
- FSF 收到 90 万美元私人捐赠
自由软件基金会(FSF)宣布收到了两笔总额约 90 万美元的捐赠。两笔捐款以门罗币(Monero) 形式捐出,是 FSF 迄今收到的金额最高的私人捐赠之一。捐赠者希望保持匿名。FSF 的资金主要来自个人捐赠和会员支持。两笔捐款让 FSF 提前实现了冬季筹款目标,它将把工作重点转向会员发展,目标是到 1 月 16 日发展 100 名准会员(associate member)。
- 俄罗斯计划十年内在月球上建造核电站
最近发生多起发射事故的俄罗斯宇航局 Roscosmos 宣布计划到 2036 年在月球上建造一座核电站,已经与宇航公司 Lavochkin Association 签署了一份合同。Roscosmos 表示参与者还包括了国家原子能公司(Rosatom)和核能研究机构 Kurchatov Institute。Roscosmos 表示该电站将为俄罗斯的月球计划提供动力,包括月球车、天文台以及俄中联合国际月球研究站(Russian-Chinese International Lunar Research Station)的基础设施。
- 欧盟 2024 年使用的能量逾四分之一来自可再生能源
欧盟 2024 年使用的能量有 25.2% 来自可再生能源,比 2023 年增加 0.7 个百分点,距离 20230 年可再生能源占比 42.5% 还差 17.3 个百分点,意味着要实现目标从 2025 年到 2030 年可再生能源占比每年要增长 2.9%。欧盟国家中,瑞典的可再生能源占比最高达到 62.8%。瑞典主要依赖固体生物质能、水力发电和风能。芬兰紧随其后占比 52.1%,同样依赖固体生物质能、风能和水力发电。丹麦第三占比 46.8%,大部分可再生能源来自固体生物质能、风能和沼气。比利时(14.3%)、卢森堡(14.7%)和爱尔兰(16.1%)的可再生能源占比最低。
- 微软否认利用 AI 使用 Rust 重写所有 C/C++ 代码
微软杰出工程师 Galen Hunt 在 LinkedIn 上畅谈要在 AI 和算法的帮助下,到 2030 年用 Rust 语言重写所有 C 和 C++ 代码,目标是一名工程师一个月一百万行代码。这番话引发了一片哗然,以至于微软进行澄清,而 Galen Hunt 则修改了他的帖子。他在帖子里大量使用“我们(We)”这个词,因此在外界看来他是代表公司发言。微软高管 Frank X. Shaw 澄清,公司没有计划使用 AI 重写 Windows 11。而 Galen Hunt 也在帖子里澄清:微软没有采用 AI 使用 Rust 重写 Windows 11,表示读者过度解读了。
- Ruby 4.0.0 释出
Ruby 语言在圣诞节释出了 v4.0.0。Ruby 语言一直习惯在圣诞节发布大更新。Ruby 4.0.0 的新特性包括:新实验性功能 Ruby Box——提供定义隔离;新的 JIT 编译器 ZJIT,它是作为 YJIT 的下一代开发的,但目前速度还不如 YJIT,不建议用于生产环境;改进并行执行机制 Ractor;语法方面的改变,等等。
- 英伟达计划春节前向中国发货 H200
在获得出口许可之后,英伟达通知中国客户计划春节前发货 H200。初始订单将使用现有库存完成,现有库存有 5,000-10,000 块 HGX 主板,总计提供 40,000 到 80,000 块 GPU。这意味着英伟达将优先供应性能更强的 SXM 版的 H200 显卡,比基于 PCIe 的 NVL 显卡更适合训练应用。根据英伟达与美国政府的协议,它将上缴 25% 的销售收入。英伟达同时通知客户,在中国政府批准之后才能确定发货时间。
- 美国禁止五名欧洲人入境
美国国务院禁止五名欧洲人入境,其中四人是欧洲 NGO 组织负责人,一人是前欧盟委员,原因是他们推动对美国科技巨头的监管。五人包括:2019-2024 年之间负责内部市场的欧盟委员 Thierry Breton,“打击网络仇恨中心”的 Imran Ahmed、总部位于英国的“全球虚假信息指数”的 Clare Melford,以及德国“仇恨援助”(HateAid)组织的 Anna-Lena von Hodenberg 和 Josephine Ballon,“仇恨援助”负责举报网络上的极右翼仇恨言论。美国国务卿鲁比奥表示:“长期以来,欧洲的意识形态分子一直在有组织地胁迫美国平台惩罚他们反对的美国观点。特朗普政府不会再容忍这种恶劣的域外审查行为。”
- 流媒体公司挑战 YouTube 对白天电视时间的支配
YouTube 是最受欢迎的视频平台,但它的优势主要是在白天而非晚上的黄金时间段。尼尔森的数据显示,10 月的上午 11 点,YouTube 在美国的平均电视观看人数为 630 万,而 Netflix 为 280 万。亚马逊在同一时间段为 100 万,HBO Max、Paramount+ 和 Peacock 则不足 60 万。到了晚上,其它流媒体服务与 YouTube 之间的观看人数差距显著缩小。Netflix 在晚上 9 点的观看人数增至 1100 万以上,略低于 YouTube 的 1200 万。为了挑战 YouTube 对白天电视时间的支配,主要流媒体公司都在增加适合用户在白天观看的内容,Netflix 计划明年推出至少 34 个视频播客节目,亚马逊则在 9 月上线了播客 New Heights。数据显示,播客的观看时间主要集中在早上 6 点至下午 6 点之间。YouTube 称在 10 月份用户在电视上观看视频播客的时长达到了 7 亿小时,比上年同期增长了 75%。
- 乌兹别克斯坦汽车牌监控系统被发现无密码联网
安全研究员 Anurag Sen 发现,乌兹别克斯坦的汽车牌追踪监控系统在没有密码保护的情况下暴露在互联网上。数据显示,该系统的数据库于 2024 年 9 月设置,交通监控则始于 2025 年中期。该系统由乌兹别克斯坦内政部公共安全局运营,由深圳公司 Maxvision 开发,该公司的外国客户包括布基纳法索、科威特、阿曼、墨西哥、沙特阿拉伯和乌兹别克斯坦。
- RTX 5090D 在 5K 分辨率下也力不从心
华硕演示了它的 5K@180Hz 27 英寸 ROG Strix 27 Pro 游戏显示器。5K 分辨率为 5120 x 2880,比 4K 分辨率 3840 x 2160 的像素数多 78%,因此 4K 下能流畅运行游戏的显卡在 5K 下也力不从心了。华硕测试了英伟达的旗舰显卡 RTX 5090D(中国特供版,已被另一个特供版 5090Dv2 取代),测试游戏是 Cyberpunk 2077,开启超高光线追踪设置,其帧数仅为 51 帧/秒。测试系统配备了 AMD Ryzen 9950X3D CPU,DLSS 设置为平衡,关闭了帧生成。同样的配置在 4K 下运行 Cyberpunk 2077,帧数达到了 77 帧/秒,比 5K 分辨率高出约 50%。ROG Strix 27 Pro 使用了 IPS 面板,支持双模,可在两种分辨率下切换:5K@180Hz 或 2K@330Hz。该显示器的售价约 800 美元。
- 优秀人才很少在童年时展现天赋或接受高强度训练
一项调查显示,国际象棋大师、奥运会金牌得主和诺贝尔奖得主很少是神童。同样童年的卓越表现和高强度的训练也很少能引领人们在成人的世界里取得最高成就。这项基于 19 项研究、涉及近 3.5 万名优秀人才的分析表明,绝大多数在各自专业领域处于全球顶尖水平的成年人都是在参与各种活动的过程中成长起来的,并逐渐发展出最精湛的技能。在不同的专业领域中,早期的高成就者与后来的世界级水平者在很大程度上是不同的人。事实上,在那些成年后表现出色的人中,只有约 10% 在未成年时表现也出色,而在未成年时表现出色的人中,大约只有 10% 在成年后取得了卓越成就。在童年和青少年时期减少高强度的训练安排,可能有助于防止倦怠和伤病,后者会影响长期职业生涯。
- 三星将在 2026 年推出 6K 裸眼3D 游戏显示器
主流显示器的分辨率正缓慢从 4K 升级到 6K。三星计划在 2026 年推出的一款游戏显示器是 Odyssey 3D G90XH,32 英寸 IPS 显示屏,6K 分辨率并支持裸眼3D,刷新率 165Hz,显示器支持实时眼动追踪,根据用户位置“自动调整深度和透视效果”。显示器支持在两种分辨率下切换:6K@165Hz 或 3K@330Hz。三星还将推出一款刷新率 1040Hz 的游戏显示器 Odyssey G6 G60H,显示屏大小 27 英寸,同样支持两种分辨率,1040Hz 仅限于 720p 分辨率:720p@1040Hz 或者 1440p@600Hz。显示器兼容 AMD FreeSync Premium 和 NVIDIA G-Sync。
- 词典的辉煌一去不复返
1980 年代后期,《韦氏大学词典(Merriam-Webster's Collegiate Dictionary)》曾连续 155 周进入《纽约时报》畅销书榜,最终销量高达 5700 万册,在美国仅次于《圣经》。但词典辉煌时代早已一去不复返了。在互联网时代,词典都在苦苦挣扎。25 年前美国约有 200 名全职词典编纂者,今天可能不到 30 人。韦氏词典如今隶属于大英百科全书,后者也在 2012 年停止出版了纸质版本。大英百科全书的网站每年吸引约 10 亿次页面浏览量,但主要内容不是字典树,而是文字游戏、流行俚语和广告。一项对数字图书馆的分析发现,英语词汇量从 1950 年的约 60 万单词增长到 2000 年的逾百万单词,印刷书籍中 52% 的英语单词是“词汇暗物质”,即没有出现在任何标准词典中。