OrangeBot.AI Digest — 2025-12-28
42 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- What an unprocessed photo looks like (maurycyz.com)
- Stepping down as Mockito maintainer after 10 years (github.com)
- Learn computer graphics from scratch and for free (www.scratchapixel.com)
- Never Use Pixelation to Hide Sensitive Text (2014) (dheera.net)
- Last Year on My Mac: Look Back in Disbelief (eclecticlight.co)
- Building a macOS app to know when my Mac is thermal throttling (stanislas.blog)
- Hungry Fat Cells Could Someday Starve Cancer (www.ucsf.edu)
- One year of keeping a tada list (www.ducktyped.org)
- Growing up in “404 Not Found”: China's nuclear city in the Gobi Desert (substack.com)
- C++ says “We have try. . . finally at home” (devblogs.microsoft.com)
- AI Slop Report: The Global Rise of Low-Quality AI Videos (www.kapwing.com)
- Rex is a safe kernel extension framework that allows Rust in the place of eBPF (github.com)
- Dialtone – AOL 3.0 Server (dialtone.live)
- Calendar (neatnik.net)
- Liberating Bluetooth on the ESP32 (exquisite.tube)
GitHub Trending(9)
- Flowseal / zapret-discord-youtube
- tw93 / Mole
🐹 Deep clean and optimize your Mac.
- TheAlgorithms / Python
All Algorithms implemented in Python
- Sergeydigl3 / zapret-discord-youtube-linux
(NOW ONLY FOR NFTABLES) Port zapret-discord-youtube from Flowseal and bol-van for easy to use on linux
- BloopAI / vibe-kanban
Get 10X more out of Claude Code, Codex or any coding agent
- RustPython / RustPython
A Python Interpreter written in Rust
- QuantConnect / Lean
Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
- Shubhamsaboo / awesome-llm-apps
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
- sinelaw / fresh
Text editor for your terminal: easy, powerful and fast
Hugging Face(7)
- Latent Implicit Visual Reasoning
While Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in their ability to handle reasoning tasks that are predominantly visual. Recent approaches have sought to address this by supervising intermediate visual steps with helper images, depth maps, or image crops. However, these strategies impose restrictive priors on what "useful" visual abstractions look like, add heavy annotation costs, and struggle to generalize across tasks. To address this critical limitation, we propose a task-agnostic mechanism that trains LMMs to discover and use visual reasoning tokens without explicit supervision. These tokens attend globally and re-encode the image in a task-adaptive way, enabling the model to extract relevant visual information without hand-crafted supervision. Our approach outperforms direct fine-tuning and achieves state-of-the-art results on a diverse range of vision-centric tasks -- including those where intermediate abstractions are hard to specify -- while also generalizing to multi-task instruction tuning.
- Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these models explore by generating new outputs, one token at a time. However, sampling actions token-by-token can result in highly inefficient learning, particularly when rewards are sparse. Here, we show that it is possible to overcome this problem by acting and exploring within the internal representations of an autoregressive model. Specifically, to discover temporally-abstract actions, we introduce a higher-order, non-causal sequence model whose outputs control the residual stream activations of a base autoregressive model. On grid world and MuJoCo-based tasks with hierarchical structure, we find that the higher-order model learns to compress long activation sequence chunks onto internal controllers. Critically, each controller executes a sequence of behaviorally meaningful actions that unfold over long timescales and are accompanied with a learned termination condition, such that composing multiple controllers over time leads to efficient exploration on novel tasks. We show that direct internal controller reinforcement, a process we term "internal RL", enables learning from sparse rewards in cases where standard RL finetuning fails. Our results demonstrate the benefits of latent action generation and reinforcement in autoregressive models, suggesting internal RL as a promising avenue for realizing hierarchical RL within foundation models.
- Spatia: Video Generation with Updatable Spatial Memory
Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.
- Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively suppress evaluative feedback steps rather than uniformly shortening responses. Together, our results demonstrate that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
- How Much 3D Do Video Foundation Models Encode?
Videos are continuous 2D projections of 3D worlds. After training on large video data, will global 3D understanding naturally emerge? We study this by quantifying the 3D understanding of existing Video Foundation Models (VidFMs) pretrained on vast video data. We propose the first model-agnostic framework that measures the 3D awareness of various VidFMs by estimating multiple 3D properties from their features via shallow read-outs. Our study presents meaningful findings regarding the 3D awareness of VidFMs on multiple axes. In particular, we show that state-of-the-art video generation models exhibit a strong understanding of 3D objects and scenes, despite not being trained on any 3D data. Such understanding can even surpass that of large expert models specifically trained for 3D tasks. Our findings, together with the 3D benchmarking of major VidFMs, provide valuable observations for building scalable 3D models.
- VA-π: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
Autoregressive (AR) visual generation relies on tokenizers to map images to and from discrete sequences. However, tokenizers are trained to reconstruct clean images from ground-truth tokens, while AR generators are optimized only for token likelihood. This misalignment leads to generated token sequences that may decode into low-quality images, without direct supervision from the pixel space. We propose VA-π, a lightweight post-training framework that directly optimizes AR models with a principled pixel-space objective. VA-π formulates the generator-tokenizer alignment as a variational optimization, deriving an evidence lower bound (ELBO) that unifies pixel reconstruction and autoregressive modeling. To optimize under the discrete token space, VA-π introduces a reinforcement-based alignment strategy that treats the AR generator as a policy, uses pixel-space reconstruction quality as its intrinsic reward. The reward is measured by how well the predicted token sequences can reconstruct the original image under teacher forcing, giving the model direct pixel-level guidance without expensive free-running sampling. The regularization term of the ELBO serves as a natural regularizer, maintaining distributional consistency of tokens. VA-π enables rapid adaptation of existing AR generators, without neither tokenizer retraining nor external reward models. With only 1% ImageNet-1K data and 25 minutes of tuning, it reduces FID from 14.36 to 7.65 and improves IS from 86.55 to 116.70 on LlamaGen-XXL, while also yielding notable gains in the text-to-image task on GenEval for both visual generation model (LlamaGen: from 0.306 to 0.339) and unified multi-modal model (Janus-Pro: from 0.725 to 0.744). Code is available at https://github.com/Lil-Shake/VA-Pi.
- GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training
Multi-turn reinforcement learning (RL) for multi-modal agents built upon vision-language models (VLMs) is hampered by sparse rewards and long-horizon credit assignment. Recent methods densify the reward by querying a teacher that provides step-level feedback, e.g., Guided Thought Reinforcement (GTR) and On-Policy Distillation, but rely on costly, often privileged models as the teacher, limiting practicality and reproducibility. We introduce GTR-Turbo, a highly efficient upgrade to GTR, which matches the performance without training or querying an expensive teacher model. Specifically, GTR-Turbo merges the weights of checkpoints produced during the ongoing RL training, and then uses this merged model as a "free" teacher to guide the subsequent RL via supervised fine-tuning or soft logit distillation. This design removes dependence on privileged VLMs (e.g., GPT or Gemini), mitigates the "entropy collapse" observed in prior work, and keeps training stable. Across diverse visual agentic tasks, GTR-Turbo improves the accuracy of the baseline model by 10-30% while reducing wall-clock training time by 50% and compute cost by 60% relative to GTR.
Solidot(11)
- Calibre 引入 AI “讨论”功能
Calibre 本月初释出了受争议的更新,引入 AI“讨论”功能。该功能由 Amir Tehrani 于今年 8 月提出,Calibre 作者兼维护者 Kovid Goyal 欣然接受,首个包含 AI 功能的版本于 12 月初释出。他承诺 Calibre 绝对不会未经用户明确同意选择加入的情况下使用第三方 AI 服务。对于用户的反对,他强调不会移除。在更新之后用户会在“视图”菜单下看到 AI 功能“Discuss selected books with AI”。如果没有配置 GitHub AI 的访问令牌或 Google AI 的 API 密钥,或者通过 LM Studio 或 Ollama 本地运行模型,该功能实际上没什么用。对大多数用户而言,它就是几个没什么用的菜单项。对于用户对 AI 的强烈反对,Goyal 也提供了选择,他放出了 Calibre 的所有版本提供下载,用户可以从 0.6.x-8.x 选择任意一个版本使用。目前开源社区还不存在功能上能替代 Calibre 的电子书管理软件。Calibre 加入 AI 凸显出无论用户是否想要,AI 都在缓慢的渗透到我们的生活之中。
- 在两年等待之后 FFmpeg 向瑞芯发出 DMCA 下架通知
在 FFmpeg 开发者发出 DMCA 通知后,GitHub 下架了瑞芯(Rockchip)的开源媒体处理库 Media Process Platform。FFmpeg 是在 2024 年 2 月首次指控瑞芯违反了该开源项目使用的 LGPL 许可证。瑞芯拷贝了 FFmpeg 的源代码库,删除了原始作者声明,宣称自己拥有这些代码库的所有权,然后使用了不兼容于 LGPL 的 Apache 许可证重新发布代码。FFmpeg 项目等待了差不多两年时间,但瑞芯的最后回应显示它无意解决问题。DMCA 通知要求删除侵权文件,或恢复正确的署名以及使用与 LGPL 兼容的许可证。
- MIT 科学家首次合成有抗癌潜力的天然分子
MIT 与丹娜法伯癌症研究院的科学家合作,首次在实验室成功合成了天然真菌分子“轮枝孢菌素A(verticillin A)”。该分子 50 多年前被首次发现,因其显著的抗癌潜力备受关注,但复杂的结构令其人工合成一直未能实现。研究成果发表于《美国化学会志》,有望开辟一类全新的抗癌药物研发路径。在最新研究中,研究团队不仅实现了轮枝孢菌素A的全合成,还以此为基础设计出多种新型衍生物。初步测试显示,部分衍生物对一种罕见的儿童脑癌——弥漫性中线神经胶质瘤表现出强大的抗肿瘤活性。研究团队从氨基酸衍生物β-羟色氨酸出发,逐步引入醇、酮、酰胺等化学官能团,并精准控制每一步的立体构型。历经16步精密反应,他们最终构建出轮枝孢菌素A分子。
- 国防科大磁悬浮试验车时速达到 700 公里
国防科技大学透露,该校磁悬浮团队成功在两秒内,将吨级重的试验车加速至 700 公里/小时,测试速度打破了同类型平台全球记录,成为全球最快的超导电动磁悬浮试验速度。磁悬浮列车速度更快、加速减速更出色,维护成本更低,但建造成本更高,而且不兼容现有铁路设施。至今只有七列磁悬浮列车在运行——中国四列,韩国两列,日本一列。有两条城际磁悬浮线路正在建造,其中一条连接日本东京和名古屋,另一条连接湖南长沙和浏阳。日本实验性磁悬浮列车 L0 Series 曾在 2015 年创造了 603 km/h 的速度记录。
- Ozempic 悄悄重塑我们的购物习惯
流行 GLP-1 减肥药如 Ozempic 不仅能帮助我们减轻体重,还会悄悄重塑我们的购物习惯,减少购买食品。根据发表在《Journal of Marketing Research》期刊上的研究,研究人员发现,服用 GLP-1 的家庭在六个月内食品杂货支出减少 5.3%。高收入家庭降幅更大为 8.2%。其中咸味零食的支出降幅最大。服用 GLP-1 的家庭在食品上的支出平均减少 10.1%。值得一提的是 GLP-1 服用者在酸奶和新鲜水果等健康食品上增加了支出。然而如果停止服用 GLP-1,研究人员观察到他们很快又恢复了过去的购买习惯,他们会在几个月内恢复大部分减掉的体重。
- 比特币矿场转型 AI 数据中心
比特币挖矿难度在 2024 年翻倍,它的币值从今年 10 月创下的 12 万美元峰值跌至不到 9 万美元。尽管如此,比特币矿场的 ETF 今年飙升了约 90%,原因不是比特币,而是因为矿场纷纷转型 AI 数据中心。AI 竞争所亟需的资产恰好比特币矿场都有:数据中心、冷却系统、土地以及电力合同。当然 AI 数据中心需要更先进的冷却和网络系统,需要用英伟达的 GPU 替换专用矿机,但通过与矿场合作,AI 公司利用现有设施比从零开始建造新数据中心更快更便宜。以 Core Scientific 矿场为例,该公司认为转型为 AI 数据中心是难以想象的极佳机遇,它计划 2028 年完全退出比特币挖矿业务。
- 宇宙可能是不对称的
现代宇宙学的基础建立在宇宙学原理的假设之上,认为宇宙在大尺度是均匀对称的,在任何地方、任何方向上看起来都一样。然而一项强而有力的最新证据显示,这个基本假设可能是错误的。一个被称为宇宙偶极异常(Cosmic Dipole Anomaly)的谜团正挑战对宇宙的理解。过去科学家早已观测到宇宙微波背景辐射(CMB)存在明确的偶极现象,天空的一侧温度略高,而另一侧则略低,差异约为千分之一。这个现象被普遍认为是运动学效应导致,也就是太阳系、银河系乃至整个本星系群,正以每秒数百公里的速度在宇宙中穿梭。1984 年天文学家 George Ellis 和 John Baldwin 提出了一项检验方法。他们指出,若我们的运动是造成 CMB 偶极的唯一原因,那么这种运动也应该在遥远天体的空间分布上留下一个相对应的偶极,这项检验方法被称为 Ellis & Baldwin test。由于相对论效应,科学家预期在物质分布上观测到的偶极讯号其幅度不仅是与 CMB 偶极相同,还会根据天体数量、光谱特性相关的因素放大。然而异常之处就在此:尽管观测到物质分布的偶极方向与 CMB 偶极的方向一致,但其幅度却异常地大于已经被放大过的理论预测值。这意味着宇宙中的物质分布比单纯由我们的运动所能解释的更加不对称,宇宙可能天生就是歪斜的。
- Vizio GPL 合规诉讼法官裁决 Vizio 无需提供安装修改软件所需的签名密钥
2021 年致力于推广开源软件和捍卫自由软件 GPL 许可证的非盈利组织 Software Freedom Conservancy(SFC) 对 Vizio 提起诉讼,指控其多次未能履行 GPL 许可证的基本要求。Vizio 是一家产品主要为高清电视机的消费电子品牌,创始人是王蔚。SFC 称 Vizio 电视机产品使用的 SmartCast 系统包含了 GPL 授权的软件,按照 GPL 许可证要求,购买 Vizio 产品的消费者有权访问源代码,允许对源代码进行修改、研究和在适当条件下重新发行。SFC 寻求 Vizio 履行其合规义务。现在法官做出一项裁决:GPLv2 许可证并不要求提供在设备上安装软件修改版本所需的签名密钥。Linus Torvalds 认为 GPLv2 许可证所要求的是提供源代码,针对的是软件,而不是扩大到硬件,要求获得硬件的访问权限。GPLv2 并没有强迫厂商开放硬件。Torvalds 批评(或者说抨击)了 SFC 的做法。
- Elementary OS 8.1 释出
以易于使用著称、基于 Ubuntu 的发行版 elementary OS 释出了 v8.1 版本。主要变化包括:默认使用 Wayland 会话;改进窗口管理和多任务处理等。Elementary OS 8.1 支持 Arm64 设备,意味着用户可以在 Apple M 系列设备或其它支持加载 UEFI 固件的设备上运行 elementary OS。
- 尼安德特人可能是被现代人类吸收了而不是灭绝了
尼安德特人如何消失或为何灭绝至今是一个争论的主题。有关尼安德特人灭绝的假说包括:人口下降、环境变化、与智人(即现代人类)竞争失败,以及基因同化。意大利和瑞士的研究人员在《Scientific Reports》上发表了一篇论文赞同基因同化的观点,认为尼安德特人是被现代人类吸收了,我们就是尼安德特人,不分彼此。尼安德特人部落的规模比智人的部落通常小一个数量级,当尼安德特人部落不断有智人迁入,反复的杂交和基因混合最终导致规模较小的尼安德特人被智人吸收。
- 两名古代人类都携带致癌病毒 HPV16
Ötzi aka 冰人(Iceman)是一具因冰封而保存完好的天然木乃伊,距今有约 5000 年历史,其发现地点是阿尔卑斯山脉厄茨塔尔山冰川。Ötzi 以其保存完好的衣物、武器和纹身而知名,他的死因可能是肩膀上的箭头。他还被发现饱受骨折、肠道寄生虫和熏黑肺部的折磨。根据发表在 bioRxiv 上的一篇论文,科学家又发现了他的一项新疾病:致癌的人类乳头瘤病毒 HPV16。科学家报告 Ötzi 以及在西伯利亚西部发现的距今 4.5 万年的智人化石都携带了 HPV16 的 DNA 片段。HPV16 致癌病毒同时存在于相距 5000 公里相隔 4 万年的人类身上,显示它已经在人类中间传播了很长时间,可能是现代人类将其传播给了尼安德特人,而不是尼安德特人传播给人类。研究人员发现尼安德特人携带的是低风险的人类乳头瘤病毒 HPV12,而非高致癌性的 HPV16。最新发现挑战了 HPV 病毒是人类与尼安德特人杂交而感染的观点。