Weekly Digest — 2025-W43
134 unique stories (2025-10-20 → 2025-10-26), aggregated across 8 sources.
Hacker News(42)
- Claude Code on the web (www.anthropic.com)
- Dutch spy services have restricted intelligence-sharing with the United States (intelnews.org)
- AWS outage shows internet users 'at mercy' of too few providers, experts say (www.theguardian.com)
- Chess grandmaster Daniel Naroditsky has died (old.reddit.com)
- Production RAG: what I learned from processing 5M+ documents (blog.abdellatif.io)
- BERT is just a single text diffusion step (nathan.rs)
- Replacing a $3000/mo Heroku bill with a $55/mo server (disco.cloud)
- NASA chief suggests SpaceX may be booted from moon mission (www.cnn.com)
- Build Your Own Database (www.nan.fyi)
- ChatGPT Atlas (chatgpt.com)
- The Programmer Identity Crisis (hojberg.xyz)
- Fallout from the AWS outage: Smart mattresses go rogue (quasa.io)
GitHub Trending(26)
- anthropics / claude-cookbooks
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
- SagerNet / sing-box
The universal proxy platform
- DrewThomasson / ebook2audiobook
Generate audiobooks from e-books, voice cloning & 1107+ languages!
- x1xhlol / system-prompts-and-models-of-ai-tools
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI, VSCode Agent, Warp.dev, Windsurf, Xcode, Z.ai Code, dia & v0. (And other Open Sourced) System Prompts, Internal Tools & AI Models
- huggingface / lerobot
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
- wavetermdev / waveterm
An open-source, cross-platform terminal for seamless workflows
- mountain-loop / yaak
The most intuitive desktop API client. Organize and execute REST, GraphQL, WebSockets, Server Sent Events, and gRPC 🦬
- louislam / uptime-kuma
A fancy self-hosted monitoring tool
- lfnovo / open-notebook
An Open Source implementation of Notebook LM with more flexibility and features
- sharkdp / bat
A cat(1) clone with wings.
- servo / servo
Servo aims to empower developers with a lightweight, high-performance alternative for embedding web technologies in applications.
- emcie-co / parlant
LLM agents built for control. Designed for real-world use. Deployed in minutes.
Hugging Face(30)
- A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Test-time scaling seeks to improve the reasoning performance of large language models (LLMs) by adding computational resources. A prevalent approach within the field is sampling-based test-time scaling methods, which enhance reasoning by generating multiple reasoning paths for a given input during inference. However, despite its practical success, the theoretical foundations remain underexplored. In this paper, we provide the first theoretical framework for analyzing sampling-based test-time scaling methods, grounded in the perspective of confidence estimation. Based on the framework, we analyze two dominant paradigms: self-consistency and perplexity, and reveal key limitations: self-consistency suffers from high estimation error while perplexity exhibits substantial modeling error and possible degradation of the estimation error convergence. To address these limitations, we introduce RPC, a hybrid method that leverages our theoretical insights through two key components: Perplexity Consistency and Reasoning Pruning. Perplexity Consistency combines the strengths of self-consistency and perplexity, boosting the convergence rate of estimation error from linear to exponential while preserving model error. Reasoning Pruning prevents degradation by eliminating low-probability reasoning paths. Both theoretical analysis and empirical results across seven benchmark datasets demonstrate that RPC has a strong potential for reducing reasoning error. Notably, RPC achieves reasoning performance comparable to self-consistency while not only enhancing confidence reliability but also reducing sampling costs by 50%. The code and resources are available at https://wnjxyk.github.io/RPC.
- OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Advancing machine intelligence requires developing the ability to perceive across multiple modalities, much as humans sense the world. We introduce OmniVinci, an initiative to build a strong, open-source, omni-modal LLM. We carefully study the design choices across model architecture and data curation. For model architecture, we present three key innovations: (i) OmniAlignNet for strengthening alignment between vision and audio embeddings in a shared omni-modal latent space; (ii) Temporal Embedding Grouping for capturing relative temporal alignment between vision and audio signals; and (iii) Constrained Rotary Time Embedding for encoding absolute temporal information in omni-modal embeddings. We introduce a curation and synthesis pipeline that generates 24M single-modal and omni-modal conversations. We find that modalities reinforce one another in both perception and reasoning. Our model, OmniVinci, outperforms Qwen2.5-Omni with +19.05 on DailyOmni (cross-modal understanding), +1.7 on MMAR (audio), and +3.9 on Video-MME (vision), while using just 0.2T training tokens - a 6 times reduction compared to Qwen2.5-Omni's 1.2T. We finally demonstrate omni-modal advantages in downstream applications spanning robotics, medical AI, and smart factory.
- NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
3D object editing is essential for interactive content creation in gaming, animation, and robotics, yet current approaches remain inefficient, inconsistent, and often fail to preserve unedited regions. Most methods rely on editing multi-view renderings followed by reconstruction, which introduces artifacts and limits practicality. To address these challenges, we propose Nano3D, a training-free framework for precise and coherent 3D object editing without masks. Nano3D integrates FlowEdit into TRELLIS to perform localized edits guided by front-view renderings, and further introduces region-aware merging strategies, Voxel/Slat-Merge, which adaptively preserve structural fidelity by ensuring consistency between edited and unedited areas. Experiments demonstrate that Nano3D achieves superior 3D consistency and visual quality compared with existing methods. Based on this framework, we construct the first large-scale 3D editing datasets Nano3D-Edit-100k, which contains over 100,000 high-quality 3D editing pairs. This work addresses long-standing challenges in both algorithm design and data availability, significantly improving the generality and reliability of 3D editing, and laying the groundwork for the development of feed-forward 3D editing models. Project Page:https://jamesyjl.github.io/Nano3D
- Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs
Recent work has shown that narrow finetuning can produce broadly misaligned LLMs, a phenomenon termed emergent misalignment (EM). While concerning, these findings were limited to finetuning and activation steering, leaving out in-context learning (ICL). We therefore ask: does EM emerge in ICL? We find that it does: across three datasets, three frontier models produce broadly misaligned responses at rates between 2% and 17% given 64 narrow in-context examples, and up to 58% with 256 examples. We also examine mechanisms of EM by eliciting step-by-step reasoning (while leaving in-context examples unchanged). Manual analysis of the resulting chain-of-thought shows that 67.5% of misaligned traces explicitly rationalize harmful outputs by adopting a reckless or dangerous ''persona'', echoing prior results on finetuning-induced EM.
- Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Instruction-based video editing promises to democratize content creation, yet its progress is severely hampered by the scarcity of large-scale, high-quality training data. We introduce Ditto, a holistic framework designed to tackle this fundamental challenge. At its heart, Ditto features a novel data generation pipeline that fuses the creative diversity of a leading image editor with an in-context video generator, overcoming the limited scope of existing models. To make this process viable, our framework resolves the prohibitive cost-quality trade-off by employing an efficient, distilled model architecture augmented by a temporal enhancer, which simultaneously reduces computational overhead and improves temporal coherence. Finally, to achieve full scalability, this entire pipeline is driven by an intelligent agent that crafts diverse instructions and rigorously filters the output, ensuring quality control at scale. Using this framework, we invested over 12,000 GPU-days to build Ditto-1M, a new dataset of one million high-fidelity video editing examples. We trained our model, Editto, on Ditto-1M with a curriculum learning strategy. The results demonstrate superior instruction-following ability and establish a new state-of-the-art in instruction-based video editing.
- Latent Diffusion Model without Variational Autoencoder
Recent progress in diffusion-based visual generation has largely relied on latent diffusion models with variational autoencoders (VAEs). While effective for high-fidelity synthesis, this VAE+diffusion paradigm suffers from limited training efficiency, slow inference, and poor transferability to broader vision tasks. These issues stem from a key limitation of VAE latent spaces: the lack of clear semantic separation and strong discriminative structure. Our analysis confirms that these properties are crucial not only for perception and understanding tasks, but also for the stable and efficient training of latent diffusion models. Motivated by this insight, we introduce SVG, a novel latent diffusion model without variational autoencoders, which leverages self-supervised representations for visual generation. SVG constructs a feature space with clear semantic discriminability by leveraging frozen DINO features, while a lightweight residual branch captures fine-grained details for high-fidelity reconstruction. Diffusion models are trained directly on this semantically structured latent space to facilitate more efficient learning. As a result, SVG enables accelerated diffusion training, supports few-step sampling, and improves generative quality. Experimental results further show that SVG preserves the semantic and discriminative capabilities of the underlying self-supervised representations, providing a principled pathway toward task-general, high-quality visual representations.
- PICABench: How Far Are We from Physically Realistic Image Editing?
Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.
- DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
Autonomous data science, from raw data sources to analyst-grade deep research reports, has been a long-standing challenge, and is now becoming feasible with the emergence of powerful large language models (LLMs). Recent workflow-based data agents have shown promising results on specific data tasks but remain fundamentally limited in achieving fully autonomous data science due to their reliance on predefined workflows. In this paper, we introduce DeepAnalyze-8B, the first agentic LLM designed for autonomous data science, capable of automatically completing the end-toend pipeline from data sources to analyst-grade deep research reports. To tackle high-complexity data science tasks, we propose a curriculum-based agentic training paradigm that emulates the learning trajectory of human data scientists, enabling LLMs to progressively acquire and integrate multiple capabilities in real-world environments. We also introduce a data-grounded trajectory synthesis framework that constructs high-quality training data. Through agentic training, DeepAnalyze learns to perform a broad spectrum of data tasks, ranging from data question answering and specialized analytical tasks to open-ended data research. Experiments demonstrate that, with only 8B parameters, DeepAnalyze outperforms previous workflow-based agents built on most advanced proprietary LLMs. The model, code, and training data of DeepAnalyze are open-sourced, paving the way toward autonomous data science.
- Glyph: Scaling Context Windows via Visual-Text Compression
Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-context LLMs. In this work, we take a different perspective-visual context scaling-to tackle this challenge. Instead of extending token-based sequences, we propose Glyph, a framework that renders long texts into images and processes them with vision-language models (VLMs). This approach substantially compresses textual input while preserving semantic information, and we further design an LLM-driven genetic search to identify optimal visual rendering configurations for balancing accuracy and compression. Through extensive experiments, we demonstrate that our method achieves 3-4x token compression while maintaining accuracy comparable to leading LLMs such as Qwen3-8B on various long-context benchmarks. This compression also leads to around 4x faster prefilling and decoding, and approximately 2x faster SFT training. Furthermore, under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks. In addition, the rendered text data benefits real-world multimodal tasks, such as document understanding. Our code and model are released at https://github.com/thu-coai/Glyph.
- FineVision: Open Data Is All You Need
The advancement of vision-language models (VLMs) is hampered by a fragmented landscape of inconsistent and contaminated public datasets. We introduce FineVision, a meticulously collected, curated, and unified corpus of 24 million samples - the largest open resource of its kind. We unify more than 200 sources into 185 subsets via a semi-automated, human-in-the-loop pipeline: automation performs bulk ingestion and schema mapping, while reviewers audit mappings and spot-check outputs to verify faithful consumption of annotations, appropriate formatting and diversity, and safety; issues trigger targeted fixes and re-runs. The workflow further applies rigorous de-duplication within and across sources and decontamination against 66 public benchmarks. FineVision also encompasses agentic/GUI tasks with a unified action space; reviewers validate schemas and inspect a sample of trajectories to confirm executable fidelity. Models trained on FineVision consistently outperform those trained on existing open mixtures across a broad evaluation suite, underscoring the benefits of scale, data hygiene, and balanced automation with human oversight. We release the corpus and curation tools to accelerate data-centric VLM research.
- Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) by retrieving relevant documents from an external corpus. However, existing RAG systems primarily focus on unimodal text documents, and often fall short in real-world scenarios where both queries and documents may contain mixed modalities (such as text and images). In this paper, we address the challenge of Universal Retrieval-Augmented Generation (URAG), which involves retrieving and reasoning over mixed-modal information to improve vision-language generation. To this end, we propose Nyx, a unified mixed-modal to mixed-modal retriever tailored for URAG scenarios. To mitigate the scarcity of realistic mixed-modal data, we introduce a four-stage automated pipeline for generation and filtering, leveraging web documents to construct NyxQA, a dataset comprising diverse mixed-modal question-answer pairs that better reflect real-world information needs. Building on this high-quality dataset, we adopt a two-stage training framework for Nyx: we first perform pre-training on NyxQA along with a variety of open-source retrieval datasets, followed by supervised fine-tuning using feedback from downstream vision-language models (VLMs) to align retrieval outputs with generative preferences. Experimental results demonstrate that Nyx not only performs competitively on standard text-only RAG benchmarks, but also excels in the more general and realistic URAG setting, significantly improving generation quality in vision-language tasks.
- When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Ensembling Large Language Models (LLMs) has gained attention as a promising approach to surpass the performance of individual models by leveraging their complementary strengths. In particular, aggregating models' next-token probability distributions to select the next token has been shown to be effective in various tasks. However, while successful for short-form answers, its application to long-form generation remains underexplored. In this paper, we show that using existing ensemble methods in long-form generation requires a careful choice of ensembling positions, since the standard practice of ensembling at every token often degrades performance. We identify two key factors for determining these positions: tokenization mismatch across models and consensus in their next-token probability distributions. Based on this, we propose SAFE, (Stable And Fast LLM Ensembling), a framework that selectively ensembles by jointly considering these factors. To further improve stability, we introduce a probability sharpening strategy that consolidates probabilities spread across multiple sub-word tokens representing the same word into a single representative token. Our experiments on diverse benchmarks, including MATH500 and BBH, demonstrate that SAFE outperforms existing methods in both accuracy and efficiency, with gains achieved even when ensembling fewer than 1% of tokens.
Solidot(36)
- AWS 宕机影响亚马逊和《堡垒之夜》等游戏
亚马逊 AWS 发生严重宕机事故,影响了数以百万计的网站和服务,包括亚马逊自己、PrimeVideo、Perplexity AI、Canva 等网站以及《堡垒之夜》等游戏。亚马逊在其 AWS 状态页面发表声明,称确认 US-EAST-1 区域DynamoDB 端点的请求存在严重的错误率,工程师正致力于缓解问题和全面理解问题根因。
- Xubuntu 官网被嵌入窃取加密货币的恶意程序
使用 Xfce 桌面环境的 Ubuntu Linux 衍生发行版 Xubuntu 官网遭到黑客入侵,黑客在下载页面提供了一个 zip 文件,其中包含了一个可疑的 exe 文件和一个 tos.txt 文件. 该文件的版权声明是 Copyright (c) 2026 Xubuntu.org。黑客嵌入的恶意程序旨在窃取加密货币,方法是扫描剪切板上的加密货币地址,然后替换黑客控制的钱包地址。扫描的加密货币包括比特币、莱特币、以太坊和狗币等。
- Windows 11 更新破坏了 Recovery Environment
Windows Recovery Environment(WinRE)是 Windows 的恢复环境,用于在启动失败之后对计算机进行故障排除,包括启动到 BIOS 或以安全模式启动计算机。但 Windows 11 十月更新 KB5066835 存在 bug,会导致在 WinRE 下 USB 键盘和鼠标失去响应,这将导致 WinRE 对大部分用户而言失去了作用。PS/2 接口的键盘和鼠标不受影响,但今天的计算机几乎不再使用此类接口的外设。微软表示正在开发修复程序解决该问题。
- 日本电商巨头 ASKUL 遭勒索软件攻击
日本电商巨头 爱速客乐(ASKUL)19 日宣布,因遭到网络攻击导致系统故障,已暂停商品的订单受理及出货业务。此次故障的原因是感染了勒索软件。运营“无印良品”的良品计划公司也于 20 日透露,19 日夜间暂停了网店销售等业务。良品计划将部分配送业务委托给爱速客乐子公司,物流出现了障碍。爱速客乐 19 日上午发现感染病毒,目前尚无修复的眉目。包括是否有个人信息和顾客数据外泄在内,公司正在调查影响范围。爱速客乐表示,“发生了给大家添麻烦、让大家担心的事态,在此致歉。”
- GIMP 正式提供 Snap 打包的版本
GIMP 正式提供 Snap 打包的版本。它已经提供了 Flatpak 包和微软 Windows 商店的 MSIX 包。Snap 是 Canonical 主导的软件包格式,GIMP 此前非官方的 Snap 软件包由下游项目 Snapcrafters 维护,GIMP 联系了 Snapcrafters,对方欣然同意转让所有权。GIMP 3.0.6 是第一个由官方维护的 Snap 版本。
- 英伟达展示首块美制 Blackwell 芯片
英伟达和台积电在美国亚利桑那州凤凰城展示了首块美制 Blackwell 芯片。Blackwell 是英伟达最新一代的 GPU 架构,GeForce RTX 5000 系列显卡使用的就是 Blackwell 架构。英伟达 CEO 黄仁勋表示,这是一个历史性的时刻,原因有很多。这是美国近代史上第一次,最重要的芯片在美国最先进的晶圆厂台积电生产出来。Blackwell 将于台积电亚利桑那州一期晶圆厂 Fab 21 生产,该基地规划量产 2 纳米、3 纳米、4 纳米与 A16 等先进制程。英伟达表示,随着 Blackwell 芯片开始在台积电亚利桑那州厂区生产,它将能更有效地自我隔绝于不断变化的关税情势与地缘政治紧张局势。
- SpaceX 进度滞后 NASA 可能选择其它公司开发月球着陆器
SpaceX 此前与 NASA 签署了一项价值 29 亿美元的合同,提供宇航员登陆月球表面的着陆器。NASA 代理局长 Sean Duffy 周一在 CNBC Squawk Box 上表示,SpaceX 推迟了时间表,而美国正致力于在中国之前载人登月,NASA 正考虑让其它公司与 SpaceX 竞争制造月球着陆器。如果 NASA 取消或修改与 SpaceX 的合同,可能预示着 NASA 的计划发生重大逆转。
- 美国客机疑与气象气球发生碰撞
美国联合航空公司一架从丹佛飞往洛杉矶的 UA1093 客机上周四 10 月 16 日发生了挡风玻璃撞击事件。这架 737 MAX 飞机前部两个玻璃窗之一破裂严重,但没有完全破碎,飞行员手臂显示疑似割伤痕迹。机长声称撞击玻璃的是太空碎片。碰撞高度在一万米左右。客机备降在盐湖城国际机场。美国国家运输安全委员会(NTSB)表示正对此事件展开调查。WindBorne System CEO 之后表示,与飞机发生相撞的物体可能是该公司的气象气球。气象气球当时正在该地区同一高度飞行。
- KDE Plasma 6.5 释出
KDE Plasma 桌面环境释出了 v6.5。主要变化包括::窗口底部圆角、自动亮暗主题切换、改进系统设置、登录屏幕加入休眠选项、改进可访问性功能、通过调整色调映射曲线改进 HDR 显示支持、实验性的 Wayland 画中画协议支持、增强 Wayland 功能、新的桌面灰度选项,等等。
- 中国东南沿海同时遭遇海平面加速上升和下沉
根据发表在《自然》期刊上的一项研究,罗格斯大学和中山大学等机构的研究人员报告中国东南沿海地区同时遭遇海平面加速上升和地面加速下沉。研究人员称这种现象在全新世地质记录中以前从未观察到过。中国东南沿海是世界人口最稠密的地区之一。研究人员分析了过去 11,700 年海平面上升速度。海平面变化经历了三个阶段:早期因冰川融化推动海平面快速上升,之后从 4200 年前到 19 世纪中叶海平面变化趋稳,自 1900 年以来海平面上升速度超过了过去 4000 年任何一个世纪。在海平面加速上升的同时,中国东南沿海地区还面临人为的地面加速沉降,大规模城市化导致的地面沉降速度远高于海平面上升速度,如潮州、福州、绍兴、汕头、杭州等城市沉降速度数倍于海平面上升速度。双重效应增加了该地区的洪涝风险。
- 被切断大脑区域的脑电图与深度睡眠脑电图相似
根据发表在 PLoS Biology 期刊上的一项研究,被切断大脑区域的脑电图与深度睡眠脑电图相似。这一研究发现加深了对意识和无意识脑状态的理解。对于药物无效的严重癫痫患儿,医生会通过名为脑半球切除术(hemispherotomy)的手术切断引发癫痫的脑半球与大脑其余部分的连接,以阻止癫痫扩散。被切断的脑组织会被留在颅骨内,保留了完整的血液供应。被切断的脑区域是否具有某种形式的意识,或者是否能表现出意识?研究人员分析了十名儿童在手术前以及手术后六个月至三年期间的脑电图。他们发现,被切断脑区域的电活动在术后减缓,而其余大脑部分的电活动没有变化,其脑电图也与对照组儿童脑电图相似。被切断脑区域的脑电波主要是 δ波,脑电图与对照组儿童深度睡眠时的脑电图相似。
- SpaceX 发射了第 1 万颗 Starlink 卫星
SpaceX 于 10 月 19 日发射了两枚 Falcon 9 火箭,分别将 28 颗 Starlink 宽带卫星送入轨道。其中第一枚在佛罗里达发射升空,其第一级创下了第 31 次发射的重复使用记录;第二枚不到两个小时后从加州范登堡太空军基地发射升空,这枚火箭携带了第 10,000 颗进入轨道的 Starlink 卫星,这是 Falcon 9 火箭今年的第 132 次发射,追平了去年创下的纪录,而距离 2026 年还有近两个半月的时间。