DIGEST · 2025-12-24

OrangeBot.AI Digest — 2025-12-24

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Tell HN: Merry Christmas
  2. Nvidia buying AI chip startup Groq for about $20B in cash (www.cnbc.com)
  3. Show HN: Minimalist editor that lives in browser, stores everything in the URL (github.com)
  4. Fabrice Bellard: Biography (2009) [pdf] (www.ipaidia.gr)
  5. Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator (github.com)
  6. I'm returning my Framework 16 (yorickpeterse.com)
  7. Why We Abandoned Matrix (2024) (forum.hackliberty.org)
  8. Games’ affordance of childlike wonder and reduced burnout risk in young adults (games.jmir.org)
  9. AMD entered the CPU market with reverse-engineered Intel 8080 clone 50 years ago (www.tomshardware.com)
  10. When Compilers Surprise You (xania.org)
  11. The e-scooter isn't new – London was zooming around on Autopeds a century ago (www.ianvisits.co.uk)
  12. Avoid Mini-Frameworks (laike9m.com)
  13. Google's year in review: areas with research breakthroughs in 2025 (blog.google)
  14. US sanctions EU government officials behind the DSA (mastodon.social)
  15. Don't Become the Machine (armeet.bearblog.dev)

GitHub Trending(15)

  1. rendercv / rendercv

    Typst-based CV/resume generator for academics and engineers

  2. twitter / the-algorithm

    Source code for the X Recommendation Algorithm

  3. google / langextract

    A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

  4. vllm-project / vllm-omni

    A framework for efficient model inference with omni-modality models

  5. stan-smith / FossFLOW

    Make beautiful isometric infrastructure diagrams

  6. davila7 / claude-code-templates

    CLI tool for configuring and monitoring Claude Code

  7. safety-research / bloom

    bloom - evaluate any behavior immediately  🌸🌱

  8. makeplane / plane

    🔥 🔥 🔥 Open Source JIRA, Linear, Monday, and Asana Alternative. Plane helps you track your issues, epics, and cycles the easiest way on the planet.

  9. yichuan-w / LEANN

    RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

  10. danielmiessler / Fabric

    Fabric is an open-source framework for augmenting humans using AI. It provides a modular system for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

  11. apurvsinghgautam / robin

    AI-Powered Dark Web OSINT Tool

  12. langgenius / dify

    Production-ready platform for agentic workflow development.

  13. anthropics / skills

    Public repository for Agent Skills

  14. etcd-io / etcd

    Distributed reliable key-value store for the most critical data of a distributed system

  15. facebookresearch / dinov3

    Reference PyTorch implementation and models for DINOv3

Hugging Face(15)

  1. SemanticGen: Video Generation in Semantic Space

    State-of-the-art video generative models typically learn the distribution of video latents in the VAE space and map them to pixels using a VAE decoder. While this approach can generate high-quality videos, it suffers from slow convergence and is computationally expensive when generating long videos. In this paper, we introduce SemanticGen, a novel solution to address these limitations by generating videos in the semantic space. Our main insight is that, due to the inherent redundancy in videos, the generation process should begin in a compact, high-level semantic space for global planning, followed by the addition of high-frequency details, rather than directly modeling a vast set of low-level video tokens using bi-directional attention. SemanticGen adopts a two-stage generation process. In the first stage, a diffusion model generates compact semantic video features, which define the global layout of the video. In the second stage, another diffusion model generates VAE latents conditioned on these semantic features to produce the final output. We observe that generation in the semantic space leads to faster convergence compared to the VAE latent space. Our method is also effective and computationally efficient when extended to long video generation. Extensive experiments demonstrate that SemanticGen produces high-quality videos and outperforms state-of-the-art approaches and strong baselines.

  2. Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

    Existing reinforcement learning (RL) approaches treat large language models (LLMs) as a single unified policy, overlooking their internal mechanisms. Understanding how policy evolves across layers and modules is therefore crucial for enabling more targeted optimization and raveling out complex reasoning mechanisms. In this paper, we decompose the language model policy by leveraging the intrinsic split of the Transformer residual stream and the equivalence between the composition of hidden states with the unembedding matrix and the resulting samplable policy. This decomposition reveals Internal Layer Policies, corresponding to contributions from individual layers, and Internal Modular Policies, which align with the self-attention and feed-forward network (FFN) components within each layer. By analyzing the entropy of internal policy, we find that: (a) Early layers keep high entropy for exploration, top layers converge to near-zero entropy for refinement, with convergence patterns varying across model series. (b) LLama's prediction space rapidly converges in the final layer, whereas Qwen-series models, especially Qwen3, exhibit a more human-like, progressively structured reasoning pattern. Motivated by these findings, we propose Bottom-up Policy Optimization (BuPO), a novel RL paradigm that directly optimizes the internal layer policy during early training. By aligning training objective at lower layer, BuPO reconstructs foundational reasoning capabilities and achieves superior performance. Extensive experiments on complex reasoning benchmarks demonstrates the effectiveness of our method. Our code is available at https://github.com/Trae1ounG/BuPO.

  3. LongVideoAgent: Multi-Agent Reasoning with Long Videos

    Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We propose a multi-agent framework in which a master LLM coordinates a grounding agent to localize question-relevant segments and a vision agent to extract targeted textual observations. The master agent plans with a step limit, and is trained with reinforcement learning to encourage concise, correct, and efficient multi-agent cooperation. This design helps the master agent focus on relevant clips via grounding, complements subtitles with visual detail, and yields interpretable trajectories. On our proposed LongTVQA and LongTVQA+ which are episode-level datasets aggregated from TVQA/TVQA+, our multi-agent system significantly outperforms strong non-agent baselines. Experiments also show reinforcement learning further strengthens reasoning and planning for the trained agent. Code and data will be shared at https://longvideoagent.github.io/.

  4. SpatialTree: How Spatial Abilities Branch Out in MLLMs

    Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood, as most studies focus on a narrow set of tasks. We introduce SpatialTree, a cognitive-science-inspired hierarchy that organizes spatial abilities into four levels: low-level perception (L1), mental mapping (L2), simulation (L3), and agentic competence (L4). Based on this taxonomy, we construct the first capability-centric hierarchical benchmark, thoroughly evaluating mainstream MLLMs across 27 sub-abilities. The evaluation results reveal a clear structure: L1 skills are largely orthogonal, whereas higher-level skills are strongly correlated, indicating increasing interdependency. Through targeted supervised fine-tuning, we uncover a surprising transfer dynamic-negative transfer within L1, but strong cross-level transfer from low- to high-level abilities with notable synergy. Finally, we explore how to improve the entire hierarchy. We find that naive RL that encourages extensive "thinking" is unreliable: it helps complex reasoning but hurts intuitive perception. We propose a simple auto-think strategy that suppresses unnecessary deliberation, enabling RL to consistently improve performance across all levels. By building SpatialTree, we provide a proof-of-concept framework for understanding and systematically scaling spatial abilities in MLLMs.

  5. MemEvolve: Meta-Evolution of Agent Memory Systems

    Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to 17.06%; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.

  6. Step-DeepResearch Technical Report

    As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent. We propose a Data Synthesis Strategy Based on Atomic Capabilities to reinforce planning and report writing, combined with a progressive training path from agentic mid-training to SFT and RL. Enhanced by a Checklist-style Judger, this approach significantly improves robustness. Furthermore, to bridge the evaluation gap in the Chinese domain, we establish ADR-Bench for realistic deep research scenarios. Experimental results show that Step-DeepResearch (32B) scores 61.4% on Scale AI Research Rubrics. On ADR-Bench, it significantly outperforms comparable models and rivals SOTA closed-source models like OpenAI and Gemini DeepResearch. These findings prove that refined training enables medium-sized models to achieve expert-level capabilities at industry-leading cost-efficiency.

  7. Reinforcement Learning for Self-Improving Agent with Skill Library

    Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new environments. One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills. However, current skill library approaches rely primarily on LLM prompting, making consistent skill library implementation challenging. To overcome these challenges, we propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities with a skill library. Specifically, we introduce Skill Augmented GRPO for self-Evolution (SAGE), a novel RL framework that systematically incorporates skills into learning. The framework's key component, Sequential Rollout, iteratively deploys agents across a chain of similar tasks for each rollout. As agents navigate through the task chain, skills generated from previous tasks accumulate in the library and become available for subsequent tasks. Additionally, the framework enhances skill generation and utilization through a Skill-integrated Reward that complements the original outcome-based rewards. Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.

  8. SAM Audio: Segment Anything in Audio

    General audio source separation is a key capability for multimodal AI systems that can perceive and reason about sound. Despite substantial progress in recent years, existing separation models are either domain-specific, designed for fixed categories such as speech or music, or limited in controllability, supporting only a single prompting modality such as text. In this work, we present SAM Audio, a foundation model for general audio separation that unifies text, visual, and temporal span prompting within a single framework. Built on a diffusion transformer architecture, SAM Audio is trained with flow matching on large-scale audio data spanning speech, music, and general sounds, and can flexibly separate target sources described by language, visual masks, or temporal spans. The model achieves state-of-the-art performance across a diverse suite of benchmarks, including general sound, speech, music, and musical instrument separation in both in-the-wild and professionally produced audios, substantially outperforming prior general-purpose and specialized systems. Furthermore, we introduce a new real-world separation benchmark with human-labeled multimodal prompts and a reference-free evaluation model that correlates strongly with human judgment.

  9. INTELLECT-3: Technical Report

    We present INTELLECT-3, a 106B-parameter Mixture-of-Experts model (12B active) trained with large-scale reinforcement learning on our end-to-end RL infrastructure stack. INTELLECT-3 achieves state of the art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models. We open-source the model together with the full infrastructure stack used to create it, including RL frameworks, complete recipe, and a wide collection of environments, built with the verifiers library, for training and evaluation from our Environments Hub community platform. Built for this effort, we introduce prime-rl, an open framework for large-scale asynchronous reinforcement learning, which scales seamlessly from a single node to thousands of GPUs, and is tailored for agentic RL with first-class support for multi-turn interactions and tool use. Using this stack, we run both SFT and RL training on top of the GLM-4.5-Air-Base model, scaling RL training up to 512 H200s with high training efficiency.

  10. Scaling Laws for Code: Every Programming Language Matters

    Code large language models (Code LLMs) are powerful but costly to train, with scaling laws predicting performance from model size, data, and compute. However, different programming languages (PLs) have varying impacts during pre-training that significantly affect base model performance, leading to inaccurate performance prediction. Besides, existing works focus on language-agnostic settings, neglecting the inherently multilingual nature of modern software development. Therefore, it is first necessary to investigate the scaling laws of different PLs, and then consider their mutual influences to arrive at the final multilingual scaling law. In this paper, we present the first systematic exploration of scaling laws for multilingual code pre-training, conducting over 1000+ experiments (Equivalent to 336,000+ H800 hours) across multiple PLs, model sizes (0.2B to 14B parameters), and dataset sizes (1T tokens). We establish comprehensive scaling laws for code LLMs across multiple PLs, revealing that interpreted languages (e.g., Python) benefit more from increased model size and data than compiled languages (e.g., Rust). The study demonstrates that multilingual pre-training provides synergistic benefits, particularly between syntactically similar PLs. Further, the pre-training strategy of the parallel pairing (concatenating code snippets with their translations) significantly enhances cross-lingual abilities with favorable scaling properties. Finally, a proportion-dependent multilingual scaling law is proposed to optimally allocate training tokens by prioritizing high-utility PLs (e.g., Python), balancing high-synergy pairs (e.g., JavaScript-TypeScript), and reducing allocation to fast-saturating languages (Rust), achieving superior average performance across all PLs compared to uniform distribution under the same compute budget.

  11. FaithLens: Detecting and Explaining Faithfulness Hallucination

    Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

  12. Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

    Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their comparison within the same framework both in terms of quality and latency. In addition, it also offers an interactive web interface to demo any system built within the tool.

  13. Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

    Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. However, existing works and our pilot study have shown that as dialogue histories grow in length and accumulate noise, current long-context models struggle to accurately identify temporally pertinent information, significantly impairing reasoning performance. To address this, we introduce Memory-T1, a framework that learns a time-aware memory selection policy using reinforcement learning (RL). It employs a coarse-to-fine strategy, first pruning the dialogue history into a candidate set using temporal and relevance filters, followed by an RL agent that selects the precise evidence sessions. The RL training is guided by a multi-level reward function optimizing (i) answer accuracy, (ii) evidence grounding, and (iii) temporal consistency. In particular, the temporal consistency reward provides a dense signal by evaluating alignment with the query time scope at both the session-level (chronological proximity) and the utterance-level (chronological fidelity), enabling the agent to resolve subtle chronological ambiguities. On the Time-Dialog benchmark, Memory-T1 boosts a 7B model to an overall score of 67.0\%, establishing a new state-of-the-art performance for open-source models and outperforming a 14B baseline by 10.2\%. Ablation studies show temporal consistency and evidence grounding rewards jointly contribute to a 15.0\% performance gain. Moreover, Memory-T1 maintains robustness up to 128k tokens, where baseline models collapse, proving effectiveness against noise in extensive dialogue histories. The code and datasets are publicly available at https://github.com/Elvin-Yiming-Du/Memory-T1/

  14. Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

    Qualitative research faces a critical reliability challenge: traditional inter-rater agreement methods require multiple human coders, are time-intensive, and often yield moderate consistency. We present a multi-perspective validation framework for LLM-based thematic analysis that combines ensemble validation with dual reliability metrics: Cohen's Kappa (κ) for inter-rater agreement and cosine similarity for semantic consistency. Our framework enables configurable analysis parameters (1-6 seeds, temperature 0.0-2.0), supports custom prompt structures with variable substitution, and provides consensus theme extraction across any JSON format. As proof-of-concept, we evaluate three leading LLMs (Gemini 2.5 Pro, GPT-4o, Claude 3.5 Sonnet) on a psychedelic art therapy interview transcript, conducting six independent runs per model. Results demonstrate Gemini achieves highest reliability (κ= 0.907, cosine=95.3%), followed by GPT-4o (κ= 0.853, cosine=92.6%) and Claude (κ= 0.842, cosine=92.1%). All three models achieve a high agreement (κ> 0.80), validating the multi-run ensemble approach. The framework successfully extracts consensus themes across runs, with Gemini identifying 6 consensus themes (50-83% consistency), GPT-4o identifying 5 themes, and Claude 4 themes. Our open-source implementation provides researchers with transparent reliability metrics, flexible configuration, and structure-agnostic consensus extraction, establishing methodological foundations for reliable AI-assisted qualitative research.

  15. Active Intelligence in Video Avatars via Closed-loop World Modeling

    Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term goals through adaptive environmental interaction. We address this by introducing L-IVA (Long-horizon Interactive Visual Avatar), a task and benchmark for evaluating goal-directed planning in stochastic generative environments, and ORCA (Online Reasoning and Cognitive Architecture), the first framework enabling active intelligence in video avatars. ORCA embodies Internal World Model (IWM) capabilities through two key innovations: (1) a closed-loop OTAR cycle (Observe-Think-Act-Reflect) that maintains robust state tracking under generative uncertainty by continuously verifying predicted outcomes against actual generations, and (2) a hierarchical dual-system architecture where System 2 performs strategic reasoning with state prediction while System 1 translates abstract plans into precise, model-specific action captions. By formulating avatar control as a POMDP and implementing continuous belief updating with outcome verification, ORCA enables autonomous multi-step task completion in open-domain scenarios. Extensive experiments demonstrate that ORCA significantly outperforms open-loop and non-reflective baselines in task success rate and behavioral coherence, validating our IWM-inspired design for advancing video avatar intelligence from passive animation to active, goal-oriented behavior.

Solidot(15)

  1. 流媒体公司挑战 YouTube 对白天电视时间的支配

    YouTube 是最受欢迎的视频平台,但它的优势主要是在白天而非晚上的黄金时间段。尼尔森的数据显示,10 月的上午 11 点,YouTube 在美国的平均电视观看人数为 630 万,而 Netflix 为 280 万。亚马逊在同一时间段为 100 万,HBO Max、Paramount+ 和 Peacock 则不足 60 万。到了晚上,其它流媒体服务与 YouTube 之间的观看人数差距显著缩小。Netflix 在晚上 9 点的观看人数增至 1100 万以上,略低于 YouTube 的 1200 万。为了挑战 YouTube 对白天电视时间的支配,主要流媒体公司都在增加适合用户在白天观看的内容,Netflix 计划明年推出至少 34 个视频播客节目,亚马逊则在 9 月上线了播客 New Heights。数据显示,播客的观看时间主要集中在早上 6 点至下午 6 点之间。YouTube 称在 10 月份用户在电视上观看视频播客的时长达到了 7 亿小时,比上年同期增长了 75%。

  2. 乌兹别克斯坦汽车牌监控系统被发现无密码联网

    安全研究员 Anurag Sen 发现,乌兹别克斯坦的汽车牌追踪监控系统在没有密码保护的情况下暴露在互联网上。数据显示,该系统的数据库于 2024 年 9 月设置,交通监控则始于 2025 年中期。该系统由乌兹别克斯坦内政部公共安全局运营,由深圳公司 Maxvision 开发,该公司的外国客户包括布基纳法索、科威特、阿曼、墨西哥、沙特阿拉伯和乌兹别克斯坦。

  3. RTX 5090D 在 5K 分辨率下也力不从心

    华硕演示了它的 5K@180Hz 27 英寸 ROG Strix 27 Pro 游戏显示器。5K 分辨率为 5120 x 2880,比 4K 分辨率 3840 x 2160 的像素数多 78%,因此 4K 下能流畅运行游戏的显卡在 5K 下也力不从心了。华硕测试了英伟达的旗舰显卡 RTX 5090D(中国特供版,已被另一个特供版 5090Dv2 取代),测试游戏是 Cyberpunk 2077,开启超高光线追踪设置,其帧数仅为 51 帧/秒。测试系统配备了 AMD Ryzen 9950X3D CPU,DLSS 设置为平衡,关闭了帧生成。同样的配置在 4K 下运行 Cyberpunk 2077,帧数达到了 77 帧/秒,比 5K 分辨率高出约 50%。ROG Strix 27 Pro 使用了 IPS 面板,支持双模,可在两种分辨率下切换:5K@180Hz 或 2K@330Hz。该显示器的售价约 800 美元。

  4. 优秀人才很少在童年时展现天赋或接受高强度训练

    一项调查显示,国际象棋大师、奥运会金牌得主和诺贝尔奖得主很少是神童。同样童年的卓越表现和高强度的训练也很少能引领人们在成人的世界里取得最高成就。这项基于 19 项研究、涉及近 3.5 万名优秀人才的分析表明,绝大多数在各自专业领域处于全球顶尖水平的成年人都是在参与各种活动的过程中成长起来的,并逐渐发展出最精湛的技能。在不同的专业领域中,早期的高成就者与后来的世界级水平者在很大程度上是不同的人。事实上,在那些成年后表现出色的人中,只有约 10% 在未成年时表现也出色,而在未成年时表现出色的人中,大约只有 10% 在成年后取得了卓越成就。在童年和青少年时期减少高强度的训练安排,可能有助于防止倦怠和伤病,后者会影响长期职业生涯。

  5. 三星将在 2026 年推出 6K 裸眼3D 游戏显示器

    主流显示器的分辨率正缓慢从 4K 升级到 6K。三星计划在 2026 年推出的一款游戏显示器是 Odyssey 3D G90XH,32 英寸 IPS 显示屏,6K 分辨率并支持裸眼3D,刷新率 165Hz,显示器支持实时眼动追踪,根据用户位置“自动调整深度和透视效果”。显示器支持在两种分辨率下切换:6K@165Hz 或 3K@330Hz。三星还将推出一款刷新率 1040Hz 的游戏显示器 Odyssey G6 G60H,显示屏大小 27 英寸,同样支持两种分辨率,1040Hz 仅限于 1080p 分辨率:1080p@1040Hz 或者 1440p@600Hz。显示器兼容 AMD FreeSync Premium 和 NVIDIA G-Sync。

  6. 词典的辉煌一去不复返

    1980 年代后期,《韦氏大学词典(Merriam-Webster's Collegiate Dictionary)》曾连续 155 周进入《纽约时报》畅销书榜,最终销量高达 5700 万册,在美国仅次于《圣经》。但词典辉煌时代早已一去不复返了。在互联网时代,词典都在苦苦挣扎。25 年前美国约有 200 名全职词典编纂者,今天可能不到 30 人。韦氏词典如今隶属于大英百科全书,后者也在 2012 年停止出版了纸质版本。大英百科全书的网站每年吸引约 10 亿次页面浏览量,但主要内容不是字典树,而是文字游戏、流行俚语和广告。一项对数字图书馆的分析发现,英语词汇量从 1950 年的约 60 万单词增长到 2000 年的逾百万单词,印刷书籍中 52% 的英语单词是“词汇暗物质”,即没有出现在任何标准词典中。

  7. 欧洲公共机构缓慢摆脱对美国科技公司软件和云服务的依赖

    2018 年的法律 US CLOUD Act 允许美国当局要求该国科技公司交出储存在任何地方的数据。这一法律对欧洲企业和公共机构的数字主权构成了威胁,成为欧盟数据保护条例 GDPR 面临的重大且不可接受的风险。欧洲公共机构正缓慢摆脱对美国科技公司的软件和云服务的依赖。奥地利联邦经济、能源和旅游部最近完成了 1200 名员工向欧洲开源协作平台 Nextcloud 的迁移,这一示范作用也促使奥地利其它部门采用 Nextcloud。欧洲的数字基础设施几乎完全依赖于非欧洲供应商。如果一家美国主要云服务商限制欧洲用户的访问或停止运营,后果将十分严重。欧洲正在采取行动加强数字主权。海牙国际刑事法院(ICC) 宣布从微软切换到德国的开源办公软件 OpenDesk。德国 Schleswig-Holstein 州的 3 万名公务员正迁移到开源软件 LibreOffice、Nextcloud、Open Xchange 和 Thunderbird。

  8. 大模型真的加快了程序员的编程速度?

    MIT Technology Review 采访逾 30 名开发者、科技公司高管、分析师和研究人员后发现,基于大模型的 AI 工具是否加快程序员编程速度不是一个一锤定音的问题。随着一线程序员认识到大模型的局限性,他们对 AI 工具的狂热开始消退。众多研究表明,AI 工具所宣称的生产力提升可能只是一种假象。GitClear 的数据显示 2022 年以来工程师所写代码的持久性——数周内代码不会被删除或重写——提高约 10%,这一改进可能需要归功于 AI。但与此同时,代码的多项质量指标在快速下降。编程问答平台 Stack Overflow 的调查首次显示对 AI 工具的信任度和好感度显著下降。程序员普遍认同 AI 工具的优势在于生成“样板代码”,编写测试、修 bug 以及向新手解释不熟悉的代码。但对于经验丰富的程序员而言,此类任务只占工作量的一小部分,AI 工具对于解决复杂难题帮助不大。基于大模型的 AI 工具也不可避免存在幻觉,它们生成的代码看起来完美,因此很难发现错误。所以使用 AI 工具就像是玩老虎机,有的时候大有帮助,但其它情况可能完全不可靠。

  9. 网信办等发布《互联网平台价格行为规则》限制利用算法操纵价格

    国家发展改革委、市场监管总局和国家网信办联合发布《互联网平台价格行为规则》,其中规定:平台经营者不得以排挤竞争对手或者独占市场为目的,以低于成本的价格销售商品或者提供服务,扰乱正常的生产经营秩序...;平台经营者不得在消费者不知情的情况下,基于支付意愿、支付能力、消费偏好、消费习惯等信息,运用数据和算法、平台规则等手段,对同一商品或者服务在同等交易条件下设置不同的价格或者收费标准;平台经营者不得利用平台规则、数据和算法等手段,相互串通,操纵市场价格,损害其他经营者、消费者合法权益;在突发事件发生期间,平台经营者、平台内经营者销售应急物资和重要民生商品,应当合理制定价格,不得在成本未明显增加时大幅度提高商品价格,或者成本虽有增加但商品价格上涨幅度明显高于成本增长幅度;未提高商品价格,但不合理大幅度提高运输费用或者收取其他不合理费用。

  10. 日本资助企业万亿日元开发国产 AI

    日本政府自 2026 年度起的 5 年间将向国产 AI 开发提供 1 万亿日元(约合人民币 447 亿元)规模的支援。该资金将支援软银等十几家公司最快在明年春季设立的新公司。此举旨在官民协作致力于最尖端的基础模型研发,与在 AI 领域领先一步的美国和中国抗衡。政府还考虑对在生产第一线机器人上使用的 AI 也提供支援。新公司旗下将汇集软银和 AI 开发企业 Preferred Networks(东京)的技术人员等约 100 人。公司将在参加经济产业省的公募后,获得开发所需建设费等支援。新公司力争在作为基础模型性能指标的数值上实现国内最高水平。开发的基础模型将向日企开放并供其使用。

  11. 两大中文暗网市场每月涉嫌洗钱 20 亿美元

    《连线》再次报道了活跃在 Telegram 上的中文加密货币暗网市场。今年早些时候在媒体报道之后,Telegram 封锁了当时最大的暗网市场好旺担保(Haowang Guarantee,aka 汇旺担保或 Huione Guarantee)和第二大市场新币担保。但几个月之后,根据加密货币追踪公司 Elliptic 的研究,目前最大的两个平台——土豆担保(Tudou Guarantee) 和新币担保(Xinbi Guarantee)——每月的洗钱规模仍然接近 20 亿美元。被称为杀猪盘的骗局通常在东南亚电诈园区进行,而电诈已发展成为全球最赚钱的网络犯罪形式。通过向电诈犯罪团伙出售洗钱服务和其它相关产品,土豆担保和新币担保的规模惊人。AlphaBay 曾是最大的暗网市场,是丝绸之路暗网市场鼎盛期的十倍,在其运营的两年半时间里交易额逾 10 亿美元。俄罗斯暗网市场 Hydra 在其运营的七年内交易额逾 50 亿美元。相比下,今年上半年被关闭的好旺担保在其运营的四年内交易额逾 270 亿美元。土豆担保与被关闭的好旺担保有关联,它取代了好旺担保的位置,目前每月交易额约 11 亿美元。新币担保在被打击后又重新上线,目前每月交易额约 8.5 亿美元。Telegram 拒绝对这些市场再次采取行动。

  12. 皮肤和内脏使用不同的方式感知冷

    人体不同部位对冷的感知存在差异。皮肤主要通过 TRPM8 离子通道感知低温,这种通道专门用于感知寒冷的环境条件;而在体内,诸如肺和胃等器官,主要依靠分子传感器 TRPA1 感知温度变化。这解释了为何体表和体内感受到的寒冷大不相同。你或许曾感受过,寒风吹过皮肤的冷和吸入冰冷空气或吞咽冷饮时的冷截然不同。这是因为每种组织类型通过激活各自的生物途径感知温度变化。研究结果表明,温度感知与身体各部位的特定生理功能紧密相关,内脏器官感知寒冷的分子机制与皮肤不同。

  13. Spotify 称反版权极端分子抓取了其音乐库

    此前专注于存档文本的影子图书馆“安娜的档案”发布了音乐串流服务 Spotify 的存档,容量 300TB,包含 2.56 亿首歌和 1.86 亿个 ISRC 码。安娜的档案称这是世界最大的公开音乐元数据库。Spotify 存档共有 8600 万个音乐文件,约占总播放量 99.6%,截止日期 2025 年 7 月。数据显示,逾七成的歌几乎没有几个人听过;流行度前三歌曲总播放量超过排名后 2000 万至 1 亿的歌曲的播放量总和。Spotify 将发布存档的人称为是反版权极端分子,表示正对此展开调查。存档者可能使用了公开的 Web API 抓取元数据和绕过 DRM 访问音频文件。Spotify 坚称这不是一次黑客入侵,用户数据没有受到影响。

  14. 三星将为其冰箱集成 Google Gemini AI

    无论你需要不需要,AI 都将进入到你的生活里,其中包括厨房。三星准备在其冰箱产品中集成 Google 的 Gemini AI,识别客户的饮食习惯。在下个月举行的 CES 2026 展会上,三星计划展示新款的 Bespoke AI Refrigerator 冰箱,内置摄像头,能在 Google Gemini 的帮助下自动识别食物,包括放在无标签容器的剩菜剩饭。AI 冰箱将不需要输入信息就能维持食物库存的更新,跟踪食物的加入和移除,并根据剩余食物提供建议。这将是 Google Gemini AI 首次集成到冰箱中,标志着生成式 AI 的应用范围已从手机和笔记本电脑推广到家用智能电器。

  15. 微软计划到 2030 年用 Rust 代码替代所有 C 和 C++ 代码

    微软计划到 2030 年用 Rust 代码替换所有 C 和 C++ 代码,借助 AI 辅助工具完成这一大规模的代码重构。微软杰出工程师 Galen Hunt 在 LinkedIn 上称,将结合 AI 和算法用 Rust 重写微软最大的代码库,期望一名工程师一个月能完成一百万行代码。Hunt 表示正在招聘一名有至少三年系统级代码开发经验的软件工程师协助完成这项工作,这名工程师最好具有编译器、数据库或操作系统实现经验。