DIGEST · 2025-12-08

OrangeBot.AI Digest — 2025-12-08

55 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Deep dive on Nvidia circular funding (philippeoger.com)
  2. Jepsen: NATS 2.12.1 (jepsen.io)
  3. Microsoft has a problem: lack of demand for its AI products (www.windowscentral.com)
  4. Strong earthquake hits northern Japan, tsunami warning issued (www3.nhk.or.jp)
  5. Hunting for North Korean Fiber Optic Cables (nkinternet.com)
  6. Let's put Tailscale on a jailbroken Kindle (tailscale.com)
  7. AMD GPU Debugger (thegeeko.me)
  8. Paramount launches hostile bid for Warner Bros (www.cnbc.com)
  9. Uber is turning data about trips and takeout into insights for marketers (www.businessinsider.com)
  10. IBM to acquire Confluent (www.confluent.io)
  11. Microsoft increases Office 365 and Microsoft 365 license prices (office365itpros.com)
  12. The "confident idiot" problem: Why AI needs hard rules, not vibe checks (steerlabs.substack.com)
  13. Bad Dye Job (daringfireball.net)
  14. Twelve Days of Shell (12days.cmdchallenge.com)
  15. The fuck off contact page (www.nicchan.me)

GitHub Trending(11)

  1. microsoft / VibeVoice

    Open-Source Frontier Voice AI

  2. sinelaw / fresh

    Text editor for your terminal: easy, powerful and fast

  3. winapps-org / winapps

    Run Windows apps such as Microsoft Office/Adobe in Linux (Ubuntu/Fedora) and GNOME/KDE as if they were a part of the native OS, including Nautilus integration. Hard fork of https://github.com/Fmstrat/winapps/

  4. patchy631 / ai-engineering-hub

    In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

  5. slidevjs / slidev

    Presentation Slides for Developers

  6. cloudflare / vibesdk

    An open-source vibe coding platform that helps you build your own vibe-coding platform, built entirely on Cloudflare stack

  7. lfnovo / open-notebook

    An Open Source implementation of Notebook LM with more flexibility and features

  8. anthropics / claude-quickstarts

    A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API

  9. 666ghj / BettaFish

    微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

  10. microsoft / Foundry-Local
  11. microsoft / ML-For-Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Hugging Face(15)

  1. TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

    Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100times with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.

  2. EditThinker: Unlocking Iterative Reasoning for Any Image Editor

    Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aesthetic quality, making instruction-following capability the primary challenge. Existing approaches improve instruction adherence via supervised or reinforcement learning, yet single-turn success rates remain limited due to inherent stochasticity and a lack of deliberation. In this work, we propose a deliberative editing framework to 'think' while they edit, which simulates the human cognitive loop by iteratively executing a Think-while-Edit cycle: Critiquing results and Refining instructions , followed by Repeating the generation until satisfactory. Specifically, we train a single MLLM, EditThinker, to act as the reasoning engine of this framework, which jointly produce the critique score, reasoning process, and refined instructions. We employ reinforcement learning to align the EditThinker's thinking with its editing, thereby generating more targeted instruction improvements. Extensive experiments on four benchmarks demonstrate that our approach significantly improves the instruction-following capability of any image editing model by a large margin. We will release our data construction framework, datasets, and models to benefit the community.

  3. From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

    Reinforcement learning has emerged as a paradigm for post-training large language models, boosting their reasoning capabilities. Such approaches compute an advantage value for each sample, reflecting better or worse performance than expected, thereby yielding both positive and negative signals for training. However, the indiscriminate mixing of the two signals in existing methods, especially from the early stages, may lead to ambiguous guidance and limited gains. To address this issue, we propose **CAPO** (**C**urriculum **A**dvantage **P**olicy **O**ptimization), an adaptive curriculum mechanism based on advantage signals. The proposed mechanism bootstraps imitation learning with positive-only advantage samples to establish robust foundations, and subsequently introduces negative signals to cultivate discriminative capabilities, thereby improving generalization across complex scenarios. Compatible with diverse optimization methods including GRPO, PPO, RLOO, and Reinforce++, our method consistently achieves stable and significant improvements in mathematical reasoning tasks, and further generalizes effectively to multimodal Graphical User Interface (GUI) reasoning scenarios, establishing itself as a versatile and robust optimization framework.

  4. EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

    We propose EMMA, an efficient and unified architecture for multimodal understanding, generation and editing. Specifically, EMMA primarily consists of 1) An efficient autoencoder with a 32x compression ratio, which significantly reduces the number of tokens required for generation. This also ensures the training balance between understanding and generation tasks by applying the same compression ratio to images. 2) Channel-wise concatenation instead of token-wise concatenation among visual understanding and generation tokens, which further reduces the visual tokens in unified architectures. 3) A shared-and-decoupled network that enables mutual improvements across tasks while meeting the task-specific modeling requirements. 4) A mixture-of-experts mechanism adopted for visual understanding encoder, which substantially improves perceptual capabilities with a few parameters increase. Extensive experiments have shown that EMMA-4B can significantly outperform state-of-the-art unified multimodal approaches (e.g., BAGEL-7B) in both efficiency and performance, while also achieving competitive results compared to recent multimodal understanding and generation experts (e.g., Qwen3-VL and Qwen-Image). We believe that EMMA lays a solid foundation for the future development of unified multimodal architectures.

  5. PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

    Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. Supervised training approaches struggle with this task due to the lack of large-scale datasets capturing visual consistency and the complexity of modeling human perceptual preferences. In this paper, we argue that reinforcement learning (RL) offers a promising alternative by enabling models to learn complex and subjective visual criteria in a data-free manner. To achieve this, we introduce PaCo-RL, a comprehensive framework that combines a specialized consistency reward model with an efficient RL algorithm. The first component, PaCo-Reward, is a pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons. The second component, PaCo-GRPO, leverages a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization. Extensive experiments across the two representative subtasks show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability. Together, these results highlight the promise of PaCo-RL as a practical and scalable solution for consistent image generation. The project page is available at https://x-gengroup.github.io/HomePage_PaCo-RL/.

  6. Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

    Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift, which often pushes the policy beyond the trust region, leading to training instabilities manifested as fluctuations in policy entropy and unstable gradients. Although PPO-Clip mitigates this issue through importance clipping, it still overlooks the global distributional shift of actions. To address these challenges, we propose using the entropy ratio between the current and previous policies as a new global metric that effectively quantifies the relative change in policy exploration throughout updates. Building on this metric, we introduce an Entropy Ratio Clipping (ERC) mechanism that imposes bidirectional constraints on the entropy ratio. This stabilizes policy updates at the global distribution level and compensates for the inability of PPO-clip to regulate probability shifts of un-sampled actions. We integrate ERC into both DAPO and GPPO reinforcement learning algorithms. Experiments across multiple benchmarks show that ERC consistently improves performance.

  7. SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

    Achieving character animation that meets studio-grade production standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a reference image, but often fail to preserve structural fidelity and temporal consistency in wild scenarios involving complex motion and cross-identity animations. In this work, we present SCAIL (Studio-grade Character Animation via In-context Learning), a framework designed to address these challenges from two key innovations. First, we propose a novel 3D pose representation, providing a more robust and flexible motion signal. Second, we introduce a full-context pose injection mechanism within a diffusion-transformer architecture, enabling effective spatio-temporal reasoning over full motion sequences. To align with studio-level requirements, we develop a curated data pipeline ensuring both diversity and quality, and establish a comprehensive benchmark for systematic evaluation. Experiments show that SCAIL achieves state-of-the-art performance and advances character animation toward studio-grade reliability and realism.

  8. Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

    Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geometry from motion, causing spatiotemporal inconsistencies and poor generalization. To address these, we extend the reconstruct-then-generate framework to jointly perform Motion generation and geometric Reconstruction for 4D Synthesis (MoRe4D). We first introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories, addressing the scarcity of high-quality 4D scene data. Based on this, we propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D point trajectories. To leverage single-view priors, we design a depth-guided motion normalization strategy and a motion-aware module for effective geometry and dynamics integration. We then propose a 4D View Synthesis Module (4D-ViSM) to render videos with arbitrary camera trajectories from 4D point track representations. Experiments show that MoRe4D generates high-quality 4D scenes with multi-view consistency and rich dynamic details from a single image. Code: https://github.com/Zhangyr2022/MoRe4D.

  9. COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

    Visual Spatial Reasoning is crucial for enabling Multimodal Large Language Models (MLLMs) to understand object properties and spatial relationships, yet current models still struggle with 3D-aware reasoning. Existing approaches typically enhance either perception, by augmenting RGB inputs with auxiliary modalities such as depth and segmentation, or reasoning, by training on spatial VQA datasets and applying reinforcement learning, and thus treat these two aspects in isolation. In this work, we investigate whether a unified MLLM can develop an intrinsic ability to enhance spatial perception and, through adaptive interleaved reasoning, achieve stronger spatial intelligence. We propose COOPER, a unified MLLM that leverages depth and segmentation as auxiliary modalities and is trained in two stages to acquire auxiliary modality generation and adaptive, interleaved reasoning capabilities. COOPER achieves an average 6.91\% improvement in spatial reasoning while maintaining general performance. Moreover, even a variant trained only for auxiliary modality generation attains a 7.92\% gain on distance and size estimation, suggesting that learning to generate auxiliary modalities helps internalize spatial knowledge and strengthen spatial understanding.

  10. RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

    With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge However, these models still fall short in photorealistic image generation. Even on simple T2I tasks, they tend to produce " fake" images with distinct AI artifacts, often characterized by "overly smooth skin" and "oily facial sheens". To recapture the original goal of "indistinguishable-from-reality" generation, we propose RealGen, a photorealistic text-to-image framework. RealGen integrates an LLM component for prompt optimization and a diffusion model for realistic image generation. Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism using both semantic-level and feature-level synthetic image detectors. We leverage this reward signal with the GRPO algorithm to optimize the entire generation pipeline, significantly enhancing image realism and detail. Furthermore, we propose RealBench, an automated evaluation benchmark employing Detector-Scoring and Arena-Scoring. It enables human-free photorealism assessment, yielding results that are more accurate and aligned with real user experience. Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea, in terms of realism, detail, and aesthetics. The code is available at https://github.com/yejy53/RealGen.

  11. SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

    Generative methods for 3D assets have recently achieved remarkable progress, yet providing intuitive and precise control over the object geometry remains a key challenge. Existing approaches predominantly rely on text or image prompts, which often fall short in geometric specificity: language can be ambiguous, and images are cumbersome to edit. In this work, we introduce SpaceControl, a training-free test-time method for explicit spatial control of 3D generation. Our approach accepts a wide range of geometric inputs, from coarse primitives to detailed meshes, and integrates seamlessly with modern pre-trained generative models without requiring any additional training. A controllable parameter lets users trade off between geometric fidelity and output realism. Extensive quantitative evaluation and user studies demonstrate that SpaceControl outperforms both training-based and optimization-based baselines in geometric faithfulness while preserving high visual quality. Finally, we present an interactive user interface that enables online editing of superquadrics for direct conversion into textured 3D assets, facilitating practical deployment in creative workflows. Find our project page at https://spacecontrol3d.github.io/

  12. ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

    Reasoning-centric video object segmentation is an inherently complex task: the query often refers to dynamics, causality, and temporal interactions, rather than static appearances. Yet existing solutions generally collapse these factors into simplified reasoning with latent embeddings, rendering the reasoning chain opaque and essentially intractable. We therefore adopt an explicit decomposition perspective and introduce ReVSeg, which executes reasoning as sequential decisions in the native interface of pretrained vision language models (VLMs). Rather than folding all reasoning into a single-step prediction, ReVSeg executes three explicit operations -- semantics interpretation, temporal evidence selection, and spatial grounding -- aligning pretrained capabilities. We further employ reinforcement learning to optimize the multi-step reasoning chain, enabling the model to self-refine its decision quality from outcome-driven signals. Experimental results demonstrate that ReVSeg attains state-of-the-art performances on standard video object segmentation benchmarks and yields interpretable reasoning trajectories. Project page is available at https://clementine24.github.io/ReVSeg/ .

  13. World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

    Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is conditioned on text and action inputs, e.g., in instruction-guided video editing and world modeling in robotics. Despite these exceptional capabilities, controllable video models often hallucinate - generating future video frames that are misaligned with physical reality - which raises serious concerns in many tasks such as robot policy evaluation and planning. However, state-of-the-art video models lack the ability to assess and express their confidence, impeding hallucination mitigation. To rigorously address this challenge, we propose C3, an uncertainty quantification (UQ) method for training continuous-scale calibrated controllable video models for dense confidence estimation at the subpatch level, precisely localizing the uncertainty in each generated video frame. Our UQ method introduces three core innovations to empower video models to estimate their uncertainty. First, our method develops a novel framework that trains video models for correctness and calibration via strictly proper scoring rules. Second, we estimate the video model's uncertainty in latent space, avoiding training instability and prohibitive training costs associated with pixel-space approaches. Third, we map the dense latent-space uncertainty to interpretable pixel-level uncertainty in the RGB space for intuitive visualization, providing high-resolution uncertainty heatmaps that identify untrustworthy regions. Through extensive experiments on large-scale robot learning datasets (Bridge and DROID) and real-world evaluations, we demonstrate that our method not only provides calibrated uncertainty estimates within the training distribution, but also enables effective out-of-distribution detection.

  14. M3DR: Towards Universal Multilingual Multimodal Document Retrieval

    Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric, limiting their effectiveness in multilingual contexts. In this work, we present M3DR (Multilingual Multimodal Document Retrieval), a framework designed to bridge this gap across languages, enabling applicability across diverse linguistic and cultural contexts. M3DR leverages synthetic multilingual document data and generalizes across different vision-language architectures and model sizes, enabling robust cross-lingual and cross-modal alignment. Using contrastive training, our models learn unified representations for text and document images that transfer effectively across languages. We validate this capability on 22 typologically diverse languages, demonstrating consistent performance and adaptability across linguistic and script variations. We further introduce a comprehensive benchmark that captures real-world multilingual scenarios, evaluating models under monolingual, multilingual, and mixed-language settings. M3DR generalizes across both single dense vector and ColBERT-style token-level multi-vector retrieval paradigms. Our models, NetraEmbed and ColNetraEmbed achieve state-of-the-art performance with ~150% relative improvements on cross-lingual retrieval.

  15. Self-Improving VLM Judges Without Human Annotations

    Effective judges of Vision-Language Models (VLMs) are crucial for model development. Current methods for training VLM judges mainly rely on large-scale human preference annotations. However, such an approach is costly, and the annotations easily become obsolete as models rapidly improve. In this work, we present a framework to self-train a VLM judge model without any human preference annotations, using only self-synthesized data. Our method is iterative and has three stages: (1) generate diverse multimodal instruction-response pairs at varying quality levels, (2) generate reasoning traces and judgments for each pair, removing the ones that do not match our expected quality levels, and (3) training on correct judge answers and their reasoning traces. We evaluate the resulting judge on Multimodal RewardBench and VL-RewardBench across domains: correctness, preference, reasoning, safety, and visual question-answering. Our method improves a Llama-3.2-11B multimodal judge from 0.38 to 0.51 in overall accuracy on VL-RewardBench, often outperforming much larger models including Llama-3.2-90B, GPT-4o, and Claude 3.5 Sonnet, with particularly strong gains in general, hallucination, and reasoning dimensions. The overall strength of these human-annotation-free results suggest the potential for a future self-judge that evolves alongside rapidly improving VLM capabilities.

Solidot(14)

  1. 欧盟对 X 罚款 1.2 亿欧元,X 封杀欧盟广告账户

    欧盟委员会上周五根据《数字服务法》(Digital Services Act)对马斯克(Elon Musk)旗下的 X/Twitter 平台处以 1.2 亿欧元的罚款,理由包括 X 违反了欧盟透明度规定、提供的数据访问权限不足,以及其认证账户的蓝勾设计具有欺骗性——该公司并没有真正验证用户身份而是只要付钱就行。马斯克随后抨击欧盟应该废除,而 X 的高级官员 Nikita Bier 则宣布封禁欧盟的广告账户,声称欧盟试图利用其广告系统中的“漏洞”宣传上周五发布的罚款推文。欧盟委员会发言人对此回应称他们只是在使用 X 提供给企业账户的工具。

  2. JavaScript 诞生三十年

    30 年前的 12 月 4 日,Netscape Communications 和 Sun Microsystems 发表新闻稿,正式宣布推出设计用于创建交互式 Web 应用的对象脚本语言 JavaScript。Netscape 工程师 Brendan Eich 在 1995 年 5 月的 10 天内冲刺开发出了一个内部原型,1996 年 3 月发布了 JavaScript 的 1.0 版本。30 年后的今天 JavaScript 运行在 98.9% 的支持客户端代码的网站上,是 Web 领域最具支配性的编程语言。除浏览器之外,JavaScript 还驱动着服务器后端、移动应用、桌面软件,甚至部分嵌入式系统。JavaScript 一直是全球使用最广泛的语言之一。包括 Netscape 和 Sun 在内的众多最早支持 JavaScript 的科技公司基本都已经消失,而 JavaScript 比它们都活得更久。JavaScript 使用过多个名字,最早叫 Mocha,然后改为 LiveScript,12 月 Netscape 和 Sun 签署授权协议正式将其命名为 JavaScript。JavaScript 与 Sun 的 Java 语言一度引起混淆和困惑,其实除了名字和部分语法规范,两者基本上毫无关系。甲骨文在收购 Sun 之后继承了 JavaScript 商标,但从未使用 JavaScript 名字构建产品,Brendan Eich 等人在一封公开信中认为甲骨文因从未使用而放弃了该商标,因此 JavaScript 成为一个通用术语。

  3. 常用抗抑郁药显著降低男性家暴率

    家暴是一个全球性问题。澳大利亚研究人员调查了常用抗抑郁药舍曲林(Sertraline)减少家暴的效果。研究人员从新南威尔士州 1738 名男性中随机挑选出 630 人,分别让他们服用舍曲林或安慰剂。这些人大多数都是有家暴前科的,是从社区矫正机构和法院招募来的。舍曲林是通过增强大脑中血清素功能去发挥作用,而血清素在调节冲动控制和情绪反应上发挥重要作用,因此这有助于缓解暴力行为中的一个关键驱动因素——无法冷静下来控制情绪。结果显示,服用 12 个月后舍曲林组再犯率(19.1%)低于安慰剂组(24.8%);服用 24 个月后舍曲林组再犯率(28.2%)低于安慰剂组(35.7%)。服药更规律的男性 24 个月后再犯率降低 30%。

  4. 俄罗斯所有保时捷因卫星连接中断而都无法使用

    俄罗斯保时捷车主遭遇了汽车启动无反应,发动机不转,仪表盘指示灯不亮等众多问题,就好像汽车变砖了。这一问题最早是在 11 月底报道的。俄罗斯最大的保时捷经销商 Rolf 证实,问题源于汽车配备的跟踪系统(Vehicle Tracking System 或 VTS)与卫星完全切断连接。VTS 是基于卫星的防盗系统,当卫星连接中断后,系统会认为汽车可能被盗,因此激活了防盗功能,切断燃油供应并且完全锁定发动机。问题影响所有安装了 VTS 的保时捷车型。

  5. 南非企鹅因食物短缺大规模饿死

    非洲企鹅每年都会换羽,脱落磨损的旧羽毛换上新羽毛,以保持羽毛的保暖和防水性能。换羽期间企鹅生活在陆地上三周无法捕猎,因此演化出储存脂肪的机制,利用自身脂肪度过换羽禁食期。换羽之后它需要迅速捕捉食物以恢复体能。如果换羽前后没有找到足够的食物,它们会难以生存。南非研究人员报告,在 2004-2011 年间南非西部沿海的沙丁鱼储量低于其峰值的四分之一(主要原因是过度捕捞),可能导致企鹅因为食物严重短缺而大规模死亡,期间估计有 6.2 万只企鹅死亡。非洲企鹅在 2024 年被列为极度濒危物种。研究人员称其它地区的企鹅种群数量也出现了大规模锐减。过去 30 年企鹅全球种群数量下降了近 80%。

  6. Calibre 最新更新加入 AI,不满的用户创建了移除 AI 的分支

    知名开源电子书管理程序 Calibre 上周释出了最新更新 v8.16.2,主要变化是加入了 AI 功能:允许询问 AI 有关 Calibre 书库中任何书籍的问题;右键​​“查看”选择“与 AI 讨论所选书籍”;右键一本书使用“相似书籍”菜单询问 AI 接下来读什么书;加入 LM Studio 后端,允许在本地运行不同 AI 模型。 对 AI 不满的用户随后创建了一个分支 Clbre,主要变化就是移除了 AI 功能。Clbre 的代码托管在正在大力集成 AI 的微软 GitHub 平台上。

  7. 为什么会议会伤害员工的身心健康?

    一个会议接着一个会议。平均而言企业管理人员每周要在会议上花 23 小时。很多会议内容常被认为价值不高,甚至完全适得其反。而糟糕的会议会导致举行更多的会议,以修补上一次会议造成的破坏。会议长期以来不是管理研究的主题。2015 年发表的一项研究为“会议科学”奠定了基础。该研究认为,真正的问题不在于会议的数量,而在于会议的设计、缺乏明确的目标以及会议无意识强化的不平等。研究人员在一系列研究中发现,会议既能促进也能损害参与者的身心健康。参加过多的会议会导致员工倦怠,甚至产生离职的想法;而会议也能提升员工的敬业。一个简单却常被遗忘的问题是:我们为什么要开会?目标不应该是减少会议数量,而是提高会议质量。会议应该尊重每个人的时间和精力,让每个人都能发表意见和建立联系。

  8. 切尔诺贝利防护罩在俄罗斯无人机袭击后失去防辐射泄漏能力

    国际原子能机构发表报告,它的一个小组完成了对切尔诺贝利核电站新安全防护罩(New Safe Confinement)的全面安全评估。防护罩在今年的一次无人机袭击中严重受损,外壳起大火。报告确认防护罩失去了其主要安全功能,包括防辐射外泄的能力,但同时也发现其承重结构和监测系统未遭受永久损坏。防护罩于 2010 年开始建造,2019 年竣工,设计使用寿命为 100 年。项目耗资 21 亿欧元,资金来自 45 个以上国家和组织的捐助,被誉为核安全领域有史以来规模最大的国际合作。

  9. 德国 Schleswig-Holstein 州已取消近八成的微软软件许可证

    2024 年初德国北部州 Schleswig-Holstein(石勒苏益格-荷尔斯泰因)决定将政府机构使用的 3 万台 PC 从 Microsoft Windows 和 Microsoft Office 迁移到 Linux 和 LibreOffice。此举旨在加强数字主权。数字主权是指相对于封闭的私有软件,公共管理部门对开源软件构成的 IT 解决方案有更多的控制权。一年半之后 Schleswig-Holstein 州数字部长 Dirk Schrödter 表示明年该州在 Windows、Microsoft Office 等软件的许可费用上节省逾 1500 万欧元,未来几年预计将保持类似节省幅度。除税务部门外,州政府部门近八成办公场所已经切换到 LibreOffice。税务部门也制定了切换的时间表。依赖于 MS Word 或 Excel 的专业应用也将会完成切换。Schleswig-Holstein 州将在 2026 年投入一次性的 900 万欧元用于办公场所的软件升级以及基于自由软件的进一步开发。

  10. Jolla 新 Linux 手机开放预购

    芬兰公司 Jolla 创办于 14 年前。在诺基亚放弃其 Linux 移动操作系统 MeeGo 之后,前员工在 MeeGo 基础上开发了 Sailfish OS。Jolla 在 2013 年通过众筹发布了它的第一款智能手机,但最近几年该公司主要专注于开发 Sailfish OS,业务主要是将其软件授权提供给其它智能手机厂商,如索尼 Xperia 和一加、三星等。现在 Jolla 的新款 Linux 手机已经开放预购。预购页面显示预购量在 2026 年 1 月 4 日前达到 2000 部才会投产,2026 年上半年交付。目前预购已经超过 2100 部,因此这不再是问题。新手机搭载了联发科的 5G SoC,配备 12GB 内存,256GB 存储空间,可扩展至 2TB,显示屏是 6.36 英寸 FullHD AMOLED,像素密度 390ppi,可更换的 5500mAh 电池。预购 折扣价为 499 欧元,比原价便宜 99 欧元。手机配备了用户可配置的物理隐私开关,允许关闭麦克风、蓝牙、Android 应用或其它任何功能。手机操作系统将提供至少五年的更新,承诺不会跟踪用户。首批销售市场包括欧盟、英国、瑞士和挪威。

  11. 反 AI 活动人士失踪两周,警方警告他可能携带武器且危险

    27 岁的 Sam Kirchner 在旧金山参与组建了反 AI 组织 Stop AI。该组织遵循典型的行动主义剧本:散发传单,每月组织一次游行示威,组织讨论,制作 T 恤。该组织领导人明确表示反对暴力,认为暴力在道德上不可接受。但随着生成式 AI 的流行,Kirchner 觉得 AI 对人类构成迫在眉睫的生存威胁,同时对组织的进展缓慢或力度不够而感到沮丧和愤怒。最终他在攻击了 Stop AI 现任领导人后与组织决裂,同时销声匿迹。他已经失踪了两周,朋友担心他的安全,而旧金山警方则警告他可能持有武器,可能会袭击 OpenAI 的雇员。哲学家、历史学者、熟悉 Kirchner 且参与过 Stop AI 活动的 Émile P. Torres 认为,人很容易陷入末日论,这种 AI 末日论在硅谷随处可见。Stop AI 成员不相信他会对公众构成威胁,他们更担心其身心健康。

  12. Linus Torvalds 为 Windows 蓝屏死机辩护

    Linus Sebastian 采访了 Linus Torvalds,期间谈及了 Torvalds 对 ECC 内存的偏好,Torvalds 在回答中评论了 Windows 著名的蓝屏死机(BSOD)。Torvalds 称很大一部分 BSOD 实际上并非是软件 bug,而是硬件不可靠导致的。而超频也会额外增加系统的不稳定性。他认为使用 ECC 内存能提高系统可靠性,让用户更信任机器。如果没有 ECC,内存迟早会出问题。微软 BSOD 的背后往往是硬件问题而不是软件 bug。他还顺便评论了下马斯克(Elon Musk)对程序员的管理方式。

  13. 现代家猫在唐朝前后抵达中国

    根据发表在《Cell Genomics》的一项遗传分析,今天的家猫(Felis catus)是在唐朝前后抵达中国。此前中国出现的捕鼠动物是豹猫(Prionailurus bengalensis)。现代家猫起源于近东的非洲野猫,经过驯化后传播至全球。中国最早的猫科动物考古记录来自距今五千多年的陕西泉护村遗址,一具猫类遗骸显示其与人类关系密切,曾被认为可能是家猫,但后被确认为体型与家猫相近的本土猫科动物豹猫。因此家猫何时以及如何传入中国的问题一直悬而未解。为解答这一问题,北大博士后韩雨和同事采集和分析了来自人居环境、时间跨度超过五千年的 22 份小型猫科动物骨骼样本,涵盖了中国已知大部分的古代猫类遗存。通过古 DNA 技术获得了全部 22 份线粒体基因组和 7 份全基因组。其中 7 份为豹猫样本,年代从 5400 年前新石器时代晚期的仰韶文化延续至 1800 年前的东汉末年,揭示了豹猫与人类持续 3500 多年的密切关系。研究中 14 份样本鉴定为家猫,均来自唐代及其后的时期。中国迄今最早的家猫遗骸出土于陕西靖边唐朝统万城遗址,碳14测年为公元 706 至 883 年,距今约 1200 年。基因组表型复原显示,该猫为雄性,毛色可能为纯白或白斑狸花,短毛、长尾,且不携带现代家猫常见的遗传缺陷。结合文献记载与考古图像,家猫传入中国的时间应早于出土遗存的年代,可能在公元 6—7 世纪唐代前后。基因组分析进一步确定了家猫传入中国的路线。中国唐代家猫与哈萨克斯坦占肯特遗址出土的同时期家猫,以及近东黎凡特地区的非洲野猫和家猫遗传关系紧密。这三地恰位于陆上丝绸之路的重要枢纽,表明家猫很可能随商旅往来,经由丝绸之路自地中海东岸途经中亚传入中国。

  14. 三成英国医生在会诊时使用 AI 工具

    根据 Nuffield Trust 智库的一项研究,三成英国全科医生在会诊时使用 AI 工具如 ChatGPT,由于 AI 工具不可避免存在幻觉,使用这些工具可能会导致医生犯错和面临诉讼。研究调查了 2108 名家庭医生,598 人(28%)的人表示已在使用 AI 工具,男性医生(33%)使用 AI 的比例高于女性医生(25%),富裕地区医生使用 AI 的比例远高于贫困地区。报告指出,无论是否使用 AI,绝大多数全科医生都担心诊所可能会面临“职业责任和医疗法律问题”、“临床错误风险”以及“患者隐私和数据安全”问题。调查还发现,使用 AI 工具的医生将节省下来的时间用于休息而不是接诊更多患者。