DIGEST · 2025-09-22

OrangeBot.AI Digest — 2025-09-22

58 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Qwen3-Omni: Native Omni AI model for text, image and video (github.com)
  2. AI-generated “workslop” is destroying productivity? (hbr.org)
  3. OpenAI and Nvidia announce partnership to deploy 10GW of Nvidia systems (openai.com)
  4. UK Millionaire exodus did not occur, study reveals (taxjustice.net)
  5. A New Internet Business Model? (blog.cloudflare.com)
  6. PlanetScale for Postgres is now GA (planetscale.com)
  7. Cap'n Web: a new RPC system for browsers and web servers (blog.cloudflare.com)
  8. Cloudflare is sponsoring Ladybird and Omarchy (blog.cloudflare.com)
  9. Why haven't local-first apps become popular? (marcobambini.substack.com)
  10. Beyond the Front Page: A Personal Guide to Hacker News (hsu.cy)
  11. Tesla coast-to-coast FSD crashes after 60 miles (electrek.co)
  12. Kmart's use of facial recognition to tackle refund fraud unlawful (www.oaic.gov.au)
  13. Tell the EU: Don't Break Encryption with "Chat Control" (www.mozillafoundation.org)
  14. SGI demos from long ago in the browser via WASM (github.com)
  15. You did this with an AI and you do not understand what you're doing here (hackerone.com)

GitHub Trending(15)

  1. Gar-b-age / CookLikeHOC

    🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

  2. bevyengine / bevy

    A refreshingly simple data-driven game engine built in Rust

  3. Alibaba-NLP / DeepResearch

    Tongyi Deep Research, the Leading Open-source Deep Research Agent

  4. tldraw / tldraw

    very good whiteboard SDK / infinite canvas SDK

  5. elastic / elasticsearch

    Free and Open Source, Distributed, RESTful Search Engine

  6. LizardByte / Sunshine

    Self-hosted game stream host for Moonlight.

  7. ytdl-org / youtube-dl

    Command-line program to download videos from YouTube.com and other video sites

  8. mindcraft-bots / mindcraft

    Minecraft AI with LLMs+Mineflayer

  9. eslint / eslint

    Find and fix problems in your JavaScript code.

  10. poteto / hiring-without-whiteboards

    ⭐️ Companies that don't have a broken hiring process

  11. AUTOMATIC1111 / stable-diffusion-webui

    Stable Diffusion web UI

  12. yangshun / tech-interview-handbook

    💯 Curated coding interview preparation materials for busy software engineers

  13. EbookFoundation / free-programming-books

    📚 Freely available programming books

  14. lllyasviel / Fooocus

    Focus on prompting and generating

  15. donnemartin / system-design-primer

    Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Hugging Face(13)

  1. RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

    Large language models excel at function- and file-level code generation, yet generating complete repositories from scratch remains a fundamental challenge. This process demands coherent and reliable planning across proposal- and implementation-level stages, while natural language, due to its ambiguity and verbosity, is ill-suited for faithfully representing complex software structures. To address this, we introduce the Repository Planning Graph (RPG), a persistent representation that unifies proposal- and implementation-level planning by encoding capabilities, file structures, data flows, and functions in one graph. RPG replaces ambiguous natural language with an explicit blueprint, enabling long-horizon planning and scalable repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework for repository generation from scratch. It operates in three stages: proposal-level planning and implementation-level refinement to construct the graph, followed by graph-guided code generation with test validation. To evaluate this setting, we construct RepoCraft, a benchmark of six real-world projects with 1,052 tasks. On RepoCraft, ZeroRepo produces repositories averaging nearly 36K LOC, roughly 3.9times the strongest baseline (Claude Code) and about 64times other baselines. It attains 81.5% functional coverage and a 69.7% pass rate, exceeding Claude Code by 27.3 and 35.8 percentage points, respectively. Further analysis shows that RPG models complex dependencies, enables progressively more sophisticated planning through near-linear scaling, and enhances LLM understanding of repositories, thereby accelerating agent localization.

  2. MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

    Unified multimodal Large Language Models (LLMs) that can both understand and generate visual content hold immense potential. However, existing open-source models often suffer from a performance trade-off between these capabilities. We present Manzano, a simple and scalable unified framework that substantially reduces this tension by coupling a hybrid image tokenizer with a well-curated training recipe. A single shared vision encoder feeds two lightweight adapters that produce continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a common semantic space. A unified autoregressive LLM predicts high-level semantics in the form of text and image tokens, with an auxiliary diffusion decoder subsequently translating the image tokens into pixels. The architecture, together with a unified training recipe over understanding and generation data, enables scalable joint learning of both capabilities. Manzano achieves state-of-the-art results among unified models, and is competitive with specialist models, particularly on text-rich evaluation. Our studies show minimal task conflicts and consistent gains from scaling model size, validating our design choice of a hybrid tokenizer.

  3. Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

    Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.

  4. BaseReward: A Strong Baseline for Multimodal Reward Model

    The rapid advancement of Multimodal Large Language Models (MLLMs) has made aligning them with human preferences a critical challenge. Reward Models (RMs) are a core technology for achieving this goal, but a systematic guide for building state-of-the-art Multimodal Reward Models (MRMs) is currently lacking in both academia and industry. Through exhaustive experimental analysis, this paper aims to provide a clear ``recipe'' for constructing high-performance MRMs. We systematically investigate every crucial component in the MRM development pipeline, including reward modeling paradigms (e.g., Naive-RM, Critic-based RM, and Generative RM), reward head architecture, training strategies, data curation (covering over ten multimodal and text-only preference datasets), backbone model and model scale, and ensemble methods. Based on these experimental insights, we introduce BaseReward, a powerful and efficient baseline for multimodal reward modeling. BaseReward adopts a simple yet effective architecture, built upon a {Qwen2.5-VL} backbone, featuring an optimized two-layer reward head, and is trained on a carefully curated mixture of high-quality multimodal and text-only preference data. Our results show that BaseReward establishes a new SOTA on major benchmarks such as MM-RLHF-Reward Bench, VL-Reward Bench, and Multimodal Reward Bench, outperforming previous models. Furthermore, to validate its practical utility beyond static benchmarks, we integrate BaseReward into a real-world reinforcement learning pipeline, successfully enhancing an MLLM's performance across various perception, reasoning, and conversational tasks. This work not only delivers a top-tier MRM but, more importantly, provides the community with a clear, empirically-backed guide for developing robust reward models for the next generation of MLLMs.

  5. SPATIALGEN: Layout-guided 3D Indoor Scene Generation

    Creating high-fidelity 3D models of indoor environments is essential for applications in design, virtual reality, and robotics. However, manual 3D modeling remains time-consuming and labor-intensive. While recent advances in generative AI have enabled automated scene synthesis, existing methods often face challenges in balancing visual quality, diversity, semantic consistency, and user control. A major bottleneck is the lack of a large-scale, high-quality dataset tailored to this task. To address this gap, we introduce a comprehensive synthetic dataset, featuring 12,328 structured annotated scenes with 57,440 rooms, and 4.7M photorealistic 2D renderings. Leveraging this dataset, we present SpatialGen, a novel multi-view multi-modal diffusion model that generates realistic and semantically consistent 3D indoor scenes. Given a 3D layout and a reference image (derived from a text prompt), our model synthesizes appearance (color image), geometry (scene coordinate map), and semantic (semantic segmentation map) from arbitrary viewpoints, while preserving spatial consistency across modalities. SpatialGen consistently generates superior results to previous methods in our experiments. We are open-sourcing our data and models to empower the community and advance the field of indoor scene understanding and generation.

  6. Lynx: Towards High-Fidelity Personalized Video Generation

    We present Lynx, a high-fidelity model for personalized video synthesis from a single input image. Built on an open-source Diffusion Transformer (DiT) foundation model, Lynx introduces two lightweight adapters to ensure identity fidelity. The ID-adapter employs a Perceiver Resampler to convert ArcFace-derived facial embeddings into compact identity tokens for conditioning, while the Ref-adapter integrates dense VAE features from a frozen reference pathway, injecting fine-grained details across all transformer layers through cross-attention. These modules collectively enable robust identity preservation while maintaining temporal coherence and visual realism. Through evaluation on a curated benchmark of 40 subjects and 20 unbiased prompts, which yielded 800 test cases, Lynx has demonstrated superior face resemblance, competitive prompt following, and strong video quality, thereby advancing the state of personalized video generation.

  7. A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

    Robotic real-world reinforcement learning (RL) with vision-language-action (VLA) models is bottlenecked by sparse, handcrafted rewards and inefficient exploration. We introduce VLAC, a general process reward model built upon InternVL and trained on large scale heterogeneous datasets. Given pairwise observations and a language goal, it outputs dense progress delta and done signal, eliminating task-specific reward engineering, and supports one-shot in-context transfer to unseen tasks and environments. VLAC is trained on vision-language datasets to strengthen perception, dialogic and reasoning capabilities, together with robot and human trajectories data that ground action generation and progress estimation, and additionally strengthened to reject irrelevant prompts as well as detect regression or stagnation by constructing large numbers of negative and semantically mismatched samples. With prompt control, a single VLAC model alternately generating reward and action tokens, unifying critic and policy. Deployed inside an asynchronous real-world RL loop, we layer a graded human-in-the-loop protocol (offline demonstration replay, return and explore, human guided explore) that accelerates exploration and stabilizes early learning. Across four distinct real-world manipulation tasks, VLAC lifts success rates from about 30\% to about 90\% within 200 real-world interaction episodes; incorporating human-in-the-loop interventions yields a further 50% improvement in sample efficiency and achieves up to 100% final success.

  8. BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

    In the field of AI-driven human-GUI interaction automation, while rapid advances in multimodal large language models and reinforcement fine-tuning techniques have yielded remarkable progress, a fundamental challenge persists: their interaction logic significantly deviates from natural human-GUI communication patterns. To fill this gap, we propose "Blink-Think-Link" (BTL), a brain-inspired framework for human-GUI interaction that mimics the human cognitive process between users and graphical interfaces. The system decomposes interactions into three biologically plausible phases: (1) Blink - rapid detection and attention to relevant screen areas, analogous to saccadic eye movements; (2) Think - higher-level reasoning and decision-making, mirroring cognitive planning; and (3) Link - generation of executable commands for precise motor control, emulating human action selection mechanisms. Additionally, we introduce two key technical innovations for the BTL framework: (1) Blink Data Generation - an automated annotation pipeline specifically optimized for blink data, and (2) BTL Reward -- the first rule-based reward mechanism that enables reinforcement learning driven by both process and outcome. Building upon this framework, we develop a GUI agent model named BTL-UI, which demonstrates consistent state-of-the-art performance across both static GUI understanding and dynamic interaction tasks in comprehensive benchmarks. These results provide conclusive empirical validation of the framework's efficacy in developing advanced GUI Agents.

  9. Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems

    Instruction-guided text-to-speech (ITTS) enables users to control speech generation through natural language prompts, offering a more intuitive interface than traditional TTS. However, the alignment between user style instructions and listener perception remains largely unexplored. This work first presents a perceptual analysis of ITTS controllability across two expressive dimensions (adverbs of degree and graded emotion intensity) and collects human ratings on speaker age and word-level emphasis attributes. To comprehensively reveal the instruction-perception gap, we provide a data collection with large-scale human evaluations, named Expressive VOice Control (E-VOC) corpus. Furthermore, we reveal that (1) gpt-4o-mini-tts is the most reliable ITTS model with great alignment between instruction and generated utterances across acoustic dimensions. (2) The 5 analyzed ITTS systems tend to generate Adult voices even when the instructions ask to use child or Elderly voices. (3) Fine-grained control remains a major challenge, indicating that most ITTS systems have substantial room for improvement in interpreting slightly different attribute instructions.

  10. RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes

    Although COLMAP has long remained the predominant method for camera parameter optimization in static scenes, it is constrained by its lengthy runtime and reliance on ground truth (GT) motion masks for application to dynamic scenes. Many efforts attempted to improve it by incorporating more priors as supervision such as GT focal length, motion masks, 3D point clouds, camera poses, and metric depth, which, however, are typically unavailable in casually captured RGB videos. In this paper, we propose a novel method for more accurate and efficient camera parameter optimization in dynamic scenes solely supervised by a single RGB video. Our method consists of three key components: (1) Patch-wise Tracking Filters, to establish robust and maximally sparse hinge-like relations across the RGB video. (2) Outlier-aware Joint Optimization, for efficient camera parameter optimization by adaptive down-weighting of moving outliers, without reliance on motion priors. (3) A Two-stage Optimization Strategy, to enhance stability and optimization speed by a trade-off between the Softplus limits and convex minima in losses. We visually and numerically evaluate our camera estimates. To further validate accuracy, we feed the camera estimates into a 4D reconstruction method and assess the resulting 3D scenes, and rendered 2D RGB and depth maps. We perform experiments on 4 real-world datasets (NeRF-DS, DAVIS, iPhone, and TUM-dynamics) and 1 synthetic dataset (MPI-Sintel), demonstrating that our method estimates camera parameters more efficiently and accurately with a single RGB video as the only supervision.

  11. WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

    Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen vocabulary and parlance. In many real-world settings, collecting speech data is impractical, necessitating text-only adaptation. We propose WhisTLE, a deeply supervised, text-only adaptation method for pretrained encoder-decoder ASR models. WhisTLE trains a variational autoencoder (VAE) to model encoder outputs from text and fine-tunes the decoder using the learned text-to-latent encoder, optionally combined with text-to-speech (TTS) adaptation. At inference, the original encoder is restored, incurring no extra runtime cost. Across four out-of-domain datasets and four ASR models, WhisTLE with TTS reduces word error rate (WER) by 12.3% relative to TTS-only adaptation and outperforms all non-WhisTLE baselines in 27 of 32 scenarios.

  12. Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

    Role-playing agents (RPAs) have attracted growing interest for their ability to simulate immersive and interactive characters. However, existing approaches primarily focus on static role profiles, overlooking the dynamic perceptual abilities inherent to humans. To bridge this gap, we introduce the concept of dynamic role profiles by incorporating video modality into RPAs. To support this, we construct Role-playing-Video60k, a large-scale, high-quality dataset comprising 60k videos and 700k corresponding dialogues. Based on this dataset, we develop a comprehensive RPA framework that combines adaptive temporal sampling with both dynamic and static role profile representations. Specifically, the dynamic profile is created by adaptively sampling video frames and feeding them to the LLM in temporal order, while the static profile consists of (1) character dialogues from training videos during fine-tuning, and (2) a summary context from the input video during inference. This joint integration enables RPAs to generate greater responses. Furthermore, we propose a robust evaluation method covering eight metrics. Experimental results demonstrate the effectiveness of our framework, highlighting the importance of dynamic role profiles in developing RPAs.

  13. Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue

    The ultimate goal of embodied agents is to create collaborators that can interact with humans, not mere executors that passively follow instructions. This requires agents to communicate, coordinate, and adapt their actions based on human feedback. Recently, advances in VLAs have offered a path toward this goal. However, most current VLA-based embodied agents operate in a one-way mode: they receive an instruction and execute it without feedback. This approach fails in real-world scenarios where instructions are often ambiguous. In this paper, we address this problem with the Ask-to-Clarify framework. Our framework first resolves ambiguous instructions by asking questions in a multi-turn dialogue. Then it generates low-level actions end-to-end. Specifically, the Ask-to-Clarify framework consists of two components, one VLM for collaboration and one diffusion for action. We also introduce a connection module that generates conditions for the diffusion based on the output of the VLM. This module adjusts the observation by instructions to create reliable conditions. We train our framework with a two-stage knowledge-insulation strategy. First, we fine-tune the collaboration component using ambiguity-solving dialogue data to handle ambiguity. Then, we integrate the action component while freezing the collaboration one. This preserves the interaction abilities while fine-tuning the diffusion to generate actions. The training strategy guarantees our framework can first ask questions, then generate actions. During inference, a signal detector functions as a router that helps our framework switch between asking questions and taking actions. We evaluate the Ask-to-Clarify framework in 8 real-world tasks, where it outperforms existing state-of-the-art VLAs. The results suggest that our proposed framework, along with the training strategy, provides a path toward collaborative embodied agents.

Solidot(15)

  1. BlockBlasters 游戏补丁被发现含有恶意程序

    Valve 从 Steam 商店下架了 2D 平台游戏《BlockBlasters》,原因是该游戏最近释出的一个补丁被发现含有恶意程序。《BlockBlasters》于 7 月 31 日发布,8 月 30 日释出了补丁 Build 19799326,其中的文件 game2.bat 表现出了恶意行为,它会收集用户的 IP 和位置信息,检测安装的杀毒软件;收集用户的登录信息,上传收集的信息,执行 VBS 启动器脚本。它最终会安装一个后门和一个窃取程序,从 Google Chrome、Brave Browser 和 Microsoft Edge 窃取信息,它主要窃取加密货币。可能有数百名玩家受到这次攻击的影响。

  2. 中国海军成功测试舰载机电磁弹射

    新华社报道,中国海军宣布,歼-15T、歼-35和空警-600三型舰载机,已于此前成功完成在福建舰上的首次弹射起飞和着舰训练。这是我国首次在弹射型航母上,实现多型号先进舰载机的电磁弹射和阻拦着舰。美国在 2010 年完成了最早的陆基电磁弹射,福特号航母在 2013 年安装了第一套电磁弹射器,但因为种种问题至今没有进行舰载机电磁弹射测试。

  3. 英国银行仍然运行 1960 年代写的代码

    英国银行仍然运行 1960 年代写的代码,而了解这些代码的人寥寥无几。根据一项对 200 家英国银行的调查,16% 的银行依赖 1960 年代的软件,近 40% 的银行仍在维护 1970 年代的代码。50% 的银行承认,他们依赖的软件只有一两位已到或接近退休年龄的员工了解。31.5% 的银行表示,他们依赖一两位未到退休年龄的员工掌握旧系统。38 家银行称,他们仍在使用设计用于在穿孔卡等物理系统上运行的代码,15% 的银行运行的是为房间大小的大型机编写的代码。银行机构庞大,不太可能在每一次科技创新时都重构基础设施。一位受访者表示,其银行核心系统建于 1970 年代,至今仍在使用 Cobol 语言。

  4. 西雅图艰难应对科技工作减少

    微软雷德蒙德总部附近的 Five Stones 咖啡店几个月前招聘咖啡师,收到的应聘者简历列出了在微软等科技公司任职的经历,应聘者通常有硕士学位,有平面设计或市场营销经验,甚至拥有高级职位,而他们应聘的职位薪水是当地的最低薪水:时薪 16.66 美元。Five Stones 咖啡店没有录取这些高学历应聘者,而是优先考虑传统的入门级咖啡师,如学历为高中的人。根据跟踪裁员的 Layoffs.fyi 网站的数据,西雅图最大的两家科技公司微软和亚马逊自 2023 年以来裁员逾 4.6 万人,占到了当地科技公司裁员总数的 85%。科技行业大规模裁员冲击了西雅图的其它领域。亚马逊和微软园区周边商业和购物区的餐饮和零售支出减少,热门地区交易额下降 7%。西雅图在 2025 年上半年有 450 家餐厅关门,相当于当地餐厅总数的 16%。Uber 司机 Juan Prado 在 2021 年的收入达到六位数,经常接送乘客去面试,但今年此类的需求要少得多。当地的商业地产空置率也创历史新高。

  5. 天文学家在地球附近发现一颗准卫星

    天文学家在地球附近发现一颗准卫星(quasi-moon)。被称为 2025 PN7 的天体是一颗近地小行星,围绕太阳飞行一周大约一年时间,可能在地球附近徘徊了约 60 年,直到今年夏天近距离掠过地球时才被望远镜观测到。此类准卫星因体积小且暗淡无光而难以被发现,夏威夷的 Pan-STARRS 天文台是在 8 月 29 日观测到 2025 PN7,历史档案数据显示它在类地球轨道上运行了数十年。天文学家仍致力于确定 2025 PN7 的大小,有估计认为其直径为 19 米或 30 米,它可能是已知绕地球运行的最小准卫星。

  6. 微软的 Entra ID 漏洞可能造成灾难性的后果

    世界各地的企业过去十年将其数字基础设施从自托管服务器迁移至云端,它们受益于云服务提供商如微软提供的安全功能。但如果云服务商本身出现问题,后果可能将是灾难性的。安全研究员 Dirk-jan Mollema 在微软云服务 Azure 的身份和访问管理平台 Entra ID 发现了两个漏洞,可用于获得管理员权限,允许他访问 Entra ID 中储存的所有用户账号,从而造成灾难性影响。Mollema 于 7 月 14 日向微软披露了漏洞,微软于 7 月 17 日发布了补丁。微软之后向 Mollema 确认,问题已于 7 月 23 日修复,8 月实施了额外补救措施。微软于 9 月 4 日公开了漏洞的 CVE。

  7. 微软向用户推送 Gaming Copilot,中国地区除外

    微软开始向 Windows 11 用户推送游戏助手 Gaming Copilot 的 Beta 版本,中国地区除外。Gaming Copilot 还将在下个月提供给 Xbox 移动应用用户。要使用 Gaming Copilot,用户需要安装 Xbox PC app,使用快捷键 Windows + G 打开 Game Bar,寻找到 Gaming Copilot,登陆 Xbox 帐户。玩家可使用 Gaming Copilot 的语音模式(Voice Mode)获取游戏任务相关的帮助,推荐新游戏,查看成就或游戏历史记录等。

  8. 退休有助于改善心理健康,但并非人人如此

    退休有助于改善心理健康,但并非对人人有效。根据发表在《SSM - Mental Health》起上的一项新研究,在结束朝九晚五的工作后,退休对心理健康的影响取决于收入水平、所从事工作的性质以及离开职场的年龄。爱丁堡大学团队评估了荷兰 1583 名受访者的数据集,受访者已退休且未从事任何有偿工作,平均退休年龄 66-67 岁。结果显示:低收入群体——收入低于最低工资标准——退休期间心理健康水平最低,他们表现出了初期的改善,但约两年半后出现下降,呈现出倒 U 形特征,即所谓的“蜜月消退效应”;中等收入群体退休前的心理健康状况显著改善,之后略有提升;高收入群体在退休前后心理健康状况没有变化,但在退休后出现显著改善,其中较晚退休者改善较慢。

  9. 教宗拒绝授权创建一个 AI 教宗

    教宗良十四世(Pope Leo XIV)拒绝创建一个 AI 教宗(AI Pope),他在接受传记作家 Eloise Allen 采访时表示:“如果要说谁不应该被虚拟化身所代表,我认为教宗绝对位居前列。” 教宗说:“最近有人请求授权打造一个我 的 AI 版本,任何人都可以登录网站与教宗进行私人对话。AI 教宗会回答他们的问题,而我说‘我不会授权’。”良十四世此前曾表示,他选择“良”名号部分是为了致敬 19 世纪的教宗良十三世(Pope Leo XIII)。良十三世以其论述工业革命期间工人阶级受剥削问题的《新事通谕》(Rerum novarum)闻名。良十四世对枢机主教们的首次讲话便称 AI 是“另一场工业革命”。教宗说:“如果整个世界自动化,只有少数人有能力不仅生存下去还能过上富足有意义的生活,那么未来将会出现一个大问题,一个严重的问题。”他担心投资 AI 的超级富豪完全忽视了人类和人性的价值,如果教会不为此发声,或者无人为此发声,那么危险就在于数字世界将自行其是,人类沦为棋子,被置之一旁。教宗表示不反对进步或新技术,但他不喜欢目前事情的发展方向。他此前曾表示,尽管 AI 能够“模拟人类推理的某些方面”,并以惊人的效率完成任务,但它仍然无法替代真正的道德分辨和建立“真正的人际关系”。这些科技的发展,必须与人类和社会的价值观,能做出合乎良知判断的能力以及日益增长的人类责任感并行。

  10. 地中海饮食与更低的牙龈疾病风险相关

    英国研究发现,饮食习惯接近地中海饮食的人群,牙龈健康状况往往更佳,患牙龈疾病的可能性更低,体内炎症水平也可能更低。不遵循地中海饮食模式的人,更易患上严重的牙龈疾病,尤其是频繁食用红肉的人群。研究团队对英国伦敦国王学院口腔、牙科与颅面生物样本库中的 200 名住院患者展开评估,具体方式包括为患者进行牙科检查、采集血样,以及通过问卷调查了解饮食习惯。地中海饮食以注重水果、蔬菜、全谷物和健康脂肪而闻名。已有研究表明,这种饮食模式与重大疾病较低的患病风险相关,包括心血管疾病、神经退行性疾病和某些癌症。最新研究结果表明,均衡的地中海饮食可能有助于减少牙龈疾病的发生,并降低全身性炎症水平。

  11. OpenAI 研究人员称 AI 幻觉在数学上是不可避免的

    OpenAI 研究人员在预印本平台 arxiv 上发表了一篇论文,指出由于大模型使用的统计学特性以及计算限制,即使有完美的数据,AI 仍然会产生貌似可信但实际上错误的输出。研究人员承认,AI 幻觉在数学上是不可避免的,无法通过更先进的工程技术解决。研究人员在论文中称,类似面对考试难题的学生,大模型会在不确定的情况下猜测,产生貌似可信但实际错误的表述,而不是承认不确定性。即使是最先进的 AI 系统,幻觉仍然存在,会破坏信任。研究人员证明,幻觉源于训练大模型使用的统计学特性,而非实现缺陷。研究人员测试了竞争对手 DeepSeek-V3 模型、Meta AI 和 Claude 3.7 Sonnet,以及 OpenAI 自己的 GPT 系列模型。研究人员称,ChatGPT 也会产生幻觉,GPT-5 有更少的幻觉,但幻觉仍然会发生,且更先进的推理模型比简单的系统更容易产生幻觉:o1 推理模型 16% 的时间会产生幻觉,而较新的 o3 和 o4-mini 分别有 33% 和 48% 的时间会产生幻觉。OpenAI 的研究识别了三个导致幻觉不可避免的数学因素:当训练数据集中信息过少时的认知不确定性,模型局限性和计算难解性。

  12. 搜狗输入法云控下发模块悄悄纂改 Edge 和 Chrome 配置

    安全公司火绒报告,搜狗输入法云控下发模块会悄悄纂改 Edge 和 Chrome 浏览器的主页和默认搜索引擎设置。相关搜狗输入法版本为 15.7.0.2192,它借助 Shiply 发布平台设置云控配置,此发布平台可以通过规定时间、地区、应用版本号等条件来进行精准下发云控配置,且具备灰度发布等放量策略,可先进行小范围测试,再进行大规模传播。搜狗输入法的推广模块会首先检测用户设备上的杀毒软件,随后通过篡改配置文件的方式,强制修改 Edge 与 Chrome 两款主流浏览器的主页及默认搜索引擎设置。以 Chrome 为例,打开浏览器会跳转至 page.wenxin9.com,随后再跳转至导航页。在导航页内点击百度链接,链接均带有来源标识参数。

  13. 梵蒂冈的 Flathub 软件包人均安装量最高

    天主教教宗驻地最爱 Linux 软件了。Reddit 用户统计了 Linux 应用商店 Flathub 在不同国家/地区的下载量,然后根据其人口总数计算了人均下载量。结果出人意料:罗马的城中之国梵蒂冈的人均下载量全球最高,一大原因是梵蒂冈只有 496 人(维基百科给出的 2024 年数据是 882 人),总下载量 6878,人均下载量接近 14,远超第二名德国的人均 4 次。统计显示,Flathub 在欧洲、美国、加拿大、澳大利亚和新西兰相当受欢迎,但在亚洲和非洲没多少用户。Flathub 提供了 Flatpak 打包的 Linux 软件,绝大部分 Linux 发行版都支持运行 Flatpak 软件。

  14. 奥地利军方从 MS Office 切换到 LibreOffice

    奥地利军方从私有的 MS Office 切换到了开源的 LibreOffice。此举并非是为了节省大约 1.6 万台工作站的软件许可证费用,用军方官员的话说是为了加强数字主权,维护基础设施的独立性,确保数据仅在内部处理。主要动机是微软的办公软件正迁移到云端,而军方不可能使用外部云服务处理内部数据。奥地利军方从 2022 年开始就在试用 LibreOffice 为迁移做准备。军方此前使用的是 Microsoft Office 2016 Professional,如今已经卸载,但仍需要使用微软办公软件的用户可以内部申请使用 MS Office 2024 LTSC。奥地利军方在使用 LibreOffice 过程中还为开源项目贡献了代码。

  15. 小米将远程修复其 11 万辆 SU7 的辅助驾驶系统缺陷

    小米和国家市场监督管理总局周五宣布,召回 2024 年 2 月 6 日至 2025 年 8 月 30 日生产的部分 SU7 标准版电动汽车,共计 116887 辆。召回编号S2025M0149I:涉及XMA7000MBEVR2和XMA7000MBEVR5车型,共计98462辆。召回编号S2025M0150I:涉及BJ7000MBEVR2车型,共计18425辆。本次召回范围内部分车辆在L2高速领航辅助驾驶功能开启的某些情况下,对极端特殊场景的识别、预警或处置可能不足,若驾驶员不及时干预可能会增加碰撞风险,存在安全隐患。小米汽车科技有限公司将通过汽车远程升级(OTA)技术,为召回范围内的车辆免费升级软件,以消除安全隐患。今年早些时候曾发生过一起涉及 SU7 辅助驾驶系统的致命车祸,导致三名大学生死亡。