DIGEST · 2025-10-14

OrangeBot.AI Digest — 2025-10-14

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Surveillance data challenges what we thought we knew about location tracking (www.lighthousereports.com)
  2. Half of America's Voting Machines Are Now Owned by a MAGA Oligarch (dissentinbloom.substack.com)
  3. The day my smart vacuum turned against me (codetiger.github.io)
  4. America Is Sliding Toward Illiteracy (www.theatlantic.com)
  5. Why your boss isn't worried about AI – "can't you just turn it off?" (boydkane.com)
  6. What Americans die from vs. what the news reports on (ourworldindata.org)
  7. A 12,000-year-old obelisk with a human face was found in Karahan Tepe (www.trthaber.com)
  8. How bad can a $2.97 ADC be? (excamera.substack.com)
  9. GPT-5o-mini hallucinates medical residency applicant grades (www.thalamusgme.com)
  10. Astronomers 'image' a mysterious dark object in the distant Universe (www.mpg.de)
  11. CRISPR-like tools that finally can edit mitochondria DNA (www.nature.com)
  12. ADS-B Exposed (adsb.exposed)
  13. Pyrefly: Python type checker and language server in Rust (pyrefly.org)
  14. Zoo of array languages (ktye.github.io)
  15. Why is everything so scalable? (www.stavros.io)

GitHub Trending(15)

  1. anthropics / prompt-eng-interactive-tutorial

    Anthropic's Interactive Prompt Engineering Tutorial

  2. nvm-sh / nvm

    Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

  3. GorvGoyl / Clone-Wars

    100+ open-source clones of popular sites like Airbnb, Amazon, Instagram, Netflix, Tiktok, Spotify, Whatsapp, Youtube etc. See source code, demo links, tech stack, github stars.

  4. alibaba / spring-ai-alibaba

    Agentic AI Framework for Java Developers

  5. datawhalechina / happy-llm

    📚 从零开始的大语言模型原理与实践教程

  6. asgeirtj / system_prompts_leaks

    Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini

  7. opendatalab / MinerU

    Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

  8. nitrojs / nitro

    Next Generation Server Toolkit. Create web servers with everything you need and deploy them wherever you prefer.

  9. Klavis-AI / klavis

    Klavis AI (YC X25): MCP integration platforms that let AI agents use tools reliably at any scale

  10. chili-chips-ba / wireguard-fpga

    Full-throttle, wire-speed hardware implementation of Wireguard VPN, using low-cost Artix7 FPGA with opensource toolchain. If you seek security and privacy, nothing is private in our codebase. Our door is wide open for backdoor scrutiny, be it related to RTL, embedded, build, bitstream or any other aspect of design and delivery package. Bujrum!

  11. dair-ai / Prompt-Engineering-Guide

    🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

  12. 1Panel-dev / MaxKB

    🔥 MaxKB is an open-source platform for building enterprise-grade agents. MaxKB 是强大易用的开源企业级智能体平台。

  13. public-apis / public-apis

    A collective list of free APIs

  14. KellerJordan / modded-nanogpt

    NanoGPT (124M) in 3 minutes

  15. volcengine / MineContext

    MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)

Hugging Face(15)

  1. QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

    We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs). While RL is essential for LLMs' reasoning capabilities, it is resource-intensive, requiring substantial GPU memory and long rollout durations. QeRL addresses these issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RL while reducing memory overhead. Beyond efficiency, our findings show that quantization noise increases policy entropy, enhancing exploration, and enabling the discovery of better strategies during RL. To further optimize exploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism, which dynamically adjusts noise during training. Experiments demonstrate that QeRL delivers over 1.5 times speedup in the rollout phase. Moreover, this is the first framework to enable RL training of a 32B LLM on a single H100 80GB GPU, while delivering overall speedups for RL training. It also achieves faster reward growth and higher final accuracy than 16-bit LoRA and QLoRA, while matching the performance of full-parameter fine-tuning on mathematical benchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. These results establish QeRL as an efficient and effective framework for RL training in LLMs.

  2. Diffusion Transformers with Representation Autoencoders

    Latent generative modeling, where a pretrained autoencoder maps pixels into a latent space for the diffusion process, has become the standard strategy for Diffusion Transformers (DiT); however, the autoencoder component has barely evolved. Most DiTs continue to rely on the original VAE encoder, which introduces several limitations: outdated backbones that compromise architectural simplicity, low-dimensional latent spaces that restrict information capacity, and weak representations that result from purely reconstruction-based training and ultimately limit generative quality. In this work, we explore replacing the VAE with pretrained representation encoders (e.g., DINO, SigLIP, MAE) paired with trained decoders, forming what we term Representation Autoencoders (RAEs). These models provide both high-quality reconstructions and semantically rich latent spaces, while allowing for a scalable transformer-based architecture. Since these latent spaces are typically high-dimensional, a key challenge is enabling diffusion transformers to operate effectively within them. We analyze the sources of this difficulty, propose theoretically motivated solutions, and validate them empirically. Our approach achieves faster convergence without auxiliary representation alignment losses. Using a DiT variant equipped with a lightweight, wide DDT head, we achieve strong image generation results on ImageNet: 1.51 FID at 256x256 (no guidance) and 1.13 at both 256x256 and 512x512 (with guidance). RAE offers clear advantages and should be the new default for diffusion transformer training.

  3. OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

    Recent advances in multimodal large language models (MLLMs) have demonstrated substantial potential in video understanding. However, existing benchmarks fail to comprehensively evaluate synergistic reasoning capabilities across audio and visual modalities, often neglecting either one of the modalities or integrating them in a logically inconsistent manner. To bridge this gap, we introduce OmniVideoBench, a large-scale and rigorously designed benchmark dedicated to assessing synergistic audio-visual understanding, with a strong emphasis on modality complementarity and logical consistency. Specifically, OmniVideoBench comprises 1000 high-quality question-answer(QA) pairs, each annotated with step-by-step reasoning traces, derived from 628 diverse videos ranging from several seconds to 30 minutes, and manually verified to guarantee complete correctness and uniqueness. Moreover, OmniVideoBench encompasses 13 carefully designed question types, covering temporal reasoning, spatial localization, counting, causal inference, summarization, and beyond, thereby capturing the essential challenges of video understanding. Evaluation of multiple MLLMs on OmniVideoBench reveals a pronounced gap between model performance and human reasoning, with open-source models lagging significantly behind their closed-source counterparts, underscoring the inherent difficulty of genuine audio-visual reasoning. We will release OmniVideoBench to foster the development of MLLMs with stronger and more generalizable reasoning capabilities.

  4. Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

    Autoregressive (AR) models remain the standard for natural language generation but still suffer from high latency due to strictly sequential decoding. Recent diffusion-inspired approaches, such as LlaDA and Dream, mitigate this by generating in parallel, yet they suffer from two core limitations: information loss, as predictive distributions for non-finalized tokens are discarded at each step, and premature commitment, where local decisions are made without sufficient global coordination. We introduce Latent Refinement Decoding (LRD), a two-stage framework with Latent Refinement and a Predictive Feedback Loop. The first stage maintains masked positions as distributional mixtures of predicted tokens and the mask embedding, allowing the model to establish more globally consistent beliefs. The second stage progressively finalizes confident tokens while retaining uncertain ones for iterative feedback. KL-divergence dynamics provide a principled and reliable criterion for convergence and early stopping. Experiments across coding (HumanEval +6.3, MBPP +2.6) and reasoning (GSM8K +2.9, MATH500 +3.8) show that LRD improves accuracy while delivering speedups of up to 10.6x, making it a strong and versatile alternative for parallel sequence generation.

  5. RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

    Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a promising framework for improving reasoning abilities in Large Language Models (LLMs). However, policy optimized with binary verification prone to overlook potential valuable exploration in reasoning trajectory. In view of heavy annotation cost of golden Process Reward Models (PRMs), recent works attempt using auxiliary signals for reward shaping of process tokens, involving entropy and likelihood collected from logit space. In this work, we offer a novel perspective on shaping RLVR with flow rewards derived from latent space, and propose RLFR, where the flow fields of model latents are constructed from either off-policy high-quality data and on-policy rejection sampling data, and the velocity deviations of policy latents within it are quantified to serve as a reward signal. RLFR first demonstrates that a well-established flow field can be a sound environment for reward signal collection, highlighting the expressive latent space is much underexplored. Moreover, RLFR is able to compress any off-policy expert data as reference for constituting reward signals, and we show that the efficient context dependence compressed within the hidden states are utilized, rather than individual token-level denotation for context comprehending. Experiments on both language and multimodal reasoning benchmarks demonstrate the reliability of flow rewards, and suggesting a promising paradigm for reward shaping with auxiliary signals.

  6. Spotlight on Token Perception for Multimodal Reinforcement Learning

    While Reinforcement Learning with Verifiable Rewards (RLVR) has advanced the reasoning capabilities of Large Vision-Language Models (LVLMs), most existing methods in multimodal reasoning neglect the critical role of visual perception within the RLVR optimization process. In this paper, we undertake a pioneering exploration of multimodal RLVR through the novel perspective of token perception, which measures the visual dependency of each generated token. With a granular analysis of Chain-of-Thought (CoT) processes, we uncover two key insights: first, token perception in a rollout trajectory is sparsely distributed, where only a small fraction of tokens have high visual dependency for visually-grounded reasoning; second, different trajectories exhibit significant divergence in their overall visual dependency. Based on these observations, we propose Visually-Perceptive Policy Optimization (VPPO), a novel policy gradient algorithm that explicitly leverages token perception to refine the learning signal. Specifically, VPPO achieves this through a dual mechanism: it reweights a trajectory's advantage by its overall visual dependency, and focuses policy updates exclusively on perceptually pivotal tokens. On a comprehensive suite of eight perception and reasoning benchmarks, VPPO demonstrates substantial gains over leading open-source RL-tuned models, with its effectiveness consistently validated across 7B and 32B model scales. Our findings not only establish a new token-level perceptual perspective for analyzing multimodal RLVR but also present a novel and effective optimization strategy to significantly enhance the multimodal reasoning capabilities of LVLMs.

  7. AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

    Audiovisual video captioning aims to generate semantically rich descriptions with temporal alignment between visual and auditory events, thereby benefiting both video understanding and generation. In this paper, we present AVoCaDO, a powerful audiovisual video captioner driven by the temporal orchestration between audio and visual modalities. We propose a two-stage post-training pipeline: (1) AVoCaDO SFT, which fine-tunes the model on a newly curated dataset of 107K high-quality, temporally-aligned audiovisual captions; and (2) AVoCaDO GRPO, which leverages tailored reward functions to further enhance temporal coherence and dialogue accuracy while regularizing caption length and reducing collapse. Experimental results demonstrate that AVoCaDO significantly outperforms existing open-source models across four audiovisual video captioning benchmarks, and also achieves competitive performance on the VDC and DREAM-1K benchmark under visual-only settings.

  8. DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

    In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation and intra-domain augmentation, applied at both the pre-VAE image level and the post-VAE token level. At the image level, we incorporate cross-domain knowledge through perspective image guidance and panoramic refinement, which enhance perceptual quality while regularizing diversity and photorealism. At the token level, hybrid supervision is applied across multiple modules, which include circular padding for boundary continuity, yaw loss for rotational robustness, and cube loss for distortion awareness. Extensive experiments on text-to-panorama, inpainting, and outpainting tasks demonstrate that our method achieves better boundary consistency and image fidelity across eleven quantitative metrics. Our code is available at https://github.com/Insta360-Research-Team/DiT360.

  9. Demystifying Reinforcement Learning in Agentic Reasoning

    Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning from three key perspectives: data, algorithm, and reasoning mode. We highlight our key insights: (i) Replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT initialization; high-diversity, model-aware datasets sustain exploration and markedly improve RL performance. (ii) Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency. (iii) A deliberative strategy with fewer tool calls outperforms frequent tool calls or verbose self-reasoning, improving tool efficiency and final accuracy. Together, these simple practices consistently enhance agentic reasoning and training efficiency, achieving strong results on challenging benchmarks with smaller models, and establishing a practical baseline for future agentic RL research. Beyond these empirical insights, we further contribute a high-quality, real end-to-end agentic SFT dataset along with a high-quality RL dataset, and demonstrate the effectiveness of our insights in boosting the agentic reasoning ability of LLMs across four challenging benchmarks, including AIME2024/AIME2025, GPQA-Diamond, and LiveCodeBench-v6. With our recipes, 4B-sized models could also achieve superior agentic reasoning performance compared to 32B-sized models. Code and models: https://github.com/Gen-Verse/Open-AgentRL

  10. Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

    While LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm, since certain risks can lead to severe consequences once carried out. However, existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level. To address this challenge, we highlight three critical gaps in current research: data gap, model gap, and evaluation gap. To close the data gap, we introduce AuraGen, a controllable engine that (i) synthesizes benign trajectories, (ii) injects category-labeled risks with calibrated difficulty, and (iii) filters outputs via an automated reward model, producing large and reliable corpora for pre-execution safety. To close the guardian model gap, we propose a foundational guardrail Safiron, combining a cross-planner adapter with a compact guardian model. The adapter unifies different input formats, while Safiron flags risky cases, assigns risk types, and generates rationales; trained in two stages with a broadly explored data recipe, Safiron achieves robust transfer across settings. To close the evaluation gap, we release Pre-Exec Bench, a realistic benchmark covering diverse tools and branching trajectories, which measures detection, fine-grained categorization, explanation, and cross-planner generalization in human-verified scenarios. Extensive experiments demonstrate consistent gains of the proposed guardrail over strong baselines on Pre-Exec Bench, and ablations further distill actionable practices, providing a practical template for safer agentic systems.

  11. InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

    General SVG modeling remains challenging due to fragmented datasets, limited transferability of methods across tasks, and the difficulty of handling structural complexity. In response, we leverage the strong transfer and generalization capabilities of multimodal large language models (MLLMs) to achieve unified modeling for SVG understanding, editing, and generation. We present the InternSVG family, an integrated data-benchmark-model suite. At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks, encompassing both static graphics and dynamic animations. It covers icons, long-sequence illustrations, scientific diagrams, and dynamic animations, supporting tasks of varied difficulty levels and providing deeper hierarchies with richer attributes compared to previous datasets. Based on this resource, we introduce SArena, a companion benchmark with comprehensive task definitions and standardized evaluation that aligns with the domains and difficulty spectrum covered by SAgoge. Building on these foundations, we propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens, subword-based embedding initialization, and a two-stage training strategy that progresses from short static SVGs to long-sequence illustrations and complex animations. This unified formulation induces positive transfer and improves overall performance. Experiments on SArena and prior benchmark confirm that InternSVG achieves substantial gains and consistently outperforms leading open and proprietary counterparts.

  12. BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

    Efficiently solving real-world problems with LLMs increasingly hinges on their ability to interact with dynamic web environments and autonomously acquire external information. While recent research like Search-R1 and WebDancer demonstrates strong performance in solving web tasks, they heavily rely on additional tools to convert the interactive web environment into static text content. This is in contrast to human browsing behaviors, which involve diverse interactions with the browser, such as scrolling, clicking, and typing. In this paper, we propose BrowserAgent, a more interactive agent that solves complex tasks through human-inspired browser actions. BrowserAgent operates directly on raw web pages via Playwright through a set of predefined browser actions. We adopt a two-stage training (Supervised Fine-Tuning (SFT) and Rejection Fine-Tuning (RFT)) to improve the model's generalization abilities. Despite using significantly less training data than Search-R1, BrowserAgent achieves more competitive results across different Open-QA tasks. Additionally, we introduce an explicit memory mechanism to store key conclusions across steps, further enhancing the model's reasoning capabilities for long-horizon tasks. Notably, BrowserAgent-7B can achieve around 20\% improvement over Search-R1 on multi-hop QA tasks like HotpotQA, 2Wiki, and Bamboogle. These results indicate that BrowserAgent can serve as a more advanced framework for more interactive and scalable web agents.

  13. ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems

    In recent years, the research focus of large language models (LLMs) and agents has shifted increasingly from demonstrating novel capabilities to complex reasoning and tackling challenging tasks. However, existing evaluations focus mainly on math/code contests or general tasks, while existing multi-domain academic benchmarks lack sufficient reasoning depth, leaving the field without a rigorous benchmark for high-level reasoning. To fill this gap, we introduce the Acadreason benchmark, designed to evaluate the ability of LLMs and agents to acquire and reason over academic knowledge. It consists of 50 expert-annotated academic problems across five high-reasoning domains, including computer science, economics, law, mathematics, and philosophy. All questions are sourced from top-tier publications in recent years and undergo rigorous annotation and quality control to ensure they are both challenging and answerable. We conduct systematic evaluations of over 10 mainstream LLMs and agents. The results show that most LLMs scored below 20 points, with even the cutting-edge GPT-5 achieving only 16 points. While agents achieved higher scores, none exceeded 40 points. This demonstrates the current capability gap between LLMs and agents in super-intelligent academic research tasks and highlights the challenges of Acadreason.

  14. Don't Just Fine-tune the Agent, Tune the Environment

    Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce Environment Tuning, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. Environment Tuning orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents.

  15. DocReward: A Document Reward Model for Structuring and Stylizing

    Recent advances in agentic workflows have enabled the automation of tasks such as professional document generation. However, they primarily focus on textual quality, neglecting visual structure and style, which are crucial for readability and engagement. This gap arises mainly from the absence of suitable reward models to guide agentic workflows toward producing documents with stronger structural and stylistic quality. To address this, we propose DocReward, a document reward model that evaluates documents based on their structure and style. We construct a multi-domain dataset DocPair of 117K paired documents, covering 32 domains and 267 document types, each including a high- and low-professionalism document with identical content but different structure and style. This enables the model to evaluate professionalism comprehensively, and in a textual-quality-agnostic way. DocReward is trained using the Bradley-Terry loss to score documents, penalizing predictions that contradict the annotated ranking. To assess the performance of reward models, we create a test dataset containing document bundles ranked by well-educated human evaluators. Notably, DocReward outperforms GPT-4o and GPT-5 in accuracy by 30.6 and 19.4 percentage points, respectively, demonstrating its superiority over baselines. In an extrinsic evaluation of document generation, DocReward achieves a significantly higher win rate of 60.8%, compared to GPT-5's 37.7% win rate, demonstrating its utility in guiding generation agents toward producing human-preferred documents.

Solidot(15)

  1. x86 生态系统顾问团队过去一年的成果

    AMD 和英特尔去年 10 月宣布组建一个 x86 生态系统顾问团队,致力于在 x86 架构实现上有更高的一致性。x86 生态系统某种程度上由 AMD 和英特尔共同开发,但两家公司保持着距离,导致了部分指令集架构存在低效和偏差问题。高级矢量扩展指令集(Advanced Vector Extensions 或 AVX)就是一个典型例子,AVX-512 在多年里只能通过英特尔平台使用,AMD 是从 2022 年的 Zen 4 起加入了对 AVX-512 的初步支持,2024 年发布的 Zen 5 才完整支持 512 位数据路径。过去一年 x86 生态系统顾问团的成果包括:内存标记指令集架构 ChkTag,Flexible Return Event Delivery (FRED),AVX 指令集的下一代 AVX10,用于矩阵乘法的 Advanced Matrix Extensions (AMX) ACE,等等。

  2. 美国 AI 淘金热下制造业疲软

    美国的 AI 产业在蓬勃发展,但制造业则陷入了更深层的衰退。美国制造业在 1979 年的巅峰时期雇佣了约 1950 万名工人。这一数字此后已缩减至不到 1300 万,而截至今年八月的一年内,制造业又减少了约 7.8 万个工作岗位。普查数据也显示,新成立的制造商数量在减少。美国经济分析局的数据显示,截至 7 月的一年内工厂投资下降了约 6%,这是自 2021 年初以来的首次下降。特朗普的关税政策也导致制造商们的利润下降。制造业的低迷与 AI 投资的巨额增长形成了鲜明对比。AI 行业使用的硬件大部分都免增关税。2025 年上半年,美国数据中心投资同比增长近 37%,而同期工厂建设则下降了约 3%。美国对计算设备的投资同比增长逾 45%,而传统工业设备的支出几乎没有变化。

  3. 日本夏季过去 42 年增加了 3 周

    日本三重大学的研究团队发现,1982-2023 年的 42 年间,日本的“夏季时长”增加了约 3 周。“冬季时长”基本未变,春季与秋季则不断缩短。团队警示称:“全球变暖导致的海面水温上升是主要原因。若变暖趋势持续,‘长夏+长冬’的两季化现象将进一步加剧”。42 年间夏季起始日提前了约 12.6 天,结束日推迟了约 8.8 天,总时长增加了约 21.4 天。以 2023 年为例,日本夏季为 6 月11 日至 10 月 9日,共计 121 天。

  4. 教宗督促警惕控制算法的人

    教宗良十四世上周接见了出席第 39 届国际新闻机构协会大会的参与者,督促在充斥“垃圾”资讯和数字媒体的时代中培育“良知”和“批判性思维”。他说,“我们并非注定要生活在一个真实与虚构再也难以区分的世界里”,他引用汉娜‧鄂兰(Hannah Arendt)的一句名言:“极权统治的理想臣民不是坚定的纳粹份子或共产主义者,而是那些再也无法分辨事实与虚构孰真孰假的人。”教宗说:“算法以前所未有的规模和速度生成内容与数据,但谁在掌控它们?人工智能正在改变我们获取资讯与交流的方式,但又是谁在引导它们,以及出于什么样的目的?”教宗提醒:“我们必须保持警惕,确保科技不会取代人类,以及对资讯和算法的管理今天不会被少数人掌控。”

  5. 大部分开放权重模型都来自中国

    尽管美国公司如 OpenAI、Anthropic 和 Google 的大模型在基准测试上处于世界领先水平,但这些模型基本上都是私有不公开权重的,根据 Hugging Face 和 LMArena 的统计,中国公司 DeepSeek 和阿里巴巴发布了下载量最多的公开权重的开放模型。Meta 一度倡导开放模型,扎克伯格(Mark Zuckerberg)去年还表示如果 AI 公司能分享模型,世界将会从中受益。但此后 Meta 放慢了公开其最新模型的步伐,扎克伯格如今表示要将最好的模型留给自己。

  6. 微软终止对 windows 10 的支持

    10 月 14 日,微软终止了对 Windows 10 的主流支持,进入到了扩展支持阶段。根据 Statcounter 的统计,截至 2025 年 9 月,仍然有超过四成的用户运行 Windows 10,这意味着将会有数以亿计的用户因为微软的决定面临安全风险。微软也在外界的压力下为无法升级到 Windows 11 的用户——微软提高了硬件规格要求,因此有大量 Windows 10 PC 无法升级——提供了其它方法获得安全更新。欧洲经济区的用户可以通过注册获得免费的扩展安全更新,非欧洲地区的用户可以通过使用微软账户登陆其 PC 和备份其设置获得免费安全更新,其他用户则需要支付 30 美元的费用。

  7. 诺贝尔经济学奖授予了研究创新对经济影响的三名经济学家

    瑞典皇家科学院宣布了 2025 年诺贝尔经济学奖得主:美国经济学家 Joel Mokyr,以表彰他通过过历史观察确定通过技术进步实现持续增长的必要因素;美国经济学家 Peter Howitt 和英国经济学家 Philippe Aghion,以表彰他们的通过创造性破坏实现持续增长的理论。瑞典皇家科学院称,获奖者的研究成果阐释了科技如何催生新产品和新的生产方法,取代旧产品和方法,提高了全球人民的生活水平和生活质量,促进健康。过去两个世纪,世界首次实现了经济持续增长。大量人口摆脱了贫困,为繁荣奠定了基础。而在人类历史上,大部分时间里经济停滞而非增长才是常态。他们的工作表明,我们必须意识到并应对持续增长面临的威胁。

  8. SmartNav 将城市 GPS 精度提升到 10 厘米

    GPS 系统在城市建筑物高密度地区精度不高,原因是信号会被高层建筑物表面反射,需要更长时间才会到达 GPS 接收器,从而影响到距离的计算,导致位置精度不佳。挪威科技大学的研究人员研发出 SmartNav,组合了卫星校正、波分析和 Google 的 3D 建筑数据,测试中 90% 的时间里实现了 10 厘米以内精度,新的方法提供了一种廉价的城市导航技术。Google 拥有全世界大约 4000 座城市的建筑物 3D 模型数据,它利用这些数据计算卫星信号在建筑物之间的反射。

  9. 高龄父亲会将更多致病突变遗传给后代

    发表于《自然》的新研究显示,高龄父亲将致病突变遗传给孩子的风险比我们想象的要高。基因组测序显示,在 30 岁出头的男性中,大约每 50 个精子中就有 1 个携带致病突变;而到 70 岁时,这一比例上升到近 1/20。 研究人员建议,如果年轻男性认为自己要年纪大一些时再有孩子,他们可以考虑冷冻精子;而计划组建家庭的年长男性则可以考虑现有的各种筛查技术。最近的研究表明,我们每个人体内的大多数细胞中都有约 70 个父母都没有的新突变,其中 80% 的突变源于父亲的睾丸,这还不包括母亲卵子中更常见的大规模染色体异常。

  10. 法拉利宣布首款电动跑车

    法拉利宣布其首款电动跑车 Elettrica 将于明年夏天推出。Elettrica 的最高时速 310 公里/小时,百公里加速仅需 2.5 秒,续航里程 530 公里,最高 350 kW 的超快直流充电,电池容量 122 kWh,能量密度 195 Wh/kg——法拉利称这是量产电动汽车中最高的。电动汽车通常因为发动机过于安静而会去模拟机械发动机的轰鸣声,Elettrica 采用了不同的方法:安装在逆变器上的传感器会探测动力系统的真实机械振动,然后将其放大,创造出一种反映驾驶方式的不断变化的自然音调。法拉利称声音为司机提供了一种反馈功能,司机如果喜欢安静驾驶可以选择将其关闭。

  11. Firefox 改进配置文件管理

    Firefox 多年来一直支持创建多个配置文件去存储个人信息,以便将工作与个人浏览分开、测试不同设置,或与他人共享计算机。但 Firefox 没有让配置文件更容易被发现或管理。现在情况即将发生改变,Mozilla 宣布将推出配置文件管理功能,用户能更轻松地创建和切换配置文件。该功能将于 10 月 14 日起逐步推广给用户。

  12. 新生儿血液中的超级细菌十分普遍

    根据发表在《Lancet Regional Health – Western Pacific》上的一项研究,研究人员分析了 2019-2020 年间斯里兰卡、印度尼西亚、马来西亚、越南和菲律宾十所医院收集的近 1.5 万份患病婴儿的血液样本,发现人类与耐药细菌的战争并不顺利,新生儿血液中的耐药菌(或称之为超级细菌)十分普遍。近八成新生儿感染的是革兰氏阴性菌如大肠杆菌(E. coli,)、克雷伯菌(Klebsiella)和不动杆菌(Acinetobacter)。革兰氏阴性菌因其细胞膜结构,比革兰氏阳性菌更容易产生抗生素耐药性。研究人员称,新生儿出生几天后就会感染耐药菌。研究还发现,真菌感染导致了近十分之一的婴儿严重感染。

  13. 流浪天体被发现可能是一颗反复爆发的亚恒星

    一项研究发现,一颗自由漂浮的行星吞噬了数量惊人的物质——每秒可以吃掉 60 亿吨气体和尘埃。这一发现模糊了行星与恒星之间的界限,暗示着恒星和行星的形成过程比想象中更相似。流浪行星是一种不围绕任何母恒星上的自由漂浮的气体星球,它们极其常见,甚至可能超过银河系中的恒星数量。但流浪行星的形成方式令天文学家困惑不已:它们会像其他行星一样先是围绕恒星运行,然后被放逐后独自在银河系中漫游吗?亦或者它们可以像恒星一样自行形成?天文学家最近发现了一颗名为 Cha 1107-7626 的流浪天体正以惊人的井喷式速度增长。早在 2008 年,该天体因其周围形成了看起来像原始行星盘的物质,曾首次引起天文学家的注意。6 月 Cha 1107-7626 突然开始以之前近 10 倍的速度消耗物质,并持续了两个月。这达到了以往只有在恒星中才能看到的增长速度。研究团队认为,一定有一种类似于恒星中发现的机制在起作用,即强磁场将物质从远处的气体和尘埃体积中通过狭窄的通道输送。但目前尚不清楚这颗行星是如何或为什么突然开始消耗如此多的质量。

  14. 金星大气层含水量超预期

    金星曾经被认为是一个十分干燥、富含硫酸大气的行星,美国科学家重新分析了先驱者金星计划留下的资料,发现金星大气不只是硫酸量比先前认为的少,还有比预期更多的水和氧化铁。先驱者金星2号任务搭载了一大三小共计 4 架金星大气层的探测器,让探测器在落下的过程持续收集金星大气的成分等等数据。在降落过程中探测器也同时收集到了金星大气中的气胶,而气胶在进入探测器后分解,成分也因此被探测器记录下来。然而这些资料一直被尘封在 NASA 档案馆里面,直到最近研究团队才在一组微缩胶卷上找到。重新分析发现了金星气胶粒子中含有水、二氧化硫、氧分子和氧化铁的证据。水的含量比先前预期地高,大约是过去估计的三倍──水占气胶质量的约 60%。

  15. 2024 年 3% 的日本新生儿是外国人

    2024 年在日本出生的外国人达到 2 万人,在新生儿中的占比超过 3%。这两项数据均被认为是首次达到如此高的水平。以劳动年龄层为中心,在日外国人已增至日本总人口的约3%,出生阶段的人口也已进入外国人在一定程度上弥补日本出生人数少的新阶段。对日本而言,不单纯侧重于加强管制,而是包含共生措施在内的外国人政策正变得愈发重要。2024 年的确定数据显示,在日本外国人的出生数为2万2878人,比上年增加3000多人。从外国人新生儿母亲的国籍来看,中国为4237人,其次是菲律宾(1807人)、巴西(1351人)。在调查中,尼泊尔、越南等在日人员较多的几个国家被归入“其他国家”类别,这一类别的新生儿母亲人数最多,达到1万4425人。