WEEK · 2025-W45

Weekly Digest — 2025-W45

134 unique stories (2025-11-032025-11-09), aggregated across 8 sources.

Hacker News(42)

  1. </> Htmx – The Fetch()ening (htmx.org)
  2. Israels top military lawyer arrested after she admitted leaking video of abuse (www.theguardian.com)
  3. Why we migrated from Python to Node.js (blog.yakkomajuri.com)
  4. Learning to read Arthur Whitney's C to become smart (2024) (needleful.net)
  5. Ask HN: Who is hiring? (November 2025)
  6. OpenAI signs $38B cloud computing deal with Amazon (www.nytimes.com)
  7. NoLongerEvil-Thermostat – Nest Generation 1 and 2 Firmware (github.com)
  8. Codemaps: Understand Code, Before You Vibe It (cognition.ai)
  9. We're open-sourcing the successor of Jupyter notebook (deepnote.com)
  10. Michael Burry a.k.a. "Big Short",discloses $1.1B bet against Nvidia&Palantir (sherwood.news)
  11. Pg_lake: Postgres with Iceberg and data lake access (github.com)
  12. The 512KB Club (512kb.club)

GitHub Trending(23)

  1. 666ghj / BettaFish

    微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

  2. GeeeekExplorer / nano-vllm

    Nano vLLM

  3. HKUDS / DeepCode

    "DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

  4. charmbracelet / glow

    Render markdown on the CLI, with pizzazz! 💅🏻

  5. sst / opencode

    The AI coding agent built for the terminal.

  6. get-convex / chef

    The only AI app builder that knows backend

  7. sst / opentui

    OpenTUI is a library for building terminal user interfaces (TUIs)

  8. mudler / LocalAI

    🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference

  9. 1Panel-dev / MaxKB

    🔥 MaxKB is an open-source platform for building enterprise-grade agents. MaxKB 是强大易用的开源企业级智能体平台。

  10. imthenachoman / How-To-Secure-A-Linux-Server

    An evolving how-to guide for securing a Linux server.

  11. Skyvern-AI / skyvern

    Automate browser based workflows with AI

  12. nocobase / nocobase

    NocoBase is the most extensible AI-powered no-code/low-code platform for building business applications and enterprise solutions.

Hugging Face(31)

  1. OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

    Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms. While these agents hold great promise for advancing digital automation, their potential for unsafe operations, such as system compromise and privacy leakage, is raising significant concerns. Detecting these safety concerns across the vast and complex operational space of mobile environments presents a formidable challenge that remains critically underexplored. To establish a foundation for mobile agent safety research, we introduce MobileRisk-Live, a dynamic sandbox environment accompanied by a safety detection benchmark comprising realistic trajectories with fine-grained annotations. Built upon this, we propose OS-Sentinel, a novel hybrid safety detection framework that synergistically combines a Formal Verifier for detecting explicit system-level violations with a VLM-based Contextual Judge for assessing contextual risks and agent actions. Experiments show that OS-Sentinel achieves 10%-30% improvements over existing approaches across multiple metrics. Further analysis provides critical insights that foster the development of safer and more reliable autonomous mobile agents.

  2. ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

    Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and image thoughts should function as complementary, rather than isomorphic, modalities that mutually advance reasoning. Guided by this principle, we build ThinkMorph, a unified model fine-tuned on 24K high-quality interleaved reasoning traces spanning tasks with varying visual engagement. ThinkMorph learns to generate progressive text-image reasoning steps that concretely manipulate visual content while maintaining coherent verbal logic. It delivers large gains on vision-centric benchmarks (averaging 34.7% over the base model) and generalizes to out-of-domain tasks, matching or surpassing larger and proprietary VLMs. Beyond performance, ThinkMorph exhibits emergent multimodal intelligence, including unseen visual manipulation skills, adaptive switching between reasoning modes, and better test-time scaling through diversified multimodal thoughts.These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning.

  3. INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

    Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language Models (LLMs). Despite this industry trend, a unified comparison of FP and integer (INT) quantization across varying granularities has been missing, leaving algorithm and hardware co-design without clear guidance. This paper fills that gap by systematically investigating the trade-offs between FP and INT formats. We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced. Our comprehensive comparison demonstrates that for popular 8-bit fine-grained formats (e.g., MX with block size 32), MXINT8 is superior to its FP counterpart in both algorithmic accuracy and hardware efficiency. However, for 4-bit formats, FP (e.g., MXFP4, NVFP4) often holds an accuracy advantage , though we show that NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied. We also introduce a symmetric clipping method that resolves gradient bias in fine-grained low-bit INT training, enabling nearly lossless performance for MXINT8 training. These findings challenge the current hardware trajectory, demonstrating that a one-size-fits-all FP approach is suboptimal and advocating that fine-grained INT formats, particularly MXINT8, offer a better balance of accuracy, power, and efficiency for future AI accelerators.

  4. π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

    Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying large-scale RL to flow-based VLAs (e.g., pi_0, pi_{0.5}) remains challenging due to intractable action log-likelihoods from iterative denoising. We address this challenge with pi_{RL}, an open-source framework for training flow-based VLAs in parallel simulation. pi_{RL} implements two RL algorithms: (1) {Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) {Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration. We evaluate pi_{RL} on LIBERO and ManiSkill benchmarks. On LIBERO, pi_{RL} boosts few-shot SFT models pi_0 and pi_{0.5} from 57.6% to 97.6% and from 77.1% to 98.3%, respectively. In ManiSkill, we train pi_{RL} in 320 parallel environments, improving pi_0 from 41.6% to 85.7% and pi_{0.5} from 40.0% to 84.8% across 4352 pick-and-place tasks, demonstrating scalable multitask RL under heterogeneous simulation. Overall, pi_{RL} achieves significant performance gains and stronger generalization over SFT-models, validating the effectiveness of online RL for flow-based VLAs.

  5. Continuous Autoregressive Language Models

    The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.

  6. Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

    Spatial understanding remains a weakness of Large Vision-Language Models (LVLMs). Existing supervised fine-tuning (SFT) and recent reinforcement learning with verifiable rewards (RLVR) pipelines depend on costly supervision, specialized tools, or constrained environments that limit scale. We introduce Spatial-SSRL, a self-supervised RL paradigm that derives verifiable signals directly from ordinary RGB or RGB-D images. Spatial-SSRL automatically formulates five pretext tasks that capture 2D and 3D spatial structure: shuffled patch reordering, flipped patch recognition, cropped patch inpainting, regional depth ordering, and relative 3D position prediction. These tasks provide ground-truth answers that are easy to verify and require no human or LVLM annotation. Training on our tasks substantially improves spatial reasoning while preserving general visual capabilities. On seven spatial understanding benchmarks in both image and video settings, Spatial-SSRL delivers average accuracy gains of 4.63% (3B) and 3.89% (7B) over the Qwen2.5-VL baselines. Our results show that simple, intrinsic supervision enables RLVR at scale and provides a practical route to stronger spatial intelligence in LVLMs.

  7. Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

    We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.

  8. The Underappreciated Power of Vision Models for Graph Structural Understanding

    Graph Neural Networks operate through bottom-up message-passing, fundamentally differing from human visual perception, which intuitively captures global structures first. We investigate the underappreciated potential of vision models for graph understanding, finding they achieve performance comparable to GNNs on established benchmarks while exhibiting distinctly different learning patterns. These divergent behaviors, combined with limitations of existing benchmarks that conflate domain features with topological understanding, motivate our introduction of GraphAbstract. This benchmark evaluates models' ability to perceive global graph properties as humans do: recognizing organizational archetypes, detecting symmetry, sensing connectivity strength, and identifying critical elements. Our results reveal that vision models significantly outperform GNNs on tasks requiring holistic structural understanding and maintain generalizability across varying graph scales, while GNNs struggle with global pattern abstraction and degrade with increasing graph size. This work demonstrates that vision models possess remarkable yet underutilized capabilities for graph structural understanding, particularly for problems requiring global topological awareness and scale-invariant reasoning. These findings open new avenues to leverage this underappreciated potential for developing more effective graph foundation models for tasks dominated by holistic pattern recognition.

  9. Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

    Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference, typically through parallel, sequential, or hybrid scaling. However, prior studies often assume fixed collaboration architectures (e.g., topologies) and single-model usage, overlooking that optimal architectures and model combinations can vary across tasks. Therefore, we study the novel problem of searching for compute-optimal model combinations and architectures in TTS under a fixed budget. We formalize it as a multi-LLM collaboration graph, where nodes encode roles and LLM model assignments, and edges capture information flow. This problem is challenging because (i) the combinatorial search space is prohibitively large, and (ii) task-specific requirements demand tailored designs. To address these, we reformulate the problem as probabilistic graph optimization and, through pilot experiments, derive three empirical insights into TTS collaboration graphs. Guided by these insights, we propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update, where feedback serves as a textual gradient to update the probabilistic graph and efficiently search for optimal multi-LLM collaboration graphs. Experiments show that Agent-REINFORCE outperforms both traditional and LLM-based baselines in sample efficiency and search performance, and effectively identifies optimal graphs under joint objectives of accuracy and inference latency.

  10. UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

    Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results, such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting. Code is available at https://github.com/alibaba-damo-academy/Lumos-Custom.

  11. ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

    Unified multimodal models (UMMs) have emerged as a powerful paradigm for seamlessly unifying text and image understanding and generation. However, prevailing evaluations treat these abilities in isolation, such that tasks with multimodal inputs and outputs are scored primarily through unimodal reasoning, i.e., textual benchmarks emphasize language-based reasoning, while visual benchmarks emphasize reasoning outcomes manifested in the pixels. We introduce ROVER to address this pressing need to test reciprocal cross-modal reasoning, the use of one modality to guide, verify, or refine outputs in the other, an ability central to the vision of unified multimodal intelligence. ROVER is a human-annotated benchmark that explicitly targets reciprocal cross-modal reasoning, which contains 1312 tasks grounded in 1876 images, spanning two complementary settings. Verbally-augmented reasoning for visual generation evaluates whether models can use verbal prompts and reasoning chains to guide faithful image synthesis. Visually-augmented reasoning for verbal generation evaluates whether models can generate intermediate visualizations that strengthen their own reasoning processes for question answering. Experiments on 17 unified models reveal two key findings: (i) Cross-modal reasoning determines visual generation quality, with interleaved models significantly outperforming non-interleaved ones; notably, combining strong unimodal models fails to achieve comparable reasoning. (ii) Models show dissociation between physical and symbolic reasoning: they succeed at interpreting perceptual concepts literally but fail to construct visual abstractions for symbolic tasks, where faulty reasoning harms performance. These results highlight reciprocal cross-modal reasoning as a critical frontier for enabling true omnimodal generation.

  12. PHUMA: Physically-Grounded Humanoid Locomotion Dataset

    Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions. The code is available at https://davian-robotics.github.io/PHUMA.

Solidot(38)

  1. 注意力不集中可能是大脑在清理垃圾

    晚上没睡好,第二天总是很难集中注意力,这可能是因为你的大脑正试图自我刷新,导致短暂的注意力缺失。 睡眠期间,大脑会进行一个冲洗循环——脑脊液被反复冲入大脑,再从大脑底部流出。这一过程能够清除白天积累的代谢废物,否则会损害脑细胞。MIT 的科学家想知道通常在睡眠不足时发生的注意力涣散,是否可能是大脑在清醒时试图弥补“自我冲洗”的结果。为了研究这个问题,科学家将试验分为两个阶段。第一阶段让26名19岁到40岁的参与者睡个好觉,得到充分的休息。第二阶段则是两周后,让他们在实验室里彻夜不眠。结果显示缺乏睡眠让参与者更难集中注意力。当研究人员分析大脑扫描结果时,发现参与者在脑脊液从大脑底部流出前约两秒就失去了注意力。更重要的是,在注意力恢复后约1秒,脑脊液被冲入大脑。研究结果表明,当大脑无法在睡眠中自我清洁时,它就会在你醒着时进行清洁,但这会影响注意力。

  2. OpenAI 可能大到无法倒下

    OpenAI 尚未盈利,其年收入仅为亚马逊的 2%。它的企业重组基本完成,未来有望上市,可能成为第一家 1 万亿美元 IPO 的公司。它与科技行业知名的企业如英伟达和甲骨文达成了复杂的交易,承诺投资和购买高达万亿美元的算力,通过一系列金额巨大的交易,OpenAI 似乎达到了“大到不能倒”的程度,如果真的倒下可能会对整个经济造成系统性风险。在部分人眼里,OpenAI 集苹果、Facebook、Google 和特斯拉于一身,像一家有无限潜力的公司,能颠覆智能手机市场,创造自己的社媒网络,取代搜索引擎,引领机器人时代的到来,重塑所有商业和行业。但在另一部分人的眼里,OpenAI 像荷兰的“郁金香热”(Tulip Mania),是大萧条的先兆,下一个互联网泡沫(dot-com),他们认为 OpenAI 是想要制造弗兰肯斯坦的疯狂科学家,是导致失业率上升的杀手。

  3. 社交媒体同意遵守澳大利亚对青少年的社媒禁令

    世界主要社交媒体平台同意遵守澳大利亚对 16 岁以下青少年的社媒禁令。Meta、Snap 和 TikTok 对澳大利亚议会确认,将在 12 月 10 日该法律生效后开始删除和停用逾百万未成年人账户。未能屏蔽未成年人用户的公司将面临最高 3250 万美元的罚款。在账户停用前青少年可以选择下载其数据,而部门社媒平台还将允许保留数据直至他们年满 17 周岁。年龄验证预计一开始不会太完美,未成年用户可能不能正确识别,而成年用户可能会被错误识别为未成年人。

  4. 韩国要求停车场盖太阳能车棚

    从本月开始,韩国所有有 80 个以上停车位的停车场将被强制安装太阳能顶棚和停车棚。新法律不仅适用于新建停车场,现有停车场也需要遵守。韩国产业通商部 8 月宣布准备对《新能源和可再生能源开发、利用和推广促进法》实施细则进行修订,规定韩国所有拥有 80 个以上停车位的公共和私人停车场都必须加装太阳能电池板。此举旨在积极扩大可再生能源,创造更多太阳能和建筑工作。此外太阳能车棚还可在暴雨、暴雪和炎炎夏日的气候下保护汽车,保持车内凉爽,延长塑料和座椅面料的使用寿命,甚至可通过降低电动汽车和插电混动汽车的空调负荷延长其续航里程。

  5. 在被禁止收集数据之后厂商远程发送指令让智能吸尘器停止工作

    工程师 Harishanka 监控了其拥有的 iLife A11 智能吸尘器的进出流量,发现吸尘器一直在向厂商(深圳智意)发送日志和遥测数据——这些行为他并没有授权。他决定屏蔽厂商遥测服务器的 IP 地址,同时继续开放固件和 OTA 服务器的访问。结果他的吸尘器很快就连开机都无法开机了。他送去维修,但都没有查出任何问题。吸尘器每次都能正常工作几天,然后停止工作。他决定拆开吸尘器查找问题根源。吸尘器使用了全志的 A33 SoC,运行 TinaLinux 操作系统,使用微控制器 GD32F103 管理传感器,测试发现硬件本身没有问题,因此他将注意力转向操作系统和软件。他在日志里发现了一个指令,其时间戳与设备停止工作时间完全吻合,这显然是一条终止指令,在他撤销该指令并重启设备后,设备恢复了正常工作。他建议不要将家里的主要 WiFi 网络连接物联网设备,将这些智能设备视为家里的陌生人。

  6. 微软七个月都未修复的 Windows 0day 正被活跃利用

    安全公司趋势科技在今年 3 月报告了一个自 2017 年以来就被多达 11 个 APT 组织利用的 0day 漏洞 CVE-2025-9491,该漏洞源自 Windows Shortcut 二进制格式中的一个 bug。七个月之后微软仍然未能修复该漏洞。安全公司 Arctic Wolf 上周四报告 APT 组织 UNC-6384 正利用该漏洞攻击多个欧洲国家。由于目前仍然没有补丁,抵御攻击的选择相当有限。最有效的反制是限制使用来自不受信任来源的 .lnk 文件。安全公司还报告了另一个微软已经释出补丁但被认为不完整的漏洞 CVE-2025-59287 正被活跃利用,该漏洞存在于 Windows Server Update Services(WSUS)中,可能会导致远程代码执行,其威胁等级 9.8/10。

  7. 铠侠与英伟达合作推出直连 GPU 的 SSD

    铠侠将与英伟达合作,推出直连 GPU 进行数据交换的 SSD,产品计划在 2027 年之前上市,替代部分 HBM DRAM 芯片。SSD一般通过 CPU 与 GPU 连接。新产品计划支持 PCIe 7.0 接口。基于 GPU 的 AI 运算主要使用作为超高速 DRAM 的 HBM。但是面向 HBM 的 DRAM 的单位容量价格很高,因此 AI 运营商很难扩大存储容量。铠侠力争通过使用以低价为优势的 NAND 闪存的 SSD,替换一部分用于扩大存储容量的 HBM。

  8. Devuan 6.0 释出

    Devuan 发行版释出了代号为 Excalibur 的 Devuan 6.0。Devuan 6.0 是基于今年 8 月发布的 Debian 13 trixie,主要变化与 Debian 13 相同,使用 Linux 6.12 LTS 内核,桌面环境包括 GNOME 48、KDE Plasma 6.3、Xfce 4.20,以及 GCC 14.2、Python 3.13 等,正式支持 riscv64 架构,等等。Devuan 发行版是因为初始化系统 systemd 争议而由一群不满的 Debian 开发者创建的不使用 systemd 的分支。

  9. 微软 AI 负责人认为 AI 有意识是无稽之谈

    微软 AI 业务负责人 Mustafa Suleyman 认为只有生物才有意识,建议开发者和研究人员应停止追求宣称 AI 有意识的项目。他在 AfroTech 会议上接受采访时表示,“我不认为这是人们应该做的工作。如果你问错了问题,最终只会得到错误的答案。我认为这完全是个错误的问题。”对于 AI 有意识或相信 AI 能感受到痛苦, Suleyman 一直持有反对立场。

  10. 吉卜力工作室等日本公司要求 OpenAI 停止使用其内容训练 Sora 2

    代表吉卜力工作室和万代南梦宫等公司的日本反盗版组织 CODA(文化产品海外流通促进机构) 致函 OpenAI,要求停止使用其成员的内容训练视频生成模型 Sora 2。CODA 在信中表示机器学习过程中的复制行为可能构成了侵权,因为 AI 模型最后会生成包含受版权保护角色的内容。Sora 2 于 9 月 30 日上线后生成了大量包含日本 IP 的内容,促使日本政府正式要求 OpenAI 停止复制日本美术作品。此外 OpenAI 今年 3 月发布 GPT-4o 发布时炒作了其“吉卜力风格”的图像生成能力。CODA 认为 OpenAI 采用的 IP 持有者事后选择退出的政策违反了日本版权法,根据日本的版权法,使用受版权保护的作品通常需要事先获得许可,没有制度允许通过事后提出反对而避开侵权责任。

  11. AI 是否真的影响了工作岗位招聘?

    对 2023 年 1 月至 2025 年 10 月发布的近 1.8 亿条招聘信息的分析显示:2025 年职位发布数量下降了 8%,职位数量下降幅度最大的是创意执行类的工作,包括计算机图形艺术家(-33%)、摄影师(-28%)和作家(-28%),但创意高管类工作岗位数量变化很小;企业合规专员(-29%)、可持续发展专员(-28%)和环境技术员(-26%),这一趋势显然和美国政府当前的政策相关,贸易合规专员职位数增长了 18%,这与美国的关税政策相关;医疗记录员职位数下降 20%,可能和 AI 能自动化记录病例相关;机器学习工程师职位增长了 40%;高级领导职位数量变化不大,其中数据工程总监职位数增长 23%;网红营销专家职位数增长 18.3%;虽然有很多关于 AI 取代软件工程师的讨论,但软件工程师是当前最稳定的工作之一;客服代表并没有被 AI 大规模取代,其职位数仅仅下降 4%。

  12. 美国学者向欧洲申请研究拨款

    欧盟发现申请拨款的美国学者兴趣激增,因为越来越多的美国研究人员寻求海外选项,作为对特朗普(Donald Trump)攻击高等教育的回应。欧盟今年收到创纪录数量的顶级科研与创新拨款申请,其中来自美国的对一项关键基金的申请数量是 2024 年的三倍。2025 年向欧盟基础研究资助机构 European Research Council(ERC) 和欧盟博士和博士后研究计划 Marie Skłodowska-Curie Actions (MSCA)的申请量均创历史新高。欧盟研究事务专员叶扎哈里耶娃(Ekaterina Zaharieva)表示,激烈的竞争是关于人才而不是资金,人人都想要吸引人才。面向高级研究员的 ERC 申请量比去年增长 31%,比 2023 年增长了 82%。MSCA 收到了 17058 份申请,为历史最高。特朗普政府今年取消了数十亿美元的联邦研究资金,取消了与其政策不符的 DEI 和气候变化等主题的资助。