DIGEST · 2025-11-04

OrangeBot.AI Digest — 2025-11-04

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. NoLongerEvil-Thermostat – Nest Generation 1 and 2 Firmware (github.com)
  2. Codemaps: Understand Code, Before You Vibe It (cognition.ai)
  3. We're open-sourcing the successor of Jupyter notebook (deepnote.com)
  4. Michael Burry a.k.a. "Big Short",discloses $1.1B bet against Nvidia&Palantir (sherwood.news)
  5. Pg_lake: Postgres with Iceberg and data lake access (github.com)
  6. The 512KB Club (512kb.club)
  7. This Day in 1988, the Morris worm infected 10% of the Internet within 24 hours (www.tomshardware.com)
  8. Show HN: A CSS-Only Terrain Generator (terra.layoutit.com)
  9. Customize Nano Text Editor (shafi.ddns.net)
  10. This Month in Ladybird – October 2025 (ladybird.org)
  11. What is a manifold? (www.quantamagazine.org)
  12. Bloom filters are good for search that does not scale (notpeerreviewed.com)
  13. Tenacity – a multi-track audio editor/recorder (tenacityaudio.org)
  14. You can't cURL a Border (drobinin.com)
  15. Tell HN: X is opening any tweet link in a webview whether you press it or not

GitHub Trending(15)

  1. 666ghj / BettaFish

    微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

  2. sst / opentui

    OpenTUI is a library for building terminal user interfaces (TUIs)

  3. GeeeekExplorer / nano-vllm

    Nano vLLM

  4. mudler / LocalAI

    🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference

  5. 1Panel-dev / MaxKB

    🔥 MaxKB is an open-source platform for building enterprise-grade agents. MaxKB 是强大易用的开源企业级智能体平台。

  6. imthenachoman / How-To-Secure-A-Linux-Server

    An evolving how-to guide for securing a Linux server.

  7. Raphire / Win11Debloat

    A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry, as well as perform various other changes to customize, declutter and improve your Windows experience. Win11Debloat works for both Windows 10 and Windows 11.

  8. DearVa / Everywhere

    A context-aware AI assistant for your desktop. Ready to respond intelligently, seamlessly integrating multiple LLMs and MCP tools.

  9. HKUDS / DeepCode

    "DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

  10. Fosowl / agenticSeek

    Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin993886460 (Beware of fake account)

  11. charmbracelet / glow

    Render markdown on the CLI, with pizzazz! 💅🏻

  12. mudler / edgevpn

    ⛵ The immutable, decentralized, statically built p2p VPN without any central server and automatic discovery! Create decentralized introspectable tunnels over p2p with shared tokens

  13. hmjz100 / LinkSwift

    一个基于 JavaScript 的网盘文件下载地址获取工具。基于【网盘直链下载助手】修改 ,支持 百度网盘 / 阿里云盘 / 中国移动云盘 / 天翼云盘 / 迅雷云盘 / 夸克网盘 / UC网盘 / 123云盘 八大网盘

  14. coleam00 / ottomator-agents

    All the open source AI Agents hosted on the oTTomator Live Agent Studio platform!

  15. PKUFlyingPig / cs-self-learning

    计算机自学指南

Hugging Face(15)

  1. Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

    We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.

  2. The Underappreciated Power of Vision Models for Graph Structural Understanding

    Graph Neural Networks operate through bottom-up message-passing, fundamentally differing from human visual perception, which intuitively captures global structures first. We investigate the underappreciated potential of vision models for graph understanding, finding they achieve performance comparable to GNNs on established benchmarks while exhibiting distinctly different learning patterns. These divergent behaviors, combined with limitations of existing benchmarks that conflate domain features with topological understanding, motivate our introduction of GraphAbstract. This benchmark evaluates models' ability to perceive global graph properties as humans do: recognizing organizational archetypes, detecting symmetry, sensing connectivity strength, and identifying critical elements. Our results reveal that vision models significantly outperform GNNs on tasks requiring holistic structural understanding and maintain generalizability across varying graph scales, while GNNs struggle with global pattern abstraction and degrade with increasing graph size. This work demonstrates that vision models possess remarkable yet underutilized capabilities for graph structural understanding, particularly for problems requiring global topological awareness and scale-invariant reasoning. These findings open new avenues to leverage this underappreciated potential for developing more effective graph foundation models for tasks dominated by holistic pattern recognition.

  3. Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

    Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference, typically through parallel, sequential, or hybrid scaling. However, prior studies often assume fixed collaboration architectures (e.g., topologies) and single-model usage, overlooking that optimal architectures and model combinations can vary across tasks. Therefore, we study the novel problem of searching for compute-optimal model combinations and architectures in TTS under a fixed budget. We formalize it as a multi-LLM collaboration graph, where nodes encode roles and LLM model assignments, and edges capture information flow. This problem is challenging because (i) the combinatorial search space is prohibitively large, and (ii) task-specific requirements demand tailored designs. To address these, we reformulate the problem as probabilistic graph optimization and, through pilot experiments, derive three empirical insights into TTS collaboration graphs. Guided by these insights, we propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update, where feedback serves as a textual gradient to update the probabilistic graph and efficiently search for optimal multi-LLM collaboration graphs. Experiments show that Agent-REINFORCE outperforms both traditional and LLM-based baselines in sample efficiency and search performance, and effectively identifies optimal graphs under joint objectives of accuracy and inference latency.

  4. UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

    Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results, such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting. Code is available at https://github.com/alibaba-damo-academy/Lumos-Custom.

  5. ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

    Unified multimodal models (UMMs) have emerged as a powerful paradigm for seamlessly unifying text and image understanding and generation. However, prevailing evaluations treat these abilities in isolation, such that tasks with multimodal inputs and outputs are scored primarily through unimodal reasoning, i.e., textual benchmarks emphasize language-based reasoning, while visual benchmarks emphasize reasoning outcomes manifested in the pixels. We introduce ROVER to address this pressing need to test reciprocal cross-modal reasoning, the use of one modality to guide, verify, or refine outputs in the other, an ability central to the vision of unified multimodal intelligence. ROVER is a human-annotated benchmark that explicitly targets reciprocal cross-modal reasoning, which contains 1312 tasks grounded in 1876 images, spanning two complementary settings. Verbally-augmented reasoning for visual generation evaluates whether models can use verbal prompts and reasoning chains to guide faithful image synthesis. Visually-augmented reasoning for verbal generation evaluates whether models can generate intermediate visualizations that strengthen their own reasoning processes for question answering. Experiments on 17 unified models reveal two key findings: (i) Cross-modal reasoning determines visual generation quality, with interleaved models significantly outperforming non-interleaved ones; notably, combining strong unimodal models fails to achieve comparable reasoning. (ii) Models show dissociation between physical and symbolic reasoning: they succeed at interpreting perceptual concepts literally but fail to construct visual abstractions for symbolic tasks, where faulty reasoning harms performance. These results highlight reciprocal cross-modal reasoning as a critical frontier for enabling true omnimodal generation.

  6. PHUMA: Physically-Grounded Humanoid Locomotion Dataset

    Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions. The code is available at https://davian-robotics.github.io/PHUMA.

  7. UniREditBench: A Unified Reasoning-based Image Editing Benchmark

    Recent advances in multi-modal generative models have driven substantial improvements in image editing. However, current generative models still struggle with handling diverse and complex image editing tasks that require implicit reasoning, underscoring the need for a comprehensive benchmark to systematically assess their performance across various reasoning scenarios. Existing benchmarks primarily focus on single-object attribute transformation in realistic scenarios, which, while effective, encounter two key challenges: (1) they largely overlook multi-object interactions as well as game-world scenarios that involve human-defined rules, which are common in real-life applications; (2) they only rely on textual references to evaluate the generated images, potentially leading to systematic misjudgments, especially in complex reasoning scenarios. To this end, this work proposes UniREditBench, a unified benchmark for reasoning-based image editing evaluation. It comprises 2,700 meticulously curated samples, covering both real- and game-world scenarios across 8 primary dimensions and 18 sub-dimensions. To improve evaluation reliability, we introduce multimodal dual-reference evaluation, providing both textual and ground-truth image references for each sample assessment. Furthermore, we design an automated multi-scenario data synthesis pipeline and construct UniREdit-Data-100K, a large-scale synthetic dataset with high-quality chain-of-thought (CoT) reasoning annotations. We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings. Through thorough benchmarking of both open-source and closed-source image editing models, we reveal their strengths and weaknesses across various aspects.

  8. World Simulation with Video Foundation Models for Physical AI

    We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Image2World, and Video2World generation in a single model and leverages [Cosmos-Reason1], a Physical AI vision-language model, to provide richer text grounding and finer control of world simulation. Trained on 200M curated video clips and refined with reinforcement learning-based post-training, [Cosmos-Predict2.5] achieves substantial improvements over [Cosmos-Predict1] in video quality and instruction alignment, with models released at 2B and 14B scales. These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems. We further extend the family with [Cosmos-Transfer2.5], a control-net style framework for Sim2Real and Real2Real world translation. Despite being 3.5times smaller than [Cosmos-Transfer1], it delivers higher fidelity and robust long-horizon video generation. Together, these advances establish [Cosmos-Predict2.5] and [Cosmos-Transfer2.5] as versatile tools for scaling embodied intelligence. To accelerate research and deployment in Physical AI, we release source code, pretrained checkpoints, and curated benchmarks under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-predict2.5 and https://github.com/nvidia-cosmos/cosmos-transfer2.5. We hope these open resources lower the barrier to adoption and foster innovation in building the next generation of embodied intelligence.

  9. ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use

    Recently, large language models (LLMs) have demonstrated remarkable problem-solving capabilities by autonomously integrating with external tools for collaborative reasoning. However, due to the inherently complex and diverse nature of multimodal information, enabling multimodal large language models (MLLMs) to flexibly and efficiently utilize external tools during reasoning remains an underexplored challenge. In this work, we introduce ToolScope, an agentic framework designed to unify global planning with local multimodal perception, adopting a specialized Perceive tool to mitigates visual context degradation in long-horizon VQA task. ToolScope comprises three primary components: the Global Navigator, the Agentic Executor, and the Response Synthesizer. The Global Navigator functions as a "telescope", offering high-level strategic guidance. The Agentic Executor operates iteratively to augment MLLM with local perception through the integration of external tools-Search, Code, and Perceive. Finally, the Response Synthesizer consolidates and organizes the reasoning process into a coherent, user-friendly output. We evaluate ToolScope on four VQA benchmarks across diverse domains, including VQA 2.0, ScienceQA, MAT-Search and MathVista. It demonstrates strong generalization capabilities, achieving an average performance improvement of up to +6.69% across all datasets.

  10. MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models

    Large reasoning models (LRMs) show strong capabilities in complex reasoning, yet their marginal gains on evidence-dependent factual questions are limited. We find this limitation is partially attributable to a reasoning-answer hit gap, where the model identifies the correct facts during reasoning but fails to incorporate them into the final response, thereby reducing factual fidelity. To address this issue, we propose MR-ALIGN, a Meta-Reasoning informed alignment framework that enhances factuality without relying on external verifiers. MR-ALIGN quantifies state transition probabilities along the model's thinking process and constructs a transition-aware implicit reward that reinforces beneficial reasoning patterns while suppressing defective ones at the atomic thinking segments. This re-weighting reshapes token-level signals into probability-aware segment scores, encouraging coherent reasoning trajectories that are more conducive to factual correctness. Empirical evaluations across four factual QA datasets and one long-form factuality benchmark show that MR-ALIGN consistently improves accuracy and truthfulness while reducing misleading reasoning. These results highlight that aligning the reasoning process itself, rather than merely the outputs, is pivotal for advancing factuality in LRMs.

  11. Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum

    The prevailing video retrieval paradigm is structurally misaligned, as narrow benchmarks incentivize correspondingly limited data and single-task training. Therefore, universal capability is suppressed due to the absence of a diagnostic evaluation that defines and demands multi-dimensional generalization. To break this cycle, we introduce a framework built on the co-design of evaluation, data, and modeling. First, we establish the Universal Video Retrieval Benchmark (UVRB), a suite of 16 datasets designed not only to measure performance but also to diagnose critical capability gaps across tasks and domains. Second, guided by UVRB's diagnostics, we introduce a scalable synthesis workflow that generates 1.55 million high-quality pairs to populate the semantic space required for universality. Finally, we devise the Modality Pyramid, a curriculum that trains our General Video Embedder (GVE) by explicitly leveraging the latent interconnections within our diverse data. Extensive experiments show GVE achieves state-of-the-art zero-shot generalization on UVRB. In particular, our analysis reveals that popular benchmarks are poor predictors of general ability and that partially relevant retrieval is a dominant but overlooked scenario. Overall, our co-designed framework provides a practical path to escape the limited scope and advance toward truly universal video retrieval.

  12. OpenSIR: Open-Ended Self-Improving Reasoner

    Recent advances in large language model (LLM) reasoning through reinforcement learning rely on annotated datasets for verifiable rewards, which may limit models' ability to surpass human-level performance. While self-play offers a promising alternative, existing approaches depend on external verifiers or cannot learn open-endedly. We present Open-Ended Self-Improving Reasoner (OpenSIR), a self-play framework where an LLM learns to generate and solve novel problems by alternating teacher and student roles without external supervision. To generate novel problems, OpenSIR optimises for both difficulty and diversity, rewarding problems that challenge appropriately while exploring distinct concepts, enabling open-ended mathematical discovery. Starting from a single trivial seed problem, OpenSIR substantially improves instruction models: Llama-3.2-3B-Instruct advances from 73.9 to 78.3 on GSM8K, and from 28.8 to 34.4 on College Math, while Gemma-2-2B-Instruct rises from 38.5 to 58.7 on GSM8K. Our analyses reveal that OpenSIR achieves open-ended learning through co-evolving teacher-student roles that adaptively calibrate difficulty and drive diverse exploration, progressing autonomously from basic to advanced mathematics.

  13. LongCat-Flash-Omni Technical Report

    We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction. By adopting a curriculum-inspired progressive training strategy that transitions from simpler to increasingly complex modality sequence modeling tasks, LongCat-Flash-Omni attains comprehensive multimodal capabilities while maintaining strong unimodal capability. Building upon LongCat-Flash, which adopts a high-performance Shortcut-connected Mixture-of-Experts (MoE) architecture with zero-computation experts, LongCat-Flash-Omni integrates efficient multimodal perception and speech reconstruction modules. Despite its immense size of 560B parameters (with 27B activated), LongCat-Flash-Omni achieves low-latency real-time audio-visual interaction. For training infrastructure, we developed a modality-decoupled parallelism scheme specifically designed to manage the data and model heterogeneity inherent in large-scale multimodal training. This innovative approach demonstrates exceptional efficiency by sustaining over 90% of the throughput achieved by text-only training. Extensive evaluations show that LongCat-Flash-Omni achieves state-of-the-art performance on omni-modal benchmarks among open-source models. Furthermore, it delivers highly competitive results across a wide range of modality-specific tasks, including text, image, and video understanding, as well as audio understanding and generation. We provide a comprehensive overview of the model architecture design, training procedures, and data strategies, and open-source the model to foster future research and development in the community.

  14. TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

    The frontier of visual reasoning is shifting toward models like OpenAI o3, which can intelligently create and operate tools to transform images for problem-solving, also known as thinking-with-images in chain-of-thought. Yet existing benchmarks fail to fully capture this advanced capability. Even Visual Search, the most common benchmark for current thinking-with-images methods, tests only basic operations such as localization and cropping, offering little insight into more complex, dynamic, and tool-dependent reasoning. We introduce TIR-Bench, a comprehensive benchmark for evaluating agentic thinking-with-images across 13 diverse tasks, each requiring novel tool use for image processing and manipulation in chain-of-thought. We evaluate 22 multimodal large language models (MLLMs), from leading open-sourced and proprietary models to those with explicit tool-use augmentation. Results show that TIR-Bench is universally challenging, and strong performance requires genuine thinking-with-images capabilities. Finally, we present a pilot study comparing direct versus agentic fine-tuning.

  15. NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

    Vision-language models demonstrate unprecedented performance and generalization across a wide range of tasks and scenarios. Integrating these foundation models into robotic navigation systems opens pathways toward building general-purpose robots. Yet, evaluating these models' navigation capabilities remains constrained by costly real-world trials, overly simplified simulations, and limited benchmarks. We introduce NaviTrace, a high-quality Visual Question Answering benchmark where a model receives an instruction and embodiment type (human, legged robot, wheeled robot, bicycle) and must output a 2D navigation trace in image space. Across 1000 scenarios and more than 3000 expert traces, we systematically evaluate eight state-of-the-art VLMs using a newly introduced semantic-aware trace score. This metric combines Dynamic Time Warping distance, goal endpoint error, and embodiment-conditioned penalties derived from per-pixel semantics and correlates with human preferences. Our evaluation reveals consistent gap to human performance caused by poor spatial grounding and goal localization. NaviTrace establishes a scalable and reproducible benchmark for real-world robotic navigation. The benchmark and leaderboard can be found at https://leggedrobotics.github.io/navitrace_webpage/.

Solidot(15)

  1. 铠侠与英伟达合作推出直连 GPU 的 SSD

    铠侠将与英伟达合作,推出直连 GPU 进行数据交换的 SSD,产品计划在 2027 年之前上市,替代部分 HBM DRAM 芯片。SSD一般通过 CPU 与 GPU 连接。新产品计划支持 PCIe 7.0 接口。基于 GPU 的 AI 运算主要使用作为超高速 DRAM 的 HBM。但是面向 HBM 的 DRAM 的单位容量价格很高,因此 AI 运营商很难扩大存储容量。铠侠力争通过使用以低价为优势的 NAND 闪存的 SSD,替换一部分用于扩大存储容量的 HBM。

  2. Devuan 6.0 释出

    Devuan 发行版释出了代号为 Excalibur 的 Devuan 6.0。Devuan 6.0 是基于今年 8 月发布的 Debian 13 trixie,主要变化与 Debian 13 相同,使用 Linux 6.12 LTS 内核,桌面环境包括 GNOME 48、KDE Plasma 6.3、Xfce 4.20,以及 GCC 14.2、Python 3.13 等,正式支持 riscv64 架构,等等。Devuan 发行版是因为初始化系统 systemd 争议而由一群不满的 Debian 开发者创建的不使用 systemd 的分支。

  3. 微软 AI 负责人认为 AI 有意识是无稽之谈

    微软 AI 业务负责人 Mustafa Suleyman 认为只有生物才有意识,建议开发者和研究人员应停止追求宣称 AI 有意识的项目。他在 AfroTech 会议上接受采访时表示,“我不认为这是人们应该做的工作。如果你问错了问题,最终只会得到错误的答案。我认为这完全是个错误的问题。”对于 AI 有意识或相信 AI 能感受到痛苦, Suleyman 一直持有反对立场。

  4. 吉卜力工作室等日本公司要求 OpenAI 停止使用其内容训练 Sora 2

    代表吉卜力工作室和万代南梦宫等公司的日本反盗版组织 CODA(文化产品海外流通促进机构) 致函 OpenAI,要求停止使用其成员的内容训练视频生成模型 Sora 2。CODA 在信中表示机器学习过程中的复制行为可能构成了侵权,因为 AI 模型最后会生成包含受版权保护角色的内容。Sora 2 于 9 月 30 日上线后生成了大量包含日本 IP 的内容,促使日本政府正式要求 OpenAI 停止复制日本美术作品。此外 OpenAI 今年 3 月发布 GPT-4o 发布时炒作了其“吉卜力风格”的图像生成能力。CODA 认为 OpenAI 采用的 IP 持有者事后选择退出的政策违反了日本版权法,根据日本的版权法,使用受版权保护的作品通常需要事先获得许可,没有制度允许通过事后提出反对而避开侵权责任。

  5. AI 是否真的影响了工作岗位招聘?

    对 2023 年 1 月至 2025 年 10 月发布的近 1.8 亿条招聘信息的分析显示:2025 年职位发布数量下降了 8%,职位数量下降幅度最大的是创意执行类的工作,包括计算机图形艺术家(-33%)、摄影师(-28%)和作家(-28%),但创意高管类工作岗位数量变化很小;企业合规专员(-29%)、可持续发展专员(-28%)和环境技术员(-26%),这一趋势显然和美国政府当前的政策相关,贸易合规专员职位数增长了 18%,这与美国的关税政策相关;医疗记录员职位数下降 20%,可能和 AI 能自动化记录病例相关;机器学习工程师职位增长了 40%;高级领导职位数量变化不大,其中数据工程总监职位数增长 23%;网红营销专家职位数增长 18.3%;虽然有很多关于 AI 取代软件工程师的讨论,但软件工程师是当前最稳定的工作之一;客服代表并没有被 AI 大规模取代,其职位数仅仅下降 4%。

  6. 美国学者向欧洲申请研究拨款

    欧盟发现申请拨款的美国学者兴趣激增,因为越来越多的美国研究人员寻求海外选项,作为对特朗普(Donald Trump)攻击高等教育的回应。欧盟今年收到创纪录数量的顶级科研与创新拨款申请,其中来自美国的对一项关键基金的申请数量是 2024 年的三倍。2025 年向欧盟基础研究资助机构 European Research Council(ERC) 和欧盟博士和博士后研究计划 Marie Skłodowska-Curie Actions (MSCA)的申请量均创历史新高。欧盟研究事务专员叶扎哈里耶娃(Ekaterina Zaharieva)表示,激烈的竞争是关于人才而不是资金,人人都想要吸引人才。面向高级研究员的 ERC 申请量比去年增长 31%,比 2023 年增长了 82%。MSCA 收到了 17058 份申请,为历史最高。特朗普政府今年取消了数十亿美元的联邦研究资金,取消了与其政策不符的 DEI 和气候变化等主题的资助。

  7. 让你闻起来更吸引人的食物

    人人都有独特的体味,影响体味的因素包括了基因、激素、健康状况和卫生习惯,以及食物。研究显示,食物不仅会影响人的整体气味,还会影响他人的感知如吸引力。食物主要通过肠道和皮肤影响体味。肠道内的细菌会代谢食物,食物中的化学物质与细菌作用会向外释放出气体。口臭就与此相关。食物中的化学成分一旦被代谢,通过血液循环进入人体组织,其中一部分会通过皮肤汗液排出,与皮肤上的细菌作用产生气味。汗液本身是无味的,是细菌使汗液散发异味。气味最浓烈的食物都富含硫化合物。有研究表明,它们可能会产生意想不到的效果,让我们对异性更具有吸引力。大蒜令人口气难闻,但研究发现食用大量大蒜的男性的腋下汗液被女性评价为更性感。相比下高碳水化合物饮食产生的气味最不性感。澳大利亚的一项研究发现,摄入更多水果和蔬菜的男性体味更佳。

  8. 报告称波兰语最容易被 AI 理解

    马里兰大学和微软的研究人员调查了哪种语言最易被 AI 理解,在 26 种语言中,波兰语排在榜首,而英语仅排名第六。研究团队测试了 OpenAI、Google Gemini、Qwen、Llama 和 DeepSeek 等主流 AI 语言模型对 26 种语言相同输入的响应。结果显示,波兰语在完成任务时的平均准确率达到了 88%。尽管可用于训练的波兰语数据量远小于英语或汉语,AI 系统仍然展现出了对波兰语的强大理解力。汉语排名倒数第四。排名前十的语言波兰语之外还有:法语 87%,意大利语 86%,西班牙语 85%,俄语 84%,英语 83.9%,乌克兰语 83.5%,葡萄牙语 82%,德语 81% 和 荷兰语 80%。

  9. Jabber Zeus 程序员 ‘MrICQ’被引渡至美国

    2012 年就遭到起诉的黑客组织 Jabber Zeus 程序员‘MrICQ’已被引渡至美国。MrICQ 原名 Yuriy Igorevich Rybtsov,现年 41 岁,来自目前被俄罗斯控制的乌克兰顿涅茨克市。MrICQ 是在意大利被拘捕的,时间地点未知,上个月被引渡到美国,目前已被拘留。Jabber Zeus 的名字来自该组织使用的定制 ZeuS 银行木马,该木马旨在窃取银行登陆凭证,每当有新受害者在银行网站输入一次性密码木马会发送一条 Jabber 消息。Jabber Zeus 主要针对中小企业,一旦进入受害者账户,它会修改公司工资单加入数十名“钱骡”(money mules),钱骡会在扣除佣金之后将窃取的钱转给乌克兰和英国的其他钱骡。Jabber Zeus 头目 Vyacheslav“Tank” Penchukov 于 2022 年前往瑞士会见妻子途中被捕,去年被美国法院判处 18 年监禁,以及逾 7300 万美元赔偿金。Zeus 木马的原作者是俄罗斯人 Evgeniy Mikhailovich Bogachev,他被 FBI 列入通缉名单,悬赏 300 万美元。

  10. 注意力不集中可能是大脑在清理垃圾

    晚上没睡好,第二天总是很难集中注意力,这可能是因为你的大脑正试图自我刷新,导致短暂的注意力缺失。 睡眠期间,大脑会进行一个冲洗循环——脑脊液被反复冲入大脑,再从大脑底部流出。这一过程能够清除白天积累的代谢废物,否则会损害脑细胞。MIT 的科学家想知道通常在睡眠不足时发生的注意力涣散,是否可能是大脑在清醒时试图弥补“自我冲洗”的结果。为了研究这个问题,科学家将试验分为两个阶段。第一阶段让26名19岁到40岁的参与者睡个好觉,得到充分的休息。第二阶段则是两周后,让他们在实验室里彻夜不眠。结果显示缺乏睡眠让参与者更难集中注意力。当研究人员分析大脑扫描结果时,发现参与者在脑脊液从大脑底部流出前约两秒就失去了注意力。更重要的是,在注意力恢复后约1秒,脑脊液被冲入大脑。研究结果表明,当大脑无法在睡眠中自我清洁时,它就会在你醒着时进行清洁,但这会影响注意力。

  11. OpenAI 可能大到无法倒下

    OpenAI 尚未盈利,其年收入仅为亚马逊的 2%。它的企业重组基本完成,未来有望上市,可能成为第一家 1 万亿美元 IPO 的公司。它与科技行业知名的企业如英伟达和甲骨文达成了复杂的交易,承诺投资和购买高达万亿美元的算力,通过一系列金额巨大的交易,OpenAI 似乎达到了“大到不能倒”的程度,如果真的倒下可能会对整个经济造成系统性风险。在部分人眼里,OpenAI 集苹果、Facebook、Google 和特斯拉于一身,像一家有无限潜力的公司,能颠覆智能手机市场,创造自己的社媒网络,取代搜索引擎,引领机器人时代的到来,重塑所有商业和行业。但在另一部分人的眼里,OpenAI 像荷兰的“郁金香热”(Tulip Mania),是大萧条的先兆,下一个互联网泡沫(dot-com),他们认为 OpenAI 是想要制造弗兰肯斯坦的疯狂科学家,是导致失业率上升的杀手。

  12. 社交媒体同意遵守澳大利亚对青少年的社媒禁令

    世界主要社交媒体平台同意遵守澳大利亚对 16 岁以下青少年的社媒禁令。Meta、Snap 和 TikTok 对澳大利亚议会确认,将在 12 月 10 日该法律生效后开始删除和停用逾百万未成年人账户。未能屏蔽未成年人用户的公司将面临最高 3250 万美元的罚款。在账户停用前青少年可以选择下载其数据,而部门社媒平台还将允许保留数据直至他们年满 17 周岁。年龄验证预计一开始不会太完美,未成年用户可能不能正确识别,而成年用户可能会被错误识别为未成年人。

  13. 韩国要求停车场盖太阳能车棚

    从本月开始,韩国所有有 80 个以上停车位的停车场将被强制安装太阳能顶棚和停车棚。新法律不仅适用于新建停车场,现有停车场也需要遵守。韩国产业通商部 8 月宣布准备对《新能源和可再生能源开发、利用和推广促进法》实施细则进行修订,规定韩国所有拥有 80 个以上停车位的公共和私人停车场都必须加装太阳能电池板。此举旨在积极扩大可再生能源,创造更多太阳能和建筑工作。此外太阳能车棚还可在暴雨、暴雪和炎炎夏日的气候下保护汽车,保持车内凉爽,延长塑料和座椅面料的使用寿命,甚至可通过降低电动汽车和插电混动汽车的空调负荷延长其续航里程。

  14. 在被禁止收集数据之后厂商远程发送指令让智能吸尘器停止工作

    工程师 Harishanka 监控了其拥有的 iLife A11 智能吸尘器的进出流量,发现吸尘器一直在向厂商(深圳智意)发送日志和遥测数据——这些行为他并没有授权。他决定屏蔽厂商遥测服务器的 IP 地址,同时继续开放固件和 OTA 服务器的访问。结果他的吸尘器很快就连开机都无法开机了。他送去维修,但都没有查出任何问题。吸尘器每次都能正常工作几天,然后停止工作。他决定拆开吸尘器查找问题根源。吸尘器使用了全志的 A33 SoC,运行 TinaLinux 操作系统,使用微控制器 GD32F103 管理传感器,测试发现硬件本身没有问题,因此他将注意力转向操作系统和软件。他在日志里发现了一个指令,其时间戳与设备停止工作时间完全吻合,这显然是一条终止指令,在他撤销该指令并重启设备后,设备恢复了正常工作。他建议不要将家里的主要 WiFi 网络连接物联网设备,将这些智能设备视为家里的陌生人。

  15. 微软七个月都未修复的 Windows 0day 正被活跃利用

    安全公司趋势科技在今年 3 月报告了一个自 2017 年以来就被多达 11 个 APT 组织利用的 0day 漏洞 CVE-2025-9491,该漏洞源自 Windows Shortcut 二进制格式中的一个 bug。七个月之后微软仍然未能修复该漏洞。安全公司 Arctic Wolf 上周四报告 APT 组织 UNC-6384 正利用该漏洞攻击多个欧洲国家。由于目前仍然没有补丁,抵御攻击的选择相当有限。最有效的反制是限制使用来自不受信任来源的 .lnk 文件。安全公司还报告了另一个微软已经释出补丁但被认为不完整的漏洞 CVE-2025-59287 正被活跃利用,该漏洞存在于 Windows Server Update Services(WSUS)中,可能会导致远程代码执行,其威胁等级 9.8/10。