DIGEST · 2025-09-12

OrangeBot.AI Digest — 2025-09-12

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. UTF-8 is a brilliant design (iamvishnu.com)
  2. EU court rules nuclear energy is clean energy (www.weplanet.org)
  3. QGIS is a free, open-source, cross platform geographical information system (github.com)
  4. Corporations are trying to hide job openings from US citizens (thehill.com)
  5. Many hard LeetCode problems are easy constraint problems (buttondown.com)
  6. Crates.io phishing attempt (fasterthanli.me)
  7. 3D modeling with paper (www.arvinpoddar.com)
  8. Ships are sailing with fake insurance from the Norwegian Ro Marine (www.nrk.no)
  9. Chat Control faces blocking minority in the EU (twitter.com)
  10. The treasury is expanding the Patriot Act to attack Bitcoin self custody (www.tftc.io)
  11. Becoming the person who does the thing (www.fredrivett.com)
  12. Show HN: I made a generative online drum machine with ClojureScript (dopeloop.ai)
  13. Using Emacs Org-Mode With Databases: A getting-started guide (gitlab.com)
  14. Qwen3-Next (qwen.ai)
  15. Debian 13, Postgres, and the US time zones (rachelbythebay.com)

GitHub Trending(15)

  1. trueadm / ripple

    the elegant TypeScript UI framework

  2. Physical-Intelligence / openpi
  3. CodebuffAI / codebuff

    Generate code from the terminal!

  4. sentient-agi / ROMA

    Recursive-Open-Meta-Agent v0.1 (Beta). A meta-agent framework to build high-performance multi-agent systems.

  5. firebase / genkit

    An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.

  6. Zie619 / n8n-workflows

    all of the workflows of n8n i could find (also from the site itself)

  7. epfml / ML_course

    EPFL Machine Learning Course, Fall 2025

  8. expo / expo

    An open-source framework for making universal native apps with React. Expo runs on Android, iOS, and the web.

  9. NVIDIA / garak

    the LLM vulnerability scanner

  10. milvus-io / milvus

    Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

  11. punkpeye / awesome-mcp-servers

    A collection of MCP servers.

  12. livekit / livekit

    End-to-end realtime stack for connecting humans and AI

  13. Azure / azure-sdk-for-python

    This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

  14. ZuodaoTech / everyone-can-use-english

    人人都能用英语

  15. kamranahmedse / developer-roadmap

    Interactive roadmaps, guides and other educational content to help developers grow in their careers.

Hugging Face(15)

  1. VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

    Vision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We introduce VLA-Adapter, a novel paradigm designed to reduce the reliance of VLA models on large-scale VLMs and extensive pre-training. To this end, we first systematically analyze the effectiveness of various VL conditions and present key findings on which conditions are essential for bridging perception and action spaces. Based on these insights, we propose a lightweight Policy module with Bridge Attention, which autonomously injects the optimal condition into the action space. In this way, our method achieves high performance using only a 0.5B-parameter backbone, without any robotic data pre-training. Extensive experiments on both simulated and real-world robotic benchmarks demonstrate that VLA-Adapter not only achieves state-of-the-art level performance, but also offers the fast inference speed reported to date. Furthermore, thanks to the proposed advanced bridging paradigm, VLA-Adapter enables the training of a powerful VLA model in just 8 hours on a single consumer-grade GPU, greatly lowering the barrier to deploying the VLA model. Project page: https://vla-adapter.github.io/.

  2. HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

    Human-Centric Video Generation (HCVG) methods seek to synthesize human videos from multimodal inputs, including text, image, and audio. Existing methods struggle to effectively coordinate these heterogeneous modalities due to two challenges: the scarcity of training data with paired triplet conditions and the difficulty of collaborating the sub-tasks of subject preservation and audio-visual sync with multimodal inputs. In this work, we present HuMo, a unified HCVG framework for collaborative multimodal control. For the first challenge, we construct a high-quality dataset with diverse and paired text, reference images, and audio. For the second challenge, we propose a two-stage progressive multimodal training paradigm with task-specific strategies. For the subject preservation task, to maintain the prompt following and visual generation abilities of the foundation model, we adopt the minimal-invasive image injection strategy. For the audio-visual sync task, besides the commonly adopted audio cross-attention layer, we propose a focus-by-predicting strategy that implicitly guides the model to associate audio with facial regions. For joint learning of controllabilities across multimodal inputs, building on previously acquired capabilities, we progressively incorporate the audio-visual sync task. During inference, for flexible and fine-grained multimodal control, we design a time-adaptive Classifier-Free Guidance strategy that dynamically adjusts guidance weights across denoising steps. Extensive experimental results demonstrate that HuMo surpasses specialized state-of-the-art methods in sub-tasks, establishing a unified framework for collaborative multimodal-conditioned HCVG. Project Page: https://phantom-video.github.io/HuMo.

  3. SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

    Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale human-operated robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks involving distribution shift. Recent breakthroughs in Large Reasoning Models (LRMs) demonstrate that reinforcement learning (RL) can dramatically enhance step-by-step reasoning capabilities, raising a natural question: Can RL similarly improve the long-horizon step-by-step action planning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RL framework tailored for VLA models. Building upon veRL, we introduce VLA-specific trajectory sampling, scalable parallelization, multi-environment rendering, and optimized loss computation. When applied to OpenVLA-OFT, SimpleVLA-RL achieves SoTA performance on LIBERO and even outperforms pi_0 on RoboTwin 1.0\&2.0 with the exploration-enhancing strategies we introduce. SimpleVLA-RL not only reduces dependence on large-scale data and enables robust generalization, but also remarkably surpasses SFT in real-world tasks. Moreover, we identify a novel phenomenon ``pushcut'' during RL training, wherein the policy discovers previously unseen patterns beyond those seen in the previous training process. Github: https://github.com/PRIME-RL/SimpleVLA-RL

  4. EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

    Speech-to-speech large language models (SLLMs) are attracting increasing attention. Derived from text-based large language models (LLMs), SLLMs often exhibit degradation in knowledge and reasoning capabilities. We hypothesize that this limitation arises because current training paradigms for SLLMs fail to bridge the acoustic-semantic gap in the feature representation space. To address this issue, we propose EchoX, which leverages semantic representations and dynamically generates speech training targets. This approach integrates both acoustic and semantic learning, enabling EchoX to preserve strong reasoning abilities as a speech LLM. Experimental results demonstrate that EchoX, with about six thousand hours of training data, achieves advanced performance on multiple knowledge-based question-answering benchmarks. The project is available at https://github.com/FreedomIntelligence/EchoX.

  5. Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

    Recent advances in audio-driven avatar video generation have significantly enhanced audio-visual realism. However, existing methods treat instruction conditioning merely as low-level tracking driven by acoustic or visual cues, without modeling the communicative purpose conveyed by the instructions. This limitation compromises their narrative coherence and character expressiveness. To bridge this gap, we introduce Kling-Avatar, a novel cascaded framework that unifies multimodal instruction understanding with photorealistic portrait generation. Our approach adopts a two-stage pipeline. In the first stage, we design a multimodal large language model (MLLM) director that produces a blueprint video conditioned on diverse instruction signals, thereby governing high-level semantics such as character motion and emotions. In the second stage, guided by blueprint keyframes, we generate multiple sub-clips in parallel using a first-last frame strategy. This global-to-local framework preserves fine-grained details while faithfully encoding the high-level intent behind multimodal instructions. Our parallel architecture also enables fast and stable generation of long-duration videos, making it suitable for real-world applications such as digital human livestreaming and vlogging. To comprehensively evaluate our method, we construct a benchmark of 375 curated samples covering diverse instructions and challenging scenarios. Extensive experiments demonstrate that Kling-Avatar is capable of generating vivid, fluent, long-duration videos at up to 1080p and 48 fps, achieving superior performance in lip synchronization accuracy, emotion and dynamic expressiveness, instruction controllability, identity preservation, and cross-domain generalization. These results establish Kling-Avatar as a new benchmark for semantically grounded, high-fidelity audio-driven avatar synthesis.

  6. Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

    In long-horizon tasks, recent agents based on Large Language Models (LLMs) face a significant challenge that sparse, outcome-based rewards make it difficult to assign credit to intermediate steps. Previous methods mainly focus on creating dense reward signals to guide learning, either through traditional reinforcement learning techniques like inverse reinforcement learning or by using Process Reward Models for step-by-step feedback. In this paper, we identify a fundamental problem in the learning dynamics of LLMs: the magnitude of policy gradients is inherently coupled with the entropy, which leads to inefficient small updates for confident correct actions and potentially destabilizes large updates for uncertain ones. To resolve this, we propose Entropy-Modulated Policy Gradients (EMPG), a framework that re-calibrates the learning signal based on step-wise uncertainty and the final task outcome. EMPG amplifies updates for confident correct actions, penalizes confident errors, and attenuates updates from uncertain steps to stabilize exploration. We further introduce a bonus term for future clarity that encourages agents to find more predictable solution paths. Through comprehensive experiments on three challenging agent tasks, WebShop, ALFWorld, and Deep Search, we demonstrate that EMPG achieves substantial performance gains and significantly outperforms strong policy gradient baselines. Project page is at https://empgseed-seed.github.io/

  7. FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

    The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason-6M and PRISM-Bench (Precise and Robust Image Synthesis Measurement Benchmark). FLUX-Reason-6M is a massive dataset consisting of 6 million high-quality FLUX-generated images and 20 million bilingual (English and Chinese) descriptions specifically designed to teach complex reasoning. The image are organized according to six key characteristics: Imagination, Entity, Text rendering, Style, Affection, and Composition, and design explicit Generation Chain-of-Thought (GCoT) to provide detailed breakdowns of image generation steps. The whole data curation takes 15,000 A100 GPU days, providing the community with a resource previously unattainable outside of large industrial labs. PRISM-Bench offers a novel evaluation standard with seven distinct tracks, including a formidable Long Text challenge using GCoT. Through carefully designed prompts, it utilizes advanced vision-language models for nuanced human-aligned assessment of prompt-image alignment and image aesthetics. Our extensive evaluation of 19 leading models on PRISM-Bench reveals critical performance gaps and highlights specific areas requiring improvement. Our dataset, benchmark, and evaluation code are released to catalyze the next wave of reasoning-oriented T2I generation. Project page: https://flux-reason-6m.github.io/ .

  8. Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

    In this paper, we introduce an insightful paradigm through the Auto-Encoder lens-understanding as the encoder (I2T) that compresses images into text, and generation as the decoder (T2I) that reconstructs images from that text. Using reconstruction fidelity as the unified training objective, we enforce the coherent bidirectional information flow between the understanding and generation processes, bringing mutual gains. To implement this, we propose UAE, a novel framework for unified multimodal learning. We begin by pre-training the decoder with large-scale long-context image captions to capture fine-grained semantic and complex spatial relationships. We then propose Unified-GRPO via reinforcement learning (RL), which covers three stages: (1) A cold-start phase to gently initialize both encoder and decoder with a semantic reconstruction loss; (2) Generation for Understanding, where the encoder is trained to generate informative captions that maximize the decoder's reconstruction quality, enhancing its visual understanding; (3) Understanding for Generation, where the decoder is refined to reconstruct from these captions, forcing it to leverage every detail and improving its long-context instruction following and generation fidelity. For evaluation, we introduce Unified-Bench, the first benchmark tailored to assess the degree of unification of the UMMs. A surprising "aha moment" arises within the multimodal learning domain: as RL progresses, the encoder autonomously produces more descriptive captions, while the decoder simultaneously demonstrates a profound ability to understand these intricate descriptions, resulting in reconstructions of striking fidelity.

  9. AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

    Large Audio Language Models (LALMs) are rapidly advancing, but evaluating them remains challenging due to inefficient toolkits that limit fair comparison and systematic assessment. Current frameworks suffer from three critical issues: slow processing that bottlenecks large-scale studies, inconsistent prompting that hurts reproducibility, and narrow task coverage that misses important audio reasoning capabilities. We introduce AU-Harness, an efficient and comprehensive evaluation framework for LALMs. Our system achieves a speedup of up to 127% over existing toolkits through optimized batch processing and parallel execution, enabling large-scale evaluations previously impractical. We provide standardized prompting protocols and flexible configurations for fair model comparison across diverse scenarios. Additionally, we introduce two new evaluation categories: LLM-Adaptive Diarization for temporal audio understanding and Spoken Language Reasoning for complex audio-based cognitive tasks. Through evaluation across 380+ tasks, we reveal significant gaps in current LALMs, particularly in temporal understanding and complex spoken language reasoning tasks. Our findings also highlight a lack of standardization in instruction modality existent across audio benchmarks, which can lead up performance differences up to 9.5 absolute points on the challenging complex instruction following downstream tasks. AU-Harness provides both practical evaluation tools and insights into model limitations, advancing systematic LALM development.

  10. SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

    Significant progress has been made in spatial intelligence, spanning both spatial reconstruction and world exploration. However, the scalability and real-world fidelity of current models remain severely constrained by the scarcity of large-scale, high-quality training data. While several datasets provide camera pose information, they are typically limited in scale, diversity, and annotation richness, particularly for real-world dynamic scenes with ground-truth camera motion. To this end, we collect SpatialVID, a dataset consists of a large corpus of in-the-wild videos with diverse scenes, camera movements and dense 3D annotations such as per-frame camera poses, depth, and motion instructions. Specifically, we collect more than 21,000 hours of raw video, and process them into 2.7 million clips through a hierarchical filtering pipeline, totaling 7,089 hours of dynamic content. A subsequent annotation pipeline enriches these clips with detailed spatial and semantic information, including camera poses, depth maps, dynamic masks, structured captions, and serialized motion instructions. Analysis of SpatialVID's data statistics reveals a richness and diversity that directly foster improved model generalization and performance, establishing it as a key asset for the video and 3D vision research community.

  11. mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

    Encoder-only languages models are frequently used for a variety of standard machine learning tasks, including classification and retrieval. However, there has been a lack of recent research for encoder models, especially with respect to multilingual models. We introduce mmBERT, an encoder-only language model pretrained on 3T tokens of multilingual text in over 1800 languages. To build mmBERT we introduce several novel elements, including an inverse mask ratio schedule and an inverse temperature sampling ratio. We add over 1700 low-resource languages to the data mix only during the decay phase, showing that it boosts performance dramatically and maximizes the gains from the relatively small amount of training data. Despite only including these low-resource languages in the short decay phase we achieve similar classification performance to models like OpenAI's o3 and Google's Gemini 2.5 Pro. Overall, we show that mmBERT significantly outperforms the previous generation of models on classification and retrieval tasks -- on both high and low-resource languages.

  12. Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

    Chart understanding presents a critical test to the reasoning capabilities of Vision-Language Models (VLMs). Prior approaches face critical limitations: some rely on external tools, making them brittle and constrained by a predefined toolkit, while others fine-tune specialist models that often adopt a single reasoning strategy, such as text-based chain-of-thought (CoT). The intermediate steps of text-based reasoning are difficult to verify, which complicates the use of reinforcement-learning signals that reward factual accuracy. To address this, we propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format. Our key insight is that this strategy must be adaptive: a fixed, code-only implementation consistently fails on complex charts where symbolic representation is unsuitable. This finding leads us to introduce Visual Programmability: a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis. We implement this concept in an adaptive framework where a VLM learns to choose between the CaT pathway and a direct visual reasoning pathway. The selection policy of the model is trained with reinforcement learning using a novel dual-reward system. This system combines a data-accuracy reward to ground the model in facts and prevent numerical hallucination, with a decision reward that teaches the model when to use each strategy, preventing it from defaulting to a single reasoning mode. Experiments demonstrate strong and robust performance across diverse chart-understanding benchmarks. Our work shows that VLMs can be taught not only to reason but also how to reason, dynamically selecting the optimal reasoning pathway for each task.

  13. Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes

    Understanding 3D spatial relationships remains a major limitation of current Vision-Language Models (VLMs). Prior work has addressed this issue by creating spatial question-answering (QA) datasets based on single images or indoor videos. However, real-world embodied AI agents such as robots and self-driving cars typically rely on ego-centric, multi-view observations. To this end, we introduce Ego3D-Bench, a new benchmark designed to evaluate the spatial reasoning abilities of VLMs using ego-centric, multi-view outdoor data. Ego3D-Bench comprises over 8,600 QA pairs, created with significant involvement from human annotators to ensure quality and diversity. We benchmark 16 SOTA VLMs, including GPT-4o, Gemini1.5-Pro, InternVL3, and Qwen2.5-VL. Our results reveal a notable performance gap between human level scores and VLM performance, highlighting that current VLMs still fall short of human level spatial understanding. To bridge this gap, we propose Ego3D-VLM, a post-training framework that enhances 3D spatial reasoning of VLMs. Ego3D-VLM generates cognitive map based on estimated global 3D coordinates, resulting in 12% average improvement on multi-choice QA and 56% average improvement on absolute distance estimation. Ego3D-VLM is modular and can be integrated with any existing VLM. Together, Ego3D-Bench and Ego3D-VLM offer valuable tools for advancing toward human level spatial understanding in real-world, multi-view environments.

  14. 2D Gaussian Splatting with Semantic Alignment for Image Inpainting

    Gaussian Splatting (GS), a recent technique for converting discrete points into continuous spatial representations, has shown promising results in 3D scene modeling and 2D image super-resolution. In this paper, we explore its untapped potential for image inpainting, which demands both locally coherent pixel synthesis and globally consistent semantic restoration. We propose the first image inpainting framework based on 2D Gaussian Splatting, which encodes incomplete images into a continuous field of 2D Gaussian splat coefficients and reconstructs the final image via a differentiable rasterization process. The continuous rendering paradigm of GS inherently promotes pixel-level coherence in the inpainted results. To improve efficiency and scalability, we introduce a patch-wise rasterization strategy that reduces memory overhead and accelerates inference. For global semantic consistency, we incorporate features from a pretrained DINO model. We observe that DINO's global features are naturally robust to small missing regions and can be effectively adapted to guide semantic alignment in large-mask scenarios, ensuring that the inpainted content remains contextually consistent with the surrounding scene. Extensive experiments on standard benchmarks demonstrate that our method achieves competitive performance in both quantitative metrics and perceptual quality, establishing a new direction for applying Gaussian Splatting to 2D image processing.

  15. Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

    Although Contrastive Language-Image Pre-training (CLIP) exhibits strong performance across diverse vision tasks, its application to person representation learning faces two critical challenges: (i) the scarcity of large-scale annotated vision-language data focused on person-centric images, and (ii) the inherent limitations of global contrastive learning, which struggles to maintain discriminative local features crucial for fine-grained matching while remaining vulnerable to noisy text tokens. This work advances CLIP for person representation learning through synergistic improvements in data curation and model architecture. First, we develop a noise-resistant data construction pipeline that leverages the in-context learning capabilities of MLLMs to automatically filter and caption web-sourced images. This yields WebPerson, a large-scale dataset of 5M high-quality person-centric image-text pairs. Second, we introduce the GA-DMS (Gradient-Attention Guided Dual-Masking Synergetic) framework, which improves cross-modal alignment by adaptively masking noisy textual tokens based on the gradient-attention similarity score. Additionally, we incorporate masked token prediction objectives that compel the model to predict informative text tokens, enhancing fine-grained semantic representation learning. Extensive experiments show that GA-DMS achieves state-of-the-art performance across multiple benchmarks.

Solidot(15)

  1. AirPods 实时翻译功能暂不向欧洲和中国大陆提供

    苹果在本周举行的新闻发布会上推出了新款 AirPods,改进了主动降噪,搭配了定制心率传感器,以及支持实时翻译功能。其中翻译功能需要运行 iOS 26 系统的 iPhone 15 Pro 或更新机型。但在 AirPods 正式发售时,实时翻译功能不会提供给欧盟地区的用户,也不会提供给中国大陆的用户,苹果对此没有给出太多的解释。不过苹果中国表示,中国三大电信运营商将为 iPhone Air 提供 eSIM 支持。

  2. 全球消费的鳗鱼 99% 属于濒危物种

    日本中央大学与台湾大学研究团队发现全球消费的鳗鱼 99% 以上属于三种濒危物种。这些被广泛食用的鳗鱼是美洲鳗、日本鳗和欧洲鳗,它们被世界自然保护联盟评估为濒临灭绝。鳗鱼在全球范围内存在大量不透明交易,难以掌握实际流通量,此次研究为了解真实情况提供了线索。研究团队对 2023-2025 年在亚洲、欧美及大洋洲 11 个国家和地区 26 个城市购买的 282 件加工品和生鲜品进行基因检测,确定了种类。其中美洲鳗 154 件,日本鳗 120 件,欧洲鳗 4 件,印度尼西亚短鳍鳗 1 件,另有 3 件未能进行分析。根据该结果,结合各国的生产量、贸易统计数据和市场规模推测全球流通比例,美洲鳗占 75.3%,日本鳗占 18.0%,欧洲鳗占 6.7%。从国别流通量来看,中国约占 60%,日本约占 19%,东亚地区可能占了大半。

  3. 矮行星鸟神星发现甲烷气体

    天文学家使用韦伯太空望远镜(JWST)首次在遥远的矮行星鸟神星(Makemake)上发现甲烷气体。这项成果颠覆了人们以往认为鸟神星只是颗冰封天体的看法,也让它成为继冥王星后,第二颗确认存在稀薄气体的海王星外天体。鸟神星于 2005 年由加州理工学院研究团队发现,半径约 715 公里,比冥王星稍小且更黯淡,绕太阳公转一圈需 305 年。过去的恒星掩星观测显示它没有明显大气,但不排除有稀薄大气存在的可能。红外线数据则揭示矮行星表面的甲烷冰有奇怪的热异常,暗示可能有局部热点并释放气体。研究团队指出,鸟神星是目前发现的海王星外天体中,最大的冰封天体之一,表面以甲烷冰为主。近期由韦伯望远镜的观测结果发现,在甲烷冰地表之上也存在稀薄的甲烷气体层,显示它的内部并非死寂,而是仍在不断变动中。鸟神星可能有一层非常稀薄的大气,类似冥王星。而这些气体也可能来自更短暂的局部地质活动,例如冰火山的羽状喷流或如同彗星般的升华作用。

  4. 章鱼有偏好使用的腕足

    章鱼能用任意腕足执行任务,但它们更倾向于使用某个或某几个腕足执行特定任务。章鱼腕足是由围绕一个中枢神经的四个不同肌群组成的复杂结构,这四个肌群分别为横肌、纵肌、斜肌和环肌,它们能让章鱼腕足以不同方式变形,由此做出一系列动作以完成各种行为,从狩猎和移动到自卫。然而此前并不了解野生章鱼如何利用并协调它们的腕足。研究人员分析了关于野生章鱼的 25 个 1 分钟视频,这些视频拍摄于 2007-2015 年间的大西洋和加勒比海。他们发现,所有这些章鱼均能让全部八个腕足以四种不同方式变形,并能用每个腕足完成所有动作。他们还发现,身体两侧的腕足使用率均等,但前面四个腕足的使用率远高于后面四个腕足(64%比36%)。前腕足更有可能用于探索周围环境,而后腕足更可能用于让章鱼到处移动。因此,两个动作更常用到后腕足:一个是翻滚,此时腕足在章鱼身下顺着海底移动,类似传送带;另一个是撑地“踩高跷”,此时腕足向下笔直延伸以抬起身体。

  5. Windows 开发者可免费在 Microsoft Store 发布应用

    微软官方博客宣布,近 200 个国家的开发者只需要拥有个人的 Microsoft 帐户就可在其应用商店 Microsoft Store 免费发布应用。此前开发者在 Microsoft Store 发布应用需要支付一次性 19 美元的费用,苹果应用商店收费最贵——开发者需要每年支付 99 美元,而 Google 则是收取一次性 25 美元的注册费。微软称,Microsoft Store 每月有逾 2.5 亿活跃用户。微软允许开发者在其应用商店发布 Win32、UWP、PWA、.NET、MAUI 或 Electron 等不同类型的应用。开发者甚至允许使用自己的应用内交易系统,获得 100% 的非游戏应用收入。微软如此大方的一个原因是它的应用商店在 Windows 平台并不具有垄断性质,它需要吸引开发者使用其应用商店。

  6. Vimeo 以 13.8 亿美元出售给 Bending Spoons

    曾与 YouTube 等竞争的视频网站 Vimeo 以 13.8 亿美元现金出售给意大利公司 Bending Spoons。Vimeo(VMEO)被私有化,交易预计将于 2026 年初完成,Vimeo 股东将以每股 7.85 美元获得现金。Vimeo 成立于 2004 年,是最早的视频分享平台之一,其视频以高质量著称,最近几年其市场份额逐渐被 YouTube 等蚕食,该公司因此更专注于满足企业客户的需求。Bending Spoons 曾在 2023 年收购了知名笔记应用印象笔记(Evernote),然后解雇了所有员工(这种做法不是第一次了)。最新收购对 Vimeo 的用户可能不是好消息。

  7. Firefox 支持播放 MKV 内容

    Firefox 最新的 Nightly 版本终于加入了对 Matroska MKV 内容的播放支持。该功能请求用户已经递交了八年之久。目前对 MKV 内容的支持仅限于 Nightly 版,或者用户也可以修改 media.mkv.enabled 选项选择启用。播放 MKV 内容功能目前只支持 AVC/H.264 和 AAC 编解码器,未来会逐渐支持其它编解码器。

  8. openSUSE 将禁用 bcachefs

    在 Linus Torvalds 将文件系统 Bcachefs 标记为由外部维护之后,Linux 发行版 openSUSE 项目宣布自 Linux 6.17 起将禁用 Bcachefs 文件系统。Linux 6.17 不再接受来自 Bcachefs 维护者 Kent Overstreet 递交的补丁,而 openSUSE 项目也不会去维护该文件系统。Linux 6.16 不受影响。openSUSE 项目表示如果 Bcachefs 维护者能与 Linux 作者和解,内核恢复合并相关补丁,openSUSE 也会恢复启用该文件系统。

  9. NASA 禁止中国公民参与其太空项目

    NASA 禁止持有有效签证的中国公民进入其设施,参与其太空项目。此前以合同工或学生身份参与 NASA 项目的中国公民在 9 月 5 日发现无法访问所有 NASA 系统和设施,NASA 随后证实它以国家安全理由禁止中国公民,“NASA 已针对中国公民采取了内部措施,包括限制其进入我们的设施、接触材料和网络,以确保我们工作的安全。”中美两国目前都在竞争重返月球,而美国的登月计划 Artemis 正面临成本超支和延误等问题。

  10. 为什么 Netflix 难以制作出高质量电影

    今年 2 月 Netflix 发布了一部饱受诟病的科幻片《The Electric State》,由明星 Chris Pratt 以及《怪奇物语》十一的扮演者 Millie Bobby Brown 主演。这部电影本应该很快被人遗忘,如果不是它的制作成本高达 3.2 亿美元的话。3.2 亿美元给 Netflix 带来了 Metacritic 综合评分 30/100,烂番茄综合评分 14%。为了填满其内容库,Netflix 投资制作了一系列低质量原创电影,它虽然也制作过一些高质量电影如《爱尔兰人》,但在影评网站如 IMDb、Letterboxd 和 TMDB 上,Netflix 电影的综合评分远低于院线电影。Netflix 曾与知名导演 Martin Scorsese、Alfonso Cuarón 和 Bradley Cooper 合作过,但大部分项目都是一次性的,知名导演很少会再次合作。今天很多导演拒绝与 Netflix 合作,即使 Netflix 提供更多的预算。《Weapons》的导演 Zach Cregger 拒绝了 Netflix 开出的 5000 万美元预算,而是选择了华纳兄弟的 3700 万美元预算和院线上映保证。Netflix 为 Emerald Fennell 和 Margot Robbie 改编自《呼啸山庄》的电影开出了 1.5 亿美元,但他们仍然选择了华纳兄弟的 8000 万美元预算和院线上映保证。

  11. 引力波证实霍金黑洞面积定理

    激光干涉引力波天文台(LIGO)探测到两个黑洞之间异常强烈的碰撞,这使得物理学家能够验证斯蒂芬·霍金在1971 年提出的黑洞面积定理。该定理指出,当两个黑洞合并时产生的黑洞视界,即连光都无法逃脱黑洞控制的边界,其面积不能小于两个原始黑洞的面积之和。该定理与热力学第二定律相呼应。热力学第二定律指出,熵或物体内部的无序状态永远不会减少。黑洞合并扭曲了宇宙的结构,产生了被称为引力波的微小时空波动,能被引力波探测器观测到。最近的这次碰撞被命名为 GW250114,与 2015 年首次观测到的产生引力波的碰撞几乎完全相同。这两次黑洞的质量都在太阳质量的 30-40 倍之间,发生在 13 亿光年之外。这一次升级后的 LIGO 探测器灵敏度是 2015 年的 3 倍,因此它们能够以前所未有的细节捕获碰撞产生的波。这使得研究人员能够通过计算证实黑洞合并后视界面积确实变大,从而验证了霍金的定理。

  12. 法国配音演员指控《古墓丽影 4-6 重制版》使用 AI 合成其声音

    古墓丽影系列的法语配音演员 Françoise Cadol 向《古墓丽影 4-6 重制版(Tomb Raider 4-6 Remastered)》开发商 Aspyr 发出停止通知函(cease and desist),指控 Aspyr 使用 AI 拷贝其声音但没有通知她或告诉游戏玩家。她形容此举是一种背叛,一种彻底的不尊重。除了法语,巴西和西班牙等地区的玩家也认为其语种的配音是由 AI 生成的,AI 合成了原配音演员的声音。巴西配音演员 Lene Bastos 收到了 Aspyr 的一封回信,它的调查显示外部开发合作伙伴在其不知情下使用生成式 AI 编辑原始声音,它表示自己没有授权这么做,对未能在审核中注意到该问题表示歉意。

  13. 小红书被要求限期整改

    网信办在一份简短声明中宣布以女性为主的社交应用小红书被要求限期整改。“针对小红书平台未落实信息内容管理主体责任,在热搜榜单重点环节频繁呈现多条炒作明星个人动态和琐事类词条等不良信息内容,破坏网络生态问题,国家网信办指导上海市网信办,依据《网络信息内容生态治理规定》等有关规定,对小红书平台采取约谈、责令限期改正、警告、从严处理责任人等处置处罚措施。”

  14. 甲骨文股价飙升,Larry Ellison 成为新首富

    甲骨文股价创下 1992 年以来最佳单日表现,股价飙升 36% 至 328 美元,市值增加 2440 亿美元接近一万亿美元大关。股价的上涨受益于 AI 驱动的云计算需求激增。股价飙升也使得公司创始人埃里森(Larry Ellison)的财富增加了 1000 亿美元,超过马斯克(Elon Musk)成为新的世界首富。

  15. 研究发现爱喝啤酒的人对蚊子有高吸引力

    一项音乐会研究发现,爱喝啤酒的人对蚊子有高吸引力。研究报告发表在预印本平台 bioRxiv 上。这项研究更多是一种轶事性质,有关蚊子如何跟踪人类有大量研究,人体会释放由体味、热量和二氧化碳组成的独特气味,蚊子通过这种气味跟踪人类,且蚊子接受气味的方法不只一种而是有多种。在这项研究中,研究人员将数千只雌性按蚊带到荷兰一年一度的 Lowlands 音乐节,搭建了一个临时实验室,招募了 500 名志愿者,填写了一份有关他们在音乐会期间饮食行为习惯的问卷,然后把手伸到装满蚊子的特质笼子内,蚊子能闻到人的气味但无法叮咬。摄像机记录了志愿者手臂上落下的蚊子数量,与笼子另一侧糖罐上的蚊子数量进行了比较。结果显示:喝啤酒的参与者对蚊子的吸引力比不喝啤酒的人高出 1.35 倍;前一天晚上与他人共枕的人也对蚊子有高吸引力;沐浴和涂防晒霜的人对蚊子的吸引力较低。研究人员一本正经的说,蚊子只对享乐主义者感兴趣。