OrangeBot.AI Digest — 2025-09-13
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- How 'overworked, underpaid' humans train Google's AI to seem smart (www.theguardian.com)
- Java 25's new CPU-Time Profiler (1) (mostlynerdless.de)
- AI Coding (geohot.github.io)
- Social media promised connection, but it has delivered exhaustion (www.noemamag.com)
- Nepal picks a new prime minister on a discord server days after social media ban (www.nytimes.com)
- Raspberry Pi Synthesizers – How the Pi is transforming synths (www.gearnews.com)
- SkiftOS: A hobby OS built from scratch using C/C++ for ARM, x86, and RISC-V (skiftos.org)
- How to Use Claude Code Subagents to Parallelize Development (zachwills.net)
- Legal win (ma.tt)
- California lawmakers pass SB 79, housing bill that brings dense housing (www.latimes.com)
- Life, work, death and the peasant: Rent and extraction (acoup.blog)
- Meow: Yet another modal editing on Emacs (github.com)
- FFglitch, FFmpeg fork for glitch art (ffglitch.org)
- Proton Mail suspended journalist accounts at request of cybersecurity agency (theintercept.com)
- I used standard Emacs extension-points to extend org-mode (edoput.it)
GitHub Trending(15)
- PowerShell / PowerShell
PowerShell for every system!
- trueadm / ripple
the elegant TypeScript UI framework
- MotiaDev / motia
Modern Backend Framework that unifies APIs, background jobs, workflows, and AI Agents into a single core primitive with built-in observability and state management.
- grpc / grpc-go
The Go language implementation of gRPC. HTTP/2 based RPC
- sentient-agi / ROMA
Recursive-Open-Meta-Agent v0.1 (Beta). A meta-agent framework to build high-performance multi-agent systems.
- ReVanced / revanced-patches
🧩 Patches for ReVanced
- Azure / azure-sdk-for-python
This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
- CodebuffAI / codebuff
Generate code from the terminal!
- protocolbuffers / protobuf
Protocol Buffers - Google's data interchange format
- facebook / folly
An open-source C++ library developed and used at Facebook.
- huggingface / transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- datawhalechina / happy-llm
📚 从零开始的大语言模型原理与实践教程
- simdjson / simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
- fla-org / flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models
- NVIDIA / garak
the LLM vulnerability scanner
Hugging Face(15)
- VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Vision-Language-Action (VLA) models typically bridge the gap between perceptual and action spaces by pre-training a large-scale Vision-Language Model (VLM) on robotic data. While this approach greatly enhances performance, it also incurs significant training costs. In this paper, we investigate how to effectively bridge vision-language (VL) representations to action (A). We introduce VLA-Adapter, a novel paradigm designed to reduce the reliance of VLA models on large-scale VLMs and extensive pre-training. To this end, we first systematically analyze the effectiveness of various VL conditions and present key findings on which conditions are essential for bridging perception and action spaces. Based on these insights, we propose a lightweight Policy module with Bridge Attention, which autonomously injects the optimal condition into the action space. In this way, our method achieves high performance using only a 0.5B-parameter backbone, without any robotic data pre-training. Extensive experiments on both simulated and real-world robotic benchmarks demonstrate that VLA-Adapter not only achieves state-of-the-art level performance, but also offers the fast inference speed reported to date. Furthermore, thanks to the proposed advanced bridging paradigm, VLA-Adapter enables the training of a powerful VLA model in just 8 hours on a single consumer-grade GPU, greatly lowering the barrier to deploying the VLA model. Project page: https://vla-adapter.github.io/.
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Human-Centric Video Generation (HCVG) methods seek to synthesize human videos from multimodal inputs, including text, image, and audio. Existing methods struggle to effectively coordinate these heterogeneous modalities due to two challenges: the scarcity of training data with paired triplet conditions and the difficulty of collaborating the sub-tasks of subject preservation and audio-visual sync with multimodal inputs. In this work, we present HuMo, a unified HCVG framework for collaborative multimodal control. For the first challenge, we construct a high-quality dataset with diverse and paired text, reference images, and audio. For the second challenge, we propose a two-stage progressive multimodal training paradigm with task-specific strategies. For the subject preservation task, to maintain the prompt following and visual generation abilities of the foundation model, we adopt the minimal-invasive image injection strategy. For the audio-visual sync task, besides the commonly adopted audio cross-attention layer, we propose a focus-by-predicting strategy that implicitly guides the model to associate audio with facial regions. For joint learning of controllabilities across multimodal inputs, building on previously acquired capabilities, we progressively incorporate the audio-visual sync task. During inference, for flexible and fine-grained multimodal control, we design a time-adaptive Classifier-Free Guidance strategy that dynamically adjusts guidance weights across denoising steps. Extensive experimental results demonstrate that HuMo surpasses specialized state-of-the-art methods in sub-tasks, establishing a unified framework for collaborative multimodal-conditioned HCVG. Project Page: https://phantom-video.github.io/HuMo.
- SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Vision-Language-Action (VLA) models have recently emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale human-operated robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks involving distribution shift. Recent breakthroughs in Large Reasoning Models (LRMs) demonstrate that reinforcement learning (RL) can dramatically enhance step-by-step reasoning capabilities, raising a natural question: Can RL similarly improve the long-horizon step-by-step action planning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RL framework tailored for VLA models. Building upon veRL, we introduce VLA-specific trajectory sampling, scalable parallelization, multi-environment rendering, and optimized loss computation. When applied to OpenVLA-OFT, SimpleVLA-RL achieves SoTA performance on LIBERO and even outperforms pi_0 on RoboTwin 1.0\&2.0 with the exploration-enhancing strategies we introduce. SimpleVLA-RL not only reduces dependence on large-scale data and enables robust generalization, but also remarkably surpasses SFT in real-world tasks. Moreover, we identify a novel phenomenon ``pushcut'' during RL training, wherein the policy discovers previously unseen patterns beyond those seen in the previous training process. Github: https://github.com/PRIME-RL/SimpleVLA-RL
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
Speech-to-speech large language models (SLLMs) are attracting increasing attention. Derived from text-based large language models (LLMs), SLLMs often exhibit degradation in knowledge and reasoning capabilities. We hypothesize that this limitation arises because current training paradigms for SLLMs fail to bridge the acoustic-semantic gap in the feature representation space. To address this issue, we propose EchoX, which leverages semantic representations and dynamically generates speech training targets. This approach integrates both acoustic and semantic learning, enabling EchoX to preserve strong reasoning abilities as a speech LLM. Experimental results demonstrate that EchoX, with about six thousand hours of training data, achieves advanced performance on multiple knowledge-based question-answering benchmarks. The project is available at https://github.com/FreedomIntelligence/EchoX.
- MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML
Large language models (LLMs) possess broad world knowledge and strong general-purpose reasoning ability, yet they struggle to learn from many in-context examples on standard machine learning (ML) tasks, that is, to leverage many-shot demonstrations purely via in-context learning (ICL) without gradient descent. We introduce MachineLearningLM, a portable continued-pretraining framework that equips a general-purpose LLM with robust in-context ML capability while preserving its general knowledge and reasoning for broader chat workflows. Our pretraining procedure synthesizes ML tasks from millions of structural causal models (SCMs), spanning shot counts up to 1,024. We begin with a random-forest teacher, distilling tree-based decision strategies into the LLM to strengthen robustness in numerical modeling. All tasks are serialized with a token-efficient prompt, enabling 3x to 6x more examples per context window and delivering up to 50x amortized throughput via batch inference. Despite a modest setup (Qwen-2.5-7B-Instruct with LoRA rank 8), MachineLearningLM outperforms strong LLM baselines (e.g., GPT-5-mini) by an average of about 15% on out-of-distribution tabular classification across finance, physics, biology, and healthcare domains. It exhibits a striking many-shot scaling law: accuracy increases monotonically as in-context demonstrations grow from 8 to 1,024. Without any task-specific training, it attains random-forest-level accuracy across hundreds of shots. General chat capabilities, including knowledge and reasoning, are preserved: it achieves 75.4% on MMLU.
- Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis
Recent advances in audio-driven avatar video generation have significantly enhanced audio-visual realism. However, existing methods treat instruction conditioning merely as low-level tracking driven by acoustic or visual cues, without modeling the communicative purpose conveyed by the instructions. This limitation compromises their narrative coherence and character expressiveness. To bridge this gap, we introduce Kling-Avatar, a novel cascaded framework that unifies multimodal instruction understanding with photorealistic portrait generation. Our approach adopts a two-stage pipeline. In the first stage, we design a multimodal large language model (MLLM) director that produces a blueprint video conditioned on diverse instruction signals, thereby governing high-level semantics such as character motion and emotions. In the second stage, guided by blueprint keyframes, we generate multiple sub-clips in parallel using a first-last frame strategy. This global-to-local framework preserves fine-grained details while faithfully encoding the high-level intent behind multimodal instructions. Our parallel architecture also enables fast and stable generation of long-duration videos, making it suitable for real-world applications such as digital human livestreaming and vlogging. To comprehensively evaluate our method, we construct a benchmark of 375 curated samples covering diverse instructions and challenging scenarios. Extensive experiments demonstrate that Kling-Avatar is capable of generating vivid, fluent, long-duration videos at up to 1080p and 48 fps, achieving superior performance in lip synchronization accuracy, emotion and dynamic expressiveness, instruction controllability, identity preservation, and cross-domain generalization. These results establish Kling-Avatar as a new benchmark for semantically grounded, high-fidelity audio-driven avatar synthesis.
- Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents
In long-horizon tasks, recent agents based on Large Language Models (LLMs) face a significant challenge that sparse, outcome-based rewards make it difficult to assign credit to intermediate steps. Previous methods mainly focus on creating dense reward signals to guide learning, either through traditional reinforcement learning techniques like inverse reinforcement learning or by using Process Reward Models for step-by-step feedback. In this paper, we identify a fundamental problem in the learning dynamics of LLMs: the magnitude of policy gradients is inherently coupled with the entropy, which leads to inefficient small updates for confident correct actions and potentially destabilizes large updates for uncertain ones. To resolve this, we propose Entropy-Modulated Policy Gradients (EMPG), a framework that re-calibrates the learning signal based on step-wise uncertainty and the final task outcome. EMPG amplifies updates for confident correct actions, penalizes confident errors, and attenuates updates from uncertain steps to stabilize exploration. We further introduce a bonus term for future clarity that encourages agents to find more predictable solution paths. Through comprehensive experiments on three challenging agent tasks, WebShop, ALFWorld, and Deep Search, we demonstrate that EMPG achieves substantial performance gains and significantly outperforms strong policy gradient baselines. Project page is at https://empgseed-seed.github.io/
- FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason-6M and PRISM-Bench (Precise and Robust Image Synthesis Measurement Benchmark). FLUX-Reason-6M is a massive dataset consisting of 6 million high-quality FLUX-generated images and 20 million bilingual (English and Chinese) descriptions specifically designed to teach complex reasoning. The image are organized according to six key characteristics: Imagination, Entity, Text rendering, Style, Affection, and Composition, and design explicit Generation Chain-of-Thought (GCoT) to provide detailed breakdowns of image generation steps. The whole data curation takes 15,000 A100 GPU days, providing the community with a resource previously unattainable outside of large industrial labs. PRISM-Bench offers a novel evaluation standard with seven distinct tracks, including a formidable Long Text challenge using GCoT. Through carefully designed prompts, it utilizes advanced vision-language models for nuanced human-aligned assessment of prompt-image alignment and image aesthetics. Our extensive evaluation of 19 leading models on PRISM-Bench reveals critical performance gaps and highlights specific areas requiring improvement. Our dataset, benchmark, and evaluation code are released to catalyze the next wave of reasoning-oriented T2I generation. Project page: https://flux-reason-6m.github.io/ .
- Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
In this paper, we introduce an insightful paradigm through the Auto-Encoder lens-understanding as the encoder (I2T) that compresses images into text, and generation as the decoder (T2I) that reconstructs images from that text. Using reconstruction fidelity as the unified training objective, we enforce the coherent bidirectional information flow between the understanding and generation processes, bringing mutual gains. To implement this, we propose UAE, a novel framework for unified multimodal learning. We begin by pre-training the decoder with large-scale long-context image captions to capture fine-grained semantic and complex spatial relationships. We then propose Unified-GRPO via reinforcement learning (RL), which covers three stages: (1) A cold-start phase to gently initialize both encoder and decoder with a semantic reconstruction loss; (2) Generation for Understanding, where the encoder is trained to generate informative captions that maximize the decoder's reconstruction quality, enhancing its visual understanding; (3) Understanding for Generation, where the decoder is refined to reconstruct from these captions, forcing it to leverage every detail and improving its long-context instruction following and generation fidelity. For evaluation, we introduce Unified-Bench, the first benchmark tailored to assess the degree of unification of the UMMs. A surprising "aha moment" arises within the multimodal learning domain: as RL progresses, the encoder autonomously produces more descriptive captions, while the decoder simultaneously demonstrates a profound ability to understand these intricate descriptions, resulting in reconstructions of striking fidelity.
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Significant progress has been made in spatial intelligence, spanning both spatial reconstruction and world exploration. However, the scalability and real-world fidelity of current models remain severely constrained by the scarcity of large-scale, high-quality training data. While several datasets provide camera pose information, they are typically limited in scale, diversity, and annotation richness, particularly for real-world dynamic scenes with ground-truth camera motion. To this end, we collect SpatialVID, a dataset consists of a large corpus of in-the-wild videos with diverse scenes, camera movements and dense 3D annotations such as per-frame camera poses, depth, and motion instructions. Specifically, we collect more than 21,000 hours of raw video, and process them into 2.7 million clips through a hierarchical filtering pipeline, totaling 7,089 hours of dynamic content. A subsequent annotation pipeline enriches these clips with detailed spatial and semantic information, including camera poses, depth maps, dynamic masks, structured captions, and serialized motion instructions. Analysis of SpatialVID's data statistics reveals a richness and diversity that directly foster improved model generalization and performance, establishing it as a key asset for the video and 3D vision research community.
- AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Large Audio Language Models (LALMs) are rapidly advancing, but evaluating them remains challenging due to inefficient toolkits that limit fair comparison and systematic assessment. Current frameworks suffer from three critical issues: slow processing that bottlenecks large-scale studies, inconsistent prompting that hurts reproducibility, and narrow task coverage that misses important audio reasoning capabilities. We introduce AU-Harness, an efficient and comprehensive evaluation framework for LALMs. Our system achieves a speedup of up to 127% over existing toolkits through optimized batch processing and parallel execution, enabling large-scale evaluations previously impractical. We provide standardized prompting protocols and flexible configurations for fair model comparison across diverse scenarios. Additionally, we introduce two new evaluation categories: LLM-Adaptive Diarization for temporal audio understanding and Spoken Language Reasoning for complex audio-based cognitive tasks. Through evaluation across 380+ tasks, we reveal significant gaps in current LALMs, particularly in temporal understanding and complex spoken language reasoning tasks. Our findings also highlight a lack of standardization in instruction modality existent across audio benchmarks, which can lead up performance differences up to 9.5 absolute points on the challenging complex instruction following downstream tasks. AU-Harness provides both practical evaluation tools and insights into model limitations, advancing systematic LALM development.
- mmBERT: A Modern Multilingual Encoder with Annealed Language Learning
Encoder-only languages models are frequently used for a variety of standard machine learning tasks, including classification and retrieval. However, there has been a lack of recent research for encoder models, especially with respect to multilingual models. We introduce mmBERT, an encoder-only language model pretrained on 3T tokens of multilingual text in over 1800 languages. To build mmBERT we introduce several novel elements, including an inverse mask ratio schedule and an inverse temperature sampling ratio. We add over 1700 low-resource languages to the data mix only during the decay phase, showing that it boosts performance dramatically and maximizes the gains from the relatively small amount of training data. Despite only including these low-resource languages in the short decay phase we achieve similar classification performance to models like OpenAI's o3 and Google's Gemini 2.5 Pro. Overall, we show that mmBERT significantly outperforms the previous generation of models on classification and retrieval tasks -- on both high and low-resource languages.
- Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
Understanding 3D spatial relationships remains a major limitation of current Vision-Language Models (VLMs). Prior work has addressed this issue by creating spatial question-answering (QA) datasets based on single images or indoor videos. However, real-world embodied AI agents such as robots and self-driving cars typically rely on ego-centric, multi-view observations. To this end, we introduce Ego3D-Bench, a new benchmark designed to evaluate the spatial reasoning abilities of VLMs using ego-centric, multi-view outdoor data. Ego3D-Bench comprises over 8,600 QA pairs, created with significant involvement from human annotators to ensure quality and diversity. We benchmark 16 SOTA VLMs, including GPT-4o, Gemini1.5-Pro, InternVL3, and Qwen2.5-VL. Our results reveal a notable performance gap between human level scores and VLM performance, highlighting that current VLMs still fall short of human level spatial understanding. To bridge this gap, we propose Ego3D-VLM, a post-training framework that enhances 3D spatial reasoning of VLMs. Ego3D-VLM generates cognitive map based on estimated global 3D coordinates, resulting in 12% average improvement on multi-choice QA and 56% average improvement on absolute distance estimation. Ego3D-VLM is modular and can be integrated with any existing VLM. Together, Ego3D-Bench and Ego3D-VLM offer valuable tools for advancing toward human level spatial understanding in real-world, multi-view environments.
- Visual Programmability: A Guide for Code-as-Thought in Chart Understanding
Chart understanding presents a critical test to the reasoning capabilities of Vision-Language Models (VLMs). Prior approaches face critical limitations: some rely on external tools, making them brittle and constrained by a predefined toolkit, while others fine-tune specialist models that often adopt a single reasoning strategy, such as text-based chain-of-thought (CoT). The intermediate steps of text-based reasoning are difficult to verify, which complicates the use of reinforcement-learning signals that reward factual accuracy. To address this, we propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format. Our key insight is that this strategy must be adaptive: a fixed, code-only implementation consistently fails on complex charts where symbolic representation is unsuitable. This finding leads us to introduce Visual Programmability: a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis. We implement this concept in an adaptive framework where a VLM learns to choose between the CaT pathway and a direct visual reasoning pathway. The selection policy of the model is trained with reinforcement learning using a novel dual-reward system. This system combines a data-accuracy reward to ground the model in facts and prevent numerical hallucination, with a decision reward that teaches the model when to use each strategy, preventing it from defaulting to a single reasoning mode. Experiments demonstrate strong and robust performance across diverse chart-understanding benchmarks. Our work shows that VLMs can be taught not only to reason but also how to reason, dynamically selecting the optimal reasoning pathway for each task.
- Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval
Although Contrastive Language-Image Pre-training (CLIP) exhibits strong performance across diverse vision tasks, its application to person representation learning faces two critical challenges: (i) the scarcity of large-scale annotated vision-language data focused on person-centric images, and (ii) the inherent limitations of global contrastive learning, which struggles to maintain discriminative local features crucial for fine-grained matching while remaining vulnerable to noisy text tokens. This work advances CLIP for person representation learning through synergistic improvements in data curation and model architecture. First, we develop a noise-resistant data construction pipeline that leverages the in-context learning capabilities of MLLMs to automatically filter and caption web-sourced images. This yields WebPerson, a large-scale dataset of 5M high-quality person-centric image-text pairs. Second, we introduce the GA-DMS (Gradient-Attention Guided Dual-Masking Synergetic) framework, which improves cross-modal alignment by adaptively masking noisy textual tokens based on the gradient-attention similarity score. Additionally, we incorporate masked token prediction objectives that compel the model to predict informative text tokens, enhancing fine-grained semantic representation learning. Extensive experiments show that GA-DMS achieves state-of-the-art performance across multiple benchmarks.
Solidot(15)
- 互联网档案馆保存的网页数即将突破 1 万亿
互联网档案馆即将在下个月迎来一个里程碑:已保存以及能通过时光机器(Wayback Machine)访问的网页数突破 1 万亿。互联网档案馆表示举办一系列活动庆祝这一时刻。互联网档案馆创办于 1996 年,其使命是“普及所有知识”(universal access to all knowledge)。它提供的数字资料有如网站、网页、图形材料音乐、视频、音频、软件、动态图像和数百万书籍等的永久性免费储存及获取的副本。但在 2001 年时光机器推出前早期保存的网页等数据并不能提供访问。
- 尼泊尔 Z 世代抗议中的技术力量
因政府封禁大部分社交媒体,以及对精英阶层的炫富行为感到愤怒,本周一 尼泊尔 Z 世代举行了大规模抗议。周一的抗议以悲剧收场,至少 19 人死亡。但在政府同意解除对社交媒体的封禁之后,更多尼泊尔人加入了抗议群体,议会大厦被烧毁,总理宣布辞职,军队暂时接手维持次序。下一步怎么办?年轻一代的活动人士转向了广泛使用的游戏聊天应用 Discord,他们热烈的讨论谁可以成为临时政府领导人。通过非正式投票,前最高法院首席大法官 Sushila Karki 脱颖而出。本周五,Sushila Karki 正式担任临时政府总理,她成为尼泊尔历史上首位女总理。根据世界银行的数据,尼泊尔 3000 万人口中逾半数上网。在抗议活动爆发前几天,很多人就转向了 VPN 等工具绕过封锁。因担心互联网被关闭,Jack Dorsey 创建的蓝牙消息应用 Bitchat 的下载量激增。在这次 Z 世代抗议运动中,科技扮演了重要角色。
- Proton Mail 应网络安全机构要求关闭了记者账户
瑞士加密邮箱服务 ProtonMail 再次引发了争议,它自称是用户个人数据的中立的安全避风港,致力于捍卫用户的自由,但在没有给出任何解释的情况下关闭了两位记者的账户,直到引发长达数周的争论和抗议之后它才恢复账户。它没有给出关闭账户的详细解释。在账户被关闭时两名记者正以 Saber 和 cyb0rg 笔名为黑客杂志《Phrack》的 8 月份撰写一篇朝鲜政府支持的黑客组织入侵韩国多个政府机构计算机网络的文章。文章称入侵活动是朝鲜 APT 组织 Kimsuky 发动的,包括韩国外交部和国防反间谍司令部在内的政府机构网络都被渗透。但上个月 ProtonMail 在收到未指明的网络安全机构的投诉后关闭了记者的账户,影响了记者与受影响机构的沟通。
- 中国电动汽车技术如何重塑全球汽车设计
2021 年奥迪高管首次看到极氪001 时倍感震惊,意识到如果想与中国竞争,需要借助中国的技术。奥迪仅用 18 个月就基于中国合作伙伴上汽提供的技术(包括电池、电动动力系统、车载信息娱乐软件和高级驾驶辅助系统)打造了奥迪 E5 Sportback。这款售价 3.3 万美元的电动汽车将于本月开始交付给中国消费者。奥迪的全球竞争对手也效仿这一做法,利用中国的知识产权快速推出新车型。丰田和大众分别与中国合作伙伴广汽和小鹏汽车合作开发专供中国市场的车型;雷诺和福特则计划更进一步,基于中国电动车平台开发全球车型。这一战略类似 1990 年代英特尔处理器的“Intel Inside”,中国汽车公司推销“China Inside”战略,通过授权现成的电动汽车技术,为传统车企节省数十亿美元和数年研发时间,赶超竞争对手。受特斯拉启发,中国电动车企开发出模块化平台,降低成本、加速研发、降低准入门槛。对全球汽车公司而言,风险是它们可能会变成某种零售商。
- Apache 软件基金会使用新 Logo 和名字 ASF
Apache 软件基金会宣布了新 Logo 和名字 ASF。旧 Logo 突出的 Apache 这个名字,新 Logo 则突出的 Apache Software Foundation 的首字母缩写 ASF。此外旧 Logo 使用的羽毛被橡树叶取代。Apache 软件基金会称,橡树是一种历久弥坚的树,能代表 ASF 持久稳定的精神。ASF 将作为基金会的视觉形象识别使用。
- 大英百科和韦氏词典指控 Perplexity 侵犯版权和商标权
Perplexity AI 成为最新一家被版权所有者起诉的 AI 公司。Perplexity 的“答案引擎(answer engine)”通过搜索互联网并总结其发现的内容,为传统搜索引擎提供了一种基于 AI 的替代。大英百科全书(Encyclopedia Britannica)和韦氏词典(Merriam-Webster)指控 Perplexity 未经许可抓取了其网站内容,复制并转载其内容,侵犯了版权。Perplexity 生成的 AI 答案不可避免的会带有幻觉——即捏造的错误信息。大英百科和韦氏词典指控 Perplexity 将 AI 的幻觉归因于它们,侵犯了其商标权。它们要求赔偿并禁止 Perplexity 滥用其内容。
- 世嘉不小心卖掉了任天堂开发套件,恐惧下报警突击搜查买家
日本游戏公司世嘉在英国的办事处从 Brentford 迁至 Chiswick Business Park,它处理了旧办公室的遗留物品,负责处理的人将包括 Game Boy Advance、DSi、3DS、Wii 和 Wii U 等任天堂开发主机,以及 Sonic Chronicles 和 Mario & Sonic 等原型游戏以 13,575 美元卖给了一位英国买家。世嘉很快意识到它惹了大麻烦,任天堂的法务不仅让玩家恐惧,游戏开发商同样恐惧。为了追回卖掉的任天堂游戏机开发套件,它报了警。英国警方于 7 月 14 日派遣了 10 名警员突击搜查了买家的家,扣押了对方购买的设备,拘留了买家 8 小时。买家被要求放弃其购买任天堂游戏机开发套件的所有权,但遭到了拒绝。
- 为打击腐败阿尔巴尼亚任命了一名 AI 部长
阿尔巴尼亚负责公共采购的新任部长不会接受贿赂、不会受到威胁或巴结的影响。因为她是 AI 聊天机器人 Diella。阿尔巴尼亚被认为是全球贩毒和武器犯罪团伙洗钱的中心,腐败早已蔓延到权力中心。任命一名 AI 部长是打击腐败的措施的一部分。即将开始第四个总理任期的 Edi Rama 称 Diella 在阿尔巴尼亚语中意思是“太阳”,将负责政府与私营公司签订各类项目的公共招标,帮助阿尔巴尼亚“成为一个公共招标 100% 零腐败的国家”。
- AirPods 实时翻译功能暂不向欧洲和中国大陆提供
苹果在本周举行的新闻发布会上推出了新款 AirPods,改进了主动降噪,搭配了定制心率传感器,以及支持实时翻译功能。其中翻译功能需要运行 iOS 26 系统的 iPhone 15 Pro 或更新机型。但在 AirPods 正式发售时,实时翻译功能不会提供给欧盟地区的用户,也不会提供给中国大陆的用户,苹果对此没有给出太多的解释。不过苹果中国表示,中国三大电信运营商将为 iPhone Air 提供 eSIM 支持。
- 全球消费的鳗鱼 99% 属于濒危物种
日本中央大学与台湾大学研究团队发现全球消费的鳗鱼 99% 以上属于三种濒危物种。这些被广泛食用的鳗鱼是美洲鳗、日本鳗和欧洲鳗,它们被世界自然保护联盟评估为濒临灭绝。鳗鱼在全球范围内存在大量不透明交易,难以掌握实际流通量,此次研究为了解真实情况提供了线索。研究团队对 2023-2025 年在亚洲、欧美及大洋洲 11 个国家和地区 26 个城市购买的 282 件加工品和生鲜品进行基因检测,确定了种类。其中美洲鳗 154 件,日本鳗 120 件,欧洲鳗 4 件,印度尼西亚短鳍鳗 1 件,另有 3 件未能进行分析。根据该结果,结合各国的生产量、贸易统计数据和市场规模推测全球流通比例,美洲鳗占 75.3%,日本鳗占 18.0%,欧洲鳗占 6.7%。从国别流通量来看,中国约占 60%,日本约占 19%,东亚地区可能占了大半。
- 矮行星鸟神星发现甲烷气体
天文学家使用韦伯太空望远镜(JWST)首次在遥远的矮行星鸟神星(Makemake)上发现甲烷气体。这项成果颠覆了人们以往认为鸟神星只是颗冰封天体的看法,也让它成为继冥王星后,第二颗确认存在稀薄气体的海王星外天体。鸟神星于 2005 年由加州理工学院研究团队发现,半径约 715 公里,比冥王星稍小且更黯淡,绕太阳公转一圈需 305 年。过去的恒星掩星观测显示它没有明显大气,但不排除有稀薄大气存在的可能。红外线数据则揭示矮行星表面的甲烷冰有奇怪的热异常,暗示可能有局部热点并释放气体。研究团队指出,鸟神星是目前发现的海王星外天体中,最大的冰封天体之一,表面以甲烷冰为主。近期由韦伯望远镜的观测结果发现,在甲烷冰地表之上也存在稀薄的甲烷气体层,显示它的内部并非死寂,而是仍在不断变动中。鸟神星可能有一层非常稀薄的大气,类似冥王星。而这些气体也可能来自更短暂的局部地质活动,例如冰火山的羽状喷流或如同彗星般的升华作用。
- 章鱼有偏好使用的腕足
章鱼能用任意腕足执行任务,但它们更倾向于使用某个或某几个腕足执行特定任务。章鱼腕足是由围绕一个中枢神经的四个不同肌群组成的复杂结构,这四个肌群分别为横肌、纵肌、斜肌和环肌,它们能让章鱼腕足以不同方式变形,由此做出一系列动作以完成各种行为,从狩猎和移动到自卫。然而此前并不了解野生章鱼如何利用并协调它们的腕足。研究人员分析了关于野生章鱼的 25 个 1 分钟视频,这些视频拍摄于 2007-2015 年间的大西洋和加勒比海。他们发现,所有这些章鱼均能让全部八个腕足以四种不同方式变形,并能用每个腕足完成所有动作。他们还发现,身体两侧的腕足使用率均等,但前面四个腕足的使用率远高于后面四个腕足(64%比36%)。前腕足更有可能用于探索周围环境,而后腕足更可能用于让章鱼到处移动。因此,两个动作更常用到后腕足:一个是翻滚,此时腕足在章鱼身下顺着海底移动,类似传送带;另一个是撑地“踩高跷”,此时腕足向下笔直延伸以抬起身体。
- Windows 开发者可免费在 Microsoft Store 发布应用
微软官方博客宣布,近 200 个国家的开发者只需要拥有个人的 Microsoft 帐户就可在其应用商店 Microsoft Store 免费发布应用。此前开发者在 Microsoft Store 发布应用需要支付一次性 19 美元的费用,苹果应用商店收费最贵——开发者需要每年支付 99 美元,而 Google 则是收取一次性 25 美元的注册费。微软称,Microsoft Store 每月有逾 2.5 亿活跃用户。微软允许开发者在其应用商店发布 Win32、UWP、PWA、.NET、MAUI 或 Electron 等不同类型的应用。开发者甚至允许使用自己的应用内交易系统,获得 100% 的非游戏应用收入。微软如此大方的一个原因是它的应用商店在 Windows 平台并不具有垄断性质,它需要吸引开发者使用其应用商店。
- Vimeo 以 13.8 亿美元出售给 Bending Spoons
曾与 YouTube 等竞争的视频网站 Vimeo 以 13.8 亿美元现金出售给意大利公司 Bending Spoons。Vimeo(VMEO)被私有化,交易预计将于 2026 年初完成,Vimeo 股东将以每股 7.85 美元获得现金。Vimeo 成立于 2004 年,是最早的视频分享平台之一,其视频以高质量著称,最近几年其市场份额逐渐被 YouTube 等蚕食,该公司因此更专注于满足企业客户的需求。Bending Spoons 曾在 2023 年收购了知名笔记应用印象笔记(Evernote),然后解雇了所有员工(这种做法不是第一次了)。最新收购对 Vimeo 的用户可能不是好消息。
- Firefox 支持播放 MKV 内容
Firefox 最新的 Nightly 版本终于加入了对 Matroska MKV 内容的播放支持。该功能请求用户已经递交了八年之久。目前对 MKV 内容的支持仅限于 Nightly 版,或者用户也可以修改 media.mkv.enabled 选项选择启用。播放 MKV 内容功能目前只支持 AVC/H.264 和 AAC 编解码器,未来会逐渐支持其它编解码器。