OrangeBot.AI Digest — 2025-10-15
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Getting syntax highlighting wrong (tonsky.me)
- Claude Haiku 4.5 (www.anthropic.com)
- I almost got hacked by a 'job interview' (blog.daviddodda.com)
- You are the scariest monster in the woods (jamie.ideasasylum.com)
- M5 MacBook Pro (www.apple.com)
- Mac Source Ports – Run old games on new Macs (www.macsourceports.com)
- Pwning the Nix ecosystem (ptrpa.ws)
- iPad Pro with M5 chip (www.apple.com)
- Apple M5 chip (www.apple.com)
- Apple Vision Pro upgraded with M5 chip (www.apple.com)
- Ireland is making basic income for artists program permanent (www.artnews.com)
- Show HN: Halloy – Modern IRC client (github.com)
- Leaving serverless led to performance improvement and a simplified architecture (www.unkey.com)
- Bots are getting good at mimicking engagement (joindatacops.com)
- The cost of turning down wind turbines in Britain (wastedwind.energy)
GitHub Trending(15)
- anthropics / prompt-eng-interactive-tutorial
Anthropic's Interactive Prompt Engineering Tutorial
- jingyaogong / minimind
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
- nitrojs / nitro
Next Generation Server Toolkit. Create web servers with everything you need and deploy them wherever you prefer.
- langchain-ai / langchainjs
🦜🔗 Build context-aware reasoning applications 🦜🔗
- karpathy / nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
- nvm-sh / nvm
Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions
- envoyproxy / envoy
Cloud-native high-performance edge/middle/service proxy
- EvolutionAPI / evolution-api
Evolution API is an open-source WhatsApp integration API
- devlikeapro / waha
WAHA - WhatsApp HTTP API (REST API) that you can configure in a click! 3 engines: WEBJS (browser based), NOWEB (websocket nodejs), GOWS (websocket go)
- enactic / openarm
A fully open-source humanoid arm for physical AI research and deployment in contact-rich environments.
- DigitalPlatDev / FreeDomain
DigitalPlat FreeDomain: Free Domain For Everyone
- ChristianLempa / boilerplates
This is my personal template collection. Here you'll find templates, and configurations for various tools, and technologies.
- alibaba / spring-ai-alibaba
Agentic AI Framework for Java Developers
- dair-ai / Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
- nanobrowser / nanobrowser
Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.
Hugging Face(15)
- Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Vision-language-action (VLA) models have recently shown strong potential in enabling robots to follow language instructions and execute precise actions. However, most VLAs are built upon vision-language models pretrained solely on 2D data, which lack accurate spatial awareness and hinder their ability to operate in the 3D physical world. Existing solutions attempt to incorporate explicit 3D sensor inputs such as depth maps or point clouds, but these approaches face challenges due to sensor noise, hardware heterogeneity, and incomplete depth coverage in existing datasets. Alternative methods that estimate 3D cues from 2D images also suffer from the limited performance of depth estimators.We propose Spatial Forcing (SF), a simple yet effective alignment strategy that implicitly forces VLA models to develop spatial comprehension capabilities without relying on explicit 3D inputs or depth estimators. SF aligns intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. By enforcing alignment at intermediate layers, SF guides VLAs to encode richer spatial representations that enhance action precision.Extensive experiments in simulation and real-world environments demonstrate that SF achieves state-of-the-art results, surpassing both 2D- and 3D-based VLAs. SF further accelerates training by up to 3.8x and improves data efficiency across diverse robotic tasks. Project page is at https://spatial-forcing.github.io/
- DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation
Large language models (LLMs) have substantially advanced machine translation (MT), yet their effectiveness in translating web novels remains unclear. Existing benchmarks rely on surface-level metrics that fail to capture the distinctive traits of this genre. To address these gaps, we introduce DITING, the first comprehensive evaluation framework for web novel translation, assessing narrative and cultural fidelity across six dimensions: idiom translation, lexical ambiguity, terminology localization, tense consistency, zero-pronoun resolution, and cultural safety, supported by over 18K expert-annotated Chinese-English sentence pairs. We further propose AgentEval, a reasoning-driven multi-agent evaluation framework that simulates expert deliberation to assess translation quality beyond lexical overlap, achieving the highest correlation with human judgments among seven tested automatic metrics. To enable metric comparison, we develop MetricAlign, a meta-evaluation dataset of 300 sentence pairs annotated with error labels and scalar quality scores. Comprehensive evaluation of fourteen open, closed, and commercial models reveals that Chinese-trained LLMs surpass larger foreign counterparts, and that DeepSeek-V3 delivers the most faithful and stylistically coherent translations. Our work establishes a new paradigm for exploring LLM-based web novel translation and provides public resources to advance future research.
- Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training
Pixel-space generative models are often more difficult to train and generally underperform compared to their latent-space counterparts, leaving a persistent performance and efficiency gap. In this paper, we introduce a novel two-stage training framework that closes this gap for pixel-space diffusion and consistency models. In the first stage, we pre-train encoders to capture meaningful semantics from clean images while aligning them with points along the same deterministic sampling trajectory, which evolves points from the prior to the data distribution. In the second stage, we integrate the encoder with a randomly initialized decoder and fine-tune the complete model end-to-end for both diffusion and consistency models. Our training framework demonstrates strong empirical performance on ImageNet dataset. Specifically, our diffusion model reaches an FID of 2.04 on ImageNet-256 and 2.35 on ImageNet-512 with 75 number of function evaluations (NFE), surpassing prior pixel-space methods by a large margin in both generation quality and efficiency while rivaling leading VAE-based models at comparable training cost. Furthermore, on ImageNet-256, our consistency model achieves an impressive FID of 8.82 in a single sampling step, significantly surpassing its latent-space counterpart. To the best of our knowledge, this marks the first successful training of a consistency model directly on high-resolution images without relying on pre-trained VAEs or diffusion models.
- Scaling Language-Centric Omnimodal Representation Learning
Recent multimodal embedding approaches leveraging multimodal large language models (MLLMs) fine-tuned with contrastive learning (CL) have shown promising results, yet the underlying reasons behind their superiority remain underexplored. This work argues that a crucial advantage of MLLM-based approaches stems from implicit cross-modal alignment achieved during generative pretraining, where the language decoder learns to exploit multimodal signals within a shared representation space for generating unimodal outputs. Through analysis of anisotropy and kernel similarity structure, we empirically confirm that latent alignment emerges within MLLM representations, allowing CL to serve as a lightweight refinement stage. Leveraging this insight, we propose a Language-Centric Omnimodal Embedding framework, termed LCO-Emb. Extensive experiments across diverse backbones and benchmarks demonstrate its effectiveness, achieving state-of-the-art performance across modalities. Furthermore, we identify a Generation-Representation Scaling Law (GRSL), showing that the representational capabilities gained through contrastive refinement scales positively with the MLLM's generative capabilities. This suggests that improving generative abilities evolves as an effective paradigm for enhancing representation quality. We provide a theoretical explanation of GRSL, which formally links the MLLM's generative quality to the upper bound on its representation performance, and validate it on a challenging, low-resource visual-document retrieval task, showing that continual generative pretraining before CL can further enhance the potential of a model's embedding capabilities. Codes, models, and resources are available at https://github.com/LCO-Embedding/LCO-Embedding.
- Robot Learning: A Tutorial
Robot learning is at an inflection point, driven by rapid advancements in machine learning and the growing availability of large-scale robotics data. This shift from classical, model-based methods to data-driven, learning-based paradigms is unlocking unprecedented capabilities in autonomous systems. This tutorial navigates the landscape of modern robot learning, charting a course from the foundational principles of Reinforcement Learning and Behavioral Cloning to generalist, language-conditioned models capable of operating across diverse tasks and even robot embodiments. This work is intended as a guide for researchers and practitioners, and our goal is to equip the reader with the conceptual understanding and practical tools necessary to contribute to developments in robot learning, with ready-to-use examples implemented in lerobot.
- Detect Anything via Next Point Prediction
Object detection has long been dominated by traditional coordinate regression-based models, such as YOLO, DETR, and Grounding DINO. Although recent efforts have attempted to leverage MLLMs to tackle this task, they face challenges like low recall rate, duplicate predictions, coordinate misalignment, etc. In this work, we bridge this gap and propose Rex-Omni, a 3B-scale MLLM that achieves state-of-the-art object perception performance. On benchmarks like COCO and LVIS, Rex-Omni attains performance comparable to or exceeding regression-based models (e.g., DINO, Grounding DINO) in a zero-shot setting. This is enabled by three key designs: 1) Task Formulation: we use special tokens to represent quantized coordinates from 0 to 999, reducing the model's learning difficulty and improving token efficiency for coordinate prediction; 2) Data Engines: we construct multiple data engines to generate high-quality grounding, referring, and pointing data, providing semantically rich supervision for training; \3) Training Pipelines: we employ a two-stage training process, combining supervised fine-tuning on 22 million data with GRPO-based reinforcement post-training. This RL post-training leverages geometry-aware rewards to effectively bridge the discrete-to-continuous coordinate prediction gap, improve box accuracy, and mitigate undesirable behaviors like duplicate predictions that stem from the teacher-guided nature of the initial SFT stage. Beyond conventional detection, Rex-Omni's inherent language understanding enables versatile capabilities such as object referring, pointing, visual prompting, GUI grounding, spatial referring, OCR and key-pointing, all systematically evaluated on dedicated benchmarks. We believe that Rex-Omni paves the way for more versatile and language-aware visual perception systems.
- A Survey of Vibe Coding with Large Language Models
The advancement of large language models (LLMs) has catalyzed a paradigm shift from code generation assistance to autonomous coding agents, enabling a novel development methodology termed "Vibe Coding" where developers validate AI-generated implementations through outcome observation rather than line-by-line code comprehension. Despite its transformative potential, the effectiveness of this emergent paradigm remains under-explored, with empirical evidence revealing unexpected productivity losses and fundamental challenges in human-AI collaboration. To address this gap, this survey provides the first comprehensive and systematic review of Vibe Coding with large language models, establishing both theoretical foundations and practical frameworks for this transformative development approach. Drawing from systematic analysis of over 1000 research papers, we survey the entire vibe coding ecosystem, examining critical infrastructure components including LLMs for coding, LLM-based coding agent, development environment of coding agent, and feedback mechanisms. We first introduce Vibe Coding as a formal discipline by formalizing it through a Constrained Markov Decision Process that captures the dynamic triadic relationship among human developers, software projects, and coding agents. Building upon this theoretical foundation, we then synthesize existing practices into five distinct development models: Unconstrained Automation, Iterative Conversational Collaboration, Planning-Driven, Test-Driven, and Context-Enhanced Models, thus providing the first comprehensive taxonomy in this domain. Critically, our analysis reveals that successful Vibe Coding depends not merely on agent capabilities but on systematic context engineering, well-established development environments, and human-agent collaborative development models.
- FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution
Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving efficiency, scalability, and real-time performance. To this end, we propose FlashVSR, the first diffusion-based one-step streaming framework towards real-time VSR. FlashVSR runs at approximately 17 FPS for 768x1408 videos on a single A100 GPU by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train-test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct VSR-120K, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to 12x speedup over prior one-step diffusion VSR models. We will release the code, pretrained models, and dataset to foster future research in efficient diffusion-based VSR.
- Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models
Diffusion models have achieved remarkable success as generative models. However, even a well-trained model can accumulate errors throughout the generation process. These errors become particularly problematic when arbitrary guidance is applied to steer samples toward desired properties, which often breaks sample fidelity. In this paper, we propose a general solution to address the off-manifold phenomenon observed in diffusion models. Our approach leverages a time predictor to estimate deviations from the desired data manifold at each timestep, identifying that a larger time gap is associated with reduced generation quality. We then design a novel guidance mechanism, `Temporal Alignment Guidance' (TAG), attracting the samples back to the desired manifold at every timestep during generation. Through extensive experiments, we demonstrate that TAG consistently produces samples closely aligned with the desired manifold at each timestep, leading to significant improvements in generation quality across various downstream tasks.
- Dr.LLM: Dynamic Layer Routing in LLMs
Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need deeper reasoning. Adaptive-depth methods can improve efficiency, but prior approaches rely on costly inference-time search, architectural changes, or large-scale retraining, and in practice often degrade accuracy despite efficiency gains. We introduce Dr.LLM, Dynamic routing of Layers for LLMs, a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block. Routers are trained with explicit supervision: using Monte Carlo Tree Search (MCTS), we derive high-quality layer configurations that preserve or improve accuracy under a compute budget. Our design, windowed pooling for stable routing, focal loss with class balancing, and bottleneck MLP routers, ensures robustness under class imbalance and long sequences. On ARC (logic) and DART (math), Dr.LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average. Routers generalize to out-of-domain tasks (MMLU, GSM8k, AIME, TruthfulQA, SQuADv2, GPQA, PIQA, AGIEval) with only 0.85% accuracy drop while retaining efficiency, and outperform prior routing methods by up to +7.7%p. Overall, Dr.LLM shows that explicitly supervised routers retrofit frozen LLMs for budget-aware, accuracy-driven inference without altering base weights.
- ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Recent advances in embodied AI highlight the potential of vision language models (VLMs) as agents capable of perception, reasoning, and interaction in complex environments. However, top-performing systems rely on large-scale models that are costly to deploy, while smaller VLMs lack the necessary knowledge and skills to succeed. To bridge this gap, we present Embodied Reasoning Agent (ERA), a two-stage framework that integrates prior knowledge learning and online reinforcement learning (RL). The first stage, Embodied Prior Learning, distills foundational knowledge from three types of data: (1) Trajectory-Augmented Priors, which enrich existing trajectory data with structured reasoning generated by stronger models; (2) Environment-Anchored Priors, which provide in-environment knowledge and grounding supervision; and (3) External Knowledge Priors, which transfer general knowledge from out-of-environment datasets. In the second stage, we develop an online RL pipeline that builds on these priors to further enhance agent performance. To overcome the inherent challenges in agent RL, including long horizons, sparse rewards, and training instability, we introduce three key designs: self-summarization for context management, dense reward shaping, and turn-level policy optimization. Extensive experiments on both high-level planning (EB-ALFRED) and low-level control (EB-Manipulation) tasks demonstrate that ERA-3B surpasses both prompting-based large models and previous training-based baselines. Specifically, it achieves overall improvements of 8.4\% on EB-ALFRED and 19.4\% on EB-Manipulation over GPT-4o, and exhibits strong generalization to unseen tasks. Overall, ERA offers a practical path toward scalable embodied intelligence, providing methodological insights for future embodied AI systems.
- SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
Recently, remarkable progress has been made in Unified Multimodal Models (UMMs), which integrate vision-language generation and understanding capabilities within a single framework. However, a significant gap exists where a model's strong visual understanding often fails to transfer to its visual generation. A model might correctly understand an image based on user instructions, yet be unable to generate a faithful image from text prompts. This phenomenon directly raises a compelling question: Can a model achieve self-improvement by using its understanding module to reward its generation module? To bridge this gap and achieve self-improvement, we introduce SRUM, a self-rewarding post-training framework that can be directly applied to existing UMMs of various designs. SRUM creates a feedback loop where the model's own understanding module acts as an internal ``evaluator'', providing corrective signals to improve its generation module, without requiring additional human-labeled data. To ensure this feedback is comprehensive, we designed a global-local dual reward system. To tackle the inherent structural complexity of images, this system offers multi-scale guidance: a global reward ensures the correctness of the overall visual semantics and layout, while a local reward refines fine-grained, object-level fidelity. SRUM leads to powerful capabilities and shows strong generalization, boosting performance on T2I-CompBench from 82.18 to 88.37 and on T2I-ReasonBench from 43.82 to 46.75. Overall, our work establishes a powerful new paradigm for enabling a UMMs' understanding module to guide and enhance its own generation via self-rewarding.
- UniFusion: Vision-Language Model as Unified Encoder in Image Generation
Although recent advances in visual generation have been remarkable, most existing architectures still depend on distinct encoders for images and text. This separation constrains diffusion models' ability to perform cross-modal reasoning and knowledge transfer. Prior attempts to bridge this gap often use the last layer information from VLM, employ multiple visual encoders, or train large unified models jointly for text and image generation, which demands substantial computational resources and large-scale data, limiting its accessibility.We present UniFusion, a diffusion-based generative model conditioned on a frozen large vision-language model (VLM) that serves as a unified multimodal encoder. At the core of UniFusion is the Layerwise Attention Pooling (LAP) mechanism that extracts both high level semantics and low level details from text and visual tokens of a frozen VLM to condition a diffusion generative model. We demonstrate that LAP outperforms other shallow fusion architectures on text-image alignment for generation and faithful transfer of visual information from VLM to the diffusion model which is key for editing. We propose VLM-Enabled Rewriting Injection with Flexibile Inference (VERIFI), which conditions a diffusion transformer (DiT) only on the text tokens generated by the VLM during in-model prompt rewriting. VERIFI combines the alignment of the conditioning distribution with the VLM's reasoning capabilities for increased capabilities and flexibility at inference. In addition, finetuning on editing task not only improves text-image alignment for generation, indicative of cross-modality knowledge transfer, but also exhibits tremendous generalization capabilities. Our model when trained on single image editing, zero-shot generalizes to multiple image references further motivating the unified encoder design of UniFusion.
- Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
The success of Transformer language models is widely credited to their dot-product attention mechanism, which interweaves a set of key design principles: mixing information across positions (enabling multi-token interactions), sequence-dependent activations (where attention weights adapt to each input), a specific mathematical form (dot-product similarities plus softmax weighting), and coupling of queries and keys to evolving hidden states (grounding attention in the current layer). However, the necessity of each of these principles remains largely untested. In this work, we systematically deconstruct attention by designing controlled variants that selectively relax these principles, applied both uniformly across all layers and in hybrid architectures where only some layers retain standard attention. Our empirical analysis reveals that mechanisms for mixing tokens are indispensable, as their absence collapses models to near-random behavior, while the exact mathematical form and sequence dependency can be substantially relaxed, especially when preserved in just a subset of layers. Surprisingly, even variants that fail in isolation can achieve robust performance when interleaved with standard attention, highlighting a cooperative effect. These findings deepen our understanding of what truly underpins attention's effectiveness and open new avenues for simplifying language models without sacrificing performance.
- Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation in each training step. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, the forward computational graphs of all MC samples need to be retained for the gradient computation of non-linear terms in the RL objective, resulting in significant memory overhead. This constraint restricts feasible sample sizes, leading to imprecise likelihood approximations and ultimately distorting the RL objective. To overcome this limitation, we propose Boundary-Guided Policy Optimization (BGPO), a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective. This lower bound is carefully designed to satisfy two key properties: (1) Linearity: it is formulated in a linear sum where each term depends only on a single MC sample, thereby enabling gradient accumulation across samples and ensuring constant memory usage; (2) Equivalence: Both the value and gradient of this lower bound are equal to those of the ELBO-based objective in on-policy training, making it also an effective approximation for the original RL objective. These properties allow BGPO to adopt a large MC sample size, resulting in more accurate likelihood approximations and improved RL objective estimation, which in turn leads to enhanced performance. Experiments show that BGPO significantly outperforms previous RL algorithms for dLLMs in math problem solving, code generation, and planning tasks.
Solidot(15)
- 研究发现卫星未加密传输敏感信息
加州圣迭戈(UC San Diego)和马里兰大学的研究人员发现,约半数地球静止卫星信号在传输敏感数据时没有加密。研究团队用了三年时间花了 800 美元在大学屋顶上安装了一个卫星接收器,拦截从他们的位置可见的卫星通信。他们只用 9 个小时就记录到了逾 2700 名 T-Mobile 用户的通话和短信。研究人员还收集到航班乘客使用机载 Wi-Fi 的数据,电力公司和海上油气平台的通信数据,以及泄露军人位置和设备信息的美国和墨西哥军方通信数据。泄露的数据源于电信公司利用卫星将远程基站信号中继到核心网络。在研究人员警告之后,大部分公司对卫星传输进行了加密。
- 商务部首次以 WPS 文档格式发表公告
商务部上周宣布了稀土出口管控新规定《商务部公告2025第62号 公布对稀土相关技术实施出口管制的决定》,公告包含了两个附件:《转移或者提供受出口管制技术情况说明》填写指南.wps 和《在境内提供受出口管制技术情况说明》填写指南.wps。这被认为是中国政府机构首次以 WPS 文档格式发表公告。.wps 是金山开发的办公软件 WPS Office 使用的文档格式,它与广泛使用的微软办公软件 Microsoft Office 的 .docx 文档格式编码结构不同,Microsoft Office 软件需要在文本转换之后才能打开 WPS 文档。
- 女性被系统性的刻画成比男性更年轻
美国人口普查数据显示,男女劳动力之间不存在系统性的年龄差距。但如果你搜索 Google 或 YouTube 或查询 ChatGPT,情况并非如此。根据《自然》期刊上发表的一项研究,对 140 万在线图片和视频,以及 9 个大模型的分析显示,女性被系统性的刻画成比男性更年轻。这种扭曲在高地位、高收入职业中更为突出。研究人员发现,主流算法进一步放大了年龄相关的性别偏见:在生成和评估近 4 万份简历时,ChatGPT 会假定女性更年轻经验更少,同时认为年长的男性申请人更有资格。在生成女性简历时,ChatGPT 假定她们比男性简历更年轻(年轻 1.6 岁)、毕业时间更近、工作经验更少。在评估简历时,ChatGPT 对同一职位的年长男性的评分高于女性。研究表明,性别偏见在其系统中是根深蒂固的。
- Google 限制 Android 侧载是其至今采取的最反消费者举措
很多人选择 Android 手机是因为相比苹果 iPhone,Android 生态系统更开放更自由。但在生态系统规模足够大之后,Google 认为它不再需要考虑用户的自由,它从本月开始执行限制侧载的新规定,到 2026 年 9 月开始强制执行新规定,这是该公司至今采取的最反消费者举措。Google 声称此举是为了用户安全,但已经执行该规定的 Google Play 应用商店并没有变得更安全。Google 没有完全杀死侧载,而是通过创造一个由 Google 控制的强制性阻塞点(choke point)重新定义和限制 Android 生态系统的参与,它将一个人人都能参与的开源项目,通过一个中心化的开发者身份验证系统,将其变成了一个只有它批准的人才能参与的项目。这可能是独立开发者、业余爱好者和小众应用社区走向终结的开始。Android 的开放性正在逐渐关闭。我们尚不清楚它是否会在有一天变成一个完全封闭的生态系统。
- NASA JPL 裁员 550 人
NASA JPL 以重组的名义在周一宣布裁员大约 550 人,占到员工总数的 11%。员工将在周二收到其状态的通知,新实验室的结构于周三生效。JPL 是由 NASA 资助的研发实验室,由加州理工管理。实验室主任 Dave Gallagher 表示,“虽然并不容易,但我相信现在采取这些行动,将帮助实验室以所需的规模和速度实现转型,助力人类在太空领域实现最宏伟的抱负。”
- 软件更新导致部分吉普牧马人 4xe 变砖
汽车巨头 Stellantis 旗下吉普(Jeep)公司在上周末向吉普牧马人 4xe 混动车推送了软件更新,但通过 OTA 更新了车载娱乐信息系统 Uconnect 的车主很快通过论坛、Reddit 以及 YouTube 报告了问题。类似去年 Crowdstrike 的安全事故,周末更新软件显然不是一个好主意。这次更新被发现存在严重问题,可能导致汽车在行驶过程中失去动力,潜在引发安全事故。车主们报告,有 bug 的软件更新没有立即导致汽车变砖,故障发生在行驶过程中:部分车主在离开家不久的低速行驶中发生故障,还有部分车主称故障发生在高速公路上。吉普迅速撤回了软件更新,但很多车主已经安装更新了。该公司的员工建议没有安装的车主忽略更新,已经安装的车主如果还没有遭遇故障则避免使用混动或电动模式。吉普已经释出了软件修复。
- x86 生态系统顾问团队过去一年的成果
AMD 和英特尔去年 10 月宣布组建一个 x86 生态系统顾问团队,致力于在 x86 架构实现上有更高的一致性。x86 生态系统某种程度上由 AMD 和英特尔共同开发,但两家公司保持着距离,导致了部分指令集架构存在低效和偏差问题。高级矢量扩展指令集(Advanced Vector Extensions 或 AVX)就是一个典型例子,AVX-512 在多年里只能通过英特尔平台使用,AMD 是从 2022 年的 Zen 4 起加入了对 AVX-512 的初步支持,2024 年发布的 Zen 5 才完整支持 512 位数据路径。过去一年 x86 生态系统顾问团的成果包括:内存标记指令集架构 ChkTag,Flexible Return Event Delivery (FRED),AVX 指令集的下一代 AVX10,用于矩阵乘法的 Advanced Matrix Extensions (AMX) ACE,等等。
- 美国 AI 淘金热下制造业疲软
美国的 AI 产业在蓬勃发展,但制造业则陷入了更深层的衰退。美国制造业在 1979 年的巅峰时期雇佣了约 1950 万名工人。这一数字此后已缩减至不到 1300 万,而截至今年八月的一年内,制造业又减少了约 7.8 万个工作岗位。普查数据也显示,新成立的制造商数量在减少。美国经济分析局的数据显示,截至 7 月的一年内工厂投资下降了约 6%,这是自 2021 年初以来的首次下降。特朗普的关税政策也导致制造商们的利润下降。制造业的低迷与 AI 投资的巨额增长形成了鲜明对比。AI 行业使用的硬件大部分都免增关税。2025 年上半年,美国数据中心投资同比增长近 37%,而同期工厂建设则下降了约 3%。美国对计算设备的投资同比增长逾 45%,而传统工业设备的支出几乎没有变化。
- 日本夏季过去 42 年增加了 3 周
日本三重大学的研究团队发现,1982-2023 年的 42 年间,日本的“夏季时长”增加了约 3 周。“冬季时长”基本未变,春季与秋季则不断缩短。团队警示称:“全球变暖导致的海面水温上升是主要原因。若变暖趋势持续,‘长夏+长冬’的两季化现象将进一步加剧”。42 年间夏季起始日提前了约 12.6 天,结束日推迟了约 8.8 天,总时长增加了约 21.4 天。以 2023 年为例,日本夏季为 6 月11 日至 10 月 9日,共计 121 天。
- 教宗督促警惕控制算法的人
教宗良十四世上周接见了出席第 39 届国际新闻机构协会大会的参与者,督促在充斥“垃圾”资讯和数字媒体的时代中培育“良知”和“批判性思维”。他说,“我们并非注定要生活在一个真实与虚构再也难以区分的世界里”,他引用汉娜‧鄂兰(Hannah Arendt)的一句名言:“极权统治的理想臣民不是坚定的纳粹份子或共产主义者,而是那些再也无法分辨事实与虚构孰真孰假的人。”教宗说:“算法以前所未有的规模和速度生成内容与数据,但谁在掌控它们?人工智能正在改变我们获取资讯与交流的方式,但又是谁在引导它们,以及出于什么样的目的?”教宗提醒:“我们必须保持警惕,确保科技不会取代人类,以及对资讯和算法的管理今天不会被少数人掌控。”
- 大部分开放权重模型都来自中国
尽管美国公司如 OpenAI、Anthropic 和 Google 的大模型在基准测试上处于世界领先水平,但这些模型基本上都是私有不公开权重的,根据 Hugging Face 和 LMArena 的统计,中国公司 DeepSeek 和阿里巴巴发布了下载量最多的公开权重的开放模型。Meta 一度倡导开放模型,扎克伯格(Mark Zuckerberg)去年还表示如果 AI 公司能分享模型,世界将会从中受益。但此后 Meta 放慢了公开其最新模型的步伐,扎克伯格如今表示要将最好的模型留给自己。
- 微软终止对 windows 10 的支持
10 月 14 日,微软终止了对 Windows 10 的主流支持,进入到了扩展支持阶段。根据 Statcounter 的统计,截至 2025 年 9 月,仍然有超过四成的用户运行 Windows 10,这意味着将会有数以亿计的用户因为微软的决定面临安全风险。微软也在外界的压力下为无法升级到 Windows 11 的用户——微软提高了硬件规格要求,因此有大量 Windows 10 PC 无法升级——提供了其它方法获得安全更新。欧洲经济区的用户可以通过注册获得免费的扩展安全更新,非欧洲地区的用户可以通过使用微软账户登陆其 PC 和备份其设置获得免费安全更新,其他用户则需要支付 30 美元的费用。
- 诺贝尔经济学奖授予了研究创新对经济影响的三名经济学家
瑞典皇家科学院宣布了 2025 年诺贝尔经济学奖得主:美国经济学家 Joel Mokyr,以表彰他通过过历史观察确定通过技术进步实现持续增长的必要因素;美国经济学家 Peter Howitt 和英国经济学家 Philippe Aghion,以表彰他们的通过创造性破坏实现持续增长的理论。瑞典皇家科学院称,获奖者的研究成果阐释了科技如何催生新产品和新的生产方法,取代旧产品和方法,提高了全球人民的生活水平和生活质量,促进健康。过去两个世纪,世界首次实现了经济持续增长。大量人口摆脱了贫困,为繁荣奠定了基础。而在人类历史上,大部分时间里经济停滞而非增长才是常态。他们的工作表明,我们必须意识到并应对持续增长面临的威胁。
- SmartNav 将城市 GPS 精度提升到 10 厘米
GPS 系统在城市建筑物高密度地区精度不高,原因是信号会被高层建筑物表面反射,需要更长时间才会到达 GPS 接收器,从而影响到距离的计算,导致位置精度不佳。挪威科技大学的研究人员研发出 SmartNav,组合了卫星校正、波分析和 Google 的 3D 建筑数据,测试中 90% 的时间里实现了 10 厘米以内精度,新的方法提供了一种廉价的城市导航技术。Google 拥有全世界大约 4000 座城市的建筑物 3D 模型数据,它利用这些数据计算卫星信号在建筑物之间的反射。
- 高龄父亲会将更多致病突变遗传给后代
发表于《自然》的新研究显示,高龄父亲将致病突变遗传给孩子的风险比我们想象的要高。基因组测序显示,在 30 岁出头的男性中,大约每 50 个精子中就有 1 个携带致病突变;而到 70 岁时,这一比例上升到近 1/20。 研究人员建议,如果年轻男性认为自己要年纪大一些时再有孩子,他们可以考虑冷冻精子;而计划组建家庭的年长男性则可以考虑现有的各种筛查技术。最近的研究表明,我们每个人体内的大多数细胞中都有约 70 个父母都没有的新突变,其中 80% 的突变源于父亲的睾丸,这还不包括母亲卵子中更常见的大规模染色体异常。