OrangeBot.AI Digest — 2025-09-21
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Rail travel is booming in America (www.economist.com)
- Sj.h: A tiny little JSON parsing library in ~150 lines of C99 (github.com)
- I forced myself to spend a week in Instagram instead of Xcode (www.pixelpusher.club)
- DXGI debugging: Microsoft put me on a list (slugcat.systems)
- LaLiga's Anti-Piracy Crackdown Triggers Widespread Internet Disruptions in Spain (reclaimthenet.org)
- The University of Oxford has fallen out of the top three universities in the UK (hotminute.co.uk)
- UK, Canada and Australia formally recognise Palestinian state (www.theguardian.com)
- Why your outdoorsy friend suddenly has a gummy bear power bank (www.theverge.com)
- Meta exposé author faces $50k fine per breach of non-disparagement agreement (www.theguardian.com)
- They Thought They Were Free (1955) (press.uchicago.edu)
- Universities should be more than toll gates (www.waliddib.com)
- Vibe coding cleanup as a service (donado.co)
- Spectral Labs releases SGS-1: the first generative model for structured CAD (www.spectrallabs.ai)
- AI was supposed to help juniors shine. Why does it mostly make seniors stronger? (elma.dev)
- iFixit iPhone Air teardown (www.ifixit.com)
GitHub Trending(15)
- Alibaba-NLP / DeepResearch
Tongyi Deep Research, the Leading Open-source Deep Research Agent
- Gar-b-age / CookLikeHOC
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
- torvalds / linux
Linux kernel source tree
- LazyVim / LazyVim
Neovim config for the lazy
- x1xhlol / system-prompts-and-models-of-ai-tools
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI, VSCode Agent, Warp.dev, Windsurf, Xcode, Z.ai Code, dia & v0. (And other Open Sourced) System Prompts, Internal Tools & AI Models
- basecamp / omarchy
Opinionated Arch/Hyprland Setup
- fmtlib / fmt
A modern formatting library
- WECENG / ticket-purchase
大麦自动抢票,支持人员、城市、日期场次、价格选择
- WebGoat / WebGoat
WebGoat is a deliberately insecure application
- CopilotKit / CopilotKit
React UI + elegant infrastructure for AI Copilots, AI chatbots, and in-app AI agents. The Agentic last-mile 🪁
- microsoft / AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
- HKUDS / AI-Researcher
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat
- tldraw / tldraw
very good whiteboard SDK / infinite canvas SDK
- EbookFoundation / free-programming-books
📚 Freely available programming books
- ml-explore / mlx-swift-examples
Examples using MLX Swift
Hugging Face(15)
- ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, built via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose computer use agents. We will release data, models, and code to advance future research: https://github.com/OpenGVLab/ScaleCUA.
- FlowRL: Matching Reward Distributions for LLM Reasoning
We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of 10.0% over GRPO and 5.1% over PPO on math benchmarks, and performs consistently better on code reasoning tasks. These results highlight reward distribution-matching as a key step toward efficient exploration and diverse reasoning in LLM reinforcement learning.
- Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
Large language models (LLMs) are increasingly applied in diverse real-world scenarios, each governed by bespoke behavioral and safety specifications (spec) custom-tailored by users or organizations. These spec, categorized into safety-spec and behavioral-spec, vary across scenarios and evolve with changing preferences and requirements. We formalize this challenge as specification alignment, focusing on LLMs' ability to follow dynamic, scenario-specific spec from both behavioral and safety perspectives. To address this challenge, we propose Align3, a lightweight method that employs Test-Time Deliberation (TTD) with hierarchical reflection and revision to reason over the specification boundaries. We further present SpecBench, a unified benchmark for measuring specification alignment, covering 5 scenarios, 103 spec, and 1,500 prompts. Experiments on 15 reasoning and 18 instruct models with several TTD methods, including Self-Refine, TPO, and MoreThink, yield three key findings: (i) test-time deliberation enhances specification alignment; (ii) Align3 advances the safety-helpfulness trade-off frontier with minimal overhead; (iii) SpecBench effectively reveals alignment gaps. These results highlight the potential of test-time deliberation as an effective strategy for reasoning over the real-world specification boundaries.
- Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing label-free methods, confidence minimization, self-consistency, or majority-vote objectives, stabilize learning but steadily shrink exploration, causing an entropy collapse: generations become shorter, less diverse, and brittle. Unlike prior approaches such as Test-Time Reinforcement Learning (TTRL), which primarily adapt models to the immediate unlabeled dataset at hand, our goal is broader: to enable general improvements without sacrificing the model's inherent exploration capacity and generalization ability, i.e., evolving. We formalize this issue and propose EVolution-Oriented and Label-free Reinforcement Learning (EVOL-RL), a simple rule that couples stability with variation under a label-free setting. EVOL-RL keeps the majority-voted answer as a stable anchor (selection) while adding a novelty-aware reward that favors responses whose reasoning differs from what has already been produced (variation), measured in semantic space. Implemented with GRPO, EVOL-RL also uses asymmetric clipping to preserve strong signals and an entropy regularizer to sustain search. This majority-for-selection + novelty-for-variation design prevents collapse, maintains longer and more informative chains of thought, and improves both pass@1 and pass@n. EVOL-RL consistently outperforms the majority-only TTRL baseline; e.g., training on label-free AIME24 lifts Qwen3-4B-Base AIME25 pass@1 from TTRL's 4.6% to 16.4%, and pass@16 from 18.5% to 37.9%. EVOL-RL not only prevents diversity collapse but also unlocks stronger generalization across domains (e.g., GPQA). Furthermore, we demonstrate that EVOL-RL also boosts performance in the RLVR setting, highlighting its broad applicability.
- Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Recent studies have demonstrated the importance of high-quality visual representations in image generation and have highlighted the limitations of generative models in image understanding. As a generative paradigm originally designed for natural language, autoregressive models face similar challenges. In this work, we present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain. We identify three key properties that hinder the learning of high-level visual semantics: local and conditional dependence, inter-step semantic inconsistency, and spatial invariance deficiency. We show that these issues can be effectively addressed by introducing self-supervised objectives during training, leading to a novel training framework, Self-guided Training for AutoRegressive models (ST-AR). Without relying on pre-trained representation models, ST-AR significantly enhances the image understanding ability of autoregressive models and leads to improved generation quality. Specifically, ST-AR brings approximately 42% FID improvement for LlamaGen-L and 49% FID improvement for LlamaGen-XL, while maintaining the same sampling strategy.
- FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Search has emerged as core infrastructure for LLM-based agents and is widely viewed as critical on the path toward more general intelligence. Finance is a particularly demanding proving ground: analysts routinely conduct complex, multi-step searches over time-sensitive, domain-specific data, making it ideal for assessing both search proficiency and knowledge-grounded reasoning. Yet no existing open financial datasets evaluate data searching capability of end-to-end agents, largely because constructing realistic, complicated tasks requires deep financial expertise and time-sensitive data is hard to evaluate. We present FinSearchComp, the first fully open-source agent benchmark for realistic, open-domain financial search and reasoning. FinSearchComp comprises three tasks -- Time-Sensitive Data Fetching, Simple Historical Lookup, and Complex Historical Investigation -- closely reproduce real-world financial analyst workflows. To ensure difficulty and reliability, we engage 70 professional financial experts for annotation and implement a rigorous multi-stage quality-assurance pipeline. The benchmark includes 635 questions spanning global and Greater China markets, and we evaluate 21 models (products) on it. Grok 4 (web) tops the global subset, approaching expert-level accuracy. DouBao (web) leads on the Greater China subset. Experimental analyses show that equipping agents with web search and financial plugins substantially improves results on FinSearchComp, and the country origin of models and tools impact performance significantly.By aligning with realistic analyst tasks and providing end-to-end evaluation, FinSearchComp offers a professional, high-difficulty testbed for complex financial search and reasoning.
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Recent video diffusion models demonstrate strong potential in spatial intelligence tasks due to their rich latent world priors. However, this potential is hindered by their limited controllability and geometric inconsistency, creating a gap between their strong priors and their practical use in 3D/4D tasks. As a result, current approaches often rely on retraining or fine-tuning, which risks degrading pretrained knowledge and incurs high computational costs. To address this, we propose WorldForge, a training-free, inference-time framework composed of three tightly coupled modules. Intra-Step Recursive Refinement introduces a recursive refinement mechanism during inference, which repeatedly optimizes network predictions within each denoising step to enable precise trajectory injection. Flow-Gated Latent Fusion leverages optical flow similarity to decouple motion from appearance in the latent space and selectively inject trajectory guidance into motion-related channels. Dual-Path Self-Corrective Guidance compares guided and unguided denoising paths to adaptively correct trajectory drift caused by noisy or misaligned structural signals. Together, these components inject fine-grained, trajectory-aligned guidance without training, achieving both accurate motion control and photorealistic content generation. Extensive experiments across diverse benchmarks validate our method's superiority in realism, trajectory consistency, and visual fidelity. This work introduces a novel plug-and-play paradigm for controllable video synthesis, offering a new perspective on leveraging generative priors for spatial intelligence.
- RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
This paper presents RynnVLA-001, a vision-language-action(VLA) model built upon large-scale video generative pretraining from human demonstrations. We propose a novel two-stage pretraining methodology. The first stage, Ego-Centric Video Generative Pretraining, trains an Image-to-Video model on 12M ego-centric manipulation videos to predict future frames conditioned on an initial frame and a language instruction. The second stage, Human-Centric Trajectory-Aware Modeling, extends this by jointly predicting future keypoint trajectories, thereby effectively bridging visual frame prediction with action prediction. Furthermore, to enhance action representation, we propose ActionVAE, a variational autoencoder that compresses sequences of actions into compact latent embeddings, reducing the complexity of the VLA output space. When finetuned on the same downstream robotics datasets, RynnVLA-001 achieves superior performance over state-of-the-art baselines, demonstrating that the proposed pretraining strategy provides a more effective initialization for VLA models.
- AToken: A Unified Tokenizer for Vision
We present AToken, the first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. Unlike existing tokenizers that specialize in either reconstruction or understanding for single modalities, AToken encodes these diverse visual inputs into a shared 4D latent space, unifying both tasks and modalities in a single framework. Specifically, we introduce a pure transformer architecture with 4D rotary position embeddings to process visual inputs of arbitrary resolutions and temporal durations. To ensure stable training, we introduce an adversarial-free training objective that combines perceptual and Gram matrix losses, achieving state-of-the-art reconstruction quality. By employing a progressive training curriculum, AToken gradually expands from single images, videos, and 3D, and supports both continuous and discrete latent tokens. AToken achieves 0.21 rFID with 82.2% ImageNet accuracy for images, 3.01 rFVD with 32.6% MSRVTT retrieval for videos, and 28.19 PSNR with 90.9% classification accuracy for 3D. In downstream applications, AToken enables both visual generation tasks (e.g., image generation with continuous and discrete tokens, text-to-video generation, image-to-3D synthesis) and understanding tasks (e.g., multimodal LLMs), achieving competitive performance across all benchmarks. These results shed light on the next-generation multimodal AI systems built upon unified visual tokenization.
- Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
- MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Current instruction-based image editing (IBIE) methods struggle with challenging editing tasks, as both editing types and sample counts of existing datasets are limited. Moreover, traditional dataset construction often contains noisy image-caption pairs, which may introduce biases and limit model capabilities in complex editing scenarios. To address these limitations, we introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality image editing samples. It encompasses 6 challenging editing tasks through a diverse collection of 18 non-style-transfer editing types and 38 style transfer operations, covering a spectrum from sophisticated style transfer to complex semantic operations like person reference editing and in-image text editing. We employ a novel dataset construction pipeline that utilizes two multi-modal large language models (MLLMs) to generate visual-adaptive editing instructions and produce high-fidelity edited images, respectively. Extensive experiments demonstrate that fine-tuning foundational open-source models with our MultiEdit-Train set substantially improves models' performance on sophisticated editing tasks in our proposed MultiEdit-Test benchmark, while effectively preserving their capabilities on the standard editing benchmark. We believe MultiEdit provides a valuable resource for advancing research into more diverse and challenging IBIE capabilities. Our dataset is available at https://huggingface.co/datasets/inclusionAI/MultiEdit.
- Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
Materials characterization is fundamental to acquiring materials information, revealing the processing-microstructure-property relationships that guide material design and optimization. While multimodal large language models (MLLMs) have recently shown promise in generative and predictive tasks within materials science, their capacity to understand real-world characterization imaging data remains underexplored. To bridge this gap, we present MatCha, the first benchmark for materials characterization image understanding, comprising 1,500 questions that demand expert-level domain expertise. MatCha encompasses four key stages of materials research comprising 21 distinct tasks, each designed to reflect authentic challenges faced by materials scientists. Our evaluation of state-of-the-art MLLMs on MatCha reveals a significant performance gap compared to human experts. These models exhibit degradation when addressing questions requiring higher-level expertise and sophisticated visual perception. Simple few-shot and chain-of-thought prompting struggle to alleviate these limitations. These findings highlight that existing MLLMs still exhibit limited adaptability to real-world materials characterization scenarios. We hope MatCha will facilitate future research in areas such as new material discovery and autonomous scientific agents. MatCha is available at https://github.com/FreedomIntelligence/MatCha.
- Agentic Software Engineering: Foundational Pillars and a Research Roadmap
Agentic Software Engineering (SE 3.0) represents a new era where intelligent agents are tasked not with simple code generation, but with achieving complex, goal-oriented SE objectives. To harness these new capabilities while ensuring trustworthiness, we must recognize a fundamental duality within the SE field in the Agentic SE era, comprising two symbiotic modalities: SE for Humans and SE for Agents. This duality demands a radical reimagining of the foundational pillars of SE (actors, processes, tools, and artifacts) which manifest differently across each modality. We propose two purpose-built workbenches to support this vision. The Agent Command Environment (ACE) serves as a command center where humans orchestrate and mentor agent teams, handling outputs such as Merge-Readiness Packs (MRPs) and Consultation Request Packs (CRPs). The Agent Execution Environment (AEE) is a digital workspace where agents perform tasks while invoking human expertise when facing ambiguity or complex trade-offs. This bi-directional partnership, which supports agent-initiated human callbacks and handovers, gives rise to new, structured engineering activities (i.e., processes) that redefine human-AI collaboration, elevating the practice from agentic coding to true agentic software engineering. This paper presents the Structured Agentic Software Engineering (SASE) vision, outlining several of the foundational pillars for the future of SE. The paper culminates in a research roadmap that identifies a few key challenges and opportunities while briefly discussing the resulting impact of this future on SE education. Our goal is not to offer a definitive solution, but to provide a conceptual scaffold with structured vocabulary to catalyze a community-wide dialogue, pushing the SE community to think beyond its classic, human-centric tenets toward a disciplined, scalable, and trustworthy agentic future.
- RecoWorld: Building Simulated Environments for Agentic Recommender Systems
We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multimodal, and semantic ID modeling, and discuss how multi-turn RL enables the recommender to refine its strategies through iterative interactions. RecoWorld also supports multi-agent simulations, allowing creators to simulate the responses of targeted user populations. It marks an important first step toward recommender systems where users and agents collaboratively shape personalized information streams. We envision new interaction paradigms where "user instructs, recommender responds," jointly optimizing user retention and engagement.
- Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
Spatio-temporal video grounding (STVG) aims at localizing the spatio-temporal tube of a video, as specified by the input text query. In this paper, we utilize multimodal large language models (MLLMs) to explore a zero-shot solution in STVG. We reveal two key insights about MLLMs: (1) MLLMs tend to dynamically assign special tokens, referred to as grounding tokens, for grounding the text query; and (2) MLLMs often suffer from suboptimal grounding due to the inability to fully integrate the cues in the text query (e.g., attributes, actions) for inference. Based on these insights, we propose a MLLM-based zero-shot framework for STVG, which includes novel decomposed spatio-temporal highlighting (DSTH) and temporal-augmented assembling (TAS) strategies to unleash the reasoning ability of MLLMs. The DSTH strategy first decouples the original query into attribute and action sub-queries for inquiring the existence of the target both spatially and temporally. It then uses a novel logit-guided re-attention (LRA) module to learn latent variables as spatial and temporal prompts, by regularizing token predictions for each sub-query. These prompts highlight attribute and action cues, respectively, directing the model's attention to reliable spatial and temporal related visual regions. In addition, as the spatial grounding by the attribute sub-query should be temporally consistent, we introduce the TAS strategy to assemble the predictions using the original video frames and the temporal-augmented frames as inputs to help improve temporal consistency. We evaluate our method on various MLLMs, and show that it outperforms SOTA methods on three common STVG benchmarks. The code will be available at https://github.com/zaiquanyang/LLaVA_Next_STVG.
Solidot(15)
- OpenAI 研究人员称 AI 幻觉在数学上是不可避免的
OpenAI 研究人员在预印本平台 arxiv 上发表了一篇论文,指出由于大模型使用的统计学特性以及计算限制,即使有完美的数据,AI 仍然会产生貌似可信但实际上错误的输出。研究人员承认,AI 幻觉在数学上是不可避免的,无法通过更先进的工程技术解决。研究人员在论文中称,类似面对考试难题的学生,大模型会在不确定的情况下猜测,产生貌似可信但实际错误的表述,而不是承认不确定性。即使是最先进的 AI 系统,幻觉仍然存在,会破坏信任。研究人员证明,幻觉源于训练大模型使用的统计学特性,而非实现缺陷。研究人员测试了竞争对手 DeepSeek-V3 模型、Meta AI 和 Claude 3.7 Sonnet,以及 OpenAI 自己的 GPT 系列模型。研究人员称,ChatGPT 也会产生幻觉,GPT-5 有更少的幻觉,但幻觉仍然会发生,且更先进的推理模型比简单的系统更容易产生幻觉:o1 推理模型 16% 的时间会产生幻觉,而较新的 o3 和 o4-mini 分别有 33% 和 48% 的时间会产生幻觉。OpenAI 的研究识别了三个导致幻觉不可避免的数学因素:当训练数据集中信息过少时的认知不确定性,模型局限性和计算难解性。
- 搜狗输入法云控下发模块悄悄纂改 Edge 和 Chrome 配置
安全公司火绒报告,搜狗输入法云控下发模块会悄悄纂改 Edge 和 Chrome 浏览器的主页和默认搜索引擎设置。相关搜狗输入法版本为 15.7.0.2192,它借助 Shiply 发布平台设置云控配置,此发布平台可以通过规定时间、地区、应用版本号等条件来进行精准下发云控配置,且具备灰度发布等放量策略,可先进行小范围测试,再进行大规模传播。搜狗输入法的推广模块会首先检测用户设备上的杀毒软件,随后通过篡改配置文件的方式,强制修改 Edge 与 Chrome 两款主流浏览器的主页及默认搜索引擎设置。以 Chrome 为例,打开浏览器会跳转至 page.wenxin9.com,随后再跳转至导航页。在导航页内点击百度链接,链接均带有来源标识参数。
- 梵蒂冈的 Flathub 软件包人均安装量最高
天主教教宗驻地最爱 Linux 软件了。Reddit 用户统计了 Linux 应用商店 Flathub 在不同国家/地区的下载量,然后根据其人口总数计算了人均下载量。结果出人意料:罗马的城中之国梵蒂冈的人均下载量全球最高,一大原因是梵蒂冈只有 496 人(维基百科给出的 2024 年数据是 882 人),总下载量 6878,人均下载量接近 14,远超第二名德国的人均 4 次。统计显示,Flathub 在欧洲、美国、加拿大、澳大利亚和新西兰相当受欢迎,但在亚洲和非洲没多少用户。Flathub 提供了 Flatpak 打包的 Linux 软件,绝大部分 Linux 发行版都支持运行 Flatpak 软件。
- 奥地利军方从 MS Office 切换到 LibreOffice
奥地利军方从私有的 MS Office 切换到了开源的 LibreOffice。此举并非是为了节省大约 1.6 万台工作站的软件许可证费用,用军方官员的话说是为了加强数字主权,维护基础设施的独立性,确保数据仅在内部处理。主要动机是微软的办公软件正迁移到云端,而军方不可能使用外部云服务处理内部数据。奥地利军方从 2022 年开始就在试用 LibreOffice 为迁移做准备。军方此前使用的是 Microsoft Office 2016 Professional,如今已经卸载,但仍需要使用微软办公软件的用户可以内部申请使用 MS Office 2024 LTSC。奥地利军方在使用 LibreOffice 过程中还为开源项目贡献了代码。
- 小米将远程修复其 11 万辆 SU7 的辅助驾驶系统缺陷
小米和国家市场监督管理总局周五宣布,召回 2024 年 2 月 6 日至 2025 年 8 月 30 日生产的部分 SU7 标准版电动汽车,共计 116887 辆。召回编号S2025M0149I:涉及XMA7000MBEVR2和XMA7000MBEVR5车型,共计98462辆。召回编号S2025M0150I:涉及BJ7000MBEVR2车型,共计18425辆。本次召回范围内部分车辆在L2高速领航辅助驾驶功能开启的某些情况下,对极端特殊场景的识别、预警或处置可能不足,若驾驶员不及时干预可能会增加碰撞风险,存在安全隐患。小米汽车科技有限公司将通过汽车远程升级(OTA)技术,为召回范围内的车辆免费升级软件,以消除安全隐患。今年早些时候曾发生过一起涉及 SU7 辅助驾驶系统的致命车祸,导致三名大学生死亡。
- 美国要求 H-1B 签证申请支付 10 万美元
美国总统特朗普发布行政令,以打击 H-1B 签证滥用,保护美国人工作的理由宣布从 9 月 21 日起入境美国的签证持有者需要在申请时支付 10 万美元。目前位于美国境内的 H1B 签证持有者不受影响,但在延长签证后他们出入境时也都需要有支付 10 万美元的证明。目前在境外的签证持有者则需要在 9 月 21 日前返回,否则需要有支付 10 万美元的证明。美国科技巨头们大规模使用 H-1B 签证,经常在裁员的同时招聘 H1B 签证持有者,因此在美国国内饱受滥用 H-1B 签证的批评。亚马逊在 2025 年上半年批准了逾 1 万份 H-1B 签证,而微软和 Meta 分别批准了逾 5,000 份 H-1B 签证。
- 华为和浙江大学发布 DeepSeek-R1-Safe
华为和浙江大学合作使用华为昇腾芯片和 MindSpeedLLM 等框架发布了 DeepSeek R1 模型的安全加强版 DeepSeek-R1-Safe(中国联通也有相似名字的安全版本模型)。源代码发表在 GitHub 等平台上。研究人员称他们基于国内外法律法规与核心价值观,构建了中英文双语的安全语料。其中语料不仅包含了带有安全思维链的标注,还提供了相应的安全回复,可用于大模型的安全训练、微调以及测试。测试结果表明,DeepSeek-R1-Safe 针对有毒有害言论、政治敏感内容、违法行为教唆等14个维度的普通有害问题整体防御成功率近 100%,针对情境假设、角色扮演、加密编码等多个越狱模式整体防御成功率超过40%。其综合安全防御能力达83%,在同样测试设置下超过Qwen-235B和DeepSeek-R1-671B等多个同期模型8%至15%。此外,在MMLU、GSM8K、CEVAL等通用能力基准测试中,DeepSeek-R1-Safe相比于DeepSeek-R1的性能损耗在1%以内。这些结果表明DeepSeek-R1-Safe不仅显著提升了安全防护能力,也保障了模型的可用性,达成了安全能力与通用性能之间的有效平衡。
- 狗能根据玩具功能对其进行分类
根据发表在《Current Biology》上的一项研究,某些狗不仅能记住对象如最喜欢的玩具的名称,还能将这些标签扩展到具有类似功能的全新对象,无论它们的外观是否相似。这是一种被称为“标签扩展(label extension)”的高级认知能力,动物通常需要在圈养环境经历多年强化训练才能获得这一能力。但狗狗不需要训练,只需要和人自然玩耍,它们就能掌握根据功能对玩具进行分类的能力。
- 诺格的补给飞船解决了软件问题成功抵达国际空间站
诺斯罗普格鲁曼公司的 Cygnus XL 货运飞船比预定时间晚了一天抵达国际空间站。飞船装载了超过五吨重的补给和实验物资。它是在上周日搭乘 SpaceX 的 Falcon 9 火箭发射升空,但本周二凌晨飞船在飞往国际空间站途中主引擎两次点火调整方向时却都提前关闭。工程师后来确定主引擎提前关闭是保守的软件保护措施触发的,引擎本身工作正常。在更新了软件参数之后,飞船周四飞到了距离空间站 30 英尺范围内,宇航员 Jonny Kim 操控空间站机械臂成功捕捉了飞船。
- 汽车行业制造了远超需求的汽车
在有 2100 万人口的成都郊区,ZCAR竹子买车正以惊人的折扣价出售汽车,有 5000 辆汽车可供客户挑选。国产奥迪车五折,一汽的一款七座 SUV 车型四折出售。ZCAR 声称它是从汽车制造商和经销商批量收购的,而之所以能提供如此低的价格是因为汽车行业产能过剩。调查显示,中国汽车行业的产量远超市场需求,因为行业生产目标受到政府政策而非消费者需求的影响。电动汽车的起售价低于 1 万美元,而在美国大部分电动汽车售价在 3.5 万美元以上。滞销车辆最终流向了像 Zcar 之类的交易商。业内人士和分析师认为汽车行业可能会出现类似房地产和光伏行业的动荡。产能过剩的问题十分突出:根据咨询机构盖世汽车研究院的数据,中国汽车制造商的工厂产能是去年实际产量——2750 万辆——的两倍。
- Steam 将从 2026 年起不再支持 32 位 Windows 操作系统
Valve 通过支持文档宣布,Steam 将从 2026 年起不再支持 32 位 Windows 操作系统。文档称,自 2026 年 1 月 1 日起 Steam 将停止支持 32 位 Windows。32 位 Windows 10 是目前 Steam 唯一支持的 32 位版本,在 Steam 硬件调查报告该操作系统所占比例仅为 0.01%。64 位 Windows 10 仍然会得到支持,而 32 位游戏仍可运行。现有的 Steam 客户端短期内仍可在 32 位 Windows 10 上运行,但将不再接收任何更新,Steam 也无法保证未来能继续正常运行。Valve 督促用户升级到 64 位版本,未来 Steam 将只支持 64 位操作系统。
- 新材料拉伸率达到 46 倍且能自我修复
国立阳明交通大学的研究人员在《Advanced Functional Materials》期刊上报告,他们研发出一种新型材料,拉伸率能达到原始长度的 46 倍。即使断开了,只要在室温下将断裂的部分轻压在一起,10 分钟内能完全恢复形状和拉伸性。这种具有粘性和弹性的聚氨酯有机凝胶材料由 covalently linked cellulose nanocrystals (CNCs) 和 modified mechanically interlocked molecules (MIMs)组合而成。凝胶对拉伸或加热等外力敏感,颜色会根据材料是处于静止状态还是受到刺激而从橙色变为蓝色。其独特的特性让这种凝胶具有广泛的应用前景,包括柔性电子皮肤、软体机器人以及防伪方案等。
- 2025 年度搞笑诺贝尔奖宣布
2025 年度搞笑诺贝尔奖(Ig Nobel)公布了获奖名单。搞笑诺贝尔奖创建于 1991 年,是对诺贝尔奖的善意戏仿,表彰那些令人发笑但又发人深思的研究。 文学奖授予了已故的 William B. Bean 医生,他记录和分析了一个指甲在 35 年中的生长速度,为此在医学期刊上发表了五篇论文——第一篇是 1953 年,最后一篇是 1980 年,他的儿子代替他领奖; 心理学奖授予了 Marcin Zajenkowsk 和 Gilles Gignac,其研究是告诉自恋者他们很聪明时会发生什么; 营养学奖授予了 Daniele Dendi 等人,他们研究了彩虹鬣蜥在多哥海滨度假胜地选择吃哪种披萨; 儿科学奖授予了 Julie Mennella 和 Gary Beauchamp,他们研究了哺乳期的母亲食用大蒜后婴儿的感受; 化学奖授予了 Rotem Naftalovich 等人,他们研究了食用塑料特氟龙作为一种食物体积和饱腹感而不增加卡路里的方法; 和平奖授予了 Fritz Renner 等人,他们证明了喝酒有时能提高一个人说外语的能力; 工程设计奖授予了 Vikash Kumar 和 Sarthak Mittal,他们研究了通过重新设计鞋架去解决臭鞋问题; 航空奖授予了 Francisco Sánchez 等人,他们研究了饮酒是否会影响蝙蝠的飞行能力和回声定位能力 物理学奖授予 Giacomo Bartolucci 等人,他们研究了意大利面酱的物理学,发现导致结块的相变可能会造成不良体验; 生物学奖授予了儿岛朋贵等日本科学家,他们研究发现,将黑毛和牛的身体涂成类似斑马的条纹状,可以使吸血的厩螫蝇等害虫难以靠近。这有望成为不依赖杀虫剂的害虫防治新方法。这是日本连续 19 年获得搞笑诺贝尔奖。研究团队用 6 头黑毛和牛进行了实验。他们将牛分为三组:一组用白色水性涂料涂成条纹;另一组用黑色涂料涂成不明显的条纹;第三组不涂任何条纹。随后比较了三组牛身上聚集的苍蝇数量,以及甩头、摆尾等驱赶苍蝇的行为次数。结果显示,有黑白条纹的牛身上聚集的苍蝇数量是其他两组的一半,且驱赶行为的次数也较少。但这一现象背后的原理未知。
- Google 为美国用户的 Chrome 浏览器集成 Gemini AI 功能
Google 官方博客宣布为所有美国用户的 Chrome 桌面浏览器集成 Gemini AI 功能。浏览器添加了一个瞩目的 Gemini 按钮,点击之后用户可以与 Gemini 聊天机器人进行对话,它能回答当前网页内容相关的问题,也能综合多个网页的信息。不喜欢该功能的用户也可以在界面移除 Gemini 按钮。Google 还计划未来为 Gemini 引入更强大的功能,如控制浏览器光标执行将商品添加到购物车等任务。
- 三星推送软件更新为冰箱加入广告
三星在美国上市了九款 Family Hub 冰箱,建议零售价从 1,800 美元到 3,500 美元。这些冰箱配备了 21.5 英寸或 32 英寸的显示屏,用户可选择显示各种内容。本周三星向 Family Hub 冰箱推送了软件更新,开始用这些显示屏展示广告。三星在一份声明中表示,它正在美国市场开展一项试点,在部分三星 Family Hub 冰箱型号上投放促销和精选广告。三星表示如果客户不喜欢某个广告,他们可以移除,之后该广告就不会再次展示。如果用户配置显示屏的 Cover Screen 展示 Art Mode 或相册,广告也不会展示。今年早些时候三星曾宣称它没有在冰箱显示屏上展示广告的计划,但显然食言了。