OrangeBot.AI Digest — 2025-09-20
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Designing NotebookLM (jasonspielman.com)
- Ultrasonic Chef's Knife (seattleultrasonics.com)
- Are touchscreens in cars dangerous? (www.economist.com)
- Cormac McCarthy's tips on how to write a science paper (2019) [pdf] (gwern.net)
- Britain jumps into bed with Palantir in £1.5B defense pact (www.theregister.com)
- Microsoft has urged its employees on H-1B and H-4 visas to return immediately (timesofindia.indiatimes.com)
- Git: Introduce Rust and announce it will become mandatory in the build system (lore.kernel.org)
- MapSCII – World map in terminal (github.com)
- Images over DNS (dgl.cx)
- FLX1s phone is launched (furilabs.com)
- IG Nobel Prize Winners 2025 (improbable.com)
- Scream cipher (sethmlarson.dev)
- Claude can sometimes prove it (www.galois.com)
- If you are good at code review, you will be good at using AI agents (www.seangoedecke.com)
- I'm Not a Robot (neal.fun)
GitHub Trending(15)
- Gar-b-age / CookLikeHOC
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
- Alibaba-NLP / DeepResearch
Tongyi Deep Research, the Leading Open-source Deep Research Agent
- flutter / flutter
Flutter makes it easy and fast to build beautiful apps for mobile and beyond
- winfunc / opcode
A powerful GUI app and Toolkit for Claude Code - Create custom agents, manage interactive Claude Code sessions, run secure background agents, and more.
- tldraw / tldraw
very good whiteboard SDK / infinite canvas SDK
- fmtlib / fmt
A modern formatting library
- linera-io / linera-protocol
Main repository for the Linera protocol
- grafana / loki
Like Prometheus, but for logs.
- microsoft / AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
- CopilotKit / CopilotKit
React UI + elegant infrastructure for AI Copilots, AI chatbots, and in-app AI agents. The Agentic last-mile 🪁
- OpenMind / OM1
Modular AI runtime for robots
- 9001 / copyparty
Portable file server with accelerated resumable uploads, dedup, WebDAV, FTP, TFTP, zeroconf, media indexer, thumbnails++ all in one file, no deps
- knownsec / aipyapp
AI-Powered Python & Python-Powered AI (Python-Use)
- cypress-io / cypress
Fast, easy and reliable testing for anything that runs in a browser.
- unslothai / unsloth
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Hugging Face(15)
- ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, built via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose computer use agents. We will release data, models, and code to advance future research: https://github.com/OpenGVLab/ScaleCUA.
- FlowRL: Matching Reward Distributions for LLM Reasoning
We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of 10.0% over GRPO and 5.1% over PPO on math benchmarks, and performs consistently better on code reasoning tasks. These results highlight reward distribution-matching as a key step toward efficient exploration and diverse reasoning in LLM reinforcement learning.
- Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
Large language models (LLMs) are increasingly applied in diverse real-world scenarios, each governed by bespoke behavioral and safety specifications (spec) custom-tailored by users or organizations. These spec, categorized into safety-spec and behavioral-spec, vary across scenarios and evolve with changing preferences and requirements. We formalize this challenge as specification alignment, focusing on LLMs' ability to follow dynamic, scenario-specific spec from both behavioral and safety perspectives. To address this challenge, we propose Align3, a lightweight method that employs Test-Time Deliberation (TTD) with hierarchical reflection and revision to reason over the specification boundaries. We further present SpecBench, a unified benchmark for measuring specification alignment, covering 5 scenarios, 103 spec, and 1,500 prompts. Experiments on 15 reasoning and 18 instruct models with several TTD methods, including Self-Refine, TPO, and MoreThink, yield three key findings: (i) test-time deliberation enhances specification alignment; (ii) Align3 advances the safety-helpfulness trade-off frontier with minimal overhead; (iii) SpecBench effectively reveals alignment gaps. These results highlight the potential of test-time deliberation as an effective strategy for reasoning over the real-world specification boundaries.
- Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing label-free methods, confidence minimization, self-consistency, or majority-vote objectives, stabilize learning but steadily shrink exploration, causing an entropy collapse: generations become shorter, less diverse, and brittle. Unlike prior approaches such as Test-Time Reinforcement Learning (TTRL), which primarily adapt models to the immediate unlabeled dataset at hand, our goal is broader: to enable general improvements without sacrificing the model's inherent exploration capacity and generalization ability, i.e., evolving. We formalize this issue and propose EVolution-Oriented and Label-free Reinforcement Learning (EVOL-RL), a simple rule that couples stability with variation under a label-free setting. EVOL-RL keeps the majority-voted answer as a stable anchor (selection) while adding a novelty-aware reward that favors responses whose reasoning differs from what has already been produced (variation), measured in semantic space. Implemented with GRPO, EVOL-RL also uses asymmetric clipping to preserve strong signals and an entropy regularizer to sustain search. This majority-for-selection + novelty-for-variation design prevents collapse, maintains longer and more informative chains of thought, and improves both pass@1 and pass@n. EVOL-RL consistently outperforms the majority-only TTRL baseline; e.g., training on label-free AIME24 lifts Qwen3-4B-Base AIME25 pass@1 from TTRL's 4.6% to 16.4%, and pass@16 from 18.5% to 37.9%. EVOL-RL not only prevents diversity collapse but also unlocks stronger generalization across domains (e.g., GPQA). Furthermore, we demonstrate that EVOL-RL also boosts performance in the RLVR setting, highlighting its broad applicability.
- Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Recent studies have demonstrated the importance of high-quality visual representations in image generation and have highlighted the limitations of generative models in image understanding. As a generative paradigm originally designed for natural language, autoregressive models face similar challenges. In this work, we present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain. We identify three key properties that hinder the learning of high-level visual semantics: local and conditional dependence, inter-step semantic inconsistency, and spatial invariance deficiency. We show that these issues can be effectively addressed by introducing self-supervised objectives during training, leading to a novel training framework, Self-guided Training for AutoRegressive models (ST-AR). Without relying on pre-trained representation models, ST-AR significantly enhances the image understanding ability of autoregressive models and leads to improved generation quality. Specifically, ST-AR brings approximately 42% FID improvement for LlamaGen-L and 49% FID improvement for LlamaGen-XL, while maintaining the same sampling strategy.
- FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Search has emerged as core infrastructure for LLM-based agents and is widely viewed as critical on the path toward more general intelligence. Finance is a particularly demanding proving ground: analysts routinely conduct complex, multi-step searches over time-sensitive, domain-specific data, making it ideal for assessing both search proficiency and knowledge-grounded reasoning. Yet no existing open financial datasets evaluate data searching capability of end-to-end agents, largely because constructing realistic, complicated tasks requires deep financial expertise and time-sensitive data is hard to evaluate. We present FinSearchComp, the first fully open-source agent benchmark for realistic, open-domain financial search and reasoning. FinSearchComp comprises three tasks -- Time-Sensitive Data Fetching, Simple Historical Lookup, and Complex Historical Investigation -- closely reproduce real-world financial analyst workflows. To ensure difficulty and reliability, we engage 70 professional financial experts for annotation and implement a rigorous multi-stage quality-assurance pipeline. The benchmark includes 635 questions spanning global and Greater China markets, and we evaluate 21 models (products) on it. Grok 4 (web) tops the global subset, approaching expert-level accuracy. DouBao (web) leads on the Greater China subset. Experimental analyses show that equipping agents with web search and financial plugins substantially improves results on FinSearchComp, and the country origin of models and tools impact performance significantly.By aligning with realistic analyst tasks and providing end-to-end evaluation, FinSearchComp offers a professional, high-difficulty testbed for complex financial search and reasoning.
- RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
This paper presents RynnVLA-001, a vision-language-action(VLA) model built upon large-scale video generative pretraining from human demonstrations. We propose a novel two-stage pretraining methodology. The first stage, Ego-Centric Video Generative Pretraining, trains an Image-to-Video model on 12M ego-centric manipulation videos to predict future frames conditioned on an initial frame and a language instruction. The second stage, Human-Centric Trajectory-Aware Modeling, extends this by jointly predicting future keypoint trajectories, thereby effectively bridging visual frame prediction with action prediction. Furthermore, to enhance action representation, we propose ActionVAE, a variational autoencoder that compresses sequences of actions into compact latent embeddings, reducing the complexity of the VLA output space. When finetuned on the same downstream robotics datasets, RynnVLA-001 achieves superior performance over state-of-the-art baselines, demonstrating that the proposed pretraining strategy provides a more effective initialization for VLA models.
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Recent video diffusion models demonstrate strong potential in spatial intelligence tasks due to their rich latent world priors. However, this potential is hindered by their limited controllability and geometric inconsistency, creating a gap between their strong priors and their practical use in 3D/4D tasks. As a result, current approaches often rely on retraining or fine-tuning, which risks degrading pretrained knowledge and incurs high computational costs. To address this, we propose WorldForge, a training-free, inference-time framework composed of three tightly coupled modules. Intra-Step Recursive Refinement introduces a recursive refinement mechanism during inference, which repeatedly optimizes network predictions within each denoising step to enable precise trajectory injection. Flow-Gated Latent Fusion leverages optical flow similarity to decouple motion from appearance in the latent space and selectively inject trajectory guidance into motion-related channels. Dual-Path Self-Corrective Guidance compares guided and unguided denoising paths to adaptively correct trajectory drift caused by noisy or misaligned structural signals. Together, these components inject fine-grained, trajectory-aligned guidance without training, achieving both accurate motion control and photorealistic content generation. Extensive experiments across diverse benchmarks validate our method's superiority in realism, trajectory consistency, and visual fidelity. This work introduces a novel plug-and-play paradigm for controllable video synthesis, offering a new perspective on leveraging generative priors for spatial intelligence.
- AToken: A Unified Tokenizer for Vision
We present AToken, the first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. Unlike existing tokenizers that specialize in either reconstruction or understanding for single modalities, AToken encodes these diverse visual inputs into a shared 4D latent space, unifying both tasks and modalities in a single framework. Specifically, we introduce a pure transformer architecture with 4D rotary position embeddings to process visual inputs of arbitrary resolutions and temporal durations. To ensure stable training, we introduce an adversarial-free training objective that combines perceptual and Gram matrix losses, achieving state-of-the-art reconstruction quality. By employing a progressive training curriculum, AToken gradually expands from single images, videos, and 3D, and supports both continuous and discrete latent tokens. AToken achieves 0.21 rFID with 82.2% ImageNet accuracy for images, 3.01 rFVD with 32.6% MSRVTT retrieval for videos, and 28.19 PSNR with 90.9% classification accuracy for 3D. In downstream applications, AToken enables both visual generation tasks (e.g., image generation with continuous and discrete tokens, text-to-video generation, image-to-3D synthesis) and understanding tasks (e.g., multimodal LLMs), achieving competitive performance across all benchmarks. These results shed light on the next-generation multimodal AI systems built upon unified visual tokenization.
- MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Current instruction-based image editing (IBIE) methods struggle with challenging editing tasks, as both editing types and sample counts of existing datasets are limited. Moreover, traditional dataset construction often contains noisy image-caption pairs, which may introduce biases and limit model capabilities in complex editing scenarios. To address these limitations, we introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality image editing samples. It encompasses 6 challenging editing tasks through a diverse collection of 18 non-style-transfer editing types and 38 style transfer operations, covering a spectrum from sophisticated style transfer to complex semantic operations like person reference editing and in-image text editing. We employ a novel dataset construction pipeline that utilizes two multi-modal large language models (MLLMs) to generate visual-adaptive editing instructions and produce high-fidelity edited images, respectively. Extensive experiments demonstrate that fine-tuning foundational open-source models with our MultiEdit-Train set substantially improves models' performance on sophisticated editing tasks in our proposed MultiEdit-Test benchmark, while effectively preserving their capabilities on the standard editing benchmark. We believe MultiEdit provides a valuable resource for advancing research into more diverse and challenging IBIE capabilities. Our dataset is available at https://huggingface.co/datasets/inclusionAI/MultiEdit.
- Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
Materials characterization is fundamental to acquiring materials information, revealing the processing-microstructure-property relationships that guide material design and optimization. While multimodal large language models (MLLMs) have recently shown promise in generative and predictive tasks within materials science, their capacity to understand real-world characterization imaging data remains underexplored. To bridge this gap, we present MatCha, the first benchmark for materials characterization image understanding, comprising 1,500 questions that demand expert-level domain expertise. MatCha encompasses four key stages of materials research comprising 21 distinct tasks, each designed to reflect authentic challenges faced by materials scientists. Our evaluation of state-of-the-art MLLMs on MatCha reveals a significant performance gap compared to human experts. These models exhibit degradation when addressing questions requiring higher-level expertise and sophisticated visual perception. Simple few-shot and chain-of-thought prompting struggle to alleviate these limitations. These findings highlight that existing MLLMs still exhibit limited adaptability to real-world materials characterization scenarios. We hope MatCha will facilitate future research in areas such as new material discovery and autonomous scientific agents. MatCha is available at https://github.com/FreedomIntelligence/MatCha.
- RecoWorld: Building Simulated Environments for Agentic Recommender Systems
We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multimodal, and semantic ID modeling, and discuss how multi-turn RL enables the recommender to refine its strategies through iterative interactions. RecoWorld also supports multi-agent simulations, allowing creators to simulate the responses of targeted user populations. It marks an important first step toward recommender systems where users and agents collaboratively shape personalized information streams. We envision new interaction paradigms where "user instructs, recommender responds," jointly optimizing user retention and engagement.
- Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
- Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
Spatio-temporal video grounding (STVG) aims at localizing the spatio-temporal tube of a video, as specified by the input text query. In this paper, we utilize multimodal large language models (MLLMs) to explore a zero-shot solution in STVG. We reveal two key insights about MLLMs: (1) MLLMs tend to dynamically assign special tokens, referred to as grounding tokens, for grounding the text query; and (2) MLLMs often suffer from suboptimal grounding due to the inability to fully integrate the cues in the text query (e.g., attributes, actions) for inference. Based on these insights, we propose a MLLM-based zero-shot framework for STVG, which includes novel decomposed spatio-temporal highlighting (DSTH) and temporal-augmented assembling (TAS) strategies to unleash the reasoning ability of MLLMs. The DSTH strategy first decouples the original query into attribute and action sub-queries for inquiring the existence of the target both spatially and temporally. It then uses a novel logit-guided re-attention (LRA) module to learn latent variables as spatial and temporal prompts, by regularizing token predictions for each sub-query. These prompts highlight attribute and action cues, respectively, directing the model's attention to reliable spatial and temporal related visual regions. In addition, as the spatial grounding by the attribute sub-query should be temporally consistent, we introduce the TAS strategy to assemble the predictions using the original video frames and the temporal-augmented frames as inputs to help improve temporal consistency. We evaluate our method on various MLLMs, and show that it outperforms SOTA methods on three common STVG benchmarks. The code will be available at https://github.com/zaiquanyang/LLaVA_Next_STVG.
- EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing
Instruction-based image editing has advanced rapidly, yet reliable and interpretable evaluation remains a bottleneck. Current protocols either (i) depend on paired reference images -- resulting in limited coverage and inheriting biases from prior generative models -- or (ii) rely solely on zero-shot vision-language models (VLMs), whose prompt-based assessments of instruction following, content consistency, and visual quality are often imprecise. To address this, we introduce EdiVal-Agent, an automated, scalable, and fine-grained evaluation framework for multi-turn instruction-based editing from an object-centric perspective, supported by a suite of expert tools. Given an image, EdiVal-Agent first decomposes it into semantically meaningful objects, then synthesizes diverse, context-aware editing instructions. For evaluation, it integrates VLMs with open-vocabulary object detectors to assess instruction following, uses semantic-level feature extractors to evaluate content consistency, and leverages human preference models to judge visual quality. We show that combining VLMs with object detectors yields stronger agreement with human judgments in instruction-following evaluation compared to using VLMs alone and CLIP-based metrics. Furthermore, the pipeline's modular design allows future tools to be seamlessly integrated, enhancing evaluation accuracy over time. Instantiating this pipeline, we build EdiVal-Bench, a multi-turn editing benchmark covering 9 instruction types and 11 state-of-the-art editing models spanning autoregressive (AR) (including Nano Banana, GPT-Image-1), flow-matching, and diffusion paradigms. We demonstrate that EdiVal-Agent can be used to identify existing failure modes, thereby informing the development of the next generation of editing models. Project page: https://tianyucodings.github.io/EdiVAL-page/.
Solidot(15)
- 小米将远程修复其 11 万辆 SU7 的辅助驾驶系统缺陷
小米和国家市场监督管理总局周五宣布,召回 2024 年 2 月 6 日至 2025 年 8 月 30 日生产的部分 SU7 标准版电动汽车,共计 116887 辆。召回编号S2025M0149I:涉及XMA7000MBEVR2和XMA7000MBEVR5车型,共计98462辆。召回编号S2025M0150I:涉及BJ7000MBEVR2车型,共计18425辆。本次召回范围内部分车辆在L2高速领航辅助驾驶功能开启的某些情况下,对极端特殊场景的识别、预警或处置可能不足,若驾驶员不及时干预可能会增加碰撞风险,存在安全隐患。小米汽车科技有限公司将通过汽车远程升级(OTA)技术,为召回范围内的车辆免费升级软件,以消除安全隐患。今年早些时候曾发生过一起涉及 SU7 辅助驾驶系统的致命车祸,导致三名大学生死亡。
- 美国要求 H-1B 签证申请支付 10 万美元
美国总统特朗普发布行政令,以打击 H-1B 签证滥用,保护美国人工作的理由宣布从 9 月 21 日起入境美国的签证持有者需要在申请时支付 10 万美元。目前位于美国境内的 H1B 签证持有者不受影响,但在延长签证后他们出入境时也都需要有支付 10 万美元的证明。目前在境外的签证持有者则需要在 9 月 21 日前返回,否则需要有支付 10 万美元的证明。美国科技巨头们大规模使用 H-1B 签证,经常在裁员的同时招聘 H1B 签证持有者,因此在美国国内饱受滥用 H-1B 签证的批评。亚马逊在 2025 年上半年批准了逾 1 万份 H-1B 签证,而微软和 Meta 分别批准了逾 5,000 份 H-1B 签证。
- 华为和浙江大学发布 DeepSeek-R1-Safe
华为和浙江大学合作使用华为昇腾芯片和 MindSpeedLLM 等框架发布了 DeepSeek R1 模型的安全加强版 DeepSeek-R1-Safe(中国联通也有相似名字的安全版本模型)。源代码发表在 GitHub 等平台上。研究人员称他们基于国内外法律法规与核心价值观,构建了中英文双语的安全语料。其中语料不仅包含了带有安全思维链的标注,还提供了相应的安全回复,可用于大模型的安全训练、微调以及测试。测试结果表明,DeepSeek-R1-Safe 针对有毒有害言论、政治敏感内容、违法行为教唆等14个维度的普通有害问题整体防御成功率近 100%,针对情境假设、角色扮演、加密编码等多个越狱模式整体防御成功率超过40%。其综合安全防御能力达83%,在同样测试设置下超过Qwen-235B和DeepSeek-R1-671B等多个同期模型8%至15%。此外,在MMLU、GSM8K、CEVAL等通用能力基准测试中,DeepSeek-R1-Safe相比于DeepSeek-R1的性能损耗在1%以内。这些结果表明DeepSeek-R1-Safe不仅显著提升了安全防护能力,也保障了模型的可用性,达成了安全能力与通用性能之间的有效平衡。
- 狗能根据玩具功能对其进行分类
根据发表在《Current Biology》上的一项研究,某些狗不仅能记住对象如最喜欢的玩具的名称,还能将这些标签扩展到具有类似功能的全新对象,无论它们的外观是否相似。这是一种被称为“标签扩展(label extension)”的高级认知能力,动物通常需要在圈养环境经历多年强化训练才能获得这一能力。但狗狗不需要训练,只需要和人自然玩耍,它们就能掌握根据功能对玩具进行分类的能力。
- 诺格的补给飞船解决了软件问题成功抵达国际空间站
诺斯罗普格鲁曼公司的 Cygnus XL 货运飞船比预定时间晚了一天抵达国际空间站。飞船装载了超过五吨重的补给和实验物资。它是在上周日搭乘 SpaceX 的 Falcon 9 火箭发射升空,但本周二凌晨飞船在飞往国际空间站途中主引擎两次点火调整方向时却都提前关闭。工程师后来确定主引擎提前关闭是保守的软件保护措施触发的,引擎本身工作正常。在更新了软件参数之后,飞船周四飞到了距离空间站 30 英尺范围内,宇航员 Jonny Kim 操控空间站机械臂成功捕捉了飞船。
- 汽车行业制造了远超需求的汽车
在有 2100 万人口的成都郊区,ZCAR竹子买车正以惊人的折扣价出售汽车,有 5000 辆汽车可供客户挑选。国产奥迪车五折,一汽的一款七座 SUV 车型四折出售。ZCAR 声称它是从汽车制造商和经销商批量收购的,而之所以能提供如此低的价格是因为汽车行业产能过剩。调查显示,中国汽车行业的产量远超市场需求,因为行业生产目标受到政府政策而非消费者需求的影响。电动汽车的起售价低于 1 万美元,而在美国大部分电动汽车售价在 3.5 万美元以上。滞销车辆最终流向了像 Zcar 之类的交易商。业内人士和分析师认为汽车行业可能会出现类似房地产和光伏行业的动荡。产能过剩的问题十分突出:根据咨询机构盖世汽车研究院的数据,中国汽车制造商的工厂产能是去年实际产量——2750 万辆——的两倍。
- Steam 将从 2026 年起不再支持 32 位 Windows 操作系统
Valve 通过支持文档宣布,Steam 将从 2026 年起不再支持 32 位 Windows 操作系统。文档称,自 2026 年 1 月 1 日起 Steam 将停止支持 32 位 Windows。32 位 Windows 10 是目前 Steam 唯一支持的 32 位版本,在 Steam 硬件调查报告该操作系统所占比例仅为 0.01%。64 位 Windows 10 仍然会得到支持,而 32 位游戏仍可运行。现有的 Steam 客户端短期内仍可在 32 位 Windows 10 上运行,但将不再接收任何更新,Steam 也无法保证未来能继续正常运行。Valve 督促用户升级到 64 位版本,未来 Steam 将只支持 64 位操作系统。
- 新材料拉伸率达到 46 倍且能自我修复
国立阳明交通大学的研究人员在《Advanced Functional Materials》期刊上报告,他们研发出一种新型材料,拉伸率能达到原始长度的 46 倍。即使断开了,只要在室温下将断裂的部分轻压在一起,10 分钟内能完全恢复形状和拉伸性。这种具有粘性和弹性的聚氨酯有机凝胶材料由 covalently linked cellulose nanocrystals (CNCs) 和 modified mechanically interlocked molecules (MIMs)组合而成。凝胶对拉伸或加热等外力敏感,颜色会根据材料是处于静止状态还是受到刺激而从橙色变为蓝色。其独特的特性让这种凝胶具有广泛的应用前景,包括柔性电子皮肤、软体机器人以及防伪方案等。
- 2025 年度搞笑诺贝尔奖宣布
2025 年度搞笑诺贝尔奖(Ig Nobel)公布了获奖名单。搞笑诺贝尔奖创建于 1991 年,是对诺贝尔奖的善意戏仿,表彰那些令人发笑但又发人深思的研究。 文学奖授予了已故的 William B. Bean 医生,他记录和分析了一个指甲在 35 年中的生长速度,为此在医学期刊上发表了五篇论文——第一篇是 1953 年,最后一篇是 1980 年,他的儿子代替他领奖; 心理学奖授予了 Marcin Zajenkowsk 和 Gilles Gignac,其研究是告诉自恋者他们很聪明时会发生什么; 营养学奖授予了 Daniele Dendi 等人,他们研究了彩虹鬣蜥在多哥海滨度假胜地选择吃哪种披萨; 儿科学奖授予了 Julie Mennella 和 Gary Beauchamp,他们研究了哺乳期的母亲食用大蒜后婴儿的感受; 化学奖授予了 Rotem Naftalovich 等人,他们研究了食用塑料特氟龙作为一种食物体积和饱腹感而不增加卡路里的方法; 和平奖授予了 Fritz Renner 等人,他们证明了喝酒有时能提高一个人说外语的能力; 工程设计奖授予了 Vikash Kumar 和 Sarthak Mittal,他们研究了通过重新设计鞋架去解决臭鞋问题; 航空奖授予了 Francisco Sánchez 等人,他们研究了饮酒是否会影响蝙蝠的飞行能力和回声定位能力 物理学奖授予 Giacomo Bartolucci 等人,他们研究了意大利面酱的物理学,发现导致结块的相变可能会造成不良体验; 生物学奖授予了儿岛朋贵等日本科学家,他们研究发现,将黑毛和牛的身体涂成类似斑马的条纹状,可以使吸血的厩螫蝇等害虫难以靠近。这有望成为不依赖杀虫剂的害虫防治新方法。这是日本连续 19 年获得搞笑诺贝尔奖。研究团队用 6 头黑毛和牛进行了实验。他们将牛分为三组:一组用白色水性涂料涂成条纹;另一组用黑色涂料涂成不明显的条纹;第三组不涂任何条纹。随后比较了三组牛身上聚集的苍蝇数量,以及甩头、摆尾等驱赶苍蝇的行为次数。结果显示,有黑白条纹的牛身上聚集的苍蝇数量是其他两组的一半,且驱赶行为的次数也较少。但这一现象背后的原理未知。
- Google 为美国用户的 Chrome 浏览器集成 Gemini AI 功能
Google 官方博客宣布为所有美国用户的 Chrome 桌面浏览器集成 Gemini AI 功能。浏览器添加了一个瞩目的 Gemini 按钮,点击之后用户可以与 Gemini 聊天机器人进行对话,它能回答当前网页内容相关的问题,也能综合多个网页的信息。不喜欢该功能的用户也可以在界面移除 Gemini 按钮。Google 还计划未来为 Gemini 引入更强大的功能,如控制浏览器光标执行将商品添加到购物车等任务。
- 三星推送软件更新为冰箱加入广告
三星在美国上市了九款 Family Hub 冰箱,建议零售价从 1,800 美元到 3,500 美元。这些冰箱配备了 21.5 英寸或 32 英寸的显示屏,用户可选择显示各种内容。本周三星向 Family Hub 冰箱推送了软件更新,开始用这些显示屏展示广告。三星在一份声明中表示,它正在美国市场开展一项试点,在部分三星 Family Hub 冰箱型号上投放促销和精选广告。三星表示如果客户不喜欢某个广告,他们可以移除,之后该广告就不会再次展示。如果用户配置显示屏的 Cover Screen 展示 Art Mode 或相册,广告也不会展示。今年早些时候三星曾宣称它没有在冰箱显示屏上展示广告的计划,但显然食言了。
- 斑胸草雀具有语义理解能力
根据发表在《科学》期刊上的一项研究,斑胸草雀不仅能区分其同类的所有鸣叫,还能根据其含义进行归类。这些结果提示,斑胸草雀具有惊人的语义理解水平。许多群居动物会利用丰富的鸣叫方式来表达其需求、情感和环境意识。斑胸草雀是一种高度社会化的鸣禽;它们在多元的社会行为中会发出约 11 种不同类型的叫声。为测试成年斑胸草雀如何对其同类的叫声进行分类,研究人员对 12 只斑胸草雀进行了一项实验;在该实验中,这些斑胸草雀必须区分一种可获奖励的鸣叫与其他十种非奖励鸣叫,包括来自其他不熟悉物种的鸣叫。他们发现,这些鸟具有卓越的区分其鸣叫库中所有类型鸣叫的能力,表明它们能够准确地感知其同类的鸣叫信号并对其进行分类。
- 英伟达向英特尔投资 50 亿美元
在美国政府收购英特尔 10% 股份一个月之后,英伟达宣布斥资 50 亿美元、以每股 23.28 美元的价格收购英特尔普通股,双方还同意合作开发用于 PC 和数据中心的新 AI 芯片,其中用于 PC 的芯片将集成英特尔的 CPU 和英伟达 GPU。双方的声明并未提及英伟达是否会使用英特尔的芯片工厂生产芯片。英伟达是芯片行业市值最高的企业,它自己不制造芯片,而是主要靠台积电等公司代工,英特尔正在推动自己的芯片代工业务,但进展甚微。
- 研究发现珊瑚无法在一个更温暖的世界里生存下来
根据发表在《自然》期刊上的一项研究,如果全球气温继续上升,到本世纪末大西洋几乎所有珊瑚都将停止生长。英国埃克塞特大学等研究机构的研究人员分析了大西洋 400 多个珊瑚礁,他们估计即使在乐观的气候暖化情景下,到 2040 年该地区逾七成的珊瑚礁也将开始死亡。如果到本世纪地球气温比工业化前水平升高超过 2 摄氏度,该地区 99% 的珊瑚礁将面临同样命运。地球气温目前已比工业化前水平高约 1.3 摄氏度。珊瑚的死亡具有深远影响。珊瑚组成的礁为鱼类等海洋生物提供栖息地,是抵御海浪的屏障,帮助保护海岸线应对海平面上升的冲击。而四分之一的海洋生物依赖于珊瑚礁,逾十亿人受益珊瑚礁。
- DeepSeek 发表 R1 模型论文,称训练成本仅 29.4 万美元
DeepSeek 的研究人员在《自然》期刊上发表了 R1 模型论文《DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning》。研究人员披露 R1 的训练成本仅 29.4 万美元,但其基础模型花了约 600 万美元;R1 主要使用英伟达的 H800 AI 芯片训练,该芯片自 2023 年起被禁止出口到中国。DeepSeek 的主要创新是使用名叫纯强化学习(pure reinforcement learning)的方法自动化试错,对模型得出正确答案进行奖励,而不是教它遵循人类选择的推理示例。模型还使用名叫 group relative policy optimization 的方法给自己打分。对于今年早些使用 OpenAI 指责 DeepSeek 使用其模型的输出进行训练,研究人员予以否认。DeepSeek-R1 是 Hugging Face 上最受欢迎的模型之一,下载量达到 1090 万次,2025 年使用强化学习的大模型几乎都受到了 R1 的启发。