OrangeBot.AI Digest — 2025-09-19
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Ruby Central's Attack on RubyGems [pdf] (pup-e.com)
- Help Us Raise $200k to Free JavaScript from Oracle (deno.com)
- iTerm2 Web Browser (iterm2.com)
- Nostr (nostr.com)
- Leatherman (vagabond) (en.wikipedia.org)
- Rules for creating good-looking user interfaces, from a developer (weberdominik.com)
- Bravo Apple! Calculator app has a memory leak (xcancel.com)
- Playing “Minecraft” without Minecraft (2024) (lenowo.org)
- Gemini in Chrome (gemini.google)
- Tracking trust with Rust in the kernel (lwn.net)
- David Lynch LA House (www.wallpaper.com)
- Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs (github.com)
- Want to piss off your IT department? Are the links not malicious looking enough? (phishyurl.com)
- The Sagrada Família takes its final shape (www.newyorker.com)
- Meta’s live demo fails; “AI” recording plays before the actor takes the steps (www.reddit.com)
GitHub Trending(15)
- Alibaba-NLP / DeepResearch
Tongyi Deep Research, the Leading Open-source Deep Research Agent
- LazyVim / LazyVim
Neovim config for the lazy
- basecamp / omarchy
Opinionated Arch/Hyprland Setup
- WebGoat / WebGoat
WebGoat is a deliberately insecure application
- flutter / flutter
Flutter makes it easy and fast to build beautiful apps for mobile and beyond
- nocodb / nocodb
🔥 🔥 🔥 Open Source Airtable Alternative
- facebookresearch / detectron2
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
- fmtlib / fmt
A modern formatting library
- Gar-b-age / CookLikeHOC
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
- linera-io / linera-protocol
Main repository for the Linera protocol
- microsoft / AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
- CopilotKit / CopilotKit
React UI + elegant infrastructure for AI Copilots, AI chatbots, and in-app AI agents. The Agentic last-mile 🪁
- microsoft / markitdown
Python tool for converting files and office documents to Markdown.
- bitnami / containers
Bitnami container images
- google-research / timesfm
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
Hugging Face(15)
- ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, built via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose computer use agents. We will release data, models, and code to advance future research: https://github.com/OpenGVLab/ScaleCUA.
- FlowRL: Matching Reward Distributions for LLM Reasoning
We propose FlowRL: matching the full reward distribution via flow balancing instead of maximizing rewards in large language model (LLM) reinforcement learning (RL). Recent advanced reasoning models adopt reward-maximizing methods (\eg, PPO and GRPO), which tend to over-optimize dominant reward signals while neglecting less frequent but valid reasoning paths, thus reducing diversity. In contrast, we transform scalar rewards into a normalized target distribution using a learnable partition function, and then minimize the reverse KL divergence between the policy and the target distribution. We implement this idea as a flow-balanced optimization method that promotes diverse exploration and generalizable reasoning trajectories. We conduct experiments on math and code reasoning tasks: FlowRL achieves a significant average improvement of 10.0% over GRPO and 5.1% over PPO on math benchmarks, and performs consistently better on code reasoning tasks. These results highlight reward distribution-matching as a key step toward efficient exploration and diverse reasoning in LLM reinforcement learning.
- Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
Large language models (LLMs) are increasingly applied in diverse real-world scenarios, each governed by bespoke behavioral and safety specifications (spec) custom-tailored by users or organizations. These spec, categorized into safety-spec and behavioral-spec, vary across scenarios and evolve with changing preferences and requirements. We formalize this challenge as specification alignment, focusing on LLMs' ability to follow dynamic, scenario-specific spec from both behavioral and safety perspectives. To address this challenge, we propose Align3, a lightweight method that employs Test-Time Deliberation (TTD) with hierarchical reflection and revision to reason over the specification boundaries. We further present SpecBench, a unified benchmark for measuring specification alignment, covering 5 scenarios, 103 spec, and 1,500 prompts. Experiments on 15 reasoning and 18 instruct models with several TTD methods, including Self-Refine, TPO, and MoreThink, yield three key findings: (i) test-time deliberation enhances specification alignment; (ii) Align3 advances the safety-helpfulness trade-off frontier with minimal overhead; (iii) SpecBench effectively reveals alignment gaps. These results highlight the potential of test-time deliberation as an effective strategy for reasoning over the real-world specification boundaries.
- Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing label-free methods, confidence minimization, self-consistency, or majority-vote objectives, stabilize learning but steadily shrink exploration, causing an entropy collapse: generations become shorter, less diverse, and brittle. Unlike prior approaches such as Test-Time Reinforcement Learning (TTRL), which primarily adapt models to the immediate unlabeled dataset at hand, our goal is broader: to enable general improvements without sacrificing the model's inherent exploration capacity and generalization ability, i.e., evolving. We formalize this issue and propose EVolution-Oriented and Label-free Reinforcement Learning (EVOL-RL), a simple rule that couples stability with variation under a label-free setting. EVOL-RL keeps the majority-voted answer as a stable anchor (selection) while adding a novelty-aware reward that favors responses whose reasoning differs from what has already been produced (variation), measured in semantic space. Implemented with GRPO, EVOL-RL also uses asymmetric clipping to preserve strong signals and an entropy regularizer to sustain search. This majority-for-selection + novelty-for-variation design prevents collapse, maintains longer and more informative chains of thought, and improves both pass@1 and pass@n. EVOL-RL consistently outperforms the majority-only TTRL baseline; e.g., training on label-free AIME24 lifts Qwen3-4B-Base AIME25 pass@1 from TTRL's 4.6% to 16.4%, and pass@16 from 18.5% to 37.9%. EVOL-RL not only prevents diversity collapse but also unlocks stronger generalization across domains (e.g., GPQA). Furthermore, we demonstrate that EVOL-RL also boosts performance in the RLVR setting, highlighting its broad applicability.
- Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Recent studies have demonstrated the importance of high-quality visual representations in image generation and have highlighted the limitations of generative models in image understanding. As a generative paradigm originally designed for natural language, autoregressive models face similar challenges. In this work, we present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain. We identify three key properties that hinder the learning of high-level visual semantics: local and conditional dependence, inter-step semantic inconsistency, and spatial invariance deficiency. We show that these issues can be effectively addressed by introducing self-supervised objectives during training, leading to a novel training framework, Self-guided Training for AutoRegressive models (ST-AR). Without relying on pre-trained representation models, ST-AR significantly enhances the image understanding ability of autoregressive models and leads to improved generation quality. Specifically, ST-AR brings approximately 42% FID improvement for LlamaGen-L and 49% FID improvement for LlamaGen-XL, while maintaining the same sampling strategy.
- FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Search has emerged as core infrastructure for LLM-based agents and is widely viewed as critical on the path toward more general intelligence. Finance is a particularly demanding proving ground: analysts routinely conduct complex, multi-step searches over time-sensitive, domain-specific data, making it ideal for assessing both search proficiency and knowledge-grounded reasoning. Yet no existing open financial datasets evaluate data searching capability of end-to-end agents, largely because constructing realistic, complicated tasks requires deep financial expertise and time-sensitive data is hard to evaluate. We present FinSearchComp, the first fully open-source agent benchmark for realistic, open-domain financial search and reasoning. FinSearchComp comprises three tasks -- Time-Sensitive Data Fetching, Simple Historical Lookup, and Complex Historical Investigation -- closely reproduce real-world financial analyst workflows. To ensure difficulty and reliability, we engage 70 professional financial experts for annotation and implement a rigorous multi-stage quality-assurance pipeline. The benchmark includes 635 questions spanning global and Greater China markets, and we evaluate 21 models (products) on it. Grok 4 (web) tops the global subset, approaching expert-level accuracy. DouBao (web) leads on the Greater China subset. Experimental analyses show that equipping agents with web search and financial plugins substantially improves results on FinSearchComp, and the country origin of models and tools impact performance significantly.By aligning with realistic analyst tasks and providing end-to-end evaluation, FinSearchComp offers a professional, high-difficulty testbed for complex financial search and reasoning.
- AToken: A Unified Tokenizer for Vision
We present AToken, the first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. Unlike existing tokenizers that specialize in either reconstruction or understanding for single modalities, AToken encodes these diverse visual inputs into a shared 4D latent space, unifying both tasks and modalities in a single framework. Specifically, we introduce a pure transformer architecture with 4D rotary position embeddings to process visual inputs of arbitrary resolutions and temporal durations. To ensure stable training, we introduce an adversarial-free training objective that combines perceptual and Gram matrix losses, achieving state-of-the-art reconstruction quality. By employing a progressive training curriculum, AToken gradually expands from single images, videos, and 3D, and supports both continuous and discrete latent tokens. AToken achieves 0.21 rFID with 82.2% ImageNet accuracy for images, 3.01 rFVD with 32.6% MSRVTT retrieval for videos, and 28.19 PSNR with 90.9% classification accuracy for 3D. In downstream applications, AToken enables both visual generation tasks (e.g., image generation with continuous and discrete tokens, text-to-video generation, image-to-3D synthesis) and understanding tasks (e.g., multimodal LLMs), achieving competitive performance across all benchmarks. These results shed light on the next-generation multimodal AI systems built upon unified visual tokenization.
- RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
This paper presents RynnVLA-001, a vision-language-action(VLA) model built upon large-scale video generative pretraining from human demonstrations. We propose a novel two-stage pretraining methodology. The first stage, Ego-Centric Video Generative Pretraining, trains an Image-to-Video model on 12M ego-centric manipulation videos to predict future frames conditioned on an initial frame and a language instruction. The second stage, Human-Centric Trajectory-Aware Modeling, extends this by jointly predicting future keypoint trajectories, thereby effectively bridging visual frame prediction with action prediction. Furthermore, to enhance action representation, we propose ActionVAE, a variational autoencoder that compresses sequences of actions into compact latent embeddings, reducing the complexity of the VLA output space. When finetuned on the same downstream robotics datasets, RynnVLA-001 achieves superior performance over state-of-the-art baselines, demonstrating that the proposed pretraining strategy provides a more effective initialization for VLA models.
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
Recent video diffusion models demonstrate strong potential in spatial intelligence tasks due to their rich latent world priors. However, this potential is hindered by their limited controllability and geometric inconsistency, creating a gap between their strong priors and their practical use in 3D/4D tasks. As a result, current approaches often rely on retraining or fine-tuning, which risks degrading pretrained knowledge and incurs high computational costs. To address this, we propose WorldForge, a training-free, inference-time framework composed of three tightly coupled modules. Intra-Step Recursive Refinement introduces a recursive refinement mechanism during inference, which repeatedly optimizes network predictions within each denoising step to enable precise trajectory injection. Flow-Gated Latent Fusion leverages optical flow similarity to decouple motion from appearance in the latent space and selectively inject trajectory guidance into motion-related channels. Dual-Path Self-Corrective Guidance compares guided and unguided denoising paths to adaptively correct trajectory drift caused by noisy or misaligned structural signals. Together, these components inject fine-grained, trajectory-aligned guidance without training, achieving both accurate motion control and photorealistic content generation. Extensive experiments across diverse benchmarks validate our method's superiority in realism, trajectory consistency, and visual fidelity. This work introduces a novel plug-and-play paradigm for controllable video synthesis, offering a new perspective on leveraging generative priors for spatial intelligence.
- RecoWorld: Building Simulated Environments for Agentic Recommender Systems
We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multimodal, and semantic ID modeling, and discuss how multi-turn RL enables the recommender to refine its strategies through iterative interactions. RecoWorld also supports multi-agent simulations, allowing creators to simulate the responses of targeted user populations. It marks an important first step toward recommender systems where users and agents collaboratively shape personalized information streams. We envision new interaction paradigms where "user instructs, recommender responds," jointly optimizing user retention and engagement.
- MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Current instruction-based image editing (IBIE) methods struggle with challenging editing tasks, as both editing types and sample counts of existing datasets are limited. Moreover, traditional dataset construction often contains noisy image-caption pairs, which may introduce biases and limit model capabilities in complex editing scenarios. To address these limitations, we introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality image editing samples. It encompasses 6 challenging editing tasks through a diverse collection of 18 non-style-transfer editing types and 38 style transfer operations, covering a spectrum from sophisticated style transfer to complex semantic operations like person reference editing and in-image text editing. We employ a novel dataset construction pipeline that utilizes two multi-modal large language models (MLLMs) to generate visual-adaptive editing instructions and produce high-fidelity edited images, respectively. Extensive experiments demonstrate that fine-tuning foundational open-source models with our MultiEdit-Train set substantially improves models' performance on sophisticated editing tasks in our proposed MultiEdit-Test benchmark, while effectively preserving their capabilities on the standard editing benchmark. We believe MultiEdit provides a valuable resource for advancing research into more diverse and challenging IBIE capabilities. Our dataset is available at https://huggingface.co/datasets/inclusionAI/MultiEdit.
- EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing
Instruction-based image editing has advanced rapidly, yet reliable and interpretable evaluation remains a bottleneck. Current protocols either (i) depend on paired reference images -- resulting in limited coverage and inheriting biases from prior generative models -- or (ii) rely solely on zero-shot vision-language models (VLMs), whose prompt-based assessments of instruction following, content consistency, and visual quality are often imprecise. To address this, we introduce EdiVal-Agent, an automated, scalable, and fine-grained evaluation framework for multi-turn instruction-based editing from an object-centric perspective, supported by a suite of expert tools. Given an image, EdiVal-Agent first decomposes it into semantically meaningful objects, then synthesizes diverse, context-aware editing instructions. For evaluation, it integrates VLMs with open-vocabulary object detectors to assess instruction following, uses semantic-level feature extractors to evaluate content consistency, and leverages human preference models to judge visual quality. We show that combining VLMs with object detectors yields stronger agreement with human judgments in instruction-following evaluation compared to using VLMs alone and CLIP-based metrics. Furthermore, the pipeline's modular design allows future tools to be seamlessly integrated, enhancing evaluation accuracy over time. Instantiating this pipeline, we build EdiVal-Bench, a multi-turn editing benchmark covering 9 instruction types and 11 state-of-the-art editing models spanning autoregressive (AR) (including Nano Banana, GPT-Image-1), flow-matching, and diffusion paradigms. We demonstrate that EdiVal-Agent can be used to identify existing failure modes, thereby informing the development of the next generation of editing models. Project page: https://tianyucodings.github.io/EdiVAL-page/.
- Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
- Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
Materials characterization is fundamental to acquiring materials information, revealing the processing-microstructure-property relationships that guide material design and optimization. While multimodal large language models (MLLMs) have recently shown promise in generative and predictive tasks within materials science, their capacity to understand real-world characterization imaging data remains underexplored. To bridge this gap, we present MatCha, the first benchmark for materials characterization image understanding, comprising 1,500 questions that demand expert-level domain expertise. MatCha encompasses four key stages of materials research comprising 21 distinct tasks, each designed to reflect authentic challenges faced by materials scientists. Our evaluation of state-of-the-art MLLMs on MatCha reveals a significant performance gap compared to human experts. These models exhibit degradation when addressing questions requiring higher-level expertise and sophisticated visual perception. Simple few-shot and chain-of-thought prompting struggle to alleviate these limitations. These findings highlight that existing MLLMs still exhibit limited adaptability to real-world materials characterization scenarios. We hope MatCha will facilitate future research in areas such as new material discovery and autonomous scientific agents. MatCha is available at https://github.com/FreedomIntelligence/MatCha.
- Agentic Software Engineering: Foundational Pillars and a Research Roadmap
Agentic Software Engineering (SE 3.0) represents a new era where intelligent agents are tasked not with simple code generation, but with achieving complex, goal-oriented SE objectives. To harness these new capabilities while ensuring trustworthiness, we must recognize a fundamental duality within the SE field in the Agentic SE era, comprising two symbiotic modalities: SE for Humans and SE for Agents. This duality demands a radical reimagining of the foundational pillars of SE (actors, processes, tools, and artifacts) which manifest differently across each modality. We propose two purpose-built workbenches to support this vision. The Agent Command Environment (ACE) serves as a command center where humans orchestrate and mentor agent teams, handling outputs such as Merge-Readiness Packs (MRPs) and Consultation Request Packs (CRPs). The Agent Execution Environment (AEE) is a digital workspace where agents perform tasks while invoking human expertise when facing ambiguity or complex trade-offs. This bi-directional partnership, which supports agent-initiated human callbacks and handovers, gives rise to new, structured engineering activities (i.e., processes) that redefine human-AI collaboration, elevating the practice from agentic coding to true agentic software engineering. This paper presents the Structured Agentic Software Engineering (SASE) vision, outlining several of the foundational pillars for the future of SE. The paper culminates in a research roadmap that identifies a few key challenges and opportunities while briefly discussing the resulting impact of this future on SE education. Our goal is not to offer a definitive solution, but to provide a conceptual scaffold with structured vocabulary to catalyze a community-wide dialogue, pushing the SE community to think beyond its classic, human-centric tenets toward a disciplined, scalable, and trustworthy agentic future.
Solidot(15)
- 汽车行业制造了远超需求的汽车
在有 2100 万人口的成都郊区,ZCAR竹子买车正以惊人的折扣价出售汽车,有 5000 辆汽车可供客户挑选。国产奥迪车五折,一汽的一款七座 SUV 车型四折出售。ZCAR 声称它是从汽车制造商和经销商批量收购的,而之所以能提供如此低的价格是因此汽车行业产能过剩。调查显示,中国汽车行业的产量远超市场需求,因为行业生产目标受到政府政策而非消费者需求的影响。电动汽车的起售价低于 1 万美元,而在美国大部分电动汽车售价在 3.5 万美元以上。滞销车辆最终流向了像 Zcar 之类的交易商。业内人士和分析师认为汽车行业可能会出现类似房地产和光伏行业的动荡。产能过剩的问题十分突出:根据咨询机构盖世汽车研究院的数据,中国汽车制造商的工厂产能是去年实际产量——2750 万辆——的两倍。
- Steam 将从 2026 年起不再支持 32 位 Windows 操作系统
Valve 通过支持文档宣布,Steam 将从 2026 年起不再支持 32 位 Windows 操作系统。文档称,自 2026 年 1 月 1 日起 Steam 将停止支持 32 位 Windows。32 位 Windows 10 是目前 Steam 唯一支持的 32 位版本,在 Steam 硬件调查报告该操作系统所占比例仅为 0.01%。64 位 Windows 10 仍然会得到支持,而 32 位游戏仍可运行。现有的 Steam 客户端短期内仍可在 32 位 Windows 10 上运行,但将不再接收任何更新,Steam 也无法保证未来能继续正常运行。Valve 督促用户升级到 64 位版本,未来 Steam 将只支持 64 位操作系统。
- 新材料拉伸率达到 46 倍且能自我修复
国立阳明交通大学的研究人员在《Advanced Functional Materials》期刊上报告,他们研发出一种新型材料,拉伸率能达到原始长度的 46 倍。即使断开了,只要在室温下将断裂的部分轻压在一起,10 分钟内能完全恢复形状和拉伸性。这种具有粘性和弹性的聚氨酯有机凝胶材料由 covalently linked cellulose nanocrystals (CNCs) 和 modified mechanically interlocked molecules (MIMs)组合而成。凝胶对拉伸或加热等外力敏感,颜色会根据材料是处于静止状态还是受到刺激而从橙色变为蓝色。其独特的特性让这种凝胶具有广泛的应用前景,包括柔性电子皮肤、软体机器人以及防伪方案等。
- 2025 年度搞笑诺贝尔奖宣布
2025 年度搞笑诺贝尔奖(Ig Nobel)公布了获奖名单。搞笑诺贝尔奖创建于 1991 年,是对诺贝尔奖的善意戏仿,表彰那些令人发笑但又发人深思的研究。 文学奖授予了已故的 William B. Bean 医生,他记录和分析了一个指甲在 35 年中的生长速度,为此在医学期刊上发表了五篇论文——第一篇是 1953 年,最后一篇是 1980 年,他的儿子代替他领奖; 心理学奖授予了 Marcin Zajenkowsk 和 Gilles Gignac,其研究是告诉自恋者他们很聪明时会发生什么; 营养学奖授予了 Daniele Dendi 等人,他们研究了彩虹鬣蜥在多哥海滨度假胜地选择吃哪种披萨; 儿科学奖授予了 Julie Mennella 和 Gary Beauchamp,他们研究了哺乳期的母亲食用大蒜后婴儿的感受; 化学奖授予了 Rotem Naftalovich 等人,他们研究了食用塑料特氟龙作为一种食物体积和饱腹感而不增加卡路里的方法; 和平奖授予了 Fritz Renner 等人,他们证明了喝酒有时能提高一个人说外语的能力; 工程设计奖授予了 Vikash Kumar 和 Sarthak Mittal,他们研究了通过重新设计鞋架去解决臭鞋问题; 航空奖授予了 Francisco Sánchez 等人,他们研究了饮酒是否会影响蝙蝠的飞行能力和回声定位能力 物理学奖授予 Giacomo Bartolucci 等人,他们研究了意大利面酱的物理学,发现导致结块的相变可能会造成不良体验; 生物学奖授予了儿岛朋贵等日本科学家,他们研究发现,将黑毛和牛的身体涂成类似斑马的条纹状,可以使吸血的厩螫蝇等害虫难以靠近。这有望成为不依赖杀虫剂的害虫防治新方法。这是日本连续 19 年获得搞笑诺贝尔奖。研究团队用 6 头黑毛和牛进行了实验。他们将牛分为三组:一组用白色水性涂料涂成条纹;另一组用黑色涂料涂成不明显的条纹;第三组不涂任何条纹。随后比较了三组牛身上聚集的苍蝇数量,以及甩头、摆尾等驱赶苍蝇的行为次数。结果显示,有黑白条纹的牛身上聚集的苍蝇数量是其他两组的一半,且驱赶行为的次数也较少。但这一现象背后的原理未知。
- Google 为美国用户的 Chrome 浏览器集成 Gemini AI 功能
Google 官方博客宣布为所有美国用户的 Chrome 桌面浏览器集成 Gemini AI 功能。浏览器添加了一个瞩目的 Gemini 按钮,点击之后用户可以与 Gemini 聊天机器人进行对话,它能回答当前网页内容相关的问题,也能综合多个网页的信息。不喜欢该功能的用户也可以在界面移除 Gemini 按钮。Google 还计划未来为 Gemini 引入更强大的功能,如控制浏览器光标执行将商品添加到购物车等任务。
- 三星推送软件更新为冰箱加入广告
三星在美国上市了九款 Family Hub 冰箱,建议零售价从 1,800 美元到 3,500 美元。这些冰箱配备了 21.5 英寸或 32 英寸的显示屏,用户可选择显示各种内容。本周三星向 Family Hub 冰箱推送了软件更新,开始用这些显示屏展示广告。三星在一份声明中表示,它正在美国市场开展一项试点,在部分三星 Family Hub 冰箱型号上投放促销和精选广告。三星表示如果客户不喜欢某个广告,他们可以移除,之后该广告就不会再次展示。如果用户配置显示屏的 Cover Screen 展示 Art Mode 或相册,广告也不会展示。今年早些时候三星曾宣称它没有在冰箱显示屏上展示广告的计划,但显然食言了。
- 斑胸草雀具有语义理解能力
根据发表在《科学》期刊上的一项研究,斑胸草雀不仅能区分其同类的所有鸣叫,还能根据其含义进行归类。这些结果提示,斑胸草雀具有惊人的语义理解水平。许多群居动物会利用丰富的鸣叫方式来表达其需求、情感和环境意识。斑胸草雀是一种高度社会化的鸣禽;它们在多元的社会行为中会发出约 11 种不同类型的叫声。为测试成年斑胸草雀如何对其同类的叫声进行分类,研究人员对 12 只斑胸草雀进行了一项实验;在该实验中,这些斑胸草雀必须区分一种可获奖励的鸣叫与其他十种非奖励鸣叫,包括来自其他不熟悉物种的鸣叫。他们发现,这些鸟具有卓越的区分其鸣叫库中所有类型鸣叫的能力,表明它们能够准确地感知其同类的鸣叫信号并对其进行分类。
- 英伟达向英特尔投资 50 亿美元
在美国政府收购英特尔 10% 股份一个月之后,英伟达宣布斥资 50 亿美元、以每股 23.28 美元的价格收购英特尔普通股,双方还同意合作开发用于 PC 和数据中心的新 AI 芯片,其中用于 PC 的芯片将集成英特尔的 CPU 和英伟达 GPU。双方的声明并未提及英伟达是否会使用英特尔的芯片工厂生产芯片。英伟达是芯片行业市值最高的企业,它自己不制造芯片,而是主要靠台积电等公司代工,英特尔正在推动自己的芯片代工业务,但进展甚微。
- 研究发现珊瑚无法在一个更温暖的世界里生存下来
根据发表在《自然》期刊上的一项研究,如果全球气温继续上升,到本世纪末大西洋几乎所有珊瑚都将停止生长。英国埃克塞特大学等研究机构的研究人员分析了大西洋 400 多个珊瑚礁,他们估计即使在乐观的气候暖化情景下,到 2040 年该地区逾七成的珊瑚礁也将开始死亡。如果到本世纪地球气温比工业化前水平升高超过 2 摄氏度,该地区 99% 的珊瑚礁将面临同样命运。地球气温目前已比工业化前水平高约 1.3 摄氏度。珊瑚的死亡具有深远影响。珊瑚组成的礁为鱼类等海洋生物提供栖息地,是抵御海浪的屏障,帮助保护海岸线应对海平面上升的冲击。而四分之一的海洋生物依赖于珊瑚礁,逾十亿人受益珊瑚礁。
- DeepSeek 发表 R1 模型论文,称训练成本仅 29.4 万美元
DeepSeek 的研究人员在《自然》期刊上发表了 R1 模型论文《DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning》。研究人员披露 R1 的训练成本仅 29.4 万美元,但其基础模型花了约 600 万美元;R1 主要使用英伟达的 H800 AI 芯片训练,该芯片自 2023 年起被禁止出口到中国。DeepSeek 的主要创新是使用名叫纯强化学习(pure reinforcement learning)的方法自动化试错,对模型得出正确答案进行奖励,而不是教它遵循人类选择的推理示例。模型还使用名叫 group relative policy optimization 的方法给自己打分。对于今年早些使用 OpenAI 指责 DeepSeek 使用其模型的输出进行训练,研究人员予以否认。DeepSeek-R1 是 Hugging Face 上最受欢迎的模型之一,下载量达到 1090 万次,2025 年使用强化学习的大模型几乎都受到了 R1 的启发。
- 全球变暖致日本危险性高温日增加 22 天
美国气候研究机构气候中心(Climate Central)公布分析结果称,受全球变暖影响,今年 6-8 月日本观测到的“危险性高温日”达到 62 天,比未发生气候变暖的情况增加 22 天。该机构指出“若不及时减少温室气体排放,各地的生态系统和经济将遭受更多损害。”研究团队将各地在 1991-2020 年观测到的气温中,排名前 10% 的高温或超过该数值的日子定义为“危险性高温日”。这类高温超出人们日常适应范围,是中暑和死亡风险上升的参考指标。分析结果显示,在全球范围内约 9.5 亿人因气候变暖多经历了 30 天以上的“危险性高温日”。在日本以外的国家和地区中,增加天数最多是 59 天的牙买加(共 74 天)和开曼群岛(共 66 天),56 天的海地(共 66 天)等紧随其后。
- NASA 确认了逾六千颗系外行星
根据 NASA Exoplanet Archive 的数据,NASA 确认的系外行星数量达到了 6007 颗。最早的系外行星是在 1990 年代初确认的,2019 年确认的系外行星数量突破了 4000 颗,2022 年突破 5000 颗,3 年后的 2025 年 9 月突破了 6000 颗,而 NASA 的数据库里还有 TESS (Transiting Exoplanet Survey Satellite)太空望远镜探测到的 7668 颗候选行星等待确认。最新确认的一颗系外行星是 KMT-2023-BLG-1896L b,属于类海王星行星,质量约 16.35 倍于地球。
- 最黑暗的夜晚愈来愈亮
天空亮度有一个分类法叫 Bortle scale,以纽约业余天文学家 John E. Bortle 的名字命名,这套分类法共分 9 级,无人造光的最黑暗地区为 1 级,市中心为 9 级。大部分人一生中生活在亮度为 5 级以上的环境中,而今天越来越多的人生活在 7 级、8 级和 9 级亮度的地区。更明亮的 LED 灯的普及也让光污染愈发严重。近期的一项研究估计,从 2011-2022 年,全球光污染每年增长 10%,约每八年翻一番。尽管如此,Bortle 1 级的黑暗区仍然存在,比如澳大利亚内陆和智利北部的 Atacama 沙漠。但智利也面临在天文台附近建造采矿项目的压力。此外互联网卫星也给天文观测带来了额外的挑战:地球轨道上运行的卫星数量已从几百颗增加到 1.2 万颗,天文学家预测十年内卫星数量将达到 10 万颗以上。
- 电视的黄金时代可能已经结束
FX 的研究部门自 2009 年以来一直跟踪英语剧本类电视剧的制作数量。根据其统计,剧本类电视剧制作数量在 2022 年达到峰值的 599 部,此后呈下降趋势。大受好评的新剧数量急剧下降,而流媒体平台则转向优先制作引起轰动的剧集的续集。这些获得续订的剧集也扩大了规模,大幅增加了制作预算。《Severance》第二季的制作费用达到了 2 亿美元,而《怪奇物语》第四季则高达 2.7 亿美元。Netflix 从 2018 年起大量制作非剧本类内容如纪录片和真人秀。与此同时,主要靠广告支持的免费视频平台 YouTube 则成为了一个巨头,不断扩大市场份额。YouTube 在流媒体观众数量方面领先于 Netflix、Paramount+ 和 Hulu。
- 极端高温催生新法律保护工人
根据世卫组织和世界气象组织的报告,全球有超过 24 亿工人暴露在极端高温下,每年造成超过 2285 万起职业伤害,有 1.9 万人死于与高温相关的工伤,世界各国政府正在实施保护工人免受日益加剧的热应激影响的法律。日本对未能在湿球温度达到 28 摄氏度时提供降温措施的雇主处以 3400 美元的罚款。新加坡要求大型户外场所安装时间分辨率为小时的温度传感器,要求在湿球温度达到 33 摄氏度时每小时休息 15分 钟。今年夏天希腊、意大利和西班牙的气温达到了 47 摄氏度,这些南欧国家下令下午停工。