DIGEST · 2025-10-30

OrangeBot.AI Digest — 2025-10-30

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Jujutsu at Google [video] (www.youtube.com)
  2. I have released a 69.0MB version of Windows 7 x86 (twitter.com)
  3. Some people can't see mental images (www.newyorker.com)
  4. The ear does not do a Fourier transform (www.dissonances.blog)
  5. Falling panel prices lead to global solar boom, except for the US (arstechnica.com)
  6. Show HN: I made a heatmap diff viewer for code reviews (0github.com)
  7. Qt Creator 18 Released (www.qt.io)
  8. Affinity Studio now free (www.affinity.studio)
  9. Free software scares normal people (danieldelaney.net)
  10. Ventoy: Create bootable USB drive for ISO/WIM/IMG/VHD(x)/EFI Files (github.com)
  11. US declines to join more than 70 countries in signing UN cybercrime treaty (therecord.media)
  12. The International Criminal Court wants to become independent of USA technology (www.heise.de)
  13. 987654321 / 123456789 (www.johndcook.com)
  14. Show HN: In a single HTML file, an app to encourage my children to invest (roberdam.com)
  15. Alphabet tops $100B quarterly revenue for first time, cloud grows 34% (www.cnbc.com)

GitHub Trending(15)

  1. helm / helm

    The Kubernetes Package Manager

  2. storybookjs / storybook

    Storybook is the industry standard workshop for building, documenting, and testing UI components in isolation

  3. open-telemetry / opentelemetry-collector

    OpenTelemetry Collector

  4. qeeqbox / social-analyzer

    API, CLI, and Web App for analyzing and finding a person's profile in 1000 social media \ websites

  5. patchy631 / ai-engineering-hub

    In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

  6. yhirose / cpp-httplib

    A C++ header-only HTTP/HTTPS server and client library

  7. allenai / olmocr

    Toolkit for linearizing PDFs for LLM datasets/training

  8. Project-MONAI / MONAI

    AI Toolkit for Healthcare Imaging

  9. janhq / jan

    Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.

  10. mem0ai / mem0

    Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

  11. iam-veeramalla / aws-devops-zero-to-hero

    AWS zero to hero repo for devops engineers to learn AWS in 30 Days. This repo includes projects, presentations, interview questions and real time examples.

  12. microsoft / agent-lightning

    The absolute trainer to light up AI agents.

  13. Tencent / WeKnora

    LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

  14. projectdiscovery / nuclei-templates

    Community curated list of templates for the nuclei engine to find security vulnerabilities.

  15. microsoft / Web-Dev-For-Beginners

    24 Lessons, 12 Weeks, Get Started as a Web Developer

Hugging Face(15)

  1. JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

    The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the rich visual outputs that programs generate. This visual dimension is critical for advanced applications like flexible content generation and precise, program-driven editing of visualizations. However, progress has been impeded by the scarcity of high-quality multimodal code data, a bottleneck stemming from challenges in synthesis and quality assessment. To address these challenges, we make contributions from both a data and modeling perspective. We first introduce a complete synthesis toolkit that leverages reciprocal synergies between data modalities to efficiently produce a large-scale, high-quality corpus spanning from standard charts to complex interactive web UIs and code-driven animations. Leveraging this toolkit, we construct JanusCode-800K, the largest multimodal code corpus to date. This powers the training of our models, JanusCoder and JanusCoderV, which establish a visual-programmatic interface for generating code from textual instructions, visual inputs, or a combination of both. Our unified model is a departure from existing approaches that build specialized models for isolated tasks. Extensive experiments on both text-centric and vision-centric coding tasks demonstrate the superior performance of the JanusCoder series, with our 7B to 14B scale models approaching or even exceeding the performance of commercial models. Furthermore, extensive analysis provides key insights into harmonizing programmatic logic with its visual expression. Our code and checkpoints will are available at https://github.com/InternLM/JanusCoder.

  2. Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

    Recent advances in image reasoning methods, particularly "Thinking with Images", have demonstrated remarkable success in Multimodal Large Language Models (MLLMs); however, this dynamic reasoning paradigm has not yet been extended to video reasoning tasks. In this paper, we propose Video-Thinker, which empowers MLLMs to think with videos by autonomously leveraging their intrinsic "grounding" and "captioning" capabilities to generate reasoning clues throughout the inference process. To spark this capability, we construct Video-Thinker-10K, a curated dataset featuring autonomous tool usage within chain-of-thought reasoning sequences. Our training strategy begins with Supervised Fine-Tuning (SFT) to learn the reasoning format, followed by Group Relative Policy Optimization (GRPO) to strengthen this reasoning capability. Through this approach, Video-Thinker enables MLLMs to autonomously navigate grounding and captioning tasks for video reasoning, eliminating the need for constructing and calling external tools. Extensive experiments demonstrate that Video-Thinker achieves significant performance gains on both in-domain tasks and challenging out-of-domain video reasoning benchmarks, including Video-Holmes, CG-Bench-Reasoning, and VRBench. Our Video-Thinker-7B substantially outperforms existing baselines such as Video-R1 and establishes state-of-the-art performance among 7B-sized MLLMs.

  3. Scaling Latent Reasoning via Looped Language Models

    Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model could be found in: http://ouro-llm.github.io.

  4. ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

    Autoformalization, which translates natural language mathematics into machine-verifiable formal statements, is critical for using formal mathematical reasoning to solve math problems stated in natural language. While Large Language Models can generate syntactically correct formal statements, they often fail to preserve the original problem's semantic intent. This limitation arises from the LLM approaches' treating autoformalization as a simplistic translation task which lacks mechanisms for self-reflection and iterative refinement that human experts naturally employ. To address these issues, we propose ReForm, a Reflective Autoformalization method that tightly integrates semantic consistency evaluation into the autoformalization process. This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors through progressive refinement. To effectively train this reflective model, we introduce Prospective Bounded Sequence Optimization (PBSO), which employs different rewards at different sequence positions to ensure that the model develops both accurate autoformalization and correct semantic validations, preventing superficial critiques that would undermine the purpose of reflection. Extensive experiments across four autoformalization benchmarks demonstrate that ReForm achieves an average improvement of 17.2 percentage points over the strongest baselines. To further ensure evaluation reliability, we introduce ConsistencyCheck, a benchmark of 859 expert-annotated items that not only validates LLMs as judges but also reveals that autoformalization is inherently difficult: even human experts produce semantic errors in up to 38.5% of cases.

  5. Reasoning-Aware GRPO using Process Mining

    Reinforcement learning (RL)-based post-training has been crucial for enabling multi-step reasoning in large reasoning models (LRMs), yet current reward schemes are typically outcome-centric. We propose PM4GRPO, a reasoning-aware Group Relative Policy Optimization (GRPO) that augments standard answer/format rewards with signals over the reasoning procedure. To this end, process mining techniques are utilized to compute a scalar conformance reward that measures how closely a policy model's reasoning aligns with the pretrained teacher model. The empirical results on five benchmarks demonstrate that PM4GRPO significantly outperforms existing methodologies for GRPO-based post-training. These results highlight that leveraging process mining for reasoning-aware GRPO effectively enhances the reasoning capabilities of policy models.

  6. The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

    Real-world language agents must handle complex, multi-step workflows across diverse Apps. For instance, an agent may manage emails by coordinating with calendars and file systems, or monitor a production database to detect anomalies and generate reports following an operating manual. However, existing language agent benchmarks often focus on narrow domains or simplified tasks that lack the diversity, realism, and long-horizon complexity required to evaluate agents' real-world performance. To address this gap, we introduce the Tool Decathlon (dubbed as Toolathlon), a benchmark for language agents offering diverse Apps and tools, realistic environment setup, and reliable execution-based evaluation. Toolathlon spans 32 software applications and 604 tools, ranging from everyday platforms such as Google Calendar and Notion to professional ones like WooCommerce, Kubernetes, and BigQuery. Most of the tools are based on a high-quality set of Model Context Protocol (MCP) servers that we may have revised or implemented ourselves. Unlike prior works, which primarily ensure functional realism but offer limited environment state diversity, we provide realistic initial environment states from real software, such as Canvas courses with dozens of students or real financial spreadsheets. This benchmark includes 108 manually sourced or crafted tasks in total, requiring interacting with multiple Apps over around 20 turns on average to complete. Each task is strictly verifiable through dedicated evaluation scripts. Comprehensive evaluation of SOTA models highlights their significant shortcomings: the best-performing model, Claude-4.5-Sonnet, achieves only a 38.6% success rate with 20.2 tool calling turns on average, while the top open-weights model DeepSeek-V3.2-Exp reaches 20.1%. We expect Toolathlon to drive the development of more capable language agents for real-world, long-horizon task execution.

  7. VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning

    Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-effect paradigm, which is resource-intensive and fundamentally incapable of generalizing to unseen effects, thus limiting scalability and creation. To address this challenge, we introduce VFXMaster, the first unified, reference-based framework for VFX video generation. It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content. In addition, it demonstrates remarkable generalization to unseen effect categories. Specifically, we design an in-context conditioning strategy that prompts the model with a reference example. An in-context attention mask is designed to precisely decouple and inject the essential effect attributes, allowing a single unified model to master the effect imitation without information leakage. In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly. Extensive experiments demonstrate that our method effectively imitates various categories of effect information and exhibits outstanding generalization to out-of-domain effects. To foster future research, we will release our code, models, and a comprehensive dataset to the community.

  8. Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

    We propose Ming-Flash-Omni, an upgraded version of Ming-Omni, built upon a sparser Mixture-of-Experts (MoE) variant of Ling-Flash-2.0 with 100 billion total parameters, of which only 6.1 billion are active per token. This architecture enables highly efficient scaling (dramatically improving computational efficiency while significantly expanding model capacity) and empowers stronger unified multimodal intelligence across vision, speech, and language, representing a key step toward Artificial General Intelligence (AGI). Compared to its predecessor, the upgraded version exhibits substantial improvements across multimodal understanding and generation. We significantly advance speech recognition capabilities, achieving state-of-the-art performance in contextual ASR and highly competitive results in dialect-aware ASR. In image generation, Ming-Flash-Omni introduces high-fidelity text rendering and demonstrates marked gains in scene consistency and identity preservation during image editing. Furthermore, Ming-Flash-Omni introduces generative segmentation, a capability that not only achieves strong standalone segmentation performance but also enhances spatial control in image generation and improves editing consistency. Notably, Ming-Flash-Omni achieves state-of-the-art results in text-to-image generation and generative segmentation, and sets new records on all 12 contextual ASR benchmarks, all within a single unified architecture.

  9. RegionE: Adaptive Region-Aware Generation for Efficient Image Editing

    Recently, instruction-based image editing (IIE) has received widespread attention. In practice, IIE often modifies only specific regions of an image, while the remaining areas largely remain unchanged. Although these two types of regions differ significantly in generation difficulty and computational redundancy, existing IIE models do not account for this distinction, instead applying a uniform generation process across the entire image. This motivates us to propose RegionE, an adaptive, region-aware generation framework that accelerates IIE tasks without additional training. Specifically, the RegionE framework consists of three main components: 1) Adaptive Region Partition. We observed that the trajectory of unedited regions is straight, allowing for multi-step denoised predictions to be inferred in a single step. Therefore, in the early denoising stages, we partition the image into edited and unedited regions based on the difference between the final estimated result and the reference image. 2) Region-Aware Generation. After distinguishing the regions, we replace multi-step denoising with one-step prediction for unedited areas. For edited regions, the trajectory is curved, requiring local iterative denoising. To improve the efficiency and quality of local iterative generation, we propose the Region-Instruction KV Cache, which reduces computational cost while incorporating global information. 3) Adaptive Velocity Decay Cache. Observing that adjacent timesteps in edited regions exhibit strong velocity similarity, we further propose an adaptive velocity decay cache to accelerate the local denoising process. We applied RegionE to state-of-the-art IIE base models, including Step1X-Edit, FLUX.1 Kontext, and Qwen-Image-Edit. RegionE achieved acceleration factors of 2.57, 2.41, and 2.06. Evaluations by GPT-4o confirmed that semantic and perceptual fidelity were well preserved.

  10. The Principles of Diffusion Models

    This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the monograph discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.

  11. ODesign: A World Model for Biomolecular Interaction Design

    Biomolecular interactions underpin almost all biological processes, and their rational design is central to programming new biological functions. Generative AI models have emerged as powerful tools for molecular design, yet most remain specialized for individual molecular types and lack fine-grained control over interaction details. Here we present ODesign, an all-atom generative world model for all-to-all biomolecular interaction design. ODesign allows scientists to specify epitopes on arbitrary targets and generate diverse classes of binding partners with fine-grained control. Across entity-, token-, and atom-level benchmarks in the protein modality, ODesign demonstrates superior controllability and performance to modality-specific baselines. Extending beyond proteins, it generalizes to nucleic acid and small-molecule design, enabling interaction types such as protein-binding RNA/DNA and RNA/DNA-binding ligands that were previously inaccessible. By unifying multimodal biomolecular interactions within a single generative framework, ODesign moves toward a general-purpose molecular world model capable of programmable design. ODesign is available at https://odesign.lglab.ac.cn ,

  12. ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

    Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.

  13. Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

    Humans possess spatial reasoning abilities that enable them to understand spaces through multimodal observations, such as vision and sound. Large multimodal reasoning models extend these abilities by learning to perceive and reason, showing promising performance across diverse spatial tasks. However, systematic reviews and publicly available benchmarks for these models remain limited. In this survey, we provide a comprehensive review of multimodal spatial reasoning tasks with large models, categorizing recent progress in multimodal large language models (MLLMs) and introducing open benchmarks for evaluation. We begin by outlining general spatial reasoning, focusing on post-training techniques, explainability, and architecture. Beyond classical 2D tasks, we examine spatial relationship reasoning, scene and layout understanding, as well as visual question answering and grounding in 3D space. We also review advances in embodied AI, including vision-language navigation and action models. Additionally, we consider emerging modalities such as audio and egocentric video, which contribute to novel spatial understanding through new sensors. We believe this survey establishes a solid foundation and offers insights into the growing field of multimodal spatial reasoning. Updated information about this survey, codes and implementation of the open benchmarks can be found at https://github.com/zhengxuJosh/Awesome-Spatial-Reasoning.

  14. Parallel Loop Transformer for Efficient Test-Time Computation Scaling

    Large Language Models (LLMs) are powerful but often too slow and costly for real-world use during inference. Looped transformers save on parameters by reusing the same weights for multiple computational steps, or "loops." However, this approach has a major flaw: the loops run one after another, causing inference latency and memory requirements to increase with each added loop. This makes them impractical for fast applications. To solve this problem, we introduce the Parallel Loop Transformer (PLT). PLT is a new architecture that delivers the performance benefits of a deep, looped model but with the low latency of a standard, non-looped model. PLT works using two key techniques. First, Cross-Loop Parallelism (CLP) breaks the sequential dependency by computing different loops for different tokens at the same time, all within a single pass. Second, to prevent memory costs from growing, we use an Efficient Representation Enhancement strategy. This method shares the memory (KV cache) from the first loop with all other loops. It then uses a Gated Sliding-Window Attention (G-SWA) to combine this shared global information with local information, maintaining high accuracy. Our experiments show that PLT achieves the high accuracy of a traditional looped model but with almost no extra latency or memory cost compared to a standard transformer.

  15. PairUni: Pairwise Training for Unified Multimodal Language Models

    Unified vision-language models (UVLMs) must perform both understanding and generation within a single architecture, but these tasks rely on heterogeneous data and supervision, making it difficult to balance them during reinforcement learning (RL). We propose PairUni, a unified framework that reorganizes data into understanding-generation (UG) pairs and aligns optimization accordingly. We first use GPT-o3 to augment single-task data, generating captions for understanding samples and question-answer (QA) pairs for generation samples, forming aligned pairs from the same instance. Additionally, for each generation sample, we retrieve a semantically related understanding example to form a retrieved pair, linking different but related data points. These paired structures expose cross-task semantic correspondences and support consistent policy learning. To leverage this structure, we present Pair-GPRO, a pair-aware variant based on Group Relative Policy Optimization. It assigns a similarity score to each pair to modulate the advantage, strengthening learning from well-aligned examples and reducing task interference. We curate a high-quality dataset of 16K UG pairs named PairUG for RL fine-tuning and evaluate PairUni on the powerful Janus-Pro UVLMs. Our approach achieves balanced improvements on various UVLMs, outperforming strong UVLM RL baselines. Code: https://github.com/Haochen-Wang409/PairUni{github.com/Haochen-Wang409/PairUni}

Solidot(15)

  1. 微软 XBox 掌机存在大量 Windows 系统问题

    微软和华硕合作推出的两款 XBox 掌机已经上市两周,游戏机运行了一个为掌机优化的 Windows 版本,但玩家很快发现该系统存在很多问题,而这些问题可通过安装 Linux 发行版 Bazzite 解决。玩家报告,白色版本的掌机无法稳定进入睡眠和唤醒模式,无法在睡眠状态下保持电量。微软和华硕都没有承认存在问题,也未给出修复时间表。华硕表示需要更多时间进行测试。玩家还发现,Bazzite 下的游戏性能比 Windows 快 30%。Bazzite 最初也存在睡眠问题,但程序员 Antheas Kapenekakis 咨询了 AMD 的内部人士,在两天内修复了问题。在 Windows 下测试掌机的待机续航,一台掌机在睡眠 12 小时后电量下降了 10%,另一台电量下降 23%。再过 12 小时后,两台掌机都只剩下 30% 电量。其中一台掌机在睡眠状态下尝试更新 Windows。两台掌机都会无法从睡眠状态唤醒,需要强制重启。

  2. Pop!_OS 24.04 LTS 将于 12 月推出

    由 Linux PC 制造商 System76 开发的发行版 Pop!_OS 是基于 Ubuntu LTS 版,它最初使用了一个修改版的 GNOME 桌面环境 COSMIC,但从 Pop!_OS 24.04 LTS 起 COSMIC 使用了 Rust 语言进行了重写,成为一个独立的桌面环境。然而由于 COSMIC 开发的滞后,基于 Ubuntu 24.04 LTS 的 Pop!_OS 24.04 LTS 发布时间也一再延迟。System76 创始人兼 CEO Carl Richell 现在正式宣布了新版本的发布日期:Pop!_OS 24.04 LTS 以及 COSMIC Epoch 1 将于 12 月 11 日正式发布。Carl Richell 表示,从 Pop!_OS 26.04 LTS 起,未来的 Pop!_OS 版本将与 Ubuntu LTS 的发布时间保持一致(大约在 Ubuntu 发布日期后两周)。这意味着下一个 LTS 版本与最新的 LTS 版本仅仅相隔 4 个多月。

  3. Grammarly 改名为 Superhuman

    提供辅助写作工具的 Grammarly 公司宣布改名为 Superhuman,但目前会仍然保留现有产品名称。Grammarly 公司在今年 7 月收购了名叫 Superhuman 的邮件客户端,该客户端采用了订阅制商业模式。Superhuman 公司同时推出了名叫 Superhuman Go 的 AI 助手,能与 Gmail、Jira 和 Google Drive 等工具集成,增强写作能力和自动化生产力任务。Superhuman 表示计划引入更多功能,使 AI 助手能从 CRM 和内部系统等来源获取数据,为邮件内容提供修改建议。

  4. Tor Browser 15.0 释出

    Tor 浏览器项目释出了 v15.0 版本,该版本是首个基于 Firefox ESR 140 的稳定版本,整合了 Firefox 上游版本一年来的更新,包括垂直标签页,统一搜索按钮等新功能和可用性改进。Tor 开发者称他们审查并解决了约 200 个可能对 Tor 浏览器用户的隐私和安全造成负面影响的 Bugzilla 问题,移除了被认为不具有可审计性的 AI 功能。

  5. Fedora Linux 43 释出

    Fedora 发行版项目宣布释出 Fedora Linux 43。新版的主要变化包括:GNOME 桌面环境仅支持 Wayland,移除了 X11 会话支持;但 KDE 桌面环境仍然支持 X11,并为 GNOME 用户提供了继续使用 X11 的权宜方法;如果字体配置文件 fontconfig`中缺少等宽字体,会默认设置一个备用等宽字体以免出现问题;gdk-pixbuf2 使用沙盒图像加载框架 Glycin 改进安全性;Noto Color Emoji 字体使用 COLRv1 格式;Python 3.14,gcc 15.2,Golang 1.25,LLVM 21,Ruby on Rails 8.0,等等。

  6. 英伟达成为第一家市值突破 5 万亿美元的公司

    本周三英伟达成为全球第一家市值突破 5 万亿美元的公司,三个月前该公司市值刚刚突破 4 万亿美元,超越苹果和微软等巨头,而三年前 AI 聊天机器人 ChatGPT 推出前英伟达市值约为 4000 亿美元。作为最大的 AI 芯片供应商,英伟达是受益于这一波 AI 技术发展的最主要公司,但这也引发了 AI 泡沫的担忧。英伟达 CEO 黄仁勋周二表示他不认为世界处于 AI 泡沫之中,认为我们正在使用各种模型,并乐于为此付费。黄仁勋在 GTC 大会上表示,英伟达预计其最新芯片的出货量将达到 2000 万颗,而上一代 Hopper 的总出货量仅为 400 万颗。

  7. 小鼠研究显示头发在 20 天内完全再生

    根据发表在《Cell Metabolism》期刊上的一项研究,小鼠实验显示头发能在 20 天内完全再生。这一突破为未来治愈脱发带来了希望,值得注意的是小鼠的头发周期比人类短得多,因此小鼠的结果能否在人类身上再现还有待观察。研究是基于毛囊再生的内部机制。当皮肤受伤或受到轻微刺激,免疫细胞会进入皮下脂肪组织发出信号促使脂肪细胞释放单不饱和脂肪酸。脂肪酸为毛囊干细胞提供营养,使其充满活力,促进新发生长。研究人员没有采用刺激疗法,而是直接使用了一种含有这些脂肪酸的精华液。

  8. 矮星系发现巨大黑洞

    Segue 1 是一个矮星系,仅包含少数恒星——少到无法提供防止自身外散到太空中所需的引力。与其他矮星系一样,长久以来认为一种称为暗物质的引力是主要的束缚力。 一项新研究推翻了这一假设,挑战了天文学家对矮星系的理解。Segue 1 中心的一个巨大黑洞提供了所需的引力,而不是暗物质,使恒星被其引力束缚在一起。Segue 1 距离银河系仅 75,000 光年,是银河系的近邻,黑洞的质量是太阳质量的 450,000 倍,大约是 Segue 1 中所有恒星质量总和的 10 倍。在大多数星系中,中心黑洞的质量不会超过恒星的质量。一种可能解释是,它曾经是一个有更多恒星的更大星系。然而随着时间的推移,银河系可能已经攫取了大部分恒星,仅少数遗留下来。另一种可能性是,Segue 1 类似于一类新发现的星系,称为小红点,这些星系似乎是在巨大的黑洞和很少的恒星中发展起来的。这些早期星系位于宇宙最遥远的地方而难以研究。有了 Segue 1,天文学家现在在临近处有一个能提供他们观察小红点演变过程的的天体。

  9. 澳大利亚警方开发大模型解码 Z 世代俚语和表情符号

    澳大利亚联邦警方正与微软合作开发 AI 工具解码 Z 世代俚语和表情符号以打击网络剥削和犯罪网红(crimefluencer)。联邦警察总长 Krissy Barrett 警告,以弱势少年少女为目标的年轻网络犯罪团伙正在兴起。她称这些人是犯罪网红,动机是制造混乱和伤害他人,而大多数受害者是少女。她说,他们的动机并非出于经济利益或性满足——纯粹是为了找乐子,或是为了博取关注,没有完全意识到其行为的后果。警方已经确认 59 名犯罪网红,逮捕了其中一部分人,他们的年龄都在 17-20 岁之间。

  10. 外卖正在毁灭美国餐饮业

    根据美国餐饮协会(National Restaurant Association)的统计数据,接近四分之三的餐厅订单不是堂食。2019-2024 年间外卖顾客比例翻了一倍多。一项民意调查显示,41% 受访者表示外卖已成为生活方式中不可或缺的一部分。这种转变从根本上改变了餐饮业的经济模式。外卖公司会向餐厅收取 5%-30% 的佣金,除此之外餐厅还需要付出处理支付的费用、广告费和搜索排名费。Shannon Orr 在西海岸经营着一个拥有八家餐厅的餐饮集团。她旗下的一家餐厅去年外卖销售收入达到 170 万美元,其中 40 万美元流向了外卖公司。她说这家餐厅曾是她最赚钱的餐厅,但 2024 年没有盈利。

  11. 侧载究竟意味着什么?

    在宣布强制性的开发者注册计划后,Google 发表多则声明声称侧载(sideloading)不会消失,但这是真的吗?FOSS 应用商店 F-droid 的开发者认为 Google 的说法并不正确。开发者验证要求实际上剥夺了个人选择在自己的设备上运行哪些软件的权利。而侧载这一术语本身就是人为制造出来的,我们通常称之为“安装”,无论软件是安装在手机上还是安装在电脑上。如果要区分传统方式获取软件和通过 Google Play Store 或 Apple App Store 等中间商平台获取软件,那么可以更准确的说是“直接安装”。侧载这个词带有负面含义,仿佛象征着用户绕过了旨在保护用户安全的机制。根据维基百科的定义:侧载是从非供应商批准的 Web 来源下载应用。根据该定义,Google 声称“侧载不会消失”的说法是完全错误的。供应商——以 Android 认证设备而言就是 Google——将会对应用来源进行审核。消费者选择 Android 很大程度上是源于 Google 做出的开放计算平台的承诺,用户可以自由选择运行任何软件。但从明年开始,Google 将剥夺用户的这一权利。

  12. 接近九成 Windows 游戏能在 Linux 上运行

    根据 ProtonDB 的数据,近九成 Windows 游戏现在能在 Linux 上运行。这一进步受益于 WINE 和 Proton 翻译层开发者的努力,以及对 Steam Deck 等 Linux 掌机的兴趣。ProtonDB 将游戏分为五类:白金级游戏无需任何调整即可完美运行;金级游戏需要进行小的调整;银级游戏可玩但并不完美;Borked 级游戏完全无法运行; 铜级游戏介于银级和 Borked 级之间。数据显示,白金级新游戏数量正在增长,而 Borked 级游戏数量则在减少。很多热门游戏不支持 Linux 主要是因为反作弊软件与 Linux 的不兼容性导致的。

  13. 哈佛本科生六成成绩是 A

    哈佛本科教育办公室发表报告,对成绩膨胀问题发出警告。数据显示,本科课程 60% 的成绩被评为 A。十年前这一比例只有 40%,二十年前不到 25%。其它美国精英大学,包含常春藤盟校,也面临类似的成绩膨胀问题。报告作者、哈佛大学本科生院长 Amanda Claybaugh 敦促教师减少给大多数学生打高分的做法,称此举会破坏学术文化。报告指出,成绩膨胀的原因之一是教师担心自己比其他同行更严格,会导致选课人数减少。报告还指出,哈佛学生有时也会向教授施压要求更高的分数。

  14. Python 基金会坚持 DEI 放弃美国政府的 150 万美元拨款

    今年 1 月 Python 软件基金会(PSF)向美国国家科学基金会(National Science Foundation)的 Safety, Security, and Privacy of Open Source Ecosystems 项目递交了提案,首次申请政府拨款。PSF 为此投入了大量时间和精力。数个月后该提案获得了拨款建议,但有条件,要求推行特朗普政府的反 DEI(多元化、平等及包容)政策。反 DEI 条款不仅适用于申请的拨款项目,还适用于基金会的所有活动。违反条款将会导致拨款资金被追回,即使资金已经花掉了。PSF 认为这构成了巨大的无法估量的财务风险,而 DEI 是基金会的核心价值观。PSF 宣布放弃这笔 150 万美元的政府拨款,但在经济不确定的时代它迫切需要资金,因此 PSF 呼吁用户捐款和成为支持会员(Supporting Member)。

  15. OpenAI 完成公司重组,微软持 27% 股份和访问技术至 2032 年

    微软与 OpenAI 达成新协议,消除了不确定性,为 OpenAI 转型为营利性企业铺平道路。微软将获得 OpenAI 27% 的股权,价值约 1350 亿美元,保留访问其技术至 2032 年,包括实现 AGI 的模型。原来控制 OpenAI 公司的基金会将持有价值约 1300 亿美元的股权。微软是 OpenAI 最大的投资者,共投资约 137.5 亿美元。根据双方的协议,一旦 OpenAI 实现 AGI,经过独立专家小组验证后,微软将不再能 OpenAI 的收入中分得一杯羹。OpenAI 新的云基础设施业务也将不再优先购买微软的 Azure 云服务,不过它承诺会额外购买 250​0 亿美元的 Azure 服务。