DIGEST · 2025-09-01

OrangeBot.AI Digest — 2025-09-01

67 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Patrick Winston: How to Speak (2018) [video] (www.youtube.com)
  2. The future of 32-bit support in the kernel (lwn.net)
  3. Implementing a Foil Sticker Effect (www.4rknova.com)
  4. Adaptive LLM routing under budget constraints (arxiv.org)
  5. Ask HN: Who is hiring? (September 2025)
  6. Israel committing genocide in Gaza, scholars group says (www.aljazeera.com)
  7. The time picker on the iPhone's alarm app isn't circular, it's just a long list (old.reddit.com)
  8. Search engine referral report for 2025 Q2 (radar.cloudflare.com)
  9. Cloudflare Radar: AI Insights (radar.cloudflare.com)
  10. Bear is now source-available (herman.bearblog.dev)
  11. Google AI Overview made up an elaborate story about me (bsky.app)
  12. Making Minecraft Spherical (www.bowerbyte.com)
  13. CocoaPods trunk read-only plan (blog.cocoapods.org)
  14. A review of Nim 2: The good and bad with example code (miguel-martin.com)
  15. The Qweremin (www.linusakesson.net)

GitHub Trending(14)

  1. dockur / windows

    Windows inside a Docker container.

  2. JetBrains / koog

    Koog is the official Kotlin framework for building and running robust, scalable and production-ready AI agents across all platforms – from backend services to Android and iOS, JVM, and even in-browser environments. Koog is based on our AI products expertise and provides proven solutions for complex LLM and AI problems

  3. juspay / hyperswitch

    An open source payments switch written in Rust to make payments fast, reliable and affordable

  4. QuentinFuxa / WhisperLiveKit

    Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI.

  5. activepieces / activepieces

    AI Agents & MCPs & AI Workflow Automation • (280+ MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

  6. lllyasviel / Fooocus

    Focus on prompting and generating

  7. laramies / theHarvester

    E-mails, subdomains and names Harvester - OSINT

  8. humanlayer / humanlayer

    HumanLayer enables AI agents to communicate with humans in tool-based and async workflows. Guarantee human oversight of high-stakes function calls with approval workflows across slack, email and more. Bring your LLM and Framework of choice and start giving your AI agents safe access to the world. Agentic Workflows, human in the loop, tool calling

  9. resemble-ai / chatterbox

    SoTA open-source TTS

  10. OpenBMB / MiniCPM-V

    MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

  11. ashishpatel26 / 500-AI-Agents-Projects

    The 500 AI Agents Projects is a curated collection of AI agent use cases across various industries. It showcases practical applications and provides links to open-source projects for implementation, illustrating how AI agents are transforming sectors such as healthcare, finance, education, retail, and more.

  12. zakirullin / cognitive-load

    🧠 Cognitive Load is what matters

  13. denizsafak / abogen

    Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

  14. bevyengine / bevy

    A refreshingly simple data-driven game engine built in Rust

Product Hunt(8)

  1. xpander.ai

    Backend and Frontend for your AI Agents

  2. JoggAI AvatarX

    AI avatars that truly act like humans

  3. Dhisana AI

    Cursor for Sales Teams

  4. Genspark AI Designer

    Your AI employee that designs anything with one prompt

  5. GitHub Copilot for Raycast

    Delegate tasks and track progress from Raycast

  6. Lumi.new

    All in one platform to build your websites.

  7. AbleMouse

    DIY solution that helps even with complete paralysis

  8. fwt. axis

    6 habits that power balanced living

Hugging Face(15)

  1. R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

    Multimodal Large Language Models (MLLMs) equipped with step-by-step thinking capabilities have demonstrated remarkable performance on complex reasoning problems. However, this thinking process is redundant for simple problems solvable without complex reasoning. To address this inefficiency, we propose R-4B, an auto-thinking MLLM, which can adaptively decide when to think based on problem complexity. The central idea of R-4B is to empower the model with both thinking and non-thinking capabilities using bi-mode annealing, and apply Bi-mode Policy Optimization~(BPO) to improve the model's accuracy in determining whether to activate the thinking process. Specifically, we first train the model on a carefully curated dataset spanning various topics, which contains samples from both thinking and non-thinking modes. Then it undergoes a second phase of training under an improved GRPO framework, where the policy model is forced to generate responses from both modes for each input query. Experimental results show that R-4B achieves state-of-the-art performance across 25 challenging benchmarks. It outperforms Qwen2.5-VL-7B in most tasks and achieves performance comparable to larger models such as Kimi-VL-A3B-Thinking-2506 (16B) on reasoning-intensive benchmarks with lower computational cost.

  2. EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

    The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general-purpose embodied intelligent systems. Recent vision-language-action (VLA) models, which are co-trained on large-scale robot and visual-text data, have demonstrated notable progress in general robot control. However, they still fail to achieve human-level flexibility in interleaved reasoning and interaction. In this work, introduce EO-Robotics, consists of EO-1 model and EO-Data1.5M dataset. EO-1 is a unified embodied foundation model that achieves superior performance in multimodal embodied reasoning and robot control through interleaved vision-text-action pre-training. The development of EO-1 is based on two key pillars: (i) a unified architecture that processes multimodal inputs indiscriminately (image, text, video, and action), and (ii) a massive, high-quality multimodal embodied reasoning dataset, EO-Data1.5M, which contains over 1.5 million samples with emphasis on interleaved vision-text-action comprehension. EO-1 is trained through synergies between auto-regressive decoding and flow matching denoising on EO-Data1.5M, enabling seamless robot action generation and multimodal embodied reasoning. Extensive experiments demonstrate the effectiveness of interleaved vision-text-action learning for open-world understanding and generalization, validated through a variety of long-horizon, dexterous manipulation tasks across multiple embodiments. This paper details the architecture of EO-1, the data construction strategy of EO-Data1.5M, and the training methodology, offering valuable insights for developing advanced embodied foundation models.

  3. A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

    The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks are inadequate, as they focus on isolated code snippets, employ unstable evaluation methods that lack reproducibility, and fail to connect the quality of input context with the security of the output. To address these gaps, we introduce A.S.E (AI Code Generation Security Evaluation), a benchmark for repository-level secure code generation. A.S.E constructs tasks from real-world repositories with documented CVEs, preserving full repository context like build systems and cross-file dependencies. Its reproducible, containerized evaluation framework uses expert-defined rules to provide stable, auditable assessments of security, build quality, and generation stability. Our evaluation of leading LLMs on A.S.E reveals three key findings: (1) Claude-3.7-Sonnet achieves the best overall performance. (2) The security gap between proprietary and open-source models is narrow; Qwen3-235B-A22B-Instruct attains the top security score. (3) Concise, ``fast-thinking'' decoding strategies consistently outperform complex, ``slow-thinking'' reasoning for security patching.

  4. Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

    Scaling laws have validated the success and promise of large-data-trained models in creative generation across text, image, and video domains. However, this paradigm faces data scarcity in the 3D domain, as there is far less of it available on the internet compared to the aforementioned modalities. Fortunately, there exist adequate videos that inherently contain commonsense priors, offering an alternative supervisory signal to mitigate the generalization bottleneck caused by limited native 3D data. On the one hand, videos capturing multiple views of an object or scene provide a spatial consistency prior for 3D generation. On the other hand, the rich semantic information contained within the videos enables the generated content to be more faithful to the text prompts and semantically plausible. This paper explores how to apply the video modality in 3D asset generation, spanning datasets to models. We introduce Droplet3D-4M, the first large-scale video dataset with multi-view level annotations, and train Droplet3D, a generative model supporting both image and dense text input. Extensive experiments validate the effectiveness of our approach, demonstrating its ability to produce spatially consistent and semantically plausible content. Moreover, in contrast to the prevailing 3D solutions, our approach exhibits the potential for extension to scene-level applications. This indicates that the commonsense priors from the videos significantly facilitate 3D creation. We have open-sourced all resources including the dataset, code, technical framework, and model weights: https://dropletx.github.io/.

  5. A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

    Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge, emphasizing the multimodal, cross-scale, and domain-specific challenges that differentiate scientific corpora from general natural language processing datasets. We systematically review recent Sci-LLMs, from general-purpose foundations to specialized models across diverse scientific disciplines, alongside an extensive analysis of over 270 pre-/post-training datasets, showing why Sci-LLMs pose distinct demands -- heterogeneous, multi-scale, uncertainty-laden corpora that require representations preserving domain invariance and enabling cross-modal reasoning. On evaluation, we examine over 190 benchmark datasets and trace a shift from static exams toward process- and discovery-oriented assessments with advanced evaluation protocols. These data-centric analyses highlight persistent issues in scientific data development and discuss emerging solutions involving semi-automated annotation pipelines and expert validation. Finally, we outline a paradigm shift toward closed-loop systems where autonomous agents based on Sci-LLMs actively experiment, validate, and contribute to a living, evolving knowledge base. Collectively, this work provides a roadmap for building trustworthy, continually evolving artificial intelligence (AI) systems that function as a true partner in accelerating scientific discovery.

  6. TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

    Audio-driven talking head synthesis has achieved remarkable photorealism, yet state-of-the-art (SOTA) models exhibit a critical failure: they lack generalization to the full spectrum of human diversity in ethnicity, language, and age groups. We argue that this generalization gap is a direct symptom of limitations in existing training data, which lack the necessary scale, quality, and diversity. To address this challenge, we introduce TalkVid, a new large-scale, high-quality, and diverse dataset containing 1244 hours of video from 7729 unique speakers. TalkVid is curated through a principled, multi-stage automated pipeline that rigorously filters for motion stability, aesthetic quality, and facial detail, and is validated against human judgments to ensure its reliability. Furthermore, we construct and release TalkVid-Bench, a stratified evaluation set of 500 clips meticulously balanced across key demographic and linguistic axes. Our experiments demonstrate that a model trained on TalkVid outperforms counterparts trained on previous datasets, exhibiting superior cross-dataset generalization. Crucially, our analysis on TalkVid-Bench reveals performance disparities across subgroups that are obscured by traditional aggregate metrics, underscoring its necessity for future research. Code and data can be found in https://github.com/FreedomIntelligence/TalkVid

  7. Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

    Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.

  8. TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

    The data mixture used in the pre-training of a language model is a cornerstone of its final performance. However, a static mixing strategy is suboptimal, as the model's learning preferences for various data domains shift dynamically throughout training. Crucially, observing these evolving preferences in a computationally efficient manner remains a significant challenge. To address this, we propose TiKMiX, a method that dynamically adjusts the data mixture according to the model's evolving preferences. TiKMiX introduces Group Influence, an efficient metric for evaluating the impact of data domains on the model. This metric enables the formulation of the data mixing problem as a search for an optimal, influence-maximizing distribution. We solve this via two approaches: TiKMiX-D for direct optimization, and TiKMiX-M, which uses a regression model to predict a superior mixture. We trained models with different numbers of parameters, on up to 1 trillion tokens. TiKMiX-D exceeds the performance of state-of-the-art methods like REGMIX while using just 20% of the computational resources. TiKMiX-M leads to an average performance gain of 2% across 9 downstream benchmarks. Our experiments reveal that a model's data preferences evolve with training progress and scale, and we demonstrate that dynamically adjusting the data mixture based on Group Influence, a direct measure of these preferences, significantly improves performance by mitigating the underdigestion of data seen with static ratios.

  9. UItron: Foundational GUI Agent with Advanced Perception and Planning

    GUI agent aims to enable automated operations on Mobile/PC devices, which is an important task toward achieving artificial general intelligence. The rapid advancement of VLMs accelerates the development of GUI agents, owing to their powerful capabilities in visual understanding and task planning. However, building a GUI agent remains a challenging task due to the scarcity of operation trajectories, the availability of interactive infrastructure, and the limitation of initial capabilities in foundation models. In this work, we introduce UItron, an open-source foundational model for automatic GUI agents, featuring advanced GUI perception, grounding, and planning capabilities. UItron highlights the necessity of systemic data engineering and interactive infrastructure as foundational components for advancing GUI agent development. It not only systematically studies a series of data engineering strategies to enhance training effects, but also establishes an interactive environment connecting both Mobile and PC devices. In training, UItron adopts supervised finetuning over perception and planning tasks in various GUI scenarios, and then develop a curriculum reinforcement learning framework to enable complex reasoning and exploration for online environments. As a result, UItron achieves superior performance in benchmarks of GUI perception, grounding, and planning. In particular, UItron highlights the interaction proficiency with top-tier Chinese mobile APPs, as we identified a general lack of Chinese capabilities even in state-of-the-art solutions. To this end, we manually collect over one million steps of operation trajectories across the top 100 most popular apps, and build the offline and online agent evaluation environments. Experimental results demonstrate that UItron achieves significant progress in Chinese app scenarios, propelling GUI agents one step closer to real-world application.

  10. Efficient Code Embeddings from Code Generation Models

    jina-code-embeddings is a novel code embedding model suite designed to retrieve code from natural language queries, perform technical question-answering, and identify semantically similar code snippets across programming languages. It makes innovative use of an autoregressive backbone pre-trained on both text and code, generating embeddings via last-token pooling. We outline the training recipe and demonstrate state-of-the-art performance despite the relatively small size of the models, validating this approach to code embedding model construction.

  11. AHELM: A Holistic Evaluation of Audio-Language Models

    Evaluations of audio-language models (ALMs) -- multimodal models that take interleaved audio and text as input and output text -- are hindered by the lack of standardized benchmarks; most benchmarks measure only one or two capabilities and omit evaluative aspects such as fairness or safety. Furthermore, comparison across models is difficult as separate evaluations test a limited number of models and use different prompting methods and inference parameters. To address these shortfalls, we introduce AHELM, a benchmark that aggregates various datasets -- including 2 new synthetic audio-text datasets called PARADE, which evaluates the ALMs on avoiding stereotypes, and CoRe-Bench, which measures reasoning over conversational audio through inferential multi-turn question answering -- to holistically measure the performance of ALMs across 10 aspects we have identified as important to the development and usage of ALMs: audio perception, knowledge, reasoning, emotion detection, bias, fairness, multilinguality, robustness, toxicity, and safety. We also standardize the prompts, inference parameters, and evaluation metrics to ensure equitable comparisons across models. We test 14 open-weight and closed-API ALMs from 3 developers and 3 additional simple baseline systems each consisting of an automatic speech recognizer and a language model. Our results show that while Gemini 2.5 Pro ranks top in 5 out of 10 aspects, it exhibits group unfairness (p=0.01) on ASR tasks whereas most of the other models do not. We also find that the baseline systems perform reasonably well on AHELM, with one ranking 5th overall despite having only speech-to-text capabilities. For transparency, all raw prompts, model generations, and outputs are available on our website at https://crfm.stanford.edu/helm/audio/v1.0.0. AHELM is intended to be a living benchmark and new datasets and models will be added over time.

  12. Morae: Proactively Pausing UI Agents for User Choices

    User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.

  13. Model-Task Alignment Drives Distinct RL Outcomes

    Recent advances in applying reinforcement learning (RL) to large language models (LLMs) have led to substantial progress. In particular, a series of remarkable yet often counterintuitive phenomena have been reported in LLMs, exhibiting patterns not typically observed in traditional RL settings. For example, notable claims include that a single training example can match the performance achieved with an entire dataset, that the reward signal does not need to be very accurate, and that training solely with negative samples can match or even surpass sophisticated reward-based methods. However, the precise conditions under which these observations hold - and, critically, when they fail - remain unclear. In this work, we identify a key factor that differentiates RL observations: whether the pretrained model already exhibits strong Model-Task Alignment, as measured by pass@k accuracy on the evaluated task. Through a systematic and comprehensive examination of a series of counterintuitive claims, supported by rigorous experimental validation across different model architectures and task domains, our findings show that while standard RL training remains consistently robust across settings, many of these counterintuitive results arise only when the model and task already exhibit strong model-task alignment. In contrast, these techniques fail to drive substantial learning in more challenging regimes, where standard RL methods remain effective.

  14. Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

    Automated discovery of physical laws from observational data in the real world is a grand challenge in AI. Current methods, relying on symbolic regression or LLMs, are limited to uni-modal data and overlook the rich, visual phenomenological representations of motion that are indispensable to physicists. This "sensory deprivation" severely weakens their ability to interpret the inherent spatio-temporal patterns within dynamic phenomena. To address this gap, we propose VIPER-R1, a multimodal model that performs Visual Induction for Physics-based Equation Reasoning to discover fundamental symbolic formulas. It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process. The model is trained via a curriculum of Motion Structure Induction (MSI), using supervised fine-tuning to interpret kinematic phase portraits and to construct hypotheses guided by a Causal Chain of Thought (C-CoT), followed by Reward-Guided Symbolic Calibration (RGSC) to refine the formula structure with reinforcement learning. During inference, the trained VIPER-R1 acts as an agent: it first posits a high-confidence symbolic ansatz, then proactively invokes an external symbolic regression tool to perform Symbolic Residual Realignment (SR^2). This final step, analogous to a physicist's perturbation analysis, reconciles the theoretical model with empirical data. To support this research, we introduce PhysSymbol, a new 5,000-instance multimodal corpus. Experiments show that VIPER-R1 consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability, enabling more precise discovery of physical laws. Project page: https://jiaaqiliu.github.io/VIPER-R1/

  15. CLIPSym: Delving into Symmetry Detection with CLIP

    Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and G-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique. The code is available at https://github.com/timyoung2333/CLIPSym.

Solidot(15)

  1. 研究称闻香味能增加大脑灰质

    根据发表在《Brain Research Bulletin》期刊上的一项研究,日本科学家报告称长时间闻香水能增加大脑灰质。日本京都大学和筑波大学的研究人员让实验组的 28 名女性抹玫瑰香油一个月,对照组的 22 名女性抹自来水。核磁共振成像(MRI) 扫描显示,抹玫瑰香油的实验组成员大脑灰质略有增加。脑灰质的增加并不一定意味着思维能力得到了增强,但这项发现可能对痴呆症等神经退行性疾病有重要意义。虽然不知道灰质增加的确切原因,研究人员猜测玫瑰香味会被大脑识别为令人不快的气味,负责调节情绪的后扣带回皮质(posterior cingulate cortex)会努力工作使体积增大。研究人员希望该发现能有助于研发能促进心理健康和大脑可塑性的芳香疗法。

  2. 日本夏季平均气温再创新高

    日本气象厅周一发布消息,今年夏天平均气温较往年高出 2.36 度,创 1898 年开始统计以来新高。日本已连续 3 年经历最炎热夏季。气象厅表示这一波酷暑还将持续两周。气象厅称,日本北部较往年高出 3.4°C,日本东部 +2.3°C,日本西部 +1.7°C,均为 1946 年有统计以来的最高值。全国 153 个气象站中,有 132 个站记录了夏季最高平均气温(其中 9 个站的记录与基线持平)。今年夏季,共有 9,385 个 AMeDAS 站记录了极端高温天数,这是自 2010 年实现统计比较以来的最高值。

  3. 新西兰人为左旋蜗牛寻找配偶

    新西兰人正在为一只罕见的左旋蜗牛寻找配偶。这只蜗牛以《辛普森一家》中左撇子邻居 Ned Flanders 的名字命名为 Ned。蜗牛的壳通常是右旋的,出现左旋壳的概率是 1:40,000,而且左旋蜗牛和右旋蜗牛是无法交配的,因为两者的生殖器对不上,所以左旋蜗牛必须和左旋蜗牛交配,但在自然界左旋蜗牛遇到左旋蜗牛的概率是非常低,因此新西兰人发起了为 Ned 寻找配偶的全国性行动——上一次的成功尝试是在 2016 年。

  4. Adobe Reader 安装程序的大小过去几年大幅膨胀

    曾经的装机软件、广泛使用的 PDF 阅读器 Adobe Reader 安装程序其容量过去几年大幅膨胀,原因当然是和所有科技公司一样,要在其产品中集成炙手可热的 AI,至于用户需要不需要则是另一回事。Adobe Reader 25.1 版本容量接近 700MB,而去年发布的 v24.2 容量只有 460MB,2016 年的 v15.17 容量不到 100MB。相比下,另一款 PDF 阅读器 SumatraPDF 容量维持在 10MB 以内。

  5. Python 纪录片上线

    由 CultRepo 制作的 Python 语言纪录片《Python: The Documentary | An origin story》上周在 YouTube 上线,观看量超过了 18 万次。Python 语言最初是荷兰程序员 Guido van Rossum 的“课余”项目,它简洁易读的特性最终令其从众多编程语言中脱颖而出,成为最受人喜爱的语言之一,成为驱动 AI、数据科学以及科技巨头所开发软件使用的语言。出现在纪录片中的人物包括了 Guido van Rossum、Travis Oliphant、Barry Warsaw 等,它讲述了 Python 的崛起、社区驱动的演变、几乎导致分崩离析的冲突,以及这门语言对世界万物的影响。

  6. 日本今年上半年出生人口再创新低

    日本厚生劳动省发布的 1~6 月人口动态统计显示,出生人数为 33.928 万人,较上年同期减少 3.1%,刷新了1969 年有可比数据以来的最低纪录。如果按照这样的速度持续下去,全年出生人数也极有可能创出历史新低。上半年的死亡人数为 83.6818 万人,同比增加 3.1%。出生人数减去死亡人数后得出的自然增减人数为负 49.7538 万人。日本已连续 21 年出现人口自然减少。从不同地区来看,所有都道府县均出现人口自然减少。

  7. 在试过后 Brian Kernighan 认为 Rust 不会很快取代 C

    83 岁的 Brian Kernighan 仍然在普林斯顿大学担任计算机科学教授,他参与了 Unix 系统的开发,与 Dennis Ritchie 合著了《C程序设计语言》(The C Programming Language)一书。他最近在新泽西州 InfoAge 科学历史博物馆做了一次演讲,在演讲之后的问答环节回答了一位现场观众的提问,这位观众询问了有关 Rust 语言是否会取代 C 语言的问题。Brian Kernighan 表示他只写过一个 Rust 程序,因此对 Rust 了解不多,但这次写 Rust 程序给他留下了非常糟糕的印象,他无法理解实现内存安全所需的机制,以及相应的支持机制。他花了好几天才写出一个 Rust 程序,用其它语言写五分钟时间就够了。他的结论是 Rust 不会很快取代 C。

  8. 建造在砂质土壤上的非洲城市在裂开

    根据发表在《自然》期刊上的一项研究,建造在砂质土壤之上、缺乏排水系统的非洲城市正在裂开,形成了很多会吞噬房屋和商铺的巨大沟壑。研究人员利用 2021-2023 年拍摄的卫星图像,在非洲 47 个城市中的 26 个城市识别出 2922 条城市沟壑,累计近 740 公里长。研究团队还使用比利时中非皇家博物馆的历史航拍照片进行交叉比对,发现城市沟壑在 1950 年代只有 46 条。研究显示,仅在刚果民主共和国,2004-2023 年间平均就有约 11.86 万人流离失所。研究人员估计,如果不采取紧急行动,未来 10 年非洲各地可能会有数十万人流离失所。刚果民主共和国首都金沙萨是受灾最严重的城市之一,该市共有 868 条城市沟壑,总长 221 公里。

  9. 研究认为平均寿命达到 100 岁变得不太可能

    根据发表在 PNAS 期刊上的一项研究,1939 年以后出生的几代人平均寿命不太可能达到 100 岁。从 1900 年到 1938 年,每一代人的预期寿命都增加了约五个半月。1900 年高收入国家居民的预期寿命平均为 62 岁,38 年后出生在类似条件下的人的预期寿命增至 80 岁。对 1939-2000 年出生的人而言,预期寿命增长速度放缓至每代约两个半月到三个半月,而 1980 年出生的人平均寿命无法达到 100 岁。除非有重大突破能显著延长人类寿命,预期寿命增长速度将会放缓。

  10. Mastodon 表示没办法遵守年龄验证法律

    联邦宇宙微博客平台 Mastodon 表示,软件不支持年龄验证,而运营 Mastodon 的非盈利组织也缺乏资源,因此它无法遵守密西西比州的年龄验证法律。Mastodon 也无意使用基于 IP 的封锁措施,认为这对正在旅行的用户是不公平的。为应对监管,2025 年 7 月释出的 Mastodon v4.4 允许管理员设定注册最低年龄,但年龄核查数据不会保存,而鉴于 Mastodon 的去中心化架构,年龄核查由不同 Mastodon 服务器管理员决定,Mastodon 本身不会跟踪不同服务器的政策执行和运营。

  11. Vivaldi 再次强调不会集成生成式 AI

    Vivaldi CEO 谭咏文(Jon von Tetzchner)再次强调了不会在浏览器中集成生成式 AI 的立场。他的理由是相比生成式 AI,人类的 Web 更具有多元化。他说,浏览器嵌入生成式 AI 会让 Web 丧失人性,导致内容发行商的流量减少,且其主要用途是收集用户数据。他说,每一个初创公司都在搞 AI,每一家公司都试图在其产品和服务中集成 AI,但这些公司都没有关注用户的真正需求。他说,Vivaldi 选择站在人类这边,而不是选择站在炒作 AI 那边,不会将探索的乐趣变成不活跃的旁观。如果没有探索,Web 会索然无味。人类的好奇心会失去动力,Web 的多元性也会跟着消亡。他表示,生成式 AI 与社媒推荐算法面临的问题相同,它们都根据收集的数据决定用户看到的内容。但 Vivaldi 希望用户能掌控自己的数据,自主决定看到的内容,希望用户能掌控一切。如果用户想使用生成式 AI,他们可以很容易访问,并不需要浏览器去集成 AI。

  12. 美国国防部叫停了微软中国工程师参与的项目

    上个月微软中国工程师被发现为五角大楼的云计算系统提供远程技术支持,中国工程师受到了持有安全许可的美国公民的监督,但调查发现执行监督任务的美国公民缺乏专业能力去理解外国工程师的工作。国防部长 Pete Hegseth 此前宣布将对此展开调查。本周他宣布国防部已经叫停了该项目。Hegseth 还表示,国防部已向微软发出正式关注函,记录了微软违反信任的行为,并要求微软对数字护航(digital escorts)项目进行第三方审核,仔细审查中国公民提交的代码。

  13. FTC 指责 Gmail 过滤共和党筹款邮件威胁美国自由

    美国共和党人重新启动了过去几年针对 Gmail 过滤共和党筹款邮件的指控。此前的投诉都被美国联邦法官和选举委员会驳回。共和党人投诉 Gmail 将其竞选筹款广告邮件标记为垃圾邮件的比例远高于民主党,因此帮助了民主党候选人,联邦选举委员会对此表示来自 Gmail 垃圾邮件过滤算法的任何倾斜结果都是无意造成的。对于最新的指控,联邦贸易委员会(FTC)主席、共和党人 Andrew N. Ferguson 致函 Alphabet CEO Sundar Pichai,称 Gmail 的做法可能违反 FTC Act 中有关禁止不公平或商业诈欺行为的规定。他威胁 FTC 可能对 Google 展开调查或执法行动。

  14. Linus Torvalds 将 Bcachefs 标记为由外部维护

    Linus Torvalds 更新了内核维护者文档 MAINTAINERS,将文件系统 Bcachefs 标记为由外部维护,发出了他不会接受 Bcachefs 新拉取请求(pull request)的信号。Linux 作者此前与 Bcachefs 维护者 Kent Overstreet 之间爆发冲突,Linus Torvalds 当时表态考虑移除 Bcachefs 文件系统。本月早些时候发布的 Linux 6.17-rc1 就没有合并来自 Overstreet 的任何拉取请求。Bcachefs 代码目前仍然存在于主线 Linux 内核中,可能是为了防止现有用户在使用 Bcachefs 时遇到问题。

  15. 为平衡能耗和生存演化可能对大脑大小设置了上限

    现代人类的大脑容量曾经历一段增长期,30 万前增长开始放缓,10 万年前增长停滞,甚至可能开始缩小。根据发表在《Brain & Cognition》期刊上的一项研究,研究人员认为脑容量大小停滞是为了平衡能量需求和生存的需要。相比其它人属物种,现代人类更大的脑容量带来了演化上的优势,是使用火、制造工具和进行符号交流的关键。但更大的脑容量也是有代价的,它消耗了五分之一的静息能量(resting energy),产生了大量热,在气候逐渐暖化的情况下可能会成为一种负担。研究人员分析了 800 个人属头颅的脑容量测量数据,包括 690 个现代人类和 99 个非现代人类物种如尼安德特人,发现冰期和间冰期的脑容量大小存在显著差异。冰期的脑容量更大,间冰期的脑容量更小,表明气候暖化加剧了维持较大脑容量所需的代谢和体温调节成本。