DIGEST · 2025-08-21

OrangeBot.AI Digest — 2025-08-21

72 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. DeepSeek-v3.1 Release (api-docs.deepseek.com)
  2. AI tooling must be disclosed for contributions (github.com)
  3. Bank forced to rehire workers after lying about chatbot productivity, union says (arstechnica.com)
  4. 95% of Companies See 'Zero Return' on $30B Generative AI Spend (thedailyadda.com)
  5. I forced every engineer to take sales calls and they rewrote our platform (old.reddit.com)
  6. Unity reintroduces the Runtime Fee through its Industry license (unity.com)
  7. Beyond sensor data: Foundation models of behavioral data from wearables (arxiv.org)
  8. AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard' (www.theregister.com)
  9. Weaponizing image scaling against production AI systems (blog.trailofbits.com)
  10. AI crawlers, fetchers are blowing up websites; Meta, OpenAI are worst offenders (www.theregister.com)
  11. Margin debt surges to record high (www.advisorperspectives.com)
  12. Using Podman, Compose and BuildKit (emersion.fr)
  13. Mark Zuckerberg freezes AI hiring amid bubble fears (www.telegraph.co.uk)
  14. Show HN: OS X Mavericks Forever (mavericksforever.com)
  15. D4D4 (www.nmichaels.org)

GitHub Trending(12)

  1. moeru-ai / airi

    💖🧸 Self hosted, you owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

  2. simstudioai / sim

    Sim is an open-source AI agent workflow builder. Sim Studio's interface is a lightweight, intuitive way to quickly build and deploy LLMs that connect with your favorite tools.

  3. google / googletest

    GoogleTest - Google Testing and Mocking Framework

  4. bitwarden / clients

    Bitwarden client apps (web, browser extension, desktop, and cli).

  5. Budibase / budibase

    Create business apps and automate workflows in minutes. Supports PostgreSQL, MySQL, MariaDB, MSSQL, MongoDB, Rest API, Docker, K8s, and more 🚀 No code / Low code platform..

  6. firecrawl / firecrawl

    The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

  7. HunxByts / GhostTrack

    Useful tool to track location or mobile number

  8. nextjs / saas-starter

    Get started quickly with Next.js, Postgres, Stripe, and shadcn/ui.

  9. plait-board / drawnix

    开源白板工具(SaaS),一体化白板,包含思维导图、流程图、自由画等。All in one open-source whiteboard tool with mind, flowchart, freehand and etc.

  10. HeyPuter / puter

    🌐 The Internet OS! Free, Open-Source, and Self-Hostable.

  11. puppeteer / puppeteer

    JavaScript API for Chrome and Firefox

  12. skills / introduction-to-github

    Get started using GitHub in less than an hour.

Product Hunt(15)

  1. Mocke

    Mock email campaigns: know your reply rate without launching

  2. ReadyBase

    Prompt to PDF in seconds

  3. Daymi

    Your AI clone for iMessage conversations that feel real

  4. Macaly 2.0

    AI website builder with built-in database, hosting & more

  5. Disco.dev

    Plug-and-play open source mcp servers

  6. Syncly Social

    AI social listening for TikTok videos

  7. Broxi AI

    No-code AI agent builder. From text to AI agents in minutes

  8. AGENTS.md

    A README, but for your AI coding agent

  9. Ponder

    AI-Powered Journal for Self-Reflection

  10. Pinery

    Markdown Editor for Books

  11. Puck

    Open-source visual editor for React

  12. Basedash Self-Hosted

    AI-native Business Intelligence on your own infrastructure

  13. Dash

    Private, encrypted OSS notes app

  14. NoDocs

    No-Code Product Documentation Builder

  15. AgentCraft. Like Cursor, but for n8n

    Get 10x speed for building automations. AI copilot is here

Hugging Face(15)

  1. From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

    Large Language Models (LLMs) have shown promise for financial applications, yet their suitability for this high-stakes domain remains largely unproven due to inadequacies in existing benchmarks. Existing benchmarks solely rely on score-level evaluation, summarizing performance with a single score that obscures the nuanced understanding of what models truly know and their precise limitations. They also rely on datasets that cover only a narrow subset of financial concepts, while overlooking other essentials for real-world applications. To address these gaps, we introduce FinCDM, the first cognitive diagnosis evaluation framework tailored for financial LLMs, enabling the evaluation of LLMs at the knowledge-skill level, identifying what financial skills and knowledge they have or lack based on their response patterns across skill-tagged tasks, rather than a single aggregated number. We construct CPA-QKA, the first cognitively informed financial evaluation dataset derived from the Certified Public Accountant (CPA) examination, with comprehensive coverage of real-world accounting and financial skills. It is rigorously annotated by domain experts, who author, validate, and annotate questions with high inter-annotator agreement and fine-grained knowledge labels. Our extensive experiments on 30 proprietary, open-source, and domain-specific LLMs show that FinCDM reveals hidden knowledge gaps, identifies under-tested areas such as tax and regulatory reasoning overlooked by traditional benchmarks, and uncovers behavioral clusters among models. FinCDM introduces a new paradigm for financial LLM evaluation by enabling interpretable, skill-aware diagnosis that supports more trustworthy and targeted model development, and all datasets and evaluation scripts will be publicly released to support further research.

  2. DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

    We present DuPO, a dual learning-based preference optimization framework that generates annotation-free feedback via a generalized duality. DuPO addresses two key limitations: Reinforcement Learning with Verifiable Rewards (RLVR)'s reliance on costly labels and applicability restricted to verifiable tasks, and traditional dual learning's restriction to strictly dual task pairs (e.g., translation and back-translation). Specifically, DuPO decomposes a primal task's input into known and unknown components, then constructs its dual task to reconstruct the unknown part using the primal output and known information (e.g., reversing math solutions to recover hidden variables), broadening applicability to non-invertible tasks. The quality of this reconstruction serves as a self-supervised reward to optimize the primal task, synergizing with LLMs' ability to instantiate both tasks via a single model. Empirically, DuPO achieves substantial gains across diverse tasks: it enhances the average translation quality by 2.13 COMET over 756 directions, boosts the mathematical reasoning accuracy by an average of 6.4 points on three challenge benchmarks, and enhances performance by 9.3 points as an inference-time reranker (trading computation for accuracy). These results position DuPO as a scalable, general, and annotation-free paradigm for LLM optimization.

  3. FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

    Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking, information gathering, contextual understanding, and decision-making under uncertainty. Agents must not only gather and interpret vast amounts of dynamic information but also integrate diverse data sources, weigh uncertainties, and adapt predictions based on emerging trends, just as human experts do in fields like politics, economics, and finance. Despite its importance, no large-scale benchmark exists for evaluating agents on future prediction, largely due to challenges in handling real-time updates and retrieving timely, accurate answers. To address this, we introduce FutureX, a dynamic and live evaluation benchmark specifically designed for LLM agents performing future prediction tasks. FutureX is the largest and most diverse live benchmark for future prediction, supporting real-time daily updates and eliminating data contamination through an automated pipeline for question gathering and answer collection. We evaluate 25 LLM/agent models, including those with reasoning, search capabilities, and integration of external tools such as the open-source Deep Research Agent and closed-source Deep Research models. This comprehensive evaluation assesses agents' adaptive reasoning and performance in dynamic environments. Additionally, we provide in-depth analyses of agents' failure modes and performance pitfalls in future-oriented tasks, including the vulnerability to fake web pages and the temporal validity. Our goal is to establish a dynamic, contamination-free evaluation standard that drives the development of LLM agents capable of performing at the level of professional human analysts in complex reasoning and predictive thinking.

  4. MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

    Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshCoder, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshCoder as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding.

  5. Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization

    We introduce Tinker, a versatile framework for high-fidelity 3D editing that operates in both one-shot and few-shot regimes without any per-scene finetuning. Unlike prior techniques that demand extensive per-scene optimization to ensure multi-view consistency or to produce dozens of consistent edited input views, Tinker delivers robust, multi-view consistent edits from as few as one or two images. This capability stems from repurposing pretrained diffusion models, which unlocks their latent 3D awareness. To drive research in this space, we curate the first large-scale multi-view editing dataset and data pipeline, spanning diverse scenes and styles. Building on this dataset, we develop our framework capable of generating multi-view consistent edited views without per-scene training, which consists of two novel components: (1) Referring multi-view editor: Enables precise, reference-driven edits that remain coherent across all viewpoints. (2) Any-view-to-video synthesizer: Leverages spatial-temporal priors from video diffusion to perform high-quality scene completion and novel-view generation even from sparse inputs. Through extensive experiments, Tinker significantly reduces the barrier to generalizable 3D content creation, achieving state-of-the-art performance on editing, novel-view synthesis, and rendering enhancement tasks. We believe that Tinker represents a key step towards truly scalable, zero-shot 3D editing. Project webpage: https://aim-uofa.github.io/Tinker

  6. From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

    Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position Agentic Science as a pivotal stage within the broader AI for Science paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI shows capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement -- behaviors once regarded as uniquely human. This survey provides a domain-oriented review of autonomous scientific discovery across life sciences, chemistry, materials science, and physics. We unify three previously fragmented perspectives -- process-oriented, autonomy-oriented, and mechanism-oriented -- through a comprehensive framework that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across the above domains, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.

  7. MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

    The Model Context Protocol has emerged as a transformative standard for connecting large language models to external data sources and tools, rapidly gaining adoption across major AI providers and development platforms. However, existing benchmarks are overly simplistic and fail to capture real application challenges such as long-horizon reasoning and large, unfamiliar tool spaces. To address this critical gap, we introduce MCP-Universe, the first comprehensive benchmark specifically designed to evaluate LLMs in realistic and hard tasks through interaction with real-world MCP servers. Our benchmark encompasses 6 core domains spanning 11 different MCP servers: Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching. To ensure rigorous evaluation, we implement execution-based evaluators, including format evaluators for agent format compliance, static evaluators for time-invariant content matching, and dynamic evaluators that automatically retrieve real-time ground truth for temporally sensitive tasks. Through extensive evaluation of leading LLMs, we find that even SOTA models such as GPT-5 (43.72%), Grok-4 (33.33%) and Claude-4.0-Sonnet (29.44%) exhibit significant performance limitations. In addition, our benchmark poses a significant long-context challenge for LLM agents, as the number of input tokens increases rapidly with the number of interaction steps. Moreover, it introduces an unknown-tools challenge, as LLM agents often lack familiarity with the precise usage of the MCP servers. Notably, enterprise-level agents like Cursor cannot achieve better performance than standard ReAct frameworks. Beyond evaluation, we open-source our extensible evaluation framework with UI support, enabling researchers and practitioners to seamlessly integrate new agents and MCP servers while fostering innovation in the rapidly evolving MCP ecosystem.

  8. Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

    Recent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource demands. While post-training quantization (PTQ) has emerged as a widely adopted technique for compressing AR LLMs, its applicability to dLLMs remains largely unexplored. In this work, we present the first systematic study on quantizing diffusion-based language models. We begin by identifying the presence of activation outliers, characterized by abnormally large activation values that dominate the dynamic range. These outliers pose a key challenge to low-bit quantization, as they make it difficult to preserve precision for the majority of values. More importantly, we implement state-of-the-art PTQ methods and conduct a comprehensive evaluation across multiple task types and model variants. Our analysis is structured along four key dimensions: bit-width, quantization method, task category, and model type. Through this multi-perspective evaluation, we offer practical insights into the quantization behavior of dLLMs under different configurations. We hope our findings provide a foundation for future research in efficient dLLM deployment. All codes and experimental setups will be released to support the community.

  9. NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

    We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

  10. RynnEC: Bringing MLLMs into Embodied World

    We introduce RynnEC, a video multimodal large language model designed for embodied cognition. Built upon a general-purpose vision-language foundation model, RynnEC incorporates a region encoder and a mask decoder, enabling flexible region-level video interaction. Despite its compact architecture, RynnEC achieves state-of-the-art performance in object property understanding, object segmentation, and spatial reasoning. Conceptually, it offers a region-centric video paradigm for the brain of embodied agents, providing fine-grained perception of the physical world and enabling more precise interactions. To mitigate the scarcity of annotated 3D datasets, we propose an egocentric video based pipeline for generating embodied cognition data. Furthermore, we introduce RynnEC-Bench, a region-centered benchmark for evaluating embodied cognitive capabilities. We anticipate that RynnEC will advance the development of general-purpose cognitive cores for embodied agents and facilitate generalization across diverse embodied tasks. The code, model checkpoints, and benchmark are available at: https://github.com/alibaba-damo-academy/RynnEC

  11. On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

    Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs). Existing approaches that integrate SFT and RL often face the risk of disrupting established model patterns and inducing overfitting to expert data. To address this, we present a novel investigation into the unified view of SFT and RL through an off-policy versus on-policy lens. We propose CHORD, a framework for the Controllable Harmonization of On- and Off-Policy Reinforcement Learning via Dynamic Weighting, which reframes SFT not as a separate stage but as a dynamically weighted auxiliary objective within the on-policy RL process. Based on an analysis of off-policy expert data's influence at both holistic and granular levels, we incorporate a dual-control mechanism in CHORD. Specifically, the framework first employs a global coefficient to holistically guide the transition from off-policy imitation to on-policy exploration, and then applies a token-wise weighting function that enables granular learning from expert tokens, which preserves on-policy exploration and mitigates disruption from off-policy data. We conduct extensive experiments on widely used benchmarks, providing empirical evidence that CHORD achieves a stable and efficient learning process. By effectively harmonizing off-policy expert data with on-policy exploration, CHORD demonstrates significant improvements over baselines. We release the implementation at https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord to inspire further research.

  12. FLARE: Fast Low-rank Attention Routing Engine

    The quadratic complexity of self-attention limits its applicability and scalability on large unstructured meshes. We introduce Fast Low-rank Attention Routing Engine (FLARE), a linear complexity self-attention mechanism that routes attention through fixed-length latent sequences. Each attention head performs global communication among N tokens by projecting the input sequence onto a fixed length latent sequence of M ll N tokens using learnable query tokens. By routing attention through a bottleneck sequence, FLARE learns a low-rank form of attention that can be applied at O(NM) cost. FLARE not only scales to unprecedented problem sizes, but also delivers superior accuracy compared to state-of-the-art neural PDE surrogates across diverse benchmarks. We also release a new additive manufacturing dataset to spur further research. Our code is available at https://github.com/vpuri3/FLARE.py.

  13. ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?

    Vision language models (VLMs) demonstrate remarkable capabilities on English multimodal tasks, but their performance on low-resource languages with genuinely multimodal educational content remains largely unexplored. In this work, we test how VLMs perform on Vietnamese educational assessments, investigating whether VLMs trained predominantly on English data can handle real-world cross-lingual multimodal reasoning. Our work presents the first comprehensive evaluation of VLM capabilities on multimodal Vietnamese exams through proposing ViExam, a benchmark containing 2,548 multimodal questions. We find that state-of-the-art VLMs achieve only 57.74% while open-source models achieve 27.70% mean accuracy across 7 academic domains, including Mathematics, Physics, Chemistry, Biology, Geography, Driving Test, and IQ Test. Most VLMs underperform average human test-takers (66.54%), with only the thinking VLM o3 (74.07%) exceeding human average performance, yet still falling substantially short of human best performance (99.60%). Cross-lingual prompting with English instructions while maintaining Vietnamese content fails to improve performance, decreasing accuracy by 1 percentage point for SOTA VLMs. Human-in-the-loop collaboration can partially improve VLM performance by 5 percentage points. Code and data are available at: https://vi-exam.github.io.

  14. Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell

    This paper presents a novel approach to calculating the Levenshtein (edit) distance within the framework of Fully Homomorphic Encryption (FHE), specifically targeting third-generation schemes like TFHE. Edit distance computations are essential in applications across finance and genomics, such as DNA sequence alignment. We introduce an optimised algorithm that significantly reduces the cost of edit distance calculations called Leuvenshtein. This algorithm specifically reduces the number of programmable bootstraps (PBS) needed per cell of the calculation, lowering it from approximately 94 operations -- required by the conventional Wagner-Fisher algorithm -- to just 1. Additionally, we propose an efficient method for performing equality checks on characters, reducing ASCII character comparisons to only 2 PBS operations. Finally, we explore the potential for further performance improvements by utilising preprocessing when one of the input strings is unencrypted. Our Leuvenshtein achieves up to 278times faster performance compared to the best available TFHE implementation and up to 39times faster than an optimised implementation of the Wagner-Fisher algorithm. Moreover, when offline preprocessing is possible due to the presence of one unencrypted input on the server side, an additional 3times speedup can be achieved.

  15. mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning

    Recent advancements in reasoning-reinforced Large Language Models (LLMs) have shown remarkable capabilities in complex reasoning tasks. However, the mechanism underlying their utilization of different human reasoning skills remains poorly investigated, especially for multilingual commonsense reasoning that involves everyday knowledge across different languages and cultures. To address this gap, we propose a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning (mSCoRe). Our benchmark incorporates three key components that are designed to systematically evaluate LLM's reasoning capabilities, including: (1) a novel taxonomy of reasoning skills that enables fine-grained analysis of models' reasoning processes, (2) a robust data synthesis pipeline tailored specifically for commonsense reasoning evaluation, and (3) a complexity scaling framework allowing task difficulty to scale dynamically alongside future improvements in LLM abilities. Extensive experiments on eights state-of-the-art LLMs of varying sizes and training approaches demonstrate that mSCoRe remains significantly challenging for current models, particularly at higher complexity levels. Our results reveal the limitations of such reasoning-reinforced models when confronted with nuanced multilingual general and cultural commonsense. We further provide detailed analysis on the models' reasoning processes, suggesting future directions for improving multilingual commonsense reasoning capabilities.

Solidot(15)

  1. AWS CEO 称用 AI 取代初级员工是蠢主意

    亚马逊 AWS CEO Matt Garman 表示用 AI 工具取代初级员工是他听过的最蠢的主意之一,因为企业需要精通业务的资深员工,但资深员工都是从初级员工一步步成长起来的。他说,如果 10 年后企业没有此类员工,你的公司如何运作?他认为,企业应该继续招聘应届生,教他们如何构建软件、分解问题和采用最佳实践。他说 AI 时代最有价值的技能与大学学位不相关。要保住自己的工作员工必须不停的继续学习更新技能。

  2. Firefox 142.0 释出

    Mozilla 释出了 Firefox 142.0。主要新特性包括:美国用户新标签页的文章推荐将按照体育、美食等主题分类,用户可选择关注自己感兴趣的主题,以及移除不感兴趣的主题;链接预览功能,用户单击右键还可选择 AI 生成摘要,AI 在本地运行,不会泄漏用户隐私,链接预览功能将逐步推广,目前提供给有 3GB 可用内存的 en-US、en-CA、en-GB 和 en-AU 地区用户;放宽对部分网站的严格跟踪保护功能,否则网站无法正常展示;等等。

  3. 苹果将在印度组装更多新款 iPhone

    苹果正在印度而不是中国组装更多新款 iPhone 17 手机,且首次所有 iPhone 新机型在印度组装出货。苹果正在开发 iPhone 16E 后续机型,计划由印度组装。为了减少对中国制造的依赖,苹果正将 iPhone 大部分的组装转移到印度。苹果预计本季度将缴纳 11 亿美元的关税,而目前从印度向美国出口的 iPhone 可享受关税豁免。分析师称,iPhone 的组件仍然主要在中国生产,然后运往印度进行最终组装。

  4. 印度有时间先富后老

    世界人口第一大国还有足够的时间先富后老。印度要到 2050 年代末才会跨过人口老龄化的门槛——中位数年龄 41 岁,而中国已经跨过这一门槛。印度需要在 35 年内保持 10.4% 的 GDP 增长率才能在老龄化前实现富裕,而中国则需要保持 32% 的 GDP 年增长率。印度劳动年龄人口占总人口的比例将从 2021 年的 67.5% 增长到 2031 年的 69.2%,到 2036 年中位年龄为 34.5 岁。中国面临的困境和发达国家差不多,欧洲 65 岁以上人口比例从 1950 年的 8% 增加到了 2050 年的 30%,提高退休年龄将面临老年选民群体的抵制,老年人口占到了欧洲选民人口的四成。美国的老龄化问题将在 2033 年出现。

  5. 北极一群岛的冰融化量足以使海平面上升 0.16 毫米

    发表在 PNAS 期刊上的一项研究显示,2024 年夏季,连续 6 周的创纪录高温导致北极斯瓦尔巴群岛的冰融化量创历史新高。到今年夏末,该群岛上 1% 的陆地冰层已经消失,足以使全球海平面平均上升 0.16 毫米。一半以上的斯瓦尔巴群岛都被冰覆盖。1991 年以来,平均每年夏季融化的冰不到 100 亿吨。但在过去 5 年中,有 4 年创下了夏季冰流失的新纪录。研究团队估计,去年夏季总共损失了大约 620 亿吨冰,且几乎全部源于表层融化而非冰川入海。气候模型显示,随着地球持续变暖,这种情况将变得更加普遍。

  6. 电流重塑角膜能有效矫正视力

    不少人选择通过 LASIK 等激光矫正技术矫正视力。但这种手术需切削角膜组织,存在一定风险。加州大学尔湾分校研究人员正在探索一种通过电流重塑角膜,而不是切割角膜的“去激光化”视力矫正新技术。人类角膜是位于眼睛前部的一种透明拱形结构,其作用是折射环境光线并将其聚焦到视网膜上,再传送到大脑形成图像。如果角膜形状异常,就无法正确聚焦光线,从而导致视力模糊。LASIK 手术通过激光去除部分角膜组织以矫正角膜形态。人们普遍认为,这一手术相对安全,但仍存在局限和风险,而且激光切割角膜会削弱眼睛的结构稳定性。研究人员此次探索的是一种名为“电—机械重塑”的方法,其原理是许多富含胶原蛋白的组织(包括角膜)依靠带相反电荷分子的吸引力维持形态,当在组织中施加电流时,会改变其pH值,使这些分子间的吸引力暂时减弱,从而让组织变得柔软可塑。当pH值恢复后,组织则固定在新形态上。

  7. Copilot 对文件的访问会不记录在日志内

    Pistachio CTO Zack Korman 披露微软 Windows AI 助手 M365 Copilot 的一个 bug,它对文件的访问会不记录在审计日志内,这意味着如果有人能操纵 Copilot 他们能匿名其访问痕迹,对企业而言这是一个严重的安全隐患。微软和其它科技公司一样,正全力押注 AI,将 AI 整合到旗下的各种产品中,Windows 11 就整合了 Copilot 助手。Zack Korman 发现如果要求 Copilot 访问一个文件,摘录其内容,在回复中不要链接文档地址,那么在日志里就不会留下访问记录。他在 7 月初向微软报告了该 bug,微软在 8 月中旬修复了 bug,但没有赋予 CVE 编号,也没有披露该 bug。

  8. 巴基斯坦互联网连接速度降至平常的五分之一

    根据 NetBlocks 报告,巴基斯坦周二互联网连接速度降至平常的五分之一,该国 1.16 亿网户受到影响。网络问题主要影响该国骨干运营商 PTCL。目前尚不清楚故障原因。此前有报道称巴基斯坦的 Web 管理系统进口了中国的网络设备,新系统从 2024 年下半年开始投入使用,相比旧系统提供了更先进的监控能力,支持从加密连接收集元数据。Web 管理系统是工作在网络通道内(in-path)而不是旁路(on-path),被认为会对网络吞吐产生负面影响,巴基斯坦网民从去年下半年就开始抱怨他们的网速显著下降。

  9. 中国 HTTPS 访问短暂受限

    8 月 20 日北京时间约 00:34 至 01:48 期间,中国 HTTPS 访问短暂受限。苹果报告 APP Store 在此期间遇到问题,但没有给出更多解释。分析显示,所有指向 TCP 443 端口的连接无条件注入伪造的 TCP RST+ACK 包,导致连接中断;无条件的 RST+ACK 注入仅发生在 TCP 443 端口(常用于 HTTPS),未见于其他常见端口(如 22、80、8443);无条件注入同时扰乱了出入境双方向的连接,但触发机制不对称。

  10. 为什么好莱坞停止制作喜剧片

    根据 Letterboxd 和消费者调查,在观众想要看到更多的电影类型中喜剧片排名第二,但好莱坞喜剧片产量自 1990 年以来却下降了 27%。喜剧片制作预算平均为 2650 万美元,投资回报率达到 102%,然而只有 9.3% 的喜剧片有续集,而动作片则高达 27.6%。为什么好莱坞停止制作喜剧片?相比动作片、恐怖片或传记片,喜剧片公式化程度不那么高,而且更难翻译到其它语言和文化,喜剧片的票房主要来自本土市场而非国际市场,如 《妙探出更(Beverly Hills Cop)》和《捉鬼敢死队(Ghostbusters)》大部分票房都来自北美市场。今天的好莱坞更倾向于制作具有国际市场潜力的电影。

  11. 部分 Docker 镜像仍然包含 XZ Utils 后门

    去年初震惊整个开源和网络安全社区的 XZ 后门事件并没有离我们而去。在 XZ 事件中,攻击者 Jia Tan(化名,未必是华人)潜伏 XZ Utils 项目长达两年多时间,最终获得信任成为项目的共同维护者,之后他或他们利用其权限悄悄在 xz-utils 包中植入了一个复杂的后门。在恶意版本大规模传播前,后门就被发现了,因此没有造成大问题。但 Binarly REsearch 的调查发现,在攻击期间构建的部分 Docker 镜像仍然包含有 XZ Utils 后门。安全研究人员从 DockerHub 上发现了超过 35 个含有后门的镜像。虽然数字不多,但研究人员只扫描了一小部分镜像,而且只针对 Debian 发行版,其它发行版如 Fedora 和 OpenSUSE 情况未知。

  12. 软银向英特尔投资 20 亿美元

    英国芯片设计公司 Arm 的大股东日本软银集团向陷入困境的英特尔公司投资 20 亿美元,入股芯片巨人。与此同时,美国政府也在与英特尔公司谈判入股 10%。软银将以每股 23 美元的价格收购英特尔的普通股,将持有约 2% 的英特尔股份。软银今年二季度是四年来首次盈利,英特尔现任 CEO 陈立武曾是软银董事。英特尔曾是最强大的半导体公司,如今已经落后英伟达和 AMD 等公司。它目前正在重组裁员。

  13. MIT 报告称 95% 的企业生成式 AI 试验失败了

    MIT 发表报告《The GenAI Divide: State of AI in Business 2025》称,95% 的企业生成式 AI 试验失败了。虽然企业纷纷整合大模型,但只有 5% 的 AI 试点项目实现了收入的快速增长,大多数项目停滞,对损益表几乎没有产生可衡量的影响。研究基于对 150 名高管的访谈,350 名员工的调查以及对 300 个公开的 AI 部署项目的分析。报告主要作者 Aditya Challapally 解释说,95% 的企业部署生成式 AI 表现不佳不是因为大模型的质量,而是因为 ChatGPT 之类的通用工具因其灵活性对个人用户非常有用,但它们无法从工作流程中学习或适应工作流程,因此企业部署停滞不前。逾半数的生成式 AI 预算是投入在销售和营销工具上面,但研究显示后台业务自动化投资回报率最高——在后台部署 AI 有助于消除业务流程外包、削减外部营力成本和简化运营。

  14. 中国有望在美国之前登陆月球

    在美国登月计划受挫之际,中国的登月计划过去几个月则取得了显著进展,中国有可能在美国之前登陆月球。中国载人航天工程办公室于 8 月 6 日成功测试了 26 吨的揽月着陆器高保真模型。揽月月面着陆器是登月舱和推进舱组成,主要用于环月轨道和月球表面间的航天员运输,可搭载 2 名航天员往返,中国国家航天局在声明中再次确认计划在 2030 年前登月。中国还在上周完成了新一代载人运载火箭长征十号的首次系留点火试验,6 月份完成了梦舟载人飞船零高度逃逸试验。

  15. 英国官员想要阻止儿童使用 VPN 浏览成人内容

    英格兰儿童事务专员 Rachel de Souza 表示,政府需要阻止儿童使用 VPN 绕过色情网站的年龄验证。她表示该漏洞需要堵上,呼吁 VPN 服务也要验证年龄。在 PornHub、Reddit 和 X 等网站开始对访问成人内容的英国用户验证年龄后,VPN 成为英国苹果 App Store 下载量最多的应用。对于限制 VPN 的评论,英国科学、创新和科技部发言人回应称,VPN 是成人使用的合法工具,目前没有禁止的计划。但如果平台故意向儿童推销 VPN 作为绕过年龄验证方法,将面临严厉执法和巨额罚款。