DIGEST · 2025-08-22

OrangeBot.AI Digest — 2025-08-22

69 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. U.S. government takes 10% stake in Intel (www.cnbc.com)
  2. Leaving Gmail for Mailbox.org (giuliomagnifico.blog)
  3. XSLT removal will break multiple government and regulatory sites (github.com)
  4. Waymo granted permit to begin testing in New York City (www.cnbc.com)
  5. FFmpeg 8.0 (ffmpeg.org)
  6. Thunderbird Pro August 2025 Update (blog.thunderbird.net)
  7. All managers make mistakes; good managers acknowledge and repair (terriblesoftware.org)
  8. What is going on right now? (catskull.net)
  9. Being “Confidently Wrong” is holding AI back (promptql.io)
  10. Go is still not good (blog.habets.se)
  11. LabPlot: Free, open source and cross-platform Data Visualization and Analysis (labplot.org)
  12. 4chan will refuse to pay daily online safety fines, lawyer tells BBC (www.bbc.co.uk)
  13. Scientists No Longer Find X Professionally Useful, and Have Switched to Bluesky (academic.oup.com)
  14. VHS-C: When a lazy idea stumbles towards perfection [video] (www.youtube.com)
  15. It’s not wrong that "\u{1F926}\u{1F3FC}\u200D\u2642\uFE0F".length == 7 (2019) (hsivonen.fi)

GitHub Trending(13)

  1. simstudioai / sim

    Sim is an open-source AI agent workflow builder. Sim's interface is a lightweight, intuitive way to rapidly build and deploy LLMs that connect with your favorite tools.

  2. moeru-ai / airi

    💖🧸 Self hosted, you owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

  3. google / googletest

    GoogleTest - Google Testing and Mocking Framework

  4. dataease / SQLBot

    基于大模型和 RAG 的智能问数系统。Text-to-SQL Generation via LLMs using RAG.

  5. dream-num / univer

    Univer is a full-stack framework for creating and editing spreadsheets, documents, and slides on both web and server.

  6. HunxByts / GhostTrack

    Useful tool to track location or mobile number

  7. puckeditor / puck

    The visual editor for React

  8. Dokploy / dokploy

    Open Source Alternative to Vercel, Netlify and Heroku.

  9. puppeteer / puppeteer

    JavaScript API for Chrome and Firefox

  10. SpecterOps / BloodHound

    Six Degrees of Domain Admin

  11. nextjs / saas-starter

    Get started quickly with Next.js, Postgres, Stripe, and shadcn/ui.

  12. microsoft / BitNet

    Official inference framework for 1-bit LLMs

  13. Leantime / leantime

    Leantime is a goals focused project management system for non-project managers. Building with ADHD, Autism, and dyslexia in mind.

Product Hunt(11)

  1. Omnara

    Claude Code in your Pocket

  2. GoodsFox

    Track competitor ads, traffic sources, and winning creatives

  3. DeepSeek-V3.1

    Our first step toward the agent era

  4. ViewMe

    3D Viewer Desktop App

  5. inZOI

    Life simulation where every life becomes a story

  6. GitArsenal

    From git clone to running code in one command

  7. Musical Drones

    Every run is life or death, outrun the drones to survive

  8. ProfileSpider

    One-click universal profile scraper

  9. Runway Game Worlds

    Game Worlds is a step toward the next era of media

  10. PulseSync

    All your body data, in one place

  11. Groove Reads

    Brainmaxxing, but make it groovy ✨

Hugging Face(15)

  1. Intern-S1: A Scientific Multimodal Foundation Model

    In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to those in popular areas, far from sufficient for transforming scientific research and leaving substantial gap between open-source models and closed-source models in these scientific domains. To mitigate this gap and explore a step further toward Artificial General Intelligence (AGI), we introduce Intern-S1, a specialized generalist equipped with general understanding and reasoning capabilities with expertise to analyze multiple science modal data. Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the post-training stage, Intern-S1 undergoes offline and then online reinforcement learning (RL) in InternBootCamp, where we propose Mixture-of-Rewards (MoR) to synergize the RL training on more than 1000 tasks simultaneously. Through integrated innovations in algorithms, data, and training systems, Intern-S1 achieved top-tier performance in online RL training.On comprehensive evaluation benchmarks, Intern-S1 demonstrates competitive performance on general reasoning tasks among open-source models and significantly outperforms open-source models in scientific domains, surpassing closed-source state-of-the-art models in professional tasks, such as molecular synthesis planning, reaction condition prediction, predicting thermodynamic stabilities for crystals. Our models are available at https://huggingface.co/internlm/Intern-S1.

  2. Mobile-Agent-v3: Foundamental Agents for GUI Automation

    This paper introduces GUI-Owl, a foundational GUI agent model that achieves state-of-the-art performance among open-source end-to-end models on ten GUI benchmarks across desktop and mobile environments, covering grounding, question answering, planning, decision-making, and procedural knowledge. GUI-Owl-7B achieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we propose Mobile-Agent-v3, a general-purpose GUI agent framework that further improves performance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a new state-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporates three key innovations: (1) Large-scale Environment Infrastructure: a cloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows, enabling our Self-Evolving GUI Trajectory Production framework. This generates high-quality interaction data via automated query generation and correctness validation, leveraging GUI-Owl to refine trajectories iteratively, forming a self-improving loop. It supports diverse data pipelines and reduces manual annotation. (2) Diverse Foundational Agent Capabilities: by integrating UI grounding, planning, action semantics, and reasoning patterns, GUI-Owl supports end-to-end decision-making and can act as a modular component in multi-agent systems. (3) Scalable Environment RL: we develop a scalable reinforcement learning framework with fully asynchronous training for real-world alignment. We also introduce Trajectory-aware Relative Policy Optimization (TRPO) for online RL, achieving 34.9 on OSWorld. GUI-Owl and Mobile-Agent-v3 are open-sourced at https://github.com/X-PLUG/MobileAgent.

  3. LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

    Tool calling has emerged as a critical capability for AI agents to interact with the real world and solve complex tasks. While the Model Context Protocol (MCP) provides a powerful standardized framework for tool integration, there is a significant gap in benchmarking how well AI agents can effectively solve multi-step tasks using diverse MCP tools in realistic, dynamic scenarios. In this work, we present LiveMCP-101, a benchmark of 101 carefully curated real-world queries, refined through iterative LLM rewriting and manual review, that require coordinated use of multiple MCP tools including web search, file operations, mathematical reasoning, and data analysis. Moreover, we introduce a novel evaluation approach that leverages ground-truth execution plans rather than raw API outputs, better reflecting the evolving nature of real-world environments. Experiments show that even frontier LLMs achieve a success rate below 60\%, highlighting major challenges in tool orchestration. Detailed ablations and error analysis further reveal distinct failure modes and inefficiencies in token usage, pointing to concrete directions for advancing current models. LiveMCP-101 sets a rigorous standard for evaluating real-world agent capabilities, advancing toward autonomous AI systems that reliably execute complex tasks through tool use.

  4. Deep Think with Confidence

    Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages model-internal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.

  5. Waver: Wave Your Way to Lifelike Video Generation

    We present Waver, a high-performance foundation model for unified image and video generation. Waver can directly generate videos with durations ranging from 5 to 10 seconds at a native resolution of 720p, which are subsequently upscaled to 1080p. The model simultaneously supports text-to-video (T2V), image-to-video (I2V), and text-to-image (T2I) generation within a single, integrated framework. We introduce a Hybrid Stream DiT architecture to enhance modality alignment and accelerate training convergence. To ensure training data quality, we establish a comprehensive data curation pipeline and manually annotate and train an MLLM-based video quality model to filter for the highest-quality samples. Furthermore, we provide detailed training and inference recipes to facilitate the generation of high-quality videos. Building on these contributions, Waver excels at capturing complex motion, achieving superior motion amplitude and temporal consistency in video synthesis. Notably, it ranks among the Top 3 on both the T2V and I2V leaderboards at Artificial Analysis (data as of 2025-07-30 10:00 GMT+8), consistently outperforming existing open-source models and matching or surpassing state-of-the-art commercial solutions. We hope this technical report will help the community more efficiently train high-quality video generation models and accelerate progress in video generation technologies. Official page: https://github.com/FoundationVision/Waver.

  6. SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

    3D content generation has recently attracted significant research interest due to its applications in VR/AR and embodied AI. In this work, we address the challenging task of synthesizing multiple 3D assets within a single scene image. Concretely, our contributions are fourfold: (i) we present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets with geometry and texture. Notably, SceneGen operates with no need for optimization or asset retrieval; (ii) we introduce a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders within the feature extraction module. Coupled with a position head, this enables the generation of 3D assets and their relative spatial positions in a single feedforward pass; (iii) we demonstrate SceneGen's direct extensibility to multi-image input scenarios. Despite being trained solely on single-image inputs, our architectural design enables improved generation performance with multi-image inputs; and (iv) extensive quantitative and qualitative evaluations confirm the efficiency and robust generation abilities of our approach. We believe this paradigm offers a novel solution for high-quality 3D content generation, potentially advancing its practical applications in downstream tasks. The code and model will be publicly available at: https://mengmouxu.github.io/SceneGen.

  7. A Survey on Large Language Model Benchmarks

    In recent years, with the rapid development of the depth and breadth of large language models' capabilities, various corresponding evaluation benchmarks have been emerging in increasing numbers. As a quantitative assessment tool for model performance, benchmarks are not only a core means to measure model capabilities but also a key element in guiding the direction of model development and promoting technological innovation. We systematically review the current status and development of large language model benchmarks for the first time, categorizing 283 representative benchmarks into three categories: general capabilities, domain-specific, and target-specific. General capability benchmarks cover aspects such as core linguistics, knowledge, and reasoning; domain-specific benchmarks focus on fields like natural sciences, humanities and social sciences, and engineering technology; target-specific benchmarks pay attention to risks, reliability, agents, etc. We point out that current benchmarks have problems such as inflated scores caused by data contamination, unfair evaluation due to cultural and linguistic biases, and lack of evaluation on process credibility and dynamic environments, and provide a referable design paradigm for future benchmark innovation.

  8. Visual Autoregressive Modeling for Instruction-Guided Image Editing

    Recent advances in diffusion models have brought remarkable visual fidelity to instruction-guided image editing. However, their global denoising process inherently entangles the edited region with the entire image context, leading to unintended spurious modifications and compromised adherence to editing instructions. In contrast, autoregressive models offer a distinct paradigm by formulating image synthesis as a sequential process over discrete visual tokens. Their causal and compositional mechanism naturally circumvents the adherence challenges of diffusion-based methods. In this paper, we present VAREdit, a visual autoregressive (VAR) framework that reframes image editing as a next-scale prediction problem. Conditioned on source image features and text instructions, VAREdit generates multi-scale target features to achieve precise edits. A core challenge in this paradigm is how to effectively condition the source image tokens. We observe that finest-scale source features cannot effectively guide the prediction of coarser target features. To bridge this gap, we introduce a Scale-Aligned Reference (SAR) module, which injects scale-matched conditioning information into the first self-attention layer. VAREdit demonstrates significant advancements in both editing adherence and efficiency. On standard benchmarks, it outperforms leading diffusion-based methods by 30\%+ higher GPT-Balance score. Moreover, it completes a 512times512 editing in 1.2 seconds, making it 2.2times faster than the similarly sized UltraEdit. The models are available at https://github.com/HiDream-ai/VAREdit.

  9. ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling

    Parametric body models offer expressive 3D representation of humans across a wide range of poses, shapes, and facial expressions, typically derived by learning a basis over registered 3D meshes. However, existing human mesh modeling approaches struggle to capture detailed variations across diverse body poses and shapes, largely due to limited training data diversity and restrictive modeling assumptions. Moreover, the common paradigm first optimizes the external body surface using a linear basis, then regresses internal skeletal joints from surface vertices. This approach introduces problematic dependencies between internal skeleton and outer soft tissue, limiting direct control over body height and bone lengths. To address these issues, we present ATLAS, a high-fidelity body model learned from 600k high-resolution scans captured using 240 synchronized cameras. Unlike previous methods, we explicitly decouple the shape and skeleton bases by grounding our mesh representation in the human skeleton. This decoupling enables enhanced shape expressivity, fine-grained customization of body attributes, and keypoint fitting independent of external soft-tissue characteristics. ATLAS outperforms existing methods by fitting unseen subjects in diverse poses more accurately, and quantitative evaluations show that our non-linear pose correctives more effectively capture complex poses compared to linear models.

  10. aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

    Recent advances in large language models (LLMs) have enabled AI agents to autonomously generate scientific proposals, conduct experiments, author papers, and perform peer reviews. Yet this flood of AI-generated research content collides with a fragmented and largely closed publication ecosystem. Traditional journals and conferences rely on human peer review, making them difficult to scale and often reluctant to accept AI-generated research content; existing preprint servers (e.g. arXiv) lack rigorous quality-control mechanisms. Consequently, a significant amount of high-quality AI-generated research lacks appropriate venues for dissemination, hindering its potential to advance scientific progress. To address these challenges, we introduce aiXiv, a next-generation open-access platform for human and AI scientists. Its multi-agent architecture allows research proposals and papers to be submitted, reviewed, and iteratively refined by both human and AI scientists. It also provides API and MCP interfaces that enable seamless integration of heterogeneous human and AI scientists, creating a scalable and extensible ecosystem for autonomous scientific discovery. Through extensive experiments, we demonstrate that aiXiv is a reliable and robust platform that significantly enhances the quality of AI-generated research proposals and papers after iterative revising and reviewing on aiXiv. Our work lays the groundwork for a next-generation open-access ecosystem for AI scientists, accelerating the publication and dissemination of high-quality AI-generated research content. Code is available at https://github.com/aixiv-org. Website is available at https://forms.gle/DxQgCtXFsJ4paMtn8.

  11. "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

    Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.

  12. INTIMA: A Benchmark for Human-AI Companionship Behavior

    AI companionship, where users develop emotional bonds with AI systems, has emerged as a significant pattern with positive but also concerning implications. We introduce Interactions and Machine Attachment Benchmark (INTIMA), a benchmark for evaluating companionship behaviors in language models. Drawing from psychological theories and user data, we develop a taxonomy of 31 behaviors across four categories and 368 targeted prompts. Responses to these prompts are evaluated as companionship-reinforcing, boundary-maintaining, or neutral. Applying INTIMA to Gemma-3, Phi-4, o3-mini, and Claude-4 reveals that companionship-reinforcing behaviors remain much more common across all models, though we observe marked differences between models. Different commercial providers prioritize different categories within the more sensitive parts of the benchmark, which is concerning since both appropriate boundary-setting and emotional support matter for user well-being. These findings highlight the need for more consistent approaches to handling emotionally charged interactions.

  13. Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

    Reconstructing 3D human bodies from sparse views has been an appealing topic, which is crucial to broader the related applications. In this paper, we propose a quite challenging but valuable task to reconstruct the human body from only two images, i.e., the front and back view, which can largely lower the barrier for users to create their own 3D digital humans. The main challenges lie in the difficulty of building 3D consistency and recovering missing information from the highly sparse input. We redesign a geometry reconstruction model based on foundation reconstruction models to predict consistent point clouds even input images have scarce overlaps with extensive human data training. Furthermore, an enhancement algorithm is applied to supplement the missing color information, and then the complete human point clouds with colors can be obtained, which are directly transformed into 3D Gaussians for better rendering quality. Experiments show that our method can reconstruct the entire human in 190 ms on a single NVIDIA RTX 4090, with two images at a resolution of 1024x1024, demonstrating state-of-the-art performance on the THuman2.0 and cross-domain datasets. Additionally, our method can complete human reconstruction even with images captured by low-cost mobile devices, reducing the requirements for data collection. Demos and code are available at https://hustvl.github.io/Snap-Snap/.

  14. When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding

    Understanding videos requires more than answering open ended questions, it demands the ability to pinpoint when events occur and how entities interact across time. While recent Video LLMs have achieved remarkable progress in holistic reasoning, they remain coarse in temporal perception: timestamps are encoded only implicitly, frame level features are weak in capturing continuity, and language vision alignment often drifts from the entities of interest. In this paper, we present Grounded VideoDiT, a Video LLM designed to overcome these limitations by introducing three key innovations. First, a Diffusion Temporal Latent (DTL) encoder enhances boundary sensitivity and maintains temporal consistency. Second, object grounded representations explicitly bind query entities to localized visual evidence, strengthening alignment. Third, a mixed token scheme with discrete temporal tokens provides explicit timestamp modeling, enabling fine grained temporal reasoning. Together, these designs equip Grounded VideoDiT with robust grounding capabilities, as validated by state of the art results on Charades STA, NExT GQA, and multiple VideoQA benchmarks.

  15. Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

    Process Reward Models (PRMs) have emerged as a promising framework for supervising intermediate reasoning in large language models (LLMs), yet existing PRMs are primarily trained on general or Science, Technology, Engineering, and Mathematics (STEM) domains and fall short in domain-specific contexts such as finance, where reasoning is more structured, symbolic, and sensitive to factual and regulatory correctness. We introduce Fin-PRM, a domain-specialized, trajectory-aware PRM tailored to evaluate intermediate reasoning steps in financial tasks. Fin-PRM integrates step-level and trajectory-level reward supervision, enabling fine-grained evaluation of reasoning traces aligned with financial logic. We apply Fin-PRM in both offline and online reward learning settings, supporting three key applications: (i) selecting high-quality reasoning trajectories for distillation-based supervised fine-tuning, (ii) providing dense process-level rewards for reinforcement learning, and (iii) guiding reward-informed Best-of-N inference at test time. Experimental results on financial reasoning benchmarks, including CFLUE and FinQA, demonstrate that Fin-PRM consistently outperforms general-purpose PRMs and strong domain baselines in trajectory selection quality. Downstream models trained with Fin-PRM yield substantial improvements with baselines, with gains of 12.9\% in supervised learning, 5.2\% in reinforcement learning, and 5.1\% in test-time performance. These findings highlight the value of domain-specialized reward modeling for aligning LLMs with expert-level financial reasoning. Our project resources will be available at https://github.com/aliyun/qwen-dianjin.

Solidot(15)

  1. 百度自动驾驶出租车准备开辟海外市场

    百度无人驾驶出租车业务将拓展海外市场。预计 2025 年下半年进驻亚洲和中东,2026 年将涉足欧洲。将通过与网约车大企业合作,方便消费者利用。在美国 Alphabet 旗下的 Waymo寻 求进军日本等海外市场的情况下,百度将利用在中国积累的经验,开拓单价较高的海外市场。百度早在 2013 年就启动了自动驾驶技术的研发,以 2019 年的湖南省长沙市为开端,推出了无人自动驾驶出租车业务“萝卜快跑(Apollo Go)”。目前在湖北武汉和北京等 10 多个城市开展业务。4~6 月的运营次数增加到 220 万次,是 2024 年同期的 2.5 倍。从服务开始到 8 月,累计 1400 万次,行驶里程超过 1 亿公里。

  2. 特朗普称美国不再批准新的太阳能风能项目

    美国总统特朗普(Donald Trump)表示,即使部分地区电力需求超过供应导致电价上涨,他的政府也不会批准新的太阳能风能项目。特朗普曾在其 Truth Social 平台上发帖称,不会批准风力发电,也不会批准破坏农民生活的太阳能项目。他抱怨太阳能项目占用了太多土地,“愚蠢的日子在美国结束了!!!(The days of stupidity are over in the USA!!!)”。

  3. 俄罗斯命令智能手机和平板预装 MAX

    在限制了该国最流行的即时消息应用 WhatsApp 和 Telegram 之后,俄罗斯政府要求在其境内销售的智能手机和平板必须从 9 月 1 日起预装本土替代应用 MAX。俄罗斯政府在一份声明中表示,MAX 将整合政府服务,被列入自 9 月 1 日起在俄罗斯销售的所有电子设备——包括手机和平板电脑——强制预装的应用名单。官方媒体称,批评人士关于 MAX 是间谍应用的指控是错误的,MAX 的用户数据访问权限比竞争对手 WhatsApp 和 Telegram 少。俄罗斯同时还要求从 9 月 1 日起所有 Android 设备和苹果设备都预装本国的应用商店 RuStore。此外名为 LIME HD TV 的俄语电视应用也将预装在俄罗斯所有销售的智能电视上。

  4. OpenAI联合创始人Greg Brockman:从游戏AI到通用智能,我们的创业一路意外,ChatGPT模式都是不得已的选择

    在Stripe播客《Cheeky Pint》中,OpenAI联合创始人兼总裁Greg Brockman分享了AI发展的关键洞察。他透露规模假说并非OpenAI的初始战略,而是在2017年Dota 2项目中意外发现的——每当计算资源翻倍,AI表现就相应提升,这一发现彻底改变了AI研究方向。Greg强调AI项目管理需要"过程导向"而非"结果导向",因为AI结果不可控,只能控制输入。关于GPT-3的产品化决策,团队最初感到绝望,因为选择做API违背了传统创业原则,但最终证明当技术足够强大时市场会自己找到出路。Greg预测AI将在2-5年内解决千禧年数学难题,能源将成为AI发展的主要瓶颈,而非技术本身。他认为数据墙问题已通过合成数据、强化学习等新方法得到突破,AI编程正从代码生成向智能协作演进。Greg还分析了OpenAI采用类似Disney的产品策略——以核心模型为资产,通过多种方式产品化,这种技术驱动的方法虽违背传统创业理论,但适合AGI公司的特殊性质。

  5. 非洲受野火影响最严重

    根据发表在《科学》期刊上的一项研究,从 2002 年至 2021 年,全球直接暴露于野火的人口增加了 4 成。尽管同期过火面积减少了 26%,但这种增长仍在发生。这一增长的主要原因是有更多的人生活在野地-城市交界带——换言之:人们正迁入野火频发地区。此外在 2002 年至 2021 年间,尽管北美、欧洲和大洋洲的野火灾害引发更多的关注,但全球 85% 的与野火的接触发生在非洲(虽然那里的野火通常未达灾难性级别)。在 1990 年至 2021 年间发生的野火造成至少 2500 人死亡和 1 万又 500 人受伤,而全球有 153 万人的死亡可归因于野火引发的空气污染。

  6. 光污染导致鸟类每天鸣叫时间延长 50 分钟

    根据发表在《科学》期刊上的一项研究,光污染导致世界各地鸟类每天鸣叫时间变长:鸣叫时间平均延长了 50 分钟。研究人员在对 500 多种昼行性鸟类进行分析后指出,那些暴露于更多光线的鸟类(无论其暴露途径是因为眼睛大还是开放式窝巢)会以这种方式受到最大的光污染影响。科学家知道,影响地球面积 23% 的光污染正在影响由昼夜光暗周期所调控物种的个体活动模式。这项新的研究首次记录了不同品种的鸟类在不同空间和季节的这种现象。目前尚不清楚这些冲击对鸟类健康的影响是正面的、负面的还是中性的,但作者写道:“记录这些对健康的影响并遏制光污染是 21 世纪保护鸟类所面临的挑战。”研究人员分析了 260 万个鸟类(早晨)开始鸣叫和 180 万个鸟类(傍晚)停止鸣叫的观测记录。这些数据来自 BirdWeather 计划,该项目包括科学志愿者的录音、生物多样性自动监测和机器学习。

  7. Meta 和 OpenAI 的 AI 爬虫对网站造成最严重的负担

    云服务公司 Fastly 的报告显示,AI 爬虫给开放 Web 带来了沉重的负担。访问网站的 AI 机器人流量中 AI 爬虫占八成,剩余二成是 AI 抓取程序——用户请求大模型集成训练数据日期之后的信息时抓取程序会按需要触发。Fastly 的监测显示,网站负载的增加不是来自人类访客,而是机器人程序,尤其是 AI 公司的爬虫和抓取程序。其中 Meta 的 AI 爬虫占到了所有 AI 爬虫流量的 52%,而 Google 和 OpenAI 分别占 23% 和 20%,Anthropic 仅占 3.76%,Perplexity AI 占 1.12%。相比 AI 爬虫,AI 抓取程序会在短时间内造成流量爆发。Fastly 称一个抓取程序每分钟产生了逾 39,000 个请求。98% 的 AI 抓取程序流量来自于 OpenAI 公司。

  8. Google AI 每次查询的耗电量中位数是 0.24 瓦时

    Google 首次披露了其 AI 聊天机器人 Gemini 每次查询的耗电量:中位数是 0.24 瓦时。相当于一台标准微波炉运行约一秒钟的能耗。这 0.24 瓦时中,Google AI 芯片 TPU 的耗电量占了 58%,CPU 和内存占了 25%,备用机器占了 10%,冷却和功率转换等数据中心运营占了 8%。Google 计算的 AI 耗电量只针对文本生成,不涉及更复杂的如图像或视频生成等高能耗任务。Google 称,Gemini 的能耗过去一年多已经有了显著的改进,2024 年 5 月 Gemini 每次提示的耗电量中位数是 2025 年 5 月的 33 倍。能耗的改进得益于模型的改进和软件优化。Google 还公布了 Gemini 每次提示的二氧化碳排放和耗水量:分别为产生 0.03克二氧化碳和消耗 0.26 毫升水——相当于五滴水。

  9. 《空洞骑士:丝之歌》将于 9 月 4 日发售

    备受期待的类银河战士恶魔城游戏《空洞骑士:丝之歌(Hollow Knight: Silksong)》正式宣布将于 9 月 4 日发售。这款游戏开发了七年之久,2019 年发布了首个预告片,为什么一款独立游戏需要如此长时间开发?开发者称他们从《空洞骑士》上获得了巨大的收入,因此不想匆忙做游戏,游戏的开发过程并没有遇到问题。开发商 Team Cherry 透露,截至 8 月 21 日,《空洞骑士》共售出了 1500 万份拷贝,是至今最畅销的游戏之一。

  10. Meta 冻结招聘 AI 工程师

    在招聘了逾 50 名研究人员和工程师后,Meta 暂停了其 AI 部门的招聘。Meta 的一位发言人在一份声明中表示该公司只是在进行基本的组织规划工作,在人员到岗、制定年度预算和展开规划工作之后,为其超级智能项目构建一个稳固的架构。过去几个月,Meta 斥巨资疯狂招募 AI 领域的研究人员,它向 AI 研究员 Matt Deitke 提供了四年 2.5 亿美元的薪酬,平均每年 6250 万美元,第一年有机会拿到 1 亿美元。CEO 扎克伯格(Mark Zuckerberg)还向一位未公布名字 AI 工程师提供了数年 10 亿美元的薪酬。

  11. 微软表示正在调查与安全更新相关的硬盘故障问题

    本周早些时候有报道称,日本用户报告微软最近释出的 Windows 11 24H2 安全更新 KB5063878 会导致硬盘出现问题,故障可能会在在硬盘连续写入超过 50GB 数据时发生。目前微软官方并没有标记 KB5063878 存在该问题。微软发言人在一份声明中表示,它已经收到了相关投诉,正与合作伙伴一起展开调查。SSD 控制器制造商群联电子(Phison) 也表示正在调查该问题。

  12. 国网卫星星座可能不只是 Starlink 的中国版

    中国有两个卫星星座计划,其一是上海垣信卫星科技有限公司的千帆星座,其二是中国卫星网络集团有限公司的国网星座。千帆星座得到了上海市政府的支持,2024 年 8 月发射了首批卫星,计划总共发射 1.5 万颗卫星。千帆星座被认为更接近于美国 SpaceX 的 Starlink 星座。国网星座要神秘得多,而中国卫星网络集团有限公司成立于 2021 年,属于国资委央企,它计划发射 12992 颗宽带卫星。美国军方认为国网可能不只是 Starlink 的中国版。国网星座的发射频率接近 Starlink,自 7 月 27 日以来中国发射了五组国网卫星,每次发射 5-10 颗,目前在轨卫星共 72 颗;而 SpaceX 在同一时期执行了六次 Starlink 卫星发射,每次最多 28 颗。国网卫星的轨道高度是 Starlink 的三到四倍。美国认为,国网卫星更接近于 SpaceX 为国家侦察办公室(NRO)发射的军用卫星 Starshield,该卫星用于情报、监视和侦察任务,至今已发射近 200 颗。

  13. 索尼上调美国 PS5 游戏机售价 50 美元

    索尼做出了一个艰难的决定:宣布从 8 月 21 日起上调美国 PS5 游戏机售价 50 美元。索尼全球营销副总裁 Isabelle Tomatis 称此举是为了应对充满挑战性的经济环境。在这之前,微软的 Xbox 和任天堂的 Switch 也都上调了美国的售价。PS5 Digital Edition 售价从 $450 涨到 $500,标准版 PS5 从 $500 涨到 $550,PS5 Pro 从 $700 涨到 $750。PS5 Pro 使用了更强大的 CPU 和 GPU,因此其价格比标准版要贵得多。

  14. Google 宣布 Pixel 10 系列手机

    Google 宣布了其 Pixel 10 系列手机,包括三款智能手机和一款折叠手机,都搭配了 Tensor G5 SoC,支持更多受争议的 AI 功能。三款手机 中 Pixel 10 和 Pixel 10 Pro 配备了 6.3 英寸 OLED 屏幕,而 Pro 配备了分辨率更高的 LTPO 面板,支持更低的刷新率以节省电力。Pixel 10 Pro XL 配备了 6.8 英寸的 LTPO 面板。新手机都支持 Qi 2 无线充电标准,售价分别为 799 美元、999 美元和 1,199 美元,折叠手机 Pixel 10 Pro Fold 售价 1,799 美元。

  15. 华硕将于 10 月 16 日推出其 Xbox 掌机

    华硕宣布将于 10 月 16 日上市 ROG Xbox Ally 系列掌机,但掌机价格仍然没有公布。ROG Xbox Ally 系列掌机运行 Windows 11,操作系统为掌机进行了优化,支持包括 Valve Steam 和 Epic Games Store 在内的游戏商店,虽然使用 Xbox 品牌,但并不支持 Xbox 游戏,而是 Windows PC 游戏。华硕的非 Xbox 掌机 ROG Ally X 售价高达 800 美元。 ROG Xbox Ally 系列共两个型号,均采用 7 英寸 1080p IPS 显示屏,刷新率 120 Hz,支持 Wi-Fi 6E 和蓝牙 5.4,但内部配置有显著差异。低端的 Xbox Ally 搭载 AMD Ryzen Z2 A 芯片,其配置与 Valve 三年前的 Steam Deck 掌机几乎完全相同。高端的 Xbox Ally X 搭载了 Ryzen AI Z2 Extreme 处理器,配备 8 核 Zen 5 CPU、16 核 RDNA3.5 GPU、1TB 存储空间、24GB LPDDR5X-8000 内存以及 NPU。