DIGEST · 2025-07-17

OrangeBot.AI Digest — 2025-07-17

73 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Apple Intelligence Foundation Language Models Tech Report 2025 (machinelearning.apple.com)
  2. The patterns of elites who conceal their assets offshore (home.dartmouth.edu)
  3. ChatGPT agent: bridging research and action (openai.com)
  4. Tell HN: Notion Desktop is monitoring your audio and network
  5. How I Use Kagi (flamedfury.com)
  6. Mistral Releases Deep Research, Voice, Projects in Le Chat (mistral.ai)
  7. On doing hard things (parv.bearblog.dev)
  8. My bank keeps on undermining anti-phishing education (moritz-mander.de)
  9. Hand: open-source Robot Hand (github.com)
  10. Upcoming coordinated security fix for all Matrix server implementations (matrix.org)
  11. Retro gaming YouTuber Once Were Nerd sued and raided by the Italian government (www.androidauthority.com)
  12. Archaeologists discover tomb of first king of Caracol (uh.edu)
  13. Code execution through email: How I used Claude to hack itself (www.pynt.io)
  14. Wttr: Console-oriented weather forecast service (github.com)
  15. Inside the box: Everything I did with an Arduino starter kit (lopespm.com)

GitHub Trending(15)

  1. microsoft / markitdown

    Python tool for converting files and office documents to Markdown.

  2. gitleaks / gitleaks

    Find secrets with Gitleaks 🔑

  3. soxoj / maigret

    🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

  4. maotoumao / MusicFree

    插件化、定制化、无广告的免费音乐播放器

  5. PromtEngineer / localGPT

    Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

  6. vanna-ai / vanna

    🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

  7. AykutSarac / jsoncrack.com

    ✨ Innovative and open-source visualization application that transforms various data formats, such as JSON, YAML, XML, CSV and more, into interactive graphs.

  8. musistudio / claude-code-router

    Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.

  9. WasmEdge / WasmEdge

    WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices, smart contracts, and IoT devices.

  10. strapi / strapi

    🚀 Strapi is the leading open-source headless CMS. It’s 100% JavaScript/TypeScript, fully customizable, and developer-first.

  11. langchain-ai / open_deep_research
  12. helix-editor / helix

    A post-modern modal text editor.

  13. freeCodeCamp / devdocs

    API Documentation Browser

  14. cloudcommunity / Free-Certifications

    A curated list of free courses with certifications. Also available at https://free-certifications.com/

  15. Kyome22 / RunCat365

    A cute running cat animation on your windows taskbar.

Product Hunt(15)

  1. Untitled UI React

    Open-source React components. Just copy, paste, and build.

  2. Runway

    Applicant ranking tool that adapts to YOUR hiring priorities

  3. Maestro Studio Desktop Beta

    Run mobile & web tests in minutes with a desktop app

  4. Snack it

    Build AI moodboards from anything you see online 🍫

  5. Beeper

    All your chats in one app

  6. Kawara 2.0

    Send newsletters consistently, zero writer’s block

  7. Symvol

    Turn any tech doc into a video · vibelearn → vibecode

  8. NotebookLM Featured Notebooks

    Explore topics with expert-driven sources

  9. UTCP

    The open, direct alternative to MCP for tool calling

  10. ADK-TS

    Build smart, tool-using agents in just one line

  11. Clevr

    Finally, an AI that talks and explains it visually

  12. Kite

    Turn email into your superpower

  13. Snapsnob

    Snapsnob is iOS app that helps you clean your photo gallery!

  14. Textalyz

    Bridge language gaps. Write & communicate like a pro

  15. Easy Bookmark Viewer

    A beautiful and powerful new tab for your bookmarks.

Hugging Face(13)

  1. Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

    Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge, yet it falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts. This survey synthesizes both strands under a unified reasoning-retrieval perspective. We first map how advanced reasoning optimizes each stage of RAG (Reasoning-Enhanced RAG). Then, we show how retrieved knowledge of different type supply missing premises and expand context for complex inference (RAG-Enhanced Reasoning). Finally, we spotlight emerging Synergized RAG-Reasoning frameworks, where (agentic) LLMs iteratively interleave search and reasoning to achieve state-of-the-art performance across knowledge-intensive benchmarks. We categorize methods, datasets, and open challenges, and outline research avenues toward deeper RAG-Reasoning systems that are more effective, multimodally-adaptive, trustworthy, and human-centric. The collection is available at https://github.com/DavidZWZ/Awesome-RAG-Reasoning.

  2. PhysX: Physical-Grounded 3D Asset Generation

    3D modeling is moving from virtual to physical. Existing 3D generation primarily emphasizes geometries and textures while neglecting physical-grounded modeling. Consequently, despite the rapid development of 3D generative models, the synthesized 3D assets often overlook rich and important physical properties, hampering their real-world application in physical domains like simulation and embodied AI. As an initial attempt to address this challenge, we propose PhysX, an end-to-end paradigm for physical-grounded 3D asset generation. 1) To bridge the critical gap in physics-annotated 3D datasets, we present PhysXNet - the first physics-grounded 3D dataset systematically annotated across five foundational dimensions: absolute scale, material, affordance, kinematics, and function description. In particular, we devise a scalable human-in-the-loop annotation pipeline based on vision-language models, which enables efficient creation of physics-first assets from raw 3D assets.2) Furthermore, we propose PhysXGen, a feed-forward framework for physics-grounded image-to-3D asset generation, injecting physical knowledge into the pre-trained 3D structural space. Specifically, PhysXGen employs a dual-branch architecture to explicitly model the latent correlations between 3D structures and physical properties, thereby producing 3D assets with plausible physical predictions while preserving the native geometry quality. Extensive experiments validate the superior performance and promising generalization capability of our framework. All the code, data, and models will be released to facilitate future research in generative physical AI.

  3. SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

    Code performance optimization is paramount in real-world software engineering and critical for production-level systems. While Large Language Models (LLMs) have demonstrated impressive capabilities in code generation and bug fixing, their proficiency in enhancing code performance at the repository level remains largely unexplored. To address this gap, we introduce SWE-Perf, the first benchmark specifically designed to systematically evaluate LLMs on code performance optimization tasks within authentic repository contexts. SWE-Perf comprises 140 carefully curated instances, each derived from performance-improving pull requests from popular GitHub repositories. Each benchmark instance includes the relevant codebase, target functions, performance-related tests, expert-authored patches, and executable environments. Through a comprehensive evaluation of representative methods that span file-level and repo-level approaches (e.g., Agentless and OpenHands), we reveal a substantial capability gap between existing LLMs and expert-level optimization performance, highlighting critical research opportunities in this emerging field.

  4. MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

    Humans are integral components of the transportation ecosystem, and understanding their behaviors is crucial to facilitating the development of safe driving systems. Although recent progress has explored various aspects of human behaviorx2014such as motion, trajectories, and intentionx2014a comprehensive benchmark for evaluating human behavior understanding in autonomous driving remains unavailable. In this work, we propose MMHU, a large-scale benchmark for human behavior analysis featuring rich annotations, such as human motion and trajectories, text description for human motions, human intention, and critical behavior labels relevant to driving safety. Our dataset encompasses 57k human motion clips and 1.73M frames gathered from diverse sources, including established driving datasets such as Waymo, in-the-wild videos from YouTube, and self-collected data. A human-in-the-loop annotation pipeline is developed to generate rich behavior captions. We provide a thorough dataset analysis and benchmark multiple tasksx2014ranging from motion prediction to motion generation and human behavior question answeringx2014thereby offering a broad evaluation suite. Project page : https://MMHU-Benchmark.github.io.

  5. MOSPA: Human Motion Generation Driven by Spatial Audio

    Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling and motion synthesis. Despite its significance, this task remains largely unexplored. Most previous works have primarily focused on mapping modalities like speech, audio, and music to generate human motion. As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion. To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. For benchmarking, we develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA, which faithfully captures the relationship between body motion and spatial audio through an effective fusion mechanism. Once trained, MOSPA could generate diverse realistic human motions conditioned on varying spatial audio inputs. We perform a thorough investigation of the proposed dataset and conduct extensive experiments for benchmarking, where our method achieves state-of-the-art performance on this task. Our model and dataset will be open-sourced upon acceptance. Please refer to our supplementary video for more details.

  6. Seq vs Seq: An Open Suite of Paired Encoders and Decoders

    The large language model (LLM) community focuses almost exclusively on decoder-only language models, since they are easier to use for text generation. However, a large subset of the community still uses encoder-only models for tasks such as classification or retrieval. Previous work has attempted to compare these architectures, but is forced to make comparisons with models that have different numbers of parameters, training techniques, and datasets. We introduce the SOTA open-data Ettin suite of models: paired encoder-only and decoder-only models ranging from 17 million parameters to 1 billion, trained on up to 2 trillion tokens. Using the same recipe for both encoder-only and decoder-only models produces SOTA recipes in both categories for their respective sizes, beating ModernBERT as an encoder and Llama 3.2 and SmolLM2 as decoders. Like previous work, we find that encoder-only models excel at classification and retrieval tasks while decoders excel at generative tasks. However, we show that adapting a decoder model to encoder tasks (and vice versa) through continued training is subpar compared to using only the reverse objective (i.e. a 400M encoder outperforms a 1B decoder on MNLI, and vice versa for generative tasks). We open-source all artifacts of this study including training data, training order segmented by checkpoint, and 200+ checkpoints to allow future work to analyze or extend all aspects of training.

  7. DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering

    Large Language Model (LLM) agents have shown great potential for solving real-world problems and promise to be a solution for tasks automation in industry. However, more benchmarks are needed to systematically evaluate automation agents from an industrial perspective, for example, in Civil Engineering. Therefore, we propose DrafterBench for the comprehensive evaluation of LLM agents in the context of technical drawing revision, a representation task in civil engineering. DrafterBench contains twelve types of tasks summarized from real-world drawing files, with 46 customized functions/tools and 1920 tasks in total. DrafterBench is an open-source benchmark to rigorously test AI agents' proficiency in interpreting intricate and long-context instructions, leveraging prior knowledge, and adapting to dynamic instruction quality via implicit policy awareness. The toolkit comprehensively assesses distinct capabilities in structured data comprehension, function execution, instruction following, and critical reasoning. DrafterBench offers detailed analysis of task accuracy and error statistics, aiming to provide deeper insight into agent capabilities and identify improvement targets for integrating LLMs in engineering applications. Our benchmark is available at https://github.com/Eason-Li-AIS/DrafterBench, with the test set hosted at https://huggingface.co/datasets/Eason666/DrafterBench.

  8. AnyI2V: Animating Any Conditional Image with Motion Control

    Recent advancements in video generation, particularly in diffusion models, have driven notable progress in text-to-video (T2V) and image-to-video (I2V) synthesis. However, challenges remain in effectively integrating dynamic motion signals and flexible spatial constraints. Existing T2V methods typically rely on text prompts, which inherently lack precise control over the spatial layout of generated content. In contrast, I2V methods are limited by their dependence on real images, which restricts the editability of the synthesized content. Although some methods incorporate ControlNet to introduce image-based conditioning, they often lack explicit motion control and require computationally expensive training. To address these limitations, we propose AnyI2V, a training-free framework that animates any conditional images with user-defined motion trajectories. AnyI2V supports a broader range of modalities as the conditional image, including data types such as meshes and point clouds that are not supported by ControlNet, enabling more flexible and versatile video generation. Additionally, it supports mixed conditional inputs and enables style transfer and editing via LoRA and text prompts. Extensive experiments demonstrate that the proposed AnyI2V achieves superior performance and provides a new perspective in spatial- and motion-controlled video generation. Code is available at https://henghuiding.com/AnyI2V/.

  9. SpatialTrackerV2: 3D Point Tracking Made Easy

    We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50times faster.

  10. Lizard: An Efficient Linearization Framework for Large Language Models

    We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into flexible, subquadratic architectures for infinite-context generation. Transformer-based LLMs face significant memory and computational bottlenecks as context lengths increase, due to the quadratic complexity of softmax attention and the growing key-value (KV) cache. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving the output quality. Unlike previous linearization methods, which are often limited by fixed model structures and therefore exclude gating mechanisms, Lizard incorporates a gating module inspired by recent state-of-the-art linear models. This enables adaptive memory control, supports constant-memory inference, offers strong length generalization, and allows more flexible model design. Lizard combines gated linear attention for global context compression with sliding window attention enhanced by meta memory, forming a hybrid mechanism that captures both long-range dependencies and fine-grained local interactions. Moreover, we introduce a hardware-aware algorithm that accelerates the training speed of our models. Extensive experiments show that Lizard achieves near-lossless recovery of the teacher model's performance across standard language modeling tasks, while significantly outperforming previous linearization methods. On the 5-shot MMLU benchmark, Lizard improves over prior models by 18 points and shows significant improvements on associative recall tasks.

  11. Replacing thinking with tool usage enables reasoning in small language models

    Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time. In that line of work, a combination of Supervised Fine-Tuning (SFT) on synthetic demonstrations and Reinforcement Learning with Verifiable Rewards (RLVR) is used for training Large Language Models to expend extra compute during inference in the form of "thoughts" expressed in natural language. In this paper, we propose to instead format these tokens as a multi-turn interaction trace with a stateful tool. At each turn, the new state of the tool is appended to the context of the model, whose job is to generate the tokens necessary to control the tool via a custom DSL. We benchmark this approach on the problem of repairing malfunctioning Python code, and show that this constrained setup allows for faster sampling of experience and a denser reward signal, allowing even models of size up to 3B parameters to learn how to proficiently expend additional compute on the task.

  12. AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

    This paper presents AI Wizards' participation in the CLEF 2025 CheckThat! Lab Task 1: Subjectivity Detection in News Articles, classifying sentences as subjective/objective in monolingual, multilingual, and zero-shot settings. Training/development datasets were provided for Arabic, German, English, Italian, and Bulgarian; final evaluation included additional unseen languages (e.g., Greek, Romanian, Polish, Ukrainian) to assess generalization. Our primary strategy enhanced transformer-based classifiers by integrating sentiment scores, derived from an auxiliary model, with sentence representations, aiming to improve upon standard fine-tuning. We explored this sentiment-augmented architecture with mDeBERTaV3-base, ModernBERT-base (English), and Llama3.2-1B. To address class imbalance, prevalent across languages, we employed decision threshold calibration optimized on the development set. Our experiments show sentiment feature integration significantly boosts performance, especially subjective F1 score. This framework led to high rankings, notably 1st for Greek (Macro F1 = 0.51).

  13. RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning

    Reinforcement learning (RL) for large language models is an energy-intensive endeavor: training can be unstable, and the policy may gradually drift away from its pretrained weights. We present RLEP\, -- \,Reinforcement Learning with Experience rePlay\, -- \,a two-phase framework that first collects verified trajectories and then replays them during subsequent training. At every update step, the policy is optimized on mini-batches that blend newly generated rollouts with these replayed successes. By replaying high-quality examples, RLEP steers the model away from fruitless exploration, focuses learning on promising reasoning paths, and delivers both faster convergence and stronger final performance. On the Qwen2.5-Math-7B base model, RLEP reaches baseline peak accuracy with substantially fewer updates and ultimately surpasses it, improving accuracy on AIME-2024 from 38.2% to 39.9%, on AIME-2025 from 19.8% to 22.3%, and on AMC-2023 from 77.0% to 82.2%. Our code, datasets, and checkpoints are publicly available at https://github.com/Kwai-Klear/RLEP to facilitate reproducibility and further research.

Solidot(15)

  1. 韦伯望远镜可能发现了星际气体云塌缩形成的超大质量黑洞

    天文学家通过韦伯望远镜发现一个罕见天体,命名为「无限星系」(Infinity Galaxy)。这个系统是由两个盘状星系碰撞所形成,结构呈现两个紧密的核心,各自被环状结构包围,外观酷似数学符号「∞」,因此得名。无限星系距离地球约 83 亿光年,位于宇宙的中早期阶段。而在这个星系中,天文学家可能首次捕捉到一颗超大质量黑洞正在形成的过程,并指出这颗黑洞并非来自恒星坍缩,而是直接由气体云塌缩而成。这项发现支持「重种子」理论,有助解释为何在宇宙形成后不到十亿年内,已经出现质量庞大的黑洞。传统的「轻种子」(light seeds)理论认为,黑洞起初来自恒星核心坍缩所形成的小黑洞,质量约为数十到数千个太阳质量,需经由长期合并才能演化为超大质量黑洞。然而,这样的成长过程所需时间过长,无法解释宇宙早期即出现的巨大黑洞。因此,也有「重种子」(heavy seeds)理论提出,在特殊条件下,大质量气体云可直接塌缩为黑洞,省略中间合并的过程,但这一机制尚缺乏观测证据。无限星系可能提供了这种极端条件的实例。当两个星系碰撞时,产生的气体震波与压缩作用可能足以触发塌缩。这类情况虽在现今宇宙中极为罕见,但在早期宇宙中可能相当常见,有助解释韦伯望远镜所观测到的早期巨大黑洞来源。

  2. 乌克兰黑客破坏了俄罗斯无人机制造商的 IT 基础设施

    乌克兰黑客与情报部门合作,成功破坏了俄罗斯最大无人机制造商之一的 Gaskar Integration 的 IT 基础设施,导致该公司的生产陷入瘫痪。黑客在此次攻击中窃取了 47 TB 数据,之后将包括备份在内的所有数据全部从服务器上抹掉。窃取的数据转交给了乌克兰国防部。

  3. Mozilla 征求用户对 Firefox 未来发展的意见

    Firefox 最近推出了多项用户期待已久的新功能,如标签页组和垂直标签,Mozilla 此举也是旨在响应社区民意。在推出这些新功能之后,Firefox 接下来该这么做呢?Mozilla 开发者通过官方论坛征求用户的意见,并准备进行一次社区的 AMA(你问我答)活动。

  4. Valve 在支付公司压力下移除部分成人游戏

    Valve 更新了内容发行规则,添加了一条规定,表示支付公司有权决定哪些游戏内容能在 Steam 上发行:“Content that may violate the rules and standards set forth by Steam’s payment processors and related card networks and banks, or internet network providers. In particular, certain kinds of adult only content.”根据 SteamDB 对 Steam 游戏数据库的跟踪,它下架了部分成人游戏,这些下架的游戏与乱伦相关。Valve/Steam 不是第一家因成人内容而受到支付公司如 PayPal 压力的平台,Patreon 也曾面临类似的问题。

  5. 新发现肠道细菌能增强癌症免疫药效果

    日本国立癌症研究中心等研究团队发现了一种可提高“Opdivo(欧狄沃)”等癌症免疫药效果的肠道细菌,并在使用小鼠的实验中证实了效果。“Opdivo”和“Keytruda”等癌症免疫药通过解除免疫细胞的抑制机制,增强其对癌细胞的攻击力。但是有疗效的患者被认为只有 2~3 成左右。日本研究团队首先调查了使用癌症免疫药的 50 名肺癌和胃癌患者的粪便。发现癌症免疫药产生疗效的患者体内“瘤胃球菌科(Ruminococcaceae)”肠道细菌的占比更高。对这种肠道细菌进行详细分析后,发现了此前未知的新型肠道细菌“YB328”。为调查 YB328 的功能和性质,研究团队将癌症免疫药无效的患者的粪便移植给小鼠,并对小鼠使用了癌症免疫药和 YB328,结果发现小鼠的肿瘤缩小。研究表明,YB328 刺激了被认为是免疫系统指挥官的“树突状细胞”,并使其活性化。

  6. 新发现的外太阳系天体挑战第九行星假说

    天文学家通过夏威夷昴望远镜的观测,发现一颗极为遥远的太阳系天体 2023 KQ14,昵称菊石(Ammonite)。这颗天体近日点距离太阳约 66 天文单位,属于极罕见的「类赛德娜天体」(Sednoid),其轨道远离海王星的引力范围,揭示外太阳系未知区域的重要线索。目前已知的类赛德娜天体仅有三颗,分别为 Sedna、2012 VP₁₁₃ 和 Leleakuhonua,它们轨道方向大致相同,曾被认为可能受到某颗尚未发现的「第九行星」的引力牵引。然而新发现的菊石,其轨道方向却与这三者相反,显示外太阳系的动力架构比先前想像更加复杂,也让第九行星假说受到挑战。模拟结果显示,菊石的轨道数十亿年来相当稳定,未受太阳系内部行星明显扰动,可视为一枚「轨道化石」,保留着太阳系早期形成的动力痕迹。

  7. 软银将建立 AI 自我复制机制

    日本软银集团董事长兼社长孙正义在东京都出席该公司面向法人的活动,表示“将建立能让人工智能(AI)自我复制的机制”。为在公司内部利用“AI智能体”代替人类开展复杂业务,考虑使生成式 AI 开发出众多的 AI 智能体。孙正义提出了由公司内的 AI 智能体日夜运行,开展战略起草、程序设计、谈判等工作的构想。他强调,为让集团员工学习 AI 相关知识,年内将在公司内建立约 10 亿个 AI 智能体。

  8. 可穿戴传感器告诉用户何时需要喝水

    随着又一个炎热夏季的到来,脱水的威胁始终挥之不去。虽然脱水的程度轻则不便,重则危及生命,但追踪起来却并非易事。德克萨斯大学奥斯汀分校的研究人员正致力于改变这一现状,他们发明了一种新型非侵入式可穿戴传感器,可以实时持续测量用户的水分含量。它利用生物阻抗技术(一种测量电信号在体内传输方式的技术)来追踪身体的水分含量。传感器利用精心布置的电极,向手臂发送微小但安全的电流。 电流如何流经身体取决于组织中的水分含量。水是良好的电导体,因此水分充足的组织更容易让电流通过,而脱水的组织则会阻碍电流流动。传感器收集的数据通过无线方式传输到智能手机,使用户能够监测自己的水分含量。

  9. 7 月 14 日的全球网络故障源于 Cloudflare 的内部配置错误

    云服务商 Cloudflare 解释了 7 月 14 日的网络故障:它的 1.1.1.1 公共 DNS 服务从 21:52 UTC 到 22:54 UTC 期间不可用,对于依赖 1.1.1.1 解析域名的用户而言,这意味着几乎所有服务都不可用。Cloudflare 称事故是内部配置错误导致的,不是网络攻击或 BGP 劫持。错误配置是在 6 月 6 日引入的,但当时相关服务尚未投入到生产环境,因此没有立即产生影响。

  10. Parrot Security 6.4 释出

    基于 Debian 的安全发行版 Parrot 释出了 v6.4 版本。Parrot 6.4 是基于 Debian 12,可能是 6.x 系列的最后的一个版本,下个大版本 Parrot 7.0 将是基于 Debian 13。其它变化包括:Kernel 6.12.12、Metasploit、Sliver、Caido 和 Empire 等工具都更新到新版本;Firefox 最新 LTS 版本,等等。

  11. Blender 4.5 LTS 释出

    Blender 项目释出了 v4.5,这是一个长期支持版本(LTS),将提供两年的更新和 bug 修正。主要新特性包括:Vulkan 后端取代 OpenGL,UI 更流畅;自适应细分使用多线程速度提升最多 14 倍;新几何节点 Camera Info 和 Instance Bounds;GPU 加速合成器节点;新布尔求解器 Manifold;Grease Pencil 渲染通道(render pass)和几何节点整合;改进 PLY、OBJ、STL、CSV、VDB 等文件格式的导入;停止支持 Intel Mac 等。

  12. 研究发现 AI 降低了开源开发者的编程效率

    AI 公司宣称大模型能提高程序员的生产力和编程效率,但一项随机对照研究发现,AI 降低了开源开发者的编程速度。研究人员招募了 16 名有多年参与开源代码库的资深程序员,跟踪了他们维护代码库时完成 246 项任务的表现,一半任务程序员被要求使用 AI 工具如 Cursor Pro 或 Anthropic Claude,另一半任务被要求不要使用 AI 工具。在执行任务前,程序员们预测 AI 工具将帮助他们减少 24% 的工作时间,在完成任务后程序员们仍然认为 AI 工具帮助将工作速度提高了 20%。但实际上使用 AI 工具完成任务比不使用 AI 工具完成任务慢了 19%。研究人员发现,AI 工具减少了程序员们自己写代码、测试/调试,阅读/搜索信息的时间,但增加了评估 AI 输出、提示 AI 系统和等待 AI 生成,以及空闲/额外的时间。大部分程序员表示他们需要修改 AI 生成的代码。研究人员认为,现有的 AI 工具不适合高质量标准的环境,它在编程环境中仍然有很大的局限性。

  13. Mozilla 工程师称 Raptor Lake CPU 在热浪下导致 Firefox 更频繁的崩溃

    英特尔 14 代酷睿桌面处理器(Raptor Lake)的不稳定性问题甚至影响到了浏览器。Mozilla Staff Platform Engineer Gabriele Svelto 在联邦宇宙微博平台 Mastodon 上发帖称,北半球的英特尔 Raptor Lake 系统更可能会因为夏季高温而崩溃,他是通过 Raptor Lake 系统自动发送的 Firefox 崩溃报告而了解到这一情况的,通过崩溃日志他甚至可以知道哪些欧洲国家经历了热浪。他解释说,Raptor Lake 系统存在已知的时序/电压问题,并且会随着温度的升高而恶化。因为相同的崩溃报告太多,他们不得不禁用了一个自动提交崩溃报告的机器人程序。

  14. Reddit 开始要求访问成人内容的英国用户验证年龄

    Reddit 宣布,为了遵守英国的新法律 Online Safety Act,它开始要求访问成人内容的英国用户验证年龄。Reddit 是通过第三方服务 Persona 验证用户年龄,Persona 会对用户上传的自拍照或政府颁发的身份证件照片进行验证。Reddit 不会储存用户上传的照片,只会储存验证状态和用户提供的出生日期,用户无需在每次访问成人内容时进行验证。Reddit 称,Persona 承诺保护数据隐私,承诺不会将照片保留超过 7 天,它也无法访问用户的 Reddit 数据。

  15. 月之暗面发布 1 万亿参数的开源模型

    北京月之暗面科技有限公司上周发布了有 1 万亿总参数、320 亿激活参数的混合专家模型 Kimi K2。基准测试显示它能在部分领域打败 OpenAI 的 GPT-4.1。Kimi K2 在软件工程测试 SWE-bench Verified 中的正确率达到了 65.8%,超过了大多数开源模型,能与私有模型相媲美;在编程测试 LiveCodeBench 中,Kimi K2 的正确率达到了 53.7%,超过了 DeepSeek-V3 的 46.9% 和 GPT-4.1 的 44.7%;在数学推理测试 MATH-500 中的得分为 97.4%,超过了 GPT-4.1 的 92.4%。相比 OpenAI,月之暗面投入的成本更低,速度更快更便宜。