DIGEST · 2025-09-02

OrangeBot.AI Digest — 2025-09-02

63 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Google can keep its Chrome browser but will be barred from exclusive contracts (www.cnbc.com)
  2. OpenAI says it's scanning users' conversations and reporting content to police (futurism.com)
  3. We already live in social credit, we just don't call it that (www.thenexus.media)
  4. Python has had async for 10 years – why isn't it more popular? (tonybaloney.github.io)
  5. AI web crawlers are destroying websites in their never-ending content hunger (www.theregister.com)
  6. Anthropic raises $13B Series F (www.anthropic.com)
  7. X(Twitter) Shadow Bans Turkish Presidential Candidate (utkusen.substack.com)
  8. The Little Book of Linear Algebra (github.com)
  9. You don't want to hire "the best engineers" (www.otherbranch.com)
  10. What's New with Firefox 142 (www.mozilla.org)
  11. Run Erlang/Elixir on Microcontrollers and Embedded Linux (www.grisp.org)
  12. An LLM is a lossy encyclopedia (simonwillison.net)
  13. Next.js is infuriating (blog.meca.sh)
  14. Collecting All Causal Knowledge (causenet.org)
  15. Kazeta: An operating system that brings the console gaming experience of 90s (kazeta.org)

GitHub Trending(12)

  1. dockur / windows

    Windows inside a Docker container.

  2. crewAIInc / crewAI

    Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

  3. JetBrains / koog

    Koog is the official Kotlin framework for building and running robust, scalable and production-ready AI agents across all platforms – from backend services to Android and iOS, JVM, and even in-browser environments. Koog is based on our AI products expertise and provides proven solutions for complex LLM and AI problems

  4. ashishpatel26 / 500-AI-Agents-Projects

    The 500 AI Agents Projects is a curated collection of AI agent use cases across various industries. It showcases practical applications and provides links to open-source projects for implementation, illustrating how AI agents are transforming sectors such as healthcare, finance, education, retail, and more.

  5. google / mangle
  6. resemble-ai / chatterbox

    SoTA open-source TTS

  7. pedroslopez / whatsapp-web.js

    A WhatsApp client library for NodeJS that connects through the WhatsApp Web browser app

  8. LukeGus / Termix

    Termix is a web-based server management platform with SSH terminal, tunneling, and file editing capabilities.

  9. rustdesk / rustdesk

    An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.

  10. bytebot-ai / bytebot

    Bytebot is a self-hosted AI desktop agent that automates computer tasks through natural language commands, operating within a containerized Linux desktop environment.

  11. google / comprehensive-rust

    This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.

  12. projectdiscovery / nuclei-templates

    Community curated list of templates for the nuclei engine to find security vulnerabilities.

Product Hunt(15)

  1. Receiptor AI 2.0

    Bookkeeping on Autopilot with AI

  2. Google Finance Beta

    Dive into the world of finance with AI-powered insights

  3. Bhava

    Create and edit diagrams instantly with AI

  4. CatDoes

    Team of AI agents build mobile apps for you & your business

  5. fileAI MCP

    Give AI agents secure, real-time access to your files

  6. Copilot Audio Expressions

    The new voice of your stories

  7. Upvoted

    Feature voting and product feedback tool

  8. Bugster

    AI QA engineer for Next.js apps

  9. Ghost

    Cursor, Figma, and Powerpoint all in one editor 🔥

  10. Floor796

    Explore a living, animated sci-fi world made of pixel art

  11. Coherence X5 for macOS

    Turn websites into isolated Mac apps using Chrome

  12. Chronos for Jira

    Time tracking made easy

  13. Guitar Wiz

    All-in-one guitar companion app to learn, practice & create

  14. SpatialChat

    Product Feature Update for our Virtual Conferencing Platform

  15. Compot

    SwiftUI components + AI coding assistant

Hugging Face(7)

  1. PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

    Critic-free reinforcement learning methods, particularly group policies, have attracted considerable attention for their efficiency in complex tasks. However, these methods rely heavily on multiple sampling and comparisons within the policy to estimate advantage, which may cause the policy to fall into local optimum and increase computational cost. To address these issues, we propose PVPO, an efficient reinforcement learning method enhanced by an advantage reference anchor and data pre-sampling. Specifically, we use the reference model to rollout in advance and employ the calculated reward score as a reference anchor. Our approach effectively corrects the cumulative bias introduced by intra-group comparisons and significantly reduces reliance on the number of rollouts. Meanwhile, the reference model can assess sample difficulty during data pre-sampling, enabling effective selection of high-gain data to improve training efficiency. Experiments conducted on nine datasets across two domains demonstrate that PVPO achieves State-Of-The-Art (SOTA) performance. Our approach not only demonstrates robust generalization across multiple tasks, but also exhibits scalable performance across models of varying scales.

  2. T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

    Extensive research has been conducted to explore the capabilities of large language models (LLMs) in table reasoning. However, the essential task of transforming tables information into reports remains a significant challenge for industrial applications. This task is plagued by two critical issues: 1) the complexity and diversity of tables lead to suboptimal reasoning outcomes; and 2) existing table benchmarks lack the capacity to adequately assess the practical application of this task. To fill this gap, we propose the table-to-report task and construct a bilingual benchmark named T2R-bench, where the key information flow from the tables to the reports for this task. The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains as well as 4 types of industrial tables. Furthermore, we propose an evaluation criteria to fairly measure the quality of report generation. The experiments on 25 widely-used LLMs reveal that even state-of-the-art models like Deepseek-R1 only achieves performance with 62.71 overall score, indicating that LLMs still have room for improvement on T2R-bench. Source code and data will be available after acceptance.

  3. How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

    Recent advances in reasoning and planning capabilities of large language models (LLMs) have enabled their potential as autonomous agents capable of tool use in dynamic environments. However, in multi-turn conversational environments like tau-bench, these agents often struggle with consistent reasoning, adherence to domain-specific policies, and extracting correct information over a long horizon of tool-calls and conversation. To capture and mitigate these failures, we conduct a comprehensive manual analysis of the common errors occurring in the conversation trajectories. We then experiment with reformulations of inputs to the tool-calling agent for improvement in agent decision making. Finally, we propose the Input-Reformulation Multi-Agent (IRMA) framework, which automatically reformulates user queries augmented with relevant domain rules and tool suggestions for the tool-calling agent to focus on. The results show that IRMA significantly outperforms ReAct, Function Calling, and Self-Reflection by 16.1%, 12.7%, and 19.1%, respectively, in overall pass^5 scores. These findings highlight the superior reliability and consistency of IRMA compared to other methods in dynamic environments.

  4. UI-Level Evaluation of ALLaM 34B: Measuring an Arabic-Centric LLM via HUMAIN Chat

    Large language models (LLMs) trained primarily on English corpora often struggle to capture the linguistic and cultural nuances of Arabic. To address this gap, the Saudi Data and AI Authority (SDAIA) introduced the ALLaM family of Arabic-focused models. The most capable of these available to the public, ALLaM-34B, was subsequently adopted by HUMAIN, who developed and deployed HUMAIN Chat, a closed conversational web service built on this model. This paper presents an expanded and refined UI-level evaluation of ALLaM-34B. Using a prompt pack spanning modern standard Arabic, five regional dialects, code-switching, factual knowledge, arithmetic and temporal reasoning, creative generation, and adversarial safety, we collected 115 outputs (23 prompts times 5 runs) and scored each with three frontier LLM judges (GPT-5, Gemini 2.5 Pro, Claude Sonnet-4). We compute category-level means with 95\% confidence intervals, analyze score distributions, and visualize dialect-wise metric heat maps. The updated analysis reveals consistently high performance on generation and code-switching tasks (both averaging 4.92/5), alongside strong results in MSA handling (4.74/5), solid reasoning ability (4.64/5), and improved dialect fidelity (4.21/5). Safety-related prompts show stable, reliable performance of (4.54/5). Taken together, these results position ALLaM-34B as a robust and culturally grounded Arabic LLM, demonstrating both technical strength and practical readiness for real-world deployment.

  5. No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

    Surface defect detection is a critical task across numerous industries, aimed at efficiently identifying and localising imperfections or irregularities on manufactured components. While numerous methods have been proposed, many fail to meet industrial demands for high performance, efficiency, and adaptability. Existing approaches are often constrained to specific supervision scenarios and struggle to adapt to the diverse data annotations encountered in real-world manufacturing processes, such as unsupervised, weakly supervised, mixed supervision, and fully supervised settings. To address these challenges, we propose SuperSimpleNet, a highly efficient and adaptable discriminative model built on the foundation of SimpleNet. SuperSimpleNet incorporates a novel synthetic anomaly generation process, an enhanced classification head, and an improved learning procedure, enabling efficient training in all four supervision scenarios, making it the first model capable of fully leveraging all available data annotations. SuperSimpleNet sets a new standard for performance across all scenarios, as demonstrated by its results on four challenging benchmark datasets. Beyond accuracy, it is very fast, achieving an inference time below 10 ms. With its ability to unify diverse supervision paradigms while maintaining outstanding speed and reliability, SuperSimpleNet represents a promising step forward in addressing real-world manufacturing challenges and bridging the gap between academic research and industrial applications. Code: https://github.com/blaz-r/SuperSimpleNet

  6. From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

    Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: landmarks for salient cues, route knowledge for movement trajectories, and survey knowledge for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.

  7. Democracy-in-Silico: Institutional Design as Alignment in AI-Governed Polities

    This paper introduces Democracy-in-Silico, an agent-based simulation where societies of advanced AI agents, imbued with complex psychological personas, govern themselves under different institutional frameworks. We explore what it means to be human in an age of AI by tasking Large Language Models (LLMs) to embody agents with traumatic memories, hidden agendas, and psychological triggers. These agents engage in deliberation, legislation, and elections under various stressors, such as budget crises and resource scarcity. We present a novel metric, the Power-Preservation Index (PPI), to quantify misaligned behavior where agents prioritize their own power over public welfare. Our findings demonstrate that institutional design, specifically the combination of a Constitutional AI (CAI) charter and a mediated deliberation protocol, serves as a potent alignment mechanism. These structures significantly reduce corrupt power-seeking behavior, improve policy stability, and enhance citizen welfare compared to less constrained democratic models. The simulation reveals that an institutional design may offer a framework for aligning the complex, emergent behaviors of future artificial agent societies, forcing us to reconsider what human rituals and responsibilities are essential in an age of shared authorship with non-human entities.

Solidot(14)

  1. 亚马逊基本上未参与 AI 人才争夺战

    对于席卷硅谷的 AI 人才争夺战,亚马逊基本上是袖手旁观。根据电商巨头 HR 团队去年底撰写的一份内部文件,总结了在 AI 人才招聘上的不利因素,包括地理位置、薪酬以及在 AI 领域明显落后等。相比下竞争对手则通常会提供更全面、更激进的薪酬待遇。亚马逊以节俭著称。它的一个起源故事是从家得宝(Home Depot)购买廉价门然后将其改装成办公桌。据说 Jeff Bezos 至今仍在使用这种改装过的办公桌。

  2. 美国人性生活频率处于历史最低水平

    根据 Institute for Family Studies 的研究报告《The Sex Recession》,美国人性生活频率处于历史最低水平,甚至低于新冠疫情期间。研究人员分析了芝加哥大学全国民意研究中心 (NORC)最新调查报告 General Social Survey 中有关性和亲密关系的数据。调查数据于 2024 年收集,今年五月公布。结果显示,只有 37% 的 18-64 岁人群每周至少有一次性生活,低于 1990 年的 55%。年轻人中间的下降幅度更为惊人:近四分之一或 24% 的 18-29 岁人群表示过去一年没有性生活;这一数字是 2010 年的两倍。研究表明,性生活下降的趋势适用于 64 岁以下所有性取向的人群,无论已婚还是单身。研究人员表示,大于 64 岁的人群性生活次数没有显著变化,主要是因为该群体报告的性生活频率本来就较低。

  3. 企业雇佣人类让 AI 垃圾不那么糟糕

    无数企业都在尝试生成式 AI,但试用过的人都知道 AI 很难产生能直接使用的令人满意的最终产品,于是出现了雇佣人类调试和修改 AI 生成内容的新职业。自由职业者表示此类工作的报酬比不上其专业领域的传统零工,但一部分人表示这至少能帮助他们支付账单。自由职业平台 Upwork、Freelancer 和 Fiverr 的最新数据表明,此类创意工作的需求在激增。客户也日益希望找到能与 AI 技术协同工作,不完全依赖或拒绝 AI 的人。AI 辅助编程(vibe coding)日益流行,但企业发现此类工具无法达到他们预想的效果,企业仍然需要人类程序员,以避免辅助编程带来的麻烦。印度程序员 Harsh Kumar 说,客户使用 AI 辅助编程产生的网站或应用常常不稳定或无法使用。

  4. 压力影响心脏功能背后的分子机制

    根据发表在《Journal of Molecular and Cellular Cardiology》期刊上的一项研究,加州戴维斯的研究人员解释了压力影响心脏背后的分子机制。通过动物实验,研究团队发现,仅 10 天的急性压力就足以引发炎症,导致心脏功能出现细微改变。他们还揭示了其背后的分子机制:一种名为 NLRP3 炎性小体的多蛋白复合体被激活,该复合体是炎症反应中的关键“放大器”。压力通过一系列细胞应激与信号通路激活这些复合体。这是科学家首次证实,环境压力可直接触发心脏细胞内的这一过程:释放出有害分子,进而促使心脏病发生。对于保护心脏而言,改变生活方式、减轻压力固然是最佳选择,但这对生活在高污染、高噪音或高社会压力环境中的人并不容易实现。

  5. 研究称闻香味能增加大脑灰质

    根据发表在《Brain Research Bulletin》期刊上的一项研究,日本科学家报告称长时间闻香水能增加大脑灰质。日本京都大学和筑波大学的研究人员让实验组的 28 名女性抹玫瑰香油一个月,对照组的 22 名女性抹自来水。核磁共振成像(MRI) 扫描显示,抹玫瑰香油的实验组成员大脑灰质略有增加。脑灰质的增加并不一定意味着思维能力得到了增强,但这项发现可能对痴呆症等神经退行性疾病有重要意义。虽然不知道灰质增加的确切原因,研究人员猜测玫瑰香味会被大脑识别为令人不快的气味,负责调节情绪的后扣带回皮质(posterior cingulate cortex)会努力工作使体积增大。研究人员希望该发现能有助于研发能促进心理健康和大脑可塑性的芳香疗法。

  6. 日本夏季平均气温再创新高

    日本气象厅周一发布消息,今年夏天平均气温较往年高出 2.36 度,创 1898 年开始统计以来新高。日本已连续 3 年经历最炎热夏季。气象厅表示这一波酷暑还将持续两周。气象厅称,日本北部较往年高出 3.4°C,日本东部 +2.3°C,日本西部 +1.7°C,均为 1946 年有统计以来的最高值。全国 153 个气象站中,有 132 个站记录了夏季最高平均气温(其中 9 个站的记录与基线持平)。今年夏季,共有 9,385 个 AMeDAS 站记录了极端高温天数,这是自 2010 年实现统计比较以来的最高值。

  7. 新西兰人为左旋蜗牛寻找配偶

    新西兰人正在为一只罕见的左旋蜗牛寻找配偶。这只蜗牛以《辛普森一家》中左撇子邻居 Ned Flanders 的名字命名为 Ned。蜗牛的壳通常是右旋的,出现左旋壳的概率是 1:40,000,而且左旋蜗牛和右旋蜗牛是无法交配的,因为两者的生殖器对不上,所以左旋蜗牛必须和左旋蜗牛交配,但在自然界左旋蜗牛遇到左旋蜗牛的概率是非常低,因此新西兰人发起了为 Ned 寻找配偶的全国性行动——上一次的成功尝试是在 2016 年。

  8. Adobe Reader 安装程序的大小过去几年大幅膨胀

    曾经的装机软件、广泛使用的 PDF 阅读器 Adobe Reader 安装程序其容量过去几年大幅膨胀,原因当然是和所有科技公司一样,要在其产品中集成炙手可热的 AI,至于用户需要不需要则是另一回事。Adobe Reader 25.1 版本容量接近 700MB,而去年发布的 v24.2 容量只有 460MB,2016 年的 v15.17 容量不到 100MB。相比下,另一款 PDF 阅读器 SumatraPDF 容量维持在 10MB 以内。

  9. Python 纪录片上线

    由 CultRepo 制作的 Python 语言纪录片《Python: The Documentary | An origin story》上周在 YouTube 上线,观看量超过了 18 万次。Python 语言最初是荷兰程序员 Guido van Rossum 的“课余”项目,它简洁易读的特性最终令其从众多编程语言中脱颖而出,成为最受人喜爱的语言之一,成为驱动 AI、数据科学以及科技巨头所开发软件使用的语言。出现在纪录片中的人物包括了 Guido van Rossum、Travis Oliphant、Barry Warsaw 等,它讲述了 Python 的崛起、社区驱动的演变、几乎导致分崩离析的冲突,以及这门语言对世界万物的影响。

  10. 日本今年上半年出生人口再创新低

    日本厚生劳动省发布的 1~6 月人口动态统计显示,出生人数为 33.928 万人,较上年同期减少 3.1%,刷新了1969 年有可比数据以来的最低纪录。如果按照这样的速度持续下去,全年出生人数也极有可能创出历史新低。上半年的死亡人数为 83.6818 万人,同比增加 3.1%。出生人数减去死亡人数后得出的自然增减人数为负 49.7538 万人。日本已连续 21 年出现人口自然减少。从不同地区来看,所有都道府县均出现人口自然减少。

  11. 在试过后 Brian Kernighan 认为 Rust 不会很快取代 C

    83 岁的 Brian Kernighan 仍然在普林斯顿大学担任计算机科学教授,他参与了 Unix 系统的开发,与 Dennis Ritchie 合著了《C程序设计语言》(The C Programming Language)一书。他最近在新泽西州 InfoAge 科学历史博物馆做了一次演讲,在演讲之后的问答环节回答了一位现场观众的提问,这位观众询问了有关 Rust 语言是否会取代 C 语言的问题。Brian Kernighan 表示他只写过一个 Rust 程序,因此对 Rust 了解不多,但这次写 Rust 程序给他留下了非常糟糕的印象,他无法理解实现内存安全所需的机制,以及相应的支持机制。他花了好几天才写出一个 Rust 程序,用其它语言写五分钟时间就够了。他的结论是 Rust 不会很快取代 C。

  12. 建造在砂质土壤上的非洲城市在裂开

    根据发表在《自然》期刊上的一项研究,建造在砂质土壤之上、缺乏排水系统的非洲城市正在裂开,形成了很多会吞噬房屋和商铺的巨大沟壑。研究人员利用 2021-2023 年拍摄的卫星图像,在非洲 47 个城市中的 26 个城市识别出 2922 条城市沟壑,累计近 740 公里长。研究团队还使用比利时中非皇家博物馆的历史航拍照片进行交叉比对,发现城市沟壑在 1950 年代只有 46 条。研究显示,仅在刚果民主共和国,2004-2023 年间平均就有约 11.86 万人流离失所。研究人员估计,如果不采取紧急行动,未来 10 年非洲各地可能会有数十万人流离失所。刚果民主共和国首都金沙萨是受灾最严重的城市之一,该市共有 868 条城市沟壑,总长 221 公里。

  13. 研究认为平均寿命达到 100 岁变得不太可能

    根据发表在 PNAS 期刊上的一项研究,1939 年以后出生的几代人平均寿命不太可能达到 100 岁。从 1900 年到 1938 年,每一代人的预期寿命都增加了约五个半月。1900 年高收入国家居民的预期寿命平均为 62 岁,38 年后出生在类似条件下的人的预期寿命增至 80 岁。对 1939-2000 年出生的人而言,预期寿命增长速度放缓至每代约两个半月到三个半月,而 1980 年出生的人平均寿命无法达到 100 岁。除非有重大突破能显著延长人类寿命,预期寿命增长速度将会放缓。

  14. Mastodon 表示没办法遵守年龄验证法律

    联邦宇宙微博客平台 Mastodon 表示,软件不支持年龄验证,而运营 Mastodon 的非盈利组织也缺乏资源,因此它无法遵守密西西比州的年龄验证法律。Mastodon 也无意使用基于 IP 的封锁措施,认为这对正在旅行的用户是不公平的。为应对监管,2025 年 7 月释出的 Mastodon v4.4 允许管理员设定注册最低年龄,但年龄核查数据不会保存,而鉴于 Mastodon 的去中心化架构,年龄核查由不同 Mastodon 服务器管理员决定,Mastodon 本身不会跟踪不同服务器的政策执行和运营。