DIGEST · 2025-08-18

OrangeBot.AI Digest — 2025-08-18

59 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Obsidian Bases (help.obsidian.md)
  2. T-Mobile claimed selling location data without consent is legal–judges disagree (arstechnica.com)
  3. Show HN: Whispering – Open-source, local-first dictation you can trust (github.com)
  4. Left to Right Programming (graic.net)
  5. Anna's Archive: An Update from the Team (annas-archive.org)
  6. Counter-Strike: A billion-dollar game built in a dorm room (www.nytimes.com)
  7. Who Invented Backpropagation? (people.idsia.ch)
  8. 95% of AI Pilots Failing (fortune.com)
  9. FFmpeg Assembly Language Lessons (github.com)
  10. Vibe coding tips and tricks (github.com)
  11. Show HN: I built an app to block Shorts and Reels (scrollguard.app)
  12. When you're asking AI chatbots for answers, they're data-mining you (www.theregister.com)
  13. Electromechanical reshaping, an alternative to laser eye surgery (medicalxpress.com)
  14. LLMs and coding agents are a security nightmare (garymarcus.substack.com)
  15. MCP doesn't need tools, it needs code (lucumr.pocoo.org)

GitHub Trending(12)

  1. coleam00 / Archon

    Beta release of Archon OS - the knowledge and task management backbone for AI coding assistants.

  2. emcie-co / parlant

    LLM agents built for control. Designed for real-world use. Deployed in minutes.

  3. DataExpert-io / data-engineer-handbook

    This is a repo with links to everything you'd ever want to learn about data engineering

  4. rasbt / LLMs-from-scratch

    Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

  5. enescingoz / awesome-n8n-templates

    Supercharge your workflow automation with this curated collection of n8n templates! Instantly connect your favorite apps-like Gmail, Telegram, Google Drive, Slack, and more-with ready-to-use, AI-powered automations. Save time, boost productivity, and unlock the true potential of n8n in just a few clicks.

  6. PixiEditor / PixiEditor

    PixiEditor is a Universal Editor for all your 2D needs

  7. immich-app / immich

    High performance self-hosted photo and video management solution.

  8. Shubhamsaboo / awesome-llm-apps

    Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

  9. MotiaDev / motia

    Modern Backend Framework that unifies APIs, background jobs, workflows, and AI agents into a single cohesive system with built-in observability and state management.

  10. OpenBB-finance / OpenBB

    Financial data aggregator for humans and AI agents.

  11. bytebot-ai / bytebot

    Bytebot is a self-hosted AI desktop agent that automates computer tasks through natural language commands, operating within a containerized Linux desktop environment.

  12. mfts / papermark

    Papermark is the open-source DocSend alternative with built-in analytics and custom domains.

Product Hunt(12)

  1. Stormy

    Stormy — AI agent for influencer marketing

  2. Mirror

    Deeply understand yourself and every relationship

  3. Dualite x Supabase

    Build full-stack applications with Dualite - securely

  4. TensorZero

    Open-source stack for industrial-grade LLM applications

  5. FileFaker

    Generate sample files of various types and sizes in seconds.

  6. Blink

    Deep code research, straight from Slack or your browser

  7. Autosana

    QA Agent for Mobile Apps

  8. Filo Mail for macOS

    Instantly turn your Mac inbox into a to-do list

  9. Gitmore

    The first AI-powered reporting tool for git repositories

  10. Extra Thursday

    Fly through your inbox just by talking

  11. RealRoots

    We guarantee women new friends in her city

  12. CarbonRunner

    Shift AI training & CI/CD to the lowest carbon regions!

Hugging Face(13)

  1. SSRL: Self-Search Reinforcement Learning

    We investigate the potential of large language models (LLMs) to serve as efficient simulators for agentic search tasks in reinforcement learning (RL), thereby reducing dependence on costly interactions with external search engines. To this end, we first quantify the intrinsic search capability of LLMs via structured prompting and repeated sampling, which we term Self-Search. Our results reveal that LLMs exhibit strong scaling behavior with respect to the inference budget, achieving high pass@k on question-answering benchmarks, including the challenging BrowseComp task. Building on these observations, we introduce Self-Search RL (SSRL), which enhances LLMs' Self-Search capability through format-based and rule-based rewards. SSRL enables models to iteratively refine their knowledge utilization internally, without requiring access to external tools. Empirical evaluations demonstrate that SSRL-trained policy models provide a cost-effective and stable environment for search-driven RL training, reducing reliance on external search engines and facilitating robust sim-to-real transfer. We draw the following conclusions: 1) LLMs possess world knowledge that can be effectively elicited to achieve high performance; 2) SSRL demonstrates the potential of leveraging internal knowledge to reduce hallucination; 3) SSRL-trained models integrate seamlessly with external search engines without additional effort. Our findings highlight the potential of LLMs to support more scalable RL agent training.

  2. Thyme: Think Beyond Images

    Following OpenAI's introduction of the ``thinking with images'' concept, recent efforts have explored stimulating the use of visual information in the reasoning process to enhance model performance in perception and reasoning tasks. However, to the best of our knowledge, no open-source work currently offers a feature set as rich as proprietary models (O3), which can perform diverse image manipulations and simultaneously enhance logical reasoning capabilities through code. In this paper, we make a preliminary attempt in this direction by introducing Thyme (Think Beyond Images), a novel paradigm for enabling MLLMs to transcend existing ``think with images'' approaches by autonomously generating and executing diverse image processing and computational operations via executable code. This approach not only facilitates a rich, on-the-fly set of image manipulations (e.g., cropping, rotation, contrast enhancement) but also allows for mathematical computations, all while maintaining high autonomy in deciding when and how to apply these operations. We activate this capability through a two-stage training strategy: an initial SFT on a curated dataset of 500K samples to teach code generation, followed by a RL phase to refine decision-making. For the RL stage, we manually collect and design high-resolution question-answer pairs to increase the learning difficulty, and we propose GRPO-ATS (Group Relative Policy Optimization with Adaptive Temperature Sampling), an algorithm that applies distinct temperatures to text and code generation to balance reasoning exploration with code execution precision. We conduct extensive experimental analysis and ablation studies. Comprehensive evaluations on nearly 20 benchmarks show that Thyme yields significant and consistent performance gains, particularly in challenging high-resolution perception and complex reasoning tasks.

  3. DINOv3

    Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this training paradigm has the potential to learn visual representations from diverse sources, ranging from natural to aerial images -- using a single algorithm. This technical report introduces DINOv3, a major milestone toward realizing this vision by leveraging simple yet effective strategies. First, we leverage the benefit of scaling both dataset and model size by careful data preparation, design, and optimization. Second, we introduce a new method called Gram anchoring, which effectively addresses the known yet unsolved issue of dense feature maps degrading during long training schedules. Finally, we apply post-hoc strategies that further enhance our models' flexibility with respect to resolution, model size, and alignment with text. As a result, we present a versatile vision foundation model that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models. We also share the DINOv3 suite of vision models, designed to advance the state of the art on a wide spectrum of tasks and data by providing scalable solutions for diverse resource constraints and deployment scenarios.

  4. BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

    Recent advances in large language model (LLM) pretraining have shown that simply scaling data quantity eventually leads to diminishing returns, hitting a data wall. In response, the use of synthetic data for pretraining has emerged as a promising paradigm for pushing the frontier of performance. Despite this, the factors affecting synthetic data quality remain poorly understood. In this work, we introduce BeyondWeb, a synthetic data generation framework that produces high-quality synthetic data for pretraining. BeyondWeb significantly extends the capabilities of traditional web-scale datasets, outperforming state-of-the-art synthetic pretraining datasets such as Cosmopedia and Nemotron-CC's high-quality synthetic subset (Nemotron-Synth) by up to 5.1 percentage points (pp) and 2.6pp, respectively, when averaged across a suite of 14 benchmark evaluations. It delivers up to 7.7x faster training than open web data and 2.7x faster than Nemotron-Synth. Remarkably, a 3B model trained for 180B tokens on BeyondWeb outperforms an 8B model trained for the same token budget on Cosmopedia. We also present several insights from BeyondWeb on synthetic data for pretraining: what drives its benefits, which data to rephrase and how, and the impact of model size and family on data quality. Overall, our work shows that there's no silver bullet for generating high-quality synthetic pretraining data. The best outcomes require jointly optimizing many factors, a challenging task that requires rigorous science and practical expertise. Naive approaches can yield modest improvements, potentially at great cost, while well-executed methods can yield transformative improvements, as exemplified by BeyondWeb.

  5. PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

    Paper search is an important activity for researchers, typically involving using a query with description of a topic to find relevant papers. As research deepens, paper search requirements may become more flexible, sometimes involving specific details such as module configuration rather than being limited to coarse-grained topics. However, previous paper search systems are unable to meet these flexible-grained requirements, as these systems mainly collect paper abstracts to construct index of corpus, which lack detailed information to support retrieval by finer-grained queries. In this work, we propose PaperRegister, consisted of offline hierarchical indexing and online adaptive retrieval, transforming traditional abstract-based index into hierarchical index tree for paper search, thereby supporting queries at flexible granularity. Experiments on paper search tasks across a range of granularity demonstrate that PaperRegister achieves the state-of-the-art performance, and particularly excels in fine-grained scenarios, highlighting the good potential as an effective solution for flexible-grained paper search in real-world applications. Code for this work is in https://github.com/Li-Z-Q/PaperRegister.

  6. XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

    Although LLM inference has emerged as a critical workload for many downstream applications, efficiently inferring LLMs is challenging due to the substantial memory footprint and bandwidth requirements. In parallel, compute capabilities have steadily outpaced both memory capacity and bandwidth over the last few decades, a trend that remains evident in modern GPU hardware and exacerbates the challenge of LLM inference. As such, new algorithms are emerging that trade increased computation for reduced memory operations. To that end, we present XQuant, which takes advantage of this trend, enabling an order-of-magnitude reduction in memory consumption through low-bit quantization with substantial accuracy benefits relative to state-of-the-art KV cache quantization methods. We accomplish this by quantizing and caching the layer input activations X, instead of using standard KV caching, and then rematerializing the Keys and Values on-the-fly during inference. This results in an immediate 2times memory savings compared to KV caching. By applying XQuant, we achieve up to sim 7.7times memory savings with <0.1 perplexity degradation compared to the FP16 baseline. Furthermore, our approach leverages the fact that X values are similar across layers. Building on this observation, we introduce XQuant-CL, which exploits the cross-layer similarity in the X embeddings for extreme compression. Across different models, XQuant-CL attains up to 10times memory savings relative to the FP16 baseline with only 0.01 perplexity degradation, and 12.5times memory savings with only 0.1 perplexity degradation. XQuant exploits the rapidly increasing compute capabilities of hardware platforms to eliminate the memory bottleneck, while surpassing state-of-the-art KV cache quantization methods and achieving near-FP16 accuracy across a wide range of models.

  7. TexVerse: A Universe of 3D Objects with High-Resolution Textures

    We introduce TexVerse, a large-scale 3D dataset featuring high-resolution textures. While recent advances in large-scale 3D datasets have enhanced high-resolution geometry generation, creating high-resolution textures end-to-end remains underexplored due to the lack of suitable datasets. TexVerse fills this gap with a curated collection of over 858K unique high-resolution 3D models sourced from Sketchfab, including more than 158K models with physically based rendering (PBR) materials. Each model encompasses all of its high-resolution variants, bringing the total to 1.6M 3D instances. TexVerse also includes specialized subsets: TexVerse-Skeleton, with 69K rigged models, and TexVerse-Animation, with 54K animated models, both preserving original skeleton and animation data uploaded by the user. We also provide detailed model annotations describing overall characteristics, structural components, and intricate features. TexVerse offers a high-quality data resource with wide-ranging potential applications in texture synthesis, PBR material development, animation, and various 3D vision and graphics tasks.

  8. StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation

    We introduce StyleMM, a novel framework that can construct a stylized 3D Morphable Model (3DMM) based on user-defined text descriptions specifying a target style. Building upon a pre-trained mesh deformation network and a texture generator for original 3DMM-based realistic human faces, our approach fine-tunes these models using stylized facial images generated via text-guided image-to-image (i2i) translation with a diffusion model, which serve as stylization targets for the rendered mesh. To prevent undesired changes in identity, facial alignment, or expressions during i2i translation, we introduce a stylization method that explicitly preserves the facial attributes of the source image. By maintaining these critical attributes during image stylization, the proposed approach ensures consistent 3D style transfer across the 3DMM parameter space through image-based training. Once trained, StyleMM enables feed-forward generation of stylized face meshes with explicit control over shape, expression, and texture parameters, producing meshes with consistent vertex connectivity and animatability. Quantitative and qualitative evaluations demonstrate that our approach outperforms state-of-the-art methods in terms of identity-level facial diversity and stylization capability. The code and videos are available at [kwanyun.github.io/stylemm_page](kwanyun.github.io/stylemm_page).

  9. FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

    Recent advances in audio-driven portrait animation have demonstrated impressive capabilities. However, existing methods struggle to align with fine-grained human preferences across multiple dimensions, such as motion naturalness, lip-sync accuracy, and visual quality. This is due to the difficulty of optimizing among competing preference objectives, which often conflict with one another, and the scarcity of large-scale, high-quality datasets with multidimensional preference annotations. To address these, we first introduce Talking-Critic, a multimodal reward model that learns human-aligned reward functions to quantify how well generated videos satisfy multidimensional expectations. Leveraging this model, we curate Talking-NSQ, a large-scale multidimensional human preference dataset containing 410K preference pairs. Finally, we propose Timestep-Layer adaptive multi-expert Preference Optimization (TLPO), a novel framework for aligning diffusion-based portrait animation models with fine-grained, multidimensional preferences. TLPO decouples preferences into specialized expert modules, which are then fused across timesteps and network layers, enabling comprehensive, fine-grained enhancement across all dimensions without mutual interference. Experiments demonstrate that Talking-Critic significantly outperforms existing methods in aligning with human preference ratings. Meanwhile, TLPO achieves substantial improvements over baseline models in lip-sync accuracy, motion naturalness, and visual quality, exhibiting superior performance in both qualitative and quantitative evaluations. Ours project page: https://fantasy-amap.github.io/fantasy-talking2/

  10. X-Node: Self-Explanation is All We Need

    Graph neural networks (GNNs) have achieved state-of-the-art results in computer vision and medical image classification tasks by capturing structural dependencies across data instances. However, their decision-making remains largely opaque, limiting their trustworthiness in high-stakes clinical applications where interpretability is essential. Existing explainability techniques for GNNs are typically post-hoc and global, offering limited insight into individual node decisions or local reasoning. We introduce X-Node, a self-explaining GNN framework in which each node generates its own explanation as part of the prediction process. For every node, we construct a structured context vector encoding interpretable cues such as degree, centrality, clustering, feature saliency, and label agreement within its local topology. A lightweight Reasoner module maps this context into a compact explanation vector, which serves three purposes: (1) reconstructing the node's latent embedding via a decoder to enforce faithfulness, (2) generating a natural language explanation using a pre-trained LLM (e.g., Grok or Gemini), and (3) guiding the GNN itself via a "text-injection" mechanism that feeds explanations back into the message-passing pipeline. We evaluate X-Node on two graph datasets derived from MedMNIST and MorphoMNIST, integrating it with GCN, GAT, and GIN backbones. Our results show that X-Node maintains competitive classification accuracy while producing faithful, per-node explanations. Repository: https://github.com/basiralab/X-Node.

  11. Controlling Multimodal LLMs via Reward-guided Decoding

    As Multimodal Large Language Models (MLLMs) gain widespread applicability, it is becoming increasingly desirable to adapt them for diverse user needs. In this paper, we study the adaptation of MLLMs through controlled decoding. To achieve this, we introduce the first method for reward-guided decoding of MLLMs and demonstrate its application in improving their visual grounding. Our method involves building reward models for visual grounding and using them to guide the MLLM's decoding process. Concretely, we build two separate reward models to independently control the degree of object precision and recall in the model's output. Our approach enables on-the-fly controllability of an MLLM's inference process in two ways: first, by giving control over the relative importance of each reward function during decoding, allowing a user to dynamically trade off object precision for recall in image captioning tasks; second, by giving control over the breadth of the search during decoding, allowing the user to control the trade-off between the amount of test-time compute and the degree of visual grounding. We evaluate our method on standard object hallucination benchmarks, showing that it provides significant controllability over MLLM inference, while consistently outperforming existing hallucination mitigation methods.

  12. MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

    Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive benchmark of fusion strategies and reconstruction target normalization schemes for multimodal, multitemporal, and multispectral Earth observation data. Based on our findings, we propose MAESTRO, a novel adaptation of the Masked Autoencoder, featuring optimized fusion strategies and a tailored target normalization scheme that introduces a spectral prior as a self-supervisory signal. Evaluated on four Earth observation datasets, MAESTRO sets a new state-of-the-art on tasks that strongly rely on multitemporal dynamics, while remaining highly competitive on tasks dominated by a single mono-temporal modality. Code to reproduce all our experiments is available at https://github.com/ignf/maestro.

  13. SPARSE Data, Rich Results: Few-Shot Semi-Supervised Learning via Class-Conditioned Image Translation

    Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks -- a generator for class-conditioned image translation, a discriminator for authenticity assessment and classification, and a dedicated classifier -- within a three-phase training framework. The method alternates between supervised training on limited labeled data and unsupervised learning that leverages abundant unlabeled images through image-to-image translation rather than generation from noise. We employ ensemble-based pseudo-labeling that combines confidence-weighted predictions from the discriminator and classifier with temporal consistency through exponential moving averaging, enabling reliable label estimation for unlabeled data. Comprehensive evaluation across eleven MedMNIST datasets demonstrates that our approach achieves statistically significant improvements over six state-of-the-art GAN-based semi-supervised methods, with particularly strong performance in the extreme 5-shot setting where the scarcity of labeled data is most challenging. The framework maintains its superiority across all evaluated settings (5, 10, 20, and 50 shots per class). Our approach offers a practical solution for medical imaging applications where annotation costs are prohibitive, enabling robust classification performance even with minimal labeled data. Code is available at https://github.com/GuidoManni/SPARSE.

Solidot(7)

  1. 一种蛋白质在人体中传递衰老信号

    根据发表在《Metabolism》期刊上的一项研究,韩国大学医学院的研究人员报告名为 ReHMGB1 的蛋白质在人体血液中传递衰老信号。ReHMGB1 代表 Reduced High Motion Group Box 1,会触发细胞衰老,永久性丧失其功能。它不仅会在局部发生,还会通过血液循环向全身传递破坏信号,尤其是在损伤或疾病的刺激下。对小鼠研究发现,如果阻断 ReHMGB1 的信号传递,肌肉受损的小鼠会显著加快肌肉再生速度,改善身体机能,减少细胞衰老迹象,减轻全身炎症。ReHMGB1 也具有有益功能,警告人体某个身体组织受损需要修复。

  2. 2025 年雨果奖宣布

    在西雅图举办的第 83 届世界科幻大会宣布了 2025 年雨果奖获奖名单: 最佳长篇小说:Robert Jackson Bennett 的《The Tainted Cup》,《Shadow of the Leviathan》系列第一部,故事发生在一个被海墙环绕的帝国 Khanum,每逢雨季巨兽利维坦会出现,然后被击退,帝国公民需要时刻关注海墙的缺口,故事始于调查一起谋杀案;  最佳中长篇小说:Ray Nayle 的《The Tusks of Extinction》;  最佳中短篇小说:Naomi Kritzer 的《The Four Sisters Overlooking the Sea》; 最佳短篇小说:Nghi Vo 的《Stitched to Skin Like Family Is》: 最佳系列小说:Rebecca Roanhorse 的《Between Earth and Sky》系列;  最佳科幻电视剧:《星际迷航:下层舰员》第五季第 10 集《The New Next Generation》; 最佳电影:《沙丘:第二部》: 最佳游戏:《卡德洞窟(Caves of Qud)》(龙腾世纪4、塞尔达传说和 1000xRESIST 等入围)。

  3. FFmpeg 迁移到 Forgejo

    FFmpeg 的开发迁移到了使用 Forgejo 的自托管平台 code.ffmpeg.org。Forgejo 是 Gitea 的分支,是一个类似 GitHub 的 Git 软件开发和版本控制平台,支持 bug 跟踪、Wiki 和代码审查等功能。迁移到 Forgejo 的一个重要原因是随着越来越多的开发者参与 FFmpeg 项目,现有的基于邮件列表的开发模式已经无法满足需求。短时间内邮件列表仍然继续使用,但随着时间的推移将会逐渐减少使用。

  4. 克罗地亚将数字游民签证有效期延长至三年

    克罗地亚将其数字游民签证有效期从一年延长至三年,允许非欧盟居民及其近亲在该国远程工作和生活。数字游民签证是一种短期签证,大部分国家的数字游民签证有效期是六个月到一年。克罗地亚更新了它的政策,允许最长三年,而且允许亲密的家庭成员加入,所谓亲密家庭成员指的是同居三年以上但无子女的伴侣,或同居短于三年有子女的伴侣。当地官员表示此举旨在吸引更多人才前来该国工作和生活。克罗地亚的生活成本较低,但仍需改善其网络基础设施。

  5. 今年前七个月电动汽车销量同比增长 27%

    数据显示,今年前七个月电动汽车销量逾 1070 万辆,同比增长 27%。其中中国销量 650 万辆同比增长 29%, 欧洲 230 万辆增长 30%,北美 100 万辆增长 2%,其它地区总销量 90 万辆增长 42%。欧洲地区的德国、英国和意大利销量都有强劲增长,欧洲地区的纯电增长 30% 插电混动增长 32%。美国的电动汽车市场则因为政策阻力而销售疲软。

  6. PuTTY 有了新官网

    知名开源虚拟终端 PuTTY 官网网址非常长——www.chiark.greenend.org.uk/~sgtatham/putty。开发者曾经在官网 FAQ 中自信的表示,用户不会找错 PuTTY 官方地址,因为用户在 Google 中搜索 PuTTY 第一个结果就是它。然而今天第一个结果是 www.putty.org——由一位反疫苗者运营的第三方网站,他因为疫情而开始热衷传播虚假信息。此事在开源社区引发了激烈的讨论,最终开发团队于 8 月 14 日宣布注册了一个更简单易记的域名——putty.software。

  7. 德国最高法院部分推翻广告屏蔽工具不侵犯版权判决

    德国出版商 Axel Springer 起诉广告屏蔽工具 Adblock Plus 开发商 Eyeo 一案已持续了十多年,基本上 Eyeo 一直是胜诉。2022 年汉堡上诉法院裁定 Adblock Plus 未侵犯网站版权,它被认为只是给予用户选择浏览器如何渲染页面。但今年 7 月 31 日,德国联邦最高法院部分推翻了汉堡法院的判决,将案件发回重审。Mozilla 官方博客发表声明,对此表达了关注,希望德国不会成为中国之后第二个禁止广告屏蔽工具的司法管辖区。北京火狐公司管理 Firefox 期间曾限制中国地区用户安装广告屏蔽扩展。