DIGEST · 2025-11-30

OrangeBot.AI Digest — 2025-11-30

47 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. "Boobs check" – Technique to verify if sites behind CDN are hosted in Iran (twitter.com)
  2. Writing a good Claude.md (www.humanlayer.dev)
  3. NixOS 25.11 released (nixos.org)
  4. Don't push AI down our throats (gpt3experiments.substack.com)
  5. The Thinking Game Film – Google DeepMind documentary (thinkinggamefilm.com)
  6. Modern cars are spying on you. Here's what you can do about it (apnews.com)
  7. Migrating Dillo from GitHub (dillo-browser.org)
  8. Windows drive letters are not limited to A-Z (www.ryanliptak.com)
  9. Norway wealth fund to vote for human rights report at Microsoft, against Nadella (www.cnbc.com)
  10. Paul Hegarty's updated CS193p SwiftUI course released by Stanford (cs193p.stanford.edu)
  11. Advent of Code 2025 (adventofcode.com)
  12. CachyOS: Fast and Customizable Linux Distribution (cachyos.org)
  13. Show HN: Real-time system that tracks how news spreads across 200k websites (yandori.io)
  14. What's Hiding Inside Haribo's Power Bank and Headphones? (www.lumafield.com)
  15. Zigbook Is Plagiarizing the Zigtools Playground (zigtools.org)

GitHub Trending(15)

  1. sansan0 / TrendRadar

    🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署⭐ 让算法为你服务,用AI理解热点

  2. google / adk-go

    An open-source, code-first Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

  3. TapXWorld / ChinaTextbook

    所有小初高、大学PDF教材。

  4. yeongpin / cursor-free-vip

    [Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please let us know if you believe this is a mistake.

  5. nvm-sh / nvm

    Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions

  6. traefik / traefik

    The Cloud Native Application Proxy

  7. HKUDS / LightRAG

    [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

  8. bobeff / open-source-games

    A list of open source games.

  9. volcengine / verl

    verl: Volcano Engine Reinforcement Learning for LLMs

  10. GibsonAI / Memori

    Open-Source Memory Engine for LLMs, AI Agents & Multi-Agent Systems

  11. yangshun / tech-interview-handbook

    Curated coding interview preparation materials for busy software engineers

  12. microsoft / call-center-ai

    Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

  13. MustardChef / WSABuilds

    Run Windows Subsystem For Android on your Windows 10 and Windows 11 PC using prebuilt binaries with Google Play Store (MindTheGapps) and/or Magisk or KernelSU (root solutions) built in.

  14. playcanvas / engine

    Powerful web graphics runtime built on WebGL, WebGPU, WebXR and glTF

  15. iptv-org / iptv

    Collection of publicly available IPTV channels from all over the world

Hugging Face(7)

  1. Video Generation Models Are Good Latent Reward Models

    Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach incurs substantial memory overhead and increased training time, and its late-stage optimization lacks early-stage supervision, refining only visual quality rather than fundamental motion dynamics and structural coherence. In this work, we show that pre-trained video generation models are naturally suited for reward modeling in the noisy latent space, as they are explicitly designed to process noisy latent representations at arbitrary timesteps and inherently preserve temporal information through their sequential modeling capabilities. Accordingly, we propose Process Reward Feedback Learning~(PRFL), a framework that conducts preference optimization entirely in latent space, enabling efficient gradient backpropagation throughout the full denoising chain without VAE decoding. Extensive experiments demonstrate that PRFL significantly improves alignment with human preferences, while achieving substantial reductions in memory consumption and training time compared to RGB ReFL.

  2. Canvas-to-Image: Compositional Image Generation with Multimodal Controls

    While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references, spatial arrangements, pose constraints, and layout annotations. We introduce Canvas-to-Image, a unified framework that consolidates these heterogeneous controls into a single canvas interface, enabling users to generate images that faithfully reflect their intent. Our key idea is to encode diverse control signals into a single composite canvas image that the model can directly interpret for integrated visual-spatial reasoning. We further curate a suite of multi-task datasets and propose a Multi-Task Canvas Training strategy that optimizes the diffusion model to jointly understand and integrate heterogeneous controls into text-to-image generation within a unified learning paradigm. This joint training enables Canvas-to-Image to reason across multiple control modalities rather than relying on task-specific heuristics, and it generalizes well to multi-control scenarios during inference. Extensive experiments show that Canvas-to-Image significantly outperforms state-of-the-art methods in identity preservation and control adherence across challenging benchmarks, including multi-person composition, pose-controlled composition, layout-constrained generation, and multi-control generation.

  3. ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

    Embodied cognition argues that intelligence arises from sensorimotor interaction rather than passive observation. It raises an intriguing question: do modern vision-language models (VLMs), trained largely in a disembodied manner, exhibit signs of embodied cognition? We introduce ENACT, a benchmark that casts evaluation of embodied cognition as world modeling from egocentric interaction in a visual question answering (VQA) format. Framed as a partially observable Markov decision process (POMDP) whose actions are scene graph changes, ENACT comprises two complementary sequence reordering tasks: forward world modeling (reorder shuffled observations given actions) and inverse world modeling (reorder shuffled actions given observations). While conceptually simple, solving these tasks implicitly demands capabilities central to embodied cognition-affordance recognition, action-effect reasoning, embodied awareness, and interactive, long-horizon memory from partially observable egocentric input, while avoiding low-level image synthesis that could confound the evaluation. We provide a scalable pipeline that synthesizes QA pairs from robotics simulation (BEHAVIOR) and evaluates models on 8,972 QA pairs spanning long-horizon home-scale activities. Experiments reveal a performance gap between frontier VLMs and humans that widens with interaction horizon. Models consistently perform better on the inverse task than the forward one and exhibit anthropocentric biases, including a preference for right-handed actions and degradation when camera intrinsics or viewpoints deviate from human vision. Website at https://enact-embodied-cognition.github.io/.

  4. Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

    Large multimodal models (LMMs) are increasingly adopted as judges in multimodal evaluation systems due to their strong instruction following and consistency with human preferences. However, their ability to follow diverse, fine-grained evaluation criteria remains underexplored. We develop Multi-Crit, a benchmark for evaluating multimodal judges on their capacity to follow pluralistic criteria and produce reliable criterion-level judgments. Covering both open-ended generation and verifiable reasoning tasks, Multi-Crit is built through a rigorous data curation pipeline that gathers challenging response pairs with multi-criterion human annotations. It further introduces three novel metrics for systematically assessing pluralistic adherence, criterion-switching flexibility, and the ability to recognize criterion-level preference conflicts. Comprehensive analysis of 25 LMMs reveals that 1) proprietary models still struggle to maintain consistent adherence to pluralistic criteria--especially in open-ended evaluation; 2) open-source models lag further behind in flexibly following diverse criteria; and 3) critic fine-tuning with holistic judgment signals enhances visual grounding but fails to generalize to pluralistic criterion-level judgment. Additional analyses on reasoning fine-tuning, test-time scaling, and boundary consistency between open-source and proprietary models further probe the limits of current multimodal judges. As a pioneering study, Multi-Crit lays the foundation for building reliable and steerable multimodal AI evaluation.

  5. MIRA: Multimodal Iterative Reasoning Agent for Image Editing

    Instruction-guided image editing offers an intuitive way for users to edit images with natural language. However, diffusion-based editing models often struggle to accurately interpret complex user instructions, especially those involving compositional relationships, contextual cues, or referring expressions, leading to edits that drift semantically or fail to reflect the intended changes. We tackle this problem by proposing MIRA (Multimodal Iterative Reasoning Agent), a lightweight, plug-and-play multimodal reasoning agent that performs editing through an iterative perception-reasoning-action loop, effectively simulating multi-turn human-model interaction processes. Instead of issuing a single prompt or static plan, MIRA predicts atomic edit instructions step by step, using visual feedback to make its decisions. Our 150K multimodal tool-use dataset, MIRA-Editing, combined with a two-stage SFT + GRPO training pipeline, enables MIRA to perform reasoning and editing over complex editing instructions. When paired with open-source image editing models such as Flux.1-Kontext, Step1X-Edit, and Qwen-Image-Edit, MIRA significantly improves both semantic consistency and perceptual quality, achieving performance comparable to or exceeding proprietary systems such as GPT-Image and Nano-Banana.

  6. Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

    MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for reuse. However, trajectory-based memory suffers from brevity bias, gradually losing essential domain knowledge. More critically, even in truly multimodal problem-solving settings, it records only a single-modality trace of past behavior, failing to preserve how visual attention and logical reasoning jointly contributed to the solution. This is fundamentally misaligned with human cognition: semantic memory is both multimodal and integrated, preserving visual and abstract knowledge through coordinated but distinct representational streams. We thus introduce ViLoMem, a dual-stream memory framework that constructs compact, schema-based memory. It separately encodes visual distraction patterns and logical reasoning errors, enabling MLLMs to learn from their successful and failed experiences. Following a grow-and-refine principle, the system incrementally accumulates and updates multimodal semantic knowledge -- preserving stable, generalizable strategies while avoiding catastrophic forgetting. Across six multimodal benchmarks, ViLoMem consistently improves pass@1 accuracy and substantially reduces repeated visual and logical errors. Ablations confirm the necessity of dual-stream memory with explicit distraction--hallucination separation, demonstrating the value of error-aware multimodal memory for lifelong and cross-domain agentic learning. Our project page will be available at https://weihao-bo.github.io/ViLoMeo-page.

  7. What does it mean to understand language?

    Language understanding entails not just extracting the surface-level meaning of the linguistic input, but constructing rich mental models of the situation it describes. Here we propose that because processing within the brain's core language system is fundamentally limited, deeply understanding language requires exporting information from the language system to other brain regions that compute perceptual and motor representations, construct mental models, and store our world knowledge and autobiographical memories. We review the existing evidence for this hypothesis, and argue that recent progress in cognitive neuroscience provides both the conceptual foundation and the methods to directly test it, thus opening up a new strategy to reveal what it means, cognitively and neurally, to understand language.

Solidot(10)

  1. 天文学家观测到红矮星的日冕物质抛射

    天文学家首次观测到一颗红矮星的日冕物质抛射(Coronal Mass Ejection, CME)。这是第一次直接探测到来自附近恒星的高能量 Ty​​pe II 无线电波爆发。这次爆发的来源为 M 型红矮星 StKM-1262,质量仅为太阳 60%,距离太阳系约 130 光年,位于天龙座边界。这次被捕捉到的 CME 威力比太阳典型的 CME 强上一万至十万倍,特征与强度类似于太阳的 Type II 爆发,然而这类爆发只占太阳全部 CME 事件的0.05%,属于极端事件。若从爆发的速度及发射频率推算,这次事件抛出的等离子粒子密度到达 M 型红矮星适居带边缘(0.2AU)时,足以把地球等级磁场行星的磁层压缩至行星表面,对行星大气是毁灭性的打击。

  2. 荷兰大学反思对微软软件的依赖

    今年早些时候,荷兰海牙的国际刑事法院以战争罪对以色列总理内塔尼亚胡及前国防部长加兰特(Yoav Gallant)发出逮捕令,美国总统特朗普对首席检察官 Karim Khan 等人进行了制裁,微软立即封锁了 Khan 的电邮账户,迫使他改用瑞士电邮服务 Proton。这一事件促使欧洲各国和教育机构反思对美国科技公司的依赖。其中之一是荷兰大学。大学的学生、教职工以及 IT 管理员都广泛使用和依赖微软软件,教育机构还在微软云服务储存了大量数据。荷兰有七所大学和一所大学学院因切断或冻结与以色列机构的联系,被美国佛罗里达州列入制裁名单,在反复无常的特朗普领导下,荷兰的教育机构可能随时面临惩罚。但荷兰大学能摆脱微软吗?教授们指出,如果停止使用微软软件会导致教育和研究立即停滞,认为对科技巨头的依赖从根本上违背了自由、独立、自主和平等的公共价值观,呼吁与欧洲其它大学合作建立自主的 IT 基础设施。

  3. 因强太阳辐射可能破坏飞控数据空客对全世界六千架客机进行软件更新

    今年 10 月 JetBlue Airways 公司一架飞美墨航线的空客飞机遭遇了飞行高度突然下降的事件,客机在佛罗里达紧急迫降,至少 15 人受伤。调查发现,强太阳辐射可能会破坏飞控功能的关键数据。为确保飞行安全空客宣布对全世界大约 6000 架客机进行紧急软件更新。受影响的主要是 A320 机型,以及部分 A318、A319 和 A321 机型。其中 5100 架客机只需要软件更新就能恢复飞行,还有 900 架旧型号客机则需要更换机载计算机,在完成更换前飞机将无法恢复飞行。

  4. 禽流感病毒能抵抗高烧

    根据发表在《科学》期刊上的一项研究,禽流感病毒能抵抗高烧,对人类构成了严重威胁。高烧是人体阻止病毒传播的一种自我防御机制,它可以使体温最高达到 41°C。而禽流感病毒天然宿主肠道内的温度最高可以达到 40°C 至 42°C。对小鼠实验发现,将体温升高至高烧水平能有效阻止源自人类流感病毒复制,但不太可能阻止禽流感病毒的复制。研究人员表示,幸运的是目前人类感染禽流感病毒的病例很少,每年几十例。但禽流感的致死率仍然令人担忧,如 H5N1 感染的死亡率超过 40%。

  5. 《怪奇物语》定义了算法时代

    在 2016 年夏天,流媒体公司仍然在摸索如何制作原创剧集:是《Sense8》那样的实验性作品,还是重启版《Arrested Development》那样的非线性叙事,或者《House of Cards》那样的高质量剧情?一部以 1980 年代为背景的爆米花恐怖惊悚片《怪奇物语》给出了答案,它定义了我们所生活的算法时代。《怪奇物语》于本周上映第四季也是最终季。它融合了斯皮尔伯格(Steven Spielberg)电影中的青少成长主题,斯蒂芬金(Stephen King)作品中的恐怖元素和青少年友谊,1980 年代的超自然题材,休斯(John Hughes)作品中的青少年对立...它就像一个装满怀旧糖果的万圣节碗,巧妙唤起观众过去所喜爱的流行文化,换句话说,它是算法的一种人工版本。通过吞食和反刍旧故事去创造新故事,这也是生成式 AI 的标志性方法。《怪奇物语》比 AI 垃圾要好得多,至少孩子们的手指数是正常的。但无论它的超自然恐怖多么惊悚,它最终还是在利用人们的怀旧情感。

  6. TikTok 日本月活用户逾 4200 万

    TikTok 宣布该应用在日本的月活人数已超过 4200 万人。2022 年 11 月时是 2120 万人,约 3 年里翻了一番。它的全球用户超过 10 亿人。TikTok 于今年 6 月在日本推出了电商功能 TikTok Shop 服务。它在全球 18 个市场开展电商业务。现有 APP 中追加电商功能后,用户可以在应用内购买商品。

  7. 欧洲新法律要求社媒为金融诈骗承担责任

    根据欧盟议员周四凌晨达成的协议,包括 Meta 和 TikTok 在内的社媒平台将需要为金融欺诈行为承担责任。社交媒体充斥着金融诈骗,欧洲议员力促追究大型科技公司和银行的责任,而欧盟各成员国认为,如果银行的防范措施不够完善,应该追究其责任。根据新法律,如果骗子冒充银行骗取受害者钱财,或者银行未经同意就处理了付款,银行应赔偿受害者;如果社媒公司未能及时清除已被举报的网络诈骗,则必须向银行支付赔偿。代表亚马逊、Google、Meta 和苹果等公司的欧洲科技行业组织批评了新法律。

  8. 现代家猫起源于北非野猫

    根据发表在《科学》期刊上的一项研究,家猫抵达欧洲的时间可能比之前认为的要晚得多:它们约在 2000 年前才来到欧洲,家猫的到来也不是因为旧石器时代近东农民的向外扩张。这些发现为人类最神秘的动物伴侣之一的起源提供了新的见解,并确定北非才是现代家猫的摇篮。研究人员对 87 个古代和现代猫的基因组进行了古基因组学分析。他们发现家猫最有可能起源于北非(而非黎凡特)野猫,而真正的家猫仅在进入新石器时代之后数千年才出现于欧洲和西南亚;这些发现与之前的研究结果相左。基因学分析显示,早期见于欧洲和土耳其的猫属欧洲野猫,其反映的是发生在古代的杂交而非早期驯化。当北非家猫被引入后,它们(通常沿着罗马军用道路)在整个欧洲迅速扩散,并在公元 1 世纪时到达不列颠。

  9. 改变推荐算法排名能改变一个人的政治立场

    发表在《科学》期刊上的一项新实验,采用了独立于 X 平台算法的由 AI 驱动的浏览器扩展程序来重新排序 X/Twitter 上的信息流,结果表明即使所接触的仅是敌对政治内容的微小变化,也能在数天内显著影响用户对反对党的观感。这些发现为算法所控帖子排名对用户社交媒体信息流会有影响提供了直接的因果证据。社交媒体已成为全球许多人获取政治信息的重要来源。然而平台算法对我们在浏览过程中所接触的内容施加了强大的影响力,因为这些算法能以种种难以理解的方式悄然左右人们的思想、情绪和行为。尽管人们就这些排名算法影响我们的方式提出了诸多解释,但要验证这些理论却异常困难。这是因为平台运营方独自掌控着其专有算法的运行方式,而且只有他们才能尝试不同信息流设计并评估其因果效应。为规避这些难题,研究人员创建了浏览器扩展,在人们浏览社交媒体信息流时对信息流实时进行重新排序,这一过程无需获得平台本身的许可。研究表明算法介导的与政治敌意内容的接触既能塑造情感极化,也能实时调控用户在使用平台过程中的瞬时情绪反应。

  10. KDE Plasma 6.8 将只支持 Wayland

    KDE Plasma 团队宣布即将发布的 v6.8 将只支持 Wayland,停止支持 X11。KDE Plasma 6.7 系列的最后一个版本预计会在 2027 年初发布,对 X11 会话的支持将会持续到 2027 年初。如果用户想要继续使用 X11,可以选择支持 Plasma X11 的长期支持发行版如 AlmaLinux 9,它会一直支持到 2032 年。而 X11 应用仍然可以通过 Xwayland 兼容层运行。