OrangeBot.AI Digest — 2025-11-29
52 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- All it takes is for one to work out (alearningaday.blog)
- Be Like Clippy (be-clippy.com)
- Electric vehicle sales are booming in South America – without Tesla (www.reuters.com)
- Iceland declares ocean-current instability a national security risk (edition.cnn.com)
- Major AI conference flooded with peer reviews written by AI (www.nature.com)
- The CRDT Dictionary: A Field Guide to Conflict-Free Replicated Data Types (www.iankduncan.com)
- DNS LOC Record (2014) (blog.cloudflare.com)
- It's Always the Process, Stupid (its.promp.td)
- Hachi: An Image Search Engine (eagledot.xyz)
- Leak confirms OpenAI is preparing ads on ChatGPT for public roll out (www.bleepingcomputer.com)
- High air pollution could diminish exercise benefits by half – study (scienceclock.com)
- Belgian Police exposed using botnets to manipulate EU data law impact assessment (old.reddit.com)
- Anthony Bourdain's Lost Li.st's (bourdain.greg.technology)
- Show HN: Explore what the browser exposes about you (neberej.github.io)
- Garfield's Proof of the Pythagorean Theorem (en.wikipedia.org)
GitHub Trending(15)
- sansan0 / TrendRadar
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署⭐ 让算法为你服务,用AI理解热点
- google / adk-go
An open-source, code-first Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
- TapXWorld / ChinaTextbook
所有小初高、大学PDF教材。
- yeongpin / cursor-free-vip
[Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please let us know if you believe this is a mistake.
- nvm-sh / nvm
Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions
- traefik / traefik
The Cloud Native Application Proxy
- HKUDS / LightRAG
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
- bobeff / open-source-games
A list of open source games.
- volcengine / verl
verl: Volcano Engine Reinforcement Learning for LLMs
- GibsonAI / Memori
Open-Source Memory Engine for LLMs, AI Agents & Multi-Agent Systems
- yangshun / tech-interview-handbook
Curated coding interview preparation materials for busy software engineers
- microsoft / call-center-ai
Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!
- MustardChef / WSABuilds
Run Windows Subsystem For Android on your Windows 10 and Windows 11 PC using prebuilt binaries with Google Play Store (MindTheGapps) and/or Magisk or KernelSU (root solutions) built in.
- playcanvas / engine
Powerful web graphics runtime built on WebGL, WebGPU, WebXR and glTF
- iptv-org / iptv
Collection of publicly available IPTV channels from all over the world
Hugging Face(7)
- Video Generation Models Are Good Latent Reward Models
Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach incurs substantial memory overhead and increased training time, and its late-stage optimization lacks early-stage supervision, refining only visual quality rather than fundamental motion dynamics and structural coherence. In this work, we show that pre-trained video generation models are naturally suited for reward modeling in the noisy latent space, as they are explicitly designed to process noisy latent representations at arbitrary timesteps and inherently preserve temporal information through their sequential modeling capabilities. Accordingly, we propose Process Reward Feedback Learning~(PRFL), a framework that conducts preference optimization entirely in latent space, enabling efficient gradient backpropagation throughout the full denoising chain without VAE decoding. Extensive experiments demonstrate that PRFL significantly improves alignment with human preferences, while achieving substantial reductions in memory consumption and training time compared to RGB ReFL.
- Canvas-to-Image: Compositional Image Generation with Multimodal Controls
While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references, spatial arrangements, pose constraints, and layout annotations. We introduce Canvas-to-Image, a unified framework that consolidates these heterogeneous controls into a single canvas interface, enabling users to generate images that faithfully reflect their intent. Our key idea is to encode diverse control signals into a single composite canvas image that the model can directly interpret for integrated visual-spatial reasoning. We further curate a suite of multi-task datasets and propose a Multi-Task Canvas Training strategy that optimizes the diffusion model to jointly understand and integrate heterogeneous controls into text-to-image generation within a unified learning paradigm. This joint training enables Canvas-to-Image to reason across multiple control modalities rather than relying on task-specific heuristics, and it generalizes well to multi-control scenarios during inference. Extensive experiments show that Canvas-to-Image significantly outperforms state-of-the-art methods in identity preservation and control adherence across challenging benchmarks, including multi-person composition, pose-controlled composition, layout-constrained generation, and multi-control generation.
- ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Embodied cognition argues that intelligence arises from sensorimotor interaction rather than passive observation. It raises an intriguing question: do modern vision-language models (VLMs), trained largely in a disembodied manner, exhibit signs of embodied cognition? We introduce ENACT, a benchmark that casts evaluation of embodied cognition as world modeling from egocentric interaction in a visual question answering (VQA) format. Framed as a partially observable Markov decision process (POMDP) whose actions are scene graph changes, ENACT comprises two complementary sequence reordering tasks: forward world modeling (reorder shuffled observations given actions) and inverse world modeling (reorder shuffled actions given observations). While conceptually simple, solving these tasks implicitly demands capabilities central to embodied cognition-affordance recognition, action-effect reasoning, embodied awareness, and interactive, long-horizon memory from partially observable egocentric input, while avoiding low-level image synthesis that could confound the evaluation. We provide a scalable pipeline that synthesizes QA pairs from robotics simulation (BEHAVIOR) and evaluates models on 8,972 QA pairs spanning long-horizon home-scale activities. Experiments reveal a performance gap between frontier VLMs and humans that widens with interaction horizon. Models consistently perform better on the inverse task than the forward one and exhibit anthropocentric biases, including a preference for right-handed actions and degradation when camera intrinsics or viewpoints deviate from human vision. Website at https://enact-embodied-cognition.github.io/.
- MIRA: Multimodal Iterative Reasoning Agent for Image Editing
Instruction-guided image editing offers an intuitive way for users to edit images with natural language. However, diffusion-based editing models often struggle to accurately interpret complex user instructions, especially those involving compositional relationships, contextual cues, or referring expressions, leading to edits that drift semantically or fail to reflect the intended changes. We tackle this problem by proposing MIRA (Multimodal Iterative Reasoning Agent), a lightweight, plug-and-play multimodal reasoning agent that performs editing through an iterative perception-reasoning-action loop, effectively simulating multi-turn human-model interaction processes. Instead of issuing a single prompt or static plan, MIRA predicts atomic edit instructions step by step, using visual feedback to make its decisions. Our 150K multimodal tool-use dataset, MIRA-Editing, combined with a two-stage SFT + GRPO training pipeline, enables MIRA to perform reasoning and editing over complex editing instructions. When paired with open-source image editing models such as Flux.1-Kontext, Step1X-Edit, and Qwen-Image-Edit, MIRA significantly improves both semantic consistency and perceptual quality, achieving performance comparable to or exceeding proprietary systems such as GPT-Image and Nano-Banana.
- What does it mean to understand language?
Language understanding entails not just extracting the surface-level meaning of the linguistic input, but constructing rich mental models of the situation it describes. Here we propose that because processing within the brain's core language system is fundamentally limited, deeply understanding language requires exporting information from the language system to other brain regions that compute perceptual and motor representations, construct mental models, and store our world knowledge and autobiographical memories. We review the existing evidence for this hypothesis, and argue that recent progress in cognitive neuroscience provides both the conceptual foundation and the methods to directly test it, thus opening up a new strategy to reveal what it means, cognitively and neurally, to understand language.
- Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
Large multimodal models (LMMs) are increasingly adopted as judges in multimodal evaluation systems due to their strong instruction following and consistency with human preferences. However, their ability to follow diverse, fine-grained evaluation criteria remains underexplored. We develop Multi-Crit, a benchmark for evaluating multimodal judges on their capacity to follow pluralistic criteria and produce reliable criterion-level judgments. Covering both open-ended generation and verifiable reasoning tasks, Multi-Crit is built through a rigorous data curation pipeline that gathers challenging response pairs with multi-criterion human annotations. It further introduces three novel metrics for systematically assessing pluralistic adherence, criterion-switching flexibility, and the ability to recognize criterion-level preference conflicts. Comprehensive analysis of 25 LMMs reveals that 1) proprietary models still struggle to maintain consistent adherence to pluralistic criteria--especially in open-ended evaluation; 2) open-source models lag further behind in flexibly following diverse criteria; and 3) critic fine-tuning with holistic judgment signals enhances visual grounding but fails to generalize to pluralistic criterion-level judgment. Additional analyses on reasoning fine-tuning, test-time scaling, and boundary consistency between open-source and proprietary models further probe the limits of current multimodal judges. As a pioneering study, Multi-Crit lays the foundation for building reliable and steerable multimodal AI evaluation.
- Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for reuse. However, trajectory-based memory suffers from brevity bias, gradually losing essential domain knowledge. More critically, even in truly multimodal problem-solving settings, it records only a single-modality trace of past behavior, failing to preserve how visual attention and logical reasoning jointly contributed to the solution. This is fundamentally misaligned with human cognition: semantic memory is both multimodal and integrated, preserving visual and abstract knowledge through coordinated but distinct representational streams. We thus introduce ViLoMem, a dual-stream memory framework that constructs compact, schema-based memory. It separately encodes visual distraction patterns and logical reasoning errors, enabling MLLMs to learn from their successful and failed experiences. Following a grow-and-refine principle, the system incrementally accumulates and updates multimodal semantic knowledge -- preserving stable, generalizable strategies while avoiding catastrophic forgetting. Across six multimodal benchmarks, ViLoMem consistently improves pass@1 accuracy and substantially reduces repeated visual and logical errors. Ablations confirm the necessity of dual-stream memory with explicit distraction--hallucination separation, demonstrating the value of error-aware multimodal memory for lifelong and cross-domain agentic learning. Our project page will be available at https://weihao-bo.github.io/ViLoMeo-page.
Solidot(15)
- 天文学家观测到红矮星的日冕物质抛射
天文学家首次观测到一颗红矮星的日冕物质抛射(Coronal Mass Ejection, CME)。这是第一次直接探测到来自附近恒星的高能量 Type II 无线电波爆发。这次爆发的来源为 M 型红矮星 StKM-1262,质量仅为太阳 60%,距离太阳系约 130 光年,位于天龙座边界。这次被捕捉到的 CME 威力比太阳典型的 CME 强上一万至十万倍,特征与强度类似于太阳的 Type II 爆发,然而这类爆发只占太阳全部 CME 事件的0.05%,属于极端事件。若从爆发的速度及发射频率推算,这次事件抛出的等离子粒子密度到达 M 型红矮星适居带边缘(0.2AU)时,足以把地球等级磁场行星的磁层压缩至行星表面,对行星大气是毁灭性的打击。
- 荷兰大学反思对微软软件的依赖
今年早些时候,荷兰海牙的国际刑事法院以战争罪对以色列总理内塔尼亚胡及前国防部长加兰特(Yoav Gallant)发出逮捕令,美国总统特朗普对首席检察官 Karim Khan 等人进行了制裁,微软立即封锁了 Khan 的电邮账户,迫使他改用瑞士电邮服务 Proton。这一事件促使欧洲各国和教育机构反思对美国科技公司的依赖。其中之一是荷兰大学。大学的学生、教职工以及 IT 管理员都广泛使用和依赖微软软件,教育机构还在微软云服务储存了大量数据。荷兰有七所大学和一所大学学院因切断或冻结与以色列机构的联系,被美国佛罗里达州列入制裁名单,在反复无常的特朗普领导下,荷兰的教育机构可能随时面临惩罚。但荷兰大学能摆脱微软吗?教授们指出,如果停止使用微软软件会导致教育和研究立即停滞,认为对科技巨头的依赖从根本上违背了自由、独立、自主和平等的公共价值观,呼吁与欧洲其它大学合作建立自主的 IT 基础设施。
- 因强太阳辐射可能破坏飞控数据空客对全世界六千架客机进行软件更新
今年 10 月 JetBlue Airways 公司一架飞美墨航线的空客飞机遭遇了飞行高度突然下降的事件,客机在佛罗里达紧急迫降,至少 15 人受伤。调查发现,强太阳辐射可能会破坏飞控功能的关键数据。为确保飞行安全空客宣布对全世界大约 6000 架客机进行紧急软件更新。受影响的主要是 A320 机型,以及部分 A318、A319 和 A321 机型。其中 5100 架客机只需要软件更新就能恢复飞行,还有 900 架旧型号客机则需要更换机载计算机,在完成更换前飞机将无法恢复飞行。
- 禽流感病毒能抵抗高烧
根据发表在《科学》期刊上的一项研究,禽流感病毒能抵抗高烧,对人类构成了严重威胁。高烧是人体阻止病毒传播的一种自我防御机制,它可以使体温最高达到 41°C。而禽流感病毒天然宿主肠道内的温度最高可以达到 40°C 至 42°C。对小鼠实验发现,将体温升高至高烧水平能有效阻止源自人类流感病毒复制,但不太可能阻止禽流感病毒的复制。研究人员表示,幸运的是目前人类感染禽流感病毒的病例很少,每年几十例。但禽流感的致死率仍然令人担忧,如 H5N1 感染的死亡率超过 40%。
- 《怪奇物语》定义了算法时代
在 2016 年夏天,流媒体公司仍然在摸索如何制作原创剧集:是《Sense8》那样的实验性作品,还是重启版《Arrested Development》那样的非线性叙事,或者《House of Cards》那样的高质量剧情?一部以 1980 年代为背景的爆米花恐怖惊悚片《怪奇物语》给出了答案,它定义了我们所生活的算法时代。《怪奇物语》于本周上映第四季也是最终季。它融合了斯皮尔伯格(Steven Spielberg)电影中的青少成长主题,斯蒂芬金(Stephen King)作品中的恐怖元素和青少年友谊,1980 年代的超自然题材,休斯(John Hughes)作品中的青少年对立...它就像一个装满怀旧糖果的万圣节碗,巧妙唤起观众过去所喜爱的流行文化,换句话说,它是算法的一种人工版本。通过吞食和反刍旧故事去创造新故事,这也是生成式 AI 的标志性方法。《怪奇物语》比 AI 垃圾要好得多,至少孩子们的手指数是正常的。但无论它的超自然恐怖多么惊悚,它最终还是在利用人们的怀旧情感。
- TikTok 日本月活用户逾 4200 万
TikTok 宣布该应用在日本的月活人数已超过 4200 万人。2022 年 11 月时是 2120 万人,约 3 年里翻了一番。它的全球用户超过 10 亿人。TikTok 于今年 6 月在日本推出了电商功能 TikTok Shop 服务。它在全球 18 个市场开展电商业务。现有 APP 中追加电商功能后,用户可以在应用内购买商品。
- 欧洲新法律要求社媒为金融诈骗承担责任
根据欧盟议员周四凌晨达成的协议,包括 Meta 和 TikTok 在内的社媒平台将需要为金融欺诈行为承担责任。社交媒体充斥着金融诈骗,欧洲议员力促追究大型科技公司和银行的责任,而欧盟各成员国认为,如果银行的防范措施不够完善,应该追究其责任。根据新法律,如果骗子冒充银行骗取受害者钱财,或者银行未经同意就处理了付款,银行应赔偿受害者;如果社媒公司未能及时清除已被举报的网络诈骗,则必须向银行支付赔偿。代表亚马逊、Google、Meta 和苹果等公司的欧洲科技行业组织批评了新法律。
- 现代家猫起源于北非野猫
根据发表在《科学》期刊上的一项研究,家猫抵达欧洲的时间可能比之前认为的要晚得多:它们约在 2000 年前才来到欧洲,家猫的到来也不是因为旧石器时代近东农民的向外扩张。这些发现为人类最神秘的动物伴侣之一的起源提供了新的见解,并确定北非才是现代家猫的摇篮。研究人员对 87 个古代和现代猫的基因组进行了古基因组学分析。他们发现家猫最有可能起源于北非(而非黎凡特)野猫,而真正的家猫仅在进入新石器时代之后数千年才出现于欧洲和西南亚;这些发现与之前的研究结果相左。基因学分析显示,早期见于欧洲和土耳其的猫属欧洲野猫,其反映的是发生在古代的杂交而非早期驯化。当北非家猫被引入后,它们(通常沿着罗马军用道路)在整个欧洲迅速扩散,并在公元 1 世纪时到达不列颠。
- 改变推荐算法排名能改变一个人的政治立场
发表在《科学》期刊上的一项新实验,采用了独立于 X 平台算法的由 AI 驱动的浏览器扩展程序来重新排序 X/Twitter 上的信息流,结果表明即使所接触的仅是敌对政治内容的微小变化,也能在数天内显著影响用户对反对党的观感。这些发现为算法所控帖子排名对用户社交媒体信息流会有影响提供了直接的因果证据。社交媒体已成为全球许多人获取政治信息的重要来源。然而平台算法对我们在浏览过程中所接触的内容施加了强大的影响力,因为这些算法能以种种难以理解的方式悄然左右人们的思想、情绪和行为。尽管人们就这些排名算法影响我们的方式提出了诸多解释,但要验证这些理论却异常困难。这是因为平台运营方独自掌控着其专有算法的运行方式,而且只有他们才能尝试不同信息流设计并评估其因果效应。为规避这些难题,研究人员创建了浏览器扩展,在人们浏览社交媒体信息流时对信息流实时进行重新排序,这一过程无需获得平台本身的许可。研究表明算法介导的与政治敌意内容的接触既能塑造情感极化,也能实时调控用户在使用平台过程中的瞬时情绪反应。
- KDE Plasma 6.8 将只支持 Wayland
KDE Plasma 团队宣布即将发布的 v6.8 将只支持 Wayland,停止支持 X11。KDE Plasma 6.7 系列的最后一个版本预计会在 2027 年初发布,对 X11 会话的支持将会持续到 2027 年初。如果用户想要继续使用 X11,可以选择支持 Plasma X11 的长期支持发行版如 AlmaLinux 9,它会一直支持到 2032 年。而 X11 应用仍然可以通过 Xwayland 兼容层运行。
- 五角大楼建议将阿里巴巴百度等加入到协助中国军方的名单
美国五角大楼建议将阿里巴巴、百度和比亚迪纳入协助中国军方的1260H名单。1260H名单虽无直接法律效力,但对美国投资者具有重要警示作用。美国国防部副部长 Stephen Feinberg 在 10 月 7 日的信函中表示,这三家企业连同另外五家,包括成都新易盛通信技术、华虹半导体、RoboSense速腾聚创、药明康德以及中际旭创,都应被列入1260H名单。阿里巴巴在一份声明中说,将该公司列入名单“毫无依据”,“阿里巴巴既非中国军事企业,也未参与任何军民融合战略” 。阿里巴巴进一步指出,由于公司不从事与美国军事采购相关的业务,被列入1260H清单不影响它在美国或全球任何地区正常开展业务。
- 基于 Source 2 的 s&box 引擎开源
沙盒游戏《Garry's Mod》和生存游戏《Rust》的开发商 Facepunch Studios 宣布在 MIT 许可证下开源其正在开发的新一代沙盒引擎 S&box,游戏代码托管在 GitHub 上。开发者表示此举受到了开源游戏引擎 Godot 成功的启发。S&box 是基于 Valve 的 Source 2 引擎,底层的 Source 2 引擎并没有开源(需要由 Valve 决定是否开源),s&box 开源的是底层之上的编辑器、网络、场景系统、用户界面等,这些部分都是用 C# 开发的。
- 手术期间播放音乐有助于减少用药量改善术后恢复
在印度首都德里一间手术室内,医生们准备切除患者的胆囊。患者全身麻醉,但却戴着播放音乐的耳机。虽然麻醉期间患者的大脑大部分区域不活跃,但听觉通路仍然维持部分活跃。根据印度科学家在《Music and Medicine》期刊上发表的一项研究,全身麻醉期间播放音乐能显著减少用药量和改善术后恢复。研究主要针对通过腹腔镜技术进行的胆囊切除手术。这是一种标准的微创切除手术,持续时间短,通常不到一小时,因此需要患者迅速且“头脑清醒”的恢复。在试验中,佩戴耳机播放音乐的患者需要的药物剂量更低,术后恢复更平稳,压力激素水平更低,且在手术过程中血压得到更好的控制。
- NASA 漫游车在火星上发现闪电
根据发表在《自然》期刊上的一项研究,毅力号火星漫游车在两个火星年的观测中记录到了 55 次闪电现象。闪电是由于大气湍流使粒子相互碰撞摩擦产生电荷。电荷积累到一定程度后以放电形式释放。闪电在地球上无处不在。科学家认为火星也存在放电现象,虽然火星大气层主要由二氧化碳组成,比地球更稀薄干燥。毅力号漫游车配备了能探测闪电迹象的仪器。法国图卢兹大学行星科学家 Baptiste Chide 领导的团队分析了漫游车 SuperCam 收集的数据,发现了 55 个事件,其中 7 个事件完整捕捉到了放电特征。大多数放电事件都非常微弱,能量仅为 0.1-150 纳焦耳。第七次事件最大,能量达到 40 毫焦耳。相比下,地球的一次云地闪电释放的能量约为 10 亿焦耳。
- 戴尔称 Windows 11 换代速度比 Windows 10 慢
戴尔 COO Jeffrey Clarke 称 Windows 11 PC 的更新换代速度比 Windows 10 慢。如果以上一代操作系统停止支持这一时间点进行比较,Windows 11 普及率比 Windows 10 落后 10 到 12 个百分点。由于微软提高了硬件需求,现有的 Windows 10 PC 很多无法升级到 Windows 11。Clarke 表示有 5 亿台 PC 无法运行 Windows 11,还有同样数量的 PC 能升级到 Windows 11。戴尔三季度营收为 270 亿美元,同比增长 11%,预计四季度营收将达到 315 亿美元,2026 财年营收将达到 1117 亿美元,分别同比增长 32% 和 17%。