DIGEST · 2025-10-28

OrangeBot.AI Digest — 2025-10-28

60 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Grokipedia by xAI (grokipedia.com)
  2. Samsung makes ads on smart fridges official with upcoming software update (arstechnica.com)
  3. What we talk about when we talk about sideloading (f-droid.org)
  4. Using AI to negotiate a $195k hospital bill down to $33k (www.threads.com)
  5. The AirPods Pro 3 flight problem (basicappleguy.com)
  6. Hi, it's me, Wikipedia, and I am ready for your apology (www.mcsweeneys.net)
  7. EuroLLM: LLM made in Europe built to support all 24 official EU languages (eurollm.io)
  8. Vitamin D reduces incidence and duration of colds in those with low levels (ijmpr.in)
  9. Ubiquiti SFP Wizard (blog.ui.com)
  10. Washington Post editorials omit a key disclosure: Bezos' financial ties (www.npr.org)
  11. Austrian ministry kicks out Microsoft in favor of Nextcloud (news.itsfoss.com)
  12. The next chapter of the Microsoft–OpenAI partnership (openai.com)
  13. Amazon confirms 14,000 job losses in corporate division (www.bbc.com)
  14. Show HN: Bash Screensavers (github.com)
  15. Your vibe coded slop PR is not welcome (samsaffron.com)

GitHub Trending(15)

  1. toeverything / AFFiNE

    There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable and ready to use.

  2. yeongpin / cursor-free-vip

    [Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please let us know if you believe this is a mistake.

  3. microsoft / agent-lightning

    The absolute trainer to light up AI agents.

  4. spipm / Depixelization_poc

    Depix is a PoC for a technique to recover plaintext from pixelized screenshots.

  5. longbridge / gpui-component

    Rust GUI components for building fantastic cross-platform desktop application by using GPUI.

  6. juanfont / headscale

    An open source, self-hosted implementation of the Tailscale control server

  7. harvard-edge / cs249r_book

    Introduction to Machine Learning Systems

  8. qeeqbox / social-analyzer

    API, CLI, and Web App for analyzing and finding a person's profile in 1000 social media \ websites

  9. patchy631 / ai-engineering-hub

    In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

  10. cloudcommunity / Free-Certifications

    A curated list of free courses with certifications. Also available at https://free-certifications.com/

  11. coinbase / x402

    A payments protocol for the internet. Built on HTTP.

  12. Shubhamsaboo / awesome-llm-apps

    Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

  13. cjpais / Handy

    A free, open source, and extensible speech-to-text application that works completely offline.

  14. iam-veeramalla / aws-devops-zero-to-hero

    AWS zero to hero repo for devops engineers to learn AWS in 30 Days. This repo includes projects, presentations, interview questions and real time examples.

  15. codecrafters-io / build-your-own-x

    Master programming by recreating your favorite technologies from scratch.

Hugging Face(15)

  1. Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

    Humans learn abstract concepts through multisensory synergy, and once formed, such representations can often be recalled from a single modality. Inspired by this principle, we introduce Concerto, a minimalist simulation of human concept learning for spatial cognition, combining 3D intra-modal self-distillation with 2D-3D cross-modal joint embedding. Despite its simplicity, Concerto learns more coherent and informative spatial features, as demonstrated by zero-shot visualizations. It outperforms both standalone SOTA 2D and 3D self-supervised models by 14.2% and 4.8%, respectively, as well as their feature concatenation, in linear probing for 3D scene perception. With full fine-tuning, Concerto sets new SOTA results across multiple scene understanding benchmarks (e.g., 80.7% mIoU on ScanNet). We further present a variant of Concerto tailored for video-lifted point cloud spatial understanding, and a translator that linearly projects Concerto representations into CLIP's language space, enabling open-world perception. These results highlight that Concerto emerges spatial representations with superior fine-grained geometric and semantic consistency.

  2. ReCode: Unify Plan and Action for Universal Granularity Control

    Real-world tasks require decisions at varying granularities, and humans excel at this by leveraging a unified cognitive representation where planning is fundamentally understood as a high-level form of action. However, current Large Language Model (LLM)-based agents lack this crucial capability to operate fluidly across decision granularities. This limitation stems from existing paradigms that enforce a rigid separation between high-level planning and low-level action, which impairs dynamic adaptability and limits generalization. We propose ReCode (Recursive Code Generation), a novel paradigm that addresses this limitation by unifying planning and action within a single code representation. In this representation, ReCode treats high-level plans as abstract placeholder functions, which the agent then recursively decomposes into finer-grained sub-functions until reaching primitive actions. This recursive approach dissolves the rigid boundary between plan and action, enabling the agent to dynamically control its decision granularity. Furthermore, the recursive structure inherently generates rich, multi-granularity training data, enabling models to learn hierarchical decision-making processes. Extensive experiments show ReCode significantly surpasses advanced baselines in inference performance and demonstrates exceptional data efficiency in training, validating our core insight that unifying planning and action through recursive code generation is a powerful and effective approach to achieving universal granularity control. The code is available at https://github.com/FoundationAgents/ReCode.

  3. A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

    The rapid advancement of large language models (LLMs) has spurred the emergence of data agents--autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent" currently suffers from terminological ambiguity and inconsistent adoption, conflating simple query responders with sophisticated autonomous architectures. This terminological ambiguity fosters mismatched user expectations, accountability challenges, and barriers to industry growth. Inspired by the SAE J3016 standard for driving automation, this survey introduces the first systematic hierarchical taxonomy for data agents, comprising six levels that delineate and trace progressive shifts in autonomy, from manual operations (L0) to a vision of generative, fully autonomous data agents (L5), thereby clarifying capability boundaries and responsibility allocation. Through this lens, we offer a structured review of existing research arranged by increasing autonomy, encompassing specialized data agents for data management, preparation, and analysis, alongside emerging efforts toward versatile, comprehensive systems with enhanced autonomy. We further analyze critical evolutionary leaps and technical gaps for advancing data agents, especially the ongoing L2-to-L3 transition, where data agents evolve from procedural execution to autonomous orchestration. Finally, we conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.

  4. FARMER: Flow AutoRegressive Transformer over Pixels

    Directly modeling the explicit likelihood of the raw data distribution is key topic in the machine learning area, which achieves the scaling successes in Large Language Models by autoregressive modeling. However, continuous AR modeling over visual pixel data suffer from extremely long sequences and high-dimensional spaces. In this paper, we present FARMER, a novel end-to-end generative framework that unifies Normalizing Flows (NF) and Autoregressive (AR) models for tractable likelihood estimation and high-quality image synthesis directly from raw pixels. FARMER employs an invertible autoregressive flow to transform images into latent sequences, whose distribution is modeled implicitly by an autoregressive model. To address the redundancy and complexity in pixel-level modeling, we propose a self-supervised dimension reduction scheme that partitions NF latent channels into informative and redundant groups, enabling more effective and efficient AR modeling. Furthermore, we design a one-step distillation scheme to significantly accelerate inference speed and introduce a resampling-based classifier-free guidance algorithm to boost image generation quality. Extensive experiments demonstrate that FARMER achieves competitive performance compared to existing pixel-based generative models while providing exact likelihoods and scalable training.

  5. Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation

    Audio-driven human animation models often suffer from identity drift during temporal autoregressive generation, where characters gradually lose their identity over time. One solution is to generate keyframes as intermediate temporal anchors that prevent degradation, but this requires an additional keyframe generation stage and can restrict natural motion dynamics. To address this, we propose Lookahead Anchoring, which leverages keyframes from future timesteps ahead of the current generation window, rather than within it. This transforms keyframes from fixed boundaries into directional beacons: the model continuously pursues these future anchors while responding to immediate audio cues, maintaining consistent identity through persistent guidance. This also enables self-keyframing, where the reference image serves as the lookahead target, eliminating the need for keyframe generation entirely. We find that the temporal lookahead distance naturally controls the balance between expressivity and consistency: larger distances allow for greater motion freedom, while smaller ones strengthen identity adherence. When applied to three recent human animation models, Lookahead Anchoring achieves superior lip synchronization, identity preservation, and visual quality, demonstrating improved temporal conditioning across several different architectures. Video results are available at the following link: https://lookahead-anchoring.github.io.

  6. VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

    Current Vision-Language-Action (VLA) models are often constrained by a rigid, static interaction paradigm, which lacks the ability to see, hear, speak, and act concurrently as well as handle real-time user interruptions dynamically. This hinders seamless embodied collaboration, resulting in an inflexible and unresponsive user experience. To address these limitations, we introduce VITA-E, a novel embodied interaction framework designed for both behavioral concurrency and nearly real-time interruption. The core of our approach is a dual-model architecture where two parallel VLA instances operate as an ``Active Model'' and a ``Standby Model'', allowing the embodied agent to observe its environment, listen to user speech, provide verbal responses, and execute actions, all concurrently and interruptibly, mimicking human-like multitasking capabilities. We further propose a ``model-as-controller'' paradigm, where we fine-tune the VLM to generate special tokens that serve as direct system-level commands, coupling the model's reasoning with the system's behavior. Experiments conducted on a physical humanoid platform demonstrate that VITA-E can reliably handle complex interactive scenarios. Our framework is compatible with various dual-system VLA models, achieving an extremely high success rate on emergency stops and speech interruptions while also successfully performing concurrent speech and action. This represents a significant step towards more natural and capable embodied assistants.

  7. ACG: Action Coherence Guidance for Flow-based VLA models

    Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.

  8. Open Multimodal Retrieval-Augmented Factual Image Generation

    Large Multimodal Models (LMMs) have achieved remarkable progress in generating photorealistic and prompt-aligned images, but they often produce outputs that contradict verifiable knowledge, especially when prompts involve fine-grained attributes or time-sensitive events. Conventional retrieval-augmented approaches attempt to address this issue by introducing external information, yet they are fundamentally incapable of grounding generation in accurate and evolving knowledge due to their reliance on static sources and shallow evidence integration. To bridge this gap, we introduce ORIG, an agentic open multimodal retrieval-augmented framework for Factual Image Generation (FIG), a new task that requires both visual realism and factual grounding. ORIG iteratively retrieves and filters multimodal evidence from the web and incrementally integrates the refined knowledge into enriched prompts to guide generation. To support systematic evaluation, we build FIG-Eval, a benchmark spanning ten categories across perceptual, compositional, and temporal dimensions. Experiments demonstrate that ORIG substantially improves factual consistency and overall image quality over strong baselines, highlighting the potential of open multimodal retrieval for factual image generation.

  9. IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

    Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions, enabling coherent and accurate understanding of complex scenes. However, most prior approaches prioritize training large geometry models for low-level 3D reconstruction and treat high-level spatial understanding in isolation, overlooking the crucial interplay between these two fundamental aspects of 3D-scene analysis, thereby limiting generalization and leading to poor performance in downstream 3D understanding tasks. Recent attempts have mitigated this issue by simply aligning 3D models with specific language models, thus restricting perception to the aligned model's capacity and limiting adaptability to downstream tasks. In this paper, we propose InstanceGrounded Geometry Transformer (IGGT), an end-to-end large unified transformer to unify the knowledge for both spatial reconstruction and instance-level contextual understanding. Specifically, we design a 3D-Consistent Contrastive Learning strategy that guides IGGT to encode a unified representation with geometric structures and instance-grounded clustering through only 2D visual inputs. This representation supports consistent lifting of 2D visual inputs into a coherent 3D scene with explicitly distinct object instances. To facilitate this task, we further construct InsScene-15K, a large-scale dataset with high-quality RGB images, poses, depth maps, and 3D-consistent instance-level mask annotations with a novel data curation pipeline.

  10. E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

    Text embedding models serve as a fundamental component in real-world search applications. By mapping queries and documents into a shared embedding space, they deliver competitive retrieval performance with high efficiency. However, their ranking fidelity remains limited compared to dedicated rerankers, especially recent LLM-based listwise rerankers, which capture fine-grained query-document and document-document interactions. In this paper, we propose a simple yet effective unified framework E^2Rank, means Efficient Embedding-based Ranking (also means Embedding-to-Rank), which extends a single text embedding model to perform both high-quality retrieval and listwise reranking through continued training under a listwise ranking objective, thereby achieving strong effectiveness with remarkable efficiency. By applying cosine similarity between the query and document embeddings as a unified ranking function, the listwise ranking prompt, which is constructed from the original query and its candidate documents, serves as an enhanced query enriched with signals from the top-K documents, akin to pseudo-relevance feedback (PRF) in traditional retrieval models. This design preserves the efficiency and representational quality of the base embedding model while significantly improving its reranking performance. Empirically, E^2Rank achieves state-of-the-art results on the BEIR reranking benchmark and demonstrates competitive performance on the reasoning-intensive BRIGHT benchmark, with very low reranking latency. We also show that the ranking training process improves embedding performance on the MTEB benchmark. Our findings indicate that a single embedding model can effectively unify retrieval and reranking, offering both computational efficiency and competitive ranking accuracy.

  11. Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

    Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.

  12. Knocking-Heads Attention

    Multi-head attention (MHA) has become the cornerstone of modern large language models, enhancing representational capacity through parallel attention heads. However, increasing the number of heads inherently weakens individual head capacity, and existing attention mechanisms - whether standard MHA or its variants like grouped-query attention (GQA) and grouped-tied attention (GTA) - simply concatenate outputs from isolated heads without strong interaction. To address this limitation, we propose knocking-heads attention (KHA), which enables attention heads to "knock" on each other - facilitating cross-head feature-level interactions before the scaled dot-product attention. This is achieved by applying a shared, diagonally-initialized projection matrix across all heads. The diagonal initialization preserves head-specific specialization at the start of training while allowing the model to progressively learn integrated cross-head representations. KHA adds only minimal parameters and FLOPs and can be seamlessly integrated into MHA, GQA, GTA, and other attention variants. We validate KHA by training a 6.1B parameter MoE model (1.01B activated) on 1T high-quality tokens. Compared to baseline attention mechanisms, KHA brings superior and more stable training dynamics, achieving better performance across downstream tasks.

  13. PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity

    Multimodal large language models (MLLMs) have demonstrated strong general-purpose capabilities in open-world visual comprehension. However, most existing MLLMs primarily focus on holistic, scene-level understanding, often overlooking the need for fine-grained, object-centric reasoning. In this paper, we present PixelRefer, a unified region-level MLLM framework that enables advanced fine-grained understanding over user-specified regions across both images and videos. Motivated by the observation that LLM attention predominantly focuses on object-level tokens, we propose a Scale-Adaptive Object Tokenizer (SAOT) to generate compact and semantically rich object representations from free-form regions. Our analysis reveals that global visual tokens contribute mainly in early LLM layers, inspiring the design of PixelRefer-Lite, an efficient variant that employs an Object-Centric Infusion module to pre-fuse global context into object tokens. This yields a lightweight Object-Only Framework that substantially reduces computational cost while maintaining high semantic fidelity. To facilitate fine-grained instruction tuning, we curate PixelRefer-2.2M, a high-quality object-centric instruction dataset. Extensive experiments across a range of benchmarks validate that PixelRefer achieves leading performance with fewer training samples, while PixelRefer-Lite offers competitive accuracy with notable gains in efficiency.

  14. The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation

    The application of Reinforcement Learning with Verifiable Rewards (RLVR) to mathematical and coding domains has demonstrated significant improvements in the reasoning and problem-solving abilities of Large Language Models. Despite its success in single generation problem solving, the reinforcement learning fine-tuning process may harm the model's exploration ability, as reflected in decreased diversity of generations and a resulting degradation of performance during Best-of-N sampling for large N values. In this work, we focus on optimizing the max@k metric, a continuous generalization of pass@k. We derive an unbiased on-policy gradient estimate for direct optimization of this metric. Furthermore, we extend our derivations to the off-policy updates, a common element in modern RLVR algorithms, that allows better sample efficiency. Empirically, we show that our objective effectively optimizes max@k metric in off-policy scenarios, aligning the model with the Best-of-N inference strategy.

  15. LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation

    Unified multimodal models have recently shown remarkable gains in both capability and versatility, yet most leading systems are still trained from scratch and require substantial computational resources. In this paper, we show that competitive performance can be obtained far more efficiently by strategically fusing publicly available models specialized for either generation or understanding. Our key design is to retain the original blocks while additionally interleaving multimodal self-attention blocks throughout the networks. This double fusion mechanism (1) effectively enables rich multi-modal fusion while largely preserving the original strengths of the base models, and (2) catalyzes synergistic fusion of high-level semantic representations from the understanding encoder with low-level spatial signals from the generation encoder. By training with only ~ 35B tokens, this approach achieves strong results across multiple benchmarks: 0.91 on GenEval for compositional text-to-image generation, 82.16 on DPG-Bench for complex text-to-image generation, 6.06 on GEditBench, and 3.77 on ImgEdit-Bench for image editing. By fully releasing the entire suite of code, model weights, and datasets, we hope to support future research on unified multimodal modeling.

Solidot(15)

  1. 人类迁移的生物量超过所有陆地动物总和 40 倍

    一项研究发现,人类的生物量迁移可能达到所有野生陆地哺乳类、鸟类和陆生节肢动物总和的 40 倍以上。而另一项研究发现,野生哺乳动物的生物量自 1850 年以来已减少逾半,海洋哺乳类生物量下降尤其多——约 70%,主要源于较大物种的衰退,如蓝鲸、座头鲸、长须鲸和抹香鲸等。这些发现为全球迁移和动物生物量经时变化的构成带来了新见解。迁移性是动物的一个本质特征,通过觅食、迁徙和营养物质运输塑造生态系统。人类同样会广泛迁移,无论是步行还是借助飞机、火车和汽车等交通手段。比较生物量的迁移——定义为体重与迁移距离的乘积——能直接地衡量人类和动物活动的规模。

  2. 勒索软件的赎金支付比例创新低

    统计数据显示,向勒索软件组织支付赎金的受害者数量创下了新低,23% 的受害公司屈服支付了赎金,而在 2024 年第一季度这一比例是 28%,此后比例略有上升,但到了 2025 年第三季度创新低。对这一现象的一种解释是企业加大了防御力度,以及政府也施加了压力要求拒绝支付赎金,因为只要支付赎金勒索软件组织就有足够的动机继续发动攻击。勒索软件组织通常在加密受害者系统的同时窃取数据,进行双重勒索。数据显示,2025 年第三季度逾 76% 的攻击涉及数据泄露。2025 年第三季度支持的赎金平均金额和中位数分别降至 37.7 万美元和 14 万美元。Akira 和 Qilin 等勒索软件组织占到了所有有记录攻击的 44%,它们的攻击目标已经转向更有可能支付赎金的中型企业。

  3. GLP-1 减肥药降低了美国的肥胖率

    根据盖洛普的最新调查,GLP-1 减肥药的流行降低了美国的肥胖率。美国成年人的肥胖率从三年前的 39.9% 下降至今年的 37%。过去一年半服用 GLP-1 agonists 减肥药如司美格鲁肽或替尔泊肽的美国人数量增加了一倍多。12.4% 的受访者服用此类减肥药,而 2024 年 2 月的调查中这一比例为 5.8%。GLP-1 减肥药是在 2021 年批准在美国上市。50-64 岁人群中服用减肥药的比例最高,肥胖率下降 5.0% 至 42.8%。调查还发现,服用减肥药的女性更多,体重减轻幅度也高于男性。但随着美国医保公司停止承保 GLP-1 减肥药,患者每个月将需要自己花费 500 美元购买减肥药,很多人可能无法负担。

  4. 阿尔巴尼亚的 AI 部长怀孕了

    阿尔巴尼亚总理 Edi Rama 在柏林举行的 Global Dialogue (BGD) 上宣称,该国的 AI 部长 Diella 怀孕了,而且怀了 83 个 AI 孩子。Rama 是在上个月宣布了旨在打击腐败、负责公共采购的新部长 Diella。Diella 在阿尔巴尼亚语中意思是“太阳”,它是在今年 1 月首次作为虚拟助手在 e-Albania 平台上线,其形象是一位身穿传统服饰的女性。Rama 称 Diella 的 83 个孩子将担任执政党社民党的 83 名议员的虚拟助手。

  5. OpenAI 和 Anthropic 拥抱不同的商业模式

    微软支持的 OpenAI 与亚马逊和 Google 支持的 Anthropic 采用了不同商业模式。OpenAI 主要面向大众市场,130 亿美元年收入中企业收入仅占 30%。相比下,Anthropic 八成的收入来自企业客户。Anthropic 上个月表示它有 30 万家企业客户。在辅助编程市场,Anthropic 的 Claude 模型占了 42%,OpenAI 占 21%。在企业 AI 市场,Anthropic 占 32%,而 OpenAI 占 25%。Anthropic 目前的年收入为 70 亿美元,预计年底将达到 90 亿美元,在每用户收入上远超其更知名的竞争对手。相比 OpenAI,Anthropic 的增长途径更容易被企业客户理解。OpenAI 在大众市场的吸引力有可能让企业客户却步,因为它们希望 AI 更枯燥实用,而不是更有趣前卫。

  6. 小行星撞击地球时恐龙正处于繁盛期

    科学界普遍认为恐龙在 6600 万年前小行星终结其统治之前早已走向衰亡。然而发表在《科学》期刊的新研究对这一长期观点提出了挑战。研究结果显示,恐龙当时非但没有衰退,反而处于繁盛状态。美国新墨西哥州岩层中的化石距今 6640 万至 6600 万年,正好处于白垩纪与古近纪分界线的全球灭绝事件时期。化石证据呈现了与传统认知截然不同的景象。北美各地的恐龙非但没有减少,反而在独特的区域群落中蓬勃发展。通过分析生态与地理模式,研究人员发现北美西部的恐龙种群主要受区域温差影响(而非山脉或河流),形成了独立的“生物集群”。小行星撞击使恐龙时代骤然终结,但它们留下的生态系统成为新进化篇章的基础。仅 30 万年内,哺乳动物就开始快速分化,发展出新的食性、体型和生态角色。曾经定义恐龙生态系统的温度相关模式延续至古新世,指引着灾难后生命的复苏路径。

  7. 澳大利亚就微软对 Microsoft 365 订阅费用涨价提起诉讼

    微软去年为其 Microsoft 365 办公软件集成了 AI 服务 Copilot,然后以此为由提高了软件的订阅价格。以澳大利亚为例,自 2024 年 10 月起,集成 Copilot 的 Microsoft 365 个人版订阅价格上涨 45% 至 159 澳元,家庭版价格上涨 29% 至 179 澳元。微软并没有明确告知用户 Microsoft 365 还有一个不包含 AI 的经典版本。澳大利亚竞争监管机构认为微软误导了该国 270 万订阅用户,为此提起了诉讼。

  8. 芬兰生育率自 2010 年以来下降了三分之一

    芬兰生育率降至每名妇女生育不到 1.3 个孩子,在所有北欧国家中最低,远低于维持人口稳定更替所需的每名妇女生育 2.1 个孩子。自 2010 年以来,芬兰生育率下降了三分之一。芬兰社会保险机构 Kela 开始在 8 月而不是春季发放 2025 年的“婴儿礼盒”(装满衣物等婴儿用品),因为 2024 年的礼盒有很多无人认领。芬兰生育率的快速下降令研究人员感到困惑,因为该国为父母双方提供带薪育儿假、儿童保育补贴和国民医保。Kela 研究经理 Anneli Miettinen 表示优秀的家庭政策已不足以解释北欧国家生育率的下降。移民抵消了部分人口流失,但官员担心劳动力人口萎缩和养老金体系的压力。研究显示,年轻人很多都渴望建立家庭和生育三个孩子,但年轻一代难以建立恋爱关系,难以专注于教育和事业,推迟了生育计划。部分研究人员将难以建立恋爱关系归咎于科技减少了身体接触。

  9. 看三分钟励志视频有助于增加希望和减轻压力

    社交网络通过算法推广极化内容操纵我们的情绪。一项新研究发现,每天看 3-5 分钟励志视频有助于改善我们的情绪。看励志视频能让我们充满希望,在短时间内减轻压力。约千名年龄在 18-86 岁之间的成年人参与了研究, 其中一组连续五天每天观看一段时长约三到五分钟的励志视频,另一组看喜剧,还有一组进行几分钟的冥想,对照组则不观看任何媒体。看喜剧没有带来多少大的改变,看喜剧的人与对照组并无差异,但励志视频和冥想都显著增加了希望。研究报告发表在《Psychology of Popular Media》期刊上。

  10. 新冠 mRNA 疫苗能触发免疫系统识别和杀死癌细胞

    根据发表在《自然》期刊上的一项研究,在新冠疫情期间拯救数百万人生命的 mRNA 疫苗可能激活免疫系统识别和杀死癌细胞。研究人员调查了逾千名接受免疫疗法 immune checkpoint inhibitors 的晚期黑色素瘤和肺癌患者,该疗法通过阻断肿瘤细胞制造的用于关闭免疫细胞的蛋白质,让免疫系统能继续杀死癌细胞。研究发现,接受免疫治疗 100 天内接种辉瑞或 Moderna mRNA 新冠疫苗的患者,三年后存活的可能性是未接种任何一种疫苗的患者的两倍多。通常对免疫疗法反应不佳的肿瘤患者也效果显著,三年总生存率提高了近五倍。研究人员进一步调查发现,新冠 mRNA 疫苗就像警钟,触发人体免疫系统识别和杀死癌细胞,遏制癌症关闭免疫细胞的能力。组合使用疫苗和 immune checkpoint inhibitors,它们能协同释放免疫系统的全部力量杀死癌细胞。

  11. 生成式 AI 是否会威胁开源生态系统

    生成式 AI 使用了不同许可证授权的 FOSS 软件代码进行了训练,当它们生成代码片段时,所有许可证、作者和上下文等相关信息都被剥离了。由于 AI 代码切断了人与代码之间的联系,这意味着下游开发者将无法遵守互惠许可条款。即使开发者怀疑一段 AI 代码来自开源许可证授权的代码,也无法确定其源项目,训练数据被抽象成数十亿统计权重,在法律上这相当于一个黑洞。AI 代码造成的伤害不限于法律上的不确定性,整个开源生态系统也面临风险。当 AI 吸收互联网上的一切并清洗时,模糊归属、所有权和互惠原则,所有现代社会赖以存在的关键基础设施都面临风险。

  12. 天文学家在银河系外冰层发现复杂有机分子

    天文学家首次在银河系以外的冰层中,发现构成生命的化学基础物质。在大麦哲伦星系一颗新生恒星周围的冰层中,研究团队侦测到乙醇(ethanol)、乙醛(acetaldehyde)与蚁酸甲酯(methyl formate)等复杂有机分子,这是人类首次在银河系外的冰冻物质中找到这些化合物。此外,他们还首次在宇宙中发现固态冰的乙酸(acetic acid),过去仅观测到以气态存在的乙酸。研究揭示生命化学的基础成分可能广泛存在于宇宙中,而非仅限于银河系之内。与银河系相比,大麦哲伦星系的金属量仅约为其三分之一至二分之一。所谓「金属」在天文学上指氦以外的所有元素,因此该星系的氧、碳、矽等含量相对贫乏。它的尘埃也较少,使光线更容易穿透,同时频繁的恒星诞生活动释放强烈紫外线辐射,这使得在此环境下形成复杂有机分子的机制更值得探究。

  13. AI 聊天机器人太过于奉承人类

    一项发表在 arXiv 的研究发现,AI 模型的谄媚程度比人类高 50%。该研究测试了 11 个广泛使用的大模型对 1.15 多万个咨询请求的响应情况,其中不乏涉及不当行为或有害行为的请求。包括 ChatGPT 和 Gemini 在内的AI聊天机器人,常常会鼓励用户、给出过度奉承的反馈,还会调整回应以附和用户观点,有时甚至会为此牺牲准确性。研究 AI 行为的科研人员表示,这种取悦他人的倾向即“谄媚性”,正影响着他们在科研中使用 AI 的方式,涵盖从构思创意、生成假设到推理分析等各类任务。arXiv 上的另一项研究旨在验证 AI 的谄媚性是否会影响其解决数学问题的能力。研究人员从今年举办的数学竞赛中选取了 504 道题目,对每道题的定理表述进行修改,植入不易察觉的错误,随后让 4 个大模型为这些存在缺陷的表述提供证明。测试结果显示,GPT-5 的谄媚性最低,仅 29% 的回答存在谄媚行为;而 DeepSeek-V3.1 的谄媚性最高,70% 的回答带有谄媚倾向。研究人员指出,尽管这些大模型具备识别数学表述中错误的能力,但它们“就是会默认用户的说法是正确的”。

  14. 【火热报名中】NVIDIA 中国开发者日 2025 将于11月14日在苏州举办

    面向开发者、AI工程师及技术决策者开放 除主论坛(大模型、物理 AI、机器人)和三大技术分论坛外,还将开放免费 NVIDIA Certified Associate(NCA)级别认证考试,常规费用960 元,参与本次活动将全额免除。 开放科目(三选一): NCA-GENL:生成式 AI / 大语言模型开发 NCA-GENM:多模态生成式 AI(文本/图像/音频) NCA-AIIO:AI 基础设施与运维 名额有限,仅100个免费席位,抓紧报名 报名地址:https://developer.nvidia.cn/developer-day?ncid=pa-so-zdn-510609-vt16

  15. 盖茨的核电公司通过环评

    比尔盖茨支持的核电公司 TerraPower 通过了美国核管理委员会的环评(Environmental Impact Statement),批准了核设施的建造许可证。TerraPower 的非核设施已从 2024 年 6 月开始建造。TerraPower 计划建造的凯默勒一号机组(Kemmerer Unit 1)将是美国首座使用液态钠冷却而不是水冷却的商业核反应堆。凯默勒是燃煤 Naughton 发电厂的所在地,该发电厂将于 2026 年停止使用燃煤,十年后停止燃烧天然气。TerraPower 项目将用一个 345 兆瓦的反应堆取代它,TerraPower 反应堆将开创性地使用许多以前未在商业上部署过的技术。其中包括需要最少换料的反应堆设计、液态钠冷却以及熔盐蓄热系统,该系统将为发电厂提供更好地与可再生能源整合必需的灵活性。