Monthly Digest — 2026-05
24 unique stories across 31 days and 8 sources.
Hacker News(4)
GitHub Trending(4)
Product Hunt(4)
Hugging Face(4)
- Heterogeneous Scientific Foundation Model Collaboration
Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.
- Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis toward intelligent visual generation: plausible visuals grounded in structure, dynamics, domain knowledge, and causal relations. To frame this shift, we introduce a five-level taxonomy: Atomic Generation, Conditional Generation, In-Context Generation, Agentic Generation, and World-Modeling Generation, progressing from passive renderers to interactive, agentic, world-aware generators. We analyze key technical drivers, including flow matching, unified understanding-and-generation models, improved visual representations, post-training, reward modeling, data curation, synthetic data distillation, and sampling acceleration. We further show that current evaluations often overestimate progress by emphasizing perceptual quality while missing structural, temporal, and causal failures. By combining benchmark review, in-the-wild stress tests, and expert-constrained case studies, this roadmap offers a capability-centered lens for understanding, evaluating, and advancing the next generation of intelligent visual generation systems.
- Co-Evolving Policy Distillation
RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first training experts and then performing OPD, though avoiding divergence, fails to fully absorb teacher capabilities due to large behavioral pattern gaps between teacher and student. We propose Co-Evolving Policy Distillation (CoPD), which encourages parallel training of experts and introduces OPD during each expert's ongoing RLVR training rather than after complete expert training, with experts serving as mutual teachers (making OPD bidirectional) to co-evolve. This enables more consistent behavioral patterns among experts while maintaining sufficient complementary knowledge throughout. Experiments validate that CoPD achieves all-in-one integration of text, image, and video reasoning capabilities, significantly outperforming strong baselines such as mixed RLVR and MOPD, and even surpassing domain-specific experts. The model parallel training pattern offered by CoPD may inspire a novel training scaling paradigm.
- ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.
Techmeme(4)
- Servers operated by Ubuntu and its parent company Canonical have been down for more than a day, following a "sustained, cross-border attack" (Dan Goodin/Ars Technica)
Dan Goodin / Ars Technica : Servers operated by Ubuntu and its parent company Canonical have been down for more than a day, following a “sustained, cross-border attack” — Servers operated by Ubuntu and its parent company Canonical were knocked offline on Thursday morning and have remained down ever since …
- Apple has stopped offering a 256GB storage option for the Mac mini globally; Mac mini now starts at 512GB for $799 in the US (Joe Rossignol/MacRumors)
Joe Rossignol / MacRumors : Apple has stopped offering a 256GB storage option for the Mac mini globally; Mac mini now starts at 512GB for $799 in the US — Apple this week stopped offering a 256GB storage option for the Mac mini worldwide. As a result, the desktop computer now has a higher starting price.
- The Academy of Motion Picture Arts and Sciences issues new rules saying acting and writing must be performed by humans and not AI to be eligible for Oscars (Lisa Richwine/Reuters)
Lisa Richwine / Reuters : The Academy of Motion Picture Arts and Sciences issues new rules saying acting and writing must be performed by humans and not AI to be eligible for Oscars — Academy Awards organizers issued new rules on Friday to clarify that acting and writing must be performed by humans …
- Sources: Cerebras is seeking to raise as much as $4B in its IPO and is targeting a valuation of about $40B (Bloomberg)
Bloomberg : Sources: Cerebras is seeking to raise as much as $4B in its IPO and is targeting a valuation of about $40B — Cerebras Systems Inc. is seeking to raise as much as $4 billion in its initial public offering, according to people familiar with the matter, as demand for the artificial intelligence chipmaker …
Solidot(4)
- 数据中心开发商 Pure Data 暂停中东投资项目
在其设施遭袭受损之后,数据中心开发商 Pure Data 暂停所有中东项目投资。Pure Data 在欧洲、亚洲和中东运营或开发逾 1GW 的数据中心。数据中心作为基础设施成为了战争中的一个重要目标。亚马逊 AWS 在中东有三座数据中心遭到袭击,导致中东客户的服务出现大规模中断,迫使亚马逊宣布免除其中东云区域客户所有费用,导致其损失了约 1.5 亿美元。Pure Data 位于阿布扎比 Yas Island 的数据中心园区遭到了弹片的袭击。该公司没有披露发生的时间以及受损情况。
- 德国 2025 年新生儿数量降至 1946 年以来最低水平
德国联邦统计局的初步数据显示,2025 年新生儿数量降至 1946 年以来最低水平。2025 年德国新生儿数约 65.5 万,远低于 1964 年婴儿潮高峰时的 136 万,2024 年的新生儿数据是 68 万。与此同时德国死亡人数接近 101 万,使得 2025 年死亡人数与出生人数之差超过 35.2 万,创战后历史新高。德国出生率连续第四年下降,目前每名妇女平均生育 1.35 个孩子,创历史新低,远低于维持人口稳定所需的 2.1 个孩子。汉堡是唯一一个生育率上升的德国州,2025 年增长了 0.5%。
- Google 给你贴上的价格标签
瑞士邮件服务商 Proton 利用 2025 年广告竞价数据,分析了逾 54,000 个人口画像,估算广告商为触达不同类美国人所支付的价格。结果显示不同人之间的价格差距远超想象。美国人平均每年产生的广告价值约 1,605 美元;一名居住在蒙大拿州 Bozeman 市、年龄 35-44 岁之间、无子女、用台式机进行高价值企业搜索的男性,其广告价值估计为 17,929.30 美元;一位居住在阿肯色州 Fort Smith 市、年龄在 18-24 岁之间、用 Android 手机进行低价值搜索的父亲,其广告价值仅为 31.05 美元。1,605 美元的平均值与 760 美元的中位数显示,少数高价值用户拉高了平均值,而此类商业模式依赖于高价值用户。分析显示,无子女用户的广告价值比有子女用户平均高出约 17%,一旦某个用户被标记为有子女,针对他们的广告投放会从每次点击 6 美元的财富管理广告转向每次点击 2 美元的面包车和幼儿园广告。台式机用户的价值是 Android 用户的 4.9 倍,苹果 iPhone 用户的价值是 Android 用户的 2.7 倍。用户年龄在 35-44 岁之间时广告价值最高,65 岁后广告价值下降——虽然老年用户价值下降,但针对他们的广告则属于高消费类别如医保补充保险、药品和金融产品。老年人的总体价值降低,但广告商的投放力度更精准。为什么蒙大拿州 Bozeman 市居民的广告价值高?因为大量远程科技工作者的涌入和户外休闲消费使其成为全美竞争最激烈的本地广告市场之一。
- 亚洲多国加大燃煤发电以应对能源危机
最新的中东能源危机促使亚洲国家加大燃煤发电,而煤炭是高污染排放来源,如果这一趋势继续,全球气候变化问题将会愈发严峻。印度宣布推迟对国内燃煤电厂的维护检查。国际能源署(IEA)的数据显示,截至 2023 年,印度发电中煤炭占 74%。石油和天然气合计约占 3%,来自中东的采购存在制约,印度通过增加煤炭火力来避免停电风险。泰国电力公司重启原计划停用的 2 座燃煤机组。韩国暂时解除了以发电能力 80% 为上限的煤炭火电站的运行限制,推迟原定于 6 月关闭的两座火电站的关闭时间。日本也将提高煤炭火力发电站的开工率。孟加拉国则增加煤炭的供应来源。全球最大的发电用煤炭出口国印尼计划上调原定为 6 亿吨的 2026 年煤炭生产计划。第二大出口国澳大利亚政府也计划扩大煤炭生产。