Weekly Digest — 2025-W26
178 unique stories (2025-06-23 → 2025-06-29), aggregated across 8 sources.
Hacker News(42)
- Vera C. Rubin Observatory first images (rubinobservatory.org)
- Judge denies creating “mass surveillance program” harming all ChatGPT users (arstechnica.com)
- uv: An extremely fast Python package and project manager, written in Rust (github.com)
- How I use my terminal (jyn.dev)
- Officials concede they don't know the fate of Iran's uranium stockpile (www.nytimes.com)
- NASA's Voyager Found a 30k-50k Kelvin "Wall" at the Edge of Solar System (www.iflscience.com)
- Fun with uv and PEP 723 (www.cottongeeks.com)
- Man 'refused entry into US' as border control catch him with bald JD Vance meme (www.dublinlive.ie)
- iPhone customers upset by Apple Wallet ad pushing F1 movie (techcrunch.com)
- MCP is eating the world (www.stainless.com)
- Writing toy software is a joy (blog.jsbarretto.com)
- PlasticList – Plastic Levels in Foods (www.plasticlist.org)
GitHub Trending(29)
- microsoft / edit
We all edit.
- voideditor / void
- ghostty-org / ghostty
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
- kortix-ai / suna
Suna - Open Source Generalist AI Agent
- x1xhlol / system-prompts-and-models-of-ai-tools
FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.
- typst / typst
A new markup-based typesetting system that is powerful and easy to learn.
- DrKLO / Telegram
Telegram for Android source
- patchy631 / ai-engineering-hub
In-depth tutorials on LLMs, RAGs and real-world AI agent applications.
- HarbourMasters / SpaghettiKart
- jujumilk3 / leaked-system-prompts
Collection of leaked system prompts
- musistudio / claude-code-router
Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.
- DioxusLabs / dioxus
Fullstack app framework for web, desktop, and mobile.
Product Hunt(42)
- HeyBoss AI Boss Mode
Get your AI team to build your site and run your business.
- Read & Give by Bono
Read the news. Make a difference. Instantly.
- AI Assistant by Mintlify
A conversational, agentic assistant built into your docs
- Bookster.cc
Transform your knowledge into captivating ebooks in minutes.
- Karsa
Get a virtual US bank account + save/spend dollars globally
- Reducto Studio
Build production-ready document pipelines in one platform
- Pally - AI Relationship Management
All your connections, across all your socials.
- Pythagora 2.0
World's first all-in-one AI dev platform
- Runbear
Your best new hire, but AI — in Slack!
- Cekura
Launch reliable voice & chat AI agents 10x faster
- SmythOS
The open source agent OS
- Zen Agents (by Zencoder)
Build AI agents. Share org-wide. 100+ Tools&MCP
Hugging Face(30)
- Better Language Model Inversion by Compactly Representing Next-Token Distributions
Language model inversion seeks to recover hidden prompts using only language model outputs. This capability has implications for security and accountability in language model deployments, such as leaking private information from an API-protected language model's system message. We propose a new method -- prompt inversion from logprob sequences (PILS) -- that recovers hidden prompts by gleaning clues from the model's next-token probabilities over the course of multiple generation steps. Our method is enabled by a key insight: The vector-valued outputs of a language model occupy a low-dimensional subspace. This enables us to losslessly compress the full next-token probability distribution over multiple generation steps using a linear map, allowing more output information to be used for inversion. Our approach yields massive gains over previous state-of-the-art methods for recovering hidden prompts, achieving 2--3.5 times higher exact recovery rates across test sets, in one case increasing the recovery rate from 17% to 60%. Our method also exhibits surprisingly good generalization behavior; for instance, an inverter trained on 16 generations steps gets 5--27 points higher prompt recovery when we increase the number of steps to 32 at test time. Furthermore, we demonstrate strong performance of our method on the more challenging task of recovering hidden system messages. We also analyze the role of verbatim repetition in prompt recovery and propose a new method for cross-family model transfer for logit-based inverters. Our findings show that next-token probabilities are a considerably more vulnerable attack surface for inversion attacks than previously known.
- Watermarking Autoregressive Image Generation
Watermarking the outputs of generative models has emerged as a promising approach for tracking their provenance. Despite significant interest in autoregressive image generation models and their potential for misuse, no prior work has attempted to watermark their outputs at the token level. In this work, we present the first such approach by adapting language model watermarking techniques to this setting. We identify a key challenge: the lack of reverse cycle-consistency (RCC), wherein re-tokenizing generated image tokens significantly alters the token sequence, effectively erasing the watermark. To address this and to make our method robust to common image transformations, neural compression, and removal attacks, we introduce (i) a custom tokenizer-detokenizer finetuning procedure that improves RCC, and (ii) a complementary watermark synchronization layer. As our experiments demonstrate, our approach enables reliable and robust watermark detection with theoretically grounded p-values.
- MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Combining pre-trained expert models offers substantial potential for scalable multimodal reasoning, but building a unified framework remains challenging due to the increasing diversity of input modalities and task complexity. For instance, medical diagnosis requires precise reasoning over structured clinical tables, while financial forecasting depends on interpreting plot-based data to make informed predictions. To tackle this challenge, we introduce MEXA, a training-free framework that performs modality- and task-aware aggregation of multiple expert models to enable effective multimodal reasoning across diverse and distinct domains. MEXA dynamically selects expert models based on the input modality and the task-specific reasoning demands (i.e., skills). Each expert model, specialized in a modality task pair, generates interpretable textual reasoning outputs. MEXA then aggregates and reasons over these outputs using a Large Reasoning Model (LRM) to produce the final answer. This modular design allows flexible and transparent multimodal reasoning across diverse domains without additional training overhead. We extensively evaluate our approach on diverse multimodal benchmarks, including Video Reasoning, Audio Reasoning, 3D Understanding, and Medical QA. MEXA consistently delivers performance improvements over strong multimodal baselines, highlighting the effectiveness and broad applicability of our expert-driven selection and aggregation in diverse multimodal reasoning tasks.
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
Vision-language models (VLMs) excel at multimodal understanding, yet their text-only decoding forces them to verbalize visual reasoning, limiting performance on tasks that demand visual imagination. Recent attempts train VLMs to render explicit images, but the heavy image-generation pre-training often hinders the reasoning ability. Inspired by the way humans reason with mental imagery-the internal construction and manipulation of visual cues-we investigate whether VLMs can reason through interleaved multimodal trajectories without producing explicit images. To this end, we present a Machine Mental Imagery framework, dubbed as Mirage, which augments VLM decoding with latent visual tokens alongside ordinary text. Concretely, whenever the model chooses to ``think visually'', it recasts its hidden states as next tokens, thereby continuing a multimodal trajectory without generating pixel-level images. Begin by supervising the latent tokens through distillation from ground-truth image embeddings, we then switch to text-only supervision to make the latent trajectory align tightly with the task objective. A subsequent reinforcement learning stage further enhances the multimodal reasoning capability. Experiments on diverse benchmarks demonstrate that Mirage unlocks stronger multimodal reasoning without explicit image generation.
- From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
One promise that Vision-Language-Action (VLA) models hold over traditional imitation learning for robotics is to leverage the broad generalization capabilities of large Vision-Language Models (VLMs) to produce versatile, "generalist" robot policies. However, current evaluations of VLAs remain insufficient. Traditional imitation learning benchmarks are unsuitable due to the lack of language instructions. Emerging benchmarks for VLAs that incorporate language often come with limited evaluation tasks and do not intend to investigate how much VLM pretraining truly contributes to the generalization capabilities of the downstream robotic policy. Meanwhile, much research relies on real-world robot setups designed in isolation by different institutions, which creates a barrier for reproducibility and accessibility. To address this gap, we introduce a unified probing suite of 50 simulation-based tasks across 10 subcategories spanning language instruction, vision, and objects. We systematically evaluate several state-of-the-art VLA architectures on this suite to understand their generalization capability. Our results show that while VLM backbones endow VLAs with robust perceptual understanding and high level planning, which we refer to as good intentions, this does not reliably translate into precise motor execution: when faced with out-of-distribution observations, policies often exhibit coherent intentions, but falter in action execution. Moreover, finetuning on action data can erode the original VLM's generalist reasoning abilities. We release our task suite and evaluation code to serve as a standardized benchmark for future VLAs and to drive research on closing the perception-to-action gap. More information, including the source code, can be found at https://ai4ce.github.io/INT-ACT/
- Optimizing Multilingual Text-To-Speech with Accents & Emotions
State-of-the-art text-to-speech (TTS) systems realize high naturalness in monolingual environments, synthesizing speech with correct multilingual accents (especially for Indic languages) and context-relevant emotions still poses difficulty owing to cultural nuance discrepancies in current frameworks. This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent. Our approach extends the Parler-TTS model by integrating A language-specific phoneme alignment hybrid encoder-decoder architecture, and culture-sensitive emotion embedding layers trained on native speaker corpora, as well as incorporating a dynamic accent code switching with residual vector quantization. Quantitative tests demonstrate 23.7% improvement in accent accuracy (Word Error Rate reduction from 15.4% to 11.8%) and 85.3% emotion recognition accuracy from native listeners, surpassing METTS and VECL-TTS baselines. The novelty of the system is that it can mix code in real time - generating statements such as "Namaste, let's talk about <Hindi phrase>" with uninterrupted accent shifts while preserving emotional consistency. Subjective evaluation with 200 users reported a mean opinion score (MOS) of 4.2/5 for cultural correctness, much better than existing multilingual systems (p<0.01). This research makes cross-lingual synthesis more feasible by showcasing scalable accent-emotion disentanglement, with direct application in South Asian EdTech and accessibility software.
- RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Recent multi-modal large language models (MLLMs) often struggle to generate personalized image captions, even when trained on high-quality captions. In this work, we observe that such limitations persist in existing post-training-based MLLM personalization methods. Specifically, despite being post-tuned with large-scale caption data through supervised fine-tuning (SFT), these models frequently fail to produce faithful descriptions in real-world scenarios, such as multi-concept image captioning. However, acquiring large-scale, high-quality captions for such complex settings is both costly and difficult. To address the data-centric nature of SFT, we propose a reinforcement learning (RL)-based post-training framework. To the best of our knowledge, this is the first RL-based approach to post-train MLLMs for personalized image captioning. Our method significantly enhances both visual recognition and personalized generation capabilities of MLLMs, and consistently outperforms existing SFT-based baselines, especially in the challenging multi-concept image captioning task.
- 4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at some times to any view at any time? We provide an affirmative answer with 4D-LRM, the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary novel view-time combinations. Unlike prior 4D approaches, e.g., optimization-based, geometry-based, or generative, that struggle with efficiency, generalization, or faithfulness, 4D-LRM learns a unified space-time representation and directly predicts per-pixel 4D Gaussian primitives from posed image tokens across time, enabling fast, high-quality rendering at, in principle, infinite frame rate. Our results demonstrate that scaling spatiotemporal pretraining enables accurate and efficient 4D reconstruction. We show that 4D-LRM generalizes to novel objects, interpolates across time, and handles diverse camera setups. It reconstructs 24-frame sequences in one forward pass with less than 1.5 seconds on a single A100 GPU.
- Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems
Despite recent progress in generating hardware RTL code with LLMs, existing solutions still suffer from a substantial gap between practical application scenarios and the requirements of real-world RTL code development. Prior approaches either focus on overly simplified hardware descriptions or depend on extensive human guidance to process complex specifications, limiting their scalability and automation potential. In this paper, we address this gap by proposing an LLM agent system, termed Spec2RTL-Agent, designed to directly process complex specification documentation and generate corresponding RTL code implementations, advancing LLM-based RTL code generation toward more realistic application settings. To achieve this goal, Spec2RTL-Agent introduces a novel multi-agent collaboration framework that integrates three key enablers: (1) a reasoning and understanding module that translates specifications into structured, step-by-step implementation plans; (2) a progressive coding and prompt optimization module that iteratively refines the code across multiple representations to enhance correctness and synthesisability for RTL conversion; and (3) an adaptive reflection module that identifies and traces the source of errors during generation, ensuring a more robust code generation flow. Instead of directly generating RTL from natural language, our system strategically generates synthesizable C++ code, which is then optimized for HLS. This agent-driven refinement ensures greater correctness and compatibility compared to naive direct RTL generation approaches. We evaluate Spec2RTL-Agent on three specification documents, showing it generates accurate RTL code with up to 75% fewer human interventions than existing methods. This highlights its role as the first fully automated multi-agent system for RTL generation from unstructured specs, reducing reliance on human effort in hardware design.
- Demystifying the Visual Quality Paradox in Multimodal Large Language Models
Recent Multimodal Large Language Models (MLLMs) excel on benchmark vision-language tasks, yet little is known about how input visual quality shapes their responses. Does higher perceptual quality of images already translate to better MLLM understanding? We conduct the first systematic study spanning leading MLLMs and a suite of vision-language benchmarks, applying controlled degradations and stylistic shifts to each image. Surprisingly, we uncover a visual-quality paradox: model, task, and even individual-instance performance can improve when images deviate from human-perceived fidelity. Off-the-shelf restoration pipelines fail to reconcile these idiosyncratic preferences. To close the gap, we introduce Visual-Quality Test-Time Tuning (VQ-TTT)-a lightweight adaptation module that: (1) inserts a learnable, low-rank kernel before the frozen vision encoder to modulate frequency content; and (2) fine-tunes only shallow vision-encoder layers via LoRA. VQ-TTT dynamically adjusts each input image in a single forward pass, aligning it with task-specific model preferences. Across the evaluated MLLMs and all datasets, VQ-TTT lifts significant average accuracy, with no external models, cached features, or extra training data. These findings redefine ``better'' visual inputs for MLLMs and highlight the need for adaptive, rather than universally ``clean'', imagery, in the new era of AI being the main data customer.
- 3D Arena: An Open Platform for Generative 3D Evaluation
Evaluating Generative 3D models remains challenging due to misalignment between automated metrics and human perception of quality. Current benchmarks rely on image-based metrics that ignore 3D structure or geometric measures that fail to capture perceptual appeal and real-world utility. To address this gap, we present 3D Arena, an open platform for evaluating image-to-3D generation models through large-scale human preference collection using pairwise comparisons. Since launching in June 2024, the platform has collected 123,243 votes from 8,096 users across 19 state-of-the-art models, establishing the largest human preference evaluation for Generative 3D. We contribute the iso3d dataset of 100 evaluation prompts and demonstrate quality control achieving 99.75% user authenticity through statistical fraud detection. Our ELO-based ranking system provides reliable model assessment, with the platform becoming an established evaluation resource. Through analysis of this preference data, we present insights into human preference patterns. Our findings reveal preferences for visual presentation features, with Gaussian splat outputs achieving a 16.6 ELO advantage over meshes and textured models receiving a 144.1 ELO advantage over untextured models. We provide recommendations for improving evaluation methods, including multi-criteria assessment, task-oriented evaluation, and format-aware comparison. The platform's community engagement establishes 3D Arena as a benchmark for the field while advancing understanding of human-centered evaluation in Generative 3D.
- Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models
Story visualization has become a popular task where visual scenes are generated to depict a narrative across multiple panels. A central challenge in this setting is maintaining visual consistency, particularly in how characters and objects persist and evolve throughout the story. Despite recent advances in diffusion models, current approaches often fail to preserve key character attributes, leading to incoherent narratives. In this work, we propose a collaborative multi-agent framework that autonomously identifies, corrects, and refines inconsistencies across multi-panel story visualizations. The agents operate in an iterative loop, enabling fine-grained, panel-level updates without re-generating entire sequences. Our framework is model-agnostic and flexibly integrates with a variety of diffusion models, including rectified flow transformers such as Flux and latent diffusion models such as Stable Diffusion. Quantitative and qualitative experiments show that our method outperforms prior approaches in terms of multi-panel consistency.
Solidot(35)
- 气候变暖将显著影响粮食产量
根据发表于《自然》的全球作物产量分析,到本世纪末,每变暖 1℃,人均可用粮食每天将减少约121千卡。研究论文作者、美国伊利诺伊大学厄巴纳-香槟分校的 Andrew Hultgren 说,根据目前轨迹,在 3℃ 的变暖情景下,“这相当于每个人都不再用早餐”。Hultgren 和同事收集了世界上六大主粮作物的产量数据。这些作物提供了全球人类所需热量的 2/3 以上。他们发现,对于除水稻外的所有作物,高温会造成巨大的损失,因为水稻在更暖的夜晚生长得更好。根据预测,到本世纪末,全球变暖情况下玉米产量比没有全球变暖的情况下降 12% 或28%,具体下降程度取决于温室气体排放量是适中还是非常高。如果农民不采取措施适应气候变化,那么在本世纪末变暖程度高的情景下,作物损失将增加约1 /3。不过,研究人员指出,即便采取了上述农业适应措施,也不太可能弥补气候变化造成的巨大作物损失。
- 英特尔将营销业务外包给 Accenture
英特尔新 CEO 陈立武正致力于削减开支重整业务,作为新战略的一部分,芯片巨人正将营销业务外包给 Accenture。英特尔表示相信 Accenture 利用 AI 将能更好的与客户沟通。英特尔表示将在 7 月 11 日前通知大部分营销人员它是否计划裁员。英特尔称,营销和运营职能的转型将导致团队结构的巨大变化,可能导致人员削减,只保留精简的团队。英特尔拒绝披露会有多少员工失去工作,也拒绝透露其营销部门有多少员工。
- Psyche 探测器切换到了备用燃料管线
NASA 宣布 Psyche 探测器切换到了备用燃料管线,继续朝目标灵神星(Psyche)前进。Psyche 探测器于 2023 月 10 月发射升空,其目标灵神星位于火星和木星之间的小行星带,距离地球大约 40 亿公里,被认为含有铁、镍、白金、稀土等元素,潜在价值巨大。探测器预计将在 2029 年抵达。它使用了四个以氙气为燃料的电推进器,比传统火箭推进器更节省燃料。推进器一直工作正常,直到今年 4 月 1 日燃料管线内压力下降,探测器根据压力信号关闭了推进器。好消息是电推进器的一大优势是灵活性,而传统推进器的点火必须指定时间内进行,在切换到备用燃料管线之后,电推进器关闭两个月并没有影响抵达灵神星的时间。NASA 称,导致这起事故的原因被认为是主管线内的一个阀门可能发生故障。
- 天文学家发现失落普通物质的线索
天文学家在观测宇宙最大的星系聚集区域— Shapley 超星系团(Shapley Supercluster)时,发现一条长达约2,300 万光年的细丝状高温星际气体,可能正是寻找失落普通物质的关键线索。这条高温气体位于 Shapley 超星系团中的 4 个次星系团;A3528N、A3528S与A3530、A3532之 间,此细丝状气体以超过一千万度的高温辐射 X 射线,总质量约相当于 10 个银河系。研究指出,这正符合宇宙学模拟中预测,存在宇宙网中的细丝状星际气体的特征。宇宙网是由暗物质构成的庞大纤维状结构,贯穿星系间的星际空间,成为星系之间相互交换气体的通道。 根据宇宙微波背景辐射(CMB)推算,宇宙初期的普通物质数量明确可知;然而,当今宇宙中能直接观测到的恒星、星系、行星、气体与尘埃,仅占预期的一半。在宇宙学中被称为「普通物质遗失问题」。由于物质不会凭空消失,因此这些「消失的」普通物质究竟隐藏在何处,一直是近年来天文观测与理论研究的重要议题。这项发现提供了有力证据,支持「失落普通物质」分布于星系间极度稀薄、难以直接探测的宇宙网细丝状结构中的说法。
- 3 亿年前的全球暖化
地球正在快速变暖,但你知道吗?早在 3 亿多年前,类似的气候剧变就曾引发海洋生命的巨大波动。南京大学研究团队在《Science Advances》发表研究报告:在晚古生代(约3.4亿至2.5亿年前),地球缓慢变冷时,海洋生物加速演化、种类剧增;而一旦气候急剧变暖,尤其是火山喷发带来的升温,便引发大规模物种灭绝。研究主角是一群叫做䗴类有孔虫(fusuline)的远古单细胞海洋生物,它们个体虽小,但数量惊人(图1),曾主宰海底世界,被誉为“碳酸盐岩工厂”。团队发现,9180 多万年间,这些生物经历了两次“多样化爆发”和四次“灭绝危机”。尤其在 2.6亿 年前峨眉山大规模火山喷发前后,体型较大的䗴类几乎绝迹;而 2.52 亿年前的二叠纪末期超级火山事件,更彻底终结了这个庞大家族的演化历程。值得警惕的是,人类活动引发的现代全球变暖,其速度已远超古代峨眉山玄武岩和二叠纪末火山事件带来的变暖速率。当前的海洋生态系统或正经历类似䗴类曾遭遇的命运考验。
- 智能手机是人类的寄生物
在人类的演化过程中,寄生虫如头虱、跳蚤和绦虫一直伴随左右。但现代最强大的寄生物并非是吸血的无脊椎动物,而是智能手机。智能手机寄生于我们的时间、注意力和个人信息,为科技公司及其广告商谋利。从演化和寄生的角度看,智能手机对社会构成了独一无二的风险。寄生虫的生存依赖于宿主,离开宿主会很快死亡,以头虱为例,它给人类带来的代价主要是痒。智能手机改变了我们的生活,以至于很多人都离不开它。它带来的代价是一部分人沦为其奴隶,导致睡眠不足、线下关系薄弱以及情绪紊乱。人类与智能手机的关系一开始是互利共生(mutualism),但逐渐的演变为寄生关系。它提供的流行应用不是为了用户的利益,而是通过操纵我们的行为和情绪为其开发商和广告商谋利。用户是宿主,而智能手机就是寄生物。我们需要对其进行限制,至少能恢复部分互利共生的关系,但科技寡头们的实力非普通人能抵挡。
- 中国五月份太阳能装机容量创下新记录
官方记录显示,中国五月份太阳能装机容量创下记录。单月新增装机容量超过其他国家 2024 年全年的装机容量。根据国家能源局的数据,5 月新增太阳能装机容量 93 GW,打破了 2024 年 12 月创下的 71 GW 的纪录。太阳能装机容量大幅增长的一个原因是政府从 6 月 1 日起取消了对太阳能项目的电价保护:在保护政策下太阳能项目只要投入运营就能确保盈利。另一项于 5 月 1 日生效的政策加大了屋顶太阳能板接入电网的难度。分析人士预测,新政策将会放缓太阳能装机容量的增长。中国累计太阳能装机容量至今已超过 1 TW。
- Vera C. Rubin 天文台公布了首批宇宙全景照
Vera C. Rubin 天文台周一公开第一批宇宙全景照,宣告展开为期 10 年的时空遗珍巡天项目(Legacy Survey of Space and Time,LSST),这将会是人类史上最全面的南天巡天计划。天文台位于智利帕乔恩山顶,海拔1,600 米,配有口径 8.4 米的望远镜以及史上最大与最高解析度的数字相机 LSSTCam,其大小与一台汽车相当。这台超级相机每三个晚上就能扫描整个南半球夜空。在首批释出的影像中,LSSTCam 捕捉到距离地球约五千万光年的室女座星系团,画面中包含多达一千万个星系,然而这一千万个星系,只占 LSST 任务期间预计将观测到 200 亿个星系的 0.05%。
- Google Chromebook 笔记本电脑集成 AI 功能
Google 正将 AI 功能集成到其越来越多的产品中,最新集成的产品是它面向教育领域的笔记本电脑 Chromebook。虽然 AI 的运算主要是在云端进行,但 Chromebook 要使用 AI 仍然需要较高的硬件配置。Google 和联想合作推出的 Chromebook Plus 14 配备了联发科 Kompanio Ultra 处理器,Google 称是 Chromebook 史上最强大的 ARM 芯片。Kompanio Ultra NPU 的 AI 运算能力达到了 50 TOPS,足以本地运行部分 AI 模型,接近微软的 Copilot+ PC。这款 Chromebook 售价 749 美元。
- 亚马逊加速发射互联网宽带卫星
ULA 周一从佛罗里达州卡纳维拉尔角使用 Atlas V火箭为亚马逊发射了 27 颗 Project Kuiper 互联网宽带卫星。Project Kuiper 至今共完成了三次发射,其中第一次是测试,目前在轨宽带卫星 54 颗,亚马逊计划发射 3232 颗宽带卫星,覆盖大部分人口密集地区。亚马逊已与四家发射公司购买了 80 多次发射合同,其中 ULA 使用 Atlas V 火箭发射九次,之后火箭退役,改用 Vulcan 火箭发射 38 次——每次发射的卫星数量将增加到 45 颗。欧洲的 Ariane 6 火箭将执行 18 次,贝佐斯旗下 Blue Origin 的 New Glenn 火箭将至少发射 12 次。竞争对手 SpaceX 将在下个月执行 Project Kuiper 的第四次发射。SpaceX 的 Starlink 宽带卫星星座总数已经超过 7000 颗。
- IYO 就 IO 商标起诉 OpenAI
从 Google X 分拆出来的创业公司 IYO 就 IO 商标起诉了 OpenAI 和 Jony Ive 的 IO Products, Inc. 公司。 OpenAI 于 2025 年 5 月 21 日宣布以 65 亿美元收购 IO,但前几天悄悄撤销了相关的宣传材料。IYO 生产名为 IYO ONE 的耳戴式设备,允许用户通过语音命令与计算机和 AI 进行交互,无需屏幕或键盘。起诉书指控被告故意为竞争产品使用一个易混淆的名称。起诉书称,OpenAI CEO Sam Altman 和 Ive 的设计工作室 LoveFrom 在 2022-2025 年间多次与 IYO 代表会面,了解 IYO 的技术和商业计划细节。2025 年 3 月,Altman 告诉 IYO 正在开发名为 io 的竞争产品。IO Products 成立于 2023 年 9 月,致力于开发与 IYO 产品类似的无屏电脑交互硬件。诉讼寻求禁制令(injunctive relief),要求对商标侵权和不正当竞争赔偿。
- Firefox 140 释出
Mozilla 释出了 Firefox 140,这是一个长期支持版本(LTS)。主要新特性包括:右键标签页会显示“Unload Tab”选项,此举可减少未使用标签页占用的内存节省 CPU 资源;支持 CSS Custom Highlighting API,Chrome 从 v121 开始支持该 API;改进垂直标签,支持添加自定义搜索引擎;支持 SVG fetchpriority 属性、Cookie Store API 等。