DIGEST · 2025-07-03

OrangeBot.AI Digest — 2025-07-03

75 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Opening up ‘Zero-Knowledge Proof’ technology (blog.google)
  2. AV1@Scale: Film Grain Synthesis, The Awakening (netflixtechblog.com)
  3. Launch HN: K-Scale Labs (YC W24) – Open-Source Humanoid Robots
  4. Poor Man's Back End-as-a-Service (BaaS), Similar to Firebase/Supabase/Pocketbase (github.com)
  5. Introducing tmux-rs (richardscollin.github.io)
  6. Spending Too Much Money on a Coding Agent (allenpike.com)
  7. Peasant Railgun (knightsdigest.com)
  8. Tools: Code Is All You Need (lucumr.pocoo.org)
  9. About AI Evals (hamel.dev)
  10. Writing Code Was Never the Bottleneck (ordep.dev)
  11. I scanned all of GitHub's "oops commits" for leaked secrets (trufflesecurity.com)
  12. Astronomers discover 3I/ATLAS – Third interstellar object to visit Solar System (www.abc.net.au)
  13. Fei-Fei Li: Spatial intelligence is the next frontier in AI [video] (www.youtube.com)
  14. Next month, saved passwords will no longer be in Microsoft’s Authenticator app (www.cnet.com)
  15. Gmailtail – Command-line tool to monitor Gmail messages and output them as JSON (github.com)

GitHub Trending(15)

  1. NanmiCoder / MediaCrawler

    小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

  2. mrdoob / three.js

    JavaScript 3D Library.

  3. microsoft / generative-ai-for-beginners

    21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

  4. LadybirdBrowser / ladybird

    Truly independent web browser

  5. isaac-sim / IsaacLab

    Unified framework for robot learning built on NVIDIA Isaac Sim

  6. openssl / openssl

    TLS/SSL and crypto library

  7. Genesis-Embodied-AI / Genesis

    A generative world for general-purpose robotics & embodied AI learning.

  8. cloudcommunity / Free-Certifications

    A curated list of free courses with certifications. Also available at https://free-certifications.com/

  9. The-Cool-Coders / Project-Ideas-And-Resources

    A Collection of application ideas that can be used to improve your coding skills ❤.

  10. danny-avila / LibreChat

    Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active project.

  11. btjawa / BiliTools

    A cross-platform bilibili toolbox. 跨平台哔哩哔哩工具箱,支持下载视频、番剧等等各类资源

  12. MotiaDev / motia

    Unified Backend Framework for APIs, Events, and AI Agents

  13. datawhalechina / happy-llm

    📚 从零开始的大语言模型原理与实践教程

  14. llmware-ai / llmware

    Unified framework for building enterprise RAG pipelines with small, specialized models

  15. swagger-api / swagger-ui

    Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.

Product Hunt(15)

  1. Skala

    Legal platform for startups

  2. Autocoder.cc

    The 1st full stack vibe coding tool

  3. AppStruct

    No-code app builder

  4. LLM Gateway

    Use any AI model with just one API

  5. Livecaller

    Caller ID that works. Now for iOS.

  6. OpenMemory Chrome Extension

    Sync memory across AI's so they pick up where you left off.

  7. OpenDia

    no need to switch browsers, just use dia on chrome or arc

  8. Product Hunt – Feed-mode

    Consume Product Hunt as if it were a feed

  9. a0.dev Phase 1

    Build mobile apps in minutes using AI

  10. Memolect

    AI meeting notetaker that updates your Jira

  11. Dumbnote

    Offline-first Markdown note app. No lock-in. Just write.

  12. Browse anything (AI browser agent)

    AI browser agent that automates any web task instantly

  13. Mousio

    Control your mouse with keys

  14. Mori

    Life timer & death countdown for Chrome

  15. Wagoo

    Private desktop assistant-reduce friction & run local

Hugging Face(15)

  1. GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. Reinforcement Learning with Curriculum Sampling (RLCS) then unlocks the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding, among others. To facilitate research in this field, we open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information are released at https://github.com/THUDM/GLM-4.1V-Thinking.

  2. Kwai Keye-VL Technical Report

    While Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities on static images, they often fall short in comprehending dynamic, information-dense short-form videos, a dominant medium in today's digital landscape. To bridge this gap, we introduce Kwai Keye-VL, an 8-billion-parameter multimodal foundation model engineered for leading-edge performance in short-video understanding while maintaining robust general-purpose vision-language abilities. The development of Keye-VL rests on two core pillars: a massive, high-quality dataset exceeding 600 billion tokens with a strong emphasis on video, and an innovative training recipe. This recipe features a four-stage pre-training process for solid vision-language alignment, followed by a meticulous two-phase post-training process. The first post-training stage enhances foundational capabilities like instruction following, while the second phase focuses on stimulating advanced reasoning. In this second phase, a key innovation is our five-mode ``cold-start'' data mixture, which includes ``thinking'', ``non-thinking'', ``auto-think'', ``think with image'', and high-quality video data. This mixture teaches the model to decide when and how to reason. Subsequent reinforcement learning (RL) and alignment steps further enhance these reasoning capabilities and correct abnormal model behaviors, such as repetitive outputs. To validate our approach, we conduct extensive evaluations, showing that Keye-VL achieves state-of-the-art results on public video benchmarks and remains highly competitive on general image-based tasks (Figure 1). Furthermore, we develop and release the KC-MMBench, a new benchmark tailored for real-world short-video scenarios, where Keye-VL shows a significant advantage.

  3. LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

    Animation colorization is a crucial part of real animation industry production. Long animation colorization has high labor costs. Therefore, automated long animation colorization based on the video generation model has significant research value. Existing studies are limited to short-term colorization. These studies adopt a local paradigm, fusing overlapping features to achieve smooth transitions between local segments. However, the local paradigm neglects global information, failing to maintain long-term color consistency. In this study, we argue that ideal long-term color consistency can be achieved through a dynamic global-local paradigm, i.e., dynamically extracting global color-consistent features relevant to the current generation. Specifically, we propose LongAnimation, a novel framework, which mainly includes a SketchDiT, a Dynamic Global-Local Memory (DGLM), and a Color Consistency Reward. The SketchDiT captures hybrid reference features to support the DGLM module. The DGLM module employs a long video understanding model to dynamically compress global historical features and adaptively fuse them with the current generation features. To refine the color consistency, we introduce a Color Consistency Reward. During inference, we propose a color consistency fusion to smooth the video segment transition. Extensive experiments on both short-term (14 frames) and long-term (average 500 frames) animations show the effectiveness of LongAnimation in maintaining short-term and long-term color consistency for open-domain animation colorization task. The code can be found at https://cn-makers.github.io/long_animation_web/.

  4. Ovis-U1 Technical Report

    In this report, we introduce Ovis-U1, a 3-billion-parameter unified model that integrates multimodal understanding, text-to-image generation, and image editing capabilities. Building on the foundation of the Ovis series, Ovis-U1 incorporates a diffusion-based visual decoder paired with a bidirectional token refiner, enabling image generation tasks comparable to leading models like GPT-4o. Unlike some previous models that use a frozen MLLM for generation tasks, Ovis-U1 utilizes a new unified training approach starting from a language model. Compared to training solely on understanding or generation tasks, unified training yields better performance, demonstrating the enhancement achieved by integrating these two tasks. Ovis-U1 achieves a score of 69.6 on the OpenCompass Multi-modal Academic Benchmark, surpassing recent state-of-the-art models such as Ristretto-3B and SAIL-VL-1.5-2B. In text-to-image generation, it excels with scores of 83.72 and 0.89 on the DPG-Bench and GenEval benchmarks, respectively. For image editing, it achieves 4.00 and 6.42 on the ImgEdit-Bench and GEdit-Bench-EN, respectively. As the initial version of the Ovis unified model series, Ovis-U1 pushes the boundaries of multimodal understanding, generation, and editing.

  5. Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

    Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting? To answer this question, we evaluate over 20 open-weight reasoning-tuned models across a broad suite of tasks, including math, scientific QA, agent planning, coding, and standard instruction-following. We surprisingly find that most models that succeed in math fail to transfer their gains to other domains. To rigorously study this phenomenon, we conduct controlled experiments on Qwen3-14B models using math-only data but different tuning methods. We find that reinforcement learning (RL)-tuned models generalize well across domains, while supervised fine-tuning (SFT)-tuned models often forget general capabilities. Latent-space representation and token-space distribution shift analyses reveal that SFT induces substantial representation and output drift, while RL preserves general-domain structure. Our results suggest a need to rethink standard post-training recipes, particularly the reliance on SFT-distilled data for advancing reasoning models.

  6. SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

    Recent advances in reinforcement learning have shown that language models can develop sophisticated reasoning through training on tasks with verifiable rewards, but these approaches depend on human-curated problem-answer pairs and domain-specific reward engineering. We introduce SPIRAL, a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving versions of themselves, eliminating the need for human supervision. Through self-play, SPIRAL generates an infinite curriculum of progressively challenging problems as models must constantly adapt to stronger opponents. To enable this self-play training at scale, We implement a fully online, multi-turn, multi-agent reinforcement learning system for LLMs and propose role-conditioned advantage estimation (RAE) to stabilize multi-agent training. Using SPIRAL, self-play on zero-sum games produces reasoning capabilities that transfer broadly. Training Qwen3-4B-Base on Kuhn Poker alone achieves 8.6% improvement on math and 8.4% on general reasoning, outperforming SFT on 25,000 expert game trajectories. Analysis reveals that this transfer occurs through three cognitive patterns: systematic decomposition, expected value calculation, and case-by-case analysis. Multi-game training (TicTacToe, Kuhn Poker, Simple Negotiation) further enhances performance as each game develops distinct reasoning strengths. Applying SPIRAL to a strong reasoning model (DeepSeek-R1-Distill-Qwen-7B) can still lead to 2.0% average improvement. These results demonstrate that zero-sum games naturally develop transferable reasoning capabilities, highlighting a promising direction for autonomous reasoning development.

  7. SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

    We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks. Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses. The platform currently supports 23 open-source and proprietary foundation models and has collected over 13,000 votes from trusted researchers across diverse scientific domains. We analyze the data collected so far and confirm that the submitted questions are diverse, aligned with real-world literature needs, and that participating researchers demonstrate strong self-consistency and inter-annotator agreement in their evaluations. We discuss the results and insights based on the model ranking leaderboard. To further promote research in building model-based automated evaluation systems for literature tasks, we release SciArena-Eval, a meta-evaluation benchmark based on our collected preference data. The benchmark measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes. Our experiments highlight the benchmark's challenges and emphasize the need for more reliable automated evaluation methods.

  8. Depth Anything at Any Condition

    We present Depth Anything at Any Condition (DepthAnything-AC), a foundation monocular depth estimation (MDE) model capable of handling diverse environmental conditions. Previous foundation MDE models achieve impressive performance across general scenes but not perform well in complex open-world environments that involve challenging conditions, such as illumination variations, adverse weather, and sensor-induced distortions. To overcome the challenges of data scarcity and the inability of generating high-quality pseudo-labels from corrupted images, we propose an unsupervised consistency regularization finetuning paradigm that requires only a relatively small amount of unlabeled data. Furthermore, we propose the Spatial Distance Constraint to explicitly enforce the model to learn patch-level relative relationships, resulting in clearer semantic boundaries and more accurate details. Experimental results demonstrate the zero-shot capabilities of DepthAnything-AC across diverse benchmarks, including real-world adverse weather benchmarks, synthetic corruption benchmarks, and general benchmarks. Project Page: https://ghost233lism.github.io/depthanything-AC-page Code: https://github.com/HVision-NKU/DepthAnythingAC

  9. MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings

    Multimodal embedding models, built upon causal Vision Language Models (VLMs), have shown promise in various tasks. However, current approaches face three key limitations: the use of causal attention in VLM backbones is suboptimal for embedding tasks; scalability issues due to reliance on high-quality labeled paired data for contrastive learning; and limited diversity in training objectives and data. To address these issues, we propose MoCa, a two-stage framework for transforming pre-trained VLMs into effective bidirectional multimodal embedding models. The first stage, Modality-aware Continual Pre-training, introduces a joint reconstruction objective that simultaneously denoises interleaved text and image inputs, enhancing bidirectional context-aware reasoning. The second stage, Heterogeneous Contrastive Fine-tuning, leverages diverse, semantically rich multimodal data beyond simple image-caption pairs to enhance generalization and alignment. Our method addresses the stated limitations by introducing bidirectional attention through continual pre-training, scaling effectively with massive unlabeled datasets via joint reconstruction objectives, and utilizing diverse multimodal data for enhanced representation robustness. Experiments demonstrate that MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results, and exhibits strong scalability with both model size and training data on MMEB.

  10. Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

    Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time in nature. Motivated by this, we propose Radial Attention, a scalable sparse attention mechanism with O(n log n) complexity that translates energy decay into exponentially decaying compute density, which is significantly more efficient than standard O(n^2) dense attention and more expressive than linear attention. Specifically, Radial Attention employs a simple, static attention mask where each token attends to spatially nearby tokens, with the attention window size shrinking with temporal distance. Moreover, it allows pre-trained video diffusion models to extend their generation length with efficient LoRA-based fine-tuning. Extensive experiments show that Radial Attention maintains video quality across Wan2.1-14B, HunyuanVideo, and Mochi 1, achieving up to a 1.9times speedup over the original dense attention. With minimal tuning, it enables video generation up to 4times longer while reducing training costs by up to 4.4times compared to direct fine-tuning and accelerating inference by up to 3.7times compared to dense attention inference.

  11. VMoBA: Mixture-of-Block Attention for Video Diffusion Models

    The quadratic complexity of full attention mechanisms poses a significant bottleneck for Video Diffusion Models (VDMs) aiming to generate long-duration, high-resolution videos. While various sparse attention methods have been proposed, many are designed as training-free inference accelerators or do not optimally capture the unique spatio-temporal characteristics inherent in video data when trained natively. This paper introduces Video Mixture of Block Attention (VMoBA), a novel sparse attention mechanism specifically adapted for VDMs. Motivated by an in-depth analysis of attention patterns within pre-trained video transformers, which revealed strong spatio-temporal locality, varying query importance, and head-specific concentration levels, VMoBA enhances the original MoBA framework with three key modifications: (1) a layer-wise recurrent block partition scheme (1D-2D-3D) to dynamically adapt to diverse spatio-temporal attention patterns and improve efficiency; (2) global block selection to prioritize the most salient query-key block interactions across an entire attention head; and (3) threshold-based block selection to dynamically determine the number of attended blocks based on their cumulative similarity. Extensive experiments demonstrate that VMoBA significantly accelerates the training of VDMs on longer sequences, achieving 2.92x FLOPs and 1.48x latency speedup, while attaining comparable or even superior generation quality to full attention. Furthermore, VMoBA exhibits competitive performance in training-free inference, offering 2.40x FLOPs and 1.35x latency speedup for high-res video generation.

  12. Calligrapher: Freestyle Text Image Customization

    We introduce Calligrapher, a novel diffusion-based framework that innovatively integrates advanced text customization with artistic typography for digital calligraphy and design applications. Addressing the challenges of precise style control and data dependency in typographic customization, our framework incorporates three key technical contributions. First, we develop a self-distillation mechanism that leverages the pre-trained text-to-image generative model itself alongside the large language model to automatically construct a style-centric typography benchmark. Second, we introduce a localized style injection framework via a trainable style encoder, which comprises both Qformer and linear layers, to extract robust style features from reference images. An in-context generation mechanism is also employed to directly embed reference images into the denoising process, further enhancing the refined alignment of target styles. Extensive quantitative and qualitative evaluations across diverse fonts and design contexts confirm Calligrapher's accurate reproduction of intricate stylistic details and precise glyph positioning. By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models, empowering creative practitioners in digital art, branding, and contextual typographic design.

  13. Listener-Rewarded Thinking in VLMs for Image Preferences

    Training robust and generalizable reward models for human visual preferences is essential for aligning text-to-image and text-to-video generative models with human intent. However, current reward models often fail to generalize, and supervised fine-tuning leads to memorization, demanding complex annotation pipelines. While reinforcement learning (RL), specifically Group Relative Policy Optimization (GRPO), improves generalization, we uncover a key failure mode: a significant drop in reasoning accuracy occurs when a model's reasoning trace contradicts that of an independent, frozen vision-language model ("listener") evaluating the same output. To address this, we introduce a listener-augmented GRPO framework. Here, the listener re-evaluates the reasoner's chain-of-thought to provide a dense, calibrated confidence score, shaping the RL reward signal. This encourages the reasoner not only to answer correctly, but to produce explanations that are persuasive to an independent model. Our listener-shaped reward scheme achieves best accuracy on the ImageReward benchmark (67.4%), significantly improves out-of-distribution (OOD) performance on a large-scale human preference dataset (1.2M votes, up to +6% over naive reasoner), and reduces reasoning contradictions compared to strong GRPO and SFT baselines. These results demonstrate that listener-based rewards provide a scalable, data-efficient path to aligning vision-language models with nuanced human preferences. We will release our reasoning model here: https://huggingface.co/alexgambashidze/qwen2.5vl_image_preference_reasoner.

  14. MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation

    Recent advances in optical flow estimation have prioritized accuracy at the cost of growing GPU memory consumption, particularly for high-resolution (FullHD) inputs. We introduce MEMFOF, a memory-efficient multi-frame optical flow method that identifies a favorable trade-off between multi-frame estimation and GPU memory usage. Notably, MEMFOF requires only 2.09 GB of GPU memory at runtime for 1080p inputs, and 28.5 GB during training, which uniquely positions our method to be trained at native 1080p without the need for cropping or downsampling. We systematically revisit design choices from RAFT-like architectures, integrating reduced correlation volumes and high-resolution training protocols alongside multi-frame estimation, to achieve state-of-the-art performance across multiple benchmarks while substantially reducing memory overhead. Our method outperforms more resource-intensive alternatives in both accuracy and runtime efficiency, validating its robustness for flow estimation at high resolutions. At the time of submission, our method ranks first on the Spring benchmark with a 1-pixel (1px) outlier rate of 3.289, leads Sintel (clean) with an endpoint error (EPE) of 0.963, and achieves the best Fl-all error on KITTI-2015 at 2.94%. The code is available at https://github.com/msu-video-group/memfof.

  15. A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

    The remarkable advancements of vision and language foundation models in multimodal understanding, reasoning, and generation has sparked growing efforts to extend such intelligence to the physical world, fueling the flourishing of vision-language-action (VLA) models. Despite seemingly diverse approaches, we observe that current VLA models can be unified under a single framework: vision and language inputs are processed by a series of VLA modules, producing a chain of action tokens that progressively encode more grounded and actionable information, ultimately generating executable actions. We further determine that the primary design choice distinguishing VLA models lies in how action tokens are formulated, which can be categorized into language description, code, affordance, trajectory, goal state, latent representation, raw action, and reasoning. However, there remains a lack of comprehensive understanding regarding action tokens, significantly impeding effective VLA development and obscuring future directions. Therefore, this survey aims to categorize and interpret existing VLA research through the lens of action tokenization, distill the strengths and limitations of each token type, and identify areas for improvement. Through this systematic review and analysis, we offer a synthesized outlook on the broader evolution of VLA models, highlight underexplored yet promising directions, and contribute guidance for future research, hoping to bring the field closer to general-purpose intelligence.

Solidot(15)

  1. 海绵结构材料借助太阳热能去除海水中的盐分

    地球上的大部分水资源都是海水,由于盐分过高而无法饮用。海水淡化厂可将海水淡化处理成饮用水,然而该过程需要消耗大量能源。香港研究团队在《ACS Energy Letters》发表研究成果,其研发出一种具有长链微气囊结构的海绵结构材料,结合阳光照射与简易塑料罩,成功实现盐水资源向淡水的转化。一项户外原理验证实验成功在自然光照条件下产出可直接饮用的淡水,标志着实现低能耗可持续海水淡化技术的重大进展。在户外测试中,研究人员将这种材料置于盛有海水的蒸发容器中,上方覆盖弧形透明塑料罩。阳光加热海绵结构材料顶部时,仅会将水分蒸发为水蒸气(盐分会被阻隔)。蒸气在塑料罩内壁凝结为液态水,沿罩壁汇集至边缘,最终滴入蒸发容器下方的漏斗中,以另一容器盛放。经过 6 小时自然光照,该系统最终产出约 3 汤匙的饮用水。

  2. 系外行星引发恒星释放耀斑

    天文学家最近发现一颗名为 HIP 67522b 的系外行星,跟它的母恒星 HIP 67522 的互动关系非常不寻常。这颗行星靠母星非常近,导致恒星表面频繁发生激烈的耀斑,也让行星的大气层持续受热膨胀。HIP 67522 是一颗年轻的 G 型恒星,位于半人马座,距离地球约 417 光年,年龄大约只有 1,700 万年。这颗恒星拥有两颗行星,其中 HIP 67522b 是一颗「热木星」——体积接近木星,由于公转轨道非常靠近母星,绕转一圈只需 7 天的时间。研究团队发现,这颗行星似乎能与母恒星的磁场产生某种奇特的连结,进而引发恒星表面出现剧烈的耀斑活动。这些耀斑朝向行星爆发时,又把大量能量「反馈」到行星身上,使它的大气层像吹气球一样不断膨胀。长期下来,行星的大气可能会被严重剥离,甚至从一颗巨大的热木星,缩小成像「热海王星」或「亚海王星」那样的体积。这类母星与行星之间的强烈互动,早就在理论上被预测过,但直到现在才首次被实际观测到。

  3. 男女对婴儿晚上哭泣声音的反应差别不大

    丹麦奥胡斯大学的一项研究发现,女性并非天生比男性更容易被婴儿晚上的哭泣声惊醒。不过女性花在夜间照顾的可能性三倍于男性。研究人员开展了两项独立研究。第一项实验针对 142 名无孩成年人,结果发现女性对非常安静的声音的反应略强于男性。对于耳语级别的声音,无论是婴儿哭声还是常见的闹钟声,女性吵醒的可能性比男性高 14%。但如果声音的响度加强,男女之间不存在显著差异。第二项研究中丹麦 117 位初为人父母的夫妇记录了他们一周内的夜间照护情况。结果显示,母亲夜间婴儿照护的可能性是父亲的三倍。研究人员认为,社会因素而非生理差异才能解释其中的差异。丹麦最近将陪产假从两周延长至十一周,可能有助于平衡父母之间的育儿责任。

  4. 美国年轻人减少了游戏开支

    根据 Circana 的数据,18-24 岁的美国年轻人四月份的游戏支出比去年同期减少了 25%,总支出比去年同期减少 13%。减少开支的可能原因是经济的不确定性和就业前景黯淡。相比之下,年龄较大的群体的支出保持了稳定。美国的经济环境可能促使年轻一代改变消费习惯,对已经面临裁员的游戏行业而言,这可能不是好消息。

  5. TikTok 涌现大量 Google Veo 3 生成的种族主义视频

    MediaMatters 报告,短视频平台 TikTok 上涌现了大量由 Google Veo 3 生成的种族主义视频。攻击对象主要是黑人,称他们是“嫌疑惯犯”、父母缺席和喜欢吃西瓜的猴子。TikTok 的服务条款禁止此类内容。但相关内容的传播并未受到多少限制。TikTok 发言人表示,MediaMatters 报告中提及的账户逾半数在报告发布前就因违反政策而被封禁,其余账户现已删除。

  6. 微软裁员约九千人,游戏业务深受影响

    微软宣布了最新一轮裁员,将裁掉约九千名员工,占员工总数的不到 4%,其中 XBox 游戏业务深受影响。微软今年至今已经经历了多轮裁员,年初根据绩效裁掉了不到 1% 的员工,5 月裁掉了逾 6000 人,6 月小规模裁员 300 人左右。这一波裁员的背景是微软仍然是标普 500 中最赚钱的公司之一。在 XBox 部门,游戏工作室 The Initiative 被关,微软同时取消了多个游戏项目,包括 Perfect Dark、Everwild 等等。

  7. 天文学家可能发现了已知第三个星际天体

    ESA 宣布,天文学家可能发现了已知第三个星际天体(第一个 Oumuamua,第二个是星际彗星 2I/Borisov)。该天体暂时命名为 A11pl3Z。A11pl3Z 是近期发现的,目前位于木星轨道内,将于今年 10 月抵达近日点穿越火星轨道。天文学家测量发现该天体的偏心率约为 6,为双曲线轨道,意味着 A11pl3Z 可能起源于太阳系之外。

  8. 测试 Firefox 120 到 Firefox 141 在 Linux 下的性能

    Linux 新闻网站 Phoronix 在一台配备了 AMD Ryzen 9 9950X 的 Ubuntu Linux 系统上测试了 Firefox 120 到 Firefox 141 Beta 的性能。Firefox 差不多一个月发布一个新版本,Firefox 120 是在 2023 年 11 月发布的,而 Firefox 最新正式版本是 Firefox 140,141 还是 Beta 状态。结果显示,Firefox 141 Beta 平均比 Firefox 120 快约 12%,内存占用也更低。Mozilla 还是在努力改进浏览器的。

  9. 任天堂有意锁定 Switch 2 的 USB-C 端口阻止第三方扩展坞

    外设制造商称,任天堂使用新的加密方案有意锁定了 Switch 2 的 USB-C 端口,阻止第三方扩展坞和外设与之兼容。因抢先推出 Steam Deck 扩展坞而名声大振的 Jsaux 表示,因任天堂的行动该公司暂停了制造 Switch 2 扩展坞的计划。

  10. Copyleft-next 项目重新启动

    在 GPLv3 发布 18 年、GPLv2 发布 34 年之际,已经停滞很久的 Copyleft-next 项目宣布重新启动。该项目旨在开发下一代 Copyleft 许可证。项目发起人表示 FOSS 需要新方法强化 copyleft。Richard Fontana 和 Bradley Kuhn 担任项目的联合主编,他们都参与过 Drafting Committees of GPLv3,从中汲取了很多教训,Software Freedom Conservancy(SFC)将为该项目提供资源并托管该项目。

  11. 西北工业大学成功试飞飞天二号高超音速飞行器

    西北工业大学空天组合动力团队牵头研制的飞天二号于 2025 年 6 月 23 日成功完成试飞试验,报道称飞行速度达到了 12 马赫,创造了世界纪录。此次试验在国际上首次获取了煤油/过氧化氢推进剂火箭冲压组合动力在变结构进气、变推力加速、变攻角自主飞行等关键工况下的科学数据。该飞行器将火箭和冲压式喷气发动机合二为一,在飞行中能自主切换两种推进系统,能应对高速飞行带来的巨大压力。

  12. 顶级 AI 工程师的薪水最高超过千万美元

    AI 军备竞赛推动 AI 工程师的薪酬一路高涨,顶级 AI 工程师的薪酬最多超过了千万美元,一般也有 300-700 万美元。OpenAI 本周告诉员工,尽管其提供的薪酬接近市场最高水平,但部分核心员工仍被竞争对手挖走,因此公司正寻求以创造性的方式奖励顶尖人才。OpenAI CEO Sam Altman 此前声称 Meta 以 1 亿美元的签约奖金试图挖走该公司最顶尖的工程师。OpenAI 首席研究官 Mark Chen 在内部备忘录中表示,在部分工程师离职之后他觉得像是家里进了小偷。自 2022 年以来,AI 工程师的薪酬上涨了 50%,而大型科技公司中高级研究科学家的薪资在 50-200 万美元之间,非 AI 领域的资深软件工程师薪资为 18-22 万美元。

  13. 数据不支持左撇子更具创造力的观点

    科学家对一个多世纪以来探索用手习惯与创造力之间关系的研究进行了分析,发现人们普遍认为左撇子更具创造力的观点实际上是错误的。论文通讯作者、美国康奈尔大学副教授 Daniel Casasanto 表示:“数据并不支持左撇子在创造性思维上存在优势。事实上,有证据表明,在某些实验室测试中右撇子更具创造力,且有充分证据显示,在需要极高创造力的职业中,右撇子占比过高。”Casasanto 指出,左撇子保守估计约占人口的 10%,从科学角度看,他们本应在创造力上具有优势。这是因为发散思维,即在短时间内探索问题的多种可能解决方案并建立意外联系的能力,更多由大脑右半球主导。荟萃分析显示,用手习惯与发散思维之间的关联几乎没有不同;若说有差异,右撇子在部分测试中反而略有优势。为什么人们一直相信左撇子具有特殊创造力?作者推测,一个因素是“左撇子特殊论”——左撇子本就稀有,而创造性天才也罕见,因此人们可能认为二者存在因果关系。另一个因素是大众观念中将创造性天才与精神疾病联系起来。左撇子更可能成为艺术家,且患抑郁症和精神分裂症的概率更高。

  14. OsmAnd 地图应用项目诞生 15 周年

    OsmAnd 或代表 OpenStreetMap Automated Navigation Directions 是一款基于 OpenStreetMap 数据集的 FOSS 地图导航应用,至今已走过了 15 年历程。项目开发者称,OsmAnd 在所有平台的月活跃量为 250 万,日活跃量为 10 万以上,总安装量超过 2000 万。开发者称,过去 15 年可分为三个阶段:前五年是构建地图导航应用的基础功能;接下来五年是让功能齐全,提供丰富的插件和专业的工具;第三个五年是重写和改进了渲染引擎和路由算法等核心组件。未来的五年将是改善稳定性、速度和整合。

  15. 华为发布了使用昇腾 NPU 训练的开放权重模型

    华为发布了使用其昇腾 NPU 训练的开放权重模型,模型发布在 Gitcode 上,其许可证禁止欧盟地区使用。被称为盘古 Pro MoE 的模型总参数 720 亿,每个 token 激活 160 亿参数。模型为昇腾 300I Duo 和 800I A2 进行了优化,单卡推理性能达到了 1148 token/s,通过预测加速(speculative acceleration)能进一步提高到 1528 token/s。华为研究人员称,在参数低于 1000 亿的模型中,盘古 Pro MoE 的性能超越了 GLM-Z1-32B 和 Qwen3-32B 等知名开放权重模型。