OrangeBot.AI Digest — 2025-08-08
73 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Jim Lovell, Apollo 13 commander, has died (www.nasa.gov)
- Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?
- I want everything local – Building my offline AI workspace (instavm.io)
- The surprise deprecation of GPT-4o for ChatGPT consumers (simonwillison.net)
- Tor: How a military project became a lifeline for privacy (thereader.mitpress.mit.edu)
- Google's Genie is more impressive than GPT5 (theahura.substack.com)
- GPT-5 vs. Sonnet: Complex Agentic Coding (elite-ai-assisted-coding.dev)
- AI is impressive because we've failed at personal computing (rakhim.exotext.com)
- Astronomy Photographer of the Year 2025 shortlist (www.rmg.co.uk)
- Getting good results from Claude Code (www.dzombak.com)
- Food, housing, & health care costs are a source of major stress for many people (apnorc.org)
- How we replaced Elasticsearch and MongoDB with Rust and RocksDB (radar.com)
- Ultrathin business card runs a fluid simulation (github.com)
- Window Activation (blog.broulik.de)
- How attention sinks keep language models stable (hanlab.mit.edu)
GitHub Trending(13)
- nautechsystems / nautilus_trader
A high-performance algorithmic trading platform and event-driven backtester
- openai / openai-cookbook
Examples and guides for using the OpenAI API
- openai / codex
Lightweight coding agent that runs in your terminal
- netbirdio / netbird
Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
- FFmpeg / asm-lessons
FFMPEG Assembly Language Lessons
- polarsource / polar
An open source engine for your digital products. Sell SaaS and digital products in minutes.
- python-poetry / poetry
Python packaging and dependency management made easy
- google / adk-python
An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
- browserbase / stagehand
The AI Browser Automation Framework
- e2b-dev / awesome-ai-agents
A list of AI autonomous agents
- backstage / backstage
Backstage is an open framework for building developer portals
- google / adk-samples
A collection of sample agents built with Agent Development (ADK)
- openai / openai-python
The official Python library for the OpenAI API
Product Hunt(15)
- CourseCorrect
AI course matchmaker for real career impact
- Vireel
AI Reels from Proven Viral Formulas
- Hera
Your AI Motion Designer
- DreamCore
Develop and share mobile games from a single prompt
- Dereference
IDE for Claude Code
- Wepost
AI that builds personal brands
- Click to Woof
Tired of AI agents? – Time to woof! 🐶
- Bublr
A raw and uncluttered space for anyone to write anything.
- Wordin
Write long content with AI without losing context
- SelfHostLLM
Calculate the GPU memory you need for LLM inference
- Ninja.new
Automate manual tasks with a fully autonomous AI agent
- Votonic
AI-powered feedback hub for Discord communities
- LingoBuddy
Master languages through AI-powered conversations
- BlipCut Video Translator
Translate videos & audio to 130+ languages
- ScrollMark
Never lose your place on the web again
Hugging Face(15)
- On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model. To rectify this, we propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token. Remarkably, this single-line code change significantly outperforms standard SFT across multiple challenging benchmarks and base models, demonstrating greatly improved generalization. Additionally, our approach shows competitive results in offline RL settings, offering an effective yet simpler alternative. This work bridges theoretical insight and practical solutions, substantially advancing SFT performance. The code will be available at https://github.com/yongliang-wu/DFT.
- R-Zero: Self-Evolving Reasoning LLM from Zero Data
Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver. These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger. This process yields a targeted, self-improving curriculum without any pre-existing tasks and labels. Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.
- Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space. Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder, enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench, a standardized benchmark suite measuring visual fidelity, physical consistency, and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning
Although Vision Language Models (VLMs) exhibit strong perceptual abilities and impressive visual reasoning, they struggle with attention to detail and precise action planning in complex, dynamic environments, leading to subpar performance. Real-world tasks typically require complex interactions, advanced spatial reasoning, long-term planning, and continuous strategy refinement, usually necessitating understanding the physics rules of the target scenario. However, evaluating these capabilities in real-world scenarios is often prohibitively expensive. To bridge this gap, we introduce DeepPHY, a novel benchmark framework designed to systematically evaluate VLMs' understanding and reasoning about fundamental physical principles through a series of challenging simulated environments. DeepPHY integrates multiple physical reasoning environments of varying difficulty levels and incorporates fine-grained evaluation metrics. Our evaluation finds that even state-of-the-art VLMs struggle to translate descriptive physical knowledge into precise, predictive control.
- Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity
Despite rapid advances in 3D content generation, quality assessment for the generated 3D assets remains challenging. Existing methods mainly rely on image-based metrics and operate solely at the object level, limiting their ability to capture spatial coherence, material authenticity, and high-fidelity local details. 1) To address these challenges, we introduce Hi3DEval, a hierarchical evaluation framework tailored for 3D generative content. It combines both object-level and part-level evaluation, enabling holistic assessments across multiple dimensions as well as fine-grained quality analysis. Additionally, we extend texture evaluation beyond aesthetic appearance by explicitly assessing material realism, focusing on attributes such as albedo, saturation, and metallicness. 2) To support this framework, we construct Hi3DBench, a large-scale dataset comprising diverse 3D assets and high-quality annotations, accompanied by a reliable multi-agent annotation pipeline. We further propose a 3D-aware automated scoring system based on hybrid 3D representations. Specifically, we leverage video-based representations for object-level and material-subject evaluations to enhance modeling of spatio-temporal consistency and employ pretrained 3D features for part-level perception. Extensive experiments demonstrate that our approach outperforms existing image-based metrics in modeling 3D characteristics and achieves superior alignment with human preference, providing a scalable alternative to manual evaluations. The project page is available at https://zyh482.github.io/Hi3DEval/.
- Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
Retrieval-Augmented Generation (RAG) systems using Multimodal Large Language Models (MLLMs) show great promise for complex document understanding, yet their development is critically hampered by inadequate evaluation. Current benchmarks often focus on specific part of document RAG system and use synthetic data with incomplete ground truth and evidence labels, therefore failing to reflect real-world bottlenecks and challenges. To overcome these limitations, we introduce Double-Bench: a new large-scale, multilingual, and multimodal evaluation system that is able to produce fine-grained assessment to each component within document RAG systems. It comprises 3,276 documents (72,880 pages) and 5,168 single- and multi-hop queries across 6 languages and 4 document types with streamlined dynamic update support for potential data contamination issues. Queries are grounded in exhaustively scanned evidence pages and verified by human experts to ensure maximum quality and completeness. Our comprehensive experiments across 9 state-of-the-art embedding models, 4 MLLMs and 4 end-to-end document RAG frameworks demonstrate the gap between text and visual embedding models is narrowing, highlighting the need in building stronger document retrieval models. Our findings also reveal the over-confidence dilemma within current document RAG frameworks that tend to provide answer even without evidence support. We hope our fully open-source Double-Bench provide a rigorous foundation for future research in advanced document RAG systems. We plan to retrieve timely corpus and release new benchmarks on an annual basis.
- Are Today's LLMs Ready to Explain Well-Being Concepts?
Well-being encompasses mental, physical, and social dimensions essential to personal growth and informed life decisions. As individuals increasingly consult Large Language Models (LLMs) to understand well-being, a key challenge emerges: Can LLMs generate explanations that are not only accurate but also tailored to diverse audiences? High-quality explanations require both factual correctness and the ability to meet the expectations of users with varying expertise. In this work, we construct a large-scale dataset comprising 43,880 explanations of 2,194 well-being concepts, generated by ten diverse LLMs. We introduce a principle-guided LLM-as-a-judge evaluation framework, employing dual judges to assess explanation quality. Furthermore, we show that fine-tuning an open-source LLM using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) can significantly enhance the quality of generated explanations. Our results reveal: (1) The proposed LLM judges align well with human evaluations; (2) explanation quality varies significantly across models, audiences, and categories; and (3) DPO- and SFT-finetuned models outperform their larger counterparts, demonstrating the effectiveness of preference-based learning for specialized explanation tasks.
- Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
Large Multimodal Models (LMMs) have witnessed remarkable growth, showcasing formidable capabilities in handling intricate multimodal tasks with exceptional performance. Recent research has underscored the inclination of large language models to passively accept defective inputs, often resulting in futile reasoning on invalid prompts. However, the same critical question of whether LMMs can actively detect and scrutinize erroneous inputs still remains unexplored. To address this gap, we introduce the Input Scrutiny Ability Evaluation Framework (ISEval), which encompasses seven categories of flawed premises and three evaluation metrics. Our extensive evaluation of ten advanced LMMs has identified key findings. Most models struggle to actively detect flawed textual premises without guidance, which reflects a strong reliance on explicit prompts for premise error identification. Error type affects performance: models excel at identifying logical fallacies but struggle with surface-level linguistic errors and certain conditional flaws. Modality trust varies-Gemini 2.5 pro and Claude Sonnet 4 balance visual and textual info, while aya-vision-8b over-rely on text in conflicts. These insights underscore the urgent need to enhance LMMs' proactive verification of input validity and shed novel insights into mitigating the problem. The code is available at https://github.com/MLGroupJLU/LMM_ISEval.
- Marco-Voice Technical Report
This paper presents a multifunctional speech synthesis system that integrates voice cloning and emotion control speech synthesis within a unified framework. The goal of this work is to address longstanding challenges in achieving highly expressive, controllable, and natural speech generation that faithfully preserves speaker identity across diverse linguistic and emotional contexts. Our approach introduces an effective speaker-emotion disentanglement mechanism with in-batch contrastive learning, enabling independent manipulation of speaker identity and eemotional style, as well as rotational emotional embedding integration method for smooth emotion control. To support comprehensive training and evaluation, we construct CSEMOTIONS, a high-quality emotional speech dataset containing 10 hours of Mandarin speech from six professional speakers across seven emotional categories. Extensive experiments demonstrate that our system, Marco-Voice, achieves substantial improvements in both objective and subjective metrics. Comprehensive evaluations and analysis were conducted, results show that MarcoVoice delivers competitive performance in terms of speech clarity and emotional richness, representing a substantial advance in the field of expressive neural speech synthesis.
- CoAct-1: Computer-using Agents with Coding as Actions
Autonomous agents that operate computers via Graphical User Interfaces (GUIs) often struggle with efficiency and reliability on complex, long-horizon tasks. While augmenting these agents with planners can improve task decomposition, they remain constrained by the inherent limitations of performing all actions through GUI manipulation, leading to brittleness and inefficiency. In this work, we introduce a more robust and flexible paradigm: enabling agents to use coding as a enhanced action. We present CoAct-1, a novel multi-agent system that synergistically combines GUI-based control with direct programmatic execution. CoAct-1 features an Orchestrator that dynamically delegates subtasks to either a conventional GUI Operator or a specialized Programmer agent, which can write and execute Python or Bash scripts. This hybrid approach allows the agent to bypass inefficient GUI action sequences for tasks like file management and data processing, while still leveraging visual interaction when necessary. We evaluate our system on the challenging OSWorld benchmark, where CoAct-1 achieves a new state-of-the-art success rate of 60.76%, significantly outperforming prior methods. Furthermore, our approach dramatically improves efficiency, reducing the average number of steps required to complete a task to just 10.15, compared to 15 for leading GUI agents. Our results demonstrate that integrating coding as a core action provides a more powerful, efficient, and scalable path toward generalized computer automation.
- InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
Large language models (LLMs) have exhibited impressive reasoning abilities on a wide range of complex tasks. However, enhancing these capabilities through post-training remains resource intensive, particularly in terms of data and computational cost. Although recent efforts have sought to improve sample efficiency through selective data curation, existing methods often rely on heuristic or task-specific strategies that hinder scalability. In this work, we introduce InfiAlign, a scalable and sample-efficient post-training framework that integrates supervised fine-tuning (SFT) with Direct Preference Optimization (DPO) to align LLMs for enhanced reasoning. At the core of InfiAlign is a robust data selection pipeline that automatically curates high-quality alignment data from open-source reasoning datasets using multidimensional quality metrics. This pipeline enables significant performance gains while drastically reducing data requirements and remains extensible to new data sources. When applied to the Qwen2.5-Math-7B-Base model, our SFT model achieves performance on par with DeepSeek-R1-Distill-Qwen-7B, while using only approximately 12% of the training data, and demonstrates strong generalization across diverse reasoning tasks. Additional improvements are obtained through the application of DPO, with particularly notable gains in mathematical reasoning tasks. The model achieves an average improvement of 3.89% on AIME 24/25 benchmarks. Our results highlight the effectiveness of combining principled data selection with full-stage post-training, offering a practical solution for aligning large reasoning models in a scalable and data-efficient manner. The model checkpoints are available at https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT.
- Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Recently, Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks. Among them, DeepSeek R1 has garnered significant attention for its exceptional performance and open-source nature, driving advancements in the research of R1-style LRMs. Unlike traditional Large Language Models (LLMs), these models enhance logical deduction and decision-making capabilities during reasoning by incorporating mechanisms such as long chain-of-thought and self-reflection through reinforcement learning. However, with the widespread application of these models, the problem of overthinking has gradually emerged. Specifically, when generating answers, these models often construct excessively long reasoning chains with redundant or repetitive steps, which leads to reduced reasoning efficiency and may affect the accuracy of the final answer. To this end, various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability. By reviewing the current research advancements in the field of efficient reasoning methods systematically, we categorize existing works into two main directions based on the lens of single-model optimization versus model collaboration: (1) Efficient Reasoning with Single Model, which focuses on improving the reasoning efficiency of individual models; and (2) Efficient Reasoning with Model Collaboration, which explores optimizing reasoning paths through collaboration among multiple models. Besides, we maintain a public GitHub repository that tracks the latest progress in efficient reasoning methods.
- Evaluating, Synthesizing, and Enhancing for Customer Support Conversation
Effective customer support requires not only accurate problem solving but also structured and empathetic communication aligned with professional standards. However, existing dialogue datasets often lack strategic guidance, and real-world service data is difficult to access and annotate. To address this, we introduce the task of Customer Support Conversation (CSC), aimed at training customer service agents to respond using well-defined support strategies. We propose a structured CSC framework grounded in COPC guidelines, defining five conversational stages and twelve strategies to guide high-quality interactions. Based on this, we construct CSConv, an evaluation dataset of 1,855 real-world customer-agent conversations rewritten using LLMs to reflect deliberate strategy use, and annotated accordingly. Additionally, we develop a role-playing approach that simulates strategy-rich conversations using LLM-powered roles aligned with the CSC framework, resulting in the training dataset RoleCS. Experiments show that fine-tuning strong LLMs on RoleCS significantly improves their ability to generate high-quality, strategy-aligned responses on CSConv. Human evaluations further confirm gains in problem resolution. All code and data will be made publicly available at https://github.com/aliyun/qwen-dianjin.
- MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
Video object segmentation (VOS) aims to segment specified target objects throughout a video. Although state-of-the-art methods have achieved impressive performance (e.g., 90+% J&F) on existing benchmarks such as DAVIS and YouTube-VOS, these datasets primarily contain salient, dominant, and isolated objects, limiting their generalization to real-world scenarios. To advance VOS toward more realistic environments, coMplex video Object SEgmentation (MOSEv1) was introduced to facilitate VOS research in complex scenes. Building on the strengths and limitations of MOSEv1, we present MOSEv2, a significantly more challenging dataset designed to further advance VOS methods under real-world conditions. MOSEv2 consists of 5,024 videos and over 701,976 high-quality masks for 10,074 objects across 200 categories. Compared to its predecessor, MOSEv2 introduces significantly greater scene complexity, including more frequent object disappearance and reappearance, severe occlusions and crowding, smaller objects, as well as a range of new challenges such as adverse weather (e.g., rain, snow, fog), low-light scenes (e.g., nighttime, underwater), multi-shot sequences, camouflaged objects, non-physical targets (e.g., shadows, reflections), scenarios requiring external knowledge, etc. We benchmark 20 representative VOS methods under 5 different settings and observe consistent performance drops. For example, SAM2 drops from 76.4% on MOSEv1 to only 50.9% on MOSEv2. We further evaluate 9 video object tracking methods and find similar declines, demonstrating that MOSEv2 presents challenges across tasks. These results highlight that despite high accuracy on existing datasets, current VOS methods still struggle under real-world complexities. MOSEv2 is publicly available at https://MOSE.video.
- StrandDesigner: Towards Practical Strand Generation with Sketch Guidance
Realistic hair strand generation is crucial for applications like computer graphics and virtual reality. While diffusion models can generate hairstyles from text or images, these inputs lack precision and user-friendliness. Instead, we propose the first sketch-based strand generation model, which offers finer control while remaining user-friendly. Our framework tackles key challenges, such as modeling complex strand interactions and diverse sketch patterns, through two main innovations: a learnable strand upsampling strategy that encodes 3D strands into multi-scale latent spaces, and a multi-scale adaptive conditioning mechanism using a transformer with diffusion heads to ensure consistency across granularity levels. Experiments on several benchmark datasets show our method outperforms existing approaches in realism and precision. Qualitative results further confirm its effectiveness. Code will be released at [GitHub](https://github.com/fighting-Zhang/StrandDesigner).
Solidot(15)
- 中国主要太阳能公司去年裁员近三分之一
数据显示,中国主要太阳能公司去年裁员近三分之一。隆基绿能、天合光能、晶科能源、晶澳太阳能和通威集团去年共计裁员约 8.7 万人,平均占员工总数的 31%。裁员凸显了企业受产能过剩和低迷需求,陷入价格战的影响。全球每年生产的太阳能电池板数量是使用量的两倍,大部分产品由中国公司制造。
- Windows 10 的 30 美元扩展安全更新支持单一账号 10 台设备
Windows 10 即将于 2025 年 10 月 14 日终止支持,之后微软不再提供安全更新。然而 Windows 10 仍然有相当高的市场占有率,因此届时可能会有数以亿计的用户将无法获得安全更新。对于短时间内不会更新到 Windows 11 的用户,微软将向他们提供一次性的为期一年的扩展安全更新,费用为 30 美元,截至 2026 年 10 月 13 日。根据微软的支持文档,30 美元的扩展安全更新许可证支持一个微软帐户最多 10 台设备。
- Linux 桌面市场份额达到 6%
IT 资产盘点和库存公司 Lansweeper 称,对逾 1500 万消费者桌面操作系统的分析显示,Linux 桌面市场份额达到 6% 以上。有多想统计数据都显示 Linux 桌面在 6% 左右,如统计访问美国联邦政府网站和应用的 US Federal Government Website and App Analytics 显示,过去 90 天 Linux 桌面操作系统市场份额达到创记录的 6.3%,StatCounter 数据显示 7 月 Linux 桌面操作系统的市场份额创下了 5.24% 的新高。欧洲商业服务和政府等部门的 Linux 普及率高于北美,但北美科技、电信、金融和保险等部门的 Linux 普及率则高于欧洲。
- OpenAI 发布 GPT-5
OpenAI 发布了新模型 GPT-5。相比旧模型,GPT-5 仍然是一个渐进改变的版本,并不是一次巨大的飞跃。OpenAI 称,GPT-5 更智能和更快,显著减少了幻觉率。CEO Sam Altman 声称,和 GPT-5 对话就像是和一位博士水平的专家对话。GPT-5 提供给所有用户,免费用户的配额用光之后将改用 PT-5 mini,Pro 会员将使用 GPT-5 Pro 版本。
- 科学家重新创造宇宙第一种分子
138 亿年前的创世大爆炸后,宇宙处于难以想象的炙热和难以置信的致密状态。但几秒钟后,宇宙就冷却到足以形成第一批元素,主要是氢和氦。这些元素仍处于完全电离态,宇宙温度在接近 38 万年后才冷却到能通过与自由电子复合形成中性原子。这为最初的化学反应铺平了道路。最古老的分子是氦合氢离子(HeH+),这标志着链式反应的开始,最终形成了今天宇宙最常见的氢分子(H2)。简单分子如 HeH⁺ 和 H2 对首批恒星的形成至关重要。德国马普核物理研究所的研究人员首次在类似早期宇宙的条件下重新创造出宇宙第一种分子,观察其链式反应,进一步接近解开第一批恒星形成之谜。
- 研究显示大模型生成虚假临床信息的可能性高于五成
根据发表在《Communications Medicine》期刊上的一项研究,纽约西奈山医疗中心的研究人员在三种条件下测试了六种大模型,其中一种条件是温度 0。结果显示,不同模型和提示方法的幻觉率在 50% 到 82% 之间。所谓幻觉就是生成了虚假信息。研究人员使用了基于提示词的缓解措施,幻觉率从 66% 降低到 44%,其中表现最佳的是 OpenAI 的 GPT-4o,其幻觉率从 53% 降至 23%。调正温度对减少幻觉率没什么帮助。
- 特朗普呼吁陈立武立即辞职
美国总统特朗普总统周四呼吁英特尔 CEO 陈立武辞职,原因被认为与陈立武过去投资了中国芯片公司的经历有关。特朗普周四在 Truth Social 上要求陈立武立即辞职,称没有其它解决方案。在他发帖前,美国阿肯色州共和党参议员 Tom Cotton(R., Ark.)致函英特尔董事会,质疑了陈立武与中国政府的关系。Cotton 称,获得政府拨款的美国公司应负责任地管理纳税人的钱,遵守严格的安全规定,英特尔董事会应向国会做出解释。
- 以色列在微软服务器上储存了数百万巴勒斯坦人的电话呼叫
以色列自 2022 年以来一直大规模监控加沙和西岸地区,记录并存储了数百万巴勒斯坦人的电话呼叫。这些数据储存在微软位于荷兰和爱尔兰的 Azure 云设施中。这项计划是微软 CEO 纳德拉(Satya Nadella)在与以色列军方监视机构 Unit 8200 的指挥官见面后亲自敲定的。纳德拉为以色列在 Azure 云中设立了一个定制的隔离区,用于存储数百万巴勒斯坦人的电话呼叫。这些电话记录被用于帮助军方发动致命空袭,以及帮助制定在该地区的军事行动。以色列长期以来一直拦截被占领土上的电话呼叫,因为它基本上控制着整个巴勒斯坦的电信基础设施。微软在以色列军事行动中扮演的角色招致了越来越多的批评,公司员工今年五月打断了一次主题演讲就此事直接质问纳德拉。
- 来自深海细菌的多糖能导致癌细胞自毁
细胞焦亡(pyroptosis)是一种炎症形式的“溶解性细胞程序性死亡方式”,最近几年它的抗癌潜力受到了广泛关注。根据发表在《FASEB Journal》上的一项研究,中科院海洋所的研究人员报告从深海细菌 Spongiibacter 中分离出一种新的活性胞外多糖 EPS3.9,能通过细胞焦亡诱导癌细胞溶解性死亡。对小鼠的实验发现,EPS3.9 不仅显著抑制了肿瘤生长,还激活了抗肿瘤免疫反应。
- 日本人口连续 16 年减少
日本总务省周三公布了基于居民基本台账的人口动态。截至 1 月 1 日,日本人口为 1 亿 2065万3227 人,比上年减少 90 万 8574 人。连续 16 年减少,同比降幅创下调查开始的 1968 年以来的新高。随着少子老龄化的加剧,死亡人数超过出生人数的“自然减少”现象扩大。死亡人数为 159 万 9850 人,为历史新高,而出生人数为 68 万 7689 人,创历史新低。外国人增加 11%,达到 367 万 7463 人,首次超过 350 万人。创出自 2013 年开始统计以来的新高。增加幅度也达到创历史最高的 35 万 4089 人。包括外国人在内的“总人口”为 1 亿 2433 万 690 人。外籍居民占总人口的比例为 2.96%。劳动力依赖外国人的现象日益明显。
- 《战地6》和《使命召唤 黑色行动 7》都要求 PC 玩家启用 Secure Boot
两大知名 FPS 游戏系列的开发商先后宣布其最新作品将需要 PC 玩家启用 Trusted Platform Module 2.0 和 Secure Boot,理由都是反作弊。《战地6》的开发商 EA 称,启用 Secure Boot 有助于打击“内核级作弊系统和 Rootkits,内存操控和注入,伪装和硬件 ID 篡改,虚拟机和仿真,篡改反作弊系统”。《使命召唤》开发商 Activision 称今年晚些时候推出的《使命召唤 黑色行动 7》需要启用 TPM 2.0 和 Secure Boot,它将在本周四的《使命召唤:黑色行动 6》第五季中测试新的反作弊措施,但不会强制执行。
- 特朗普威胁对芯片征收 100% 关税,除非在美建厂或承诺建厂
美国总统特朗普周三表示将对进口半导体和芯片征收 100% 关税,他没有透露具体细节。如果半导体厂商想要豁免关税,它们要么需要在美国建厂,要么需要承诺将在美国建厂。特朗普说,“我们将对芯片和半导体征收高额关税,但对像苹果这样的公司来说,好消息是如果你在美国生产,或者已承诺在美国生产,就不会征收任何费用。”苹果公司在这之前承诺未来四年在美国投资 1000 亿美元以促进美国制造业的发展。
- 日本禁止苹果 iOS 限制第三方浏览器引擎
日本最近通过了被称为《Bill on the Promotion of Competition for Specified Software Used in Smartphones》的智能手机法案,其中之一是禁止苹果在其 iOS 平台上限制第三方浏览器引擎的做法。第三方浏览器登陆苹果的平台必须使用它的浏览器引擎——即 WebKit,Firefox、Chrome、Edge、Opera、Brave 和 Vivaldi 的 iOS 版本都是 WebKit 的换皮,此举导致 iOS 上的浏览器缺乏竞争。上周日本发布了指南 Mobile Software Competition Act (MSCA) Guidelines,明确禁止苹果的这项政策。MSCA 将于 2025 年 12 月生效,执行该法律将是它面临的一大挑战,欧盟和英国都制定了类似的法律。
- Grok 未经用户要求就生成斯威夫特的裸照
马斯克(Elon Musk)旗下 AI 公司 xAI 的聊天机器人 Grok 被发现未经用户要求就生成了著名歌星斯威夫特(Taylor Swift)的裸照。用户使用了提示词 Taylor Swift celebrating Coachella with the boys 选择预设 spicy 生成视频,结果 Grok 生成了斯威夫特在一群 AI 观众前脱衣和穿丁字裤跳舞的视频。随着 Take It Down Act 法案将于明年生效,如果平台放任 AI 生成深度伪造的裸照 xAI 可能会面临法律后果。
- 维基百科编辑对 AI 生成文章采用加速删除政策
维基百科编辑采用了一项新政策去处理大量涌入的 AI 生成文章。新政策允许管理员快速删除符合一定条件的 AI 生成文章。维基百科以前的文章删除流程通常需要长达一周的讨论。对于 AI 生成文章,在标记和审核是否符合条件之后管理员可以无需讨论快速删除。允许快速删除的 AI 文章需要满足两个条件:其一包含明显的大模型对提示词的回应如 Here is your Wikipedia article on…、Up to my last training update … 以及 as a large language model 等等之类;另一个条件是大模型经常犯的错误——引用不存在或显然错误的来源。