OrangeBot.AI Digest — 2025-11-07
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- AI is Dunning-Kruger as a service (christianheilmann.com)
- YouTube Removes Windows 11 Bypass Tutorials, Claims 'Risk of Physical Harm' (news.itsfoss.com)
- VLC's Jean-Baptiste Kempf Receives the European SFS Award 2025 (fsfe.org)
- Apple is crossing a Steve Jobs red line (kensegall.com)
- James Watson has died (www.nytimes.com)
- Rockstar employee shares account of the company's union-busting efforts (gtaforums.com)
- Myna: Monospace typeface designed for symbol-heavy programming languages (github.com)
- Gmail AI gets more intrusive (daveverse.org)
- Vodafone Germany is killing the open internet – one peering connection at a time (coffee.link)
- Denmark's government aims to ban access to social media for children under 15 (apnews.com)
- I Love OCaml (mccd.space)
- A.I. and Social Media Contribute to 'Brain Rot' (www.nytimes.com)
- OpenMW 0.50.0 Released – open-source Morrowind reimplementation (openmw.org)
- Meta projected 10% of 2024 revenue came from scams (sherwood.news)
- Lessons from Growing a Piracy Streaming Site (prison.josh.mn)
GitHub Trending(15)
- prometheus / alertmanager
Prometheus Alertmanager
- 666ghj / BettaFish
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
- simstudioai / sim
Open-source platform to build and deploy AI agent workflows.
- lima-vm / lima
Linux virtual machines, with a focus on running containers
- awslabs / mcp
AWS MCP Servers — helping you get the most out of AWS, wherever you use MCP.
- usestrix / strix
✨ Open-source AI hackers for your apps 👨🏻💻
- blakeblackshear / frigate
NVR with realtime local object detection for IP cameras
- imthenachoman / How-To-Secure-A-Linux-Server
An evolving how-to guide for securing a Linux server.
- FFmpeg / asm-lessons
FFmpeg Assembly Language Lessons
- ad-on-is / rachoon
🦝 Rachoon — A self-hostable way to handle invoices
- TheAlgorithms / Python
All Algorithms implemented in Python
- jwasham / coding-interview-university
A complete computer science study plan to become a software engineer.
- Shubhamsaboo / awesome-llm-apps
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
- GoogleCloudPlatform / vertex-ai-creative-studio
GenMedia Creative Studio is a Vertex AI generative media user experience highlighting the use of Imagen, Veo, Gemini 🍌, Gemini TTS, Chirp 3, Lyria and other generative media APIs on Google Cloud.
- GopeedLab / gopeed
A modern download manager that supports all platforms. Built with Golang and Flutter.
Hugging Face(15)
- Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
"Thinking with Text" and "Thinking with Images" paradigm significantly improve the reasoning ability of large language models (LLMs) and Vision Language Models (VLMs). However, these paradigms have inherent limitations. (1) Images capture only single moments and fail to represent dynamic processes or continuous changes, and (2) The separation of text and vision as distinct modalities, hindering unified multimodal understanding and generation. To overcome these limitations, we introduce "Thinking with Video", a new paradigm that leverages video generation models, such as Sora-2, to bridge visual and textual reasoning in a unified temporal framework. To support this exploration, we developed the Video Thinking Benchmark (VideoThinkBench). VideoThinkBench encompasses two task categories: (1) vision-centric tasks (e.g., Eyeballing Puzzles), and (2) text-centric tasks (e.g., subsets of GSM8K, MMMU). Our evaluation establishes Sora-2 as a capable reasoner. On vision-centric tasks, Sora-2 is generally comparable to state-of-the-art (SOTA) VLMs, and even surpasses VLMs on several tasks, such as Eyeballing Games. On text-centric tasks, Sora-2 achieves 92% accuracy on MATH, and 75.53% accuracy on MMMU. Furthermore, we systematically analyse the source of these abilities. We also find that self-consistency and in-context learning can improve Sora-2's performance. In summary, our findings demonstrate that the video generation model is the potential unified multimodal understanding and generation model, positions "thinking with video" as a unified multimodal reasoning paradigm.
- V-Thinker: Interactive Thinking with Images
Empowering Large Multimodal Models (LMMs) to deeply integrate image interaction with long-horizon reasoning capabilities remains a long-standing challenge in this field. Recent advances in vision-centric reasoning explore a promising "Thinking with Images" paradigm for LMMs, marking a shift from image-assisted reasoning to image-interactive thinking. While this milestone enables models to focus on fine-grained image regions, progress remains constrained by limited visual tool spaces and task-specific workflow designs. To bridge this gap, we present V-Thinker, a general-purpose multimodal reasoning assistant that enables interactive, vision-centric thinking through end-to-end reinforcement learning. V-Thinker comprises two key components: (1) a Data Evolution Flywheel that automatically synthesizes, evolves, and verifies interactive reasoning datasets across three dimensions-diversity, quality, and difficulty; and (2) a Visual Progressive Training Curriculum that first aligns perception via point-level supervision, then integrates interactive reasoning through a two-stage reinforcement learning framework. Furthermore, we introduce VTBench, an expert-verified benchmark targeting vision-centric interactive reasoning tasks. Extensive experiments demonstrate that V-Thinker consistently outperforms strong LMM-based baselines in both general and interactive reasoning scenarios, providing valuable insights for advancing image-interactive reasoning applications.
- Scaling Agent Learning via Experience Synthesis
While reinforcement learning (RL) can empower large language model (LLM) agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. Rather than relying on expensive real-environment rollouts, DreamGym distills environment dynamics into a reasoning-based experience model that derives consistent state transitions and feedback signals through step-by-step reasoning, enabling scalable agent rollout collection for RL. To improve the stability and quality of transitions, DreamGym leverages an experience replay buffer initialized with offline real-world data and continuously enriched with fresh interactions to actively support agent training. To improve knowledge acquisition, DreamGym adaptively generates new tasks that challenge the current agent policy, enabling more effective online curriculum learning. Experiments across diverse environments and agent backbones demonstrate that DreamGym substantially improves RL training, both in fully synthetic settings and in sim-to-real transfer scenarios. On non-RL-ready tasks like WebArena, DreamGym outperforms all baselines by over 30%. And in RL-ready but costly settings, it matches GRPO and PPO performance using only synthetic interactions. When transferring a policy trained purely on synthetic experiences to real-environment RL, DreamGym yields significant additional performance gains while requiring far fewer real-world interactions, providing a scalable warm-start strategy for general-purpose RL.
- Cambrian-S: Towards Spatial Supersensing in Video
We argue that progress in true multimodal intelligence calls for a shift from reactive, task-driven systems and brute-force long context towards a broader paradigm of supersensing. We frame spatial supersensing as four stages beyond linguistic-only understanding: semantic perception (naming what is seen), streaming event cognition (maintaining memory across continuous experiences), implicit 3D spatial cognition (inferring the world behind pixels), and predictive world modeling (creating internal models that filter and organize information). Current benchmarks largely test only the early stages, offering narrow coverage of spatial cognition and rarely challenging models in ways that require true world modeling. To drive progress in spatial supersensing, we present VSI-SUPER, a two-part benchmark: VSR (long-horizon visual spatial recall) and VSC (continual visual spatial counting). These tasks require arbitrarily long video inputs yet are resistant to brute-force context expansion. We then test data scaling limits by curating VSI-590K and training Cambrian-S, achieving +30% absolute improvement on VSI-Bench without sacrificing general capabilities. Yet performance on VSI-SUPER remains limited, indicating that scale alone is insufficient for spatial supersensing. We propose predictive sensing as a path forward, presenting a proof-of-concept in which a self-supervised next-latent-frame predictor leverages surprise (prediction error) to drive memory and event segmentation. On VSI-SUPER, this approach substantially outperforms leading proprietary baselines, showing that spatial supersensing requires models that not only see but also anticipate, select, and organize experience.
- GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
We introduce GUI-360^circ, a large-scale, comprehensive dataset and benchmark suite designed to advance computer-using agents (CUAs). CUAs present unique challenges and is constrained by three persistent gaps: a scarcity of real-world CUA tasks, the lack of automated collection-and-annotation pipelines for multi-modal trajectories, and the absence of a unified benchmark that jointly evaluates GUI grounding, screen parsing, and action prediction. GUI-360^circ addresses these gaps with an LLM-augmented, largely automated pipeline for query sourcing, environment-template construction, task instantiation, batched execution, and LLM-driven quality filtering. The released corpus contains over 1.2M executed action steps across thousands of trajectories in popular Windows office applications, and includes full-resolution screenshots, accessibility metadata when available, instantiated goals, intermediate reasoning traces, and both successful and failed action trajectories. The dataset supports three canonical tasks, GUI grounding, screen parsing, and action prediction, and a hybrid GUI+API action space that reflects modern agent designs. Benchmarking state-of-the-art vision--language models on GUI-360^circ reveals substantial out-of-the-box shortcomings in grounding and action prediction; supervised fine-tuning and reinforcement learning yield significant gains but do not close the gap to human-level reliability. We release GUI-360^circ and accompanying code to facilitate reproducible research and accelerate progress on robust desktop CUAs. The full dataset has been made public on https://huggingface.co/datasets/vyokky/GUI-360.
- NVIDIA Nemotron Nano V2 VL
We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and training recipes. Nemotron Nano V2 VL builds on Nemotron Nano V2, a hybrid Mamba-Transformer LLM, and innovative token reduction techniques to achieve higher inference throughput in long document and video scenarios. We are releasing model checkpoints in BF16, FP8, and FP4 formats and sharing large parts of our datasets, recipes and training code.
- Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to test-set leakage. While prior works have proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for contaminated VLMs remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. We then propose a novel simple yet effective detection method based on multi-modal semantic perturbation, demonstrating that contaminated models fail to generalize under controlled perturbations. Finally, we validate our approach across multiple realistic contamination strategies, confirming its robustness and effectiveness. The code and perturbed dataset will be released publicly.
- The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
The strong lottery ticket hypothesis (SLTH) conjectures that high-performing subnetworks, called strong lottery tickets (SLTs), are hidden in randomly initialized neural networks. Although recent theoretical studies have established the SLTH across various neural architectures, the SLTH for transformer architectures still lacks theoretical understanding. In particular, the current theory of the SLTH does not yet account for the multi-head attention (MHA) mechanism, a core component of transformers. To address this gap, we introduce a theoretical analysis of the existence of SLTs within MHAs. We prove that, if a randomly initialized MHA of H heads and input dimension d has the hidden dimension O(dlog(Hd^{3/2})) for the key and value, it contains an SLT that approximates an arbitrary MHA with the same input dimension with high probability. Furthermore, by leveraging this theory for MHAs, we extend the SLTH to transformers without normalization layers. We empirically validate our theoretical findings, demonstrating that the approximation error between the SLT within a source model (MHA and transformer) and an approximate target counterpart decreases exponentially by increasing the hidden dimension of the source model.
- Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Robust benchmarks are crucial for evaluating Multimodal Large Language Models (MLLMs). Yet we find that models can ace many multimodal benchmarks without strong visual understanding, instead exploiting biases, linguistic priors, and superficial patterns. This is especially problematic for vision-centric benchmarks that are meant to require visual inputs. We adopt a diagnostic principle for benchmark design: if a benchmark can be gamed, it will be. Designers should therefore try to ``game'' their own benchmarks first, using diagnostic and debiasing procedures to systematically identify and mitigate non-visual biases. Effective diagnosis requires directly ``training on the test set'' -- probing the released test set for its intrinsic, exploitable patterns. We operationalize this standard with two components. First, we diagnose benchmark susceptibility using a ``Test-set Stress-Test'' (TsT) methodology. Our primary diagnostic tool involves fine-tuning a powerful Large Language Model via k-fold cross-validation on exclusively the non-visual, textual inputs of the test set to reveal shortcut performance and assign each sample a bias score s(x). We complement this with a lightweight Random Forest-based diagnostic operating on hand-crafted features for fast, interpretable auditing. Second, we debias benchmarks by filtering high-bias samples using an ``Iterative Bias Pruning'' (IBP) procedure. Applying this framework to four benchmarks -- VSI-Bench, CV-Bench, MMMU, and VideoMME -- we uncover pervasive non-visual biases. As a case study, we apply our full framework to create VSI-Bench-Debiased, demonstrating reduced non-visual solvability and a wider vision-blind performance gap than the original.
- How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
Automatic evaluation of speech-to-text translation (ST) systems is typically performed by comparing translation hypotheses with one or more reference translations. While effective to some extent, this approach inherits the limitation of reference-based evaluation that ignores valuable information from the source input. In machine translation (MT), recent progress has shown that neural metrics incorporating the source text achieve stronger correlation with human judgments. Extending this idea to ST, however, is not trivial because the source is audio rather than text, and reliable transcripts or alignments between source and references are often unavailable. In this work, we conduct the first systematic study of source-aware metrics for ST, with a particular focus on real-world operating conditions where source transcripts are not available. We explore two complementary strategies for generating textual proxies of the input audio, automatic speech recognition (ASR) transcripts, and back-translations of the reference translation, and introduce a novel two-step cross-lingual re-segmentation algorithm to address the alignment mismatch between synthetic sources and reference translations. Our experiments, carried out on two ST benchmarks covering 79 language pairs and six ST systems with diverse architectures and performance levels, show that ASR transcripts constitute a more reliable synthetic source than back-translations when word error rate is below 20%, while back-translations always represent a computationally cheaper but still effective alternative. Furthermore, our cross-lingual re-segmentation algorithm enables robust use of source-aware MT metrics in ST evaluation, paving the way toward more accurate and principled evaluation methodologies for speech translation.
- Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots
Humanoid soccer poses a representative challenge for embodied intelligence, requiring robots to operate within a tightly coupled perception-action loop. However, existing systems typically rely on decoupled modules, resulting in delayed responses and incoherent behaviors in dynamic environments, while real-world perceptual limitations further exacerbate these issues. In this work, we present a unified reinforcement learning-based controller that enables humanoid robots to acquire reactive soccer skills through the direct integration of visual perception and motion control. Our approach extends Adversarial Motion Priors to perceptual settings in real-world dynamic environments, bridging motion imitation and visually grounded dynamic control. We introduce an encoder-decoder architecture combined with a virtual perception system that models real-world visual characteristics, allowing the policy to recover privileged states from imperfect observations and establish active coordination between perception and action. The resulting controller demonstrates strong reactivity, consistently executing coherent and robust soccer behaviors across various scenarios, including real RoboCup matches.
- SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
Despite impressive high-level video comprehension, multimodal language models struggle with spatial reasoning across time and space. While current spatial training approaches rely on real-world video data, obtaining diverse footage with precise spatial annotations remains a bottleneck. To alleviate this bottleneck, we present SIMS-V -- a systematic data-generation framework that leverages the privileged information of 3D simulators to create spatially-rich video training data for multimodal language models. Using this framework, we investigate which properties of simulated data drive effective real-world transfer through systematic ablations of question types, mixes, and scales. We identify a minimal set of three question categories (metric measurement, perspective-dependent reasoning, and temporal tracking) that prove most effective for developing transferable spatial intelligence, outperforming comprehensive coverage despite using fewer question types. These insights enable highly efficient training: our 7B-parameter video LLM fine-tuned on just 25K simulated examples outperforms the larger 72B baseline and achieves competitive performance with proprietary models on rigorous real-world spatial reasoning benchmarks. Our approach demonstrates robust generalization, maintaining performance on general video understanding while showing substantial improvements on embodied and real-world spatial tasks.
- RDMA Point-to-Point Communication for LLM Systems
Emerging Large Language Model (LLM) system patterns, such as disaggregated inference, Mixture-of-Experts (MoE) routing, and asynchronous reinforcement fine-tuning, require flexible point-to-point communication beyond simple collectives. Existing implementations are locked to specific Network Interface Controllers (NICs), hindering integration into inference engines and portability across hardware providers. We present TransferEngine, which bridges the functionality of common NICs to expose a uniform interface. TransferEngine exposes one-sided WriteImm operations with a ImmCounter primitive for completion notification, without ordering assumptions of network transport, transparently managing multiple NICs per GPU. We demonstrate peak throughput of 400 Gbps on both NVIDIA ConnectX-7 and AWS Elastic Fabric Adapter (EFA). We showcase TransferEngine through three production systems: (1) KvCache transfer for disaggregated inference with dynamic scaling, (2) RL weight updates achieving 1.3 seconds for trillion-parameter models, and (3) MoE dispatch/combine implementation exceeding DeepEP decode latency on ConnectX-7, with the first viable latencies on EFA. We demonstrate that our portable point-to-point communication complements collectives while avoiding lock-in.
- EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.
- SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
We introduce SAIL-RL, a reinforcement learning (RL) post-training framework that enhances the reasoning capabilities of multimodal large language models (MLLMs) by teaching them when and how to think. Existing approaches are limited by outcome-only supervision, which rewards correct answers without ensuring sound reasoning, and by uniform thinking strategies, which often lead to overthinking on simple tasks and underthinking on complex ones. SAIL-RL addresses these challenges with a dual reward system: the Thinking Reward, which evaluates reasoning quality through factual grounding, logical coherence, and answer consistency, and the Judging Reward, which adaptively determines whether deep reasoning or direct answering is appropriate. Experiments on the state-of-the-art SAIL-VL2 show that SAIL-RL improves reasoning and multimodal understanding benchmarks at both 4B and 8B scales, achieving competitive performance against commercial closed-source models such as GPT-4o, and substantially reduces hallucinations, establishing it as a principled framework for building more reliable and adaptive MLLMs. The code will be available at https://github.com/BytedanceDouyinContent/SAIL-RL.
Solidot(15)
- 2023 年亚马逊湖泊温度飙升
根据发表在《科学》期刊上的一项研究,2023 年前所未有的热浪和干旱将亚马逊湖泊变成了浅层微沸盆地,其中一个湖泊的水温飙升至 40 摄氏度 (ºC) 以上,水位也骤降至历史最低点。极端高温的影响范围广泛:波及与外界隔绝的偏远河岸社区,并导致鱼类和濒危亚马逊河豚群体性死亡。这些发现证实了亚马逊地区缺乏监测的湖泊与河流的升温趋势令人担忧,预示着气候变化对全球热带淡水生态系统的影响日益加剧。研究人员分析了 2023 年亚马逊中部地区 10 个湖泊在干旱期间的水温测量数据。研究发现,10 个湖泊中有 5 个的日间水温高得异常:超过 37 ºC。其中特费湖(Lake Tefé)浅水区的水温在其深达 2 米的水体中飙升至 41ºC——比一般温泉浴的水温更高。亚马逊地区的湖泊一直在快速暖化——在过去 30 年左右的时间内,其升温速度约为每十年 0.3 至 0.8 ºC(比全球平均速度更快)。在 2024 年的亢旱之时,该地区许多湖泊面积也急剧缩小,其中特菲湖面积缩减了 75%,巴达霍斯湖(Badajós Lake)面积则锐减了 9 成。
- Mastodon 4.5 释出
Mastodon 去中心化微博平台释出了 v4.5 版本。主要新功能包括:支持引用帖子;原生表情符号(emoji)支持;增强服务器管理员的信息流管理和屏蔽功能,管理员可以将本服务器的信息流设为访客主页,屏蔽特定用户名,审核界面可以显示上下文,等等。
- 卢浮宫视频监控服务器的密码曾是“卢浮宫”
2025 年 10 月 19 日,位于巴黎市中心的卢浮宫发生珠宝盗窃案。当地时间早上约 9 点 30 分,阿波罗画廊存放的多件法国王冠珠宝被盗走,整个过程仅持续约 4-7 分钟。被盗珠宝价值约 8800 万欧元。盗匪伪装成建筑工人实施盗窃,期间还触发了警报,与保安对峙,但最后仍然扬长而去。事发后,卢浮宫松弛的安保措施引发了广泛关注。比如卢浮宫的视频监控服务器多年来一直使用“卢浮宫(Louvre)”为密码。法国国家网络安全局 (French National Cybersecurity Agency)在 2014 年应卢浮宫要求进行了一次渗透测试,安全专家轻松进入安全网络篡改了视频监控并修改了门禁卡权限。安全专家在报告中称卢浮宫的网络安全措施太薄弱,输入 Louvre 就能访问管理视频监控的服务器,输入 THALES 就能访问 Thales 公司开发的一个程序。文件还显示,2025 年卢浮宫仍在使用 2003 年购买的安全软件,该软件的开发商已不再提供支持,而该软件运行在 Windows Server 2003 上。
- 疑遭空间碎片撞击神舟二十号推迟返回
中国载人航天工程办公室发布消息,神舟二十号载人飞船疑似遭空间微小碎片撞击,正在进行影响分析和风险评估。为确保航天员生命健康安全和任务圆满成功,经研究决定,原计划 11 月 5 日实施的神舟二十号返回任务将推迟进行。官网没有提供更多信息,如疑似撞击点和损坏程度,也没有给出新返回日期的时间表。神舟二十号于 2025 年 4月 24 日发射升空,三名宇航员陈冬、陈中瑞和王杰已完成天宫空间站在轨 6 个月任务,11 月 4 日将空间站的控制权移交给新抵达的神舟二十一号乘组,原计划 11 月 5 日返回。
- Chrome 将于 2026 年 11 月移除对 XSLT 的支持
Chrome 官方博客宣布将于 2026 年 11 月 17 日发布 v155 时移除对 Extensible Stylesheet Language Transformations(XSLT) 的支持。Google 的解释是有助于改进安全,称 Firefox 和 WebKit 项目也都有类似的计划。XML 文档适合计算机读取但不适合人类阅读,XSLT 的目的就是将 XML 文档转换成其它更适合人类阅读的格式如 HTML。Chrome、Firefox、Safari 等主流浏览器都支持客户端 XSLT 渲染,但仅限于 1999 年 的 1.0 版本,而不是 2017 年最新的 3.0 版本。Google 早在 2013 年就表达了移除 XSLT 的想法,但没有付诸实施。今年的 WHATWG 会议正式将移除 XSLT 的提议加入了讨论议程。Google 开发者认为浏览器使用的 XSLT 的代码库已经老化,易受内存安全漏洞的影响,而且 XSLT 使用率非常低,每 7891 次页面加载只有一次涉及客户端 XSLT。
- 天文学家发现有史以来最亮的黑洞光爆发
根据发表在《Nature Astronomy》期间上的一项研究,当黑洞吞噬一颗质量至少为太阳 30 倍的恒星时,天文学家探测到了有史以来黑洞中最明亮的光爆发——其峰值亮度比太阳光高 10 万亿倍以上。当 2018年 天文学家第一次观测到这个天体时,他们并未意识到这是一个超级耀斑。在注意到天体亮度增强后,研究人员立即用帕罗玛山天文台的 200 英寸海耳望远镜瞄准了它。2023 年研究团队注意到,即使在 5 年后,这个耀斑仍然异常明亮。因此他们利用夏威夷凯克天文台进行了更深入的观测,结果显示,该天体距离地球约 300万千秒差距,即 100 亿光年。能在如此遥远的距离上看起来如此明亮,其发出的光必定是极其耀眼的。天文学家现在表示,这个耀斑的亮度是此前探测到的任何一次黑洞光爆发的 30 倍。研究人员认为合理的解释是,一颗大质量恒星在离黑洞过近时遭遇了厄运。当黑洞的引力将撕碎恒星时,它发出的光比之前要亮数十倍。他们还认为,由于耀斑还没有完全消失,这颗恒星可能还没有被黑洞完全吞噬。
- 43% 的 Z 世代偏爱 YouTube 和 TikTok 而非传统电视和流媒体
Activate Consulting 的调查显示,43% 的 Z 世代偏爱 YouTube 和 TikTok 而不是传统电视或付费流媒体。全球媒体收入大幅增长,而传统电视收视率则在暴跌,每个人在各个平台上消费内容的时间平均超过 13 个小时,而多任务处理让人人过着“一天 32 小时”的生活。调查还显示,时长 1-2 分钟的微短剧正在快速流行,2800 万美国成年人(52% 年龄在 18-34 岁之间)在消费这种新娱乐形式。调查预计到 2029 年全球互联网和媒体收入将增加 3880 亿美元,人们每天看流媒体视频的时长将增至 4 小时 8 分钟,观看传统电视的时长将降至 1 小时 17 分钟。流媒体收入(包括广告和订阅)将每年增长 18-19%,而传统电视收入将每年下降 4-6%。
- 中国要求国家资助数据中心使用国产 AI 芯片
中国发布指导方针,要求国家资助的新建数据中心使用国产 AI 芯片,完工进度低于 30% 的数据中心必须拆除所有已安装的外国芯片或取消采购计划;进度高于 30% 的数据中心则视个案而定。这可能是至今最强力的在关键基础中去除外国技术的举措。
- 法国将封禁希音网站
法国政府表示,将禁止希音(Shein)在该国运营,此前在这家快时尚零售商的线上平台上发现了儿童外形的性爱玩偶和大量武器在售。内政部长 Laurent Nuñez 周三提交法律申请,要求屏蔽希音网站,“以最终制止希音的缺点对公共秩序所造成的严重危害”。财政部表示,此举是在希音的市集平台上发现有“大量”第三方卖家上架A类武器之后作出的。它确认,这些武器包括砍刀、斧头和指节铜套。巴黎检方此前刚刚以希音销售“充气娃娃”涉嫌构成儿童色情为由展开调查。在法国,传播和持有儿童色情均属违法。近期甚至有日本足球协会(JFA)高层因在飞机上观看儿童色情而被判罪,相关执法正在严格化。
- 世界经济论坛主席警告三大泡沫
世界经济论坛主席 Borge Brende 周三表示,金融市场可能存在三大泡沫,世界应对此报以警惕。他表示,三大泡沫分别是:加密货币泡沫、AI 泡沫,以及债务泡沫。他说,政府的债务负担从未像现在这样沉重,这是自 1945 年以来从未有过的。Brende 表示 AI 虽然有可能大幅提高生产力,但也可能威胁到很多白领工作。最糟糕的情况下,有大量白领工资的大城市出现“铁锈地带”。
- 因 Mozilla 引入 AI 翻译日语本地化社区宣布关闭
因 Mozilla 引入 AI 翻译,日语本地化社区 SUMO 宣布终止运作。机器人翻译 sumobot 是在 10 月 22 日被引入去帮助撰写日语知识库文章,SUMO 社区发现它不遵守翻译指南;不尊重现有日语用户的本地化;对所有已存档知识库文章立即批准其直接翻译的英文机器翻译;在更新后需要 72 小时才能获批,培训新贡献者的工作无法展开;它的工作不经过社区的批准、控制和沟通;逾 300 篇知识库文章被 sumobot 覆写。SUMO 社区认为这是对他们工作的大规模破坏,负责人宣布不再为 support.mozilla.org 贡献内容,要求禁止将其翻译用于训练机器人和 AI,要求从 AI 的学习数据中移除所有翻译。
- 新 HDR10+ Advanced 将改进运动平滑
三星公布了 HDR10+ Advanced 的新细节,其中最令人感兴趣的功能是 HDR10+ Intelligent FRC(代表 frame rate conversion 或帧率转换),该功能旨在改进运动平滑。电视使用运动平滑分析视频的每一帧,判断如果视频的帧率匹配电视的刷新率那么如何插入额外的帧。一台启用运动平滑(或叫运动补偿或插帧)的电视,其刷新率是 60Hz,而电影的帧率是 24p,那么电视会尝试插帧让 24p 的电影看起来像是以 60p 的帧率拍摄的,让画面更平滑,消除抖动。但插帧技术的效果并不自然。Intelligent FRC 采用了一种更精细的运动平滑方法,允许内容创作者控制每个场景中使用的运动平滑级别,还允许根据环境光调整运动插值的强度。
- DRAM 芯片价格涨幅超过黄金
2025 年第三季度 DRAM 芯片合约价比去年同期飙升 171.8%,涨幅超过了黄金。ADATA(威刚科技)董事长陈立白表示,2025 年第四季度将标志着 DRAM 牛市的开始,他预计 2026 年将出现严重的供应短缺。内存制造商已将生产重心转向 RDIMM 和 HBM 等数据中心专用内存芯片,受此影响,消费级 DDR5 芯片产量出现下降。一套 Corsair Vengeance RGB 双通道 DDR5 内存套装 7 月份的售价为 91 美元,如今在 Newegg 上的售价已涨至 183 美元。内存芯片的价格上涨也波及到了 NAND 闪存和硬盘领域。分析师预测价格上涨的趋势至少会持续四年,匹配企业与供应商三星和 SK 海力士签订的供货合同期限。
- YouTube 删除了逾 700 部以色列侵犯人权纪录片
Google 旗下的视频网站 YouTube 证实,它遵守特朗普政府的命令,将遭到制裁的巴勒斯坦人权组织的账户及其视频全部删除。受影响的人权组织包括 Al-Haq,Al Mezan Center for Human Rights 和 Palestinian Centre for Human Rights,它们发布了逾 700 部以色列侵犯人权纪录片。海牙国际刑事法院今年初指控以色列总理内塔尼亚胡(Benjamin Netanyahu)和前国防部长加兰特(Yoav Gallant)在加沙犯有战争罪,特朗普政府随后制裁了国际刑事法院,微软则立即停用了国际刑事法院首席检察长的账户。今年九月,特朗普政府将巴勒斯坦人权组织列入制裁名单,理由是它们与国际刑事法院合作指控以色列官员犯有战争罪。YouTube 在 10 月初悄悄删除了相关组织的账户、频道和视频。Al Mezan 称其频道在没有收到提前通知的情况下于 10 月 7 日被关闭,Al-Haq 称其频道于 10 月 3 日关闭,理由是内容违反指导方针。人权组织谴责 YouTube 此举是助纣为虐。
- 安全公司员工被控发动勒索软件攻击
美国检方指控三名安全公司员工“监守自盗”:DigitalMint 公司的 Kevin Tyler Martin 和另一名未公布名字的员工,以及 Sygnia 公司的前事件响应经理 Ryan Clifford Goldberg。三人被控入侵企业,窃取敏感数据,部署 ALPHV/BlackCat 开发的勒索软件。ALPHV/BlackCat 采用的是勒索软件即服务模式,它提供勒索软件,由加盟成员——在本案中就是三名安全公司员工——通过入侵企业网络部署勒索软件,然后赎金留出部分给加盟者。DigitalMint 公司从事的就是与勒索软件黑帮谈判赎金的业务,它的两名遭到起诉的员工既充当了谈判者,又充当了赎金的分成者。检方指控他们攻击了至少五家美国企业,从其中一家获得了逾 120 万美元的赎金。