OrangeBot.AI Digest — 2025-11-08
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- IP Blocking the UK Is Not Enough to Comply with the Online Safety Act (prestonbyrne.com)
- I want you to understand Chicago (aphyr.com)
- I Want You to Understand Chicago (aphyr.com)
- Marko – A declarative, HTML‑based language (markojs.com)
- Always be ready to leave (even if you never do) (andreacanton.dev)
- Cloudflare scrubs Aisuru botnet from top domains list (krebsonsecurity.com)
- 52 Year old data tape could contain Unix history (www.theregister.com)
- Ticker: Don't die of heart disease (myticker.com)
- Btop: A better modern alternative of htop with a gamified interface (github.com)
- $1T in tech stocks sold off as market grows skeptical of AI (gizmodo.com)
- Study identifies weaknesses in how AI systems are evaluated (www.oii.ox.ac.uk)
- My friends and I accidentally faked the Ryzen 7 9700X3D leaks (old.reddit.com)
- Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (arxiv.org)
- Apple's "notarisation" – blocking software freedom of developers and users (fsfe.org)
- Cerebras Code now supports GLM 4.6 at 1000 tokens/sec (www.cerebras.ai)
GitHub Trending(15)
- usestrix / strix
✨ Open-source AI hackers for your apps 👨🏻💻
- umami-software / umami
Umami is a modern, privacy-focused alternative to Google Analytics.
- prometheus / alertmanager
Prometheus Alertmanager
- lima-vm / lima
Linux virtual machines, with a focus on running containers
- nocobase / nocobase
NocoBase is the most extensible AI-powered no-code/low-code platform for building business applications and enterprise solutions.
- dbeaver / dbeaver
Free universal database tool and SQL client
- localstack / localstack
💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline
- Shubhamsaboo / awesome-llm-apps
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
- 666ghj / BettaFish
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
- airweave-ai / airweave
Context retrieval for AI agents across apps and databases
- TodePond / GulfOfMexico
perfect programming language
- penpot / penpot
Penpot: The open-source design tool for design and code collaboration
- awslabs / mcp
AWS MCP Servers — helping you get the most out of AWS, wherever you use MCP.
- public-apis / public-apis
A collective list of free APIs
- thinking-machines-lab / tinker-cookbook
Post-training with Tinker
Hugging Face(15)
- Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
"Thinking with Text" and "Thinking with Images" paradigm significantly improve the reasoning ability of large language models (LLMs) and Vision Language Models (VLMs). However, these paradigms have inherent limitations. (1) Images capture only single moments and fail to represent dynamic processes or continuous changes, and (2) The separation of text and vision as distinct modalities, hindering unified multimodal understanding and generation. To overcome these limitations, we introduce "Thinking with Video", a new paradigm that leverages video generation models, such as Sora-2, to bridge visual and textual reasoning in a unified temporal framework. To support this exploration, we developed the Video Thinking Benchmark (VideoThinkBench). VideoThinkBench encompasses two task categories: (1) vision-centric tasks (e.g., Eyeballing Puzzles), and (2) text-centric tasks (e.g., subsets of GSM8K, MMMU). Our evaluation establishes Sora-2 as a capable reasoner. On vision-centric tasks, Sora-2 is generally comparable to state-of-the-art (SOTA) VLMs, and even surpasses VLMs on several tasks, such as Eyeballing Games. On text-centric tasks, Sora-2 achieves 92% accuracy on MATH, and 75.53% accuracy on MMMU. Furthermore, we systematically analyse the source of these abilities. We also find that self-consistency and in-context learning can improve Sora-2's performance. In summary, our findings demonstrate that the video generation model is the potential unified multimodal understanding and generation model, positions "thinking with video" as a unified multimodal reasoning paradigm.
- V-Thinker: Interactive Thinking with Images
Empowering Large Multimodal Models (LMMs) to deeply integrate image interaction with long-horizon reasoning capabilities remains a long-standing challenge in this field. Recent advances in vision-centric reasoning explore a promising "Thinking with Images" paradigm for LMMs, marking a shift from image-assisted reasoning to image-interactive thinking. While this milestone enables models to focus on fine-grained image regions, progress remains constrained by limited visual tool spaces and task-specific workflow designs. To bridge this gap, we present V-Thinker, a general-purpose multimodal reasoning assistant that enables interactive, vision-centric thinking through end-to-end reinforcement learning. V-Thinker comprises two key components: (1) a Data Evolution Flywheel that automatically synthesizes, evolves, and verifies interactive reasoning datasets across three dimensions-diversity, quality, and difficulty; and (2) a Visual Progressive Training Curriculum that first aligns perception via point-level supervision, then integrates interactive reasoning through a two-stage reinforcement learning framework. Furthermore, we introduce VTBench, an expert-verified benchmark targeting vision-centric interactive reasoning tasks. Extensive experiments demonstrate that V-Thinker consistently outperforms strong LMM-based baselines in both general and interactive reasoning scenarios, providing valuable insights for advancing image-interactive reasoning applications.
- Scaling Agent Learning via Experience Synthesis
While reinforcement learning (RL) can empower large language model (LLM) agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. Rather than relying on expensive real-environment rollouts, DreamGym distills environment dynamics into a reasoning-based experience model that derives consistent state transitions and feedback signals through step-by-step reasoning, enabling scalable agent rollout collection for RL. To improve the stability and quality of transitions, DreamGym leverages an experience replay buffer initialized with offline real-world data and continuously enriched with fresh interactions to actively support agent training. To improve knowledge acquisition, DreamGym adaptively generates new tasks that challenge the current agent policy, enabling more effective online curriculum learning. Experiments across diverse environments and agent backbones demonstrate that DreamGym substantially improves RL training, both in fully synthetic settings and in sim-to-real transfer scenarios. On non-RL-ready tasks like WebArena, DreamGym outperforms all baselines by over 30%. And in RL-ready but costly settings, it matches GRPO and PPO performance using only synthetic interactions. When transferring a policy trained purely on synthetic experiences to real-environment RL, DreamGym yields significant additional performance gains while requiring far fewer real-world interactions, providing a scalable warm-start strategy for general-purpose RL.
- Cambrian-S: Towards Spatial Supersensing in Video
We argue that progress in true multimodal intelligence calls for a shift from reactive, task-driven systems and brute-force long context towards a broader paradigm of supersensing. We frame spatial supersensing as four stages beyond linguistic-only understanding: semantic perception (naming what is seen), streaming event cognition (maintaining memory across continuous experiences), implicit 3D spatial cognition (inferring the world behind pixels), and predictive world modeling (creating internal models that filter and organize information). Current benchmarks largely test only the early stages, offering narrow coverage of spatial cognition and rarely challenging models in ways that require true world modeling. To drive progress in spatial supersensing, we present VSI-SUPER, a two-part benchmark: VSR (long-horizon visual spatial recall) and VSC (continual visual spatial counting). These tasks require arbitrarily long video inputs yet are resistant to brute-force context expansion. We then test data scaling limits by curating VSI-590K and training Cambrian-S, achieving +30% absolute improvement on VSI-Bench without sacrificing general capabilities. Yet performance on VSI-SUPER remains limited, indicating that scale alone is insufficient for spatial supersensing. We propose predictive sensing as a path forward, presenting a proof-of-concept in which a self-supervised next-latent-frame predictor leverages surprise (prediction error) to drive memory and event segmentation. On VSI-SUPER, this approach substantially outperforms leading proprietary baselines, showing that spatial supersensing requires models that not only see but also anticipate, select, and organize experience.
- NVIDIA Nemotron Nano V2 VL
We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and training recipes. Nemotron Nano V2 VL builds on Nemotron Nano V2, a hybrid Mamba-Transformer LLM, and innovative token reduction techniques to achieve higher inference throughput in long document and video scenarios. We are releasing model checkpoints in BF16, FP8, and FP4 formats and sharing large parts of our datasets, recipes and training code.
- GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
We introduce GUI-360^circ, a large-scale, comprehensive dataset and benchmark suite designed to advance computer-using agents (CUAs). CUAs present unique challenges and is constrained by three persistent gaps: a scarcity of real-world CUA tasks, the lack of automated collection-and-annotation pipelines for multi-modal trajectories, and the absence of a unified benchmark that jointly evaluates GUI grounding, screen parsing, and action prediction. GUI-360^circ addresses these gaps with an LLM-augmented, largely automated pipeline for query sourcing, environment-template construction, task instantiation, batched execution, and LLM-driven quality filtering. The released corpus contains over 1.2M executed action steps across thousands of trajectories in popular Windows office applications, and includes full-resolution screenshots, accessibility metadata when available, instantiated goals, intermediate reasoning traces, and both successful and failed action trajectories. The dataset supports three canonical tasks, GUI grounding, screen parsing, and action prediction, and a hybrid GUI+API action space that reflects modern agent designs. Benchmarking state-of-the-art vision--language models on GUI-360^circ reveals substantial out-of-the-box shortcomings in grounding and action prediction; supervised fine-tuning and reinforcement learning yield significant gains but do not close the gap to human-level reliability. We release GUI-360^circ and accompanying code to facilitate reproducible research and accelerate progress on robust desktop CUAs. The full dataset has been made public on https://huggingface.co/datasets/vyokky/GUI-360.
- Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to test-set leakage. While prior works have proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for contaminated VLMs remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. We then propose a novel simple yet effective detection method based on multi-modal semantic perturbation, demonstrating that contaminated models fail to generalize under controlled perturbations. Finally, we validate our approach across multiple realistic contamination strategies, confirming its robustness and effectiveness. The code and perturbed dataset will be released publicly.
- The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms
The strong lottery ticket hypothesis (SLTH) conjectures that high-performing subnetworks, called strong lottery tickets (SLTs), are hidden in randomly initialized neural networks. Although recent theoretical studies have established the SLTH across various neural architectures, the SLTH for transformer architectures still lacks theoretical understanding. In particular, the current theory of the SLTH does not yet account for the multi-head attention (MHA) mechanism, a core component of transformers. To address this gap, we introduce a theoretical analysis of the existence of SLTs within MHAs. We prove that, if a randomly initialized MHA of H heads and input dimension d has the hidden dimension O(dlog(Hd^{3/2})) for the key and value, it contains an SLT that approximates an arbitrary MHA with the same input dimension with high probability. Furthermore, by leveraging this theory for MHAs, we extend the SLTH to transformers without normalization layers. We empirically validate our theoretical findings, demonstrating that the approximation error between the SLT within a source model (MHA and transformer) and an approximate target counterpart decreases exponentially by increasing the hidden dimension of the source model.
- Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Robust benchmarks are crucial for evaluating Multimodal Large Language Models (MLLMs). Yet we find that models can ace many multimodal benchmarks without strong visual understanding, instead exploiting biases, linguistic priors, and superficial patterns. This is especially problematic for vision-centric benchmarks that are meant to require visual inputs. We adopt a diagnostic principle for benchmark design: if a benchmark can be gamed, it will be. Designers should therefore try to ``game'' their own benchmarks first, using diagnostic and debiasing procedures to systematically identify and mitigate non-visual biases. Effective diagnosis requires directly ``training on the test set'' -- probing the released test set for its intrinsic, exploitable patterns. We operationalize this standard with two components. First, we diagnose benchmark susceptibility using a ``Test-set Stress-Test'' (TsT) methodology. Our primary diagnostic tool involves fine-tuning a powerful Large Language Model via k-fold cross-validation on exclusively the non-visual, textual inputs of the test set to reveal shortcut performance and assign each sample a bias score s(x). We complement this with a lightweight Random Forest-based diagnostic operating on hand-crafted features for fast, interpretable auditing. Second, we debias benchmarks by filtering high-bias samples using an ``Iterative Bias Pruning'' (IBP) procedure. Applying this framework to four benchmarks -- VSI-Bench, CV-Bench, MMMU, and VideoMME -- we uncover pervasive non-visual biases. As a case study, we apply our full framework to create VSI-Bench-Debiased, demonstrating reduced non-visual solvability and a wider vision-blind performance gap than the original.
- SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
Despite impressive high-level video comprehension, multimodal language models struggle with spatial reasoning across time and space. While current spatial training approaches rely on real-world video data, obtaining diverse footage with precise spatial annotations remains a bottleneck. To alleviate this bottleneck, we present SIMS-V -- a systematic data-generation framework that leverages the privileged information of 3D simulators to create spatially-rich video training data for multimodal language models. Using this framework, we investigate which properties of simulated data drive effective real-world transfer through systematic ablations of question types, mixes, and scales. We identify a minimal set of three question categories (metric measurement, perspective-dependent reasoning, and temporal tracking) that prove most effective for developing transferable spatial intelligence, outperforming comprehensive coverage despite using fewer question types. These insights enable highly efficient training: our 7B-parameter video LLM fine-tuned on just 25K simulated examples outperforms the larger 72B baseline and achieves competitive performance with proprietary models on rigorous real-world spatial reasoning benchmarks. Our approach demonstrates robust generalization, maintaining performance on general video understanding while showing substantial improvements on embodied and real-world spatial tasks.
- How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
Automatic evaluation of speech-to-text translation (ST) systems is typically performed by comparing translation hypotheses with one or more reference translations. While effective to some extent, this approach inherits the limitation of reference-based evaluation that ignores valuable information from the source input. In machine translation (MT), recent progress has shown that neural metrics incorporating the source text achieve stronger correlation with human judgments. Extending this idea to ST, however, is not trivial because the source is audio rather than text, and reliable transcripts or alignments between source and references are often unavailable. In this work, we conduct the first systematic study of source-aware metrics for ST, with a particular focus on real-world operating conditions where source transcripts are not available. We explore two complementary strategies for generating textual proxies of the input audio, automatic speech recognition (ASR) transcripts, and back-translations of the reference translation, and introduce a novel two-step cross-lingual re-segmentation algorithm to address the alignment mismatch between synthetic sources and reference translations. Our experiments, carried out on two ST benchmarks covering 79 language pairs and six ST systems with diverse architectures and performance levels, show that ASR transcripts constitute a more reliable synthetic source than back-translations when word error rate is below 20%, while back-translations always represent a computationally cheaper but still effective alternative. Furthermore, our cross-lingual re-segmentation algorithm enables robust use of source-aware MT metrics in ST evaluation, paving the way toward more accurate and principled evaluation methodologies for speech translation.
- Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots
Humanoid soccer poses a representative challenge for embodied intelligence, requiring robots to operate within a tightly coupled perception-action loop. However, existing systems typically rely on decoupled modules, resulting in delayed responses and incoherent behaviors in dynamic environments, while real-world perceptual limitations further exacerbate these issues. In this work, we present a unified reinforcement learning-based controller that enables humanoid robots to acquire reactive soccer skills through the direct integration of visual perception and motion control. Our approach extends Adversarial Motion Priors to perceptual settings in real-world dynamic environments, bridging motion imitation and visually grounded dynamic control. We introduce an encoder-decoder architecture combined with a virtual perception system that models real-world visual characteristics, allowing the policy to recover privileged states from imperfect observations and establish active coordination between perception and action. The resulting controller demonstrates strong reactivity, consistently executing coherent and robust soccer behaviors across various scenarios, including real RoboCup matches.
- RDMA Point-to-Point Communication for LLM Systems
Emerging Large Language Model (LLM) system patterns, such as disaggregated inference, Mixture-of-Experts (MoE) routing, and asynchronous reinforcement fine-tuning, require flexible point-to-point communication beyond simple collectives. Existing implementations are locked to specific Network Interface Controllers (NICs), hindering integration into inference engines and portability across hardware providers. We present TransferEngine, which bridges the functionality of common NICs to expose a uniform interface. TransferEngine exposes one-sided WriteImm operations with a ImmCounter primitive for completion notification, without ordering assumptions of network transport, transparently managing multiple NICs per GPU. We demonstrate peak throughput of 400 Gbps on both NVIDIA ConnectX-7 and AWS Elastic Fabric Adapter (EFA). We showcase TransferEngine through three production systems: (1) KvCache transfer for disaggregated inference with dynamic scaling, (2) RL weight updates achieving 1.3 seconds for trillion-parameter models, and (3) MoE dispatch/combine implementation exceeding DeepEP decode latency on ConnectX-7, with the first viable latencies on EFA. We demonstrate that our portable point-to-point communication complements collectives while avoiding lock-in.
- SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
We introduce SAIL-RL, a reinforcement learning (RL) post-training framework that enhances the reasoning capabilities of multimodal large language models (MLLMs) by teaching them when and how to think. Existing approaches are limited by outcome-only supervision, which rewards correct answers without ensuring sound reasoning, and by uniform thinking strategies, which often lead to overthinking on simple tasks and underthinking on complex ones. SAIL-RL addresses these challenges with a dual reward system: the Thinking Reward, which evaluates reasoning quality through factual grounding, logical coherence, and answer consistency, and the Judging Reward, which adaptively determines whether deep reasoning or direct answering is appropriate. Experiments on the state-of-the-art SAIL-VL2 show that SAIL-RL improves reasoning and multimodal understanding benchmarks at both 4B and 8B scales, achieving competitive performance against commercial closed-source models such as GPT-4o, and substantially reduces hallucinations, establishing it as a principled framework for building more reliable and adaptive MLLMs. The code will be available at https://github.com/BytedanceDouyinContent/SAIL-RL.
- EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
We propose EVTAR, an End-to-End Virtual Try-on model with Additional Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance try-on accuracy. Most existing virtual try-on approaches rely on complex inputs such as agnostic person images, human pose, densepose, or body keypoints, making them labor-intensive and impractical for real-world applications. In contrast, EVTAR adopts a two-stage training strategy, enabling simple inference with only the source image and the target garment inputs. Our model generates try-on results without masks, densepose, or segmentation maps. Moreover, EVTAR leverages additional reference images of different individuals wearing the same clothes to preserve garment texture and fine-grained details better. This mechanism is analogous to how humans consider reference models when choosing outfits, thereby simulating a more realistic and high-quality dressing effect. We enrich the training data with supplementary references and unpaired person images to support these capabilities. We evaluate EVTAR on two widely used benchmarks and diverse tasks, and the results consistently validate the effectiveness of our approach.
Solidot(15)
- 柯林斯词典的年度词是 Vibe Coding
柯林斯词典(vibe coding)的年度词是 Vibe Coding。Vibe Coding 这一术语由 OpenAI 联合创始人 Andrej Karpathy 在今年 2 月创造,意思是开发者不是自己写代码而是通过向 AI 聊天机器人描述需求去创造应用或网站。Vibe Coding 风靡一时,但很多人已经发现它并不能保证代码能正常运行或没有 bug。柯林斯词典总经理 Alex Beecroft 表示,该词完美诠释了语言随技术发展如何演变。其它上榜的词包括:Biohacking,通过改变人体自然生理过程改善健康和延寿的活动;Coolcation,在凉爽的地方度假;Glaze,过度或不恰当的赞美或奉承一个人;Henry,“high earner, not rich yet”的缩写,高收入但尚未积累大量财富的人;Micro-retirement,在两份工作之间安排追求个人兴趣的休息期;Taskmasking,假装高效工作。
- VLC 总裁 Jean-Baptiste Kempf 获欧洲自由软件奖
VLC 总裁兼项目核心开发者 Jean-Baptiste Kempf 获得了欧洲自由软件奖,以表彰他在 VLC 项目上的长期贡献。VLC 诞生于 1996 年,最初是一个学生项目,如今已发展成为全球最流行媒体播放器之一,用户数以十亿计。Jean-Baptiste Kempf 在学生时代参与了 VLC 项目,在最早一批的开发者毕业项目面临死亡时,他接过了重担。他与其他核心开发者一起创造了我们今天所依赖于的播放器。
- 美国企业在裁员近百万的同时利润创历史新高
美国企业今年至今裁员近百万,但与此同时企业利润增长和股市都创新高。投资研究公司 Alpine Macro 的首席全球策略师 Chen Zhao 将这种企业利润飙升和大规模裁员之间的脱节现象形容为“无就业繁荣(jobless boom)”。加速裁员通常发生在企业盈利能力下降需要削减成本的情况之下。Zhao 称这种现象以前从未看到过,与以往的历史剧本完全不同,亚马逊利润丰厚但却裁员三万人,这非常令人不解。他认为原因可能是 AI 提高了生产力降低了成本。但不是所有人认为 AI 是裁员潮的罪魁祸首,软件公司 Bullhorn 的 CEO Art Papas 认为大规模裁员是企业是在疫情期间过度招聘后进行的调整。
- James D. Watson 去世,享年 97 岁
与 Francis Crick 共同发现 DNA 双螺旋结构并因此获得 1962 年诺贝尔生理学或医学奖的分子生物学家 James D. Watson 去世,享年 97 岁,其子 Duncan 称父亲本周因感染去医院接受治疗,之后转入了临终关怀中心,周四在纽约长岛的临终关怀中心去世。DNA 双螺旋结构被认为是科学史上最重要的发现之一,但也陷入了盗用 Rosalind Elsie Franklin 的 DNA 晶体衍射图片的争议。
- Meta 靠诈骗广告收入去资助 AI
路透发表的长篇调查报告披露,Meta 在明知广告来自骗子的情况下仍然将诈骗广告投放给最可能点击的用户,以此获取广告收入去资助发展 AI。内部文件显示,Meta 不愿意立即删除骗子的账户,允许“高价值账户”积累逾 500 次违规,骗子的违规记录越多 Meta 收取的广告费率越高,文件显示 Meta 通过提高广告费率去“惩罚”骗子。Meta 内部估计,其用户每天会收到 150 亿条“高风险”诈骗广告——即兜售虚假投资计划或假冒产品的广告。2024 年约有 160 亿美元(占总收入的十分之一)来自诈骗广告,其中 70 亿美元来自“高风险”广告。Meta 打击力度最大的是冒充名人或知名品牌的诈骗广告类型,原因是它担心会导致广告投放或互动停止。内部文件认为骗子在 Meta 平台上投放诈骗广告比在 Google 上更容易,承认 Google 在消除欺诈上比它做得更好。文件还显示,Meta 不允许导致公司收入损失逾 0.15% 的清除诈骗行动,它还经常无视用户举报的诈骗活动。
- 亚马逊测试 AI 翻译工具,自动将图书翻译到其它语言
亚马逊推出能自动将整本书翻译到其它语言的 AI 工具。名为 Kindle Translate 的工具以 beta 版形式提供给使用 Kindle Direct Publishing 平台的部分作者,支持英语西班牙语互译以及德语英语互译,亚马逊表示未来会支持更多语言。使用 AI 工具翻译的书会有明确的 Kindle Translate 标签以此提醒消费者注意。AI 翻译通常无法提供高质量的译文,且可能会存在所谓的幻觉——也就是错译——的问题,亚马逊表示译稿在发布前会通过准确性的自动评估,作者也可以对译文进行预览,但他们未必精通被翻译的语言。
- 俄罗斯东方航天发射因拖欠电费被切断电力供应
为了减少对哈萨克斯坦拜科努尔航天发射场的依赖,俄罗斯在位于远东的阿穆尔州建造了东方航天发射场。发射场于 2011 年开始动工建造,2016 年启用了第一个发射台执行了首次发射,2024 年启用了第二个发射台完成了首次火箭发射。俄罗斯计划建造 7 个发射台,目前建造工作还在进行之中。东方航天发射场一直被腐败和拖延工资等丑闻困扰,最新的丑闻是主承包商 Kazan Open Stock Company 拖欠了高达 62.7 万美元的电费,能源公司 Far Eastern Energy Company 随后切断了建造区域的电力供应,计划提起诉讼,寻求其宣告破产。发射场运营商表示运营中的两个发射台未受到此次纠纷的影响。
- FBI 想知道匿名存档网站拥有者的身份
FBI 向加拿大域名注册商 Tucows 发出传票,想要知道匿名存档网站 Archive.today aka archive.is 和 archive.ph 的拥有者的身份。Archive.today 是流行的存档网站,被广泛用于绕过付费墙。传票称与一项联邦刑事调查有关,但未提供任何细节。Archive.today 诞生于 2010 年代初,在 GamerGate 事件期间声名鹊起,它至今保存了数亿网页。FBI 要求 Tucows 提供其客户的姓名、地址、账单信息、电话记录、支付方式、互联网连接会话时间和设备标识符。对于网站的拥有者,有传言认为是俄罗斯人。
- SAS 退出中国大陆
美国软件公司 SAS Institute 上周四通知员工将退出中国大陆,结束该公司在中国长达 25 年的运营。SAS 还举行了一次简短的视频会议,高管在视频中对员工表达了感谢,称退出中国大陆的理由是“组织优化”。SAS 发言人周五表示该公司将通过第三方合作伙伴在中国大陆继续开展业务。SAS 在中国大陆裁掉了了约 400 个工作岗位,要求每位员工在 11 月 14 日前签署离职协议。受影响的员工将获得补偿,每工作一年获得一个月薪水,外加两个月薪水、年度奖金以及截至今年年底的薪水。SAS 的简体中文网站已下线。在这之前,戴尔、美光和 IBM 都在中国大陆大规模裁员。SAS 于 1999 年进入中国大陆,2005 年在北京设立研发中心。
- 月之暗面发布万亿参数的推理模型 Kimi K2 Thinking
北京月之暗面发布了万亿参数的推理模型 Kimi K2 Thinking。月之暗面声称其模型在“智能体”能力上超越了 OpenAI 的 ChatGPT,在 Humanity's Last Exam (HLE)、BrowseCom 等测试中表现最出色,在推理、智能搜索、编程、写作和通用能力上显著提升。模型无需人工干预即可执行 200-300 次连续工具调用,通过数百个步骤的连续推理去解决复杂问题。相比 OpenAI 等公司高达数十亿美元的模型训练成本,Kimi K2 Thinking 模型的训练成本据报道仅仅为 460 万美元。
- 2023 年亚马逊湖泊温度飙升
根据发表在《科学》期刊上的一项研究,2023 年前所未有的热浪和干旱将亚马逊湖泊变成了浅层微沸盆地,其中一个湖泊的水温飙升至 40 摄氏度 (ºC) 以上,水位也骤降至历史最低点。极端高温的影响范围广泛:波及与外界隔绝的偏远河岸社区,并导致鱼类和濒危亚马逊河豚群体性死亡。这些发现证实了亚马逊地区缺乏监测的湖泊与河流的升温趋势令人担忧,预示着气候变化对全球热带淡水生态系统的影响日益加剧。研究人员分析了 2023 年亚马逊中部地区 10 个湖泊在干旱期间的水温测量数据。研究发现,10 个湖泊中有 5 个的日间水温高得异常:超过 37 ºC。其中特费湖(Lake Tefé)浅水区的水温在其深达 2 米的水体中飙升至 41ºC——比一般温泉浴的水温更高。亚马逊地区的湖泊一直在快速暖化——在过去 30 年左右的时间内,其升温速度约为每十年 0.3 至 0.8 ºC(比全球平均速度更快)。在 2024 年的亢旱之时,该地区许多湖泊面积也急剧缩小,其中特菲湖面积缩减了 75%,巴达霍斯湖(Badajós Lake)面积则锐减了 9 成。
- Mastodon 4.5 释出
Mastodon 去中心化微博平台释出了 v4.5 版本。主要新功能包括:支持引用帖子;原生表情符号(emoji)支持;增强服务器管理员的信息流管理和屏蔽功能,管理员可以将本服务器的信息流设为访客主页,屏蔽特定用户名,审核界面可以显示上下文,等等。
- 卢浮宫视频监控服务器的密码曾是“卢浮宫”
2025 年 10 月 19 日,位于巴黎市中心的卢浮宫发生珠宝盗窃案。当地时间早上约 9 点 30 分,阿波罗画廊存放的多件法国王冠珠宝被盗走,整个过程仅持续约 4-7 分钟。被盗珠宝价值约 8800 万欧元。盗匪伪装成建筑工人实施盗窃,期间还触发了警报,与保安对峙,但最后仍然扬长而去。事发后,卢浮宫松弛的安保措施引发了广泛关注。比如卢浮宫的视频监控服务器多年来一直使用“卢浮宫(Louvre)”为密码。法国国家网络安全局 (French National Cybersecurity Agency)在 2014 年应卢浮宫要求进行了一次渗透测试,安全专家轻松进入安全网络篡改了视频监控并修改了门禁卡权限。安全专家在报告中称卢浮宫的网络安全措施太薄弱,输入 Louvre 就能访问管理视频监控的服务器,输入 THALES 就能访问 Thales 公司开发的一个程序。文件还显示,2025 年卢浮宫仍在使用 2003 年购买的安全软件,该软件的开发商已不再提供支持,而该软件运行在 Windows Server 2003 上。
- 疑遭空间碎片撞击神舟二十号推迟返回
中国载人航天工程办公室发布消息,神舟二十号载人飞船疑似遭空间微小碎片撞击,正在进行影响分析和风险评估。为确保航天员生命健康安全和任务圆满成功,经研究决定,原计划 11 月 5 日实施的神舟二十号返回任务将推迟进行。官网没有提供更多信息,如疑似撞击点和损坏程度,也没有给出新返回日期的时间表。神舟二十号于 2025 年 4月 24 日发射升空,三名宇航员陈冬、陈中瑞和王杰已完成天宫空间站在轨 6 个月任务,11 月 4 日将空间站的控制权移交给新抵达的神舟二十一号乘组,原计划 11 月 5 日返回。
- Chrome 将于 2026 年 11 月移除对 XSLT 的支持
Chrome 官方博客宣布将于 2026 年 11 月 17 日发布 v155 时移除对 Extensible Stylesheet Language Transformations(XSLT) 的支持。Google 的解释是有助于改进安全,称 Firefox 和 WebKit 项目也都有类似的计划。XML 文档适合计算机读取但不适合人类阅读,XSLT 的目的就是将 XML 文档转换成其它更适合人类阅读的格式如 HTML。Chrome、Firefox、Safari 等主流浏览器都支持客户端 XSLT 渲染,但仅限于 1999 年 的 1.0 版本,而不是 2017 年最新的 3.0 版本。Google 早在 2013 年就表达了移除 XSLT 的想法,但没有付诸实施。今年的 WHATWG 会议正式将移除 XSLT 的提议加入了讨论议程。Google 开发者认为浏览器使用的 XSLT 的代码库已经老化,易受内存安全漏洞的影响,而且 XSLT 使用率非常低,每 7891 次页面加载只有一次涉及客户端 XSLT。