OrangeBot.AI Digest — 2025-12-30
58 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Mitsubishi Diatone D-160 (1985) (audio-database.com)
- Everything as code: How we manage our company in one monorepo (www.kasava.dev)
- A faster heart for F-Droid. Our new server is here (f-droid.org)
- FediMeteo: A €4 FreeBSD VPS Became a Global Weather Service (it-notes.dragas.net)
- A Vulnerability in Libsodium (00f.net)
- Show HN: 22 GB of Hacker News in SQLite (hackerbook.dosaygo.com)
- Loss32: Let's Build a Win32/Linux (loss32.org)
- Reverse Engineering a Mysterious UDP Stream in My Hotel (2016) (www.gkbrk.com)
- Public Sans – A strong, neutral typeface (public-sans.digital.gov)
- Win32 is the stable Linux ABI (loss32.org)
- No strcpy either (daniel.haxx.se)
- Times New American: A Tale of Two Fonts (hsu.cy)
- Non-Zero-Sum Games (nonzerosum.games)
- Nicolas Guillou, French ICC judge sanctioned by the US and “debanked” (www.lemonde.fr)
- Go away Python (lorentz.app)
GitHub Trending(13)
- BloopAI / vibe-kanban
Get 10X more out of Claude Code, Codex or any coding agent
- x1xhlol / system-prompts-and-models-of-ai-tools
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI, VSCode Agent, Warp.dev, Windsurf, Xcode, Z.ai Code, Dia & v0. (And other Open Sourced) System Prompts, Internal Tools & AI Models
- QuantConnect / Lean
Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
- jrouwe / JoltPhysics
A multi core friendly rigid body physics and collision detection library. Written in C++. Suitable for games and VR applications. Used by Horizon Forbidden West.
- timescale / pg-aiguide
MCP server and Claude plugin for Postgres skills and documentation. Helps AI coding tools generate better PostgreSQL code.
- resemble-ai / chatterbox
SoTA open-source TTS
- RustPython / RustPython
A Python Interpreter written in Rust
- sinelaw / fresh
Text editor for your terminal: easy, powerful and fast
- alexta69 / metube
Self-hosted YouTube downloader (web UI for youtube-dl / yt-dlp)
- anthropics / skills
Public repository for Agent Skills
- cjpais / Handy
A free, open source, and extensible speech-to-text application that works completely offline.
- sst / opencode
The open source coding agent.
- louislam / uptime-kuma
A fancy self-hosted monitoring tool
Hugging Face(15)
- Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Mixture-of-Experts (MoE) models lack explicit constraints to ensure the router's decisions align well with the experts' capabilities, which ultimately limits model performance. To address this, we propose expert-router coupling (ERC) loss, a lightweight auxiliary loss that tightly couples the router's decisions with expert capabilities. Our approach treats each expert's router embedding as a proxy token for the tokens assigned to that expert, and feeds perturbed router embeddings through the experts to obtain internal activations. The ERC loss enforces two constraints on these activations: (1) Each expert must exhibit higher activation for its own proxy token than for the proxy tokens of any other expert. (2) Each proxy token must elicit stronger activation from its corresponding expert than from any other expert. These constraints jointly ensure that each router embedding faithfully represents its corresponding expert's capability, while each expert specializes in processing the tokens actually routed to it. The ERC loss is computationally efficient, operating only on n^2 activations, where n is the number of experts. This represents a fixed cost independent of batch size, unlike prior coupling methods that scale with the number of tokens (often millions per batch). Through pre-training MoE-LLMs ranging from 3B to 15B parameters and extensive analysis on trillions of tokens, we demonstrate the effectiveness of the ERC loss. Moreover, the ERC loss offers flexible control and quantitative tracking of expert specialization levels during training, providing valuable insights into MoEs.
- LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Real-time video generation via diffusion is essential for building general-purpose multimodal interactive AI systems. However, the simultaneous denoising of all video frames with bidirectional attention via an iterative process in diffusion models prevents real-time interaction. While existing distillation methods can make the model autoregressive and reduce sampling steps to mitigate this, they focus primarily on text-to-video generation, leaving the human-AI interaction unnatural and less efficient. This paper targets real-time interactive video diffusion conditioned on a multimodal context, including text, image, and audio, to bridge the gap. Given the observation that the leading on-policy distillation approach Self Forcing encounters challenges (visual artifacts like flickering, black frames, and quality degradation) with multimodal conditioning, we investigate an improved distillation recipe with emphasis on the quality of condition inputs as well as the initialization and schedule for the on-policy optimization. On benchmarks for multimodal-conditioned (audio, image, and text) avatar video generation including HDTF, AVSpeech, and CelebV-HQ, our distilled model matches the visual quality of the full-step, bidirectional baselines of similar or larger size with 20x less inference cost and latency. Further, we integrate our model with audio language models and long-form video inference technique Anchor-Heavy Identity Sinks to build LiveTalk, a real-time multimodal interactive avatar system. System-level evaluation on our curated multi-turn interaction benchmark shows LiveTalk outperforms state-of-the-art models (Sora2, Veo3) in multi-turn video coherence and content quality, while reducing response latency from 1 to 2 minutes to real-time generation, enabling seamless human-AI multimodal interaction.
- Yume-1.5: A Text-Controlled Interactive World Generation Model
Recent approaches have demonstrated the promise of using diffusion models to generate interactive and explorable worlds. However, most of these methods face critical challenges such as excessively large parameter sizes, reliance on lengthy inference steps, and rapidly growing historical context, which severely limit real-time performance and lack text-controlled generation capabilities. To address these challenges, we propose \method, a novel framework designed to generate realistic, interactive, and continuous worlds from a single image or text prompt. \method achieves this through a carefully designed framework that supports keyboard-based exploration of the generated worlds. The framework comprises three core components: (1) a long-video generation framework integrating unified context compression with linear attention; (2) a real-time streaming acceleration strategy powered by bidirectional attention distillation and an enhanced text embedding scheme; (3) a text-controlled method for generating world events. We have provided the codebase in the supplementary material.
- SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B.
- Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Transparent objects remain notoriously hard for perception systems: refraction, reflection and transmission break the assumptions behind stereo, ToF and purely discriminative monocular depth, causing holes and temporally unstable estimates. Our key observation is that modern video diffusion models already synthesize convincing transparent phenomena, suggesting they have internalized the optical rules. We build TransPhy3D, a synthetic video corpus of transparent/reflective scenes: 11k sequences rendered with Blender/Cycles. Scenes are assembled from a curated bank of category-rich static assets and shape-rich procedural assets paired with glass/plastic/metal materials. We render RGB + depth + normals with physically based ray tracing and OptiX denoising. Starting from a large video diffusion model, we learn a video-to-video translator for depth (and normals) via lightweight LoRA adapters. During training we concatenate RGB and (noisy) depth latents in the DiT backbone and co-train on TransPhy3D and existing frame-wise synthetic datasets, yielding temporally consistent predictions for arbitrary-length input videos. The resulting model, DKT, achieves zero-shot SOTA on real and synthetic video benchmarks involving transparency: ClearPose, DREDS (CatKnown/CatNovel), and TransPhy3D-Test. It improves accuracy and temporal consistency over strong image/video baselines, and a normal variant sets the best video normal estimation results on ClearPose. A compact 1.3B version runs at ~0.17 s/frame. Integrated into a grasping stack, DKT's depth boosts success rates across translucent, reflective and diffuse surfaces, outperforming prior estimators. Together, these results support a broader claim: "Diffusion knows transparency." Generative video priors can be repurposed, efficiently and label-free, into robust, temporally coherent perception for challenging real-world manipulation.
- Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) that enhances detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR, reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. Project page: https://jamichss.github.io/stream-diffvsr-project-page/
- SpotEdit: Selective Region Editing in Diffusion Transformers
Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.
- Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
While autoregressive Large Vision-Language Models (VLMs) have achieved remarkable success, their sequential generation often limits their efficacy in complex visual planning and dynamic robotic control. In this work, we investigate the potential of constructing Vision-Language Models upon diffusion-based large language models (dLLMs) to overcome these limitations. We introduce Dream-VL, an open diffusion-based VLM (dVLM) that achieves state-of-the-art performance among previous dVLMs. Dream-VL is comparable to top-tier AR-based VLMs trained on open data on various benchmarks but exhibits superior potential when applied to visual planning tasks. Building upon Dream-VL, we introduce Dream-VLA, a dLLM-based Vision-Language-Action model (dVLA) developed through continuous pre-training on open robotic datasets. We demonstrate that the natively bidirectional nature of this diffusion backbone serves as a superior foundation for VLA tasks, inherently suited for action chunking and parallel generation, leading to significantly faster convergence in downstream fine-tuning. Dream-VLA achieves top-tier performance of 97.2% average success rate on LIBERO, 71.4% overall average on SimplerEnv-Bridge, and 60.5% overall average on SimplerEnv-Fractal, surpassing leading models such as π_0 and GR00T-N1. We also validate that dVLMs surpass AR baselines on downstream tasks across different training objectives. We release both Dream-VL and Dream-VLA to facilitate further research in the community.
- GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models
The text encoder is a critical component of text-to-image and text-to-video diffusion models, fundamentally determining the semantic fidelity of the generated content. However, its development has been hindered by two major challenges: the lack of an efficient evaluation framework that reliably predicts downstream generation performance, and the difficulty of effectively adapting pretrained language models for visual synthesis. To address these issues, we introduce GRAN-TED, a paradigm to Generate Robust, Aligned, and Nuanced Text Embeddings for Diffusion models. Our contribution is twofold. First, we propose TED-6K, a novel text-only benchmark that enables efficient and robust assessment of an encoder's representational quality without requiring costly end-to-end model training. We demonstrate that performance on TED-6K, standardized via a lightweight, unified adapter, strongly correlates with an encoder's effectiveness in downstream generation tasks. Notably, under our experimental setup, compared with training a diffusion model from scratch, evaluating with TED-6K is about 750times faster. Second, guided by this validated framework, we develop a superior text encoder using a novel two-stage training paradigm. This process involves an initial fine-tuning stage on a Multimodal Large Language Model for better visual representation, followed by a layer-wise weighting method to extract more nuanced and potent text features. Our experiments show that the resulting GRAN-TED encoder not only achieves state-of-the-art performance on TED-6K but also leads to demonstrable performance gains in text-to-image and text-to-video generation. Our TED-6K dataset and evaluation code are available at the following link: https://anonymous.4open.science/r/GRAN-TED-4FCC/.
- Act2Goal: From World Model To General Goal-conditioned Policy
Specifying robotic manipulation tasks in a manner that is both expressive and precise remains a central challenge. While visual goals provide a compact and unambiguous task specification, existing goal-conditioned policies often struggle with long-horizon manipulation due to their reliance on single-step action prediction without explicit modeling of task progress. We propose Act2Goal, a general goal-conditioned manipulation policy that integrates a goal-conditioned visual world model with multi-scale temporal control. Given a current observation and a target visual goal, the world model generates a plausible sequence of intermediate visual states that captures long-horizon structure. To translate this visual plan into robust execution, we introduce Multi-Scale Temporal Hashing (MSTH), which decomposes the imagined trajectory into dense proximal frames for fine-grained closed-loop control and sparse distal frames that anchor global task consistency. The policy couples these representations with motor control through end-to-end cross-attention, enabling coherent long-horizon behavior while remaining reactive to local disturbances. Act2Goal achieves strong zero-shot generalization to novel objects, spatial layouts, and environments. We further enable reward-free online adaptation through hindsight goal relabeling with LoRA-based finetuning, allowing rapid autonomous improvement without external supervision. Real-robot experiments demonstrate that Act2Goal improves success rates from 30% to 90% on challenging out-of-distribution tasks within minutes of autonomous interaction, validating that goal-conditioned world models with multi-scale temporal control provide structured guidance necessary for robust long-horizon manipulation. Project page: https://act2goal.github.io/
- Web World Models
Language agents increasingly require persistent worlds in which they can act, remember, and learn. Existing approaches sit at two extremes: conventional web frameworks provide reliable but fixed contexts backed by databases, while fully generative world models aim for unlimited environments at the expense of controllability and practical engineering. In this work, we introduce the Web World Model (WWM), a middle ground where world state and ``physics'' are implemented in ordinary web code to ensure logical consistency, while large language models generate context, narratives, and high-level decisions on top of this structured latent state. We build a suite of WWMs on a realistic web stack, including an infinite travel atlas grounded in real geography, fictional galaxy explorers, web-scale encyclopedic and narrative worlds, and simulation- and game-like environments. Across these systems, we identify practical design principles for WWMs: separating code-defined rules from model-driven imagination, representing latent state as typed web interfaces, and utilizing deterministic generation to achieve unlimited but structured exploration. Our results suggest that web stacks themselves can serve as a scalable substrate for world models, enabling controllable yet open-ended environments. Project Page: https://github.com/Princeton-AI2-Lab/Web-World-Models.
- DiRL: An Efficient Post-Training Framework for Diffusion Language Models
Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models. While recent efforts have validated their pre-training potential and accelerated inference speeds, the post-training landscape for dLLMs remains underdeveloped. Existing methods suffer from computational inefficiency and objective mismatches between training and inference, severely limiting performance on complex reasoning tasks such as mathematics. To address this, we introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference. This architecture enables a streamlined online model update loop, facilitating efficient two-stage post-training (Supervised Fine-Tuning followed by Reinforcement Learning). Building on this framework, we propose DiPO, the first unbiased Group Relative Policy Optimization (GRPO) implementation tailored for dLLMs. We validate our approach by training DiRL-8B-Instruct on high-quality math data. Our model achieves state-of-the-art math performance among dLLMs and surpasses comparable models in the Qwen2.5 series on several benchmarks.
- Training AI Co-Scientists Using Rubric Rewards
AI co-scientists are emerging as a tool to assist human researchers in achieving their research goals. A crucial feature of these AI co-scientists is the ability to generate a research plan given a set of aims and constraints. The plan may be used by researchers for brainstorming, or may even be implemented after further refinement. However, language models currently struggle to generate research plans that follow all constraints and implicit requirements. In this work, we study how to leverage the vast corpus of existing research papers to train language models that generate better research plans. We build a scalable, diverse training corpus by automatically extracting research goals and goal-specific grading rubrics from papers across several domains. We then train models for research plan generation via reinforcement learning with self-grading. A frozen copy of the initial policy acts as the grader during training, with the rubrics creating a generator-verifier gap that enables improvements without external human supervision. To validate this approach, we conduct a study with human experts for machine learning research goals, spanning 225 hours. The experts prefer plans generated by our finetuned Qwen3-30B-A3B model over the initial model for 70% of research goals, and approve 84% of the automatically extracted goal-specific grading rubrics. To assess generality, we also extend our approach to research goals from medical papers, and new arXiv preprints, evaluating with a jury of frontier models. Our finetuning yields 12-22% relative improvements and significant cross-domain generalization, proving effective even in problem settings like medical research where execution feedback is infeasible. Together, these findings demonstrate the potential of a scalable, automated training recipe as a step towards improving general AI co-scientists.
- Video-BrowseComp: Benchmarking Agentic Video Research on Open Web
The evolution of autonomous agents is redefining information seeking, transitioning from passive retrieval to proactive, open-ended web research. However, while textual and static multimodal agents have seen rapid progress, a significant modality gap remains in processing the web's most dynamic modality: video. Existing video benchmarks predominantly focus on passive perception, feeding curated clips to models without requiring external retrieval. They fail to evaluate agentic video research, which necessitates actively interrogating video timelines, cross-referencing dispersed evidence, and verifying claims against the open web. To bridge this gap, we present Video-BrowseComp, a challenging benchmark comprising 210 questions tailored for open-web agentic video reasoning. Unlike prior benchmarks, Video-BrowseComp enforces a mandatory dependency on temporal visual evidence, ensuring that answers cannot be derived solely through text search but require navigating video timelines to verify external claims. Our evaluation of state-of-the-art models reveals a critical bottleneck: even advanced search-augmented models like GPT-5.1 (w/ Search) achieve only 15.24\% accuracy. Our analysis reveals that these models largely rely on textual proxies, excelling in metadata-rich domains (e.g., TV shows with plot summaries) but collapsing in metadata-sparse, dynamic environments (e.g., sports, gameplay) where visual grounding is essential. As the first open-web video research benchmark, Video-BrowseComp advances the field beyond passive perception toward proactive video reasoning.
- VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
In most existing embodied navigation tasks, instructions are well-defined and unambiguous, such as instruction following and object searching. Under this idealized setting, agents are required solely to produce effective navigation outputs conditioned on vision and language inputs. However, real-world navigation instructions are often vague and ambiguous, requiring the agent to resolve uncertainty and infer user intent through active dialog. To address this gap, we propose Interactive Instance Object Navigation (IION), a task that requires agents not only to generate navigation actions but also to produce language outputs via active dialog, thereby aligning more closely with practical settings. IION extends Instance Object Navigation (ION) by allowing agents to freely consult an oracle in natural language while navigating. Building on this task, we present the Vision Language-Language Navigation (VL-LN) benchmark, which provides a large-scale, automatically generated dataset and a comprehensive evaluation protocol for training and assessing dialog-enabled navigation models. VL-LN comprises over 41k long-horizon dialog-augmented trajectories for training and an automatic evaluation protocol with an oracle capable of responding to agent queries. Using this benchmark, we train a navigation model equipped with dialog capabilities and show that it achieves significant improvements over the baselines. Extensive experiments and analyses further demonstrate the effectiveness and reliability of VL-LN for advancing research on dialog-enabled embodied navigation. Code and dataset: https://0309hws.github.io/VL-LN.github.io/
Solidot(15)
- KDE Plasma 的 2025 年
KDE 开发者总结了桌面环境 Plasma 在 2025 年的重要进展:切换到 Wayland 显示服务器的工作基本完成,2027 年初发布的 Plasma 将停止支持 X11 会话;Plasma 持续改进和成熟,成为众多面向游戏发行版的默认桌面环境,这些发行版包括了 Bazzite、CachyOS、Garuda、Nobara,以及 Valve 掌机/主机运行的 SteamOS。Fedora 发行版也将其 Plasma 桌面版本与 GNOME 桌面版本放在同等位置,唯一能在苹果新 Mac 设备上运行的发行版 Asahi Linux 使用的也是 KDE Plasma 桌面。Parrot Linux 最近也开始默认使用 Plasma。EndeavourOS、Manjaro、NixOS、OpenMandriva、Slackware 和 TuxedoOS 等老牌发行版的默认桌面环境都是 Plasma。
- 蚊子口器启发 3D 打印喷嘴设计
加拿大麦吉尔大学与美国德雷塞尔大学团队联合开发出一种颇具创意的高分辨率3D打印新技术。他们将雌性蚊子的口器(吸血管)转化成了高分辨率的3D打印喷嘴。这种技术不仅能打印出精度达 20 微米的极细线条,还为解决昂贵、高能耗的微纳制造难题提供了可持续的生物学方案。高分辨率 3D 打印对喷嘴精度要求极高。目前市售的超细喷嘴多由特种金属或玻璃制成,制造工艺复杂,成本高昂。研究团队指出,传统喷嘴在生产和使用过程中不仅产生大量环境废弃物,还可能因工艺局限带来健康风险。为了寻找替代方案,研究团队将目光投向自然界中高度进化的微结构——蚊子口器。经过数百万年进化,蚊子口器形成了一种直径仅为人类发丝直径一半左右的天然微针结构,兼具特殊几何形态和力学韧性。研究团队在显微镜下分离出蚊子吸血管,并利用特种树脂将其固定在标准塑料分配器尖端。结果发现,这种生物喷嘴能承受极大的压力,打印出的复杂结构精细程度大约是目前商业打印喷嘴的 2 倍。
- 网信办起草暂行办法要求 AI 服务商采取措施阻止自杀自残
网信办发布了《人工智能拟人化互动服务管理暂行办法(征求意见稿)》,意见截止日期 1 月 25 日。该《暂行办法》包含了被认为全球最严厉的政策,要求服务商采取措施阻止 AI 帮助用户自杀或自残。《暂行办法》包括: 第八条 提供者应当落实拟人化互动服务安全主体责任,建立健全算法机制机理审核、科技伦理审查、信息发布审核、网络安全、数据安全、个人信息保护、反电信网络诈骗、重大风险预案、应急处置等管理制度,具有安全可控的技术保障措施,配备与产品规模、业务方向和用户群体相适应的内容管理技术和人员。 第九条 提供者应当在拟人化互动服务全生命周期履行安全责任,明确设计、运行、升级、终止服务等各阶段安全要求,保证安全措施与服务功能同步设计、同步使用,提升内生安全水平,加强运行阶段安全监测和风险评估,及时发现纠正系统偏差、处置安全问题,依法留存网络日志。提供者应当具备心理健康保护、情感边界引导、依赖风险预警等安全能力,不得将替代社会交往、控制用户心理、诱导沉迷依赖等作为设计目标。 第十一条 提供者应当具备用户状态识别能力,在保护用户个人隐私前提下,评估用户情绪及对产品和服务的依赖程度,发现用户存在极端情绪和沉迷的,采取必要措施予以干预。提供者应当预设回复模板,发现涉及威胁用户生命健康和财产安全的高风险倾向的,及时输出安抚和鼓励寻求帮助等内容,并提供专业援助方式。提供者应当建立应急响应机制,发现用户明确提出实施自杀、自残等极端情境时,由人工接管对话,并及时采取措施联络用户监护人、紧急联系人。针对未成年人、老年人用户,提供者应当在注册环节要求填写用户监护人、紧急联系人等信息。 第十七条 用户连续使用拟人化互动服务超过2个小时的,提供者应当以弹窗等方式动态提醒用户暂停使用服务。
- 中国汽车销量超越日本
中国车企的全球销量在 2025 年超过日本,首次跃居首位。根据2025 年 1~11 月各企业发布的资料和标普全球汽车(S&P Global Mobility)的数据,中国汽车的全球销量预计同比增长 17%,增至约 2700 万辆。中国在 2023 年首次位居汽车出口首位。整体销量也将在 2025 年跃居首位。日本车企合计销量约为 2500 万辆,与上年持平。过去世界汽车销售由美国和日本展开竞争。在顶峰时期的 2018 年日本销量近 3000 万辆。另一方面,中国国内的供应过剩迹象增强,最大车企比亚迪开始降价,价格竞争日趋激烈。中国汽车制造商正在转向出口寻找出路。
- 2025 年美国人观看了更少的新电视剧
对尼尔森最新数据的分析显示,2025 年没有一部新的原创剧能进入十大最受欢迎的流媒体节目之列。这是尼尔森自 2020 年以来发布流媒体数据以来首次出现该情况。数据还显示,由广告支持的免费流媒体服务增长速度超过了付费流媒体服务。YouTube 是美国电视上观看量最高的流媒体服务,超过了 Netflix 和亚马逊总和。Netflix 在热门剧上仍然具有优势,在尼尔森每周十大热门原创节目榜单中占了约三分之二。但其主导地位正逐渐消失——该公司的流媒体观看份额占比降至 20% 以下。迪士尼流媒体服务份额三年以来停滞不前,而亚马逊则在迎头赶上。2025 年观看量最高的原创剧是《鱿鱼游戏》终季,之后是《星期三》第二季和《爱情岛》的最新季。
- 火灾空气污染比预期的更严重
大火吞噬着土地,不断向空气中排放气体和颗粒物,它们对空气污染的影响可能被低估了。一项研究报告称,在世界各地,野火和计划烧除排放的气体可能远超此前预估量。每年大片森林、草地和泥炭地都会在野火中焚烧,向空气中释放出水蒸气、灰烬和碳基化合物的复杂混合物。研究人员查阅了 1997 年至 2023 年全球森林、草地和泥炭地林野火灾所烧毁土地面积的数据库。他们还收集了有关每种植被类型燃烧时排放的有机化合物数据。研究人员估计,在研究期间,林野火灾每年平均向空气中排放约 1.43 亿吨有机化合物。该数值比之前的估算值高出 21%,这表明林野火灾排放物所造成的空气污染可能比先前认为的更加严重。
- GOG 和 CD Projekt 联合创始人完全控制 GOG
与 Marcin Iwiński 共同创办 CD Projekt 游戏工作室以及数字游戏发行平台 Good Old Games(GOG)的 Michal Kicinski 从 CD Projekt 收购了 GOG 全部股份,完全控制该数字平台。GOG 将继续独立运营,继续坚持对无 DRM 游戏的承诺—— GOG 发行的游戏没有 DRM。CD Projekt 将专注于游戏开发,而 GOG 已经签署了协议,未来推出的 CD Projekt Red 游戏将会继续在 GOG 上发行无 DRM 版本。
- 2026 年可能是 Qwen 之年
阿里巴巴的开放权重大模型 Qwen(通义千问)不是最先进的 AI 模型,但它是目前最受欢迎的开放权重大模型之一,它的易于使用和易于修改吸引了全世界的研究人员和企业采用。AI 会议 NeurIPS 有数百篇论文使用了 Qwen 模型,智能眼镜初创公司 Rokid 在其原型设备中使用了 Qwen,Airbnb、Perplexity 和 Nvidia 都在使用 Qwen,就连 Meta 据说也在利用 Qwen 帮助构建新模型。Qwen 的崛起表明,衡量任何 AI 模型的关键指标,除了其智能程度之外,还应该看它能用于构建其它产品的程度。
- Blender 调查显示大部分用户不用 AI
5102 人参加了 Blender 基金会的年度调查。结果显示:大部分参与者的年龄在 19-35 岁之间;16% 的参与者来自美国,德国是 7.26%,中国是 5.61%,印度是 5.46%;三分之一参与者是美术师,17% 的参与者是设计师;半数参与者每天使用 Blender;逾半数参与者是因为免费或有趣或开源而使用 Blender;用户会长时间一直使用一个 LTS 版本;大部分参与者不使用 AI,只有 7% 的用户经常使用 AI。
- 每天饮用瓶装水的人每年会多摄入 9 万微塑料
Sarah Sajedi 在泰国披披岛(Phi Phi Island)旅游时为壮观的海景所吸引,但她低头一看,发现海滩上遍地是塑料瓶。她在攻读博士学位期间分析了逾 140 篇论文以判断塑料瓶对人体的影响。她发现,人平均每年从食物和饮用水中摄入 39,000-52,000 个微塑料颗粒,而每天饮用瓶装水的人每年摄入的微塑料颗粒要多 90,000 个。Sajedi 建议人们在紧急情况下饮用塑料瓶装水,不应该日常饮用。微塑料是 1-5 毫米之间的塑料颗粒,而纳米塑料则小于 1 微米。塑料颗粒肉眼不可见,但在瓶子的生产、储存、运输和分解过程中会不断产生。与其它通过食物链进入人体的塑料颗粒不同,塑料瓶中的微塑料更令人担忧,因为它们会随饮用水直接摄入体内。一旦进入人体微塑料颗粒能进入血液循环,到达重要器官,引发慢性炎症反应,使细胞暴露于氧化应激之下,进而导致激素系统紊乱、生殖功能受损和神经系统损伤。
- 伊朗和俄罗斯的审查和反审查
今年六月与以色列爆发冲突期间伊朗一度断网数天,它也加强了网络审查。Tor 项目开发的 Snowflake 是伊朗使用最广泛的网络流量混淆工具。为更好的应对伊朗对网桥——不公开的 Tor 中继但可以通过各种方法获取——的封锁,Tor 项目开发了可插拔传输协议 Conjure——其功能类似为避免垃圾邮件而生成的临时邮件地址,一个网桥地址被封锁不影响用户获取新网桥地址。俄罗斯也加强了对网络的审查,Tor 项目去年推出的模拟 HTTPS 流量的新可插拔传输协议 WebTunnel 在俄罗斯很受欢迎。俄罗斯在 6 月加强了对 WebTunnel 网桥地址的封锁,Tor 项目开始通过 Telegram 分发 WebTunnel 网桥。Tor 项目计划明年部署 Conjure 和持续改进 WebTunnel,更好的应对封锁。
- SuperTux 0.7 发布首个 Beta
模仿超级马里奥兄弟的开源游戏《超级企鹅(SuperTux》在时隔多年之后释出了下一个大版本 v0.7 的首个 Beta 版本。《超级企鹅》游戏主角是 Linux 吉祥物企鹅,游戏玩法是类似超级马力欧兄弟的横版过关。游戏于 2003 年开始开发,上一个大版本 v0.6 是在 2019 年发布的。v0.7 版本是一次重大更新,重做了多个世界,引入了全新的美术和音乐等内容,核心玩法不变,但游戏体验可能和以前完全不同。游戏提供了 Flatpak 打包的版本。
- Sal Khan 建议企业捐出 1% 的利润帮助被 AI 取代的工人
可汗学院(Khan Academy)创始人 Sal Khan 建议受益于自动化的企业捐出 1% 的利润帮助被 AI 取代的工人接受重就业培训。他认为这不是慈善,而是符合公司的自身利益,因为如果企业利润飙升的同时失业率增加,可能促使公众支持加强监管和增税,或支持禁止自动化。资助工人重新接受培训对大企业而言是微不足道的,几乎没有任何压力,但对公众而言却具有重大意义。全球最大的十几家公司年利润逾万亿美元,捐出百分之一利润就能创办一个每年有百亿美元的基金,拿出一部分就足以打造一个中心化的技能培训平台。基金可由独立非营利组织运营,通过与企业协调,确保所培训的技能符合市场需求。
- 科学家发现自闭症大脑的分子差异
耶鲁大学医学院的科学家发现自闭症患者大脑与神经正常者大脑之间的分子差异。根据发表在《The American Journal of Psychiatry》期刊上的研究,自闭症患者大脑一种特定类型的谷氨酸受体数量较少,谷氨酸是大脑中最常见的兴奋性神经递质。减少的谷氨酸受体数量可能与自闭症多种特征相关。大脑神经元通过电信号和称为神经递质的化学信使相互沟通。当电流在神经元中传递时,会促使释放神经递质,进而将信号传递给其它神经元。这种信号传递可以是兴奋性的,也可以是抑制性的。兴奋性信号主要触发神经递质谷氨酸的释放,起到绿灯作用,告诉其它神经元激发;抑制性信号则起到刹车作用抑制神经活动。大脑需要两种信号保持精确平衡才能正常运作。自闭症病因的主要假说之一是大脑中兴奋性和抑制性信号失衡。
- 随着内存价格飙升,电子产品也会跟着涨价
市场研究咨询公司 TrendForce 高级研究副总裁 Avril Wu 建议如果需要什么电子产品现在就去购买。AI 热导致了全世界出现内存短缺,而内存短缺将会影响各种电子产品的价格。TrendForce 的数据显示,RAM 芯片的需求量比供应量高 10%,且需求的增长速度是如此之快以至于厂商每月不得不支付更高的价格采购。最常见的内存芯片 DRAM 本季度的支付价比上季度高 50%。如果厂商需要提前拿到芯片则需要支付两到三倍的价格。DRAM 的价格在下一季度还将上涨 40%,她预计 2026 年价格不会下降。随着内存厂商如美光将生产重心转移到 AI 相关的高端内存领域,那么 PC、手机、游戏机和电视等消费电子产品的内存芯片供应量将会减少。Wu 女士表示在可预计的未来电子产品价格会继续上升。