Monthly Digest — 2026-01
397 unique stories across 31 days and 8 sources.
Hacker News(124)
- A website to destroy all websites (henry.codes)
- Linux is good now (www.pcgamer.com)
- If you care about security you might want to move the iPhone Camera app (blog.jgc.org)
- Finland detains ship and its crew after critical undersea cable damaged (www.cnn.com)
- NY Fed cash transfers to banks increase dramatically in Q4 2025 (www.dcreport.org)
- Daft Punk Easter Egg in the BPM Tempo of Harder, Better, Faster, Stronger? (www.madebywindmill.com)
- Publish on your own site, syndicate elsewhere (indieweb.org)
- Clicks Communicator (www.clicksphone.com)
- Total monthly number of StackOverflow questions over time (data.stackexchange.com)
- Report: Microsoft kills official way to activate Windows 11/10 without internet (www.neowin.net)
- Sirius DB (www.sirius-db.com)
- The C3 Programming Language (c3-lang.org)
- Claude Code On-the-Go (granda.org)
- Show HN: Terminal UI for AWS (github.com)
- I charged $18k for a Static HTML Page (2019) (idiallo.com)
- Lessons from 14 Years at Google (addyosmani.com)
- Google broke my heart (perishablepress.com)
- There were BGP anomalies during the Venezuela blackout (loworbitsecurity.com)
- Pebble Round 2 (repebble.com)
- Try to take my position: The best promotion advice I ever got (andrew.grahamyooll.com)
GitHub Trending(71)
- awslabs / amazon-bedrock-agentcore-samples
Amazon Bedrock Agentcore accelerates AI agents into production with the scale, reliability, and security, critical to real-world deployment.
- BloopAI / vibe-kanban
Get 10X more out of Claude Code, Codex or any coding agent
- usememos / memos
An open-source, self-hosted note-taking service. Your thoughts, your data, your control — no tracking, no ads, no subscription fees.
- organicmaps / organicmaps
🍃 Organic Maps is a free Android & iOS offline maps app for travelers, tourists, hikers, and cyclists. It uses crowd-sourced OpenStreetMap data and is developed with love by the community. No ads, no tracking, no data collection, no crapware. Please donate to support the development!
- HQarroum / docker-android
🤖 A minimal and customizable Docker image running the Android emulator as a service.
- nocodb / nocodb
🔥 🔥 🔥 Open Source Airtable Alternative
- openai / openai-cookbook
Examples and guides for using the OpenAI API
- ourongxing / newsnow
Elegant reading of real-time and hottest news
- pathwaycom / pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
- OpenBB-finance / OpenBB
Financial data platform for analysts, quants and AI agents.
- anomalyco / opencode
The open source coding agent.
- protocolbuffers / protobuf
Protocol Buffers - Google's data interchange format
- Lissy93 / web-check
🕵️♂️ All-in-one OSINT tool for analysing any website
- microsoft / PowerToys
Microsoft PowerToys is a collection of utilities that help you customize Windows and streamline everyday tasks
- anthropics / claude-code-action
- thedotmack / claude-mem
A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.
- google / googletest
GoogleTest - Google Testing and Mocking Framework
- ChromeDevTools / chrome-devtools-mcp
Chrome DevTools for coding agents
- anthropics / claude-code
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
- nothings / stb
stb single-file public domain libraries for C/C++
Hugging Face(90)
- mHC: Manifold-Constrained Hyper-Connections
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.
- Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on distillation, Youtu-LLM (1.96B) is pre-trained from scratch to systematically cultivate reasoning and planning capabilities. The key technical advancements are as follows: (1) Compact Architecture with Long-Context Support: Built on a dense Multi-Latent Attention (MLA) architecture with a novel STEM-oriented vocabulary, Youtu-LLM supports a 128k context window. This design enables robust long-context reasoning and state tracking within a minimal memory footprint, making it ideal for long-horizon agent and reasoning tasks. (2) Principled "Commonsense-STEM-Agent" Curriculum: We curated a massive corpus of approximately 11T tokens and implemented a multi-stage training strategy. By progressively shifting the pre-training data distribution from general commonsense to complex STEM and agentic tasks, we ensure the model acquires deep cognitive abilities rather than superficial alignment. (3) Scalable Agentic Mid-training: Specifically for the agentic mid-training, we employ diverse data construction schemes to synthesize rich and varied trajectories across math, coding, and tool-use domains. This high-quality data enables the model to internalize planning and reflection behaviors effectively. Extensive evaluations show that Youtu-LLM sets a new state-of-the-art for sub-2B LLMs. On general benchmarks, it achieves competitive performance against larger models, while on agent-specific tasks, it significantly surpasses existing SOTA baselines, demonstrating that lightweight models can possess strong intrinsic agentic capabilities.
- Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production pipeline for agent LLMs. ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering. We release ROME (ROME is Obviously an Agentic Model), an open-source agent grounded by ALE and trained on over one million trajectories. Our approach includes data composition protocols for synthesizing complex behaviors and a novel policy optimization algorithm, Interaction-based Policy Alignment (IPA), which assigns credit over semantic interaction chunks rather than individual tokens to improve long-horizon training stability. Empirically, we evaluate ROME within a structured setting and introduce Terminal Bench Pro, a benchmark with improved scale and contamination control. ROME demonstrates strong performance across benchmarks like SWE-bench Verified and Terminal Bench, proving the effectiveness of the ALE infrastructure.
- GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction
Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input views are limited. Various approaches, including regularization techniques, semantic priors, and geometric constraints, have been implemented to address this challenge. Latest diffusion-based methods have demonstrated substantial improvements by generating novel views from new camera poses to augment training data, surpassing earlier regularization and prior-based techniques. Despite this progress, we identify three critical limitations in these state-of-the-art approaches: inadequate coverage beyond known view peripheries, geometric inconsistencies across generated views, and computationally expensive pipelines. We introduce GaMO (Geometry-aware Multi-view Outpainter), a framework that reformulates sparse-view reconstruction through multi-view outpainting. Instead of generating new viewpoints, GaMO expands the field of view from existing camera poses, which inherently preserves geometric consistency while providing broader scene coverage. Our approach employs multi-view conditioning and geometry-aware denoising strategies in a zero-shot manner without training. Extensive experiments on Replica and ScanNet++ demonstrate state-of-the-art reconstruction quality across 3, 6, and 9 input views, outperforming prior methods in PSNR and LPIPS, while achieving a 25times speedup over SOTA diffusion-based methods with processing time under 10 minutes. Project page: https://yichuanh.github.io/GaMO/
- Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Many RAG systems incorporate a working memory module to consolidate retrieved information. However, existing memory designs function primarily as passive storage that accumulates isolated facts for the purpose of condensing the lengthy inputs and generating new sub-queries through deduction. This static nature overlooks the crucial high-order correlations among primitive facts, the compositions of which can often provide stronger guidance for subsequent steps. Therefore, their representational strength and impact on multi-step reasoning and knowledge evolution are limited, resulting in fragmented reasoning and weak global sense-making capacity in extended contexts. We introduce HGMem, a hypergraph-based memory mechanism that extends the concept of memory beyond simple storage into a dynamic, expressive structure for complex reasoning and global understanding. In our approach, memory is represented as a hypergraph whose hyperedges correspond to distinct memory units, enabling the progressive formation of higher-order interactions within memory. This mechanism connects facts and thoughts around the focal problem, evolving into an integrated and situated knowledge structure that provides strong propositions for deeper reasoning in subsequent steps. We evaluate HGMem on several challenging datasets designed for global sense-making. Extensive experiments and in-depth analyses show that our method consistently improves multi-step RAG and substantially outperforms strong baseline systems across diverse tasks.
- Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose Dynamic Large Concept Models (DLCM), a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first compression-aware scaling law, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a decoupled μP parametrization that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting (R=4, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a +2.69\% average improvement across 12 zero-shot benchmarks under matched inference FLOPs.
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
While recent Multimodal Large Language Models (MLLMs) have attained significant strides in multimodal reasoning, their reasoning processes remain predominantly text-centric, leading to suboptimal performance in complex long-horizon, vision-centric tasks. In this paper, we establish a novel Generative Multimodal Reasoning paradigm and introduce DiffThinker, a diffusion-based reasoning framework. Conceptually, DiffThinker reformulates multimodal reasoning as a native generative image-to-image task, achieving superior logical consistency and spatial precision in vision-centric tasks. We perform a systematic comparison between DiffThinker and MLLMs, providing the first in-depth investigation into the intrinsic characteristics of this paradigm, revealing four core properties: efficiency, controllability, native parallelism, and collaboration. Extensive experiments across four domains (sequential planning, combinatorial optimization, constraint satisfaction, and spatial configuration) demonstrate that DiffThinker significantly outperforms leading closed source models including GPT-5 (+314.2\%) and Gemini-3-Flash (+111.6\%), as well as the fine-tuned Qwen3-VL-32B baseline (+39.0\%), highlighting generative multimodal reasoning as a promising approach for vision-centric reasoning.
- On the Role of Discreteness in Diffusion LLMs
Diffusion models offer appealing properties for language generation, such as parallel decoding and iterative refinement, but the discrete and highly structured nature of text challenges the direct application of diffusion principles. In this paper, we revisit diffusion language modeling from the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements. We first categorize existing approaches into continuous diffusion in embedding space and discrete diffusion over tokens. We then show that each satisfies only part of the five essential properties and therefore reflects a structural trade-off. Through analyses of recent large diffusion language models, we identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding. These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.
- NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at https://neoverse-4d.github.io
- Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization
Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive manual effort in tool integration and prompt engineering, while deployed agents struggle to adapt to dynamic environments without expensive fine-tuning. To address these issues, we propose Youtu-Agent, a modular framework designed for the automated generation and continuous evolution of LLM agents. Youtu-Agent features a structured configuration system that decouples execution environments, toolkits, and context management, enabling flexible reuse and automated synthesis. We introduce two generation paradigms: a Workflow mode for standard tasks and a Meta-Agent mode for complex, non-standard requirements, capable of automatically generating tool code, prompts, and configurations. Furthermore, Youtu-Agent establishes a hybrid policy optimization system: (1) an Agent Practice module that enables agents to accumulate experience and improve performance through in-context optimization without parameter updates; and (2) an Agent RL module that integrates with distributed training frameworks to enable scalable and stable reinforcement learning of any Youtu-Agents in an end-to-end, large-scale manner. Experiments demonstrate that Youtu-Agent achieves state-of-the-art performance on WebWalkerQA (71.47\%) and GAIA (72.8\%) using open-weight models. Our automated generation pipeline achieves over 81\% tool synthesis success rate, while the Practice module improves performance on AIME 2024/2025 by +2.7\% and +5.4\% respectively. Moreover, our Agent RL training achieves 40\% speedup with steady performance improvement on 7B LLMs, enhancing coding/reasoning and searching capabilities respectively up to 35\% and 21\% on Maths and general/multi-hop QA benchmarks.
- Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we introduce a direct preference optimization method that leverages synthetic losing samples constructed by dropping user conditions, enabling label-free learning of expressive interaction. Experimental results demonstrate that our framework enables real-time interaction with low latency (approximately 500ms), achieving 6.8X speedup compared to the baseline, and produces reactive and expressive avatar motion, which is preferred over 80% against the baseline.
- SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
While Vision-Language Models (VLMs) can solve complex tasks through agentic reasoning, their capabilities remain largely constrained to text-oriented chain-of-thought or isolated tool invocation. They fail to exhibit the human-like proficiency required to seamlessly interleave dynamic tool manipulation with continuous reasoning, particularly in knowledge-intensive and visually complex scenarios that demand coordinated external tools such as search and image cropping. In this work, we introduce SenseNova-MARS, a novel Multimodal Agentic Reasoning and Search framework that empowers VLMs with interleaved visual reasoning and tool-use capabilities via reinforcement learning (RL). Specifically, SenseNova-MARS dynamically integrates the image search, text search, and image crop tools to tackle fine-grained and knowledge-intensive visual understanding challenges. In the RL stage, we propose the Batch-Normalized Group Sequence Policy Optimization (BN-GSPO) algorithm to improve the training stability and advance the model's ability to invoke tools and reason effectively. To comprehensively evaluate the agentic VLMs on complex visual tasks, we introduce the HR-MMSearch benchmark, the first search-oriented benchmark composed of high-resolution images with knowledge-intensive and search-driven questions. Experiments demonstrate that SenseNova-MARS achieves state-of-the-art performance on open-source search and fine-grained image understanding benchmarks. Specifically, on search-oriented benchmarks, SenseNova-MARS-8B scores 67.84 on MMSearch and 41.64 on HR-MMSearch, surpassing proprietary models such as Gemini-3-Flash and GPT-5. SenseNova-MARS represents a promising step toward agentic VLMs by providing effective and robust tool-use capabilities. To facilitate further research in this field, we will release all code, models, and datasets.
- NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.
- Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with true correctness. We ask: can LLMs predict their own failures by inspecting internal states during inference? We introduce Gnosis, a lightweight self-awareness mechanism that enables frozen LLMs to perform intrinsic self-verification by decoding signals from hidden states and attention patterns. Gnosis passively observes internal traces, compresses them into fixed-budget descriptors, and predicts correctness with negligible inference cost, adding only ~5M parameters and operating independently of sequence length. Across math reasoning, open-domain question answering, and academic knowledge benchmarks, and over frozen backbones ranging from 1.7B to 20B parameters, Gnosis consistently outperforms strong internal baselines and large external judges in both accuracy and calibration. Moreover, it generalizes zero-shot to partial generations, enabling early detection of failing trajectories and compute-aware control. These results show that reliable correctness cues are intrinsic to generation process and can be extracted efficiently without external supervision.
- K-EXAONE Technical Report
This technical report presents K-EXAONE, a large-scale multilingual language model developed by LG AI Research. K-EXAONE is built on a Mixture-of-Experts architecture with 236B total parameters, activating 23B parameters during inference. It supports a 256K-token context window and covers six languages: Korean, English, Spanish, German, Japanese, and Vietnamese. We evaluate K-EXAONE on a comprehensive benchmark suite spanning reasoning, agentic, general, Korean, and multilingual abilities. Across these evaluations, K-EXAONE demonstrates performance comparable to open-weight models of similar size. K-EXAONE, designed to advance AI for a better life, is positioned as a powerful proprietary AI foundation model for a wide range of industrial and research applications.
- DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic information. Existing methods struggle to maintain identity similarity and attribute preservation while preserving temporal consistency. To address the challenge, we propose a comprehensive framework to seamlessly transfer the superiority of Image Face Swapping (IFS) to the video domain. We first introduce a novel data pipeline SyncID-Pipe that pre-trains an Identity-Anchored Video Synthesizer and combines it with IFS models to construct bidirectional ID quadruplets for explicit supervision. Building upon paired data, we propose the first Diffusion Transformer-based framework DreamID-V, employing a core Modality-Aware Conditioning module to discriminatively inject multi-model conditions. Meanwhile, we propose a Synthetic-to-Real Curriculum mechanism and an Identity-Coherence Reinforcement Learning strategy to enhance visual realism and identity consistency under challenging scenarios. To address the issue of limited benchmarks, we introduce IDBench-V, a comprehensive benchmark encompassing diverse scenes. Extensive experiments demonstrate DreamID-V outperforms state-of-the-art methods and further exhibits exceptional versatility, which can be seamlessly adapted to various swap-related tasks.
- InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Existing depth estimation methods are fundamentally limited to predicting depth on discrete image grids. Such representations restrict their scalability to arbitrary output resolutions and hinder the geometric detail recovery. This paper introduces InfiniDepth, which represents depth as neural implicit fields. Through a simple yet effective local implicit decoder, we can query depth at continuous 2D coordinates, enabling arbitrary-resolution and fine-grained depth estimation. To better assess our method's capabilities, we curate a high-quality 4K synthetic benchmark from five different games, spanning diverse scenes with rich geometric and appearance details. Extensive experiments demonstrate that InfiniDepth achieves state-of-the-art performance on both synthetic and real-world benchmarks across relative and metric depth estimation tasks, particularly excelling in fine-detail regions. It also benefits the task of novel view synthesis under large viewpoint shifts, producing high-quality results with fewer holes and artifacts.
- MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization
Speaker-Attributed, Time-Stamped Transcription (SATS) aims to transcribe what is said and to precisely determine the timing of each speaker, which is particularly valuable for meeting transcription. Existing SATS systems rarely adopt an end-to-end formulation and are further constrained by limited context windows, weak long-range speaker memory, and the inability to output timestamps. To address these limitations, we present MOSS Transcribe Diarize, a unified multimodal large language model that jointly performs Speaker-Attributed, Time-Stamped Transcription in an end-to-end paradigm. Trained on extensive real wild data and equipped with a 128k context window for up to 90-minute inputs, MOSS Transcribe Diarize scales well and generalizes robustly. Across comprehensive evaluations, it outperforms state-of-the-art commercial systems on multiple public and in-house benchmarks.
- LTX-2: Efficient Joint Audio-Visual Foundation Model
Recent text-to-video diffusion models can generate compelling video sequences, yet they remain silent -- missing the semantic, emotional, and atmospheric cues that audio provides. We introduce LTX-2, an open-source foundational model capable of generating high-quality, temporally synchronized audiovisual content in a unified manner. LTX-2 consists of an asymmetric dual-stream transformer with a 14B-parameter video stream and a 5B-parameter audio stream, coupled through bidirectional audio-video cross-attention layers with temporal positional embeddings and cross-modality AdaLN for shared timestep conditioning. This architecture enables efficient training and inference of a unified audiovisual model while allocating more capacity for video generation than audio generation. We employ a multilingual text encoder for broader prompt understanding and introduce a modality-aware classifier-free guidance (modality-CFG) mechanism for improved audiovisual alignment and controllability. Beyond generating speech, LTX-2 produces rich, coherent audio tracks that follow the characters, environment, style, and emotion of each scene -- complete with natural background and foley elements. In our evaluations, the model achieves state-of-the-art audiovisual quality and prompt adherence among open-source systems, while delivering results comparable to proprietary models at a fraction of their computational cost and inference time. All model weights and code are publicly released.
- SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the core competencies of scientific intelligence, including Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding, Scientific Symbolic Reasoning, Scientific Code Generation, Science Hypothesis Generation and Scientific Knowledge Understanding. It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science. SciEvalKit builds a foundation of expert-grade scientific benchmarks, curated from real-world, domain-specific datasets, ensuring that tasks reflect authentic scientific challenges. The toolkit features a flexible, extensible evaluation pipeline that enables batch evaluation across models and datasets, supports custom model and dataset integration, and provides transparent, reproducible, and comparable results. By bridging capability-based evaluation and disciplinary diversity, SciEvalKit offers a standardized yet customizable infrastructure to benchmark the next generation of scientific foundation models and intelligent agents. The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.
Solidot(112)
- 外国科技从业者正在避开美国
随着特朗普政府日益强化入境审查,甚至要倒查五年的社交媒体历史记录,外国科技从业者、研究人员和会议主讲越来越多的避开美国。行业会议和活动逐渐转移到更友好的欧洲、加拿大和亚洲地区。专栏作家 Steven Vaughan-Nichols 今年参加了 13 场科技会议,大部分是在美国之外举行的。在这些会议上,非美国人最关心的非科技话题都围绕着特朗普第二个任期给美国带来的巨大变化,与会者多表示他们不会去美国找工作,也不会去美国参加会议。展会主办方也开始取消原计划在美国举办的展会,转移到欧洲、加拿大和亚洲。曾经精英们为去美国愿意放弃一切,但如今情势出现了逆转。
- 以色列部署无人机激光防御系统
以色列部署了无人机激光防御系统 Iron Beam。Iron Beam 功率达到 10 万瓦,能以极低的单次发射成本击落无人机、火箭弹和迫击炮弹。它由以色列拉斐尔先进防御系统(Rafael Advanced Defense Systems)研发,其核心是先进激光源和独一无二的光电瞄准系统,能在更远的作战距离上以高精度和高效率拦截各种目标。以色列没有透露太多细节,只是强调 Iron Beam 已成功拦截火箭弹、迫击炮弹和无人机,其投入使用标志着高能激光防御时代的开始。
- 印度 GDP 超过日本成为全球第四大经济体
根据印度政府的年终经济评估,该国 GDP 超过日本成为全球第四大经济体。按照目前的增长速度,印度 GDP 预计会在三年内超过德国成为全球第三大经济体,仅次于美国和中国。印度 GDP 今年估计为 4.18 万亿美元,预计 2030 年达到 7.3 万亿美元。印度 2025 年 GDP 正式数据要到 2026 年公布。国际货币基金组织的预测是印度 GDP 将在明年超过日本。
- 冰岛经历了有记录以来最暖和的平安夜
冰岛经历了有记录以来最暖和的平安夜,最高气温达到了 19.8C。冰岛气象局称,东部小镇 Seyðisfjörður 气温高达 19.8C,东部 Borgarfjörður 的 Bakkagerði 气温达到了 19.7C。12 月冰岛平均气温在 -1C 到 4C 之间。气象学家表示原因是热带暖气流笼罩了冰岛。今年以来冰岛各地遭遇了破纪录的热浪,首次在野外发现蚊子。此前冰岛是南极洲之外唯一一个没有野蚊子的地方。
- 收入不平等扩大与工作时长增加相关
根据发表在《Social Psychological and Personality Science》期刊上的一项研究,收入不平等加剧与工作时长增加相关。过去四十年全球收入不平等显著加剧,北京师范大学和瑞士洛桑大学的研究人员调查了收入不平等和工作工作时长的关系。第一项研究使用的数据集包含了 1960-2019 年 69 个国家的数据,结果发现收入不平等程度(基尼系数)每增加十分之一,工作时长每年增加 60 小时——相当于一年多工作一周以上的时间。第二项研究针对的是美国,使用了 1968-2021 年 33,083 名参与者的数据,结果显示美国一个州的基尼系数每增加十分之一,平均每位参与者每年的工作时长增加约 53 小时;相比白人,黑人与工作时长增加之间的关联更显著;相比男性,女性与工作时长增加之间的关联也更显著。第三项研究针对的是中国,数据集包含了 2012-2020 年的26251 名参与者的数据,结果发现参与者感知的不平等程度每增加一个单位,每年工作时长增加约 10 小时。中国和美国情况是相反的,美国的收入不平等增加了弱势人群的工作时长,但中国的收入不平等增加的是优势人群的工作时长。研究人员对此感到惊讶,收入不平等扩大与城市居民的工作时长增加相关,但对农村居民没影响。
- Steam 用户中 Linux 比例达到 3.19%
根据 Valve 公布的 2025 年 12 月Steam 硬件和软件调查,Steam 用户中使用 Linux 的比例达到 3.19%,比前一个月下降 0.01%,远高于 2024 年 12 月的 2.29%。Linux 玩家使用 AMD CPU 的比率达到了 71.93%——Steam Deck 掌机使用的就是 AMD APU,Windows 玩家中 AMD CPU 比例为 47.27%。其它数据包括:Windows 11 份额突破了七成达到了 70.83%,Windows 10 占 26.70%;简体中文用户占 22.12%,英语用户占 47.08%。
- Windows 用户在 2026 年应该尝试下 Linux
Neowin 对比了从 Windows Vista、Windows 7、Windows 8 / 8.1、Windows 10 和 Windows 11 的安装流程,显示 Windows 11 之前的版本安装都十分简单,但 Windows 11 完全变成了一个广告展示系统,微软在整个过程中不停的向用户推荐它的各种产品,包括 OneDrive、Microsoft 365 和 Game Pass。Windows 11 越来越多的让用户觉得他们并不拥有其所购买的新 PC。相比下 Linux 系统不存在这种问题,过去几年 Linux 已经取得了长足进步,尤其是在曾经的弱项游戏领域。Valve 通过不断改进 Proton 兼容层显著改善了 Windows 游戏运行在 Linux 系统上的兼容性,部分情况下 Linux 下游戏的性能甚至可能超过 Windows。2026 年 Windows 用户应该去尝试下 Linux。
- 索尼 PS5 ROM 密钥泄漏
索尼 PS5 Level 0 BootROM 密钥在新年前夕泄漏。BootROM 是 PS5 使用的 AMD APU 在启动之后执行的首批代码,用于验证 Bootloader 是否合法,是否由索尼签名。密钥无法被修改,是直接烧录在 APU 中的。BootROM 密钥泄漏为黑客进一步破解 Bootloader 提供了帮助,但目前破解 PS5 还不太可能,黑客还需要绕过索尼在系统中设置的其它安全措施。索尼官方尚未对此事发表声明。
- 廉价太阳能改变非洲人的生活
廉价太阳能正在改变非洲人频繁遭遇断电的生活。以南非为例,太阳能从 2019 年的几乎为零上升到占发电量的约 10%,其中多数为私人所有。过去十年,美国加大了化石燃料出口,中国则专注于主导可再生能源。今天全世界的太阳能电池板、电动汽车和电池大部分都由中国公司生产,以至于它们正在大幅降价,并拼命寻找买家。根据英国能源追踪组织 Ember 对2025年前 10 个月中国出口数据的分析,非洲从中国的太阳能进口量上升了 50%。
- 微软终止 Windows 10/11 的电话激活支持
Windows 用户通过官方论坛和社交媒体报告,微软终止了 Windows 10/11 的电话激活方式。这意味着用户只能通过在线激活操作系统。以前 Windows 10/11 支持通过电话的离线激活,方法是开始>设置>系统>激活,在激活菜单下可选择通过电话激活。用户报告,当尝试通过电话激活时他们收到了该功能已经终止的自动语音提示,称“产品激活支持已转移到线上”。
- SpaceX 将在 2026 年降低其四千颗 Starlink 卫星轨道高度
SpaceX Starlink 业务副总裁 Michael Nicolls 宣布,为了改善太空安全,在 2026 年内将约 4400 颗 Starlink 卫星的轨道高度从 550 公里降至 480 公里。降低轨道高度有助于在卫星发生故障后迅速脱离轨道重返大气层,此举可避免因卫星故障增加碰撞风险,避免产生太空碎片。过去几年近地轨道日益拥挤,其中 SpaceX 已经发射了上万颗 Starlink 卫星,其它宽带卫星公司也在加速发射,拥挤的近地轨道引发了凯斯勒现象(Kessler Syndrome)的担忧。凯斯勒现象或碰撞级联效应是美国科学家 Donald J. Kessler 于 1978 年提出的一种理论假设。该假设认为当在近地轨道的运转的物体的密度达到一定程度时,将让这些物体在碰撞后产生的碎片能够形成更多的新撞击,形成级联效应。如果凯斯勒现象发生,作为最大的卫星宽带运营商,SpaceX 显然会深受其害。
- 泰坦星可能不存在全球性的地下海洋
土星最大卫星泰坦星(土卫六)稠密大气层和甲烷湖泊环境一直令科学家着迷,引发了它可能维持生命生存的猜测。根据发表在《自然》期刊上的一项研究,NASA JPL 重新分析了卡西尼号探测器收集的泰坦星数据,认为不存在全球性的地下海洋。研究人员认为泰坦内部的液体将以局部融化的融水囊形式存在。这些融水囊在潮汐能量的加热下,缓慢地向地表冰层上升。随着它们的上升,它们有可能将来自下方的有机分子带上来,并混合陨石撞击地表带来的物质。研究人员强调这并不排除它孕育基本生命形式的可能性。分析结果认为泰坦应该存在液态水区域,温度可能高达摄氏20度,能够将营养物质从岩石内核输送到高压冰层,最终到达地表的坚固冰壳。
- Reddit 在英国超过 TikTok 成为访问量第四大的社媒
Reddit 在英国超过 TikTok 成为访问量第四大的社媒平台。英国用户是仅次于美国用户的第二大访问人群,过去两年英国用户人数增长了 88%,Ofcom 的数据显示三分之二的英国网民会访问 Reddit,而 2023 年是三分之一。Reddit 在英国年轻人群中非常受欢迎,18-24 岁英国用户中 Reddit 是访问量第六大的网站,而一年前这一数字是第十。Reddit 的崛起背后的因素包括了 Google 调整了算法突出了论坛类内容,而 Reddit 就是论坛形式的社媒。在 AI 时代,用户也越来越多的转向人工撰写的内容,Reddit 也受益于这一趋势。在 Reddit 的英国用户中女性占了半数以上。
- 测试显示 Windows 11 的速度在六个 Windows 版本中垫底
Windows 11 是目前微软唯一支持的 Windows 版本,但它因为更高的硬件需求、更臃肿的系统以及 AI 而在用户中间口碑不佳。在六部旧笔电 ThinkPad X220——配置英特尔 Core i5-2520M CPU、8GB 内存和 256GB 硬盘——上测试最新版本的 Windows XP、Windows Vista、Windows 7、Windows 8.1、Windows 10 和 Windows 11,结果显示:Windows 11 启动速度最慢;安装容量为 37.3GB,略低于 Windows Vista 的 37.8GB 和 Windows 7 的 44.6GB;内存占用 3.3GB 最多 3.7GB;在旧硬件上更容易出现卡顿。
- 日本将利用 AI 加速漫画翻译
日本漫画在海外颇受欢迎,但很多读者看的是盗版。由出版社等组成的一般社团法人 ABJ 调查了约 900 个以漫画为中心、刊载日本出版物的盗版网站,结果发现,仅 2025 年 6 月一个月内,这些网站就有来自 123 个国家和地区读者的共计 28 亿次访问,累计阅读量达到 14 亿册。年度损失额据估算达到 8.5万 亿日元。日本希望在 AI 帮助下加速漫画的翻译,以多语言推动正版作品在海外流通,以防止读者流向盗版网站。目前在日本每年出版的漫画中,只有一成左右被翻译成英语。
- 广电总局打击 AI 魔改视频
国家广播电视总局宣布展开为期一个月的“AI魔改”视频专项治理行动。广电总局称: “随着生成式人工智能技术快速发展,部分网络账号滥用AI工具,对经典影视、动画片等内容进行颠覆性篡改、魔性解构与低俗化改编,这些内容严重背离经典作品精神内核,扰乱网络传播秩序,助长侵权行为,危害行业发展,干扰未成年人形成正确文化认知和现实感知。 专项治理重点清理基于四大名著、历史题材、革命题材、英模人物等电视剧作品进行“AI魔改”的下列视频:一是严重违背原作精神内核和角色形象,颠覆基本认知,解构普遍共识;二是内容渲染血腥暴力、猎奇低俗,宣扬错误价值观,违背公序良俗;三是存在对中华文化挪用、篡改的突出问题,导致对真实历史时空、中华文明标识产生明显错位认知,冲击文化认同。专项治理同步清理将少年儿童所熟知、所喜爱的动画形象进行改编生成的各类邪典动画。”
- 700 万年前人类祖先能直立行走
根据发表于《科学进展》的研究,基于 700 万年前的化石,利用强有力的解剖学证据表明,外表像猿、大脑很小的撒海尔人乍得种能直立行走。这意味着,人类祖先直立行走的时间比预期的早得多。法国普瓦提埃大学的古生物学家在中非乍得德乍腊沙漠发现了撒海尔人乍得种化石。这些化石可追溯至 700 万年前。这些化石到底属于人类直系祖先,还是一种已灭绝的旁支类人猿,学术界对此长期存在争议,其中一个关键争论点就是撒海尔人乍得种是否能直立行走。研究人员利用先进的三维成像技术等,对撒海尔人乍得种的肢体骨骼化石进行分析,发现了支撑其双足行走的三个关键特征:一是股骨近端前侧有个结节。这个结构虽小却很重要,它是人体最强韧带——髂股韧带的附着点。这种韧带是直立行走的关键。这一特征迄今仅在人科动物中观察到。二是股骨自然旋转扭曲,即股骨前倾,其角度处于人科动物范围内,有助于腿部向前伸展,从而实现高效行走。三是臀肌与早期人科动物相似,能够稳定髋关节,并有助于站立、行走和奔跑。后两个特征此前已有研究提及,而这项新研究证实了它们的存在。
- 戴尔恢复 XPS 品牌
去年 CES 上 AI 戴尔宣布了一项受争议的决定:重塑了其产品品牌,废弃了有几十年历史的 XPS、Inspiron、Latitude 品牌,改用 Dell、Dell Pro 和 Dell Pro Max,每个品牌下有三个级别:Base、Plus 和 Premium。戴尔称,此举是为了让客户更方便的找到满足其需求的 AI PC。Dell 品牌 PC 针对娱乐、教育和工作,Dell Pro 针对生产力,Dell Pro Max 瞄准最高性能。一年之后戴尔在 2026 年 CES 展会上承认放弃 XPS 品牌是错误的决定,它决定重启 XPS,将其定位为高端笔记本电脑产品,不过它无意重启 Inspiron 和 Latitude 品牌。
- 世嘉联合创始人 David Rosen 去世
世嘉联合创始人 David Rosen 去世,享年 95 岁。David Rosen 在朝鲜战争期间是驻扎在日本的美国空军飞行员。战后他因为喜欢日本而留下,1954 年创办了 Rosen Enterprises,1965 年与另一家公司 Nihon Goraku Bussan 合并,该公司的投币游戏业务 Service Games 在新公司缩写为世嘉(Sega)。世嘉在之后的 15 年里从进口游戏转向自主设计游戏,从点唱机和弹珠台转向街机游戏,它还建立起了街机厅。Rosen 担任世嘉董事直到 1996 年,之后退休。在其任职期间,世嘉的街机业务是行业的领导者,但游戏机业务输给了任天堂。
- 美国大学仍然运作良好
美国的民意调查显示美国人对高等教育的态度在恶化。皮尤研究中心发现,认为大学“非常重要”的成年人比例从 2013 年的 70% 降至如今的 35%;NBC 民调显示认为大学学位“不值钱”的比例从同期的 40% 增至如今的 63%。但大学入学数据却与多数人的感知截然不同。2023 年四年制大学授予 200 万个学士学位,而 2010 年是 160 万;过去 15 年 25 岁群体拥有学士学位的比例稳步增长。从经济角度看,高等教育仍然具有强大的吸引力。拥有学士学位的人平均收入比拥有类似工作经验的高中毕业生高约 70%。如果将奖学金考虑在内,自 2015 年以来美国公立四年制大学的学费下降逾 20%。即使考虑学生贷款,大学毕业生每年的净收入比无学位者高约 8000 美元。造成认知偏差的部分原因可能是对大学收费存在误解。近半数美国成年人认为所有人学费都一样,但实际上只有不到 20% 的家庭支付了官方公布的学费。