OrangeBot.AI Digest — 2025-10-07
57 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Gemini 2.5 Computer Use model (blog.google)
- ICE bought vehicles equipped with fake cell towers to spy on phones (techcrunch.com)
- Google's requirement for developers to be verified threatens app store F-Droid (www.techdirt.com)
- Solar energy is now the cheapest source of power, study (www.surrey.ac.uk)
- German government comes out against Chat Control (xcancel.com)
- Doing Rails Wrong (www.bananacurvingmachine.com)
- Robin Williams' daughter pleads for people to stop sending AI videos of her dad (www.bbc.co.uk)
- Police Said They Surveilled Woman Who Had an Abortion for Her 'Safety.' (www.404media.co)
- An illustrated introduction to linear algebra (www.ducktyped.org)
- Show HN: Timelinize – Privately organize your own data from everywhere, locally (timelinize.com)
- IKEA Catalogs 1951-2021 (ikeamuseum.com)
- No account? No Windows 11, Microsoft says as another loophole snaps shut (www.theregister.com)
- Qualcomm to acquire Arduino (www.qualcomm.com)
- Devpush – Open-source and self-hostable alternative to Vercel, Render, Netlify (github.com)
- Nobel Prize in Physics 2025 (www.nobelprize.org)
GitHub Trending(15)
- Stremio / stremio-web
Stremio - Freedom to Stream
- trycua / cua
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
- simstudioai / sim
Open-source platform to build and deploy AI agent workflows.
- Infisical / infisical
Infisical is the open-source platform for secrets management, PKI, and SSH access.
- BeehiveInnovations / zen-mcp-server
The power of Claude Code / GeminiCLI / CodexCLI + [Gemini / OpenAI / OpenRouter / Azure / Grok / Ollama / Custom Model / All Of The Above] working as one.
- FlowiseAI / Flowise
Build AI Agents, Visually
- aandrew-me / ytDownloader
Desktop App for downloading Videos and Audios from hundreds of sites
- dgtlmoon / changedetection.io
Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monitoring—all for free or enjoy our SaaS plan!
- Flowseal / zapret-discord-youtube
- audacity / audacity
Audio Editor
- firefly-iii / firefly-iii
Firefly III: a personal finances manager
- openai / openai-agents-python
A lightweight, powerful framework for multi-agent workflows
- microsoft / BitNet
Official inference framework for 1-bit LLMs
- Morganamilo / paru
Feature packed AUR helper
- openemr / openemr
The most popular open source electronic health records and medical practice management solution.
Hugging Face(15)
- Paper2Video: Automatic Video Generation from Scientific Papers
Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tables), and the need to coordinate multiple aligned channels such as slides, subtitles, speech, and human talker. To address these challenges, we introduce PaperTalker, the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata. We further design four tailored evaluation metrics--Meta Similarity, PresentArena, PresentQuiz, and IP Memory--to measure how videos convey the paper's information to the audience. Building on this foundation, we propose PaperTalker, the first multi-agent framework for academic presentation video generation. It integrates slide generation with effective layout refinement by a novel effective tree search visual choice, cursor grounding, subtitling, speech synthesis, and talking-head rendering, while parallelizing slide-wise generation for efficiency. Experiments on Paper2Video demonstrate that the presentation videos produced by our approach are more faithful and informative than existing baselines, establishing a practical step toward automated and ready-to-use academic video generation. Our dataset, agent, and code are available at https://github.com/showlab/Paper2Video.
- Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Video understanding represents the most challenging frontier in computer vision, requiring models to reason about complex spatiotemporal relationships, long-term dependencies, and multimodal evidence. The recent emergence of Video-Large Multimodal Models (Video-LMMs), which integrate visual encoders with powerful decoder-based language models, has demonstrated remarkable capabilities in video understanding tasks. However, the critical phase that transforms these models from basic perception systems into sophisticated reasoning engines, post-training, remains fragmented across the literature. This survey provides the first comprehensive examination of post-training methodologies for Video-LMMs, encompassing three fundamental pillars: supervised fine-tuning (SFT) with chain-of-thought, reinforcement learning (RL) from verifiable objectives, and test-time scaling (TTS) through enhanced inference computation. We present a structured taxonomy that clarifies the roles, interconnections, and video-specific adaptations of these techniques, addressing unique challenges such as temporal localization, spatiotemporal grounding, long video efficiency, and multimodal evidence integration. Through systematic analysis of representative methods, we synthesize key design principles, insights, and evaluation protocols while identifying critical open challenges in reward design, scalability, and cost-performance optimization. We further curate essential benchmarks, datasets, and metrics to facilitate rigorous assessment of post-training effectiveness. This survey aims to provide researchers and practitioners with a unified framework for advancing Video-LMM capabilities. Additional resources and updates are maintained at: https://github.com/yunlong10/Awesome-Video-LMM-Post-Training
- VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
Recent video generation models can produce smooth and visually appealing clips, but they often struggle to synthesize complex dynamics with a coherent chain of consequences. Accurately modeling visual outcomes and state transitions over time remains a core challenge. In contrast, large language and multimodal models (e.g., GPT-4o) exhibit strong visual state reasoning and future prediction capabilities. To bridge these strengths, we introduce VChain, a novel inference-time chain-of-visual-thought framework that injects visual reasoning signals from multimodal models into video generation. Specifically, VChain contains a dedicated pipeline that leverages large multimodal models to generate a sparse set of critical keyframes as snapshots, which are then used to guide the sparse inference-time tuning of a pre-trained video generator only at these key moments. Our approach is tuning-efficient, introduces minimal overhead and avoids dense supervision. Extensive experiments on complex, multi-step scenarios show that VChain significantly enhances the quality of generated videos.
- MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Tree search has become as a representative framework for test-time reasoning with large language models (LLMs), exemplified by methods such as Tree-of-Thought and Monte Carlo Tree Search that explore multiple reasoning paths. However, it remains difficult to provide instant and reliable quantitative assessments of intermediate reasoning step quality, and extensive path exploration is computationally costly. To address this, we propose Mutual Information Tree Search (MITS), a novel framework that guides reasoning with information-theoretic principles. MITS introduces an effective scoring function based on pointwise mutual information (PMI), which enables step-wise evaluation of reasoning paths and search tree expansion via beam search without expensive look-ahead simulations, achieving superior reasoning performances while maintaining computational efficiency. The framework is complemented by an entropy-based dynamic sampling strategy that adaptively allocates computational resources to uncertain reasoning steps where exploration is most beneficial. For final prediction, MITS employs a weighted voting scheme that combines PMI scores with prediction consensus. Through comprehensive experiments on diverse reasoning benchmarks, MITS consistently surpasses baseline methods, establishing a principled and efficient framework for LLM reasoning.
- Imperceptible Jailbreaking against Large Language Models
Jailbreaking attacks on the vision modality typically rely on imperceptible adversarial perturbations, whereas attacks on the textual modality are generally assumed to require visible modifications (e.g., non-semantic suffixes). In this paper, we introduce imperceptible jailbreaks that exploit a class of Unicode characters called variation selectors. By appending invisible variation selectors to malicious questions, the jailbreak prompts appear visually identical to original malicious questions on screen, while their tokenization is "secretly" altered. We propose a chain-of-search pipeline to generate such adversarial suffixes to induce harmful responses. Our experiments show that our imperceptible jailbreaks achieve high attack success rates against four aligned LLMs and generalize to prompt injection attacks, all without producing any visible modifications in the written prompt. Our code is available at https://github.com/sail-sg/imperceptible-jailbreaks.
- Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Recent progress in large language models demonstrates that hybrid architectures--combining self-attention mechanisms with structured state space models like Mamba--can achieve a compelling balance between modeling quality and computational efficiency, particularly for long-context tasks. While these hybrid models show promising performance, systematic comparisons of hybridization strategies and analyses on the key factors behind their effectiveness have not been clearly shared to the community. In this work, we present a holistic evaluation of hybrid architectures based on inter-layer (sequential) or intra-layer (parallel) fusion. We evaluate these designs from a variety of perspectives: language modeling performance, long-context capabilities, scaling analysis, and training and inference efficiency. By investigating the core characteristics of their computational primitive, we identify the most critical elements for each hybridization strategy and further propose optimal design recipes for both hybrid models. Our comprehensive analysis provides practical guidance and valuable insights for developing hybrid language models, facilitating the optimization of architectural configurations.
- Optimal Scaling Needs Optimal Norm
Despite recent progress in optimal hyperparameter transfer under model and dataset scaling, no unifying explanatory principle has been established. Using the Scion optimizer, we discover that joint optimal scaling across model and dataset sizes is governed by a single invariant: the operator norm of the output layer. Across models with up to 1.3B parameters trained on up to 138B tokens, the optimal learning rate/batch size pair (eta^{ast}, B^{ast}) consistently has the same operator norm value - a phenomenon we term norm transfer. This constant norm condition is necessary but not sufficient: while for each dataset size, multiple (eta, B) reach the optimal norm, only a unique (eta^{ast}, B^{ast}) achieves the best loss. As a sufficient condition, we provide the first measurement of (eta^{ast}, B^{ast}) scaling with dataset size for Scion, and find that the scaling rules are consistent with those of the Adam optimizer. Tuning per-layer-group learning rates also improves model performance, with the output layer being the most sensitive and hidden layers benefiting from lower learning rates. We provide practical insights on norm-guided optimal scaling and release our Distributed Scion (Disco) implementation with logs from over two thousand runs to support research on LLM training dynamics at scale.
- Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.
- Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance
Instruction-tuning plays a vital role in enhancing the task-solving abilities of large language models (LLMs), improving their usability in generating helpful responses on various tasks. However, previous work has demonstrated that they are sensitive to minor variations in instruction phrasing. In this paper, we explore whether introducing perturbations in instruction-tuning data can enhance LLMs' resistance against noisy instructions. We focus on how instruction-tuning with perturbations, such as removing stop words or shuffling words, affects LLMs' performance on the original and perturbed versions of widely-used benchmarks (MMLU, BBH, GSM8K). We further assess learning dynamics and potential shifts in model behavior. Surprisingly, our results suggest that instruction-tuning on perturbed instructions can, in some cases, improve downstream performance. These findings highlight the importance of including perturbed instructions in instruction-tuning, which can make LLMs more resilient to noisy user inputs.
- Factuality Matters: When Image Generation and Editing Meet Structured Visuals
While modern visual generation models excel at creating aesthetically pleasing natural images, they struggle with producing or editing structured visuals like charts, diagrams, and mathematical figures, which demand composition planning, text rendering, and multimodal reasoning for factual fidelity. To address this, we present the first comprehensive, systematic investigation of this domain, encompassing data construction, model training, and an evaluation benchmark. First, we construct a large-scale dataset of 1.3 million high-quality structured image pairs derived from executable drawing programs and augmented with chain-of-thought reasoning annotations. Building on it, we train a unified model that integrates a VLM with FLUX.1 Kontext via a lightweight connector for enhanced multimodal understanding. A three-stage training curriculum enables progressive feature alignment, knowledge infusion, and reasoning-augmented generation, further boosted by an external reasoner at inference time. Finally, we introduce StructBench, a novel benchmark for generation and editing with over 1,700 challenging instances, and an accompanying evaluation metric, StructScore, which employs a multi-round Q\&A protocol to assess fine-grained factual accuracy. Evaluations of 15 models reveal that even leading closed-source systems remain far from satisfactory. Our model attains strong editing performance, and inference-time reasoning yields consistent gains across diverse architectures. By releasing the dataset, model, and benchmark, we aim to advance unified multimodal foundations for structured visuals.
- Judging with Confidence: Calibrating Autoraters to Preference Distributions
The alignment of large language models (LLMs) with human values increasingly relies on using other LLMs as automated judges, or ``autoraters''. However, their reliability is limited by a foundational issue: they are trained on discrete preference labels, forcing a single ground truth onto tasks that are often subjective, ambiguous, or nuanced. We argue that a reliable autorater must learn to model the full distribution of preferences defined by a target population. In this paper, we propose a general framework for calibrating probabilistic autoraters to any given preference distribution. We formalize the problem and present two learning methods tailored to different data conditions: 1) a direct supervised fine-tuning for dense, probabilistic labels, and 2) a reinforcement learning approach for sparse, binary labels. Our empirical results show that finetuning autoraters with a distribution-matching objective leads to verbalized probability predictions that are better aligned with the target preference distribution, with improved calibration and significantly lower positional bias, all while preserving performance on objective tasks.
- Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
The prevailing paradigm for enhancing the reasoning abilities of LLMs revolves around post-training on high-quality, reasoning-intensive data. While emerging literature suggests that reasoning data is increasingly incorporated also during the mid-training stage-a practice that is relatively more proprietary and less openly characterized-the role of such data in pretraining remains unclear. In particular, due to the opaqueness of pretraining corpora in most frontier models, the effect of reasoning data introduced at different phases of pre- and/or post-training is relatively less reported in the scientific literature. This raises several important questions: Is adding reasoning data earlier during pretraining any better than introducing it during post-training? Could earlier inclusion risk overfitting and harm generalization, or instead establish durable foundations that later fine-tuning cannot recover? We conduct the first systematic study of how reasoning data-varying in scale, diversity, and quality-affects LLM performance when introduced at different stages of training. We find that front-loading reasoning data into pretraining is critical (19% avg gain), establishing foundational capabilities that cannot be fully replicated by later-stage SFT, even with more data. We uncover an asymmetric principle for optimal data allocation: pretraining benefits most from broad diversity in reasoning patterns (11% avg gain), while SFT is more sensitive to data quality (15% avg gain). We show that high-quality pretraining data has latent effects, activated only after SFT, and that naively scaling SFT data can be detrimental, washing away the benefits of early reasoning injection. Our results challenge the conventional separation of language modeling and reasoning, providing a principled guide for strategically allocating data across the entire training pipeline to build more capable models.
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation -- modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. Building on the adaptive memory introduced by Dynamic Cheatsheet, we introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.
- Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Reinforcement learning has been central to recent advances in large language model reasoning, but most algorithms rely on on-policy training that demands fresh rollouts at every update, limiting efficiency and scalability. Asynchronous RL systems alleviate this by decoupling rollout generation from training, yet their effectiveness hinges on tolerating large staleness in rollout data, a setting where existing methods either degrade in performance or collapse. We revisit this challenge and uncover a prosperity-before-collapse phenomenon: stale data can be as informative as on-policy data if exploited properly. Building on this insight, we introduce M2PO (Second-Moment Trust Policy Optimization), which constrains the second moment of importance weights to suppress only extreme outliers while preserving informative updates. Notably, M2PO sharply reduces the fraction of clipped tokens under high staleness (from 1.22% to 0.06% over training), precisely masking high-variance tokens while maintaining stable optimization. Extensive evaluation across six models (from 1.7B to 32B) and eight benchmarks shows that M2PO delivers stable off-policy training even with data stale by at least 256 model updates and matches on-policy performance.
- Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
Reinforcement learning applied to large language models (LLMs) for reasoning tasks is often bottlenecked by unstable gradient estimates due to fixed and uniform sampling of responses across prompts. Prior work such as GVM-RAFT addresses this by dynamically allocating inference budget per prompt to minimize stochastic gradient variance under a budget constraint. Inspired by this insight, we propose Reinforce-Ada, an adaptive sampling framework for online RL post-training of LLMs that continuously reallocates sampling effort to the prompts with the greatest uncertainty or learning potential. Unlike conventional two-stage allocation methods, Reinforce-Ada interleaves estimation and sampling in an online successive elimination process, and automatically stops sampling for a prompt once sufficient signal is collected. To stabilize updates, we form fixed-size groups with enforced reward diversity and compute advantage baselines using global statistics aggregated over the adaptive sampling phase. Empirical results across multiple model architectures and reasoning benchmarks show that Reinforce-Ada accelerates convergence and improves final performance compared to GRPO, especially when using the balanced sampling variant. Our work highlights the central role of variance-aware, adaptive data curation in enabling efficient and reliable reinforcement learning for reasoning-capable LLMs. Code is available at https://github.com/RLHFlow/Reinforce-Ada.
Solidot(12)
- 微软表示会继续开发 XBox 游戏机
微软最近再次上调了 Xbox Series X 和 Series S 游戏机的售价,将订阅服务 Xbox Game Pass Ultimate 价格上涨 50%。一系列动作让很多人不看好微软游戏机业务的未来,包括 Costco 在内的零售商决定将 Xbox 产品下架。索尼 PS5 之后有 PS6,但 Xbox Series X 之后是否还会有新 Xbox?对于它可能放弃硬件制造的传言,微软周一发表声明重审它仍然致力于开发 Xbox 游戏机,继续与 AMD 公司在硬件方面进行合作。微软和索尼目前游戏机都使用 AMD 提供的 CPU 和 GPU 方案。微第一方 Xbox 掌机的计划据报道已经取消,原因据称是 AMD 在合同中要求销量至少要达到一千万,而 Steam Deck 自 2022 年发布以来销量也只有 400-500 万台。
- Ubuntu Linux 26.04 LTS 代号 Resolute Raccoon
在 Ubuntu 25.10 即将释出之际,Canonical 宣布下一个 LTS(长期支持版)Ubuntu 26.04 的代号为 Resolute Raccoon。Ubuntu 25.10 只支持九个月,而 Ubuntu 26.04 将支持五年,预计 2026 年 4 月释出。Ubuntu 25.10 的主要特性包括:Linux 6.17,GCC 15,使用 Rust 语言开发的系统组件 sudo-rs 和 Rust Coreutils,默认桌面环境 GNOME 49,等等。Ubuntu 26.04 的具体特性将在未来几个月逐步揭晓。
- 2025 年诺贝尔物理学奖授予了三名研究量子力学的美国科学家
2025 年诺贝尔物理学奖授予了美国科学家 John Clarke、Michel H. Devoret 和 John M. Martinis,以表彰他们发现了电路中的宏观量子力学隧穿效应和能量量子化。物理学中的一个主要问题是,能展示量子力学效应的系统最大尺度是多少。今年的诺贝尔奖获得者通过一个电路进行了实验,在该系统中,他们同时演示了量子力学隧穿效应和能量量子化,而这个系统的尺寸大到足以用手握住。在 1984 年和 1985 年,John clarke、Michel H. Devoret 和 John M. Martini 使用由超导体构建的电子电路进行了一系列实验。在电路中,超导元件被一层薄薄的绝缘材料隔开,这种结构被称为约瑟夫森结。通过精化并测量其电路的各种特性,他们能够控制并探索当电流通过时出现的现象。共同在超导体中移动的带电粒子构成了一个系统,其行为就好像它们是填充整个电路的单个粒子一样。这个宏观的类粒子系统最初处于一种电流流动而没有任何电压的状态。系统被束缚在这种状态中,就像被困在一个无法穿越的势垒后面。在实验中,系统通过成功隧穿脱离零电压状态,展示了其量子特性。系统状态的改变通过电压的出现而被检测到。
- 清理 50 块最具危险性的太空垃圾能将新碎片数量减半
根据上周悉尼国际宇航大会上发表的一项研究,如果能清理掉低地球轨道上最具有危险性的 50 块太空垃圾,那么新生成碎片的数量将能整体减半。论文主要作者是 Darren McKnight,他们计算了最可能与其它碎片碰撞产生更多碎片的低轨道物体。50 块最具危险性的太空垃圾有 34 块来自俄罗斯/苏联,10 块来自中国,美国 4 块,欧洲 2 块,日本 1 块。即使只清理掉其中最危险的 10 块,新太空碎片数量也能减少 30%。McKnight 指出,大部分太空垃圾来自于 2000 年之前,50 块最具有危险性的太空垃圾有 76% 是上个世纪留下的,88% 是遗留在太空的火箭残骸。坏消息是,自 2024 年 1 月 1 日以来,遗留在低地球轨道上的火箭残骸达到了 26 枚,它们会在轨道上停留逾 25 年。这 26 枚中有 21 枚是中国发射的,另外 5 枚来自美国、俄罗斯、印度和伊朗。随着中国加速发射和部署数量数以千计的国网和千帆星座,低轨道上的火箭残骸数量可能会继续增加。自去年发射国网和千帆星座以来,中国在轨道上遗留了9 枚火箭上面级的残骸,未来可能会遗留逾 100 枚。不过中国航天局的一名官员表示正在研究如何清理轨道上的太空垃圾。
- 如果 AI 泡沫破裂?
美国上半年经济增长率 1.6%,大部分增长来自对 AI 的投资。如果没有 AI 方面的投资,经济增长率将会只有这一数字的三分之一。AI 支出的巨大经济影响力表明,硅谷正以史无前例的规模押注 AI 技术将会彻底改变生活工作的各个方面。科技巨头如 Google、Meta、Microsoft 和 Amazon 今年预计在数据中心上的投资将会接近 4000 亿美元。如果这次押注失败,如此规模的经济影响力意味着,其经济损失将会远大于硅谷本身。科技圈和金融圈对 AI 投资的潜在泡沫的担忧日益加剧。ChatGPT 等 AI 工具深受企业和消费者的欢迎,过去三年 AI 领域已投入了数千亿美元。但 AI 公司至今都无法盈利,然而需要巨额利润才能让巨大的投资物有所值。科技公司如今主导着公开市场,其业绩和股价的任何变化会对股指、401(k)退休金以及更广泛的经济产生巨大影响。独立研究公司 MacroStrategy Partnership 估计,AI 泡沫的规模是互联网泡沫的 17 倍,是次贷泡沫的 4 倍。从未有过如此大规模的资金被如此迅速的投入到一项尽管潜力巨大,但其盈利商业模式尚未得到证实的技术上。
- 天文学家发现至今信号最强的奇异电波圈
天文学家发现了一个至今最遥远、信号最强的「奇异电波圈」(Odd Radio Circle 或 ORC)。这个神秘的天体,让科学家对星系及中心超大质量黑洞之间的互动获得新的线索。所谓「奇异电波圈」,是巨大的环状结构,目前仅在射电波段被观测到。ORC 直到六年前才第一次被发现,目前天文学家在可观测的宇宙中仅确认少数几个,每一个的尺寸都比我们的银河系大十倍以上。至于其成因,天文学界原本推测可能与星系合并或超大质量黑洞碰撞所产生的冲击波有关。而最新研究提出另一种解释:这些巨环或许是螺旋星系在喷发「超风外流」(superwind outflow)时形成的。这种超风由星遽增(starburst)活动所驱动,能将能量与物质吹送至星系之外,甚至扩展成庞大的电波泡泡。在某些情况下,黑洞活动也可能参与其中,使外流更为剧烈。根据这项研究,研究团队发现的奇异电波圈编号为 RAD J131346.9+500320,距离我们极为遥远,观测到的光线对应于宇宙年龄仅为现今一半时的景象。它是目前已知最远且电波最强的奇异电波圈。更特别的是,它拥有两个彼此交错的环状结构,目前仅有两个已知的奇异电波圈呈现出这种双环交错的结构。这些观测结果显示,奇异电波圈可能是星系与超大质量黑洞共同成长的线索,由黑洞喷流、星系风与周围环境交织而成的庞大等离子体结构。
- 2025 年诺贝尔生理学或医学奖授予了免疫系统研究员
2025 年诺贝尔生理学或医学奖授予了美国科学家 Mary E. Brunkow、Fred Ramsdell 和日本科学家 Shimon Sakaguchi,以表彰他们在防止免疫系统伤害身体的外周免疫耐受方面做出的开创性发现。人体强大的免疫系统必须受到调控,否则它可能攻击我们自身的器官。每天,我们的免疫系统会保护我们免受成千上万种不同微生物的入侵。它们都有不同的外观,其中许多还进化出与人类细胞相似的特征作为伪装。那么免疫系统是如何决定它应该攻击什么,防御什么呢?今年的获奖者识别出了免疫系统的"保安"——调节性T细胞,这些细胞能阻止免疫细胞攻击我们自身的身体。
- 如何阻止 AI 设计出有害蛋白质
由 AI 辅助的蛋白工程正在蛋白设计领域实现突破,但它们同时也带来了与产生潜在有害蛋白相关的生物安全挑战。实验室制造蛋白的必要步骤是订购编码该蛋白的 DNA。提供这些合成核酸的公司会用生物安全筛查软件(BSS)筛选客户订单,旨在发现和阻断可编码令人担忧蛋白的基因。而 AI 设计的氨基酸序列可能会因为差异足够大而逃避检测。根据发表在《科学》期刊上的一项研究,研究人员采用一种“AI 红队演练”法来评估 BSS 模型,旨在改进这些模型以增强生物安全性。他们利用开源 AI 蛋白质设计软件生成了超过 7 万 5000 种蛋白危险变体,并将其提交给四家不同的 BSS 开发商;他们发现,虽然所有工具在筛选原始野生型蛋白质时表现近乎完美,但它们检测重新设计变体的能力却不稳定。这些结果表明,尽管当前的 BSS 系统对未改变的序列仍然有效,但在面对通过现代生成式 AI 方法设计的蛋白序列同源物时,它们仍缺乏稳定一致的灵敏度。研究人员与 BSS 供应商合作开发了软件补丁,并由四家 BSS 中的三家部署到其系统之中。这些更新提高了该软件对 AI 生成变体的检测率,但假阳性却并未显著增加。
- 为什么女性比男性更长寿
女性通常比男性更长寿。传统的解释包括男性抽了更多烟,饮了更多酒,从事了更危险的行为。但不管哪个国家,不论哪个世纪,男女之间的寿命差距都存在,这表明还存在更深层次的原因。发表在《Science Advances》期刊上的一项研究再次证实,这一现象可能与女性有两个 X 染色体有关,一个冗余的染色体能帮助女性抵御有害突变。研究人员分析了动物园饲养的 528 种哺乳动物和 648 种鸟类的寿命数据,发现大多数哺乳动物与人类相似,近四分之三的哺乳动物雌性寿命比雄性长。而在鸟类中,68% 的鸟类雄性寿命更长,这是因为鸟类雌性有一对不同的染色体,而雄性的一对性染色体相同。
- 自由软件基金会庆祝四十周年,任命 Ian Kelling 为新主席
自由软件基金会(FSF)庆祝了诞生四十周年,向自由软件社区介绍了该组织理事会的新主席 Ian Kelling。FSF 成立于 1985 年 10 月 4 日,致力于推广自由软件,执行 GNU 计划。现任理事会成员包括了 Christina Haralanova、Geoffrey Knauth(财务主管)、Gerald J. Sussman、Ian Kelling 和 Richard M. Stallman(创始人)。Ian Kelling 现年 43 岁,自 2021 年起担任理事会成员和投票成员,是一位活跃的演讲者和博主,他表示将致力于加强 FSF 应对计算机用户自由新威胁的能力,将比以往任何时候欢迎更多自由软件支持者加入这项运动。
- 大曼彻斯特警署因有警官使用自动按键工具假装工作暂停了远程办公
有 12,677 名员工的大曼彻斯特警署(Greater Manchester Police),由于近期的调查发现有警员使用自动按键工具假装工作而暂停了远程办公,有 26 名警员、工作人员和合同工因行为不当而遭到起诉。根据调查,一名警员作证,一名警探在 12 天内 38 次让自己的电脑看起来像在使用中。证据显示,在很长时间里他唯一的活动是单次按键,12 月 3 日 10:28 到 11:56 GMT 之间,他按了 H 键约 30 次,之后按了 I 键逾 16000 次。在总共 85 小时的登录时间中,有 45 个小时使用了自动按键,他有一半的工作时间不在键盘旁。这名警探已经辞职。
- Opera 推出月费 19.9 美元的 AI 浏览器
Opera 不想错过 AI 热,它推出了一款 AI 浏览器 Opera Neon,前 9 个月价格 59.90 美元,之后每月 19.90 美元。Opera Neon 主要使用了云端运行的大模型,任务是浏览器的核心概念,Neon 利用 AI 为用户执行各种任务,Opera 称:“Neon 会按照你的指令行动,打开标签页、进行研究、寻找最优价格、评估安全性,无论你需要什么。它提供的结果可供你使用、共享和构建。”另一家 AI 公司 Perplexity 也发布了它的 AI 浏览器 Comet,用户可免费使用,可选择支付 5 美元获得 AI 新闻服务。