DIGEST · 2026-03-26

OrangeBot.AI Digest — 2026-03-26

84 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. We haven't seen the worst of what gambling and prediction markets will do (www.derekthompson.org)
  2. John Bradley, author of xv, has died (voxday.net)
  3. Why so many control rooms were seafoam green (2025) (bethmathews.substack.com)
  4. Olympic Committee bars transgender athletes from women’s events (www.nytimes.com)
  5. My minute-by-minute response to the LiteLLM malware attack (futuresearch.ai)
  6. AI users whose lives were wrecked by delusion (www.theguardian.com)
  7. Newly purchased Vizio TVs now require Walmart accounts to use smart features (arstechnica.com)
  8. Moving from GitHub to Codeberg, for lazy people (unterwaditzer.net)
  9. European Parliament decided that Chat Control 1.0 must stop (bsky.app)
  10. Landmark L.A. jury verdict finds Instagram, YouTube were designed to addict kids (www.latimes.com)
  11. End of "Chat Control": EU parliament stops mass surveillance (www.patrick-breyer.de)
  12. LibreOffice and the art of overreacting (blog.documentfoundation.org)
  13. From zero to a RAG system: successes and failures (en.andros.dev)
  14. Obsolete Sounds (citiesandmemory.com)
  15. Swift 6.3 (www.swift.org)

GitHub Trending(9)

  1. mvanhorn / last30days-skill
  2. Yeachan-Heo / oh-my-claudecode
  3. virattt / dexter
  4. ruvnet / RuView
  5. bytedance / deer-flow
  6. Vaibhavs10 / insanely-fast-whisper
  7. agentscope-ai / agentscope
  8. twentyhq / twenty
  9. datalab-to / chandra

Product Hunt(15)

  1. DenchClaw

    Open Source AI CRM hosted locally on your machine

  2. Listen To This

    Paste an article to listen to it in your podcast app

  3. Playcode

    The world's best AI website builder. 10 years in the making.

  4. Mokkit

    Turn any screenshot into a scroll-stopping animated visual

  5. Jentic Mini

    Give your AI agents safe access to 10,000+ APIs

  6. Spotify SongDNA

    The interactive creative network behind your favorite music

  7. Arm AGI CPU

    The world’s most efficient agentic CPU

  8. Appoval

    Ship to the App Store with confidence

  9. Dunky AI

    Practice your elevator pitch with Dunky AI

  10. Triqai

    Turn messy bank transactions into clean, structured data

  11. Linear Agent

    Synthesize context, makes recommendations, and takes action.

  12. Breadth Edits Beta v1.0

    Transforms prompts into professional, high-end motion edits

  13. Zcode

    Build iPhone and Mac native apps with LLMs

  14. Luzo

    Design and debug API workflows visually

  15. Zeus

    Highly autonomous agent for finishing complex, long tasks

Hugging Face(15)

  1. CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

    Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the critical missing ingredient for scaling these agents. However, the largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video. To address this bottleneck, we introduce CUA-Suite, a large-scale ecosystem of expert video demonstrations and dense annotations for professional desktop computer-use agents. At its core is VideoCUA, which provides approximately 10,000 human-demonstrated tasks across 87 diverse applications with continuous 30 fps screen recordings, kinematic cursor traces, and multi-layerfed reasoning annotations, totaling approximately 55 hours and 6 million frames of expert video. Unlike sparse datasets that capture only final click coordinates, these continuous video streams preserve the full temporal dynamics of human interaction, forming a superset of information that can be losslessly transformed into the formats required by existing agent frameworks. CUA-Suite further provides two complementary resources: UI-Vision, a rigorous benchmark for evaluating grounding and planning capabilities in CUAs, and GroundCUA, a large-scale grounding dataset with 56K annotated screenshots and over 3.6 million UI element annotations. Preliminary evaluation reveals that current foundation action models struggle substantially with professional desktop applications (~60% task failure rate). Beyond evaluation, CUA-Suite's rich multimodal corpus supports emerging research directions including generalist screen parsing, continuous spatial control, video-based reward modeling, and visual world models. All data and models are publicly released.

  2. EVA: Efficient Reinforcement Learning for End-to-End Video Agent

    Video understanding with multimodal large language models (MLLMs) remains challenging due to the long token sequences of videos, which contain extensive temporal dependencies and redundant frames. Existing approaches typically treat MLLMs as passive recognizers, processing entire videos or uniformly sampled frames without adaptive reasoning. Recent agent-based methods introduce external tools, yet still depend on manually designed workflows and perception-first strategies, resulting in inefficiency on long videos. We present EVA, an Efficient Reinforcement Learning framework for End-to-End Video Agent, which enables planning-before-perception through iterative summary-plan-action-reflection reasoning. EVA autonomously decides what to watch, when to watch, and how to watch, achieving query-driven and efficient video understanding. To train such agents, we design a simple yet effective three-stage learning pipeline - comprising supervised fine-tuning (SFT), Kahneman-Tversky Optimization (KTO), and Generalized Reward Policy Optimization (GRPO) - that bridges supervised imitation and reinforcement learning. We further construct high-quality datasets for each stage, supporting stable and reproducible training. We evaluate EVA on six video understanding benchmarks, demonstrating its comprehensive capabilities. Compared with existing baselines, EVA achieves a substantial improvement of 6-12% over general MLLM baselines and a further 1-3% gain over prior adaptive agent methods. Our code and model are available at https://github.com/wangruohui/EfficientVideoAgent.

  3. T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

    While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.

  4. UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

    Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

  5. Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

    Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly. Across Qwen3-8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct, we observe performance drops of up to 40%. Our findings highlight that exposing appropriate levels of uncertainty is crucial for robust reasoning and underscore the importance of optimizing reasoning behavior beyond merely reinforcing correct answer traces.

  6. GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

    Multimodal LLMs are increasingly deployed as perceptual backbones for autonomous agents in 3D environments, from robotics to virtual worlds. These applications require agents to perceive rapid state changes, attribute actions to the correct entities, and reason about concurrent multi-agent behaviors from a first-person perspective, capabilities that existing benchmarks do not adequately evaluate. We introduce GameplayQA, a framework for evaluating agentic-centric perception and reasoning through video understanding. Specifically, we densely annotate multiplayer 3D gameplay videos at 1.22 labels/second, with time-synced, concurrent captions of states, actions, and events structured around a triadic system of Self, Other Agents, and the World, a natural decomposition for multi-agent environments. From these annotations, we refined 2.4K diagnostic QA pairs organized into three levels of cognitive complexity, accompanied by a structured distractor taxonomy that enables fine-grained analysis of where models hallucinate. Evaluation of frontier MLLMs reveals a substantial gap from human performance, with common failures in temporal and cross-video grounding, agent-role attribution, and handling the decision density of the game. We hope GameplayQA stimulates future research at the intersection of embodied AI, agentic perception, and world modeling.

  7. When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

    Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult to scale. To address this, we propose an unsupervised self-evolution training framework for multimodal reasoning that achieves stable performance improvements without using human-annotated answers or external reward models. For each input, we sample multiple reasoning trajectories and jointly model their within group structure. We use the Actor's self-consistency signal as a training prior, and introduce a bounded Judge based modulation to continuously reweight trajectories of different quality. We further model the modulated scores as a group level distribution and convert absolute scores into relative advantages within each group, enabling more robust policy updates. Trained with Group Relative Policy Optimization (GRPO) on unlabeled data, our method consistently improves reasoning performance and generalization on five mathematical reasoning benchmarks, offering a scalable path toward self-evolving multimodal models. The code are available at https://github.com/OPPO-Mente-Lab/LLM-Self-Judge.

  8. Understanding the Challenges in Iterative Generative Optimization with LLMs

    Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite active research, only 9% of surveyed agents used any automated optimization. We argue that this brittleness arises because, to set up a learning loop, an engineer must make ``hidden'' design choices: What can the optimizer edit and what is the "right" learning evidence to provide at each update? We investigate three factors that affect most applications: the starting artifact, the credit horizon for execution traces, and batching trials and errors into learning evidence. Through case studies in MLAgentBench, Atari, and BigBench Extra Hard, we find that these design decisions can determine whether generative optimization succeeds, yet they are rarely made explicit in prior work. Different starting artifacts determine which solutions are reachable in MLAgentBench, truncated traces can still improve Atari agents, and larger minibatches do not monotonically improve generalization on BBEH. We conclude that the lack of a simple, universal way to set up learning loops across domains is a major hurdle for productionization and adoption. We provide practical guidance for making these choices.

  9. SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision

    3D Gaussian Splatting (3DGS) enables real-time, photorealistic novel view synthesis, making it a highly attractive representation for model-based video tracking. However, leveraging the differentiability of the 3DGS renderer "in the wild" remains notoriously fragile. A fundamental bottleneck lies in the compact, local support of the Gaussian primitives. Standard photometric objectives implicitly rely on spatial overlap; if severe camera misalignment places the rendered object outside the target's local footprint, gradients strictly vanish, leaving the optimizer stranded. We introduce SpectralSplats, a robust tracking framework that resolves this "vanishing gradient" problem by shifting the optimization objective from the spatial to the frequency domain. By supervising the rendered image via a set of global complex sinusoidal features (Spectral Moments), we construct a global basin of attraction, ensuring that a valid, directional gradient toward the target exists across the entire image domain, even when pixel overlap is completely nonexistent. To harness this global basin without introducing periodic local minima associated with high frequencies, we derive a principled Frequency Annealing schedule from first principles, gracefully transitioning the optimizer from global convexity to precise spatial alignment. We demonstrate that SpectralSplats acts as a seamless, drop-in replacement for spatial losses across diverse deformation parameterizations (from MLPs to sparse control points), successfully recovering complex deformations even from severely misaligned initializations where standard appearance-based tracking catastrophically fails.

  10. The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

    While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.

  11. 4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video

    We introduce 4DGS360, a diffusion-free framework for 360^{circ} dynamic object reconstruction from casual monocular video. Existing methods often fail to reconstruct consistent 360^{circ} geometry, as their heavy reliance on 2D-native priors causes initial points to overfit to visible surface in each training view. 4DGS360 addresses this challenge through a advanced 3D-native initialization that mitigates the geometric ambiguity of occluded regions. Our proposed 3D tracker, AnchorTAP3D, produces reinforced 3D point trajectories by leveraging confident 2D track points as anchors, suppressing drift and providing reliable initialization that preserves geometry in occluded regions. This initialization, combined with optimization, yields coherent 360^{circ} 4D reconstructions. We further present iPhone360, a new benchmark where test cameras are placed up to 135^{circ} apart from training views, enabling 360^{circ} evaluation that existing datasets cannot provide. Experiments show that 4DGS360 achieves state-of-the-art performance on the iPhone360, iPhone, and DAVIS datasets, both qualitatively and quantitatively.

  12. CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

    Multimodal agentic pipelines are transforming human-computer interaction by enabling efficient and accessible automation of complex, real-world tasks. However, recent efforts have focused on short-horizon or general-purpose applications (e.g., mobile or desktop interfaces), leaving long-horizon automation for domain-specific systems, particularly in healthcare, largely unexplored. To address this, we introduce CareFlow, a high-quality human-annotated benchmark comprising complex, long-horizon software workflows across medical annotation tools, DICOM viewers, EHR systems, and laboratory information systems. On this benchmark, existing vision-language models (VLMs) perform poorly, struggling with long-horizon reasoning and multi-step interactions in medical contexts. To overcome this, we propose CarePilot, a multi-agent framework based on the actor-critic paradigm. The Actor integrates tool grounding with dual-memory mechanisms (long-term and short-term experience) to predict the next semantic action from the visual interface and system state. The Critic evaluates each action, updates memory based on observed effects, and either executes or provides corrective feedback to refine the workflow. Through iterative agentic simulation, the Actor learns to perform more robust and reasoning-aware predictions during inference. Our experiments show that CarePilot achieves state-of-the-art performance, outperforming strong closed-source and open-source multimodal baselines by approximately 15.26% and 3.38%, respectively, on our benchmark and out-of-distribution dataset.

  13. Qworld: Question-Specific Evaluation Criteria for LLMs

    Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends on the question's context. Binary scores and static rubrics fail to capture these context-dependent requirements. Existing methods define criteria at the dataset level or generate them in a single pass, which limits their ability to explore the evaluation space implied by each question. We introduce One-Question-One-World (Qworld), a method that generates question-specific evaluation criteria using a recursive expansion tree. Given a question, Qworld decomposes it into scenarios, perspectives, and fine-grained binary criteria through structured hierarchical and horizontal expansion. The resulting criteria specify what a high-quality answer must address for that question. On HealthBench, Qworld covers 89% of expert-authored criteria and generates 79% novel criteria validated by human experts. Experts rate Qworld criteria higher in insight and granularity than those produced by prior methods. When applied to 11 frontier LLMs on HealthBench and Humanity's Last Exam, Qworld reveals capability differences in dimensions such as long-term impact, equity, error handling, and interdisciplinary reasoning that coarse rubrics do not distinguish. By formulating criteria generation as structured coverage of question-implied evaluation axes, Qworld enables evaluation that adapts to each question rather than relying on fixed task-level criteria.

  14. LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis

    Recent work has shown that neural networks can perform 3D tasks such as Novel View Synthesis (NVS) without explicit 3D reconstruction. Even so, we argue that strong 3D inductive biases are still helpful in the design of such networks. We show this point by introducing LagerNVS, an encoder-decoder neural network for NVS that builds on `3D-aware' latent features. The encoder is initialized from a 3D reconstruction network pre-trained using explicit 3D supervision. This is paired with a lightweight decoder, and trained end-to-end with photometric losses. LagerNVS achieves state-of-the-art deterministic feed-forward Novel View Synthesis (including 31.4 PSNR on Re10k), with and without known cameras, renders in real time, generalizes to in-the-wild data, and can be paired with a diffusion decoder for generative extrapolation.

  15. OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

    While proprietary systems such as Seedance-2.0 have achieved remarkable success in omni-capable video generation, open-source alternatives significantly lag behind. Most academic models remain heavily fragmented, and the few existing efforts toward unified video generation still struggle to seamlessly integrate diverse tasks within a single framework. To bridge this gap, we propose OmniWeaving, an omni-level video generation model featuring powerful multimodal composition and reasoning-informed capabilities. By leveraging a massive-scale pretraining dataset that encompasses diverse compositional and reasoning-augmented scenarios, OmniWeaving learns to temporally bind interleaved text, multi-image, and video inputs while acting as an intelligent agent to infer complex user intentions for sophisticated video creation. Furthermore, we introduce IntelligentVBench, the first comprehensive benchmark designed to rigorously assess next-level intelligent unified video generation. Extensive experiments demonstrate that OmniWeaving achieves SoTA performance among open-source unified models. The codes and model will be made publicly available soon. Project Page: https://omniweaving.github.io.

Techmeme(15)

  1. OpenAI has surpassed $100M in annualized revenue from ChatGPT ads, has expanded to 600+ advertisers, and plans to launch self-serve advertiser access in April (Stephanie Palazzolo/The Information)

    Stephanie Palazzolo / The Information : OpenAI has surpassed $100M in annualized revenue from ChatGPT ads, has expanded to 600+ advertisers, and plans to launch self-serve advertiser access in April —  OpenAI has surpassed $100 million in annualized revenue from its ChatGPT ads business, six weeks after the pilot was announced, according to a spokesperson.

  2. Google releases new tools for its Gemini AI assistant that let users upload chat history and context from other AI apps, making it easier to switch from them (Natalie Lung/Bloomberg)

    Natalie Lung / Bloomberg : Google releases new tools for its Gemini AI assistant that let users upload chat history and context from other AI apps, making it easier to switch from them —  Google released new tools for its Gemini artificial intelligence assistant that will let users upload chat history and context from other AI apps …

  3. Apple discontinues the Mac Pro and says it has no plans to offer future Mac Pro hardware (Chance Miller/9to5Mac)

    Chance Miller / 9to5Mac : Apple discontinues the Mac Pro and says it has no plans to offer future Mac Pro hardware —  It's the end of an era: Apple has confirmed to 9to5Mac that the Mac Pro is being discontinued.  It is being removed from Apple's website as of Thursday afternoon.  The “buy” page on Apple's website …

  4. Sources: Apple granted out-of-cycle bonuses worth several hundred thousand dollars to iPhone hardware designers, as OpenAI and others poach its engineers (Mark Gurman/Bloomberg)

    Mark Gurman / Bloomberg : Sources: Apple granted out-of-cycle bonuses worth several hundred thousand dollars to iPhone hardware designers, as OpenAI and others poach its engineers —  Apple Inc. awarded rare bonuses to iPhone hardware designers this week, aiming to stem a wave of departures to AI startups like OpenAI that are building their own devices.

  5. X limits X Pro access to subscribers of the $40/month Premium+ plan without notifying users in advance; it was previously available in the $8/month Premium plan (Juli Clover/MacRumors)

    Juli Clover / MacRumors : X limits X Pro access to subscribers of the $40/month Premium+ plan without notifying users in advance; it was previously available in the $8/month Premium plan —  Social network X is now limiting X Pro access to customers who subscribe to the X Premium+ plan, which is priced at $40 per month (or $33/month when paid annually).

  6. Xona, which aims to build a commercial alternative to GPS by launching a low-Earth orbit satellite constellation, raised a $170M Series C (Sandra Erwin/SpaceNews)

    Sandra Erwin / SpaceNews : Xona, which aims to build a commercial alternative to GPS by launching a low-Earth orbit satellite constellation, raised a $170M Series C —  The company is scaling up production as it looks to build a 258-satellite network to provide positioning, navigation and timing services

  7. Meta plans to increase its investment in a data center in El Paso, Texas, to more than $10B, a significant rise from the initial $1.5B commitment (Riley Griffin/Bloomberg)

    Riley Griffin / Bloomberg : Meta plans to increase its investment in a data center in El Paso, Texas, to more than $10B, a significant rise from the initial $1.5B commitment —  Meta Platforms Inc. will spend more than $10 billion to develop a data center in El Paso, Texas, a jump from prior projections and the latest …

  8. Netflix raises US prices following a January 2025 hike; standard with ads rises $1 to $8.99/month; standard with no ads and premium rise $2 to $19.99 and $26.99 (Todd Spangler/Variety)

    Todd Spangler / Variety : Netflix raises US prices following a January 2025 hike; standard with ads rises $1 to $8.99/month; standard with no ads and premium rise $2 to $19.99 and $26.99 —  Netflix, for the second time in a little over a year, is raising prices for its three plans in the U.S. The new pricing …

  9. Sources: X let go of 20+ staffers in nontechnical roles ahead of a SpaceX IPO; X staff have been told to focus on growing revenue since xAI brought on a CRO (Wall Street Journal)

    Wall Street Journal : Sources: X let go of 20+ staffers in nontechnical roles ahead of a SpaceX IPO; X staff have been told to focus on growing revenue since xAI brought on a CRO —  Redundant roles have been removed as the social-media company tries to boost profit and integrate with Musk's space-exploration company

  10. Italy-based Subbyx, which builds infrastructure that lets businesses offer access-based subscriptions, raised a €30M Series A (Tamara Djurickovic/Tech.eu)

    Tamara Djurickovic / Tech.eu : Italy-based Subbyx, which builds infrastructure that lets businesses offer access-based subscriptions, raised a €30M Series A —  Subbyx will use the funding to expand its subscription infrastructure platform, supporting businesses in transitioning from ownership to access-based models …

  11. Fannie Mae will accept crypto-backed mortgages for the first time; Coinbase launches a mortgage product allowing buyers to pledge bitcoin or USDC as collateral (Wall Street Journal)

    Wall Street Journal : Fannie Mae will accept crypto-backed mortgages for the first time; Coinbase launches a mortgage product allowing buyers to pledge bitcoin or USDC as collateral —  New offering from Better Home & Finance and Coinbase allows home buyers to pledge bitcoin and other cryptocurrencies when making a down payment

  12. Sources: Apple plans to open up Siri to run any AI service via App Store apps in iOS 27, dropping ChatGPT as exclusive partner in Apple Intelligence and Siri (Mark Gurman/Bloomberg)

    Mark Gurman / Bloomberg : Sources: Apple plans to open up Siri to run any AI service via App Store apps in iOS 27, dropping ChatGPT as exclusive partner in Apple Intelligence and Siri —  Apple Inc. plans to open Siri to outside artificial intelligence assistants, a major move aimed at bolstering the iPhone as an AI platform.

  13. The China Computer Federation calls for a boycott of AI conference NeurIPS after organizers barred submissions from US-sanctioned companies like Huawei (Vincent Chow/South China Morning Post)

    Vincent Chow / South China Morning Post : The China Computer Federation calls for a boycott of AI conference NeurIPS after organizers barred submissions from US-sanctioned companies like Huawei —  Move to comply with US sanctions sparks backlash, with China's top computing body threatening to blacklist the AI conference

  14. Google expands Search Live, its AI conversational search feature previously limited to the US and India, to all languages and regions where AI Mode is available (Aisha Malik/TechCrunch)

    Aisha Malik / TechCrunch : Google expands Search Live, its AI conversational search feature previously limited to the US and India, to all languages and regions where AI Mode is available —  Google announced on Thursday that it's expanding its AI-powered conversational search feature, Search Live …

  15. Meta stock fell 8% on Thursday after juries in two US trials found the company failed to adequately warn or protect young users (Reuters)

    Reuters : Meta stock fell 8% on Thursday after juries in two US trials found the company failed to adequately warn or protect young users —  Meta Platforms (META.O) shares dropped 7% on Thursday after two verdicts holding it liable for harm to young users sparked fears the social media giant …

Solidot(15)

  1. Sora 为何失败:每天推理成本最高 1500 万美元总收入仅为 210 万美元

    Sora 一度被视为是视频的未来,但却成为 OpenAI 少数关闭的产品之一。很多人为之惋惜,但数据显示这款产品是注定要关闭的,因为其经济模式不可维持。Sora 在鼎盛时期每天的推理成本高达 1500 万美元,一年的服务器总支出可能高达几十亿美元,而该应用至今的总收入为 210 万美元,也就是收入相对于支出几乎为零。Sora 的活跃用户数也远远少于它的聊天机器人 ChatGPT:Sora 在 2025 年 11 月在 iOS 和 Google Play 的下载量为 333 万,但到了 2026 年 2 月下载量 110 万次左右,跌至峰值的三分之一,它的月活用户数在 2025 年 12 月达到峰值,之后开始下降,也就是用户在流失而不是增长。

  2. 汽车钠离子电池能在 11 分钟内完成充电

    在宁德时代和长安发布首款搭载钠离子电池的量产电动汽车后,北汽宣布了它的极光钠离子电池原型。钠离子电池成本比锂离子电池低,对原材料的价格不那么敏感,宁德时代、比亚迪等都在押注钠离子电池以应对不断上涨的锂材料价格。北汽称其钠离子电池能在 -40°C 至 60°C 内稳定工作,-20°C能量保持率超 92%,电芯能量密度 170Wh/kg,充电仅需 11 分钟,搭载该电池的 CLTC 汽车续航里程可达 450 公里。

  3. 学生说服学校设立 Tor 中继节点

    台师大计算机科学和信息工程系学生苏恩立(Su En-Li,NZ)成功说服学校设立了首个校园 Tor 中继节点。Tor 中继节点与出口节点不同,它只是中继加密的流量而不直接向用户传输内容,因此风险较低。苏恩立通过公开正规的行政流程与学校的网络管理员、教授和系主任进行邮件沟通,最终成功在管理严密的学术网络 TANet 内设立了首个 Tor 中继节点。苏恩立还通过组织一系列活动帮助人们理解匿名网络≠犯罪工具。

  4. Meta 和 YouTube 在社媒上瘾案中犯有疏忽罪

    在美国一起具有里程碑意义的社媒成瘾案中,由七名女性和五名男性组成的陪审团裁定 Meta 和 YouTube 犯有疏忽罪,认为社媒广泛使用的无限滚动和算法推荐等上瘾设计损害了一名年轻用户,导致其出现心理健康问题。判决裁定赔偿 600 万美元,其中 Meta 支付 420 万美元,YouTube 180 万美元。这一判决可能为更多针对社媒的诉讼铺平道路。这起案件由 20 岁的 K.G.M. 提起,她指控社媒公司开发的产品如香烟或网络赌场一样让人上瘾。K.G.M. 起诉了拥有 Instagram 和 Facebook 的 Meta 以及 Google 的 YouTube,指控其提供的无限滚动和算法推荐等功能导致她焦虑和抑郁。TikTok 和 Snap 也是被告方,但在审判前与原告达成了和解,和解条款未公开。最新判决可能迫使社媒公司对其产品做出改变。

  5. 新西兰卫生部警告员工不要用生成式 AI 撰写临床记录

    新西兰卫生部警告员工不要在工作中使用生成式 AI 如 ChatGPT、Claude 或 Gemini 等工具撰写临床记录,称这么做可能会导致正式的纪律处分。根据官员发布的备忘录,出于数据安全、隐私和问责方面的考虑,严禁将免费 AI 工具如 ChatGPT、Claude 和 Gemini 用于临床用途。即使匿名化了患者信息,也不允许使用 AI 工具起草记录,然后再转录成手写或打印记录。代表医疗工作人员的 Public Service Association 则表示临床员工转向 AI 工具是因为他们面临巨大的压力,威胁纪律处分是错误的做法。

  6. 加拿大移民局根据 AI 虚构的工作描述拒绝移民申请

    加拿大移民局在使用生成式 AI 帮助处理移民申请,而它的 AI 不可避免的产生了幻觉,以虚构的工作描述拒绝了一位科学家的移民申请。这位申请人在加拿大从事医疗健康方面的科研工作,拥有衰老免疫学博士学位,但 AI 将她的工作描述为“连接和组装控制电路、装配控制和机器人面板、编程和故障排除”。移民局以其工作职责与她所声称的加拿大工作经验不符拒绝了申请。申请人的律师对此倍感震惊。加拿大移民局坚称生成式 AI 并未参与决策,最终决定是由人类官员在审查后做出的。

  7. CERN 科学家首次成功运输反物质

    欧洲核子研究中心(CERN)的研究团队周二将 92 个反质子装进一个利用磁场捕获它们的特制瓶子中。一辆装载着这个瓶子的卡车,沿着位于瑞士日内瓦郊外的 CERN 实验室场地行驶了 30 分钟逾 8 公里最高时速 42 公里。 CERN 是世界上唯一能大量生产反质子的地方,实验的最终目标就是将反质子运送到一个不受实验噪声干扰的地方,从而对其开展更精准的研究。反物质是物质的等量、反状态,两者相遇会相互湮灭,完全转化为能量,这使得储存或移动反物质变得极其困难。CERN 通过让质子束撞击一块致密金属来制造反物质,然后利用电场和磁场减速并捕获由此产生的反质子。这一过程十分艰难,而且大多数粒子都在此过程中丢失了。研究团队开发了一种便携式粒子陷阱,使粒子永远不接触含物质的容器侧壁。这意味着要为超导磁体系统供电,并使用低温技术将其冷却至 4 开尔文(-269摄氏度)。瓶子必须保持在非常严格的真空环境中,以防止反物质在运输途中与任何游离的物质粒子相遇并湮灭,而且所有设备都必须能够承受卡车运输过程中产生的各种力。研究团队还安装了一个探测器,可以在驾驶座上检查反质子的情况。

  8. FreeCAD v1.1 释出

    FreeCAD 项目释出了 v1.1 版本。主要新特性包括:Part Design 透明预览、Fillet 和 Chamfer 等工具新增交互式拖拽功能、三点照明、Clarify Selection 工具、Assembly 和 FEM 改进和动画效果、全新 CAM 工具库系统,等等。

  9. Krita 5.3.0 和 6.0.0 释出

    开源绘画软件项目 Krita 释出了 5.3.0 和 6.0.0 版本。两个版本功能上基本相同,区别是 6.0.0 版本是首个基于 Qt 6 的版本,目前还处于实验阶段不推荐使用。主要变化包括:画布编辑,完整支持 OpenType,文本融入形状,填充工具能封闭空隙,变换工具更快的液化模式,新滤镜如 Propagate Colors 和 重置透明度,改进 HDR 绘画,等等。

  10. 河狸能将河流变成蓄碳湿地

    河狸被誉为自然界的水坝工程师,它们会在河流小溪中建起水坝,创建湿地,改善周围的环境。根据发表在《Communications Earth & Environment》期刊上的一项研究,欧洲研究人员分析了瑞士的一处河狸湿地,发现该湿地每年能储存约 98 吨碳,固碳量上限为 1194 吨。研究人员对湿地进行了为期一年的监测,测量了水流量、水中溶解的碳含量、温室气体如二氧化碳和甲烷的释放量以及植物生长情况,还采集分析了河狸到来后积累的沉积物和枯木。当水坝减缓水速时,沉积物开始沉淀。沉积物中的有机物质如树叶和土壤都含有碳。它们不会冲向下游而是埋入湿地土壤。河狸筑的水坝会抬高水位,淹没现有的植被。树木枯死沉入水中,它们会在很长一段时间里缓慢蓄碳。湿地中生长的新一代植物和藻类也会吸收大气中的碳。湿地随时间的推移而逐渐成为一个天然的碳储存系统。

  11. 国安法实施细则授予警方获取手机和计算机密码的权力

    《香港国安法》第四十三条实施细则赋权警员可要求指明人士提供电子设备所需的密码或解密方法,不遵从或可处罚款 10 万元及监禁 1 年。保安局长邓炳强回应称,警员须基于国安原因到法院申请手令,才可搜证。假如拒绝提供设备密码,情况有如警方以搜查令搜屋时,屋内的人“顶着道门”不让警察进入,“所以有罚则,也是非常合理的”。被问到忘记密码是否等同拒绝提供解密方法,邓炳强表示不能一概而论,要视警员观察指明人士的言行判断,他举例若指明人士当天曾数度使用手提电话,无理由突然忘记密码,但若有纪录显示,指明人士近一年或三年前接触该受查电子设备,忘记密码可能合理。

  12. Wine 11 的 NTSYNC 内核模块显著提升 Windows 游戏在 Linux 上的性能

    今年早些时候释出的 Wine 11 引入了一个全新的 NTSYNC 内核驱动去实现 Windows NT 同步原语。Windows NT 同步原语以前是通过一个用户空间 RPC 实现的,其开销日益成为性能瓶颈,NTSYNC 在内核中实现了同步原语,因此显著提升了 Windows 游戏在 Linux 上的性能。开发者的测试显示,《Dirt 3》的帧率从 110.6 FPS 提升至 860.7 FPS,提升幅度高达 678%;《Resident Evil 2》从 26 FPS 提升至 77 FPS;《Call of Juarez》从 99.8 FPS 提升至 224.1 FPS,《Tiny Tina's Wonderlands》从 130 FPS 提升至 360 FPS,《Call of Duty: Black Ops I》在 Linux 上也能流畅运行了。NTSYNC 的最主要受益者是需要大量多线程运算、同步开销成为瓶颈的游戏。

  13. 火星首次发现红宝石

    美国洛斯阿拉莫斯国家实验室科学家首次在火星鹅卵石中发现宝石级矿物,其成分与地球上的红宝石相差无几。火星红宝石由名为刚玉的物质组成。刚玉的主要成分是氧化铝,硬度仅次于钻石。它本身无色透明,却因蕴藏不同元素而呈现不同面貌:混入微量铬,便成为艳丽的红宝石;含铁或钛,则成为深邃如海的蓝宝石;若不含致色元素,便是纯净的白刚玉。这些红色宝石,是“毅力号”火星车在探索杰泽罗陨石坑时意外发现的。在一次对名为“汉普登河”的岩石进行分析时,火星车搭载的激光仪器 SuperCam 使用了两种探测手段——用激光烧蚀岩石表层,或激发其发光,再通过两台相机捕捉光谱信息。两次检测均显示,其矿物成分与地球红宝石惊人相似,暗示其中可能藏有微小的刚玉颗粒。这是科学家首次在火星上发现宝石,其形成机制可能与地球截然不同。地球上,刚玉多与板块构造活动相关,需要低硅高铝的特殊环境。而火星并无板块运动,科学家推测,这些刚玉或由远古陨石撞击产生——在撞击瞬间,巨大热量与压力加热或压缩火星表面的尘埃,孕育出这些微小珍宝。

  14. 阿里巴巴发布优化运行国产大模型的 RISC-V 服务器芯片

    阿里巴巴发布了优化运行国产大模型的 RISC-V 服务器芯片玄铁 C950,原生支持 Qwen3、DeepSeek V3 等千亿参数大模型。阿里巴巴称玄铁 C950 单核通用性能在 Specint2006 基准测试中突破 70 分,刷新了全球 RISC-V 性能纪录。Google 研究员 Laurie Kirk 称玄铁 C950 的性能与苹果在 2020 年发布的 M1 芯片差不多。玄铁 C950 实现了 2025 年发布的 RISC-V RVA v23.1。该芯片使用 5 纳米工艺制造。

  15. PyPI 库中的 LiteLLM 遭到入侵植入恶意代码

    LiteLLM 项目维护者账号被盗,黑客向 PyPI 软件库发布了两个嵌入恶意代码的版本 v1.82.7 和 v1.82.8。恶意代码旨在窃取凭证,包括 SSH 密钥、云服务凭证、加密钱包等。任何安装了恶意版本的用户需要立即检查是否遭到入侵。恶意文件 litellm_init.pth 会在每次 Python 进程启动时自动执行。项目维护者称账号被盗源于刚刚爆出的 trivy 漏洞,恶意版本都已从 PyPI 上移除,所有维护者帐户都已更改。