DIGEST · 2026-03-16

OrangeBot.AI Digest — 2026-03-16

87 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Leanstral: Open-Source foundation for trustworthy vibe-coding (mistral.ai)
  2. Palestinian boy, 12, describes how Israeli forces killed his family in car (www.bbc.com)
  3. Meta’s renewed commitment to jemalloc (engineering.fb.com)
  4. AirPods Max 2 (www.apple.com)
  5. The “small web” is bigger than you might think (kevinboone.me)
  6. US Job Market Visualizer (karpathy.ai)
  7. 'Pokémon Go' players unknowingly trained delivery robots with 30B images (www.popsci.com)
  8. Obsession with growth is destroying nature, 150 countries warn (www.politico.eu)
  9. My Journey to a reliable and enjoyable locally hosted voice assistant (2025) (community.home-assistant.io)
  10. MoD sources warn Palantir role at heart of government is threat to UK security (www.thenerve.news)
  11. Why I love FreeBSD (it-notes.dragas.net)
  12. Corruption erodes social trust more in democracies than in autocracies (www.frontiersin.org)
  13. Polymarket gamblers threaten to kill me over Iran missile story (www.timesofisrael.com)
  14. Ask HN: What is it like being in a CS major program these days?
  15. Home Assistant waters my plants (finnian.io)

GitHub Trending(12)

  1. 666ghj / MiroFish
  2. thedotmack / claude-mem
  3. Crosstalk-Solutions / project-nomad
  4. obra / superpowers
  5. abhigyanpatwari / GitNexus
  6. lightpanda-io / browser
  7. volcengine / OpenViking
  8. shareAI-lab / learn-claude-code
  9. p-e-w / heretic
  10. langchain-ai / deepagents
  11. YishenTu / claudian
  12. voidzero-dev / vite-plus

Product Hunt(15)

  1. GLM-5-Turbo

    High-speed agentic model built specifically for OpenClaw

  2. Masko Code

    A mascot that watches Claude Code for you

  3. MuleRun

    Raise an AI that actually learns how you work

  4. Donely

    Your own OpenClaw instance for $0/mo + free AI usage offer

  5. JetBrains Air

    Run Codex, Claude Agents, Gemini CLI, and Junie side by side

  6. Faces

    Interactive presentations that use the full power of the web

  7. Adaptive — The Agent Computer

    The computer for AI to get things done

  8. GitFit.AI

    Track any nutrient, habit, or activity daily with AI

  9. Knock

    Knock on your MacBook to control your Mac

  10. ZeroSettle

    Drop-in direct billing SDK to skip the 30% Apple Tax

  11. Wendi AI

    The AI OS for people who manage people

  12. Spott

    Spott is the AI-native ATS & CRM for recruiting firms

  13. FnKey

    macOS dictation with Deepgram stream

  14. Glam AI

    Pick a trend, add your photo, and create viral content

  15. Refgrow 2.0

    Grow your revenue with referrals

Hugging Face(15)

  1. LMEB: Long-horizon Memory Embedding Benchmark

    Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' ability to handle long-horizon memory retrieval tasks involving fragmented, context-dependent, and temporally distant information. To address this, we introduce the Long-horizon Memory Embedding Benchmark (LMEB), a comprehensive framework that evaluates embedding models' capabilities in handling complex, long-horizon memory retrieval tasks. LMEB spans 22 datasets and 193 zero-shot retrieval tasks across 4 memory types: episodic, dialogue, semantic, and procedural, with both AI-generated and human-annotated data. These memory types differ in terms of level of abstraction and temporal dependency, capturing distinct aspects of memory retrieval that reflect the diverse challenges of the real world. We evaluate 15 widely used embedding models, ranging from hundreds of millions to ten billion parameters. The results reveal that (1) LMEB provides a reasonable level of difficulty; (2) Larger models do not always perform better; (3) LMEB and MTEB exhibit orthogonality. This suggests that the field has yet to converge on a universal model capable of excelling across all memory retrieval tasks, and that performance in traditional passage retrieval may not generalize to long-horizon memory retrieval. In summary, by providing a standardized and reproducible evaluation framework, LMEB fills a crucial gap in memory embedding evaluation, driving further advancements in text embedding for handling long-term, context-dependent memory retrieval. LMEB is available at https://github.com/KaLM-Embedding/LMEB.

  2. Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

    A recent cutting-edge topic in multimodal modeling is to unify visual comprehension and generation within a single model. However, the two tasks demand mismatched decoding regimes and visual representations, making it non-trivial to jointly optimize within a shared feature space. In this work, we present Cheers, a unified multimodal model that decouples patch-level details from semantic representations, thereby stabilizing semantics for multimodal understanding and improving fidelity for image generation via gated detail residuals. Cheers includes three key components: (i) a unified vision tokenizer that encodes and compresses image latent states into semantic tokens for efficient LLM conditioning, (ii) an LLM-based Transformer that unifies autoregressive decoding for text generation and diffusion decoding for image generation, and (iii) a cascaded flow matching head that decodes visual semantics first and then injects semantically gated detail residuals from the vision tokenizer to refine high-frequency content. Experiments on popular benchmarks demonstrate that Cheers matches or surpasses advanced UMMs in both visual understanding and generation. Cheers also achieves 4x token compression, enabling more efficient high-resolution image encoding and generation. Notably, Cheers outperforms the Tar-1.5B on the popular benchmarks GenEval and MMBench, while requiring only 20% of the training cost, indicating effective and efficient (i.e., 4x token compression) unified multimodal modeling. We will release all code and data for future research.

  3. Can Vision-Language Models Solve the Shell Game?

    Visual entity tracking is an innate cognitive ability in humans, yet it remains a critical bottleneck for Vision-Language Models (VLMs). This deficit is often obscured in existing video benchmarks by visual shortcuts. We introduce VET-Bench, a synthetic diagnostic testbed featuring visually identical objects that necessitate tracking exclusively through spatiotemporal continuity. Our experiments reveal that current state-of-the-art VLMs perform at or near chance level on VET-Bench, exposing a fundamental limitation: an over-reliance on static frame-level features and a failure to maintain entity representations over time. We provide a theoretical analysis drawing connections to the state-tracking problem, proving that fixed-depth transformer-based VLMs are fundamentally limited in tracking indistinguishable objects without intermediate supervision due to expressivity constraints. To address this, we propose Spatiotemporal Grounded Chain-of-Thought (SGCoT): generating object trajectories as explicit intermediate states. Leveraging Molmo2's object tracking ability, we elicit SGCoT reasoning by fine-tuning on synthesized text-only data for alignment. Our method achieves state-of-the-art accuracy exceeding 90% on VET-Bench, demonstrating that VLMs can reliably solve the video shell-game task end-to-end without external tools. Our code and data are available at https://vetbench.github.io .

  4. daVinci-Env: Open SWE Environment Synthesis at Scale

    Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for iterative code editing, test execution, and solution refinement. However, existing open-source datasets remain limited in scale and repository diversity, while industrial solutions are opaque with unreleased infrastructure, creating a prohibitive barrier for most academic research groups. We present OpenSWE, the largest fully transparent framework for SWE agent training in Python, comprising 45,320 executable Docker environments spanning over 12.8k repositories, with all Dockerfiles, evaluation scripts, and infrastructure fully open-sourced for reproducibility. OpenSWE is built through a multi-agent synthesis pipeline deployed across a 64-node distributed cluster, automating repository exploration, Dockerfile construction, evaluation script generation, and iterative test analysis. Beyond scale, we propose a quality-centric filtering pipeline that characterizes the inherent difficulty of each environment, filtering out instances that are either unsolvable or insufficiently challenging and retaining only those that maximize learning efficiency. With 891K spent on environment construction and an additional 576K on trajectory sampling and difficulty-aware curation, the entire project represents a total investment of approximately $1.47 million, yielding about 13,000 curated trajectories from roughly 9,000 quality guaranteed environments. Extensive experiments validate OpenSWE's effectiveness: OpenSWE-32B and OpenSWE-72B achieve 62.4% and 66.0% on SWE-bench Verified, establishing SOTA among Qwen2.5 series. Moreover, SWE-focused training yields substantial out-of-domain improvements, including up to 12 points on mathematical reasoning and 5 points on science benchmarks, without degrading factual recall.

  5. OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

    Recent joint audio-visual diffusion models achieve remarkable generation quality but suffer from high latency due to their bidirectional attention dependencies, hindering real-time applications. We propose OmniForcing, the first framework to distill an offline, dual-stream bidirectional diffusion model into a high-fidelity streaming autoregressive generator. However, naively applying causal distillation to such dual-stream architectures triggers severe training instability, due to the extreme temporal asymmetry between modalities and the resulting token sparsity. We address the inherent information density gap by introducing an Asymmetric Block-Causal Alignment with a zero-truncation Global Prefix that prevents multi-modal synchronization drift. The gradient explosion caused by extreme audio token sparsity during the causal shift is further resolved through an Audio Sink Token mechanism equipped with an Identity RoPE constraint. Finally, a Joint Self-Forcing Distillation paradigm enables the model to dynamically self-correct cumulative cross-modal errors from exposure bias during long rollouts. Empowered by a modality-independent rolling KV-cache inference scheme, OmniForcing achieves state-of-the-art streaming generation at sim25 FPS on a single GPU, maintaining multi-modal synchronization and visual quality on par with the bidirectional teacher.Project Page: https://omniforcing.com{https://omniforcing.com}

  6. Multimodal OCR: Parse Anything from Documents

    We present Multimodal OCR (MOCR), a document parsing paradigm that jointly parses text and graphics into unified textual representations. Unlike conventional OCR systems that focus on text recognition and leave graphical regions as cropped pixels, our method, termed dots.mocr, treats visual elements such as charts, diagrams, tables, and icons as first-class parsing targets, enabling systems to parse documents while preserving semantic relationships across elements. It offers several advantages: (1) it reconstructs both text and graphics as structured outputs, enabling more faithful document reconstruction; (2) it supports end-to-end training over heterogeneous document elements, allowing models to exploit semantic relations between textual and visual components; and (3) it converts previously discarded graphics into reusable code-level supervision, unlocking multimodal supervision embedded in existing documents. To make this paradigm practical at scale, we build a comprehensive data engine from PDFs, rendered webpages, and native SVG assets, and train a compact 3B-parameter model through staged pretraining and supervised fine-tuning. We evaluate dots.mocr from two perspectives: document parsing and structured graphics parsing. On document parsing benchmarks, it ranks second only to Gemini 3 Pro on our OCR Arena Elo leaderboard, surpasses existing open-source document parsing systems, and sets a new state of the art of 83.9 on olmOCR Bench. On structured graphics parsing, dots.mocr achieves higher reconstruction quality than Gemini 3 Pro across image-to-SVG benchmarks, demonstrating strong performance on charts, UI layouts, scientific figures, and chemical diagrams. These results show a scalable path toward building large-scale image-to-code corpora for multimodal pretraining. Code and models are publicly available at https://github.com/rednote-hilab/dots.mocr.

  7. MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

    Multimodal Large Language Models (MLLMs) are increasingly used to carry out visual workflows such as navigating GUIs, where the next step depends on verified visual compositional conditions (e.g., "if a permission dialog appears and the color of the interface is green, click Allow") and the process may branch or terminate early. Yet this capability remains under-evaluated: existing benchmarks focus on shallow-compositions or independent-constraints rather than deeply chained compositional conditionals. In this paper, we introduce MM-CondChain, a benchmark for visually grounded deep compositional reasoning. Each benchmark instance is organized as a multi-layer reasoning chain, where every layer contains a non-trivial compositional condition grounded in visual evidence and built from multiple objects, attributes, or relations. To answer correctly, an MLLM must perceive the image in detail, reason over multiple visual elements at each step, and follow the resulting execution path to the final outcome. To scalably construct such workflow-style data, we propose an agentic synthesis pipeline: a Planner orchestrates layer-by-layer generation of compositional conditions, while a Verifiable Programmatic Intermediate Representation (VPIR) ensures each layer's condition is mechanically verifiable. A Composer then assembles these verified layers into complete instructions. Using this pipeline, we construct benchmarks across three visual domains: natural images, data charts, and GUI trajectories. Experiments on a range of MLLMs show that even the strongest model attains only 53.33 Path F1, with sharp drops on hard negatives and as depth or predicate complexity grows, confirming that deep compositional reasoning remains a fundamental challenge.

  8. Visual-ERM: Reward Modeling for Visual Equivalence

    Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured representations with high visual fidelity. While recent Large Vision Language Models (LVLMs) achieve strong results via supervised fine-tuning, reinforcement learning remains challenging due to misaligned reward signals. Existing rewards either rely on textual rules or coarse visual embedding similarity, both of which fail to capture fine-grained visual discrepancies and are vulnerable to reward hacking. We propose Visual Equivalence Reward Model (Visual-ERM), a multimodal generative reward model that provides fine-grained, interpretable, and task-agnostic feedback to evaluate vision-to-code quality directly in the rendered visual space. Integrated into RL, Visual-ERM improves Qwen3-VL-8B-Instruct by +8.4 on chart-to-code and yields consistent gains on table and SVG parsing (+2.7, +4.1 on average), and further strengthens test-time scaling via reflection and revision. We also introduce VisualCritic-RewardBench (VC-RewardBench), a benchmark for judging fine-grained image-to-image discrepancies on structured visual data, where Visual-ERM at 8B decisively outperforms Qwen3-VL-235B-Instruct and approaches leading closed-source models. Our results suggest that fine-grained visual reward supervision is both necessary and sufficient for vision-to-code RL, regardless of task specificity.

  9. Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

    Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical reasoning stream. However, directly applying test-time scaling methods incurs unacceptable response latency. To address this trade-off, we propose Video Streaming Thinking (VST), a novel paradigm for streaming video understanding. It supports a thinking while watching mechanism, which activates reasoning over incoming video clips during streaming. This design improves timely comprehension and coherent cognition while preserving real-time responsiveness by amortizing LLM reasoning latency over video playback. Furthermore, we introduce a comprehensive post-training pipeline that integrates VST-SFT, which structurally adapts the offline VideoLLM to causal streaming reasoning, and VST-RL, which provides end-to-end improvement through self-exploration in a multi-turn video interaction environment. Additionally, we devise an automated training-data synthesis pipeline that uses video knowledge graphs to generate high-quality streaming QA pairs, with an entity-relation grounded streaming Chain-of-Thought to enforce multi-evidence reasoning and sustained attention to the video stream. Extensive evaluations show that VST-7B performs strongly on online benchmarks, e.g. 79.5% on StreamingBench and 59.3% on OVO-Bench. Meanwhile, VST remains competitive on offline long-form or reasoning benchmarks. Compared with Video-R1, VST responds 15.7 times faster and achieves +5.4% improvement on VideoHolmes, demonstrating higher efficiency and strong generalization across diverse video understanding tasks. Code, data, and models will be released at https://github.com/1ranGuan/VST.

  10. V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

    Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.

  11. HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

    The rapid evolution of embodied agents has accelerated the deployment of household robots in real-world environments. However, unlike structured industrial settings, household spaces introduce unpredictable safety risks, where system limitations such as perception latency and lack of common sense knowledge can lead to dangerous errors. Current safety evaluations, often restricted to static images, text, or general hazards, fail to adequately benchmark dynamic unsafe action detection in these specific contexts. To bridge this gap, we introduce HomeSafe-Bench, a challenging benchmark designed to evaluate Vision-Language Models (VLMs) on unsafe action detection in household scenarios. HomeSafe-Bench is contrusted via a hybrid pipeline combining physical simulation with advanced video generation and features 438 diverse cases across six functional areas with fine-grained multidimensional annotations. Beyond benchmarking, we propose Hierarchical Dual-Brain Guard for Household Safety (HD-Guard), a hierarchical streaming architecture for real-time safety monitoring. HD-Guard coordinates a lightweight FastBrain for continuous high-frequency screening with an asynchronous large-scale SlowBrain for deep multimodal reasoning, effectively balancing inference efficiency with detection accuracy. Evaluations demonstrate that HD-Guard achieves a superior trade-off between latency and performance, while our analysis identifies critical bottlenecks in current VLM-based safety detection.

  12. From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

    Group Relative Policy Optimization (GRPO) has emerged as a powerful framework for preference alignment in text-to-image (T2I) flow models. However, we observe that the standard paradigm where evaluating a group of generated samples against a single condition suffers from insufficient exploration of inter-sample relationships, constraining both alignment efficacy and performance ceilings. To address this sparse single-view evaluation scheme, we propose Multi-View GRPO (MV-GRPO), a novel approach that enhances relationship exploration by augmenting the condition space to create a dense multi-view reward mapping. Specifically, for a group of samples generated from one prompt, MV-GRPO leverages a flexible Condition Enhancer to generate semantically adjacent yet diverse captions. These captions enable multi-view advantage re-estimation, capturing diverse semantic attributes and providing richer optimization signals. By deriving the probability distribution of the original samples conditioned on these new captions, we can incorporate them into the training process without costly sample regeneration. Extensive experiments demonstrate that MV-GRPO achieves superior alignment performance over state-of-the-art methods.

  13. HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

    Diffusion models have demonstrated a remarkable ability in Text-to-Image (T2I) generation applications. Despite the advanced generation output, they suffer from heavy computation overhead, especially for large models that contain tens of billions of parameters. Prior work has illustrated that replacing part of the denoising steps with a smaller model still maintains the generation quality. However, these methods only focus on saving computation for some timesteps, ignoring the difference in compute demand within one timestep. In this work, we propose HybridStitch, a new T2I generation paradigm that treats generation like editing. Specifically, we introduce a hybrid stage that jointly incorporates both the large model and the small model. HybridStitch separates the entire image into two regions: one that is relatively easy to render, enabling an early transition to the smaller model, and another that is more complex and therefore requires refinement by the large model. HybridStitch employs the small model to construct a coarse sketch while exploiting the large model to edit and refine the complex regions. According to our evaluation, HybridStitch achieves 1.83times speedup on Stable Diffusion 3, which is faster than all existing mixture of model methods.

  14. VQQA: An Agentic Approach for Video Evaluation and Quality Improvement

    Despite rapid advancements in video generation models, aligning their outputs with complex user intent remains challenging. Existing test-time optimization methods are typically either computationally expensive or require white-box access to model internals. To address this, we present VQQA (Video Quality Question Answering), a unified, multi-agent framework generalizable across diverse input modalities and video generation tasks. By dynamically generating visual questions and using the resulting Vision-Language Model (VLM) critiques as semantic gradients, VQQA replaces traditional, passive evaluation metrics with human-interpretable, actionable feedback. This enables a highly efficient, closed-loop prompt optimization process via a black-box natural language interface. Extensive experiments demonstrate that VQQA effectively isolates and resolves visual artifacts, substantially improving generation quality in just a few refinement steps. Applicable to both text-to-video (T2V) and image-to-video (I2V) tasks, our method achieves absolute improvements of +11.57% on T2V-CompBench and +8.43% on VBench2 over vanilla generation, significantly outperforming state-of-the-art stochastic search and prompt optimization techniques.

  15. EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

    The increasing adoption of Large Language Models (LLMs) has enabled AI scientists to perform complex end-to-end scientific discovery tasks requiring coordination of specialized roles, including idea generation and experimental execution. However, most state-of-the-art AI scientist systems rely on static, hand-designed pipelines and fail to adapt based on accumulated interaction histories. As a result, these systems overlook promising research directions, repeat failed experiments, and pursue infeasible ideas. To address this, we introduce EvoScientist, an evolving multi-agent AI scientist framework that continuously improves research strategies through persistent memory and self-evolution. EvoScientist comprises three specialized agents: a Researcher Agent (RA) for scientific idea generation, an Engineer Agent (EA) for experiment implementation and execution, and an Evolution Manager Agent (EMA) that distills insights from prior interactions into reusable knowledge. EvoScientist contains two persistent memory modules: (i) an ideation memory, which summarizes feasible research directions from top-ranked ideas while recording previously unsuccessful directions; and (ii) an experimentation memory, which captures effective data processing and model training strategies derived from code search trajectories and best-performing implementations. These modules enable the RA and EA to retrieve relevant prior strategies, improving idea quality and code execution success rates over time. Experiments show that EvoScientist outperforms 7 open-source and commercial state-of-the-art systems in scientific idea generation, achieving higher novelty, feasibility, relevance, and clarity via automatic and human evaluation. EvoScientist also substantially improves code execution success rates through multi-agent evolution, demonstrating persistent memory's effectiveness for end-to-end scientific discovery.

Techmeme(15)

  1. Austin-based Ironlight, which is building a regulated marketplace for tokenized securities, raised a $21M Series A; its platform received FINRA approval in 2025 (Ryan Lawler/Axios)

    Ryan Lawler / Axios : Austin-based Ironlight, which is building a regulated marketplace for tokenized securities, raised a $21M Series A; its platform received FINRA approval in 2025 —  Ironlight, which is building a regulated marketplace for tokenized securities, raised $21 million in Series A funding …

  2. Manus introduces My Computer, a desktop application that enables its AI agent to interact directly with the user's local files, tools, and applications (Manus)

    Manus : Manus introduces My Computer, a desktop application that enables its AI agent to interact directly with the user's local files, tools, and applications —  Until today, Manus has lived entirely in the cloud.  —  The cloud sandbox has served Manus well.  Inside an isolated, secure environment …

  3. Roche says it has deployed 3,500+ Nvidia Blackwell GPUs, which it calls "the greatest announced GPU footprint available to a pharmaceutical company" (Sebastian Moss/DatacenterDynamics)

    Sebastian Moss / DatacenterDynamics : Roche says it has deployed 3,500+ Nvidia Blackwell GPUs, which it calls “the greatest announced GPU footprint available to a pharmaceutical company” —  Building on previous Genentech-Nvidia partnership  —  The fifth-largest pharmaceutical company in the world has deployed more than 3,500 Nvidia Blackwell GPUs.

  4. Nvidia says BYD, Geely, Isuzu, and Nissan will use its Drive Hyperion AV platform, and that Uber will launch Hyperion-powered robotaxis across 28 cities by 2028 (Andrew J. Hawkins/The Verge)

    Andrew J. Hawkins / The Verge : Nvidia says BYD, Geely, Isuzu, and Nissan will use its Drive Hyperion AV platform, and that Uber will launch Hyperion-powered robotaxis across 28 cities by 2028 —  The chipmaker has been at the center of simmering trade tensions between the US and China.

  5. Nvidia unveils Space-1 Vera Rubin for orbital data centers, saying its GPU delivers up to 25x more AI compute for space-based inferencing compared to the H100 (Sebastian Moss/DatacenterDynamics)

    Sebastian Moss / DatacenterDynamics : Nvidia unveils Space-1 Vera Rubin for orbital data centers, saying its GPU delivers up to 25x more AI compute for space-based inferencing compared to the H100 —  Will be used by Aetherflux, Axiom Space, Kepler Comms, Planet, Sophia Space & Starcloud  —  Nvidia has developed a space-specific module of its Vera Rubin GPU-CPU platform.

  6. US startup Reflection AI is working with South Korea's Shinsegae Group to build a 250MW data center in South Korea, sources say in a several-billion-dollar deal (Amrith Ramkumar/Wall Street Journal)

    Amrith Ramkumar / Wall Street Journal : US startup Reflection AI is working with South Korea's Shinsegae Group to build a 250MW data center in South Korea, sources say in a several-billion-dollar deal —  The Trump administration is using AI chips and models as a tool for diplomacy and boosting U.S. allies

  7. Nvidia announces NemoClaw, which combines the OpenClaw agent platform with components of Nvidia's Agent Toolkit to add privacy and security controls (Frederic Lardinois/The New Stack)

    Frederic Lardinois / The New Stack : Nvidia announces NemoClaw, which combines the OpenClaw agent platform with components of Nvidia's Agent Toolkit to add privacy and security controls —  Nvidia's NemoClaw combines the OpenClaw agent platform with components of its Agent Toolkit to add privacy and security controls.

  8. Nvidia unveils a server rack with 256 Vera CPUs, with each CPU featuring 88 custom Olympus cores and LPDDR5X memory for up to 1.2 TB/s of bandwidth (Tobias Mann/The Register)

    Tobias Mann / The Register : Nvidia unveils a server rack with 256 Vera CPUs, with each CPU featuring 88 custom Olympus cores and LPDDR5X memory for up to 1.2 TB/s of bandwidth —  GTC Intel and AMD take notice.  At GTC on Monday, Nvidia unveiled its latest liquid-cooled rack systems.

  9. Z.ai launches GLM-5-Turbo, a closed-source, faster, and cheaper variant of GLM-5 optimized for agent-driven workflows and OpenClaw-style tasks (Carl Franzen/VentureBeat)

    Carl Franzen / VentureBeat : Z.ai launches GLM-5-Turbo, a closed-source, faster, and cheaper variant of GLM-5 optimized for agent-driven workflows and OpenClaw-style tasks —  Chinese AI startup Z.ai, known for its powerful, open source GLM family of large language models (LLMs), has introduced GLM-5-Turbo, a new …

  10. Nvidia announces the Nvidia Groq 3 LPX, an inference server rack featuring 256 Groq 3 LPUs and 128GB of on-chip SRAM, available in H2 2026 (Dylan Martin/CRN)

    Dylan Martin / CRN : Nvidia announces the Nvidia Groq 3 LPX, an inference server rack featuring 256 Groq 3 LPUs and 128GB of on-chip SRAM, available in H2 2026 —  Nvidia announced Monday at GTC 2026 that its new Groq-based inference server rack will be available alongside the Vera Rubin NVL72 rack …

  11. Jensen Huang says Nvidia expects to generate $1T+ in sales from its flagship AI chips through the end of 2027, after previously forecasting $500B by 2026's end (Ian King/Bloomberg)

    Ian King / Bloomberg : Jensen Huang says Nvidia expects to generate $1T+ in sales from its flagship AI chips through the end of 2027, after previously forecasting $500B by 2026's end —  Nvidia Corp. Chief Executive Officer Jensen Huang, addressing crowds at the company's biggest annual event …

  12. Nvidia unveils DLSS 5, which uses a real-time neural rendering model to add photorealistic lighting to game frames, arriving this fall to RTX 50-series GPUs (Richard Leadbetter/Digital Foundry)

    Richard Leadbetter / Digital Foundry : Nvidia unveils DLSS 5, which uses a real-time neural rendering model to add photorealistic lighting to game frames, arriving this fall to RTX 50-series GPUs —  At its GTC 2026 event, Nvidia has revealed the next generation of DLSS.  DLSS 5 isn't a frame-rate, frame generation or performance enhancing technology.

  13. A recording of Jensen Huang's Nvidia GTC 2026 keynote at the SAP Center in San Jose, California (NVIDIA on YouTube)

    NVIDIA on YouTube : A recording of Jensen Huang's Nvidia GTC 2026 keynote at the SAP Center in San Jose, California —  Watch NVIDIA Founder and CEO Jensen Huang's GTC keynote as he unveils the latest breakthroughs in AI and accelerated computing.  See how agentic AI, AI factor...

  14. Digital asset wealth management platform Abra plans to go public on Nasdaq via a SPAC merger with New Providence at a $750M pre-money valuation (Stacy Jones/Decrypt)

    Stacy Jones / Decrypt : Digital asset wealth management platform Abra plans to go public on Nasdaq via a SPAC merger with New Providence at a $750M pre-money valuation —  Abra Financial Holdings, the San Francisco-based digital asset wealth management platform, said Monday it will go public through a business combination …

  15. A group of Tennessee teenagers sues xAI, alleging its AI tools were used to create nude images of them by editing photos in which they were clothed (Faiz Siddiqui/Washington Post)

    Faiz Siddiqui / Washington Post : A group of Tennessee teenagers sues xAI, alleging its AI tools were used to create nude images of them by editing photos in which they were clothed —  Three plaintiffs, two of whom are minors, accuse xAI of distribution, possession and production with intent to distribute child pornography.

Solidot(15)

  1. GDC 2026 访客减少三成

    2026 年的游戏开发者大会(GDC)访客人数比去年下降了三成,总人数大约 2 万。2022 年的 GDC 大会是疫情之后首度回归,采用了线上线下的混合模式举行,其中线下访客接近 1.2 万,总访客 1.7 万。2023 年访客人数回升到 2.8 万,2024 年突破 3 万创下访客人数纪录,2025 年维持这一水平。但今年的 GDC 大会参观人数因为费用以及国际游客对美国国内情况的担忧而减少。

  2. FSF 希望用户自由是 AI 公司版权诉讼的一个目标

    Anthropic 从 Library Genesis 等影子图书馆下载了逾 700 万本书籍,它与图书作者和解了侵权诉讼,正联系相关图书的作者提供经济补偿。被收录在 Anthropic 图书数据库中的一本书是 Sam Williams 著的《Free as in freedom: Richard Stallman's crusade for free software》,该书由 O'Reilly 和 FSF 根据 GNU Free Documentation License (GNU FDL)许可证出版,GNU FDL 是一种自由许可证,无需付费即可用于任意目的。FSF 表示,它对经济补偿兴趣不大,如果其拥有版权的图书被 AI 公司用于训练大模型,那么它更希望获得的补偿是用户自由:AI 公司与用户共享完整的训练输入,完整的模型、训练配置设置和相应的软件源代码。

  3. 英国涂鸦艺术家 Banksy 是否还应该保持匿名?

    路透发表了一篇调查报告,分析了匿名英国涂鸦艺术家 Banksy 的身份。2022 年 11 月,Banksy 在乌克兰基辅的一处被炸村庄墙壁上制作了涂鸦,当地居民看到了涂鸦者。路透记者对此展开了调查,发现了 Banksy 本人多年前亲笔写下的一份认罪书,承认行为不检的轻罪指控,这份文件揭露了他的真实身份。但 Banksy 的律师督促记者不要公开 Banksy 的身份,称会侵犯艺术家的隐私,干扰他艺术创作,危及他的安全,且会损害公众利益。律师称,“匿名或使用笔名进行创作符合重要的社会利益。它保护了言论自由,使创作者能畅所欲言的向权力说出真话,不必担心遭到报复、审查或迫害——尤其是在涉及政治、宗教或社会正义等敏感问题时。”

  4. 常压超导温度创下新纪录

    根据发表在 PNAS 期刊上的一项研究,休斯顿大学物理系及得克萨斯超导中心的研究团队在环境压力下实现了151开尔文(约零下 122 摄氏度)的超导转变温度,刷新了常压下超导温度的世界纪录。超导体通常需要超高压或超低温,常压室温超导体一直是科学家追求的目标。研究团队采用了一种名为“压力淬火”的新工艺。该方法的原理是,先对预选的材料样本施加极高压力,此过程能改变材料的微观结构,从而显著提升其超导转变温度。在维持高压并降温至特定状态后,迅速将压力完全释放。通过这种快速“淬火”,材料在高压下获得的、更利于超导的亚稳态结构得以“锁定”并保留下来,材料在恢复常压后仍能在比原来高得多的温度下保持超导特性。凭借这一方法,团队将超导材料在常压下的转变温度提升至 151 开尔文。

  5. 逾四成日本人计划工作到 70 岁后

    根据日经的调查,回答“到了 70 岁仍会继续工作”的比例为 42%,自 2018 年调查开始以来首次超过 4 成。回答“70~74 岁”仍会继续工作的比例为 23%,回答“75岁以上”的比例为 19%。工作到多大年龄的平均值为 68.3 岁,高于法定的 65 岁。而日本政府在《老年人雇用稳定法》中规定,确保老年人工作到 70 岁的机会是企业的努力义务。

  6. 波兰核研究机构遭黑客攻击

    波兰核研究机构 National Centre for Nuclear Research(NCBJ)披露其 IT 基础设施遭到网络攻击,但表示安全团队迅速采取行动,挫败了攻击,因此未遭受什么影响。NCBJ 从事核物理、反应堆技术、粒子物理和辐射应用方面的研究,运营着用于科学实验、中子研究和医用同位素生产的核反应堆 MARIA。NCBJ 称 MARIA 反应堆未受影响,仍然全负荷运行。NCBJ 未确定攻击者身份。

  7. AI 生成裸照的性吸引力高于真人照片

    根据发表在《Archives of Sexual Behavior》期刊上的一项研究,AI 生成的女性裸照的性吸引力高于真人照片。研究人员在捷克招募了 649 名异性恋成年人,参与者主要为男性,女性为 45 人。研究人员向他们展示了六种不同类型的图像,包括真实女性的照片、计算机生成的人像、AI 生成的人像、经过整形手术的真实女性、硅胶性爱娃娃和成人动漫图像。参与者对每张图像的真实性、性吸引力和审美进行评分。结果显示,虽然 AI 生成的图像在真实性上低于真人照片,但在审美吸引力和性吸引力上都最高。

  8. 黑客使用看不见的字符对 GitHub 等平台发动供应链攻击

    安全公司 Aikido Security 的研究人员报告了对 GitHub 等平台发动的新供应链攻击。攻击者使用不可见的 Unicode 字符上传了 151 个恶意包,这些字符在编辑器等界面对人眼不可见,但能被机器阅读,并能执行其恶意指令。安全研究人员将该组织命名为 Glassworm,认为攻击者使用大模型生成了不同项目的软件包。不可见字符使用 Public Use Areas(aka Public Use Access)渲染,是 Unicode 标准中用于定义表情符号、旗帜等特殊字符的私有字符代码点。当输入计算机时,这些代码点的输出对人类完全不可见,只能看到空白或空行,但对 JavaScript 解释器而言,这些代码点会被转换为可执行代码。

  9. 有线耳机销量暴增

    2016 年和 2017 年苹果以及 Google 先后取消了手机上的 3.5mm 耳机孔,无线耳机大行其道,有线耳机看起来要走入历史了。然而过去几个月,有线耳机的销量飙升,它能以更低的价格提供更好的音质。Circana 的数据显示,在连续五年销量下滑之后,有线耳机销量在 2025 年下半年出现爆发式增长,2026 年前六周有线耳机收入增长了 20%。无线耳机确实更自由,但它的电池续航力有限,蓝牙配对也经常不能正常工作。虽然专用的耳机孔消失了,但现在出现了支持 USB 或 Lightning 接口的有线耳机。

  10. 《萤火虫》制作动画系列

    科幻电视剧《萤火虫》中“宁静号”舰长扮演者 Nathan Fillion 披露了动画片的消息,原剧卡司将回归配音。《萤火虫》以美中统治世界为背景,因此剧中角色在英文对白中间会时不时加入一句中文。电视剧讲述了 9 位异议分子驾驶宁静号在星际边缘地带冒险的故事。《萤火虫》最初在福克斯电视台播出,播到第 11 集就遭停播,但其 DVD 的畅销不衰最终推动了电影版《宁静号》的诞生。动画版《萤火虫》由 Collision33 和 20th Television Animation 联合制作,Tara Butters (《Agent Carter》和《Dollhouse》)担任编剧兼制作人,Marc Guggenheim(《Arrow》和《Flash》)担任制片人。

  11. 工业用氦气出现短缺

    卡特尔供应了全球三成以上的工业用氦气。卡塔尔国有能源公司 QatarEnergy 位于 Ras Laffan 的氦气生产设施在本月初遭无人机袭击之后宣布停产,导致全球工业氦气供应中断,氦气价格随后飙升。QatarEnergy 称它遭遇了不可抗力,免除了其对客户的供应义务,其生产何时恢复未知。氦气在金属焊接和半导体制造中有着重要作用。医学成像和化学分析研究也使用到氦气,冷却到零下 268 摄氏度的液氦能让 MRI 和 NMR 等设备使用的超导磁体保持冷却。韩国是受影响最大的国家之一,2025 年韩国 64.7% 的氦气进口自卡塔尔。SK 海力士表示已实现氦气供应多元化并确保了充足的库存。台积电表示 QatarEnergy 的工厂停产预期不会产生显著影响,但正密切关注事态变化。

  12. 研究称 AI 的生产力提升仅为每周 16 分钟

    根据 Foxit 的《State of Document Intelligence》报告,AI 的生产力提升远低于高管的预期,仅为每周节省 16 分钟工作时间。虽然 89% 的高管和 79% 的终端用户表示 AI 工具让他们感觉工作效率更高,但将审核和验证 AI 生成输出的时间考虑在内之后,实际节省的时间大幅缩水。对美国和英国 1000 名办公室员工和 400 名高管的调查发现,高管认为 AI 每周能为他们节省约 4.6 小时,但他们需要花费大约 4 小时 20 分钟验证结果。终端用户的情况类似,他们估计节省了 3.6 小时,但需要花费 3 小时 50 分钟审核 AI 生成的工作。一旦将这种“验证负担”考虑在内,高管每周仅节省 16 分钟,终端用户实际上增加了约 14 分钟。

  13. Windows 11 二月更新可能导致部分三星笔记本电脑用户无法访问 C 盘

    微软正在调查一个影响部分三星笔记本电脑的问题。用户在安装 2026 年 2 月安全更新后无法访问 C 盘,无法启动应用。微软表示正与三星合作以确定该问题是否与 Windows 更新或受影响设备上安装的三星软件有关。微软称,受影响用户会遭遇 C 盘无法访问,访问被拒绝的错误信息。该问题也阻止用户访问文件,以及启动 Outlook、Office、浏览器、系统工具和快速助手(Quick Assist)。问题主要发生在巴西、葡萄牙、韩国和印度,影响三星 Galaxy Book 4 等消费类设备。该问题可能与三星共享应用有关,但确切原因尚未确认。

  14. arXiv 成立独立基金会,招聘 CEO

    根据本周发布的一则招聘通知,预印本平台 arXiv 正在成立独立的非盈利基金会,招聘一位有相关领域经验的 CEO,预期薪水 30 万美元左右。arXiv 已有逾 30 年历史,托管了逾 270 万篇学术论文,每五天新增 1000 篇论文,累积下载量 32 亿次。arXiv 上的论文都是开放获取,旨在打破学术研究中的障碍,让研究人员能分享和发现最新研究成果。arXiv 此前由康奈尔大学图书馆管理,在 Simons Foundation 基金会的支持下,它正在成立一个非盈利基金会,将迈入新的阶段。它目前的年度预算约为 600 万美元,有 27 名员工,大部分都是远程工作。

  15. 小鼠实验显示在类似火星的重力条件下肌肉会流失

    根据发表在《Science Advanc》期刊上的一项研究,研究人员调查了参与国际空间站 Multiple Artificial-gravity Research System 实验的小鼠肌肉流失情况。他们测量了在 0.33g、0.67g 和 1g 重力下离心机中训练的小鼠,发现 0.67g 和 1g 下的小鼠肌肉几乎没有差异,但 0.33g 下出现了明显的肌肉流失。0.33g 类似火星上的重力条件。