MONTH · 2026-04

Monthly Digest — 2026-04

613 unique stories across 30 days and 8 sources.

Hacker News(120)

  1. Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs (github.com)
  2. NASA Artemis II moon mission live launch broadcast (plus.nasa.gov)
  3. The OpenAI graveyard: All the deals and products that haven't happened (www.forbes.com)
  4. Ask HN: Who is hiring? (April 2026)
  5. Tailscale's new macOS home (tailscale.com)
  6. Cursor 3 (cursor.com)
  7. Google releases Gemma 4 open models (deepmind.google)
  8. Delve allegedly forked an open-source tool and sold it as its own (techcrunch.com)
  9. Oracle Files H-1B Visa Petitions Amid Mass Layoffs (nationaltoday.com)
  10. Artemis II crew take 'spectacular' image of Earth (www.bbc.com)
  11. iNaturalist (www.inaturalist.org)
  12. We replaced RAG with a virtual filesystem for our AI documentation assistant (www.mintlify.com)
  13. How many products does Microsoft have named 'Copilot'? (teybannerman.com)
  14. Iranian missile blitz takes down AWS data centers in Bahrain and Dubai (www.tomshardware.com)
  15. 12k AI-generated blog posts added in a single commit (github.com)
  16. When legal sports betting surges, so do Americans' financial problems (www.npr.org)
  17. LÖVE: 2D Game Framework for Lua (github.com)
  18. Gemma 4 on iPhone (apps.apple.com)
  19. Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code (ai.georgeliu.com)
  20. LibreOffice – Let's put an end to the speculation (blog.documentfoundation.org)

GitHub Trending(53)

  1. anthropics / claude-code
  2. microsoft / VibeVoice
  3. google-research / timesfm
  4. luongnv89 / claude-howto
  5. siddharthvaddem / openscreen
  6. Yeachan-Heo / oh-my-codex
  7. asgeirtj / system_prompts_leaks
  8. sherlock-project / sherlock
  9. onyx-dot-app / onyx
  10. Blaizzy / mlx-vlm
  11. google-ai-edge / gallery
  12. block / goose
  13. abhigyanpatwari / GitNexus
  14. google-ai-edge / LiteRT-LM
  15. NVIDIA / personaplex
  16. forrestchang / andrej-karpathy-skills
  17. TheCraigHewitt / seomachine
  18. NousResearch / hermes-agent
  19. HKUDS / DeepTutor
  20. OpenBMB / VoxCPM

Product Hunt(117)

  1. flock

    Run a flock of Claude Code (or other agents) in one window.

  2. Ray-Ban Meta G2 Blayzer & Scriber Optics

    Meta's first AI glasses built for prescriptions

  3. Ditch

    App cleaner that lives in your MacBook’s notch

  4. Noiz Easter Voice

    Crack an Easter egg to generate an AI voice

  5. Lightning V3

    Text-to-Speech built for Voice Agents

  6. SampleStack

    The native macOS sample manager built for every instrument

  7. Denovo

    Build and run your business while you sleep.

  8. Syncly Social

    Find creators by what's actually in their content

  9. Dashla

    Tesla vehicle status, navigation, map + more in a dashboard

  10. FindThem

    Describe ideal lead or investor - get their Linkedin & email

  11. MAI-Transcribe-1

    Production ASR for noisy multilingual audio

  12. Vxero Neo

    SSH-native CLI that manage servers, apps, infrastructure

  13. Fluently

    AI subtitles & translations for YouTube. 20+ Languages.

  14. Open Claude in Chrome

    Claude in Chrome, reverse-engineered, Jailbroken

  15. Google Vids 2.0

    Create, edit and share videos at no cost w/ new AI features

  16. Surf Social Websites

    Bring together people and content on the social web

  17. Panorama

    AI that finds your team’s workflows and hidden structures

  18. Influcio

    AI marketing Agent for result-driven influencer campaign

  19. Shotwell

    The screenshot editor for iPhone.

  20. Tiny Aya

    Local, open-weight AI designed for real-world languages

Hugging Face(89)

  1. FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

    We present Future-KL Influenced Policy Optimization (FIPO), a reinforcement learning algorithm designed to overcome reasoning bottlenecks in large language models. While GRPO style training scales effectively, it typically relies on outcome-based rewards (ORM) that distribute a global advantage uniformly across every token in a trajectory. We argue that this coarse-grained credit assignment imposes a performance ceiling by failing to distinguish critical logical pivots from trivial tokens. FIPO addresses this by incorporating discounted future-KL divergence into the policy update, creating a dense advantage formulation that re-weights tokens based on their influence on subsequent trajectory behavior. Empirically, FIPO enables models to break through the length stagnation seen in standard baselines. Evaluated on Qwen2.5-32B, FIPO extends the average chain-of-thought length from roughly 4,000 to over 10,000 tokens and increases AIME 2024 Pass@1 accuracy from 50.0% to a peak of 58.0% (converging at approximately 56.0\%). This outperforms both DeepSeek-R1-Zero-Math-32B (around 47.0%) and o1-mini (approximately 56.0%). Our results suggest that establishing dense advantage formulations is a vital path for evolving ORM-based algorithms to unlock the full reasoning potential of base models. We open-source our training system, built on the verl framework.

  2. CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

    The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir

  3. LongCat-Next: Lexicalizing Modalities as Discrete Tokens

    The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and suboptimal integration. To transcend this limitation, we introduce Discrete Native Autoregressive (DiNA), a unified framework that represents multimodal information within a shared discrete space, enabling a consistent and principled autoregressive modeling across modalities. A key innovation is the Discrete Native Any-resolution Visual Transformer (dNaViT), which performs tokenization and de-tokenization at arbitrary resolutions, transforming continuous visual signals into hierarchical discrete tokens. Building on this foundation, we develop LongCat-Next, a native multimodal model that processes text, vision, and audio under a single autoregressive objective with minimal modality-specific design. As an industrial-strength foundation model, it excels at seeing, painting, and talking within a single framework, achieving strong performance across a wide range of multimodal benchmarks. In particular, LongCat-Next addresses the long-standing performance ceiling of discrete vision modeling on understanding tasks and provides a unified approach to effectively reconcile the conflict between understanding and generation. As an attempt toward native multimodality, we open-source the LongCat-Next and its tokenizers, hoping to foster further research and development in the community. GitHub: https://github.com/meituan-longcat/LongCat-Next

  4. Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

    Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.

  5. ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

    OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution. Existing security measures for the OpenClaw ecosystem remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. To bridge this gap, we present ClawKeeper, a real-time security framework that integrates multi-dimensional protection mechanisms across three complementary architectural layers. (1) Skill-based protection operates at the instruction level, injecting structured security policies directly into the agent context to enforce environment-specific constraints and cross-platform boundaries. (2) Plugin-based protection serves as an internal runtime enforcer, providing configuration hardening, proactive threat detection, and continuous behavioral monitoring throughout the execution pipeline. (3) Watcher-based protection introduces a novel, decoupled system-level security middleware that continuously verifies agent state evolution. It enables real-time execution intervention without coupling to the agent's internal logic, supporting operations such as halting high-risk actions or enforcing human confirmation. We argue that this Watcher paradigm holds strong potential to serve as a foundational building block for securing next-generation autonomous agent systems. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of ClawKeeper across diverse threat scenarios. We release our code.

  6. Terminal Agents Suffice for Enterprise Automation

    There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Yet, it remains unclear whether such complex agentic systems are necessary given their cost and operational overhead. We argue that a coding agent equipped only with a terminal and a filesystem can solve many enterprise tasks more effectively by interacting directly with platform APIs. We evaluate this hypothesis across diverse real-world systems and show that these low-level terminal agents match or outperform more complex agent architectures. Our findings suggest that simple programmatic interfaces, combined with strong foundation models, are sufficient for practical enterprise automation.

  7. MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

    Recent progress in deep research systems has been impressive, but evaluation still lags behind real user needs. Existing benchmarks predominantly assess final reports using fixed rubrics, failing to evaluate the underlying research process. Most also offer limited multimodal coverage, rely on synthetic tasks that do not reflect real-world query complexity, and cannot be refreshed as knowledge evolves. To address these gaps, we introduce MiroEval, a benchmark and evaluation framework for deep research systems. The benchmark comprises 100 tasks (70 text-only, 30 multimodal), all grounded in real user needs and constructed via a dual-path pipeline that supports periodic updates, enabling a live and evolving setting. The proposed evaluation suite assesses deep research systems along three complementary dimensions: adaptive synthesis quality evaluation with task-specific rubrics, agentic factuality verification via active retrieval and reasoning over both web sources and multimodal attachments, and process-centric evaluation audits how the system searches, reasons, and refines throughout its investigation. Evaluation across 13 systems yields three principal findings: the three evaluation dimensions capture complementary aspects of system capability, with each revealing distinct strengths and weaknesses across systems; process quality serves as a reliable predictor of overall outcome while revealing weaknesses invisible to output-level metrics; and multimodal tasks pose substantially greater challenges, with most systems declining by 3 to 10 points. The MiroThinker series achieves the most balanced performance, with MiroThinker-H1 ranking the highest overall in both settings. Human verification and robustness results confirm the reliability of the benchmark and evaluation framework. MiroEval provides a holistic diagnostic tool for the next generation of deep research agents.

  8. ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

    Beneath the stunning visual fidelity of modern AIGC models lies a "logical desert", where systems fail tasks that require physical, causal, or complex spatial reasoning. Current evaluations largely rely on superficial metrics or fragmented benchmarks, creating a ``performance mirage'' that overlooks the generative process. To address this, we introduce ViGoR Vision-G}nerative Reasoning-centric Benchmark), a unified framework designed to dismantle this mirage. ViGoR distinguishes itself through four key innovations: 1) holistic cross-modal coverage bridging Image-to-Image and Video tasks; 2) a dual-track mechanism evaluating both intermediate processes and final results; 3) an evidence-grounded automated judge ensuring high human alignment; and 4) granular diagnostic analysis that decomposes performance into fine-grained cognitive dimensions. Experiments on over 20 leading models reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR as a critical ``stress test'' for the next generation of intelligent vision models. The demo have been available at https://vincenthancoder.github.io/ViGoR-Bench/

  9. DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

    Data-centric training has emerged as a promising direction for improving large language models (LLMs) by optimizing not only model parameters but also the selection, composition, and weighting of training data during optimization. However, existing approaches to data selection, data mixture optimization, and data reweighting are often developed in isolated codebases with inconsistent interfaces, hindering reproducibility, fair comparison, and practical integration. In this paper, we present DataFlex, a unified data-centric dynamic training framework built upon LLaMA-Factory. DataFlex supports three major paradigms of dynamic data optimization: sample selection, domain mixture adjustment, and sample reweighting, while remaining fully compatible with the original training workflow. It provides extensible trainer abstractions and modular components, enabling a drop-in replacement for standard LLM training, and unifies key model-dependent operations such as embedding extraction, inference, and gradient computation, with support for large-scale settings including DeepSpeed ZeRO-3. We conduct comprehensive experiments across multiple data-centric methods. Dynamic data selection consistently outperforms static full-data training on MMLU across both Mistral-7B and Llama-3.2-3B. For data mixture, DoReMi and ODM improve both MMLU accuracy and corpus-level perplexity over default proportions when pretraining Qwen2.5-1.5B on SlimPajama at 6B and 30B token scales. DataFlex also achieves consistent runtime improvements over original implementations. These results demonstrate that DataFlex provides an effective, efficient, and reproducible infrastructure for data-centric dynamic training of LLMs.

  10. The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

    Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-readable verbal traces. This shift is driven by the structural limitations of explicit-space computation, including linguistic redundancy, discretization bottlenecks, sequential inefficiency, and semantic loss. This survey aims to provide a unified and up-to-date landscape of latent space in language-based models. We organize the survey into five sequential perspectives: Foundation, Evolution, Mechanism, Ability, and Outlook. We begin by delineating the scope of latent space, distinguishing it from explicit or verbal space and from the latent spaces commonly studied in generative visual models. We then trace the field's evolution from early exploratory efforts to the current large-scale expansion. To organize the technical landscape, we examine existing work through the complementary lenses of mechanism and ability. From the perspective of Mechanism, we identify four major lines of development: Architecture, Representation, Computation, and Optimization. From the perspective of Ability, we show how latent space supports a broad capability spectrum spanning Reasoning, Planning, Modeling, Perception, Memory, Collaboration, and Embodiment. Beyond consolidation, we discuss the key open challenges, and outline promising directions for future research. We hope this survey serves not only as a reference for existing work, but also as a foundation for understanding latent space as a general computational and systems paradigm for next-generation intelligence.

  11. Generative World Renderer

    Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a novel dual-screen stitched capture method, we extracted 4M continuous frames (720p/30 FPS) of synchronized RGB and five G-buffer channels across diverse scenes, visual effects, and environments, including adverse weather and motion-blur variants. This dataset uniquely advances bidirectional rendering: enabling robust in-the-wild geometry and material decomposition, and facilitating high-fidelity G-buffer-guided video generation. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Experiments demonstrate that inverse renderers fine-tuned on our data achieve superior cross-dataset generalization and controllable generation, while our VLM evaluation strongly correlates with human judgment. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.

  12. SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

    Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.

  13. Self-Distilled RLVR

    On-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide dense, fine-grained signals for each sampled trajectory, in contrast to reinforcement learning with verifiable rewards (RLVR), which only obtains sparse signals from verifiable outcomes in the environment. Recently, the community has explored on-policy self-distillation (OPSD), where the same model serves as both teacher and student, with the teacher receiving additional privileged information such as reference answers to enable self-evolution. This paper demonstrates that learning signals solely derived from the privileged teacher result in severe information leakage and unstable long-term training. Accordingly, we identify the optimal niche for self-distillation and propose RLSD (RLVR with Self-Distillation). Specifically, we leverage self-distillation to obtain token-level policy differences for determining fine-grained update magnitudes, while continuing to use RLVR to derive reliable update directions from environmental feedback (e.g., response correctness). This enables RLSD to simultaneously harness the strengths of both RLVR and OPSD, achieving a higher convergence ceiling and superior training stability.

  14. A Simple Baseline for Streaming Video Understanding

    Recent streaming video understanding methods increasingly rely on complex memory mechanisms to handle long video streams. We challenge this trend with a simple finding: a sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM already matches or surpasses published streaming models. We formalize this baseline as SimpleStream and evaluate it against 13 major offline and online video LLM baselines on OVO-Bench and StreamingBench. Despite its simplicity, SimpleStream delivers consistently strong performance. With only 4 recent frames, it reaches 67.7% average accuracy on OVO-Bench and 80.59% on StreamingBench. Controlled ablations further show that the value of longer context is backbone-dependent rather than uniformly increasing with model scale, and reveal a consistent perception-memory trade-off: adding more historical context can improve recall, but often weakens real-time perception. This suggests that stronger memory, retrieval, or compression modules should not be taken as evidence of progress unless they clearly outperform SimpleStream under the same protocol. We therefore argue that future streaming benchmarks should separate recent-scene perception from long-range memory, so that performance improvements from added complexity can be evaluated more clearly.

  15. Token Warping Helps MLLMs Look from Nearby Viewpoints

    Can warping tokens, rather than pixels, help multimodal large language models (MLLMs) understand how a scene appears from a nearby viewpoint? While MLLMs perform well on visual reasoning, they remain fragile to viewpoint changes, as pixel-wise warping is highly sensitive to small depth errors and often introduces geometric distortions. Drawing on theories of mental imagery that posit part-level structural representations as the basis for human perspective transformation, we examine whether image tokens in ViT-based MLLMs serve as an effective substrate for viewpoint changes. We compare forward and backward warping, finding that backward token warping, which defines a dense grid on the target view and retrieves a corresponding source-view token for each grid point, achieves greater stability and better preserves semantic coherence under viewpoint shifts. Experiments on our proposed ViewBench benchmark demonstrate that token-level warping enables MLLMs to reason reliably from nearby viewpoints, consistently outperforming all baselines including pixel-wise warping approaches, spatially fine-tuned MLLMs, and a generative warping method.

  16. Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

    Multimodal Large Language Models (MLLMs) are evolving from passive observers into active agents, solving problems through Visual Expansion (invoking visual tools) and Knowledge Expansion (open-web search). However, existing evaluations fall short: they lack flexible tool integration, test visual and search tools separately, and evaluate primarily by final answers. Consequently, they cannot verify if tools were actually invoked, applied correctly, or used efficiently. To address this, we introduce Agentic-MME, a process-verified benchmark for Multimodal Agentic Capabilities. It contains 418 real-world tasks across 6 domains and 3 difficulty levels to evaluate capability synergy, featuring over 2,000 stepwise checkpoints that average 10+ person-hours of manual annotation per task. Each task includes a unified evaluation framework supporting sandboxed code and APIs, alongside a human reference trajectory annotated with stepwise checkpoints along dual-axis: S-axis and V-axis. To enable true process-level verification, we audit fine-grained intermediate states rather than just final answers, and quantify efficiency via an overthinking metric relative to human trajectories. Experimental results show the best model, Gemini3-pro, achieves 56.3% overall accuracy, which falls significantly to 23.0% on Level-3 tasks, underscoring the difficulty of real-world multimodal agentic problem solving.

  17. OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

    World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the evolution of world models, we propose a clear definition: a world model is a model or framework centered on perception, equipped with interaction and long-term memory capabilities, for understanding and predicting the complex world. We further systematically categorize the essential capabilities of world models. Based on this definition, OpenWorldLib integrates models across different tasks within a unified framework, enabling efficient reuse and collaborative inference. Finally, we present additional reflections and analyses on potential future directions for world model research. Code link: https://github.com/OpenDCAI/OpenWorldLib

  18. MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

    Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored. Yet SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself. Building on this finding, we present \minerupro, which advances the state of the art solely through data engineering and training strategy optimization while keeping the 1.2B-parameter architecture of \mineru completely fixed. At its core is a Data Engine co-designed around coverage, informativeness, and annotation accuracy: Diversity-and-Difficulty-Aware Sampling expands training data from under 10M to 65.5M samples while correcting distribution shift; Cross-Model Consistency Verification leverages output agreement among heterogeneous models to assess sample difficulty and generate reliable annotations; the Judge-and-Refine pipeline improves annotation quality for hard samples through render-then-verify iterative correction. A three-stage progressive training strategy -- large-scale pre-training, hard sample fine-tuning, and GRPO alignment -- sequentially exploits these data at different quality tiers. On the evaluation front, we fix element-matching biases in OmniDocBench~v1.5 and introduce a Hard subset, establishing the more discriminative OmniDocBench~v1.6 protocol. Without any architectural modification, \minerupro achieves 95.69 on OmniDocBench~v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods including models with over 200times more parameters.

  19. LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

    Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leaving robustness to paraphrased instructions underexplored. To study this gap, we introduce LIBERO-Para, a controlled benchmark that independently varies action expressions and object references for fine-grained analysis of linguistic generalization. Across seven VLA configurations (0.6B-7.5B), we observe consistent performance degradation of 22-52 pp under paraphrasing. This degradation is primarily driven by object-level lexical variation: even simple synonym substitutions cause large drops, indicating reliance on surface-level matching rather than semantic grounding. Moreover, 80-96% of failures arise from planning-level trajectory divergence rather than execution errors, showing that paraphrasing disrupts task identification. Binary success rate treats all paraphrases equally, obscuring whether models perform consistently across difficulty levels or rely on easier cases. To address this, we propose PRIDE, a metric that quantifies paraphrase difficulty using semantic and syntactic factors. Our benchmark and corresponding code are available at: https://github.com/cau-hai-lab/LIBERO-Para

  20. TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

    Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-key selection and unstable reasoning. To avoid this issue, we turn to the pre-RoPE space, where we observe that Q and K vectors are highly concentrated around fixed non-zero centers and remain stable across positions -- Q/K concentration. We show that this concentration causes queries to preferentially attend to keys at specific distances (e.g., nearest keys), with the centers determining which distances are preferred via a trigonometric series. Based on this, we propose TriAttention to estimate key importance by leveraging these centers. Via the trigonometric series, we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation. On AIME25 with 32K-token generation, TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction, whereas leading baselines achieve only about half the accuracy at the same efficiency. TriAttention enables OpenClaw deployment on a single consumer GPU, where long context would otherwise cause out-of-memory with Full Attention.

Techmeme(120)

  1. Solana-based DeFi platform Drift warns users about an "active attack" on its protocol; Arkham data said over $250M had moved from Drift to an interim wallet (Helene Braun/CoinDesk)

    Helene Braun / CoinDesk : Solana-based DeFi platform Drift warns users about an “active attack” on its protocol; Arkham data said over $250M had moved from Drift to an interim wallet —  The platform halted deposits while it investigates suspicious activity and urges users to proceed with caution.

  2. Cognichip, which is building an AI model for chip design, raised a $60M Series A led by Seligman Ventures, with participation from new board member Lip-Bu Tan (Tim Fernholz/TechCrunch)

    Tim Fernholz / TechCrunch : Cognichip, which is building an AI model for chip design, raised a $60M Series A led by Seligman Ventures, with participation from new board member Lip-Bu Tan —  The most advanced silicon chips have accelerated the development of artificial intelligence.  Now, can AI return the favor?

  3. Sources: the FBI has declared a recent China-linked hack of a system, which contained pen register and trap and trace surveillance returns, a "major incident" (John Sakellariadis/Politico)

    John Sakellariadis / Politico : Sources: the FBI has declared a recent China-linked hack of a system, which contained pen register and trap and trace surveillance returns, a “major incident” —  The determination suggests the hackers successfully compromised swathes of sensitive data stored directly on FBI systems …

  4. Franklin Templeton agrees to acquire CoinFund spinoff 250 Digital to form Franklin Crypto, which will offer strategies designed for institutional investors (Vicky Ge Huang/Wall Street Journal)

    Vicky Ge Huang / Wall Street Journal : Franklin Templeton agrees to acquire CoinFund spinoff 250 Digital to form Franklin Crypto, which will offer strategies designed for institutional investors —  Money manager's crypto investment unit will offer strategies designed for institutional investors

  5. LinkedIn job posting data: companies added 640K AI-related jobs from 2023 to 2025 in the US, including 225K "head of AI" jobs, up 49% from the prior four years (Te-Ping Chen/Wall Street Journal)

    Te-Ping Chen / Wall Street Journal : LinkedIn job posting data: companies added 640K AI-related jobs from 2023 to 2025 in the US, including 225K “head of AI” jobs, up 49% from the prior four years —  AI is raising big fears about employment losses, but it is also giving rise to new engineering and training jobs

  6. Sources: SpaceX is floating a $2T+ valuation to prospective investors in its IPO; SpaceX's acquisition of xAI reportedly valued the combined company at $1.25T (Bloomberg)

    Bloomberg : Sources: SpaceX is floating a $2T+ valuation to prospective investors in its IPO; SpaceX's acquisition of xAI reportedly valued the combined company at $1.25T —  SpaceX boosted its target IPO valuation above $2 trillion, according to people familiar with the matter, as the world's …

  7. Source: OpenAI bought TBPN, which was set to generate $30M in 2026, for "low hundreds of millions of dollars"; OpenAI says TBPN will be editorially independent (George Hammond/Financial Times)

    George Hammond / Financial Times : Source: OpenAI bought TBPN, which was set to generate $30M in 2026, for “low hundreds of millions of dollars”; OpenAI says TBPN will be editorially independent —  ChatGPT-maker moves into broadcasting with deal for TBPN after it had pledged to abandon ‘side-quests’

  8. The CFTC sues Arizona, Connecticut, and Illinois over their actions against prediction markets, saying it has the "exclusive" authority to regulate such markets (Alex Harring/CNBC)

    Alex Harring / CNBC : The CFTC sues Arizona, Connecticut, and Illinois over their actions against prediction markets, saying it has the “exclusive” authority to regulate such markets —  A federal commission on Wednesday announced lawsuits against three states over its ability to exclusively regulate prediction markets.

  9. Sources: Meta has paused its work with Mercor while it investigates a security breach at the data vendor; OpenAI says it is investigating the security incident (Wired)

    Wired : Sources: Meta has paused its work with Mercor while it investigates a security breach at the data vendor; OpenAI says it is investigating the security incident —  Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor.

  10. Utah launches a one-year pilot program allowing Legion Health's AI chatbot to renew prescriptions for 15 low-risk psychiatric maintenance medications (Robert Hart/The Verge)

    Robert Hart / The Verge : Utah launches a one-year pilot program allowing Legion Health's AI chatbot to renew prescriptions for 15 low-risk psychiatric maintenance medications —  Some psychiatrists are asking what problem, exactly, this is solving. … Utah is allowing an AI system to prescribe psychiatric drugs without a doctor.

  11. Internal memo: Iranian strikes have rendered two AWS zones "hard down" in Dubai and Bahrain and Amazon expects them to be "unavailable for an extended period" (Alex Kantrowitz/Big Technology)

    Alex Kantrowitz / Big Technology : Internal memo: Iranian strikes have rendered two AWS zones “hard down” in Dubai and Bahrain and Amazon expects them to be “unavailable for an extended period” —  Amazon tells its employees to deprioritize these regions as the Iran war deals meaningful damage to its infrastructure in the Gulf.

  12. Interviews with Codex lead Alexander Embiricos, OpenClaw's Peter Steinberger, and others about OpenAI's upcoming superapp that combines ChatGPT with Codex (Alex Heath/Sources)

    Alex Heath / Sources : Interviews with Codex lead Alexander Embiricos, OpenClaw's Peter Steinberger, and others about OpenAI's upcoming superapp that combines ChatGPT with Codex —  Why Codex is becoming the foundation for everything.  Also: Fidji Simo's internal memo about taking a leave of absence.  —  ∙ Paid

  13. Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it's meant for AI research, not accelerating graphics (AppleInsider)

    AppleInsider : Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it's meant for AI research, not accelerating graphics —  Apple has signed a driver for AMD or Nvidia eGPUs connected to Apple Silicon but there are some big caveats, and it won't improve your graphics.

  14. Research across 1,372 participants and 9K+ trials details "cognitive surrender", where most subjects had minimal AI skepticism and accepted faulty AI reasoning (Kyle Orland/Ars Technica)

    Kyle Orland / Ars Technica : Research across 1,372 participants and 9K+ trials details “cognitive surrender”, where most subjects had minimal AI skepticism and accepted faulty AI reasoning —  When it comes to large language model-powered tools, there are generally two broad categories of users.

  15. VCs are covering expenses like rent for young college dropouts founding AI startups; Antler: average AI unicorn founder age fell from 40 in 2020 to 29 in 2024 (Kate Clark/Wall Street Journal)

    Kate Clark / Wall Street Journal : VCs are covering expenses like rent for young college dropouts founding AI startups; Antler: average AI unicorn founder age fell from 40 in 2020 to 29 in 2024 —  Venture capitalists are stepping in to cover expenses like rent while dropouts from Harvard to Stanford chase their startup dreams

  16. Y Combinator appears to have dropped Delve, removing the company's profile from its startup directory, following allegations of fake compliance certificates (The Economic Times)

    The Economic Times : Y Combinator appears to have dropped Delve, removing the company's profile from its startup directory, following allegations of fake compliance certificates —  Delve's removal from Y Combinator's directory follows allegations that compliance certifications for hundreds of Delve's clients were fabricated.

  17. Medvi, glorified by the NYT as a two-employee startup with $1B+ in revenue, is a warning about how AI can be misused for shady business and marketing practices (Gary Marcus/Marcus on AI)

    Gary Marcus / Marcus on AI : Medvi, glorified by the NYT as a two-employee startup with $1B+ in revenue, is a warning about how AI can be misused for shady business and marketing practices —  AI isn't the only thing behind Medvi  —  On Thursday, The New York Times published a thing  —  and it went viral, declared as a victory for AI:

  18. Drift details how suspected North Korean attackers stole $270M posing as a quant trading firm in a 6+ month operation with in-person meetings and a $1M+ deposit (Shaurya Malwa/CoinDesk)

    Shaurya Malwa / CoinDesk : Drift details how suspected North Korean attackers stole $270M posing as a quant trading firm in a 6+ month operation with in-person meetings and a $1M+ deposit —  Attackers posed as a trading firm, met Drift contributors in person across multiple countries, deposited $1 million of their own capital …

  19. A profile of Mikko Hyppönen, a cybersecurity veteran who pivoted from fighting malware to developing anti-drone systems for law enforcement and the military (Lorenzo Franceschi-Bicchierai/TechCrunch)

    Lorenzo Franceschi-Bicchierai / TechCrunch : A profile of Mikko Hyppönen, a cybersecurity veteran who pivoted from fighting malware to developing anti-drone systems for law enforcement and the military —  Mikko Hyppönen is pacing back and forth on the stage, with his trademark dark blonde ponytail resting on an impeccable teal suit.

  20. TrueUp data shows over 67,000 software engineering job openings, up 30% so far in 2026 and the most in three years, with listings up about 2x since mid-2023 (Alistair Barr/Business Insider)

    Alistair Barr / Business Insider : TrueUp data shows over 67,000 software engineering job openings, up 30% so far in 2026 and the most in three years, with listings up about 2x since mid-2023 —  The US jobs report on Friday was surprisingly strong.  That's not the only part of the job market that's doing better than expected.

Solidot(114)

  1. 欧洲国家快速拥抱绿色技术和电动汽车

    因霍尔木兹海峡封锁推高世界各地的油气价格,欧洲多国转向了绿色技术购买了更多电动汽车。数据显示,3 月前三周英国热泵销量较上月同期增长 51%,太阳能销量增长 54%,电动汽车充电器销量增长 20%。法国二手车在线零售商 Aramisauto 的电动汽车销量在 2 月中旬到 3 月 9 日期间几乎翻了一番。阿姆斯特丹二手车交易平台 Olx 表示它在法国、罗马尼亚、葡萄牙和波兰的平台上电动汽车的客户咨询量激增。挪威最大二手车交易平台 Finn.no 上电动汽车销量超过了柴油车。

  2. 百度多辆无人驾驶出租车同时发生故障

    百度旗下的萝卜快跑在武汉运营无人驾驶出租车服务,本周二 3 月 31 日 20 时左右无人出租车在路上集体趴窝。根据社交媒体平台上广泛流传的现场照片和视频,故障的萝卜快跑无人出租车不仅有停在路边的,还有停在马路中间的,甚至还有停在高架路的,部分乘客被困车内逾一小时。武汉交警称,初步判断为系统故障所致。交警表示这起事故中无人受伤,所有乘客均安全下车。目前不清楚有多少百度无人驾驶出租车受到影响。社交网络的照片和视频显示,突然停车的无人出租车至少造成多起追尾事故。有武汉网民称看到至少十几辆无人出租车趴窝。百度尚未对事故进行说明。

  3. 瑞典回归传统的基于纸质的课堂教育模式

    数字化教育和社交媒体给儿童和青少年带来的问题最近几年日益受到关注和争论。瑞典和其它国家一样,过去几十年逐渐放弃纸质书籍,切换到平板电脑和数字资源,试图让学生为网络世界做好准备。但数字化教育的争议最终促使瑞典在 2023 年宣布回归传统的基于纸质的课堂教育模式,纸质书籍重新回到课堂,学生开始学习用铅笔或钢笔在纸上手写,以最传统的方式进行学习。瑞典政府还计划在全国范围内推行学校禁止使用手机政策。这标志着瑞典教育模式的重大转变。瑞典官员强调,学校不会完全摒弃数字技术。数字辅助工具主要用于帮助高年级年龄段学生进行学习。

  4. 尼安德特人在灭绝边缘生存了 35 万年

    从 40 万到 4.5 万年前,尼安德特人独自占据了欧亚大陆的大部分地区,狩猎大型动物、采集植物、熟练制造石器,用兽皮制作衣物。但他们的生存状况岌岌可危。两项新研究表明,尼安德特人以地理上相距甚远的小群体形式生存,经历了严重的近亲繁殖,7.5 万年前濒临灭绝。近亲繁殖被广泛认为不利于适应环境变化。但如果环境在较长的时间内维持稳定,近亲繁殖的群体仍然能长时间生存下来。研究人员报告,7.5 万年前,尼安德特人的遗址和骨骼遗骸广泛分布于整个欧洲大陆,​​其基因组相对多样化。但 7.5 万至 6.5 万年前的冰河时期遗址的数量减少,6 万年前所有遗传多样性全部消失只剩一条单一的谱系。4.5 万年前气候再次波动,加上现代人类抵达欧亚大陆,尼安德特人有效种群数量在三千年内急剧下降,在大约 4.2 万年前达到最低点,之后彻底消失。

  5. 亚马逊洽谈收购 Globalstar 以挑战 Starlink

    亚马逊正在洽谈收购 Globalstar 以帮助它与 SpaceX 的 Starlink 宽带卫星星座展开竞争。苹果持有 Globalstar 五分之一的股份,因此亚马逊和苹果需要展开谈判,增加了交易的复杂性。双方的磋商可能会破裂,无法达成任何协议。Globalstar 成立于 1991 年,受收购传闻的推动,周三市值达到 90 亿美元。苹果是在 2024 年向 Globalstar 投资 15 亿美元从而拥有 20% 股份。

  6. 实验室手套可能会释放塑料颗粒影响测量结果

    研究人员发现,常用的丁腈橡胶和乳胶实验室手套会释放与微塑料相似的硬脂酸盐颗粒,可能会在研究微塑料污染时高估其含量。实验室手套会无意中将颗粒转移到用于分析空气、水等样本的实验室工具上。研究人员建议使用洁净室手套,这种手套释放的颗粒要少得多。硬脂酸盐是一种类肥皂的盐基物质,被添加到一次性手套中,以帮助其在制造过程中轻松与模具分离。由于其化学性质与部分塑料相似,在实验室分析中会难以区分,增加了研究微塑料污染时出现假阳性的风险。

  7. Anthropic 以版权侵犯为由要求删除上万份 Claude Code 源代码副本

    Claude Code 源代码不小心泄漏之后,Anthropic 正以版权侵犯为由要求删除上万份 Claude Code 源码副本,但覆水难收,新的副本仍然源源不断的出现。开发者对该源码的分析揭示了 Anthropic 采用的一些窍门:定期回顾任务以巩固记忆,该过程被称之为“做梦(dreaming)”;某种隐藏身份的卧底模式;被称为 Buddy 的可互动电子宠物。还有开发者有其它 AI 工具和其它编程语言重写了 Claude Code,认为此举称不上版权侵犯,能免于下架的命运。

  8. SpaceX 申请 IPO

    SpaceX 本周向 SEC 秘密递交了上市申请,标志着史上规模最大的 IPO 拉开帷幕。秘密提交上市申请允许企业在不公开披露财务信息的情况下推进上市计划。SpaceX 此次 IPO 计划融资约 750 亿美元,目标估值约 1.75 万亿美元。在美国,只有英伟达、苹果、Alphabet、微软和亚马逊的市值高于 SpaceX。SpaceX 有望迅速加入纳指,因为纳斯达克证券交易所刚刚修改了几乎是为 SpaceX 量身定做的指数纳入方法:取消了公开发行至少 10% 股份的要求——SpaceX 计划发行不到 5% 的股份;上市交易 15 天后即可加入纳斯达克 100 指数。批评人士认为,此举可能会扭曲 IPO 后的价格发现。

  9. 微软更新服务条款声明 Copilot 仅供娱乐

    微软被发现最近更新了 Copilot 的服务条款,包含了一则免责声明:Copilot 仅供娱乐,会犯错,会没有如预期般的工作,不要依赖 Copilot 提供重要建议,使用 Copilot 风险自负。经常使用 AI 聊天机器人的人可能早就知道它提供的信息并不可靠,需要验证。但由于它们过于方便,偷懒的人类变得不那么愿意花时间验证其输出。微软的免责声明再次强调,AI 聊天机器人既不是伴侣,也不是可靠的建议来源。它们是容易出错的工具,可能前一秒大有裨益,下一秒就可能犯错。

  10. 可再生能源新增装机容量占全球新增装机容量的八成以上

    IRENA 最新报告显示,2025 年可再生能源新增装机容量占全球新增装机容量的 85.6%,其中太阳能新增装机容量占到了四分之三。可再生能源新增装机容量约 700 GW,太阳能就有 511 GW,太阳能总装机容量达到了 2.4 TW,比风能和水力发电高 1 TW 以上,但由于太阳能的特性,2024 年的数据显示太阳能的发电量低于风力发电:太阳能占全球总发电量的 7%,风能 8%,核能占 9%。2025 年的数据还没有公布,但根据其快速增长的装机容量太阳能的发电量可能已经超过风电成为仅次于水电的第二大无碳电力来源。

  11. 考古学家在北美发现距今至少 1.2 万年的骰子

    赌博的历史比你想象的要更悠久。考古学家在《American Antiquity》期刊上报告发现了美洲原居民用于赌博的最古老骰子,距今至少 1.2 万年,比旧大陆上的同类活动要早六千年。从掷骰子到赛马,所有机会游戏都依赖于概率,而概率是一个相对反直觉的概念。骰子、机会游戏和赌博一直是美洲原居民文化的重要组成部分,最早的骰子出现在怀俄明州、科罗拉多州和新墨西哥州晚更新世的福尔松地层(Late Pleistocene Folsom)中。这些结果表明,古代美洲原住民掌握了关于机会、随机性和概率的基本知识,因而在对这些概念的理解和实际应用上走在了世界前列。

  12. 人们日常说话的单词量比上一年减少 300 个单词

    发表在《Perspectives on Psychological Science》期刊上的一项研究分析了逾 2000 名参与者在 2005 年到2019 年之间的音频数据,参与者的年龄从 10-94 岁。结果显示,我们每天说话的单词量比上一年减少了约 300 个单词。这意味着一年说话的单词量比上一年减少逾 12 万个单词。说话更少意味着我们花更少的时间与他人交流,也可能意味着更孤独,而孤独与负面健康影响密切相关。研究显示,年轻一代每日说话的单词量下降更快。

  13. 蟒蛇之血含有无副作用的减肥化合物

    蟒蛇能长到电话杆那么大,能一次吞食大量食物然后几个月不进食,同时还能维持代谢健康。根据发表在《Natural Metabolism》期刊上的一项研究,科学家报告在蟒蛇血液中发现了一种抑制食欲的化合物,没有流行减肥药 GLP-1 的副作用。研究发现,在进食后的几个小时内,为帮助消耗食物蟒蛇的心脏会扩张 25% 新陈代谢速度加快 4000 倍。研究人员测量了喂食后的球蟒和缅甸蟒的血液样本,发现有 208 种代谢物在蟒蛇进食后显著增加,其中名为 para-tyramine-O-sulfate(pTOS)的分子浓度飙升了千倍。小鼠实验显示,pTOS 能抑制食欲,降低体重,且不会引起胃肠道问题或肌肉流失。

  14. 美国近半数计划中的数据中心项目推迟或取消

    由于重要电力设施零部件如变压器、开关和电池短缺,美国近半数计划中的数据中心项目推迟或取消。美国计划在 2026 年新增 12 GW 的数据中心容量,但由于各种问题,只有三分之一的数据中心容量在积极建造中。电力基础设施占数据中心总成本的不到 10%,但它与计算硬件同样重要。由于需求旺盛,美国大功率变压器的交货周期从 2020 年前的 24-30 个月大幅延长到五年甚至更长。对 AI 数据中心而言,这无疑是灾难,因为它们的部署周期通常不到 18 个月。为解决短缺美国公司转向了全球市场,加拿大、墨西哥和韩国成为美国 AI 数据中心大功率变压器的主要供应国。数据显示,截至 2025 年 10 月,美国从中国进口的大功率变压器数量从 2022 年的不到 1500 台增至逾 8000 台。除此之外,中国占美国电池进口的 40% 以上,部分变压器和开关设备的份额接近 30%。

  15. 律师滥用 AI 生成虚假案例的事件激增

    律师滥用 AI 生成虚假的不存在案例的情况屡禁不止,而法庭对相关律师的惩罚并没有起到威慑作用。2025 年此类事件的数量出现了激增。巴黎高等商学院 (HEC Paris) 研究员 Damien Charlotin 建立了一个全球数据库,跟踪律师对 AI 的滥用。他说最近一天内收到来自 10 个不同法院的 10 起此类案件。他至今记录到了逾 1200 起滥用 AI 生成虚构案例的事件,其中美国最多,高达 831 起,香港记录到了 2 起。Damien Charlotin 说,法庭最近也开始加大了惩罚力度,俄勒冈州一名律师因滥用 AI 被勒令支付 109,700 美元的罚款和诉讼费用。

  16. Gentoo GNU/Hurd 不是愚人节玩笑

    4 月 1 日愚人节,Gentoo Linux 项目宣布将把 GNU Hurd 作为其主要内核。这并非完全是愚人节玩笑,它真的发布了 Gentoo GNU/Hurd 移植版本。基于微内核架构的 GNU Hurd 至今有逾 35 历史,但 1.0 版本还未发布,最新版本是 2016 年的 v0.9。Gentoo 项目表示它的 GNU/Hurd 版本仍然处于实验阶段,建议想要尝试的用户通过 QEMU 模拟器运行,当然也可以挑战直接在硬件上运行。

  17. 认知投降导致 AI 用户放弃逻辑思维能力

    AI 工具的用户通常可分为两类:其一将 AI 视为功能强大但会犯错的服务,需要人类仔细监督和审查以发现其中的推理或事实错误;其二将 AI 视为无所不知——此类用户被称为是“认知投降派”。宾夕法尼亚大学沃顿商学院的研究人员对 1372 名参与者和逾 9500 次测试后发现,高达 73.2% 的情况下参与者愿意接受 AI 错误的推理,只有 19.7% 的情况下会推翻推理。研究人员表示这一结果“表明人很容易将 AI 生成的输出融入到决策过程中,且通常几乎没有任何抵触或怀疑”,“流畅、自信的输出会被视为有认知权威性,从而降低审查门槛,减弱了通常会促使人们进行深思熟虑的元认知信号”。他们发现,倾向于将 AI 视为权威的人更容易被 AI 提供的错误答案误导。

  18. AWS 工程师报告 Linux 7.0 下 PostgreSQL 性能暴降一半

    亚马逊 AWS 工程师 Salvatore Dipietro 报告 Linux 7.0 下 PostgreSQL 的吞吐量和延迟性能出现了显著的下降。Linux 7.0 目前还在开发中,预计会在一两周内发布。测试显示,在基于 arm64 架构的 Graviton4 服务器上 PostgreSQL 的吞吐量仅为上个内核版本的 0.51 倍,原因是用户空间自旋锁导致花费的时间大幅增加。根本原因被认为是 Linux 7.0 新引入的对内核可用抢占模式的限制上。PostgreSQL 开发者要求在不同条件下重复进行更多测试。

  19. Ubuntu 26.04 LTS 的最低内存需求提高到 6GB

    Canonical 即将于本月晚些时候正式释出的 Ubuntu 26.04 LTS 把最低内存需求提高到了 6GB。Ubuntu 14.04 LTS (Trusty Tahr)将最低内存需求设为 1GB,Ubuntu 18.04 LTS (Bionic Beaver)提高到 4GB,现在又一次提高了内存需求。相比下,微软将 Windows 11 的最低内存需求设为 4GB,当然这不过是微软的又一个谎言,没人真的会在只有 4GB 内存的计算机运行微软的最新操作系统,Windows 11 至少需要 8GB 内存。Canonical 并没有将 6GB 内存作为硬条件,用户仍然能在不到 6GB 内存的计算机上安装 Ubuntu 26.04。

  20. 德国公开俄罗斯勒索软件组织 REvil 头目的身份

    德国公开了曾在早期运营俄罗斯勒索软件组织 GandCrab 和 REvil 的头目 UNKN 的身份。31 岁的 Daniil Maksimovich Shchukin 于 2019-2021 年间在德国实施了至少 130 起计算机破坏和勒索行动。德国称,Shchukin 以及另一名俄罗斯人——43 岁 Anatoly Sergeevitsch Kravchuk——一起勒索了近 200 万欧元,造成经济损失逾 3500 万欧元。德国联邦刑事警察局(BKA)称他领导的 GandCrab 和 REvil 首创了双重勒索——先向受害者收取赎金提供解锁的密钥,然后再收取一笔费用换取不公开被盗数据的承诺。GandCrab 在 2019 年成功勒索逾 20 亿美元后宣布解散,但随后就以 REvil 的名字再次亮相。