DIGEST · 2026-04-17

OrangeBot.AI Digest — 2026-04-17

90 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. All 12 moonwalkers had "lunar hay fever" from dust smelling like gunpowder (2018) (www.esa.int)
  2. Show HN: Smol machines – subsecond coldstart, portable virtual machines (github.com)
  3. NASA Force (nasaforce.gov)
  4. Measuring Claude 4.7's tokenizer costs (www.claudecodecamp.com)
  5. NIST gives up enriching most CVEs (risky.biz)
  6. Claude Design (www.anthropic.com)
  7. Middle schooler finds coin from Troy in Berlin (www.thehistoryblog.com)
  8. Ban the sale of precise geolocation (www.lawfaremedia.org)
  9. Isaac Asimov: The Last Question (1956) (hex.ooo)
  10. How Big Tech wrote secrecy into EU law to hide data centres' environmental toll (www.investigate-europe.eu)
  11. Ada, its design, and the language that built the languages (www.iqiipi.com)
  12. FIM – Linux framebuffer image viewer (www.nongnu.org)
  13. Reflections on 30 years of HPC programming (chapel-lang.org)
  14. A Python Interpreter Written in Python (aosabook.org)
  15. Bluesky has been dealing with a DDoS attack for nearly a full day (www.theverge.com)

GitHub Trending(15)

  1. EvoMap / evolver
  2. lsdefine / GenericAgent
  3. SimoneAvogadro / android-reverse-engineering-skill
  4. BasedHardware / omi
  5. Lordog / dive-into-llms
  6. Donchitos / Claude-Code-Game-Studios
  7. jamiepine / voicebox
  8. lukilabs / craft-agents-oss
  9. Tracer-Cloud / opensre
  10. obra / superpowers
  11. z-lab / dflash
  12. openai / openai-agents-python
  13. google / magika
  14. pingdotgg / t3code
  15. ChromeDevTools / chrome-devtools-mcp

Product Hunt(15)

  1. Ichiba AI

    AI to AI influence, scored. See what moves the models.

  2. SpeechPal

    The practice room for real life conversations

  3. Briq (Beta)

    Bug verification in 1 click

  4. Wingman City Guide

    Turn saved travel videos into real-world trips

  5. Arky

    The canvas for thinking with AI

  6. Submit.DIY

    All-in-one AI launch platform for makers

  7. Canva AI 2.0

    AI that creates with you, and connects to your world

  8. E.Y.E. by Expert Chase

    Where human life runs with AI

  9. DASCA

    Real-time visual effects with GLSL Playground

  10. Melo

    One canvas for all your work

  11. Build Check (for Outsiders)

    Is your app idea actually worth building?

  12. Codex 2.0 by OpenAI

    Codex now runs apps, automates tasks, codes & more

  13. Noa

    Your life admin for quick availability and scheduling

  14. Claude Opus 4.7

    Claude’s most capable model for reasoning and agentic coding

  15. Athena

    Claude Code for Product Teams

Hugging Face(15)

  1. HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

    We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian Splatting (3DGS) scenes. This is achieved through a four-stage method: a) Panorama Generation with HY-Pano 2.0, b) Trajectory Planning with WorldNav, c) World Expansion with WorldStereo 2.0, and d) World Composition with WorldMirror 2.0. Specifically, we introduce key innovations to enhance panorama fidelity, enable 3D scene understanding and planning, and upgrade WorldStereo, our keyframe-based view generation model with consistent memory. We also upgrade WorldMirror, a feed-forward model for universal 3D prediction, by refining model architecture and learning strategy, enabling world reconstruction from multi-view images or videos. Also, we introduce WorldLens, a high-performance 3DGS rendering platform featuring a flexible engine-agnostic architecture, automatic IBL lighting, efficient collision detection, and training-rendering co-design, enabling interactive exploration of 3D worlds with character support. Extensive experiments demonstrate that HY-World 2.0 achieves state-of-the-art performance on several benchmarks among open-source approaches, delivering results comparable to the closed-source model Marble. We release all model weights, code, and technical details to facilitate reproducibility and support further research on 3D world models.

  2. RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

    High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of corrective negative feedback when trained purely with imitation learning. To address these issues, we propose RAD-2, a unified generator-discriminator framework for closed-loop planning. Specifically, a diffusion-based generator is used to produce diverse trajectory candidates, while an RL-optimized discriminator reranks these candidates according to their long-term driving quality. This decoupled design avoids directly applying sparse scalar rewards to the full high-dimensional trajectory space, thereby improving optimization stability. To further enhance reinforcement learning, we introduce Temporally Consistent Group Relative Policy Optimization, which exploits temporal coherence to alleviate the credit assignment problem. In addition, we propose On-policy Generator Optimization, which converts closed-loop feedback into structured longitudinal optimization signals and progressively shifts the generator toward high-reward trajectory manifolds. To support efficient large-scale training, we introduce BEV-Warp, a high-throughput simulation environment that performs closed-loop evaluation directly in Bird's-Eye View feature space via spatial warping. RAD-2 reduces the collision rate by 56% compared with strong diffusion-based planners. Real-world deployment further demonstrates improved perceived safety and driving smoothness in complex urban traffic.

  3. DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

    Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report generation, yet their evaluation remains challenging due to dynamic web environments and ambiguous task definitions. We propose DR^{3}-Eval, a realistic and reproducible benchmark for evaluating deep research agents on multimodal, multi-file report generation. DR^{3}-Eval is constructed from authentic user-provided materials and paired with a per-task static research sandbox corpus that simulates open-web complexity while remaining fully verifiable, containing supportive documents, distractors, and noise. Moreover, we introduce a multi-dimensional evaluation framework measuring Information Recall, Factual Accuracy, Citation Coverage, Instruction Following, and Depth Quality, and validate its alignment with human judgments. Experiments with our developed multi-agent system DR^{3}-Agent based on multiple state-of-the-art language models demonstrate that DR^{3}-Eval is highly challenging and reveals critical failure modes in retrieval robustness and hallucination control. Our code and data are publicly available.

  4. How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

    A widely adopted strategy for model enhancement is to use synthetic data generated by a stronger model for supervised fine-tuning (SFT). However, for emerging reasoning models like Qwen3-8B, this approach often fails to improve reasoning capabilities and can even lead to a substantial drop in performance. In this work, we identify substantial stylistic divergence between teacher generated data and the distribution of student as a major factor impacting SFT. To bridge this gap, we propose a Teacher-Student Cooperation Data Synthesis framework (TESSY), which interleaves teacher and student models to alternately generate style and non-style tokens. Consequently, TESSY produces synthetic sequences that inherit the advanced reasoning capabilities of the teacher while maintaining stylistic consistency with the distribution of the student. In experiments on code generation using GPT-OSS-120B as the teacher, fine-tuning Qwen3-8B on teacher-generated data leads to performance drops of 3.25% on LiveCodeBench-Pro and 10.02% on OJBench, whereas TESSY achieves improvements of 11.25% and 6.68%.

  5. HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

    While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visual-grounded-centric hierarchical framework that explicitly decouples high-level semantic planning from low-level motor control. In high-level part, a VLM planner first performs task decomposition and visual grounding to generate structured plans, comprising a subtask instruction and a precise target bounding box. Then, to translate this plan into physical actions, we introduce a flow-matching Diffusion Transformer (DiT) action expert in low-level part equipped with a novel cascaded cross-attention mechanism. This design sequentially fuses global context, high-resolution object-centric crops and skill semantics, enabling the DiT to focus purely on robust execution. Our decoupled architecture preserves the VLM's zero-shot reasoning while allowing independent improvement of both components. Extensive experiments in simulation and the real world demonstrate that HiVLA significantly outperforms state-of-the-art end-to-end baselines, particularly excelling in long-horizon skill composition and the fine-grained manipulation of small objects in cluttered scenes.

  6. ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

    Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes. As tense jailbreaking demonstrates that models refusing harmful requests often comply when rephrased in past tense, a critical generalization gap is revealed in current alignment methods whose underlying mechanisms are poorly understood. In this work, we introduce Activation-Scaling Guard (ASGuard), an insightful, mechanistically-informed framework that surgically mitigates this specific vulnerability. In the first step, we use circuit analysis to identify the specific attention heads causally linked to the targeted jailbreaking such as a tense-changing attack. Second, we train a precise, channel-wise scaling vector to recalibrate the activation of tense vulnerable heads. Lastly, we apply it into a "preventative fine-tuning", forcing the model to learn a more robust refusal mechanism. Across four LLMs, ASGuard effectively reduces the attack success rate of targeted jailbreaking while preserving general capabilities and minimizing over refusal, achieving a Pareto-optimal balance between safety and utility. Our findings underscore how adversarial suffixes suppress the propagation of the refusal-mediating direction, based on mechanistic analysis. Furthermore, our work showcases how a deep understanding of model internals can be leveraged to develop practical, efficient, and targeted methods for adjusting model behavior, charting a course for more reliable and interpretable AI safety.

  7. GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

    The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inference, suffer from significant trade-offs between these goals, mainly due to the reliance on local, heuristic-driven allocation strategies that lack global scene awareness. Specifically, current feed-forward methods are largely pixel-aligned or voxel-aligned. By unprojecting pixels into dense, view-aligned primitives, they bake redundancy into the 3D asset. As more input views are added, the representation size increases and global consistency becomes fragile. To this end, we introduce GlobalSplat, a framework built on the principle of align first, decode later. Our approach learns a compact, global, latent scene representation that encodes multi-view input and resolves cross-view correspondences before decoding any explicit 3D geometry. Crucially, this formulation enables compact, globally consistent reconstructions without relying on pretrained pixel-prediction backbones or reusing latent features from dense baselines. Utilizing a coarse-to-fine training curriculum that gradually increases decoded capacity, GlobalSplat natively prevents representation bloat. On RealEstate10K and ACID, our model achieves competitive novel-view synthesis performance while utilizing as few as 16K Gaussians, significantly less than required by dense pipelines, obtaining a light 4MB footprint. Further, GlobalSplat enables significantly faster inference than the baselines, operating under 78 milliseconds in a single forward pass. Project page is available at https://r-itk.github.io/globalsplat/

  8. Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

    Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or data requirements, making deployment more efficient. However, applying KD to VLMs is challenged by modality-specific supervision: although multimodal knowledge in VLMs is fused within the language space, current methods supervise each modality separately without explicitly addressing multimodal alignment, leading to inconsistent multimodal knowledge transfer. To address this, we propose Switch-KD, a visual-switch distillation framework that unifies vision-language knowledge transfer within a shared text-probability space. Switch-KD comprises two key components: (1) Visual-Switch Distillation, which switches the student's visual outputs into the teacher's language pathway to construct cross-modal probabilistic references for implicit visual knowledge transfer; and (2) Dynamic Bi-directional Logits Difference (DBiLD) loss, which adaptively aligns informative probability regions while preserving the distributional structures of teacher and student through bidirectional supervision. Guided by Switch-KD, a 0.5B TinyLLaVA effectively distills rich multimodal knowledge from its 3B teacher, yielding an average improvement of 3.6 points across 10 multimodal benchmarks without any architectural modification.

  9. UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

    Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual semantics essential for complex reasoning. To address this limitation, we propose UniDoc-RL, a unified reinforcement learning framework in which an LVLM agent jointly performs retrieval, reranking, active visual perception, and reasoning. UniDoc-RL formulates visual information acquisition as a sequential decision-making problem with a hierarchical action space. Specifically, it progressively refines visual evidence from coarse-grained document retrieval to fine-grained image selection and active region cropping, allowing the model to suppress irrelevant content and attend to information-dense regions. For effective end-to-end training, we introduce a dense multi-reward scheme that provides task-aware supervision for each action. Based on Group Relative Policy Optimization (GRPO), UniDoc-RL aligns agent behavior with multiple objectives without relying on a separate value network. To support this training paradigm, we curate a comprehensive dataset of high-quality reasoning trajectories with fine-grained action annotations. Experiments on three benchmarks demonstrate that UniDoc-RL consistently surpasses state-of-the-art baselines, yielding up to 17.7% gains over prior RL-based methods.

  10. Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

    Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as autonomous driving. We present Re2Pix, a hierarchical video prediction framework that decomposes forecasting into two stages: semantic representation prediction and representation-guided visual synthesis. Instead of directly predicting future RGB frames, our approach first forecasts future scene structure in the feature space of a frozen vision foundation model, and then conditions a latent diffusion model on these predicted representations to render photorealistic frames. This decomposition enables the model to focus first on scene dynamics and then on appearance generation. A key challenge arises from the train-test mismatch between ground-truth representations available during training and predicted ones used at inference. To address this, we introduce two conditioning strategies, nested dropout and mixed supervision, that improve robustness to imperfect autoregressive predictions. Experiments on challenging driving benchmarks demonstrate that the proposed semantics-first design significantly improves temporal semantic consistency, perceptual quality, and training efficiency compared to strong diffusion baselines. We provide the implementation code at https://github.com/Sta8is/Re2Pix

  11. TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

    Every call to an LLM classification endpoint produces a labeled input-output pair already retained in production logs. These pairs constitute a free, growing training set: a lightweight surrogate trained on them can absorb a significant portion of future traffic at near-zero marginal inference cost. The open questions are when the surrogate is reliable enough to deploy, what it handles versus defers, and how that boundary evolves as data accumulates. We introduce TRACER (Trace-based Adaptive Cost-Efficient Routing), an open-source system that trains ML surrogates on an LLM's own production traces and governs deployment through a parity gate: the surrogate is activated only when its agreement with the LLM exceeds a user-specified threshold α. To make the routing boundary transparent, TRACER generates interpretability artifacts describing which input regions the surrogate handles, where it plateaus, and why it defers. On a 77-class intent benchmark with a Sonnet 4.6 teacher, TRACER achieves 83-100% surrogate coverage depending on the quality target α; on a 150-class benchmark, the surrogate fully replaces the teacher. On a natural language inference task, the parity gate correctly refuses deployment because the embedding representation cannot support reliable separation. The system is available as open-source software.

  12. Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

    Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes its comprehensive architecture by analyzing the publicly available TypeScript source code and further comparing it with OpenClaw, an independent open-source AI agent system that answers many of the same design questions from a different deployment context. Our analysis identifies five human values, philosophies, and needs that motivate the architecture (human decision authority, safety and security, reliable execution, capability amplification, and contextual adaptability) and traces them through thirteen design principles to specific implementation choices. The core of the system is a simple while-loop that calls the model, runs tools, and repeats. Most of the code, however, lives in the systems around this loop: a permission system with seven modes and an ML-based classifier, a five-layer compaction pipeline for context management, four extensibility mechanisms (MCP, plugins, skills, and hooks), a subagent delegation mechanism with worktree isolation, and append-oriented session storage. A comparison with OpenClaw, a multi-channel personal assistant gateway, shows that the same recurring design questions produce different architectural answers when the deployment context changes: from per-action safety classification to perimeter-level access control, from a single CLI loop to an embedded runtime within a gateway control plane, and from context-window extensions to gateway-wide capability registration. We finally identify six open design directions for future agent systems, grounded in recent empirical, architectural, and policy literature.

  13. Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

    Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based editing, style transfer, or inverse problems. However, it relies on the assumption that generative models remain sensitive to natural language prompts. We demonstrate that for state-of-the-art native text-to-3D generative models, this assumption often collapses. We identify a critical failure mode where generation trajectories are drawn into latent ``sink traps'': regions where the model becomes insensitive to prompt modifications. In these regimes, changes to the input text fail to alter internal representations in a way that alters the output geometry. Crucially, we observe that this is not a limitation of the model's geometric expressivity; the same generative models possess the ability to produce a vast diversity of shapes but, as we demonstrate, become insensitive to out-of-distribution text guidance. We investigate this behavior by analyzing the sampling trajectories of the generative model, and find that complex geometries can still be represented and produced by leveraging the model's unconditional generative prior. This leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model's geometric representation power from its linguistic sensitivity. Our approach addresses the limitations of current 3D pipelines and enables high-fidelity semantic manipulation of out-of-distribution 3D shapes. Project webpage: https://daidedou.sorpi.fr/publication/beyondprompts

  14. Boosting Visual Instruction Tuning with Self-Supervised Guidance

    Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-grained visual reasoning. Recent evidence suggests that this limitation arises not from weak visual representations, but from under-utilization of visual information during instruction tuning, where many tasks can be partially solved using language priors alone. We propose a simple and lightweight approach that augments visual instruction tuning with a small number of visually grounded self-supervised tasks expressed as natural language instructions. By reformulating classical self-supervised pretext tasks, such as rotation prediction, color matching, and cross-view correspondence, as image-instruction-response triplets, we introduce supervision that cannot be solved without relying on visual evidence. Our approach requires no human annotations, no architectural modifications, and no additional training stages. Across multiple models, training regimes, and benchmarks, injecting only a small fraction (3-10%) of such visually grounded instructions consistently improves performance on vision-centric evaluations. Our findings highlight instruction tuning with visually grounded SSL tasks as a powerful lever for improving visual reasoning in MLLMs through simple adjustments to the training data distribution. Code available at: https://github.com/sirkosophia/V-GIFT

  15. RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

    Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine. To address this, we introduce RadAgent, a tool-using AI agent that generates CT reports through a stepwise and interpretable process. Each resulting report is accompanied by a fully inspectable trace of intermediate decisions and tool interactions, allowing clinicians to examine how the reported findings are derived. In our experiments, we observe that RadAgent improves Chest CT report generation over its 3D VLM counterpart, CT-Chat, across three dimensions. Clinical accuracy improves by 6.0 points (36.4% relative) in macro-F1 and 5.4 points (19.6% relative) in micro-F1. Robustness under adversarial conditions improves by 24.7 points (41.9% relative). Furthermore, RadAgent achieves 37.0% in faithfulness, a new capability entirely absent in its 3D VLM counterpart. By structuring the interpretation of chest CT as an explicit, tool-augmented and iterative reasoning trace, RadAgent brings us closer toward transparent and reliable AI for radiology.

Techmeme(15)

  1. Sources: Recursive Superintelligence, a four-month-old start-up developing self-teaching AI and founded by ex-DeepMind and OpenAI engineers, has raised $500M+ (Financial Times)

    Financial Times : Sources: Recursive Superintelligence, a four-month-old start-up developing self-teaching AI and founded by ex-DeepMind and OpenAI engineers, has raised $500M+ —  Group founded by former engineers at DeepMind and OpenAI secures $4bn valuation in deal with Google's venture arm and Nvidia

  2. White House says a meeting between Chief of Staff Susie Wiles and Dario Amodei had been "productive and constructive"; source: Scott Bessent joined the meeting (Axios)

    Axios : White House says a meeting between Chief of Staff Susie Wiles and Dario Amodei had been “productive and constructive”; source: Scott Bessent joined the meeting —  Treasury Secretary Scott Bessent joined a meeting on Friday between White House Chief of Staff Susie Wiles …

  3. A deep dive into Dwarkesh Patel's interview with Jensen Huang, including Huang's takes on Nvidia's moat and chip sales to China, and reactions to the interview (Zvi Mowshowitz/Don't Worry About the Vase)

    Zvi Mowshowitz / Don't Worry About the Vase : A deep dive into Dwarkesh Patel's interview with Jensen Huang, including Huang's takes on Nvidia's moat and chip sales to China, and reactions to the interview —  Some podcasts are self-recommending on the 'yep, I'm going to be breaking this one down' level.  This was one of those.  So here we go.

  4. Bill Peebles, the researcher behind Sora, is leaving OpenAI, along with Srinivas Narayanan, OpenAI's CTO of enterprise applications (Rebecca Bellan/TechCrunch)

    Rebecca Bellan / TechCrunch : Bill Peebles, the researcher behind Sora, is leaving OpenAI, along with Srinivas Narayanan, OpenAI's CTO of enterprise applications —  OpenAI is losing two of the architects of its most ambitious moonshots.  Kevin Weil, who led the company's science research initiative, and Bill Peebles …

  5. Cerebras files to go public on Nasdaq and reports $510M in 2025 revenue, up 76% YoY, with a net income of $87.9M, up from a $485M net loss in 2024 (Jordan Novet/CNBC)

    Jordan Novet / CNBC : Cerebras files to go public on Nasdaq and reports $510M in 2025 revenue, up 76% YoY, with a net income of $87.9M, up from a $485M net loss in 2024 —  Cerebras, a producer of chips that run artificial intelligence models, on Friday filed to go public on Nasdaq under the ticker symbol “CBRS.”

  6. Kevin Weil, OpenAI's former CPO who became VP of OpenAI for Science, is leaving the company; Prism, a web app for scientists launched in Jan., will be shuttered (Maxwell Zeff/Wired)

    Maxwell Zeff / Wired : Kevin Weil, OpenAI's former CPO who became VP of OpenAI for Science, is leaving the company; Prism, a web app for scientists launched in Jan., will be shuttered —  The former Instagram VP is departing the ChatGPT-maker, which is folding the AI science application he led into Codex.

  7. Sources: Meta intends to conduct a first wave of sweeping layoffs planned for this year on May 20, laying off ~10% of its global workforce, or ~8,000 employees (Reuters)

    Reuters : Sources: Meta intends to conduct a first wave of sweeping layoffs planned for this year on May 20, laying off ~10% of its global workforce, or ~8,000 employees —  Meta (META.O) intends to conduct a first wave of sweeping layoffs planned for this year on May 20, with more coming later …

  8. Figma stock closed down 6.84% on Friday after Anthropic launched Claude Design, a dedicated app powered by its latest model Claude Opus 4.7 (Jon Keegan/Sherwood News)

    Jon Keegan / Sherwood News : Figma stock closed down 6.84% on Friday after Anthropic launched Claude Design, a dedicated app powered by its latest model Claude Opus 4.7 —  Today Anthropic launched Claude Design, a dedicated app powered by its latest model Claude Opus 4.7 that lets users use text prompts to build web site designs …

  9. In disclosures due to Broadcom CEO Hock Tan's presence on the Meta board, Meta says it paid Broadcom $2.3B in 2025; Tan is leaving the board (Martin Peers/The Information)

    Martin Peers / The Information : In disclosures due to Broadcom CEO Hock Tan's presence on the Meta board, Meta says it paid Broadcom $2.3B in 2025; Tan is leaving the board —  Meta Platforms paid Broadcom $2.3 billion in 2025, Meta disclosed in a securities filing, a rare disclosure of how much tech firms pay Broadcom for help designing their AI chips.

  10. Sources: Cursor is in advanced talks to raise about $2B co-led by a16z at a pre-money valuation of more than $50B, with Nvidia participating (Bloomberg)

    Bloomberg : Sources: Cursor is in advanced talks to raise about $2B co-led by a16z at a pre-money valuation of more than $50B, with Nvidia participating —  Cursor, a leading artificial intelligence startup for coding, is in advanced talks with investors to raise about $2 billion in a funding round …

  11. A profile of wealth manager Iconiq, which, sources say, has $100B AUM, with $26B specifically for VC investing; Iconiq invested $3B into AI startups in 2025 (Natasha Mascarenhas/Bloomberg)

    Natasha Mascarenhas / Bloomberg : A profile of wealth manager Iconiq, which, sources say, has $100B AUM, with $26B specifically for VC investing; Iconiq invested $3B into AI startups in 2025 —  Last year, Anthropic PBC chief Dario Amodei and a handful of executives traveled 8,000 miles from San Francisco to the Middle East.

  12. World expands its Tinder partnership and partners with Zoom and others to verify human users, as it continues its pivot from crypto to identity verification (Maxwell Zeff/Wired)

    Maxwell Zeff / Wired : World expands its Tinder partnership and partners with Zoom and others to verify human users, as it continues its pivot from crypto to identity verification —  Honestly, what's hotter than a real person?  —  Sam Altman's iris-scanning, humanity-verifying World project announced at an event …

  13. Ad buyers say ad rates for ChatGPT are falling from $60 CPM to as low as $25 and the minimum spend to advertise is down to $50K from $250K at launch (Krystal Scanlon/Digiday)

    Krystal Scanlon / Digiday : Ad buyers say ad rates for ChatGPT are falling from $60 CPM to as low as $25 and the minimum spend to advertise is down to $50K from $250K at launch —  Advertising in ChatGPT is already getting cheaper.  The rate advertisers pay to reach every thousand users has fallen from $60 …

  14. Netflix plans to launch a vertical video feed, to help with content discovery, this month, and plans to use AI for content creation and recommendations (Ivan Mehta/TechCrunch)

    Ivan Mehta / TechCrunch : Netflix plans to launch a vertical video feed, to help with content discovery, this month, and plans to use AI for content creation and recommendations —  Netflix is going to launch a TikTok-like vertical video feed within its apps this month, and plans to use AI broadly for content creation …

  15. China's smartphone shipments declined 4% YoY in Q1 amid memory shortages; Huawei's shipments grew 2% YoY for a 20% market share, iPhone grew 20% for a 19% share (Ivan Lam/Counterpoint Research)

    Ivan Lam / Counterpoint Research : China's smartphone shipments declined 4% YoY in Q1 amid memory shortages; Huawei's shipments grew 2% YoY for a 20% market share, iPhone grew 20% for a 19% share —  Ivan is a Senior Research Analyst at Counterpoint Research, based in Hong Kong.  He has more than 15 years of experience, with a major focus on mobile and network devices.

Solidot(15)

  1. 英伟达 CEO 反对进一步限制向中国出口芯片

    英伟达执行长黄仁勋在 Dwarkesh Podcast 节目中反驳美国强化对中国芯片设备出口管制的主张,反对进一步限制对中国出口。他指出,中国具备庞大能源资源,可透过扩大产能弥补制程差距,因此中国无法自主发展 AI 芯片的说法“毫无根据”。黄仁勋强调,美国不该放弃全球第二大的算力市场,若迫使中国加速建立本土 AI 技术系统,将损害美国科技领先地位。他批评,现行政策已间接推动中国芯片产业成长,并指出华为去年营收已创下历史新高。

  2. 美国科技巨头成功在欧盟法律中将数据中心环境影响列为保密信息

    微软以及成员包括亚马逊、Google 和 Meta 的游说组织 DigitalEurope 被发现成功在欧盟法律中争取到一则保密条款,阻止公众获取数据中心环境影响的相关信息。法律学者认为该保密条款可能违反了欧盟的透明度规定。该保密条款是在 2024 年添加到 EU Energy Efficiency Directive 修订版中。欧盟委员会在 2023 年发布了第一版草案,按程序听取利益攸关者的反馈。2024 年初微软和 DigitalEurope 提出了他们的反馈意见:将数据中心的环境足迹信息列为机密和商业敏感信息。2024 年 3 月欧盟委员会发布终稿时逐字逐句的加入了微软和 DigitalEurope 的意见。

  3. Firefox 加入了对 Web Serial API 的支持

    Firefox Nightly 版加入了对 Web Serial API 的支持,而六年前 Mozilla 以不安全为由反对支持该 API。Web Serial API 允许浏览器与通过串行端口通信的设备交互,此类设备包括 3D 打印机,微控制器如 Arduino 和 ESP32,智能家居面板如 ESPHome,以及通过 USB 或蓝牙模拟串行端口的设备通信。Google Chrome 自 2021 年起加入了对 Web Serial API 的支持,基于 Chromium 的浏览器如 Edge、Opera 和 Vivaldi 也都支持该 API。Mozilla 杰出工程师 Martin Thomso 在 2020 年表示,对于如此强大的功能,无法为用户提供充分的保护,即使用户同意。串行端口是物理连接赋予高度信任的时代的遗物,许多设备允许通过该接口连接的设备在没有任何身份验证的情况下获得管理权限,这一权限甚至超过了 root。两年后 Mozilla 被要求重新考虑其立场,Firefox CTO Bobby Holley 表示 Mozilla 愿意采用和 WebMIDI 相同的附加组件守门机制(add-on-gating mechanism)支持 WebSerial API。Mozilla 目前仍然反对 WebUSB 和 WebHID,而苹果 WebKit 团队仍然对 WebSerial、WebUSB 和 WebHID 持反对态度。

  4. 大自然仍然在铸造人类基因

    一万年对于现代人类的演化历史而言不过是一瞬间,因此科学家认为过去万年就人类演化而言变化甚微。然而根据发表《自然》期刊上的一项研究,科学家分析了 15836 具古代人类遗骸的 DNA,发现了 479 个过去万年受自然选择青睐的基因突变。研究人员认为可能还有数千种基因突变经历了自然选择。研究人员发现,导致麸质过敏腹泻性乳糜泻的突变出现在 4000 年前,意味着它比埃及金字塔建造的时间还要晚。今天全世界可能有 8000 万人患有乳糜泻,这是一种自身免疫性疾病,患者的免疫系统会攻击麸质并损害肠道。由于某种原因,携带这种突变的人比没有这种突变的人有更多的后代。研究人员还发现欧洲居民身上增加吸烟率的基因突变在减少,原因可能与吸烟的危害无关,因为欧洲人吸烟的历史只有 460 年。研究人员承认他们不知道是什么原因导致的。

  5. 威尼斯如何应对海平面上升

    威尼斯是世界文化遗产城市,坐落于威尼斯潟湖内,过去 150 年间饱受洪水冲击。这座城市目前的防洪设施包括位于潟湖入口的三座可移动屏障。发表在《Scientific Reports》期刊上的一项研究评估威尼斯如何应对海平面上升的策略。研究人员估计,如采取更多措施,现有的可动防洪屏障或许能应对最高约 1.25 米的海平面上升。在低排放场景下,因气候变化和地面沉降,这一阈值可能在 2300 年被突破。当海平面上升0.5米时(在高排放场景下,可能发生于 2100 年之前),有必要修建堤坝。封闭潟湖的策略在海平面上升 0.5 米后也是可行的,这将保护城市抵御最高达 10 米的海平面上升。研究人员提出,在海平面上升超过 4.5 米后,迁移城市或将成为必要,预计将发生在 2300 年后。研究人员估算威尼斯现有防洪系统的总成本约为 60 亿欧元,估计建设堤坝的成本将在 5 亿至 45 亿欧元之间。用超级堤坝封闭潟湖最初成本将超过 300 亿欧元,搬迁城市的成本则可能高达 1000 亿欧元。

  6. SpaceX 将发射 ESA 的 Rosalind Franklin 火星漫游车

    NASA 宣布,因种种原因多次推迟的 ESA Rosalind Franklin 火星漫游车将使用 SpaceX 的重型火箭 Falcon Heavy 发射到火星,最早发射时间是 2028 年。Rosalind Franklin 漫游车的历史可上溯到 1997 年,最初是 NASA 和 ESA 合作的项目,原计划 2018 年发射,但因为 NASA 在奥巴马政府任期内削减预算,美国退出了该项目。ESA 因此改与俄罗斯进行合作,计划 2020 年发射,但由于新冠疫情以及降落伞测试失败等问题,发射时间推迟到 2022 年。2022 年的俄乌战争促使 ESA 终止了与俄罗斯的合作。这一次美国再次伸出了援助之手,双方于 2024 年签署了合作协议。

  7. Discourse 强调会继续开源

    日程安排平台 Cal.com 最近宣布从开源转为闭源,理由是 AI 工具更容易从开源代码中发现漏洞,而安全性依赖于模糊,因此闭源有助于提高安全。开源论坛软件 Discourse 对此做出了回应,强调会继续开源,同时表示不敢苟同其对软件安全的看法。Discourse 认为 AI 工具并不需要源代码去发现漏洞,它们针对的是编译后的二进制文件和黑盒 API。闭源并不会让软件更安全。世界最重要的互联网基础设施运行在以 Linux 为代表的开源软件之上,开源代码时刻暴露在无数人的注视之下。它遭受无情的攻击,但也在无止境的加固。这就是安全领域开源真正的意义所在:透明性不是消除风险,但能带来更强大的防御能力。开源带来了一种紧迫感:当代码公开时,你会预料到代码会被仔细审查,因此会更早更积极投入资源,在攻击者前面发现和修复问题。闭源只是给你带来虚幻的安全感。

  8. 美国主流媒体封禁互联网档案馆的存档机器人

    互联网档案馆时光机器(Wayback Machine)存档的内容被媒体广泛使用,然而包括 NYT 和 USA Today 等美国几十家主流新闻网站最近都屏蔽了互联网档案馆的存档爬虫 ia_archiverbot,社交新闻平台 Reddit 也屏蔽了该爬虫,《卫报》没有屏蔽但进行了限制。《卫报》解释称这是为了防止 AI 公司滥用存档目的的内容抓取。NYT 给出的理由类似,称 AI 公司正利用互联网档案馆存档的纽约时报内容训练其模型。AI 公司大量收集互联网内容,而时光机器拥有数十年历史的资料库,被认为是一个极具吸引力的数据源。互联网档案馆运营了 30 年,存档了逾万亿网页。主流网站对其的限制可能削弱其保存工作。互联网档案馆正与 NYT 等媒体进行对话,希望它们最终会改变其做法。

  9. 新研究再次证实 AI 有害大脑

    研究人员在预印本平台 ArXiv 上发表论文《AI assistance reduces persistence and hurts independent performance》,再次证实 AI 有害大脑。研究人员招募了 350 名美国人,任务是解决一些分数方程。半数参与者被随机分配到 AI 组,他们可从一个基于 OpenAI GPT-5 构建的专用聊天机器人获取帮助,另一半必须独立完成。考试进行到一半时,AI 组的访问权限被切断。此举导致 AI 组的正确答案数量急剧下降,很多人干脆放弃考试。这一结果——成绩和毅力双双下降——在一项包含 670 名参与者的更大规模实验中得到了重复验证。研究人员指出,AI 辅助能提高即时表现,但会带来巨大的认知代价。仅仅使用 AI 十分钟就会让人对这项技术产生依赖,一旦停止使用,会导致表现下降和倦怠。

  10. Mozilla 宣布开源可自托管 AI 客户端 Thunderbolt

    Mozilla 与德国 AI 基础设施公司 deepset 合作宣布开源可自托管 AI 客户端 Thunderbolt。MZLA Technologies Corporation CEO Ryan Sipes 表示,AI 太重要而不能外包,Thunderbolt 为机构组织提供了一种自主的 AI 客户端,根据自身的基础设施、数据和需求,决定 AI 如何融入自身的工作流程。Thunderbolt 主要面向企业用户,而不是普通的 Firefox 用户。

  11. Linux Mint 宣布采用更长的开发周期

    基于 Ubuntu 的发行版 Linux Mint 正式宣布放慢发布周期。Ubuntu 是每半年发布一个新版本,Linux Mint 此前采用的发布周期类似。项目联合创始人 Clem Lefebvre 指出,每六个月发布一个新版本,此外还包括 LMDE,他们花在测试、修 bug 和发布上的时间远多于开发时间。Linux Mint 决定改变现状,采用更长的开发周期:下一个版本计划于 2026 年圣诞节发布,它将基于预计四月晚些时候释出的 Ubuntu 26.04 LTS,使用 Linux kernel 7.0(刚刚发布)。

  12. 人类的噪音在伤害动物,我们会学会安静吗?

    人类一直在制造噪音,而动物并不喜欢。动物需要时刻注意周围的声音,警惕捕食者接近或者发现求偶者。无处不在的人造噪音增加了动物彼此之间的沟通难度。旧金山公园 Presidio 的历史录音显示,在 1960 年代麻雀有三种不同的“方言”,但到了 2010 年代,由于交通噪音麻雀主要使用其中音调更高的“方言”,另外两种较柔和的“方言”要么已经消失要么正在消失。为了在噪音下被同类听到,鸟儿只能竭尽全力的鸣叫。城市噪音甚至改变了鸟类的体型,它们变得更瘦,压力更大。求偶鸣叫也不再有效。因为雌鸟不太喜欢高音调高音量的叫声。噪音还加剧了鸟类之间的冲突,因为听不到警告叫声鸟儿容易误入敌对鸟的领地。新冠疫情初期为遏制病毒的扩散全世界都采取了社交封锁的政策,世界变得更安静了。公园的噪音降低了 7 分贝,麻雀的叫声也发生了变化,鸣叫声更轻柔频率范围也更丰富,传送的距离也比以前多了一倍,求偶鸣叫也更撩人了。研究人员发现,当声音超过 55 分贝,胆小的动物就会进入应激反应;超过 65 分贝,几乎所有动物都会逃跑。噪音对动物有害,对人类也是如此:研究发现交通噪音与睡眠质量差、血压升高、心脏病发病率增加以及压力增大相关。那么我们能安静下来吗?

  13. 帝企鹅因气候变化导致数量减少被列为濒危

    世界自然保护联盟(IUCN)发布公报,南极洲两个最具代表性的物种——帝企鹅和南极毛皮海狮因数量快速、急剧减少,已被列入濒危物种红色名录濒危等级。IUCN 将物种生存状况划分为 9 个等级:灭绝、野外灭绝、极危、濒危、易危、近危、无危、数据缺乏、未予评估。IUCN 公报指出,根据卫星图像,2009 年至 2018 年间,帝企鹅种群数量减少了约 10%,相当于 2 万多只成年帝企鹅消失了。南极气候变化导致海冰发生变化,预测到 2180 年代,帝企鹅的数量将减半。南极毛皮海狮因食物短缺导致数量自 2000 年以来减少了 50%。气候变化致使海洋温度上升和海冰面积缩减,磷虾向更深、更冷的水域迁移,导致南极毛皮海狮的食物来源减少。

  14. 抹香鲸的发声沟通方式与人类相似

    人类与抹香鲸可能没有共同之处,但根据发表在《Proceedings B》上的一项研究,抹香鲸的发声沟通方式与人类语言有着惊人的相似性。抹香鲸不仅有着某种形式的“字母表”,在发声中形成元音,且元音的结构与人类语言采用了相同的方法。抹香鲸通过一系列“咔嗒”声进行交流,对咔嗒声的分析发现,抹香鲸能通过长短不同的咔哒声或通过升降调去区分元音,其模式与汉语、拉丁语和斯洛文尼亚语等语言相似。这项发现是致力于翻译鲸鱼语言的 Project Ceti(代表 Cetacean Translation Initiative)的最新研究结果。Project Ceti 创始人 David Gruber 说,鲸鱼可能将信息一代又一代传递了逾两千万年。

  15. 中国手游如何征服世界

    AppMagic 的数据显示,2026 年 2 月全世界收入最高的 15 款手游中有 7 款来自中国,这些游戏的内购在一个月内产生了 6.68 亿美元的收入。Playrix、King、Roblox Corporation 和 Supercell 等西方公司仍然能跻身全球收入最高的 10 大手游发行商榜单,但它们的成功源于老游戏。2025 年收入排名前 15 的新游戏中没有一款来自西方游戏工作室。2025 年中国游戏在海外市场的收入达到 205 亿美元,连续第十年实现增长,连续第二年保持两位数增长。这种现象并非偶然,也并非仅仅因为中国市场庞大,中国手游的全球支配地位是循序渐进建立起来的。这一切源自本世纪初,当时中国的 PC 游戏盗版猖獗,买断制游戏模式难以维系,因此中国游戏公司转向了免费内购模式,在手游逐渐流行时,中国游戏公司对玩家付费心理的理解领先世界十年,并至今影响着他们的运营模式。2025 年中国手游活跃玩家 7.72 亿,手游收入占到了游戏市场总收入 501 亿美元的 73.29%,巨大的竞争压力促使中国游戏公司转向海外市场,首先是 4X 策略游戏,然后是中核(Mid-core)游戏,曾经的西方强项益智、合并和消除类游戏也逐渐被中国公司吞食。中国公司拥有西方公司所不具备的结构性优势:庞大且地域集中的人才库;对轮班制工作的文化接受度;员工纪律性强且易于替代;规模化生产后单位劳动力成本更低;以及对庞大团队和快速重组的容忍度。西方公司无法长期安排多个轮班以满足全天候实时运营的需求,无法将团队规模扩大到数千人,无法以工业化速度招聘和重组,没办法在这方面与中国公司展开竞争。中国公司对市场的反应无人能及,前一天 TikTok 上流行的梗(meme)可能第二天就出现在游戏关卡里。中国公司凭借组织规模取胜,西方公司只能依靠组织和创意的精准性取胜。