DIGEST · 2026-04-16

OrangeBot.AI Digest — 2026-04-16

89 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 (simonwillison.net)
  2. Codex for almost everything (openai.com)
  3. We gave an AI a 3 year retail lease and asked it to make a profit (andonlabs.com)
  4. Claude Opus 4.7 (www.anthropic.com)
  5. Laravel raised money and now injects ads directly into your agent (techstackups.com)
  6. Claude Opus 4.7 (www.anthropic.com)
  7. Mozilla Thunderbolt (www.thunderbolt.io)
  8. Qwen3.6-35B-A3B: Agentic coding power, now open to all (qwen.ai)
  9. Cloudflare Email Service (blog.cloudflare.com)
  10. The future of everything is lies, I guess: Where do we go from here? (aphyr.com)
  11. Cloudflare's AI Platform: an inference layer designed for agents (blog.cloudflare.com)
  12. AI cybersecurity is not proof of work (antirez.com)
  13. €54k spike in 13h from unrestricted Firebase browser key accessing Gemini APIs (discuss.ai.google.dev)
  14. Codex Hacked a Samsung TV (blog.calif.io)
  15. IPv6 traffic crosses the 50% mark (www.google.com)

GitHub Trending(14)

  1. forrestchang / andrej-karpathy-skills
  2. thedotmack / claude-mem
  3. lsdefine / GenericAgent
  4. jamiepine / voicebox
  5. vercel-labs / open-agents
  6. google / magika
  7. steipete / wacli
  8. topoteretes / cognee
  9. z-lab / dflash
  10. Lordog / dive-into-llms
  11. openai / openai-agents-python
  12. EvoMap / evolver
  13. SimoneAvogadro / android-reverse-engineering-skill
  14. BasedHardware / omi

Product Hunt(15)

  1. Foyer

    Make your site speak and sell

  2. Innogath

    Turn deep research into a navigable book + graph

  3. Claude Code Desktop App Redesigned

    Run parallel coding agents from one desktop workspace

  4. Google Chrome Skills

    Turn your best AI prompts into one-click tools in Chrome

  5. Astropad Workbench

    Remote desktop for AI agents running on headless Macs

  6. X-Pilot

    Explain anything accurately, from document to video course

  7. Subspace

    All your Agents in one app and persistent context

  8. Subagents in Gemini CLI

    Gemini CLI now runs specialist subagents in your terminal

  9. Askiva AI

    Your autonomous AI user researcher

  10. Pilot5.ai

    Your question, deliberated by 5 frontier AI models

  11. stagewise

    The coding agent that works in its own browser environment

  12. CodePlanet

    Master coding while building a portfolio

  13. MacSpoof

    A quick and easy MAC address changer

  14. ClayHog

    See what AI says about your brand and competitors

  15. Mantle SAFEs

    Issue & sign SAFEs for free. No DocuSign required.

Hugging Face(15)

  1. Seedance 2.0: Advancing Video Generation for World Complexity

    Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrating one of the most comprehensive suites of multi-modal content reference and editing capabilities available in the industry to date. It delivers substantial, well-rounded improvements across all key sub-dimensions of video and audio generation. In both expert evaluations and public user tests, the model has demonstrated performance on par with the leading levels in the field. Seedance 2.0 supports direct generation of audio-video content with durations ranging from 4 to 15 seconds, with native output resolutions of 480p and 720p. For multi-modal inputs as reference, its current open platform supports up to 3 video clips, 9 images, and 3 audio clips. In addition, we provide Seedance 2.0 Fast version, an accelerated variant of Seedance 2.0 designed to boost generation speed for low-latency scenarios. Seedance 2.0 has delivered significant improvements to its foundational generation capabilities and multi-modal generation performance, bringing an enhanced creative experience for end users.

  2. GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

    Towards an embodied generalist for real-world interaction, Multimodal Large Language Model (MLLM) agents still suffer from challenging latency, sparse feedback, and irreversible mistakes. Video games offer an ideal testbed with rich visual observations and closed-loop interaction, demanding fine-grained perception, long-horizon planning, and precise control. However, systematically evaluating these capabilities is currently hindered by heterogeneous action interfaces and heuristic verification. To this end, we introduce GameWorld, a benchmark designed for standardized and verifiable evaluation of MLLMs as generalist game agents in browser environments. Two game agent interfaces are studied: (i) computer-use agents that directly emit keyboard and mouse controls, and (ii) generalist multimodal agents that act in a semantic action space via deterministic Semantic Action Parsing. GameWorld contains 34 diverse games and 170 tasks, each paired with state-verifiable metrics for outcome-based evaluation. The results across 18 model-interface pairs suggest that even the best performing agent is far from achieving human capabilities on video games. Extensive experiments of repeated full-benchmark reruns demonstrate the robustness of the benchmark, while further studies on real-time interaction, context-memory sensitivity, and action validity expose more challenges ahead for game agents. Together, by offering a standardized, verifiable, and reproducible evaluation framework, GameWorld lays a robust foundation for advancing research on multimodal game agents and beyond. The project page is at https://gameworld-bench.github.io.

  3. RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

    Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into active optimization tools, improving generators in two complementary ways: at training time, structured rationales provide interpretable, fine-grained rewards for reinforcement learning; at test time, a Generate-Critique-Refine loop turns critiques into targeted prompt revisions that improve outputs without any parameter updates. To train such a reward model without costly rationale annotations, we introduce Preference-Anchored Rationalization (PARROT), a principled framework that recovers high-quality rationales from readily available preference data through anchored generation, consistency filtering, and distillation. The resulting model, RationalRewards (8B), achieves state-of-the-art preference prediction among open-source reward models, competitive with Gemini-2.5-Pro, while using 10-20x less training data than comparable baselines. As an RL reward, it consistently improves text-to-image and image-editing generators beyond scalar alternatives. Most strikingly, its test-time critique-and-refine loop matches or exceeds RL-based fine-tuning on several benchmarks, suggesting that structured reasoning can unlock latent capabilities in existing generators that suboptimal prompts fail to elicit.

  4. SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

    Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by the cost of geometric annotation. The self-evolving paradigm offers a promising path, but its reliance on model consensus to construct pseudo-labels causes training to reinforce rather than correct the model's own geometric errors. We identify a property unique to 3D spatial reasoning that circumvents this limitation: ground truth is a deterministic consequence of the underlying geometry, computable exactly from point clouds and camera poses without any model involvement. Building on this insight, we present SpatialEvo, a self-evolving framework for 3D spatial reasoning, centered on the Deterministic Geometric Environment (DGE). The DGE formalizes 16 spatial reasoning task categories under explicit geometric validation rules and converts unannotated 3D scenes into zero-noise interactive oracles, replacing model consensus with objective physical feedback. A single shared-parameter policy co-evolves across questioner and solver roles under DGE constraints: the questioner generates physically valid spatial questions grounded in scene observations, while the solver derives precise answers against DGE-verified ground truth. A task-adaptive scheduler endogenously concentrates training on the model's weakest categories, producing a dynamic curriculum without manual design. Experiments across nine benchmarks demonstrate that SpatialEvo achieves the highest average score at both 3B and 7B scales, with consistent gains on spatial reasoning benchmarks and no degradation on general visual understanding.

  5. OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

    AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor safety monitoring to customs import processing), yet existing benchmarks can only evaluate agents in the few domains where public environments exist. We introduce OccuBench, a benchmark covering 100 real-world professional task scenarios across 10 industry categories and 65 specialized domains, enabled by Language World Models (LWMs) that simulate domain-specific environments through LLM-driven tool response generation. Our multi-agent synthesis pipeline automatically produces evaluation instances with guaranteed solvability, calibrated difficulty, and document-grounded diversity. OccuBench evaluates agents along two complementary dimensions: task completion across professional domains and environmental robustness under controlled fault injection (explicit errors, implicit data degradation, and mixed faults). We evaluate 15 frontier models across 8 model families and find that: (1) no single model dominates all industries, as each has a distinct occupational capability profile; (2) implicit faults (truncated data, missing fields) are harder than both explicit errors (timeouts, 500s) and mixed faults, because they lack overt error signals and require the agent to independently detect data degradation; (3) larger models, newer generations, and higher reasoning effort consistently improve performance. GPT-5.2 improves by 27.5 points from minimal to maximum reasoning effort; and (4) strong agents are not necessarily strong environment simulators. Simulator quality is critical for LWM-based evaluation reliability. OccuBench provides the first systematic cross-industry evaluation of AI agents on professional occupational tasks.

  6. Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

    Memory-based self-evolution has emerged as a promising paradigm for coding agents. However, existing approaches typically restrict memory utilization to homogeneous task domains, failing to leverage the shared infrastructural foundations, such as runtime environments and programming languages, that exist across diverse real-world coding problems. To address this limitation, we investigate Memory Transfer Learning (MTL) by harnessing a unified memory pool from heterogeneous domains. We evaluate performance across 6 coding benchmarks using four memory representations, ranging from concrete traces to abstract insights. Our experiments demonstrate that cross-domain memory improves average performance by 3.7\%, primarily by transferring meta-knowledge, such as validation routines, rather than task-specific code. Importantly, we find that abstraction dictates transferability; high-level insights generalize well, whereas low-level traces often induce negative transfer due to excessive specificity. Furthermore, we show that transfer effectiveness scales with the size of the memory pool, and memory can be transferred even between different models. Our work establishes empirical design principles for expanding memory utilization beyond single-domain silos. Project page: https://memorytransfer.github.io/

  7. From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space

    While reinforcement learning with verifiable rewards (RLVR) significantly enhances LLM reasoning by optimizing the conditional distribution P(y|x), its potential is fundamentally bounded by the base model's existing output distribution. Optimizing the marginal distribution P(y) in the Pre-train Space addresses this bottleneck by encoding reasoning ability and preserving broad exploration capacity. Yet, conventional pre-training relies on static corpora for passive learning, leading to a distribution shift that hinders targeted reasoning enhancement. In this paper, we introduce PreRL (Pre-train Space RL), which applies reward-driven online updates directly to P(y). We theoretically and empirically validate the strong gradient alignment between log P(y) and log P(y|x), establishing PreRL as a viable surrogate for standard RL. Furthermore, we uncover a critical mechanism: Negative Sample Reinforcement (NSR) within PreRL serves as an exceptionally effective driver for reasoning. NSR-PreRL rapidly prunes incorrect reasoning spaces while stimulating endogenous reflective behaviors, increasing transition and reflection thoughts by 14.89x and 6.54x, respectively. Leveraging these insights, we propose Dual Space RL (DSRL), a Policy Reincarnation strategy that initializes models with NSR-PreRL to expand the reasoning horizon before transitioning to standard RL for fine-grained optimization. Extensive experiments demonstrate that DSRL consistently outperforms strong baselines, proving that pre-train space pruning effectively steers the policy toward a refined correct reasoning subspace.

  8. Exploration and Exploitation Errors Are Measurable for Language Model Agents

    Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these settings is the ability to both explore the problem space and exploit acquired knowledge effectively. However, systematically distinguishing and quantifying exploration and exploitation from observed actions without access to the agent's internal policy remains challenging. To address this, we design controllable environments inspired by practical embodied AI scenarios. Each environment consists of a partially observable 2D grid map and an unknown task Directed Acyclic Graph (DAG). The map generation can be programmatically adjusted to emphasize exploration or exploitation difficulty. To enable policy-agnostic evaluation, we design a metric to quantify exploration and exploitation errors from agent's actions. We evaluate a variety of frontier LM agents and find that even state-of-the-art models struggle on our task, with different models exhibiting distinct failure modes. We further observe that reasoning models solve the task more effectively and show both exploration and exploitation can be significantly improved through minimal harness engineering. We release our code https://github.com/jjj-madison/measurable-explore-exploit{here}.

  9. Target Policy Optimization

    In RL, given a prompt, we sample a group of completions from a model and score them. Two questions follow: which completions should gain probability mass, and how should the parameters move to realize that change? Standard policy-gradient methods answer both at once, so the update can overshoot or undershoot depending on the learning rate, clipping, and other optimizer choices. We introduce Target Policy Optimization (TPO), which separates the two questions. Given scored completions, TPO constructs a target distribution q_i propto p_i^{,old} exp(u_i) and fits the policy to it by cross-entropy. The loss gradient on sampled-completion logits is p^θ- q, which vanishes once the policy matches the target. On tabular bandits, transformer sequence tasks, and billion-parameter LLM RLVR, TPO matches PG, PPO, GRPO, and DG on easy tasks and substantially outperforms them under sparse reward. Code is available at https://github.com/JeanKaddour/tpo.

  10. Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure

    AI coding agents have become central to developer workflows, yet every existing solution locks its reasoning capabilities within a specific delivery form, such as a CLI, IDE plugin, or web application. This limitation creates systemic barriers when enterprises attempt to reuse these capabilities across heterogeneous engineering environments. To address this challenge, we present Sema Code, an open AI coding framework built on the principle of being embeddable, pluggable, and framework-first. Sema Code completely decouples the core agent engine from all client layers, publishing it as a standalone npm library that any runtime can drive programmatically. Built around this architecture, we designed eight key mechanisms: multi-tenant engine isolation, FIFO input queuing with safe session reconstruction, adaptive context compression, multi-agent collaborative scheduling, intelligent Todo-based process management, four-layer asynchronous permission control, three-tier ecosystem integration spanning MCP, Skills, and Plugins, and a background task framework with separated execution and observation privileges. These mechanisms collectively address the engineering challenges of transforming a complex agent engine into a shared, programmable core. Demonstrating its architectural versatility, the same Sema Core engine simultaneously powers a VSCode extension and a multi-channel messaging gateway, which we name SemaClaw, to unify agent interactions across platforms such as Telegram and Feishu. These represent two fundamentally different product forms sharing an identical reasoning kernel, differing only at the client layer.

  11. SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering

    The rise of OpenClaw in early 2026 marks the moment when millions of users began deploying personal AI agents into their daily lives, delegating tasks ranging from travel planning to multi-step research. This scale of adoption signals that two parallel arcs of development have reached an inflection point. First is a paradigm shift in AI engineering, evolving from prompt and context engineering to harness engineering-designing the complete infrastructure necessary to transform unconstrained agents into controllable, auditable, and production-reliable systems. As model capabilities converge, this harness layer is becoming the primary site of architectural differentiation. Second is the evolution of human-agent interaction from discrete tasks toward a persistent, contextually aware collaborative relationship, which demands open, trustworthy and extensible harness infrastructure. We present SemaClaw, an open-source multi-agent application framework that addresses these shifts by taking a step towards general-purpose personal AI agents through harness engineering. Our primary contributions include a DAG-based two-phase hybrid agent team orchestration method, a PermissionBridge behavioral safety system, a three-tier context management architecture, and an agentic wiki skill for automated personal knowledge base construction.

  12. Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

    Feed-forward 3D reconstruction models are efficient but rigid: once trained, they perform inference in a zero-shot manner and cannot adapt to the test scene. As a result, visually plausible reconstructions often contain errors, particularly under occlusions, specularities, and ambiguous cues. To address this, we introduce Free Geometry, a framework that enables feed-forward 3D reconstruction models to self-evolve at test time without any 3D ground truth. Our key insight is that, when the model receives more views, it produces more reliable and view-consistent reconstructions. Leveraging this property, given a testing sequence, we mask a subset of frames to construct a self-supervised task. Free Geometry enforces cross-view feature consistency between representations from full and partial observations, while maintaining the pairwise relations implied by the held-out frames. This self-supervision allows for fast recalibration via lightweight LoRA updates, taking less than 2 minutes per dataset on a single GPU. Our approach consistently improves state-of-the-art foundation models, including Depth Anything 3 and VGGT, across 4 benchmark datasets, yielding an average improvement of 3.73% in camera pose accuracy and 2.88% in point map prediction. Code is available at https://github.com/hiteacherIamhumble/Free-Geometry .

  13. TIP: Token Importance in On-Policy Distillation

    On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importance are incomplete. We ask a direct question: which tokens carry the most useful learning signal in OPD? Our answer is that informative tokens come from two regions: positions with high student entropy, and positions with low student entropy plus high teacher--student divergence, where the student is overconfident and wrong. Empirically, student entropy is a strong first-order proxy: retaining 50% of tokens with entropy-based sampling matches or exceeds all-token training while reducing peak memory by up to 47%. But entropy alone misses a second important region. When we isolate low-entropy, high-divergence tokens, training on fewer than 10% of all tokens nearly matches full-token baselines, showing that overconfident tokens carry dense corrective signal despite being nearly invisible to entropy-only rules. We organize these findings with TIP (Token Importance in on-Policy distillation), a two-axis taxonomy over student entropy and teacher--student divergence, and give a theoretical explanation for why entropy is useful yet structurally incomplete. This view motivates type-aware token selection rules that combine uncertainty and disagreement. We validate this picture across three teacher--student pairs spanning Qwen3, Llama, and Qwen2.5 on MATH-500 and AIME 2024/2025, and on the DeepPlanning benchmark for long-horizon agentic planning, where Q3-only training on <20% of tokens surpasses full-token OPD. Our experiments are implemented by extending the OPD repository https://github.com/HJSang/OPSD_OnPolicyDistillation, which supports memory-efficient distillation of larger models under limited GPU budgets.

  14. LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

    Continuous diffusion has been the foundation of high-fidelity, controllable, and few-step generation of many data modalities such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts due to the sparse data space and the underexplored design space. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete diffusion, by connecting embedding-space DLMs to Flow Matching via Bregman divergence, alongside three key innovations: (1) we derive a novel ODE-based NLL bound for principled evaluation of continuous flow-based language models; (2) we propose an information-uniform principle for setting the noise schedule, which motivates a learnable noise scheduler based on a Gumbel distribution; and (3) we revise prior training protocols by incorporating self-conditioning, as we find it improves both likelihood and sample quality of embedding-space DLMs with effects substantially different from discrete diffusion. Putting everything together, LangFlow rivals top discrete DLMs on both the perplexity (PPL) and the generative perplexity (Gen. PPL), reaching a PPL of 30.0 on LM1B and 24.6 on OpenWebText. It even exceeds autoregressive baselines in zero-shot transfer on 4 out of 7 benchmarks. LangFlow provides the first clear evidence that continuous diffusion is a promising paradigm for language modeling. Homepage: https://github.com/nealchen2003/LangFlow

  15. TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

    While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the Researcher and the Executor-the system seamlessly performs requirement analysis, open-domain literature and data research, formulation of training strategies, preparation of data recipes, and model training and evaluation. The multi-round experimental process is modeled as a search tree, enabling the system to efficiently plan exploration paths, reuse historical results, and distill high-level insights from iterative trials. To evaluate the capability of automated LLM training, we construct FT-Bench, a benchmark comprising 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks. Experimental results demonstrate that the TREX agent consistently optimizes model performance on target tasks.

Techmeme(15)

  1. Netflix says its 2026 ad revenue remains on track to reach $3B, doubling from 2025, and it now works with over 4,000 advertising clients, up 70% YoY (Bill Bradley/Adweek)

    Bill Bradley / Adweek : Netflix says its 2026 ad revenue remains on track to reach $3B, doubling from 2025, and it now works with over 4,000 advertising clients, up 70% YoY —  Netflix reveals ad sales projections and new products on the way in 2026 … The numbers  —  $2.8 billion — Netflix's payment received …

  2. Google updates AI Mode in Chrome, letting users open links side by side with AI Mode on desktop; users can search across multiple tabs on desktop and mobile (Aisha Malik/TechCrunch)

    Aisha Malik / TechCrunch : Google updates AI Mode in Chrome, letting users open links side by side with AI Mode on desktop; users can search across multiple tabs on desktop and mobile —  Google announced on Thursday that it's rolling out a new way to explore the web with AI Mode, its conversational search experience.

  3. Netflix reports Q1 revenue up 16% YoY to $12.3B, vs. $12.2B est., net income of $5.28B, and forecasts Q2 EPS below estimates; NFLX drops 8%+ after hours (Lucas Shaw/Bloomberg)

    Lucas Shaw / Bloomberg : Netflix reports Q1 revenue up 16% YoY to $12.3B, vs. $12.2B est., net income of $5.28B, and forecasts Q2 EPS below estimates; NFLX drops 8%+ after hours —  Netflix Inc. gave a forecast for the second quarter that fell short of analysts expectations, sending the shares down in extended trading.

  4. Netflix chairman and co-founder Reed Hastings will step down from the company's board after his term expires in June to focus on philanthropy and other pursuits (Isabella Simonetti/Wall Street Journal)

    Isabella Simonetti / Wall Street Journal : Netflix chairman and co-founder Reed Hastings will step down from the company's board after his term expires in June to focus on philanthropy and other pursuits —  Co-founder won't stand for re-election to the board, Netflix says in first earnings report since pulling out of Warner Discovery bid

  5. Sources: Upscale AI, which builds AI networking infrastructure, is in talks to raise $180M to $200M at a $2B valuation, its third funding round in seven months (Bloomberg)

    Bloomberg : Sources: Upscale AI, which builds AI networking infrastructure, is in talks to raise $180M to $200M at a $2B valuation, its third funding round in seven months —  Artificial intelligence startup Upscale AI is in talks to raise a new round of funding at a valuation of about $2 billion, according to people with knowledge of the efforts.

  6. AI labs are buying Slack, Jira, and email archives from defunct startups to build "reinforcement learning gyms" and train AI agents in simulated workplaces (Anna Tong/Forbes)

    Anna Tong / Forbes : AI labs are buying Slack, Jira, and email archives from defunct startups to build “reinforcement learning gyms” and train AI agents in simulated workplaces —  Defunct startups are being liquidated for their Slack archives, Jira tickets, and email threads—operational exhaust that AI labs now treat as premium training data.

  7. OpenAI launches GPT-Rosalind, an AI model for life sciences research, including drug discovery, as a research preview for customers such as Moderna and Amgen (Megan Morrone/Axios)

    Megan Morrone / Axios : OpenAI launches GPT-Rosalind, an AI model for life sciences research, including drug discovery, as a research preview for customers such as Moderna and Amgen —  OpenAI announced a new series of AI models built to help life sciences researchers work faster.

  8. OpenAI's global policy chief, Chris Lehane, calls out AI "doomers" and says "when you put some of those thoughts and ideas out there, they do have consequences" (Caroline O'Donovan/The San Francisco ...)

    Caroline O'Donovan / The San Francisco Standard : OpenAI's global policy chief, Chris Lehane, calls out AI “doomers” and says “when you put some of those thoughts and ideas out there, they do have consequences” —  OpenAI's Chris Lehane thinks the discussion around AI has gotten out of hand.

  9. Memo: the White House notified Cabinet departments that OMB is setting up protections that would allow their agencies to begin using Anthropic's Mythos model (Bloomberg)

    Bloomberg : Memo: the White House notified Cabinet departments that OMB is setting up protections that would allow their agencies to begin using Anthropic's Mythos model —  The US government is preparing to make a version of Anthropic PBC's powerful new artificial intelligence model available …

  10. DeepL, best known for its text translation tools, launches DeepL Voice-to-Voice, which enables real-time spoken translation, with add-ons for services like Zoom (Ivan Mehta/TechCrunch)

    Ivan Mehta / TechCrunch : DeepL, best known for its text translation tools, launches DeepL Voice-to-Voice, which enables real-time spoken translation, with add-ons for services like Zoom —  DeepL, a translation company best known for its text tools, released a voice-to-voice translation suite today that covers use cases …

  11. The UK unveils Sovereign AI, a £500M fund to invest in domestic AI startups, starting with Callosum, which builds software to help different chips work together (Joel Khalili/Wired)

    Joel Khalili / Wired : The UK unveils Sovereign AI, a £500M fund to invest in domestic AI startups, starting with Callosum, which builds software to help different chips work together —  In a bid to minimize dependence on technology from other countries, the UK government is plowing resources into homegrown AI startups.

  12. OpenAI updates its Codex desktop app with features like computer control, an in-app browser, image generation, automation memory, plugin support, and more (David Gewirtz/ZDNET)

    David Gewirtz / ZDNET : OpenAI updates its Codex desktop app with features like computer control, an in-app browser, image generation, automation memory, plugin support, and more —  ZDNET's key takeaways  — Codex Desktop expands from coding into full productivity workflows.  — Automation can generate images, charts, and workflow outputs.

  13. Google's 2025 Ads Safety Report: Google blocked a record 8.3B ads, up 63% YoY, but suspended 36% fewer advertisers, attributing the disparity to its use of AI (Jagmeet Singh/TechCrunch)

    Jagmeet Singh / TechCrunch : Google's 2025 Ads Safety Report: Google blocked a record 8.3B ads, up 63% YoY, but suspended 36% fewer advertisers, attributing the disparity to its use of AI —  Google said Thursday it blocked a record 8.3 billion ads globally in 2025 — up from 5.1 billion the year before.

  14. Drift Protocol secures $147.5M in funding, including $127.5M from Tether, to replace Circle stablecoin with USDT after a $270M exploit linked to North Korea (Will Canny/CoinDesk)

    Will Canny / CoinDesk : Drift Protocol secures $147.5M in funding, including $127.5M from Tether, to replace Circle stablecoin with USDT after a $270M exploit linked to North Korea —  What to know: … Drift Protocol, the victim of a recent North Korean exploit, plans to relaunch with Tether's USDT as its settlement layer …

  15. The US DOJ says a judge sentenced two US citizens to a combined 16 years in prison for running laptop farms that let North Korean IT workers pose as US workers (Jowi Morales/Tom's Hardware)

    Jowi Morales / Tom's Hardware : The US DOJ says a judge sentenced two US citizens to a combined 16 years in prison for running laptop farms that let North Korean IT workers pose as US workers —  Your online co-worker isn't who you think. … Two individuals from New Jersey pleaded guilty to conspiracy to commit wire fraud …

Solidot(15)

  1. 新研究再次证实 AI 有害大脑

    研究人员在预印本平台 ArXiv 上发表论文《AI assistance reduces persistence and hurts independent performance》,再次证实 AI 有害大脑。研究人员招募了 350 名美国人,任务是解决一些分数方程。半数参与者被随机分配到 AI 组,他们可从一个基于 OpenAI GPT-5 构建的专用聊天机器人获取帮助,另一半必须独立完成。考试进行到一半时,AI 组的访问权限被切断。此举导致 AI 组的正确答案数量急剧下降,很多人干脆放弃考试。这一结果——成绩和毅力双双下降——在一项包含 670 名参与者的更大规模实验中得到了重复验证。研究人员指出,AI 辅助能提高即时表现,但会带来巨大的认知代价。仅仅使用 AI 十分钟就会让人对这项技术产生依赖,一旦停止使用,会导致表现下降和倦怠。

  2. Mozilla 宣布开源可自托管 AI 客户端 Thunderbolt

    Mozilla 与德国 AI 基础设施公司 deepset 合作宣布开源可自托管 AI 客户端 Thunderbolt。MZLA Technologies Corporation CEO Ryan Sipes 表示,AI 太重要而不能外包,Thunderbolt 为机构组织提供了一种自主的 AI 客户端,根据自身的基础设施、数据和需求,决定 AI 如何融入自身的工作流程。Thunderbolt 主要面向企业用户,而不是普通的 Firefox 用户。

  3. Linux Mint 宣布采用更长的开发周期

    基于 Ubuntu 的发行版 Linux Mint 正式宣布放慢发布周期。Ubuntu 是每半年发布一个新版本,Linux Mint 此前采用的发布周期类似。项目联合创始人 Clem Lefebvre 指出,每六个月发布一个新版本,此外还包括 LMDE,他们花在测试、修 bug 和发布上的时间远多于开发时间。Linux Mint 决定改变现状,采用更长的开发周期:下一个版本计划于 2026 年圣诞节发布,它将基于预计四月晚些时候释出的 Ubuntu 26.04 LTS,使用 Linux kernel 7.0(刚刚发布)。

  4. 人类的噪音在伤害动物,我们会学会安静吗?

    人类一直在制造噪音,而动物并不喜欢。动物需要时刻注意周围的声音,警惕捕食者接近或者发现求偶者。无处不在的人造噪音增加了动物彼此之间的沟通难度。旧金山公园 Presidio 的历史录音显示,在 1960 年代麻雀有三种不同的“方言”,但到了 2010 年代,由于交通噪音麻雀主要使用其中音调更高的“方言”,另外两种较柔和的“方言”要么已经消失要么正在消失。为了在噪音下被同类听到,鸟儿只能竭尽全力的鸣叫。城市噪音甚至改变了鸟类的体型,它们变得更瘦,压力更大。求偶鸣叫也不再有效。因为雌鸟不太喜欢高音调高音量的叫声。噪音还加剧了鸟类之间的冲突,因为听不到警告叫声鸟儿容易误入敌对鸟的领地。新冠疫情初期为遏制病毒的扩散全世界都采取了社交封锁的政策,世界变得更安静了。公园的噪音降低了 7 分贝,麻雀的叫声也发生了变化,鸣叫声更轻柔频率范围也更丰富,传送的距离也比以前多了一倍,求偶鸣叫也更撩人了。研究人员发现,当声音超过 55 分贝,胆小的动物就会进入应激反应;超过 65 分贝,几乎所有动物都会逃跑。噪音对动物有害,对人类也是如此:研究发现交通噪音与睡眠质量差、血压升高、心脏病发病率增加以及压力增大相关。那么我们能安静下来吗?

  5. 帝企鹅因气候变化导致数量减少被列为濒危

    世界自然保护联盟(IUCN)发布公报,南极洲两个最具代表性的物种——帝企鹅和南极毛皮海狮因数量快速、急剧减少,已被列入濒危物种红色名录濒危等级。IUCN 将物种生存状况划分为 9 个等级:灭绝、野外灭绝、极危、濒危、易危、近危、无危、数据缺乏、未予评估。IUCN 公报指出,根据卫星图像,2009 年至 2018 年间,帝企鹅种群数量减少了约 10%,相当于 2 万多只成年帝企鹅消失了。南极气候变化导致海冰发生变化,预测到 2180 年代,帝企鹅的数量将减半。南极毛皮海狮因食物短缺导致数量自 2000 年以来减少了 50%。气候变化致使海洋温度上升和海冰面积缩减,磷虾向更深、更冷的水域迁移,导致南极毛皮海狮的食物来源减少。

  6. 抹香鲸的发声沟通方式与人类相似

    人类与抹香鲸可能没有共同之处,但根据发表在《Proceedings B》上的一项研究,抹香鲸的发声沟通方式与人类语言有着惊人的相似性。抹香鲸不仅有着某种形式的“字母表”,在发声中形成元音,且元音的结构与人类语言采用了相同的方法。抹香鲸通过一系列“咔嗒”声进行交流,对咔嗒声的分析发现,抹香鲸能通过长短不同的咔哒声或通过升降调去区分元音,其模式与汉语、拉丁语和斯洛文尼亚语等语言相似。这项发现是致力于翻译鲸鱼语言的 Project Ceti(代表 Cetacean Translation Initiative)的最新研究结果。Project Ceti 创始人 David Gruber 说,鲸鱼可能将信息一代又一代传递了逾两千万年。

  7. 中国手游如何征服世界

    AppMagic 的数据显示,2026 年 2 月全世界收入最高的 15 款手游中有 7 款来自中国,这些游戏的内购在一个月内产生了 6.68 亿美元的收入。Playrix、King、Roblox Corporation 和 Supercell 等西方公司仍然能跻身全球收入最高的 10 大手游发行商榜单,但它们的成功源于老游戏。2025 年收入排名前 15 的新游戏中没有一款来自西方游戏工作室。2025 年中国游戏在海外市场的收入达到 205 亿美元,连续第十年实现增长,连续第二年保持两位数增长。这种现象并非偶然,也并非仅仅因为中国市场庞大,中国手游的全球支配地位是循序渐进建立起来的。这一切源自本世纪初,当时中国的 PC 游戏盗版猖獗,买断制游戏模式难以维系,因此中国游戏公司转向了免费内购模式,在手游逐渐流行时,中国游戏公司对玩家付费心理的理解领先世界十年,并至今影响着他们的运营模式。2025 年中国手游活跃玩家 7.72 亿,手游收入占到了游戏市场总收入 501 亿美元的 73.29%,巨大的竞争压力促使中国游戏公司转向海外市场,首先是 4X 策略游戏,然后是中核(Mid-core)游戏,曾经的西方强项益智、合并和消除类游戏也逐渐被中国公司吞食。中国公司拥有西方公司所不具备的结构性优势:庞大且地域集中的人才库;对轮班制工作的文化接受度;员工纪律性强且易于替代;规模化生产后单位劳动力成本更低;以及对庞大团队和快速重组的容忍度。西方公司无法长期安排多个轮班以满足全天候实时运营的需求,无法将团队规模扩大到数千人,无法以工业化速度招聘和重组,没办法在这方面与中国公司展开竞争。中国公司对市场的反应无人能及,前一天 TikTok 上流行的梗(meme)可能第二天就出现在游戏关卡里。中国公司凭借组织规模取胜,西方公司只能依靠组织和创意的精准性取胜。

  8. IPv6 普及度突破 50%

    根据 Google 的统计,全球的 IPv6 普及度最近一段时间突破了 50%。按国别统计,中国只有 4.66%,美国 54.61%,加拿大 40.91%,俄罗斯 47.67%,印度 74.87%。然而 Cloudflare 的数据与 Google 有显著差异,Cloudflare 统计的是 HTTP 请求中 IPv4 和 IPv6 的比例,中国的 IPv6 比例达到了 32.6%,印度是 69.6%,美国是 49.1%。Akamai 的数据是中国 IPv6 普及度 26.1%,印度 73.4%,美国 48.1%。

  9. 挪威男子在移植其兄弟的干细胞后治愈 HIV

    一名 63 岁的挪威男子接受其兄弟的干细胞移植后治愈了艾滋病,该男子已停止抗逆转录病毒治疗两年,定期检测未发现病毒。全世界大约有 10 例通过干细胞移植治愈艾滋病的病例,而这是首例涉及家庭成员捐赠干细胞的病例。这名挪威男子于 2006 年确诊艾滋病,2018 年确诊骨髓增生异常综合征,这是一种会削弱骨髓造血功能的癌症。癌症在治疗两年后复发,医生决定移植干细胞,希望能找到适配骨髓同时还携带能抵抗 HIV 的基因突变的捐赠者。名为 CCR5-delta 32 的突变能让免疫细胞抵抗最常见的艾滋病病毒 HIV-1,但只有 1% 的人口携带两个该突变基因拷贝。在未找到之后,医生转向了该男子的兄弟,他的兄弟适配骨髓,出乎意料的是,医生在检测时发现他还携带了 CCR5-delta 32 突变。

  10. 波士顿动力的机器狗集成了 Google 的 Gemini 模型

    波士顿动力将 Google DeepMind 的高级具身推理模型 Gemini Robotics-ER 1.6 集成到其机器狗 Spot 中,使 Spot 在工业检测如发现泄漏和读取仪表数据上拥有更强的自主推理能力,机器狗还能认识到何时调用其他 AI 工具。波士顿动力与 Google DeepMind 合作的重点集中在工业检测相关领域,即机器狗在工业设施内巡逻过程中是否能识别潜在爆炸风险。在集成 Gemini Robotics 之后,Spot 能自主寻找危险碎片或泄漏物,读取复杂仪表和视镜,在需要帮助理解周围环境时调用视觉-语言-动作模型等工具。波士顿动力在 YouTube 上发布了一则视频演示了 Spot 的新能力。

  11. Cal.com 因 AI 从开源转为闭源

    日程安排平台 Cal.com 宣布从开源转为闭源,理由是 AI 工具更容易从开源代码中发现漏洞,而安全性依赖于模糊,因此闭源有助于提高安全。Cal 联合创始人 Peer Richelsen 称 AI 攻击者正在利用开源项目的透明特性,CEO Bailey Pumfleet 说,开源就好比银行公开其金库的图纸,在 AI 工具的帮助下研究图纸的黑客数量增加了百倍。Cal.com 表示会继续支持开源,将为爱好者提供一个独立的开源版本 Cal.diy。该公司核心产品则将从开源许可证 GNU Affero General Public License(AGPL)切换到闭源许可证。Pumfleet 表示,他们不希望因为漏洞而暴露客户敏感的订购数据,他们旨在成为一家日程安排公司,而不是一家网络安全公司。

  12. 免费领取价值30/90美金的NVIDIA DLI自学课程并测试获得证书

    领取规则:未注册过开发者的用户可以通过如下链接免费选择一门 DLI 在线自主培训的付费课程,配套云端实验环境和可获得 NVIDIA 培训证书。每位用户(每个邮箱账号)仅可选择一门。 ⚠️目前可选课程包括 7 门英文课,5 门中文课,目前课程列表如下,随时下架,免费名额有限,先到先得: ✨领取指南: 总共分四步,即可领取课程并学习测试获得证书 一、新注册开发者用户 填写未注册过的邮箱——设置密码——完善资料 二、免费领取一节免费课程 下滑到NVIDIA 培训和认证模块,点击立即领取,并补充个人信息 注意:年龄仅限18岁以上,Location选择China 三、领取一节您感兴趣的中文或者英文课程 右上角可选择语言挑选中/英文课程 四、开始学习并通过测试获得NVIDIA培训证书 学习前,请阅读下图四步进行环境确认 自测链接:http://websocketstest.courses.nvidia.com

  13. 全球暖化危及水稻产量

    全球九成以上的大米产自亚洲,亚洲有逾十亿人口以大米为主食。历史数据显示,亚洲大米过去九千年很少能在年平均气温逾 28°C 或暖季最高气温逾 33°C 的地区茁壮成长,而全球暖化的速度比水稻能适应的速度快 5000 倍。根据发表在《Communications Earth & Environment》期刊上的一项研究,预计到 2070 年印度和东南亚等传统水稻种植区的温度将超过 40 摄氏度,在这种温度下现有的亚洲水稻无法正常生长,逾十亿人的粮食安全面临威胁。研究人员指出,现有的水稻未适应气温升高,而是其种植从较暖的地区扩大到了较凉的地区,种植面积增加了,但承受气候变化的能力并没有增长。以中国为例,水稻种植从华中扩大到华北,同时加大了炎热地区的灌溉力度,因此大米的产量才略有增长。

  14. 美国国会新法案要求操作系统验证用户年龄

    美国民主党议员 Josh Gottheimer 和共和党议员 Elise M. Stefanik 联合提出了一项新法案,要求操作系统开发商验证用户的年龄。法案已于 4 月 13 日递交到众议院能源和商业委员,其详细内容还没有公开,目前只知道法案的标题是“To require operating system providers to verify the age of any user of an operating system, and for other purposes”。该法案可能是 Gottheimer 本月初提出的 Parents Decide Act 的一部分,其中包括:要求苹果和 Google 等操作系统开发商在设置新设备时验证用户年龄,不依赖于用户自行报告的年龄;允许家长从一开始就设置适合孩子年龄的内容控制,包括限制访问社交媒体、应用和 AI 平台;确保年龄和家长控制设置安全传输到应用和 AI 平台,它们因此能为儿童量身定制内容;通过在各个平台建立一致且可信的标准,防止儿童访问有害或露骨的内容,包括不合适的 AI 聊天机器人互动。

  15. 斯坦福报告凸显了 AI 业内人士和公众之间的分歧

    斯坦福大学 HAI 研究院本周一发表了年度报告 AI Index。报告凸显了 AI 业内人士和公众之间日益扩大的分歧。报告援引皮尤研究中心上月发布的一份报告:只有 10% 的美国人对 AI 在日常生活中的日益普及感到兴奋而非担忧,但 56% 的 AI 专家认为 AI 将在未来 20 年对美国产生积极影响。AI 专家的意见和公众情绪存在显著分歧:84% 的专家认为 AI 未来 20 年将对医疗保健产生积极影响,只有 44% 的公众持相同观点;73% 的专家积极看待 AI 对工作方式的影响,而持相同观点的公众仅占 23%;69% 的专家认为 AI 将对经济产生积极影响,只有 21% 的公众持相同观点;AI 专家对 AI 对就业市场的影响持较为乐观态度,而 64% 的公众认为 AI 将在未来 20 年导致就业岗位减少。