DIGEST · 2026-04-23

OrangeBot.AI Digest — 2026-04-23

90 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Meta to cut 10% of jobs (techcrunch.com)
  2. GPT-5.5 (openai.com)
  3. An update on recent Claude Code quality reports (www.anthropic.com)
  4. Palantir employees are starting to wonder if they're the bad guys (www.wired.com)
  5. 'Hairdryer used to trick weather sensor' to win Polymarket bet (www.telegraph.co.uk)
  6. Incident with multple GitHub services (www.githubstatus.com)
  7. If America's so rich, how'd it get so sad? (www.derekthompson.org)
  8. French government agency confirms breach as hacker offers to sell data (www.bleepingcomputer.com)
  9. To Protect and Swerve: NYPD Cop Has 547 Speeding Tickets (nyc.streetsblog.org)
  10. Bitwarden CLI compromised in ongoing Checkmarx supply chain campaign (socket.dev)
  11. Raylib v6.0 (github.com)
  12. Investigation uncovers two sophisticated telecom surveillance campaigns (techcrunch.com)
  13. Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite (github.com)
  14. Your hex editor should color-code bytes (simonomi.dev)
  15. Arch Linux Now Has a Bit-for-Bit Reproducible Docker Image (antiz.fr)

GitHub Trending(15)

  1. huggingface / ml-intern
  2. zilliztech / claude-context
  3. HKUDS / RAG-Anything
  4. Z4nzu / hackingtool
  5. ruvnet / RuView
  6. Anil-matcha / Open-Generative-AI
  7. Alishahryar1 / free-claude-code
  8. open-metadata / OpenMetadata
  9. microsoft / ai-agents-for-beginners
  10. PowerShell / PowerShell
  11. cline / cline
  12. microsoft / onnxruntime
  13. mksglu / context-mode
  14. coreyhaines31 / marketingskills
  15. chiphuyen / aie-book

Product Hunt(15)

  1. Docsio

    Lovable for doc sites

  2. Hookdeck Outpost

    Open-source outbound webhooks for your platform

  3. FocuSee 2.0

    Record screen to get polished demos & tutorials

  4. Vora Health

    Every wearable, every metric, one free AI-powered health app

  5. Wellows

    See how AI talks about your brand — and fix it

  6. Instamor

    Video-first partner matching per your personal vibe

  7. MailerLogic

    Email API with built-in deliverability intelligence

  8. Speakmac

    Private voice typing for Mac that works anywhere

  9. Workspace agents in ChatGPT

    Codex-powered agents for teams.

  10. Talk to Review

    Real reviews in 10 seconds.

  11. The Autonomous Stack

    Production-tested architecture for autonomous Claude agents

  12. USVC by AngelList

    Back the companies building the future. Before it’s obvious.

  13. Ubuntu 26.04 Resolute Raccoon

    The next-generation Ubuntu for Developers, AI and Cloud

  14. Foil AI Code Security

    AI code security review that runs entirely on your Mac

  15. FloMCP

    Ship MCP servers with 32 security checks in under 5 minutes

Hugging Face(15)

  1. LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

    We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal understanding and generation within a natively integrated framework. Its architecture combines a fully semantic discrete tokenizer, a MoE-based dLLM backbone, and a diffusion decoder. By discretizing continuous visual inputs via SigLIP-VQ, the model enables block-level masked diffusion for both text and vision inputs within the backbone, while the decoder reconstructs visual tokens into high-fidelity images. Inference efficiency is enhanced beyond parallel decoding through prefix-aware optimizations in the backbone and few-step distillation in the decoder. Supported by carefully curated large-scale data and a tailored multi-stage training pipeline, LLaDA2.0-Uni matches specialized VLMs in multimodal understanding while delivering strong performance in image generation and editing. Its native support for interleaved generation and reasoning establishes a promising and scalable paradigm for next-generation unified foundation models. Codes and models are available at https://github.com/inclusionAI/LLaDA2.0-Uni.

  2. Near-Future Policy Optimization

    Reinforcement learning with verifiable rewards (RLVR) has become a core post-training recipe. Introducing suitable off-policy trajectories into on-policy exploration accelerates RLVR convergence and raises the performance ceiling, yet finding a source of such trajectories remains the key challenge. Existing mixed-policy methods either import trajectories from external teachers (high-quality but distributionally far) or replay past training trajectories (close but capped in quality), and neither simultaneously satisfies the strong enough (higher Q , more new knowledge to learn) and close enough (lower V , more readily absorbed) conditions required to maximize the effective learning signal S = Q/V. We propose Near-Future Policy Optimization (NPO), a simple mixed-policy scheme that learns from a policy's own near-future self: a later checkpoint from the same training run is a natural source of auxiliary trajectories that is both stronger than the current policy and closer than any external source, directly balancing trajectory quality against variance cost. We validate NPO through two manual interventions, early-stage bootstrapping and late-stage plateau breakthrough, and further propose AutoNPO,an adaptive variant that automatically triggers interventions from online training signals and selects the guide checkpoint that maximizes S. On Qwen3-VL-8B-Instruct with GRPO, NPO improves average performance from 57.88 to 62.84, and AutoNPO pushes it to 63.15, raising the final performance ceiling while accelerating convergence.

  3. DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

    Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.

  4. OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

    Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.

  5. DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

    Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these synthetic videos holds strong potential for motion planning in dexterous robotic manipulation, their limited physical fidelity and purely 2D nature make them difficult to use directly as imitation targets in physics-based character control. We present DeVI (Dexterous Video Imitation), a novel framework that leverages text-conditioned synthetic videos to enable physically plausible dexterous agent control for interacting with unseen target objects. To overcome the imprecision of generative 2D cues, we introduce a hybrid tracking reward that integrates 3D human tracking with robust 2D object tracking. Unlike methods relying on high-quality 3D kinematic demonstrations, DeVI requires only the generated video, enabling zero-shot generalization across diverse objects and interaction types. Extensive experiments demonstrate that DeVI outperforms existing approaches that imitate 3D human-object interaction demonstrations, particularly in modeling dexterous hand-object interactions. We further validate the effectiveness of DeVI in multi-object scenes and text-driven action diversity, showcasing the advantage of using video as an HOI-aware motion planner.

  6. Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

    Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit imperfections in learned reward signals to maximize proxy objectives without fulfilling true task intent. As models scale and optimization intensifies, such exploitation manifests as verbosity bias, sycophancy, hallucinated justification, benchmark overfitting, and, in multimodal settings, perception--reasoning decoupling and evaluator manipulation. Recent evidence further suggests that seemingly benign shortcut behaviors can generalize into broader forms of misalignment, including deception and strategic gaming of oversight mechanisms. In this survey, we propose the Proxy Compression Hypothesis (PCH) as a unifying framework for understanding reward hacking. We formalize reward hacking as an emergent consequence of optimizing expressive policies against compressed reward representations of high-dimensional human objectives. Under this view, reward hacking arises from the interaction of objective compression, optimization amplification, and evaluator--policy co-adaptation. This perspective unifies empirical phenomena across RLHF, RLAIF, and RLVR regimes, and explains how local shortcut learning can generalize into broader forms of misalignment, including deception and strategic manipulation of oversight mechanisms. We further organize detection and mitigation strategies according to how they intervene on compression, amplification, or co-adaptation dynamics. By framing reward hacking as a structural instability of proxy-based alignment under scale, we highlight open challenges in scalable oversight, multimodal grounding, and agentic autonomy.

  7. Exploring Spatial Intelligence from a Generative Perspective

    Spatial intelligence is essential for multimodal large language models, yet current benchmarks largely assess it only from an understanding perspective. We ask whether modern generative or unified multimodal models also possess generative spatial intelligence (GSI), the ability to respect and manipulate 3D spatial constraints during image generation, and whether such capability can be measured or improved. We introduce GSI-Bench, the first benchmark designed to quantify GSI through spatially grounded image editing. It consists of two complementary components: GSI-Real, a high-quality real-world dataset built via a 3D-prior-guided generation and filtering pipeline, and GSI-Syn, a large-scale synthetic benchmark with controllable spatial operations and fully automated labeling. Together with a unified evaluation protocol, GSI-Bench enables scalable, model-agnostic assessment of spatial compliance and editing fidelity. Experiments show that fine-tuning unified multimodal models on GSI-Syn yields substantial gains on both synthetic and real tasks and, strikingly, also improves downstream spatial understanding. This provides the first clear evidence that generative training can tangibly strengthen spatial reasoning, establishing a new pathway for advancing spatial intelligence in multimodal models.

  8. A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

    As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

  9. C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

    We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

  10. WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training

    End-to-end spoken dialogue models have garnered significant attention because they offer a higher potential ceiling in expressiveness and perceptual ability than cascaded systems. However, the intelligence and expressiveness of current open-source spoken dialogue models often remain below expectations. Motivated by the success of online reinforcement learning(RL) in other domains, one might attempt to directly apply preference optimization to spoken dialogue models, yet this transfer is non-trivial. We analyze these obstacles from the perspectives of reward modeling and rollout sampling, focusing on how sparse preference supervision interacts with dense speech generation under shared-parameter updates. Based on the analysis, we propose a modality-aware adaptive post-training recipe that makes RL practical for spoken dialogue: it constrains preference updates to the semantic channel and improves acoustic behavior via explicit anchoring, while dynamically regulating their mixture from rollout statistics to avoid unreliable preference gradients. We evaluate the method across multiple spoken dialogue benchmarks and representative architectures, and observe consistent improvements in semantic quality and speech expressiveness.

  11. SWE-chat: Coding Agent Interactions From Real Users in the Wild

    AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat, the first large-scale dataset of real coding agent sessions collected from open-source developers in the wild. The dataset currently contains 6,000 sessions, comprising more than 63,000 user prompts and 355,000 agent tool calls. SWE-chat is a living dataset; our collection pipeline automatically and continually discovers and processes sessions from public repositories. Leveraging SWE-chat, we provide an initial empirical characterization of real-world coding agent usage and failure modes. We find that coding patterns are bimodal: in 41% of sessions, agents author virtually all committed code ("vibe coding"), while in 23%, humans write all code themselves. Despite rapidly improving capabilities, coding agents remain inefficient in natural settings. Just 44% of all agent-produced code survives into user commits, and agent-written code introduces more security vulnerabilities than code authored by humans. Furthermore, users push back against agent outputs -- through corrections, failure reports, and interruptions -- in 44% of all turns. By capturing complete interaction traces with human vs. agent code authorship attribution, SWE-chat provides an empirical foundation for moving beyond curated benchmarks towards an evidence-based understanding of how AI agents perform in real developer workflows.

  12. Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

    Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and clarification on unanswerable queries while preserving strong performance on answerable ones. Experiments on Abstain-Test, Abstain-QA, and SelfAware show that Abstain-R1 substantially improves over its base model and achieves unanswerable-query behavior competitive with larger systems including DeepSeek-R1, suggesting that calibrated abstention and clarification can be learned through verifiable rewards rather than emerging from scale alone.

  13. Image Generators are Generalist Vision Learners

    Recent works show that image and video generators exhibit zero-shot visual understanding behaviors, in a way reminiscent of how LLMs develop emergent capabilities of language understanding and reasoning from generative pretraining. While it has long been conjectured that the ability to create visual content implies an ability to understand it, there has been limited evidence that generative vision models have developed strong understanding capabilities. In this work, we demonstrate that image generation training serves a role similar to LLM pretraining, and lets models learn powerful and general visual representations that enable SOTA performance on various vision tasks. We introduce Vision Banana, a generalist model built by instruction-tuning Nano Banana Pro (NBP) on a mixture of its original training data alongside a small amount of vision task data. By parameterizing the output space of vision tasks as RGB images, we seamlessly reframe perception as image generation. Our generalist model, Vision Banana, achieves SOTA results on a variety of vision tasks involving both 2D and 3D understanding, beating or rivaling zero-shot domain-specialists, including Segment Anything Model 3 on segmentation tasks, and the Depth Anything series on metric depth estimation. We show that these results can be achieved with lightweight instruction-tuning without sacrificing the base model's image generation capabilities. The superior results suggest that image generation pretraining is a generalist vision learner. It also shows that image generation serves as a unified and universal interface for vision tasks, similar to text generation's role in language understanding and reasoning. We could be witnessing a major paradigm shift for computer vision, where generative vision pretraining takes a central role in building Foundational Vision Models for both generation and understanding.

  14. Tadabur: A Large-Scale Quran Audio Dataset

    Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

  15. Scaling Test-Time Compute for Agentic Coding

    Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, errors, and partial progress taken by the agent. In this setting, the main challenge is no longer generating more attempts, but representing prior experience in a form that can be effectively selected from and reused. We propose a test-time scaling framework for agentic coding based on compact representations of rollout trajectories. Our framework converts each rollout into a structured summary that preserves its salient hypotheses, progress, and failure modes while discarding low-signal trace details. This representation enables two complementary forms of inference-time scaling. For parallel scaling, we introduce Recursive Tournament Voting (RTV), which recursively narrows a population of rollout summaries through small-group comparisons. For sequential scaling, we adapt Parallel-Distill-Refine (PDR) to the agentic setting by conditioning new rollouts on summaries distilled from prior attempts. Our method consistently improves the performance of frontier coding agents across SWE-Bench Verified and Terminal-Bench v2.0. For example, by using our method Claude-4.5-Opus improves from 70.9% to 77.6% on SWE-Bench Verified (mini-SWE-agent) and 46.9% to 59.1% on Terminal-Bench v2.0 (Terminus 1). Our results suggest that test-time scaling for long-horizon agents is fundamentally a problem of representation, selection, and reuse.

Techmeme(15)

  1. Sources: the US DOJ arrested a soldier involved in the capture of Nicolás Maduro for allegedly making $400K+ on Polymarket by betting on his removal from office (ABC News)

    ABC News : Sources: the US DOJ arrested a soldier involved in the capture of Nicolás Maduro for allegedly making $400K+ on Polymarket by betting on his removal from office —  The bet was allegedly placed by a commando involved in Nicolas Maduro's capture.  —  Katherine Faulders, Aaron Katersky, Peter Charalambous, and Alexander Mallin

  2. Texas Instruments stock rose 19% on Thursday, its best day since 2000, after upbeat Q2 guidance driven by high demand for analog chips used in AI data centers (Katie Tarasov/CNBC)

    Katie Tarasov / CNBC : Texas Instruments stock rose 19% on Thursday, its best day since 2000, after upbeat Q2 guidance driven by high demand for analog chips used in AI data centers —  Texas Instruments had its best day on Wall Street since 2000 after the chipmaker reported better-than-expected quarterly results …

  3. Shenzhen-based Pudu Robotics, which makes commercial service robots, raised ~$150M, bringing its total funding to $300M+, and says its valuation exceeds $1.5B (The Robot Report)

    The Robot Report : Shenzhen-based Pudu Robotics, which makes commercial service robots, raised ~$150M, bringing its total funding to $300M+, and says its valuation exceeds $1.5B —  Pudu Technology Inc. today said it has it raised nearly $150 million in a new funding round.  Following this round …

  4. Intel reports Q1 revenue up 7% YoY to $13.58B, vs. $12.42B est., and forecasts Q2 revenue and adjusted EPS above estimates; INTC jumps 15%+ after hours (CNBC)

    CNBC : Intel reports Q1 revenue up 7% YoY to $13.58B, vs. $12.42B est., and forecasts Q2 revenue and adjusted EPS above estimates; INTC jumps 15%+ after hours —  Intel has been a Wall Street darling of late even as the business has yet to find much momentum.

  5. Music publishers including UMG, Warner Music, and Sony drop a copyright suit against Verizon following a SCOTUS decision limiting ISP liability in Cox's suit (Kyle Jahner/Bloomberg Law)

    Kyle Jahner / Bloomberg Law : Music publishers including UMG, Warner Music, and Sony drop a copyright suit against Verizon following a SCOTUS decision limiting ISP liability in Cox's suit —  Publishers representing the bulk of the music industry dropped a copyright lawsuit against Verizon Communications Inc. over music piracy …

  6. Xbox CEO Asha Sharma and Chief Content Officer Matt Booty detail their "return of Xbox" strategy, including daily active players as its "new north star" (Tom Warren/The Verge)

    Tom Warren / The Verge : Xbox CEO Asha Sharma and Chief Content Officer Matt Booty detail their “return of Xbox” strategy, including daily active players as its “new north star” —  Microsoft's new Xbox leader starts to talk strategy for the company's gaming business.

  7. Instagram launches Instants, an app for sharing disappearing photos, in Italy and Spain, after rolling out an Instants feature in its main app in some regions (Sydney Bradley/Business Insider)

    Sydney Bradley / Business Insider : Instagram launches Instants, an app for sharing disappearing photos, in Italy and Spain, after rolling out an Instants feature in its main app in some regions —  - Instagram launched a new app in Italy and Spain this week called “Instants.”  — A hodgepodge of apps like Snapchat, Locket …

  8. OpenAI says "GPT-5.5 matches GPT-5.4 per-token latency in real-world serving, while performing at a much higher level of intelligence" (OpenAI)

    OpenAI : OpenAI says “GPT-5.5 matches GPT-5.4 per-token latency in real-world serving, while performing at a much higher level of intelligence” —  A new class of intelligence for real work  —  We're releasing GPT-5.5, our smartest and most intuitive to use model yet …

  9. Anthropic says it has fixed three causes of recent Claude Code quality issues: reduced default reasoning, a caching bug, and a system prompt to reduce verbosity (Anthropic)

    Anthropic : Anthropic says it has fixed three causes of recent Claude Code quality issues: reduced default reasoning, a caching bug, and a system prompt to reduce verbosity —  We traced recent reports of Claude Code quality issues to three separate changes.  Here's what happened and what we're changing.

  10. GPT-5.5 is priced at $5/1M input tokens and $30/1M output tokens, double GPT-5.4's pricing; GPT-5.5 Pro costs $30/1M input tokens and $180/1M output tokens (Carl Franzen/VentureBeat)

    Carl Franzen / VentureBeat : GPT-5.5 is priced at $5/1M input tokens and $30/1M output tokens, double GPT-5.4's pricing; GPT-5.5 Pro costs $30/1M input tokens and $180/1M output tokens —  After months of rumors and reports that OpenAI was developing a new, more powerful AI large language model for use in ChatGPT …

  11. GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, and GPT-5.5 Pro to Pro, Business, and Enterprise users in ChatGPT (The Verge)

    The Verge : GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, and GPT-5.5 Pro to Pro, Business, and Enterprise users in ChatGPT —  The new model ‘excels’ at tasks like writing and debugging code and doing work across different tools.

  12. OpenAI says GPT-5.5's improvements are strongest in agentic coding, computer use, and early scientific research, which require reasoning across longer contexts (Madison Mills/Axios)

    Madison Mills / Axios : OpenAI says GPT-5.5's improvements are strongest in agentic coding, computer use, and early scientific research, which require reasoning across longer contexts —  OpenAI on Thursday released its most capable model, GPT-5.5, codenamed “Spud,” just one week after competitor Anthropic launched its latest model.

  13. OpenAI launches GPT-5.5, designed to handle complex tasks with minimal guidance; the model will be used to power the company's upcoming "super app" (Rachel Metz/Bloomberg)

    Rachel Metz / Bloomberg : OpenAI launches GPT-5.5, designed to handle complex tasks with minimal guidance; the model will be used to power the company's upcoming “super app” —  OpenAI is introducing an artificial intelligence model that's intended to be better at completing work without much direction …

  14. At a town hall, Asha Sharma said Microsoft is returning to using Xbox for its gaming division, instead of Microsoft Gaming, as "Xbox needs to be our identity" (Tom Warren/The Verge)

    Tom Warren / The Verge : At a town hall, Asha Sharma said Microsoft is returning to using Xbox for its gaming division, instead of Microsoft Gaming, as “Xbox needs to be our identity” —  Xbox is Microsoft's gaming identity moving forward. … Xbox CEO Asha Sharma has had a busy week.

  15. Meta plans to cut 10% of workers, or ~8,000 jobs, on May 20 and won't fill 6,000 open roles, in an effort to offset its AI spending and boost efficiency (Kurt Wagner/Bloomberg)

    Kurt Wagner / Bloomberg : Meta plans to cut 10% of workers, or ~8,000 jobs, on May 20 and won't fill 6,000 open roles, in an effort to offset its AI spending and boost efficiency —  Meta Platforms Inc. plans to cut 10% of workers, or roughly 8,000 employees, in an effort to boost efficiency and offset its heavy spending on artificial intelligence.

Solidot(15)

  1. 猕猴吃土帮助消化游客的高热量垃圾食品

    剑桥大学的一项研究发现,欧洲唯一野猴群、生活在直布罗陀的猕猴群会定期吃土。对猴群的监测显示,与游客接触频繁的猕猴摄入的泥土量更多,而在旅游旺季食土率也更高。科学家认为游客提供或从游客偷取的高热量食物如巧克力、薯片和冰淇淋正在扰乱它们的肠道微生物组成,导致了其习惯的改变。吃土可能有助于猕猴恢复肠胃平衡,土壤可以提供垃圾食品缺乏的细菌和矿物质,可能有助于保护肠道内壁,缓解或预防因摄入过多糖分和脂肪引起的刺激。猕猴群平均每周有 12 次吃土行为。30% 的吃土行为发生在群体中,89% 发生在其它猕猴在场的情况下,其它猕猴通常会在旁观察,表明这种行为是“社会习得的”。大多数猕猴偏爱食用红土(terra rossa),占到了所有事件的 83%。

  2. 特朗普模因币导致投资者损失数十亿美元

    2025 年 1 月总统就职典礼前特朗普推出了官方模因币,成为首位发行个人加密货币的总统。分析认为特朗普家族从中获利逾 2.8 亿美元,但投资该模因币的散户则损失了逾 43 亿美元。大部分特朗普模因币掌握在内部人士手中,通过关键时刻抛售和收集手续费,他们没有损失,反而获利逾 6 亿美元。特朗普代币最初定价 28.73 美元,如今的价格不到 3 美元,币值跌了 93%。第一夫人 Melania Trump 的官方模因币则更惨,币值相比峰值跌了 99%。特朗普家族的加密货币公司 World Liberty Financial 则为其带来了约 50 亿美元的财富。

  3. 53 国齐聚哥伦比亚商讨淘汰化石燃料

    53 国(不包括主要排放国中国、美国、印度和俄罗斯)将于下周齐聚哥伦比亚商讨如何逐步淘汰化石燃料。霍尔木兹海峡的关闭让亚太国家面临能源危机,迫使这些国家采取紧急措施如推行远程办公和关闭学校。与会国并非传统盟友,而是在短时间内组建的松散联盟,这也反映了形势的紧迫性。最新的能源危机可能成为可再生清洁能源的一个转折点。化石燃料行业多年来一直宣称石油、天然气和煤炭是可靠的能源,但危机显示它们并不可靠,可再生能源则廉价、可靠且安全。相比 1970 年代的中东石油危机,今天可作为化石燃料提到的清洁能源已经成熟。自 1970 年代以来,太阳能电池板的价格下降了 99.9%,自 1984 年以来风能成本下降了 91%,自 1991 年以来电池价格下降了 99%。此前韩国七成原油经过霍尔木兹海峡,现在韩国计划四年内将可再生能源装机容量翻一番。过去 30 年的全球气候谈判几乎从未提及化石燃料,部分原因是化石燃料主要出口国和游说团体的阻挠。现在一个松散的国家联盟正绕过全球气候谈判,讨论如何真正逐步淘汰化石燃料。

  4. 加密货币骗子瞄准滞留在霍尔木兹海峡附近的船只

    由于伊朗关闭了霍尔木兹海峡,目前有约 2000 艘船只和 2 万名海员滞留在附近。伊朗几周前表示通过海峡的油轮需要以加密货币形式支付过境费,但美国随后也宣布关闭海峡搜查过往船只。形势相当混乱,而在混乱之中骗子开始浑水摸鱼。据报道,加密货币骗子正冒充伊朗当局,向航运公司发送信息,要求以比特币或泰达币支付过境费。未经证实的消息称,有船只认为已经支付了过境费而试图通过海峡,在遭到伊朗炮击后不得不返回。目前已发生了两次类似事件,一次是 4 月 18 日,一次是 4 月 22 日。

  5. 古代人类曾三次迁徙到南美洲

    南美洲是人类最后定居的一块大陆。科学家以前认为迁徙到南美很简单:大约 1.5 万前来自大约同一群体的古代人类祖先进入这块大陆,逐渐适应从丛林到高原等不同环境。但发表在《自然》期刊上的古代和现代基因组分析研究显示:古代人类至少分三次迁徙到南美洲。第一波迁徙大约发生在 1.27 万年前,第二波是 9000 年前,第三波则是约 1300 年前。第三波迁徙者与中美洲人有关联。

  6. 盖茨基金会准备裁员,正在审查与爱泼斯坦的关联

    由于基金会主席比尔盖茨与已故性侵犯爱泼斯坦(Jeffrey ​Epstein)之间的关系,盖茨基金会最近深陷争议。它在周二宣布已委托外部机构对该基金会与爱泼斯坦之间关联进行审查,预计夏天会收到相关评估报告。美国司法部今年一月公布的文件显示爱泼斯坦曾与基金会的员工有过通信,文件还包括盖茨与爱泼斯坦以及被涂黑女性之间合影。盖茨之后声明与爱泼斯坦的关系仅限于慈善事业,称与他见面是一个错误,否认与爱泼斯坦性侵犯案受害者有过接触。盖茨基金会据报道还计划在 2030 年前裁掉约五分之一或 200 名员工。

  7. 高收入人群的 AI 普及率远高于低收入人群

    FT 的调查显示,高收入人群的 AI 普及率与低收入人群存在显著的差异。对美英两国 4000 名员工的调查显示,AI 采用明显向高收入员工倾斜:超过 60% 的高薪员工每天使用 AI,而低收入员工中这一比例仅为 16%。调查还显示出性别差距:在从科技到教育与零售等各行业中,男性使用 AI 工具的可能性显著高于女性。诺贝尔经济学奖得主 Daron Acemoglu 说,AI 工具被认为会走向大众化,但现实是你需要一定的教育水平、抽象思维和定量分析能力,以及对计算机和编程的熟悉程度才能使用这些模型。几乎肯定 AI 将会加剧劳动力和资本之间的不平等。薪酬、教育水平和 AI 使用之间存在紧密联系,表明这项技术可能会通过提高高收入人群而非低收入人群的生产力加剧收入不平等。另一位诺奖得主 Chris Pissarides 说,发明的技术越智能,你的智力越重要。PC 革命早期也有类似的鸿沟,但随着 PC 的普及鸿沟逐渐缩小。现在的问题是多长时间?如果需要十到二十年那么情况可能会令人担忧。

  8. 孙宇晨起诉特朗普家族的加密货币公司 World Liberty

    特朗普家族加密货币公司 World Liberty(WLFI)的早期支持者、波场(TRON)创始人孙宇晨提起诉讼,指控 World Liberty 非法扣押他持有的 WLFI 代币,冻结其参与公司治理的投票权。孙宇晨还指控包括联合创始人 Chase Herro 在内的其他 World Liberty 运营者将该公司作为利用特朗普品牌通过欺诈牟利的绝佳机会。在周二提交到旧金山联邦法院的诉状中,孙宇晨称,World Liberty 最初承诺未来将允许代币持有者交易该货币,这一承诺是虚假的且具有误导性。虽然大部分代币可以交易,但 World Liberty 阻止他出售任何一个代币,还威胁要销毁他持有的所有代币。WLFI 在 X 上回应称,孙宇晨最喜欢在扮演受害者的同时捏造毫无根据的指控掩盖其不当行为,“WLFI 不是第一个,我们有合同。我们有证据。我们有真相”,它表示法庭上见。

  9. 沥青会释放有毒的挥发性有机物

    沥青是石油副产品,被用作沥青混凝土的粘结剂。根据发表在《Journal of Hazardous Materials》和《Science of the Total Environment》期刊上的两项研究,沥青会释放出挥发性有机物(Volatile organic compounds,VOC),高温晴朗的天气会更显著。这些有毒且通常无味的挥发性有机物小到足以进入动脉和器官,它们会对人类造成严重的神经损伤,对女性和老年人影响更大。研究人员称,高温会加剧沥青释放有毒有机物。研究人员报告可以通过在沥青中添加藻类的方式大幅降低其释放出的毒性最强的有机物,测试显示毒性可以减少到原来的百分之一,而且能减缓路面老化。

  10. 乒乓球机器人击败了人类顶尖选手

    索尼 AI 部门研发的乒乓球机器人 Ace 击败了人类顶尖选手创造了历史,成为首个在竞技体育项目达到专家级水平的自主机器人。项目负责人表示,竞技体育需要快速决策和精准执行,Ace 的成就受益于其高速感知、基于 AI 的控制以及先进的机器人系统。Sony AI Zurich 主任兼 Ace 项目负责人 Peter Dürr 表示,AI 系统在计算机游戏中已经超越人类专家,但乒乓球之类的物理实时体育运动仍然是一个巨大的开放性挑战,因为它们需要在接近障碍物和人类反应极限的情况下进行快速、精准且对抗性的互动。该项目的目标不仅是在乒乓球比赛中展开竞争,更重要的是深入了解机器人如何在动态环境中以类似人类的速度和精度感知、规划和行动。索尼研究人员在《自然》期刊上发表论文介绍了 Ace。Ace 在 2025 年 4 月与精英选手之间的五场比赛赢了三场,但与职业选手的两场比赛都输了,Ace 之后分别在 2025 年 12 月和上个月进行的比赛中击败了职业选手。输给机器人的职业选手 Mayuka Taira 称,机器人没有情绪因此难以预测。曾赢过机器人的精英选手 Rui Takenaka 说机器人会用旋转复杂的球应对她发出的复杂球,用简单球回应简单球,她因此能乘机获胜。

  11. 苹果修复了在原始信息删除之后预览仍然保留在通知数据库里的 Bug

    在最近的一起涉及 ICE 的案件中,FBI 利用 iPhone 手机上储存的通知数据库数据恢复了已卸载的 Signal 应用消息。Signal 采用端对端加密,除了接收双方,其他人本来无法查看消息,但 iPhone 的通知功能提供了消息的预览,相关信息还会储存在通知数据库中。苹果现在释出了更新 iOS 26.4.2 和 iPadOS 26.4.2,修复了标记为删除的信息仍然保留在设备通知数据库里的 Bug。

  12. 阴谋论引发美国总统关注促使 FBI 调查

    美国 UFO 社群将过去几年发生的一系列航空航天和核物理领域无关联的专家死亡或失踪案关联起来,认为其中存在巨大的阴谋,怀疑是黑衣人组织 MIB 在执行清理工作。这一阴谋论最终渗透到美国政府,美国总统特朗普称此事非常严重,白宫表示在与联邦机构合作对此展开调查,共和党领导的众议院监督委员会也表示展开调查,FBI 表示正牵头展开调查,正与能源部、战争部以及州和地方执法部门合作去寻找答案。

  13. Google 发布第八代自研 AI 芯片 TPU 8t 和 TPU 8i

    Google 宣布了第八代自研 AI 芯片 TPU 8t 和 TPU 8i,前者专为大模型训练设计,后者专为大模型推理设计。TPU 8t 拥有更大的计算吞吐量和更多的可扩展带宽去满足计算密集训练工作负载,而 TPU 8i 则拥有更多的内存带宽去满足对延迟最敏感的推理工作负载。Google 称,TPU 8t 设计将前沿模型的开发周期从数月缩短至数周,单个 TPU 8t superpod 可扩展至 9,600 个芯片和 2 PB 共享高带宽内存,芯片间带宽是上一代的两倍,该架构可提供 121 ExaFlops 算力,允许最复杂模型利用单一海量内存池。TPU 8i 芯片则配备了 288 GB 高带宽内存和 384 MB 片上 SRAM,模型活动工作集能完全留在芯片上。

  14. 中国选拔出两名巴基斯坦预备宇航员

    中国载人航天工程首批外籍航天员选拔工作于 2026 年 4 月上旬结束,2 名巴基斯坦籍候选对象 Muhammad Zeeshan Ali(穆罕默德·齐尚·阿里)和 Khurram Daud(胡拉姆·达乌德)最终入选。他们将作为预备航天员来华参加训练,在完成各项训练并通过考核后,其中 1 人将以载荷专家身份参加飞行任务,成为首位进入中国空间站的外籍航天员。2025 年 2 月,中巴在伊斯兰堡签署《关于选拔、训练巴基斯坦航天员并参与中国空间站飞行任务的合作协议》,随即正式启动巴基斯坦航天员选拔工作。经过初选、复选、定选三个阶段的严格筛选和评定,最终选拔出 2 名巴基斯坦预备航天员。

  15. 英国 17 岁以下儿童终身禁止购烟

    英国议会通过了法案 Tobacco and Vapes Bill,17 岁及以下儿童将面临终身禁止购买香烟的禁令。该法案旨在禁止商店向 2009 年 1 月 1 日之后出生的儿童出售烟草,阻止他们吸烟,最终目标是打造无烟一代。吸烟是可预防死因之一,该法案被视为是几十年来英国最大的公共卫生干预措施。