DIGEST · 2026-04-21

OrangeBot.AI Digest — 2026-04-21

83 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Changes to GitHub Copilot individual plans (github.blog)
  2. Claude Code removed from Anthropic's Pro plan (claude.com)
  3. ChatGPT Images 2.0 (openai.com)
  4. Framework Laptop 13 Pro (frame.work)
  5. Meta to start capturing employee mouse movements, keystrokes for AI training (www.reuters.com)
  6. The Vercel breach: OAuth attack exposes risk in platform environment variables (www.trendmicro.com)
  7. Original GrapheneOS responses to WIRED fact checker (discuss.grapheneos.org)
  8. Brands got worse on purpose (www.worseonpurpose.com)
  9. Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return (techcrunch.com)
  10. Tim Cook's Impeccable Timing (stratechery.com)
  11. Show HN: VidStudio, a browser based video editor that doesn't upload your files (vidstudio.app)
  12. Laws of Software Engineering (lawsofsoftwareengineering.com)
  13. Apple ignores DMA interoperability requests and contradicts own documentation (fsfe.org)
  14. Edit store price tags using Flipper Zero (github.com)
  15. MNT Reform is an open hardware laptop, designed and assembled in Germany (mnt.stanleylieber.com)

GitHub Trending(8)

  1. Fincept-Corporation / FinceptTerminal
  2. thunderbird / thunderbolt
  3. zilliztech / claude-context
  4. ruvnet / RuView
  5. microsoft / ai-agents-for-beginners
  6. dayanch96 / YTLite
  7. HKUDS / RAG-Anything
  8. sansan0 / TrendRadar

Product Hunt(15)

  1. Magic Lane

    Sovereign navigation infrastructure for Europe

  2. RankAI

    RankAI autonomously gets you buyers from Google & AI Search

  3. Pioneer

    Fine-tune any LLM in minutes, with one prompt

  4. Spectrum

    Bring agents to all the interfaces people already use

  5. Kimi K2.6

    Open-source SOTA for long-horizon coding and agent swarms

  6. OnTheMap

    The global map for builders, founders, and visionaries

  7. Flow AI

    Turn Linkedin into unlimited leads on auto-pilot

  8. Magic Layers by Canva

    Turn any flat image into a fully editable design

  9. Gauge Sentiment

    How is your brand perceived by AI?

  10. Cosine Swarm

    Parallel AI agents for long-horizon, complex software tasks

  11. Perplexity Health

    Ask health questions across your records, labs, wearables

  12. Chronicle

    Build Codex memories from recent screen context.

  13. X Island

    Dynamic Island for AI Coding Agents

  14. Chat Skills for AI Agents

    One file. Any agent. Working chat in under 10 minutes.

  15. GladeKit

    AI agent for Unity game development

Hugging Face(15)

  1. Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

    Few-step generation has been a long-standing goal, with recent one-step generation methods exemplified by MeanFlow achieving remarkable results. Existing research on MeanFlow primarily focuses on class-to-image generation. However, an intuitive yet unexplored direction is to extend the condition from fixed class labels to flexible text inputs, enabling richer content creation. Compared to the limited class labels, text conditions pose greater challenges to the model's understanding capability, necessitating the effective integration of powerful text encoders into the MeanFlow framework. Surprisingly, although incorporating text conditions appears straightforward, we find that integrating powerful LLM-based text encoders using conventional training strategies results in unsatisfactory performance. To uncover the underlying cause, we conduct detailed analyses and reveal that, due to the extremely limited number of refinement steps in the MeanFlow generation, such as only one step, the text feature representations are required to possess sufficiently high discriminability. This also explains why discrete and easily distinguishable class features perform well within the MeanFlow framework. Guided by these insights, we leverage a powerful LLM-based text encoder validated to possess the required semantic properties and adapt the MeanFlow generation process to this framework, resulting in efficient text-conditioned synthesis for the first time. Furthermore, we validate our approach on the widely used diffusion model, demonstrating significant generation performance improvements. We hope this work provides a general and practical reference for future research on text-conditioned MeanFlow generation. The code is available at https://github.com/AMAP-ML/EMF.

  2. OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, but consistently fall short of their explicit counterparts. We suggest that this is due to purely linguistic latent representations compressing a symbolic abstraction of the world, rather than the causal dynamics that actually govern driving. Thus, we present OneVL (One-step latent reasoning and planning with Vision-Language explanations), a unified VLA and World Model framework that routes reasoning through compact latent tokens supervised by dual auxiliary decoders. Alongside a language decoder that reconstructs text CoT, we introduce a visual world model decoder that predicts future-frame tokens, forcing the latent space to internalize the causal dynamics of road geometry, agent motion, and environmental change. A three-stage training pipeline progressively aligns these latents with trajectory, language, and visual objectives, ensuring stable joint optimization. At inference, the auxiliary decoders are discarded and all latent tokens are prefilled in a single parallel pass, matching the speed of answer-only prediction. Across four benchmarks, OneVL becomes the first latent CoT method to surpass explicit CoT, delivering state-of-the-art accuracy at answer-only latency, and providing direct evidence that tighter compression, when guided in both language and world-model supervision, produces more generalizable representations than verbose token-by-token reasoning. Project Page: https://xiaomi-embodied-intelligence.github.io/OneVL

  3. Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

    Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present Agent-World, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.

  4. OpenGame: Open Agentic Coding for Games

    Game development sits at the intersection of creative design and intricate software engineering, demanding the joint orchestration of game engines, real-time loops, and tightly coupled state across many files. While Large Language Models (LLMs) and code agents now solve isolated programming tasks with ease, they consistently stumble when asked to produce a fully playable game from a high-level design, collapsing under cross-file inconsistencies, broken scene wiring, and logical incoherence. We bridge this gap with OpenGame, the first open-source agentic framework explicitly designed for end-to-end web game creation. At its core lies Game Skill, a reusable, evolving capability composed of a Template Skill that grows a library of project skeletons from experience and a Debug Skill that maintains a living protocol of verified fixes - together enabling the agent to scaffold stable architectures and systematically repair integration errors rather than patch isolated syntax bugs. Powering this framework is GameCoder-27B, a code LLM specialized for game engine mastery through a three-stage pipeline of continual pre-training, supervised fine-tuning, and execution-grounded reinforcement learning. Since verifying interactive playability is fundamentally harder than checking static code, we further introduce OpenGame-Bench, an evaluation pipeline that scores agentic game generation along Build Health, Visual Usability, and Intent Alignment via headless browser execution and VLM judging. Across 150 diverse game prompts, OpenGame establishes a new state-of-the-art. We hope OpenGame pushes code agents beyond discrete software engineering problems and toward building complex, interactive real-world applications. Our framework will be fully open-sourced.

  5. MultiWorld: Scalable Multi-Agent Multi-View Video World Models

    Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems. We present MultiWorld, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. We introduce the Multi-Agent Condition Module to achieve precise multi-agent controllability, and the Global State Encoder to ensure coherent observations across different views. MultiWorld supports flexible scaling of agent and view counts, and synthesizes different views in parallel for high efficiency. Experiments on multi-player game environments and multi-robot manipulation tasks demonstrate that MultiWorld outperforms baselines in video fidelity, action-following ability, and multi-view consistency. Project page: https://multi-world.github.io/

  6. EasyVideoR1: Easier RL for Video Understanding

    Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large language models. As models evolve into natively multimodal architectures, extending RLVR to video understanding becomes increasingly important yet remains largely unexplored, due to the diversity of video task types, the computational overhead of repeatedly decoding and preprocessing high-dimensional visual inputs, and the difficulty of reproducible evaluation across numerous sensitive hyperparameters. Existing open-source RL training frameworks provide solid infrastructure for text and image scenarios but lack systematic optimizations tailored for video modality. In this work, we present EasyVideoR1, a complete and efficient reinforcement learning framework specifically designed for training large vision-language models on video understanding tasks. EasyVideoR1 makes the following contributions: (1) a full video RL training pipeline with offline preprocessing and tensor caching that eliminates redundant video decoding and yields a 1.47 times throughput improvement; (2) a comprehensive, task-aware reward system covering 11 distinct video and image problem types with unified routing and modular extension; (3) a mixed offline-online data training paradigm that combines curated high-quality trajectories with on-policy exploration, benefiting the learning of more challenging tasks; (4) joint image-video training with independently configurable pixel budgets, allowing the two modalities to mutually reinforce each other; and (5) an asynchronous multi-benchmark evaluation framework covering 22 mainstream video understanding benchmarks, with reproduced accuracy closely aligned with officially reported scores.

  7. GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

    Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a special case of policy gradient optimization with an extremely sparse implicit reward and unstable inverse-probability weighting, which together lead to single-path dependency, entropy collapse, and gradient explosion. Motivated by this diagnosis, we propose Group Fine-Tuning (GFT), a unified post-training framework that addresses these intrinsic limitations through two mechanisms: Group Advantage Learning, which constructs diverse response groups and derives normalized contrastive supervision to alleviate reward sparsity, and Dynamic Coefficient Rectification, which adaptively bounds inverse-probability weights to stabilize optimization while preserving efficient knowledge injection. Experiments demonstrate that GFT consistently surpasses SFT-based methods and yields policies that integrate more smoothly with subsequent RL training.

  8. When Can LLMs Learn to Reason with Weak Supervision?

    Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward signals becomes increasingly difficult, making it essential to understand when RLVR can succeed under weaker forms of supervision. We conduct a systematic empirical study across diverse model families and reasoning domains under three weak supervision settings: scarce data, noisy rewards, and self-supervised proxy rewards. We find that generalization is governed by training reward saturation dynamics: models that generalize exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while models that saturate rapidly memorize rather than learn. We identify reasoning faithfulness, defined as the extent to which intermediate steps logically support the final answer, as the pre-RL property that predicts which regime a model falls into, while output diversity alone is uninformative. Motivated by these findings, we disentangle the contributions of continual pre-training and supervised fine-tuning, finding that SFT on explicit reasoning traces is necessary for generalization under weak supervision, while continual pre-training on domain data amplifies the effect. Applied together to Llama3.2-3B-Base, these interventions enable generalization across all three settings where the base model previously failed.

  9. WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

    Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and codebase-level reasoning largely unmeasured. We introduce WebCompass, a multimodal benchmark that provides unified lifecycle evaluation of web engineering capability. Recognizing that real-world web coding is an iterative cycle of generation, editing, and repair, WebCompass spans three input modalities (text, image, video) and three task types (generation, editing, repair), yielding seven task categories that mirror professional workflows. Through a multi-stage, human-in-the-loop pipeline, we curate instances covering 15 generation domains, 16 editing operation types, and 11 repair defect types, each annotated at Easy/Medium/Hard levels. For evaluation, we adopt a checklist-guided LLM-as-a-Judge protocol for editing and repair, and propose a novel Agent-as-a-Judge paradigm for generation that autonomously executes generated websites in a real browser, explores interactive behaviors via the Model Context Protocol (MCP), and iteratively synthesizes targeted test cases, closely approximating human acceptance testing. We evaluate representative closed-source and open-source models and observe that: (1) closed-source models remain substantially stronger and more balanced; (2) editing and repair exhibit distinct difficulty profiles, with repair preserving interactivity better but remaining execution-challenging; (3) aesthetics is the most persistent bottleneck, especially for open-source models; and (4) framework choice materially affects outcomes, with Vue consistently challenging while React and Vanilla/HTML perform more strongly depending on task type.

  10. ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

    Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an autonomous generation pipeline that instantiates this formalism from natural language descriptions. The pipeline comprises three modules: (1) a parser that extracts structured generation parameters from natural language input; (2) a generator that produces the task specification, tool interface, and scoring configuration; and (3) a validator that enforces feasibility, diversity, structural validity, and internal consistency across the generated environments. Using ClawEnvKit, we construct Auto-ClawEval, the first large-scale benchmark for claw-like agents, comprising 1,040 environments across 24 categories. Empirically, Auto-ClawEval matches or exceeds human-curated environments on coherence and clarity at 13,800x lower cost. Evaluated across 4 model families and 8 agent harness frameworks, we find that harness engineering boosts performance by up to 15.7 percentage points over a bare ReAct baseline, completion remains the primary axis of variation with no model saturating the benchmark, and automated generation enables evaluation at a scale previously infeasible. Beyond static benchmarking, ClawEnvKit enables live evaluation: users describe a desired capability in natural language and obtain a verified environment on demand, turning evaluation into a continuous, user-driven process. The same mechanism serves as an on-demand training environment generator, producing task distributions that adapt to an agent's current weaknesses rather than being bounded by existing user logs.

  11. SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

    As the capability frontier of autonomous agents continues to expand, they are increasingly able to complete specialized tasks through plug-and-play external skills. Yet current benchmarks mostly test whether models can use provided skills, leaving open whether they can discover skills from experience, repair them after failure, and maintain a coherent library over time. We introduce SkillFlow, a benchmark of 166 tasks across 20 families in which task construction within each family follows a Domain-Agnostic Execution Flow (DAEF) that defines an agent workflow framework, allowing these tasks to share a consistent workflow. Agents are evaluated under an Agentic Lifelong Learning protocol in which they begin without skills, solve tasks sequentially within each family, externalize lessons through trajectory- and rubric-driven skill patches, and carry the updated library forward. Experiments reveal a substantial capability gap. For Claude Opus 4.6, lifelong skill evolution improves task success from 62.65% to 71.08% (+8.43 points). However, high skill usage does not necessarily imply high utility: Kimi K2.5 gains only +0.60 points despite 66.87% skill usage, while Qwen-Coder-Next reaches only a 44.58% task completion rate and still regresses relative to the vanilla setting. SkillFlow contributes a structured testbed for this direction and an in-depth empirical analysis of skill discovery, patching, transfer, and their failure modes under lifelong evaluation.

  12. Crowded in B-Space: Calibrating Shared Directions for LoRA Merging

    Merging separately trained LoRA adapters is a practical alternative to joint multi-task training, but it often hurts performance. Existing methods usually treat the LoRA update ΔW = BA as a single object and do not distinguish the two LoRA matrices. We show that the main source of LoRA merge interference comes from the output-side matrix B. Across tasks, B repeatedly uses a small set of shared directions, while A remains much more task-specific. As a result, the merged adapter overemphasizes these shared directions, and task-specific information is lost. We propose Pico (Pre-merge interference calibration in output-space), a data-free method that calibrates B before merge by downscaling over-shared directions and then rescaling the merged update. Pico plugs directly into existing merging methods such as Task Arithmetic, TIES, and TSV-M. Across eight different benchmarks from math, coding, finance, and medical domains, Pico improves average accuracy by 3.4-8.3 points over the corresponding base method and achieves the best overall average performance. Pico also enables merged adapters to outperform the LoRA trained with all task data. These results show that LoRA merging works better when the two LoRA matrices are treated separately.

  13. The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

    On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: teacher supervision is formed under privileged context available during training, whereas the deployed model must report confidence using only deployment-time information. We formalize this perspective theoretically, showing that teacher-conditioned success is generally not a valid target for deployment-time confidence and that helpful privileged context induces entropy collapse and a systematic optimism bias. To address this, we propose a calibration-aware OPD framework, CaOPD, that estimates empirical confidence from model rollouts, replaces self-reported confidence with this student-grounded target, and distills the revised response through the same self-distillation pipeline. Experiments across various models and domains show that CaOPD achieves Pareto-optimal calibration while maintaining competitive capability, generalizing robustly under out-of-distribution and continual learning. Our findings highlight that capability distillation does not imply calibrated confidence, and that confidence should be treated as an essential objective in post-training. Code: https://github.com/SalesforceAIResearch/CaOPD

  14. Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

    Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word order and attribute binding. This limitation arises from a scarcity of informative samples needed to differentiate subtle semantic variations during contrastive pretraining. Although hard negative mining offers a promising remedy, existing methods lack explicit mechanisms to dictate which linguistic elements undergo modification. Instead of engineering generative architectures, this study establishes lexical concreteness as a fundamental determinant of negative sample efficacy. Modifying highly concrete terms generates more pronounced structural and visual discrepancies, providing a substantially stronger learning signal. Leveraging this principle, ConcretePlant is proposed to systematically isolate and manipulate perceptually grounded concepts. Analyses of the InfoNCE further reveals a severe gradient imbalance, where easily distinguishable pairs disproportionately overwhelm the optimization process and restrict the bandwidth available for nuanced learning. To resolve this degradation, the Cement loss is formulated utilizing a margin-based approach. By correlating psycholinguistic scores with sample difficulty, this objective dynamically calibrates the penalization applied to individual training pairs. Comprehensive evaluations substantiate these theoretical claims. The integrated framework, designated as Slipform, achieves state-of-the-art accuracy across diverse compositional evaluation benchmarks, general cross-modal retrieval, single and multi label linear probing.

  15. On the Reliability of Computer Use Agents

    Computer-use agents have rapidly improved on real-world tasks such as web navigation, desktop automation, and software interaction, in some cases surpassing human performance. Yet even when the task and model are unchanged, an agent that succeeds once may fail on a repeated execution of the same task. This raises a fundamental question: if an agent can succeed at a task once, what prevents it from doing so reliably? In this work, we study the sources of unreliability in computer-use agents through three factors: stochasticity during execution, ambiguity in task specification, and variability in agent behavior. We analyze these factors on OSWorld using repeated executions of the same task together with paired statistical tests that capture task-level changes across settings. Our analysis shows that reliability depends on both how tasks are specified and how agent behavior varies across executions. These findings suggest the need to evaluate agents under repeated execution, to allow agents to resolve task ambiguity through interaction, and to favor strategies that remain stable across runs.

Techmeme(15)

  1. Trump Media & Technology Group names Kevin McGurn as interim CEO effective immediately; McGurn previously worked as an executive at Hulu, Vevo, and T-Mobile (Todd Spangler/Variety)

    Todd Spangler / Variety : Trump Media & Technology Group names Kevin McGurn as interim CEO effective immediately; McGurn previously worked as an executive at Hulu, Vevo, and T-Mobile —  Trump Media & Technology Group, the parent company of social-media platform Truth Social and other businesses whose mission is …

  2. Source: a handful of unauthorized users in a private Discord channel have been accessing Anthropic's Mythos model since the day the company announced it (Rachel Metz/Bloomberg)

    Rachel Metz / Bloomberg : Source: a handful of unauthorized users in a private Discord channel have been accessing Anthropic's Mythos model since the day the company announced it —  A small group of unauthorized users have accessed Anthropic PBC's new Mythos AI model, a technology that the company says is so powerful …

  3. Google now offers two research agents: Deep Research, replacing its December preview release, and Deep Research Max, both available via Gemini API paid tiers (The Keyword)

    The Keyword : Google now offers two research agents: Deep Research, replacing its December preview release, and Deep Research Max, both available via Gemini API paid tiers —  Built with Gemini 3.1 Pro, the new Deep Research agents bring MCP support, native visualizations and unprecedented analytical quality …

  4. Reliable Robotics, which is developing autonomous aircraft systems for cargo flights, raised $160M led by Nimble Partners, pushing its valuation to ~$1B (Cailley LaPara/Bloomberg)

    Cailley LaPara / Bloomberg : Reliable Robotics, which is developing autonomous aircraft systems for cargo flights, raised $160M led by Nimble Partners, pushing its valuation to ~$1B —  Reliable Robotics Corp. secured $160 million in new funding — pushing its valuation to nearly $1 billion — as the Silicon Valley startup makes …

  5. Adobe announces a $25B stock repurchase program through April 30, 2030; Adobe shares have fallen around 30% so far this year (Zaheer Kachwala/Reuters)

    Zaheer Kachwala / Reuters : Adobe announces a $25B stock repurchase program through April 30, 2030; Adobe shares have fallen around 30% so far this year —  Adobe (ADBE.O) on Tuesday said its board of directors has approved a new $25 billion stock repurchase program through April 30, 2030, sending its shares up around 2% in extended trading.

  6. The US DOJ says a former ransomware negotiator pleaded guilty to helping cybercriminals extort companies in cyberattacks in five different incidents (Lorenzo Franceschi-Bicchierai/TechCrunch)

    Lorenzo Franceschi-Bicchierai / TechCrunch : The US DOJ says a former ransomware negotiator pleaded guilty to helping cybercriminals extort companies in cyberattacks in five different incidents —  Angelo Martino, a former ransomware negotiator, has pleaded guilty to helping cybercriminals extort companies in cyberattacks.

  7. Roblox reaches settlements totaling $35.8M with the AGs of West Virginia, Alabama, and Nevada over child-safety protections (Cecilia D'Anastasio/Bloomberg)

    Cecilia D'Anastasio / Bloomberg : Roblox reaches settlements totaling $35.8M with the AGs of West Virginia, Alabama, and Nevada over child-safety protections —  Roblox Corp. has reached settlements with attorneys general in three states over child-safety protections.  —  As part of the deals, the video game company pledged …

  8. An interview with Sam Altman and Greg Brockman on OpenAI's restructuring, cutting Sora, "personal AGI", Anthropic's "fear-based marketing" for Mythos, and more (Core Memory)

    Core Memory : An interview with Sam Altman and Greg Brockman on OpenAI's restructuring, cutting Sora, “personal AGI”, Anthropic's “fear-based marketing” for Mythos, and more —  The last two standing - on everything  —  Sam Altman and Greg Brockman came on Core Memory together for a ten-year look back at OpenAI.

  9. Core Scientific plans to raise $3.3B via a junk bond sale to finance its shift from crypto mining to building AI data centers and leasing them to CoreWeave (Francisco Rodrigues/CoinDesk)

    Francisco Rodrigues / CoinDesk : Core Scientific plans to raise $3.3B via a junk bond sale to finance its shift from crypto mining to building AI data centers and leasing them to CoreWeave —  What to know: … Core Scientific (CORZ) is preparing to raise $3.3 billion through a junk bond sale as it continues …

  10. Sources: following Manus probe, Chinese authorities ordered at least one other prominent AI startup, MiroMind, not to send talent and research out of China (Washington Post)

    Washington Post : Sources: following Manus probe, Chinese authorities ordered at least one other prominent AI startup, MiroMind, not to send talent and research out of China —  A Chinese government probe of a Meta-acquired company, Manus AI, reveals what tech workers see as a new red line.  —  Summary

  11. Mozilla says its Firefox 150 release includes fixes for 271 vulnerabilities identified using early access to Anthropic's Mythos Preview (Lily Hay Newman/Wired)

    Lily Hay Newman / Wired : Mozilla says its Firefox 150 release includes fixes for 271 vulnerabilities identified using early access to Anthropic's Mythos Preview —  The Firefox team doesn't think emerging AI capabilities will upend cybersecurity long term, but they warn that software developers are likely in for a rocky transition.

  12. NeoCognition, which wants to build AI agents that self-learn like humans, emerges from stealth with a $40M seed co-led by Cambium Capital and Walden Catalyst (Marina Temkin/TechCrunch)

    Marina Temkin / TechCrunch : NeoCognition, which wants to build AI agents that self-learn like humans, emerges from stealth with a $40M seed co-led by Cambium Capital and Walden Catalyst —  Investors are aggressively courting AI researchers to build startups that can make AI more reliable and efficient.

  13. ChatGPT Images 2.0 is available globally to ChatGPT and Codex users, with a more powerful version for paying subscribers; its knowledge cutoff is December 2025 (Reece Rogers/Wired)

    Reece Rogers / Wired : ChatGPT Images 2.0 is available globally to ChatGPT and Codex users, with a more powerful version for paying subscribers; its knowledge cutoff is December 2025 —  The ChatGPT Images 2.0 model is here.  Our testing shows it's better at creating more detailed images and rendering text …

  14. OpenAI says ChatGPT Images 2.0 comes in Instant and Thinking variants and can generate images of up to 2K resolution and in multiple aspect ratios (Zac Hall/9to5Mac)

    Zac Hall / 9to5Mac : OpenAI says ChatGPT Images 2.0 comes in Instant and Thinking variants and can generate images of up to 2K resolution and in multiple aspect ratios —  OpenAI is announcing its upgraded ChatGPT image generation model with ChatGPT Images 2.  The company is also scaling up Codex for enterprise with a new Codex Labs initiative.

  15. OpenAI says that ChatGPT Images 2.0 has a stronger understanding of non-Latin text rendering in languages like Japanese, Korean, Hindi, and Bengali (Amanda Silberling/TechCrunch)

    Amanda Silberling / TechCrunch : OpenAI says that ChatGPT Images 2.0 has a stronger understanding of non-Latin text rendering in languages like Japanese, Korean, Hindi, and Bengali —  It used to be easy enough to distinguish between human-made and AI-generated imagery — just two years ago, you couldn't use image models …

Solidot(15)

  1. 创意软件行业向 Adobe 宣战

    帝国终将陨落,创意软件行业如今一致认为 Adobe 结束的时代即将到来,它们纷纷以免费或更低的价格提供与 Adobe 竞争的产品——Adobe 的创意软件过去几十年一直被视为行业标准。Cinema 4D 开发商 Maxon 在收购 Autograph 之后向个人用户提供了免费版本,Autograph 是类似 Adobe After Effects 的 动态图形设计软件,此前的永久授权费用高达 1795 美元;Canva 在收购 Affinity 之后将功能上类似 Adobe Illustrator、Photoshop 和 InDesign 的软件 Affinity Designer 2、Affinity Photo 2 和 Affinity Publisher 2 免费提供给用户,它在收购 Cavalry 之后也将其免费,Cavalry 类似 After Effects;苹果于今年 1 月推出了 Creator Studio 套件,包含了 Final Cut Pro、Logic Pro、Pixelmator Pro、Motion、Compressor 和 MainStage 等创意软件,月订费 12.99 美元,相比下 Adob​​e的 Creative Cloud Pro 每月订阅费用 69.99 美元,苹果没有强制用户订阅,用户仍可购买单个应用的单次授权。

  2. 吸入可卡因的鲑鱼有更强的冒险精神

    人类使用的毒品不可避免的渗透到环境中,被动物如鲨鱼摄入。但动物摄入毒品之后会有什么影响?实验室研究表明,接触过可卡因的水蚤游得更快,小龙虾会冒险游到藏身处之外,在自然界里这是一种危险的行为。根据发表在《Current Biology》期刊上的一项研究,科学家首次在野外环境测试了毒品对鲑鱼的影响。一组 35 条鲑鱼植入了含有可卡因的小型装置,另一组植入了含有可卡因主要代谢物苯甲酰爱康宁(benzoylecgonine)的装置,第三组则是对照组。实验结果显示,对照组的鱼多数生活在距离其放流点 20 公里的地方,可卡因组则生活在更遥远的地方,苯甲酰爱康宁组最远生活在 32 公里外。代谢物相对于可卡因对动物的影响更大,这与实验室观察一致。毒品对动物行为的影响会带来什么长期后果暂时还不清楚,需要进一步研究。

  3. Palantir 发表受争议的技术共和国宣言

    Palantir CEO Alex Karp 及法务主管 Nicholas W. Zamiska 联合发表的《技术共和国(The Technological Republic)》一书 22 点宣言过去几天引发了广泛争议。Palantir 是硅谷亿万富翁 Peter Thiel 联合创办的军事承包商。批评者认为这一宣言是法西斯主义,充斥着保守派对自由主义的陈词滥调。宣言称,自由民主社会要赢需要的不只是道德诉求,它需要硬实力,而本世纪的硬实力将建立在软件之上。宣言要求对宗教信仰更宽容,抵制空洞的多元主义的诱惑;认为制造人们喜欢且觉得有用的产品是颓废,硅谷公司应该提供安全;问题不在于 AI 武器是否会被制造出来;问题在于谁建造它们,以及建造的目的是什么,就 AI 武器展开作秀式的辩论纯属浪费时间;兵役应是一种普遍义务,民众同时要对精英保持敬畏和宽容,不应嘲笑马斯克(Elon Musk),反对无情曝光公众人物私生活。宣言还批评西方以包容性之名拒绝定义民族文化。

  4. F-35 是为不同战争而造的尖端战斗机

    美国的 F-35 战斗机项目始于上世纪末到本世纪初,整个项目总生命周期成本预计超两万亿美元,为美国历史上最昂贵的国防采购项目。然而过去几年爆发的战争凸显了现代战争越来越趋向于能大批量生产且损失之后能迅速替换的系统。而 F-35 战斗机数量少,造价昂贵,无法迅速替换,使其并不适合此类持久战。大规模发射的廉价导弹或无人机也让今天昂贵的导弹防御系统面临性价比问题。爱国者和萨德是最先进的导弹防御系统,但造价昂贵且产量有限。F-35 的情况类似。俄乌战争证明,无人系统能以比怀疑者预期更快的速度重塑战场。

  5. 英国计划学校期间禁止使用手机

    英国计划禁止学生在校期间使用手机。政府计划修订《儿童福祉与学校法案(children’s wellbeing and schools bill)》,将学校手机禁令的指导意见变成具有法律效力的正式禁令。英国大部分学校已经制定政策禁止使用手机,数据显示 99.8% 的小学和 90% 的中学限制或禁止学生在校期间使用手机。《儿童福祉与学校法案》被视为英国数十年来最重要的儿童保护立法,其中包括对未入学儿童进行强制性登记、打击儿童社会福利领域的牟利行为,以及创建一个唯一标识符帮助相关机构追踪儿童福利状况等。

  6. 荣耀人形机器人如何成为半马冠军

    智能手机厂商荣耀的人形机器人包揽了北京亦庄半程马拉松的前六名,而去年的冠军天工以及备受关注的宇树科技的 H1 都在终点线附近摔倒。这一结果凸显了荣耀与初创企业之间的可靠性差距。荣耀工程师在谈及该公司人形机器人获胜原因时表示,第一是将智能手机领域积累的技术实力应用到了人形机器人上。第二是体型,参考人类顶尖运动员的身形,将腿长设定为 95 厘米,实现了非常大的改善。第三是自主开发的高性能冷却系统。荣耀机器人所使用的核心散热部件来自华科冷芯(上海)动力科技有限公司。华科冷芯 CEO 陈奇表示,人形机器人要实现持续高速奔跑,面临的核心难题之一是下肢关节电机的散热。高负载奔跑要求高扭矩输出,并产生大量热量,这相当于一个小型“火炉”。一旦电机温度超过安全阈值,可能导致控制器烧毁、永磁体退磁、绕组绝缘损坏等永久性故障。荣耀机器人配置两个华科冷芯的高速悬浮泵,这种液冷技术帮助解决机器人的散热困境。

  7. Deezer 称 44% 的新上传音乐是 AI 生成的

    Spotify 和 YouTube Music 等音乐流媒体服务已成为人们收听音乐的主要方式,这比购买专辑更方便,但也更容易让 AI 创作的歌曲进入用户播放列表。多数流媒体平台不会特意标记 AI 音乐,但 Deezer 一直致力于开发识别 AI 内容的技术。该公司表示,AI 音乐已接近所有新上传内容的一半,而且 AI 内容的大部分“听众”也是 AI。Deezer 的测试显示,97% 的用户无法区分 AI 创作的音乐和人类创作的音乐。Deezer 平台上传的新歌曲中 AI 创作的比例达到 44%,但它不会将标记为 AI 创作的歌曲添加到推荐歌单或编辑歌单中。AI 音乐的播放量只占到 Deezer 总播放量的 1%-3% 之间。

  8. PlayStation 将要求使用聊天功能的用户验证年龄

    索尼发送的一封邮件显示,PlayStation 将于今年晚些时候要求玩家验证年龄才能继续使用 PlayStation 的通讯服务如消息和语音聊天。索尼称,此举旨在在尊重玩家及其家庭隐私的前提下为他们提供安全、适合其年龄的游戏体验,赋予他们“对游戏体验的有效控制权”。年龄验证流程将在全球范围内实施。如果玩家选择不验证年龄,他们仍然可以使用 PlayStation 的其他服务如游戏、奖杯和商店。只有通讯体验会受到影响。索尼尚未披露年龄验证的具体实施时间。

  9. 洋葱新闻再次达成协议接管阴谋论网站 Infowars

    美国最好的新闻源洋葱新闻(The Onion)再次达成协议接管 Alex Jones 创办的阴谋论网站 Infowars。Jones 因宣称发生在 2012 年 12 月 14 日的 Sandy Hook 小学枪击案是一场骗局而被受害者家属提起诉讼,2022 年他被勒令向受害者家属赔偿 14 亿美元,而 Jones 在继续上诉的同时申请了破产,法官则同意清算其资产以支付赔偿金。2024 年洋葱新闻拍下了 Infowars,但法官以竞拍不透明为由拒绝将 Infowars 出售给洋葱新闻。现在洋葱新闻与破产法庭指定的资产清算管理者 Gregory Milligan 达成协议以每月 1800 美元获得 Infowars 及其知识产权的使用授权,该协议需要获得破产法庭批准,而 Jones 能提出上诉。如果法庭批准,洋葱新闻计划将 Infowars 转变成讽刺其阴谋论起源的喜剧网站,聘请喜剧演员 Tim Heidecker 担任创意总监。

  10. Tim Cook 将于九月卸任苹果 CEO 一职

    Tim Cook 将于九月卸任苹果 CEO 一职,改任执行主席,CEO 职位由硬件技术高级副总裁 John Ternus 担任,而高管 Johny Srouji 则将出任新的首席硬件官。Tim Cook 于 1998 年 3 月加入苹果公司担任营运部门总裁,2005 年出任首席运营官,在苹果联合创始人乔布斯(Steven Jobs)因病离职之后于 2011 年 8 月担任 CEO,至今任职 15 年。

  11. 从 2027 年起欧盟销售的智能手机和平板必须能更换电池

    根据欧盟的新规定,从 2027 年起欧洲销售的智能手机和平板电脑必须配备可更换电池。此举旨在减少电子垃圾。欧盟地区每年售出约 1.5 亿部智能手机和 2400 万台平板电脑,相当于每年产生约 500 万吨电子垃圾,只有不到四成的电子垃圾被妥善回收。强制电池可更换的规定于 2027 年 2 月 18 日生效,它还规定任何便捷式电子产品的替换电池必须在产品最后一台投放市场后至少五年内继续供应。电池必须能由消费者自行替换,如果需要专用工具则必须在出售时免费提供。欧盟的新规定还要求操作系统的更新必须持续至少五年。

  12. 诺奖得主对人类再生存 50 年感到悲观

    美国理论物理学家 David Gross 因与学生 Frank Anthony Wilczek 发现了量子色动力学中的渐近自由而在 2004 年共同获得诺贝尔物理学奖,2026 年 4 月 18 日他因为其一生对理论物理学的开创性贡献而获得了基础物理学突破奖的特别奖,获得了 300 万美元奖金。他在接受采访时被问到理论物理学是否可能在 50 年内实现大一统时表示,人类能再生存 50 年的概率非常小。他说,人类每年爆发核战争的概率大约为 2%,过去十年大国之间没有签署任何条约,人类正陷入一场惊人的军备竞赛。最近的一系列事件都增加了核战争的风险,2% 的概率已是保守估计了。当今世界有 9 个拥有核武器的国家,其中三个是核超级大国,情况比两个核大国复杂得多,国家之间的协议和规范都在瓦解。他认为人类能再生存一百年的概率微乎其微,再生存两百年的概率更是趋向于无限小了。所以费米提出的“银河系所有文明所有智慧生命都去哪里?为什么不与人类交流?”的悖论的答案是它们已经自我毁灭了。

  13. GitHub 上项目的伪造星数

    在最大的源代码托管平台 GitHub,一个项目的星数曾经是衡量其受欢迎程度的重要指标。因为重要,因此伪造星数或者付费刷星数也日益商业化。卡内基梅隆、北卡州立大学和 Socket 的研究人员在 ICSE 2026 上发表了一项研究,使用工具 StarScout 分析了 20TB GitHub 元数据,涵盖 2019 年到 2024 年 67 亿个事件和 3.26 亿星数,识别了 600 万被怀疑刷的虚假星数,涉及 30.1 万个账户创建的 18,617 个库。付费刷星数在 2024 年急剧恶化,到 7 月 16.66% 有 50 或以上星数的项目涉嫌刷星数。到了 2025 年 1 月,涉嫌刷星数的项目有 90.42% 被官方移除,涉嫌的账号有 57.07% 被关闭。AI 和 LLM(大模型)的项目超过区块链/加密货币,成为刷星数最多的非恶意项目类别。调查发现有几十家网站、以及 Fiverr 卖家和 Telegram 频道提供付费刷星数的服务,价格最低 0.03 美元/星,最高 0.8-0.9 美元/星。清华大学的一项研究发现 QQ 和微信推广群也提供了刷星数的付费服务。

  14. WireGuard For Windows v1.0 释出

    WireGuard 作者 Jason Donenfeld 在邮件列表上宣布 WireGuard For Windows 以及 Windows 下内核模式实现 WireGuardNT 释出 v1.0。WireGuard 是开源 VPN 协议和自由开源软件,旨在获得比 IPsec 和 OpenVPN 更好的性能。项目在 2015 年发布了最早的版本,2020 年其 Linux 版本达到稳定生产阶段,正式合并到内核主线。Windows 版本从测试阶段到成熟又经历了五年时间。

  15. Brave 推出付费版 Brave Origin,Linux 版免费

    Brave 推出了付费版浏览器 Brave Origin,该版本移除了原版内置的变现功能如 Rewards。Origin 可单独下载,或作为现有版本的升级,一次购买即可解锁,可以在多个设备上激活。Origin 的 Linux 版本是免费的,这可能会让 Windows 付费版用户困惑:为什么他们要为别人免费获得的东西付费?