OrangeBot.AI Digest — 2026-05-28
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Bricks and Minifigs Stole a Man's $200k Lego Collection (mybricklog.com)
- Just Use Postgres for Durable Workflows (www.dbos.dev)
- Anthropic raises $65B in Series H funding at $965B post-money valuation (www.anthropic.com)
- Claude Opus 4.8 (www.anthropic.com)
- The Permanent Upper Crow (permanent-upper-crow.jasonwu.ink)
- New York passes pied-a-terre tax (www.cnbc.com)
- Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue (llmgame.scalex.dev)
- Indoor Wi-Fi Roaming with OpenWRT (taoofmac.com)
- EU fines Temu €200M for allowing sale of illegal products (www.bbc.co.uk)
- Citing 'severe' math deficits, UC faculty demand a return to SAT tests for STEM (www.latimes.com)
- Disagreement among frontier LLMs on real-world fact-checks (lenz.io)
- AI sticker shock hits corporate America (www.axios.com)
- AMD pulls a bait-and-switch on Linux users with Vivado licensing changes (itsfoss.com)
- A Eureka machine that thinks like nature and explores what AI cannot (iisc.ac.in)
- Bttf is a command line datetime Swiss army knife (github.com)
GitHub Trending(15)
- harry0703 / MoneyPrinterTurbo
- affaan-m / ECC
- Leonxlnx / taste-skill
- hardikpandya / stop-slop
- twentyhq / twenty
- DigitalPlatDev / FreeDomain
- byoungd / English-level-up-tips
- microsoft / markitdown
- obra / superpowers
- revfactory / harness
- codecrafters-io / build-your-own-x
- Lum1104 / Understand-Anything
- unclecode / crawl4ai
- OpenMOSS / MOSS-TTS
- EveryInc / compound-engineering-plugin
Product Hunt(15)
- Memori
Persistent memory from agent trace, not just conversation
- Revolte
AI for Software Engineering
- Pitch Agent
On-brand presentations, generated in seconds
- SpotsNow
Track who's advertising across podcasts w/ campaign insights
- Buffer API
One API to publish across every social platform.
- Granite
A vault for every document that matters
- NeuralAgent 2.5
Talk to your computer, it responds and gets things done.
- Parastore
Simulate real store with LLM-powered synthetic consumer
- Pancake
OpenClaw in Slack that makes your company autonomous
- AccountyCat
A focus companion that actually gets context
- Robinhood Agentic Trading
Let your agent trade
- LaunchOS
Bring Back the Classic Launchpad Experience on macOS 26+
- Growati
The autopilot for YouTube post-production
- Angel Match 4.0
A database of 125K+ angels and VCs to raise your seed round
- Stage
Screen recording for demos, bugs, and updates
Hugging Face(15)
- ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
Proactive Recommender Systems (PRSs) aim to guide user preference shift toward target items by generating paths of intermediate recommendations. Reinforcement learning (RL) provides a principled framework for optimizing such sequential decision tasks, as path rewards can naturally capture both short-term acceptance and long-term guidance effectiveness. However, naively applying policy gradients to PRS results in deficient gradient estimation. We identify two deficiencies: (1) path-level rewards decompose into step-level rewards with positive mean, creating a length-dependent bias that causes gradients to favor path extension over meaningful exploration; (2) weighting each step by the entire path-level reward ignores the decomposition structure, leading to high gradient variance. To rectify these two deficiencies, we propose an effective RL framework ProRL with two novel mechanisms for proactive recommendation. First, Stepwise Reward Centering subtracts expected rewards to neutralize length-dependent bias, ensuring that path extension yields zero expected gradient signal. Second, Position-Specific Advantage Estimation leverages the reward decomposition structure to compute step-dependent baselines, reducing gradient variance. Together, these mechanisms yield policy gradients that precisely target path quality. Our experiments on three real-world datasets demonstrate that ProRL significantly outperforms state-of-the-art PRSs. Our code is available at https://github.com/hongruhou89/ProRL.
- Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
Vision-language models with extended reasoning succeed on complex problems, but many real-world problems require external tools that internal reasoning alone often cannot resolve. Agentic reasoning therefore interleaves two behaviors with a structural asymmetry: thinking (the self-contained default) and tool use (a high-variance auxiliary acting). We refer to this asymmetry as the Thinking-Acting Gap. Under standard RL recipes like GRPO, the gap manifests as two diagnostic symptoms during training: tool use is attempted on only ~30% of rollouts, and when attempted, the tool-using rollouts within a group are all-wrong on ~40% of questions, suppressing the learning signal at the tool calls that needed it. We propose AXPO (Agent eXplorative Policy Optimization): for each all-wrong tool-using subgroup, AXPO fixes the thinking prefix and resamples the tool call and its continuation, paired with uncertainty-based prefix selection. Across nine multimodal benchmarks and three scales of Qwen3-VL-Thinking, SFT+AXPO outperforms SFT+GRPO at average (+1.8pp Pass@1 and +1.8pp Pass@4 at 8B on average) and 8B with SFT+AXPO surpasses the 32B Base on Pass@4 with 4 times fewer parameters.
- Self-Improving Language Models with Bidirectional Evolutionary Search
Search has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference. However, widely used methods such as best-of-N sampling and tree search face two fundamental limitations: they are guided by sparse verification signals, and they construct candidates primarily through autoregressive expansion, restricting exploration to regions with substantial model probability mass. To address these, we propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition. In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout. In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search. We provide theoretical motivation showing that candidates generated by expansion-only search are confined to a narrow entropy shell while evolutionary operators can escape it, and that backward search can exponentially reduce the number of required samples to find a correct answer. Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains, and on three open problem solving benchmarks at inference time, BES outperforms existing open-source frameworks in both average and best-case performance. Code and trained models are available at https://github.com/Embodied-Minds-Lab/BES.
- DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes
Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily curated difficult datasets, limiting scalable capability improvement. In this paper, we introduce DenoiseRL, a reinforcement learning framework that substitutes external supervision with recovery-oriented optimization over failures from weak models. Instead of relying on stronger supervision or carefully engineered data, DenoiseRL learns directly from incorrect reasoning traces by converting them into opportunities for improvement, making training more scalable and less dependent on external resources. This yields a richer and more diverse learning signal, improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for expensive data curation or stronger teacher models. Empirically, DenoiseRL consistently outperforms strong on-policy RL baselines across competitive mathematical and general reasoning benchmarks and promotes stronger self-corrective behavior as training difficulty increases, highlighting an effective and scalable alternative pathway for improving reasoning in large language models.
- MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems
Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work, we study the new problem of error tracing and attribution in LLM memory systems. We propose a novel framework that transforms memory pipelines into executable memory evolution graphs, enabling fine-grained tracing of operational information flow. We then construct MemTraceBench, a benchmark collected from representative memory systems such as Long-Context, RAG, Mem0, and EverMemOS, to systematically study memory failure modes. We further introduce an automatic attribution method that iteratively traces operation subgraphs to pinpoint the root cause of any failed case. Our analysis reveals that memory failures are systematic, stemming from operation-level issues like information loss and retrieval misalignment. Crucially, we leverage these fine-grained attribution signals to guide downstream prompt optimization, establishing a closed-loop system that automatically corrects faults and boosts end-task performance by up to 7.62%. Code will be released at https://github.com/zjunlp/MemTrace.
- GEM: Generative Supervision Helps Embodied Intelligence
Embodied Vision-Language Models (VLMs) have demonstrated impressive performance and generalization in robotics, particularly within Vision-Language-Action frameworks. However, a significant gap remains between the high-level semantic focus of standard text-guided pre-training paradigms and the low-level spatial and physical knowledge critical for execution in embodied environments. In this paper, we introduce GEM, a Generative-supervised Embodied vision-language Model designed to bridge this divide. We propose integrating a depth map generation task directly into the VLM pre-training phase. By training this generative objective jointly with the main model, we observe substantial improvements in embodied intelligence, significantly enhancing both semantic understanding and physical operation capabilities. To support this paradigm, we curate and release GEM-4M, a comprehensive large-scale dataset featuring a mixture of grounding, reasoning, and planning data paired with high-quality depth supervision. Extensive experiments demonstrate that GEM achieves state-of-the-art results across diverse embodied benchmarks. Furthermore, our deployed action model, GEM-VLA, exhibits vastly superior task execution abilities in both simulation environments and real-world evaluations. Code, models, and datasets are available at https://zhaorw02.github.io/GEM/
- Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.
- ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence
Autonomous research agents produce competitive solutions and professional-looking manuscripts, yet their outputs contain verifiability failures undetectable by surface-level evaluation: fabricated citations, unreproducible scores, and method descriptions that diverge from the implementation. We address this through three contributions. First, Chain-of-Evidence (CoE), a verifiability framework requiring every claim to be traceable to its evidence source. Second, ScientistOne, an end-to-end autonomous research system that maintains evidence chains by construction throughout literature review, solution discovery, and paper writing. Third, CoE Audit, a post-hoc audit whose four integrity checks -- score verification, specification violation, reference verification, and method-code alignment -- apply uniformly to all systems. Across 75 papers spanning five systems and five frontier research tasks, every baseline exhibits at least one systematic failure mode: hallucinated reference rates reach 21%, score verification passes in as few as 42% of papers, and method-code alignment ranges from 20% to 80%. ScientistOne achieves zero hallucinated references (0/337), perfect score verification (12/12), and the highest method-code alignment (14/15), while matching or exceeding human expert performance on all five tasks. ScientistOne further generalizes to six additional tasks spanning medical imaging, fine-grained recognition, 3D perception, and language modeling, achieving state-of-the-art on Parameter Golf and gold medals on MLE-Bench tasks where baselines fail entirely.
- AI Research Agents Narrow Scientific Exploration
AI research agents can now generate research ideas, design experiments, run code, and draft papers, raising the possibility of large-scale AI-assisted scientific discovery. Many current agent frameworks explicitly encourage the generation of novel and high-impact ideas. Yet it remains unclear whether AI-assisted ideation broadens scientific exploration or mainly concentrates around existing work. We study AI research agents as scientific search systems. Using four AI research-agent frameworks and six large language models, we generate 37,802 scientific ideas from shared seed literature across citation-defined research areas in AI and machine learning. We then compare the resulting AI ideas against human-authored papers from the same research areas, follow-on human research emerging from the same seed literature, and the seed literature itself. Across experiments, four consistent patterns emerge. First, AI-generated ideas are substantially more concentrated than human-authored papers from the same research areas. Second, AI-generated ideas remain much closer to their starting literature than later human follow-on work does. Third, papers most similar to AI-generated ideas tend to receive lower subsequent citations. Fourth, when AI-generated ideas differ from prior work, the differences arise primarily from recombining existing technical methods rather than introducing fundamentally new research questions. Overall, current AI research agents appear better suited to local elaboration than to broadening scientific exploration.
- Rethinking Memory as Continuously Evolving Connectivity
Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be connected. To address this, we propose FluxMem, a connectivity-evolving memory framework that models memory as a heterogeneous graph and progressively refines its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. During execution, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, guided by one metric for memory generalizability and evolutionary maturity. Across three fundamentally distinct benchmarks including LoCoMo, Mind2Web, and GAIA, FluxMem achieves consistent state-of-the-art performance, demonstrating strong adaptation and generalization in complex agentic environments. The code will be open-sourced in https://github.com/zjunlp/LightMem.
- Triplet-Block Diffusion RWKV
Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration remains inherently inconsistent: diffusion requires bidirectional attention, while causal models are unidirectional. To unify these architectures, we propose B^3D-RWKV, a diffusion RWKV variant that integrates the model's O(L) inference efficiency with parallel, bidirectional discrete-diffusion through a triplet-block layout method. B^3D-RWKV-7.2B reaches comparable accuracy on an 8-task suite versus existing models while significantly outperforming baselines in decoding throughput with an average of 1.6times speedup.
- GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
Despite the rapid progress of multimodal large language models in building Graphical User Interface (GUI) agents, their real-world task completion is fundamentally bottlenecked by a lack of world knowledge about GUI operations. Existing solutions typically rely on expensive multi-agent scaffolding or conventional post-training paradigms, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). However, post-training only allows agents to implicitly absorb world knowledge through action annotations or reward signals, leading to inefficient trajectory memorization rather than genuine comprehension. Therefore, an approach that enables explicit learning of this knowledge is imperative. To this end, we propose GUI-CIDER, a mid-training method that explicitly internalizes GUI world knowledge through Causal Internalization and Density-aware Exemplar Reselection. GUI-CIDER operates in three stages: (1) data synthesis, which distills static planning and dynamic causal knowledge from GUI trajectories into text; (2) exemplar reselection, which filters the corpus by rewarding causal structures and penalizing semantic redundancy; and (3) mid-training, where the refined data is used to embed the acquired knowledge. Extensive experiments on two GUI knowledge benchmarks and three task completion benchmarks demonstrate that GUI-CIDER consistently improves both the agent's understanding of GUI operations and its task success rates.The codes are available at https://github.com/Wuzheng02/GUI-CIDER.
- Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization
Recent advances in online reinforcement learning (RL) for large language models (LLMs) have demonstrated promising performance in complex reasoning tasks. However, they often exhibit an imbalanced exploration-exploitation trade-off, resulting in unstable optimization and sub-optimal performance. We introduce IB-Score, a novel metric grounded in Information Bottleneck theory that evaluates policy's exploration-exploitation balance by quantifying the trade-off between step-level reasoning diversity and mutual information shared with the correct answer. Analysis based on IB-Score shows that popular online RL approaches (e.g., GRPO) with common regularizers fail to consistently maintain balance during training with suboptimal results. To address this, we propose Information Bottleneck-driven Tree-based Policy Optimization (IB-TPO), a principled framework that formulates IB-Score as a fine-grained optimization objective and utilizes a novel IB-guided tree sampling strategy that not only improves the efficiency of online sampling with 50% more trajectories under the same token budget, but also reuses the tree structure for effective IB-Score Monte Carlo estimation. Extensive experiments across standard benchmarks show that our method significantly outperforms GRPO baseline by 2.9% to 3.6% and also outperforms other state-of-the-art online RL approaches. Our code is available at https://github.com/alibaba/EfficientRL.
- Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving
End-to-end autonomous driving via Vision-Language-Action (VLA) models demands a precarious balance between high-fidelity trajectory planning and efficient inference. Existing paradigms typically fall short: autoregressive (AR) VLAs are memory-bandwidth-bound on edge hardware and prone to exposure-bias drift, while full-sequence diffusion models preclude KV-cache reuse and suffer from "logical leakage" that violates the fundamental perceive-then-plan causality. We present Fast-dDrive, a block-diffusion VLA that performs bidirectional refinement within semantic units while enforcing strict causal ordering across them. Leveraging the observation that driving VLAs often emit structured JSON-like outputs, Fast-dDrive freezes structural tokens into a section scaffold and employs a section-aware training recipe that prioritizes safety-critical planning. We further introduce Scaffold Speculative Decoding to achieve AR-equivalent quality at significantly higher throughput. Finally, we propose a low-overhead test-time scaling scheme: by forking N stochastic trajectory rollouts from a single shared-prefix KV cache and averaging them, we effectively suppress prediction variance at a fractional computational cost. Empirical results demonstrate that Fast-dDrive redefines the speed-accuracy frontier for driving agents. On the WOD-E2E test set, Fast-dDrive achieves SOTA ADE@3s and ADE@5s, alongside the highest RFS among diffusion-based VLAs; on nuScenes, it reduces average L2 error to 0.32m (a 22% improvement). When integrated with SGLang, our framework delivers 12times throughput speedup over the AR baseline, narrowing the gap between high-capacity VLAs and the efficiency demands of real-time on-vehicle deployment.
- GE-Sim 2.0: A Roadmap Towards Comprehensive Closed-loop Video World Simulators for Robotic Manipulation
We introduce GE-Sim 2.0 (Genie Envisioner World Simulator 2.0), a closed-loop video world simulator for robotic manipulation. Building on the action-conditioned video generation framework of Genie Envisioner, GE-Sim 2.0 is re-trained on thousands of hours of real-world robot data spanning teleoperation, contact-rich interaction, and on-robot policy deployment, substantially improving action-following fidelity and trajectory coverage. On top of this foundation, three new modules close the loop from video simulation to policy learning: a state expert that decodes proprioceptive state from video latents to support next-chunk prediction by downstream VLA policies; a world judge that scores generated rollouts against task instructions, yielding machine-verifiable success signals and rewards in place of manual inspection; and an acceleration framework that delivers a 25-frame rollout in 2.3 seconds on a single H100, with up to 4* frame skipping at inference for long-horizon evaluation. GE-Sim 2.0 tops the public WorldArena leaderboard at only 2B parameters, outperforming both dedicated robotic world models and closed-source general video generators, and policies trained against its rollouts and rewards translate into measurable real-world gains, establishing GE-Sim 2.0 as a practical platform for scalable evaluation and closed-loop learning of manipulation policies.
Techmeme(15)
- Sources: Amazon has shut down an internal leaderboard that tracked employees' use of AI tools after workers tried to boost their scores with needless tasks (Rafe Rosner-Uddin/Financial Times)
Rafe Rosner-Uddin / Financial Times : Sources: Amazon has shut down an internal leaderboard that tracked employees' use of AI tools after workers tried to boost their scores with needless tasks — Senior executive Dave Treadwell tells staff 'don't use AI just for the sake of using AI' as costs rise
- Sources: Airwallex raised new funding led by Lee Fixel's Addition at a ~$12B valuation, up from $8B late last year, and hit $1.5B in ARR, up from $1B in October (Axios)
Axios : Sources: Airwallex raised new funding led by Lee Fixel's Addition at a ~$12B valuation, up from $8B late last year, and hit $1.5B in ARR, up from $1B in October — Payments firm Airwallex has raised new funding at around a $12 billion valuation led by Lee Fixel's Addition, Axios has learned.
- Doc: the EU is preparing emergency powers to intervene in Europe's chip supply chains during shortages, including by forcing chipmakers to override contracts (Barbara Moens/Financial Times)
Barbara Moens / Financial Times : Doc: the EU is preparing emergency powers to intervene in Europe's chip supply chains during shortages, including by forcing chipmakers to override contracts — Chipmakers could be forced to override existing contracts under draft law — The EU is preparing sweeping emergency powers …
- Dell reports Q1 revenue up 88% YoY to $43.84B, vs. $35.43B est., and forecasts FY 2027 revenue above estimates; DELL jumps 15%+ after hours (Jordan Novet/CNBC)
Jordan Novet / CNBC : Dell reports Q1 revenue up 88% YoY to $43.84B, vs. $35.43B est., and forecasts FY 2027 revenue above estimates; DELL jumps 15%+ after hours — Dell reported its fastest pace of revenue growth for any period since its return to the public market more than seven years ago, and topped analysts' estimates for sales and profit.
- Autodesk agrees to buy MaintainX, a company focused on maintenance tools, in an all-cash deal that values MaintainX at $3.6B (Brody Ford/Bloomberg)
Brody Ford / Bloomberg : Autodesk agrees to buy MaintainX, a company focused on maintenance tools, in an all-cash deal that values MaintainX at $3.6B — Engineering software maker Autodesk Inc. has agreed to buy MaintainX, a firm focused on maintenance tools. — The all-cash deal will value MaintainX at $3.6 billion, Autodesk said in a statement Thursday.
- Snowflake stock closed up 36% on Thursday, its best day ever, after the company boosted guidance and announced an AI compute deal with Amazon (Samantha Subin/CNBC)
Samantha Subin / CNBC : Snowflake stock closed up 36% on Thursday, its best day ever, after the company boosted guidance and announced an AI compute deal with Amazon — Software stocks popped on Thursday after Snowflake said it plans to spend $6 billion on compute from Amazon and topped earnings estimates on artificial intelligence momentum.
- Fonoa, which helps enterprises manage indirect tax compliance, raised a $110M Series C led by Headline and acquired PwC's Indirect Tax Edge platform (Ryan Lawler/Axios)
Ryan Lawler / Axios : Fonoa, which helps enterprises manage indirect tax compliance, raised a $110M Series C led by Headline and acquired PwC's Indirect Tax Edge platform — Fonoa, which helps enterprises manage indirect tax compliance, raised $110 million in Series C funding and acquired PwC's Indirect Tax Edge platform …
- Anthropic adds dynamic workflows to Claude Code, enabling hundreds of subagents to run in parallel for complex engineering tasks such as framework migrations (Claude)
Claude : Anthropic adds dynamic workflows to Claude Code, enabling hundreds of subagents to run in parallel for complex engineering tasks such as framework migrations — Early access users and teams inside Anthropic have been using dynamic workflows for a wide range of use cases, including:
- The CFTC moves to vacate a $5M settlement with Gemini, reversing a Biden-era enforcement action, following a lobbying campaign by the Winklevoss twins (Wall Street Journal)
Wall Street Journal : The CFTC moves to vacate a $5M settlement with Gemini, reversing a Biden-era enforcement action, following a lobbying campaign by the Winklevoss twins — A $5 million settlement at the end of the Biden administration was at the center of a lobbying campaign by the billionaire brothers
- Anthropic raised a $65B Series H at a $965B post-money valuation, overtaking OpenAI's $852B valuation, and says its revenue run rate crossed $47B this month (New York Times)
New York Times : Anthropic raised a $65B Series H at a $965B post-money valuation, overtaking OpenAI's $852B valuation, and says its revenue run rate crossed $47B this month — Anthropic raised $65 billion in new fund-raising that put its value at $900 billion, ahead of OpenAI's last valuation of $730 billion, as the companies duel for A.I. dominance.
- Anthropic says it expects Mythos-class models to be available to all customers "in the coming weeks" following the development of stronger safeguards (Madison Mills/Axios)
Madison Mills / Axios : Anthropic says it expects Mythos-class models to be available to all customers “in the coming weeks” following the development of stronger safeguards — Anthropic released Claude Opus 4.8 Thursday, an upgrade to its flagship AI model with better coding and knowledge work skills, all for the same price as its prior version.
- Corgi, which uses AI to provide insurance for startups, raised a $106M Series B1 at a $2.6B valuation, up from $1.3B on May 6, for a total funding of $378M (Dominic-Madori Davis/TechCrunch)
Dominic-Madori Davis / TechCrunch : Corgi, which uses AI to provide insurance for startups, raised a $106M Series B1 at a $2.6B valuation, up from $1.3B on May 6, for a total funding of $378M — Insurance tech Corgi on Thursday announced a $106 million Series B1 raise, valuing the company at $2.6 billion …
- Anthropic launches Opus 4.8, saying it's "more likely to flag uncertainties about its work and less likely to make unsupported claims", at the same price as 4.7 (Russell Brandom/TechCrunch)
Russell Brandom / TechCrunch : Anthropic launches Opus 4.8, saying it's “more likely to flag uncertainties about its work and less likely to make unsupported claims”, at the same price as 4.7 — On Thursday, Anthropic released Opus 4.8, the newest version of its most advanced publicly available model.
- Elon Musk says Anthropic's Colossus deal is "a 180 day lease with 90 day notice"; SpaceX's S-1 said Anthropic "agreed to pay a monthly fee through May 2029" (Russell Brandom/TechCrunch)
Russell Brandom / TechCrunch : Elon Musk says Anthropic's Colossus deal is “a 180 day lease with 90 day notice”; SpaceX's S-1 said Anthropic “agreed to pay a monthly fee through May 2029” — Earlier this month, xAI signed a major compute deal with Anthropic, pledging billions of dollars a month …
- Meta's Oversight Board says Meta agreed to increase its funding by $13M, ensuring that it will be funded through 2028, reversing a planned decrease in funding (Casey Newton/Platformer)
Casey Newton / Platformer : Meta's Oversight Board says Meta agreed to increase its funding by $13M, ensuring that it will be funded through 2028, reversing a planned decrease in funding — Meta has agreed to increase funding to its external Oversight Board by $13 million, Platformer has learned …
Solidot(15)
- Temu 因违反 DSA 被欧盟罚款 2 亿欧元
欧盟委员会根据 Digital Services Act (DSA)对 Temu 因处以 2 亿欧元罚款。原因是 Temu 对其平台上假冒伪劣商品所带来的系统性风险没有尽职尽责的识别、分析和评估,从而给欧盟消费者造成了伤害。欧盟委员会举例说:它调查的充电器有相当高比例的产品未能通过基本的安全测试;在测试的婴儿玩具中,有相当比例的产品存在中度至高度的安全风险,这些玩具含有超过法定安全限值的化学物质,或者由于可拆卸部件而存在窒息危险。欧盟委员会是在 2024 年 10 月 31 日启动调查,2025 年 7 月通过了初步调查结果,5 月 28 日公布处罚。
- 网站能通过分析 SSD 活动监视用户
浏览器已经演变成类似操作系统的复杂平台,但不断加入的新特性也增加了浏览器的攻击面,引入新的漏洞。最新的攻击被称为 FROST(fingerprinting remotely using OPFS-based SSD timing),通过测量用户使用的 SSD 的部分 I/O(输入/输出)操作时序,攻击者能识别用户在浏览器标签页打开的网站以及正在运行的应用程序。FROST 攻击无需任何交互,只需打开执行攻击的网站。FROST 攻击完全在浏览器中运行。它使用 JavaScript 与 OPFS(origin private file system)交互。OPFS 是 Web API 的一部分,是一个为特定网站预留的专属存储空间,用于运行完成特定任务所需的目标代码。网站无需任何交互就可以直接创建该空间。该攻击的一大缺陷是需要的 OPFS 文件比较大,可能需要 1GB 左右,因此会容易检测出来。
- Last.fm 独立运营
音乐平台 Last.fm 宣布再次独立运营,声明所有权更改了,但用户每天使用的产品没有变。用户的账号以及音乐品味数据等都没有变。Last.fm 创办于 2002 年,利用 Audioscrobbler 音乐推荐系统根据收听数据为每位用户创建品味档案。CBS Interactive 在 2007 年以 2.8 亿美元将其收购,CBS Interactive 如今是 Paramount Skydance 的一部分。
- 黄仁勋将成为最新一位加入清华经管顾问委员会的美国企业高管
FT 报道,英伟达 CEO 黄仁勋已同意加入清华大学经管学院的顾问委员会——该委员会现任主席是苹果 CEO 库克(Tim Cook)——黄仁勋正力争维持与北京方面的关系。清华大学位于北京,是中国专注于科学和工程的顶尖学府,该校经济管理学院顾问委员会的公开目标包括帮助该商学院加强国际联系和塑造长期战略。委员会中的美国企业高管还包括了马斯克(Elon Musk)、扎克伯格(Mark Zuckerberg)以及微软 CEO 纳德拉(Satya Nadella)。
- Valve 大幅提高 Steam Deck 掌机的售价
由于内存和 SSD 价格飙升,Valve 大幅提高了 Steam Deck 掌机的售价。以美国地区为例,512GB OLED 版本售价从 549 美元提高到 789 美元,上涨 240 美元;1TB OLED 版本售价从 649 美元提高至 949 美元,上涨 300 美元。Steam Deck 掌机于 2022 年 2 月推出,早期版本使用的屏幕是 LCD,2023 年 11 月 Valve 将屏幕从 LCD 升级到 OLED,淘汰了 LCD 版本。Steam Deck 配备的是 16 GB LPDDR5,从去年底开始内存价格上涨了数倍,SSD 的涨势没有这么夸张,但也更贵了。
- Google 员工被控利用内部消息在 Polymarket 投注获利 120 万美元
Google 安全工程师 Michele Spagnuolo 利用内部消息在预测市场 Polymarket 押注歌手 d4vd 成为 2025 年 Google 搜索量最高的人物而获利 120 万美元,他被控犯有欺诈罪,于周三上午被捕,后以 225 万美元保释金获释。Spagnuolo 能访问内部数据系统,包括一个能访问未公开年度搜索数据的工具。Polymarket 平台观察者在去年 12 月注意到账号 AlphaRaccoon 在年度搜索量最高的人物上进行可疑交易,Spagnuolo 就是该账号的所有者,他从相关投注上获利 120 万美元。Google 表示正配合调查,称 Spagnuolo 的行为违反了公司政策。
- 袭击石油设施释放的污染相当于一次火山喷发
武汉大学和中国气象局研究团队利用风云卫星和欧洲哨兵卫星量化了今年三月伊朗石油设施遭袭击后释放的二氧化硫。3 月 7 日的空袭中伊朗 Fardis、Shahran 和 Aghdasieh 油库以及德黑兰炼油厂遭到严重破坏,其中 Shahran 油库破坏最为严重,燃烧的石油流入城市下水道系统,引燃城市绿地,造成大量有毒烟雾。当地居民报告他们立即出现了呼吸困难、皮肤刺激和口中有苦味等健康问题。科学家特别关注了油库燃烧释放的具有强刺激性和腐蚀性的二氧化硫污染。利用风云-3(FY-3F 和 FY-3E)和哨兵-5P,科学家发现当地的二氧化硫浓度从 0.8 DU 上升到 2.0 DU(DU 指 Dobson unit),总排放量估计为 2.98×10⁴ 吨。这次事件的影响范围为 3.0×10⁵ 平方公里。
- 一亿年前的鸟就用华丽羽毛吸引配偶
根据发表在 PLOS One 期刊上的一项研究,生活在一亿多年前的鸟 Plumadraco bankoorum 就利用华丽羽毛去吸引配偶。这种鸟的化石在辽宁出土,生活在 1.21 亿年。该鸟从喙到尾羽根部仅长 15 厘米,但其双尾羽却长达近 30 厘米。这对羽毛不具备空气动力学功能,更可能是用于展示。在现代鸟类中,如孔雀和天堂鸟,长尾羽通常出现在雄性个体身上,用于华丽的求偶展示;而雌性则羽色低调,以便在筑巢育雏时避免被捕食者发现。研究人员据此推测,这件羽龙化石很可能代表一只雄性个体,其异常修长的尾羽可能具有类似功能。但研究也指出,这一推测还需更多关于此类远古鸟类尾部肌肉结构和筑巢策略的证据来证实。
- YouTube 将自动标记 AI 生成视频
对于人眼愈来愈难以分辨、几乎以假乱真的 AI 视频,YouTube 宣布将自动标记 AI 生成视频,并以最显眼的方式展示给用户,此举旨在改进内容透明度。对于长视频:AI 标签将显示在视频播放器下方和描述上方。对于短视频:标签将以叠加层的形式显示在视频上。
- 女性也认为女性的脸更有吸引力
根据发表在《Proceedings of the Royal Society B》期刊上的一项研究,甚至女性也认为女性的脸比男性更有吸引力。研究人员表示,这种感知差距会随着年龄的增长而缩小,到 80 多岁后消失。这一结论印证了“性别吸引力差异”,在人类不同地区的语言中,女性都被认为是更美的性别。达尔文在观察动物时发现,雄性为吸引雌性通常会有更华丽的外观,但人类的情况恰恰相反,原因是人类的性选择不是女性而是男性驱动的,男性为最有吸引力的女性而战,或者通过追逐财富和权力达到同样的目的。在这项研究中,研究人员利用 76 个国家的 52 项研究编辑了一个脸部吸引力数据库,包含近 3 万名评分者对 1.7 万张脸部的逾 150 万条评分。女性脸部吸引力的平均评分高于六成的男性脸部。这一结果部分是脸部结构的性别差异造成的,男性的脸型更偏向方形或国字脸,而女性的脸型更偏向圆形,而男性和女性都倾向于认为圆脸更具吸引力。
- 科学家用鼻喷剂逆转大脑老化
德州农工的科学家利用鼻喷剂逆转了大脑老化,该疗法仅两次就能恢复记忆力、减轻慢性炎症并改善脑细胞功能。大脑衰老通常伴随着低水平炎症。慢性炎症会干扰记忆、思维以及大脑适应新环境的能力,它也被认为是导致神经退行性疾病的重要因素。研究人员表示这种大脑老化是可以逆转的。新疗法依赖于细胞外囊泡(EVs)装载 MicroRNA 去帮助调控大脑重要生物过程。科学家利用鼻喷剂输送细胞外囊泡,让药物能绕过大脑保护屏障,直接进入脑组织。
- 《巫师3》将于明年推出新资料片《旧时曲》
CD PROJEKT RED 宣布《巫师3》的第三部资料片《旧时曲(Songs of the Past)》将于明年推出。《巫师3:狂猎》于 2015 年 5 月发布,2015 年 10 月与 2016 年 6 月分别发布了两个资料片《石之心》和《血与酒》。《巫师3》饱受赞誉,至今销量逾 6000 万份,是史上最畅销的游戏之一。《旧时曲》由 CD PROJEKT RED 与 Fool’s Theory 联合开发,Fool’s Theory 由之前参与《巫师》系列的前 CD PROJEKT RED 开发者组建,它正在开发的一个项目是第一部《巫师》的重制版。在《旧时曲》中,玩家将再次扮演猎魔人利维亚的杰洛特,开启一段全新的冒险之旅。更多信息将于夏末公布。这部资料片被广泛视为是为即将推出的《巫师4》预热。
- 轨道上的中国火箭残骸急剧增加
中国在 2022 年发射了 64 枚火箭,2025 年创下了 93 枚的发射纪录,数量仅次于美国。随着中国公司加速发射国网和千帆宽带卫星星座,火箭发射数量还会增加。但中国公司在发射时没有更好的处理火箭的上面级。根据 Jim Shell 的最新分析,过去五年中国在高生存期轨道上的火箭残骸质量从不到 100 吨增至 252 吨。高生存期轨道顾名思义也就是火箭残骸会长期留在轨道上。为发射巨型宽带卫星星座,中国预计未来十年将会执行千次或以上的火箭发射。
- Google 转型 AI 搜索之后 DuckDuckGo 安装量上涨最高三成
Google 上周宣布将大幅更改搜索功能,把搜索框改为 AI 聊天机器人的对话框,此举立即在用户中间引发了强烈反对。一部分批评者认为这将杀死开放 Web,一部分人担心 AI overviews 会展示错误的答案,且剥夺了不想要 AI 的用户的控制权。部分用户因此转向了替代搜索 DuckDuckGo。DuckDuckGo 称,其美国应用在 5 月 20 日-25 日期间的安装量周环比平均增长 18.1%,安装量增势持续了六天,5 月 25 日达到最高的 30.5%。而在 iOS 平台上,安装量周环比平均增长 33%,最高 69.9%。不展示 AI 结果的 noai.duckduckgo.com 访问量周环比平均增长 22.7%,5 月 24 日最高 27.7%。DuckDuckGo 高管 Kamyl Bazbaz 称用户想要选择权。
- Dropbox 创始人卸任 CEO 一职
Dropbox 创始人 Drew Houston 周二通知员工他将卸任 CEO 一职改任执行董事长,联席 CEO Ashraf Alkarmi 将成为唯一的 CEO。Houston 是在 24 岁创办了 Dropbox,担任 CEO 长达 19 年,帮助开创了云存储市场,与巨头 Google 和苹果展开直接竞争。但他领导下的 Dropbox 未能走向巅峰,其市值比上市时的峰值跌去了一半。Dropbox 在最新的季度财报中表示其付费用户逾 1800 万,其云存储服务仍然深受媒体专业人士、平面设计师、建筑师以及其他日常工作中需要共享文件和照片的人士的欢迎。Dropbox 2017 年年收入突破 10 亿美元,四年后突破 20 亿美元,但过去两年收入基本持平,2025 年略有下降。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.