OrangeBot.AI Digest — 2026-05-27
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- YouTube to automatically label AI-generated videos (blog.youtube)
- Valve raises Steam Deck prices (www.theverge.com)
- Canada to order military plane fleet from Sweden in shift from US suppliers (www.theguardian.com)
- SimCity 3k in 4k (2025) (www.thran.uk)
- I think Anthropic and OpenAI have found product-market fit (simonwillison.net)
- DuckDuckGo search saw 28% more visits after Google said people love AI mode (www.pcgamer.com)
- Tech CEOs are apparently suffering from AI psychosis (techcrunch.com)
- Training our own AI models (posthog.com)
- Last.fm is now independent (support.last.fm)
- Private equity bought America's essential services (rubbishtalk.com)
- Incident with Pull Requests, Issues, Git Operations and API Requests (www.githubstatus.com)
- All of human cooking compressed into 2 megabytes (arxiv.org)
- I'm Tired of Talking to AI (orchidfiles.com)
- Mini Micro Fantasy Computer (miniscript.org)
- The Melancholy of Slaying Monsters (thereader.mitpress.mit.edu)
GitHub Trending(15)
- harry0703 / MoneyPrinterTurbo
- Lum1104 / Understand-Anything
- hardikpandya / stop-slop
- affaan-m / ECC
- anthropics / knowledge-work-plugins
- Leonxlnx / taste-skill
- p-e-w / heretic
- shiyu-coder / Kronos
- mukul975 / Anthropic-Cybersecurity-Skills
- twentyhq / twenty
- Chachamaru127 / claude-code-harness
- DigitalPlatDev / FreeDomain
- obra / superpowers
- byoungd / English-level-up-tips
- iii-hq / iii
Product Hunt(15)
- Pawse.ai
An acoustic regulation system for dogs
- Archi-Flow
Visualize cloud architecture with live traffic simulations
- baz.studio
Skills library & video editor for AI Agents
- Layers
Create beautiful animated code snippet videos for free
- Krater
All the AI tools you use, one subscription
- Octolane
Self-driving AI CRM that you can talk to
- Bluedot 2.1
Record on Apple Watch. Sync with Claude
- Powabase
Build AI apps with Postgres, RAG, and agents
- Local Panel
Local SSH server manager with no subscriptions or installs
- Chunk sidecars
Validate agent-generated code before it ever reaches CI
- QuickSheet v1.2
Instantly create and edit spreadsheets from your menu bar
- Netfox
A native local macOS network monitor
- CircadiaOS
Sleep optimization, minus the $3,000 mattress pod
- Oasis Browser for Mac
A privacy-first AI browser you can train anonymously
- Aviquill
A calm canvas for visual thinkers with messy minds
Hugging Face(15)
- LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck due to strictly sequential generation. We introduce LocateAnything, a unified generative grounding and detection framework based on Parallel Box Decoding (PBD). By decoding geometric elements such as bounding boxes and points as atomic units in a single step, LocateAnything preserves intra-box geometric coherence and unlocks substantial parallelism. We show that PBD improves both decoding throughput and localization accuracy. We further develop a scalable data engine and curate LocateAnything-Data, a large-scale dataset with more than 138 million training samples, substantially increasing data diversity for high-precision localization. Extensive evaluations show that LocateAnything advances the speed-accuracy frontier, achieving significantly higher decoding throughput while improving high-IoU localization quality across diverse benchmarks. The results highlight the complementary benefits of Parallel Box Decoding and large-scale training data in enabling efficient and precise unified visual grounding and detection.
- EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
The rapid evolution of generative video foundation models has propelled the field toward professional-grade cinematic synthesis. To achieve such demanding quality, the community transitions towards Reinforcement Learning (RL) and agentic workflows. However, reliable evaluation has emerged as a critical bottleneck. Existing benchmarks predominantly evaluate ''whether it is right'' (basic prompt-following) while fundamentally neglecting ''whether it is good'' (cinematic quality, acting, and aesthetics). Furthermore, current automated metrics lack the domain-specific rigor required to provide trustworthy signals, creating a severe credibility gap between human aesthetic perception and machine scoring. To bridge this gap, we introduce EvalVerse, a comprehensive, pipeline-aware, and expert-calibrated evaluation framework. We treat video generation assessment not merely as an engineering task, but as a core scientific problem: the systematic digitization of subjective cinematic expertise. First, we organize domain knowledge into an evaluation taxonomy aligned with the professional filmmaking workflow (pre-production, production, and post-production). Second, we distill human expert judgments into a curated dataset with large-scale human annotations. Third, we inject this knowledge into Vision-Language Models (VLMs) through an expert-calibrated fine-tuning strategy, enabling the VLM to perform explicit Chain-of-Thought reasoning. Compared to previous works, EvalVerse not only retains compatibility with foundational ''rightness'' metrics, but also significantly expands the criteria to ''goodness'' and broaden the task coverage to complex multi-shot sequencing and audio-visual integration. Consequently, by providing granular diagnostic signals, EvalVerse transcends a static leaderboard and establishes a fundamental infrastructure for future work, such as reward models and evaluator agent.
- SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
While spatial foundation models have demonstrated impressive performance on standard datasets, a critical question remains: are they truly all-round players capable of generalizing robustly across diverse downstream tasks, arbitrary viewpoints, shifting scene domains, varying input densities, and specific hardware constraints? Answering this overarching question requires a holistic assessment, yet current models are mainly evaluated on specific domains for which they were specifically designed or trained. Such evaluations are intrinsically limited by narrow paradigm coverage, limited scene domains, and arbitrary frame sampling, making it fundamentally difficult to assess their true generalization capabilities. To address this gap, we present SpatialBench, a cross-paradigm, domain-diverse benchmark for spatial foundation models with deterministic sampling. SpatialBench features unprecedented scale and rigorous deterministic design, comprising 19 datasets and 546 scenes across 5 diverse spatial domains. It comprehensively evaluates 41 models across 6 paradigms on 5 task suites under 4 different input density settings. Our extensive evaluation reveals that current models are not yet all-round players, and uncovers crucial insights for future advancement. Specifically, we demonstrate that full-context attention maximizes accuracy while bounded-memory strategies unlock long-sequence scalability. Moreover, our empirical evaluations in challenging embodied and egocentric tasks demonstrate that strict domain alignment and high data quality are far more critical to performance than simple dataset scaling. Furthermore, to address the largest data gap identified in our analysis, we go beyond evaluation by introducing a large-scale dataset, DA-Next-5M, and a strong baseline model, DA-Next, pushing the boundaries of spatial representation learning.
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research
We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two capabilities previously out of reach for everyday apps: verifiable outcome signals through deterministic state-based judging over structured JSON state, and scalable online RL through low-cost parallel rollouts. The full environment state is captured, configured, forked, and compared as structured JSON, and a single server can host hundreds of parallel instances, with about 400 MB memory per instance and about 3 s cold start. A layered state model and a declarative task-definition framework keep state programmability and task creation practical at scale, and a single programmatic judging mechanism delivers both deterministic evaluation verdicts and dense RL rewards. The accompanying MobileGym-Bench provides 416 parameterized task templates, including 256 test and 160 train templates, over 28 apps, with deterministic judges and a structured AnswerSheet protocol that avoids free-text matching failures. In a Sim-to-Real case study, GRPO on Qwen3-VL-4B-Instruct gains +12.8 percentage points on the 256-task test set, and on a 59-task real-device signal subset, real-device execution retains 95.1% of the simulation-side training gain. Project page: https://mobilegym.github.io.
- Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
Multi-view 3D reconstruction has achieved remarkable progress with the advent of feed-forward 3D reconstruction models. However, these models are typically trained and evaluated under ideal, degradation-free imaging conditions, whereas real-world observations often contain degradations that differ significantly from such settings. Improving robustness for multi-view 3D reconstruction under degraded conditions therefore remains an important challenge. We present Geometry-Aware Representation Denoising (GARD), a novel framework that performs diffusion-based multi-view restoration directly in the feature space of a feed-forward 3D reconstruction model. This design exploits the geometry-aware feature representations of the 3D reconstructor to effectively recover accurate scene geometry. Furthermore, by employing an additional RGB image decoder, the refined representations can also be used to restore high-quality RGB images, thereby enabling the simultaneous recovery of 3D scene geometry and high-quality imagery. Comprehensive experiments on the Depth Anything 3 (DA3) benchmark demonstrate the effectiveness of the proposed GARD framework.
- LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
Audio-visual generation is rapidly advancing from short clips to minute-long content, while existing evaluation protocols remain largely confined to short-form settings. Existing benchmarks primarily focus on 5--10 second text-conditioned generation and rarely support unified evaluation across text, image, and video conditioning modalities. Moreover, they provide limited insight into how identity consistency, narrative coherence, and audio-visual alignment degrade over extended temporal horizons. To bridge this gap, we introduce LongAV-Compass, a systematic benchmark for minute-long audio-visual generation. LongAV-Compass contains 284 curated test cases spanning text-to-audio-video (T2AV), image-to-audio-video (I2AV), and video-to-audio-video (V2AV), organized by application scenario and generation complexity. The benchmark combines taxonomy-guided benchmark construction with a unified evaluation framework that integrates MLLM-assisted assessment with complementary perceptual and multimodal metrics, including DINO-v2, ArcFace, CLIP, and ImageBind. The framework evaluates more than 20 fine-grained dimensions covering within-segment quality, cross-segment consistency, global narrative coherence, semantic alignment, and audio-visual synchronization. Through experiments on 11 representative models together with human-alignment validation, LongAV-Compass provides a diagnostic testbed for analyzing the limitations of current systems in sustaining coherent, semantically aligned, and temporally consistent minute-scale audio-visual generation across diverse input modalities.
- D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safety monitoring for D-LLMs remains largely unexplored. Unlike AR-LLMs, D-LLMs generate text through a multi-step denoising process, exposing intermediate hidden representations that may contain safety-relevant information unavailable in standard single-step monitoring setups. Motivated by the suitability of lightweight probes for always-on monitoring, we analyze which trajectory-level signals best indicate when such probes are likely to struggle. We find that the most informative signal is safety hesitation: intermediate hidden states repeatedly falling within a small margin of the probe's decision boundary. The number of such hesitation steps in D-LLM's trajectory predicts probe failure effectively, providing a proxy of sample difficulty. Building on this analysis, we propose D^2-Monitor, a bi-level safety monitor for D-LLMs. D^2-Monitor adopts a lightweight probe as an always-on monitor to jointly estimate hesitation and perform base classification. When the hesitation level exceeds a threshold, a more expressive but computationally heavier probe is activated. This dynamic routing mechanism allocates monitoring resources efficiently at test time. Evaluated on 3 datasets (WildguardMix, ToxicChat, OpenAI-Moderation) across 4 D-LLMs, D^2-Monitor achieves state-of-the-art performance with a compact parameter footprint (leq 0.85M parameters), and exhibits the best trade-off between effectiveness and efficiency relative to 8 baselines.
- The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence
We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the M2 series rests on three components: (i) agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork, each grounded in an executable workspace and an artifact-aligned reward; (ii) Forge, a scalable agent-native RL system that adapts to long-horizon agent trajectories, paired with windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clean training-inference-agent decoupling that supports both white-box and black-box agents; (iii) the latest M2.7 checkpoint takes an early step toward self-evolution -- autonomously debugging training runs and modifying its own scaffold. Across M2 through M2.7, this combination translates a mini-activation footprint into frontier-tier performance on agentic coding, deep search, office-task, and reasoning benchmarks.
- Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling
Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and cannot guide other branches in time. This information isolation causes substantial redundant exploration, as branches repeatedly rediscover information already found elsewhere and require more search steps to collect complete decision information needed to reach correct answers. To bridge this gap, we propose Collaborative Parallel Thinking (CPT), a training-free inference framework that enables search-time information sharing across parallel branches. CPT extracts compact intermediate information from ongoing branches, maintains a deduplicated query-level information pool, and broadcasts pool entries through the input context, allowing each branch in subsequent search steps to reuse discoveries made by other branches rather than rediscover the same information. Empirically, experiments on HMMT and AIME benchmarks show that CPT establishes a stronger accuracy--latency Pareto frontier than strong baselines across rollout budgets and model scales, highlighting search-time collaboration as an effective direction for efficient parallel TTS.
- Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration
We study series-level cinematic remaking, a long-horizon video-to-video generation problem that localizes full episodes or films via stylization or actor replacement while strictly preserving narrative structure, motion choreography, and character identity across hundreds of shots. Existing video generation and editing pipelines often break down in this regime due to compounding identity drift, background mutation, and semantic erosion under large camera motions and viewpoint changes. We propose Soap2Soap, a multi-agent framework that enforces long-term language-visual consistency through a Dual-Bridge Consistency mechanism: a scene-aware JSON screenplay serving as a persistent semantic backbone, and dynamically allocated visual reference anchors at both scene and shot levels. To suppress drift before video synthesis, we introduce batch keyframe consistency, jointly generating multiple keyframes in a shared latent context via a grid-based formulation. A closed-loop verification agent further audits identity, stability, and alignment to trigger selective regeneration. Experiments on SoapBench demonstrate strong improvements over commercial video generation APIs in long-term consistency and narrative fidelity.
- LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
We introduce LLaVA-OneVision-2 (LLaVA-OV-2), the most capable vision-language model in the LLaVA-OneVision series to date, achieving superior performance across a broad range of multimodal benchmarks. The model builds on a native OneVision-Encoder and incorporates Windowed Attention for efficient local computation while maintaining native resolution. Its key advance is codec-stream tokenization: it treats compressed video as a continuous bit-cost stream, where bit-cost dynamics determine adaptive temporal groups, and motion-residual cues select salient spatial evidence into compact visual canvases. This allocation concentrates a limited token budget on event-bearing content, enabling more stable long-video token compression than fixed groups of pictures. A shared 3D RoPE further places codec canvases, sampled frames, and images in a unified spatiotemporal coordinate system. Furthermore, we build the LLaVA-OV-2 data and training stack around large-scale open supervision: approximately 8M re-captioned video samples for pretraining, a 4M-sample spatial corpus for fine-tuning. We also introduce JumpScore, a temporal-localization benchmark targeting fine-grained grounding in high-frequency, densely repeated motion, a regime underrepresented by existing video evaluations. A standout capability of LLaVA-OV-2 is its unified perception across video understanding, temporal grounding, spatial grounding, and manipulation-trace reasoning. On JumpScore, LLaVA-OneVision-2-8B reaches 74.9 JumpScore mAP, surpassing Qwen3-VL-8B (30.1) by +44.8 points; under matched visual-token budgets on the same benchmark, codec-stream inputs improve temporal grounding over frame sampling by +9.7 points. Across standard benchmarks, LLaVA-OneVision-2-8B further outperforms Qwen3-VL-8B by +4.3 average points on video tasks, +5.3 on spatial tasks, and +15.6 average J&F on tracking tasks.
- Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models
Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present a systematic study of scale vectors in LLMs from the perspectives of expressivity, optimization, and architectural structure. First, we show empirically that although scale vectors constitute only a negligible fraction of model parameters, removing them substantially degrades LLM pre-training. Our theory further shows that, in Pre-Norm architectures, scale vectors do not increase expressivity; instead, they improve optimization through a self-amplifying preconditioning effect on subsequent linear mappings. Second, we investigate the role of weight decay for scale vectors. By distinguishing Input-Norm and Output-Norm layers, we theoretically show that weight decay is beneficial for the former but harmful for the latter, due to their distinct roles in optimization and expressivity. Third, motivated by this understanding, we propose three lightweight and complementary improvements to scale vectors: branch-specific heterogeneity, improved placement around linear mappings, and magnitude-direction reparameterization. Both theory and experiments show that each improvement yields consistent gains. Finally, we combine these improvements into a unified scale-vector strategy and evaluate it through extensive LLM pre-training experiments on dense and mixture-of-experts models ranging from 0.12B to 2B parameters, across multiple optimizers and learning rate schedules, under industrial-scale token budgets. The unified strategy consistently achieves lower terminal loss than well-tuned baselines and exhibits more favorable scaling behavior, while adding negligible parameter and computational overhead.
- JLT: Clean-Latent Prediction in Latent Diffusion Transformers
Flow matching with clean-data prediction has shown that regressing the clean point can exploit low-dimensional structure more effectively than predicting an ambient noised quantity. We ask whether this principle remains useful after images are mapped into a learned latent space, where compression has already removed much of the raw pixel variability. We introduce JLT, a 130M latent diffusion Transformer over frozen FLUX.2 VAE codes, and compare clean-latent prediction with a matched velocity-prediction DiT under the same representation, backbone, and training settings. Although the three variables x, epsilon, and v are linearly convertible for a fixed corruption time, a local Gaussian analysis shows that velocity regression inherits an isotropic target-covariance floor and amplifies low-variance latent directions, while clean prediction damps them. On ImageNet 256 x 256, JLT-B/1 obtains FID-50K 2.50 with classifier-free guidance, with a large matched-target gap over velocity prediction. These results suggest that prediction targets in latent diffusion are representation-dependent geometric choices, rather than interchangeable algebraic parameterizations.
- Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement
Agentic reinforcement learning (RL) has proven effective for training LLM-based agents with external tool-use capabilities. However, we identify that agentic RL training induces increasing redundant tool calls and blurs the model's intrinsic knowledge boundary, where the model fails to distinguish when tools are needed versus when parametric knowledge suffices. Existing solutions based on reward shaping create coarse-grained optimization targets that tend to incentivize indiscriminate tool-call suppression, leading to reward hacking. In this paper, we propose AKBE (Agentic Knowledge Boundary Enhancement), an on-policy method that dynamically probes the model's intrinsic knowledge boundary through dual-path (with-tool and no-tool) rollouts during training. We define the knowledge boundary as the per-instance determination of whether tools are required and the minimum tool calls necessary. By comparing correctness across paths, AKBE categorizes trajectories and constructs targeted supervisory signals that guide efficient tool-use patterns for each question. These signals are integrated seamlessly into the agentic RL training loop. Experiments on seven QA benchmarks demonstrate that AKBE improves task accuracy by +1.85 on average and reduces tool calls by 18% over standard agentic RL, yielding 25% higher tool productivity without any accuracy-efficiency trade-off. Further analysis suggests its plug-and-play compatibility across different RL algorithms and the mechanism of each signal category. Our code is available at https://github.com/CuSO4-Chen/AKBE.
- QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents
Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environments are scored only by game outcomes such as win rates and largely remain to text-only interaction, making it difficult to tell whether an agent's language is actually grounded in what it perceived and did, or to identify the failure modes underlying its behavior. To address this gap, we introduce QUACK, an open-source environment and evaluation framework for auditing the grounding of agent language in multimodal social reasoning. QUACK evaluates agents at three levels: game outcomes, behavioral trajectories, and utterance-level consistency. Its core Statement Verification Pipeline reconstructs each agent's ground-truth trajectory from engine logs and checks every discussion claim against it, automatically flagging spatial hallucination, unsupported accusation, deception collapse, and language-action inconsistency. Evaluating three frontier VLMs in both homogeneous and cross-model adversarial settings, we find that even the strongest agent hallucinates 15.1% of its verifiable spatial claims and makes over half of its accusations without grounded evidence. We release the full engine, evaluation framework, toolkit, and logs at https://github.com/AAAAA-Academia-Attractions/QUACK.
Techmeme(15)
- Sources: Uber has increased its stake in Delivery Hero to ~37% by acquiring Aspex Management's 14.6% share at a €12B valuation, after a takeover offer last week (Financial Times)
Financial Times : Sources: Uber has increased its stake in Delivery Hero to ~37% by acquiring Aspex Management's 14.6% share at a €12B valuation, after a takeover offer last week — Share purchase from Aspex Management steps up US ride-hailing group's pursuit of German food delivery company
- HP reports Q2 revenue up 9% YoY to $14.4B, vs. $14B est., Personal Systems revenue up 13% to $10.2B, and forecasts Q3 adjusted EPS above estimates (Dina Bass/Bloomberg)
Dina Bass / Bloomberg : HP reports Q2 revenue up 9% YoY to $14.4B, vs. $14B est., Personal Systems revenue up 13% to $10.2B, and forecasts Q3 adjusted EPS above estimates — HP Inc. gave a profit forecast for the current quarter that topped analysts' estimates, signaling that the company is weathering a dramatic increase in memory chip prices.
- Salesforce reports Q1 revenue up 13% YoY to $11.13B, vs. $11.05B est., Agentforce annual recurring revenue up 205% to $1.2B, and forecasts Q2 revenue below est. (Jordan Novet/CNBC)
Jordan Novet / CNBC : Salesforce reports Q1 revenue up 13% YoY to $11.13B, vs. $11.05B est., Agentforce annual recurring revenue up 205% to $1.2B, and forecasts Q2 revenue below est. — Salesforce reported stronger-than-expected quarterly results on Wednesday, but the cloud software vendor issued full-year guidance …
- Snowflake reports Q1 revenue up 33% YoY to $1.39B, vs. $1.32B est., and commits to spending $6B on AWS over five years; SNOW jumps 29%+ after hours (CNBC)
CNBC : Snowflake reports Q1 revenue up 33% YoY to $1.39B, vs. $1.32B est., and commits to spending $6B on AWS over five years; SNOW jumps 29%+ after hours — Snowflake is going deeper with Amazon's Web Services, and plans to use its Arm-based Graviton chips.
- The UK's GCHQ head says the UK and allies have a "narrowing window" to counter cyber threats from China and Russia, as Russia intensifies "daily" hybrid warfare (Chloe Taylor/CNBC)
Chloe Taylor / CNBC : The UK's GCHQ head says the UK and allies have a “narrowing window” to counter cyber threats from China and Russia, as Russia intensifies “daily” hybrid warfare — Choose CNBC as your preferred source on Google and never miss a moment from the most trusted name in business news.
- Amazon MGM Studios announces the GenAI Creators' Fund, greenlights three AI animated series for Prime Video, and launches an AI production platform with AWS (Todd Spangler/Variety)
Todd Spangler / Variety : Amazon MGM Studios announces the GenAI Creators' Fund, greenlights three AI animated series for Prime Video, and launches an AI production platform with AWS — Amazon MGM Studios is touting the power of AI to help create “cinematic” entertainment in a big new initiative …
- Valve raises Steam Deck OLED prices due to "rising memory and storage costs"; the 512GB model is now $789, up from $549, and the 1TB model is $949, up from $649 (Jay Peters/The Verge)
Jay Peters / The Verge : Valve raises Steam Deck OLED prices due to “rising memory and storage costs”; the 512GB model is now $789, up from $549, and the 1TB model is $949, up from $649 — The 1TB OLED model got a $300 price increase, and now costs $949. … Valve has significantly increased the price …
- Meta rolls out Plus plans for Instagram, Facebook, and WhatsApp globally, will test $7.99/mo. and $19.99/mo. Meta AI plans, a $49.99/mo. creator plan, and more (Sarah Perez/TechCrunch)
Sarah Perez / TechCrunch : Meta rolls out Plus plans for Instagram, Facebook, and WhatsApp globally, will test $7.99/mo. and $19.99/mo. Meta AI plans, a $49.99/mo. creator plan, and more — Meta is doubling down on its subscription offerings. On Wednesday, the social networking giant announced it's now rolling …
- OpenAI announces partnerships to combat election misinformation, offering cybersecurity products to state officials and backing legislation to curb deepfakes (Maria Curi/Axios)
Maria Curi / Axios : OpenAI announces partnerships to combat election misinformation, offering cybersecurity products to state officials and backing legislation to curb deepfakes — OpenAI is announcing new partnerships to combat misinformation, offering its cybersecurity products to state officials …
- Starlette, an open-source Python framework underpinning FastAPI, has a vulnerability called BadHost that can allow hackers to bypass authorization (Dan Goodin/Ars Technica)
Dan Goodin / Ars Technica : Starlette, an open-source Python framework underpinning FastAPI, has a vulnerability called BadHost that can allow hackers to bypass authorization — Millions of AI agents and tools around the world have been imperiled by a critical vulnerability that can allow hackers to breach the servers running …
- OpenAI Foundation says it is committing an initial $250M for grants, partnerships, and direct work aimed at helping workers and economies navigate AI disruption (Reuters)
Reuters : OpenAI Foundation says it is committing an initial $250M for grants, partnerships, and direct work aimed at helping workers and economies navigate AI disruption — The non-profit that controls OpenAI will commit an initial $250 million for grants, partnerships and direct work aimed …
- Tensormesh, whose inference platform uses KV caching to reduce costs, raised a $20M seed extension, bringing its total funding to $24.5M (Chris Metinko/Axios)
Chris Metinko / Axios : Tensormesh, whose inference platform uses KV caching to reduce costs, raised a $20M seed extension, bringing its total funding to $24.5M — Inference optimization startup Tensormesh raised a $20 million seed extension, co-founder and CEO Junchen Jiang tells Axios Pro.
- NYC-based Pace, whose AI agents automate back-office operations for insurance companies, raised a $46M series B led by Thrive and Sequoia at a $375M valuation (Anna Tong/Forbes)
Anna Tong / Forbes : NYC-based Pace, whose AI agents automate back-office operations for insurance companies, raised a $46M series B led by Thrive and Sequoia at a $375M valuation — The startup says its AI agents can handle the dull work insurers have long shipped to offshore operators.
- AI coding startup Cognition AI raised more than $1B at a $26B valuation, and says its revenue run rate has increased to $492M from $37M in May 2025 (Rebecca Torrence/Bloomberg)
Rebecca Torrence / Bloomberg : AI coding startup Cognition AI raised more than $1B at a $26B valuation, and says its revenue run rate has increased to $492M from $37M in May 2025 — Cognition AI Inc. has raised more than $1 billion in a new funding round at a $26 billion valuation, the latest sign of strong demand …
- Opendoor co-Founder Eric Wu's NavigateAI, which is building an expert AI coach for construction workers, raised a $25M seed led by Elad Gil at a $225M valuation (Anna Tong/Forbes)
Anna Tong / Forbes : Opendoor co-Founder Eric Wu's NavigateAI, which is building an expert AI coach for construction workers, raised a $25M seed led by Elad Gil at a $225M valuation — After pioneering automated home buying, the serial real estate tech entrepreneur is shifting focus from buying physical assets to empowering the workers who build them.
Solidot(15)
- 科学家用鼻喷剂逆转大脑老化
德州农工的科学家利用鼻喷剂逆转了大脑老化,该疗法仅两次就能恢复记忆力、减轻慢性炎症并改善脑细胞功能。大脑衰老通常伴随着低水平炎症。慢性炎症会干扰记忆、思维以及大脑适应新环境的能力,它也被认为是导致神经退行性疾病的重要因素。研究人员表示这种大脑老化是可以逆转的。新疗法依赖于细胞外囊泡(EVs)装载 MicroRNA 去帮助调控大脑重要生物过程。科学家利用鼻喷剂输送细胞外囊泡,让药物能绕过大脑保护屏障,直接进入脑组织。
- 《巫师3》将于明年推出新资料片《旧时曲》
CD PROJEKT RED 宣布《巫师3》的第三部资料片《旧时曲(Songs of the Past)》将于明年推出。《巫师3:狂猎》于 2015 年 5 月发布,2015 年 10 月与 2016 年 6 月分别发布了两个资料片《石之心》和《血与酒》。《巫师3》饱受赞誉,至今销量逾 6000 万份,是史上最畅销的游戏之一。《旧时曲》由 CD PROJEKT RED 与 Fool’s Theory 联合开发,Fool’s Theory 由之前参与《巫师》系列的前 CD PROJEKT RED 开发者组建,它正在开发的一个项目是第一部《巫师》的重制版。在《旧时曲》中,玩家将再次扮演猎魔人利维亚的杰洛特,开启一段全新的冒险之旅。更多信息将于夏末公布。这部资料片被广泛视为是为即将推出的《巫师4》预热。
- 轨道上的中国火箭残骸急剧增加
中国在 2022 年发射了 64 枚火箭,2025 年创下了 93 枚的发射纪录,数量仅次于美国。随着中国公司加速发射国网和千帆宽带卫星星座,火箭发射数量还会增加。但中国公司在发射时没有更好的处理火箭的上面级。根据 Jim Shell 的最新分析,过去五年中国在高生存期轨道上的火箭残骸质量从不到 100 吨增至 252 吨。高生存期轨道顾名思义也就是火箭残骸会长期留在轨道上。为发射巨型宽带卫星星座,中国预计未来十年将会执行千次或以上的火箭发射。
- Google 转型 AI 搜索之后 DuckDuckGo 安装量上涨最高三成
Google 上周宣布将大幅更改搜索功能,把搜索框改为 AI 聊天机器人的对话框,此举立即在用户中间引发了强烈反对。一部分批评者认为这将杀死开放 Web,一部分人担心 AI overviews 会展示错误的答案,且剥夺了不想要 AI 的用户的控制权。部分用户因此转向了替代搜索 DuckDuckGo。DuckDuckGo 称,其美国应用在 5 月 20 日-25 日期间的安装量周环比平均增长 18.1%,安装量增势持续了六天,5 月 25 日达到最高的 30.5%。而在 iOS 平台上,安装量周环比平均增长 33%,最高 69.9%。不展示 AI 结果的 noai.duckduckgo.com 访问量周环比平均增长 22.7%,5 月 24 日最高 27.7%。DuckDuckGo 高管 Kamyl Bazbaz 称用户想要选择权。
- Dropbox 创始人卸任 CEO 一职
Dropbox 创始人 Drew Houston 周二通知员工他将卸任 CEO 一职改任执行董事长,联席 CEO Ashraf Alkarmi 将成为唯一的 CEO。Houston 是在 24 岁创办了 Dropbox,担任 CEO 长达 19 年,帮助开创了云存储市场,与巨头 Google 和苹果展开直接竞争。但他领导下的 Dropbox 未能走向巅峰,其市值比上市时的峰值跌去了一半。Dropbox 在最新的季度财报中表示其付费用户逾 1800 万,其云存储服务仍然深受媒体专业人士、平面设计师、建筑师以及其他日常工作中需要共享文件和照片的人士的欢迎。Dropbox 2017 年年收入突破 10 亿美元,四年后突破 20 亿美元,但过去两年收入基本持平,2025 年略有下降。
- 奇怪的语言错误或有助于识别论文工厂的论文
Medical Evidence Project 项目的 James Heathers 在世界科研诚信大会上报告称,一种简单的寻找语言错误的方法,有助于识别出由“论文工厂”炮制出来的虚假研究论文。Heathers 是在去年萌生的这一想法。当时有人给他发来十几篇看起来极为相似的医学论文,希望他能够找出其中的问题所在。Heathers 花了两天时间阅读这些论文,并注意到一些奇怪但常见的拼写错误、语法错误和用词。例如“Kolmogorovor 信息复杂度”拼写错了数学家 Andrey Kolmogorov 的姓氏;还有多篇论文出现不规范表述,如“5毫升含凝胶生物化学试管”,Heathers 形容这种表达“像是外星人写的”。这类语言错误可能只是非英语母语作者的失误,本身不足以判定论文造假。但Heathers 在 Google 学术平台检索上述特殊表述后,又发现了约 200 篇论文与最初那十几篇论文具有相同的特征——不仅主题一致,研究设计、图表样式等细节特征也高度重合。他认为,从统计学角度看,这种情况几乎不可能发生,除非它们都来自同一源头。Heathers 推测,这些论文都是同一篇论文的不同版本,由论文工厂批量伪造、翻新后,出售给那些急于增加论文发表数量的科学家。
- 荷兰阻止美国公司收购其重要数字供应商
针对美国 IT 巨头 Kyndryl 拟收购荷兰云服务商 Solvinity 的交易,荷兰政府最终决定阻止收购。Solvinity 托管了荷兰的在线身份平台 DigiD,因此交易引发了 DigiD 数据被美国控制和索取的担忧。荷兰数字经济国务秘书 Willemijn Aerdts 周二致函荷兰议会,负责审查投资的机构认为此次收购“可能对公共利益构成风险”,建议政府阻止此次收购。政府随后采纳了建议。Kyndryl 在一份声明中对荷兰政府的决定表示极度失望。
- 教宗的首份通谕被怀疑部分是在 AI 帮助下撰写的
教宗良十四世发布了其首道通谕《伟大的人类(Magnifica Humanitas)》,谈论了在 AI 时代守护人类。但这篇 通谕被质疑部分是在 AI 帮助下撰写的。AI 检测工具 Pangra 的分析显示,部分段落有 40% 到 100% 的概率是由 AI 撰写的,大部分段落则没有使用 AI。以前发布的通谕没有发现使用 AI 的痕迹。根据文本和间接证据判断,所使用的 AI 很可能是 Anthropic 的 Claude。而这份通谕的一位顾问是 Anthropic 联合创始人 Christopher Olah。
- 维基媒体基金会解雇工会组织者引发社区抗议
维基媒体基金会在五月中旬解雇了 MediaWiki 资深首席开发者 Brooke Vibber,5 月 21 日解散了 Community Tech 团队,五名工程师和一名经理全部离职。他们多数人都是工会组织者。Brooke Vibber 于 2003 年初担任 MediaWiki 项目的首席开发者,维基百科就运行在 MediaWiki 之上,她是维基媒体基金会聘用的第一位全职员工,也是首位 CTO,她被认为是少数深入理解系统技术底层的资深开发者。而 Community Tech 团队旨在通过 Community Wishlist 实现社区志愿者们想要的功能。维基媒体基金会此举立即引发了志愿者的抗议,社区志愿者准备采取罢工等集体行动。这是首次志愿者与基金会员工联合发起声援行动。名叫 Femke 的管理员认为一个致力于造福社会的组织,不应该在没有工会的情况下运作。维基媒体基金会拥有 2.966 亿美元的储备金,足以支付 17.1 个月的运营支出。而工会 Wiki Workers United 只要求:领导层对员工和社区保持透明和负责;决策前倾听员工对年度规划的建议;告别朝令夕改的招人、辞退与晋升乱象,等等,相当的温和。
- 伊朗逐步恢复全球联网
在切断网络近三个月之后,伊朗逐步恢复全球联网。伊朗第一副总统 Mohammad Reza Aref 周二通过其 X 账号宣布了这一消息。网络监视组织 Netblocks 和 Kentik 都报告伊朗网络从 13:00 GMT 开始逐步恢复,但大部分网络尚未恢复。这次断网始于 2 月 28 日,是全球历史上持续时间最长的断网事件之一。Netblocks 的研究主管 Isik Mater 称,有迹象表明伊朗对互联网的过滤比之前更严格,WhatsApp 等消息应用被额外过滤。
- 美国 14 州实施堕胎禁令后妊娠相关死亡增加 9.2%
2021 年美国德州通过法案禁止孕妇在妊娠约 6 周后堕胎。2022 年美国最高法院在 Dobbs v. Jackson Women’s Health Organization 一案中裁决宪法未赋予公民堕胎权,因此推翻了 1973 年的 Roe v. Wade 案。截至 2026 年初美国有 13 个州全面禁止堕胎,7 个州禁止孕妇妊娠 22 周后堕胎。严格堕胎禁令被认为会增加妊娠相关死亡率。发表在《American Journal of Public Health》期刊上的一项研究调查了严格堕胎禁令对孕妇健康的影响。结果显示,在 14 个严格禁止堕胎和禁止妊娠 6 周后堕胎的州,妊娠相关死亡比预期高 9.2%。
- 在内存天价时代 Meta 更新了 CacheLib 项目
Meta 在 2021 年开源了缓存引擎 CacheLib,该项目旨在利用非易失性存储器作为缓存去扩展服务,以抵消不断上涨的 DRAM 成本。该项目在 2024 年 6 月之后就停止了更新,但在 2026 年 5 月 25 日 Meta 再次释出了更新——而今天由于 AI 热 DRAM 价格相比 2021 年几乎是天价。
- 座头鲸迁徙距离超过 1.5 万公里
科学家首次记录了一次非凡的鲸类迁徙壮举,证实两头座头鲸在澳大利亚东部和巴西的繁殖地之间,穿越了超过 1.4 万公里的海洋。研究人员通过对比数万张座头鲸尾鳍的图像来辨认这些鲸。每头鲸的尾鳍都有独特的斑纹,这使得研究人员能长期追踪并识别个体。2007 年,一头座头鲸在澳大利亚昆士兰州的赫维湾首次被拍到。2013年,它再次出现在同一海域,随后于 2019 年现身巴西圣保罗附近。这些繁殖地之间的最短直线距离约为1.42万公里。第二头座头鲸更令人惊叹。研究人员于2003年首次在巴西阿布洛霍斯礁群——该国主要的座头鲸繁殖地,拍摄到了它的身影。当时它正与由9头成年鲸组成的活跃群体一起游弋。22年后的2025年9月,同一头鲸被发现在澳大利亚赫维湾独自游弋。两次目击地之间的距离达 1.51 万公里,这创下了单头座头鲸已知最远迁徙距离的新纪录。研究基于19283张高质量的鲸照片,这些照片拍摄于1984年至2025年间,采集自澳大利亚东部和拉丁美洲。这些图像既来自专业研究人员,也来自通过全球鲸追踪平台“Happywhale”参与的公民科学家。
- 英国皇家医学院学会认为社媒和香烟一样不利于青少年健康
英国皇家医学院学会在递交给政府的咨询意见书中表示,社交媒体的使用与吸烟一样对年轻人的健康构成威胁。医生在接诊年轻患者时,应例行询问他们的屏幕时间和社交媒体使用情况。英国政府正在考虑的一项措施是禁止 16 岁以下儿童使用社交媒体,类似澳大利亚的做法。其它可能采取的限制包括宵禁,或禁用自动播放和无限滚动等功能。儿童精神科医生 Emily Sehmer 认为过度使用社媒的危害远甚于吸烟,因为儿童只需几秒钟就会接触到有害内容。
- Uber COO 称愈来愈难以证明最大化词元花的钱是合理的
Uber 高管表示 AI 上支出并没有带来相应的回报。Uber COO Andrew Macdonald 上周六接受采访时表示愈来愈难以证明最大化 AI 词元花的钱是合理的。而在上个月的一次采访中 Uber CTO Praveen Neppalli Naga 告诉 The Information,该公司已经用完了 2026 年的 Claude Code 预算。Macdonald 称,通过与工程主管的交流,他认识到更高的 AI 词元使用量并没有转变为消费者功能的相应增加。他说 AI 带来的权衡成本愈来愈难以证明支出是合理的。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.