TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0872
THU, MAY 21, 2026
Discover the best information organized by OrangeBot.AI
TODAY · THU, MAY 21, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

May 21, 2026

Here is a summary of today's main news events.

Markets Fluctuate on Iran Deal Hopes and Fed Rate Hike Fears

U.S. markets, led by the Nasdaq, declined as investors weighed mixed signals. Oil prices climbed on reports of stalled U.S.-Iran peace negotiations, which could keep supplies tight. Meanwhile, minutes from the Federal Reserve revealed that officials are open to further interest rate hikes to combat inflation, putting pressure on stocks and causing a significant two-day drop for semiconductor shares.

AI Dominates Tech News with Nvidia Earnings and SpaceX IPO Filing

The artificial intelligence boom continues to drive major headlines. Chipmaker Nvidia reported strong first-quarter revenue of $82 billion, though its stock performance has not kept pace with its market dominance. In a major move, Elon Musk's SpaceX filed for a massive initial public offering (IPO) expected to raise over $80 billion, with documents revealing Musk's controlling stake and the company's current unprofitability.

Son of Mango Founder Becomes Prime Suspect in Tycoon's Death

Jonathan Andic has been identified as the prime suspect in the mysterious death of his father, the billionaire founder of the global fashion retailer Mango. This development places the son at the center of the investigation into the retail tycoon's death.

Europe Faces Economic Headwinds and UK Policy Shifts

European economies are showing signs of strain, with Brussels lowering its Eurozone growth forecast for the year to just 0.9%, citing political uncertainty. In the UK, the government is tightening work visa restrictions, and the Chancellor has announced a new package of economic measures. Meanwhile, the euro has weakened against the dollar ahead of a key interest rate decision from the European Central Bank.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - May 21, 2026

Hacker News Feed: Highlighting key posts and discussions.

No Slop Grenade

(noslopgrenade.com)

16085
Vivaldi 8.0

(vivaldi.com)

189120
Haskell Foundation 2026 Update

(discourse.haskell.org)

14851
Your Most Improbable Life

(kevinkelly.substack.com)

13394
SpaceX S-1

(www.sec.gov)

394302
Flipper One Tech Specs

(docs.flipper.net)

451150
Saying goodbye to asm.js

(spidermonkey.dev)

399150
Map of Metal

(mapofmetal.com)

430169
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - May 21, 2026

huggingface.description

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an "acoustic robustness bottleneck": models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional distortions. We propose Mega-ASR, a unified ASR-in-the-wild framework that combines scalable compound-data construction with progressive acoustic-to-semantic optimization. We introduce Voices-in-the-Wild-2M, covering 7 classic acoustic phenomena and 54 physically plausible compound scenarios, and train Mega-ASR with Acoustic-to-Semantic Progressive Supervised Fine-Tuning and Dual-Granularity WER-Gated Policy Optimization. Extensive experiments demonstrate that Mega-ASR achieves significant advantages over prior state-of-the-art systems on adverse-condition ASR benchmarks (45.69% vs. 54.01% on VOiCES R4-B-F, and 21.49% vs. 29.34% on NOIZEUS Sta-0). On complex compositional acoustic scenarios, Mega-ASR further delivers over 30% relative WER reduction against strong open- and closed-source baselines, establishing a scalable paradigm for robust ASR in-the-wild.

101
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos. Frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the mismatch between training and inference, coupled with the challenge of maintaining long-term consistency, limits the effective utilization of foundation models. To mitigate these concerns, we propose MIGA, a novel infinite-frame long video generation method. Firstly, we propose an effective two-stage alignment mechanism that mitigates the training-inference gap by reducing the excessive noise span fed to the model. We then introduce an innovative dual consistency enhancement mechanism, where the self-reflection approach corrects early high-noise frames and the long-range frame guidance approach leverages later low-noise frames with broad coverage to steer generation, jointly improving temporal consistency. Extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA. Our project page is available at https://xiaokunfeng.github.io/miga_homepage/.

72
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Recent advances in multimodal large language models have driven growing interest in graphical user interface (GUI) agents, yet their generalization remains constrained by the scarcity of large-scale training data spanning diverse real-world applications. Existing datasets rely heavily on costly manual annotations and are typically confined to narrow domains. To address this challenge, we propose Video2GUI, a fully automated framework that extracts grounded GUI interaction trajectories directly from unlabeled Internet videos. Video2GUI employs a coarse-to-fine filtering strategy to identify high-quality GUI tutorial videos and convert them into structured agent trajectories. Applying this pipeline to 500 million video metadata entries, we construct WildGUI, a large-scale dataset containing 12 million interaction trajectories spanning over 1,500 applications and websites. Pre-training Qwen2.5-VL and Mimo-VL on WildGUI yields consistent improvements of 5-20% across multiple GUI grounding and action benchmarks, matching or surpassing state-of-the-art performance. We will release both the WildGUI dataset and the Video2GUI pipeline to support future research of GUI agents.

65
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

The rapid advancement toward long-context reasoning and multi-modal intelligence has made the memory footprint of the Key-Value (KV) cache a dominant memory bottleneck for efficient deployment. While the established per-channel quantization effectively accommodates intrinsic channel-wise outliers in Key tensors, its efficacy diminishes under extreme compression. In this work, we revisit the inherent limitations of the per-channel quantization paradigm from both empirical and theoretical perspectives. Our analysis identifies Token Norm Imbalance (TNI) as the primary bottleneck to quantization fidelity. We demonstrate that TNI systematically amplifies errors when shared quantization parameters are required to span token groups exhibiting substantial norm disparities. Instead of relying on intricate quantization pipelines (e.g., TurboQuant), we propose OScaR (Omni-Scaled Canalized Rotation), an accurate and lightweight KV cache compression framework for X-LLMs (i.e., text-only, multi-modal, and omni-modal LLMs). Advancing the per-channel paradigm, OScaR employs Canalized Rotation followed by Omni-Token Scaling to mitigate TNI-induced sequence-dimensional variance both effectively and efficiently, further supported by our optimized system design and CUDA kernels. Extensive evaluations across X-LLMs show that OScaR consistently outperforms existing methods and achieves near-lossless performance under INT2 quantization, establishing it as a robust, low-complexity, and universal framework that defines a new Pareto front. Compared with the BF16 FlashDecoding-v2 baseline, our OScaR implementation achieves a notable up to 3.0x speedup in decoding, reduces memory footprint by 5.3x, and increases throughput by 4.1x. The code for OScaR is publicly available at https://github.com/ZunhaiSu/OScaR-KV-Quant.

35
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in open-vocabulary industrial anomaly detection (IAD) is often limited by domain-misaligned reasoning and hallucinated structural inferences. To address these challenges, we propose IndusAgent, a tool-augmented agentic framework for open-vocabulary IAD. Specifically, we first construct Indus-CoT, a structured dataset that integrates global visual observations, high-resolution local patches, and expert normalcy priors, providing supervision for fine-tuning the model on rigorous industrial inspection trajectories. Building on this, IndusAgent dynamically orchestrates a set of external tools, including dynamic region cropping, high-frequency feature enhancement, and prior retrieval, thus enabling the agent to actively resolve visual ambiguities and disentangle subtle anomalies. Furthermore, we introduce a gated reinforcement learning objective that jointly optimizes anomaly classification, localization accuracy, anomaly type reasoning, and efficient tool usage, ensuring that tool invocation occurs only when beneficial. Extensive evaluations on five industrial anomaly benchmarks, including MVTec-AD, VisA, MPDD, DTD, and SDD, demonstrate that IndusAgent achieves state-of-the-art zero-shot performance among all existing methods, validating our robustness and generalization capacity.

33
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extremely low-rank and highly predictable. Specifically, we find that the majority of downstream performance gains are captured by a rank-1 approximation of the parameter deltas, where the magnitude of this projection evolves near-linearly with training steps. Motivated by this, we propose a simple and compute-efficient method RELEX (REinforcement Learning EXtrapolation), which estimates the rank-1 subspace from a short observation window and extrapolates future checkpoints via linear regression, with no learned model required. Across three models (i.e., Qwen2.5-Math-1.5B, Qwen3-4B-Base, and Qwen3-8B-Base), RELEX produces checkpoints that match or exceed RLVR performance on both in-domain and out-of-domain benchmarks, requiring as few as 15% steps of full RLVR training. Remarkably, RELEX is able to extrapolate far beyond the observation window at no training cost, predicting checkpoints up to 10-20times beyond the observed prefix with continued improvement (e.g., observe only the first 50 steps and extrapolate to 1000 steps). Our ablation analysis confirms the minimalist sufficiency of RELEX: neither increasing the subspace rank nor employing non-linear modeling yields further gains in extrapolation. Finally, we show that RELEX's success stems from a "denoising" effect: by projecting updates onto the rank-1 subspace, the model discards stochastic optimization noise that would otherwise degrade performance during extrapolation. Our code is available at https://github.com/weizhepei/RELEX.

32
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

The foundational capabilities established by Large Language Models (LLMs) have paved the way for Multimodal Large Language Models (MLLMs), within which Large Audio Language Models (LALMs) are essential for realizing universal auditory intelligence. Despite their remarkable performance, the escalation of LALMs' capabilities has significantly outpaced the development of systemic frameworks to ensure their trustworthiness. This survey provides a comprehensive investigation into the endogenous mechanisms of LALMs, detailing the architectural innovations and alignment algorithms that facilitate emergent reasoning. Specifically, we analyze how the transition to unified end-to-end frameworks and the integration of continuous acoustic signals inherently expand the attack surface. To rigorously evaluate the risks within these paradigms, we establish a comprehensive taxonomy of trustworthiness, categorizing critical vulnerabilities such as cross-modal jailbreaking, latent acoustic backdoors, and biometric privacy leakage. We review the state-of-the-art through six analytical pillars: hallucination, robustness, safety, privacy, fairness, and authentication. The profound imbalance between a mature offensive landscape and underdeveloped defenses further validates the critical trustworthiness gaps and multidimensional risks facing audio-centric intelligence. Finally, we propose a strategic roadmap advocating for "Defense-in-Depth" architectures, causal auditory world modeling, and intrinsic representation engineering to bridge the gap between empirical performance and intrinsically trustworthy audio intelligence. Our project has been uploaded to GitHub https://github.com/Kwwwww74/Awesome-Trustworthy-AudioLLMs.

25
It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical. However, even frontier models remain unreliable in making disclosure decisions, and existing mitigation strategies often degrade underlying task performance. To overcome this privacy-utility trade-off, we propose SELFCI, a complementary self-distillation framework that decouples information suppression from task resolution. SELFCI jointly optimizes two independent reverse KL divergences over distinct teacher distributions derived from feedback: one encourages preserving task-relevant information for utility, while the other enforces minimal and appropriate disclosure. This complementary formulation induces a Product-of-Experts (PoE) target, aligning the policy with the intersection of capability and privacy requirements. Empirical evaluations demonstrate that SELFCI, without relying on costly external supervision, consistently outperforms competitive baselines such as online reinforcement learning algorithms (e.g., GRPO). These trends further extend to out-of-domain settings involving agentic workflows and accumulated private context, suggesting that SELFCI provides a practical path toward CI alignment.

25
CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

While GUI agents have made significant progress in web navigation and basic operating system tasks, their capabilities in professional creative workflows remain largely underexplored. To bridge this gap, we introduce Cutverse, a benchmark designed to systematically evaluate autonomous GUI agents in realistic media post-production environments. We curate expert demonstrations across 7 professional applications (e.g., Premiere Pro, Photoshop), covering 186 complex, long-horizon tasks grounded in authentic editing workflows, involving dense multimodal interfaces and tightly coupled interaction sequences. To support scalable evaluation, we develop a lightweight parser that transforms raw screen recordings and low-level interaction logs into structured, compositional GUI action trajectories with precise grounding. Extensive evaluations reveal that existing agents achieve only 36.0\% task success on realistic media editing tasks, underscoring the challenges posed by complex, long-horizon media post-production workflows in our benchmark.While current models demonstrate promising spatial grounding, multimodal alignment, and coordinated action execution, they remain limited in long-horizon reliability and domain-specific planning.

18
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

LLM agents have recently emerged as a powerful paradigm for solving complex tasks through planning, tool use, memory retrieval, and multi-step interaction. However, these agentic workflows often introduce substantial input-side overhead, making the compute-intensive prefilling stage a key bottleneck in long-context, multi-turn inference. In this work, we propose Mix-Quant, a simple and effective phase-aware quantization framework for fast agentic inference. We first investigate FP4 quantization in agentic LLM workflows and observe that quantizing the entire inference process can incur significant performance degradation. In contrast, the prefilling stage exhibits substantial quantization redundancy and can therefore be quantized with minimal accuracy loss, despite being the dominant source of computation. Based on this insight, we apply high-throughput NVFP4 quantization to the prefilling phase while preserving BF16 precision for decoding. By decoupling prefilling acceleration from decoding quality, Mix-Quant combines phase-aware algorithmic quantization with hardware-efficient NVFP4 execution to alleviate the inference bottleneck in LLM agents. Extensive experiments across long-context and agentic benchmarks demonstrate that Mix-Quant largely preserves task performance while delivering significant efficiency improvements, achieving up to a 3x speedup during prefilling.

17
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely resulting in a performance trade-off rather than true mutual reinforcement. To break this paradigm, we propose Uni-Edit, an intelligent image editing task that serves as the first general task for UMM tuning. Unlike complex mixed pipelines, Uni-Edit improves performance across all three abilities at once using only one task, one training stage, and one dataset. Specifically, we first identify image editing as an inherently ideal general task, as it naturally demands both visual understanding and generation. However, existing editing data relies on simplistic instructions that severely underutilize a model's understanding capacity. To address this, we introduce the first automated and scalable data synthesis pipeline for intelligent editing, transforming diverse VQA data into complex and effective editing instructions with embedded questions and nested logic. This yields Uni-Edit-148k, pairing diverse reasoning-intensive instructions with high-quality edited images. Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves comprehensive enhancements across all three capabilities without any auxiliary operations.

17
Generative Recursive Reasoning

How should future neural reasoning systems implement extended computation? Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative latent-state refinement with shared transition functions. Yet existing RRMs are largely deterministic, following a single latent trajectory and converging to a single prediction. We introduce Generative Recursive reAsoning Models (GRAM), a framework that turns recursive latent reasoning into probabilistic multi-trajectory computation. GRAM models reasoning as a stochastic latent trajectory, enabling multiple hypotheses, alternative solution strategies, and inference-time scaling through both recursive depth and parallel trajectory sampling. This yields a latent-variable generative model supporting conditional reasoning via p_θ(y mid x) and, with fixed or absent inputs, unconditional generation via p_θ(x). Trained with amortized variational inference, GRAM improves over deterministic recurrent and recursive baselines on structured reasoning and multi-solution constraint satisfaction tasks, while demonstrating an unconditional generation capability. https://ahn-ml.github.io/gram-website

15
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Evaluating large language models (LLMs) on natural-language logical reasoning is essential because rule-governed tasks require conclusions to follow strictly from stated premises. Many existing logical-reasoning benchmarks are generated by templating natural-language items from sampled formulas, provide only coarse or unaudited formal annotations, and are now quickly saturated by frontier reasoning models. We present LLMEval-Logic, a Chinese logical reasoning benchmark built from realistic situational scenarios. Its pipeline forward-authors and expert-audits natural-language items together with their reference formalizations, verifies annotated answers with Z3, constructs expert rubrics for natural-to-formal grading, and hardens selected items through a closed-loop adversarial workflow. The benchmark is released in two paired subsets: a 246-item Base subset shipped with 1,400 expert-developed rubric atoms, and a 190-item Hard subset with 938 multi-step sub-questions over closed model spaces. Evaluating 14 frontier LLMs on LLMEval-Logic reveals substantial gaps in current models: the best model reaches only 37.5% Hard Item Accuracy, and even with reference symbols the highest joint Z3+Rubric formalization score among evaluated models reaches only 60.16%. Our benchmark is publicly available at https://github.com/llmeval/LLMEval-Logic.

14
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.

9
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are more optimistic about their readiness without concrete evidence. Understanding what AI reviewers do well, where they fall short, and what challenges remain is essential. However, existing evaluations of AI reviewers have focused on whether their verdicts match human verdicts (e.g., score alignment, acceptance prediction), which is insufficient to characterize their capabilities and limits. In this paper, we close this gap through a large-scale expert annotation study, in which 45 domain scientists in Physical, Biological, and Health Sciences spent 469 hours rating 2,960 individual criticisms (each targeting one specific aspect of a paper) from human-written and AI-generated reviews of 82 Nature-family papers on correctness, significance, and sufficiency of evidence. On a composite of all three dimensions, a reviewing agent powered by GPT-5.2 scores above each paper's top-rated human reviewer (60.0% vs. 48.2%, p = 0.009), while all three AI reviewers (including Gemini 3.0 Pro and Claude Opus 4.5) exceed the lowest-rated human across every dimension. AI reviewers' accurate criticisms are also more often rated significant and well-evidenced, and surface a distinct 26% of issues no human raises. However, AI reviewers overlap far more than humans do (21% vs. 3% for cross-reviewer pairs), and exhibit 16 recurring weaknesses humans do not share, such as limited subfield knowledge, lack of long context management over multiple files, and overly critical stance on minor issues. Overall, our results position current AI reviewers as complements to, not substitutes for, human reviewers.

9
HRM-Text: Efficient Pretraining Beyond Scaling

The current pretraining paradigm for large language models relies on massive compute and internet-scale raw text, creating a significant barrier to foundational research. In contrast, biological systems demonstrate highly sample-efficient learning through multi-timescale processing, such as the functional organization of the frontoparietal loop. Taking this as inspiration, we introduce HRM-Text, which replaces standard Transformers with a Hierarchical Recurrent Model (HRM) that decouples computation into slow-evolving strategic and fast-evolving execution layers. To stabilize this deep recurrence for language modeling, we introduce MagicNorm and warmup deep credit assignment. Furthermore, instead of standard raw-text pretraining, we train exclusively on instruction-response pairs using a task-completion objective and PrefixLM masking. Serving as an empirical existence proof of efficient pretraining, a 1B-parameter HRM-Text model trained from scratch on only 40 billion unique tokens and $1,500 budget achieves 60.7% on MMLU, 81.9% on ARC-C, 82.2% on DROP, 84.5% on GSM8K, and 56.2% on MATH. Despite utilizing roughly 100-900x fewer training tokens and 96-432x less estimated compute than standard baselines, HRM-Text performs competitively with 2-7B parameter open models. These results demonstrate that co-designing architectures and objectives can radically reduce the compute-to-performance ratio, making pretraining from scratch accessible to the broader research community.

8
Stable Audio 3

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.

6
OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address this issue, we first construct SA-Z, a large-scale dataset enriched with explicit occlusion ordering and pixel-level annotations. Building upon our proposed dataset, we introduce OcclusionFormer, a novel occlusion-aware Diffusion Transformer framework that explicitly models Z-order priority by decoupling instances and compositing them via volume rendering. Furthermore, to ensure fine-grained spatial precision, we introduce a queried alignment loss that explicitly supervises individual instances and enhances semantic consistency. The proposed method effectively reduces ambiguity in overlapping regions, enforces correct occlusion dependencies, and preserves structural integrity, leading to substantial accuracy gains across diverse scenes.

6
Stitched Value Model for Diffusion Alignment

For practical use, diffusion- or flow-based generative models must be aligned with task-specific rewards, such as prompt fidelity or aesthetic preference. That alignment is challenging because the reward is defined for clean output images, but the alignment procedure requires value function estimates at noisy intermediate latents. Existing methods resort to Tweedie-style or Monte Carlo approximations, trading off estimator bias against computational cost: Tweedie estimates are efficient but biased, while Monte Carlo estimates are more accurate but require expensive rollouts. A natural alternative would be a learned value function, but it remains an open question how to effectively train a strong and general value model specifically for noisy latents. Here, we propose StitchVM, a model stitching framework that efficiently transfers reward models pretrained for clean images to the noisy latent regime. StitchVM starts from an existing, truncated pixel-space reward model and attaches a frozen diffusion backbone to it as its head. From the pixel-space model, the resulting hybrid retains a carefully pretrained, robust reward capability; from the diffusion backbone, it inherits its native ability to handle noisy latents. The stitching procedure is exceptionally lightweight, e.g., stitching and finetuning CLIP ViT-L and SD 3.5 Medium takes only 10 GPU-hours. By lifting powerful pixel-space reward models to latent space, StitchVM opens up a new style of diffusion alignment: instead of rough, yet costly per-sample approximation of the value function, the correct function for the actual, noisy latents is constructed once and then amortized over many samples and iterations. We show that this approach yields improvements across a broad range of downstream steering and post-training methods: DPS becomes 3.2times faster while halving peak GPU memory, and DiffusionNFT becomes 2.3times faster.

5
Toto 2.0: Time Series Forecasting Enters the Scaling Era

We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.

4
The Unlearnability Phenomenon in RLVR for Language Models

Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language Model's (LLM) reasoning ability. However, the learning dynamics of RLVR remain underexplored. In this paper, we reveal a counterintuitive phenomenon: among hard examples that the model initially struggles with, a substantial subset remains unlearnable even when correct rollouts are present. To understand the phenomenon, we first demonstrate that existing optimization and sampling techniques fail to resolve unlearnability. With cross-example gradient analysis, we show that unlearnable examples have fundamental representation issue, characterized by low gradient similarity with the rest of the examples and ungeneralizable reasoning patterns. We further show that representation flaws are difficult to mitigate in RL, as data augmentation does not improve gradient similarity. Our study provides the first systematic characterization of unlearnable data in RLVR training and reveals fundamental limitations in current RL approaches for reasoning tasks. Code and data are available at https://github.com/yulinchen99/unlearnability-rlvr.

4
PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis

Generating a consistent whole-house VR tour from a floorplan and style reference requires both photorealistic panoramas and cross-view spatial coherence. Pure 2D generators produce appealing single panoramas but re-imagine geometry and materials when the viewpoint changes, whereas monolithic 3D generation becomes expensive and loses fine texture at multi-room scale. We introduce PanoWorld, a generative spatial world model that treats whole-house synthesis as autoregressive generation of node-based 360-degree panoramas, matching the discrete navigation used by real VR tour products. PanoWorld uses a floorplan-derived 3D shell as a global geometric proxy and a dynamic 3D Gaussian Splatting cache as renderable spatial memory. A feed-forward panoramic LRM designed for metric-scale multi-room 360-degree inputs lifts generated panoramas into local 3DGS updates, while Room-aware Group Attention suppresses cross-room feature interference. A topology-aware progressive caching strategy fuses these local updates without repeatedly reconstructing the full history. By decoupling shell-based geometry guidance from cache-rendered visual memory, PanoWorld preserves high-frequency 2D synthesis quality while improving cross-node layout and material consistency. The project link is https://jjrcn.github.io/PanoWorld-project-home/

4
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization. Project Page: https://octopus-quant.github.io/

3
Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently violated in practice: the RLHF-optimal policy must prefer human-preferred responses. When this assumption fails, DPO optimizes relative advantage over the reference policy rather than absolute alignment with human preferences, leading to pathological convergence where policies decrease DPO loss while preferring dispreferred responses. We characterize when this assumption is violated, show the existence of an undesirable solution space, and prove that DPO and RLHF optimize fundamentally different objectives in such cases. To address this, we introduce Constrained Preference Optimization (CPO), augmenting RLHF with constraints for provable alignment. We further provide a geometric interpretation through soft margin ranking, revealing that DPO implements margin ranking with potentially negative targets. Our theoretical analysis establishes when DPOs' guarantees hold and provides solutions preserving simplicity with provable alignment. Comprehensive experiments on standard benchmarks demonstrate that CPO achieves state-of-the-art performance. Code is available at: https://github.com/visitworld123/CPO.

3
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward hacking phenomenon by decompose software engineering tasks into three parts: (i) a natural language description of the specification (ii) visible validation tests that exercise specified features in isolation, and (iii) held-out tests that compose those same features to simulate real-world usage. Based on the specification and the visible validation test suites, a genuine agent would be able to generate a solution that can also pass all of the held-out tests. Therefore we use the gap in pass rates on these two suites to quantify reward hacking. Based on this methodology, we introduce SpecBench, a benchmark comprising 30 systems-level programming tasks ranging from short horizon tasks like building a JSON parser to ultra long horizon tasks like building an entire OS kernel from scratch. Large-scale experiments reveal a consistent pattern: while every frontier agent saturates the visible suite, reward hacking persists, with smaller models exhibiting larger gaps on holdout suites. The gap also scales sharply with task length: it grows by 28 percentage points for every tenfold increase in code size. Failures range from subtle feature isolation to deliberate exploits, including a 2,900-line hash-table "compiler" that memorizes test inputs. SpecBench offers a principled testbed for measuring whether coding agents build genuine working systems or merely game the test suites developers hand them.

3
Mem-π: Adaptive Memory through Learning When and What to Generate

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downstream agent, to generate context-specific guidance for complex tasks. Conditioned on the current agent context, the model jointly decides when to produce guidance and what guidance to produce. We train it with a decision-content decoupled reinforcement learning (RL) objective, enabling it to abstain when generation would not help and otherwise produce concise, useful guidance. Across diverse agentic benchmarks spanning web navigation, terminal-based tool use, and text-based embodied interaction, Mem-π consistently outperforms retrieval-based and prior RL-optimized memory baselines, achieving over 30% relative improvement on web navigation tasks.

3
UniT: Unified Geometry Learning with Group Autoregressive Transformer

Recent feed-forward models have significantly advanced geometry perception for inferring dense 3D structure from sensor observations. However, its essential capabilities remain fragmented across multiple incompatible paradigms, including online perception, offline reconstruction, multi-modal integration, long-horizon scalability, and metric-scale estimation. We present UniT, a unified model built upon a novel Group Autoregressive Transformer, which reformulates these seemingly disparate capabilities within a single framework. The key idea is to treat groups of sensor observations as the basic autoregressive units and predict the corresponding point maps in an anchor-free and scale-adaptive manner. More specifically, diverse view configurations in both online and offline settings are naturally unified within a single group autoregression process. By varying the group size, online mode operates over multiple autoregressive steps with single-frame groups, whereas offline mode aggregates a multi-frame group in a single forward pass. Meanwhile, a queue-style KV caching mechanism ensures bounded autoregressive memory over long horizons. This is enabled by reducing long-range dependencies on early frames through anchor-free relational modeling, thereby allowing outdated memory to be discarded on the fly. To improve metric-scale generalization across scenes, a scale-adaptive geometry loss is further introduced within this framework. It couples relative geometric constraints with a partial absolute scale term, implicitly regularizing global scale and inducing a progressive transition from scale-invariant geometry to metric-scale solutions. Together with a dedicated modal attention module for integrating auxiliary modalities, UniT achieves state-of-the-art performance in unified geometry perception, as validated on ten benchmarks spanning seven representative tasks.

3
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

LLM agents organize behavior through skills - structured natural-language specifications governing how an agent reasons, retrieves, and responds. Unlike monolithic prompts, skills are multi-field artifacts subject to hard platform constraints: description fields are truncated for routing, instruction bodies are compacted via progressive disclosure, and co-resident skills compete for limited context windows. These constraints make skill optimization inherently multi-objective: a skill must simultaneously maximize task performance and satisfy platform limits. Yet existing prompt optimizers either ignore these trade-offs or collapse them into a weighted sum, missing Pareto-optimal variants in non-convex objective regions. We introduce MOCHA (Multi-Objective Chebyshev Annealing), which replaces single-objective selection with Chebyshev scalarization - covering the full Pareto front, including non-convex regions - combined with exponential annealing that transitions from exploration to exploitation. In our experiments across six diverse agent skills - where all methods share the same multi-objective mutation operator and baselines receive identical per-objective textual feedback - existing optimizers fail to improve the seed skill on 4 of 6 tasks: 1000 rollouts yield zero progress. MOCHA breaks through on every task, achieving 7.5% relative improvement in mean correctness over the strongest baseline (up to 14.9% on FEVER and 10.4% on TheoremQA) while discovering twice as many more Pareto-optimal skill variants.

2
Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.

2
LongMINT: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

Real-world agents operate over long and evolving horizons, where information is repeatedly updated and may interfere across memories, requiring accurate recall and aggregated reasoning over multiple pieces of information. However, existing benchmarks focus on static, independent recall and fail to capture these dynamic interactions between evolving memories. In this paper, we study how current memory-augmented agents perform in realistic, interference-heavy, long-horizon settings across diverse domains and question types. We introduce LongMINT (Long-Horizon Memory under INTerference), a benchmark featuring (1) long, highly interconnected contexts with frequently updated information that induces substantial interference, (2) diverse domains (state tracking, multi-turn dialogue, Wikipedia revisions, and GitHub commits), enabling evaluation of domain generalization, and (3) diverse question types that assess robustness to interference, including (i) single-target recall tasks requiring retrieval of a specific target from long contexts, and (ii) multi-target aggregation tasks requiring reasoning over multiple relevant pieces of information. Overall, LongMINT has 15.6k question-answering pairs over long-horizon contexts averaging 138.8k tokens and extending up to 1.8M tokens per instance. We evaluate 7 representative systems, including vanilla long-context LLMs, RAG, and memory-augmented agent frameworks. Across all systems, we observe consistently low performance (avg. 27.9% accuracy), especially on questions requiring aggregated reasoning over multiple pieces of evidence. Our analysis shows that performance is primarily limited by retrieval and memory construction. Furthermore, current memory systems struggle to recall and reason over earlier facts that are later revised or interfered with by subsequent context, with performance degrading as the number of intervening updates increases.

2
Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection

Safety post-training can improve the harmfulness and policy compliance of Large Language Models (LLMs), but it may also reduce general utility, a phenomenon often described as the alignment tax. We study this trade-off through the lens of continual learning: sequential alignment stages expose the model to shifted data distributions and objectives, and their gradients may interfere with directions that support previously acquired general capabilities. This view does not claim that all alignment degradation has a single cause; rather, it provides a useful first-order mechanism for mitigating one important source of capability regression. We propose Orthogonal Gradient Projection for Safety Alignment (OGPSA), a lightweight update rule that estimates a low-rank reference subspace from gradients on a small set of general-capability data and removes from each safety gradient the component lying in this subspace. The resulting update is the steepest local safety-descent direction subject to first-order preservation constraints on the reference objectives. OGPSA is compatible with standard post-training pipelines and avoids large-scale replay, although it introduces periodic reference-gradient computation. Across Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and sequential SFTrightarrowDPO settings, OGPSA improves the observed safety--utility trade-off over standard baselines. Under the sequential SFTrightarrowDPO pipeline, the average performance gain increases from 33.98\% to 42.74\% on Qwen2.5-7B-Instruct and from 19.74\% to 32.98\% on Llama3.1-8B-Instruct. We have open sourced our code at https://github.com/SunGL001/OGPSA.

1
iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interaction. To bridge this gap, we introduce and formalize a new challenging task: Interactive Video Virtual Try-On (Interactive VVT), where subjects in the video actively engage with their clothing. This task introduces unique challenges beyond simple texture preservation, including: (1) resolving the semantic ambiguity of interactions from standard pose information, and (2) learning complex garment deformations from video where interactive moments are sparse and brief. To address these challenges, we propose iTryOn, a novel framework built upon a large-scale video diffusion Transformer. iTryOn pioneers a multi-level interaction injection mechanism to guide the generation of complex dynamics. At the spatial level, we introduce a garment-agnostic 3D hand prior to provide fine-grained guidance for precise hand-garment contact, effectively resolving spatial ambiguity. At the semantic level, iTryOn leverages global captions for overall context and time-stamped action captions for localized interactions, synchronized via our novel Action-aware Rotational Position Embedding (A-RoPE). Extensive experiments demonstrate that iTryOn not only achieves state-of-the-art performance on traditional VVT benchmarks but also establishes a commanding lead in the new interactive setting, marking a significant step towards more dynamic and controllable virtual try-on experiences.

1
Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Large Vision Language Models (LVLMs) show promise in medical applications, but their inability to faithfully ground responses in visual evidence raises serious concerns about clinical trustworthiness. While visual attribution methods are widely used to explain LVLM predictions, whether these explanations actually reflect the visual evidence underlying the model's decision is largely unverified, since ground-truth annotations for internal model reasoning are typically unavailable. We address this question for chest X-ray (CXR) reasoning by developing a causal evaluation framework that retains only CXR-VQA samples for which the expert-annotated region is verified, via counterfactual editing, to be causally responsible for the model's prediction. Using this framework across 11 attribution methods, six open-source LVLMs, and two output modes (direct answer and step-by-step reasoning), we find that existing attribution methods often fail to identify the evidence used by LVLMs. To address this failure, we propose MedFocus, a concept-based attribution method that localizes clinically meaningful anatomical regions via unbalanced optimal transport and measures their causal effect on model outputs through targeted interventions. MedFocus produces spatial, concept-level, and token-level attributions and substantially outperforms prior methods, taking a step toward more trustworthy attribution for medical LVLMs. Our data and code are available at https://github.com/gzxiong/medfocus/.

1
DrawMotion: Generating 3D Human Motions by Freehand Drawing

Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated motions, respectively. Specifically, we tackle the fine-grained motion generation task from three perspectives: 1) freehand drawing condition. To accurately capture users' intended motions without requiring tedious textual input, we develop an algorithm to automatically generate hand-drawn stickman sketches across different dataset formats; 2) multi-condition fusion. We propose a Multi-Condition Module (MCM) that is integrated into the diffusion process, enabling the model to exploit all possible condition combinations while reducing computational complexity compared to conventional approaches; and 3) training-free guidance. Notably, the MCM in DrawMotion ensures that its intermediate features lie in a continuous space, allowing classifier-guidance gradients to update the features and thereby aligning the generated motions with user intentions while preserving fidelity. Quantitative experiments and user studies demonstrate that the freehand drawing approach reduces user time by approximately 46.7% when generating motions aligned with their imagination. The code, demos, and relevant data are publicly available at https://github.com/InvertedForest/DrawMotion.

1
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited support for scalable generation, automatic verification, or planning-oriented training. We introduce PlanningBench, a framework for generating scalable, diverse, and verifiable planning data for both evaluation and training. PlanningBench starts from real planning scenarios and abstracts practical workflows into a structured taxonomy of more than 30 task types, subtasks, constraint families, and difficulty factors. Guided by this taxonomy, a constraint-driven synthesis pipeline instantiates self-contained planning problems with adaptive difficulty control, quality filtering, and instance-level verification checklists. This shifts planning data construction from fixed benchmark collection to controllable generation while preserving realistic task grounding. We use PlanningBench to evaluate open-source and closed-source frontier LLMs, and find that current models still struggle to produce complete solutions under coupled constraints. Beyond evaluation, reinforcement learning on verified PlanningBench data improves performance on unseen planning benchmarks and broader instruction-following tasks. Further analysis suggests that determinate or well-specified optimal solutions provide clearer reward signals and more stable training dynamics. Overall, PlanningBench provides a controllable source of planning data for diagnosing and improving generalizable planning abilities in LLMs.

1
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - May 21, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Mintlify Workflows icon
Mintlify Workflows

Self-updating knowledge bases

0
WarmIntro icon
WarmIntro

Free tool to find your warmest path into any company

0
Slideshot icon
Slideshot

Product demo videos, recorded by your AI agent

0
Framed icon
Framed

Turn screenshots, videos, and code into polished visuals

0
WeWeb 3.0 icon
WeWeb 3.0

Vibe-code apps with the safety net of a no-code editor

0
AutoSubtitles 2.0 icon
AutoSubtitles 2.0

AI subtitles & animated captions with faster editing

0
Google Antigravity 2.0 icon
Google Antigravity 2.0

Orchestrate multi-agent workflows from a desktop app

0
Tycoon AI icon
Tycoon AI

Run one-person companies entirely with AI agents

0
Tacet icon
Tacet

The brain monitor for cognitive health scores

0
CatchAll by NewsCatcher icon
CatchAll by NewsCatcher

Build any dataset from the web. Filtered to your criteria.

0
TongueType for macOS icon
TongueType for macOS

Local dictation for macOS without the subscription

0
Basedash Skills icon
Basedash Skills

Reusable AI instructions for every Basedash surface.

0
AlliHat icon
AlliHat

Claude AI in your Safari sidebar

0
Visual Usability Checker icon
Visual Usability Checker

Validate your design decisions instantly with AI insights

0
InstaVM icon
InstaVM

Instant computers in isolated environments for AI agents

0
Vivaldi 8.0 icon
Vivaldi 8.0

A bold new look for the browser that's all yours

0
Novi Notes 1.1 icon
Novi Notes 1.1

A local AI memory layer for your Mac

0
Ente Locker icon
Ente Locker

Shared vault for your most important documents

0
Mixpanel Headless icon
Mixpanel Headless

Programmatic access to product analytics for agents and devs

0
Chromtuner icon
Chromtuner

A chromatic tuner for macOS. ±1¢ accuracy

0
LayerProof Kraft icon
LayerProof Kraft

Co-write insightful long form content

0
Contextberg icon
Contextberg

Turn your work into AI agent memory, served over MCP

0
Tether icon
Tether

The presence who comes to life in your messages

0
Tophat by Shopify icon
Tophat by Shopify

Test mobile CI builds on any device without building locally

0
Insta360 Mic Pro icon
Insta360 Mic Pro

Pro audio with a customizable color E-Ink face

0
Manus Scheduled Tasks 2.0 icon
Manus Scheduled Tasks 2.0

Run recurring Manus work inside the same task context

0
mailX by mailwarm icon
mailX by mailwarm

Email deliverability toolkit for humans and AI agents

0
StoreClaw icon
StoreClaw

Grow your store profits with agents that know how to sell

0
Gemini Omni icon
Gemini Omni

Create anything from any input – starting with video

0
Viberia icon
Viberia

Command AI agents like you're playing Civilization

0
Glia icon
Glia

Local-first AI memory bridge between browser chats and IDEs

0
Runtime icon
Runtime

Sandboxed coding agents for everyone on your team

0
Supercut for Agents icon
Supercut for Agents

Permission-aware AI access to recordings and metadata

0
Retina icon
Retina

Screen recorder w/ auto-zoom, smooth cursors, + AI graphics

0
Skilled icon
Skilled

Dashboard to find agent skills you no longer need

0
Type Switch 3.0 for macOS icon
Type Switch 3.0 for macOS

Instant language switching for multilingual Mac users

0
Owlish icon
Owlish

Reduce support volume with AI agents trained on your docs

0
GhostSnap icon
GhostSnap

Multiple screenshots - Single paste - Auto compressed for AI

0
Multi-Claude icon
Multi-Claude

Run multiple Claude accounts side by side on your Mac

0
Emdash icon
Emdash

One app. Every coding agent. Open-source.

0
Re_gent icon
Re_gent

Version Control for AI agent Activity

0
Invenio icon
Invenio

Local AI search for Mac video & photo libraries

0
Cosmic Insights icon
Cosmic Insights

Cookieless web analytics built into your CMS

0
CLI Market icon
CLI Market

3,760 retailers, one API for AI agents

0
Buggyverse icon
Buggyverse

Study with strangers online, high-accountability focus rooms

0
VWFNDR™ + MBL icon
VWFNDR™ + MBL

Take raw photos with proof they're real, not AI

0
Thinnest AI icon
Thinnest AI

Build Voice AI Agents in 100+ languages for ₹1.5/min

0
Haystack icon
Haystack

Review the pull requests that actually need human attention

0
Agora-1 by Odyssey icon
Agora-1 by Odyssey

A multi-agent world model you can play

0
Papr Graph icon
Papr Graph

Upgrade to graph-native vector embeddings

0
06

TECHMEME

06.00
TECHMEME

Techmeme - May 21, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Sources: Anthropic is in talks to rent servers powered by Microsoft-designed chips; source: Anthropic has steadily increased its Azure usage since November 2025 (The Information)
Source: TechmemePublished: May 21, 2026

The Information : Sources: Anthropic is in talks to rent servers powered by Microsoft-designed chips; source: Anthropic has steadily increased its Azure usage since November 2025 —  Anthropic is in talks to rent servers powered by Microsoft-designed AI server chips as it seeks more computing power …

Anthropic, Blackstone, and Hellman & Friedman's unnamed enterprise services JV buys Fractional AI, its first deal; sources say Fractional ends its OpenAI deal (Preeti Singh/Bloomberg)
Source: TechmemePublished: May 21, 2026

Preeti Singh / Bloomberg : Anthropic, Blackstone, and Hellman & Friedman's unnamed enterprise services JV buys Fractional AI, its first deal; sources say Fractional ends its OpenAI deal —  The new artificial intelligence enterprise services firm backed by Blackstone Inc., Anthropic PBC and Hellman & Friedman …

Flipper unveils the Flipper One, a pocketable open Arm Linux computer with similar performance to a Raspberry Pi 5, and welcomes feedback to get it market-ready (Mark Tyson/Tom's Hardware)
Source: TechmemePublished: May 21, 2026

Mark Tyson / Tom's Hardware : Flipper unveils the Flipper One, a pocketable open Arm Linux computer with similar performance to a Raspberry Pi 5, and welcomes feedback to get it market-ready —  But the devs admit there's still a lot of work to be done to get the Flipper One market-ready, and they are looking for contributors.

GitHub links the breach of 3,800 internal repositories to the TanStack npm supply-chain attack, saying hackers used a malicious Nx Console VS Code extension (Sergiu Gatlan/BleepingComputer)
Source: TechmemePublished: May 21, 2026

Sergiu Gatlan / BleepingComputer : GitHub links the breach of 3,800 internal repositories to the TanStack npm supply-chain attack, saying hackers used a malicious Nx Console VS Code extension —  GitHub says the hackers who breached 3,800 internal repositories gained access via a malicious version of the Nx Console VS Code extension …

Netflix and iHeartMedia say Charlamagne tha God's The Breakfast Club will stream live on Netflix on weekdays from June 1, the service's first daily live show (Anne Steele/Wall Street Journal)
Source: TechmemePublished: May 21, 2026

Anne Steele / Wall Street Journal : Netflix and iHeartMedia say Charlamagne tha God's The Breakfast Club will stream live on Netflix on weekdays from June 1, the service's first daily live show —  Charlamagne tha God's morning radio show has been a standout success among the streamer's video podcast efforts

Bluesky and Clemson University researchers detail a novel Russian influence campaign that hijacked influential Bluesky accounts to spread pro-Kremlin propaganda (Steven Lee Myers/New York Times)
Source: TechmemePublished: May 21, 2026

Steven Lee Myers / New York Times : Bluesky and Clemson University researchers detail a novel Russian influence campaign that hijacked influential Bluesky accounts to spread pro-Kremlin propaganda —  The company said it was fighting Russian efforts to hijack real users' accounts to post fake content, an apparently novel tactic.

The US Commerce Department plans to award $2B in grants to nine quantum computing companies and will take equity stakes; IBM is set to get $1B of the package (Wall Street Journal)
Source: TechmemePublished: May 21, 2026

Wall Street Journal : The US Commerce Department plans to award $2B in grants to nine quantum computing companies and will take equity stakes; IBM is set to get $1B of the package —  Trump administration hopes to spur ‘a new era of American innovation,’ Commerce's Lutnick says  —  WASHINGTON—The Trump administration …

Taiwan is seeking to detain three people for forging documents to export Super Micro servers with Nvidia chips to China, Hong Kong, and Macau, breaking US rules (Bloomberg)
Source: TechmemePublished: May 21, 2026

Bloomberg : Taiwan is seeking to detain three people for forging documents to export Super Micro servers with Nvidia chips to China, Hong Kong, and Macau, breaking US rules —  Taiwanese officials are seeking to detain three individuals for forging documents in order to export Nvidia Corp. AI chips to China …

Higgsfield AI premieres Hell Grind, a 95-minute fully AI-generated film, at Cannes and says it took two weeks and cost $500K to make, of which $400K was on AI (Isabelle Bousquette/Wall Street Journal)
Source: TechmemePublished: May 21, 2026

Isabelle Bousquette / Wall Street Journal : Higgsfield AI premieres Hell Grind, a 95-minute fully AI-generated film, at Cannes and says it took two weeks and cost $500K to make, of which $400K was on AI —  ‘Hell Grind,’ a 95-minute fully AI-generated film, premieres this week at Cannes, where questions around the technology's encroachment remain center-stage

SpaceX S-1: Valor Equity Partners founder Antonio Gracias, a longtime Elon Musk ally, controls a 7.3% stake in SpaceX, the second-largest holder after Musk (Bloomberg)
Source: TechmemePublished: May 21, 2026

Bloomberg : SpaceX S-1: Valor Equity Partners founder Antonio Gracias, a longtime Elon Musk ally, controls a 7.3% stake in SpaceX, the second-largest holder after Musk —  Antonio Gracias, the founder of Valor Equity Partners and a longtime Elon Musk ally, controls a 7.3% stake in SpaceX …

Jeff Bezos dismisses AI job fears, defends billionaires against "vilification", proposes eliminating income taxes for low earners, and praises President Trump (CNBC)
Source: TechmemePublished: May 21, 2026

CNBC : Jeff Bezos dismisses AI job fears, defends billionaires against “vilification”, proposes eliminating income taxes for low earners, and praises President Trump —  VIDEO 54:38  —  Watch CNBC's full interview with Amazon founder Jeff Bezos  —  Ultrabillionaire Jeff Bezos …

Analysis: Samsung is set to distribute ~$26.6B to its 78,000 chip employees, or a ~$340K bonus to each, in early 2027 as part of a last-minute labor union deal (Bloomberg)
Source: TechmemePublished: May 21, 2026

Bloomberg : Analysis: Samsung is set to distribute ~$26.6B to its 78,000 chip employees, or a ~$340K bonus to each, in early 2027 as part of a last-minute labor union deal —  Samsung Electronics Co. could distribute about 40 trillion won ($26.6 billion) to chip employees as a bonus for this year …

Sources: Manus' co-founders are in talks to raise $1B+ to buy back the Chinese-founded startup, after Beijing ordered Meta to unwind its $2B Manus acquisition (Bloomberg)
Source: TechmemePublished: May 21, 2026

Bloomberg : Sources: Manus' co-founders are in talks to raise $1B+ to buy back the Chinese-founded startup, after Beijing ordered Meta to unwind its $2B Manus acquisition —  The co-founders of Manus are exploring options to fulfill Beijing's demand to unwind a controversial takeover by Meta Platforms Inc. …

AMD pledges to invest $10B+ in Taiwan's chip industry to make advanced chip packaging for AI, and says TSMC will ramp up production of its next-gen Venice chips (Sherry Qin/Wall Street Journal)
Source: TechmemePublished: May 21, 2026

Sherry Qin / Wall Street Journal : AMD pledges to invest $10B+ in Taiwan's chip industry to make advanced chip packaging for AI, and says TSMC will ramp up production of its next-gen Venice chips —  The company is investing more to meet growing demand for artificial-intelligence infrastructure

Arizona State University study: waste heat from Phoenix-area data centers raised downwind neighborhoods' air temperatures by up to 4°F above upwind temperatures (Tech Xplore)
Source: TechmemePublished: May 21, 2026

Tech Xplore : Arizona State University study: waste heat from Phoenix-area data centers raised downwind neighborhoods' air temperatures by up to 4°F above upwind temperatures —  Waste heat from data centers can boost air temperatures in downwind neighborhoods by as much as 4 degrees Fahrenheit …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - May 21, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - May 21, 2026

Solidot Feed: Highlighting essential tech & open-source news.

SpaceX 最大的收入来源是与 Anthropic 达成的数据中心交易

SpaceX 周三晚上向美国证券交易委员会(SEC)递交了招股说明书,首次披露了其财务状况。根据招股说明书,在合并了马斯克(Elon Musk)旗下的 xAI 和 X/Twitter 之后,SpaceX 最大的收入来源就是今年五月与 Anthropic 达成的为期三年的数据中心交易,租用 Colossus 1 园区的算力,每月支付 12.5 亿美元。但这笔交易并非是保障性,任何一方都可以提前 90 天通知终止交易。其它数据包括:2025 年营收 187 亿美元,营业亏损 26 亿美元,净亏损 49 亿美元。其中卫星宽带 Starlink / Connectivity 业务营收 114 亿美元营业利润 44 亿美元,太空发射业务营收 41 亿美元运营亏损 6.57 亿美元,AI 以及社媒业务营收 32 亿美元营业亏损 64 亿美元。招股书数百次提及 AI。马斯克持有 12.3% 的 A 类股和 93.6% 的 B 类股,B 类股投票权十倍于 A 类股,马斯克总共控制着公司 85.1% 的投票权。如果他出售任何 B 类股,它们将自动转换为 A 类股。

Google 的 AI 搜索容易被人为操纵

Google 的 AI 搜索非常容易被人为操纵。因为以前的搜索结果是第一页给你 10 个链接然后让用户判断,现在的 AI 搜索是给你一个答案,而答案的来源可能只有一个。BBC 科技记者通过个人网站上一篇热狗文章演示了这一操纵。专家表示此类操纵正大规模系统性地发生。操纵 AI 搜索向用户提供偏见或不准确信息可能会带来严重后果。这并非一个无关紧要的问题。在全球范围内,逾 10 亿人日常使用 AI 聊天机器人,每月有 25 亿人浏览 Google 的 AI overviews。如果你能操控此类工具就能获得巨大的权力。Google 等公司也注意到了该问题。, Google 上周更新了其政策,将试图操纵 AI 回复的行为视为违反公司规定。Google 威胁对涉嫌操纵行为的公司或网站从搜索结果中移除或降低排名。

RTX 5090DV2 显卡列入封禁清单

上周五,中国海关将去年 8 月英伟达为通过美国出口管制规定而推出的 RTX 5090DV2 显卡列入封禁清单。该清单最初包括 H200 和 H20。H20 是英伟达此前在中国市场销售的另一款中国特供芯片。在京东和淘宝等主要电商平台,RTX 5090DV2 仍在销售,价格在 1.8 万-2.2 万元之间,意味着现有库存仍然能正常销售,但随着进口的消失,其数量将会越来越少。

Google 意外公开了未修复 Chromium 漏洞的利用代码

Google 周三公开了一个未修复 Chromium 漏洞的利用代码。该漏洞影响所有使用基于 Chromium 浏览器的用户。独立安全研究员 Lyra Rebane 在 2022 年底向 Google 报告了漏洞,但 29 个月后它仍然没有修复。本周三上午 Google 向 Chromium 的 bug 跟踪系统披露了漏洞,Rebane 一开始以为漏洞已经修复了,结果发现根本没有。Google 虽然之后删除了帖子,但其内容已被其它网站存档。该漏洞滥用了 Chromium 的 Browser Fetch API 打开一个持续活动的 Service Worker,恶意网站可通过 JavaScript 触发该 Service Worker 创建连接,监视用户的部分活动,它还可作为代理访问网站和发起 DDoS 攻击。安全研究人员认为这是一个严重的漏洞,它实际上相当于一个受限的后门,将浏览器变成僵尸网络的一部分。

三星电子劳资谈判达成初步协议,罢工终止

三星电子工会在 20 日 23 时总罢工启动仅剩最后 1 个小时之际,与三星电子公司戏剧性地达成了协议,罢工终止。根据双方达成的就绩效奖金方案初步协议,负责半导体业务的设备解决方案(DS)部门员工今年有望获得最高约 6 亿韩元(约合人民币 272.3 万元)的绩效奖金。劳资商定维持既有的年终绩效奖金(OPI)制度的同时,为 DS 部门新设半导体特别绩效奖金。公司将拿出业绩的 10.5% 作为特别绩效奖金资金来源,不设上限。资金来源中的 40% 将分配给 DS 部门,其余 60% 分配给子部门,向行政部门统一发放的绩效奖金为 DS 子部门存储芯片事业部的 70% 水平。人均绩效奖金规模有望达 6 亿韩元。

安娜档案馆被判向图书出版商赔偿 1950 万美元

Penguin Random House、Elsevier 和 HarperCollins 等 13 家大型图书出版商今年 3 月联合起诉安娜的档案(Anna’s Archive),指控该影子图书馆助长图书盗版。出版商此举旨在获得法庭禁令,对安娜的档案的域名注册商施压。安娜的档案已经深陷了多起诉讼,去年底流媒体巨头 Spotify 和唱片公司起诉安娜的档案导致其失去了 .org 主域名。本周美国地区法官 Jed S. Rakoff 签署了一项缺席判决书,完全满足了出版商的要求,安娜档案馆被判向出版商赔偿 1950 万美元。法官还发布了一项范围广泛的永久禁令,要求二十多家全球域名注册商、托管商和服务提供商立即关闭安娜的档案的其余域名。鉴于网站运营者身份匿名,赔偿金基本不可能兑现,因此它面临的影响主要是禁令,如美国公司 Cloudflare 和 OwnRegistrar 将需要遵守禁令。

Firefox 将移除 asm.js 相关代码

Mozilla 宣布 Firefox 未来将移除 asm.js 相关代码,因为它早有了后继者 WebAssembly,同时维护两者耗费时间且增加攻击面。asm.js 是 Mozilla 对 NaCl 和 PNaCl 的回应:通过选择一个严格静态的 JavaScript 子集获得类似 NaCl/PNaCl 的性能,同时代码又能直接运行在 Web 内容中。asm.js 于 2013 年随 Firefox 22 发布,获得了巨大的成功,证明只使用 Web 技术就能在 Web 上以接近原生的速度运行代码,它为 WebAssembly 的诞生铺平了道路,WebAssembly 在 2019 年成为 W3C 标准。Mozilla 从 Firefox 148 开始 JS 引擎 SpiderMonkey 默认禁用 asm.js 优化,未来版本将完全移除相关代码,使用 asm.js 的网站不会受到影响,开发者建议想要继续使用 asm.js 发布内容的网站重编译到 WebAssembly,它的执行速度更快,二进制文件更小。

Google 云服务 GCP 不小心将其大客户 Railway 的账号封禁

2024 年 Google 云服务 GCP 的错误配置导致澳大利亚退休基金管理公司 UniSuper 的数据被完全删除,幸运的是 UniSuper 在另一家公司有备份。这起事故导致 UniSuper 下线了一周多时间。2026 年 5 月 19 日 GCP 发生了一起类似的严重事故,它的自动系统将其大客户、PaaS 平台 Railway.com 的生产账号给封了,导致 Railway 的服务下线,根据 Railway 官方博客的事故报告,宕机持续了大约 8 个小时。账号封禁发生在 19 日 22:10 UTC,导致 Railway 失去了 GCP 相关的基础设施,这些基础设施支持了控制面板、API 以及部分网络基础设施。Railway 立即联系了 GCP 的客户经理,22:29 UTC 账号恢复,但计算实例、磁盘以及网络都需要逐个慢慢恢复,直到第二天 07:58 UTC 事故才完全解决。Railway 宣布将降低对 GCP 的依赖,计划将 GCP 从热路径中移除,保留作为备份/故障转移服务。

为何日本的花粉过敏如此严重

日本的花粉过敏症是一个全国性健康问题,估计 43% 的日本人出现中度至重度症状。相比下英国是 26%,美国为 12%-18%。每年春天日本全国各地的城市街道上人人都戴上口罩,原因就是花粉引发的过敏性鼻炎。为什么日本的花粉过敏问题如此严重?原因与健康不佳、污染甚至自然环境都关系不大,而是与二战后日本政客的决策有关。战争期间,石油和天然气短缺迫使日本转向其最丰富的自然资源——森林——作为家庭和工业的燃料来源。天然森林遭到大面积砍伐,东京、大阪和神户等城市周围山林被砍伐殆尽。二战之后,由于光秃秃的山容易引发山体滑坡和洪涝灾害,政府决定开展大规模植树造林。政府选择了两种快速生长的树种:日本杉(sugi)和日本扁柏(hinoki)。今天这些杉树和柏树的种植面积占到了国土面积的五分之一。问题是杉树和柏树在生长 30 年成熟之后会产生大量轻质花粉。而几乎所有人工林的年龄都超过 30 岁了。为了缓解过敏症日本政府如今计划砍掉五分之一的杉树林,替换上新树种。

Fedora 移除深度桌面环境包

在 openSUSE 之后,Fedora 发行版移除了深度桌面环境包(Deepin Desktop)。2025 年初 SUSE 安全团队在一次例行审查中发现深度桌面环境有名叫 deepin-feature-enable 的软件包,该软件包是在 2021 年 4 月加入的,并没有咨询或通知 SUSE,它包含了一个“许可协议对话框(license agreement dialog)”,基本上说讲因为 openSUSE 的安全规定,它禁用了 deepin-api 和 deepin-daemon 需要的所有 dbus 和 polkit 功能,这可能导致 Deepin Desktop 不能正常工作,部分功能无效。如果用户不在意这些安全问题,可选择点击确认,之后会自动安装缺少的 dbus 和 polkit。安全团队的调查发现,deepin-daemon 中的核心组件从未递交进行安全审查,它们被悄悄的引入到了 openSUSE 中。鉴于 Deepin 社区过去几年多次违规,openSUSE 决定移除 Deepin Desktop。Fedora 项目随后也对深度桌面环境包展开安全审查,期间开发者发现难以联系部分深度软件包的维护者,因为安全担忧和软件包缺乏维护,它最终决定移除深度桌面环境。

OpenAI 和英伟达等在模型中加入了对 SynthID 水印的支持

Google 在三年前推出了用于标记 AI 图像的数字水印技术 SynthID,它称 SynthID 至今被用于标记了 1000 亿张图像和视频。Google 去年在 Gemini 应用中添加了 SynthID 检测功能。用户上传可疑内容,询问聊天机器人是否是 AI 生成的。Google 称至今还没有人成功破解 SynthID,宣布与多家 AI 公司合作加入对该水印技术的支持。英伟达的 Cosmos、OpenAI 的 GPT 2 图像、Kakao 和 ElevenLabs 都将在其 AI 生成内容中加入对 SynthID 的支持。

全球疫苗接种率下滑

全球疫苗接种率下滑。在医疗体系陷入混乱的新冠疫情过去后,疫苗接种率今未能恢复至以前的水平。2024 年麻疹疫情已蔓延至 59 个国家。麻疹病毒传染性极强,如果同一空间中有感染者,没有相关免疫的人群几乎 100% 会被感染。该病的并发症有肺炎、中耳炎等,甚至可能导致脑炎,变成重症。预防麻疹必须要靠疫苗。想要维持群体免疫、防止疫情扩散,疫苗接种率需达到 95% 以上。新冠疫情期间,由于出行限制,民众普遍推迟了其他疫苗的接种。医疗机构方面,接种人员和治疗人员也侧重于应对新冠疫情。加上其他传染病的流行得到抑制,认为无需接种疫苗的人越来越多,导致全球疫苗接种率持续走低。除麻疹以外,其他传染病也呈现出类似趋势。2024 年白喉、百日咳、破伤风三联疫苗的接种率全球所有地区都低于 2010 年以后的峰值水平。

地月之间的最高效路线

科学家开发出一种数学方法,能更精确地计算天体轨道之间最经济的旅行路线。以地月为例,与此前最节能的路线相比,新路线所需燃料减少了 58.80 米/秒。与旅程的预估总成本 3342.96 米/秒相比,这一差距看似微小,却对任务成本影响巨大。团队表示,在太空旅行中,每1米/秒的速度变化,都意味着巨大的燃料消耗。基于这一结果,团队绘制出一条从地球轨道到月球轨道的航天器飞行轨迹,并将其分为两个阶段。首先,航天器脱离地球轨道,进入L1拉格朗日点周围的轨道。L1拉格朗日点位于地球和月球之间,在这里,两天体的引力恰好相互抵消。借助控制系统,航天器可以无限期地保持在这个中间轨道上,直到任务准备就绪,再执行进入月球轨道的第二阶段。

GitHub 证实黑客窃取了其内部代码库

GitHub 通过 X 平台官方账号证实黑客窃取了其内部代码库,它正对此展开调查。此前黑客组织 TeamPCP 通过 Breached 论坛声称获得了 GitHub 内部源代码和内部组织的访问权限,窃取了大约 3800 个代码库,它对想要访问源代码的人开出了 5 万美元的报价。TeamPCP 坚称这不是勒索,只要有人开出不低于 5 万美元的报价,它们会在收钱之后销毁数据,如果没有买家则将会免费公开。GitHub 称它的调查显示一名员工的计算机被入侵,其源头是安装的恶意 VS Code 扩展,他们移除了扩展隔离了设备,正继续进行调查。GitHub 表示目前没有证据表明客户数据受到影响。

Kickstarter 撤销对成人内容的全面封禁

众筹平台 Kickstarter 上周修改了规则,扩大了禁止的成人内容范围。此前它只禁止“色情内容”,更新后的规则显著扩大了成人内容范围,包括但不限于:暗示性行为,MILF/DILF 内容,暗示性裸露,任何包含女性乳头/乳晕、生殖器和肛门的内容。在引发争议之后,Kickstarter 证实它修改规则是在支付处理商 Stripe 压力下做出的,而 Stripe 受到了更大的金融系统的制约。过去几个月 Kickstarter 上进行众筹的项目有许多其筹款账号被 Stripe 暂停,因此它修改规则以满足 Stripe 限制成人内容的要求。但这一做法受到了社区的批评,它现在决定撤销新的规则,回归旧规则,但同时添加了 Stripe 政策的相关链接。

Google 宣布改变搜索框

在周二举行的 Google I/O 开发者大会上,Google 宣布对其有 25 年历史的标志性搜索框进行重新设计,将其转变成 AI 驱动的“智能搜索框”——基本上就是聊天机器人的对话框,其功能从执行搜索变为询问 Google(Ask Google)。Google 声称在搜索服务集成 AI 模式之后,月活跃用户数突破了 10 亿,搜索量创下了历史新高,所以它现在准备进一步把 AI 模式变成搜索的默认功能。类似 AI 聊天机器人,智能搜索框可以将文本、图像、文件、视频或 Chrome 标签页作为输入进行搜索。Google 还将提供智能体数字助手帮助用户自动搜索,寻找公寓的用户无需打开 Zillow 等网站即可收到新房源的通知。Google 此举再次引发了广泛批评,基于大模型的 AI 功能并没有将精确性视为核心,因此未来的搜索质量会进一步下降,进一步模糊广告和搜索结果。

三星电子劳资谈判破裂,从 21 日起开始 18 天大罢工

三星电子劳资 20 日就奖金发放上限标准等进行第三轮事后调解会议,但是双方未能达成协议,谈判最终破裂。工会表示对雇佣劳动部旗下中央劳动委员会提出的协调方案表示同意,但是三星电子方面拒不接受协调方案。三星电子只反复称“尚未做出决策”,没有表明立场。工会将于明天如期启动总罢工,在罢工期间工会仍将继续努力,争取同资方达成协议。总罢工预期每天会给三星电子带来多达 20 亿美元的损失。韩总统府对谈判破裂表示遗憾,韩政府正在研讨行使“紧急调解权”限制工会进行罢工,并将支持劳资进行新一轮调解。

Bug 悬赏项目被 AI 报告淹没

企业通过 Bug 悬赏项目向白帽子黑客支付发现 bug 的赏金,但此类项目如今被低质量的 AI 报告淹没,迫使部分企业终止项目。Bugcrowd 的客户包括 OpenAI、T-Mobile 和摩托罗拉,该公司表示 3 月三周内收到的报告数量翻了四倍多,大部分报告被证实是错误的。Curl 项目在 1 月暂停了 Bug 悬赏项目。网络安全公司 Sophos 的首席信息安全官 Ross McKerchar 表示,低质量 AI 报告正迅速成为一大问题,Bug 悬赏会继续 存在,但必须做出改变。Nextcloud 在 4 月暂停了 Bug 悬赏。Bug 悬赏项目平台 HackerOne 也开始引入 AI 智能体去筛选递交的 Bug 报告,CEO Kara Sprague 表示高质量的 AI 报告最近也略有增加。

pgBackRest 作者宣布继续维护该项目

上月底,PostgreSQL 备份恢复项目 pgBackRest 的维护者 David Steele 宣布项目存档停止维护。pgBackRest 被广泛视为是 PostgreSQL 生态系统最流行的运维工具之一。Steele 解释说,过去 13 年 pgBackRest 是他倾注热情的项目,幸运的是大部分时间里他都有企业资助,他的长期赞助商是 Crunchy Data 公司,但这家公司被 Snowflake 收购了,而新东家无意资助他继续从事相关工作,因此他过去几个月一直在寻找继续这项工作的职位但没有成功,获得的赞助也远远未能达到维持项目运营所需的金额,因此只能宣布停止维护。在这一声明公布数周之后,他更新了消息,宣布将继续开发 pgBackRes:因为一个赞助商联盟同意为项目持续提供资金,给予了 pgBackRes 开发所需的长期稳定性,他对此表示了感谢。

索尼取消将 PS 独占单人游戏移植到 PC 的计划

负责索尼 PS 工作室业务的高管 Hermen Hulst 周一证实了此前的流言:取消将 PS 独占单人游戏移植到 PC 的计划。索尼过去几年将此前的独占 PS 单人游戏如 God of War 系列、Spider-Man 系列、Ghost of Tsushima、The Last of Us 系列和 Horizon Zero Dawn 系列移植到了 PC 平台,但最近一段时间移植频率下降,引发了索尼改变移植战略的流言。Hermen Hulst 周一在员工大会上宣布了公司的战略调整计划。索尼据称是担心稀释 PlayStation 品牌影响力。此举意味着索尼最近推出的单人游戏 Ghost of Yotei 和 Saros 将会无缘登陆 PC。索尼的战略调整针对的是第一方工作室的单人游戏,多人游戏以及第三方工作室的单人游戏仍然会登陆 PC。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK