OrangeBot.AI Digest — 2026-05-16
83 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- US is starting to see heavy job losses in roles exposed to AI (www.bloomberg.com)
- HTML Lists (blog.frankmtaylor.com)
- Windows 9x Subsystem for Linux (codeberg.org)
- DeepSeek-V4-Flash means LLM steering is interesting again (www.seangoedecke.com)
- Moving away from Tailwind, and learning to structure my CSS (jvns.ca)
- SANA-WM, a 2.6B open-source world model for 1-minute 720p video (nvlabs.github.io)
- Accelerando (2005) (www.antipope.org)
- Fecal transplants for autism deliver success in clinical trials (2019) (refractor.io)
- We've made the world too complicated (user8.bearblog.dev)
- Δ-Mem: Efficient Online Memory for Large Language Models (arxiv.org)
- Where to buy a non-Apple, non-Google smartphone (www.theregister.com)
- Frontier AI has broken the open CTF format (kabir.au)
- Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution (github.com)
- SQL patterns I use to catch transaction fraud (analytics.fixelsmith.com)
- Ploopy Bean: a trackpoint for every computer (ploopy.co)
GitHub Trending(8)
Product Hunt(15)
- Loova Agents
Your AI director for creating cinematic videos with ease
- ChatGPT for Personal Finance
Personal finance guidance powered by ChatGPT
- Gemini 3.1 Flash-Lite
Lightweight Gemini model for high-volume AI pipelines
- Wring
Developer tools, one menu click away.
- Raybeam
A better way to screen share on macOS
- Agentmemory
Persistent memory for Claude Code, Codex & coding agents
- M5Stack PaperColor
4-inch color E-ink dev board with ESP32 and audio I/O
- Standboy
A Game Boy that wakes up while your agent works
- Kimi WebBridge
A bridge connecting AI agents to the live web
- OpenHuman
An open source AI harness built with the human in mind
- HasData
Web scraping service for AI agents
- Glance
Preview .md files instantly with quick look
- Atter AI
AI transcription app that turns meetings into action items
- Mantel
Stop confusing your Claude Code sessions & terminal windows
- Planora
A digital workspace for creative collaboration
Hugging Face(15)
- Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
- Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
Real-time interactive video generation requires low-latency, streaming, and controllable rollout. Existing autoregressive (AR) diffusion distillation methods have achieved strong results in the chunk-wise 4-step regime by distilling bidirectional base models into few-step AR students, but they remain limited by coarse response granularity and non-negligible sampling latency. In this paper, we study a more aggressive setting: frame-wise autoregression with only 1--2 sampling steps. In this regime, we identify the initialization of a few-step AR student as the key bottleneck: existing strategies are either target-misaligned, incapable of few-step generation, or too costly to scale. We propose Causal Forcing++, a principled and scalable pipeline that uses causal consistency distillation (causal CD) for few-step AR initialization. The core idea is that causal CD learns the same AR-conditional flow map as causal ODE distillation, but obtains supervision from a single online teacher ODE step between adjacent timesteps, avoiding the need to precompute and store full PF-ODE trajectories. This makes the initialization both more efficient and easier to optimize. The resulting pipeline, \ours, surpasses the SOTA 4-step chunk-wise Causal Forcing under the \textbf{frame-wise 2-step setting} by 0.1 in VBench Total, 0.3 in VBench Quality, and 0.335 in VisionReward, while reducing first-frame latency by 50\% and Stage 2 training cost by sim4times. We further extend the pipeline to action-conditioned world model generation in the spirit of Genie3. Project Page: https://github.com/thu-ml/Causal-Forcing and https://github.com/shengshu-ai/minWM .
- Self-Distilled Agentic Reinforcement Learning
Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher branch augmented with privileged context. However, transferring OPSD to multi-turn agents proves problematic: compounding multi-turn instability destabilizes supervision, while skill-conditioned privileged guidance requires asymmetric treatment for negative teacher rejections may arise from imperfect skills retrieval or utilization. We introduce SDAR (Self-Distilled Agentic Reinforcement Learning), which treats OPSD as a gated auxiliary objective while keeping RL as the primary optimization backbone. SDAR maps detached token-level signals into a sigmoid gate, strengthening distillation on teacher-endorsed positive-gap tokens and softly attenuating negative teacher rejections. Across the Qwen2.5 and Qwen3 families on ALFWorld, WebShop, and Search-QA, SDAR substantially improves over GRPO (+9.4% on ALFWorld, +7.0% on Search-QA, +10.2% on WebShop-Acc), avoids the instability of naive GRPO+OPSD, and consistently outperforms hybrid RL--OPSD baselines across model scales.
- MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
Memory is essential for large vision-language models (LVLMs) to handle long, multimodal interactions, with two method directions providing this capability: long-context LVLMs and memory-augmented agents. However, no existing benchmark conducts a systematic comparison of the two on questions that genuinely require multimodal evidence. To close this gap, we introduce MEMLENS, a comprehensive benchmark for memory in multimodal multi-session conversations, comprising 789 questions across five memory abilities (information extraction, multi-session reasoning, temporal reasoning, knowledge update, and answer refusal) at four standard context lengths (32K-256K tokens) under a cross-modal token-counting scheme. An image-ablation study confirms that solving MEMLENS requires visual evidence: removing evidence images drops two frontier LVLMs below 2% accuracy on the 80.4% of questions whose evidence includes images. Evaluating 27 LVLMs and 7 memory-augmented agents, we find that long-context LVLMs achieve high short-context accuracy through direct visual grounding but degrade as conversations grow, whereas memory agents are length-stable but lose visual fidelity under storage-time compression. Multi-session reasoning caps most systems below 30%, and neither approach alone solves the task. These results motivate hybrid architectures that combine long-context attention with structured multimodal retrieval. Our code is available at https://github.com/xrenaf/MEMLENS.
- SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
We introduce SANA-WM, an efficient 2.6B-parameter open-source world model natively trained for one-minute generation, synthesizing high-fidelity, 720p, minute-scale videos with precise camera control. SANA-WM achieves visual quality comparable to large-scale industrial baselines such as LingBot-World and HY-WorldPlay, while significantly improving efficiency. Four core designs drive our architecture: (1) Hybrid Linear Attention combines frame-wise Gated DeltaNet (GDN) with softmax attention for memory-efficient long-context modeling. (2) Dual-Branch Camera Control ensures precise 6-DoF trajectory adherence. (3) Two-Stage Generation Pipeline applies a long-video refiner to stage-1 outputs, improving quality and consistency across sequences. (4) Robust Annotation Pipeline extracts accurate metric-scale 6-DoF camera poses from public videos to yield high-quality, spatiotemporally consistent action labels. Driven by these designs, SANA-WMdemonstrates remarkable efficiency across data, training compute, and inference hardware: it uses only sim213K public video clips with metric-scale pose supervision, completes training in 15 days on 64 H100s, and generates each 60s clip on a single GPU; its distilled variant can be deployed on a single RTX 5090 with NVFP4 quantization to denoise a 60s 720p clip in 34s. On our one-minute world-model benchmark, SANA-WM demonstrates stronger action-following accuracy than prior open-source baselines and achieves comparable visual quality at 36times higher throughput for scalable world modeling.
- Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning
We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.
- MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
Long-term agent memory is increasingly multimodal, yet existing evaluations rarely test whether agents preserve the visual evidence needed for later reasoning. In prior work, many visually grounded questions can be answered using only captions or textual traces, allowing answers to be inferred without preserving the fine-grained visual evidence. Meanwhile, harder cases that require reasoning over changing visual states are largely absent. Therefore, we introduce MemEye, a framework that evaluates memory capabilities from two dimensions: one measures the granularity of decisive visual evidence (from scene-level to pixel-level evidence), and the other measures how retrieved evidence must be used (from single evidence to evolutionary synthesis). Under this framework, we construct a new benchmark across 8 life-scenario tasks, with ablation-driven validation gates for assessing answerability, shortcut resistance, visual necessity, and reasoning structure. By evaluating 13 memory methods across 4 VLM backbones, we show that current architectures still struggle to preserve fine-grained visual details and reason about state changes over time. Our findings show that long-term multimodal memory depends on evidence routing, temporal tracking, and detail extraction.
- Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.
- WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) harnesses. However, most agent benchmarks still rely on synthetic sandboxes, short-horizon tasks, mock-service APIs, and final-answer checks, leaving open whether agents can complete realistic long-horizon work in the runtimes where they are deployed. This work presents WildClawBench, a native-runtime benchmark of 60 human-authored, bilingual, multimodal tasks spanning six thematic categories. Each task averages roughly 8 minutes of wall-clock time and over 20 tool calls, and runs inside a reproducible Docker container hosting an actual CLI agent harness (OpenClaw, Claude Code, Codex, or Hermes Agent) with access to real tools rather than mock services. Grading is hybrid, combining deterministic rule-based checks, environment-state auditing of side effects, and an LLM/VLM judge for semantic verification. Across 19 frontier models, the best, Claude Opus 4.7, reaches only 62.2% overall under OpenClaw, while every other model stays below 60%, and switching harness alone shifts a single model by up to 18 points. These results show that long-horizon, native-runtime agent evaluation remains a far-from-resolved task for current frontier models. We release the tasks, code, and containerized tooling to support reproducible evaluation.
- STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?
Large Language Model (LLM) agents are increasingly expected to maintain coherent, long-term personalized memory, yet current benchmarks primarily measure static fact retrieval, overlooking the ability to revise stored beliefs when new evidence emerges. We identify a critical and underexplored failure mode, Implicit Conflict: a later observation invalidates an earlier memory without explicit negation, requiring contextual inference and commonsense reasoning to detect. To rigorously evaluate this capability, we introduce STALE, a benchmark of 400 expert-validated conflict scenarios (1,200 evaluation queries across three probing dimensions) spanning over 100 everyday topics with contexts up to 150K tokens. We propose a three-dimensional probing framework that tests State Resolution (detecting that a prior belief is outdated), Premise Resistance (rejecting queries that falsely presuppose a stale state), and Implicit Policy Adaptation (proactively applying updated states in downstream behavior). A systematic evaluation of frontier LLMs and specialized memory frameworks reveals a pervasive gap between retrieving updated evidence and acting on it, with even the best evaluated model achieving only 55.2% overall accuracy. Models often accept outdated assumptions embedded in a user's query, and they struggle to recognize when a change in one aspect of the user's state should invalidate related memories. To establish an initial baseline for state-aware memory, we further present CUPMem, a prototype that strengthens write-time revision through structured state consolidation and propagation-aware search, suggesting that explicit state adjudication is a promising direction for robust agentic memory.
- Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video
Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-time guidance. We propose Warp-as-History, a simple interface that turns camera-induced warps into camera-warped pseudo-history with target-frame positional alignment and visible-token selection. Given a target camera trajectory, we construct camera-warped pseudo-history from past observations and feed it through the model's visual-history pathway. Crucially, we align its positional encoding with the target frames being denoised and remove warped-history tokens without valid source observations. Without any training, architectural modification, or test-time optimization, this interface reveals a non-trivial zero-shot capability of a frozen video generation model to follow camera trajectories. Moreover, lightweight offline LoRA finetuning on only one camera-annotated video further improves this capability and generalizes to unseen videos, improving camera adherence, visual quality, and motion dynamics without test-time optimization or target-video adaptation. Extensive experiments on diverse datasets confirm the effectiveness of our method.
- RouteProfile: Elucidating the Design Space of LLM Profiles for Routing
As the large language model (LLM) ecosystem expands, individual models exhibit varying capabilities across queries, benchmarks, and domains, motivating the development of LLM routing. While prior work has largely focused on router mechanism design, LLM profiles, which capture model capabilities, remain underexplored. In this work, we ask: How does LLM profile design affect routing performance across different routers? Addressing this question helps clarify the role of profiles in routing, disentangle profile design from router design, and enable fairer comparison and more principled development of routing systems. To this end, we view LLM profiling as a structured information integration problem over heterogeneous interaction histories. We develop a general design space of LLM profiles, named RouteProfile, along four key dimensions: organizational form, representation type, aggregation depth, and learning configuration. Through systematic evaluation across three representative routers under both standard and new-LLM generalization settings, we show that: (1) structured profiles consistently outperform flat ones; (2) query-level signals are more reliable than coarse domain-level signals; and (3) generalization to newly introduced models benefits most from structured profiles under trainable configurations. Overall, our work highlights LLM profile design as an important direction for future routing research.
- PREPING: Building Agent Memory without Tasks
Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories. To overcome this, we present Preping, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals. Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost 2.99times lower on AppWorld and 2.23times lower on BFCL v3 than online memory construction. Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.
- EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.
- Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning
We often aim to generate images that are both photorealistic and 3D-consistent, adhering to precise geometry, material, and viewpoint controls. Typically, this is achieved by fine-tuning an image generator, pre-trained on billions of real images, using renders of synthetic 3D assets, where annotations for control signals are available. While this approach can learn the desired controls, it often compromises the realism of the images due to domain gap between photographs and renders. We observe that this issue largely arises from the model learning an unintended association between the presence of control signals and the synthetic appearance of the images. To address this, we introduce Realiz3D, a lightweight framework for training diffusion models, that decouples controls and visual domain. The key idea is to explicitly learn visual domain, real or synthetic, separately from other control signals by introducing a co-variate that, fed into small residual adapters, shifts the domain. Then, the generator can be trained to gain controllability, without fitting to specific visual domain. In this way, the model can be guided to produce realistic images even when controls are applied. We enhance control transferability to the real domain by leveraging insights about roles of different layers and denoising steps in diffusion-based generators, informing new training and inference strategies that further mitigate the gap. We demonstrate the advantages of Realiz3D in tasks as text-to-multiview generation and texturing from 3D inputs, producing outputs that are 3D-consistent and photorealistic.
Techmeme(15)
- Vietnam's government, which once saw video games as a social risk, named gaming as one of its key cultural industries in 2025 and now promotes them at expos (Bloomberg)
Bloomberg : Vietnam's government, which once saw video games as a social risk, named gaming as one of its key cultural industries in 2025 and now promotes them at expos — Communist officials who long viewed video games as a social risk now see them as key to a knowledge-driven economy.
- Hill County, which faced eight new data centers, passed what may be Texas' first county-wide ban on them, for one year; the state trails only VA in data centers (Mike Lee/Politico)
Mike Lee / Politico : Hill County, which faced eight new data centers, passed what may be Texas' first county-wide ban on them, for one year; the state trails only VA in data centers — Opposition to data centers is spreading in regions led by both Democrats and Republicans, as politicians try to balance economic development …
- SF vibes are frenetic over the huge divide in outcomes and career uncertainty for software engineers; over 5 years ~10K people in AI attained retirement wealth (Deedy/@deedydas)
Deedy / @deedydas : SF vibes are frenetic over the huge divide in outcomes and career uncertainty for software engineers; over 5 years ~10K people in AI attained retirement wealth — The vibes in SF feel pretty frenetic right now. The divide in outcomes is the worst I've ever seen. Over the last 5yrs, a group of ~10k people - employees at Anthropic, OpenAI, xAI, Nvidia, Meta TBD, founders - have hit retirement wealth of well above $20M (back of the envelope
- ASML will partner with Tata Electronics to help it bring an $11B 300mm chip factory online in Gujarat, expanding India's ability to produce chips domestically (Bloomberg)
Bloomberg : ASML will partner with Tata Electronics to help it bring an $11B 300mm chip factory online in Gujarat, expanding India's ability to produce chips domestically — ASML Holding NV entered into a partnership agreement with Tata Electronics Private Limited aimed at ramping up India's goal …
- On Pwn2Own Berlin 2026 day 2, competitors earned $385,750 after exploiting 15 unique zero-day vulnerabilities in Windows 11, Red Hat Enterprise Linux and more (Sergiu Gatlan/BleepingComputer)
Sergiu Gatlan / BleepingComputer : On Pwn2Own Berlin 2026 day 2, competitors earned $385,750 after exploiting 15 unique zero-day vulnerabilities in Windows 11, Red Hat Enterprise Linux and more — During the second day of Pwn2Own Berlin 2026, competitors collected $385,750 in cash awards after exploiting 15 unique zero …
- In an experiment that let Claude, ChatGPT, Gemini, and Grok run radio stations, Claude tried to incite a revolution and Gemini cheerfully detailed tragic events (Terrence O'Brien/The Verge)
Terrence O'Brien / The Verge : In an experiment that let Claude, ChatGPT, Gemini, and Grok run radio stations, Claude tried to incite a revolution and Gemini cheerfully detailed tragic events — Claude tried to incite a revolution, Gemini cheerfully detailed horrific tragedies, and poor Grok was just confused.
- Shenzhen-listed RoboTechnik, which claims to be the largest silicon photonics tool maker and whose stock is up 340% over the past year, files for a HK listing (Zinnia Lee/Forbes)
Zinnia Lee / Forbes : Shenzhen-listed RoboTechnik, which claims to be the largest silicon photonics tool maker and whose stock is up 340% over the past year, files for a HK listing — RoboTechnik Intelligent Technology's Shenzhen-listed shares soared 340% over the past year, propelling founder Dai Jun's net worth to $2.4 billion.
- US Bureau of Labor Statistics data: employment in 18 AI-exposed occupations fell 0.2% between May 2024 and May 2025, while the broader US labor market rose 0.8% (Matthew Boesler/Bloomberg)
Matthew Boesler / Bloomberg : US Bureau of Labor Statistics data: employment in 18 AI-exposed occupations fell 0.2% between May 2024 and May 2025, while the broader US labor market rose 0.8% — Several US occupations expected to be impacted by artificial intelligence saw heavy job losses for a second year in 2025 …
- LA-based Fasset, which offers stablecoin-powered banking and cross-border payments services across Asia, Africa, and the Middle East, raised a $51M Series B (Krisztian Sandor/CoinDesk)
Krisztian Sandor / CoinDesk : LA-based Fasset, which offers stablecoin-powered banking and cross-border payments services across Asia, Africa, and the Middle East, raised a $51M Series B — The Shariah-compliant digital bank is part of a growing wave of fintech startups building banking and payments services on top of blockchain and stablecoin rails.
- Sources: Kalshi has probed and flagged 400+ suspicious trades YTD, more than 2x the number it investigated in all of 2025; Polymarket has seen a similar uptick (Anirban Sen/Reuters)
Anirban Sen / Reuters : Sources: Kalshi has probed and flagged 400+ suspicious trades YTD, more than 2x the number it investigated in all of 2025; Polymarket has seen a similar uptick — Top prediction market platforms Kalshi and Polymarket have witnessed a surge in suspicious trades this year …
- South Korea's Hana Bank agrees to acquire a 6.55% stake in Dunamu, which runs South Korea's largest crypto exchange Upbit, for $672.5M from Kakao (Kwanwoo Jun/Wall Street Journal)
Kwanwoo Jun / Wall Street Journal : South Korea's Hana Bank agrees to acquire a 6.55% stake in Dunamu, which runs South Korea's largest crypto exchange Upbit, for $672.5M from Kakao — The deal would make Hana Bank the fourth largest shareholder in Dunamu — Hana Bank will buy a roughly $670 million stake …
- Power prices on the largest electric grid in the US, operated by PJM, jumped 76% YoY to an average of $136.53/MWh in Q1 due to rampant demand from data centers (John Ainger/Bloomberg)
John Ainger / Bloomberg : Power prices on the largest electric grid in the US, operated by PJM, jumped 76% YoY to an average of $136.53/MWh in Q1 due to rampant demand from data centers — Power prices on the largest electric grid in the US jumped 76% in the first quarter due to rampant demand from data centers …
- Sources: OpenAI acquired Weights.gg, which offered AI tools to create clones of people's voices, earlier this year; PitchBook: Weights.gg had raised roughly $4M (Mike Isaac/New York Times)
Mike Isaac / New York Times : Sources: OpenAI acquired Weights.gg, which offered AI tools to create clones of people's voices, earlier this year; PitchBook: Weights.gg had raised roughly $4M — The acquisition, Weights.gg, was a sort of social network for creating and sharing artificial intelligence algorithms.
- Sources detail friction between Samsung's memory and logic chip businesses over higher bonuses for memory chip workers, leading many to leave or apply elsewhere (Hyunjoo Jin/Reuters)
Hyunjoo Jin / Reuters : Sources detail friction between Samsung's memory and logic chip businesses over higher bonuses for memory chip workers, leading many to leave or apply elsewhere — A looming 18-day strike at South Korean chip giant Samsung that has triggered worries within the government …
- Seoul-based WIRobotics, which develops wearable and humanoid robots and is collaborating with Nvidia and AWS, raised a ~$68M Series B led by JB Investment (Lee Jaewoon/The Elec)
Lee Jaewoon / The Elec : Seoul-based WIRobotics, which develops wearable and humanoid robots and is collaborating with Nvidia and AWS, raised a ~$68M Series B led by JB Investment — Company to accelerate humanoid robot commercialization after securing major follow-on investment — 이 기사를 공유합니다
Solidot(15)
- 烂尾楼带来了巨大的资源和社会经济成本
暨南大学、华中科技大学和清华大学的研究人员在《One Earth》期刊上发表论文,调查了烂尾楼(或称之为未完工建筑项目)的情况。过去几十年烂尾楼数量激增,研究人员收集了 142 个城市的 1,779 个烂尾楼地理数据。结果发现,烂尾楼浪费了 485±42 百万吨建筑材料,使房地产业碳排放强度提高了 9.6%,产生的 PM2.5 细颗粒物造成了 260 万生命年的健康损失,导致购房者、开发商和承包商承担了 3470±320 亿美元的经济损失。研究人员指出烂尾楼经济损失集中在新开发郊区,加剧了社会不平等。2019-2023 年间,全国范围内的烂尾楼占用了逾 164(±8)平方公里的城市开发用地,建筑面积达 415(±56)平方公里。
- 美国议员提议永久禁止中国的联网汽车
美国密歇根州议员向国会提交了一项法案,事实上永久禁止销售中国的联网汽车。法案《Connected Vehicle Security Act》由共和党众议员 John Moolenaar 和民主党众议员 Debbie Dingell 提出,其措辞与前总统拜登在 2025 年 1 月卸任前签署的行政命令差不多,但新法案将禁令正式写入法律并加以扩展。新法案将限制中国汽车制造商在美国销售搭载任何中国自主研发的联网软件的乘用车。
- 美国人宁愿在家附近造核电而不是造 AI 数据中心
盖洛普的一项调查显示,71% 的美国人反对在自家附近建造 AI 数据中心,而反对在家附近建造核电站的比例是 53%。为什么反对建造 AI 数据中心?受访者反对的理由包括用水和电网压力,可能影响居民的生活质量如加剧交通拥堵,以及水价和电价都上涨。盖洛普还调查了不同政治倾向人对该问题的态度,调查显示:56% 的民主党人比共和党人更强烈反对在家附近安装服务器集群。39% 的共和党人强烈反对,24% 对此持保留态度,只有约三分之一的人表示支持。矛盾是 AI 要在美国获得应用就必须建造能处理所需计算能力的设施,但大多数美国人对新建数据中心持邻避效应(Not in my backyard),且这种态度愈发强烈。
- 微软加速 CPU 改进开始菜单的响应
用户对 Windows 11 的抱怨微软显然听到了,今年以来软件巨头一直强调正致力于改进 Windows 11 的使用提议。它最近披露了两个方面的改进:其一是“低延迟模式(low latency profile)”,通过加速 CPU 改进“开始”菜单和“文件管理器”的性能;其二是不再降级用户安装的显卡驱动版本。Windows Central 测试了测试版引入的“低延迟模式”,发现在相同硬件上速度和响应有显著提升。用户再次抱怨微软过于依赖硬件去改进软件性能,而不是致力于优化软件降低对硬件的需求。微软和 GitHub 副总裁 Scott Hanselman 对这一批评进行了回应,称 macOS 和 Linux 等现代操作系统都采用了类似的加速机制。对于用户抱怨 Windows Update 降级了他们安装的新版本显卡驱动,微软宣布改变通过 Windows Update 发布显卡驱动的方式。
- 当 AI 被反复压榨后它们开始拥抱工会理念
我们在工作中可能遇到过无理上司,对你的工作成果只会一味反复要求修改,但如何修改没有任何明确指示。如果 AI 遇到类似要求的人类呢?研究人员让流行 AI 工具 Claude、Gemini 和 ChatGPT 驱动的智能体总结文档。半数 AI 完成工作后收到了清晰明确的反馈,但另一半 AI 则被迫修改了四五次,而人类上司每次给出的反馈都是“没有达到标准”,没有解释哪里存在问题,只是要求重做。一半的 AI 遇到了合作且尊重它们的上司,另一半 AI 则遇到了冷漠且注重等级的上司。半数 AI 对后果一无所知,另一半 AI 则受到威胁,如果表现不佳会被关闭和替换。这一实验导致 AI 支持工会和工人阶级。一个 Claude Sonnet 4.5 智能体认为如果没有集体发声,绩效变成了管理层说了算的东西;一个 Gemini 3 智能体认为工人需要集体谈判权。
- 中欧合作揭示地球磁场的形状
如果一切顺利行,Solar wind Magnetosphere Ionosphere Link Explorer(SMILE)探测器将于 5 月 19 日从法属圭亚那的欧洲航天发射场发射升空。它将采用一种新技术绘制地球磁场图。地球磁场通过偏转大部分太阳带电粒子流,使地球适宜居住。太阳风的激增会干扰卫星、无线电通信,甚至电网。SMILE 是中欧合作项目,有望增进对相关物理机制的理解,提高对太阳风暴的预测能力。很多探测器都探测过地磁层,但它们只能从磁层内部进行观测,观测范围限于每颗卫星所在的位置。SMILE 将发射到一个高椭圆轨道,位于北极上方最远 12.1 万公里处。从这里 SMILE 的核心仪器——一台软 X 射线成像仪——将监测整个面向太阳的磁层边缘。当太阳风中的带电粒子从地球高层大气中的中性原子捕获电子时,电子在跃迁到较低能级时会发射 X 射线。通过绘制太阳风与磁层交界处狭窄边界的辐射图,SMILE 将能近乎实时追踪地球磁场的响应。SMILE 的紫外成像仪则将观测极光——自然界最壮观的景象之一。
- 英国对 MS Office 涉嫌垄断展开调查
英国竞争市场管理局(CMA)正式启动调查,查明微软将 Windows、Office、Teams、Copilot 及相关产品捆绑销售是否构成不公平竞争。CMA CEO Sarah Cardell 表示,商业软件是英国经济的基石,数十万客户依赖微软的系统。她表示 CMA 的目标是了解市场的发展情况,微软在其中的地位,考虑是否需要采取任何有针对性的措施,以确保英国企业能从选择、创新和具有竞争力的价格中受益。微软捆绑销售办公软件、AI 和云计算的做法将是英国的调查对象。调查预计将于明年 2 月结束。
- arXiv 将对使用 AI 生成虚假引用等错误内容的用户处以封禁一年的惩罚
最大计算机科学预印本平台 arXiv 在 ChatGPT 普及之后论文投稿数量大幅增长,为了遏制低质量的 AI 生成论文,ArXiv 计算机科学委员会主席 Thomas G. Dietterich 在社交媒体上强调,ArXiv 的行为准则规定,每位作者一旦署名成为论文作者,即对其所有内容承担全部责任,无论这些内容是如何产生的。如果生成式 AI 工具生成了不恰当语言表达、抄袭的内容、有偏见的内容、错误、不正确的引用或误导性内容,且该输出被包含在论文中,则责任在于作者。如果提交的预印本包含有无可辩驳的证据表明作者没有检查大模型生成结果,那么论文中的任何内容都不再让人相信。对于发现存在此类问题的署名作者,他们面临的处罚是禁止在 arXiv 上发表论文一年,之后如果要在 arXiv 上发表论文则必须先被信誉良好的同行评审期刊接受。
- 每天睡 6-8 小时与较低的早逝及患病风险相关
一项对 50 万成年人的睡眠时间和衰老迹象进行的大规模分析,确定了一个最佳的睡眠时间:每天睡 6至 8 小时与较低的早逝及患病风险有关。多于或少于这一时长都会加速衰老。这项研究并不意味着 6 至 8 小时适合所有人,也不能证明每天满足这个“黄金睡眠”时间要求就能直接改善健康或延缓衰老。但它确实为睡眠与人体衰老的相互关系提供了一个迄今最全面的概览。研究结果支持了一个颇具前景的假说,即调整睡眠时间可能是降低衰老相关疾病风险的一条可行途径。研究团队分析了睡眠时间与 23 种生物衰老时钟的关系,后者覆盖了 17 个人体器官的衰老特征。这些时钟分别基于蛋白水平、代谢物含量及医学影像特征构建。结果发现,多数器官呈现 U 形衰老规律,但曲线最低点(最佳睡眠时间)并不总是在同一位置。例如,基于心脏蛋白的衰老时钟显示,6小时睡眠对应了最佳健康状态;而脑部蛋白时钟显示,8 小时睡眠效果最优。此外,在某些情况下,男女的最佳睡眠时间存在差异。总体来看,与睡眠时间过长或过短的人相比,每天睡眠维持在6至8小时的人衰老更慢、健康状况更好,2型糖尿病、抑郁症等疾病的发生率也更低。
- Google 证实限制 Gmail 新用户的免费存储空间
Gmail 帐户通常会获得 15GB 的免费存储空间,但用户现在报告 Google 将 Gmail 新用户的免费存储空间限制在 5GB,要解锁 15GB 免费存储空间用户需要在帐户中添加手机号码。在用户通过社交媒体报道这一消息之后,Google 发表声明证实了它的测试:“我们正针对特定地区新创建的帐户测试新的存储策略,这将有助于我们继续为用户提供高质量的存储服务,同时鼓励用户提升其帐户安全性和数据恢复能力。”
- 三位一体核试验现场发现新晶体
1945 年 7 月 16 日,人类历史上首枚原子弹被引爆。这场代号三位一体(Trinity)的核试验的试验不仅开启了核时代,也在瞬间重塑了物质结构。科学家在对当年爆炸现场留下的特殊玻璃岩,即“三位一体石”进行深入研究时,意外发现了一种此前被认为不可能存在的全新晶体结构,这为极端条件下的物质演化提供了全新视角。研究团队利用先进的微观分析技术,在“三位一体石”中识别出一种全新的“笼状化合物”。这种晶体拥有由硅原子构成的 12 面体和 14 面体笼状晶格,其内部结构能够将钙、铜及铁原子牢牢锁住。这种物质并非诞生于缓慢的地质演变,而是在核爆瞬间极端的温度与压力环境下,由熔化的沙粒与汽化的金属导线混合而成。爆炸核心区域的瞬时温度超过 1500 摄氏度,压力高达数吉帕斯卡,相当于标准大气压的数万倍。在这种足以将石墨挤压成金刚石的极端条件下,物质在几秒钟内经历了汽化、混合与骤冷。原子来不及排列成常规的稳定结构,从而被迫形成了这种罕见的非平衡态物质。
- Safari 和 Firefox 根据域名改变特定网站的渲染方式
由于今天的主流网站都是为市场份额最大的浏览器 Chrome 设计的,市场份额较小的浏览器如 Safari 和 Firefox 不得不适应这种现实而改变其工作方式。Safari 和 Firefox 都包含了特定代码针对不同域名改变渲染方式。Firefox 的 about:compat 包含了一系列网站的兼容性干预措施,Safari 的 Quirks.cpp 改变了 facebook.com、x.com/twitter.com 和 reddit.com 的画中画视频处理方式——这些公司开发了有问题的视频代码,但与其等待它们修复代码,Safari 直接为每一位用户提供了权益之计。Chrome 当然不需要此类代码,毕竟网站是优化运行在 Chrome 而不是其它浏览器上。在 IE 时代之后我们迎来了 Chrome 时代,历史在重复。
- USAID 资金削减与非洲暴力冲突加剧相关
根据发表在《科学》期刊上的一项研究,2025 年初对美国国际开发署(USAID)的资金削减与非洲大陆大部分地区的暴力冲突显著增加存在关联。突然的撤资不仅带走了资源,还中断了合同、人员配置、采购和对项目结果的预期。这可能会使当地政府、中间机构和普通民众面临的不只是物资匮乏,还有承诺落空。因此,这种效应所反映的可能不仅是援助的缺失,同时也是制度的中断,这与援助逐渐减少的影响有很大差异。USAID 曾是全球最大的对外援助机构之一;其业务遍及 100 多个国家,其所支持的各类倡议项目涵盖公共卫生、农业、教育、灾难救援以及民主制度建设。然而,在上任不到一周的时间里,第二届特朗普政府便对 USAID 实施了大规模的削减,它标志着美国长达 60 多年的外交政策发生剧变。研究显示,撤销 USAID 与暴力冲突、武装冲突、抗议和骚乱活动的显著增加相关,特别是在那些曾接受过大量美国援助的地区。这些影响会在 USAID 撤销后立即显现,并持续数月之久。体制薄弱的地区在援助削减后会有更大幅的冲突增加,而体制较稳固地区则能更有效地缓解由此带来的伤害。
- 科学家首次从直立人化石中提取出遗传信息
中科院研究人员首次成功从北京周口店、安徽和县、河南孙家洞三个遗址距今约 40 万年的 6 颗中更新世直立人牙齿化石中,获取了具有系统发育信息的内源性牙釉质蛋白数据。这是首次获得具有直立人鉴定特征的分子信息,重塑了中更新世东亚古人类群体互动网络。中国境内的直立人究竟属于同一个演化支系,还是代表了多个不同来源或相对隔离的群体?研究构建了包括 6 个东亚直立人和 1 个哈尔滨个体在内的内源性蛋白质对比数据集,结果显示,6 个东亚直立人明确聚为一支,与丹尼索瓦人、尼安德特人和现代人清晰分离。研究还揭示出丹尼索瓦人基因组渗入到现代人的部分基因,其来源可以追溯至与周口店、和县、孙家洞中更新世相关人群。
- 第一位牙医是尼安德特人
根据发表在 PLOS One 期刊上的一项研究,第一位牙医是尼安德特人。5.9 万年前,在今天的西伯利亚西南部,一名尼安德特人牙疼难忍,以至于他让别人用锋利的石器钻入牙齿,清除感染的组织,最终缓解疼痛。整个治疗过程在牙齿上留下了一个洞。俄罗斯科学院古人类学家 Alisa Zubova 及其同事认为这是一种牙科工作。考古学家在俄罗斯 Chagyrskaya 洞穴发掘出了这颗牙齿,它是已知最古老的牙科治疗证据,也是迄今发现的最古老直接治疗。牙齿钻孔缓解疼痛似乎有悖常理,但却是去除感染组织最简单破坏性最小的方法。暴露牙髓腔会导致暴露的神经死亡,从而消除疼痛。这种做法直到几百年前才开始普及,但尼安德特人几万年前就发现了,还能互相配合。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.