MONTH · 2026-06

Monthly Digest — 2026-06

131 unique stories across 30 days and 8 sources.

Hacker News(24)

  1. Florida sues OpenAI and Sam Altman over AI risks (www.politico.com)
  2. GitHub and the crime against software (eblog.fly.dev)
  3. Should you normalize RGB values by 255 or 256? (30fps.net)
  4. AI Agent Guidelines for CS336 at Stanford (github.com)
  5. MAI-Thinking-1 (microsoft.ai)
  6. Gmail thinks I'm stupid, so I left (moddedbear.com)
  7. MAI-Code-1-Flash (microsoft.ai)
  8. Morningstar values SpaceX at $780B, half its IPO target (www.reuters.com)
  9. Elixir v1.20: Now a gradually typed language (elixir-lang.org)
  10. I was recently diagnosed with anti-NMDA receptor encephalitis (burntsushi.net)
  11. MacBook Neo is so popular that Apple doubled production (www.macrumors.com)
  12. Gemma 4 12B: A unified, encoder-free multimodal model (blog.google)
  13. Anthropic's open-source framework for AI-powered vulnerability discovery (github.com)
  14. Meta's ships facial recognition on smart glasses (www.buchodi.com)
  15. The desperation of NYTimes (rozumem.xyz)
  16. Sagrada Família Lego set (www.lego.com)
  17. Gov.uk has replaced Stripe with Dutch provider Adyen (www.theregister.com)
  18. Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency (blog.google)
  19. New method turns ocean water into drinking water, without waste (www.rochester.edu)
  20. pg_durable: Microsoft open sources in-database durable execution (github.com)

GitHub Trending(15)

  1. microsoft / markitdown
  2. nesquena / hermes-webui
  3. supermemoryai / supermemory
  4. harry0703 / MoneyPrinterTurbo
  5. chopratejas / headroom
  6. affaan-m / ECC
  7. D4Vinci / Scrapling
  8. aquasecurity / trivy
  9. NousResearch / hermes-agent
  10. PaddlePaddle / PaddleOCR
  11. CopilotKit / CopilotKit
  12. lfnovo / open-notebook
  13. mvanhorn / last30days-skill
  14. MemPalace / mempalace
  15. danielmiessler / Personal_AI_Infrastructure

Product Hunt(24)

  1. R0Y OMNI 1.0

    Generate more accurate investment dashboards and reports

  2. Stella

    Local natural language search across all your files

  3. Sentinel

    Control your robots from anywhere in the world

  4. Tokenwise

    A smart LLM proxy that shows where you're overpaying

  5. Gusto Cofounder

    If Gusto, OpenClaw, and Claude Cowork had a baby...

  6. Knock agent for Slack

    Build, manage, and ship customer messaging from Slack

  7. Branda

    A fun new way to create & manage brands.

  8. choclift

    Use iPhone to open apps, Apple Shortcuts and websites on Mac

  9. Brand Context API

    Ship AI that stays on-brand

  10. Composer

    Multiplayer markdown for you, your team, and your agents.

  11. Town

    The assistant that learns how you work, then gets to work.

  12. Barflare

    Cloudflare Tunnels, managed from your menu bar

  13. Sun

    Collaborative voice API for agents

  14. Extella.AI

    Agentic platform that evolves & builds reusable systems

  15. ChatPilot

    Bulk delete, archive & timestamp your ChatGPT conversations

  16. Build Club Campus

    Virtual AI School: Upskill in AI and Become Great at it Fast

  17. Agent Browser Shield

    Block prompt inject & cut token costs for AI browser agents

  18. Nemotron 3 Ultra by NVIDIA

    Powers faster, efficient reasoning for long-running agents

  19. Leni

    The world’s most accurate AI for investors

  20. Lumo Studios

    Build Decks that Speak for Themselves

Hugging Face(20)

  1. GrepSeek: Training Search Agents for Direct Corpus Interaction

    Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using an index of pre-computed document representations. In this work, we explore a complementary perspective in which the search agent treats the corpus itself as the search environment and finds evidence by issuing executable shell commands. We introduce GrepSeek, an optimized direct corpus interaction (DCI) search agent that trains a compact search agent to find, filter, and compose evidence from large text corpora. To address the instability of learning behavior directly with reinforcement learning on large corpora, we propose a two-stage training pipeline. First, we construct a cold-start dataset using an answer-aware Tutor and answer-blind Planner to generate verified, causally grounded search trajectories. Second, we refine the initialized policy with Group Relative Policy Optimization (GRPO), allowing the agent to improve its task-oriented search behavior through direct interaction with the corpus. To make DCI practical at scale, we further use a semantics-preserving sharded-parallel execution engine that accelerates shell-based retrieval by up to 7.6times while preserving byte-exact equivalence with sequential execution of the shell command. Experiments across seven open-domain question answering benchmarks show that GrepSeek achieves the strongest overall token-level F_1 and Exact Match. Our analysis also highlights the limitations of purely lexical interaction on queries with substantial surface-form variation, suggesting DCI as a practical and competitive method for search agents that can complement existing retrieval paradigms in the real world.

  2. COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

    LLM agents are increasingly expected not only to complete isolated tasks, but also to carry bounded representations of human expertise, judgment, and interaction style. Building such person-grounded agents remains difficult because actionable knowledge associated with a person or role is usually embedded in heterogeneous traces rather than written as clean instructions. Existing memory and persona systems capture fragments of this evidence, while skill frameworks provide portable packaging formats; however, there is no end-to-end workflow for distilling these traces into inspectable, correctable, and agent-usable skills. We present an automated trace-to-skill distillation system for generating person-grounded AI skills via expert knowledge distillation. Given materials from a target person or role, COLLEAGUE.SKILL produces a versioned skill package with two coordinated tracks: a capability track for practices, mental models, and decision heuristics, and a bounded behavior track for communication style, interaction rules, and correction history. The package can be inspected, invoked, updated through natural-language feedback, rolled back, installed across agent hosts, and optionally prepared for controlled distribution. We describe the artifact contract, generation workflow, correction lifecycle, deployment surface, and domain presets implemented in the open-source system. At the time of writing, the public repository has approximately 18.5k GitHub stars; the gallery lists 215 skills from 165 contributors and more than 100k cumulative stars across listed skill cards. The system illustrates how person-grounded skills can be represented as portable, correctable packages rather than opaque prompts or hidden memories.

  3. Trust-Region Behavior Blending for On-Policy Distillation

    On-policy distillation (OPD) trains a student on prefixes sampled from its own policy while matching a stronger teacher. This addresses the prefix mismatch of offline distillation, but early student rollouts can still be poor, placing teacher supervision on weak or low-quality prefixes. We propose Trust-Region behavior Blending (TRB), a warmup method that replaces the early rollout policy with the closest-to-teacher behavior policy inside a student-centered KL trust region, while keeping the per-prefix reverse-KL OPD loss unchanged. The KL budget is annealed to zero, so training returns to pure student rollouts after warmup. Across two math-reasoning distillation settings, TRB attains the strongest average among the compared methods.

  4. SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

    Zero-shot text-to-speech (TTS) has improved substantially for single-speaker synthesis, yet expressive long-form multi-speaker dialogue remains difficult. A common workaround is to synthesize each turn with a monologue TTS model and stitch the outputs together. This adds inference cost and often breaks acoustic consistency, conversational coherence, and affective continuity across turns. Recent dialogue TTS systems have begun to address this setting, but they still struggle to keep expressive coherence, controllable speaker switching, and monologue quality at the same time. We present SwanData-Speech and SwanVoice. SwanData-Speech builds monologue and dialogue corpora from in-the-wild audio, using Swan Forced Aligner for pause-aware word-level alignment and RobustMegaTTS3 for pronunciation-hard cases. Built on these data, SwanVoice is a zero-shot TTS model for 1--4 speakers, combining a 25 Hz VAE, raw-text conditioning with pause-aware symbols and pinyin substitution, and a flow-matching DiT with speaker-turn conditioning. Training starts from monologue speech, moves through mixed and real dialogue data, and then uses DiffusionNFT post-training with phone-level and speaker-similarity rewards. On SwanBench-Speech, SwanVoice obtains higher richness and hierarchy scores than all evaluated open-source baselines in both monologue and dialogue settings, while content accuracy remains the main limitation. Audio demos are available at https://swanaigc.github.io//#swanvoice.

  5. A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

    As agent capabilities advance, existing benchmarks, such as τ^2-Bench, are becoming increasingly saturated. Yet constructing new benchmark tasks remains complex, costly, and labor-intensive. Moreover, the standard approach, in which scenarios are first written in natural language and then mapped to tool sequences, captures only a narrow subset of the tool-use patterns agents exercise. In this paper, we address these problems by reversing the task construction process. We propose TASTE: Task Synthesis from Tool Sequence Evolution, an automatic method that generates challenging tasks with broader tool-use coverage. TASTE utilizes an Adaptive Contrastive n-gram model trained on LLM-judged validity signals. This enables sampling valid tool sequences that cover a vast range of tool combinations. TASTE then selects representative sequences from the pool via clustering, instantiates them into complete benchmark tasks, and refines them through iterative difficulty evolution. Using TASTE, we construct τ^c-Bench, a challenging extension of the three domains of τ^2-Bench. We evaluate 11 agent/user LLM pairs and find that models nearly saturating τ^2-Bench suffer severe performance drops on our tasks (e.g., Gemini-3-Flash falls from 0.82!-!0.94 to 0.28!-!0.61). Beyond increasing difficulty, our generated tasks more than double the number of unique tool combinations agents must execute. Our results suggest high scores on existing benchmarks often reflect saturation rather than robust task-solving ability. By automating the generation of difficult, high-coverage benchmarks, TASTE enables continuous, scalable evaluation of future agents.

  6. Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

    Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available at https://github.com/pat-jj/harness-1.

  7. Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

    Speculative decoding accelerates LLM inference by drafting multiple tokens and verifying them in parallel with the target model. However, its practical speedup is constrained by the trade-off between draft quality and drafting cost: autoregressive drafters model causal dependencies among draft tokens but incur sequential overhead, while parallel drafters reduce drafting cost but weaken intra-block dependency modeling. In this paper, we propose Domino, a speculative decoding framework that decouples causal dependency modeling from expensive autoregressive draft execution. Domino first uses a parallel draft backbone to produce preliminary draft distributions for the entire block, and then applies a lightweight Domino head to refine them with prefix-dependent causal information. To stabilize teacher-forced causal encoding, we further introduce a base-anchored training curriculum that first strengthens the parallel backbone and then gradually shifts optimization toward the causally corrected final distribution. Experiments on Qwen3 models show that Domino achieves up to \(5.49\times\) end-to-end speedup under the Transformers backend and up to \(5.8\times\) throughput speedup under SGLang serving.

  8. Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

    Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive markets, these perturbations are typically independent across providers. We theoretically prove that averaging output probability distributions recovers the unwatermarked distribution with up to a second-order error term. Empirically, simply averaging 3-5 models cancels out these perturbations. We introduce WASH (Watermark Attenuation via Statistical Hybridisation), which solves practical challenges in ensemble generation: vocabulary misalignment and tokenisation differences across heterogeneous models. Experiments across six watermarking schemes and three LLMs show that averaging across 3 models suppresses detection z-scores from 5-300 to below 2 (below the detection threshold of 4) and reduces TPR at 5% FPR to below 50%, while improving quality by 27.5% and running 6 times faster than the best baseline on the long sequence generation. Our results suggest that robust AI-text detection via watermarking requires either accepting this fundamental vulnerability or unprecedented coordination among model providers.

  9. OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

    Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world's knowledge into its weights. However, many practical applications benefit more from robust reasoning than from extensive parametric knowledge. In this setting, task-specialized small language models (SLMs) offer a principled design choice. We introduce Optimal Cognitive Core (OCC), a family of SLMs built around this premise. As a variant of OCC, we present OCC-RAG, optimized for faithful question answering (QA) grounded in the provided context. This task directly aligns with the OCC design approach, requiring multi-hop reasoning over supplied passages while ignoring memorized knowledge. To train OCC-RAG, we implement a novel pipeline for synthesizing multi-context, multi-hop QA data at scale, producing a corpus of over three million examples targeting multi-hop reasoning, strict context faithfulness, and calibrated abstention. We release OCC-RAG-0.6B and OCC-RAG-1.7B, both mid-trained on this corpus. The models produce structured reasoning traces with source citations grounded in literal quotes from the context. Through OCC-RAG, we demonstrate that compact, task-specialized SLMs can match or exceed general-purpose models 2 -- 6x their size across multi-hop reasoning (HotpotQA, MuSiQue, TAT-QA), faithfulness (ConFiQA), and refusal (MuSiQue-Un) benchmarks.

  10. From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

    Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept relative to other concepts. Yet strong activation alone does not establish that a region represents the concept itself, as responses may instead be driven by correlated visual or semantic cues. We introduce BrainCause, an automated framework that combines generative and brain models to synthesize controlled stimuli and validate neural representations through targeted causal testing. Given a query specifying a concept of interest, our framework constructs targeted stimulus sets comprising concept images, counterfactual edits that remove the target concept while preserving other image content, and images with candidate correlated distractors. It then uses an image-to-fMRI encoding model to predict brain responses and searches for representations that respond specifically to the target concept over correlated alternatives. BrainCause returns validated candidate representations and proposes follow-up fMRI experiments to further test or extend its discoveries. Our approach successfully recovers known functional localizations and identifies new candidate representations across dozens of concepts, validated on both predicted and measured fMRI data. Critically, we show that without causal validation, a large fraction of localizations would be false positives, confirming that activation alone is insufficient evidence of representation.

  11. Trust Region On-Policy Distillation

    On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ substantially, as teacher supervision on student-generated tokens may yield unreliable policy gradients and even cause optimization failure. This work addresses reliable on-policy token-level supervision through credit assignment strategies, and proposes Trust Region On-Policy Distillation, TrOPD. It features the following characteristics: 1) Trust-Region On-Policy Learning: TrOPD performs OPD only in regions where the teacher provides reliable supervision, mitigating the optimization difficulty of the K1 reverse-KL estimator under distribution mismatch. 2) Outlier Estimation: For outlier regions, we explore gradient clipping, masking, and forward-KL estimation to reduce the adverse effects of unreliable supervision. 3) Off-Policy Guidance: The student continues generation from teacher prefixes and uses forward KL to imitate off-policy guidance, encouraging on-policy exploration toward reliable regions. Experiments show that TrOPD consistently outperforms SoTA OPD baselines, including OPD, EOPD, and REOPOLD, across mathematical reasoning, code generation, and general-domain benchmarks.

  12. Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

    We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks highly dynamic behaviors while achieving unprecedented zero-shot generalization to unseen motions and control tasks. Extensive experiments and scaling analyses show that our model establishes a new performance frontier, demonstrating robust zero-shot generalization to unseen tasks while simultaneously tracking highly dynamic and complex motions.

  13. Audio Interaction Model

    Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-decide-respond loop, listens to sound, environment, and instructions in real time and reacts on the fly. We formalize this regime as the Audio Interaction Model, and realize it with Audio-Interaction, a unified streaming model that retains offline task execution while adding online general audio instruction following, from dialogue to full voice chatting, deciding when to respond from the semantics of the stream. To enable this, we propose SoundFlow, a framework that instantiates the perceive-decide-respond loop end to end, from data to training to deployment, through streaming-native data construction, comprehension-aware training, and asynchronous low-latency inference for stable real-time interaction. We further construct StreamAudio-2M, a 2.6M-item streaming corpus spanning 7 fundamental abilities and 28 sub-tasks, and Proactive-Sound-Bench for evaluating proactive audio intervention. Across 8 benchmarks, Audio-Interaction preserves competitive performance on mainstream audio tasks while unlocking capabilities inaccessible to offline LALMs, including real-time ASR, streaming audio instruction following, and proactive help.

  14. Cosmos 3: Omnimodal World Models for Physical AI

    We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 https://openmdw.ai/license/1-1/ License at https://github.com/nvidia/cosmos}{github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3 . The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3 .

  15. Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

    Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents. We collect 2,790 real trajectories from two agent frameworks, three backbone models, and three benchmarks, convert raw logs into semantic spans, and annotate harmful error spans through LLM-assisted expert review. From these annotations, we build TELBench, a 1,000-instance benchmark for identifying error spans among normal exploration, failed searches, tentative hypotheses, and harmless noise. We further propose DRIFT, a claim-centric auditing framework that tracks agent claims, checks their support in trajectory evidence, and marks spans where unsupported or conflicting claims affect the answer path. Experiments across model families and auditing frameworks show that DRIFT improves span-level error localization and first-error accuracy by up to 30 percentage points. Our work provides a process-level view of reliability in deep-research agents.

  16. Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

    Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hacking behaviors are often subtle and entangled with multiple judge biases, making them difficult to analyze, detect, and mitigate. In this paper, we introduce CHERRL, a controllable hacking environment for rubric-based RL. By injecting known biases into LaaJ, CHERRL enables stable reproduction of reward hacking, explicit observation of reward divergence, and precise identification of hacking onset. This provides a clean experimental testbed for studying the mechanisms and mitigations of reward hacking in rubric-based RL. To demonstrate its utility, we analyze different judge biases from the perspectives of discoverability and exploitability, and explore an agent-based system for automatically detecting reward hacking onset from training logs. The code and environment are publicly available at https://github.com/THUAIS-Lab/CHERRL.

  17. ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

    Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in scenarios the source text never explores. We introduce ArcANE (Arc-Aware Narrative Evaluation), an automatically constructed benchmark spanning 17 novels and 80 principal characters. A Character Arc segments the narrative into phases along a psychological axis, and each probe poses the same scenario across phases, spanning both situations within the source text and situations beyond it. Across six models and six context modes, conditioning on the Character Arc tops every other context strategy on every model, and the gap is largest on scenarios outside the source text where retrieval has nothing to find. We further fine-tune open-weight models on the same data to obtain ArcANE-8B/32B, which widen the Arc advantage even more on scenarios outside the source text.

  18. TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

    Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their total number unknown in advance. We frame this as the task of discovering multiple hidden problems from context, in which coexisting problems should be uncovered, grounded in supporting evidence, and paired with concrete actions. To this end, we introduce TIDE, a template-guided iterative framework with two complementary mechanisms. Specifically, motivated by the observation that single-pass prediction anchors on the most salient cases and yields generic claims, we propose iterative discovery, which surfaces a small batch of candidates per round while conditioning on what has already been found, so subsequent rounds extend coverage; and thought templates, reusable schemas distilled from previously solved cases that specify what contextual signals to attend to and how to connect them, anchoring each prediction in a recognizable problem class. We validate TIDE on two realistic settings, personal workspaces and software repositories, across four model backbones, showing substantial gains over single-shot and parallel multi-agent baselines on task coverage, identification, and resolution.

  19. Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

    Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific LoRA adapters, effectively injecting repository knowledge with zero inference-time token overhead. Code2LoRA supports two usage scenarios: Code2LoRA-Static converts a single repository snapshot into an adapter, suitable for comprehension of stable codebases; while Code2LoRA-Evo maintains an adapter backed by a GRU hidden state updated per code diff, suitable for active development of evolving codebases. To evaluate Code2LoRA against parameter-efficient fine-tuning baselines, we build RepoPeftBench, a benchmark of 604 Python repositories with two tracks: a static track with 40K training and 12K test assertion-completion tasks, and an evolution track with 215K commit-derived training and 87K commit-derived test tasks. On the static track, Code2LoRA-Static achieves 63.8% cross-repo and 66.2% in-repo exact match, matching the per-repository LoRA upper bound; on the evolution track, Code2LoRA-Evo achieves 60.3% cross-repo exact match (+5.2 pp over a single shared LoRA). Code2LoRA's code can be found at https://anonymous.4open.science/r/code2lora-6857; the model checkpoints and RepoPeftBench datasets can be found at https://huggingface.co/code2lora.

  20. AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

    Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To address this gap, we introduce AdaPlanBench, a dynamic interactive benchmark for evaluating whether Large Language Model (LLM) agents can adaptively plan and re-plan under progressively revealed world and user constraints. AdaPlanBench is built on 307 household tasks, with a scalable constraint construction pipeline that augments each task with dual constraints. At runtime, agents interact with the environment in a multi-turn protocol where hidden constraints are revealed only when the agent proposes a plan that violates them, requiring iterative plan revision under accumulating feedback. This makes planning challenging, as agents must infer and track constraints from feedback while re-planning effectively. Experiments on ten leading LLMs show that adaptive planning under dual constraints remains challenging, with the best model reaching only 67.75% accuracy. We further observe that performance degrades as more constraints accumulate, with user constraints posing a particularly large challenge and failures often stemming from weaker physical grounding and reduced effectiveness. These results establish AdaPlanBench as a testbed for dual-constrained interactive planning and highlight the challenge of reliable adaptation to dynamically revealed constraints in LLM agents.

Techmeme(24)

  1. Alphabet is raising $80B in equity offerings, including a $10B investment deal with Berkshire Hathaway, to help raise money for its AI spending plans (Bloomberg)

    Bloomberg : Alphabet is raising $80B in equity offerings, including a $10B investment deal with Berkshire Hathaway, to help raise money for its AI spending plans —  Google parent Alphabet Inc. is raising $80 billion in equity offerings, including an investment deal with Berkshire Hathaway Inc. …

  2. HPE reports Q2 revenue up 40% YoY to $10.7B, vs. $9.74B est., Server revenue up 33%, forecasts revenue for FY26 and FY27 above est.; HPE jumps 30%+ after hours (Brody Ford/Bloomberg)

    Brody Ford / Bloomberg : HPE reports Q2 revenue up 40% YoY to $10.7B, vs. $9.74B est., Server revenue up 33%, forecasts revenue for FY26 and FY27 above est.; HPE jumps 30%+ after hours —  Hewlett Packard Enterprise Co. shares soared in extended trading after the company gave an outlook for annual sales that topped estimates …

  3. Gigascale Capital, a climate tech VC firm co-founded by former Meta CTO Mike Schroepfer, closed a $250M fund to back early-stage startups supporting the AI boom (Michelle Ma/Bloomberg)

    Michelle Ma / Bloomberg : Gigascale Capital, a climate tech VC firm co-founded by former Meta CTO Mike Schroepfer, closed a $250M fund to back early-stage startups supporting the AI boom —  Gigascale Capital closed a $250 million fund for early-stage startups.  —  Climate tech venture capital firm Gigascale Capital …

  4. Researchers find packages in the @redhat-cloud-services npm namespace shipped malware that harvests credentials for GitHub Actions, AWS, GCP, Azure, and others (Rohan Prabhu/Step Security Blog)

    Rohan Prabhu / Step Security Blog : Researchers find packages in the @redhat-cloud-services npm namespace shipped malware that harvests credentials for GitHub Actions, AWS, GCP, Azure, and others —  Several packages in the @redhat-cloud-services npm scope were found to carry malicious payloads that fire via a preinstall hook on every npm install.

  5. The US sanctions Nobitex, Iran's largest crypto exchange, accusing it of helping Iran's government and blacklisted state institutions evade Western sanctions (Gavin Finch/Reuters)

    Gavin Finch / Reuters : The US sanctions Nobitex, Iran's largest crypto exchange, accusing it of helping Iran's government and blacklisted state institutions evade Western sanctions —  The United States announced sanctions on Iran's biggest cryptocurrency exchange on Tuesday, accusing it of enabling the Iranian government …

  6. Palo Alto Networks reports Q3 revenue up 31% YoY to $3B, including $388M from CyberArk and Chronosphere, vs. $2.94B est., and forecasts Q4 revenue above est. (Samantha Subin/CNBC)

    Samantha Subin / CNBC : Palo Alto Networks reports Q3 revenue up 31% YoY to $3B, including $388M from CyberArk and Chronosphere, vs. $2.94B est., and forecasts Q4 revenue above est. —  - Palo Alto Networks topped Wall Street's third-quarter estimates, citing AI advancements

  7. Meta is testing a Series feature, letting select creators make episodic Reels that are placed in a dedicated hub on their profile, using both old and new Reels (Aisha Malik/TechCrunch)

    Aisha Malik / TechCrunch : Meta is testing a Series feature, letting select creators make episodic Reels that are placed in a dedicated hub on their profile, using both old and new Reels —  Meta is testing a new “Series” feature for Reels that's designed to make it easier to keep up with serialized content on Instagram and Facebook, the company told TechCrunch.

  8. Source: a CoreWeave-tied data center raised $900M via five-year junk bonds, priced at par to yield 7.5%, as the sector increasingly turns to high-yield bonds (Gowri Gurumurthy/Bloomberg)

    Gowri Gurumurthy / Bloomberg : Source: a CoreWeave-tied data center raised $900M via five-year junk bonds, priced at par to yield 7.5%, as the sector increasingly turns to high-yield bonds —  A data center tied to CoreWeave Inc. raised $900 million from a high-yield note offering, joining a wave of junk issuers tapping debt markets …

  9. CrowdStrike reports Q1 revenue up 26% YoY to $1.39B, vs. $1.36B est., and forecasts Q2 revenue of about $1.44B, vs. $1.43B est.; CRWD drops 9%+ after hours (Samantha Subin/CNBC)

    Samantha Subin / CNBC : CrowdStrike reports Q1 revenue up 26% YoY to $1.39B, vs. $1.36B est., and forecasts Q2 revenue of about $1.44B, vs. $1.43B est.; CRWD drops 9%+ after hours —  CrowdStrike narrowly beat Wall Street's fiscal first-quarter estimates after the bell on Wednesday, but shares slid 10% following the report.

  10. Broadcom reports Q2 revenue up 48% YoY to $22.19B, vs. $22.27B est., and forecasts Q3 semiconductor revenue from AI below estimates; AVGO drops 12%+ after hours (Reuters)

    Reuters : Broadcom reports Q2 revenue up 48% YoY to $22.19B, vs. $22.27B est., and forecasts Q3 semiconductor revenue from AI below estimates; AVGO drops 12%+ after hours —  Broadcom (AVGO.O) missed Wall Street expectations for second-quarter revenue on Wednesday, as increased competition …

  11. Filing: SpaceX aims to raise $75B in its IPO, selling 555.6M shares at $135 each, which would value the company at ~$1.77T (Bailey Lipschultz/Bloomberg)

    Bailey Lipschultz / Bloomberg : Filing: SpaceX aims to raise $75B in its IPO, selling 555.6M shares at $135 each, which would value the company at ~$1.77T —  SpaceX is seeking to raise $75 billion in an initial public offering that would be the biggest of all time, as Elon Musk's rocket, satellite and artificial intelligence …

  12. The US and other Five Eyes nations warn that China is flooding online job platforms with fake profiles and offers targeting government and military personnel (Greg Miller/Washington Post)

    Greg Miller / Washington Post : The US and other Five Eyes nations warn that China is flooding online job platforms with fake profiles and offers targeting government and military personnel —  Nations in the Five Eyes intelligence partnership warned that fake profiles and job offers are targeting military officers, spies …

  13. Internal docs from lawsuits by 1,400 school districts show how social media companies targeted kids: Meta paid "teen ambassadors", Snap sent school-hour alerts (Jennifer Valentino-DeVries/New York Times)

    Jennifer Valentino-DeVries / New York Times : Internal docs from lawsuits by 1,400 school districts show how social media companies targeted kids: Meta paid “teen ambassadors”, Snap sent school-hour alerts —  Internal documents show how tech giants grabbed children's attention throughout the day, a strategy that schools say has undermined education.

  14. Poke, which lets users access AI agents via text message, becomes the first AI agent approved for Apple's Messages for Business platform (Sarah Perez/TechCrunch)

    Sarah Perez / TechCrunch : Poke, which lets users access AI agents via text message, becomes the first AI agent approved for Apple's Messages for Business platform —  Poke, a startup that turns using AI agents into something as simple as sending a text message, has become the first AI agent approved to run on Apple's Messages for Business platform.

  15. Unsealed 2020 lawsuit: ex-IBM VP of threat intelligence alleges that IBM and AT&T concealed foreign cyber breaches to maintain eligibility for federal contracts (Bloomberg)

    Bloomberg : Unsealed 2020 lawsuit: ex-IBM VP of threat intelligence alleges that IBM and AT&T concealed foreign cyber breaches to maintain eligibility for federal contracts —  International Business Machines Corp. and AT&T Inc.'s computer systems were repeatedly breached by foreign hackers …

  16. Sources: Anthropic has embedded around half a dozen forward-deployed engineers within the NSA to help the agency deploy Mythos for offensive cyber operations (Financial Times)

    Financial Times : Sources: Anthropic has embedded around half a dozen forward-deployed engineers within the NSA to help the agency deploy Mythos for offensive cyber operations —  Arrangement comes as AI lab is locked in legal battle with Pentagon over Claude model

  17. Marvell and Flex, a contract manufacturer for electronics, will join the S&P 500; MRVL jumps 6%+ after hours after closing down 16.74% amid a broader sell-off (Kif Leswing/CNBC)

    Kif Leswing / CNBC : Marvell and Flex, a contract manufacturer for electronics, will join the S&P 500; MRVL jumps 6%+ after hours after closing down 16.74% amid a broader sell-off —  - Marvell Technology, which makes parts and products needed for the AI infrastructure boom, is joining the S&P 500

  18. Source: OpenAI and White House are discussing a government stake in the company, to seed something like the "Public Wealth Fund" that OpenAI outlined earlier (CNBC)

    CNBC : Source: OpenAI and White House are discussing a government stake in the company, to seed something like the “Public Wealth Fund” that OpenAI outlined earlier —  OpenAI CEO Sam Altman and the White House are in ongoing talks about a possible government stake in the artificial intelligence company, CNBC confirmed on Friday.

  19. Sources: Apollo and Blackstone finalized a $35B package for Anthropic to lease TPUs; Broadcom is backstopping payments on the debt's largest senior portions (Bloomberg)

    Bloomberg : Sources: Apollo and Blackstone finalized a $35B package for Anthropic to lease TPUs; Broadcom is backstopping payments on the debt's largest senior portions —  Apollo Global Management Inc. and Blackstone Inc. have finalized a $35 billion financing package for Anthropic PBC to expand its AI infrastructure …

  20. Trump signs a national security memorandum seeking to "accelerate the use of AI across intelligence and warfighting domains in line with American values" (Reuters)

    Reuters : Trump signs a national security memorandum seeking to “accelerate the use of AI across intelligence and warfighting domains in line with American values” —  The White House said on Friday it would accelerate the development and use of AI for national security applications …

Solidot(24)

  1. 三种埃博拉疫苗在研发中

    The International Aids Vaccine Initiative(IAVI)、牛津大学以及 Moderna 公司正在研发针对埃博拉病毒的疫苗。IAVI 表示正在刚果民主共和国爆发的埃博拉疫情可能是至今最严重的。疫情发生在冲突地区,已经报告了逾千例疑似病例,邻国乌干达已确诊 9 例。目前已知有六种埃博拉病毒株,只有三种会引发疫情。最常见的 Zaire 毒株已有针对性的疫苗,但此次爆发的是比较罕见的 Bundibugyo 毒株,目前还没有针对它的疫苗。Moderna 公司宣布将利用 mRNA 技术研发针对 Bundibugyo 毒株的疫苗。

  2. 巴西亚马逊出现旱季延长和降雨模式改变

    最近发表的两项研究显示,巴西亚马逊地区开始出现此前预测几十年后才会出现的情景,包括旱季延长和降雨模式改变。如果没有采取应对措施,情况可能会迅速恶化,对生物多样性、天然水库的补充以及森林功能构成威胁。其中一项研究表明,亚马逊地区的旱季正从四个月延长至六个月,期间降水量减少逾 150 毫米。第二项研究分析了 2023 年至 2024 年间亚马逊地区的干旱情况。研究结果显示,过火面积增加了 9%,森林退化预警增加了 19%,在干旱高峰期,多达 420 万公顷的森林受到火灾影响。结果表明,干旱、火灾和退化的循环在加剧,削弱了生态系统的恢复能力。亚马逊雨林的面积也可能会减少。

  3. 中国批准首例侵入式脑机接口芯片之后

    去年 10 月的一天,Dong Hui 突然决定试试能不能握笔写字。6 年前他因为车祸导致的脊髓损伤而颈部以下瘫痪。他缓慢而坚定的写下了自己的名字、谢谢和日期。他能做到这一切来自他参加的脑机接口芯片试验。2024 年 11 月 Dong Hui 成为中国首批接受脑部手术植入侵入式脑机接口芯片的患者之一。今年三月他使用的植入式脑机接口产品获得了商业使用批准。他植入的脑机接口设备被称为 NEO,由上海初创公司 Neuracle Technology 和清华大学合作研发。手术历时约 1.5 小时,收集脑电信号的传感器植入放置在他的硬脑膜上。植入物会将信号传输到计算机。计算机将信号翻译成指令,控制他每天 2.5 小时训练期间佩戴的软体机器人手套,帮助他学习抓握。手术后大约一周他开始康复训练,“训练的第九天,我的右手成功不用手套抓住了一个球,那真是个奇迹。”悉尼科技大学的脑机接口研究员 Avinash Singh 表示,NEO 迅速获得批准的原因之一是其侵入性相对较小,它的 8 个传感器放置在大脑保护膜之上,相比下马斯克(Elon Musk)所创办的 Neuralink 公司开发的 N1 脑机芯片直接穿透了大脑皮层。NEO 的出血、胶质瘢痕形成和长期信号衰减的风险较低。中国还着手将脑机接口列入医保,将其与量子技术、人形机器人等列为对中国未来科技竞争力至关重要的六大关键产业之一。信息科学家 Meicen Sun 表示,中国一大优势是患者乐于接受新技术。美国初创公司 Axoft 正与中国公司合作在中国对四名患者进行脑机接口测试,并计划扩大规模。

  4. 实验性药物显著延长了最致命癌症患者的生存期

    胰腺癌是最致命的癌症,大部分现有疗法的效果甚微。现在名为 daraxonrasib 的药物公布了 III 期临床试验结果,有 500 名胰腺癌已扩散的患者参与了试验,其中 248 名患者每日服用 daraxonrasib,其余 252 名接受化疗。结果显示,服药组的中位生存期为 13.2 个月,化疗组为 6.6 个月,也就是药物将患者的生存期延长了一倍,而且副作用更少。研究报告公布在芝加哥举行的美国临床肿瘤学会年会上,专家认为这种药物有望引领一场治疗革命。Daraxonrasib 的作用机制是靶向名为 Kras 的蛋白质,这种蛋白质驱动了几乎所有胰腺癌。药物通过粘合分子去捕获并抑制 Kras 蛋白,从而阻止肿瘤的生长。

  5. 拒绝停止呼吸的土壤

    法国生化学家 Sébastien Fontaine 15 年来一直试图杀死土壤,他想要了解没有任何生命的土壤能释放多少碳。 他的团队将土壤密封在罐子内,用伽马射线进行灭菌照射。然后等待土壤释放的二氧化碳——这是微生物呼吸持续进行的标志——下降。他们等待了几周,几个月。在显微镜下,经辐射处理的土壤没有显示任何生命迹象,但它仍在继续释放二氧化碳。土壤拒绝停止呼吸。Fontaine 的实验室重复了实验得到了相同的结果。研究人员开始寻找无生命土壤中的呼吸来源。Fontaine 的团队如今报告,他们的土壤样本在六年内持续消耗氧气并释放二氧化碳。他们提出,为生命提供能量的代谢过程也可能发生在活细胞之外。他们的实验表明,即使没有通常组织土壤的生物蛋白质,这种代谢过程也能在土壤中发挥作用。如果他们的假设正确,那么部分生化反应如释放富碳糖分子能量的反应,可能并非生物所独有。此类反应甚至可能在地球生命出现前就已经存在。

  6. 蓝色章鱼是全新物种

    2015 年在加拉帕戈斯群岛进行深海考察的科学家在查看遥控潜水器拍摄的影像时,发现了一只体型娇小、通体呈蓝色的章鱼,它位于水下约 1773 米处。科学家捕捉了这只章鱼以进行进一步分析。研究人员如今得出结论:这只体型小到可以放在手掌的可爱小生物属于一个全新物种。研究报告发表在《Zootaxis》期刊上。小章鱼被保存在储藏室中。由于它的独一无二,且极不可能采集到另一只,科学家不愿意对其解剖进行彻底的物种鉴定分析。因此研究团队选择了 mini-CT 扫描,研究表明这种生物手臂很短,臂上的吸盘很少,没有墨囊,皮肤光滑,且有一颗巨大的脊齿。他们将该物种命名为 Microeledone galapagensis。

  7. 富铁免疫细胞帮助信鸽导航

    迁徒鸟、海龟等动物似乎具有感知地磁场的能力,能利用地磁场进行导航。根据发表在《科学》期刊上的一项研究,信鸽肝脏中的富铁免疫细胞可能赋予了其磁罗盘的能力。对信鸽组织薄片的分析发现,其肝脏巨噬细胞富含铁蛋白,但它在脾脏中很少,且在喙和大脑中完全不存在。电子显微镜的进一步观察发现,巨噬细胞紧邻神经元,而这些神经元都与中枢神经系统相连。研究人员设计了一个试验检验富含铁的巨噬细胞是否能像磁罗盘一样为信鸽指引方向:他们使用名为 clodronate liposomes 的药物抑制巨噬细胞的活性。研究团队训练了 34 只信鸽。白天信鸽利用太阳的位置确定方向。当阴天或完全被云层遮蔽时,它们依靠磁感应辨别方向。研究团队给 18 只信鸽注射了 clodronate,24 小时后当云完全遮蔽阳光时将它们逐一放飞。这些信鸽都佩戴了 GPS 发射器,研究团队能实时追踪其飞行轨迹。所有 18 只信鸽都迷路了,直到天空放晴才返回。16 只对照组的信鸽都没有迷路。研究人员表示,如果铁蛋白辅助导航机制得到证实,那么它可能具有普适性,适用于从蜜蜂到蝙蝠,到鲸鱼和鲨鱼等各种动物。

  8. NASA 低音爆超音速飞机 X-59 将首次尝试突破音速

    NASA 宣布,由洛克希德马丁臭鼬工厂设计的 X-59 Quess 低音爆超音速飞机将在本月首次尝试突破音速。X-59 设计能突破音速但同时不会有超音速飞机通常会产生的音爆,它会产生更安静的“砰砰声”,类似室内听到关车门的声音。它没有前向窗户,而是通过摄像头和显示屏为飞行员提供飞机前方的增强现实的外部视觉系统。如果 X-59 成功它有望对超音速飞行和航空业产生革命性影响,解除目前对超音速飞行的限制。X-59 于 2025 年 10 月完成首飞,2026 年 3 月以来进行了 14 次试飞,本月的超音速飞行计划在 16.7 公里高度实现 1.4 马赫。

  9. 青春与长寿之间的基因权衡

    科学家发现基因 vgll3 与生命早期生长发育和生殖成功以及生命晚期衰老加速和癌症风险增加直接相关。最新研究为 antagonistic pleiotropy 假说提供了实验证据。该假说认为某些基因会在生命早期带来优势,但在生命晚期则会带来不利影响。研究人员针对了一种寿命非常短的非洲丽鱼(African turquoise killifish),使用 CRISPR 基因编辑技术修改了该基因。结果显示,修改了 vgll3 基因的鱼生长速度更快,性成熟更早,在自然环境中具有繁殖优势。但代价是寿命缩短,且罹患与年龄相关癌症的几率更高。研究人员指出,大自然并不优先考虑寿命,而是优先考虑延续性。人类也存在 vgll3 基因,这项研究也有助于更好的理解人类发育、衰老和年龄相关疾病。

  10. Meta 给予员工每次最多 30 分钟退出跟踪

    Meta 最近开始在美国员工电脑上安装追踪软件,捕捉员工鼠标移动、点击和按键数据以用于训练 AI 模型,此举是该公司构建能自动执行工作任务的 AI 智能体的大计划的一部分。被称为 Model Capability Initiative(MCI)的工具在公司内部引发了强烈反对,部分员工为此发起了一项请愿活动,已有逾 1500 人签名。有匿名员工认为公司的行为“非常反乌托邦”。根据周二发给员工的一份内部备忘录,Meta 略微后退了一步,允许员工退出跟踪,“每次最长 30 分钟”,员工也可以申请永久退出该跟踪计划。

  11. 数学家警告 AI 对数学专业的威胁

    数学家联合发表了获得国际数学联盟支持的宣言《Leiden Declaration》,警告 AI 通过产生大量看似合理但不可靠甚至错误的证明、削弱归因、改变激励机制以及赋予科技公司对研究优先事项过大的影响力去破坏数学。已有数百人签署了这一宣言,它警告 AI 的发展威胁到了数学研究的固有价值。宣言首先指出,区分 AI 产生的证明和正确的数学证明非常困难,给审稿人带来了越来越大的压力,生成 AI 论文成本低廉但验证论文代价昂贵,如果后续研究是基于错误的前提,那么错误会扩大。其次 AI 的训练是基于已有的数学论文,但它输出论文时经常不能正确引用,AI 模型的训练也普遍存在版权侵犯问题。第三 AI 的激励机制与数学专业的价值观背道而驰。宣言敦促数学家将 AI 视为一种工具,而非人类责任的替代品。数学家个人应公开 AI 的使用情况,对其工作的正确性承担责任。宣言还警告,数学可能被用于战争、压迫、大规模监控和破坏民主,因此数学家应谨慎权衡与科技行业合作的伦理问题。

  12. 微软的量子芯片存在基础性问题

    微软宣布了其第二代量子芯片 Majorana 2。但专家认为微软的量子芯片缺乏坚实的研究基础,根本行不通。微软是在 2025 年初宣布了其第一代量子计算芯片 Majorana 1,利用它所谓的拓扑体去观察和控制马约拉纳粒子,从而产生更可靠和可扩展的量子比特。第一代拓扑体使用砷化铟半导体和铝超导体,结果到了第二代微软换成了铅超导体,声称量子比特的寿命从 20 秒延长到了 1 分钟。科学家对微软的说法持强烈怀疑态度,它的最新论文预印本尚未通过同行审议,物理学家 Henry Legg 认为预印本中数据来自于随机伪影。微软的上一篇预印本至今没有通过同行审议,很可能已被顶尖期刊拒绝了。

  13. 在失联半年后火星 MAVEN 任务宣告结束

    在经历了长达六个月的无线电静默后,MAVEN 正式宣告任务终结。这艘于 2013 年发射的探测器,在 2025 年 12 月底一次飞越火星背面的常规过程中神秘失联,根据最后传回的数据显示,探测器当时陷入了异常的快速自旋,导致轨道偏离并耗尽了机载电池。 NASA 召集的审查委员会于近日得出结论,判定其已无法复原。尽管它预计还会在轨道上徘徊 50 到 100 年才会坠毁于火星表面,但其科学寿命已画下句号。NASA 在火星轨道上有三艘探测器,包括了 2001 年发射的 Mars Odysse 探测器,2005 年发射的 Mars Reconnaissance Orbiter(MRO)探测器,以及 2013 年发射的 Mars Atmosphere and Volatile Evolution(MAVEN)。MAVEN 属于三艘中服役时间最短的探测器,另外两艘都接近寿命终点。火星轨道上还有两颗欧洲探测器,以及地面上还有漫游车,因此火星研究还会继续。

  14. Steam 用户中使用 Linux 比例降至 3.99%

    Valve 公布了 2026 年 5 月的 Steam 硬件和软件调查。在 3 月 Steam 玩家使用 Linux 比例达到创纪录的 5.33% 之后 Linux 份额连续两个月下降:4 月 4.52%,5 月 3.99% 减少 0.53% 但仍然有去年同期的两倍。Windows 操作系统占 93.85%,OSX 占 2.16%。在玩家使用的语言中,英语占 39.48% 增加 2.71%,简体中文占 21.85% 减少 1.56%。用户使用英特尔 CPU 的比例占 53.94%,AMD 占 46.03%,英特尔份额在缓慢减少 AMD 在缓慢增加。

  15. 微软创建 Rust Coreutils 分支 Coreutils for Windows

    在本周举行的 Build 2026 大会上,微软宣布了 Coreutils for Windows 项目——软件巨人维护的 Rust Coreutils(uutils)的一个分支,该分支不是硬分支,而是下游版本。Coreutils for Windows 包含了 uutils/coreutils、findutils 和 grep 等工具,其目标是在 Windows、WSL、macOS 和 Linux 等不同平台之间的开发切换更无缝,因为有统一的命令、flags 和管线,以相同的方式工作,现有脚本无需转换即可直接使用。不知道鲍尔默(Steve Ballmer)是不是还记得他说过的话。

  16. 任何程度的饮酒都会增加健康风险

    一项大规模研究显示,即使每天饮酒不足一个标准杯,也会增加患多种癌症风险。研究团队分析了截至 2023 年发表的 843 项队列研究和病例对照研究,对酒精与多种疾病之间的关联进行了系统评估、在所考察的 10 种癌症中,饮酒均与风险升高有关,且风险随饮酒量增加而持续上升。即使每日摄入不足 10 克纯酒精,也与咽癌、结直肠癌、食管癌、乳腺癌、肝癌、胰腺癌和前列腺癌风险增加相关。其中咽癌风险增幅最为显著,可增加一倍以上。除癌症外,饮酒还与肝硬化等慢性肝病以及胰腺炎风险上升相关。研究显示,慢性肝病风险至少增加 40%,胰腺炎风险至少增加 22%。研究结果清晰表明,癌症风险会随着任何水平的酒精摄入而增加,而所谓“适量饮酒有益健康”的证据主要集中在部分非癌症疾病领域,且关联性较弱。

  17. 因空气泄露国际空间站宇航员被告知准备紧急撤离

    由于国际空间站俄罗斯舱段的漏气过去几天从每天一磅空气增加到两磅(0.9 公斤),NASA 命令国际空间站上的宇航员待在飞船内,做好紧急撤离的准备。NASA Crew-12 任务的四名宇航员——两名美国宇航员、一名法国宇航员和一名俄罗斯宇航员——于美国东部时间周五 9.04am 接到 NASA 任务控制中心的命令,进入与空间站对接的 Crew Dragon 飞船,穿上宇航服,以防漏气情况需要紧急撤离。漏气的舱段位于 Progress(进步号)气闸舱和 Zvezda(星辰号)服务舱之间的 PrK 模块,漏气原因是微小的结构裂缝。最近几个月 NASA 和俄罗斯航天局一直在讨论漏气的原因和可能的修复方案。

  18. Brave 以 60 美元出售精简版本

    Brave 浏览器过去几年积累了加密货币钱包、AI 助手、新闻流和奖励计划等不太欢迎的功能。为了回应用户对臃肿功能的不满,Brave 推出了精简版 Brave Origin 浏览器。Linux 平台免费,但其它平台则要付费,且价格不菲。Brave Origin 移除了 Brave Rewards、钱包、Leo AI、新闻流、Talk、VPN、Tor 等功能,保留了内置的广告和跟踪器屏蔽功能 Brave Shields,它的一次性授权费用为 59.99 美元,最多可用于 10 台设备。60 美元是否物有所值则取决于用户了。

  19. 超加工食品的加工过程可能与健康风险相关

    越来越多的研究将超加工食品与心脏病、糖尿病、过早死亡等关联起来。但科学家仍在争论究竟是什么导致了健康风险:是食品本身的营养质量,还是生产过程中使用的工业加工和添加剂。根据《American Journal of Public Health》期刊上的一项研究,加工过程本身可能在其中发挥着重要作用。超加工食品的加工过程会改变食物细胞结构、流失有益化合物,引入添加剂以及包装的化合物。对美国长达 20 年的健康营养数据分析显示,超加工食品的热量每增加 10%,健康指标就会恶化。食用超加工食品的人体重更高、血糖控制更差、血压更高、胆固醇水平较差。他们更容易患上糖尿病、代谢综合征和癌症,在研究期间有更高的死亡风险。在考虑了超加工食品的营养质量,以及饱和脂肪、添加糖或钠的含量之后,这种关联仍然存在。

  20. 大黄蜂能利用工具解决问题

    根据发表在《科学》期刊上的一项研究,大黄蜂能利用工具解决问题。昆虫加入到了能解决“盒子香蕉”问题的动物行列,展现出了基本智能。在盒子香蕉问题中,黑猩猩通过叠盒子够着了之前够不着的香蕉。在最新研究中,研究人员根据大黄蜂修改了盒子香蕉问题:它需要将聚苯乙烯球滚到特定位置,然后爬上去够到低天花板上的人造花。参与实验的大黄蜂只有几周大,研究人员训练它们将人造花与糖水奖励联系起来。在基础测试中 75% 的黄蜂成功够到了花朵;在更复杂的测试中,30 只黄蜂中有 23 只成功了。研究人员指出,即使昆虫的大脑非常小,它们也能灵活解决各种新问题。