OrangeBot.AI Digest — 2026-05-14
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- The AI Zombification of Universities (www.thenewcritic.com)
- First public macOS kernel memory corruption exploit on Apple M5 (blog.calif.io)
- AI is making me dumb (jpain.io)
- Removing the modem and GPS from my 2024 RAV4 hybrid (arkadiyt.com)
- New Nginx Exploit (github.com)
- RTX 5090 and M4 MacBook Air: Can It Game? (scottjg.com)
- A message from President Kornbluth about funding and the talent pipeline (president.mit.edu)
- Bitcoin trader recovers wallet with help of Claude (www.tomshardware.com)
- Meta's New Reality: Record High Profits. Record Low Morale (www.wired.com)
- Computer Hobby Movement in Canada (museum.eecs.yorku.ca)
- USDA Projects Smallest US Wheat Harvest Since 1972 Due to Plains Drought (www.agweb.com)
- Sam Altman's Business Dealings Under GOP Scrutiny Ahead of OpenAI's IPO (www.wsj.com)
- Leaving the Physical World (www.eff.org)
- Rewrite Bun in Rust has been merged (github.com)
- A Claude Code and Codex Skill for Deliberate Skill Development (github.com)
GitHub Trending(15)
- ruvnet / RuView
- tinyhumansai / openhuman
- rohitg00 / agentmemory
- obra / superpowers
- K-Dense-AI / scientific-agent-skills
- shiyu-coder / Kronos
- roboflow / supervision
- influxdata / telegraf
- supertone-inc / supertonic
- Genymobile / scrcpy
- NVIDIA-AI-Blueprints / video-search-and-summarization
- CloakHQ / CloakBrowser
- mattpocock / skills
- github / spec-kit
- garrytan / gstack
Product Hunt(15)
- Open Browser Use
Open-source browser automation for local AI agents
- Agent FM for Claude Code & Codex
Tune in and stay in the loop with your agents 🎧
- Naptick AI
Al sleep companion that helps fall asleep without struggle
- Causo for Fundraising
Pitch the right VCs, skip the grind
- Spellar 3.0
AI Meeting companion with cross-meeting memory
- Sherloq
Know which LinkedIn prospect is warm - right now
- Enjo Help Center
AI auto-builds your help centers that learn from your team
- Higgsfield Supercomputer
Run your entire creative pipeline from one chat agent
- DesignMD
Turn any website into an AI-ready design system
- Fei Design Mode
Directly edit and tweak UI pixels live with AI agents
- Asteroid
Build Browser, Linux and Windows AI agents in seconds
- Raindrop Workshop
Open source, free, local debugger for AI agents
- Tendem by Toloka
AI platform to hand off any task to a human expert
- Notion Developer Platform
Build on Notion, not just inside it
- Instants by Instagram
Send disappearing, unedited photos to Close Friends
Hugging Face(15)
- MinT: Managed Infrastructure for Training and Serving Millions of LLMs
We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.
- MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
Tabular Foundation Models have recently established the state of the art in supervised tabular learning, by leveraging pretraining to learn generalizable representations of numerical and categorical structured data. However, they lack native support for unstructured modalities such as text and image, and rely on frozen, pretrained embeddings to process them. On established Multimodal Tabular Learning benchmarks, we show that tuning the embeddings to the task improves performance. Existing benchmarks, however, often focus on the mere co-occurrence of modalities; this leads to high variance across datasets and masks the benefits of task-specific tuning. To address this gap, we introduce MulTaBench, a benchmark of 40 datasets, split equally between image-tabular and text-tabular tasks. We focus on predictive tasks where the modalities provide complementary predictive signal, and where generic embeddings lose critical information, necessitating Target-Aware Representations that are aligned with the task. Our experimental results demonstrate that the gains from target-aware representation tuning generalize across both text and image modalities, several tabular learners, encoder scales, and embedding dimensions. MulTaBench constitutes the largest image-tabular benchmarking effort to date, spanning high-impact domains such as healthcare and e-commerce. It is designed to enable the research of novel architectures which incorporate joint modeling and target-aware representations, paving the way for the development of novel Multimodal Tabular Foundation Models.
- AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior of ODE sampling. To address this limitation, we introduce AnyFlow, the first any-step video diffusion distillation framework based on flow maps. Instead of distilling a model for only a few fixed sampling steps, AnyFlow optimizes the full ODE sampling trajectory. To this end, we shift the distillation target from endpoint consistency mapping (z_{t}rightarrow z_{0}) to flow-map transition learning (z_{t}rightarrow z_{r}) over arbitrary time intervals. We further propose Flow Map Backward Simulation, which decomposes a full Euler rollout into shortcut flow-map transitions, enabling efficient on-policy distillation that reduces test-time errors (i.e., discretization error in few-step sampling and exposure bias in causal generation). Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters, demonstrate that AnyFlow achieves performance matches or surpasses consistency-based counterparts in the few-step regime, while scaling with sampling step budgets.
- Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
Long-context modeling is becoming a core capability of modern large vision-language models (LVLMs), enabling sustained context management across long-document understanding, video analysis, and multi-turn tool use in agentic workflows. Yet practical training recipes remain insufficiently explored, particularly for designing and balancing long-context data mixtures. In this work, we present a systematic study of long-context continued pre-training for LVLMs, extending a 7B model from 32K to 128K context with extensive ablations on long-document data. We first show that long-document VQA is substantially more effective than OCR transcription. Building on this observation, our ablations further yield three key findings: i) for sequence-length distribution, balanced data outperforms target-length-focused data (e.g., 128K), suggesting that long-context ability requires generalizable key-information retrieval across various lengths and positions; ii) retrieval remains the primary bottleneck, favoring retrieval-heavy mixtures with modest reasoning data for task diversity; and iii) pure long-document VQA largely preserves short-context capabilities, suggesting that instruction-formatted long data reduces the need for short-data mixing. Based on these findings, we introduce MMProLong, obtained by long-context continued pre-training from Qwen2.5-VL-7B with only a 5B-token budget. MMProLong improves long-document VQA scores by 7.1% and maintains strong performance at 256K and 512K contexts beyond its 128K training window, without additional training. It further generalizes to webpage-based multimodal needle retrieval, long-context vision-text compression, and long-video understanding without task-specific supervision. Overall, our study establishes a practical LongPT recipe and an empirical foundation for advancing long-context vision-language models.
- EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges: generating realistic simulated conversations, and measuring quality across the full scope of voice-specific failure modes. We present EVA-Bench, an end-to-end evaluation framework that addresses both. On the simulation side, EVA-Bench orchestrates bot-to-bot audio conversations over dynamic multi-turn dialogues, with automatic simulation validation that detects user simulator error and appropriately regenerates conversations before scoring. On the measurement side, EVA-Bench introduces two composite metrics: EVA-A (Accuracy), capturing task completion, faithfulness, and audio-level speech fidelity; and EVA-X (Experience), capturing conversation progression, spoken conciseness, and turn-taking timing. Both metrics apply to different agent architectures, enabling direct cross-architecture comparison. EVA-Bench includes 213 scenarios across three enterprise domains, a controlled perturbation suite for accent and noise robustness, and pass@1, pass@k, pass^k measurements that distinguish peak from reliable capability. Across 12 systems spanning all three architectures, we find: (1) no system simultaneously exceeds 0.5 on both EVA-A pass@1 and EVA-X pass@1; (2) peak and reliable performance diverge substantially (median pass@k - pass^k gap of 0.44 on EVA-A); and (3) accent and noise perturbations expose substantial robustness gaps, with effects varying across architectures, systems, and metrics (mean up to 0.314). We release the full framework, evaluation suite, and benchmark data under an open-source license.
- Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
AI agents negotiate and transact in natural language with unfamiliar counterparts: a buyer bot facing an unknown seller, or a procurement assistant negotiating with a supplier. In such interactions, the counterpart's LLM, prompts, control logic, and rule-based fallbacks are hidden, while each decision can have monetary consequences. We ask whether an agent can predict an unfamiliar counterpart's next decision from a few interactions. To avoid real-world logging confounds, we study this problem in controlled bargaining and negotiation games, formulating it as target-adaptive text-tabular prediction: each decision point is a table row combining structured game state, offer history, and dialogue, while K previous games of the same target agent, i.e., the counterpart being modeled, are provided in the prompt as labeled adaptation examples. Our model is built on a tabular foundation model that represents rows using game-state features and LLM-based text representations, and adds LLM-as-Observer as an additional representation: a small frozen LLM reads the decision-time state and dialogue; its answer is discarded, and its hidden state becomes a decision-oriented feature, making the LLM an encoder rather than a direct few-shot predictor. Training on 13 frontier-LLM agents and testing on 91 held-out scaffolded agents, the full model outperforms direct LLM-as-Predictor prompting and game+text features baselines. Within this tabular model, Observer features contribute beyond the other feature schemes: at K=16, they improve response-prediction AUC by about 4 points across both tasks and reduce bargaining offer-prediction error by 14%. These results show that formulating counterpart prediction as a target-adaptive text-tabular task enables effective adaptation, and that hidden LLM representations expose decision-relevant signals that direct prompting does not surface.
- Qwen-Image-VAE-2.0 Technical Report
We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression, we adopt an improved architecture featuring Global Skip Connections (GSC) and expanded latent channels. Moreover, we scale training to billions of images and incorporate a synthetic rendering engine to improve performance in text-rich scenarios. To tackle the convergence challenges of high-dimensional latent space, we implement an enhanced semantic alignment strategy to make the latent space highly amenable to diffusion modeling. To optimize computational efficiency, we leverage an asymmetric and attention-free encoder-decoder backbone to minimize encoding overhead. We present a comprehensive evaluation of Qwen-Image-VAE-2.0 on public reconstruction benchmarks. To evaluate performance in text-rich scenarios, we propose OmniDoc-TokenBench, a new benchmark comprising a diverse collection of real-world documents coupled with specialized OCR-based evaluation metrics. Qwen-Image-VAE-2.0 achieves state-of-the-art reconstruction performance, demonstrating exceptional capabilities in both general domains and text-rich scenarios at high compression ratio. Furthermore, downstream DiT experiments reveal our models possess superior diffusability, significantly accelerating convergence compared to existing high-compression baselines. These establish Qwen-Image-VAE-2.0 as a leading model with high compression, superior reconstruction, and exceptional diffusability.
- Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
Recent image editing models have achieved remarkable progress in instruction following, multimodal understanding, and complex visual editing. However, existing benchmarks often fail to faithfully reflect human judgment, especially for strong frontier models, due to limited task difficulty and coarse-grained evaluation protocols. In parallel, reward models have become increasingly important for RL-based image editing optimization, yet existing reward model benchmarks still rely on unrealistic evaluation settings that deviate from practical RL scenarios. These limitations hinder reliable assessment of both image editing models and reward models. To address these challenges, we introduce Edit-Compass and EditReward-Compass, a unified evaluation suite for image editing and reward modeling. Edit-Compass contains 2,388 carefully annotated instances spanning six progressively challenging task categories, covering capabilities such as world knowledge reasoning, visual reasoning, and multi-image editing. Beyond broad task coverage, Edit-Compass adopts a fine-grained multidimensional evaluation framework based on structured reasoning and carefully designed scoring rubrics. In parallel, EditReward-Compass contains 2,251 preference pairs that simulate realistic reward modeling scenarios during RL optimization.
- TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking
Dense 3D tracking from monocular video is fundamental to dynamic scene understanding. While recent 3D foundation models provide reliable per-frame geometry, recovering object motion in this geometry remains challenging and benefits from strong motion priors learned from real-world videos. Existing 3D trackers either follow iterative paradigms trained from scratch on synthetic data or fine-tune 3D reconstruction models learned from static multi-view images, both lacking real-world motion priors. Pre-trained video diffusion transformers (video DiTs) offer rich spatio-temporal priors from internet-scale videos, making them a promising foundation for 3D tracking. However, their frame-anchored formulation, which generates each frame's content, is fundamentally mismatched with reference-anchored dense 3D tracking, which must follow the same physical points from a reference frame across time. We present TrackCraft3R, the first method to repurpose a video DiT as a feed-forward dense 3D tracker. Given a monocular video and its frame-anchored reconstruction pointmap, TrackCraft3R predicts a reference-anchored tracking pointmap that follows every pixel of the first frame across time in a single forward pass, along with its visibility. We achieve this through two designs: (i) a dual-latent representation that uses per-frame geometry latents and reference-anchored track latents as dense queries, and (ii) temporal RoPE alignment, which specifies the target timestamp of each track latent. Together, these designs convert the per-frame generative paradigm of video DiTs into a reference-anchored tracking formulation with LoRA fine-tuning. TrackCraft3R achieves state-of-the-art performance on standard sparse and dense 3D tracking benchmarks, while running 1.3x faster and using 4.6x less peak memory than the strongest prior method. We further demonstrate robustness to large motions and long videos.
- Many-Shot CoT-ICL: Making In-Context Learning Truly Learn
In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.
- FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long low-change segments dominate the training stream, while manipulation-critical transitions such as alignment, contact, grasping, and release appear only sparsely. We introduce FrameSkip, a data-layer frame selection framework that scores trajectory frames using action variation, visual-action coherence, task-progress priors, and gripper-transition preservation, then remaps training samples toward high-importance frames under a target retention ratio. Because FrameSkip operates only in the dataloader, it leaves the VLA architecture, action head, training objective, and inference procedure unchanged. Across RoboCasa-GR1, SimplerEnv, and LIBERO, FrameSkip improves the success-retention trade-off over full-frame training and simpler frame selection variants, achieving a macro-average success rate of 76.15% across the three benchmarks compared with 66.50% for full-frame training while using a compressed trajectory view that retains 20% of unique frames in the main setting.
- The DAWN of World-Action Interactive Models
A plausible scene evolution depends on the maneuver being considered, while a good maneuver depends on how the scene may evolve. Existing World Action Models (WAMs) largely miss this reciprocity, treating world prediction and action generation as either isolated parallel branches or rigid predict-then-plan pipelines. We formalize this perspective as World-Action Interactive Models (WAIMs), and instantiate it in autonomous driving with DAWN (Denoising Actions and World iNteractive model), a simple yet strong latent generative baseline. DAWN operates in a compact semantic latent space and couples a World Predictor with a World-Conditioned Action Denoiser: the predicted world hypothesis conditions action denoising, while the denoised action hypothesis is fed back to update the world prediction, so that both are recursively refined during inference. Rather than eliminating test-time world evolution altogether or rolling out the full future in pixel space, DAWN performs a short explicit latent rollout that is sufficient to support long-horizon trajectory generation in complex interactive scenes. Experiments show that DAWN achieves strong planning performance and favorable safety-related results across multiple autonomous driving benchmarks. More broadly, our results suggest that interactive world-action generation is a principled path toward truly actionable world models.
- Asymmetric Flow Models
Flow-based generation in high-dimensional spaces is difficult because velocity prediction requires modeling high-dimensional noise, even when data has strong low-rank structure. We present Asymmetric Flow Modeling (AsymFlow), a rank-asymmetric velocity parameterization that restricts noise prediction to a low-rank subspace while keeping data prediction full-dimensional. From this asymmetric prediction, AsymFlow analytically recovers the full-dimensional velocity without changing the network architecture or training/sampling procedures. On ImageNet 256times256, AsymFlow achieves a leading 1.57 FID, outperforming prior DiT/JiT-like pixel diffusion models by a large margin. AsymFlow also provides the first-ever route for finetuning pretrained latent flow models into pixel-space models: aligning the low-rank pixel subspace to the latent space gives a seamless initialization that preserves the latent model's high-level semantics and structure, so finetuning mainly improves low-level mismatches rather than relearning pixel generation. We show that the pixel AsymFlow model finetuned from FLUX.2 klein 9B establishes a new state of the art for pixel-space text-to-image generation, beating its latent base on HPSv3, DPG-Bench, and GenEval while qualitatively showing substantially improved visual realism.
- Learning Agentic Policy from Action Guidance
Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically depends on the exploration capability of the base policy, as training signals emerge only within its in-capability region. For tasks where the base policy cannot reach reward states, additional training or external guidance is needed to recover effective learning signals. Rather than relying on costly iterative supervised fine tuning (SFT), we exploit the abundant action data generated in everyday human interactions. We propose ActGuide-RL, which injects action data as plan-style reference guidance, enabling the agentic policy to overcome reachability barriers to reward states. Guided and unguided rollouts are then jointly optimized via mixed-policy training, internalizing the exploration gains back into the unguided policy. Motivated by a theoretical and empirical analysis of the benefit-risk trade-off, we adopt a minimal intervention principle that invokes guidance only as an adaptive fallback, matching task difficulty while minimizing off-policy risk. On search-agent benchmarks, ActGuide-RL substantially improves over zero RL (+10.7 pp on GAIA and +19 pp on XBench with Qwen3-4B), and performs on par with the SFT+RL pipeline without any cold start. This suggests a new paradigm for agentic RL that reduces the reliance on heavy SFT data by using scalable action guidance instead.
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
Memory retrieval in agentic large language model (LLM) systems is often treated as a static lookup problem, relying on flat vector search or fixed binary relational graphs. However, fixed graph structures cannot capture the varying strength, confidence, and query-dependent relevance of relationships between events. In this paper, we propose HAGE, a weighted multi-relational memory framework that reconceptualizes retrieval as sequential, query-conditioned traversal over a unified relational memory graph. Memory is organized as relation-specific graph views over shared memory nodes, where each edge is associated with a trainable relation feature vector encoding multiple relational signals. Given a query, an LLM-based classifier identifies the relational intent, and a routing network dynamically modulates the corresponding dimensions of the edge embedding. Traversal scores are computed via a learned combination of semantic similarity and these query-conditioned edge representations. This allows memory traversal to prioritize high-utility relational paths while softly suppressing noisy or weakly relevant connections. Beyond adaptive traversal, HAGE further introduces a reinforcement learning-based training framework that jointly optimizes routing behavior and edge representations using downstream tasks. Finally, empirical results demonstrate improved long-horizon reasoning accuracy and a favorable accuracy-efficiency trade-off compared to state-of-the-art agentic memory systems. Our code is available at https://github.com/FredJiang0324/HAGE_MVPReview.
Techmeme(15)
- OpenAI adds remote access to Codex in the ChatGPT mobile app, letting users control Codex sessions running on a connected computer directly from a phone (Zac Hall/9to5Mac)
Zac Hall / 9to5Mac : OpenAI adds remote access to Codex in the ChatGPT mobile app, letting users control Codex sessions running on a connected computer directly from a phone — OpenAI has released a new way to interact with its Codex app from your smartphone. An update to ChatGPT's mobile app brings remote access …
- Figma reports Q1 revenue up 46% YoY to $333.4M, vs. $313.2M est., and forecasts Q2 revenue above estimates, citing AI monetization; FIG jumps 8%+ after hours (Zaheer Kachwala/Reuters)
Zaheer Kachwala / Reuters : Figma reports Q1 revenue up 46% YoY to $333.4M, vs. $313.2M est., and forecasts Q2 revenue above estimates, citing AI monetization; FIG jumps 8%+ after hours — Figma (FIG.N) raised its annual revenue forecast on Thursday, as growing adoption of its artificial intelligence tools helped convert …
- Sources: Joshua Kushner's Thrive Capital told its stakeholders that it has invested ~$100M in Shopify, framed as a bet on how AI could lead to gains in commerce (Natasha Mascarenhas/Bloomberg)
Natasha Mascarenhas / Bloomberg : Sources: Joshua Kushner's Thrive Capital told its stakeholders that it has invested ~$100M in Shopify, framed as a bet on how AI could lead to gains in commerce — Joshua Kushner's Thrive Capital has taken a stake in Shopify Inc., according to people familiar with the matter, marking a rare bet by the venture firm on a public company.
- Sources: Microsoft plans to remove most of its Claude Code licenses and push its developers toward GitHub Copilot CLI, after previously pushing Claude Code (Tom Warren/The Verge)
Tom Warren / The Verge : Sources: Microsoft plans to remove most of its Claude Code licenses and push its developers toward GitHub Copilot CLI, after previously pushing Claude Code — Thousands of Microsoft developers will use GitHub Copilot CLI instead — Microsoft first started opening up access to Claude Code in December …
- The Senate Banking Committee advances the Clarity Act, which would make CFTC the primary regulator for most digital assets while SEC oversees digital securities (Bloomberg)
Bloomberg : The Senate Banking Committee advances the Clarity Act, which would make CFTC the primary regulator for most digital assets while SEC oversees digital securities — Bitcoin climbed past $80,000 after the Senate Banking Committee advanced a landmark US digital asset market structure bill after months of negotiations.
- Sources: 50+ researchers and engineers have left xAI since the SpaceX acquisition via layoffs, firings, and voluntary departures; many have joined Meta and TML (Theo Wayt/The Information)
Theo Wayt / The Information : Sources: 50+ researchers and engineers have left xAI since the SpaceX acquisition via layoffs, firings, and voluntary departures; many have joined Meta and TML — Call it the SpaceXAI exodus. — More than 50 researchers and engineers working on xAI's Grok models have left …
- Sources: Intel has begun testing production of "low-end/legacy iPhone, iPad, and Mac processors"; Apple thinks TSMC's resources will continue tilting toward AI (@mingchikuo)
@mingchikuo : Sources: Intel has begun testing production of “low-end/legacy iPhone, iPad, and Mac processors”; Apple thinks TSMC's resources will continue tilting toward AI — ... Apple has kicked off low-end/legacy iPhone, iPad, and Mac processors at Intel on the 18A-P series (using Foveros packaging).
- Ford stock jumped as much as 25% in two days after the launch of Ford Energy, a new subsidiary providing battery storage capacity to AI data centers (Christian Davies/Financial Times)
Christian Davies / Financial Times : Ford stock jumped as much as 25% in two days after the launch of Ford Energy, a new subsidiary providing battery storage capacity to AI data centers — New subsidiary pivots to energy storage batteries for AI after disastrous electric vehicle writedown — Shares in Detroit auto giant Ford surged …
- A look at GameStop's takeover bid for eBay, a bizarre job application by GameStop CEO Ryan Cohen, who does not have the money to buy eBay, to become eBay's CEO (Matt Levine/Bloomberg)
Matt Levine / Bloomberg : A look at GameStop's takeover bid for eBay, a bizarre job application by GameStop CEO Ryan Cohen, who does not have the money to buy eBay, to become eBay's CEO — Also Adani Green and fund formation legal fees. — GameStop — The simple model of GameStop Corp.'s proposal to acquire eBay Inc. is:
- Cerebras opens at $350, valuing the chipmaker at $100B+, after raising $5.5B by selling 30M shares at $185, the largest US tech IPO since Uber's debut in 2019 (Jordan Novet/CNBC)
Jordan Novet / CNBC : Cerebras opens at $350, valuing the chipmaker at $100B+, after raising $5.5B by selling 30M shares at $185, the largest US tech IPO since Uber's debut in 2019 — Cerebras Systems soared 68% in its Nasdaq debut on Thursday, closing at $311.07 after selling shares at $185, well above the company's expected range.
- Sources: OpenAI is weighing legal action against Apple after expectations that ChatGPT's Siri integration would generate billions in revenue fell short (Mark Gurman/Bloomberg)
Mark Gurman / Bloomberg : Sources: OpenAI is weighing legal action against Apple after expectations that ChatGPT's Siri integration would generate billions in revenue fell short — Apple Inc.'s two-year-old partnership with OpenAI has become strained, according to people familiar with the matter …
- OpenAI says two employee devices were impacted via a supply chain attack on TanStack but no user data or production systems were compromised (Lorenzo Franceschi-Bicchierai/TechCrunch)
Lorenzo Franceschi-Bicchierai / TechCrunch : OpenAI says two employee devices were impacted via a supply chain attack on TanStack but no user data or production systems were compromised — Earlier this week, hackers hijacked several open source projects used by dozens of companies and pushed updates designed to spread malware.
- Ian Crosby's Synthetic, which is building an AI bookkeeper, raised a $10M seed led by Khosla; Crosby founded Bench, which raised $100M+ and imploded in 2024 (Marina Temkin/TechCrunch)
Marina Temkin / TechCrunch : Ian Crosby's Synthetic, which is building an AI bookkeeper, raised a $10M seed led by Khosla; Crosby founded Bench, which raised $100M+ and imploded in 2024 — Ian Crosby, whose previous startup Bench Accounting famously shut down in 2024 before being bought for scraps …
- Anthropic and the Gates Foundation pledge $200M to use AI in health and education initiatives; the foundation signed a similar, $50M deal with OpenAI in January (Jeffrey Dastin/Reuters)
Jeffrey Dastin / Reuters : Anthropic and the Gates Foundation pledge $200M to use AI in health and education initiatives; the foundation signed a similar, $50M deal with OpenAI in January — Anthropic and the Gates Foundation have pledged $200 million to back artificial intelligence-related public goods …
- Razer updates the Blade 18 with an Intel Core Ultra 9 290HX Plus chip, starting at $4000 with an Nvidia RTX 5070 Ti, up $500 from the comparable 2025 model (Antonio G. Di Benedetto/The Verge)
Antonio G. Di Benedetto / The Verge : Razer updates the Blade 18 with an Intel Core Ultra 9 290HX Plus chip, starting at $4000 with an Nvidia RTX 5070 Ti, up $500 from the comparable 2025 model — RAMageddon has come for the Blade 18 — and the Blade 16, too. … Razer just announced a new Blade 18 gaming laptop with an Intel Core Ultra 9 290HX Plus chip.
Solidot(15)
- 科学家首次从直立人化石中提取出遗传信息
中科院研究人员首次成功从北京周口店、安徽和县、河南孙家洞三个遗址距今约 40 万年的 6 颗中更新世直立人牙齿化石中,获取了具有系统发育信息的内源性牙釉质蛋白数据。这是首次获得具有直立人鉴定特征的分子信息,重塑了中更新世东亚古人类群体互动网络。中国境内的直立人究竟属于同一个演化支系,还是代表了多个不同来源或相对隔离的群体?研究构建了包括 6 个东亚直立人和 1 个哈尔滨个体在内的内源性蛋白质对比数据集,结果显示,6 个东亚直立人明确聚为一支,与丹尼索瓦人、尼安德特人和现代人清晰分离。研究还揭示出丹尼索瓦人基因组渗入到现代人的部分基因,其来源可以追溯至与周口店、和县、孙家洞中更新世相关人群。
- 第一位牙医是尼安德特人
根据发表在 PLOS One 期刊上的一项研究,第一位牙医是尼安德特人。5.9 万年前,在今天的西伯利亚西南部,一名尼安德特人牙疼难忍,以至于他让别人用锋利的石器钻入牙齿,清除感染的组织,最终缓解疼痛。整个治疗过程在牙齿上留下了一个洞。俄罗斯科学院古人类学家 Alisa Zubova 及其同事认为这是一种牙科工作。考古学家在俄罗斯 Chagyrskaya 洞穴发掘出了这颗牙齿,它是已知最古老的牙科治疗证据,也是迄今发现的最古老直接治疗。牙齿钻孔缓解疼痛似乎有悖常理,但却是去除感染组织最简单破坏性最小的方法。暴露牙髓腔会导致暴露的神经死亡,从而消除疼痛。这种做法直到几百年前才开始普及,但尼安德特人几万年前就发现了,还能互相配合。
- AI 工具作弊的流行迫使普林斯顿推翻无人监考制度
1893 年普林斯顿大学学生请愿取消考试中教师监考的制度,大学随后制定了《荣誉规章(Honor Code)》,学生承诺——我以我的人格保证,我没有在这次考试中违反《荣誉规章》的学术诚信政策。这种无人监考的制度实施了 133 年,直到本周被投票取消,原因是 AI 作弊工具的流行。2025 年对大四学生的调查发现,29.9% 的学生承认至少在一次作业或考试中作弊。其中攻读工程学理学士(BSE)学位的学生承认作弊的比例高达 40.8%,而文学学士学生“仅”为 26.4%。作弊基本上是借助了生成式 AI 工具。荣誉规章依赖于学生举报,但手机、AI 以及不愿告密的文化,许多人对作弊行为视而不见。学生说,在考试期间男厕所排起来长队,表明了作弊的普遍。调查显示,44.6% 的大四学生目睹过作弊行为,但选择不举报。普林斯顿大学教职工本周投票取消了无人监考,这次投票只有一个人投了反对票。从 7 月 1 日开始,所有课堂考试必须由教师监考。
- 为什么部分人特别招蚊子?
为什么部分人特别招蚊子?科学家正试图破解背后的化学信号。一系列感官线索促使蚊子选择叮咬特定的人——主要是身体释放出的气味和热量,以及呼出的二氧化碳。雌蚊——也是唯一会叮咬人的蚊子——利用其精密调整的受体探测这些信号,据此选择目标。在距离人体约 10 米的范围内,蚊子会开始探测到气味。随着距离的拉近,体温和湿度也会使某些人对它们更具吸引力。在最新研究中,研究人员在实验室中将埃及伊蚊释放到 42 名女性身上,观察它们更喜欢叮咬哪些人。研究人员证明,蚊子利用了多种气味化合物的混合物,在可能的 1000 种化合物中他们识别了 27 种蚊子能探测到的化合物。蚊子最喜欢叮咬的女性——包括处于妊娠中期的孕妇——会分泌大量由皮肤油脂皮脂分解产生的特定化合物,其中一种化合物是 1-octen-3-ol。
- 美国批准向 10 家中国公司出售 H200 芯片
美国批准向 10 家中国公司出售 H200 芯片,但中国尚未批准任何交易。英伟达 CEO 黄仁勋本周随美国总统访问中国,寻求取得突破。黄仁勋最初并未列入白宫赴华代表团名单,他是受特朗普的邀请加入访问团,飞机在阿拉斯加接上了黄仁勋。此次访问或许能打破芯片销售的僵局。美国商务部已批准包括阿里巴巴、腾讯、字节跳动和京东在内的 10 家中国公司采购英伟达 H200 芯片。包括联想和富士康在内的分销商也获得了批准。买家可以直接从英伟达购买,也可以通过中间商购买,根据美国许可条款,每位获得批准的客户最多可购买 75,000 颗芯片。
- 研究发现在出生前接触蔬菜气味帮助婴儿爱上吃蔬菜
吃蔬菜有益健康,但哄孩子吃蔬菜对父母们而言是一大难题。一项研究发现,新生儿父母可以未雨绸缪,在出生前就让他们熟悉蔬菜的气味,那么出生之后他们不会再排斥蔬菜。在实验中,研究人员让部分孕妇服用羽衣甘蓝粉(kale powder)胶囊,部分孕妇服用胡萝卜粉胶囊,然后观察胎儿或婴儿对羽衣甘蓝和胡萝卜的面部反应。对胎儿的观察是借助超声波,之后是出生后三周以及三岁。研究人员说,孕妇不愿意为了科学而服用大量羽衣甘蓝汁或胡萝卜汁,所以他们选择了胶囊。结果基本一致:接触过胡萝卜粉的孩子对胡萝卜不排斥,接触过羽衣甘蓝的也喜欢羽衣甘蓝。研究人员推测,孕晚期接触特定口味可能会给孩子留下持久的味觉或嗅觉记忆,有可能影响他们出生多年后的饮食偏好。研究人员表示这项研究规模较小,如果资金充足将展开更大规模的研究。
- 天文学家观测 120 亿年前极小星系 LAP1-B
根据发表在《自然》期刊上的一项研究,一个国际天文学团队对诞生于宇宙形成后约 8 亿年的极小星系“LAP1-B”成功进行观测。该星系所含的氧是迄今发现的星系中最少的,也几乎没有重元素,处于成长初期状态。天文学家是利用引力透镜效应使用 NASA 韦伯望远镜对其进行了逾 30 小时的高灵敏度光谱观测。结果显示,星系中氧的比例是太阳的 1/240 以下,与氧相比,碳的比例极高。这意味着很可能捕捉到了第一代恒星发生超新星爆炸后不久的状态。银河系恒星总质量约为太阳的 1000 亿倍,而 LAP1-B 的恒星总质量极小,仅为太阳的 3300 倍以下,天体的大部分由未知的“暗物质”构成。LAP1-B 可能是银河系周围、超过 120 亿年以前就停止形成恒星的“化石星系”的祖先。
- 肥胖率在发展中国家加速
根据发表在《自然》期刊上的一项研究,过去 45 年肥胖率在中低收入国家持续上升,但在许多高收入国家已趋于平稳。研究基于对 200 个国家和地区 2.32 亿人的大规模数据分析,发现在不同国家、年龄组和性别之间的肥胖发展轨迹存在显著差异。在西欧、北美和澳大拉西亚等地区的高收入国家,肥胖率在研究初期呈上升趋势,但随后在多数国家都趋于平稳,尽管肥胖患病率存在差异。例如,在西欧不同国家,成人肥胖患病率稳定在 11% 至 23%之 间,儿童和青少年则稳定在 4% 至 15% 之间。相比之下,部分中低收入国家的肥胖率却急剧上升,在中欧(如罗马尼亚和捷克)及拉丁美洲(如巴西)的某些国家,成人肥胖率已达到 30%–40%。
- 哈佛教师就是否限制学生获得 A 进行投票
美国名校面临成绩膨胀问题。以哈佛大学为例,六成本科生拿到了 A,而十年前这一比例只有 40%,二十年前不到 25%,2025 年有逾 50 名学生以完美的 GPA——即所有课都拿到了 A——成绩毕业。为遏制成绩膨胀,哈佛教职工正就一项提议进行投票,该提议包括:将每门课获得 A 的学生比例限制在 20%,每门课额外增加四个 A 评分;学校荣誉的计算方式从传统的 GPA 改为平均百分位排名;课程使用新的合格、不合格、合格+ 评分。对教职工的调查显示,205 名受访者六成支持“20+4”的评分方案。额外的 4 个 A 有助于小型研讨班,这些班级里的学生通常成绩更好。如果投票通过,新变更将于 2027 年秋季学期生效。
- Meta 员工抗议公司对其鼠标移动和按键的跟踪
Meta 最近开始在美国员工电脑上安装追踪软件,捕捉员工鼠标移动、点击和按键数据以用于训练 AI 模型,此举是该公司构建能自动执行工作任务的 AI 智能体的大计划的一部分。被称为 Model Capability Initiative(MCI)的工具将在工作相关应用和网站上运行,会不定时截取屏幕内容的快照。本周二 Meta 员工在多个办公室散发传单抗议公司的跟踪软件。传单出现在办公室会议室、自动售货机和卫生纸架上,鼓励员工签署一份反对跟踪软件的在线请愿书。传单和请愿书援引了法律 U.S. National Labor Relations Act,称当选择组织起来改善工作条件时员工的行为受到法律保护。
- Windows Update 将自动回滚问题驱动
微软正为其更新程序 Windows Update 引入名为 Cloud-Initiated Driver Recovery 的功能,在更新了一个有问题的驱动程序之后该功能将自动回滚到之前正常工作的版本,无需硬件制造商去修复或用户自己动手去修复。目前问题驱动的处理流程是让硬件制造商负责推送更新或用户手动卸载问题驱动。Cloud-Initiated Driver Recovery 将允许微软远超触发驱动回滚。该功能的测试和验证工作将持续到 8 月,计划 9 月推送给 Windows 用户。
- 被解雇兄弟删了 96 个数据库
企业员工在被解雇前其凭证会提前失效,因为被解雇的员工是安全隐患。一对被同一家公司解雇的双胞胎兄弟 Muneeb 和 Sohaib Akhter 在几分钟内删掉了 96 个美国政府数据库。两兄弟有犯罪前科,曾因为犯罪行为被判刑,但 2023 年和 2024 年两人先后被同一家公司雇佣。雇主在 2025 年 2 月知道了他们过去的行为,立即解雇了他们。然而 Sohaib 的账号被关闭了,但其兄弟 Muneeb 的账号却被忽略了,Muneeb 立即采取行动对公司进行破坏,他删除了公司维护的美国政府数据库,并且还询问 AI 工具删库后如何清除服务器上的系统日志。两人最终于 12 月 3 日被捕,2026 年 4 月 15 日 Muneeb 认罪,Sohaib 则接受了陪审团的裁决,他被判有共谋罪。
- Kickstarter 禁止成人内容
众筹平台 Kickstarter 过去几天修改了规则,扩大了禁止的成人内容范围。此前它只禁止“色情内容”,如今显著扩大了成人内容范围,包括但不限于:暗示性行为,MILF/DILF 内容,暗示性裸露,任何包含女性乳头/乳晕、生殖器和肛门的内容。它甚至还禁止了屁股。暂时不清楚 Kickstarter 为什么要这么做。媒体猜测可能是来自支付公司 Stripe 的压力。Kickstarter 曾在 3 月向众筹项目的发起人发邮件,警告 Stripe 将对任何包含“成人/NSFW 内容”的项目进行审核,可能会关闭项目的筹款账号。
- 美国左右派都对 AI 表示担忧
右派的 Steve Bannon 和左派的 Bernie Sanders 在很多问题上的观点相去甚远,但他们都认为 AI 对工人阶级是一场灾难。Sanders 说 AI 寡头想要的不仅仅是取代特定工作岗位,他们想要取代工人。Bannon 说硅谷不在乎普通人。民调显示,美国是世界最关注 AI 的国家之一,既是全球 AI 的主要研发者也是 AI 的主要反对者。美国反 AI 的情绪在持续高涨。缅因州通过了美国首个全州范围内的数据中心建设禁令,虽然禁令被州长否决。OpenAI CEO Sam Altman 的家被人投掷了燃烧弹。四分之一的美国人接受将暴力作为一种反对手段。硅谷喜欢将 AI 比作工业革命,喜欢强调工业革命所释放的巨大财富。工业革命的确促进了经济增长,但亲身经历工业革命却是另一回事。企业家积累了巨额财富,工人的工资却停滞不前,工作条件也日益恶化。大多数美国人已感受到经济体制被操纵有利于富人。民意调查发现,对 AI 在日常生活中的作用最乐观的美国人群体是年收入超过 20 万美元的人。财富和权力日益集中在少数人手中。
- 艺术和文化活动与延缓衰老相关
根据发表在《Innovation in Aging》期刊上的一项研究,唱歌、绘画、参观美术馆或博物馆有助于延缓衰老,这项研究将积极参与艺术文化活动与健康改善联系起来。延缓衰老速度并不一定意味着寿命更长。研究使用“表观遗传时钟”评估生物衰老,可预测未来的发病率和死亡率。根据研究使用的一种评估方法,每周至少进行一次艺术活动的人,其衰老速度减缓了 4%;而每月进行一次则减缓了 3%。另一项测试表明,每周至少进行一次艺术活动的人,其生理年龄平均比很少参与此类活动的人年轻一岁。而每周锻炼一次的人,按此标准仅年轻六个月。研究人员表示,艺术对延缓衰老速度是如此显著,堪比吸烟者和戒烟者之间的差异。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.