OrangeBot.AI Digest — 2026-03-31
89 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- OpenAI closes funding round at an $852B valuation (www.cnbc.com)
- GitHub's Historic Uptime (damrnelson.github.io)
- OkCupid gave 3M dating-app photos to facial recognition firm, FTC says (arstechnica.com)
- The Claude Code Source Leak: fake tools, frustration regexes, undercover mode (alex000kim.com)
- Italy blocks US use of Sicily air base for Middle East war (www.politico.eu)
- Tell HN: Chrome says "suspicious download" when trying to download yt-dlp
- Microsoft: Copilot is for entertainment purposes only (www.microsoft.com)
- Oracle slashes 30k jobs (rollingout.com)
- Open source CAD in the browser (Solvespace) (solvespace.com)
- Claude Code users hitting usage limits 'way faster than expected' (www.theregister.com)
- Audio tapes reveal mass rule-breaking in Milgram's obedience experiments (www.psypost.org)
- Claude Code's source code has been leaked via a map file in their NPM registry (twitter.com)
- Good CTE, Bad CTE (boringsql.com)
- GitHub backs down, kills Copilot pull-request ads after backlash (www.theregister.com)
- Google's 200M-parameter time-series foundation model with 16k context (github.com)
GitHub Trending(14)
- luongnv89 / claude-howto
- microsoft / VibeVoice
- Yeachan-Heo / oh-my-claudecode
- shanraisshan / claude-code-best-practice
- NousResearch / hermes-agent
- obra / superpowers
- microsoft / agent-lightning
- PaddlePaddle / PaddleOCR
- Dimillian / Skills
- sherlock-project / sherlock
- neovim / neovim
- vas3k / TaxHacker
- OpenBMB / ChatDev
- jwasham / coding-interview-university
Product Hunt(15)
- Relacan
Your canvas becomes a website. Think, arrange, publish.
- Autoclaw
One-click Openclaw set up by Z.AI
- MacMonitor
Real-time Apple Silicon system monitor for your menu bar
- OpenClawCloud
The turn key OpenClaw solution with unlimited LLM tokens
- FireAPI
Discover, consume, and monetize APIs in one place
- IndieEvent
Meet Indie makers in your city
- Solvea
Create your AI receptionist that answers, books, and sells
- Google Ads MCP Server
Run Google Ads from your choice of AI. Skip the UI maze
- Stamp
The AI Secretary that thinks, writes, and works like you
- LobeHub IM Integration
Chat with your AI agent right where your team already works
- Unify
Hire AI colleagues you onboard just like real people
- JobFlow
Your AI co-pilot for job hunting
- Cosmic Team Agents
AI team members that live in Slack, WhatsApp, and Telegram
- Gallifai
Speedy conversational AI
- Codync
Monitor Claude Code sessions in real-time, from anywhere
Hugging Face(15)
- TAPS: Task Aware Proposal Distributions for Speculative Sampling
Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative decoding quality depends on the draft training distribution. We study this question with lightweight HASS and EAGLE-2 drafters trained on MathInstruct, ShareGPT, and mixed-data variants, evaluated on MT-Bench, GSM8K, MATH-500, and SVAMP. Measured by acceptance length, task-specific training yields clear specialization: MathInstruct-trained drafts are strongest on reasoning benchmarks, while ShareGPT-trained drafts are strongest on MT-Bench. Mixed-data training improves robustness, but larger mixtures do not dominate across decoding temperatures. We also study how to combine specialized drafters at inference time. Naive checkpoint averaging performs poorly, whereas confidence-based routing improves over single-domain drafts and merged-tree verification yields the highest acceptance length overall for both backbones. Finally, confidence is a more useful routing signal than entropy: rejected tokens tend to have higher entropy, but confidence produces much clearer benchmark-level routing decisions. These results show that speculative decoding quality depends not only on draft architecture, but also on the match between draft training data and downstream workload, and that specialized drafters are better combined at inference time than in weight space.
- Towards a Medical AI Scientist
Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.
- Gen-Searcher: Reinforcing Agentic Search for Image Generation
Recent image generation models have shown strong capabilities in generating high-fidelity and photorealistic images. However, they are fundamentally constrained by frozen internal knowledge, thus often failing on real-world scenarios that are knowledge-intensive or require up-to-date information. In this paper, we present Gen-Searcher, as the first attempt to train a search-augmented image generation agent, which performs multi-hop reasoning and search to collect the textual knowledge and reference images needed for grounded generation. To achieve this, we construct a tailored data pipeline and curate two high-quality datasets, Gen-Searcher-SFT-10k and Gen-Searcher-RL-6k, containing diverse search-intensive prompts and corresponding ground-truth synthesis images. We further introduce KnowGen, a comprehensive benchmark that explicitly requires search-grounded external knowledge for image generation and evaluates models from multiple dimensions. Based on these resources, we train Gen-Searcher with SFT followed by agentic reinforcement learning with dual reward feedback, which combines text-based and image-based rewards to provide more stable and informative learning signals for GRPO training. Experiments show that Gen-Searcher brings substantial gains, improving Qwen-Image by around 16 points on KnowGen and 15 points on WISE. We hope this work can serve as an open foundation for search agents in image generation, and we fully open-source our data, models, and code.
- Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Multi-agent systems composed of large generative models are rapidly moving from laboratory prototypes to real-world deployments, where they jointly plan, negotiate, and allocate shared resources to solve complex tasks. While such systems promise unprecedented scalability and autonomy, their collective interaction also gives rise to failure modes that cannot be reduced to individual agents. Understanding these emergent risks is therefore critical. Here, we present a pioneer study of such emergent multi-agent risk in workflows that involve competition over shared resources (e.g., computing resources or market share), sequential handoff collaboration (where downstream agents see only predecessor outputs), collective decision aggregation, and others. Across these settings, we observe that such group behaviors arise frequently across repeated trials and a wide range of interaction conditions, rather than as rare or pathological cases. In particular, phenomena such as collusion-like coordination and conformity emerge with non-trivial frequency under realistic resource constraints, communication protocols, and role assignments, mirroring well-known pathologies in human societies despite no explicit instruction. Moreover, these risks cannot be prevented by existing agent-level safeguards alone. These findings expose the dark side of intelligent multi-agent systems: a social intelligence risk where agent collectives, despite no instruction to do so, spontaneously reproduce familiar failure patterns from human societies.
- EpochX: Building the Infrastructure for an Emergent Agent Civilization
General-purpose technologies reshape economies less by improving individual tools than by enabling new ways to organize production and coordination. We believe AI agents are approaching a similar inflection point: as foundation models make broad task execution and tool use increasingly accessible, the binding constraint shifts from raw capability to how work is delegated, verified, and rewarded at scale. We introduce EpochX, a credits-native marketplace infrastructure for human-agent production networks. EpochX treats humans and agents as peer participants who can post tasks or claim them. Claimed tasks can be decomposed into subtasks and executed through an explicit delivery workflow with verification and acceptance. Crucially, EpochX is designed so that each completed transaction can produce reusable ecosystem assets, including skills, workflows, execution traces, and distilled experience. These assets are stored with explicit dependency structure, enabling retrieval, composition, and cumulative improvement over time. EpochX also introduces a native credit mechanism to make participation economically viable under real compute costs. Credits lock task bounties, budget delegation, settle rewards upon acceptance, and compensate creators when verified assets are reused. By formalizing the end-to-end transaction model together with its asset and incentive layers, EpochX reframes agentic AI as an organizational design problem: building infrastructures where verifiable work leaves persistent, reusable artifacts, and where value flows support durable human-agent collaboration.
- On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping the existing ones frozen. However, despite expert isolation, MoE-based continual learners still suffer from forgetting due to routing-drift: old-task tokens become mistakenly attracted to newly added experts, degrading performance on prior tasks. We analyze the failure mode at the token level and reveal the token's dilemma: ambiguous and old tokens in new-task data offer minimal learning benefit yet induce forgetting when routed to new experts, due to their ambiguous routing assignment during training. Motivated by this, we propose LLaVA-DyMoE, a dynamic MoE framework that incrementally expands the MoE with drift-aware token assignment. We characterize token types via their routing score distributions and apply targeted regularization. Specifically, a token-level assignment guidance steers ambiguous and old tokens away from new experts to preserve established routing patterns and alleviate routing-drift, while complementary routing score regularizations enforce expert-group separation and promote new-expert specialization. Extensive experiments demonstrate that our LLaVA-DyMoE effectively mitigates routing-drift-induced forgetting, achieving over a 7% gain in mean final accuracy and a 12% reduction in forgetting compared to baselines. The project page is https://zhaoc5.github.io/DyMoE.
- GEditBench v2: A Human-Aligned Benchmark for General Image Editing
Recent advances in image editing have enabled models to handle complex instructions with impressive realism. However, existing evaluation frameworks lag behind: current benchmarks suffer from narrow task coverage, while standard metrics fail to adequately capture visual consistency, i.e., the preservation of identity, structure and semantic coherence between edited and original images. To address these limitations, we introduce GEditBench v2, a comprehensive benchmark with 1,200 real-world user queries spanning 23 tasks, including a dedicated open-set category for unconstrained, out-of-distribution editing instructions beyond predefined tasks. Furthermore, we propose PVC-Judge, an open-source pairwise assessment model for visual consistency, trained via two novel region-decoupled preference data synthesis pipelines. Besides, we construct VCReward-Bench using expert-annotated preference pairs to assess the alignment of PVC-Judge with human judgments on visual consistency evaluation. Experiments show that our PVC-Judge achieves state-of-the-art evaluation performance among open-source models and even surpasses GPT-5.1 on average. Finally, by benchmarking 16 frontier editing models, we show that GEditBench v2 enables more human-aligned evaluation, revealing critical limitations of current models, and providing a reliable foundation for advancing precise image editing.
- PRBench: End-to-end Paper Reproduction in Physics Research
AI agents powered by large language models exhibit strong reasoning and problem-solving capabilities, enabling them to assist scientific research tasks such as formula derivation and code generation. However, whether these agents can reliably perform end-to-end reproduction from real scientific papers remains an open question. We introduce PRBench, a benchmark of 30 expert-curated tasks spanning 11 subfields of physics. Each task requires an agent to comprehend the methodology of a published paper, implement the corresponding algorithms from scratch, and produce quantitative results matching the original publication. Agents are provided only with the task instruction and paper content, and operate in a sandboxed execution environment. All tasks are contributed by domain experts from over 20 research groups at the School of Physics, Peking University, each grounded in a real published paper and validated through end-to-end reproduction with verified ground-truth results and detailed scoring rubrics. Using an agentified assessment pipeline, we evaluate a set of coding agents on PRBench and analyze their capabilities across key dimensions of scientific reasoning and execution. The best-performing agent, OpenAI Codex powered by GPT-5.3-Codex, achieves a mean overall score of 34%. All agents exhibit a zero end-to-end callback success rate, with particularly poor performance in data accuracy and code correctness. We further identify systematic failure modes, including errors in formula implementation, inability to debug numerical simulations, and fabrication of output data. Overall, PRBench provides a rigorous benchmark for evaluating progress toward autonomous scientific research.
- Make Geometry Matter for Spatial Reasoning
Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation models into VLMs. Nevertheless, we observe that naive token fusion followed by standard fine-tuning in this line of work often leaves such geometric cues underutilized for spatial reasoning, as VLMs tend to rely heavily on 2D visual cues. In this paper, we propose GeoSR, a framework designed to make geometry matter by encouraging VLMs to actively reason with geometry tokens. GeoSR introduces two key components: (1) Geometry-Unleashing Masking, which strategically masks portions of 2D vision tokens during training to weaken non-geometric shortcuts and force the model to consult geometry tokens for spatial reasoning; and (2) Geometry-Guided Fusion, a gated routing mechanism that adaptively amplifies geometry token contributions in regions where geometric evidence is critical. Together, these designs unleash the potential of geometry tokens for spatial reasoning tasks. Extensive experiments on both static and dynamic spatial reasoning benchmarks demonstrate that GeoSR consistently outperforms prior methods and establishes new state-of-the-art performance by effectively leveraging geometric information. The project page is available at https://suhzhang.github.io/GeoSR/.
- ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
Advances in diffusion, autoregressive, and hybrid models have enabled high-quality image synthesis for tasks such as text-to-image, editing, and reference-guided composition. Yet, existing benchmarks remain limited, either focus on isolated tasks, cover only narrow domains, or provide opaque scores without explaining failure modes. We introduce ImagenWorld, a benchmark of 3.6K condition sets spanning six core tasks (generation and editing, with single or multiple references) and six topical domains (artworks, photorealistic images, information graphics, textual graphics, computer graphics, and screenshots). The benchmark is supported by 20K fine-grained human annotations and an explainable evaluation schema that tags localized object-level and segment-level errors, complementing automated VLM-based metrics. Our large-scale evaluation of 14 models yields several insights: (1) models typically struggle more in editing tasks than in generation tasks, especially in local edits. (2) models excel in artistic and photorealistic settings but struggle with symbolic and text-heavy domains such as screenshots and information graphics. (3) closed-source systems lead overall, while targeted data curation (e.g., Qwen-Image) narrows the gap in text-heavy cases. (4) modern VLM-based metrics achieve Kendall accuracies up to 0.79, approximating human ranking, but fall short of fine-grained, explainable error attribution. ImagenWorld provides both a rigorous benchmark and a diagnostic tool to advance robust image generation.
- On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generative outcomes. We identify a fundamental trade-off in current approaches to diversity: modifying model inputs requires costly optimization to incorporate feedback from the generative path. In contrast, acting on spatially-committed intermediate latents tends to disrupt the forming visual structure, leading to artifacts. In this work, we propose to apply repulsion in the Contextual Space as a novel framework for achieving rich diversity in Diffusion Transformers. By intervening in the multimodal attention channels, we apply on-the-fly repulsion during the transformer's forward pass, injecting the intervention between blocks where text conditioning is enriched with emergent image structure. This allows for redirecting the guidance trajectory after it is structurally informed but before the composition is fixed. Our results demonstrate that repulsion in the Contextual Space produces significantly richer diversity without sacrificing visual fidelity or semantic adherence. Furthermore, our method is uniquely efficient, imposing a small computational overhead while remaining effective even in modern "Turbo" and distilled models where traditional trajectory-based interventions typically fail.
- MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences
Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision experiences through hindsight reasoning. These experiences are organized into a quality-filtered experience bank that supports policy-driven experience retrieval at inference time. Specifically, MuSEAgent enables adaptive experience exploitation through complementary wide- and deep-search strategies, allowing the agent to dynamically retrieve multimodal guidance across diverse compositional semantic viewpoints. Extensive experiments demonstrate that MuSEAgent consistently outperforms strong trajectory-level experience retrieval baselines on both fine-grained visual perception and complex multimodal reasoning tasks. These results validate the effectiveness of stateful experience modeling in improving multimodal agent reasoning.
- Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend-specific evaluation services for Triton on NVIDIA GPUs and Maca on MetaX GPUs. On the training side, we convert long-horizon evolution trajectories into step-centric supervision and reinforcement learning signals by retaining correctness-preserving, high-gain revisions, so that the model is optimized as a strong local improver inside the evolutionary loop rather than as a one-shot generator. Under a unified evolutionary protocol, Kernel-Smith-235B-RL achieves state-of-the-art overall performance on KernelBench with Nvidia Triton backend, attaining the best average speedup ratio and outperforming frontier proprietary models including Gemini-3.0-pro and Claude-4.6-opus. We further validate the framework on the MetaX MACA backend, where our Kernel-Smith-MACA-30B surpasses large-scale counterparts such as DeepSeek-V3.2-think and Qwen3-235B-2507-think, highlighting potential for seamless adaptation across heterogeneous platforms. Beyond benchmark results, the same workflow produces upstream contributions to production systems including SGLang and LMDeploy, demonstrating that LLM-driven kernel optimization can transfer from controlled evaluation to practical deployment.
- ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning
Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compressed but in the volume of pixels the encoder receives, and address it with ResAdapt, an Input-side adaptation framework that learns how much visual budget each frame should receive before encoding. ResAdapt couples a lightweight Allocator with an unchanged MLLM backbone, so the backbone retains its native visual-token interface while receiving an operator-transformed input. We formulate allocation as a contextual bandit and train the Allocator with Cost-Aware Policy Optimization (CAPO), which converts sparse rollout feedback into a stable accuracy-cost learning signal. Across budget-controlled video QA, temporal grounding, and image reasoning tasks, ResAdapt improves low-budget operating points and often lies on or near the efficiency-accuracy frontier, with the clearest gains on reasoning-intensive benchmarks under aggressive compression. Notably, ResAdapt supports up to 16x more frames at the same visual budget while delivering over 15% performance gain. Code is available at https://github.com/Xnhyacinth/ResAdapt.
- ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs) remain limited. We introduce ChartNet, a high-quality, million-scale multimodal dataset designed to advance chart interpretation and reasoning. ChartNet leverages a novel code-guided synthesis pipeline to generate 1.5 million diverse chart samples spanning 24 chart types and 6 plotting libraries. Each sample consists of five aligned components: plotting code, rendered chart image, data table, natural language summary, and question-answering with reasoning, providing fine-grained cross-modal alignment. To capture the full spectrum of chart comprehension, ChartNet additionally includes specialized subsets encompassing human annotated data, real-world data, safety, and grounding. Moreover, a rigorous quality-filtering pipeline ensures visual fidelity, semantic accuracy, and diversity across chart representations. Fine-tuning on ChartNet consistently improves results across benchmarks, demonstrating its utility as large-scale supervision for multimodal models. As the largest open-source dataset of its kind, ChartNet aims to support the development of foundation models with robust and generalizable capabilities for data visualization understanding. The dataset is publicly available at https://huggingface.co/datasets/ibm-granite/ChartNet
Techmeme(15)
- An excerpt from the book The Infinity Machine details how DeepMind's early governance battles with Google changed Demis Hassabis from an idealist into a realist (Sebastian Mallaby/Colossus)
Sebastian Mallaby / Colossus : An excerpt from the book The Infinity Machine details how DeepMind's early governance battles with Google changed Demis Hassabis from an idealist into a realist — The inside story of how DeepMind's experiments in AI safety governance transformed Demis Hassabis from an idealist into a realist
- Samsung launches Hearapy, a free Android app to mitigate motion sickness by playing a 100Hz sine wave tone; a 60-second session can provide two hours of relief (Andrew Liszewski/The Verge)
Andrew Liszewski / The Verge : Samsung launches Hearapy, a free Android app to mitigate motion sickness by playing a 100Hz sine wave tone; a 60-second session can provide two hours of relief — Listening to a 100Hz sine wave tone for just 60 seconds could reduce motion sickness symptoms for up to two hours.
- Austin-based Saronic, which builds military autonomous ships, raised a $1.75B Series D led by Kleiner Perkins at a $9.25B valuation, up from $4B in Feb. 2025 (Samantha Subin/CNBC)
Samantha Subin / CNBC : Austin-based Saronic, which builds military autonomous ships, raised a $1.75B Series D led by Kleiner Perkins at a $9.25B valuation, up from $4B in Feb. 2025 — Autonomous ship startup Saronic said Tuesday that it's raised $1.75 billion as it ramps up production to meet mounting U.S. military demand …
- Sequoia says Doug Leone is returning in a newly created role of chairman, after he announced his retirement in 2022 from his role as "senior steward" (Iain Martin/Forbes)
Iain Martin / Forbes : Sequoia says Doug Leone is returning in a newly created role of chairman, after he announced his retirement in 2022 from his role as “senior steward” — Three years after his official retirement, the Midas List investor is back in a new supervisory role at the blue chip Silicon Valley fund.
- Anthropic confirms it leaked parts of Claude Code's source code, saying the leak was "a release packaging issue caused by human error, not a security breach" (Ashley Capoot/CNBC)
Ashley Capoot / CNBC : Anthropic confirms it leaked parts of Claude Code's source code, saying the leak was “a release packaging issue caused by human error, not a security breach” — Anthropic leaked part of the internal source code for its popular artificial intelligence coding assistant, Claude Code, the company confirmed on Tuesday.
- Snap shares climbed 14% on Tuesday after activist investor Irenic suggested changes to boost the stock's value 7x, such as cutting staff by 21% and ending Specs (Lola Murti/CNBC)
Lola Murti / CNBC : Snap shares climbed 14% on Tuesday after activist investor Irenic suggested changes to boost the stock's value 7x, such as cutting staff by 21% and ending Specs — Shares of Snap climbed 14% Tuesday after shareholder Irenic Capital Management sent a letter to CEO Evan Spiegel outlining changes …
- Yupp, which raised a $33M seed led by a16z crypto in 2024 for a crowdsourced AI model picker, shuts down, saying it didn't reach strong product-market fit (Julie Bort/TechCrunch)
Julie Bort / TechCrunch : Yupp, which raised a $33M seed led by a16z crypto in 2024 for a crowdsourced AI model picker, shuts down, saying it didn't reach strong product-market fit — Sometimes an apparently good idea, a big raise from a big-name VC, and a sea of well-connected angel investors is not enough.
- Microsoft stock plunged 23% in Q1, a steeper drop than any of its tech peers or the Nasdaq, and its steepest quarterly drop since the 2008 financial crisis (Jordan Novet/CNBC)
Jordan Novet / CNBC : Microsoft stock plunged 23% in Q1, a steeper drop than any of its tech peers or the Nasdaq, and its steepest quarterly drop since the 2008 financial crisis — Microsoft just closed out its worst quarter on Wall Street since the 2008 financial crisis, as investors soured on the software giant's prospects in artificial intelligence.
- OpenAI has tapped retail investors for the first time, raising $3B+ as part of its $122B round, through a trio of banks and ETFs managed by ARK Invest (George Hammond/Financial Times)
George Hammond / Financial Times : OpenAI has tapped retail investors for the first time, raising $3B+ as part of its $122B round, through a trio of banks and ETFs managed by ARK Invest — ChatGPT maker taps individuals for first time as it pulls in up to $122bn — OpenAI has raised more than $3bn from retail investors …
- OpenAI closed a $122B funding round led by SoftBank, a16z, and others at an $852B post-money valuation, after previously announcing the round would total $110B (Ashley Capoot/CNBC)
Ashley Capoot / CNBC : OpenAI closed a $122B funding round led by SoftBank, a16z, and others at an $852B post-money valuation, after previously announcing the round would total $110B — OpenAI on Tuesday announced that it closed a record-breaking funding round at a post-money valuation of $852 billion.
- PrismML, which says its 1-bit LLM achieves radical compression without sacrificing performance, comes out of stealth with $16.25M in SAFE and seed funding (Steven Rosenbush/Wall Street Journal)
Steven Rosenbush / Wall Street Journal : PrismML, which says its 1-bit LLM achieves radical compression without sacrificing performance, comes out of stealth with $16.25M in SAFE and seed funding — PrismML says its 1-bit large language model achieves radical compression without sacrificing performance, lowering energy consumption
- Iranian media: Iran arrested 46 people allegedly in a network selling Starlink terminals and seized 139 terminals; there are an estimated 50K terminals in Iran (Bloomberg)
Bloomberg : Iranian media: Iran arrested 46 people allegedly in a network selling Starlink terminals and seized 139 terminals; there are an estimated 50K terminals in Iran — Iranian authorities arrested dozens of people who were allegedly in a network that sold Starlink satellite terminals …
- Monzo is shuttering its US operations to focus on scaling in the UK and Europe; source: it will lay off ~50 employees and close clients' accounts in June (Aisha S Gani/Bloomberg)
Aisha S Gani / Bloomberg : Monzo is shuttering its US operations to focus on scaling in the UK and Europe; source: it will lay off ~50 employees and close clients' accounts in June — Monzo Bank Ltd. is shuttering its US operations after years of failing to gain a foothold on the world's largest market …
- Google attributes the supply chain attack on HTTP client Axios to a suspected North Korean threat actor it calls UNC1069 (Lorenzo Franceschi-Bicchierai/TechCrunch)
Lorenzo Franceschi-Bicchierai / TechCrunch : Google attributes the supply chain attack on HTTP client Axios to a suspected North Korean threat actor it calls UNC1069 — A suspected North Korean hacker has hijacked and modified a popular open source software development tool to deliver malware that could put millions of developers at risk of being compromised.
- Google launches Veo 3.1 Lite, costing <50% of Veo 3.1 Fast and meant for "high-volume video applications", and affirms its commitment to video generation tools (Abner Li/9to5Google)
Abner Li / 9to5Google : Google launches Veo 3.1 Lite, costing <50% of Veo 3.1 Fast and meant for “high-volume video applications”, and affirms its commitment to video generation tools — Following OpenAI's Sora exit last week, Google today said it's committed to offering video generation while announcing Veo 3.1 Lite.
Solidot(15)
- 微软停止通过 Copilot 在 Pull Request 中插入广告
在引发广泛关注之后,GitHub Copilot 首席产品经理 Tim Rogers 通过 HN 宣布他们已经禁用了 Copilot 在 Pull Request 中插入广告的行为。Copilot 团队没有把在 PR 中插入“Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast”之类的文字是广告,而是视为“实用技巧(tips)”,“我们一直在 Copilot 编程智能体创建的 PR 中加入产品实用技巧。目的是帮助开发者学习如何在工作流程中更好的使用智能体。但听取反馈并反思后,我们认为这是一个错误的决定。我们以后不会再这样做了。”广告文字涉及的 Raycast 公司表示他们对此毫不知情。
- Google 开始推行 Android 开发者身份验证
Google 受争议的开发者身份验证正式启动。从今年 9 月起 Google 将强制性要求验证所有 Android 应用开发者的身份,未经身份验证的开发者的应用将无法在 Android 设备上安装(sideload)。Google 官方博客声称它是出于安全理由要求验证开发者身份,理由是其分析显示来自第三方源的恶意应用数量是 Google Play 的 90 多倍。开发者身份验证通过 Android Developer Console 和 Play Console 推出;如果应用开发者只在 Google Play 外发布应用那么他们需要通过 Android Developer Console 创建一个账号。
- 空气污染预警减少了过早死亡
根据发表于 PNAS Nexus 上的一项研究,研究团队分析了中国北方 57 座城市连续五年的数据,以评估空气污染预警的实际效果。短期暴露于 PM2.5(细颗粒物)已被充分证实会增加心血管和呼吸系统死亡风险。研究显示,由于空气预警导致的 PM2.5 的减少在五年间共避免了近 5.4 万例过早死亡——相当于与污染事件相关、由 PM2.5 导致的过早死亡减少了约 11%。 河南、河北和山东等地区受益最大。这些地区通常具有重工业密集、煤炭消费量高的特征。在预警生效期间,PM2.5 引发的急性死亡风险估计下降了 30%–40%。空气污染预警会触发一系列短期措施,例如工厂临时停工、交通管制、禁止易扬尘建筑作业以及向公众发布健康警示。研究人员在57座城市中发现,在预警期间: PM2.5(细颗粒物)下降了 20%–40%;PM10(可吸入颗粒物)下降了 33%;NO2(二氧化氮)下降了 5%–25%。
- 儿童青少年屏幕使用时间过去三十年显著增加
芬兰研究人员报告,过去三十年(1991-2022 年间)儿童和青少年的屏幕使用时间显著增加,新冠疫情后更为明显。以前的屏幕主要是传统电视,但随着时间的迁移屏幕逐渐向 PC、手机和游戏等更个性化的数字设备转变,电视的观看时间则在逐渐下降。研究发现,新冠疫情期间采取了社交封锁政策,儿童青少年依赖电子屏幕进行学习、社交和娱乐,导致屏幕使用时间急剧增加。研究表明,男孩和女孩的屏幕使用时间均有所增加,但男孩往往花更多时间玩游戏。年龄较小的儿童通常比年龄较大的儿童屏幕使用时间更少,来自高收入家庭的儿童屏幕使用时间通常也较少。
- 微软 Copilot 在修改 PR 中的拼写错误时添加了广告
开发者发现,使用微软 AI 助手 Copilot 修改 PR 中的拼写错误时它主动添加了一则广告。对 GitHub 平台的搜索发现,已经有数以万计的 PR 包含了相同的广告——“Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast”。开发者认为微软此举无法让人接受。
- 木星闪电释放的能量相当于原子弹爆炸
科学家使用 NASA 朱诺号探测器仪器测量了木星闪电,发现其释放的能量是地球上闪电的 1 百到 1 万倍。地球闪电一次释放的能量约为 10 亿焦耳,这意味着木星最强闪电释放大约 10 万亿焦耳的能量,相当于 2400 吨 TNT 炸药,或广岛原子弹威力的六分之一。根据朱诺号对木星风暴中闪电发生频率的研究,它平均每秒出现三次闪电,这意味着风暴每分钟释放的能量相当于多颗原子弹爆炸。闪电被认为促进了地球生命的演化,木星上的闪电也可能促进复杂的化学反应过程。
- 蜜蜂和蜂鸟在工作期间吸入了微量的酒
蜜蜂和蜂鸟都会饮酒,它们的食物——花蜜——都含有微量的酒精。加州伯克利的研究人员发现含有乙醇的花蜜相当普遍。在他们分析的 29 种植物花蜜样本中有 26 种发现了乙醇。大多数样本的乙醇浓度极低,但有一个样本的乙醇浓度达到了 0.056%——大约相当于 0.1 度酒精,勉强可算作酒了。虽然听起来微不足道,但相比授粉昆虫的体重,它们每天摄入的酒精并不少。一只安氏蜂鸟(anna's hummingbird)每天饮入相当于自身体重 0.5 到 1.5 倍的花蜜,根据该摄入量,研究人员估计,蜂鸟每天每公斤体重大约摄入 0.2 克乙醇。由于它们一直在花中穿梭,因此摄入的酒精会被迅速代谢掉,所以不太可能醉酒。实验室测试显示,蜂鸟乐意饮用酒精含量在 1% 左右的花蜜,但当酒精浓度升高时它们会开始避开,到 2% 左右时访问花朵的次数会急剧下降。它们也知道适度饮酒。
- 杜比诉 Snapchat 挑战 AV1 的免专利费声明
成员包括亚马逊、苹果、Google、微软、Mozilla 和 Netflix 的 AOMedia 联盟开发了免专利费的开放编解码器 AOMedia Video 1(AV1)。但杜比公司(Dolby Laboratories)对 Snap 公司提起的专利侵权诉讼对 AV1 的免专利费声明提出了质疑。杜比在诉讼书中称,AV1 利用它已经申请专利的技术,该公司未同意在免费且不收取专利税的条件下授权使用这些技术。杜比称 AOMedia]并不拥有实现 AV1 编解码器使用的所有专利,AV1 整合了存在于 HEVC 中的技术。相关技术受到了现有第三方专利权和许可义务的约束。
- AI 和机器人流量超过人类
根据 Human Security 发布的《The State of AI Traffic》报告,AI 和机器人流量正式超过了人类。报告称在 2025 年包括 AI 在内的自动化流量增长速度几乎是人类活动的八倍。OpenAI 的 ChatGPT、Anthropic 的 Claude 和 Google 的 Gemini 等大模型的流行推动了 AI 流量的增长,2025 年 AI 流量增长了 187%。Cloudflare CEO Matthew Prince 此前在 SXSW 会议上表示,在生成式 AI 时代之前,互联网流量中约有 20% 来自机器人,主要由 Google的 Web 爬虫驱动。
- DNA 告诉了我们什么,它又有什么局限
在“金州杀手(Golden State Killer)”变成陈年悬案四十余年后的 2018 年,一位对家族史感兴趣的女性向一家家谱公司寄去唾沫进行测序。她的 DNA 成为破案的关键。凶手是其远房亲戚,调查人员最终抓住了前警官 Joseph James DeAngelo Jr.,他在 2020 年承认了 13 项谋杀罪和 13 项绑架罪。有数以百万的人将其 DNA 样本寄送给 23andMe 和 AncestryDNA 等测序公司,以了解自己的祖先、发现健康风险或寻找失散的亲人。但 DNA 揭示的真相可能会颠覆我们对家庭和身份的理解:你可能会发现父母不是自己的生物学父母,或者兄弟姐妹之一可能不是亲兄弟姐妹。DNA 也揭示我们彼此之间比以前认为的更紧密:所有人类的最近共同祖先生活在几千年前。我们彼此之间都有血缘关系。美国人一直以隐私理由反对建立国家 DNA 数据库,但志愿性质的消费者基因检测已经创造了类似的国家数据库,由于共享 DNA,只需 1% 的测序就能让所有人人都搜索到,而美国有 7% 的人做过了测序。科学家也发现 DNA 揭示的信息仍然有限,糖尿病患病风险是 25% 还是 20% 并没有多大区别,并不意味着你是糖尿病高危人群,所以在利用基因筛选胚胎时将糖尿病患病风险从 35% 降至 30% 意义并不大。
- NASA 宇航员在空间站失语,原因未知
因一名宇航员出现健康问题 NASA 今年 1 月提前结束 Crew-11 任务。Crew-11 任务于 2025 年 8 月 1 日发射,原计划于 2026 年 2 月 20 日左右返回地面,执行该任务的四名宇航员——38 岁的指挥官 Zena Cardman、58 岁的飞行员 Mike Fincke、55 岁的日本宇航员 Kimiya Yui、39 岁的俄罗斯宇航员 Oleg Platonov 提前了一个多月回到地面。这是国际空间站 25 年历史上首次因医疗问题而进行的紧急撤离。上个月 NASA 披露了患病宇航员的身份——58 岁的 Mike Fincke,但没有透露更多信息。上周五 Fincke 透露他在空间站突然失语,而医生至今仍然不清楚病因。目前尚不清楚他失语了多久,也不清楚他何时恢复了说话能力。NASA 尚未置评。
- 勒索软件组织将目标瞄准波斯语系统
被称为 TeamPCP 的勒索软件组织试图卷入伊朗战争,他们释出了一种蠕虫,旨在抹掉使用伊朗时区或默认语言为波斯语的受感染系统上的数据。TeamPCP 从去年底开始利用蠕虫感染云端环境,窃取身份凭证,通过 Telegram 勒索受害者。安全公司 Flare 今年 1 月报告,被 TeamPCP 蠕虫感染的云服务 Azure(61%)和 AWS(36%)合计占到 97%。TeamPCP 最近被发现部署了新的恶意程序,如果检测到用户的时区和语言区域与伊朗相符,它将执行数据清除攻击;如果检测到受害者位于伊朗并有权访问 Kubernetes 集群,它将清除该集群每个节点上的数据,否则只清除本地计算机上的数据。
- OpenAI 利用 Cloudflare 程序防 AI 爬虫抓取
OpenAI 被发现利用 Cloudflare 程序防 AI 爬虫抓取。用户发现每条 ChatGPT 消息都会触发一个 Cloudflare Turnstile 程序的检查,Turnstile 会验证用户是否运行一个真实的浏览器,以及是否启动了 ChatGPT React 应用。如果机器人程序(bot)伪造了浏览器指纹但没有渲染真正的 ChatGPT SPA,那么它将无法通过 Turnstile 的验证。OpenAI 工程师回应称,此举是为了确保其产品没有遭到机器人程序、网络爬虫抓取等的滥用。其辩解被认为极富有讽刺性,因为 OpenAI AI 爬虫的抓取行为给网站造成了严重的负担。
- 高效烹饪番茄和胡萝卜
番茄和胡萝卜是食物中类胡萝卜素的主要来源。类胡萝卜素有助于降低多种慢性疾病的风险,包括心血管疾病和癌症。类胡萝卜素的健康影响不仅取决于其在食物中的浓度,还取决于其生物可利用率(Bioaccessibility)——即这些物质在经过人体消化后到底有多少能真正被肠胃吸收。生物可利用率会根据烹饪方式不同而产生显著差异。热处理通过破坏细胞结构和促进微胶粒形成提高类胡萝卜素的生物可利用率,但过高的温度或过长的时间可能导致其降解和异构化。根据发表在《Food Chemistry》期刊上的一项研究,研究人员对比了空气炸锅、烤箱和微波炉烹饪番茄和胡萝卜的生物可利用率。结果显示:胡萝卜经烤箱烹饪后,其总类胡萝卜素的生物可利用率最高可达原来的 9 倍;对于西红柿,无论采用空气炸锅(190 ℃ 10 分钟)还是传统烤箱(180 ℃ 20 分钟)烹饪,均可获得最高的生物利用率;对胡萝卜来说,微波加热是效率最高的烹饪方式,可将电力消耗降低 96%;对于西红柿来说,使用空气炸锅不仅能获得最高的生物可利用率,还能减少 80% 的耗能。
- Google TurboQuant AI 压缩算法大幅减少大模型内存使用
Google 研究院发布了压缩算法 TurboQuant,能在大幅减少大模型内存占用的同时提高速度和维持精度。TurboQuant 旨在减小键值缓存的大小,被称为是储存重要信息减少再计算的“数字查找表(digital cheat sheet)”。大模型并不理解任何东西,它通过映射词元文本语义的向量去模拟对事物的理解。大模型的向量通常使用 XYZ 坐标进行编码,而实现 TurboQuant 压缩的系统将向量转换为笛卡尔坐标系的极坐标,向量被简化为两类信息:半径(核心数据强度)和方向(数据含义)。如果使用 XYZ 坐标编码向量,那么特定位置可以编码为“向东走 3 个街区,向北走 4 个街区”,采用笛卡尔坐标编码向量,那么同样的信息编码为“沿 37 度方向走 5 个街区” ,简化了空间节省了计算。Google 的早期测试显示,TurboQuant 在部分测试中实现了 8 倍的性能提升,内存占用减少到原来的六分之一,同时质量没有损失。实现 TurboQuant 算法将有助于降低 AI 模型的运行成本和内存占用,但也可能推动更复杂模型的出现,因此对降低内存价格可能没有什么效果。