DIGEST · 2026-04-12

OrangeBot.AI Digest — 2026-04-12

89 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Google removes "Doki Doki Literature Club" from Google Play (bsky.app)
  2. The peril of laziness lost (bcantrill.dtrace.org)
  3. Viktor Orbán concedes defeat after 'painful' election result (apnews.com)
  4. Apple has removed most of the towns and villages in Lebanon from Apple maps? (maps.apple.com)
  5. DIY Soft Drinks (blinry.org)
  6. Show HN: boringBar – a taskbar-style dock replacement for macOS (boringbar.app)
  7. Most people can't juggle one ball (www.lesswrong.com)
  8. I gave every train in New York an instrument (www.trainjazz.com)
  9. Seven countries now generate 100% of their electricity from renewable energy (www.the-independent.com)
  10. Bring Back Idiomatic Design (essays.johnloeber.com)
  11. Tell HN: docker pull fails in spain due to football cloudflare block
  12. Pro Max 5x quota exhausted in 1.5 hours despite moderate usage (github.com)
  13. Happy Map (pudding.cool)
  14. AI Will Be Met with Violence, and Nothing Good Will Come of It (www.thealgorithmicbridge.com)
  15. Apple update looks like Czech mate for locked-out iPhone user (www.theregister.com)

GitHub Trending(14)

  1. NousResearch / hermes-agent
  2. shiyu-coder / Kronos
  3. forrestchang / andrej-karpathy-skills
  4. microsoft / markitdown
  5. multica-ai / multica
  6. coleam00 / Archon
  7. shanraisshan / claude-code-best-practice
  8. OpenBMB / VoxCPM
  9. thedotmack / claude-mem
  10. ahujasid / blender-mcp
  11. rustfs / rustfs
  12. virattt / ai-hedge-fund
  13. snarktank / ralph
  14. TapXWorld / ChinaTextbook

Product Hunt(15)

  1. Interactive Simulations in Gemini

    Gemini now lets you play with the concepts you ask about

  2. Layered

    Turn your selfies into a personal AI stylist

  3. R0Y

    Natural language to Investing dashboards in seconds.

  4. Ray

    Your personal CFO in the terminal

  5. Edgee Codex Compressor

    Use Codex at 35.6% lower costs

  6. ClarifierAI for IOS

    Use AI for writing & translating your messages 10x faster

  7. Music Marketplace by Eleven Labs

    Create a track. Publish it. Earn when it is used.

  8. Nicelydone MCP

    Design context for AI agents

  9. Upvotics

    AI-Powered Competitive Intelligence and Tracker on Autopilot

  10. Clicky

    AI buddy next to your cursor on Mac—sees, guides, helps you!

  11. LaReview

    Open-source free next-generation code review

  12. Capso

    Free open-source screenshot & screen recorder for Mac

  13. 1% Better

    Visualise the compounding effect of your daily habits

  14. shush

    Room-aware noise. Quieter. Smarter. Better REM sleep.

  15. Claude Code ultraplan

    Claude Code command that plans your codebase in the cloud

Hugging Face(15)

  1. Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

    A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by optimization dynamics, training data, and base-model capability. Some reported failures are under-optimization artifacts: cross-domain performance first degrades before recovering and improving with extended training (a dip-and-recovery pattern), so shorttraining checkpoints can underestimate generalization. Data quality and structure both matter: low-quality solutions broadly hurt generalization,while verified long-CoT traces yield consistent cross-domain gains. Model capability is essential: stronger models internalize transferable procedural patterns (e.g., backtracking) even from a toy arithmetic game, while weaker ones imitate surface verbosity. This generalization is asymmetric, however: reasoning improves while safety degrades, reframing the question from whether reasoning SFT generalizes to under what conditions and at what cost.

  2. SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

    Large language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deployment. As a result, similar workflows, tool usage patterns, and failure modes are repeatedly rediscovered across users, preventing the system from improving with experience. While interactions from different users provide complementary signals about when a skill works or fails, existing systems lack a mechanism to convert such heterogeneous experiences into reliable skill updates. To address these issues, we present SkillClaw, a framework for collective skill evolution in multi-user agent ecosystems, which treats cross-user and over-time interactions as the primary signal for improving skills. SkillClaw continuously aggregates trajectories generated during use and processes them with an autonomous evolver, which identifies recurring behavioral patterns and translates them into updates to the skill set by refining existing skills or extending them with new capabilities. The resulting skills are maintained in a shared repository and synchronized across users, allowing improvements discovered in one context to propagate system-wide while requiring no additional effort from users. By integrating multi-user experience into ongoing skill updates, SkillClaw enables cross-user knowledge transfer and cumulative capability improvement, and experiments on WildClawBench show that limited interaction and feedback, it significantly improves the performance of Qwen3-Max in real-world agent scenarios.

  3. ClawBench: Can AI Agents Complete Everyday Online Tasks?

    AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work, spanning 144 live platforms across 15 categories, from completing purchases and booking appointments to submitting job applications. These tasks require demanding capabilities beyond existing benchmarks, such as obtaining relevant information from user-provided documents, navigating multi-step workflows across diverse platforms, and write-heavy operations like filling in many detailed forms correctly. Unlike existing benchmarks that evaluate agents in offline sandboxes with static pages, ClawBench operates on production websites, preserving the full complexity, dynamic nature, and challenges of real-world web interaction. A lightweight interception layer captures and blocks only the final submission request, ensuring safe evaluation without real-world side effects. Our evaluations of 7 frontier models show that both proprietary and open-source models can complete only a small portion of these tasks. For example, Claude Sonnet 4.6 achieves only 33.3%. Progress on ClawBench brings us closer to AI agents that can function as reliable general-purpose assistants.

  4. HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

    We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Vision-Language Models (VLMs) and the demands of embodied agents, our models are developed to enhance the core capabilities required by embodied intelligence: spatial and temporal visual perception, alongside advanced embodied reasoning for prediction, interaction, and planning. The HY-Embodied-0.5 suite comprises two primary variants: an efficient model with 2B activated parameters designed for edge deployment, and a powerful model with 32B activated parameters targeted for complex reasoning. To support the fine-grained visual perception essential for embodied tasks, we adopt a Mixture-of-Transformers (MoT) architecture to enable modality-specific computing. By incorporating latent tokens, this design effectively enhances the perceptual representation of the models. To improve reasoning capabilities, we introduce an iterative, self-evolving post-training paradigm. Furthermore, we employ on-policy distillation to transfer the advanced capabilities of the large model to the smaller variant, thereby maximizing the performance potential of the compact model. Extensive evaluations across 22 benchmarks, spanning visual perception, spatial reasoning, and embodied understanding, demonstrate the effectiveness of our approach. Our MoT-2B model outperforms similarly sized state-of-the-art models on 16 benchmarks, while the 32B variant achieves performance comparable to frontier models such as Gemini 3.0 Pro. In downstream robot control experiments, we leverage our robust VLM foundation to train an effective Vision-Language-Action (VLA) model, achieving compelling results in real-world physical evaluations. Code and models are open-sourced at https://github.com/Tencent-Hunyuan/HY-Embodied.

  5. When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

    Text-to-video diffusion models have enabled open-ended video synthesis, but often struggle with generating the correct number of objects specified in a prompt. We introduce NUMINA , a training-free identify-then-guide framework for improved numerical alignment. NUMINA identifies prompt-layout inconsistencies by selecting discriminative self- and cross-attention heads to derive a countable latent layout. It then refines this layout conservatively and modulates cross-attention to guide regeneration. On the introduced CountBench, NUMINA improves counting accuracy by up to 7.4% on Wan2.1-1.3B, and by 4.9% and 5.5% on 5B and 14B models, respectively. Furthermore, CLIP alignment is improved while maintaining temporal consistency. These results demonstrate that structural guidance complements seed search and prompt enhancement, offering a practical path toward count-accurate text-to-video diffusion. The code is available at https://github.com/H-EmbodVis/NUMINA.

  6. MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

    In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and high-quality style dataset. We achieve this by leveraging the consistent text-to-image style mapping capability of current large generative models, which can generate images in the same style from a given style description. Building on this foundation, we curate a diverse and balanced prompt gallery with 170K style prompts and 400K content prompts, and generate a large-scale style dataset MegaStyle-1.4M via content-style prompt combinations. With MegaStyle-1.4M, we propose style-supervised contrastive learning to fine-tune a style encoder MegaStyle-Encoder for extracting expressive, style-specific representations, and we also train a FLUX-based style transfer model MegaStyle-FLUX. Extensive experiments demonstrate the importance of maintaining intra-style consistency, inter-style diversity and high-quality for style dataset, as well as the effectiveness of the proposed MegaStyle-1.4M. Moreover, when trained on MegaStyle-1.4M, MegaStyle-Encoder and MegaStyle-FLUX provide reliable style similarity measurement and generalizable style transfer, making a significant contribution to the style transfer community. More results are available at our project website https://jeoyal.github.io/MegaStyle/.

  7. LPM 1.0: Video-based Character Performance Model

    Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness, real-time inference, and long-horizon identity stability, a tension we call the performance trilemma. Conversation is the most comprehensive performance scenario, as characters simultaneously speak, listen, react, and emote while maintaining identity over time. To address this, we present LPM 1.0 (Large Performance Model), focusing on single-person full-duplex audio-visual conversational performance. Concretely, we build a multimodal human-centric dataset through strict filtering, speaking-listening audio-video pairing, performance understanding, and identity-aware multi-reference extraction; train a 17B-parameter Diffusion Transformer (Base LPM) for highly controllable, identity-consistent performance through multimodal conditioning; and distill it into a causal streaming generator (Online LPM) for low-latency, infinite-length interaction. At inference, given a character image with identity-aware references, LPM 1.0 generates listening videos from user audio and speaking videos from synthesized audio, with text prompts for motion control, all at real-time speed with identity-stable, infinite-length generation. LPM 1.0 thus serves as a visual engine for conversational agents, live streaming characters, and game NPCs. To systematically evaluate this setting, we propose LPM-Bench, the first benchmark for interactive character performance. LPM 1.0 achieves state-of-the-art results across all evaluated dimensions while maintaining real-time inference.

  8. OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

    Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending this success to open-source multimodal generalist models remains heavily constrained by two primary challenges: the extreme variance in reward topologies across diverse visual tasks, and the inherent difficulty of balancing fine-grained perception with multi-step reasoning capabilities. To address these issues, we introduce Gaussian GRPO (G^2RPO), a novel RL training objective that replaces standard linear scaling with non-linear distributional matching. By mathematically forcing the advantage distribution of any given task to strictly converge to a standard normal distribution, N(0,1), G^2RPO theoretically ensures inter-task gradient equity, mitigates vulnerabilities to heavy-tail outliers, and offers symmetric update for positive and negative rewards. Leveraging the enhanced training stability provided by G^2RPO, we introduce two task-level shaping mechanisms to seamlessly balance perception and reasoning. First, response length shaping dynamically elicits extended reasoning chains for complex queries while enforce direct outputs to bolster visual grounding. Second, entropy shaping tightly bounds the model's exploration zone, effectively preventing both entropy collapse and entropy explosion. Integrating these methodologies, we present OpenVLThinkerV2, a highly robust, general-purpose multimodal model. Extensive evaluations across 18 diverse benchmarks demonstrate its superior performance over strong open-source and leading proprietary frontier models.

  9. DMax: Aggressive Parallel Decoding for dLLMs

    We present DMax, a new paradigm for efficient diffusion language models (dLLMs). It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further propose Soft Parallel Decoding. We represent each intermediate decoding state as an interpolation between the predicted token embedding and the mask embedding, enabling iterative self-revising in embedding space. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax. Compared with the original LLaDA-2.0-mini, our method improves TPF on GSM8K from 2.04 to 5.47 while preserving accuracy. On MBPP, it increases TPF from 2.71 to 5.86 while maintaining comparable performance. On two H200 GPUs, our model achieves an average of 1,338 TPS at batch size 1. Code is available at: https://github.com/czg1225/DMax

  10. KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

    Personalized mobile agents that infer user preferences and calibrate proactive assistance hold great promise as everyday digital assistants, yet existing benchmarks fail to capture what this requires. Prior work evaluates preference recovery from static histories or intent prediction from fixed contexts. Neither tests whether an agent can elicit missing preferences through interaction, nor whether it can decide when to intervene, seek consent, or remain silent in a live GUI environment. We introduce KnowU-Bench, an online benchmark for personalized mobile agents built on a reproducible Android emulation environment, covering 42 general GUI tasks, 86 personalized tasks, and 64 proactive tasks. Unlike prior work that treats user preferences as static context, KnowU-Bench hides the user profile from the agent and exposes only behavioral logs, forcing genuine preference inference rather than context lookup. To support multi-turn preference elicitation, it instantiates an LLM-driven user simulator grounded in structured profiles, enabling realistic clarification dialogues and proactive consent handling. Beyond personalization, KnowU-Bench provides comprehensive evaluation of the complete proactive decision chain, including grounded GUI execution, consent negotiation, and post-rejection restraint, evaluated through a hybrid protocol combining rule-based verification with LLM-as-a-Judge scoring. Our experiments reveal a striking degradation: agents that excel at explicit task execution fall below 50% under vague instructions requiring user preference inference or intervention calibration, even for frontier models like Claude Sonnet 4.6. The core bottlenecks are not GUI navigation but preference acquisition and intervention calibration, exposing a fundamental gap between competent interface operation and trustworthy personal assistance.

  11. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    Large language model (LLM) agents are increasingly built less by changing model weights than by reorganizing the runtime around them. Capabilities that earlier systems expected the model to recover internally are now externalized into memory stores, reusable skills, interaction protocols, and the surrounding harness that makes these modules reliable in practice. This paper reviews that shift through the lens of externalization. Drawing on the idea of cognitive artifacts, we argue that agent infrastructure matters not merely because it adds auxiliary components, but because it transforms hard cognitive burdens into forms that the model can solve more reliably. Under this view, memory externalizes state across time, skills externalize procedural expertise, protocols externalize interaction structure, and harness engineering serves as the unification layer that coordinates them into governed execution. We trace a historical progression from weights to context to harness, analyze memory, skills, and protocols as three distinct but coupled forms of externalization, and examine how they interact inside a larger agent system. We further discuss the trade-off between parametric and externalized capability, identify emerging directions such as self-evolving harnesses and shared agent infrastructure, and discuss open challenges in evaluation, governance, and the long-term co-evolution of models and external infrastructure. The result is a systems-level framework for explaining why practical agent progress increasingly depends not only on stronger models, but on better external cognitive infrastructure.

  12. Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

    The advent of agentic multimodal models has empowered systems to actively interact with external environments. However, current agents suffer from a profound meta-cognitive deficit: they struggle to arbitrate between leveraging internal knowledge and querying external utilities. Consequently, they frequently fall prey to blind tool invocation, resorting to reflexive tool execution even when queries are resolvable from the raw visual context. This pathological behavior precipitates severe latency bottlenecks and injects extraneous noise that derails sound reasoning. Existing reinforcement learning protocols attempt to mitigate this via a scalarized reward that penalizes tool usage. Yet, this coupled formulation creates an irreconcilable optimization dilemma: an aggressive penalty suppresses essential tool use, whereas a mild penalty is entirely subsumed by the variance of the accuracy reward during advantage normalization, rendering it impotent against tool overuse. To transcend this bottleneck, we propose HDPO, a framework that reframes tool efficiency from a competing scalar objective to a strictly conditional one. By eschewing reward scalarization, HDPO maintains two orthogonal optimization channels: an accuracy channel that maximizes task correctness, and an efficiency channel that enforces execution economy exclusively within accurate trajectories via conditional advantage estimation. This decoupled architecture naturally induces a cognitive curriculum-compelling the agent to first master task resolution before refining its self-reliance. Extensive evaluations demonstrate that our resulting model, Metis, reduces tool invocations by orders of magnitude while simultaneously elevating reasoning accuracy.

  13. MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

    Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, reproducibility, and community-driven progress. We believe agents for the open web should be built in the open. To this end, we introduce (1) MolmoWebMix, a large and diverse mixture of browser task demonstrations and web-GUI perception data and (2) MolmoWeb, a family of fully open multimodal web agents. Specifically, MolmoWebMix combines over 100K synthetic task trajectories from multiple complementary generation pipelines with 30K+ human demonstrations, atomic web-skill trajectories, and GUI perception data, including referring expression grounding and screenshot question answering. MolmoWeb agents operate as instruction-conditioned visual-language action policies: given a task instruction and a webpage screenshot, they predict the next browser action, requiring no access to HTML, accessibility trees, or specialized APIs. Available in 4B and 8B size, on browser-use benchmarks like WebVoyager, Online-Mind2Web, and DeepShop, MolmoWeb agents achieve state-of-the-art results outperforming similar scale open-weight-only models such as Fara-7B, UI-Tars-1.5-7B, and Holo1-7B. MolmoWeb-8B also surpasses set-of-marks (SoM) agents built on much larger closed frontier models like GPT-4o. We further demonstrate consistent gains through test-time scaling via parallel rollouts with best-of-N selection, achieving 94.7% and 60.5% pass@4 (compared to 78.2% and 35.3% pass@1) on WebVoyager and Online-Mind2Web respectively. We will release model checkpoints, training data, code, and a unified evaluation harness to enable reproducibility and accelerate open research on web agents.

  14. OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

    Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To bridge this gap, we elucidate the design principles of a robust data generation system and introduce OpenSpatial -- an open-source data engine engineered for high quality, extensive scalability, broad task diversity, and optimized efficiency. OpenSpatial adopts 3D bounding boxes as the fundamental primitive to construct a comprehensive data hierarchy across five foundational tasks: Spatial Measurement (SM), Spatial Relationship (SR), Camera Perception (CP), Multi-view Consistency (MC), and Scene-Aware Reasoning (SAR). Leveraging this scalable infrastructure, we curate OpenSpatial-3M, a large-scale dataset comprising 3 million high-fidelity samples. Extensive evaluations demonstrate that versatile models trained on our dataset achieve state-of-the-art performance across a wide spectrum of spatial reasoning benchmarks. Notably, the best-performing model exhibits a substantial average improvement of 19 percent, relatively. Furthermore, we provide a systematic analysis of how data attributes influence spatial perception. By open-sourcing both the engine and the 3M-scale dataset, we provide a robust foundation to accelerate future research in spatial intelligence.

  15. OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering

    To extend the reinforcement learning post-training paradigm to omni-modal models for concurrently bolstering video-audio understanding and collaborative reasoning, we propose OmniJigsaw, a generic self-supervised framework built upon a temporal reordering proxy task. Centered on the chronological reconstruction of shuffled audio-visual clips, this paradigm strategically orchestrates visual and auditory signals to compel cross-modal integration through three distinct strategies: Joint Modality Integration, Sample-level Modality Selection, and Clip-level Modality Masking. Recognizing that the efficacy of such proxy tasks is fundamentally tied to puzzle quality, we design a two-stage coarse-to-fine data filtering pipeline, which facilitates the efficient adaptation of OmniJigsaw to massive unannotated omni-modal data. Our analysis reveals a ``bi-modal shortcut phenomenon'' in joint modality integration and demonstrates that fine-grained clip-level modality masking mitigates this issue while outperforming sample-level modality selection. Extensive evaluations on 15 benchmarks show substantial gains in video, audio, and collaborative reasoning, validating OmniJigsaw as a scalable paradigm for self-supervised omni-modal learning.

Techmeme(15)

  1. A look at the escalating global AI arms race, as the US, China, Russia, and others rush to build AI-backed autonomous weapons and defense systems (New York Times)

    New York Times : A look at the escalating global AI arms race, as the US, China, Russia, and others rush to build AI-backed autonomous weapons and defense systems —  China, the U.S., Russia and others have ramped up their contest over artificial-intelligence-backed weapons and military systems.

  2. A deep dive into the debate about Claude Mythos Preview, the model's capabilities, attempts to refute Anthropic's claims, and what it means for the future of AI (Zvi Mowshowitz/Don't Worry About the Vase)

    Zvi Mowshowitz / Don't Worry About the Vase : A deep dive into the debate about Claude Mythos Preview, the model's capabilities, attempts to refute Anthropic's claims, and what it means for the future of AI —  Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon.

  3. As weather betting grows on prediction markets, climate experts are debating whether it improves forecasts by aggregating knowledge or is simply a zero-sum game (Bloomberg)

    Bloomberg : As weather betting grows on prediction markets, climate experts are debating whether it improves forecasts by aggregating knowledge or is simply a zero-sum game —  From Kalshi and Polymarket to niche scientific platforms, traders are predicting the weather — and climate experts are debating the results.

  4. Sources: UK regulators plan to warn banks, insurers, and exchanges about security risks exposed by Claude Mythos Preview at a meeting within the next two weeks (Martin Arnold/Financial Times)

    Martin Arnold / Financial Times : Sources: UK regulators plan to warn banks, insurers, and exchanges about security risks exposed by Claude Mythos Preview at a meeting within the next two weeks —  Leading banks, insurers and exchanges to be warned over cyber security vulnerabilities exposed by Claude Mythos

  5. Sources: Apple is testing four AI glasses designs with rectangular and oval frames, multiple colors, and a camera system with vertically oriented oval lenses (Mark Gurman/Bloomberg)

    Mark Gurman / Bloomberg : Sources: Apple is testing four AI glasses designs with rectangular and oval frames, multiple colors, and a camera system with vertically oriented oval lenses —  Also: The latest on the foldable iPhone.  —  Apple is working on several frame styles and a unique camera design for its first smart glasses.

  6. Flipkart and Amazon's quick commerce push in India is intensifying competition in an already crowded space where profitability remains under pressure (Jagmeet Singh/TechCrunch)

    Jagmeet Singh / TechCrunch : Flipkart and Amazon's quick commerce push in India is intensifying competition in an already crowded space where profitability remains under pressure —  India's quick commerce market is booming, with demand more than doubling for some players.  But the fast-delivery push by Flipkart …

  7. A journalist recounts how he used ChatGPT to develop a fitness plan to prepare for the Paris Marathon, resulting in a 20-pound weight loss and faster race times (Derek Wallbank/Bloomberg)

    Derek Wallbank / Bloomberg : A journalist recounts how he used ChatGPT to develop a fitness plan to prepare for the Paris Marathon, resulting in a 20-pound weight loss and faster race times —  Six months of pain and progress, 20 pounds lost and a trial-and-error test of what AI can — and cannot — do.

  8. The Linux Kernel Organization now lets developers submit AI-generated code, as long as it complies with the guidelines, licensing, and attribution requirements (Simon Batt/XDA Developers)

    Simon Batt / XDA Developers : The Linux Kernel Organization now lets developers submit AI-generated code, as long as it complies with the guidelines, licensing, and attribution requirements —  - Linux allows AI-generated kernel code, but the community will treat it as your own contribution.

  9. Analysts and researchers say Google's TurboQuant compression algorithm to make LLMs more efficient is more likely to expand memory chip demand than reduce it (Daniel Tudor/Financial Times)

    Daniel Tudor / Financial Times : Analysts and researchers say Google's TurboQuant compression algorithm to make LLMs more efficient is more likely to expand memory chip demand than reduce it —  More efficient artificial intelligence could mean even greater need for semiconductors, say experts

  10. Takeaways from HumanX, one of the AI industry's main events: Claude Code dominated the conversation, while some execs noted China's lead in open-weight models (Ashley Capoot/CNBC)

    Ashley Capoot / CNBC : Takeaways from HumanX, one of the AI industry's main events: Claude Code dominated the conversation, while some execs noted China's lead in open-weight models —  If one thing became clear at the HumanX conference in San Francisco this week, where 6,500 executives, founders and investors gathered …

  11. Q&A with NYT reporter Tiffany Hsu about AI-generated online influencers, how the volume of synthetic content produces exhaustion for users, and more (Charlie Warzel/The Atlantic)

    Charlie Warzel / The Atlantic : Q&A with NYT reporter Tiffany Hsu about AI-generated online influencers, how the volume of synthetic content produces exhaustion for users, and more —  AI avatars are redefining influence and trust online.  —  On this week's Galaxy Brain episode, Charlie Warzel is joined …

  12. Survey of 6,698 people across six EU countries: around 84% said they don't trust US tech companies with their personal data; 93% don't trust Chinese companies (Ellen O'Regan/Politico)

    Ellen O'Regan / Politico : Survey of 6,698 people across six EU countries: around 84% said they don't trust US tech companies with their personal data; 93% don't trust Chinese companies —  Europe is rolling out measures to keep data local and reduce reliance on foreign tech.  —  More than 8 in 10 Europeans …

  13. Sources: Anthropic met with Christian leaders in March to seek input on Claude's moral and spiritual development and if it could be considered a "child of God" (Washington Post)

    Washington Post : Sources: Anthropic met with Christian leaders in March to seek input on Claude's moral and spiritual development and if it could be considered a “child of God” —  The artificial intelligence company asked religious leaders for guidance on building a moral chatbot.  —  Summary

  14. A wave of top AI researchers returned from the US to China in the past year, driven by better pay, quality of life, and a more restrictive US immigration system (Zijing Wu/Financial Times)

    Zijing Wu / Financial Times : A wave of top AI researchers returned from the US to China in the past year, driven by better pay, quality of life, and a more restrictive US immigration system —  Engineers and scientists return for better pay and quality of life as US grows more hostile  —  In the hushed corridors …

  15. Google says Polymarket bets "briefly appeared in Google News in error", after the bets appeared alongside news articles in the "For You" section (Terrence O'Brien/The Verge)

    Terrence O'Brien / The Verge : Google says Polymarket bets “briefly appeared in Google News in error”, after the bets appeared alongside news articles in the “For You” section —  Links to bets on world events were appearing alongside legitimate news organizations.

Solidot(15)

  1. 韩国移动运营商将为其用户提供 400 Kbps 基本数据传输率

    韩国三大移动运营商 SK Telecom、KT 和 LG Uplus 将为逾 700 万移动用户提供 400Kbps 的基本数据传输率。当用户使用的流量超过了其每月限额之后,他们的移动传输率将降至 400Kbps,但不再有流量限制。400Kbps 可能不太适合刷短视频——标清视频需要 5 Mbps 左右的网速,但对于浏览网页、收发短信、VoIP 语音通话是绰绰有余了。此举意味着用户的月流量耗尽之后不会被强制断网或者收取高额流量费用。三大运营商还承诺增加老年人的数据和通话流量。这些措施是运营商们在去年发生一系列安全事故之后采取的补救义务行动。

  2. 反腐如何影响餐饮业

    2012 年党中央发布了限制铺张浪费的八项规定(全称“十八届中央政治局关于改进工作作风、密切联系群众的八项规定”),为经济学家提供了罕见的机会观察反腐如何影响餐饮业的选址和整体经济格局。研究人员利用大众点评的数据,分析了 2010-2014 年间数十万条顾客评论和消费报告,记录了 120 个政府机关的地址,观察附近数万家餐厅的情况。结果显示,八项规定实施后,政府机关附近餐厅的生意立即下滑。顾客到访量下降 5.5%,人均消费下降 2.7%。这相当于北京餐厅每年损失约 4 亿美元的销售额。高档餐厅受到的冲击最大。到 2016 年,北京餐饮业的整体格局发生了改变。在整顿前,高档餐饮大多集中在政治权力中心附近,之后高档餐厅逐渐分散到普通商业区和居民区。这表明政治权力的地理分布直接改变地方经济格局。它以正常市场力量无法解释的方式将财富和资源聚集在一起。研究表明,政治权力会影响企业选址。研究还揭示了反腐的隐性成本,其经济影响远超预期目标。

  3. CPUID 网站下载链接被劫持传播恶意程序

    提供 CPU-Z 和 HWMonitor 等流行免费系统分析工具的 CPUID 网站遭到入侵,导致用户在短时间内下载了恶意程序。用户首先通过社交媒体报告安装从 CPUID 下载的程序时杀毒软件弹出了警告。CPUID 网站随后证实,它使用的一个第三方 API 在 4 月 9-10 日期间被入侵了大约 6 个小时,导致主网站随机显示恶意链接。CPUID 提供的应用本身没有被纂改。攻击者主要针对 HWMonitor 用户,恶意版本包含了一个假的 CRYPTBASE.dll 文件,它会连接指令控制服务器下载更多恶意负荷。CPUID 表示问题已修复。

  4. Rockstar 证实遭到入侵,但否认重要数据被盗

    黑客组织 ShinyHunters 声称入侵了 Rockstar Games 的 Snowflake 服务器,窃取了大量数据,它要求 Rockstar 在 4 月14 日前支付赎金,否则将泄露数据。ShinyHunters 是通过 Anodot 访问了 Rockstar 托管在 Snowflake 的服务器,Snowflake 本身并没有遭到入侵。Rockstar 之后证实遭到入侵,但否认重要数据被盗,称有少量非物质(non-material)公司信息被访问,这次事件不会对公司或玩家造成任何影响。

  5. Red Hat 裁掉了中国工程团队

    IBM 子公司 Red Hat 解雇了整个中国工程团队,将大部分工作岗位转移到印度。一位自称是 Red Hat China 首席软件工程师的用户在 Hacker News 上发帖称他周四醒来后发现无法登陆 VPN,其他多种服务的访问权限也都撤销了,CTO 之后通知他们公司将业务重心转移到亚太中心。此次裁员有 300-500 人受到影响。根据 Red Hat CTO Chris Wright 的备忘录,Red Hat 将印度视为关键地点,中国不再是,因此它将停止在中国的工程活动,将大部分工作转移到印度。IBM 此前表示它的印度员工总数超过了美国,全球员工总数达到 26.4 万人。

  6. 在 Firefox 浏览器上安装“所有”扩展会发生什么

    Firefox 官方扩展商店共有 8.4 万个扩展,如果我们下载安装所有这些扩展,会发生什么事?有人替我们完成了这一几乎不可能的任务。扩展的总容量为 49.3 GB,单个扩展的平均容量为 584.9 kB。容量最大的是 dmitlichess 共 196.3 MB——包含逾 2000 音频文件,其次是 (Unoffical) ReactBot Web 容量 184.9 MB,Eric’s Thumbnail Seasoning! 容量 146.6 MB,Animal Forest:PG BGM 137.4 MB...等等。最小的扩展是 theTabs-saver,7518 bytes,没有代码;访问权限需求最多的扩展是 FalscheLaden,要求 3,695 个访问权限,该扩展没有活跃用户,Google Dark Theme 要求 2,675 个权限但它有 1,687 名用户;扩展开发者 Dr. B 使用 AI 工具开发了 84 个扩展,这些扩展的 README.md 提到了 Grok 3。34.3% 的扩展没有日活用户,0.7% 的扩展日活用户超过 1 万;76.7% 的扩展开源。作者发现,当加载的扩展数达到 65,335 个之后浏览器就挂起卡死了。在设法加载 8.4 万个扩展之后,浏览器基本上没法用了,比如打开扩展页 about:addons 就花了 6 个小时。

  7. 全球夜间人造光亮度 8 年增加 16%

    根据发表在《自然》期刊上的一项研究,通过对卫星观测数据分析发现,2014-2022 年间,全球夜间人造光总体亮度增加了 16%。研究显示,全球夜间亮度变化并非均匀增强,而是不同地区此消彼长形成“拼图式”格局。2022 年美国夜间总亮度位居全球首位,其次为中国、印度、加拿大和巴西。亮度增长主要来自城市化加速、基础设施扩张及农村电气化,撒哈拉以南非洲和东南亚增幅最为显著。而亮度下降则分为两类:突发性变暗多由自然灾害、电网故障和武装冲突引发;渐进性变暗则多与节能政策和减少光污染措施相关,欧洲部分地区即呈现这一趋势。

  8. 涉嫌向 Sam Altman 住宅扔燃烧瓶的嫌疑人被捕

    旧金山警方逮捕了一名涉嫌向 OpenAI CEO Sam Altman 住宅扔燃烧瓶的嫌疑人,还嫌疑人还跑到 OpenAI 位于旧金山 Mission Bay 的总部大楼前发表威胁言论。OpenAI 在发给员工的内部声明中表示,在周五 3:45am PT 左右,嫌疑人接近 Sam 住宅投掷了一枚燃烧装置。该装置落在附近并熄灭。无人受伤,仅造成轻微损失。不久后安保在总部大楼 MB1 外发现了与一位描述的嫌疑人相符的人。OpenAI 通知员工,安保可能会有所加强,办公室仍然正常开放,建议员工不要让任何人尾随进入大楼。

  9. Artemis II 宇航员返回地面

    NASA 执行阿尔忒弥斯二号(Artemis II)绕月任务的四名宇航员于 4 月 10 日 5:07 p.m. PDT 溅落在圣地亚哥海岸附近。NASA 宇航员 Reid Wiseman、Victor Glover 和 Christina Koch,以及加拿大 CSA 宇航员 Jeremy Hansen 搭乘直升飞机前往 John P. Murtha 号驱逐舰接受初步体检,预计 4 月 11 日返回休斯顿的 NASA 太空中心。宇航员总共飞行了 694,481 英里,打破了 1970 年阿波罗 13 号宇航员创下的人类最远飞行距离纪录。

  10. 阿根廷企鹅国度发现 PFAS

    加州戴维斯的科学家在 2022-2024 年的企鹅繁殖季期间给生活在阿根廷 Patagonian 海岸的 54 只麦哲伦企鹅的脚套上硅胶被动取样器,收集企鹅在数天活动期间内接触的水、空气和表面化学物质的样本。采样器回收后送往纽约州立大学布法罗进行检测。检测结果显示,即使在偏远地区,逾九成样本都检测到了 PFAS,PFAS 代表全氟烷基和多氟烷基物质,因为不会自然分解它被称为“永久留存的化学品”。

  11. FBI 利用 iPhone 通知数据恢复已删除 Signal 消息

    在最近一起涉及在德州 Alvarado ICE Prairieland Detention Facility 放烟花和破坏财产案件中,FBI 利用 iPhone 手机上储存的通知数据库数据恢复了已删除的 Signal 消息。Signal 采用端对端加密,除了接收双方,其他人本来无法查看消息,但 iPhone 的通知功能能提供消息的预览,相关信息还会储存在通知数据库中。要避免消息被预览和保存,Signal 用户可以启用禁止预览,或者修改 iPhone 的设置 > 通知 > 通知内容 > 显示:“仅名称”或“不显示名称或内容”。在本案中,被告没有修改相关设置,导致即使应用卸载之后其消息仍然能被恢复。

  12. 法国政府将从 Windows 工作站迁移到 Linux 工作站

    在微软听命于美国总统下达的不合理命令之后,越来越多的欧盟国家开始强化数字主权,远离软件巨人,拥抱开源软件。法国总理命令跨部门数字事务局(DINUM)采取措施减少对美国的依赖。DINUM 已经宣布了第一个目标:替代 Windows。DINUM 负责监管各政府部门的 IT 设备和服务部署,它宣布工作站运行的操作系统将从 Windows 迁移到 Linux。除了工作站,协作工具、防病毒软件、人工智能、数据库、虚拟化等也将考虑迁移到非美国解决方案。

  13. EFF 退出 X 平台

    电子前哨基金会(EFF)宣布退出 X/Twitter 平台,原因可能是 X 平台的算法在马斯克(Elon Musk)治下更倾向于传播右翼内容,导致立场倾向于左翼的数字权利组织 EFF 的内容曝光度大幅下降。EFF 称它每天在 X/Twitter 上发表 5-10 条推文, 2018 年这些推文月曝光量能达到 5000 万到 1 亿次,2024 年它共发表了 2500 条推文月曝光量仅为 200 万次,2025 年共发表了 1500 条推文,全年曝光量仅为 1300 万次。今天 EFF 一则推文的曝光量仅为 7 年前的 3%。EFF 称,它在 Facebook、Instagram、YouTube 和 TikTok 上有账号并不代表认可这些平台,而是为了接触这些平台的用户。它会继续通过 Bluesky、Mastodon、LinkedIn、Instagram、TikTok、Facebook、YouTube 和 eff.org 等平台和网站继续抗争,只是不再通过 X 平台。

  14. NASA 如何构建 Artemis II 的容错计算机

    阿波罗宇航员用于登月的计算机配备了 1-MHz 处理器和 4 kilobytes 可擦除内存,其功能有限,飞船重要的环境和电源控制仍然是通过手动或机电方式实现的。阿尔忒弥斯(Artemis)II 的 Orion 飞船则配备了至今容错能力最强的计算机系统。为了应对太空辐射,飞船配备了两台计算机 Vehicle Management Computers,每台包含两个飞控模块 Flight Control Modules,总共四个 FCM,每个 FCM 配备了一对自检验处理器。八个 CPU 并行运行飞行软件。自检验意味着一旦某个 CPU 因一次辐射事件而出错,系统会立即检测出错误并做出反应。此外系统还使用了三模块冗余内存,每次读取时都能自动纠正单比特错误。网络也采用三重冗余设计,所有网络交换机都采用自检策略。飞船还搭载了一个完全独立的 Backup Flight Software(BFS)系统,它使用不同的硬件,运行不同的操作系统,使用独立开发的简化版飞行软件。这种设计被称为异构冗余。

  15. FreeBSD 公布兼容笔记本型号

    FreeBSD 基金会在 2024 年第四季度启动了项目 Laptop Support & Usability Project,致力于改进 FreeBSD 发行版在现代笔记本电脑上的兼容性,解决 Wi-Fi、图形、音频、安装程序和睡眠状态等方面的问题。该项目已经取得了长足进步,它公布了一个兼容笔记本型号的名单,有 10 个型号的笔记本获得了 8/8 的几乎完美评分。这些笔记本电脑包括:联想 ThinkPad X270、华硕 TUF Gaming F15 FX507VU_FX507VU、惠普 EliteBook 845 G7 Notebook PC、联想 IdeaPad 5 15ALC05、Framework Laptop 13 (13th Gen Intel Core)、联想 Yoga 11e、Framework Laptop 13 (AMD Ryzen 7040 Series)、联想 ThinkPad T490、Framework Laptop 16 (AMD Ryzen 7040 Series) 以及 Aspire A315-24PT。