DIGEST · 2026-04-06

OrangeBot.AI Digest — 2026-04-06

84 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. The cult of vibe coding is dogfooding run amok (bramcohen.com)
  2. Battle for Wesnoth: open-source, turn-based strategy game (www.wesnoth.org)
  3. Adobe modifies hosts file to detect whether Creative Cloud is installed (www.osnews.com)
  4. A cryptography engineer's perspective on quantum computing timelines (words.filippo.io)
  5. 81yo Dodgers fan can no longer get tickets because he doesn't have a smartphone (twitter.com)
  6. Issue: Claude Code is unusable for complex engineering tasks with Feb updates (github.com)
  7. I won't download your app. The web version is a-ok (www.0xsid.com)
  8. German police name alleged leaders of GandCrab and REvil ransomware groups (krebsonsecurity.com)
  9. What being ripped off taught me (belief.horse)
  10. Sam Altman may control our future – can he be trusted? (www.newyorker.com)
  11. Is Germany's gold safe in New York ? (www.dw.com)
  12. Age verification as mass surveillance infrastructure (tboteproject.com)
  13. France pulls last gold held in US (www.mining.com)
  14. Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B (github.com)
  15. The 1987 game “The Last Ninja” was 40 kilobytes (twitter.com)

GitHub Trending(14)

  1. abhigyanpatwari / GitNexus
  2. google-ai-edge / gallery
  3. block / goose
  4. google-ai-edge / LiteRT-LM
  5. immich-app / immich
  6. KeygraphHQ / shannon
  7. NousResearch / hermes-agent
  8. tobi / qmd
  9. TelegramMessenger / Telegram-iOS
  10. kepano / obsidian-skills
  11. ollama / ollama
  12. ggml-org / llama.cpp
  13. siddharthvaddem / openscreen
  14. NVIDIA / personaplex

Product Hunt(15)

  1. Glassbrain

    Visual trace replay for AI apps to fix bugs in one click

  2. Ogoron

    Your best QA team — 9x faster, 20х cheaper

  3. AgentPulse by Rectify

    Everything in OpenClaw's terminal, you can now do visually

  4. PixVerse V6

    The AI video model that actually feels alive.

  5. Mailero

    Turn support emails into tickets

  6. Moonshot

    Track the Artemis II mission from your Mac

  7. HyperCap

    Remap Caps Lock to a hyperkey, just hold it + any key

  8. Adapted

    AI Physical Therapy for Athletes

  9. DebtMeltPro

    Compare debt payoff strategies and become debt-free faster

  10. Predflow AI

    Your AI agent for ad performance

  11. KREV

    AI creative agents for ecommerce brands

  12. Walkie

    Free local speech-to-text tool

  13. Deploy Hermes

    Private Telegram AI agents, live in under a minute

  14. Epismo Context Pack

    Portable memory for agent workflows

  15. Metoro

    AI SRE that detects, root causes & auto-fixes K8s incidents

Hugging Face(15)

  1. Self-Distilled RLVR

    On-policy distillation (OPD) has become a popular training paradigm in the LLM community. This paradigm selects a larger model as the teacher to provide dense, fine-grained signals for each sampled trajectory, in contrast to reinforcement learning with verifiable rewards (RLVR), which only obtains sparse signals from verifiable outcomes in the environment. Recently, the community has explored on-policy self-distillation (OPSD), where the same model serves as both teacher and student, with the teacher receiving additional privileged information such as reference answers to enable self-evolution. This paper demonstrates that learning signals solely derived from the privileged teacher result in severe information leakage and unstable long-term training. Accordingly, we identify the optimal niche for self-distillation and propose RLSD (RLVR with Self-Distillation). Specifically, we leverage self-distillation to obtain token-level policy differences for determining fine-grained update magnitudes, while continuing to use RLVR to derive reliable update directions from environmental feedback (e.g., response correctness). This enables RLSD to simultaneously harness the strengths of both RLVR and OPSD, achieving a higher convergence ceiling and superior training stability.

  2. A Simple Baseline for Streaming Video Understanding

    Recent streaming video understanding methods increasingly rely on complex memory mechanisms to handle long video streams. We challenge this trend with a simple finding: a sliding-window baseline that feeds only the most recent N frames to an off-the-shelf VLM already matches or surpasses published streaming models. We formalize this baseline as SimpleStream and evaluate it against 13 major offline and online video LLM baselines on OVO-Bench and StreamingBench. Despite its simplicity, SimpleStream delivers consistently strong performance. With only 4 recent frames, it reaches 67.7% average accuracy on OVO-Bench and 80.59% on StreamingBench. Controlled ablations further show that the value of longer context is backbone-dependent rather than uniformly increasing with model scale, and reveal a consistent perception-memory trade-off: adding more historical context can improve recall, but often weakens real-time perception. This suggests that stronger memory, retrieval, or compression modules should not be taken as evidence of progress unless they clearly outperform SimpleStream under the same protocol. We therefore argue that future streaming benchmarks should separate recent-scene perception from long-range memory, so that performance improvements from added complexity can be evaluated more clearly.

  3. Token Warping Helps MLLMs Look from Nearby Viewpoints

    Can warping tokens, rather than pixels, help multimodal large language models (MLLMs) understand how a scene appears from a nearby viewpoint? While MLLMs perform well on visual reasoning, they remain fragile to viewpoint changes, as pixel-wise warping is highly sensitive to small depth errors and often introduces geometric distortions. Drawing on theories of mental imagery that posit part-level structural representations as the basis for human perspective transformation, we examine whether image tokens in ViT-based MLLMs serve as an effective substrate for viewpoint changes. We compare forward and backward warping, finding that backward token warping, which defines a dense grid on the target view and retrieves a corresponding source-view token for each grid point, achieves greater stability and better preserves semantic coherence under viewpoint shifts. Experiments on our proposed ViewBench benchmark demonstrate that token-level warping enables MLLMs to reason reliably from nearby viewpoints, consistently outperforming all baselines including pixel-wise warping approaches, spatially fine-tuned MLLMs, and a generative warping method.

  4. Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

    Multimodal Large Language Models (MLLMs) are evolving from passive observers into active agents, solving problems through Visual Expansion (invoking visual tools) and Knowledge Expansion (open-web search). However, existing evaluations fall short: they lack flexible tool integration, test visual and search tools separately, and evaluate primarily by final answers. Consequently, they cannot verify if tools were actually invoked, applied correctly, or used efficiently. To address this, we introduce Agentic-MME, a process-verified benchmark for Multimodal Agentic Capabilities. It contains 418 real-world tasks across 6 domains and 3 difficulty levels to evaluate capability synergy, featuring over 2,000 stepwise checkpoints that average 10+ person-hours of manual annotation per task. Each task includes a unified evaluation framework supporting sandboxed code and APIs, alongside a human reference trajectory annotated with stepwise checkpoints along dual-axis: S-axis and V-axis. To enable true process-level verification, we audit fine-grained intermediate states rather than just final answers, and quantify efficiency via an overthinking metric relative to human trajectories. Experimental results show the best model, Gemini3-pro, achieves 56.3% overall accuracy, which falls significantly to 23.0% on Level-3 tasks, underscoring the difficulty of real-world multimodal agentic problem solving.

  5. Test-Time Scaling Makes Overtraining Compute-Optimal

    Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test (T^2) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. T^2 modernizes pretraining scaling laws with pass@k modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from T^2 are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that T^2 scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making T^2 scaling meaningful in modern deployments.

  6. Communicating about Space: Language-Mediated Spatial Integration Across Partial Views

    Humans build shared spatial understanding by communicating partial, viewpoint-dependent observations. We ask whether Multimodal Large Language Models (MLLMs) can do the same, aligning distinct egocentric views through dialogue to form a coherent, allocentric mental model of a shared environment. To study this systematically, we introduce COSMIC, a benchmark for Collaborative Spatial Communication. In this setting, two static MLLM agents observe a 3D indoor environment from different viewpoints and exchange natural-language messages to solve spatial queries. COSMIC contains 899 diverse scenes and 1250 question-answer pairs spanning five tasks. We find a consistent capability hierarchy, MLLMs are most reliable at identifying shared anchor objects across views, perform worse on relational reasoning, and largely fail at building globally consistent maps, performing near chance, even for the frontier models. Moreover, we find thinking capability yields consistent gains in anchor grounding, but is insufficient for higher-level spatial communication. To contextualize model behavior, we additionally collect 250 human-human dialogues. Humans achieve 95% aggregate accuracy, leaving significant room for improvement for even the best performing model Gemini-3-Pro-Thinking which achieves 72% aggregate accuracy. Moreover, human conversations become increasingly specific as partners converge on a shared mental model, whereas model dialogues continue to explore new possibilities rather than converging, consistent with a limited ability to build and maintain a robust shared mental model. Our code and data is available at https://github.com/ankursikarwar/Cosmic

  7. InCoder-32B-Thinking: Industrial Code World Model for Thinking

    Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason about hardware constraints and timing semantics. In this work, we propose InCoder-32B-Thinking, trained on the data from the Error-driven Chain-of-Thought (ECoT) synthesis framework with an industrial code world model (ICWM) to generate reasoning traces. Specifically, ECoT generates reasoning chains by synthesizing the thinking content from multi-turn dialogue with environmental error feedback, explicitly modeling the error-correction process. ICWM is trained on domain-specific execution traces from Verilog simulation, GPU profiling, etc., learns the causal dynamics of how code affects hardware behavior, and enables self-verification by predicting execution outcomes before actual compilation. All synthesized reasoning traces are validated through domain toolchains, creating training data matching the natural reasoning depth distribution of industrial tasks. Evaluation on 14 general (81.3% on LiveCodeBench v5) and 9 industrial benchmarks (84.0% in CAD-Coder and 38.0% on KernelBench) shows InCoder-32B-Thinking achieves top-tier open-source results across all domains.GPU Optimization

  8. Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression

    The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a dynamic rank allocation strategy that jointly accounts for local reconstruction loss and end-to-end layer importance. Extensive experiments across six LLMs and eight datasets demonstrate that Swift-SVD outperforms state-of-the-art baselines, achieving optimal compression accuracy while delivering 3-70X speedups in end-to-end compression time. Our code will be released upon acceptance.

  9. AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

    With the rise of personalized, persistent LLM agent frameworks such as OpenClaw, human-centered agentic social networks in which teams of collaborative AI agents serve individual users in a social network across multiple domains are becoming a reality. This setting creates novel privacy challenges: agents must coordinate across domain boundaries, mediate between humans, and interact with other users' agents, all while protecting sensitive personal information. While prior work has evaluated multi-agent coordination and privacy preservation, the dynamics and privacy risks of human-centered agentic social networks remain unexplored. To this end, we introduce AgentSocialBench, the first benchmark to systematically evaluate privacy risk in this setting, comprising scenarios across seven categories spanning dyadic and multi-party interactions, grounded in realistic user profiles with hierarchical sensitivity labels and directed social graphs. Our experiments reveal that privacy in agentic social networks is fundamentally harder than in single-agent settings: (1) cross-domain and cross-user coordination creates persistent leakage pressure even when agents are explicitly instructed to protect information, (2) privacy instructions that teach agents how to abstract sensitive information paradoxically cause them to discuss it more (we call it abstraction paradox). These findings underscore that current LLM agents lack robust mechanisms for privacy preservation in human-centered agentic social networks, and that new approaches beyond prompt engineering are needed to make agent-mediated social coordination safe for real-world deployment.

  10. AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

    Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, they maintain state across interactions and translate intermediate outputs into concrete actions. This creates a distinct safety challenge in that harmful behavior may emerge through sequences of individually plausible steps, including intermediate actions that appear locally acceptable but collectively lead to unauthorized actions. We present AgentHazard, a benchmark for evaluating harmful behavior in computer-use agents. AgentHazard contains 2,653 instances spanning diverse risk categories and attack strategies. Each instance pairs a harmful objective with a sequence of operational steps that are locally legitimate but jointly induce unsafe behavior. The benchmark evaluates whether agents can recognize and interrupt harm arising from accumulated context, repeated tool use, intermediate actions, and dependencies across steps. We evaluate AgentHazard on Claude Code, OpenClaw, and IFlow using mostly open or openly deployable models from the Qwen3, Kimi, GLM, and DeepSeek families. Our experimental results indicate that current systems remain highly vulnerable. In particular, when powered by Qwen3-Coder, Claude Code exhibits an attack success rate of 73.63\%, suggesting that model alignment alone does not reliably guarantee the safety of autonomous agents.

  11. Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

    As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks suffer from narrow domain coverage, reliance on generalist tasks, or self-evaluation biases. To bridge this gap, we present XpertBench, a high-fidelity benchmark engineered to assess LLMs across authentic professional domains. XpertBench consists of 1,346 meticulously curated tasks across 80 categories, spanning finance, healthcare, legal services, education, and dual-track research (STEM and Humanities). These tasks are derived from over 1,000 submissions by domain experts--including researchers from elite institutions and practitioners with extensive clinical or industrial experience--ensuring superior ecological validity. Each task uses detailed rubrics with mostly 15-40 weighted checkpoints to assess professional rigor. To facilitate scalable yet human-aligned assessment, we introduce ShotJudge, a novel evaluation paradigm that employs LLM judges calibrated with expert few-shot exemplars to mitigate self-rewarding biases. Our empirical evaluation of state-of-the-art LLMs reveals a pronounced performance ceiling: even leading models achieve a peak success rate of only ~66%, with a mean score around 55%. Models also exhibit domain-specific divergence, showing non-overlapping strengths in quantitative reasoning versus linguistic synthesis.. These findings underscore a significant "expert-gap" in current AI systems and establish XpertBench as a critical instrument for navigating the transition from general-purpose assistants to specialized professional collaborators.

  12. VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

    Vision Language Models (VLMs) achieve impressive performance across a wide range of multimodal tasks. However, on some tasks that demand fine-grained visual perception, they often fail even when the required information is present in their internal representations. In this work, we demonstrate that this gap arises from their narrow training pipeline which focuses on moving visual information to the textual space. Consequently, VLMs can only reason about visual entities that can be mapped to known concepts in the language space, leaving vision-focused tasks such as visual correspondence and reasoning about novel visual entities poorly supported. As a result, VLMs are severely limited in several important multimodal capabilities because they rely on brittle, hallucinated textual descriptions of visual entities that they cannot map to textual representations. We verify this behavior through visual correspondence tasks, in which VLMs must detect matching entities between two images. Testing across semantic, shape, and face correspondence tasks, we find that VLMs perform much better when the relevant entities are nameable in language than when they are unnameable. Mechanistically, our Logit Lens analyses confirm that VLMs explicitly assign semantic labels to nameable entities and surface more unique corresponding tokens compared to unnameable entities. Furthermore, we show that teaching completely arbitrary names for unknown entities improves performance, yet task-specific finetuning yields even stronger generalization without relying on language priors. Our findings suggest that current VLM failures on visual tasks reflect learned shortcuts from their training, rather than a fundamental limitation of multimodal architectures.

  13. Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

    Distilling video generation models to extremely low inference budgets (e.g., 2--4 NFEs) is crucial for real-time deployment, yet remains challenging. Trajectory-style consistency distillation often becomes conservative under complex video dynamics, yielding an over-smoothed appearance and weak motion. Distribution matching distillation (DMD) can recover sharp, mode-seeking samples, but its local training signals do not explicitly regularize how denoising updates compose across timesteps, making composed rollouts prone to drift. To overcome this challenge, we propose Self-Consistent Distribution Matching Distillation (SC-DMD), which explicitly regularizes the endpoint-consistent composition of consecutive denoising updates. For real-time autoregressive video generation, we further treat the KV cache as a quality parameterized condition and propose Cache-Distribution-Aware training. This training scheme applies SC-DMD over multi-step rollouts and introduces a cache-conditioned feature alignment objective that steers low-quality outputs toward high-quality references. Across extensive experiments on both non-autoregressive backbones (e.g., Wan~2.1) and autoregressive real-time paradigms (e.g., Self Forcing), our method, dubbed Salt, consistently improves low-NFE video generation quality while remaining compatible with diverse KV-cache memory mechanisms. Source code will be released at https://github.com/XingtongGe/Salt{https://github.com/XingtongGe/Salt}.

  14. Do World Action Models Generalize Better than VLAs? A Robustness Study

    Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting how it will evolve in response to actions. Vision-language-action (VLA), which repurpose large-scale vision-language models for robot action generation using action experts, have achieved notable success across a variety of robotic tasks. Nevertheless, their performance remains constrained by the scope of their training data, exhibiting limited generalization to unseen scenarios and vulnerability to diverse contextual perturbations. More recently, world models have been revisited as an alternative to VLAs. These models, referred to as world action models (WAMs), are built upon world models that are trained on large corpora of video data to predict future states. With minor adaptations, their latent representation can be decoded into robot actions. It has been suggested that their explicit dynamic prediction capacity, combined with spatiotemporal priors acquired from web-scale video pretraining, enables WAMs to generalize more effectively than VLAs. In this paper, we conduct a comparative study of prominent state-of-the-art VLA policies and recently released WAMs. We evaluate their performance on the LIBERO-Plus and RoboTwin 2.0-Plus benchmarks under various visual and language perturbations. Our results show that WAMs achieve strong robustness, with LingBot-VA reaching 74.2% success rate on RoboTwin 2.0-Plus and Cosmos-Policy achieving 82.2% on LIBERO-Plus. While VLAs such as π_{0.5} can achieve comparable robustness on certain tasks, they typically require extensive training with diverse robotic datasets and varied learning objectives. Hybrid approaches that partially incorporate video-based dynamic learning exhibit intermediate robustness, highlighting the importance of how video priors are integrated.

  15. CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

    Recent vision-language models (VLMs) typically rely on a single vision encoder trained with contrastive image-text objectives, such as CLIP-style pretraining. While contrastive encoders are effective for cross-modal alignment and retrieval, self-supervised visual encoders often capture richer dense semantics and exhibit stronger robustness on recognition and understanding tasks. In this work, we investigate how to scale the fusion of these complementary visual representations for vision-language modeling. We propose CoME-VL: Complementary Multi-Encoder Vision-Language, a modular fusion framework that integrates a contrastively trained vision encoder with a self-supervised DINO encoder. Our approach performs representation-level fusion by (i) entropy-guided multi-layer aggregation with orthogonality-constrained projections to reduce redundancy, and (ii) RoPE-enhanced cross-attention to align heterogeneous token grids and produce compact fused visual tokens. The fused tokens can be injected into a decoder-only LLM with minimal changes to standard VLM pipelines. Extensive experiments across diverse vision-language benchmarks demonstrate that CoME-VL consistently outperforms single-encoder baselines. In particular, we observe an average improvement of 4.9% on visual understanding tasks and 5.4% on grounding tasks. Our method achieves state-of-the-art performance on RefCOCO for detection while improving over the baseline by a large margin. Finally, we conduct ablation studies on layer merging, non-redundant feature mixing, and fusion capacity to evaluate how complementary contrastive and self-supervised signals affect VLM performance.

Techmeme(15)

  1. The rapid adoption of AI coding tools has let workers generate massive volumes of code, leaving companies scrambling to review and secure the AI-generated code (New York Times)

    New York Times : The rapid adoption of AI coding tools has let workers generate massive volumes of code, leaving companies scrambling to review and secure the AI-generated code —  When a financial services company recently began using Cursor, an artificial intelligence technology that writes computer code, the difference that it made was immediate.

  2. Sources: OpenAI, Anthropic, and Google are sharing information via the Frontier Model Forum to detect adversarial distillation attempts that violate their ToS (Bloomberg)

    Bloomberg : Sources: OpenAI, Anthropic, and Google are sharing information via the Frontier Model Forum to detect adversarial distillation attempts that violate their ToS —  Rivals OpenAI, Anthropic PBC, and Alphabet Inc.'s Google have begun working together to try to clamp down on Chinese competitors extracting results …

  3. Australian AI infrastructure startup Firmus raised $505M led by Coatue at a $5.5B valuation, bringing its funding raised in the last six months to $1.35B (Ian King/Bloomberg)

    Ian King / Bloomberg : Australian AI infrastructure startup Firmus raised $505M led by Coatue at a $5.5B valuation, bringing its funding raised in the last six months to $1.35B —  Data center builder Firmus Technologies Pty raised $505 million in an investment round led by Coatue Management LLC …

  4. A look at Eko, whose Arkansas "capture factory" creates digital product catalogs intended to serve as training data for retail-focused AI models (Sarah Nassauer/Wall Street Journal)

    Sarah Nassauer / Wall Street Journal : A look at Eko, whose Arkansas “capture factory” creates digital product catalogs intended to serve as training data for retail-focused AI models —  In an Arkansas ‘capture factory,’ hand models and food stylists are preparing for the future of shopping

  5. How social media became a freak show: X punishes external links and most top accounts, such as Catturd, are very low-quality but get more engagement than NYT (Nate Silver/Silver Bulletin)

    Nate Silver / Silver Bulletin : How social media became a freak show: X punishes external links and most top accounts, such as Catturd, are very low-quality but get more engagement than NYT —  The ecosystem is unhealthy, especially on Twitter, and that's producing some strange beasts among the most influential accounts.

  6. A federal appeals court rules New Jersey cannot block Kalshi users in the state from sports-related event contracts, finding CFTC has exclusive jurisdiction (Nate Raymond/Reuters)

    Nate Raymond / Reuters : A federal appeals court rules New Jersey cannot block Kalshi users in the state from sports-related event contracts, finding CFTC has exclusive jurisdiction —  A federal appeals court ruled on Monday that New Jersey gaming regulators cannot prevent Kalshi from allowing people in the state …

  7. Sources: Meta is preparing to release the first AI models developed under Alexandr Wang, with plans to offer versions of those models via an open source license (Ina Fried/Axios)

    Ina Fried / Axios : Sources: Meta is preparing to release the first AI models developed under Alexandr Wang, with plans to offer versions of those models via an open source license —  Meta is preparing to release the first new AI models developed under Alexandr Wang, with plans to eventually offer versions …

  8. Netflix launches Netflix Playground, a games app for kids aged eight and under, in the US, UK, Canada, Australia, the Philippines, and New Zealand (Andrew Webster/The Verge)

    Andrew Webster / The Verge : Netflix launches Netflix Playground, a games app for kids aged eight and under, in the US, UK, Canada, Australia, the Philippines, and New Zealand —  It's called Netflix Playground, and it's out now. … Netflix has made family-friendly titles a key part of its current games strategy …

  9. Source: Binance Chief Compliance Officer Noah Perlman is looking to leave in 2026 or 2027; other senior compliance staff have departed over the past few months (Olga Kharif/Bloomberg)

    Olga Kharif / Bloomberg : Source: Binance Chief Compliance Officer Noah Perlman is looking to leave in 2026 or 2027; other senior compliance staff have departed over the past few months —  When Binance pleaded guilty to US sanctions and anti-money-laundering violations in late 2023, rebuilding its compliance operation was key to the deal.

  10. OpenAI buying TBPN makes little sense, par for the course for a company that, like Twitter, stumbled into a big market and may never build a functional business (Ben Thompson/Stratechery)

    Ben Thompson / Stratechery : OpenAI buying TBPN makes little sense, par for the course for a company that, like Twitter, stumbled into a big market and may never build a functional business —  OpenAI's purchase of TBPN makes no sense, which may be par for the course for OpenAI.  Then, AI is breaking stuff, starting with tech services.

  11. How advanced chip packaging became one of Intel's fast-growing businesses; sources: Intel is in talks with Google and Amazon for its advanced packaging services (Lauren Goode/Wired)

    Lauren Goode / Wired : How advanced chip packaging became one of Intel's fast-growing businesses; sources: Intel is in talks with Google and Amazon for its advanced packaging services —  Advanced chip packaging is suddenly at the center of the AI boom.  Intel is going all in.  —  Sixteen miles north of Albuquerque …

  12. Xoople, which is developing a satellite constellation to collect earth data for training AI models, raised a $130M Series B, bringing its total funding to $225M (Tim Fernholz/TechCrunch)

    Tim Fernholz / TechCrunch : Xoople, which is developing a satellite constellation to collect earth data for training AI models, raised a $130M Series B, bringing its total funding to $225M —  Space data companies have argued for years that the private sector needs their products, but the real uptake has been from government buyers.

  13. Russian cryptocurrency payment network A7 expands to Africa, as Moscow builds an alternative payments system amid western sanctions after its Ukraine invasion (Financial Times)

    Financial Times : Russian cryptocurrency payment network A7 expands to Africa, as Moscow builds an alternative payments system amid western sanctions after its Ukraine invasion —  Videos show A7 office in Nigeria, while company also claims new branch in Zimbabwe  —  A recent vacancy posted …

  14. Interviews with Sam Altman and 100+ people on if he can be trusted amid allegations of consistent lying and more: some defend him as others call him a sociopath (New Yorker)

    New Yorker : Interviews with Sam Altman and 100+ people on if he can be trusted amid allegations of consistent lying and more: some defend him as others call him a sociopath —  New interviews and closely guarded documents shed light on the persistent doubts about the head of OpenAI.

  15. Jack Dorsey says Apple removed his Bluetooth P2P messaging app Bitchat, used during protests in five countries, from the App Store in China at the CAC's request (Stephen Katte/Cointelegraph)

    Stephen Katte / Cointelegraph : Jack Dorsey says Apple removed his Bluetooth P2P messaging app Bitchat, used during protests in five countries, from the App Store in China at the CAC's request —  Bitchat launched in July last year and has been used during protests in Madagascar, Uganda, Nepal, Indonesia and Iran …

Solidot(10)

  1. 德国公开俄罗斯勒索软件组织 REvil 头目的身份

    德国公开了曾在早期运营俄罗斯勒索软件组织 GandCrab 和 REvil 的头目 UNKN 的身份。31 岁的 Daniil Maksimovich Shchukin 于 2019-2021 年间在德国实施了至少 130 起计算机破坏和勒索行动。德国称,Shchukin 以及另一名俄罗斯人——43 岁 Anatoly Sergeevitsch Kravchuk——一起勒索了近 200 万欧元,造成经济损失逾 3500 万欧元。德国联邦刑事警察局(BKA)称他领导的 GandCrab 和 REvil 首创了双重勒索——先向受害者收取赎金提供解锁的密钥,然后再收取一笔费用换取不公开被盗数据的承诺。GandCrab 在 2019 年成功勒索逾 20 亿美元后宣布解散,但随后就以 REvil 的名字再次亮相。

  2. Chrome 148 将延迟加载视频和音频以改进性能

    Chrome 和 Firefox 等浏览器都支持延迟加载。延迟加载又名懒加载,顾名思义,为了加快页面加载速度而推迟加载特定对象,这些对象直到要使用时才会开始加载,如 Chrome 从 2019 年起就延迟加载图像和 iframe。现在它正在 Chrome 148 上测试延迟加载视频和音频以改进浏览器性能。今天很多网站尤其是新闻网站都会在页面中嵌入视频和音频,影响了页面加载速度。

  3. 美国科罗拉多州推出测均速相机系统

    今天的司机有很多方法躲避超速相机的监测,比如手机应用程序能提前通知司机前方有测速相机,司机随后放慢车速,通过之后再加速行驶。为了遏制此类行为,越来越多的地方开始推出测量均速的相机系统:跟踪同一辆汽车在多个监控点之间的均速,如果均速超过限速 10 英里/时或以上,则对相关车辆开出罚单。美国科罗拉多州于 2023 年通过法律允许使用自动车辆识别系统计算汽车在不同摄像头之间的均速,去年底交警开始正式对超速行驶的汽车开罚单。罚款为 75 美元,不扣分,罚单将开给车主,不管驾驶汽车的司机是谁。地图软件如 TomTom 也对此采取了应对措施,为司机提供测均速区域的均速信息。

  4. 认知投降导致 AI 用户放弃逻辑思维能力

    AI 工具的用户通常可分为两类:其一将 AI 视为功能强大但会犯错的服务,需要人类仔细监督和审查以发现其中的推理或事实错误;其二将 AI 视为无所不知——此类用户被称为是“认知投降派”。宾夕法尼亚大学沃顿商学院的研究人员对 1372 名参与者和逾 9500 次测试后发现,高达 73.2% 的情况下参与者愿意接受 AI 错误的推理,只有 19.7% 的情况下会推翻推理。研究人员表示这一结果“表明人很容易将 AI 生成的输出融入到决策过程中,且通常几乎没有任何抵触或怀疑”,“流畅、自信的输出会被视为有认知权威性,从而降低审查门槛,减弱了通常会促使人们进行深思熟虑的元认知信号”。他们发现,倾向于将 AI 视为权威的人更容易被 AI 提供的错误答案误导。

  5. AWS 工程师报告 Linux 7.0 下 PostgreSQL 性能暴降一半

    亚马逊 AWS 工程师 Salvatore Dipietro 报告 Linux 7.0 下 PostgreSQL 的吞吐量和延迟性能出现了显著的下降。Linux 7.0 目前还在开发中,预计会在一两周内发布。测试显示,在基于 arm64 架构的 Graviton4 服务器上 PostgreSQL 的吞吐量仅为上个内核版本的 0.51 倍,原因是用户空间自旋锁导致花费的时间大幅增加。根本原因被认为是 Linux 7.0 新引入的对内核可用抢占模式的限制上。PostgreSQL 开发者要求在不同条件下重复进行更多测试。

  6. Ubuntu 26.04 LTS 的最低内存需求提高到 6GB

    Canonical 即将于本月晚些时候正式释出的 Ubuntu 26.04 LTS 把最低内存需求提高到了 6GB。Ubuntu 14.04 LTS (Trusty Tahr)将最低内存需求设为 1GB,Ubuntu 18.04 LTS (Bionic Beaver)提高到 4GB,现在又一次提高了内存需求。相比下,微软将 Windows 11 的最低内存需求设为 4GB,当然这不过是微软的又一个谎言,没人真的会在只有 4GB 内存的计算机运行微软的最新操作系统,Windows 11 至少需要 8GB 内存。Canonical 并没有将 6GB 内存作为硬条件,用户仍然能在不到 6GB 内存的计算机上安装 Ubuntu 26.04。

  7. 蟒蛇之血含有无副作用的减肥化合物

    蟒蛇能长到电话杆那么大,能一次吞食大量食物然后几个月不进食,同时还能维持代谢健康。根据发表在《Natural Metabolism》期刊上的一项研究,科学家报告在蟒蛇血液中发现了一种抑制食欲的化合物,没有流行减肥药 GLP-1 的副作用。研究发现,在进食后的几个小时内,为帮助消耗食物蟒蛇的心脏会扩张 25% 新陈代谢速度加快 4000 倍。研究人员测量了喂食后的球蟒和缅甸蟒的血液样本,发现有 208 种代谢物在蟒蛇进食后显著增加,其中名为 para-tyramine-O-sulfate(pTOS)的分子浓度飙升了千倍。小鼠实验显示,pTOS 能抑制食欲,降低体重,且不会引起胃肠道问题或肌肉流失。

  8. 美国近半数计划中的数据中心项目推迟或取消

    由于重要电力设施零部件如变压器、开关和电池短缺,美国近半数计划中的数据中心项目推迟或取消。美国计划在 2026 年新增 12 GW 的数据中心容量,但由于各种问题,只有三分之一的数据中心容量在积极建造中。电力基础设施占数据中心总成本的不到 10%,但它与计算硬件同样重要。由于需求旺盛,美国大功率变压器的交货周期从 2020 年前的 24-30 个月大幅延长到五年甚至更长。对 AI 数据中心而言,这无疑是灾难,因为它们的部署周期通常不到 18 个月。为解决短缺美国公司转向了全球市场,加拿大、墨西哥和韩国成为美国 AI 数据中心大功率变压器的主要供应国。数据显示,截至 2025 年 10 月,美国从中国进口的大功率变压器数量从 2022 年的不到 1500 台增至逾 8000 台。除此之外,中国占美国电池进口的 40% 以上,部分变压器和开关设备的份额接近 30%。

  9. 律师滥用 AI 生成虚假案例的事件激增

    律师滥用 AI 生成虚假的不存在案例的情况屡禁不止,而法庭对相关律师的惩罚并没有起到威慑作用。2025 年此类事件的数量出现了激增。巴黎高等商学院 (HEC Paris) 研究员 Damien Charlotin 建立了一个全球数据库,跟踪律师对 AI 的滥用。他说最近一天内收到来自 10 个不同法院的 10 起此类案件。他至今记录到了逾 1200 起滥用 AI 生成虚构案例的事件,其中美国最多,高达 831 起,香港记录到了 2 起。Damien Charlotin 说,法庭最近也开始加大了惩罚力度,俄勒冈州一名律师因滥用 AI 被勒令支付 109,700 美元的罚款和诉讼费用。

  10. Gentoo GNU/Hurd 不是愚人节玩笑

    4 月 1 日愚人节,Gentoo Linux 项目宣布将把 GNU Hurd 作为其主要内核。这并非完全是愚人节玩笑,它真的发布了 Gentoo GNU/Hurd 移植版本。基于微内核架构的 GNU Hurd 至今有逾 35 历史,但 1.0 版本还未发布,最新版本是 2016 年的 v0.9。Gentoo 项目表示它的 GNU/Hurd 版本仍然处于实验阶段,建议想要尝试的用户通过 QEMU 模拟器运行,当然也可以挑战直接在硬件上运行。