DIGEST · 2026-04-02

OrangeBot.AI Digest — 2026-04-02

79 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Tailscale's new macOS home (tailscale.com)
  2. Cursor 3 (cursor.com)
  3. Google releases Gemma 4 open models (deepmind.google)
  4. Delve allegedly forked an open-source tool and sold it as its own (techcrunch.com)
  5. Renewables reached nearly 50% of global electricity capacity last year (www.theregister.com)
  6. Artemis II will use laser beams to live-stream 4K moon footage at 260 Mbps (www.tomshardware.com)
  7. Artemis computer running two instances of MS outlook; they can't figure out why (bsky.app)
  8. Qwen3.6-Plus: Towards real world agents (qwen.ai)
  9. LinkedIn is searching your browser extensions (browsergate.eu)
  10. Lemonade by AMD: a fast and open source local LLM server using GPU and NPU (lemonade-server.ai)
  11. Inside Nepal's Fake Rescue Racket (kathmandupost.com)
  12. Sweden goes back to basics, swapping screens for books in the classroom (undark.org)
  13. Significant raise of reports (lwn.net)
  14. I Am Not A Number. In memory of the more than 72,000 Palestinians killed (bkhmsi.github.io)
  15. IBM Announces Strategic Collaboration with Arm (newsroom.ibm.com)

GitHub Trending(4)

  1. siddharthvaddem / openscreen
  2. Yeachan-Heo / oh-my-codex
  3. asgeirtj / system_prompts_leaks
  4. sherlock-project / sherlock

Product Hunt(15)

  1. Lightning V3

    Text-to-Speech built for Voice Agents

  2. SampleStack

    The native macOS sample manager built for every instrument

  3. Denovo

    Build and run your business while you sleep.

  4. Syncly Social

    Find creators by what's actually in their content

  5. Mngr

    Run 100s of Claude agents in parallel

  6. tama96

    A Tamagotchi for your desktop, terminal, and AI agents

  7. Nitro by Rocketlane

    AI agents for modern service delivery

  8. Wan 2.7-Image

    Interactive pixel-level editing and consistent storyboards

  9. Chunk

    An essential macos productivity app

  10. Protocol: Survival

    Know your gaps. Close them. Before it matters.

  11. Roger AI

    Your friendly screen guide for any task!

  12. GitCity

    Your GitHub contributions as 3D city you can drive through

  13. Gyuni Player

    A more refined video player for Mac

  14. Mac Pet

    A pixel pet for your menu bar or MacBook notch w/ Pomodoro

  15. OpenYak

    The open-source Claude Desktop with any model you want

Hugging Face(15)

  1. ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

    OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution. Existing security measures for the OpenClaw ecosystem remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. To bridge this gap, we present ClawKeeper, a real-time security framework that integrates multi-dimensional protection mechanisms across three complementary architectural layers. (1) Skill-based protection operates at the instruction level, injecting structured security policies directly into the agent context to enforce environment-specific constraints and cross-platform boundaries. (2) Plugin-based protection serves as an internal runtime enforcer, providing configuration hardening, proactive threat detection, and continuous behavioral monitoring throughout the execution pipeline. (3) Watcher-based protection introduces a novel, decoupled system-level security middleware that continuously verifies agent state evolution. It enables real-time execution intervention without coupling to the agent's internal logic, supporting operations such as halting high-risk actions or enforcing human confirmation. We argue that this Watcher paradigm holds strong potential to serve as a foundational building block for securing next-generation autonomous agent systems. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of ClawKeeper across diverse threat scenarios. We release our code.

  2. Terminal Agents Suffice for Enterprise Automation

    There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Yet, it remains unclear whether such complex agentic systems are necessary given their cost and operational overhead. We argue that a coding agent equipped only with a terminal and a filesystem can solve many enterprise tasks more effectively by interacting directly with platform APIs. We evaluate this hypothesis across diverse real-world systems and show that these low-level terminal agents match or outperform more complex agent architectures. Our findings suggest that simple programmatic interfaces, combined with strong foundation models, are sufficient for practical enterprise automation.

  3. MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

    Recent progress in deep research systems has been impressive, but evaluation still lags behind real user needs. Existing benchmarks predominantly assess final reports using fixed rubrics, failing to evaluate the underlying research process. Most also offer limited multimodal coverage, rely on synthetic tasks that do not reflect real-world query complexity, and cannot be refreshed as knowledge evolves. To address these gaps, we introduce MiroEval, a benchmark and evaluation framework for deep research systems. The benchmark comprises 100 tasks (70 text-only, 30 multimodal), all grounded in real user needs and constructed via a dual-path pipeline that supports periodic updates, enabling a live and evolving setting. The proposed evaluation suite assesses deep research systems along three complementary dimensions: adaptive synthesis quality evaluation with task-specific rubrics, agentic factuality verification via active retrieval and reasoning over both web sources and multimodal attachments, and process-centric evaluation audits how the system searches, reasons, and refines throughout its investigation. Evaluation across 13 systems yields three principal findings: the three evaluation dimensions capture complementary aspects of system capability, with each revealing distinct strengths and weaknesses across systems; process quality serves as a reliable predictor of overall outcome while revealing weaknesses invisible to output-level metrics; and multimodal tasks pose substantially greater challenges, with most systems declining by 3 to 10 points. The MiroThinker series achieves the most balanced performance, with MiroThinker-H1 ranking the highest overall in both settings. Human verification and robustness results confirm the reliability of the benchmark and evaluation framework. MiroEval provides a holistic diagnostic tool for the next generation of deep research agents.

  4. ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

    Beneath the stunning visual fidelity of modern AIGC models lies a "logical desert", where systems fail tasks that require physical, causal, or complex spatial reasoning. Current evaluations largely rely on superficial metrics or fragmented benchmarks, creating a ``performance mirage'' that overlooks the generative process. To address this, we introduce ViGoR Vision-G}nerative Reasoning-centric Benchmark), a unified framework designed to dismantle this mirage. ViGoR distinguishes itself through four key innovations: 1) holistic cross-modal coverage bridging Image-to-Image and Video tasks; 2) a dual-track mechanism evaluating both intermediate processes and final results; 3) an evidence-grounded automated judge ensuring high human alignment; and 4) granular diagnostic analysis that decomposes performance into fine-grained cognitive dimensions. Experiments on over 20 leading models reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR as a critical ``stress test'' for the next generation of intelligent vision models. The demo have been available at https://vincenthancoder.github.io/ViGoR-Bench/

  5. Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

    Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website development. The benchmark is constructed from real-world websites and comprises a total of 193 tasks across 16 categories, with 918 prototype images and 1,255 test cases. To support flexible, thorough and reliable evaluation, we propose workflow-based agent verification paradigm based on two complementary components: a GUI agent verifier and a VLM-based judge. We evaluate multiple visual language models instantiated under different coding-agent frameworks, revealing substantial performance gaps at all task levels, with state-of-the-art models still struggling on full-stack development.

  6. QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

    Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce QuitoBench, a regime-balanced benchmark for time series forecasting with coverage across eight trendtimesseasonalitytimesforecastability (TSF) regimes, designed to capture forecasting-relevant properties rather than application-defined domain labels. The benchmark is built upon Quito, a billion-scale time series corpus of application traffic from Alipay spanning nine business domains. Benchmarking 10 models from deep learning, foundation models, and statistical baselines across 232,200 evaluation instances, we report four key findings: (i) a context-length crossover where deep learning models lead at short context (L=96) but foundation models dominate at long context (L ge 576); (ii) forecastability is the dominant difficulty driver, producing a 3.64 times MAE gap across regimes; (iii) deep learning models match or surpass foundation models at 59 times fewer parameters; and (iv) scaling the amount of training data provides substantially greater benefit than scaling model size for both model families. These findings are validated by strong cross-benchmark and cross-metric consistency. Our open-source release enables reproducible, regime-aware evaluation for time series forecasting research.

  7. Reasoning Shift: How Context Silently Shortens LLM Reasoning

    Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning traces and self-verification, have demonstrated remarkable performance on complex, long-term reasoning tasks. However, the robustness of these reasoning behaviors remains underexplored. To investigate this, we conduct a systematic evaluation of multiple reasoning models across three scenarios: (1) problems augmented with lengthy, irrelevant context; (2) multi-turn conversational settings with independent tasks; and (3) problems presented as a subtask within a complex task. We observe an interesting phenomenon: reasoning models tend to produce much shorter reasoning traces (up to 50%) for the same problem under different context conditions compared to the traces produced when the problem is presented in isolation. A finer-grained analysis reveals that this compression is associated with a decrease in self-verification and uncertainty management behaviors, such as double-checking. While this behavioral shift does not compromise performance on straightforward problems, it might affect performance on more challenging tasks. We hope our findings draw additional attention to both the robustness of reasoning models and the problem of context management for LLMs and LLM-based agents.

  8. Brevity Constraints Reverse Performance Hierarchies in Language Models

    Standard evaluation protocols reveal a counterintuitive phenomenon: on 7.7% of benchmark problems spanning five datasets, larger language models underperform smaller ones by 28.4 percentage points despite 10-100x more parameters. Through systematic evaluation of 31 models (0.5B-405B parameters) across 1,485 problems, we identify the mechanism as spontaneous scale-dependent verbosity that introduces errors through overelaboration. Causal intervention experiments demonstrate this reflects correctable prompt design rather than fundamental capability limitations. Constraining large models to produce brief responses improves accuracy by 26 percentage points and reduces performance gaps by up to two-thirds. Most critically, brevity constraints completely reverse performance hierarchies on mathematical reasoning and scientific knowledge benchmarks, with large models achieving 7.7-15.9 percentage point advantages over small models -- direct inversions of the original gaps. These reversals prove large models possess superior latent capabilities that universal prompting masks. We validate findings through three independent contamination tests and demonstrate inverse scaling operates continuously across the full parameter spectrum, with dataset-specific optimal scales ranging from 0.5B to 3.0B parameters. Our results establish that maximizing large model performance requires scale-aware prompt engineering rather than universal evaluation protocols, with immediate implications for deployment: prompt adaptation simultaneously improves accuracy and reduces computational costs.

  9. HippoCamp: Benchmarking Contextual Agents on Personal Computers

    We present HippoCamp, a new benchmark designed to evaluate agents' capabilities on multimodal file management. Unlike existing agent benchmarks that focus on tasks like web interaction, tool use, or software automation in generic settings, HippoCamp evaluates agents in user-centric environments to model individual user profiles and search massive personal files for context-aware reasoning. Our benchmark instantiates device-scale file systems over real-world profiles spanning diverse modalities, comprising 42.4 GB of data across over 2K real-world files. Building upon the raw files, we construct 581 QA pairs to assess agents' capabilities in search, evidence perception, and multi-step reasoning. To facilitate fine-grained analysis, we provide 46.1K densely annotated structured trajectories for step-wise failure diagnosis. We evaluate a wide range of state-of-the-art multimodal large language models (MLLMs) and agentic methods on HippoCamp. Our comprehensive experiments reveal a significant performance gap: even the most advanced commercial models achieve only 48.3% accuracy in user profiling, struggling particularly with long-horizon retrieval and cross-modal reasoning within dense personal file systems. Furthermore, our step-wise failure diagnosis identifies multimodal perception and evidence grounding as the primary bottlenecks. Ultimately, HippoCamp exposes the critical limitations of current agents in realistic, user-centric environments and provides a robust foundation for developing next-generation personal AI assistants.

  10. PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

    We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning. PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic, spanning perceptual subtasks such as objects, attributes, relations, locations, actions, and events, and requiring skills including semantic recognition, visual correspondence, temporal reasoning, and spatial reasoning. The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains including city walk tours, indoor villa tours, video games, and extreme outdoor sports, with 100% manual annotation. Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants take much longer than on prior benchmarks, and accuracy drops to near chance (18.97%) when rewatching is disallowed. State-of-the-art MLLMs also perform substantially worse on PerceptionComp than on existing benchmarks: the best model in our evaluation, Gemini-3-Flash, reaches only 45.96% accuracy in the five-choice setting, while open-source models remain below 40%. These results suggest that perception-centric long-horizon video reasoning remains a major bottleneck, and we hope PerceptionComp will help drive progress in perceptual reasoning.

  11. Universal YOCO for Efficient Depth Scaling

    The rise of test-time scaling has remarkably boosted the reasoning and agentic proficiency of Large Language Models (LLMs). Yet, standard Transformers struggle to scale inference-time compute efficiently, as conventional looping strategies suffer from high computational overhead and a KV cache that inflates alongside model depth. We present Universal YOCO (YOCO-U), which combines the YOCO decoder-decoder architecture with recursive computation to achieve a synergistic effect greater than either alone. Built on the YOCO framework, YOCO-U implements a Universal Self-Decoder that performs multiple iterations via parameter sharing, while confining the iterative process to shallow, efficient-attention layers. This combination yields a favorable capability-efficiency tradeoff that neither YOCO nor recursion achieves independently. The YOCO architecture provides a constant global KV cache and linear pre-filling, while partial recursion enhances representational depth with limited overhead. Together, YOCO-U improves token utility and scaling behavior while maintaining efficient inference. Empirical results confirm that YOCO-U remains highly competitive in general and long-context benchmarks, demonstrating that the integration of efficient-attention architectures and recursive computation is a promising direction for scalable LLMs.

  12. GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

    Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using a causal transformer with 3D rotary positional embedding, enabling sequential generation of spatial structure and appearance. Unlike diffusion-based methods that refine scenes holistically, our formulation constructs scenes step-by-step, naturally supporting completion, outpainting, controllable sampling via temperature, and flexible generation horizons. This formulation leverages the compositional inductive biases and scalability of autoregressive modeling while operating on explicit representations compatible with modern neural rendering pipelines, positioning autoregressive transformers as a complementary paradigm for controllable and context-aware 3D generation.

  13. Embarrassingly Simple Self-Distillation Improves Code Generation

    Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

  14. Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

    This paper introduces the first systematic evaluation framework for quantifying the quality and risks of papers written by modern coding agents. While AI-driven paper writing has become a growing concern, rigorous evaluation of the quality and potential risks of AI-written papers remains limited, and a unified understanding of their reliability is still lacking. We introduce Paper Reconstruction Evaluation (PaperRecon), an evaluation framework in which an overview (overview.md) is created from an existing paper, after which an agent generates a full paper based on the overview and minimal additional resources, and the result is subsequently compared against the original paper. PaperRecon disentangles the evaluation of the AI-written papers into two orthogonal dimensions, Presentation and Hallucination, where Presentation is evaluated using a rubric and Hallucination is assessed via agentic evaluation grounded in the original paper source. For evaluation, we introduce PaperWrite-Bench, a benchmark of 51 papers from top-tier venues across diverse domains published after 2025. Our experiments reveal a clear trade-off: while both ClaudeCode and Codex improve with model advances, ClaudeCode achieves higher presentation quality at the cost of more than 10 hallucinations per paper on average, whereas Codex produces fewer hallucinations but lower presentation quality. This work takes a first step toward establishing evaluation frameworks for AI-driven paper writing and improving the understanding of its risks within the research community.

  15. Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

    Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

Techmeme(15)

  1. LinkedIn job posting data: companies added 640K AI-related jobs from 2023 to 2025 in the US, including 225K "head of AI" jobs, up 49% from the prior four years (Te-Ping Chen/Wall Street Journal)

    Te-Ping Chen / Wall Street Journal : LinkedIn job posting data: companies added 640K AI-related jobs from 2023 to 2025 in the US, including 225K “head of AI” jobs, up 49% from the prior four years —  AI is raising big fears about employment losses, but it is also giving rise to new engineering and training jobs

  2. Sources: SpaceX is floating a $2T+ valuation to prospective investors in its IPO; SpaceX's acquisition of xAI reportedly valued the combined company at $1.25T (Bloomberg)

    Bloomberg : Sources: SpaceX is floating a $2T+ valuation to prospective investors in its IPO; SpaceX's acquisition of xAI reportedly valued the combined company at $1.25T —  SpaceX boosted its target IPO valuation above $2 trillion, according to people familiar with the matter, as the world's …

  3. Source: OpenAI bought TBPN, which was set to generate $30M in 2026, for "low hundreds of millions of dollars"; OpenAI says TBPN will be editorially independent (George Hammond/Financial Times)

    George Hammond / Financial Times : Source: OpenAI bought TBPN, which was set to generate $30M in 2026, for “low hundreds of millions of dollars”; OpenAI says TBPN will be editorially independent —  ChatGPT-maker moves into broadcasting with deal for TBPN after it had pledged to abandon ‘side-quests’

  4. The CFTC sues Arizona, Connecticut, and Illinois over their actions against prediction markets, saying it has the "exclusive" authority to regulate such markets (Alex Harring/CNBC)

    Alex Harring / CNBC : The CFTC sues Arizona, Connecticut, and Illinois over their actions against prediction markets, saying it has the “exclusive” authority to regulate such markets —  A federal commission on Wednesday announced lawsuits against three states over its ability to exclusively regulate prediction markets.

  5. Mental health startup Kintsugi is shutting down and open-sourcing its AI tech to detect depression and anxiety, after failing to secure FDA clearance (Robert Hart/The Verge)

    Robert Hart / The Verge : Mental health startup Kintsugi is shutting down and open-sourcing its AI tech to detect depression and anxiety, after failing to secure FDA clearance —  Instead, a mental health startup shut down and open-sourced its tech. … For the past seven years, the California-based startup Kintsugi …

  6. OpenAI acquires tech news show TBPN; Fidji Simo says the move aims to "help create a space for a real, constructive conversation about the changes AI creates" (Katie Deighton/Wall Street Journal)

    Katie Deighton / Wall Street Journal : OpenAI acquires tech news show TBPN; Fidji Simo says the move aims to “help create a space for a real, constructive conversation about the changes AI creates” —  TBPN staff will help with marketing and communications at OpenAI but keep their editorial independence, the ChatGPT parent says

  7. OpenAI acquires popular tech news show TBPN; the show will stay the same and will continue to air live at 11am PT weekdays (John Coogan/@johncoogan)

    John Coogan / @johncoogan : OpenAI acquires popular tech news show TBPN; the show will stay the same and will continue to air live at 11am PT weekdays —  TBPN has been acquired by OpenAI! The show is staying the same and we'll continue to go live at 11am pacific every weekday. This is a full circle moment for me as I've worked with @sama for well over a decade. He funded my first company in 2013. Then helped us fix a serious

  8. Cursor launches Cursor 3, an "agent-first" coding product designed to compete with Claude Code and Codex by letting developers manage multiple AI agents (Maxwell Zeff/Wired)

    Maxwell Zeff / Wired : Cursor launches Cursor 3, an “agent-first” coding product designed to compete with Claude Code and Codex by letting developers manage multiple AI agents —  As Cursor launches the next generation of its product, the AI coding startup has to compete with OpenAI and Anthropic more directly than ever.

  9. Flipboard launches Surf, an app for creating custom feeds from Mastodon, Threads, Bluesky, RSS, podcasts, and YouTube, after over a year in beta (David Pierce/The Verge)

    David Pierce / The Verge : Flipboard launches Surf, an app for creating custom feeds from Mastodon, Threads, Bluesky, RSS, podcasts, and YouTube, after over a year in beta —  It combines Bluesky, Mastodon, RSS, and other content into something that feels entirely new. … Surf is a slightly hard app to explain.

  10. Google adds new features to its video editing app Vids, including directing and customizing avatars through text prompts and Veo 3.1 support (Ivan Mehta/TechCrunch)

    Ivan Mehta / TechCrunch : Google adds new features to its video editing app Vids, including directing and customizing avatars through text prompts and Veo 3.1 support —  Google on Thursday added new features to its video editor app Vids, including directing and customizing avatars through text prompts, Veo 3.1 support …

  11. Google launches Gemma 4, its "most intelligent" open model family, purpose-built for advanced reasoning and agentic workflows, under an Apache 2.0 license (The Keyword)

    The Keyword : Google launches Gemma 4, its “most intelligent” open model family, purpose-built for advanced reasoning and agentic workflows, under an Apache 2.0 license —  Today, we are introducing Gemma 4 — our most intelligent open models to date.  Purpose-built for advanced reasoning …

  12. Coinbase says it has won conditional approval from US banking regulators for a national trust company charter, which could let it issue stablecoins and more (Olga Kharif/Bloomberg)

    Olga Kharif / Bloomberg : Coinbase says it has won conditional approval from US banking regulators for a national trust company charter, which could let it issue stablecoins and more —  Coinbase Global Inc., the largest US cryptocurrency exchange, said it has won conditional approval from banking regulators …

  13. Mustafa Suleyman says Microsoft is "not able to build models in the very largest scale yet" but its "computation ramp is coming to enable us to do" it in 2026 (Financial Times)

    Financial Times : Mustafa Suleyman says Microsoft is “not able to build models in the very largest scale yet” but its “computation ramp is coming to enable us to do” it in 2026 —  Tech giant's AI chief says it will have the resources to build frontier systems later this year

  14. An interview with Mustafa Suleyman on Microsoft's AI reorg, how revising its OpenAI deal "unlocked [Microsoft's] ability to pursue superintelligence", and more (Hayden Field/The Verge)

    Hayden Field / The Verge : An interview with Mustafa Suleyman on Microsoft's AI reorg, how revising its OpenAI deal “unlocked [Microsoft's] ability to pursue superintelligence”, and more —  Its new transcription model is a step towards those goals, says Microsoft AI's Mustafa Suleyman.

  15. Challenger: US tech sector job cuts rose 24%+ YoY to 18,720 in March, taking Q1's tech total to 52,000+; AI accounted for 25% of layoffs across all industries (Julia Fanzeres/Bloomberg)

    Julia Fanzeres / Bloomberg : Challenger: US tech sector job cuts rose 24%+ YoY to 18,720 in March, taking Q1's tech total to 52,000+; AI accounted for 25% of layoffs across all industries —  Layoff announcements at technology companies continued to mount in March, leading other industries in overall US job-cut plans …

Solidot(15)

  1. 亚马逊洽谈收购 Globalstar 以挑战 Starlink

    亚马逊正在洽谈收购 Globalstar 以帮助它与 SpaceX 的 Starlink 宽带卫星星座展开竞争。苹果持有 Globalstar 五分之一的股份,因此亚马逊和苹果需要展开谈判,增加了交易的复杂性。双方的磋商可能会破裂,无法达成任何协议。Globalstar 成立于 1991 年,受收购传闻的推动,周三市值达到 90 亿美元。苹果是在 2024 年向 Globalstar 投资 15 亿美元从而拥有 20% 股份。

  2. 实验室手套可能会释放塑料颗粒影响测量结果

    研究人员发现,常用的丁腈橡胶和乳胶实验室手套会释放与微塑料相似的硬脂酸盐颗粒,可能会在研究微塑料污染时高估其含量。实验室手套会无意中将颗粒转移到用于分析空气、水等样本的实验室工具上。研究人员建议使用洁净室手套,这种手套释放的颗粒要少得多。硬脂酸盐是一种类肥皂的盐基物质,被添加到一次性手套中,以帮助其在制造过程中轻松与模具分离。由于其化学性质与部分塑料相似,在实验室分析中会难以区分,增加了研究微塑料污染时出现假阳性的风险。

  3. Anthropic 以版权侵犯为由要求删除上万份 Claude Code 源代码副本

    Claude Code 源代码不小心泄漏之后,Anthropic 正以版权侵犯为由要求删除上万份 Claude Code 源码副本,但覆水难收,新的副本仍然源源不断的出现。开发者对该源码的分析揭示了 Anthropic 采用的一些窍门:定期回顾任务以巩固记忆,该过程被称之为“做梦(dreaming)”;某种隐藏身份的卧底模式;被称为 Buddy 的可互动电子宠物。还有开发者有其它 AI 工具和其它编程语言重写了 Claude Code,认为此举称不上版权侵犯,能免于下架的命运。

  4. SpaceX 申请 IPO

    SpaceX 本周向 SEC 秘密递交了上市申请,标志着史上规模最大的 IPO 拉开帷幕。秘密提交上市申请允许企业在不公开披露财务信息的情况下推进上市计划。SpaceX 此次 IPO 计划融资约 750 亿美元,目标估值约 1.75 万亿美元。在美国,只有英伟达、苹果、Alphabet、微软和亚马逊的市值高于 SpaceX。SpaceX 有望迅速加入纳指,因为纳斯达克证券交易所刚刚修改了几乎是为 SpaceX 量身定做的指数纳入方法:取消了公开发行至少 10% 股份的要求——SpaceX 计划发行不到 5% 的股份;上市交易 15 天后即可加入纳斯达克 100 指数。批评人士认为,此举可能会扭曲 IPO 后的价格发现。

  5. 比特币的签名算法比预计的更容易破解

    比特币的数字签名使用了 256 位椭圆曲线算法 secp256k1,此前估计量子计算机需要百万量子比特才能将其破解。Google 研究院的研究人员发表了一篇白皮书,称他们改进了 Shor 算法,使得在 10 分钟内破解比特币地址中的公钥成为可能。他们编译了两个解决椭圆曲线离散对数问题的量子电路,其一需要不到 1200 个逻辑量子比特和 9000 万个托佛利门,其二需要不到 1450 个逻辑量子比特和 7000 万个托佛利门。 比特币作者中本聪早在 2010 年就提出过,如果量子计算机变得切实可行,那么比特币软件需要升级改用其它算法。Google 研究人员推荐加密货币社区迁移到能抵抗量子计算机破解的后量子密码学(PQC)。

  6. USPTO 驳回去年授予任天堂的召唤物战斗专利

    美国专利商标局 USPTO 去年 9 月授予了任天堂和 The Pokémon Company 一项受争议的专利,该专利与召唤物并让召唤物为玩家战斗(自动或手控)的游戏机制有关。该游戏机制已存在了几十年,被游戏开发商广泛使用,如 1990 年代的《暗黑破坏神》以及早期的《最终幻想》游戏系列都存在玩家使用技能或符咒召唤一个角色让其为玩家战斗的机制。去年 11 月 USPTO 局长 John A. Squires 下令对该专利启动单方复审,专利审查员将重新评估该专利是否应该被授予。上周专利审查员发布了非终审的审定书,驳回了该专利——即撤销已授予的专利。任天堂有两个月的时间作出回应,如果它提出申请,期限还可以进一步延长。

  7. The Document Foundation 取消 Collabora 员工的会员资格

    开发 LibreOffice 项目的基金会 The Document Foundation 与其主要商业合作伙伴 Collabora 之间的紧张关系在加剧:基金会取消了 Collabora 员工的会员资格,而这些员工是项目的核心开发者。LibreOffice 项目可能像它的前身 OpenOffice 那样面临分裂。The Document Foundation 官方博客称,最近批准的社区规章(Community Bylaws)要求移除与基金会存在法律纠纷的企业的员工的会员资格,原因是存在利益冲突,它希望避免进一步争论谁应该承担责任。基金会表示它在雇佣开发者,称其收到的捐款在增加。

  8. 苹果下架了多款 Vibe Coding 应用

    苹果最近下架了多款 Vibe Coding 应用,包括 Replit、Vibecode 和 Anything。Vibe Coding 应用允许没有多少编程知识的用户利用大模型生成应用,如果顺利的话可以直接在手机上生成应用而无需通过计算机输入任何代码。苹果认为这些应用违反了它的应用商店指南 2.5.2 条款:“App 应自包含在自己的套装中,不得在指定容器范围外读取或写入数据,也不得下载、安装或执行会引入或更改 App 特性或功能的代码,包括其他 App。”

  9. 儿童死亡率过去三十年大幅下降,但仍然未达到目标

    联合国可持续发展目标提出到 2030 年各国争取将五岁以下儿童每 1,000 例活产的死亡率降至 25 例以下。北大研究人员分析了 1990 年至 2023 年间 200 个国家和地区的五岁以下儿童年度死亡人数和死亡率,再依据死亡率随时间变化的趋势,预测了目前尚未达标国家的达标时间。研究发现,在研究期间,全球五岁以下儿童死亡人数下降 63%,从 1990 年的近 1,300 万降至 2023 年的 478 万;死亡率平均每年下降 3.18%。目前全球五岁以下儿童每 1,000 例活产的死亡率为 36.72 例,仍明显高于目标水平,预计要到 2035 年才能达标。目前已达标的国家有 133 个,另有 9 个国家有望在 2030 年前达标。无法按时达标的国家有 58 个,其中 25 个国家预计要到 2050 年以后才能达标,多米尼克甚至出现五岁以下儿童死亡人数不降反增的情况。全球五岁以下儿童死亡人数有超过五分之四集中在以下两个地区:一是撒哈拉以南非洲,该地区目前的五岁以下儿童每 1,000 例活产的死亡率仍高达 68.82 例,预计要到 2055 年才能达到目标;二是中亚和南亚地区。

  10. Steam 用户中使用 Linux 比例达到 5.33%

    Valve 公布的 2026 年 3 月 Steam 硬件和软件调查显示,玩家运行的操作系统中 Linux 比例达到了 5.33%。和 2026 年 2 月的情况类似,Valve 的统计数据可能再次出现异常,因为 2 月 Linux 的比例仅为 2.23%,一个内月突然翻一倍以上概率不大。Windows 操作系统的比例降至了 92.33%,OSX 占 2.35%。Linux 用户比例此前从未超过 5%,超过 3% 也仅为一次。今年 2 月 Steam 简体中文用户比例数据异常,超过了五成,3 月 Valve 看起来修正了错误,简体中文用户比例减少 31.85% 占 22.75%,英语用户增加 16.82% 占 39.09%。用户使用英特尔 CPU 的比例占 55.82%。AMD 占 44.17%。

  11. NASA 执行 Artemis II 载人绕月任务

    NASA 于美国东部时间 4 月 1 日 6:35 p.m.在佛罗里达肯尼迪太空中心发射了 SLS 火箭,执行阿耳忒弥斯 2 号 (Artemis II)载人绕月任务。美国上一次载人登月/绕月是 1972 年 12 月的阿波罗 17 号任务。阿耳忒弥斯(Artemis)登月计划于 2017 年正式宣布,2022 年执行了无人的阿耳忒弥斯 1 号绕月任务。阿耳忒弥斯 2号载人绕月任务预计持续 10 天,四名宇航员包括了 NASA 宇航员 Reid Wiseman、Victor Glover 和 Christina Koch,以及加拿大 CSA 宇航员 Jeremy Hansen。此次任务之一是在载人环境下验证生命维持系统,为在月球建立基地做准备。阿耳忒弥斯 3 号预计在 2027 年执行,仍然是载人绕月,仍然是验证技术为主。2028 年的阿耳忒弥斯 4 号计划载人登月。

  12. 欧洲国家快速拥抱绿色技术和电动汽车

    因霍尔木兹海峡封锁推高世界各地的油气价格,欧洲多国转向了绿色技术购买了更多电动汽车。数据显示,3 月前三周英国热泵销量较上月同期增长 51%,太阳能销量增长 54%,电动汽车充电器销量增长 20%。法国二手车在线零售商 Aramisauto 的电动汽车销量在 2 月中旬到 3 月 9 日期间几乎翻了一番。阿姆斯特丹二手车交易平台 Olx 表示它在法国、罗马尼亚、葡萄牙和波兰的平台上电动汽车的客户咨询量激增。挪威最大二手车交易平台 Finn.no 上电动汽车销量超过了柴油车。

  13. 百度多辆无人驾驶出租车同时发生故障

    百度旗下的萝卜快跑在武汉运营无人驾驶出租车服务,本周二 3 月 31 日 20 时左右无人出租车在路上集体趴窝。根据社交媒体平台上广泛流传的现场照片和视频,故障的萝卜快跑无人出租车不仅有停在路边的,还有停在马路中间的,甚至还有停在高架路的,部分乘客被困车内逾一小时。武汉交警称,初步判断为系统故障所致。交警表示这起事故中无人受伤,所有乘客均安全下车。目前不清楚有多少百度无人驾驶出租车受到影响。社交网络的照片和视频显示,突然停车的无人出租车至少造成多起追尾事故。有武汉网民称看到至少十几辆无人出租车趴窝。百度尚未对事故进行说明。

  14. 瑞典回归传统的基于纸质的课堂教育模式

    数字化教育和社交媒体给儿童和青少年带来的问题最近几年日益受到关注和争论。瑞典和其它国家一样,过去几十年逐渐放弃纸质书籍,切换到平板电脑和数字资源,试图让学生为网络世界做好准备。但数字化教育的争议最终促使瑞典在 2023 年宣布回归传统的基于纸质的课堂教育模式,纸质书籍重新回到课堂,学生开始学习用铅笔或钢笔在纸上手写,以最传统的方式进行学习。瑞典政府还计划在全国范围内推行学校禁止使用手机政策。这标志着瑞典教育模式的重大转变。瑞典官员强调,学校不会完全摒弃数字技术。数字辅助工具主要用于帮助高年级年龄段学生进行学习。

  15. 尼安德特人在灭绝边缘生存了 35 万年

    从 40 万到 4.5 万年前,尼安德特人独自占据了欧亚大陆的大部分地区,狩猎大型动物、采集植物、熟练制造石器,用兽皮制作衣物。但他们的生存状况岌岌可危。两项新研究表明,尼安德特人以地理上相距甚远的小群体形式生存,经历了严重的近亲繁殖,7.5 万年前濒临灭绝。近亲繁殖被广泛认为不利于适应环境变化。但如果环境在较长的时间内维持稳定,近亲繁殖的群体仍然能长时间生存下来。研究人员报告,7.5 万年前,尼安德特人的遗址和骨骼遗骸广泛分布于整个欧洲大陆,​​其基因组相对多样化。但 7.5 万至 6.5 万年前的冰河时期遗址的数量减少,6 万年前所有遗传多样性全部消失只剩一条单一的谱系。4.5 万年前气候再次波动,加上现代人类抵达欧亚大陆,尼安德特人有效种群数量在三千年内急剧下降,在大约 4.2 万年前达到最低点,之后彻底消失。