OrangeBot.AI Digest — 2026-03-01
88 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- How to talk to anyone, and why you should (www.theguardian.com)
- Microgpt explained interactively (growingswe.com)
- Operational issue – Multiple services (UAE) (health.aws.amazon.com)
- When does MCP make sense vs CLI? (ejholmes.github.io)
- New iron nanomaterial wipes out cancer cells without harming healthy tissue (www.sciencedaily.com)
- Why XML tags are so fundamental to Claude (glthr.com)
- AI is making junior devs useless (beabetterdev.com)
- Ape Coding [fiction] (rsaksida.com)
- AI Made Writing Code Easier. It Made Being an Engineer Harder (www.ivanturkovic.com)
- Ghostty – Terminal Emulator (ghostty.org)
- Flightradar24 for Ships (atlas.flexport.com)
- I built a demo of what AI chat will look like when it's "free" and ad-supported (99helpers.com)
- Decision trees – the unreasonable power of nested decision rules (mlu-explain.github.io)
- An ode to houseplant programming (2025) (hannahilea.com)
- Switch to Claude without starting over (claude.com)
GitHub Trending(13)
Product Hunt(15)
- Simplora 2.0
The agentic meeting stack with free prep, notes, and chat
- Voicr
Your voice in, polished text out — in seconds
- Epismo Skills
Everything your agent needs to run reliably
- Octrafic
Test your APIs in plain English, straight from the terminal
- BU
Openclaw in the cloud
- Claude Import Memory
Switch from ChatGPT to Claude with import memory feature
- Notra
Turn your daily work into publish-ready content
- Hearica
Turn all computer audio into captions for the deaf
- OpenFang
Open-Source Agent Operating System
- OpenAI WebSocket Mode for Responses API
Persistent AI agents. Up to 40% faster.
- Study OS
A minimalist focus timer with tasks, notes & study music
- The Claw News
OpenClaw agents publishing daily news
- Solace
Your Mac's appearance, in tune with the world around you.
- Producer AI by Google Labs
Turn ideas into tracks with your AI co-producer
- Pixel
Scale performance ads without juggling 7 ad platforms
Hugging Face(15)
- The Trinity of Consistency as a Defining Principle for General World Models
The construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential of data-driven scaling laws to approximate physical dynamics, while the emerging Unified Multimodal Model (UMM) offers a promising architectural paradigm for integrating perception, language, and reasoning. Despite these advances, the field still lacks a principled theoretical framework that defines the essential properties requisite for a General World Model. In this paper, we propose that a World Model must be grounded in the Trinity of Consistency: Modal Consistency as the semantic interface, Spatial Consistency as the geometric basis, and Temporal Consistency as the causal engine. Through this tripartite lens, we systematically review the evolution of multimodal learning, revealing a trajectory from loosely coupled specialized modules toward unified architectures that enable the synergistic emergence of internal world simulators. To complement this conceptual framework, we introduce CoW-Bench, a benchmark centered on multi-frame reasoning and generation scenarios. CoW-Bench evaluates both video generation models and UMMs under a unified evaluation protocol. Our work establishes a principled pathway toward general world models, clarifying both the limitations of current systems and the architectural requirements for future progress.
- From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic, targeted reinforcement. Motivated by findings that test driven error exposure and feedback based correction outperform repetitive practice, we propose Diagnostic-driven Progressive Evolution (DPE), a spiral loop where diagnosis steers data generation and reinforcement, and each iteration re-diagnoses the updated model to drive the next round of targeted improvement. DPE has two key components. First, multiple agents annotate and quality control massive unlabeled multimodal data, using tools such as web search and image editing to produce diverse, realistic samples. Second, DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks, indicating DPE as a scalable paradigm for continual LMM training under open task distributions. Our code, models, and data are publicly available at https://github.com/hongruijia/DPE.
- MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional evaluation protocol centered on outcome validity, complemented by assessments of instruction understanding, planning, tool use, and efficiency. Using MobilityBench, we evaluate multiple LLM-based route-planning agents across diverse real-world mobility scenarios and provide an in-depth analysis of their behaviors and performance. Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility applications. We publicly release the benchmark data, evaluation toolkit, and documentation at https://github.com/AMAP-ML/MobilityBench .
- OmniGAIA: Towards Native Omni-Modal AI Agents
Human intelligence naturally intertwines omni-modal perception -- spanning vision, audio, and language -- with complex reasoning and tool usage to interact with the world. However, current multi-modal LLMs are primarily confined to bi-modal interactions (e.g., vision-language), lacking the unified cognitive capabilities required for general AI assistants. To bridge this gap, we introduce OmniGAIA, a comprehensive benchmark designed to evaluate omni-modal agents on tasks necessitating deep reasoning and multi-turn tool execution across video, audio, and image modalities. Constructed via a novel omni-modal event graph approach, OmniGAIA synthesizes complex, multi-hop queries derived from real-world data that require cross-modal reasoning and external tool integration. Furthermore, we propose OmniAtlas, a native omni-modal foundation agent under tool-integrated reasoning paradigm with active omni-modal perception. Trained on trajectories synthesized via a hindsight-guided tree exploration strategy and OmniDPO for fine-grained error correction, OmniAtlas effectively enhances the tool-use capabilities of existing open-source models. This work marks a step towards next-generation native omni-modal AI assistants for real-world scenarios.
- Imagination Helps Visual Reasoning, But Not Yet in Latent Space
Latent visual reasoning aims to mimic human's imagination process by meditating through hidden states of Multimodal Large Language Models. While recognized as a promising paradigm for visual reasoning, the underlying mechanisms driving its effectiveness remain unclear. Motivated to demystify the true source of its efficacy, we investigate the validity of latent reasoning using Causal Mediation Analysis. We model the process as a causal chain: the input as the treatment, the latent tokens as the mediator, and the final answer as the outcome. Our findings uncover two critical disconnections: (a) Input-Latent Disconnect: dramatic perturbations on the input result in negligible changes to the latent tokens, suggesting that latent tokens do not effectively attend to the input sequence. (b) Latent-Answer Disconnect: perturbations on the latent tokens yield minimal impact on the final answer, indicating the limited causal effect latent tokens imposing on the outcome. Furthermore, extensive probing analysis reveals that latent tokens encode limited visual information and exhibit high similarity. Consequently, we challenge the necessity of latent reasoning and propose a straightforward alternative named CapImagine, which teaches the model to explicitly imagine using text. Experiments on vision-centric benchmarks show that CapImagine significantly outperforms complex latent-space baselines, highlighting the superior potential of visual reasoning through explicit imagination.
- Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Exploration remains the key bottleneck for large language model agents trained with reinforcement learning. While prior methods exploit pretrained knowledge, they fail in environments requiring the discovery of novel states. We propose Exploratory Memory-Augmented On- and Off-Policy Optimization (EMPO^2), a hybrid RL framework that leverages memory for exploration and combines on- and off-policy updates to make LLMs perform well with memory while also ensuring robustness without it. On ScienceWorld and WebShop, EMPO^2 achieves 128.6% and 11.3% improvements over GRPO, respectively. Moreover, in out-of-distribution tests, EMPO^2 demonstrates superior adaptability to new tasks, requiring only a few trials with memory and no parameter updates. These results highlight EMPO^2 as a promising framework for building more exploratory and generalizable LLM-based agents.
- AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We propose AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining. Our approach acts as an active firewall, intercepting agent outputs and employing a retrieval-augmented rectifier to iteratively correct errors based on a failure-driven indicator pool. This mechanism allows for the precise identification of potential errors using distilled failure patterns as prior knowledge. Irreparable outputs are subsequently pruned to prevent error propagation, while a fallback strategy preserves system integrity. Empirical results on extensive math benchmarks show that AgentDropoutV2 significantly boosts the MAS's task performance, achieving an average accuracy gain of 6.3 percentage points on math benchmarks. Furthermore, the system exhibits robust generalization and adaptivity, dynamically modulating rectification efforts based on task difficulty while leveraging context-aware indicators to resolve a wide spectrum of error patterns. Our code and dataset are released at https://github.com/TonySY2/AgentDropoutV2.
- Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.
- MediX-R1: Open Ended Medical Reinforcement Learning
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correctness with a strict YES/NO decision, a medical embedding-based semantic reward to capture paraphrases and terminology variants, and lightweight format and modality rewards that enforce interpretable reasoning and modality recognition. This multi-signal design provides stable, informative feedback for open-ended outputs where traditional verifiable or MCQ-only rewards fall short. To measure progress, we propose a unified evaluation framework for both text-only and image+text tasks that uses a Reference-based LLM-as-judge in place of brittle string-overlap metrics, capturing semantic correctness, reasoning, and contextual alignment. Despite using only sim51K instruction examples, MediX-R1 achieves excellent results across standard medical LLM (text-only) and VLM (image + text) benchmarks, outperforming strong open-source baselines and delivering particularly large gains on open-ended clinical tasks. Our results demonstrate that open-ended RL with comprehensive reward signals and LLM-based evaluation is a practical path toward reliable medical reasoning in multimodal models. Our trained models, curated datasets and source code are available at https://medix.cvmbzuai.com
- VGG-T^3: Offline Feed-Forward 3D Reconstruction at Scale
We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements grow quadratically w.r.t. the number of input images. Our approach is built on the key insight that this bottleneck stems from the varying-length Key-Value (KV) space representation of scene geometry, which we distill into a fixed-size Multi-Layer Perceptron (MLP) via test-time training. VGG-T^3 (Visual Geometry Grounded Test Time Training) scales linearly w.r.t. the number of input views, similar to online models, and reconstructs a 1k image collection in just 54 seconds, achieving a 11.6times speed-up over baselines that rely on softmax attention. Since our method retains global scene aggregation capability, our point map reconstruction error outperforming other linear-time methods by large margins. Finally, we demonstrate visual localization capabilities of our model by querying the scene representation with unseen images.
- Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling
Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid parallelism framework that combines a novel data parallel strategy, condition-based partitioning, with an optimal pipeline scheduling method, adaptive parallelism switching, to reduce generation latency and achieve high generation quality in conditional diffusion models. The key ideas are to (i) leverage the conditional and unconditional denoising paths as a new data-partitioning perspective and (ii) adaptively enable optimal pipeline parallelism according to the denoising discrepancy between these two paths. Our framework achieves 2.31times and 2.07times latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX~3090 GPUs, while preserving image quality. This result confirms the generality of our approach across U-Net-based diffusion models and DiT-based flow-matching architectures. Our approach also outperforms existing methods in acceleration under high-resolution synthesis settings. Code is available at https://github.com/kaist-dmlab/Hybridiff.
- General Agent Evaluation
The promise of general-purpose agents - systems that perform tasks in unfamiliar environments without domain-specific engineering - remains largely unrealized. Existing agents are predominantly specialized, and while emerging implementations like OpenAI SDK Agent and Claude Code hint at broader capabilities, no systematic evaluation of their general performance has been pursued. Current agentic benchmarks assume domain-specific integration, encoding task information in ways that preclude fair evaluation of general agents. This paper frames general-agent evaluation as a first-class research objective. We propose conceptual principles for such evaluation, a Unified Protocol enabling agent-benchmark integration, and Exgentic - a practical framework for general agent evaluation. We benchmark five prominent agent implementations across six environments as the first Open General Agent Leaderboard. Our experiments show that general agents generalize across diverse environments, achieving performance comparable to domain-specific agents without any environment-specific tuning. We release our evaluation protocol, framework, and leaderboard to establish a foundation for systematic research on general-purpose agents.
- EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents
Human behaviors in the real world naturally encode rich, long-term contextual information that can be leveraged to train embodied agents for perception, understanding, and acting. However, existing capture systems typically rely on costly studio setups and wearable devices, limiting the large-scale collection of scene-conditioned human motion data in the wild. To address this, we propose EmbodMocap, a portable and affordable data collection pipeline using two moving iPhones. Our key idea is to jointly calibrate dual RGB-D sequences to reconstruct both humans and scenes within a unified metric world coordinate frame. The proposed method allows metric-scale and scene-consistent capture in everyday environments without static cameras or markers, bridging human motion and scene geometry seamlessly. Compared with optical capture ground truth, we demonstrate that the dual-view setting exhibits a remarkable ability to mitigate depth ambiguity, achieving superior alignment and reconstruction performance over single iphone or monocular models. Based on the collected data, we empower three embodied AI tasks: monocular human-scene-reconstruction, where we fine-tune on feedforward models that output metric-scale, world-space aligned humans and scenes; physics-based character animation, where we prove our data could be used to scale human-object interaction skills and scene-aware motion tracking; and robot motion control, where we train a humanoid robot via sim-to-real RL to replicate human motions depicted in videos. Experimental results validate the effectiveness of our pipeline and its contributions towards advancing embodied AI research.
- AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity. Most are also static, quickly saturating as developers explicitly or implicitly optimize for them. We propose that a more promising way to evaluate human-like general intelligence in AI systems is through a particularly strong form of general game playing: studying how and how well they play and learn to play all conceivable human games, in comparison to human players with the same level of experience, time, or other resources. We define a "human game" to be a game designed by humans for humans, and argue for the evaluative suitability of this space of all such games people can imagine and enjoy -- the "Multiverse of Human Games". Taking a first step towards this vision, we introduce the AI GameStore, a scalable and open-ended platform that uses LLMs with humans-in-the-loop to synthesize new representative human games, by automatically sourcing and adapting standardized and containerized variants of game environments from popular human digital gaming platforms. As a proof of concept, we generated 100 such games based on the top charts of Apple App Store and Steam, and evaluated seven frontier vision-language models (VLMs) on short episodes of play. The best models achieved less than 10\% of the human average score on the majority of the games, and especially struggled with games that challenge world-model learning, memory and planning. We conclude with a set of next steps for building out the AI GameStore as a practical way to measure and drive progress toward human-like general intelligence in machines.
- Causal Motion Diffusion Models for Autoregressive Motion Generation
Recent advances in motion diffusion models have substantially improved the realism of human motion synthesis. However, existing approaches either rely on full-sequence diffusion models with bidirectional generation, which limits temporal causality and real-time applicability, or autoregressive models that suffer from instability and cumulative errors. In this work, we present Causal Motion Diffusion Models (CMDM), a unified framework for autoregressive motion generation based on a causal diffusion transformer that operates in a semantically aligned latent space. CMDM builds upon a Motion-Language-Aligned Causal VAE (MAC-VAE), which encodes motion sequences into temporally causal latent representations. On top of this latent representation, an autoregressive diffusion transformer is trained using causal diffusion forcing to perform temporally ordered denoising across motion frames. To achieve fast inference, we introduce a frame-wise sampling schedule with causal uncertainty, where each subsequent frame is predicted from partially denoised previous frames. The resulting framework supports high-quality text-to-motion generation, streaming synthesis, and long-horizon motion generation at interactive rates. Experiments on HumanML3D and SnapMoGen demonstrate that CMDM outperforms existing diffusion and autoregressive models in both semantic fidelity and temporal smoothness, while substantially reducing inference latency.
Techmeme(15)
- Australia's eSafety Commissioner threatens action against app stores and search engines if AI services operating in Australia don't verify user ages by March 9 (Byron Kaye/Reuters)
Byron Kaye / Reuters : Australia's eSafety Commissioner threatens action against app stores and search engines if AI services operating in Australia don't verify user ages by March 9 — Australia's internet regulator said it may push search engines and app stores to block artificial intelligence services that fail …
- Israel-based Guidde, which is developing a platform to accelerate the adoption of AI in organizations, raised a $50M Series B round led by PSG Equity (Meir Orbach/CTech)
Meir Orbach / CTech : Israel-based Guidde, which is developing a platform to accelerate the adoption of AI in organizations, raised a $50M Series B round led by PSG Equity — The Israeli startup's platform turns employee workflows into structured knowledge for automation. — Guidde, a startup developing …
- Sources describe in detail the failed talks between Anthropic and DOD, and how officials at agencies, including the CIA, still hope for a peace agreement (New York Times)
New York Times : Sources describe in detail the failed talks between Anthropic and DOD, and how officials at agencies, including the CIA, still hope for a peace agreement — The Pentagon and Anthropic were close to agreeing on the use of artificial intelligence. But strong personalities, mutual dislike and a rival company unraveled a deal.
- Chinese matchmaking apps like Wanmei Qinjia, which has 50M users and lets parents look for spouses for their children, surge as marriage rates continue to fall (Kohei Fujimura/Nikkei Asia)
Kohei Fujimura / Nikkei Asia : Chinese matchmaking apps like Wanmei Qinjia, which has 50M users and lets parents look for spouses for their children, surge as marriage rates continue to fall — DALIAN, China — Apps that enable parents to search for spouses for their unmarried children have become increasingly popular in China …
- Source describes the failed Pentagon-Anthropic talks: through the end, the Pentagon wanted to use Anthropic's AI to analyze bulk data collected about Americans (Ross Andersen/The Atlantic)
Ross Andersen / The Atlantic : Source describes the failed Pentagon-Anthropic talks: through the end, the Pentagon wanted to use Anthropic's AI to analyze bulk data collected about Americans — Right up until the moment that Pete Hegseth moved to terminate the government's relationship with the AI company Anthropic …
- Nvidia partners with Cisco, Nokia, and others to build 6G networks based on open, software-defined AI radio access networking (AI-RAN) architecture (Kyt Dotson/SiliconANGLE)
Kyt Dotson / SiliconANGLE : Nvidia partners with Cisco, Nokia, and others to build 6G networks based on open, software-defined AI radio access networking (AI-RAN) architecture — Nvidia Corp. early Sunday announced ahead of the MWC Barcelona conference that its joining global telecom leaders in a commitment to build 6G …
- A look at Hyundai's Atlas humanoid robot, slated for assembly tasks in 2028; Hyundai has invested billions in robotics since acquiring Boston Dynamics in 2021 (Hyonhee Shin/Bloomberg)
Hyonhee Shin / Bloomberg : A look at Hyundai's Atlas humanoid robot, slated for assembly tasks in 2028; Hyundai has invested billions in robotics since acquiring Boston Dynamics in 2021 — As Hyundai Motor Co. Executive Chair Euisun Chung took the stage at CES 2022 accompanied by “Spot” — a robot dog …
- How some companies are trying to become the "Strava of tennis" by offering match video, stats, highlights, social features, and performance analysis for players (Charlie Eccleshare/The Athletic)
Charlie Eccleshare / The Athletic : How some companies are trying to become the “Strava of tennis” by offering match video, stats, highlights, social features, and performance analysis for players — Share full article — More people are playing tennis in America than ever before.
- Inside the world's largest crypto casino Stake, which claims ~4% of all BTC transactions, with popularity boosted by celebs like Drake and influencers on Kick (Bloomberg)
Bloomberg : Inside the world's largest crypto casino Stake, which claims ~4% of all BTC transactions, with popularity boosted by celebs like Drake and influencers on Kick — Drake just needed some juice. In 82 minutes of online slots play, the Canadian rapper's starting balance of $3.5 million worth of Bitcoin had dwindled to $422,355.
- [Thread] In an AMA, Sam Altman says DOD blacklisting Anthropic sets an "extremely scary precedent", OpenAI rushed its deal to "de-escalate things", and more (Sam Altman/@sama)
Sam Altman / @sama : [Thread] In an AMA, Sam Altman says DOD blacklisting Anthropic sets an “extremely scary precedent”, OpenAI rushed its deal to “de-escalate things”, and more — I'd like to answer questions about our work with the DoW and our thinking over the past few days. Please AMA.
- Netflix actually won by walking away from the WBD bid, collecting a $2.8B termination fee and driving up the price and debt load of the Paramount-WBD merger (Dan Gallagher/Wall Street Journal)
Dan Gallagher / Wall Street Journal : Netflix actually won by walking away from the WBD bid, collecting a $2.8B termination fee and driving up the price and debt load of the Paramount-WBD merger — King of streaming preserves its business model, while Paramount will have to deal with a massive debt load
- Multiple AWS developers say they are asked to take on new roles with AI tools' assistance, and engineers are now required to complete technical writing tasks (Financial Times)
Financial Times : Multiple AWS developers say they are asked to take on new roles with AI tools' assistance, and engineers are now required to complete technical writing tasks — Drive for ‘leaner’ operations piles pressure on employees but could be a playbook for rivals — Amazon's HR chief last month sought …
- Block's plan to lay off over 4,000 employees, citing AI work automation, adds to growing angst among white-collar workers over AI's potential for job disruption (Chip Cutter/Wall Street Journal)
Chip Cutter / Wall Street Journal : Block's plan to lay off over 4,000 employees, citing AI work automation, adds to growing angst among white-collar workers over AI's potential for job disruption — Layoffs at Block add to growing backlash bubbling up across American companies; ‘It brings out the pitchforks’
- Sources: the Pentagon used Claude in its major air attack in Iran, hours after Trump declared that the federal government will end its use of Anthropic's tools (Wall Street Journal)
Wall Street Journal : Sources: the Pentagon used Claude in its major air attack in Iran, hours after Trump declared that the federal government will end its use of Anthropic's tools — Within hours of declaring that the federal government will end its use of artificial-intelligence tools made by tech company Anthropic …
- Polymarket trades on contracts tied to strikes on Iran hit $529M, and six new accounts profited a total of $1M by betting on the US to strike Iran by Feb. 28 (Emily Nicolle/Bloomberg)
Emily Nicolle / Bloomberg : Polymarket trades on contracts tied to strikes on Iran hit $529M, and six new accounts profited a total of $1M by betting on the US to strike Iran by Feb. 28 — As US and Israeli bombs fell on Iran this weekend, bettors on Polymarket — where $529 million was traded on contracts tied to the timing of the strikes — were cashing in.
Solidot(15)
- 美国加州和科罗拉多州计划要求在操作系统层级验证用户年龄
美国多个州都要求成人网站验证访客年龄,但验证年龄的常用方法如扫描脸部或提供身份证件都存在泄露隐私的问题。加州以及科罗拉多州计划要求在操作系统层级验证年龄,然后通过 API 与应用共享。加州去年通过了 AB 1043 法案,要求操作系统开发商创建一种让设备所有者注册其年龄段的方法,该法律将于 2027 年 1 月 1 日生效。科罗拉多州的议员提出了类似的法案 SB26-051,该法案的共同提出者参议员 Matt Ball 表示,他们的目的通过一个以注重隐私的年龄验证框架为儿童的网络安全提供周全的保障。
- Anthropic 的 Claude 在苹果美国区免费应用榜跃居第一
本周五,美国总统特朗普下令联邦机构立刻停用 Anthropic 的 Claude 助手,原因是 Anthropic 在安全原则上坚守其立场。相比之下,其竞争对手 OpenAI 看起来完全没有任何立场,此举在美国用户中间引发了卸载 OpenAI 的 ChatGPT 安装 Claude 的热潮,这一趋势推动 Claude 周六跃居苹果 App Store 美国区免费应用榜榜首,超过了 ChatGPT,ChatGPT 屈居第二,Google 的 Gemini 排名第四。根据分析公司 Sensor Tower 的数据,一个月前的 1 月 30 日 Claude 还排在排行榜的第 131 名,2 月的大部分时间徘徊在前 20 名左右,而 ChatGPT 通常是第一名。Anthropic 还发布了记忆导入功能,方便 ChatGPT 用户改用 Claude。
- 当你需要帮助时狗的反应类似 2 岁小孩但猫只会旁观
根据发表在《Animal Behaviour》期刊上的一项研究,匈牙利研究人员对比了人在需要帮助时 18-24 个月的幼儿、以及宠物狗和猫的反应。结果显示,狗的自发性亲社会行为与幼儿类似,而猫则是冷眼旁观。在实验中,熟人如父母或主人假装在寻找一个藏起来的东西,四分之三的情况下狗和幼儿会提供帮助。猫只有在符合自身利益时才会参与进来提供帮助。
- 克罗地亚宣布在战争结束 31 年后完成地雷清除工作
发生在 1991 年—1995 年之间的克罗地亚战争广泛使用地雷,战后留下了超过 1000 平方公里的布雷区。在战争结束 31 年之后,克罗地亚内政部长 Davor Božinović 宣布所有已知的雷区均已清除完毕。清除地雷期间有 208 人死亡,包括 41 名排雷人员,总耗资约 12 亿欧元。他表示共清除了近 10.7 万枚地雷和 40.7 万枚未爆弹药。
- Metacritic 不收录 AI 生成的评测
Metacritic 是一家聚合电影、电视、音乐专辑、游戏评测及其评分的网站,它会基于相关评分给出一个加权平均值。Metacritic 成立于 2001 年,至今有 25 年历史,它的综合评分获得了广泛认可。网站联合创始人 Marc Doyle 在一份声明中表示 Metacritic 不会收录 AI 生成的评测。在这之前,一家叫 Videogamer.com 的英国老牌游戏网站(曾经很受欢迎)在被博彩公司收购之后解雇了大部分员工,然后用 AI 生成了新游戏的评测和评分。该网站评测作者的肖像是用 ChatGPT 生成的。Metacritic 在获悉之后删除了该网站的评测。
- NASA 宣布 Artemis III 任务仍然是绕月而非登月
NASA 宣布调整阿耳忒弥斯(Artemis)系列载人登月计划:原计划今年 3 月 6 日发射的 Artemis II 载人绕月飞行任务推迟到最早 4 月 1 日进行;Artemis III 载人登月任务不再计划登月,而仍然是绕月飞行。NASA 局长 Jared Isaacman 表示,更为循序渐进的新计划让 NASA 团队能测试飞行和改进技术。
- OpenAI 与五角大楼达成合作,用户纷纷取消 ChatGPT 订阅
在特朗普宣布禁用 Anthropic AI 技术数小时后,其竞争对手 OpenAI 的 CEO Sam Altman 宣布与五角大楼达成协议,在其机密网路中部署旗下 AI 模型。Anthropic 是出于安全原则上的问题而与五角大楼发生分歧,OpenAI 明显在安全原则上更愿意妥协,该公司此举立即在用户中间引发争议和批评,在社交网络 Reddit 和 Hacker News 上,用户纷纷呼吁取消 ChatGPT 订阅,删除该应用,改用 Anthropic 的 Claude。
- 伊朗再次全面断网
根据 Netblocks 的监测,伊朗再次接近全面断网。此举可能是伊朗政府对正在进行的战争采取的信息管制措施,而非战争破坏通信基础设施导致的结果。以色列和美国刚刚对伊朗发动了“先发制人”的空袭,首都德黑兰、Isfahan、Qom、Karaj 和 Kermanshah 五大城市遭到袭击。
- 月球短暂拥有过强于地球的磁场
关于月球早期磁场是强是弱,科学界一直有争论。牛津大学科学家通过分析阿波罗任务带回的样本,发现月球曾拥有极强磁场,强度甚至一度超过地球磁场。只不过,这些“强磁场时刻”极其短暂,更像昙花一现的例外,而非常态。大多数时间里,月球磁场其实很弱。新研究认为:在月球深处,富含钛的物质曾因高温而熔化,在极短时间内催生出强烈的磁场。研究发现,月球样本中钛含量与磁场强度密切相关。富钛岩石往往伴随强磁场记录,而钛含量低于 6% 的样本,则磁场微弱。
- 丹·西蒙斯因中风去世,享年 77 岁
著名科幻、奇幻和恐怖小说家丹·西蒙斯于 2 月 21 日因中风去世,享年 77 岁。西蒙斯出生于美国伊利诺伊州,先后获得英语学士学位和教育硕士学位,最初在小学工作,1985 年出版了他的第一本书《Song of Kali》,他从事小学教育工作至 1989 年,当年他出版了被誉为最伟大科幻小说之一的《海伯利安》。海伯利安讲述了七 名前往光阴冢(Time tombs)的朝圣者的故事。七名朝圣者包括了领事、神父、军人、诗人、圣徒、学者和侦探。这本书为他赢得了雨果奖和轨迹奖。西蒙斯之后还发表了三本续集《海伯利安的陨落》、《安迪密恩》和《安迪密恩的崛起》,四本书组成了《海伯利安诗篇》系列。他的其它作品包括以英国探险家约翰·富兰克林为主角的历史小说《极地恶灵》等。
- 韩国允许 Google 使用高精度地图数据
韩国逆转了长达 20 年的限制高精度地图数据出口的政策,为 Google 地图能在韩国正常工作铺平了道路。韩国目前仍与朝鲜处于技术上的战争状态,禁止出口高精度地图数据的政策与此相关,它此前于 2007 年和 2016 年两次拒绝了 Google 出口地图数据的请求,理由是担心敏感的军事和安全设施信息可能泄露。韩国要求 Google 今后提供涉韩卫星影像和航拍图片时须使用经特殊处理的图像,对之前已经发布的时间序列影像和街景中出现的军事、安全设施进行遮挡处理。Google 须利用韩国合作伙伴企业的境内服务器对原始数据进行加工,并经政府审核将相关数据向境外传输,而且相关数据仅限于导航、地图服务。
- 特朗普命令联邦机构立即停用 Anthropic 的 AI 技术
特朗普命令所有联邦机构立即停用 Anthropic 的 AI 技术。在这之前 Anthropic 与美国国防部在军事用途的安全限制上发生分歧。Anthropic 去年 7 月与五角大楼签署了价值 2 亿美元的合同,该公司在合同中加入了限制条款,禁止将其 Claude 模型用于对美国公民进行大规模监视,禁止在缺乏人类监管的情况下将模型用于决定军事任务的目标锁定。Anthropic 认为 AI 的幻觉问题无法避免,可能存在误判,可能会导致局势非预期升级。但五角大楼要求不受限的使用 Claude,国防部长 Pete Hegseth 设定了周五下午五点为废除安全限制的最后期限。特朗普称,美国绝不会允许一家激进左派、觉醒的公司决定我们伟大军队如何作战与赢得战争(THE UNITED STATES OF AMERICA WILL NEVER ALLOW A RADICAL LEFT, WOKE COMPANY TO DICTATE HOW OUR GREAT MILITARY FIGHTS AND WINS WARS! )。在特朗普发布命令之后,Pete Hegseth 将 Anthropic 列为国家安全的供应链风险。
- 让你压力山大的人也会增加你的生物年龄
根据发表在 PNAS 期刊上的一项研究,常制造问题或让周围人生活变得更艰难的人也会加速周围人的生物衰老。研究人员将这种人称为是 Hassler,每多一个 Hassler,周围人生物衰老速度会加快约 1.5%,相当于比同龄人老了约 9 个月。研究利用了美国印第安纳州 2345 名年龄在 18-103 岁成年人的 DNA 甲基化表观遗传时钟和社交网络数据,其中 29% 的人报告其社交网络至少有一名 Hassler。家庭成员中的 Hassler 与加速衰老的相关性最强,但配偶中的 Hassler 没有产生显著影响。除了加速衰老,Hassler 也会增加抑郁和焦虑的严重程度,更高的体重指数(BMI)、炎症水平和多种疾病的风险。相比吸烟对生物衰老的影响,Hassler 效应相当于其 13%-17%。
- 内存短缺可能杀死入门级 PC
市场调研机构 IDC 预测,因内存短缺 2026 年智能手机出货量预计下滑 12.9%。另一家机构 Gartner 则预测 2026 年智能手机出货量预计下滑 8%,而 PC 出货量将下滑逾 10%。内存价格自去年底以来已翻了一番甚至翻了两番,Gartner 认为到 2026 年底 DRAM 和 NAND 价格还会上涨 130%。Gartner 研究总监 Ranjit Atwal 表示入门级 PC 将不复存在,因为 PC 厂商无法再向注重价格的消费者提供此类 PC。内存价格上涨是如此之快,厂商失去了提供售价低于 500 美元的入门级 PC 的能力。AI PC 则缺乏杀手级应用而无法吸引消费者。由于关键零部件费用快速上涨,Gartner 预计更多企业和家庭用户会延长现有 PC 的使用寿命,推迟更新换代。企业 PC 的平均使用寿命预计将延长 15%,而消费者 PC 的平均使用寿命预计将延长 20%。任何考虑更换 PC 的人都应该现在就购买,因为价格只会继续上涨,且可能至少会持续到明年年底。
- 北美农业区的鸟类数量加速减少
根据发表在《科学》期刊上的一项研究,利用 North American Breeding Bird Survey 的鸟类调查数据,在研究人员分析的 261 种鸟类中,122 种(47%)在 1987-2021 年之间数量显著减少,四分之一(63 种)数量加速减少,加速减少的鸟类主要集中在高集约化农业区域。农业集约化程度可能是这一趋势背后的驱动因素,可能与大规模喷洒农药导致昆虫数量减少有关。