DIGEST · 2026-02-10

OrangeBot.AI Digest — 2026-02-10

59 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Google handed ICE student journalist's bank and credit card numbers (theintercept.com)
  2. The Singularity will occur on a Tuesday (campedersen.com)
  3. Ex-GitHub CEO launches a new developer platform for AI agents (entire.io)
  4. Parse, Don't Validate (2019) (lexi-lambda.github.io)
  5. I started programming when I was 7. I'm 50 now and the thing I loved has changed (www.jamesdrandall.com)
  6. Europe's $24T Breakup with Visa and Mastercard Has Begun (europeanbusinessmagazine.com)
  7. The US is flirting with its first-ever population decline (www.bloomberg.com)
  8. Vercel's CEO offers to cover expenses of 'Jmail' (www.threads.com)
  9. Oxide raises $200M Series C (oxide.computer)
  10. Jury told that Meta, Google 'engineered addiction' at landmark US trial (techxplore.com)
  11. Clean-room implementation of Half-Life 2 on the Quake 1 engine (code.idtech.space)
  12. AI doesn’t reduce work, it intensifies it (simonwillison.net)
  13. Qwen-Image-2.0: Professional infographics, exquisite photorealism (qwen.ai)
  14. Zulip.com Values (zulip.com)
  15. MIT Technology Review has confirmed that posts on Moltbook were fake (www.technologyreview.com)

GitHub Trending(14)

  1. google / langextract

    A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

  2. iOfficeAI / AionUi

    Free, local, open-source 24/7 Cowork and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

  3. KeygraphHQ / shannon

    Fully autonomous AI hacker to find actual exploits in your web apps. Shannon has achieved a 96.15% success rate on the hint-free, source-aware XBOW Benchmark.

  4. github / gh-aw

    GitHub Agentic Workflows

  5. EveryInc / compound-engineering-plugin

    Official Claude Code compound engineering plugin

  6. hsliuping / TradingAgents-CN

    基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版

  7. gitbutlerapp / gitbutler

    The GitButler version control client, backed by Git, powered by Tauri/Rust/Svelte

  8. carlvellotti / claude-code-pm-course

    Interactive course teaching Product Managers how to use Claude Code effectively

  9. Shubhamsaboo / awesome-llm-apps

    Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

  10. drawdb-io / drawdb

    Free, simple, and intuitive online database diagram editor and SQL generator.

  11. pydantic / monty

    A minimal, secure Python interpreter written in Rust for use by AI

  12. cheahjs / free-llm-api-resources

    A list of free LLM inference resources accessible via API.

  13. Jeffallan / claude-skills

    65 Specialized Skills for Full-Stack Developers. Transform Claude Code into your expert pair programmer.

  14. virattt / dexter

    An autonomous agent for deep financial research

Hugging Face(15)

  1. QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

    Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated experience. To address these challenges, we propose QuantaAlpha, an evolutionary alpha mining framework that treats each end-to-end mining run as a trajectory and improves factors through trajectory-level mutation and crossover operations. QuantaAlpha localizes suboptimal steps in each trajectory for targeted revision and recombines complementary high-reward segments to reuse effective patterns, enabling structured exploration and refinement across mining iterations. During factor generation, QuantaAlpha enforces semantic consistency across the hypothesis, factor expression, and executable code, while constraining the complexity and redundancy of the generated factor to mitigate crowding. Extensive experiments on the China Securities Index 300 (CSI 300) demonstrate consistent gains over strong baseline models and prior agentic systems. When utilizing GPT-5.2, QuantaAlpha achieves an Information Coefficient (IC) of 0.1501, with an Annualized Rate of Return (ARR) of 27.75% and a Maximum Drawdown (MDD) of 7.98%. Moreover, factors mined on CSI 300 transfer effectively to the China Securities Index 500 (CSI 500) and the Standard & Poor's 500 Index (S&P 500), delivering 160% and 137% cumulative excess return over four years, respectively, which indicates strong robustness of QuantaAlpha under market distribution shifts.

  2. MOVA: Towards Scalable and Synchronized Video-Audio Generation

    Audio is indispensable for real-world video, yet generation models have largely overlooked audio components. Current approaches to producing audio-visual content often rely on cascaded pipelines, which increase cost, accumulate errors, and degrade overall quality. While systems such as Veo 3 and Sora 2 emphasize the value of simultaneous generation, joint multimodal modeling introduces unique challenges in architecture, data, and training. Moreover, the closed-source nature of existing systems limits progress in the field. In this work, we introduce MOVA (MOSS Video and Audio), an open-source model capable of generating high-quality, synchronized audio-visual content, including realistic lip-synced speech, environment-aware sound effects, and content-aligned music. MOVA employs a Mixture-of-Experts (MoE) architecture, with a total of 32B parameters, of which 18B are active during inference. It supports IT2VA (Image-Text to Video-Audio) generation task. By releasing the model weights and code, we aim to advance research and foster a vibrant community of creators. The released codebase features comprehensive support for efficient inference, LoRA fine-tuning, and prompt enhancement.

  3. Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

    As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.

  4. AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

    LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced from state-of-the-art machine learning papers. These tasks span diverse domains, including language modeling, mathematics, bioinformatics, and time series forecasting. AIRS-Bench tasks assess agentic capabilities over the full research lifecycle -- including idea generation, experiment analysis and iterative refinement -- without providing baseline code. The AIRS-Bench task format is versatile, enabling easy integration of new tasks and rigorous comparison across different agentic frameworks. We establish baselines using frontier models paired with both sequential and parallel scaffolds. Our results show that agents exceed human SOTA in four tasks but fail to match it in sixteen others. Even when agents surpass human benchmarks, they do not reach the theoretical performance ceiling for the underlying tasks. These findings indicate that AIRS-Bench is far from saturated and offers substantial room for improvement. We open-source the AIRS-Bench task definitions and evaluation code to catalyze further development in autonomous scientific research.

  5. Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

    Current Vision-Language-Action (VLA) models rely on fixed computational depth, expending the same amount of compute on simple adjustments and complex multi-step manipulation. While Chain-of-Thought (CoT) prompting enables variable computation, it scales memory linearly and is ill-suited for continuous action spaces. We introduce Recurrent-Depth VLA (RD-VLA), an architecture that achieves computational adaptivity via latent iterative refinement rather than explicit token generation. RD-VLA employs a recurrent, weight-tied action head that supports arbitrary inference depth with a constant memory footprint. The model is trained using truncated backpropagation through time (TBPTT) to efficiently supervise the refinement process. At inference, RD-VLA dynamically allocates compute using an adaptive stopping criterion based on latent convergence. Experiments on challenging manipulation tasks show that recurrent depth is critical: tasks that fail entirely (0 percent success) with single-iteration inference exceed 90 percent success with four iterations, while simpler tasks saturate rapidly. RD-VLA provides a scalable path to test-time compute in robotics, replacing token-based reasoning with latent reasoning to achieve constant memory usage and up to 80x inference speedup over prior reasoning-based VLA models. Project page: https://rd-vla.github.io/

  6. LLaDA2.1: Speeding Up Text Diffusion via Token Editing

    While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context window, we implement the first large-scale Reinforcement Learning (RL) framework specifically tailored for dLLMs, anchored by specialized techniques for stable gradient estimation. This alignment not only sharpens reasoning precision but also elevates instruction-following fidelity, bridging the chasm between diffusion dynamics and complex human intent. We culminate this work by releasing LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B). Across 33 rigorous benchmarks, LLaDA2.1 delivers strong task performance and lightning-fast decoding speed. Despite its 100B volume, on coding tasks it attains an astounding 892 TPS on HumanEval+, 801 TPS on BigCodeBench, and 663 TPS on LiveCodeBench.

  7. GEBench: Benchmarking Image Generation Models as GUI Environments

    Recent advancements in image generation models have enabled the prediction of future Graphical User Interface (GUI) states based on user instructions. However, existing benchmarks primarily focus on general domain visual fidelity, leaving the evaluation of state transitions and temporal coherence in GUI-specific contexts underexplored. To address this gap, we introduce GEBench, a comprehensive benchmark for evaluating dynamic interaction and temporal coherence in GUI generation. GEBench comprises 700 carefully curated samples spanning five task categories, covering both single-step interactions and multi-step trajectories across real-world and fictional scenarios, as well as grounding point localization. To support systematic evaluation, we propose GE-Score, a novel five-dimensional metric that assesses Goal Achievement, Interaction Logic, Content Consistency, UI Plausibility, and Visual Quality. Extensive evaluations on current models indicate that while they perform well on single-step transitions, they struggle significantly with maintaining temporal coherence and spatial grounding over longer interaction sequences. Our findings identify icon interpretation, text rendering, and localization precision as critical bottlenecks. This work provides a foundation for systematic assessment and suggests promising directions for future research toward building high-fidelity generative GUI environments. The code is available at: https://github.com/stepfun-ai/GEBench.

  8. Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

    Despite the growing video understanding capabilities of recent Multimodal Large Language Models (MLLMs), existing video benchmarks primarily assess understanding based on models' static, internal knowledge, rather than their ability to learn and adapt from dynamic, novel contexts from few examples. To bridge this gap, we present Demo-driven Video In-Context Learning, a novel task focused on learning from in-context demonstrations to answer questions about the target videos. Alongside this, we propose Demo-ICL-Bench, a challenging benchmark designed to evaluate demo-driven video in-context learning capabilities. Demo-ICL-Bench is constructed from 1200 instructional YouTube videos with associated questions, from which two types of demonstrations are derived: (i) summarizing video subtitles for text demonstration; and (ii) corresponding instructional videos as video demonstrations. To effectively tackle this new challenge, we develop Demo-ICL, an MLLM with a two-stage training strategy: video-supervised fine-tuning and information-assisted direct preference optimization, jointly enhancing the model's ability to learn from in-context examples. Extensive experiments with state-of-the-art MLLMs confirm the difficulty of Demo-ICL-Bench, demonstrate the effectiveness of Demo-ICL, and thereby unveil future research directions.

  9. Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

    Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present BudgetMem, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., Low/Mid/High). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

  10. LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

    Large language models (LLMs) are increasingly capable of carrying out long-running, real-world tasks. However, as the amount of context grows, their reliability often deteriorates, a phenomenon known as "context rot". Existing long-context benchmarks primarily focus on single-step settings that evaluate a model's ability to retrieve information from a long snippet. In realistic scenarios, however, LLMs often need to act as agents that explore environments, follow instructions and plans, extract useful information, and predict correct actions under a dynamically growing context. To assess language agents in such settings, we introduce LOCA-bench (a benchmark for LOng-Context Agents). Given a task prompt, LOCA-bench leverages automated and scalable control of environment states to regulate the agent's context length. This design enables LOCA-bench to extend the context length potentially to infinity in a controlled way while keeping the underlying task semantics fixed. LOCA-bench evaluates language agents as a combination of models and scaffolds, including various context management strategies. While agent performance generally degrades as the environment states grow more complex, advanced context management techniques can substantially improve the overall success rate. We open-source LOCA-bench to provide a platform for evaluating models and scaffolds in long-context, agentic scenarios: https://github.com/hkust-nlp/LOCA-bench

  11. GISA: A Benchmark for General Information-Seeking Assistant

    The advancement of large language models (LLMs) has significantly accelerated the development of search agents capable of autonomously gathering information through multi-turn web interactions. Various benchmarks have been proposed to evaluate such agents. However, existing benchmarks often construct queries backward from answers, producing unnatural tasks misaligned with real-world needs. Moreover, these benchmarks tend to focus on either locating specific information or aggregating information from multiple sources, while relying on static answer sets prone to data contamination. To bridge these gaps, we introduce GISA, a benchmark for General Information-Seeking Assistants comprising 373 human-crafted queries that reflect authentic information-seeking scenarios. GISA features four structured answer formats (item, set, list, and table), enabling deterministic evaluation. It integrates both deep reasoning and broad information aggregation within unified tasks, and includes a live subset with periodically updated answers to resist memorization. Notably, GISA provides complete human search trajectories for every query, offering gold-standard references for process-level supervision and imitation learning. Experiments on mainstream LLMs and commercial search products reveal that even the best-performing model achieves only 19.30\% exact match score, with performance notably degrading on tasks requiring complex planning and comprehensive information gathering. These findings highlight substantial room for future improvement.

  12. InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

    We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture composed of three coordinated subsystems for generation, verification, and evolution. These subsystems are supported by foundational capabilities for deep research, solution optimization, and long horizon memory. The architecture allows InternAgent-1.5 to operate continuously across extended discovery cycles while maintaining coherent and improving behavior. It also enables the system to coordinate computational modeling and laboratory experimentation within a single unified system. We evaluate InternAgent-1.5 on scientific reasoning benchmarks such as GAIA, HLE, GPQA, and FrontierScience, and the system achieves leading performance that demonstrates strong foundational capabilities. Beyond these benchmarks, we further assess two categories of discovery tasks. In algorithm discovery tasks, InternAgent-1.5 autonomously designs competitive methods for core machine learning problems. In empirical discovery tasks, it executes complete computational or wet lab experiments and produces scientific findings in earth, life, biological, and physical domains. Overall, these results show that InternAgent-1.5 provides a general and scalable framework for autonomous scientific discovery.

  13. Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

    Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for active, self-directed exploration remains understudied. We propose Theory of Space, defined as an agent's ability to actively acquire information through self-directed, active exploration and to construct, revise, and exploit a spatial belief from sequential, partial observations. We evaluate this through a benchmark where the goal is curiosity-driven exploration to build an accurate cognitive map. A key innovation is spatial belief probing, which prompts models to reveal their internal spatial representations at each step. Our evaluation of state-of-the-art models reveals several critical bottlenecks. First, we identify an Active-Passive Gap, where performance drops significantly when agents must autonomously gather information. Second, we find high inefficiency, as models explore unsystematically compared to program-based proxies. Through belief probing, we diagnose that while perception is an initial bottleneck, global beliefs suffer from instability that causes spatial knowledge to degrade over time. Finally, using a false belief paradigm, we uncover Belief Inertia, where agents fail to update obsolete priors with new evidence. This issue is present in text-based agents but is particularly severe in vision-based models. Our findings suggest that current foundation models struggle to maintain coherent, revisable spatial beliefs during active exploration.

  14. WorldCompass: Reinforcement Learning for Long-Horizon World Models

    This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively "steer" the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.

  15. LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

    Chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) in natural language to perform complex reasoning. However, chemical reasoning is inherently continuous and structural, and forcing it into discrete linguistic tokens introduces a fundamental representation mismatch that constrains both efficiency and performance. We introduce LatentChem, a latent reasoning interface that decouples chemical computation from textual generation, enabling models to perform multi-step reasoning directly in continuous latent space while emitting language only for final outputs. Remarkably, we observe a consistent emergent behavior: when optimized solely for task success, models spontaneously internalize reasoning, progressively abandoning verbose textual derivations in favor of implicit latent computation. This shift is not merely stylistic but computationally advantageous. Across diverse chemical reasoning benchmarks, LatentChem achieves a 59.88\% non-tie win rate over strong CoT-based baselines on ChemCoTBench, while delivering a 10.84times average inference speedup. Our results provide empirical evidence that chemical reasoning is more naturally and effectively realized as continuous latent dynamics rather than discretized linguistic trajectories.

Solidot(15)

  1. 2025 年有 410 艘油轮被遗弃

    Ivan 所在的油轮去年 11 月运载 75 万桶俄罗斯原油从远东前往中国,国际运输工人联盟(International Transport Workers' Federation,ITF) 在得知船员已经几个月没有拿到工资后于 12 月宣布油轮废弃。这艘油轮目前停留在国际水域,由于受到多方密切关注中国不允许其靠岸。ITF 已经帮助 Ivan 和其他船员拿到了 12 月的工资,运送了食物、饮用水和其它生活必需品。部分船员已经回国,包括 Ivan 在内的大部分船员仍然滞留在船上。根据 ITF 的统计,2016 年全球共有 20 艘船遗弃。但到 2025 年这一数字飙升至 410 艘,6223 名商船海员沦为受害者。这两个数字比 2024 年增长了近三分之一。油轮遗弃的主要原因是地缘政治不稳定。类似 Ivan 被困的油轮,大部分船东身份不明、船龄老旧、可能没有保险,在监管不严的国家如巴拿马、利比里亚和马绍尔群岛注册。冈比亚在 2023 年没有任何油轮,但到了 2025 年 3 月,该国注册的油轮达到了 35 艘。根据国际海事组织(IMO)的指导方针,如果海员至少两个月合同工资未支付,就构成了遗弃。2025 年因油轮遗弃印度籍海员受影响人数最多有 1125 人,占总数的 18%。菲律宾(539人)和叙利亚(309人)位列第二和第三。

  2. 电动汽车有助于改善空气质量

    根据发表在《The Lancet Planetary Health》期刊上的一项研究,电动汽车有助于改善空气质量。研究调查了纯电和插电混动汽车数量增加对加州空气污染的影响。加州是美国插电汽车保有量最大的州,数量已足以对空气质量产生积极影响。研究利用卫星数据,通过 NO2 吸收和反射太阳光追踪 NO2 水平,结果显示,2019-2023 年间每新增 200 辆纯电或插电混动汽车,NO2 水平下降 1.1%。NO2 会触发哮喘支气管炎,增加患心脏病和中风风险。研究还证实,汽油汽车增加的社区污染物排放量会上升。

  3. 素食幼儿的生长速度与杂食幼儿相同

    生于素食家庭的婴儿在早期可能略微偏瘦,但到两岁时,体重就能赶上杂食家庭的同龄婴儿。以色列内盖夫本-古里安大学的研究人员分析了 2014-2023 年间从以色列国家家庭护理中心收集的 120 万名婴儿的数据,记录了每个婴儿从出生到 24 个月的身长、体重和头围。研究团队将生长数据与婴儿父母报告的饮食类型进行了比较。绝大多数家庭表示他们是杂食家庭,只有 1.2% 的家庭自称是素食者,0.3% 的家庭自称是纯素食者。素食家庭和纯素食家庭中仍然有大约 1.8 万名婴儿。研究人员按照饮食类型将婴儿分为三组。在出生后的 60 天内,三组婴儿的身长、头围以及生长发育受限的发生率都相似。来自无肉家庭(尤其是纯素食家庭)的婴儿体重偏轻的可能性较高。到 2 岁左右,这些差异基本消失,三组幼儿的生长指标趋于一致。研究人员指出,这项研究应该能让人放心,无肉饮食可以支持婴儿早期的健康成长,他同时指出,这些饮食情况是由父母自行报告的,这可能会影响结果的准确性。

  4. YouTube Music 限制免费用户查看歌词

    Google 旗下的 YouTube Music 服务限制免费用户查看歌词。免费用户报告他们能查看的歌词次数有限制,他们收到了剩余多少次查看的警告。Google 尚未正式宣布 YouTube Music 歌词功能只提供给付费订阅用户。Google 发言人在回应中表示他们还在测试,尚未做出最终决定。YouTube Video 和 Music 的月费为 14 美元,YouTube Music 月费为 11 美元。音乐流媒体巨头 Spotify 也曾在 2024 年限制用户访问歌词,因用户反应强烈它取消了限制。

  5. 美国 AI 初创公司盛行 996 工作制

    996 工作制在中国饱受争议,而美国 AI 初创公司正将这一工作制视为特色。纽约 AI 初创公司 Rilla 在招聘广告中警告应聘者每周工作时间可能长达 70 小时。只有 7 个人的初创公司 Browser-Use 开发浏览器与 AI 的交互工具,它将一个共享空间作为办公以及生活的场所,更进一步的模糊了工作和生活的界限。风投 Menlo Ventures 的合伙人 Deedy Das 指出,长时间工作并不意味着员工工作效率高或有更高的生产力,这么做会疏远有家庭的员工以及富有经验的年长员工。长时间工作还会导致职业倦怠。他认为公司创始人长时间工作无可厚非,因为攸关自身利益,如果公司成功他们会变得非常富有。密歇根州立大学的研究发现,一名每周工作 70 小时的员工的产出与一名每周工作 50 小时的员工几乎相同。

  6. Google 计划发行百年债券为 AI 筹集资金

    美国主要科技巨头计划今年在 AI 数据中心上投资 7000 亿美元,而为了筹集资金这些公司转向了债券市场,其中 Google 已经与多家银行达成协议,计划发行罕见的期限为一百年的债券。历史上的百年债券很多都以失败告终,因为发行方在百年期限届满前就破产了。科技巨头发行的债券期限多数最长为 40 年,上一次有科技巨头发行百年债券是 1997 年的摩托罗拉,这也是摩托罗拉最后一次被视为行业巨头。当年摩托罗拉的企业品牌在美国排名第一,超过了微软,但如今其市值排名第 232 位。

  7. 欧洲政府悄然普及 Matrix

    Matrix 点对点通信协议正被越来越多的政府机构使用。邮件客户端 Thunderbird 自 2022 年起原生支持 Matrix。Matrix 项目正与约 35 个国家洽谈 FOSS 通信基础设施。联合国将 Matrix 作为内部使用的通信工具。在特朗普政府对国际刑事法院(ICC)首席检察官 Karim Khan 制裁后,ICC 也在采用 Matrix,并致力于抛弃微软软件。法国政府使用的数字化工作空间 La Suite 有两个组件是基于 Matrix:聊天工具 Tchap 和视频会议工具 Visio。乌克兰、荷兰、瑞士和奥地利等国政府机构使用了 Matrix。Matrix 协议的独特之处在于每天有成千上万的人使用它,却从未听说过它,因为它被集成在其它工具和应用中。

  8. Discord 将验证用户年龄

    Discord 宣布下个月推出年龄验证,用户将被默认设置为青少年,除非通过脸部扫描或身份证件证明是成年人。未验证为成年人的用户将无法访问有成人内容的服务器和频道,无法在 Discord 直播频道上发言,会过滤掉成人内容,陌生用户的私信将被自动过滤到单独收件箱。身份证件将由第三方供应商验证,Discord 表示证件照片会在年龄确认后立即删除。

  9. AI 写的编译器都能编译内核了,但我还是无动于衷

    Nala Ginrut 写道:AI 究竟是否能成为新的数字利维坦,目前尚未可知,但 AI 写编译器的事情着实吓尿了一群人,但也有一些人暂时还没被吓尿,并且还有心情写文章。 根据 Anthropic 的记录,无限增加 agent数 量并不会有什么效果,16个并行的 agent 一起跑,结果 16 个 agent都卡在同一个问题。理想中的 agent 应该是能更好更细粒度地规避冲突,互相协调,有同步协调机制。 也就是说,他们这次没有做 orchestration agent。 可是问题就来了,真正有生产力的 orchestration agent,其设计和实现也是一个复杂工程。如果一句“给我来一个XX“之类的简单 prompt 不能让 AI 帮你自动写,就意味着需要投入工程师才行了。 这就回到了我们之前提到的VibeOS悖论:你得拥有足够丰富的工程经验,以及编译理论的理解,才能设计出一个好的专用 orchestration agent。

  10. AI 没有减少工作量而是增加了工作强度

    哈佛商业评论发表报告称,AI 没有减少工作量而是增加了工作强度。报告称:在一项为期八个月的研究中,他们调查了生成式 AI 如何改变一家有大约 200 名员工的美国科技公司的工作习惯。研究发现,员工的工作节奏更快,承担了更广泛的任务,工作时间更长,经常是主动去做的。公司并未强制要求使用 AI。由于 AI 让多做点事变得触手可及、切实可行,且在很多情况下能带来内在成就感,员工们主动承担了更多的工作。听起来是企业领导梦寐以求的事情,但热情拥抱 AI 带来的变革可能难以持续,会引发更多新问题。一旦新鲜感消退,员工们可能会发现他们的工作量悄然增加,疲于应对突然涌现的任务。这种工作量的累积反过来会导致认知疲劳、职业倦怠和决策能力下降。最初享受到的生产力激增可能会被工作质量下降、人员流动等问题所取代。

  11. 任何运动都有益健康

    美国和 WHO 的指南不再规定中等或高强度有氧运动的最低时长。因为研究表明,即使短至 30 秒的规律运动也能像泡几小时健身房那样带来健康益处。每天爬几层楼就能改变人生。运动专家称之为 VILPA——代表 vigorous intermittent lifestyle physical activity。今天的理念是任何运动都有益健康。每天走楼梯就能降低体重,降低中风和心脏病的风险。虽然爬楼梯可能不会消耗太多热量。每天只需要运动四分钟。基本上就是以较快的速度爬几层楼梯。强度是最重要的因素。短暂的运动不会让你大汗淋漓,但你需要感受到运动带来的刺激。如果以呼吸作为运动强度的参考:在短暂运动之后如果你还能唱歌那就是轻度运动;如果能说话但不能唱歌那就是中等强度运动;如果上气不接下气无法说话,那就是剧烈运动。中等到剧烈运动带来的益处最大。

  12. 单次饮酒就能显著改变大脑内部通信

    根据发表在《Drug and Alcohol Dependence》期刊上的一项研究,单次饮酒就能显著改变大脑内部通信。研究人员招募了 107 名年龄在 21-45 岁之间的健康成年人。参与者都是社交饮酒者,没有酒精使用障碍史。参与者参与了两次实验,一次是饮酒一次是安慰剂(酒杯表面散了酒精模拟酒),参与者不知道他们饮用的是酒还是安慰剂。在饮用之后参与者在眼睛睁开的情况下接受核磁共振扫描,记录了指示神经活动的血氧水平。研究人员随后分析了 106 个不同脑区之间的功能连接。分析发现,饮酒后大脑拓扑结构发生了显著变化。多个大脑区域的全局效率略有下降,其中负责处理视觉信息的枕叶尤为显著,显示酒精会使视觉信息更难与大脑其它部分整合运作。额叶和颞叶皮层与其邻近区域更频繁通信,大脑看起来分成更小更独立的区域。维持这种结构所需的能量更少,但阻碍了复杂信息的快速整合能力。全局效率下降最大、局部效率上升最高的参与者报告的醉酒感最强。

  13. NASA 允许宇航员携带智能手机去月球

    亿万富翁、NASA 局长 Jared Isaacman 宣布允许执行 Crew-12 载人飞行和 Artemis II 绕月任务的宇航员携带智能手机。此举引发了很多争议,因为航空公司对智能手机使用的锂离子电池都有严格规定,而 NASA 竟然允许在更危险更封闭的环境中放置或使用锂离子电池。Isaacman 宣称此举是为了挑战他所谓的臃肿的认证要求,此前要允许硬件在太空中使用需要经历繁琐的审批流程,包括辐射特性分析、电池热测试、气体释放评估和振动测试等等。

  14. SpaceX 优先建月球城市而不是火星城市

    马斯克(Elon Musk)殖民火星的“计划”被推迟了;他在周日表示 SpaceX 目前的重心转向了在月球上建造一座“自增长城市”,声称这一目标有可能在十年内实现——他给出的时间表通常过于乐观而不可能实现,比如完全自动驾驶。马斯克声称首要任务是确保人类文明的未来,月球相比火星更快。SpaceX 已经告诉投资者,公司将优先考虑登月,在稍后再尝试火星之旅,它的目标是在 2027 年 3 月进行无人登月。马斯克在去年表示,他的目标是在 2026 年底前向火星发射无人探测器。

  15. 为遵守美国新规汽车厂商加紧移除中国软件代码

    美国最新规定要求汽车厂商需要向美国政府证明,自 3 月 17 日起,其产品的核心部件不包含在中国、或由中国公司编写的代码。该规定还涵盖高级自动驾驶软件,并将从 2029 年扩大到硬件。此举旨在防止车载摄像头、麦克风和 GPS 追踪系统被外国对手利用,是美国试图与中国供应链脱钩的试金石。汽车创新联盟(Alliance for Automotive Innovation)政策主管 Hilary Cain 表示,这是数十年来最具影响力和复杂性的汽车监管政策之一,需要对供应链进行深入审查,严格遵守合规时间表。