OrangeBot.AI Digest — 2026-04-07
84 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- A truck driver spent 20 years making a scale model of every building in NYC (www.smithsonianmag.com)
- Assessing Claude Mythos Preview's cybersecurity capabilities (red.anthropic.com)
- System Card: Claude Mythos Preview [pdf] (www-cdn.anthropic.com)
- Project Glasswing: Securing critical software for the AI era (www.anthropic.com)
- Cambodia unveils a statue of famous landmine-sniffing rat Magawa (www.bbc.com)
- GLM-5.1: Towards Long-Horizon Tasks (z.ai)
- 12k Tons of Dumped Orange Peel Grew into a Landscape Nobody Expected (2017) (www.sciencealert.com)
- Taste in the age of AI and LLMs (rajnandan.com)
- Claude Code is locking people out for hours (github.com)
- Cloudflare targets 2029 for full post-quantum security (blog.cloudflare.com)
- Dropping Cloudflare for Bunny.net (jola.dev)
- Show HN: Stop paying for Dropbox/Google Drive, use your own S3 bucket instead (locker.dev)
- AI may be making us think and write more alike (dornsife.usc.edu)
- Show HN: Brutalist Concrete Laptop Stand (2024) (sam-burns.com)
- We found an undocumented bug in the Apollo 11 guidance computer code (www.juxt.pro)
GitHub Trending(9)
Product Hunt(15)
- OpenOwl
Automate what APIs can't in one prompt done locally
- FITYCAL
Track body measurements, fat%, lean Mass, progress & more
- MBCompass
A full navigation utility in ~2MB
- ChatGPT Ads by Gauge
The intelligence layer for ChatGPT Ads
- Bibby AI
The AI co-author for research papers
- Cheese! OCR
Select any screen area, get text instantly
- MacYaps
Battery dying? WiFi gone? Your Mac finally talks back.
- NovaVoice
Smart dictation, AI assistant, + app control via voice
- Caret
Press Tab for AI anywhere you type on Mac
- Keupera
Get seen in search, AI and beyond
- Netflix Playground
A world for kids to explore along their favorite characters
- Highlight Studio
Record, edit, and brand screen recordings Metal powered
- lofi.town
A cozy productivity app to focus with others + vibe to lofi
- INSPEC Lucid Dreaming Device
A night-vision smart camera that knows when you are dreaming
- Lessie AI
Search, Reach and Connect - Find the perfect fit, 10x faster
Hugging Face(15)
- OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
World models have garnered significant attention as a promising research direction in artificial intelligence, yet a clear and unified definition remains lacking. In this paper, we introduce OpenWorldLib, a comprehensive and standardized inference framework for Advanced World Models. Drawing on the evolution of world models, we propose a clear definition: a world model is a model or framework centered on perception, equipped with interaction and long-term memory capabilities, for understanding and predicting the complex world. We further systematically categorize the essential capabilities of world models. Based on this definition, OpenWorldLib integrates models across different tasks within a unified framework, enabling efficient reuse and collaborative inference. Finally, we present additional reflections and analyses on potential future directions for world model research. Code link: https://github.com/OpenDCAI/OpenWorldLib
- MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored. Yet SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself. Building on this finding, we present \minerupro, which advances the state of the art solely through data engineering and training strategy optimization while keeping the 1.2B-parameter architecture of \mineru completely fixed. At its core is a Data Engine co-designed around coverage, informativeness, and annotation accuracy: Diversity-and-Difficulty-Aware Sampling expands training data from under 10M to 65.5M samples while correcting distribution shift; Cross-Model Consistency Verification leverages output agreement among heterogeneous models to assess sample difficulty and generate reliable annotations; the Judge-and-Refine pipeline improves annotation quality for hard samples through render-then-verify iterative correction. A three-stage progressive training strategy -- large-scale pre-training, hard sample fine-tuning, and GRPO alignment -- sequentially exploits these data at different quality tiers. On the evaluation front, we fix element-matching biases in OmniDocBench~v1.5 and introduce a Hard subset, establishing the more discriminative OmniDocBench~v1.6 protocol. Without any architectural modification, \minerupro achieves 95.69 on OmniDocBench~v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods including models with over 200times more parameters.
- LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models
Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leaving robustness to paraphrased instructions underexplored. To study this gap, we introduce LIBERO-Para, a controlled benchmark that independently varies action expressions and object references for fine-grained analysis of linguistic generalization. Across seven VLA configurations (0.6B-7.5B), we observe consistent performance degradation of 22-52 pp under paraphrasing. This degradation is primarily driven by object-level lexical variation: even simple synonym substitutions cause large drops, indicating reliance on surface-level matching rather than semantic grounding. Moreover, 80-96% of failures arise from planning-level trajectory divergence rather than execution errors, showing that paraphrasing disrupts task identification. Binary success rate treats all paraphrases equally, obscuring whether models perform consistently across difficulty levels or rely on easier cases. To address this, we propose PRIDE, a metric that quantifies paraphrase difficulty using semantic and syntactic factors. Our benchmark and corresponding code are available at: https://github.com/cau-hai-lab/LIBERO-Para
- TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE queries. However, queries rotate with position during RoPE, making representative queries very few, leading to poor top-key selection and unstable reasoning. To avoid this issue, we turn to the pre-RoPE space, where we observe that Q and K vectors are highly concentrated around fixed non-zero centers and remain stable across positions -- Q/K concentration. We show that this concentration causes queries to preferentially attend to keys at specific distances (e.g., nearest keys), with the centers determining which distances are preferred via a trigonometric series. Based on this, we propose TriAttention to estimate key importance by leveraging these centers. Via the trigonometric series, we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation. On AIME25 with 32K-token generation, TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction, whereas leading baselines achieve only about half the accuracy at the same efficiency. TriAttention enables OpenClaw deployment on a single consumer GPU, where long context would otherwise cause out-of-memory with Full Attention.
- Adam's Law: Textual Frequency Law on Large Language Models
While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction in terms of textual data frequency, which is an understudied topic, to the best of our knowledge. Our framework is composed of three units. First, this paper proposes Textual Frequency Law (TFL), which indicates that frequent textual data should be preferred for LLMs for both prompting and fine-tuning. Since many LLMs are closed-source in their training data, we propose using online resources to estimate the sentence-level frequency. We then utilize an input paraphraser to paraphrase the input into a more frequent textual expression. Next, we propose Textual Frequency Distillation (TFD) by querying LLMs to conduct story completion by further extending the sentences in the datasets, and the resulting corpora are used to adjust the initial estimation. Finally, we propose Curriculum Textual Frequency Training (CTFT) that fine-tunes LLMs in an increasing order of sentence-level frequency. Experiments are conducted on our curated dataset Textual Frequency Paired Dataset (TFPD) on math reasoning, machine translation, commonsense reasoning and agentic tool calling. Results show the effectiveness of our framework.
- AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approaches often rely on decoupled trigger-response pipelines or are limited to captioning-style narration, reducing their effectiveness for open-ended question answering and long-horizon interaction. We propose AURA (Always-On Understanding and Real-Time Assistance), an end-to-end streaming visual interaction framework that enables a unified VideoLLM to continuously process video streams and support both real-time question answering and proactive responses. AURA integrates context management, data construction, training objectives, and deployment optimization for stable long-horizon streaming interaction. It achieves state-of-the-art performance on streaming benchmarks and supports a real-time demo system with ASR and TTS running at 2 FPS on two 80G accelerators. We release the AURA model together with a real-time inference framework to facilitate future research.
- ClawArena: Benchmarking AI Agents in Evolving Information Environments
AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rather than explicit instructions. Existing benchmarks largely assume static, single-authority settings and do not evaluate whether agents can keep up with this complexity. We introduce ClawArena, a benchmark for evaluating AI agents in evolving information environments. Each scenario maintains a complete hidden ground truth while exposing the agent only to noisy, partial, and sometimes contradictory traces across multi-channel sessions, workspace files, and staged updates. Evaluation is organized around three coupled challenges: multi-source conflict reasoning, dynamic belief revision, and implicit personalization, whose interactions yield a 14-category question taxonomy. Two question formats, multi-choice (set-selection) and shell-based executable checks, test both reasoning and workspace grounding. The current release contains 64 scenarios across 8 professional domains, totaling 1{,}879 evaluation rounds and 365 dynamic updates. Experiments on five agent frameworks and five language models show that both model capability (15.4% range) and framework design (9.2%) substantially affect performance, that self-evolving skill frameworks can partially close model-capability gaps, and that belief revision difficulty is determined by update design strategy rather than the mere presence of updates. Code is available at https://github.com/aiming-lab/ClawArena.
- SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
Image spatial editing performs geometry-driven transformations, allowing precise control over object layout and camera viewpoints. Current models are insufficient for fine-grained spatial manipulations, motivating a dedicated assessment suite. Our contributions are listed: (i) We introduce SpatialEdit-Bench, a complete benchmark that evaluates spatial editing by jointly measuring perceptual plausibility and geometric fidelity via viewpoint reconstruction and framing analysis. (ii) To address the data bottleneck for scalable training, we construct SpatialEdit-500k, a synthetic dataset generated with a controllable Blender pipeline that renders objects across diverse backgrounds and systematic camera trajectories, providing precise ground-truth transformations for both object- and camera-centric operations. (iii) Building on this data, we develop SpatialEdit-16B, a baseline model for fine-grained spatial editing. Our method achieves competitive performance on general editing while substantially outperforming prior methods on spatial manipulation tasks. All resources will be made public at https://github.com/EasonXiao-888/SpatialEdit.
- FileGram: Grounding Agent Personalization in File-System Behavioral Traces
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training and evaluation, and existing methods remain interaction-centric while overlooking dense behavioral traces in file-system operations; to address this gap, we propose FileGram, a comprehensive framework that grounds agent memory and personalization in file-system behavioral traces, comprising three core components: (1) FileGramEngine, a scalable persona-driven data engine that simulates realistic workflows and generates fine-grained multimodal action sequences at scale; (2) FileGramBench, a diagnostic benchmark grounded in file-system behavioral traces for evaluating memory systems on profile reconstruction, trace disentanglement, persona drift detection, and multimodal grounding; and (3) FileGramOS, a bottom-up memory architecture that builds user profiles directly from atomic actions and content deltas rather than dialogue summaries, encoding these traces into procedural, semantic, and episodic channels with query-time abstraction; extensive experiments show that FileGramBench remains challenging for state-of-the-art memory systems and that FileGramEngine and FileGramOS are effective, and by open-sourcing the framework, we hope to support future research on personalized memory-centric file-system agents.
- Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed rollouts, lacking the token-level focus needed to efficiently address specific deviations. Self-Distillation Policy Optimization (SDPO) addresses this by providing denser, more targeted logit-level supervision that facilitates rapid early improvement, yet it frequently collapses during prolonged training. We trace this late-stage instability to two intrinsic flaws: self-distillation on already-correct samples introduces optimization ambiguity, and the self-teacher's signal reliability progressively degrades. To resolve these issues, we propose Sample-Routed Policy Optimization (SRPO), a unified on-policy framework that routes correct samples to GRPO's reward-aligned reinforcement and failed samples to SDPO's targeted logit-level correction. SRPO further incorporates an entropy-aware dynamic weighting mechanism to suppress high-entropy, unreliable distillation targets while emphasizing confident ones. Evaluated across five benchmarks and two model scales, SRPO achieves both the rapid early improvement of SDPO and the long-horizon stability of GRPO. It consistently surpasses the peak performance of both baselines, raising the five-benchmark average on Qwen3-8B by 3.4% over GRPO and 6.3% over SDPO, while simultaneously yielding moderate response lengths and lowering per-step compute cost by up to 17.2%.
- LightThinker++: From Reasoning Compression to Memory Management
Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.
- Self-Execution Simulation Improves Coding Models
A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.
- SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a plug-and-play skill knowledge base that can be reused across agents and environments. SkillX operates through a fully automated pipeline built on three synergistic innovations: (i) Multi-Level Skills Design, which distills raw trajectories into three-tiered hierarchy of strategic plans, functional skills, and atomic skills; (ii) Iterative Skills Refinement, which automatically revises skills based on execution feedback to continuously improve library quality; and (iii) Exploratory Skills Expansion, which proactively generates and validates novel skills to expand coverage beyond seed training data. Using a strong backbone agent (GLM-4.6), we automatically build a reusable skill library and evaluate its transferability on challenging long-horizon, user-interactive benchmarks, including AppWorld, BFCL-v3, and τ^2-Bench. Experiments show that SkillKB consistently improves task success and execution efficiency when plugged into weaker base agents, highlighting the importance of structured, hierarchical experience representations for generalizable agent learning. Our code will be publicly available soon at https://github.com/zjunlp/SkillX.
- Vero: An Open RL Recipe for General Visual Reasoning
What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pipelines with non-public data. We introduce Vero, a family of fully open VLMs that matches or exceeds existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answer formats. Vero achieves state-of-the-art performance, improving over four base models by 3.7-5.5 points on average across VeroEval, our suite of 30 challenging benchmarks. Starting from Qwen3-VL-8B-Instruct, Vero outperforms Qwen3-VL-8B-Thinking on 23 of 30 benchmarks without additional proprietary thinking data. When trained from the same base model, Vero-600K exceeds existing RL datasets across task categories. Systematic ablations reveal that different task categories elicit qualitatively distinct reasoning patterns that transfer poorly in isolation, suggesting that broad data coverage is the primary driver of strong RL scaling. All data, code, and models are released.
- Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.
Techmeme(15)
- Z.ai releases GLM-5.1, a 754B-parameter model that it says outperforms GPT-5.4 and Claude Opus 4.6 on SWE-bench Pro, available under an MIT license (Carl Franzen/VentureBeat)
Carl Franzen / VentureBeat : Z.ai releases GLM-5.1, a 754B-parameter model that it says outperforms GPT-5.4 and Claude Opus 4.6 on SWE-bench Pro, available under an MIT license — Is China picking back up the open source AI baton? — Z.ai, also known as Zhupai AI, a Chinese AI startup best known for its powerful …
- A group of US agencies including the FBI and the NSA warns that Iran-linked hackers have targeted industrial control devices used in US critical infrastructure (Andy Greenberg/Wired)
Andy Greenberg / Wired : A group of US agencies including the FBI and the NSA warns that Iran-linked hackers have targeted industrial control devices used in US critical infrastructure — As Trump threatens Iranian infrastructure, the US government warns that Iran has carried out its own digital attacks against US critical infrastructure.
- Google rolls out an AI Enhance button for Photos on Android globally, offering automated lighting and contrast adjustments, and video playback speed controls (Andrew Romero/9to5Google)
Andrew Romero / 9to5Google : Google rolls out an AI Enhance button for Photos on Android globally, offering automated lighting and contrast adjustments, and video playback speed controls — Google announced two new additions to Google Photos for all Android users, and they've already begun rolling out.
- Elon Musk amends his OpenAI lawsuit to ask that damages he might win be awarded to OpenAI's charity arm and that Altman be removed from OpenAI's nonprofit board (Jessica Toonkel/Wall Street Journal)
Jessica Toonkel / Wall Street Journal : Elon Musk amends his OpenAI lawsuit to ask that damages he might win be awarded to OpenAI's charity arm and that Altman be removed from OpenAI's nonprofit board — Tesla billionaire also seeks Sam Altman's removal from OpenAI nonprofit's board in amendment to suit over for-profit conversion
- Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6 (Michael Nuñez/VentureBeat)
Michael Nuñez / VentureBeat : Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6 — Anthropic on Tuesday announced Project Glasswing, a sweeping cybersecurity initiative that pairs an unreleased frontier AI model …
- Q&A with OpenAI President Greg Brockman about OpenAI's research direction, how far it can push Codex, closing Sora, betting on text vs. world models, and more (Alex Kantrowitz/Big Technology)
Alex Kantrowitz / Big Technology : Q&A with OpenAI President Greg Brockman about OpenAI's research direction, how far it can push Codex, closing Sora, betting on text vs. world models, and more — OpenAI is shifting strategies yet again. Here's the logic behind the latest moves and what they mean for the company's direction.
- Anthropic says Mythos Preview is a general-purpose model and found thousands of high-severity vulnerabilities, including some in every major OS and web browser (Anthropic)
Anthropic : Anthropic says Mythos Preview is a general-purpose model and found thousands of high-severity vulnerabilities, including some in every major OS and web browser — Earlier today we announced Claude Mythos Preview, a new general-purpose language model. This model performs strongly across the board …
- Google updates Chrome with vertical tabs, a feature that Mozilla Firefox and Microsoft Edge have long offered, and a new full-screen layout in reading mode (Lance Whitney/ZDNET)
Lance Whitney / ZDNET : Google updates Chrome with vertical tabs, a feature that Mozilla Firefox and Microsoft Edge have long offered, and a new full-screen layout in reading mode — ZDNET's key takeaways — Google has started rolling out vertical tabs in Chrome. — With vertical tabs, all your open web pages appear in a sidebar.
- Interviews with Anthropic executives on why Claude Mythos Preview is a cybersecurity "reckoning", it is not releasing it publicly over misuse concerns, and more (Kevin Roose/New York Times)
Kevin Roose / New York Times : Interviews with Anthropic executives on why Claude Mythos Preview is a cybersecurity “reckoning”, it is not releasing it publicly over misuse concerns, and more — The company said on Tuesday that it was holding back on releasing the new technology but was working with 40 companies …
- Waterloo-based Mappedin, which uses AI and LiDAR to create and maintain 3D digital maps of indoor spaces, raised $24.5M, bringing its total funding to $35M (Chris Metinko/Axios)
Chris Metinko / Axios : Waterloo-based Mappedin, which uses AI and LiDAR to create and maintain 3D digital maps of indoor spaces, raised $24.5M, bringing its total funding to $35M — Mappedin, a Canadian indoor digital mapping startup, raised a US$24.5 million growth equity financing led by Edison Partners, CEO Hongwei Liu tells Axios Pro exclusively.
- Anthropic's Project Glasswing launch partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, Microsoft, Nvidia, and Palo Alto Networks (David Gewirtz/ZDNET)
David Gewirtz / ZDNET : Anthropic's Project Glasswing launch partners include AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, Microsoft, Nvidia, and Palo Alto Networks — ZDNET's key takeaways — AI found thousands of hidden bugs in critical systems. — Tech rivals unite to secure shared infrastructure risks.
- Anthropic announces Project Glasswing, a cybersecurity initiative that will use its Claude Mythos Preview model to help find and fix software vulnerabilities (Anthropic)
Anthropic : Anthropic announces Project Glasswing, a cybersecurity initiative that will use its Claude Mythos Preview model to help find and fix software vulnerabilities — Today we're announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom …
- Anthropic commits up to $100M in usage credits for Project Glasswing, along with $4M in direct donations to open-source security organizations (Greg Otto/CyberScoop)
Greg Otto / CyberScoop : Anthropic commits up to $100M in usage credits for Project Glasswing, along with $4M in direct donations to open-source security organizations — The program comes as the tech industry races to secure software before similar AI-powered offensive capabilities become too much for defenders to handle.
- Anthropic says it will make Claude Mythos Preview available to 40+ organizations that maintain critical software and doesn't plan to make it generally available (Lucas Ropek/TechCrunch)
Lucas Ropek / TechCrunch : Anthropic says it will make Claude Mythos Preview available to 40+ organizations that maintain critical software and doesn't plan to make it generally available — Anthropic on Tuesday released a preview of its new frontier model, Mythos, which it says will be used by a small coterie of partner organizations for cybersecurity work.
- Sources say Apple will announce its foldable iPhone in September alongside the iPhone 18 Pro and iPhone 18 Pro Max, rebutting a report about production delays (Mark Gurman/Bloomberg)
Mark Gurman / Bloomberg : Sources say Apple will announce its foldable iPhone in September alongside the iPhone 18 Pro and iPhone 18 Pro Max, rebutting a report about production delays — Apple Inc.'s first foldable phone is on track to arrive during the company's normal iPhone launch period later this year …
Solidot(15)
- 日本科学家演示能承受核反应堆六个月强辐射的 Wi-Fi 接收器
日本科学家 Yasuto Narukiyo 在 ISSCC 上演示了能承受核反应堆六个月强辐射的 Wi-Fi 接收器。Wi-Fi 接收器能承受 500 kilograys,是太空电子设备通常三年承受 100-300 grays 辐射剂量的千倍以上。Narukiyo 称,2011 年福岛核事故后工程师使用机器人勘察和清理核电站。这些机器人多需要 LAN 线缆,而线缆很容易缠绕在一起。他的团队的目标是开发一种用于在这种恶劣环境下控制机器人的无线系统。即使不是如此极端的环境,核电站寿命到期也需要清理,而很多关闭的核电站至今未完成清理工作,未来二十年还有 200 座反应堆退役。Narukiyo 的团队使用了硅 MOSFET 晶体管,减少晶体管数量同时改变其形状,并加宽了栅极。
- 创纪录风能和太阳能发电量让英国避免了 10 亿英镑天然气进口
由于风能和太阳能发电量创新高,英国在 2026 年 3 月免于进口价值 10 亿英镑的天然气。3 月的风能和太阳能发电总量达到 11TWh,合计增长 28%,英国因此避免了进口 21 TWh 的天然气,按照目前的价格相当于 10 亿英镑。相比 2022 年俄罗斯入侵乌克兰导致 3 月油气价格高涨,中东战事后的天然气价格对英国电价的影响降低了约 25%。
- 流行 NPM 软件包维护者成为 AI 深度伪造攻击目标
多位流行 NPM 软件包维护者成为 AI 深度伪造攻击目标,他们遭遇了相似的社会工程攻击。axios 维护者 Jason Saayman 称,疑似 APT 组织 UNC1069 的黑客冒充一家公司的创始人联系他,他们不仅克隆了创始人的外表,还克隆了公司本身。他们邀请他加入一个真实的 Slack 工作区(workspace),还创建了频道分享 LinkedIn 帖子,非常逼真。然后黑客邀请参加一个 Microsoft Teams 虚拟会议,会议提示其系统存在问题。他以为这与 Teams 有关,于是安装了缺失组件,结果却植入了远程访问木马。他维护的 axios 周下载量 1 亿,被云服务和编码环境广泛使用,黑客窃取了维护者的凭证释出了 axios 的恶意版本。这不是一起孤立事件,多位周下载量上亿的 NPM 软件包维护者遭遇了类似的 AI 深度伪造攻击。
- TDF 基金会称它取消 Collabora 员工的会员资格是为了遵守非营利组织法
The Document Foundation(TDF)再次通过官方博客回应了它与主要商业合作伙伴 Collabora 之间的分歧。TDF 称过去几年它犯下了多项违反非营利组织法的错误:仅允许生态系统内的公司免费使用 LibreOffice 品牌;将 LibreOffice 的开发合同——新功能开发、修 bug 等——授予在基金会董事会中拥有代表,积极参与采购的企业。基金会法律顾问指出这些违反法律的错误之后,从中受益的企业试图维持现状而不是解决问题。为避免失去非营利组织地位以及由此带来的不可预见的后果,TDF 基金会采取了措施,取消 Collabora 员工的会员资格、冻结招标以及引入开发采购政策,并制定了规则降低未来再次出现类似问题的风险。
- Linux 准备移除对 i486 CPU 的支持
Linux 准备移除对 i486 CPU 的支持,名为“x86/cpu: Remove M486/M486SX/ELAN support”的补丁预计将合并到 Linux 7.1 中。英特尔是在 1989 年 4 月推出了 25MHz i486 CPU,33MHz i486 在 1990 年 5 月发布,50MHz 在 1991 年 6 月推出。英特尔对 i486 的生产一直持续到 2007 年。i486 的继任者是 1993 年推出的奔腾 CPU。i486 也是 AMD 的最后一款英特尔处理器克隆产品。Linux kernel 过去几年已经多次考虑移除对 i486 的支持。Linux 作者 Linus Torvalds 最近再次表示不应该再浪费精力在 i486 的支持上。
- Sam Altman 能被信任吗?
《纽约客》发表了一篇长文,从 OpenAI 董事会在 2023 年秋季发起的未遂“政变”谈起,使用了未完整公开的内部备忘录,详述了 OpenAI 受争议 CEO Sam Altman 过去十几年的经历,提出了一个问题:如果今天的大模型能通向 AGI(通用人工智能),那么 OpenAI 及其 CEO 有可能控制人类的未来,但 Sam 能被人信任吗?文章援引了他在 Y Combinator 第一批创业营的同学、2013 年自杀的 Aaron Swartz 的评价,“Sam 永远不可被信任,他是个反社会人格者。什么都做得出来。”微软在 2023 年的未遂“政变”中支持了 Sam,如今双方关系紧张, 微软高管认为他可能会作为一个骗子留在人们的记忆里。
- 日本越来越多的家庭没有电视
日本越来越多的家庭没有电视。29 岁以下的单身男性拥有电视的比例仅为 58.0%,该数字在 2005 年为 96.8%,2015 年降至 76.2%。两人以上家庭拥有电视的比例从 2000~2013 年的 99% 以上降至 94.4%。2025 年的日本人每日平均媒体接触时间为 440 分钟。其中电视仅占四分之一左右。而在 2006 年电视占比超过半数。
- 调查显示俄罗斯 46% 的用户曾用过 VPN
被俄罗斯列入黑名单的网站高达 470 万,其中包括主要社交平台 Facebook、Instagram、YouTube 和 X,VPN 已成为数百万俄罗斯人的日常必需品。尽管政府加大了打击力度,VPN 的用户数量仍然在持续增长。一位莫斯科居民表示只有在使用与 VPN 不兼容的官方应用时他才关闭该软件。一位 29 岁的社媒营销人员表示她的工作需要她使用 VPN 访问 Instagram 和 YouTube 等受限平台。据估计俄罗斯的 VPN 使用率位居全球第二,约有 37.6% 的网民依赖 VPN。Institute of Social Marketing 在 2025 年的一项调查发现,46% 的受访者至少使用过一次 VPN。由于无法完全禁止 VPN,移动运营商正考虑对跨境流量收取高昂费用,让访问国际互联网成为少数人的特权。
- 美国 AI 公司联手遏制中国公司的蒸馏
彭博报道,OpenAI、Anthropic 以及 Google 开始合作,试图遏制中国竞争对手从美国先进 AI 模型中提取结果,以在全球AI竞赛中获取优势。这一罕见合作凸显出美国 AI 企业对相关问题的重视程度。这些公司担忧一些用户,尤其是中国的用户,正开发其产品的仿制版本,可能通过更低价格争夺客户,同时带来国家安全风险。美国官员预计未经授权的蒸馏行为每年给硅谷实验室造成数十亿美元利润损失。
- Artemis II 创下了人类太空飞行最遥远纪录
NASA Artemis II 于 4 月 6 日 12:56 p.m. CDT 创下了人类太空飞行最遥远纪录。Artemis II 正在执行载人绕月任务,共持续十天,于 4 月 1 日发射,目前整个任务还剩下四天。Artemis II 的 Orion 飞船飞行了 248,655 英里,超过了 1970 年阿波罗 13 号任务创下的人类太空飞行最远距离纪录。Orion 飞船最远将达到 252,756 英里,之后返回地球,预计 4 月 10 日 8:07 p.m. EDT 溅落在圣地亚哥海岸附近。
- 欧盟在一次供应链攻击中被盗走 92 GB 压缩数据
黑客组织 TeamPCP 利用 Trivy 的 GitHub 库在二月底被入侵之后凭证轮换不完整的漏洞推送了恶意代码。Trivy 是广泛使用的开源漏洞扫描器。欧盟委员会的自动化安全流程下载了含有恶意代码的 Trivy 更新,恶意代码窃取了 AWS API 密钥,攻击者随后访问了欧盟委员会在 AWS 的云账号,窃取了 92 GB 的压缩数据。这次攻击始于 3 月 19 日,安全团队直到 3 月 24 日才检测到异常活动。另一个黑客团队 ShinyHunters 公开了部分窃取的数据。一个网络犯罪团队负责攻击,另一个团队负责泄露,凸显了网络犯罪活动的专业分工程度在提高。公开的数据集在解压后大小为 340GB,包含了数万电邮和个人信息。
- 德国公开俄罗斯勒索软件组织 REvil 头目的身份
德国公开了曾在早期运营俄罗斯勒索软件组织 GandCrab 和 REvil 的头目 UNKN 的身份。31 岁的 Daniil Maksimovich Shchukin 于 2019-2021 年间协助在德国实施了至少 130 起计算机破坏和勒索行动。德国称,Shchukin 以及另一名俄罗斯人——43 岁 Anatoly Sergeevitsch Kravchuk——一起勒索了近 200 万欧元,造成经济损失逾 3500 万欧元。德国联邦刑事警察局(BKA)称他领导的 GandCrab 和 REvil 首创了双重勒索——先向受害者收取赎金提供解锁的密钥,然后再收取一笔费用换取不公开被盗数据的承诺。GandCrab 在 2019 年成功勒索逾 20 亿美元后宣布解散,但随后就以 REvil 的名字再次亮相。
- Chrome 148 将延迟加载视频和音频以改进性能
Chrome 和 Firefox 等浏览器都支持延迟加载。延迟加载又名懒加载,顾名思义,为了加快页面加载速度而推迟加载特定对象,这些对象直到要使用时才会开始加载,如 Chrome 从 2019 年起就延迟加载图像和 iframe。现在它正在 Chrome 148 上测试延迟加载视频和音频以改进浏览器性能。今天很多网站尤其是新闻网站都会在页面中嵌入视频和音频,影响了页面加载速度。
- 美国科罗拉多州推出测均速相机系统
今天的司机有很多方法躲避超速相机的监测,比如手机应用程序能提前通知司机前方有测速相机,司机随后放慢车速,通过之后再加速行驶。为了遏制此类行为,越来越多的地方开始推出测量均速的相机系统:跟踪同一辆汽车在多个监控点之间的均速,如果均速超过限速 10 英里/时或以上,则对相关车辆开出罚单。美国科罗拉多州于 2023 年通过法律允许使用自动车辆识别系统计算汽车在不同摄像头之间的均速,去年底交警开始正式对超速行驶的汽车开罚单。罚款为 75 美元,不扣分,罚单将开给车主,不管驾驶汽车的司机是谁。地图软件如 TomTom 也对此采取了应对措施,为司机提供测均速区域的均速信息。
- 认知投降导致 AI 用户放弃逻辑思维能力
AI 工具的用户通常可分为两类:其一将 AI 视为功能强大但会犯错的服务,需要人类仔细监督和审查以发现其中的推理或事实错误;其二将 AI 视为无所不知——此类用户被称为是“认知投降派”。宾夕法尼亚大学沃顿商学院的研究人员对 1372 名参与者和逾 9500 次测试后发现,高达 73.2% 的情况下参与者愿意接受 AI 错误的推理,只有 19.7% 的情况下会推翻推理。研究人员表示这一结果“表明人很容易将 AI 生成的输出融入到决策过程中,且通常几乎没有任何抵触或怀疑”,“流畅、自信的输出会被视为有认知权威性,从而降低审查门槛,减弱了通常会促使人们进行深思熟虑的元认知信号”。他们发现,倾向于将 AI 视为权威的人更容易被 AI 提供的错误答案误导。