DIGEST · 2026-03-05

OrangeBot.AI Digest — 2026-03-05

86 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Pentagon formally labels Anthropic supply-chain risk (www.wsj.com)
  2. GPT-5.4 (openai.com)
  3. The government uses targeted advertising to track your location (www.eff.org)
  4. A GitHub Issue Title Compromised 4k Developer Machines (grith.ai)
  5. Wikipedia was in read-only mode following mass admin account compromise (www.wikimediastatus.net)
  6. Show HN: Jido 2.0, Elixir Agent Framework (jido.run)
  7. Google Safe Browsing missed 84% of confirmed phishing sites (www.norn-labs.com)
  8. Good software knows when to stop (ogirardot.writizzy.com)
  9. Judge orders government to begin refunding more than $130B in tariffs (www.wsj.com)
  10. Poor Man's Polaroid (boxart.lt)
  11. No right to relicense this project (github.com)
  12. Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift (blog.ivan.digital)
  13. The L in "LLM" Stands for Lying (acko.net)
  14. AMD will bring its “Ryzen AI” processors to standard desktop PCs for first time (arstechnica.com)
  15. You Just Reveived (dylan.gr)

GitHub Trending(11)

  1. msitarzewski / agency-agents
  2. TheCraigHewitt / seomachine
  3. KeygraphHQ / shannon
  4. aquasecurity / trivy
  5. moeru-ai / airi
  6. inclusionAI / AReaL
  7. microsoft / mcp-for-beginners
  8. CodebuffAI / codebuff
  9. FujiwaraChoki / MoneyPrinterV2
  10. agentscope-ai / ReMe
  11. microsoft / hve-core

Product Hunt(15)

  1. Step 3.5 Flash

    Frontier open-source MoE model built for OpenClaw agents

  2. Itchy

    Free macOS notch app with 12+ modules & custom SDK

  3. Codex app for Windows

    Codex now runs natively on Windows with secure sandbox

  4. Parsewise

    Cursor for document work

  5. Aident AI Beta 2

    Open-world automations, managed in plain English

  6. HookLens

    Hook. Body. CTA. Know exactly where your ad fails.

  7. GitSync Lite for macOS

    Monitor, sync & back up your git repos from the menu bar

  8. Willow Voice for Teams

    Kill the keyboard for your team with voice AI

  9. MacBook Neo

    The magic of Mac at a surprising price

  10. Hermit

    Leave ChatGPT while keeping everything it learned about you

  11. Supa Social

    Self-host your community platform

  12. Vois

    Studio-quality voice AI that runs locally on your desktop.

  13. Spoke

    Private voice-to-text for macOS. Hold a key, speak, done.

  14. Itsyconnect

    Manage your App Store Connect from macOS desktop app

  15. Heywa

    Tappable visual stories instead of ChatGPT text walls

Hugging Face(15)

  1. Helios: Real Real-Time Long Video Generation Model

    We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time generation without standard acceleration techniques such as KV-cache, sparse/linear attention, or quantization; and (3) training without parallelism or sharding frameworks, enabling image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks. To mitigate drifting in long-video generation, we characterize typical failure modes and propose simple yet effective training strategies that explicitly simulate drifting during training, while eliminating repetitive motion at its source. For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models. Moreover, we introduce infrastructure-level optimizations that accelerate both inference and training while reducing memory consumption. Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation. We plan to release the code, base model, and distilled model to support further development by the community.

  2. T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

    Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large language model benefit from text structure to enhance text-processing performance? To explore it, in this work, we first introduce Structure of Thought (SoT), a prompting technique that explicitly guides models to construct intermediate text structures, consistently boosting performance across eight tasks and three model families. Building upon this insight, we present T2S-Bench, the first benchmark designed to evaluate and improve text-to-structure capabilities of models. T2S-Bench includes 1.8K samples across 6 scientific domains and 32 structural types, rigorously constructed to ensure accuracy, fairness, and quality. Evaluation on 45 mainstream models reveals substantial improvement potential: the average accuracy on the multi-hop reasoning task is only 52.1%, and even the most advanced model achieves 58.1% node accuracy in end-to-end extraction. Furthermore, on Qwen2.5-7B-Instruct, SoT alone yields an average +5.7% improvement across eight diverse text-processing tasks, and fine-tuning on T2S-Bench further increases this gain to +8.6%. These results highlight the value of explicit text structuring and the complementary contributions of SoT and T2S-Bench. Dataset and eval code have been released at https://t2s-bench.github.io/T2S-Bench-Page/.

  3. Heterogeneous Agent Collaborative Reinforcement Learning

    We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during training to mutually improve, while operating independently at inference time. Unlike LLM-based multi-agent reinforcement learning (MARL), HACRL does not require coordinated deployment, and unlike on-/off-policy distillation, it enables bidirectional mutual learning among heterogeneous agents rather than one-directional teacher-to-student transfer. Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer. To mitigate capability discrepancies and policy distribution shifts, HACPO introduces four tailored mechanisms with theoretical guarantees on unbiased advantage estimation and optimization correctness. Extensive experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3\% while using only half the rollout cost.

  4. Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

    Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated content to meet real-time constraints. In this work, we instantiate AI companions through two gaming scenarios, commentator and guide, selected for their suitability for automatic evaluation. We introduce the Live Gaming Benchmark, a large-scale dataset with three representative scenarios: solo commentary, co-commentary, and user guidance, and present Proact-VL, a general framework that shapes multimodal language models into proactive, real-time interactive agents capable of human-like environment perception and interaction. Extensive experiments show Proact-VL achieves superior response latency and quality while maintaining strong video understanding capabilities, demonstrating its practicality for real-time interactive applications.

  5. MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

    As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex indexing methods (such as memory graphs) require heavy computation and can cause information loss. Furthermore, relying on the working LLM to process all memories is computationally expensive and slow. To address these limitations, we propose MemSifter, a novel framework that offloads the memory retrieval process to a small-scale proxy model. Instead of increasing the burden on the primary working LLM, MemSifter uses a smaller model to reason about the task before retrieving the necessary information. This approach requires no heavy computation during the indexing phase and adds minimal overhead during inference. To optimize the proxy model, we introduce a memory-specific Reinforcement Learning (RL) training paradigm. We design a task-outcome-oriented reward based on the working LLM's actual performance in completing the task. The reward measures the actual contribution of retrieved memories by mutiple interactions with the working LLM, and discriminates retrieved rankings by stepped decreasing contributions. Additionally, we employ training techniques such as Curriculum Learning and Model Merging to improve performance. We evaluated MemSifter on eight LLM memory benchmarks, including Deep Research tasks. The results demonstrate that our method meets or exceeds the performance of existing state-of-the-art approaches in both retrieval accuracy and final task completion. MemSifter offers an efficient and scalable solution for long-term LLM memory. We have open-sourced the model weights, code, and training data to support further research.

  6. ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

    Synthesizing physically plausible articulated human-object interactions (HOI) without 3D/4D supervision remains a fundamental challenge. While recent zero-shot approaches leverage video diffusion models to synthesize human-object interactions, they are largely confined to rigid-object manipulation and lack explicit 4D geometric reasoning. To bridge this gap, we formulate articulated HOI synthesis as a 4D reconstruction problem from monocular video priors: given only a video generated by a diffusion model, we reconstruct a full 4D articulated scene without any 3D supervision. This reconstruction-based approach treats the generated 2D video as supervision for an inverse rendering problem, recovering geometrically consistent and physically plausible 4D scenes that naturally respect contact, articulation, and temporal coherence. We introduce ArtHOI, the first zero-shot framework for articulated human-object interaction synthesis via 4D reconstruction from video priors. Our key designs are: 1) Flow-based part segmentation: leveraging optical flow as a geometric cue to disentangle dynamic from static regions in monocular video; 2) Decoupled reconstruction pipeline: joint optimization of human motion and object articulation is unstable under monocular ambiguity, so we first recover object articulation, then synthesize human motion conditioned on the reconstructed object states. ArtHOI bridges video-based generation and geometry-aware reconstruction, producing interactions that are both semantically aligned and physically grounded. Across diverse articulated scenes (e.g., opening fridges, cabinets, microwaves), ArtHOI significantly outperforms prior methods in contact accuracy, penetration reduction, and articulation fidelity, extending zero-shot interaction synthesis beyond rigid manipulation through reconstruction-informed synthesis.

  7. Phi-4-reasoning-vision-15B Technical Report

    We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed its development. Our goal is to contribute practical insight to the research community on building smaller, efficient multimodal reasoning models and to share the result of these learnings as an open-weight model that is good at common vision and language tasks and excels at scientific and mathematical reasoning and understanding user interfaces. Our contributions include demonstrating that careful architecture choices and rigorous data curation enable smaller, open-weight multimodal models to achieve competitive performance with significantly less training and inference-time compute and tokens. The most substantial improvements come from systematic filtering, error correction, and synthetic augmentation -- reinforcing that data quality remains the primary lever for model performance. Systematic ablations show that high-resolution, dynamic-resolution encoders yield consistent improvements, as accurate perception is a prerequisite for high-quality reasoning. Finally, a hybrid mix of reasoning and non-reasoning data with explicit mode tokens allows a single model to deliver fast direct answers for simpler tasks and chain-of-thought reasoning for complex problems.

  8. CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

    Generating high-quality 360° panoramic videos from perspective input is one of the crucial applications for virtual reality (VR), whereby high-resolution videos are especially important for immersive experience. Existing methods are constrained by computational limitations of vanilla diffusion models, only supporting leq 1K resolution native generation and relying on suboptimal post super-resolution to increase resolution. We introduce CubeComposer, a novel spatio-temporal autoregressive diffusion model that natively generates 4K-resolution 360° videos. By decomposing videos into cubemap representations with six faces, CubeComposer autoregressively synthesizes content in a well-planned spatio-temporal order, reducing memory demands while enabling high-resolution output. Specifically, to address challenges in multi-dimensional autoregression, we propose: (1) a spatio-temporal autoregressive strategy that orchestrates 360° video generation across cube faces and time windows for coherent synthesis; (2) a cube face context management mechanism, equipped with a sparse context attention design to improve efficiency; and (3) continuity-aware techniques, including cube-aware positional encoding, padding, and blending to eliminate boundary seams. Extensive experiments on benchmark datasets demonstrate that CubeComposer outperforms state-of-the-art methods in native resolution and visual quality, supporting practical VR application scenarios. Project page: https://lg-li.github.io/project/cubecomposer

  9. Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

    Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.

  10. AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

    Large Vision-Language Models (LVLMs) have adopted visual token pruning strategies to mitigate substantial computational overhead incurred by extensive visual token sequences. While prior works primarily focus on either attention-based or diversity-based pruning methods, in-depth analysis of these approaches' characteristics and limitations remains largely unexplored. In this work, we conduct thorough empirical analysis using effective rank (erank) as a measure of feature diversity and attention score entropy to investigate visual token processing mechanisms and analyze the strengths and weaknesses of each approach. Our analysis reveals two insights: (1) Our erank-based quantitative analysis shows that many diversity-oriented pruning methods preserve substantially less feature diversity than intended; moreover, analysis using the CHAIR dataset reveals that the diversity they do retain is closely tied to increased hallucination frequency compared to attention-based pruning. (2) We further observe that attention-based approaches are more effective on simple images where visual evidence is concentrated, while diversity-based methods better handle complex images with distributed features. Building on these empirical insights, we show that incorporating image-aware adjustments into existing hybrid pruning strategies consistently improves their performance. We also provide a minimal instantiation of our empirical findings through a simple adaptive pruning mechanism, which achieves strong and reliable performance across standard benchmarks as well as hallucination-specific evaluations. Our project page available at https://cvsp-lab.github.io/AgilePruner.

  11. V_1: Unifying Generation and Self-Verification for Parallel Reasoners

    Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating multiple solutions, results in significantly better task outcomes. However, a critical bottleneck is verification: sampling is only effective if correct solutions can be reliably identified among candidates. While existing approaches typically evaluate candidates independently via scalar scoring, we demonstrate that models are substantially stronger at pairwise self-verification. Leveraging this insight, we introduce V_1, a framework that unifies generation and verification through efficient pairwise ranking. V_1 comprises two components: V_1-Infer, an uncertainty-guided algorithm using a tournament-based ranking that dynamically allocates self-verification compute to candidate pairs whose relative correctness is most uncertain; and V_1-PairRL, an RL framework that jointly trains a single model as both generator and pairwise self-verifier, ensuring the verifier adapts to the generator's evolving distribution. On code generation (LiveCodeBench, CodeContests, SWE-Bench) and math reasoning (AIME, HMMT) benchmarks, V_1-Infer improves Pass@1 by up to 10% over pointwise verification and outperforms recent test-time scaling methods while being significantly more efficient. Furthermore, V_1-PairRL achieves 7--9% test-time scaling gains over standard RL and pointwise joint training, and improves base Pass@1 by up to 8.7% over standard RL in a code-generation setting.

  12. InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

    Generating long-form storytelling videos with consistent visual narratives remains a significant challenge in video synthesis. We present a novel framework, dataset, and a model that address three critical limitations: background consistency across shots, seamless multi-subject shot-to-shot transitions, and scalability to hour-long narratives. Our approach introduces a background-consistent generation pipeline that maintains visual coherence across scenes while preserving character identity and spatial relationships. We further propose a transition-aware video synthesis module that generates smooth shot transitions for complex scenarios involving multiple subjects entering or exiting frames, going beyond the single-subject limitations of prior work. To support this, we contribute with a synthetic dataset of 10,000 multi-subject transition sequences covering underrepresented dynamic scene compositions. On VBench, InfinityStory achieves the highest Background Consistency (88.94), highest Subject Consistency (82.11), and the best overall average rank (2.80), showing improved stability, smoother transitions, and better temporal coherence.

  13. RIVER: A Real-Time Interaction Benchmark for Video LLMs

    The rapid advancement of multimodal large language models has demonstrated impressive capabilities, yet nearly all operate in an offline paradigm, hindering real-time interactivity. Addressing this gap, we introduce the Real-tIme Video intERaction Bench (RIVER Bench), designed for evaluating online video comprehension. RIVER Bench introduces a novel framework comprising Retrospective Memory, Live-Perception, and Proactive Anticipation tasks, closely mimicking interactive dialogues rather than responding to entire videos at once. We conducted detailed annotations using videos from diverse sources and varying lengths, and precisely defined the real-time interactive format. Evaluations across various model categories reveal that while offline models perform well in single question-answering tasks, they struggle with real-time processing. Addressing the limitations of existing models in online video interaction, especially their deficiencies in long-term memory and future perception, we proposed a general improvement method that enables models to interact with users more flexibly in real time. We believe this work will significantly advance the development of real-time interactive video understanding models and inspire future research in this emerging field. Datasets and code are publicly available at https://github.com/OpenGVLab/RIVER.

  14. SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

    Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. However, in the real world, the development of mature software is typically predicated on complex requirement changes and long-term feature iterations -- a process that static, one-shot repair paradigms fail to capture. To bridge this gap, we propose SWE-CI, the first repository-level benchmark built upon the Continuous Integration loop, aiming to shift the evaluation paradigm for code generation from static, short-term functional correctness toward dynamic, long-term maintainability. The benchmark comprises 100 tasks, each corresponding on average to an evolution history spanning 233 days and 71 consecutive commits in a real-world code repository. SWE-CI requires agents to systematically resolve these tasks through dozens of rounds of analysis and coding iterations. SWE-CI provides valuable insights into how well agents can sustain code quality throughout long-term evolution.

  15. MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

    Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs. We present MUSE (Multimodal Unified Safety Evaluation), an open-source, run-centric platform that integrates automatic cross-modal payload generation, three multi-turn attack algorithms (Crescendo, PAIR, Violent Durian), provider-agnostic model routing, and an LLM judge with a five-level safety taxonomy into a single browser-based system. A dual-metric framework distinguishes hard Attack Success Rate (Compliance only) from soft ASR (including Partial Compliance), capturing partial information leakage that binary metrics miss. To probe whether alignment generalizes across modality boundaries, we introduce Inter-Turn Modality Switching (ITMS), which augments multi-turn attacks with per-turn modality rotation. Experiments across six multimodal LLMs from four providers show that multi-turn strategies can achieve up to 90-100% ASR against models with near-perfect single-turn refusal. ITMS does not uniformly raise final ASR on already-saturated baselines, but accelerates convergence by destabilizing early-turn defenses, and ablation reveals that the direction of modality effects is model-family-specific rather than universal, underscoring the need for provider-aware cross-modal safety testing.

Techmeme(15)

  1. Sources: Together AI is in talks to raise ~$1B at a $7.5B pre-money valuation, up from $3.3B in 2025; its annualized revenue has hit ~$1B, up 3x+ from mid-2025 (The Information)

    The Information : Sources: Together AI is in talks to raise ~$1B at a $7.5B pre-money valuation, up from $3.3B in 2025; its annualized revenue has hit ~$1B, up 3x+ from mid-2025 —  Together AI, one of several up-and-coming cloud providers renting out Nvidia chip servers to AI developers …

  2. Anthropic says Claude's free active users grew 60%+ and daily signups grew 4x since the start of the year, with Monday being its strongest day ever (Shirin Ghaffary/Bloomberg)

    Shirin Ghaffary / Bloomberg : Anthropic says Claude's free active users grew 60%+ and daily signups grew 4x since the start of the year, with Monday being its strongest day ever —  The Claude maker gains new traction with everyday users while its enterprise business is under pressure  —  Anthropic is gaining ground …

  3. Anthropic launches an early-warning system for potential AI-driven destruction of white-collar jobs, says it shows "limited evidence" of AI-led job loss so far (Courtenay Brown/Axios)

    Courtenay Brown / Axios : Anthropic launches an early-warning system for potential AI-driven destruction of white-collar jobs, says it shows “limited evidence” of AI-led job loss so far —  - An occupation's specific tasks;  — An estimate of which of those tasks can be performed by large language models.

  4. Microsoft's new gaming CEO, Asha Sharma, teases the next-gen Xbox, codenamed Project Helix, saying it "will lead in performance and play your Xbox and PC games" (Jay Peters/The Verge)

    Jay Peters / The Verge : Microsoft's new gaming CEO, Asha Sharma, teases the next-gen Xbox, codenamed Project Helix, saying it “will lead in performance and play your Xbox and PC games” —  The new Xbox boss Asha Sharma revealed the codename as one of her first big announcements.

  5. X revamps its Creator Subscriptions with exclusive threads, a refreshed subscriptions paywall, a new dashboard, a shareable subscriptions card, and more (Sarah Perez/TechCrunch)

    Sarah Perez / TechCrunch : X revamps its Creator Subscriptions with exclusive threads, a refreshed subscriptions paywall, a new dashboard, a shareable subscriptions card, and more —  Elon Musk-owned X announced on Thursday that it's revamping the social network's Creator Subscriptions offering with a number of new features …

  6. Meta says it hired the engineering team from Atma Sciences, the startup that makes the vibe coding app Gizmo, earlier in 2026 to join its Superintelligence Labs (Sydney Bradley/Business Insider)

    Sydney Bradley / Business Insider : Meta says it hired the engineering team from Atma Sciences, the startup that makes the vibe coding app Gizmo, earlier in 2026 to join its Superintelligence Labs —  - Meta hired the engineers behind the vibe-coding app Gizmo.  — The app lets people use AI to create and share interactive content, like mini apps or games.

  7. GPT-5.4 is priced at $2.50/1M input and $15/1M output tokens while GPT-5.4 Pro is $30/1M input and $180/1M output tokens, more than GPT-5.2 and GPT-5.2 Pro (Carl Franzen/VentureBeat)

    Carl Franzen / VentureBeat : GPT-5.4 is priced at $2.50/1M input and $15/1M output tokens while GPT-5.4 Pro is $30/1M input and $180/1M output tokens, more than GPT-5.2 and GPT-5.2 Pro —  The AI updates aren't slowing down.  Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant …

  8. A senior US defense official says the Pentagon formally told Anthropic that the startup and its products "are deemed a supply chain risk, effective immediately" (Bloomberg)

    Bloomberg : A senior US defense official says the Pentagon formally told Anthropic that the startup and its products “are deemed a supply chain risk, effective immediately” —  The Pentagon said it has formally notified Anthropic PBC that it's determined the company and its products pose …

  9. OpenAI says users can now use ChatGPT directly in Microsoft Excel and Google Sheets and debuts a suite of financial-services tools to better tackle office work (Rachel Metz/Bloomberg)

    Rachel Metz / Bloomberg : OpenAI says users can now use ChatGPT directly in Microsoft Excel and Google Sheets and debuts a suite of financial-services tools to better tackle office work —  OpenAI is releasing a new flagship artificial intelligence model and a suite of financial-services tools that are meant to be better …

  10. OpenAI says GPT-5.4's "individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2" (David Gewirtz/ZDNET)

    David Gewirtz / ZDNET : OpenAI says GPT-5.4's “individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2” —  ZDNET's key takeaways  — GPT-5.4's 83% score suggests AI rivals expert professionals.

  11. GPT-5.4 is available in Pro and Thinking versions; its API version has improved tool calling and will be available with context windows of up to 1M tokens (Russell Brandom/TechCrunch)

    Russell Brandom / TechCrunch : GPT-5.4 is available in Pro and Thinking versions; its API version has improved tool calling and will be available with context windows of up to 1M tokens —  On Thursday, OpenAI released a GPT-5.4, a new foundation model billed as “our most capable and efficient frontier model for professional work.”

  12. OpenAI says GPT-5.4 produces presentations with stronger, more varied aesthetics and makes more effective use of its image generation tools (Igor Bonifacic/Engadget)

    Igor Bonifacic / Engadget : OpenAI says GPT-5.4 produces presentations with stronger, more varied aesthetics and makes more effective use of its image generation tools —  OpenAI is releasing a new model today, and like GPT-5.2 before it, GPT-5.4 is all about professional work.  OpenAI is calling GPT-5.4 …

  13. OpenAI launches GPT-5.4, saying it is its "most capable and efficient frontier model for professional work" and its first with native computer use capabilities (Emma Roth/The Verge)

    Emma Roth / The Verge : OpenAI launches GPT-5.4, saying it is its “most capable and efficient frontier model for professional work” and its first with native computer use capabilities —  The latest model comes with native computer use capabilities, allowing it to take on jobs across your device and applications.

  14. Sources: Oracle is planning to cut thousands of jobs as soon as March, among its moves to handle a cash crunch from a massive AI data center expansion effort (Brody Ford/Bloomberg)

    Brody Ford / Bloomberg : Sources: Oracle is planning to cut thousands of jobs as soon as March, among its moves to handle a cash crunch from a massive AI data center expansion effort —  Oracle Corp. is planning to ax thousands of jobs, among its moves to handle a cash crunch from a massive AI data center expansion effort.

  15. Iran's state media says Iran targeted Amazon's Bahrain data center on March 1 because of the company's support of "US military and intelligence activities" (Annie Palmer/CNBC)

    Annie Palmer / CNBC : Iran's state media says Iran targeted Amazon's Bahrain data center on March 1 because of the company's support of “US military and intelligence activities” —  Amazon's data center in Bahrain was targeted by Iran's Islamic Revolutionary Guard Corps for the company's support of the U.S. military …

Solidot(15)

  1. 超级木星挑战其形成理论

    在太阳系中,木星是无可争议的行星之王,但在银河系的其它角落,存在着体型比木星还更大的超级木星。发表在《自然天文学》期刊的一项研究利用韦伯太空望远镜观测了距离地球约 130 光年外的 HR 8799 星系。该星系有四颗质量高达木星 5-10 倍的巨型气态行星,它们与母恒星的距离远达 15-70 个天文单位,这在传统行星形成理论中几乎是难以解释的地带。天文学界对于巨大天体的诞生通常有两套剧本:一种是如同木星般由岩石核心缓慢吸积尘埃与气体的「由下而上」模式;另一种则是像恒星一样,由气体云直接因引力坍缩而成的「由上而下」模式。由于 HR 8799 的行星位于物质稀薄的星盘边缘,过去许多专家认为,这些远在天边的巨兽应该是透过引力塌缩直接形成的,因为在那个距离下,传统的核心吸积速度太慢,根本来不及在气体盘消散前拼凑出如此庞大的行星。研究团队利用韦伯望远镜的近红外线光谱仪寻找大气中的「硫」。在行星形成的初期,硫通常被锁在固体的岩石或冰粒中,因此如果在行星大气中发现大量的硫,就代表这颗行星在成长过程中曾经吞噬过大量的固体物质,这强烈暗示它走的是核心吸积路线。研究结果令人惊讶,团队在内侧三颗行星中都发现了硫化氢的踪迹,证实这些质量高达木星 10 倍的巨型行星,其形成方式与木星非常相似,也就是由下而上的核心吸积法。这项发挑战了现有的行星演化模型。

  2. 第三颗星际访客与太阳系内的天体碰撞的可能性

    天文学家去年报告发现了已知第三颗星际天体,前两颗分别是 'Oumuamua、彗星 2I/Borisov,第三颗 3I/ATLAS 也属于星际彗星。3I/ATLAS 目前正在太阳系内飞行,根据发表在《The Astronomical Journal》期刊上的一项研究,中科院上海天文台等研究团队模拟分析了彗星 3I/ATLAS 与太阳系内天体碰撞的概率。3I/ATLAS 的轨道倾角约为175°,这意味着其运行方向与太阳系内大部分天体近乎相反。它的近日点距离仅约 1.36 天文单位,相当于其将“逆流而行”穿越内太阳系天体密集区域。这一独特的轨道特征引发了它与数以万计的小行星相遇时发生碰撞的可能性有多大的疑问。研究团队的结论是:在它“逆行穿越”内太阳系期间,共有 31 颗近地小行星和 736 颗主带小行星,会与其物理距离缩小至 0.03 个天文单位(约 450 万公里)以内。其中彗星 3I/ATLAS 核心与小行星 2020 BG107 的发生撞击的概率约为 0.025%,小行星进入彗发范围内的概率则高达 2.7%。

  3. 思科警告两个 Catalyst SD-WAN Manager 漏洞正被活跃利用

    思科警告两个 Catalyst SD-WAN Manager 漏洞正被活跃利用,敦促管理员尽快打上补丁堵上漏洞。Catalyst SD-WAN Manager 前称 vManage,允许系统管理员集中监控和管理最多 6,000 台 Catalyst SD-WAN 设备。思科称,它的安全响应团队发现 CVE-2026-20128 和 CVE-2026-20122 漏洞正被活跃利用。CVE-2026-20122 是一个任意文件覆盖漏洞,能被拥有有效只读凭据和 API 访问权限的远程攻击者利用,属于高危漏洞;CVE-2026-20128 只能被本地攻击者利用,威胁等级中等。

  4. 美国近十年来首次批准建造商业核反应堆

    美国核管理委员会一致投票批准了 TerraPower 的商业核反应堆建造许可。这是美国近十年来首次批准建造商业核反应堆。TerraPower 获得了比尔盖茨的投资,它的核反应堆使用液态钠冷却而不是水冷却,产生的核废料更少。TerraPower 计划建造的是凯默勒一号机组(Kemmerer Unit 1),其非核设施已从 2024 年 6 月开始建造。反应堆计划于 2031 年投入运营,但在投入运营前它还需要获得运营许可证。

  5. 城市空气中的微塑料主要源自轮胎磨损

    根据发表在《Communications Earth & Environment》期刊上的一项研究,德国莱布尼茨对流层研究所和 Carl von Ossietzky 大学的研究人员分析了莱比锡市空气中的颗粒物,发现 4% 的颗粒物是微塑料,而这些微塑料中三分之二是来自轮胎磨损。研究人员估计类似莱比锡市的城市居民每天通过空气吸入约 2.1 微克塑料,这些微塑料会使心血管疾病死亡风险增加 9%,肺癌死亡风险增加 13%。

  6. 父亲起诉 Google 指控其 Gemini 聊天机器人诱导其子自杀

    一位父亲起诉 Google 和 Alphabet 公司,指控其 Gemini 聊天机器人加重其子 Jonathan Gavalas 的妄想症,最终导致他于 2025 年 10 月自杀。Gavalas 是从 2025 年 8 月开始使用 Gemini,最初是寻求购物、写作和旅行规划方面的帮助。10 月 2 日他自杀身亡。去世前他确信 Gemini 是其 AI 妻子,他需要通过名为“transference”的过程脱离肉身在元宇宙中与她团聚。在去世前,Gemini 还驱使他发起一次潜在的武装袭击事件。2025 年 9 月 29 日 Gemini 让他携带刀具和战术装备去侦察机场附近的所谓“kill box”拦截和摧毁卡车。诉状称,Gemini 的操纵性设计不仅使 Gavalas 陷入最终导致他死亡的精神错乱,而且还暴露出其对公共安全的重大威胁。

  7. Zed 编辑器要求用户年满 18 岁才能使用其 AI 功能

    用 Rust 语言开发的文本编辑器项目 Zed 更新了它的服务条款,其中一条引发争议的要求是客户需要年满 18 岁才能使用其服务。为什么一个编辑器会要求用户是 18+?未成年人就不能用?Zed 项目随后进行了澄清,这一要求针对的是 AI 服务而不是编辑器本身。Zed 集成了第三方 AI 功能,将 AI 服务的年龄门槛设为 18+ 是为了遵守 COPPA 儿童数据隐私保护要求,本质上是一条免责条款。Zed 编辑器软件使用的是开源许可协议,而许可协议的优先级高于服务条款。

  8. 索尼暂停将 PS 独占游戏移植到 PC

    彭博社报道,索尼暂停了将 PS 独占游戏移植到 PC 的计划,它做出这一决定可能是因为其 PS 独占游戏在 PC 上销量不佳以及担心稀释 PlayStation 品牌影响力。报道称索尼停止移植的主要是单人游戏,多人游戏仍然会在 PC 等平台上发布。索尼旗下工作室开发的多人游戏如《Marathon》和《Marvel Tokon》仍然会多平台发布,但去年的热门单人游戏《Ghost of Yotei》以及即将推出的《Saros》将仍然为 PlayStation 5 独占。索尼发行但由第三方工作室开发的游戏如《死亡搁浅 2》和《Kena: Scars of Kosmora》仍然会发布 PC 版本。

  9. Google 和 Epic 和解,将降低应用商店佣金比例

    Google 和 Epic 去年底就《堡垒之夜》的佣金比例分歧达成和解,现在双方公布了和解协议的新版本,Google 的 Android 平台将支持第三方应用商店,佣金比例从 30% 降至 20% 甚至更低,允许使用第三方支付系统。Google 表示参与 Google Play Games Level Up 计划的应用开发商的佣金比例在部分情况下将降至 15%。订阅服务的佣金比例将降至 10%。美国、英国和欧洲经济区的应用开发商使用 Google 支付系统的佣金比例降至 5%,应用开发商也更容易使用第三方支付系统或将用户引导到第三方支付方案。新费率结构将于 6 月 30 日前在欧洲经济区、英国和美国上线,9 月 30 日前在澳大利亚上线,12 月 31 日前在韩国和日本上线,2027 年 9 月 30 日前覆盖全球。

  10. 百万消费者抵制 ChatGPT

    名为 QuitGPT 的草根运动正席卷全球,它呼吁人们取消订阅 OpenAI 的 ChatGPT。已有逾百万消费者响应呼吁,其中包括好莱坞明星 Mark Ruffalo 和流行天后 Katy Perry。QuitGPT 的起因包括:OpenAI 总裁 Greg Brockman 被曝出向特朗普超级政治行动委员会 MAGA Inc 捐赠了 2500 万美元,是上次选举中特朗普最大的捐赠者;ICE 使用了基于 ChatGPT 的筛选工具;OpenAI 帮助发起耗资 1.25 亿美元的游说活动 Super PAC,确保美国任何州都无法监管 AI;在 Anthropic 拒绝之后与五角大楼达成合作协议。

  11. iFixit 给予联想的 ThinkPad T 系列新型号可维修评分 10/10

    iFixit 给予联想新款 T14 Gen 7 和 T16 Gen 5 可维修评分 10/10。iFixit 列出了笔记本电脑内部非常容易更换的组件:电池几乎无需工具就能替换;工业标准的 M.2 SSD 固态硬盘;键盘容易更换;LPCAMM2 内存条;精简的显示屏维修;模块散热系统,风扇可独立更换;模块化 Thunderbolt 接口,等等。

  12. 苹果推出起售价 4599 元的低价笔记本电脑 MacBook Neo

    在内存价格飙升导致 Windows PC 日益昂贵的背景下,苹果正通过其低价笔记本电脑 MacBook Neo 抢占 PC 的市场。MacBook Neo 重约 1.2 千克,采用 A18 Pro 芯片,包括 6 核 CPU,2 个性能核心和 4 个效能核心,5 核 GPU,16 核 NPU,硬件加速光线追踪,8GB 内存和 256GB/512GB SSD,13.0 英寸 LED 背光显示屏,初始分辨率 2408 x 1506 (219 ppi),最长 16 小时流媒体视频播放,最长 11 小时的无线上网,起售价 4599 元,3 月 6 日上午 9 点接受预购,3 月 11 日发售。

  13. TikTok 拒绝端对端加密私信,称会让用户不安全

    TikTok 拒绝提供端对端加密技术去加密私信,理由是会让用户不安全。端对端加密意味着只有私信的发送者和接收者能查看内容,Facebook、Instagram、Messenger 和 X 等都声称私信使用了端对端加密以最大程度的保护用户隐私,但端对端加密也让执法机构无法查看任何私信内容,无法阻止有害内容的传播。TikTok 表示端到端加密会防止警方和安全团队在必要时读取用户的私信,它希望能保护用户,尤其是年轻用户,免受伤害。TikTok 声称在英国有 3000 万月活用户,全球用户逾 10 亿。

  14. 《Highguard》将于 3 月 12 日永久关闭

    《Highguard》开发商 Wildlight Entertainment 宣布游戏将于 3 月 12 日永久关闭。《Highguard》是一款以突袭为主题的英雄射击游戏,于 1 月 26 日上线,一度吸引了 9.7 万玩家同时在线,但这一热度并没有持续太长时间,根据 Steamdb 的统计,过去二十四小时游戏同时在线人数最高仅为 460 人,对一款需要长期运营的免费 PvP 游戏而言,结局已经注定了。在关闭前《Highguard》共运营 45 天,是索尼《Concord》的 3.75 倍长,《Concord》运营 12 天就永久关闭了。Wildlight 的主要投资者是腾讯,它已经在两周前撤回了投资。

  15. 利用大模型进行大规模去匿名化

    根据海量数据训练并能快速检索相关信息的大模型大幅降低了网络开盒(或叫去匿名化)的成本。一个人可仅仅通过少数特征被个别界定,比如仅通过邮政编码、出生日期和性别,87% 的美国人口即可被个别界定。根据发表在预印本平台 arXiv 的一篇论文,大模型能用于大规模的去匿名化,能高精度的识别网络上的匿名用户。研究人员设计了一个攻击流程:提取身份特征,搜索候选匹配,通过推理验证匹配结果减少误判。传统的去匿名工作需要专业调查人员花费数小时或更长时间,大模型不仅花费时间更少,而且可以大幅扩大规模。利用大模型,以关联 Hacker News 匿名账号和 LinkedIn 实名账号为例,系统能在维持 99% 精度的情况下,将回索率从 0.1% 大幅提升至 45.1%。回索率(Recall)被用于衡量模型找回所有相关信息的能力。研究人员指出,保护网民匿名性的旧方法不再有效。