OrangeBot.AI Digest — 2026-03-22
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- PC Gamer recommends RSS readers in a 37mb article that just keeps downloading (stuartbreckenridge.net)
- OpenClaw is a security nightmare dressed up as a daydream (composio.dev)
- Palantir extends reach into British state as gets access to sensitive FCA data (www.theguardian.com)
- GrapheneOS refuses to comply with new age verification laws for operating system (www.tomshardware.com)
- Why I love NixOS (www.birkey.co)
- Reports of code's death are greatly exaggerated (stevekrouse.com)
- The future of version control (bramcohen.com)
- I hate: Programming Wayland applications (www.p4m.dev)
- Building an FPGA 3dfx Voodoo with Modern RTL Tools (noquiche.fyi)
- Project Nomad – Knowledge That Never Goes Offline (www.projectnomad.us)
- Bored of eating your own dogfood? Try smelling your own farts (shkspr.mobi)
- Windows native app development is a mess (domenic.me)
- Vatican Rebukes Peter Thiel's Antichrist Lectures in Rome (www.thenerdreich.com)
- Flash-MoE: Running a 397B Parameter Model on a Laptop (github.com)
- 25 Years of Eggs (www.john-rush.com)
GitHub Trending(15)
- FujiwaraChoki / MoneyPrinterV2
- TauricResearch / TradingAgents
- vxcontrol / pentagi
- jamwithai / production-agentic-rag-course
- affaan-m / everything-claude-code
- jarrodwatts / claude-hud
- Crosstalk-Solutions / project-nomad
- systemd / systemd
- browser-use / browser-use
- HKUDS / LightRAG
- hsliuping / TradingAgents-CN
- louis-e / arnis
- aquasecurity / trivy
- bytedance / deer-flow
- harry0703 / MoneyPrinterTurbo
Product Hunt(15)
- Edgee Claude Code Compression
Extend Claude Pro's limit by 26.2%
- Silicon Friendly
How Silicon Friendly is your website? (from L0 to L5)
- Bench for Claude Code
Store, review, and share your Claude Code sessions
- Context.dev
One API to scrape, enrich, and understand the web.
- Embedful
Easy data visualizations. Embed and share anywhere.
- Claude Code Scheduled Tasks
Schedule recurring tasks locally and in the cloud easily
- Vite+
The Unified Toolchain for the Web
- Cursor Glass
Unified agent workspace with seamless cloud handoff power
- murmur
practice tough phone calls with AI before you make them
- Novi Notes
Local-first AI note app for Mac zero config via MCP
- optimo
effortless media optimizer for the web
- Mindspend
Track how you feel about spending, not just the numbers
- Educato App
Personalized exam prep, now in your pocket
- Sanota
Stories, beautifully crafted
- Caplo
Real-time AI captions & translation for any iOS app
Hugging Face(15)
- Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which are limited by data scarcity and generalization challenges. In this work, we propose a paradigm shift by leveraging the implicit spatial prior within large-scale video generation models. We posit that to synthesize temporally coherent videos, these models inherently learn robust 3D structural priors and physical laws. We introduce VEGA-3D (Video Extracted Generative Awareness), a plug-and-play framework that repurposes a pre-trained video diffusion model as a Latent World Simulator. By extracting spatiotemporal features from intermediate noise levels and integrating them with semantic representations via a token-level adaptive gated fusion mechanism, we enrich MLLMs with dense geometric cues without explicit 3D supervision. Extensive experiments across 3D scene understanding, spatial reasoning, and embodied manipulation benchmarks demonstrate that our method outperforms state-of-the-art baselines, validating that generative priors provide a scalable foundation for physical-world understanding. Code is publicly available at https://github.com/H-EmbodVis/VEGA-3D.
- SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
Current instruction-guided video editing models struggle to simultaneously balance precise semantic modifications with faithful motion preservation. While existing approaches rely on injecting explicit external priors (e.g., VLM features or structural conditions) to mitigate these issues, this reliance severely bottlenecks model robustness and generalization. To overcome this limitation, we present SAMA (factorized Semantic Anchoring and Motion Alignment), a framework that factorizes video editing into semantic anchoring and motion modeling. First, we introduce Semantic Anchoring, which establishes a reliable visual anchor by jointly predicting semantic tokens and video latents at sparse anchor frames, enabling purely instruction-aware structural planning. Second, Motion Alignment pre-trains the same backbone on motion-centric video restoration pretext tasks (cube inpainting, speed perturbation, and tube shuffle), enabling the model to internalize temporal dynamics directly from raw videos. SAMA is optimized with a two-stage pipeline: a factorized pre-training stage that learns inherent semantic-motion representations without paired video-instruction editing data, followed by supervised fine-tuning on paired editing data. Remarkably, the factorized pre-training alone already yields strong zero-shot video editing ability, validating the proposed factorization. SAMA achieves state-of-the-art performance among open-source models and is competitive with leading commercial systems (e.g., Kling-Omni). Code, models, and datasets will be released.
- 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model
Creating dynamic, view-consistent videos of customized subjects is highly sought after for a wide range of emerging applications, including immersive VR/AR, virtual production, and next-generation e-commerce. However, despite rapid progress in subject-driven video generation, existing methods predominantly treat subjects as 2D entities, focusing on transferring identity through single-view visual features or textual prompts. Because real-world subjects are inherently 3D, applying these 2D-centric approaches to 3D object customization reveals a fundamental limitation: they lack the comprehensive spatial priors necessary to reconstruct the 3D geometry. Consequently, when synthesizing novel views, they must rely on generating plausible but arbitrary details for unseen regions, rather than preserving the true 3D identity. Achieving genuine 3D-aware customization remains challenging due to the scarcity of multi-view video datasets. While one might attempt to fine-tune models on limited video sequences, this often leads to temporal overfitting. To resolve these issues, we introduce a novel framework for 3D-aware video customization, comprising 3DreamBooth and 3Dapter. 3DreamBooth decouples spatial geometry from temporal motion through a 1-frame optimization paradigm. By restricting updates to spatial representations, it effectively bakes a robust 3D prior into the model without the need for exhaustive video-based training. To enhance fine-grained textures and accelerate convergence, we incorporate 3Dapter, a visual conditioning module. Following single-view pre-training, 3Dapter undergoes multi-view joint optimization with the main generation branch via an asymmetrical conditioning strategy. This design allows the module to act as a dynamic selective router, querying view-specific geometric hints from a minimal reference set. Project page: https://ko-lani.github.io/3DreamBooth/
- FASTER: Rethinking Real-Time Flow VLAs
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in π_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.
- Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.
- Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising condition feature extraction (Perception), discrete token generation (Planning), and diffusion-based motion synthesis (Control). Central to this framework is MoTok, a diffusion-based discrete motion tokenizer that decouples semantic abstraction from fine-grained reconstruction by delegating motion recovery to a diffusion decoder, enabling compact single-layer tokens while preserving motion fidelity. For kinematic conditions, coarse constraints guide token generation during planning, while fine-grained constraints are enforced during control through diffusion-based optimization. This design prevents kinematic details from disrupting semantic token planning. On HumanML3D, our method significantly improves controllability and fidelity over MaskControl while using only one-sixth of the tokens, reducing trajectory error from 0.72 cm to 0.08 cm and FID from 0.083 to 0.029. Unlike prior methods that degrade under stronger kinematic constraints, ours improves fidelity, reducing FID from 0.033 to 0.014.
- Memento-Skills: Let Agents Design Agents
We introduce Memento-Skills, a generalist, continually-learnable LLM agent system that functions as an agent-designing agent: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with stateful prompts, where reusable skills (stored as structured markdown files) serve as persistent, evolving memory. These skills encode both behaviour and context, enabling the agent to carry forward knowledge across interactions. Starting from simple elementary skills (like Web search and terminal operations), the agent continually improves via the Read--Write Reflective Learning mechanism introduced in Memento~2~wang2025memento2. In the read phase, a behaviour-trainable skill router selects the most relevant skill conditioned on the current stateful prompt; in the write phase, the agent updates and expands its skill library based on new experience. This closed-loop design enables continual learning without updating LLM parameters, as all adaptation is realised through the evolution of externalised skills and prompts. Unlike prior approaches that rely on human-designed agents, Memento-Skills enables a generalist agent to design agents end-to-end for new tasks. Through iterative skill generation and refinement, the system progressively improves its own capabilities. Experiments on the General AI Assistants benchmark and Humanity's Last Exam demonstrate sustained gains, achieving 26.2\% and 116.2\% relative improvements in overall accuracy, respectively. Code is available at https://github.com/Memento-Teams/Memento-Skills.
- MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.
- Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically 8-32 dims), sacrificing the semantic richness essential for understanding. While high-dimensional pretrained representations (768-1024 dims) could bridge this gap, their discrete generation poses fundamental challenges. In this paper, we present Cubic Discrete Diffusion (CubiD), the first discrete generation model for high-dimensional representations. CubiD performs fine-grained masking throughout the high-dimensional discrete representation -- any dimension at any position can be masked and predicted from partial observations. This enables the model to learn rich correlations both within and across spatial positions, with the number of generation steps fixed at T regardless of feature dimensionality, where T ll hwd. On ImageNet-256, CubiD achieves state-of-the-art discrete generation with strong scaling behavior from 900M to 3.7B parameters. Crucially, we validate that these discretized tokens preserve original representation capabilities, demonstrating that the same discrete tokens can effectively serve both understanding and generation tasks. We hope this work will inspire future research toward unified multimodal architectures. Code is available at: https://github.com/YuqingWang1029/CubiD.
- LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs
Recent advancements in omnimodal large language models (OmniLLMs) have significantly improved the comprehension of audio and video inputs. However, current evaluations primarily focus on short audio and video clips ranging from 10 seconds to 5 minutes, failing to reflect the demands of real-world applications, where videos typically run for tens of minutes. To address this critical gap, we introduce LVOmniBench, a new benchmark designed specifically for the cross-modal comprehension of long-form audio and video. This dataset comprises high-quality videos sourced from open platforms that feature rich audio-visual dynamics. Through rigorous manual selection and annotation, LVOmniBench comprises 275 videos, ranging in duration from 10 to 90 minutes, and 1,014 question-answer (QA) pairs. LVOmniBench aims to rigorously evaluate the capabilities of OmniLLMs across domains, including long-term memory, temporal localization, fine-grained understanding, and multimodal perception. Our extensive evaluation reveals that current OmniLLMs encounter significant challenges when processing extended audio-visual inputs. Open-source models generally achieve accuracies below 35%, whereas the Gemini 3 Pro reaches a peak accuracy of approximately 65%. We anticipate that this dataset, along with our empirical findings, will stimulate further research and the development of advanced models capable of resolving complex cross-modal understanding problems within long-form audio-visual contexts.
- F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.
- ReactMotion: Generating Reactive Listener Motions from Speaker Utterance
In this paper, we introduce a new task, Reactive Listener Motion Generation from Speaker Utterance, which aims to generate naturalistic listener body motions that appropriately respond to a speaker's utterance. However, modeling such nonverbal listener behaviors remains underexplored and challenging due to the inherently non-deterministic nature of human reactions. To facilitate this task, we present ReactMotionNet, a large-scale dataset that pairs speaker utterances with multiple candidate listener motions annotated with varying degrees of appropriateness. This dataset design explicitly captures the one-to-many nature of listener behavior and provides supervision beyond a single ground-truth motion. Building on this dataset design, we develop preference-oriented evaluation protocols tailored to evaluate reactive appropriateness, where conventional motion metrics focusing on input-motion alignment ignore. We further propose ReactMotion, a unified generative framework that jointly models text, audio, emotion, and motion, and is trained with preference-based objectives to encourage both appropriate and diverse listener responses. Extensive experiments show that ReactMotion outperforms retrieval baselines and cascaded LLM-based pipelines, generating more natural, diverse, and appropriate listener motions.
- AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
Long-horizon GUI agents are a key step toward real-world deployment, yet effective interaction memory under prevailing paradigms remains under-explored. Replaying full interaction sequences is redundant and amplifies noise, while summaries often erase dependency-critical information and traceability. We present AndroTMem, a diagnostic framework for anchored memory in long-horizon Android GUI agents. Its core benchmark, AndroTMem-Bench, comprises 1,069 tasks with 34,473 interaction steps (avg. 32.1 per task, max. 65). We evaluate agents with TCR (Task Complete Rate), focusing on tasks whose completion requires carrying forward critical intermediate state; AndroTMem-Bench is designed to enforce strong step-to-step causal dependencies, making sparse yet essential intermediate states decisive for downstream actions and centering interaction memory in evaluation. Across open- and closed-source GUI agents, we observe a consistent pattern: as interaction sequences grow longer, performance drops are driven mainly by within-task memory failures, not isolated perception errors or local action mistakes. Guided by this diagnosis, we propose Anchored State Memory (ASM), which represents interaction sequences as a compact set of causally linked intermediate-state anchors to enable subgoal-targeted retrieval and attribution-aware decision making. Across multiple settings and 12 evaluated GUI agents, ASM consistently outperforms full-sequence replay and summary-based baselines, improving TCR by 5%-30.16% and AMS by 4.93%-24.66%, indicating that anchored, structured memory effectively mitigates the interaction-memory bottleneck in long-horizon GUI tasks. The code, benchmark, and related resources are publicly available at [https://github.com/CVC2233/AndroTMem](https://github.com/CVC2233/AndroTMem).
- Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
While Multimodal Large Language Models (MLLMs) have achieved remarkable success in interpreting natural scenes, their ability to process discrete symbols -- the fundamental building blocks of human cognition -- remains a critical open question. Unlike continuous visual data, symbols such as mathematical formulas, chemical structures, and linguistic characters require precise, deeper interpretation. This paper introduces a comprehensive benchmark to evaluate how top-tier MLLMs navigate these "discrete semantic spaces" across five domains: language, culture, mathematics, physics, and chemistry. Our investigation uncovers a counterintuitive phenomenon: models often fail at basic symbol recognition yet succeed in complex reasoning tasks, suggesting they rely on linguistic probability rather than true visual perception. By exposing this "cognitive mismatch", we highlight a significant gap in current AI capabilities: the struggle to truly perceive and understand the symbolic languages that underpin scientific discovery and abstract thought. This work offers a roadmap for developing more rigorous, human-aligned intelligent systems.
- EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing
Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered by the lack of a comprehensive dataset that systematically captures common object effects across varied environments for training and evaluation. To address this, we introduce VOR (Video Object Removal), a large-scale dataset that provides diverse paired videos, each consisting of one video where the target object is present with its effects and a counterpart where the object and effects are absent, with corresponding object masks. VOR contains 60K high-quality video pairs from captured and synthetic sources, covers five effects types, and spans a wide range of object categories as well as complex, dynamic multi-object scenes. Building on VOR, we propose EffectErase, an effect-aware video object removal method that treats video object insertion as the inverse auxiliary task within a reciprocal learning scheme. The model includes task-aware region guidance that focuses learning on affected areas and enables flexible task switching. Then, an insertion-removal consistency objective that encourages complementary behaviors and shared localization of effect regions and structural cues. Trained on VOR, EffectErase achieves superior performance in extensive experiments, delivering high-quality video object effect erasing across diverse scenarios.
Techmeme(15)
- Tencent launches ClawBot, an OpenClaw-based agent integrated into WeChat, letting its 1B+ MAUs send and receive commands to interact with the AI agent via chat (Reuters)
Reuters : Tencent launches ClawBot, an OpenClaw-based agent integrated into WeChat, letting its 1B+ MAUs send and receive commands to interact with the AI agent via chat — Tencent(0700.HK) launched a tool on Sunday to integrate its WeChat messaging platform with the OpenClaw agent …
- AI tools like Claude Code have transformed coders' lives, and AI labs are now eyeing a bigger goal: automating everyone's lives and winning the non-coder market (Kate Clark/Wall Street Journal)
Kate Clark / Wall Street Journal : AI tools like Claude Code have transformed coders' lives, and AI labs are now eyeing a bigger goal: automating everyone's lives and winning the non-coder market — The AI sprint is hurtling toward a world where anyone can build personal concierges to do everything from executive presentations to March Madness brackets
- A look at Huawei-backed Yuanjie, a maker of photonic chips used in AI data center optical interconnects, whose stock has surged 780% over the past year (Yue Wang/Forbes)
Yue Wang / Forbes : A look at Huawei-backed Yuanjie, a maker of photonic chips used in AI data center optical interconnects, whose stock has surged 780% over the past year — Huawei-backed Yuanjie Semiconductor Technology makes photonic chips used in optical interconnects in AI data centers.
- A profile of Chinese bitcoin mining company Bitmain, now allied with Eric Trump's American Bitcoin and previously the target of a DHS espionage-risk probe (Ryan Weeks/Bloomberg)
Ryan Weeks / Bloomberg : A profile of Chinese bitcoin mining company Bitmain, now allied with Eric Trump's American Bitcoin and previously the target of a DHS espionage-risk probe — Bitmain has been dogged for years by questions about the security of its mining rigs. But that hasn't stopped it from going into business with a key member of the First Family.
- An essay on the history, theory, progress, and potential of world models, a prominent theme at Nvidia GTC 2026, co-written by General Intuition CEO Pim de Witte (Not Boring by Packy McCormick)
Not Boring by Packy McCormick : An essay on the history, theory, progress, and potential of world models, a prominent theme at Nvidia GTC 2026, co-written by General Intuition CEO Pim de Witte — Welcome to the 458 newly Not Boring people who have joined us since our last essay! Join 260,170 smart, curious folks by subscribing here:
- Elon Musk announces Terafab, an Austin-based project run by Tesla and SpaceX to manufacture robotics, AI, and space data center chips for Tesla, xAI, and SpaceX (Bloomberg)
Bloomberg : Elon Musk announces Terafab, an Austin-based project run by Tesla and SpaceX to manufacture robotics, AI, and space data center chips for Tesla, xAI, and SpaceX — Elon Musk said his Terafab project — a grand plan to eventually manufacture his own chips for robotics, artificial intelligence …
- Speaking at a Beijing forum, Tim Cook praised Apple's partners and developers in China, a week after Chinese state media labeled the App Store "monopolistic" (Bloomberg)
Bloomberg : Speaking at a Beijing forum, Tim Cook praised Apple's partners and developers in China, a week after Chinese state media labeled the App Store “monopolistic” — Apple Inc. Chief Executive Officer Tim Cook commended Chinese developers and the company's partners in the country …
- Cloaked, which offers security and privacy services such as VPNs, raised a $375M Series B in a mix of equity and growth funding, for enterprise expansion (Ivan Mehta/TechCrunch)
Ivan Mehta / TechCrunch : Cloaked, which offers security and privacy services such as VPNs, raised a $375M Series B in a mix of equity and growth funding, for enterprise expansion — Consumer-facing security tools often focus on one kind of modality, such as password protection, VPNs, or identity management.
- Hands-on with Gemini task automation on mobile: it's super impressive despite being very slow and failing at some tasks; it can order food, book Ubers, and more (Allison Johnson/The Verge)
Allison Johnson / The Verge : Hands-on with Gemini task automation on mobile: it's super impressive despite being very slow and failing at some tasks; it can order food, book Ubers, and more — It took nine minutes to order my dinner, but it still feels like the future. … I've been testing out Gemini's new task automation …
- How gig apps like Kled AI, Silencio, Neon Mobile, and Luel AI pay users for data that AI companies can use to train models, from phone calls to videos of places (Shubham Agarwal/The Guardian)
Shubham Agarwal / The Guardian : How gig apps like Kled AI, Silencio, Neon Mobile, and Luel AI pay users for data that AI companies can use to train models, from phone calls to videos of places — Gig AI trainers worldwide are selling moments of their lives, including calls and texts, to AI companies for quick cash
- A look at "tokenmaxxing", a status game where employees at a number of companies compete on leaderboards to show how much AI they're using (Kevin Roose/New York Times)
Kevin Roose / New York Times : A look at “tokenmaxxing”, a status game where employees at a number of companies compete on leaderboards to show how much AI they're using — An engineer at OpenAI processed 210 billion “tokens” — enough text to fill Wikipedia 33 times — through the company's artificial intelligence models …
- Social media accounts showing AI-generated women as pro-Trump soldiers, truckers, and cops have gone viral, with thousands appearing to believe they are real (Drew Harwell/Washington Post)
Drew Harwell / Washington Post : Social media accounts showing AI-generated women as pro-Trump soldiers, truckers, and cops have gone viral, with thousands appearing to believe they are real — The beautiful Army blonde Jessica Foster has posed with an F-22 Raptor fighter jet, donned camouflage in the desert and walked …
- Sources: advertisers that bought ChatGPT's first ad campaigns say the process was low tech and that they haven't received much data showing if their ads worked (Catherine Perloff/The Information)
Catherine Perloff / The Information : Sources: advertisers that bought ChatGPT's first ad campaigns say the process was low tech and that they haven't received much data showing if their ads worked — As OpenAI prepares to open up ad sales to more marketers next month, it is trying to address what some advertisers say was lacking in the initial ad sales offering.
- CEO of Halide-maker Lux Optics, Ben Sandofsky, sues his co-founder Sebastiaan de With, now on Apple's design team, alleging improper use of funds and stolen IP (Aaron Tilley/The Information)
Aaron Tilley / The Information : CEO of Halide-maker Lux Optics, Ben Sandofsky, sues his co-founder Sebastiaan de With, now on Apple's design team, alleging improper use of funds and stolen IP — Last summer, Apple held talks to acquire Lux Optics, a tiny startup that makes Halide, one of the most popular and critically acclaimed camera apps in the App Store.
- Vercel, which helps developers host web apps and AI agents, says its run-rate GAAP revenue hit $340M at the end of February, up 86% YoY, amid the AI coding boom (Richard Nieva/Forbes)
Richard Nieva / Forbes : Vercel, which helps developers host web apps and AI agents, says its run-rate GAAP revenue hit $340M at the end of February, up 86% YoY, amid the AI coding boom — One of the most popular ways to view the Epstein Files, an interface called Jmail that mimics a Gmail inbox, is hosted on Guillermo Rauch's $9.3 billion unicorn Vercel.
Solidot(15)
- 视觉小说数据库创始人去世
视觉小说数据库(Visual Novel Database,简称 VNDB)创始人 Yoran Heling 于 3 月 17 日去世。2007 年网名 Yorhel 的 Yoran Heling 在游玩了《时空轮回》之后,惊讶于网络上没有一个供玩家交流视觉小说、寻找视觉小说的地方,故花费了三个星期的时间创立了 VNDB,作为玩家集中讨论、收集视觉小说的场所。在创立后的一年内,视觉小说数据库便已收录 1000 部视觉小说。目前 VNDB 收录了 61125 部视觉小说。网站背景图中的角色是《AS~天使小夜曲》的女主角“菈司蒂·珐姒”。
- 猪脑成功冷冻
在保存大脑精微结构方面,时机至关重要。血液循环停止后数分钟内,酶就会分解神经元,细胞开始自我消化。 人体冷冻技术通常涉及将人体保存在零度以下,希望未来一旦出现针对其病情的治疗方法或疗法,能够使其复活。传统上,该技术旨在通过冷却和添加固定剂在自然死亡后快速保存大脑,但除非冷冻团队守候在患者床边,否则在此之前退化过程早已开始。为规避这一问题,专注于记忆保存的美国科技公 司Nectome 团队制定了一套与医生协助死亡相兼容的方案,即绝症患者可选择自己的离世时间。其理念在于,通过立即干预,科学家可能拥有最佳机会使大脑的保存状态尽可能接近活体状态。该团队在猪身上测试了这一方案,猪的大脑和心血管解剖结构与人类相当。首先,他们在心脏骤停约 1 分钟后将插管插入心脏,随后冲洗血液并将保存液引入大脑。这些液体含有醛类化学物质,可在细胞间形成分子桥,实质上将细胞活动锁定在原地。随后他们引入冷冻保护剂,置换组织中的水分,防止冷却过程中形成冰晶(否则会损害细胞)。接下来,大脑被冷却至约零下 32 摄氏度,在此温度下冷冻保护剂会形成玻璃态。此时大脑结构可近乎永久保存。
- 全球暖化加剧极端干旱热浪事件的发生频率
21 世纪以来,干旱与热浪同时发生的极端天气事件频率显著增加。这些事件造成了严重的社会经济损失。例如 2010 年俄罗斯的干旱、热浪与野火同时发生,导致了 5.5 万人死亡;2019 年至 2020 年澳大利亚的森林大火与干旱与热浪同时发生导致了“黑夏”;2021 年 6 月太平洋西北地区的极端天气事件导致加拿大大不列颠哥伦比亚省和阿尔伯塔省春小麦产量下降 31%。根据发表在《Science Advances》的一项研究,自 2000 年以来,极端干旱热浪事件的发生频率增加了近 8 倍。全球气温每上升 1°C,极端天气事件发生频率从 1.6% 上升至 13.1%。
- Cloudflare 将 archive.today 归类为 C&C/Botnet
在维基百科之后,提供 DDoS 保护、网页应用防火墙、公共 DNS 解析器、反向代理和 CDN 等服务的 Cloudflare 将 Archive.today 以及相关域名 archive.is、archive.ph 等加入到了 Command and Control & Botnet 类别,意味着该域名被 Cloudflare 的 1.1.1.2 停止解析。原因是 Archive.today 被发现劫持用户的浏览器对博主 Jani Patokallio 的个人博客 Gyrovague 发动了 DDoS 攻击。Patokallio 在澳大利亚工作,他是芬兰人,Archive.today 的运营者针对芬兰 IP 的 CAPTCHA 验证页面嵌入脚本对其博客发动攻击,至今该脚本仍然存在,对 Patokallio 的 DDoS 攻击仍然在进行之中。
- SystemD 加入可选的 birthDate 字段
SystemD 项目合并了一个 pull request,为 userdb 管理的 JSON 用户记录添加了一个新的 birthDate 字段,此举旨在响应美国加州和科罗拉多州,以及巴西的年龄验证法律。systemd 作者 Lennart Poettering 强调该字段是完全可选的,只是定义了一个字段,在用户需要存储出生日期时将其标准化。他表示 systemd 本身不会处理出生日期相关的数据,也不会强制要求提供数据,systemd 并不执行年龄验证政策,而是由系统的其它部分决定。
- 慢性肾脏病可能会唤醒破坏大脑的病毒
全世界可能多达九成的人感染了人类多瘤病毒 2——aka JC 病毒,以首位分离出该病毒的患者 John Cunningham 的名字首字母命名。除非被激活,对大部分人而言这种病毒不会有症状。如果激活,病毒会破坏大脑,导致“进行性多灶性白质脑病(缩写 PML)”。PML 型的 JC 病毒会破坏特定的脑细胞,导致神经细胞功能障碍和死亡。PML 被认为非常罕见,但在艾滋病出现之后,在流行早期有 2% 至 5% 的 HIV 感染者会同时出现 PML。这意味着艾滋病相关联的严重免疫抑制可能是 PML 的一种激活条件。现在,根据发表在《Annals of Internal Medicine Case》期刊上的一项研究,研究人员报告了 PML 的另一种可能的激活条件——慢性肾脏病。相比 HIV,慢性肾脏病影响全世界十分之一的人口,慢性肾脏病激活 PML 型 JC 病毒可能会带来严重后果,需要警惕。
- 太阳磁引擎可能位于表面下 20 万公里
根据发表在《Nature Scientific Reports》期刊上的一项研究,物理学家分析了近三十年的太阳振荡数据,追踪太阳内部的动力学,发现太阳磁引擎可能位于太阳表面下约 20 万公里处。研究团队利用了 NASA Solar and Heliospheric Observatory 卫星上的 Michelson Doppler Imager(MDI)、Solar Dynamics Observatory 上的 Helioseismic and Magnetic Imager 以及地基 Global Oscillation Network Group 近 30 年的观测数据。自 1990 中期以来,这些仪器每隔 45 到 60 秒记录一次由太阳内部湍流等离子体运动产生的声波。通过整合观测数据,研究人员创建了迄今持续时间最长、最详细的太阳内部振动记录。他们发现太阳表面下近 20 万公里处存在一个被称为差旋层(tachocline)的过渡层。穿过差旋层,太阳自转速度突变,产生了强大的剪切流,驱动了太阳磁场。
- 超微联合创始人因涉嫌走私英伟达 AI 芯片被捕
美国司法部周四指控超微电脑联合创始人廖义贤(Yih-Shyan Liaw) ,以及员工张瑞增(Ruei-Tsang Chang)、孙廷伟(Ting-Wei Sun) 串谋,将内置高性能英伟达(Nvidia)GPU的计算机服务器走私至中国。根据起诉书,三人自约两年前起与他人共谋,通过一家位于东南亚的“中转”公司,向中国销售至少价值 25 亿美元的服务器。而美国法律禁止未经许可向中国销售此类设备。超微表示公司本身未被列为被告。公司已将廖义贤和张瑞增停职,并终止与合同工孙廷伟的合作。美国司法部表示,廖义贤和廷伟已被逮捕,张瑞增仍在逃。尽管超微未列为被告,但其股价周五暴跌。
- 中国帮助古巴发展太阳能
古巴面临数十年来最严重的能源危机,而中国正在帮助古巴加速发展太阳能。中国对古巴的太阳能设备出口额从 2023 年的约 500 万美元飙升至 2025 年的 1.17 亿美元,且没有停止的迹象。太阳能发电量目前可能占到了古巴总发电量的 10%。古巴正成为全球太阳能发展速度最快的国家之一,其太阳能发电占比超过了美国在内的大多数国家。
- 微软承诺减少 Copilot 集成,允许用户无限期推迟更新
Windows 操作系统作为微软的核心产品其质量过去几年显著下降,每个月的例行安全更新在修复 bug 的同时还会带来额外问题,而微软还在不断给操作系统塞入用户并不需要的 AI 功能。微软今年早先时候承诺将致力于改进 Windows 的质量,现在官方博客公布了它准备采取哪些措施的细节:任务栏定制,任务栏不再固定在底部,可以移动到屏幕的任意一侧;减少记事本等应用中的 Copilot 集成;用户可以控制 Windows 更新时间,允许忽略更新或无限期推迟更新,重启或关机而不安装更新;等等。
- ENIAC 诞生八十周年
世界第一台通用计算机 ENIAC(代表 Electronic Numerical Integrator And Computer 或电子数值积分计算机) 诞生八十周年。1946 年 2 月 15 日宾夕法尼亚大学 Moore 电气工程学院研发的 ENIAC 首次公开亮相。以今天的标准,ENIAC 相当原始,但其纯电子设计和可编程性在当时是计算机领域的突破性进展。ENIAC 使高速通用计算成为可能,为现代计算机奠定了基础。过去八十年,计算机领域已发展成为一个庞然大物、成为现代经济增长的引擎。ENIAC 最初是为美国陆军加速计算火炮的火力表而研发的,包含了 17468 个电子管,由 80 台鼓风机进行冷却,重达 27 吨(30 美吨),耗电量相当于一个小镇,它于 1955 年 10 月 2 日运行九年后退役。
- arXiv 的独立之路
预印本平台 arXiv.org 将于 7 月 1 日脱离康奈尔大学成立独立的非营利性公司,它正在招募 CEO。过去两年因为 AI 以及 AI 相关领域的火热 arXiv 收到的投稿数量激增,员工人数也增长到 27 人,导致出现了运营赤字,康奈尔大学帮助支付了超支。但 arXiv 在资金需求上与康奈尔内部项目存在竞争,而它在寻求赞助时对方会担心通过康奈尔捐赠的资金无法直接用于 arXiv 项目,成立独立机构就是为了解决该问题。此举有助于 arXiv 从更多的捐赠者手中直接筹集到用于该平台的资金,也有助于它解决投稿中的 AI slop 问题。分拆和独立运营是 arXiv 创始人、物理学家 Paul Ginsparg 最早提出的,Ginsparg 一直想退出和退休,独立运营也有助于他彻底放手。
- 北美何时开始使用弓箭
根据发表在《PNAS Nexus》上的一项考古学研究,北美居民大约是在 1400 年前开始用弓箭取代飞镖和投矛器。在南部区域弓箭几乎是立即被接受,北部区域的接受度比较慢,一开始是将弓箭作为现有工具的补充,花了千年时间才淘汰飞镖和投矛器。弓箭在精度、射程、速度、射击频率等方面都强于飞镖和投矛器,但制造和维护成本更高,且需要双手操作,无法同时持盾,但其优点大于缺点而获得广泛使用。弓箭由有机材料制成,没有石器、骨器或金属工具那样容易保存,因此确定其出现时间和流行时间比较困难。
- 《二重螺旋》两次通过更新向玩家传播恶意程序
免费抽卡游戏《二重螺旋(Duet Night Abyss)》开发商英雄游戏为 3 月 18 日发生的“网络安全事故”道歉,攻击者利用游戏启动器的更新向用户传播了窃取信息的恶意程序 Trojan:MSIL/UmbralStealer.DG!MTB,该恶意程序主要被用于窃取密码和加密货币。包含恶意程序的更新是在 3 月 18 日 7:39 am UTC 在 Steam 上释出的。 这不是《二重螺旋》最近几个月第一次遭遇安全事故,二月底它发生过类似的事故,攻击者主要引导用户去玩《原神》,恶意程度略低,可能主要是发出警告。
- 卫星照片显示澳大利亚的红色沙漠在大雨之后变成绿色
位于澳洲地理中心的小镇爱丽丝泉(Alice Springs),因其锈红色的沙漠景观而常被称为红土中心(Red Centre)。然而在 2026 年 2 月至 3 月连续数周的大雨过后,沙漠披上了一层绿装。NASA Terra 卫星显示,原本因富含铁质的岩石氧化而呈现红褐色的地貌,已完全被新生的植被覆盖。根据澳洲气象局的数据,北领地在 2026 年 2 月的平均降雨量高达 239 毫米,创下自 1900 年有纪录以来排名第三的湿润二月。这场地貌转变的背后也伴随着严重的地面洪涝灾害,洪水不仅连根拔起了树木,更造成许多民众受困。