Monthly Digest — 2026-05
216 unique stories across 31 days and 8 sources.
Hacker News(44)
- Credit cards are vulnerable to brute force attacks (metin.nextc.org)
- Ti-84 Evo (education.ti.com)
- New research suggests people can communicate and practice skills while dreaming (www.newyorker.com)
- City Learns Flock Accessed Cameras in Children's Gymnastics Room as a Sales Demo (www.404media.co)
- VS Code inserting 'Co-Authored-by Copilot' into commits regardless of usage (github.com)
- Do_not_track (donottrack.sh)
- Dav2d (code.videolan.org)
- California to begin ticketing driverless cars that violate traffic laws (www.bbc.com)
- Statue of a man blinded by a flag put up by Banksy in central London (www.smithsonianmag.com)
- OpenAI's o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors (www.theguardian.com)
- I built my own hair electrolysis machine (www.scd31.com)
- Denuvo has been cracked in all single-player games it previously protected (www.tomshardware.com)
- Microsoft Edge stores all passwords in memory in clear text, even when unused (twitter.com)
- Heat pump sales rise across Europe (www.pv-magazine.com)
- US healthcare marketplaces shared citizenship and race data with ad tech giants (techcrunch.com)
- Days without GitHub incidents (www.dayswithoutgithubincident.com)
- .de TLD offline due to DNSSEC? (dnssec-analyzer.verisignlabs.com)
- California farmers to destroy 420k peach trees following Del Monte bankruptcy (www.sfgate.com)
- IBM didn't want Microsoft to use the Tab key to move between dialog fields (devblogs.microsoft.com)
- Computer Use is 45x more expensive than structured APIs (reflex.dev)
GitHub Trending(20)
- TauricResearch / TradingAgents
- soxoj / maigret
- warpdotdev / warp
- 1jehuang / jcode
- ruvnet / ruflo
- browserbase / skills
- Hmbown / DeepSeek-TUI
- virattt / dexter
- docusealco / docuseal
- addyosmani / agent-skills
- PriorLabs / TabPFN
- anthropics / financial-services
- z-lab / dflash
- InsForge / InsForge
- bytedance / UI-TARS-desktop
- rohitg00 / agentmemory
- datawhalechina / hello-agents
- CloakHQ / CloakBrowser
- yikart / AiToEarn
- playcanvas / supersplat
Product Hunt(44)
- Postiz
Agentic social media scheduler for agents like OpenClaw
- Bitgrain
Design studio lighter than Figma & more flexible than Canva
- Buda
Recruit agents to run your company as a synchronous team
- Zed 1.0
High-performance, open source, multiplayer code editor
- Cloud Computer by Manus
A dedicated cloud machine for bots and software
- Feather
Photo editor with local AI
- Filect
Organize Your Files With AI
- Microsoft Copilot Health
Dedicated space to bring your personal health data together
- Mockin 2.0
Ultimate career toolkit for UX/UI & Product designers
- Rosentic
Catch when coding agents break each other before merge
- Radar
The missing open-source Kubernetes UI
- PandaProbe
open source agent engineering platform
- Sleek Analytics for iOS
Your website analytics in your pocket
- Flowly
Your personal AI assistant, native to your desktop
- Claude Code & Codex Usage Trading Cards by Rudel
Get your trading card based on your CC & codex usage
- Dropy
Track prices on stores like Amazon, eBay, & AliExpress
- Flowstep 1.0
AI design engineer to turn your thoughts into editable UI
- Kilo Code v7 for VS Code
Parallel agents, diff reviewer, and multi-model comparisons
- Blaze
The AI-powered calendar that plans your day for you.
- Velo 2.0
Instantly turn your voice and screen into shareable videos
Hugging Face(29)
- Heterogeneous Scientific Foundation Model Collaboration
Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.
- Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis toward intelligent visual generation: plausible visuals grounded in structure, dynamics, domain knowledge, and causal relations. To frame this shift, we introduce a five-level taxonomy: Atomic Generation, Conditional Generation, In-Context Generation, Agentic Generation, and World-Modeling Generation, progressing from passive renderers to interactive, agentic, world-aware generators. We analyze key technical drivers, including flow matching, unified understanding-and-generation models, improved visual representations, post-training, reward modeling, data curation, synthetic data distillation, and sampling acceleration. We further show that current evaluations often overestimate progress by emphasizing perceptual quality while missing structural, temporal, and causal failures. By combining benchmark review, in-the-wild stress tests, and expert-constrained case studies, this roadmap offers a capability-centered lens for understanding, evaluating, and advancing the next generation of intelligent visual generation systems.
- Co-Evolving Policy Distillation
RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first training experts and then performing OPD, though avoiding divergence, fails to fully absorb teacher capabilities due to large behavioral pattern gaps between teacher and student. We propose Co-Evolving Policy Distillation (CoPD), which encourages parallel training of experts and introduces OPD during each expert's ongoing RLVR training rather than after complete expert training, with experts serving as mutual teachers (making OPD bidirectional) to co-evolve. This enables more consistent behavioral patterns among experts while maintaining sufficient complementary knowledge throughout. Experiments validate that CoPD achieves all-in-one integration of text, image, and video reasoning capabilities, significantly outperforming strong baselines such as mixed RLVR and MOPD, and even surpassing domain-specific experts. The model parallel training pattern offered by CoPD may inspire a novel training scaling paradigm.
- ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.
- Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliably reconstruct method evolution topologies from unstructured text. We introduce Intern-Atlas, a methodological evolution graph that automatically identifies method-level entities, infers lineage relationships among methodologies, and captures the bottlenecks that drive transitions between successive innovations. Built from 1,030,314 papers spanning AI conferences, journals, and arXiv preprints, the resulting graph comprises 9,410,201 semantically typed edges, each grounded in verbatim source evidence, forming a queryable causal network of methodological development. To operationalize this structure, we further propose a self-guided temporal tree search algorithm for constructing evolution chains that trace the progression of methods over time. We evaluate the quality of the resulting graph against expert-curated ground-truth evolution chains and observe strong alignment. In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graphs as a foundational data layer for the emerging automated scientific discovery.
- UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Recent progress has shown that video diffusion models (VDMs) can be repurposed for diverse multimodal graphics tasks. However, existing methods often train separate models for each problem setting, which fixes the input-output mapping and limits the modeling of correlations across modalities. We present UniVidX, a unified multimodal framework that leverages VDM priors for versatile video generation. UniVidX formulates pixel-aligned tasks as conditional generation in a shared multimodal space, adapts to modality-specific distributions while preserving the backbone's native priors, and promotes cross-modal consistency during synthesis. It is built on three key designs. Stochastic Condition Masking (SCM) randomly partitions modalities into clean conditions and noisy targets during training, enabling omni-directional conditional generation instead of fixed mappings. Decoupled Gated LoRA (DGL) introduces per-modality LoRAs that are activated when a modality serves as the generation target, preserving the strong priors of the VDM. Cross-Modal Self-Attention (CMSA) shares keys and values across modalities while keeping modality-specific queries, facilitating information exchange and inter-modal alignment. We instantiate UniVidX in two domains: UniVid-Intrinsic, for RGB videos and intrinsic maps including albedo, irradiance, and normal; and UniVid-Alpha, for blended RGB videos and their constituent RGBA layers. Experiments show that both models achieve performance competitive with state-of-the-art methods across distinct tasks and generalize robustly to in-the-wild scenarios, even when trained on fewer than 1,000 videos. Project page: https://houyuanchen111.github.io/UniVidX.github.io/
- Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
Agentic web search increasingly faces two distinct demands: deep reasoning over a single target, and structured aggregation across many entities and heterogeneous sources. Current systems struggle on both fronts. Breadth-oriented tasks demand schema-aligned outputs with wide coverage and cross-entity consistency, while depth-oriented tasks require coherent reasoning over long, branching search trajectories. We introduce Web2BigTable, a multi-agent framework for web-to-table search that supports both regimes. Web2BigTable adopts a bi-level architecture in which an upper-level orchestrator decomposes the task into sub-problems and lower-level worker agents solve them in parallel. Through a closed-loop run--verify--reflect process, the framework jointly improves decomposition and execution over time via persistent, human-readable external memory, with self-evolving updates to each single-agent. During execution, workers coordinate through a shared workspace that makes partial findings visible, allowing them to reduce redundant exploration, reconcile conflicting evidence, and adapt to emerging coverage gaps. Web2BigTable sets a new state of the art on WideSearch, reaching an Avg@4 Success Rate of 38.50 (7.5times the second best at 5.10), Row F1 of 63.53 (+25.03 over the second best), and Item F1 of 80.12 (+14.42 over the second best). It also generalises to depth-oriented search on XBench-DeepSearch, achieving 73.0 accuracy. Code is available at https://github.com/web2bigtable/web2bigtable.
- Map2World: Segment Map Conditioned Text to 3D World Generation
3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.
- Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions
Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.
- MolmoAct2: Action Reasoning Models for Real-world Deployment
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency for their grounding, and fine-tuned success rates remain below the threshold for dependable use. We present MolmoAct2, a fully open action reasoning model built for practical deployment, advancing its predecessor along five axes. We introduce MolmoER, a VLM backbone specialized for spatial and embodied reasoning, trained on a 3.3M-sample corpus with a specialize-then-rehearse recipe. We release three new datasets spanning low-to-medium cost platforms, including MolmoAct2-BimanualYAM, 720 hours of teleoperated bimanual trajectories that constitute the largest open bimanual dataset to date, together with quality-filtered Franka (DROID) and SO100/101 subsets. We provide OpenFAST, an open-weight, open-data action tokenizer trained on millions of trajectories across five embodiments. We redesign the architecture to graft a flow-matching continuous-action expert onto a discrete-token VLM via per-layer KV-cache conditioning. Finally, we propose MolmoThink, an adaptive-depth reasoning variant that re-predicts depth tokens only for scene regions that change between timesteps, retaining geometric grounding at a fraction of prior latency. In the most extensive empirical study of any open VLA to date, spanning 7 simulation and real-world benchmarks, MolmoAct2 outperforms strong baselines including Pi-05, while MolmoER surpasses GPT-5 and Gemini Robotics ER-1.5 across 13 embodied-reasoning benchmarks. We release model weights, training code, and complete training data. Project page: https://allenai.org/blog/molmoact2
- From Context to Skills: Can Language Models Learn from Context Skillfully?
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.
- Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to ensure sustained, on-demand visual perception. Integrated as a parallel branch alongside the Feed-Forward Network (FFN) in LVLMs, PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for precise visual perception, thereby structurally mitigating the signal suppression inherent to deep generation. Extensive experiments on Qwen3-VL models demonstrate that PVM brings notable improvements with negligible parameter overhead, delivering consistent average accuracy gains across both 4B and 8B scales, particularly in complex reasoning tasks that demand persistent visual perception. Furthermore, in-depth analysis reveals that PVM can resist length-induced signal decay and accelerate internal prediction convergence.
- Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German, French, or Japanese, aggressive filtering creates a strategic dilemma: should practitioners prioritize diversity by training once on large amounts of lightly filtered web data, or prioritize quality by strictly filtering for a high-quality core and repeating it over multiple epochs? We investigate this trade-off for German by constructing hierarchical quality filters applied to 500M web documents, comparing multi-epoch training on the filtered subsets against single-pass training on a diverse corpus. Our experiments across multiple model scales and token budgets show that repeating high-quality data consistently outperforms single-pass training on larger, less filtered sets. Notably, the performance gap persists even after 7 epochs. Our findings suggest that for non-English LLMs, semantic concentration through quality filtering offers a more viable path to efficient language modeling than simply maximizing unique data volume. We release our German language models (called Boldt), as well as our cleaned evaluation benchmarks to the research community. Our experiments indicate that they achieve state-of-the-art results despite training on 10-360x fewer tokens than comparable models.
- ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance of agent systems built on LLMs depends on both the model weights and the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor's framing. Therefore, we present ARIS as a research harness that coordinates machine-learning research workflows through cross-model adversarial collaboration as a default configuration: an executor model drives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusable Markdown-defined skills, model integrations via MCP, a persistent research wiki for iterative reuse of prior findings, and deterministic figure generation. The orchestration layer coordinates five end-to-end workflows with adjustable effort settings and configurable routing to reviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence: integrity verification, result-to-claim mapping, and claim auditing that cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-pass scientific-editing pipeline, mathematical-proof checks, and visual inspection of the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.
- OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
- Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). However, SFT introduces distributional drift that neither preserves the model's original capabilities nor faithfully matches the supervision distribution. This problem is further amplified in multimodal reasoning, where perception errors and reasoning failures follow distinct drift patterns that compound during subsequent RL. We introduce PRISM, a three-stage pipeline that mitigates this drift by inserting an explicit distribution-alignment stage between SFT and RLVR. Building on the principle of on-policy distillation (OPD), PRISM casts alignment as a black-box, response-level adversarial game between the policy and a Mixture-of-Experts (MoE) discriminator with dedicated perception and reasoning experts, providing disentangled corrective signals that steer the policy toward the supervision distribution without requiring access to teacher logits. While 1.26M public demonstrations suffice for broad SFT initialization, distribution alignment demands higher-fidelity supervision; we therefore curate 113K additional demonstrations from Gemini 3 Flash, featuring dense visual grounding and step-by-step reasoning on the hardest unsolved problems. Experiments on Qwen3-VL show that PRISM consistently improves downstream RLVR performance across multiple RL algorithms (GRPO, DAPO, GSPO) and diverse multimodal benchmarks, improving average accuracy by +4.4 and +6.0 points over the SFT-to-RLVR baseline on 4B and 8B, respectively. Our code, data, and model checkpoints are publicly available at https://github.com/XIAO4579/PRISM.
- X2SAM: Any Segmentation in Images and Videos
Multimodal Large Language Models (MLLMs) have demonstrated strong image-level visual understanding and reasoning, yet their pixel-level perception across both images and videos remains limited. Foundation segmentation models such as the SAM series produce high-quality masks, but they rely on low-level visual prompts and cannot natively interpret complex conversational instructions. Existing segmentation MLLMs narrow this gap, but are usually specialized for either images or videos and rarely support both textual and visual prompts in one interface. We introduce X2SAM, a unified segmentation MLLM that extends any-segmentation capabilities from images to videos. Given conversational instructions and visual prompts, X2SAM couples an LLM with a Mask Memory module that stores guided vision features for temporally consistent video mask generation. The same formulation supports generic, open-vocabulary, referring, reasoning, grounded conversation generation, interactive, and visual grounded segmentation across image and video inputs. We further introduce the Video Visual Grounded (V-VGD) segmentation benchmark, which evaluates whether a model can segment object tracks in videos from interactive visual prompts. With a unified joint training strategy over heterogeneous image and video datasets, X2SAM delivers strong video segmentation performance, remains competitive on image segmentation benchmarks, and preserves general image and video chat ability.
- Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Distillation-based acceleration has become foundational for making autoregressive streaming video diffusion models practical, with distribution matching distillation (DMD) as the de facto choice. Existing methods, however, train the student to match the teacher's output indiscriminately, treating every rollout, frame, and pixel as equally reliable supervision. We argue that this caps distilled quality, since it overlooks two complementary axes of variance in DMD supervision: Inter-Reliability across student rollouts whose supervision varies in reliability, and Intra-Perplexity across spatial regions and temporal frames that contribute unequally to where quality can still be improved. The objective thus conflates two questions under a uniform weight: whether to learn from each rollout, and where to concentrate optimization within it. To address this, we propose Stream-R1, a Reliability-Perplexity Aware Reward Distillation framework that adaptively reweights the distillation objective at both rollout and spatiotemporal-element levels through a single shared reward-guided mechanism. At the Inter-Reliability level, Stream-R1 rescales each rollout's loss by an exponential of a pretrained video reward score, so that rollouts with reliable supervision dominate optimization. At the Intra-Perplexity level, it back-propagates the same reward model to extract per-pixel gradient saliency, which is factored into spatial and temporal weights that concentrate optimization pressure on regions and frames where refinement yields the largest expected gain. An adaptive balancing mechanism prevents any single quality axis from dominating across visual quality, motion quality, and text alignment. Stream-R1 attains consistent improvements on all three dimensions over distillation baselines on standard streaming video generation benchmarks, without architectural modification or additional inference cost.
- Stream-T1: Test-Time Scaling for Streaming Video Generation
While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering computational overhead while enabling fine-grained temporal control. Driven by this insight, we introduced Stream-T1, a pioneering comprehensive TTS framework exclusively tailored for streaming video generation. Specifically, Stream-T1 is composed of three units: (1) Stream -Scaled Noise Propagation, which actively refines the initial latent noise of the generating chunk using historically proven, high-quality previous chunk noise, effectively establishes temporal dependency and utilizing the historical Gaussian prior to guide the current generation; (2) Stream -Scaled Reward Pruning, which comprehensively evaluates generated candidates to strike an optimal balance between local spatial aesthetics and global temporal coherence by integrating immediate short-term assessments with sliding-window-based long-term evaluations; (3) Stream-Scaled Memory Sinking, which dynamically routes the context evicted from KV-cache into distinct updating pathways guided by the reward feedback, ensuring that previously generated visual information effectively anchors and guides the subsequent video stream. Evaluated on both 5s and 30s comprehensive video benchmarks, Stream-T1 demonstrates profound superiority, significantly improving temporal consistency, motion smoothness, and frame-level visual quality.
- RLDX-1 Technical Report
While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. π_{0.5} and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while π_{0.5} and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.
Techmeme(44)
- Servers operated by Ubuntu and its parent company Canonical have been down for more than a day, following a "sustained, cross-border attack" (Dan Goodin/Ars Technica)
Dan Goodin / Ars Technica : Servers operated by Ubuntu and its parent company Canonical have been down for more than a day, following a “sustained, cross-border attack” — Servers operated by Ubuntu and its parent company Canonical were knocked offline on Thursday morning and have remained down ever since …
- Apple has stopped offering a 256GB storage option for the Mac mini globally; Mac mini now starts at 512GB for $799 in the US (Joe Rossignol/MacRumors)
Joe Rossignol / MacRumors : Apple has stopped offering a 256GB storage option for the Mac mini globally; Mac mini now starts at 512GB for $799 in the US — Apple this week stopped offering a 256GB storage option for the Mac mini worldwide. As a result, the desktop computer now has a higher starting price.
- The Academy of Motion Picture Arts and Sciences issues new rules saying acting and writing must be performed by humans and not AI to be eligible for Oscars (Lisa Richwine/Reuters)
Lisa Richwine / Reuters : The Academy of Motion Picture Arts and Sciences issues new rules saying acting and writing must be performed by humans and not AI to be eligible for Oscars — Academy Awards organizers issued new rules on Friday to clarify that acting and writing must be performed by humans …
- Sources: Cerebras is seeking to raise as much as $4B in its IPO and is targeting a valuation of about $40B (Bloomberg)
Bloomberg : Sources: Cerebras is seeking to raise as much as $4B in its IPO and is targeting a valuation of about $40B — Cerebras Systems Inc. is seeking to raise as much as $4 billion in its initial public offering, according to people familiar with the matter, as demand for the artificial intelligence chipmaker …
- Analysis: after Trump's World Liberty raised $550M from investors, tokens worth hundreds of millions in USD were privately sold in "white glove" transactions (Olga Kharif/Bloomberg)
Olga Kharif / Bloomberg : Analysis: after Trump's World Liberty raised $550M from investors, tokens worth hundreds of millions in USD were privately sold in “white glove” transactions — The pitch was straightforward: Invest in the cryptocurrency venture of Donald Trump and his family …
- Investigation: Nobitex was founded by two brothers from Iran's elite Kharrazi family; the crypto exchange processed hundreds of millions beyond US sanctions (Reuters)
Reuters : Investigation: Nobitex was founded by two brothers from Iran's elite Kharrazi family; the crypto exchange processed hundreds of millions beyond US sanctions — Two brothers from the elite Kharrazi family, using an alternative surname, started up Nobitex in 2018.
- A Chinese court ruled that companies cannot terminate staff just to replace them with AI, following a similar ruling by another Chinese court in December 2025 (Victor Swezey/Bloomberg)
Victor Swezey / Bloomberg : A Chinese court ruled that companies cannot terminate staff just to replace them with AI, following a similar ruling by another Chinese court in December 2025 — A Chinese court ruled that companies cannot terminate employees just to replace them with artificial intelligence systems …
- Study: OpenAI's o1 correctly diagnosed 67% of emergency room patients using electronic records and a few sentences from nurses, vs. to 50-55% for triage doctors (Robert Booth/The Guardian)
Robert Booth / The Guardian : Study: OpenAI's o1 correctly diagnosed 67% of emergency room patients using electronic records and a few sentences from nurses, vs. to 50-55% for triage doctors — Researchers say results mark a ‘profound change in technology that will reshape medicine’ — From George Clooney in ER …
- A look at Atlassian and Twilio earnings beats, with early signs of Atlassian's AI response success and Twilio becoming a picks-and-shovels layer for AI agents (Jason Lemkin/SaaStr)
Jason Lemkin / SaaStr : A look at Atlassian and Twilio earnings beats, with early signs of Atlassian's AI response success and Twilio becoming a picks-and-shovels layer for AI agents — So this week two of the more important bellwether names in B2B software reported earnings. And neither of them just “beat.”
- Analysis: Asian suppliers account for ~90% of Nvidia's production costs, up from 65% in 2025, as latest wave of collaborations shifts from chips to physical AI (Abhishek Vishnoi/Bloomberg)
Abhishek Vishnoi / Bloomberg : Analysis: Asian suppliers account for ~90% of Nvidia's production costs, up from 65% in 2025, as latest wave of collaborations shifts from chips to physical AI — The list of Asian stocks that benefit from business partnership with Nvidia Corp. is getting longer, as the region further integrates …
- JLL: Japan's $23B data center market is set to grow ~50% by 2030, with 90% of sites concentrated in densely populated regions, prompting pushback from residents (Financial Times)
Financial Times : JLL: Japan's $23B data center market is set to grow ~50% by 2030, with 90% of sites concentrated in densely populated regions, prompting pushback from residents — Japan is getting ready for a huge surge in AI facilities — and complaints from nearby residents
- How Amazon's expansion into fashion helped Jeff Bezos enter fashion's inner circle, as he and Lauren Sánchez Bezos become underwriters for this year's Met Gala (Chavie Lieber/Wall Street Journal)
Chavie Lieber / Wall Street Journal : How Amazon's expansion into fashion helped Jeff Bezos enter fashion's inner circle, as he and Lauren Sánchez Bezos become underwriters for this year's Met Gala — The Amazon founder and Lauren Sánchez Bezos have become front-row fixtures through business expansion and charitable giving
- Duolingo reports Q1 revenue up 27% YoY to $292M, vs. $288.5M est., bookings up 14% to $308.5M, and expects slower growth in Q2; DUOL drops 12%+ after hours (Akash Sriram/Reuters)
Akash Sriram / Reuters : Duolingo reports Q1 revenue up 27% YoY to $292M, vs. $288.5M est., bookings up 14% to $308.5M, and expects slower growth in Q2; DUOL drops 12%+ after hours — Duolingo (DUOL.O) posted strong first-quarter results but signaled a more measured growth trajectory ahead, as the language-learning …
- Musk v. Altman: Greg Brockman testifies that his OpenAI stake is now worth ~$30B; a Musk attorney asks why he hasn't donated $29B to OpenAI's nonprofit arm (Bloomberg)
Bloomberg : Musk v. Altman: Greg Brockman testifies that his OpenAI stake is now worth ~$30B; a Musk attorney asks why he hasn't donated $29B to OpenAI's nonprofit arm — OpenAI co-founder and President Greg Brockman testified that his stake in the startup is now worth almost $30 billion …
- Palantir reports Q1 revenue up 85% YoY to $1.63B, vs. $1.54B est., US government revenue up 84% to $687M, and US commercial revenue up 133% to $595M (Jaspreet Singh/Reuters)
Jaspreet Singh / Reuters : Palantir reports Q1 revenue up 85% YoY to $1.63B, vs. $1.54B est., US government revenue up 84% to $687M, and US commercial revenue up 133% to $595M — Palantir Technologies (PLTR.O) beat Wall Street estimates for first-quarter revenue on Monday, driven by rising demand for its data analytics software …
- Elon Musk agrees to pay $1.5M to settle SEC allegations that he cheated Twitter shareholders in 2022 by failing to disclose the 5%+ stake he had in the company (Nicola M White/Bloomberg)
Nicola M White / Bloomberg : Elon Musk agrees to pay $1.5M to settle SEC allegations that he cheated Twitter shareholders in 2022 by failing to disclose the 5%+ stake he had in the company — Elon Musk agreed to settle Securities and Exchange Commission allegations that he cheated Twitter shareholders out of millions …
- Kaspersky says Daemon Tools, a widely used app for mounting disk images, has been backdoored in a monthlong compromise that has pushed malicious updates (Dan Goodin/Ars Technica)
Dan Goodin / Ars Technica : Kaspersky says Daemon Tools, a widely used app for mounting disk images, has been backdoored in a monthlong compromise that has pushed malicious updates — Daemon Tools, a widely used app for mounting disk images, has been backdoored in a monthlong compromise that has pushed malicious updates …
- EA reports Q4 net bookings up 3.6% YoY to $1.86B, vs. $2B est., weighed down by a post-launch drop-off in engagement for Battlefield 6 (Anhata Rooprai/Reuters)
Anhata Rooprai / Reuters : EA reports Q4 net bookings up 3.6% YoY to $1.86B, vs. $2B est., weighed down by a post-launch drop-off in engagement for Battlefield 6 — Videogame publisher Electronic Arts (EA.O) missed quarterly bookings estimates on Tuesday, weighed down by a post-launch drop-off in engagement for its …
- GlobalFoundries reports Q1 revenue up 3% YoY to $1.63B, in line with est., and forecasts Q2 revenue and adjusted earnings above estimates; GFS closes up 9.28% (Patrick Seitz/Investor's Business Daily)
Patrick Seitz / Investor's Business Daily : GlobalFoundries reports Q1 revenue up 3% YoY to $1.63B, in line with est., and forecasts Q2 revenue and adjusted earnings above estimates; GFS closes up 9.28% — Contract chipmaker GlobalFoundries (GFS) on Tuesday beat earnings estimates on in-line sales for the first quarter.
- Micron closes up 11% after announcing its highest-capacity SSD has started to ship, lifting its market cap past $700B for the first time; Sandisk closes up 12% (Lola Murti/CNBC)
Lola Murti / CNBC : Micron closes up 11% after announcing its highest-capacity SSD has started to ship, lifting its market cap past $700B for the first time; Sandisk closes up 12% — Micron's historic rally continued on Tuesday, with shares of the memory maker surging 11%, lifting the company's market cap past $700 billion for the first time.
Solidot(35)
- 数据中心开发商 Pure Data 暂停中东投资项目
在其设施遭袭受损之后,数据中心开发商 Pure Data 暂停所有中东项目投资。Pure Data 在欧洲、亚洲和中东运营或开发逾 1GW 的数据中心。数据中心作为基础设施成为了战争中的一个重要目标。亚马逊 AWS 在中东有三座数据中心遭到袭击,导致中东客户的服务出现大规模中断,迫使亚马逊宣布免除其中东云区域客户所有费用,导致其损失了约 1.5 亿美元。Pure Data 位于阿布扎比 Yas Island 的数据中心园区遭到了弹片的袭击。该公司没有披露发生的时间以及受损情况。
- 德国 2025 年新生儿数量降至 1946 年以来最低水平
德国联邦统计局的初步数据显示,2025 年新生儿数量降至 1946 年以来最低水平。2025 年德国新生儿数约 65.5 万,远低于 1964 年婴儿潮高峰时的 136 万,2024 年的新生儿数据是 68 万。与此同时德国死亡人数接近 101 万,使得 2025 年死亡人数与出生人数之差超过 35.2 万,创战后历史新高。德国出生率连续第四年下降,目前每名妇女平均生育 1.35 个孩子,创历史新低,远低于维持人口稳定所需的 2.1 个孩子。汉堡是唯一一个生育率上升的德国州,2025 年增长了 0.5%。
- Google 给你贴上的价格标签
瑞士邮件服务商 Proton 利用 2025 年广告竞价数据,分析了逾 54,000 个人口画像,估算广告商为触达不同类美国人所支付的价格。结果显示不同人之间的价格差距远超想象。美国人平均每年产生的广告价值约 1,605 美元;一名居住在蒙大拿州 Bozeman 市、年龄 35-44 岁之间、无子女、用台式机进行高价值企业搜索的男性,其广告价值估计为 17,929.30 美元;一位居住在阿肯色州 Fort Smith 市、年龄在 18-24 岁之间、用 Android 手机进行低价值搜索的父亲,其广告价值仅为 31.05 美元。1,605 美元的平均值与 760 美元的中位数显示,少数高价值用户拉高了平均值,而此类商业模式依赖于高价值用户。分析显示,无子女用户的广告价值比有子女用户平均高出约 17%,一旦某个用户被标记为有子女,针对他们的广告投放会从每次点击 6 美元的财富管理广告转向每次点击 2 美元的面包车和幼儿园广告。台式机用户的价值是 Android 用户的 4.9 倍,苹果 iPhone 用户的价值是 Android 用户的 2.7 倍。用户年龄在 35-44 岁之间时广告价值最高,65 岁后广告价值下降——虽然老年用户价值下降,但针对他们的广告则属于高消费类别如医保补充保险、药品和金融产品。老年人的总体价值降低,但广告商的投放力度更精准。为什么蒙大拿州 Bozeman 市居民的广告价值高?因为大量远程科技工作者的涌入和户外休闲消费使其成为全美竞争最激烈的本地广告市场之一。
- 亚洲多国加大燃煤发电以应对能源危机
最新的中东能源危机促使亚洲国家加大燃煤发电,而煤炭是高污染排放来源,如果这一趋势继续,全球气候变化问题将会愈发严峻。印度宣布推迟对国内燃煤电厂的维护检查。国际能源署(IEA)的数据显示,截至 2023 年,印度发电中煤炭占 74%。石油和天然气合计约占 3%,来自中东的采购存在制约,印度通过增加煤炭火力来避免停电风险。泰国电力公司重启原计划停用的 2 座燃煤机组。韩国暂时解除了以发电能力 80% 为上限的煤炭火电站的运行限制,推迟原定于 6 月关闭的两座火电站的关闭时间。日本也将提高煤炭火力发电站的开工率。孟加拉国则增加煤炭的供应来源。全球最大的发电用煤炭出口国印尼计划上调原定为 6 亿吨的 2026 年煤炭生产计划。第二大出口国澳大利亚政府也计划扩大煤炭生产。
- 为什么 OpenAI 的系统提示词要专门限制 Goblins
OpenAI Codex CLI 系统提示词专门加入了一条对地精(Goblins)等词的限制:“never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query”。官方解释称,从 GPT-5.1 开始该公司的模型在比喻中提及 goblin 等词的频率大增,ChatGPT 中 goblin 的使用量增加了 175%,gremlin 使用量增加了 52%。它为此展开了调查,发现是因为 Nerdy 个性无意中奖励了此类比喻,导致高频使用 goblin 的行为扩散。为解决该问题,OpenAI 淘汰了 Nerdy 个性,移除了对 goblin 友好的奖励信号,从训练数据过滤掉相关示例,防止其再次不恰当的出现。
- 瑞士将于六月公投是否将人口限制在一千万
瑞士将于 6 月 14 日举行全民公投,决定是否在 2050 年前将全国常住人口限制在一千万以内。瑞士的人口出生率为每名妇女生育 1.29 个孩子,远低于 2.1 的人口替代率,它的人口增长主要归因于外来移民。目前瑞士人口已超过 900 万,官方数据显示,2024 年外国公民占到了瑞士总人口的 27% 以上。右翼的瑞士人民党(Swiss People's Party)支持的提案要求“2050 年前瑞士常住人口不得超过 1000 万,且瑞士应放弃与欧盟的自由流动协议”。对瑞士 16176 名受访者的最新民调显示,52%的人支持或倾向于支持该提案,46% 的人反对,其余未表明立场。
- 内核曝出 Root 提权漏洞 Copy Fail
Xint Code 团队报告了被称为 Copy Fail 的内核 root 提权漏洞。该漏洞非常容易利用,影响 2017 年以来的几乎所有内核版本。在漏洞披露前内核安全团队没有提前通知发行版也引发了争议。内核不将损坏的页面标记为可写回,因此磁盘上的文件内容不变,但内存中的页面缓存已被篡改。访问文件时,系统读取的是页面缓存,因此损坏的数据会立即影响整个系统。本地非特权用户可通过损坏 setuid 二进制文件的页面缓存获取 root 权限。由于页面缓存在主机和容器之间共享,攻击者可以跨容器边界利用此漏洞。该漏洞影响几乎所有发行版,主要发行版都已经释出或准备释出补丁。
- Mozilla 反对 Chrome 的 Prompt API
Google Chrome 在 2025 年提出了 Prompt API,也就是为浏览器集成的本地模型——使用前需要下载——提供统一的 JavaScript API。Google 还有意让该 API 成为一个 W3C 标准。Chrome 桌面版集成的大模型是 Gemini Nano,使用该模型需要本地设备至少有 4GB 显存、16GB 内存和至少 22GB 可用空间(浏览器所在硬盘)。Mozilla 开发者发表声明反对 Chrome 的 Prompt API。开发者认为该 API 存在巨大的互操作性问题,因为不同的模型都有各种独特的特性,因此系统提示词需要对模型进行针对性调整,然而对一个模型进行的调整对另一个模型就可能是过度修正。为了实现互操作性,Mozilla 和 Apple 可能不得不获得 Google 模型的授权,或者发布一个与 Google 模型特性兼容的模型。另一个大问题是模型的中立性缺乏。
- VS Code 默认在 commit 中插入 Co-Authored-by Copilot
微软的编辑器 VS Code 被发现默认在 commit 中插入了 Co-Authored-by Copilot,不管用户有没有使用其 AI 助手 Copilot。此事再次在用户中引发了大量批评。微软开发者回应称他们将会在下个版本中解决默认启用的问题,称如果用户没有使用 AI 助手那么就不应该说代码是 Copilot 合作编写的。
- 中国三月绿色技术出口增长七成
因霍尔木兹海峡封锁引发的新一轮能源危机,世界各国正加速向清洁能源转型,最大的绿色技术出口国中国三月的太阳能、电池和电动汽车的出口总额同比增长 70%,其中出口的太阳能装机容量达到 68GW,电池出口额达到 100 亿美元,电动汽车和混合动力汽车出口同比增长 140%。多达 50 个国家从中国进口的太阳能设备都创历史新高。
- Steam 用户中使用 Linux 比例占 4.52%
2026 年 3 月 Steam 玩家中使用 Linux 比例达到了史无前例的 5.33%,比前一个月增加了一倍多。根据 Valve 公布的 2026 年 4 月 Steam 硬件和软件调查,Steam 用户中使用 Linux 比例回落到了 4.52%,减少 0.81%,但仍然比去年同期翻了一番。Windows 操作系统的比例提高到 93.47%,OSX 占 2.01%。有众多证据表明 Linux 上的游戏表现有了翻天覆地的变化,而 Linux 下游戏的一大特性是需要的资源比 Windows 更少,在今天内存价格飙升的时期显得更有吸引力。其它数据显示:简体中文用户比例占 23.41%,英语用户占 36.77% 。用户使用英特尔 CPU 的比例占 55.81%,AMD 占 44.18%,几乎和前一个月相同。
- 英国 NHS 以 AI 为由准备关闭所有开源库
日程安排平台 Cal.com 上月宣布从开源转为闭源,理由是 AI 工具更容易从开源代码中发现漏洞,而安全性依赖于模糊,因此闭源有助于提高安全。现在英国国家医疗服务体系(NHS)以相同的理由准备关闭它几乎所有的开源库,这一决定引发了广泛争议和批评。批评者指出 NHS 公布的大部分开源库是数据集、内部工具、指南、研究工具、前端设计等,它们不会因为安全扫描技术的进步而受到影响。此外是否开源对于 Anthropic Mythos 之类的 AI 工具并无区别,因为它们也能分析二进制程序并寻找漏洞。批评者发表了公开信,呼吁 NHS 保持其代码公开。
- 科学家发现咖啡如何影响肠道和大脑
根据发表在《Nature Communications》期刊上的一项研究,科学家发现常饮用含咖啡因和不含咖啡因的咖啡会影响肠道菌群,从而影响情绪和压力水平。研究人员对比了 31 名常饮用咖啡者和 31 名不喝咖啡者。常饮用咖啡者指的是每天饮用 3-5 杯咖啡的人。实验开始时,咖啡饮用者停止饮用咖啡两周。在此期间,研究人员持续收集生物样本监测心理健康状况。实验期间参与者并不知道自己饮用的是含咖啡因的咖啡还是不含咖啡因的咖啡。一半参与者饮用不含咖啡因的咖啡,另一半饮用普通咖啡。参与者都报告情绪有所改善,这一结果显示即使不含咖啡因咖啡也能改善情绪。研究还发现常饮用咖啡者有更高的埃格特菌属(Eggertella sp.)和短隐杆菌(Cryptobacterium curtum),更多的厚壁菌门(Firmicutes)。只有摄入不含咖啡因的人才表现出学习和记忆力的提升,而只有摄入咖啡因的参与者才体验到焦虑减轻以及注意力和警觉性提高。
- 天文学家发现 27 颗围绕双恒星运行的候选行星
天文学家发现了 27 颗围绕双恒星运行的候选行星,类似星球大战里的沙漠行星塔图因(Tatooine)。天文学家至今发现了 18 颗环双星行星,但类似环绕太阳运行的的单恒星行星则发现了逾 8000 颗。科学家以前通过凌日现象识别环双星行星,但需要在特定条件下才能观测到。现在他们采用了轨道进动(apsidal precession),寻找相互绕行且发生掩食的双星系统中轨道出现的摆动,这种摆动通常只能用存在第三个天体去解释。研究团队利用 NASA Transiting Exoplanet Survey Satellite 卫星收集的数据,从 1590 个恒星系统中识别出 36 个候选天体,其中 27 个天体可能具有行星质量。研究人员表示需要更多研究才能确定它们是否是环双星行星。
- MS Edge 被发现会在内存中明文加载所有密码
MS Edge 浏览器被发现启动时会在内存中明文加载其保存的所有密码。相比下 Chrome 只在需要时解密凭证,没有将所有密码保存在内存中。Edge 和 Chrome 都是基于开源的 Chromium。微软的做法让从内存中抓取重要数据变得更容易,也增加了共享环境下密码泄露的风险。安全研究人员将这一问题报告给了微软,收到的回应是该行为就是这么设计的。研究人员在 GitHub 上发布了概念演示工具 EdgeSavedPasswordsDumper。
- NetHack 释出 5.0 版本
有 39 年历史的 Roguelike 游戏 NetHack 释出 5.0 版本。NetHack 在 1987 年发布了最早的版本,名字中的 Net 指的是通过网络合作开发,Hack 则指的是角色扮演中的 hack 和 slash。玩家可以在游戏中扮演骑士、野蛮人、巫师、游侠、女武神、僧侣和武士等不同职业,目标是在地下城的最底层获取 Yendor 的项链并将其供奉给自己的神灵。NetHack 5.0 除了支持 Windows,还支持 MS-DOS 和 Amiga,此外它是完全开源的,可以编译在 Linux 等类 Unix 系统运行。NetHack 5.0 现在可以通过支持 C99 标准的编译器进行编译,使用 Lua 生成地下城,在游戏初期阶段新增了一个可选教程,等等。
- CNN 创始人 Ted Turner 去世,享年 87 岁
CNN 创始人 Ted Turner 去世,享年 87 岁。他创办的 CNN 以 24 小时实时播报全球新闻闻名,对电视新闻产生了革命性影响。CNN 创办于 1980 年 6 月 1 日,是第一个 24 小时播报新闻的有线电视网。1995 年 CNN 出售给了时代华纳,Turner 退出了电视行业,他一直称 CNN 是其一生最伟大的成就。
- 研究称吃鸡蛋有助于降低阿尔茨海默病风险
研究人员发现,每天吃一个鸡蛋,每周至少五天,可将患阿尔茨海默病风险降低最多 27%。每月吃 1-3 次鸡蛋可将风险降低 17%,每周吃 2-4 次鸡蛋可将风险降低 20%。研究人员称,鸡蛋能提供有益于大脑健康的关键营养素。鸡蛋提供胆碱,胆碱是乙酰胆碱和磷脂酰胆碱的前体,而乙酰胆碱和磷脂酰胆碱对记忆和突触功能至关重要。鸡蛋还含有叶黄素和玉米黄素——这些类胡萝卜素与认知能力的提高和氧化应激的降低有关。鸡蛋还含有重要的 ω-3 脂肪酸,蛋黄尤其富含磷脂,磷脂占鸡蛋总脂质的近 30%,对神经递质受体的功能至关重要。这项研究获得了美国鸡蛋委员会的资助。
- OpenAI 总裁被迫在法庭作证时阅读自己的个人日记
马斯克(Elon Musk)上周在法庭上作证指控 OpenAI 的另外两位联合创始人 Greg Brockman 和 Sam Altman 放弃创办时的其非营利使命以谋取个人私利。本周 Brockman 出庭作证,被迫在陪审团前阅读个人日记,似乎印证了马斯克的指控。Brockman 称他从学生时期就写日记,在职业生涯中通过写日记去思考重大决策。这些日记是在去年 10 月作为证据递交到法庭,今年 1 月解封。2017 年马斯克向 OpenAI 发出最后通牒,要么完全由他掌控 OpenAI 的营利性部门,要么 OpenAI 继续保持非营利性质。而 Brockman 同一时间在日记里畅谈了赚钱的好处。在 OpenAI 成立了不由马斯克掌控的营利性部门之后,Brockman 个人在 OpenAI 的股份如今价值 300 亿美元。他还在日记中纠结投票反对马斯克的计划或者投票支持将马斯克逐出董事会是否在道德上是错误的。他在日记中写道:“从他手中夺走这家非营利机构是错误的。在道德上是败坏的。”
- 奥斯卡奖拒绝 AI 演员和 AI 创作的剧本
负责评选奥斯卡奖的美国电影艺术与科学学院宣布,只有人类表演和人类创作的剧本才有资格获得奥斯卡奖提名。奥斯卡奖不会全面禁止 AI 工具的使用,但将根据人类是否在创意作品中仍然扮演核心角色去评判电影。电影艺术与科学学院表示,如果电影制作人在作品中使用了 AI 工具,此类工具既不会帮助也不会损害其获得提名的机会。这是电影学院首次明确奖项只颁发给人类的表演和人类创作的剧本。