DIGEST · 2026-04-28

OrangeBot.AI Digest — 2026-04-28

88 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Ghostty is leaving GitHub (mitchellh.com)
  2. Waymo in Portland (waymo.com)
  3. Claude.ai unavailable and elevated errors on the API (status.claude.com)
  4. Anthropic Joins the Blender Development Fund as Corporate Patron (www.blender.org)
  5. Google and Pentagon reportedly agree on deal for 'any lawful' use of AI (www.theverge.com)
  6. Your phone is about to stop being yours (keepandroidopen.org)
  7. OpenAI CEO's Identity Verification Company Announced Fake Bruno Mars Partnership (www.vice.com)
  8. UAE Leaves OPEC (www.reuters.com)
  9. UAE to leave OPEC (www.ft.com)
  10. GitHub Copilot code review will start consuming GitHub Actions minutes (github.blog)
  11. VibeVoice: Open-source frontier voice AI (github.com)
  12. Localsend: An open-source cross-platform alternative to AirDrop (github.com)
  13. Period tracking app, Flo, found to be selling user data to Meta (femtechdesigndesk.substack.com)
  14. Who owns the code Claude Code wrote? (legallayer.substack.com)
  15. An Update on GitHub Availability (github.blog)

GitHub Trending(13)

  1. mattpocock / skills
  2. abhigyanpatwari / GitNexus
  3. ComposioHQ / awesome-codex-skills
  4. microsoft / VibeVoice
  5. davila7 / claude-code-templates
  6. HunxByts / GhostTrack
  7. fspecii / ace-step-ui
  8. public-apis / public-apis
  9. CJackHwang / ds2api
  10. Alishahryar1 / free-claude-code
  11. donnemartin / system-design-primer
  12. EbookFoundation / free-programming-books
  13. iamgio / quarkdown

Product Hunt(15)

  1. Kinhub

    Scalable coaching that drives real business impact

  2. Flitch

    Turn your data into insights

  3. Doza Assist

    Open-source local AI that learns how you edit video

  4. Clera

    An AI agent matching candidates to the right roles.

  5. SureThing.io

    Autonomous agent that communicates results like a human

  6. Happy Horse

    Top-tier AI video generation and editing from Alibaba ATH

  7. Actian VectorAI DB

    The portable vector database for AI agents beyond the cloud

  8. Lovable mobile app

    Your ideas don't wait for you to sit down at a desk

  9. Blueprint

    One-shot bigger coding tasks

  10. Lumen Tool

    A 3D portrait lighting simulator

  11. Social Fetch

    Pull real-time data from any social platform via API.

  12. Colir

    Gradients that don't look like defaults

  13. MaxHermes by Minimax

    AI agent that builds skills from every task you give it

  14. Jitera

    Shared context that turns AI into your teammate

  15. Couch Critic

    Netflix killed comments. We brought them back.

Hugging Face(15)

  1. From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

    Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that governs how a workforce of agents is assembled, governed, and improved over time, decoupled from what individual agents know. To fill this gap, we introduce OneManCompany (OMC), a framework that elevates multi-agent systems to the organisational level. OMC encapsulates skills, tools, and runtime configurations into portable agent identities called Talents, orchestrated through typed organisational interfaces that abstract over heterogeneous backends. A community-driven Talent Market enables on-demand recruitment, allowing the organisation to close capability gaps and reconfigure itself dynamically during execution. Organisational decision-making is operationalised through an Explore-Execute-Review (E^2R) tree search, which unifies planning, execution, and evaluation in a single hierarchical loop: tasks are decomposed top-down into accountable units and execution outcomes are aggregated bottom-up to drive systematic review and refinement. This loop provides formal guarantees on termination and deadlock freedom while mirroring the feedback mechanisms of human enterprises. Together, these contributions transform multi-agent systems from static, pre-configured pipelines into self-organising and self-improving AI organisations capable of adapting to open-ended tasks across diverse domains. Empirical evaluation on PRDBench shows that OMC achieves an 84.67% success rate, surpassing the state of the art by 15.48 percentage points, with cross-domain case studies further demonstrating its generality.

  2. World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

    Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utilizing Flow-GRPO, we optimize the model using feedback from pre-trained 3D foundation models and vision-language models to enforce structural coherence without altering the underlying architecture. We further employ a periodic decoupled training strategy to balance rigid geometric consistency with dynamic scene fluidity. Extensive evaluations reveal that our approach significantly enhances 3D consistency while preserving the original visual quality of the foundation model, effectively bridging the gap between video generation and scalable world simulation.

  3. ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

    Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth for video-based evaluation, reconstruction and annotation artifacts can miss objects that are clearly visible in the video, mislabel object identities, or corrupt geometry-dependent answers (e.g., size), yielding incorrect or ambiguous QA pairs. Second, evaluations often assume full-scene access, while many VLMs operate on sparsely sampled frames (e.g., 16-64), making many questions effectively unanswerable under the actual model inputs. We improve evaluation validity by introducing ReVSI, a benchmark and protocol that ensures each QA pair is answerable and correct under the model's actual inputs. To this end, we re-annotate objects and geometry across 381 scenes from 5 datasets to improve data quality, and regenerate all QA pairs with rigorous bias mitigation and human verification using professional 3D annotation tools. We further enhance evaluation controllability by providing variants across multiple frame budgets (16/32/64/all) and fine-grained object visibility metadata, enabling controlled diagnostic analyses. Evaluations of general and domain-specific VLMs on ReVSI reveal systematic failure modes that are obscured by prior benchmarks, yielding a more reliable and diagnostic assessment of spatial intelligence.

  4. Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

    Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time latency constraints on defense, error propagation over long-horizon trajectories, and vulnerabilities in the data supply chain. Yet the literature remains fragmented across robotic learning, adversarial machine learning, AI alignment, and autonomous systems safety. This survey provides a unified and up-to-date overview of safety in Vision-Language-Action models. We organize the field along two parallel timing axes, attack timing (training-time vs. inference-time and defense timing (training-time vs. inference-time, linking each class of threat to the stage at which it can be mitigated. We first define the scope of VLA safety, distinguishing it from text-only LLM safety and classical robotic safety, and review the foundations of VLA models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning and backdoors, as well as inference-time attacks including adversarial patches, cross-modal perturbations, semantic jailbreaks, and freezing attacks. We review training-time and runtime defenses, analyze existing benchmarks and metrics, and discuss safety challenges across six deployment domains. Finally, we highlight key open problems, including certified robustness for embodied trajectories, physically realizable defenses, safety-aware training, unified runtime safety architectures, and standardized evaluation.

  5. Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

    Unified multimodal models typically rely on pretrained vision encoders and use separate visual representations for understanding and generation, creating misalignment between the two tasks and preventing fully end-to-end optimization from raw pixels. We introduce Tuna-2, a native unified multimodal model that performs visual understanding and generation directly based on pixel embeddings. Tuna-2 drastically simplifies the model architecture by employing simple patch embedding layers to encode visual input, completely discarding the modular vision encoder designs such as the VAE or the representation encoder. Experiments show that Tuna-2 achieves state-of-the-art performance in multimodal benchmarks, demonstrating that unified pixel-space modelling can fully compete with latent-space approaches for high-quality image generation. Moreover, while the encoder-based variant converges faster in early pretraining, Tuna-2's encoder-free design achieves stronger multimodal understanding at scale, particularly on tasks requiring fine-grained visual perception. These results show that pretrained vision encoders are not necessary for multimodal modelling, and end-to-end pixel-space learning offers a scalable path toward stronger visual representations for both generation and perception.

  6. ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

    Language-model agents are increasingly used as persistent coworkers that assist users across multiple working days. During such workflows, the surrounding environment may change independently of the agent: new emails arrive, calendar entries shift, knowledge-base records are updated, and evidence appears across images, scanned PDFs, audio, video, and spreadsheets. Existing benchmarks do not adequately evaluate this setting because they typically run within a single static episode and remain largely text-centric. We introduce , a benchmark for coworker agents built around multi-turn multi-day tasks, a stateful sandboxed service environment whose state evolves between turns, and rule-based verification. The current release contains 100 tasks across 13 professional scenarios, executed against five stateful sandboxed services (filesystem, email, calendar, knowledge base, spreadsheet) and scored by 1537 deterministic Python checkers over post-execution service state; no LLM-as-judge is invoked during scoring. We benchmark seven frontier agent systems. The strongest model reaches 75.8 weighted score, but the best strict Task Success is only 20.0\%, indicating that partial progress is common while complete end-to-end workflow completion remains rare. Turn-level analysis shows that performance drops after the first exogenous environment update, highlighting adaptation to changing state as a key open challenge. We release the benchmark, evaluation harness, and construction pipeline to support reproducible coworker-agent evaluation.

  7. SketchVLM: Vision language models can annotate images to explain thoughts and guide users

    When answering questions about images, humans naturally point, label, and draw to explain their reasoning. In contrast, modern vision-language models (VLMs) such as Gemini-3-Pro and GPT-5 only respond with text, which can be difficult for users to verify. We present SketchVLM, a training-free, model-agnostic framework that enables VLMs to produce non-destructive, editable SVG overlays on the input image to visually explain their answers. Across seven benchmarks spanning visual reasoning (maze navigation, ball-drop trajectory prediction, and object counting) and drawing (part labeling, connecting-the-dots, and drawing shapes around objects), SketchVLM improves visual reasoning task accuracy by up to +28.5 percentage points and annotation quality by up to 1.48x relative to image-editing and fine-tuned sketching baselines, while also producing annotations that are more faithful to the model's stated answer. We find that single-turn generation already achieves strong accuracy and annotation quality, and multi-turn generation opens up further opportunities for human-AI collaboration. An interactive demo and code are at https://sketchvlm.github.io/.

  8. Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis

    Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first present a empirical study revealing that general-domain PRMs struggle to supervise data analysis agents. Specifically, they fail to detect silent errors, logical flaws that yield incorrect results without triggering interpreter exceptions, and erroneously penalize exploratory actions, mistaking necessary trial-and-error exploration for grounding failures. To bridge this gap, we introduce DataPRM, a novel environment-aware generative process reward model that (1) can serve as an active verifier, autonomously interacting with the environment to probe intermediate execution states and uncover silent errors, and (2) employs a reflection-aware ternary reward strategy that distinguishes between correctable grounding errors and irrecoverable mistakes. We design a scalable pipeline to construct over 8K high-quality training instances for DataPRM via diversity-driven trajectory generation and knowledge-augmented step-level annotation. Experimental results demonstrate that DataPRM improves downstream policy LLMs by 7.21% on ScienceAgentBench and 11.28% on DABStep using Best-of-N inference. Notably, with only 4B parameters, DataPRM outperforms strong baselines, and exhibits robust generalizability across diverse Test-Time Scaling strategies. Furthermore, integrating DataPRM into Reinforcement Learning yields substantial gains over outcome-reward baselines, achieving 78.73% on DABench and 64.84% on TableBench, validating the effectiveness of process reward supervision. Code is available at https://github.com/zjunlp/DataMind.

  9. For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs

    Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them computationally prohibitive for billion-parameter models and precluding batch parallelization. In this work, we introduce For-Value, a forward-only data valuation framework that enables efficient batch-scalable value estimation while maintaining effectiveness. Leveraging the expressive power of pretrained LLMs/VLMs, we theoretically demonstrate that data valuation can be captured by the alignment between the final hidden representations and prediction errors at the last layer. In light of this insight, For-Value computes data value using a simple closed-form expression with a single forward pass, eliminating the need for costly backpropagation and enabling efficient batch calculating at scale. Extensive experiments show that For-Value matches or outperforms gradient-based baselines in detecting influential data and mislabeled data, while achieving significant efficiency improvements.

  10. Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

    Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing effectively leverages domain expert knowledge, we find it simultaneously induces a human-like cognitive bias known as Actor-Observer Asymmetry (AOA). Specifically, an agent acting as an actor (during self-reflection) tends to attribute failures to external factors, whereas an observer (during mutual auditing) attributes the same errors to internal faults. We quantify this using our new Ambiguous Failure Benchmark, which reveals that simply swapping perspectives triggers the AOA effect in over 20% of cases for most models. To tame this bias, we introduce ReTAS (Reasoning via Thesis-Antithesis-Synthesis), a model trained through dialectical alignment to enforce perspective-invariant reasoning. By integrating dialectical chain-of-thought with Group Relative Policy Optimization, ReTAS guides agents to synthesize conflicting viewpoints into an objective consensus. Experiments demonstrate that ReTAS effectively mitigates attribution inconsistency and significantly improves fault resolution rates in ambiguous scenarios.

  11. Efficient Agent Evaluation via Diversity-Guided User Simulation

    Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this approach is computationally inefficient, repeatedly regenerating identical early prefixes, and often fails to uncover deep failure modes that arise from rare user behaviors. We introduce DIVERT (Diversity-Induced Evaluation via Branching of Trajectories), an efficient, snapshot-based, coverage-guided user simulation framework for systematic exploration of agent-user interactions. DIVERT captures the full agent-environment state at critical decision points and resumes execution from these snapshots, enabling reuse of shared conversation prefixes and reducing redundant computation. From each junction, the framework branches using targeted, diversity-inducing user responses, allowing directed exploration of alternative interaction paths. By focusing evaluation on semantically diverse and underexplored trajectories, DIVERT improves both efficiency and coverage. Empirical results show that it discovers more failures per token compared to standard linear rollout protocols, while expanding the set of tasks on which failures are identified.

  12. OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

    Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-diversity annotations and outdated benchmarks. To alleviate these limitations, we propose OmniShotCut to formulate SBD as structured relational prediction, jointly estimating shot ranges with intra-shot relations and inter-shot relations, by a shot query-based dense video Transformer. To avoid imprecise manual labeling, we adopt a fully synthetic transition synthesis pipeline that automatically reproduces major transition families with precise boundaries and parameterized variants. We also introduce OmniShotCutBench, a modern wide-domain benchmark enabling holistic and diagnostic evaluation.

  13. Sapiens2

    We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substantially improves over its predecessor in both pretraining and post-training. First, to learn features that capture low-level details (for dense prediction) and high-level semantics (for zero-shot or few-label settings), we combine masked image reconstruction with self-distilled contrastive objectives. Our evaluations show that this unified pretraining objective is better suited for a wider range of downstream tasks. Second, along the data axis, we pretrain on a curated dataset of 1 billion high-quality human images and improve the quality and quantity of task annotations. Third, architecturally, we incorporate advances from frontier models that enable longer training schedules with improved stability. Our 4K models adopt windowed attention to reason over longer spatial context and are pretrained with 2K output resolution. Sapiens2 sets a new state-of-the-art and improves over the first generation on pose (+4 mAP), body-part segmentation (+24.3 mIoU), normal estimation (45.6% lower angular error) and extends to new tasks such as pointmap and albedo estimation. Code: https://github.com/facebookresearch/sapiens2

  14. UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

    Camera-controllable image editing aims to synthesize novel views of a given scene under varying camera poses while strictly preserving cross-view geometric consistency. However, existing methods typically rely on fragmented geometric guidance, such as only injecting point clouds at the representation level despite models containing multiple levels, and are mainly based on image diffusion models that operate on discrete view mappings. These two limitations jointly lead to geometric drift and structural degradation under continuous camera motion. We observe that while leveraging video models provides continuous viewpoint priors for camera-controllable image editing, they still struggle to form stable geometric understanding if geometric guidance remains fragmented. To systematically address this, we inject unified geometric guidance across three levels that jointly determine the generative output: representation, architecture, and loss function. To this end, we propose UniGeo, a novel camera-controllable editing framework. Specifically, at the representation level, UniGeo incorporates a frame-decoupled geometric reference injection mechanism to provide robust cross-view geometry context. At the architecture level, it introduces geometric anchor attention to align multi-view features. At the loss function level, it proposes a trajectory-endpoint geometric supervision strategy to explicitly reinforce the structural fidelity of target views. Comprehensive experiments across multiple public benchmarks, encompassing both extensive and limited camera motion settings, demonstrate that UniGeo significantly outperforms existing methods in both visual quality and geometric consistency.

  15. TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction

    Existing document OCR largely targets plain text or Markdown, discarding the structural and executable properties that make LaTeX essential for scientific publishing. We study page-level reconstruction of scientific PDFs into compilable LaTeX and introduce TexOCR-Bench, a benchmark, and TexOCR-Train, a large-scale training corpus, for this task. TexOCR-Bench features a multi-dimensional evaluation suite that jointly assesses transcription fidelity, structural faithfulness, and end-to-end compilability. Leveraging TexOCR-Train, we train a 2B-parameter model, TexOCR, using supervised fine-tuning (SFT) and reinforcement learning (RL) with verifiable rewards derived from LaTeX unit tests that directly enforce compilability and referential integrity. Experiments across 21 frontier models on TexOCR-Bench show that existing systems frequently violate key document invariants, including consistent section structure, correct float placement, and valid label-reference links, which undermines compilation reliability and downstream usability. Our analysis further reveals that RL with verifiable rewards yields consistent improvements over SFT alone, particularly on structural and compilation metrics.

Techmeme(15)

  1. AWS launches a desktop app for its Amazon Quick AI assistant, letting users connect their tools and local files to build custom apps, live dashboards, and more (Jigar Thakkar/About Amazon)

    Jigar Thakkar / About Amazon : AWS launches a desktop app for its Amazon Quick AI assistant, letting users connect their tools and local files to build custom apps, live dashboards, and more —  Amazon Quick brings a personal AI assistant to your desktop.  Build presentations, intelligent dashboards, and more.

  2. Musk v. Altman: Musk testifies he's suing OpenAI because "it is not okay to steal a charity" and its pivot sets a concerning precedent for philanthropic efforts (Bloomberg)

    Bloomberg : Musk v. Altman: Musk testifies he's suing OpenAI because “it is not okay to steal a charity” and its pivot sets a concerning precedent for philanthropic efforts —  Arguments Begin in Musk, Altman Showdown  —  Video Player is loading.  —  Unmute  —  Current Time 0:00 Loaded: 17.06% Playback Rate

  3. NXP reports Q1 revenue up 12% YoY to $3.18B, vs. $3.15B est., and forecasts Q2 revenue above estimates; NXPI jumps 13%+ after hours (Christina Kyriasoglou/Bloomberg)

    Christina Kyriasoglou / Bloomberg : NXP reports Q1 revenue up 12% YoY to $3.18B, vs. $3.15B est., and forecasts Q2 revenue above estimates; NXPI jumps 13%+ after hours —  NXP Semiconductors NV jumped in late trading after giving an upbeat revenue forecast, a sign the chipmaker is bouncing back from a prolonged auto industry slump and tariff uncertainties.

  4. Sources: the US Commerce Department last week ordered multiple chip equipment companies to halt some shipments to China's second-largest chipmaker, Hua Hong (Karen Freifeld/Reuters)

    Karen Freifeld / Reuters : Sources: the US Commerce Department last week ordered multiple chip equipment companies to halt some shipments to China's second-largest chipmaker, Hua Hong —  The U.S. Department of Commerce last week ordered multiple chip equipment companies to halt certain tool shipments …

  5. Robinhood reports Q1 revenue up 15% YoY to $1.07B, vs. $1.14B est., and crypto revenue down 47% to $134M, vs. $147.6M est.; HOOD drops 6%+ after hours (Luke Kawa/Sherwood News)

    Luke Kawa / Sherwood News : Robinhood reports Q1 revenue up 15% YoY to $1.07B, vs. $1.14B est., and crypto revenue down 47% to $134M, vs. $147.6M est.; HOOD drops 6%+ after hours —  The brokerage just reported quarterly results.  —  Robinhood MarketsHOOD $77.11 (-2.29%) is sharply lower in postmarket trading after reporting underwhelming Q1 results:

  6. Seagate reports Q3 revenue up 44% YoY to $3.11B, vs. $2.96B est., and forecasts Q4 revenue and adjusted EPS above estimates; STX jumps 13%+ after hours (Zaheer Kachwala/Reuters)

    Zaheer Kachwala / Reuters : Seagate reports Q3 revenue up 44% YoY to $3.11B, vs. $2.96B est., and forecasts Q4 revenue and adjusted EPS above estimates; STX jumps 13%+ after hours —  Seagate Technology (STX.O) forecast fourth-quarter revenue and profit above Wall Street expectations on Tuesday, betting on strong demand …

  7. China's National Supercomputing Center in Shenzhen unveils the Lingshen project, aiming for 2+ exaFLOPS performance using a domestic-made CPU-only architecture (Luke James/Tom's Hardware)

    Luke James / Tom's Hardware : China's National Supercomputing Center in Shenzhen unveils the Lingshen project, aiming for 2+ exaFLOPS performance using a domestic-made CPU-only architecture —  Good luck with that. … China's National Supercomputing Center in Shenzhen announced the Lingshen supercomputer project …

  8. Sources: Apple plans an AI overhaul for photo editing in iOS 27, including using on-device AI models to extend, enhance, and reframe photos (Mark Gurman/Bloomberg)

    Mark Gurman / Bloomberg : Sources: Apple plans an AI overhaul for photo editing in iOS 27, including using on-device AI models to extend, enhance, and reframe photos —  Apple Inc. is planning a major overhaul of the built-in photo-editing features for the iPhone, iPad and Mac, leaning heavily on artificial intelligence to better compete with Android devices.

  9. South Africa withdraws its first draft national AI policy after revelations that it contained fictitious sources that appeared to have been AI-generated (Nellie Peyton/Reuters)

    Nellie Peyton / Reuters : South Africa withdraws its first draft national AI policy after revelations that it contained fictitious sources that appeared to have been AI-generated —  South Africa has withdrawn its first draft national AI policy after revelations that it contained fictitious sources in its reference list which appeared to have been AI-generated.

  10. Q&A with Sam Altman and AWS CEO Matt Garman about OpenAI's new partnership with AWS, Bedrock Managed Agents, Trainium chips, and more (Ben Thompson/Stratechery)

    Ben Thompson / Stratechery : Q&A with Sam Altman and AWS CEO Matt Garman about OpenAI's new partnership with AWS, Bedrock Managed Agents, Trainium chips, and more —  As I noted yesterday, today's Stratechery Interview is early in terms of my timing — Tuesday instead of Thursday — and late in terms of delivery …

  11. Sources: Google dropped out of a $100M Pentagon challenge to create tech for voice-controlled, autonomous drone swarms, following an internal ethics review (Katrina Manson/Bloomberg)

    Katrina Manson / Bloomberg : Sources: Google dropped out of a $100M Pentagon challenge to create tech for voice-controlled, autonomous drone swarms, following an internal ethics review —  Google abruptly dropped out of a $100 million Pentagon prize challenge to create technology for voice-controlled …

  12. Amazon and OpenAI announce an expanded deal that will make OpenAI's models available from AWS, a day after OpenAI revised its Microsoft partnership (Ina Fried/Axios)

    Ina Fried / Axios : Amazon and OpenAI announce an expanded deal that will make OpenAI's models available from AWS, a day after OpenAI revised its Microsoft partnership —  Amazon and OpenAI on Tuesday announced an expanded deal that will make the AI startup's models available from Amazon's cloud.

  13. AWS launches Amazon Connect Decisions and Amazon Connect Talent, which are AI agentic tools aimed at logistics workers and recruiters (Matt Day/Bloomberg)

    Matt Day / Bloomberg : AWS launches Amazon Connect Decisions and Amazon Connect Talent, which are AI agentic tools aimed at logistics workers and recruiters —  Amazon.com Inc.'s cloud unit, best known for providing technology infrastructure to corporations, is looking to sell AI-powered productivity software for the office.

  14. Lovable launches its AI coding app on iOS and Android, letting users code via voice or text AI prompts, and allowing them to switch between a PC and mobile (Sarah Perez/TechCrunch)

    Sarah Perez / TechCrunch : Lovable launches its AI coding app on iOS and Android, letting users code via voice or text AI prompts, and allowing them to switch between a PC and mobile —  Apple's recent crackdown on vibe-coding apps hasn't held up Lovable's launch of its no-code AI app builder, which is now available …

  15. Musk v. Altman: the judge asks Elon Musk and Sam Altman to "control your propensity to use social media to make things worse outside this courtroom" (Madlin Mekelburg/Bloomberg)

    Madlin Mekelburg / Bloomberg : Musk v. Altman: the judge asks Elon Musk and Sam Altman to “control your propensity to use social media to make things worse outside this courtroom” —  The judge overseeing a high-profile court case between Elon Musk and OpenAI has asked the tech executives involved in the legal battle …

Solidot(15)

  1. 调查显示对接种疫苗犹豫的人更可能阅读新右派新闻

    2025 年美国有 43 个州报告了逾 2000 例麻疹病例,几乎所有病例都发生在未接种疫苗人群中。2026 年的麻疹病例数仍在增加。美国学龄儿童的麻风腮三联疫苗(MMR)的接种率持续下降,徘徊在 93% 左右,低于 95% 的群体免疫阈值。研究人员调查了 2970 名成年人,虽然大多数美国人(83%)表示 MMR 疫苗的好处大于风险,但大约六分之一受访者表示对接种疫苗犹豫。犹豫的成年人总体更年轻,62% 的人年龄在 44 岁以下,且更有可能为人父母。他们更有可能是少数族裔、低收入和受教育程度较低的人。他们表达了更保守的政治信念,且更可能认同共和党(39%)或独立派(33%)。犹豫的成年人还更有可能认同“让美国再次健康”运动(MAHA),其比例为 43%,而非犹豫成年人占 27%。疫苗接种犹豫和非犹豫者之间的最大差异是前者偏爱阅读新右派新闻如 Breitbart、Newsmax 和 Zero Hedge。

  2. 尼安德特人和现代人类大脑之间主要是外观上的差异

    尼安德特人和现代智人大脑的头骨在外形上存在明显的差异:尼安德特人的头骨更扁更长,而现代人类更圆。根据发表在 PNAS 期刊上的一项研究《Neanderthal brain and cognition reconsidered》,外形上的差异并不能说明什么。对比 400 名(200 欧裔 200 汉族)现代人类大脑的 MRI 扫描和尼安德特人头骨内模,现代人类大脑之间的差异比更新世智人和和尼安德特人之间的差异更大。鉴于大脑大小并不能准确预测认知能力,尼安德特人的认知能力可能更接近现代人类,这意味着人类可能不是凭借更聪明或更强的适应能力而战胜尼安德特人。研究人员认为,尼安德特人大脑和认知能力差异,完全可纳入现代人类内部的差异。

  3. 肥胖的记忆会长时间留在免疫系统中

    根据发表在《EMBO Reports》期刊上一项历时十年的研究,被称为辅助 T 细胞(helper T cells)的免疫细胞会长期携带肥胖记忆。记忆通过 DNA 甲基化过程标记在免疫细胞的 DNA 上,在成功减肥之后仍然会持续 5-10 年。这意味着减肥者仍然会长期面临肥胖相关疾病风险。研究人员称,短期减肥可能无法立即降低肥胖相关的疾病风险,需要在几年时间里维持减肥后的体重才可能逆转肥胖对 T 细胞的影响。

  4. Mercor 4TB 语音样本被盗

    Mercor 是美国一家 AI 初创公司,主要业务是为其他 AI 公司提供专家帮助训练模型和聊天机器人。它招聘专家/合同工时要求对方提供护照或驾照扫描件、自拍和录制一段语音。本月初勒索组织 Lapsus$ 披露它从 Mercor 窃取了 4TB 语音样本。语音样本加上身份证件,引发了合同工们身份被盗用的担忧。已有至少五名合同工对 Mercor 提起了诉讼,指控该公司以训练数据的名义收集语音特征,但并未明确说明这些特征是永久性的生物识别标识符。现有的语音克隆技术只需要 15 秒钟的清晰参照音频,而 Mercor 要求合同工录制的语音长度达到了 2-5 分钟,足够实现语音克隆。

  5. 三星家电业务计划退出中国

    韩国三星电子计划 2026 年内撤出在中国的家电和电视销售。由于中国企业的崛起,该业务的收益持续低迷。将把该业务的重心转向表现强劲的美国。三星电子最快将在 4月 底最终决定停止在中国国内的家电和电视销售,将开始向当地员工和合作伙伴展开说明,将逐步处理中国国内的库存,2026 年内完全结束销售。该公司将把经营资源转向半导体和智能手机的销售。三星在中国生产冰箱和洗衣机等产品,将维持这些产品的生产体制。在结束向中国国内出货的同时,转为面向海外的供应基地。

  6. 网易运营的《暗黑4》国服限时免费

    网易运营的《暗黑破坏神 IV》国服限时一个月免费,领取需要实名——网易邮箱账号或手机号码。《暗黑破坏神 IV》于 2023 年 6 月 5 日发布,故事时间为《暗黑破坏神 III》的 50 年后,玩家可以自由选择游玩每个区域的顺序,在开放世界中体验旅途。本作中共有五张地图:黑暗森林索格伦、冰天山地裂峰山、无边沙漠干燥平原、毒雾沼泽哈维萨和堡垒废墟凯基斯坦。国服是在 2025 年 12 月 12 日正式上线,免费的是基础版本,扩展内容需要付费购买。

  7. ZSNES 原作者推出 Super ZSNES

    超任游戏机 SNES 的开源模拟器项目 ZSNES 在 2007 年 1 月 24 日发布 1.51 版本后停止了开发。时隔 19 年之后原作者 zsKnight 和 _Demo_ 推出了 ZSNES 的继任者 Super ZSNES。Super ZSNES 与 ZSNES 并无继承关系,新项目摒弃了旧代码,不借助 AI 辅助编程完全重写,相比原版本拥有更精确的 CPU 和音频核心,大量使用基于 GPU 的渲染,而非像原版那样依赖基于 CPU 的模拟。Super ZSNES 的 Super Enhancement Engine 目前支持 7 款热门游戏,未来会支持更多游戏,新增强为旧 SNES 游戏带来了截然不同的视觉和听觉体验,支持的游戏包括: F-Zero、Gradius 3、第一代 Mega Man X、Super Castlevania 4、Super Ghouls & Ghosts、Super Mario World 和 Super Metroid。

  8. GitHub Copilot 切换到基于使用量的计费方式

    微软旗下的代码托管平台 GitHub 通过其官方博客宣布 AI 辅助编程工具 GitHub Copilot 将从 6 月 1 日起切换到基于使用量的计费方式。GitHub 称 Copilot 每个套餐都含有定额 GitHub AI 积分,以前如果使用额度耗尽将切换到基于 Premium Requests 的计费方式,以后将切换到基于使用量的计费方式。Copilot 基础套餐价格不变,Pro 仍为每月 10 美元,Pro+ 仍为每月 39 美元,Business 仍为每用户每月 19 美元,Enterprise 仍为每用户每月 39 美元。代码补全和下次编辑建议功能不消耗 AI 积分。

  9. 研究发现三分之一新网站是 AI 生成或使用 AI 辅助

    斯坦福、伦敦帝国理工和互联网档案馆的研究人员发表论文《The Impact of AI-Generated Text on the Internet》,他们利用互联网档案馆的数据发现在 ChatGPT 发布三年之后,35% 的新网站是 AI 生成或使用 AI 辅助,而 ChatGPT 之前的比例是零。论文合作者、斯坦福 AI 研究员 Jonas Dolezal 说,人类用了几十年时间塑造互联网,但它的很大一部分仅仅三年就被 AI 重新定义,我们正见证数字化景观在短时间内的一次重大转变。

  10. pgBackRest 作者宣布停止维护该项目

    PostgreSQL 备份恢复项目 pgBackRest 的维护者 David Steele 宣布项目存档停止维护。Steele 解释说,过去 13 年 pgBackRest 是他倾注热情的项目,幸运的是大部分时间里他都有企业资助,他的长期赞助商是 Crunchy Data 公司,但这家公司被 Snowflake 收购了,而新东家无意资助他继续从事相关工作,因此他过去几个月一直在寻找继续这项工作的职位但没有成功,获得的赞助也远远未能达到维持项目运营所需的金额。因此他只能宣布停止维护。pgBackRest 被广泛视为是 PostgreSQL 生态系统最流行的运维工具之一,是功能最完整的 PostgreSQL 备份工具,支持块级增量备份、并行恢复、页面校验和验证等。

  11. Notepad++ 有了原生 macOS 版本

    流行的 Windows 文本编辑器 Notepad++ 终于有了原生 macOS 版本,但该版本不是 Notepad++ 作者侯今吾或其团队开发的,而是社区独立构建的版本。Notepad++ for Mac 使用 Cocoa 构建,支持使用 Apple Silicon 的新 Mac 以及使用英特尔 CPU 的旧 Mac。该项目没有依赖 Wine 或类似的兼容层工具,用 macOS 原生界面替换了 Windows 界面,保留了核心编辑引擎,支持插件,采用了和原版本相同的 GNU GPL v3 许可证。该项目由 Andrey Letov 等人维护。

  12. 老房子闹鬼可能源于陈旧设施产生的次声波

    觉得老房子闹鬼?你可能是受到了陈旧设施如旧管道和旧锅炉产生的次声波的影响。根据发表在《Frontiers in Behavioural Neuroscience》期刊上的一项研究,研究人员让 36 名志愿者听轻音乐或鬼屋景点播放的那种令人心神不宁的音乐。在参与者不知情下,研究人员悄悄在半数情况下播放了次声波。结果显示,次声波让志愿者感到更烦躁和恼怒,觉得音乐更悲伤,且唾液中的皮质醇水平更高。研究人员称,人耳听不到次声波,但身体和情绪仍然能做出反应,且通常是不愉快的反应。《The Science of Weird Shit: Why Our Minds Conjure the Paranormal》一书的作者 Chris French 教授认为用次声波解释闹鬼有点牵强。

  13. 欧洲批准了 Moderna 的流感和 COVID-19 联合疫苗

    欧洲批准了 Moderna 研发的基于 mRNA 技术的流感和 COVID-19 联合疫苗。被称为 mRNA-1083 或 mCOMBRIAX 的疫苗成为全球首个获得批准的针对这两种呼吸道病毒的联合疫苗。疫苗获批是基于一项 4000 名成年人参与的 III 期临床试验结果。试验分为两组,一组为 50-64 岁的受试者,与标准流感疫苗进行比较;另一组为 65 岁及以上的受试者,与高剂量流感疫苗进行比较。两组受试者中,相对于对照组 mCOMBRIAX 疫苗都能诱导对常见流感病毒株(A/H1N1、A/H3N2 和 B/Victoria)以及 SARS-CoV-2 病毒产生统计上显著更高的免疫反应。试验未发现安全性或不良反应方面的问题。

  14. 杀虫剂导致北美蝴蝶数量大减

    2025 年 3 月科学家在《科学》期刊上发表研究,Xerces Society for Invertebrate Conservation 保护协会随后发表了蝴蝶现状报告。研究发现,从 2000 年到 2020 年全美蝴蝶总数下降了 22%,有 24 种蝴蝶数量下降 90% 或以上。杀虫剂被认为是导致这一结果的主要原因。1960 年代化学公司研制出了强效杀虫剂滴滴涕(DDT),公众对滴滴涕的反对促使企业研制出弱化对人类伤害但强化对昆虫杀伤力的新杀虫剂。多种混合型杀虫剂的使用导致蝴蝶等昆虫在 21 世纪加速减少。生态学家 Matt Forister 等人在《Environmental Toxicology and Chemistry》期刊上报告,他们分析了 336 株植物只有 22 株植物没有检测到农药残留。这些植物至少含有三种化学物质,其中 71 株植物的农药浓度对蝴蝶而言是致命或接近致命。在 2022 年的一项类似研究中,Forister 团队分析了 33 家苗圃出售的 235 株乳草(对帝王蝶至关重要的植物),发现每株植物平均含有 12.2 种杀虫剂。

  15. 发改委要求撤销对 Manus 的收购

    国家发展改革委周一发布通报,外商投资安全审查工作机制办公室(国家发改委)依法依规对外资收购 Manus 项目作出禁止投资决定,要求当事人撤销该收购交易。《外商投资安全审查办法》于 2020 年 12 月 19 日由国家发展改革委、商务部联合发布,自 2021 年 1 月 18 日开始施行,对适用审查的外商投资类型、审查机构、审查范围、审查程序、审查决定监督执行和违规处理等进行规定。跟据该文件,国家建立外商投资安全审查工作机制,工作机制办公室设在国家发展改革委,由国家发展改革委、商务部牵头,承担外商投资安全审查的日常工作。