DIGEST · 2025-07-23

OrangeBot.AI Digest — 2025-07-23

75 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. AccuWeather to discontinue free access to Core Weather API (developer.accuweather.com)
  2. Major rule about cooking meat turns out to be wrong (www.seriouseats.com)
  3. How to increase your surface area for luck (usefulfictions.substack.com)
  4. CARA – High precision robot dog using rope (www.aaedmusa.com)
  5. The Promised LAN (tpl.house)
  6. You can now disable all AI features in Zed (zed.dev)
  7. Neil Armstrong's customs form for moon rocks (2016) (magazine.uc.edu)
  8. What to expect from Debian/Trixie (michael-prokop.at)
  9. Building better AI tools (hazelweakly.me)
  10. Proxmox Donates €10k to the Perl and Raku Foundation (www.perl.com)
  11. Cops say criminals use a Google Pixel with GrapheneOS – I say that's freedom (www.androidauthority.com)
  12. Geocities Backgrounds (pixelmoondust.neocities.org)
  13. Cerebras launches Qwen3-235B, achieving 1.5k tokens per second (www.cerebras.ai)
  14. Brave blocks Microsoft Recall by default (brave.com)
  15. AI groups spend to replace low-cost 'data labellers' with high-paid experts (www.ft.com)

GitHub Trending(15)

  1. srbhr / Resume-Matcher

    Improve your resumes with Resume Matcher. Get insights, keyword suggestions and tune your resumes to job descriptions.

  2. maybe-finance / maybe

    The personal finance app for everyone

  3. OpenBB-finance / OpenBB

    Investment Research for Everyone, Everywhere.

  4. moby / moby

    The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

  5. juspay / hyperswitch

    An open source payments switch written in Rust to make payments fast, reliable and affordable

  6. remoteintech / remote-jobs

    A list of semi to fully remote-friendly companies (jobs) in tech.

  7. donnemartin / system-design-primer

    Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

  8. roboflow / supervision

    We write your reusable computer vision tools. 💜

  9. zephyrproject-rtos / zephyr

    Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.

  10. microsoft / ai-agents-for-beginners

    11 Lessons to Get Started Building AI Agents

  11. frappe / hrms

    Open Source HR and Payroll Software

  12. Sjj1024 / PakePlus

    Turn any webpage/Vue/React and so on into desktop and mobile app under 5M with easy in few minutes. 轻松将任意网站/Vue/React等项目构建为轻量级(小于5M)多端桌面应用和手机应用仅需几分钟. https://ppofficial.netlify.app

  13. yeongpin / cursor-free-vip

    [Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machine. Please upgrade to pro. We have this limit in place to prevent abuse. Please let us know if you believe this is a mistake.

  14. jj-vcs / jj

    A Git-compatible VCS that is both simple and powerful

  15. steven2358 / awesome-generative-ai

    A curated list of modern Generative Artificial Intelligence projects and services

Product Hunt(15)

  1. Trickle - Magic Canvas

    The 1st Agentic Canvas for building apps visually with AI

  2. Clearitty

    Human-verified signals for smarter sales outreach

  3. commitify.me

    AI agents that CALL your phone to keep you on track

  4. Zams

    The Sales Automation Platform for B2B companies

  5. Qwen3-Coder

    A powerful open model for agentic coding tasks

  6. Super for iOS

    Counter/Activity Tracking hybrid app for minimalists

  7. SaveIt.now

    Organize nothing. Find everything.

  8. LLMs.txt Generator

    Convert websites to LLMs.txt format in seconds

  9. MinuteText

    $1 message for 1 minute or possibly FOREVER | Make it count

  10. Composer MCP Server

    Create, backtest, and execute trades directly in Claude.

  11. Readwell

    Find books you’ll love with AI-powered recommendations.

  12. TurboStyle

    Visual editor for any website

  13. Troov

    Meet someone special doing what you love

  14. iipmaps

    No-code maps, charts and stories from your data

  15. GenFa.st

    Your entire studio, in a single prompt.

Hugging Face(15)

  1. Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

    To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.

  2. Step-Audio 2 Technical Report

    This paper presents Step-Audio~2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. By integrating a latent audio encoder and reasoning-centric reinforcement learning (RL), Step-Audio 2 achieves promising performance in automatic speech recognition (ASR) and audio understanding. To facilitate genuine end-to-end speech conversation, Step-Audio 2 incorporates the generation of discrete audio tokens into language modeling, significantly enhancing its responsiveness to paralinguistic information such as speaking styles and emotions. To effectively leverage the rich textual and acoustic knowledge in real-world data, Step-Audio 2 integrates retrieval-augmented generation (RAG) and is able to call external tools such as web search to mitigate hallucination and audio search to switch timbres. Trained on millions of hours of speech and audio data, Step-Audio 2 delivers intelligence and expressiveness across diverse conversational scenarios. Evaluation results demonstrate that Step-Audio 2 achieves state-of-the-art performance on various audio understanding and conversational benchmarks compared to other open-source and commercial solutions. Please visit https://github.com/stepfun-ai/Step-Audio2 for more information.

  3. MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

    Scientific reasoning is critical for developing AI scientists and supporting human researchers in advancing the frontiers of natural science discovery. However, the open-source community has primarily focused on mathematics and coding while neglecting the scientific domain, largely due to the absence of open, large-scale, high-quality, verifiable scientific reasoning datasets. To bridge this gap, we first present TextbookReasoning, an open dataset featuring truthful reference answers extracted from 12k university-level scientific textbooks, comprising 650k reasoning questions spanning 7 scientific disciplines. We further introduce MegaScience, a large-scale mixture of high-quality open-source datasets totaling 1.25 million instances, developed through systematic ablation studies that evaluate various data selection methodologies to identify the optimal subset for each publicly available scientific dataset. Meanwhile, we build a comprehensive evaluation system covering diverse subjects and question types across 15 benchmarks, incorporating comprehensive answer extraction strategies to ensure accurate evaluation metrics. Our experiments demonstrate that our datasets achieve superior performance and training efficiency with more concise response lengths compared to existing open-source scientific datasets. Furthermore, we train Llama3.1, Qwen2.5, and Qwen3 series base models on MegaScience, which significantly outperform the corresponding official instruct models in average performance. In addition, MegaScience exhibits greater effectiveness for larger and stronger models, suggesting a scaling benefit for scientific tuning. We release our data curation pipeline, evaluation system, datasets, and seven trained models to the community to advance scientific reasoning research.

  4. Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers

    Diffusion transformers have emerged as an alternative to U-net-based diffusion models for high-fidelity image and video generation, offering superior scalability. However, their heavy computation remains a major obstacle to real-world deployment. Existing acceleration methods primarily exploit the temporal dimension such as reusing cached features across diffusion timesteps. Here, we propose Region-Adaptive Latent Upsampling (RALU), a training-free framework that accelerates inference along spatial dimension. RALU performs mixed-resolution sampling across three stages: 1) low-resolution denoising latent diffusion to efficiently capture global semantic structure, 2) region-adaptive upsampling on specific regions prone to artifacts at full-resolution, and 3) all latent upsampling at full-resolution for detail refinement. To stabilize generations across resolution transitions, we leverage noise-timestep rescheduling to adapt the noise level across varying resolutions. Our method significantly reduces computation while preserving image quality by achieving up to 7.0times speed-up on FLUX and 3.0times on Stable Diffusion 3 with minimal degradation. Furthermore, RALU is complementary to existing temporal accelerations such as caching methods, thus can be seamlessly integrated to further reduce inference latency without compromising generation quality.

  5. Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

    Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual CoT performance, which hinders reinforcement learning, and (2) the lack of high-quality visual CoT training data. We introduce Zebra-CoT, a diverse large-scale dataset with 182,384 samples, containing logically coherent interleaved text-image reasoning traces. We focus on four categories of tasks where sketching or visual reasoning is especially natural, spanning scientific questions such as geometry, physics, and algorithms; 2D visual reasoning tasks like visual search and jigsaw puzzles; 3D reasoning tasks including 3D multi-hop inference, embodied and robot planning; visual logic problems and strategic games like chess. Fine-tuning the Anole-7B model on the Zebra-CoT training corpus results in an improvement of +12% in our test-set accuracy and yields up to +13% performance gain on standard VLM benchmark evaluations. Fine-tuning Bagel-7B yields a model that generates high-quality interleaved visual reasoning chains, underscoring Zebra-CoT's effectiveness for developing multimodal reasoning abilities. We open-source our dataset and models to support development and evaluation of visual CoT.

  6. Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

    Enhancing large vision-language models (LVLMs) with visual slow-thinking reasoning is crucial for solving complex multimodal tasks. However, since LVLMs are mainly trained with vision-language alignment, it is difficult to adopt on-policy reinforcement learning (RL) to develop the slow thinking ability because the rollout space is restricted by its initial abilities. Off-policy RL offers a way to go beyond the current policy, but directly distilling trajectories from external models may cause visual hallucinations due to mismatched visual perception abilities across models. To address these issues, this paper proposes SOPHIA, a simple and scalable Semi-Off-Policy RL for vision-language slow-tHInking reAsoning. SOPHIA builds a semi-off-policy behavior model by combining on-policy visual understanding from a trainable LVLM with off-policy slow-thinking reasoning from a language model, assigns outcome-based rewards to reasoning, and propagates visual rewards backward. Then LVLM learns slow-thinking reasoning ability from the obtained reasoning trajectories using propagated rewards via off-policy RL algorithms. Extensive experiments with InternVL2.5 and InternVL3.0 with 8B and 38B sizes show the effectiveness of SOPHIA. Notably, SOPHIA improves InternVL3.0-38B by 8.50% in average, reaching state-of-the-art performance among open-source LVLMs on multiple multimodal reasoning benchmarks, and even outperforms some closed-source models (e.g., GPT-4.1) on the challenging MathVision and OlympiadBench, achieving 49.08% and 49.95% pass@1 accuracy, respectively. Analysis shows SOPHIA outperforms supervised fine-tuning and direct on-policy RL methods, offering a better policy initialization for further on-policy training.

  7. ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

    Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments. Existing approaches typically train VLA models in an end-to-end fashion, directly mapping inputs to actions without explicit reasoning, which hinders their ability to plan over multiple steps or adapt to complex task variations. In this paper, we propose ThinkAct, a dual-system framework that bridges high-level reasoning with low-level action execution via reinforced visual latent planning. ThinkAct trains a multimodal LLM to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency. These reasoning plans are compressed into a visual plan latent that conditions a downstream action model for robust action execution on target environments. Extensive experiments on embodied reasoning and robot manipulation benchmarks demonstrate that ThinkAct enables few-shot adaptation, long-horizon planning, and self-correction behaviors in complex embodied AI tasks.

  8. Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

    Vision-language models (VLMs) have been widely adopted in robotics to enable autonomous planning. However, grounding VLMs, originally trained on internet data, to diverse real-world robots remains a challenge. This paper presents ExpTeach, a framework that grounds VLMs to physical robots by building a self-generated memory of real-world experiences. In ExpTeach, the VLM autonomously plans actions, verifies outcomes, reflects on failures, and adapts robot behaviors in a closed loop. The self-generated experiences during this process are then summarized into a long-term memory, enabling retrieval of learned knowledge to guide future tasks via retrieval-augmented generation (RAG). Additionally, ExpTeach enhances the spatial understanding of VLMs with an on-demand image annotation module. In experiments, we show that reflection improves success rates from 36% to 84% on four challenging robotic tasks and observe the emergence of intelligent object interactions, including creative tool use. Across extensive tests on 12 real-world scenarios (including eight unseen ones), we find that grounding with long-term memory boosts single-trial success rates from 22% to 80%, demonstrating the effectiveness and generalizability of ExpTeach.

  9. HOComp: Interaction-Aware Human-Object Composition

    While existing image-guided composition methods may help insert a foreground object onto a user-specified region of a background image, achieving natural blending inside the region with the rest of the image unchanged, we observe that these existing methods often struggle in synthesizing seamless interaction-aware compositions when the task involves human-object interactions. In this paper, we first propose HOComp, a novel approach for compositing a foreground object onto a human-centric background image, while ensuring harmonious interactions between the foreground object and the background person and their consistent appearances. Our approach includes two key designs: (1) MLLMs-driven Region-based Pose Guidance (MRPG), which utilizes MLLMs to identify the interaction region as well as the interaction type (e.g., holding and lefting) to provide coarse-to-fine constraints to the generated pose for the interaction while incorporating human pose landmarks to track action variations and enforcing fine-grained pose constraints; and (2) Detail-Consistent Appearance Preservation (DCAP), which unifies a shape-aware attention modulation mechanism, a multi-view appearance loss, and a background consistency loss to ensure consistent shapes/textures of the foreground and faithful reproduction of the background human. We then propose the first dataset, named Interaction-aware Human-Object Composition (IHOC), for the task. Experimental results on our dataset show that HOComp effectively generates harmonious human-object interactions with consistent appearances, and outperforms relevant methods qualitatively and quantitatively.

  10. RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

    With the rapid advancement of Large Language Models (LLMs), developing effective critic modules for precise guidance has become crucial yet challenging. In this paper, we initially demonstrate that supervised fine-tuning for building critic modules (which is widely adopted in current solutions) fails to genuinely enhance models' critique abilities, producing superficial critiques with insufficient reflections and verifications. To unlock the unprecedented critique capabilities, we propose RefCritic, a long-chain-of-thought critic module based on reinforcement learning with dual rule-based rewards: (1) instance-level correctness of solution judgments and (2) refinement accuracies of the policy model based on critiques, aiming to generate high-quality evaluations with actionable feedback that effectively guides model refinement. We evaluate RefCritic on Qwen2.5-14B-Instruct and DeepSeek-R1-Distill-Qwen-14B across five benchmarks. On critique and refinement settings, RefCritic demonstrates consistent advantages across all benchmarks, e.g., 6.8\% and 7.2\% gains on AIME25 for the respective base models. Notably, under majority voting, policy models filtered by RefCritic show superior scaling with increased voting numbers. Moreover, despite training on solution-level supervision, RefCritic outperforms step-level supervised approaches on ProcessBench, a benchmark to identify erroneous steps in mathematical reasoning.

  11. PrefPalette: Personalized Preference Modeling with Latent Attributes

    Personalizing AI systems requires understanding not just what users prefer, but the reasons that underlie those preferences - yet current preference models typically treat human judgment as a black box. We introduce PrefPalette, a framework that decomposes preferences into attribute dimensions and tailors its preference prediction to distinct social community values in a human-interpretable manner. PrefPalette operationalizes a cognitive science principle known as multi-attribute decision making in two ways: (1) a scalable counterfactual attribute synthesis step that involves generating synthetic training data to isolate for individual attribute effects (e.g., formality, humor, cultural values), and (2) attention-based preference modeling that learns how different social communities dynamically weight these attributes. This approach moves beyond aggregate preference modeling to capture the diverse evaluation frameworks that drive human judgment. When evaluated on 45 social communities from the online platform Reddit, PrefPalette outperforms GPT-4o by 46.6% in average prediction accuracy. Beyond raw predictive improvements, PrefPalette also shed light on intuitive, community-specific profiles: scholarly communities prioritize verbosity and stimulation, conflict-oriented communities value sarcasm and directness, and support-based communities emphasize empathy. By modeling the attribute-mediated structure of human judgment, PrefPalette delivers both superior preference modeling and transparent, interpretable insights, and serves as a first step toward more trustworthy, value-aware personalized applications.

  12. Task-Specific Zero-shot Quantization-Aware Training for Object Detection

    Quantization is a key technique to reduce network size and computational complexity by representing the network parameters with a lower precision. Traditional quantization methods rely on access to original training data, which is often restricted due to privacy concerns or security challenges. Zero-shot Quantization (ZSQ) addresses this by using synthetic data generated from pre-trained models, eliminating the need for real training data. Recently, ZSQ has been extended to object detection. However, existing methods use unlabeled task-agnostic synthetic images that lack the specific information required for object detection, leading to suboptimal performance. In this paper, we propose a novel task-specific ZSQ framework for object detection networks, which consists of two main stages. First, we introduce a bounding box and category sampling strategy to synthesize a task-specific calibration set from the pre-trained network, reconstructing object locations, sizes, and category distributions without any prior knowledge. Second, we integrate task-specific training into the knowledge distillation process to restore the performance of quantized detection networks. Extensive experiments conducted on the MS-COCO and Pascal VOC datasets demonstrate the efficiency and state-of-the-art performance of our method. Our code is publicly available at: https://github.com/DFQ-Dojo/dfq-toolkit .

  13. SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

    Recent advances in large language models (LLMs) have opened new opportunities for academic literature retrieval. However, existing systems often rely on rigid pipelines and exhibit limited reasoning capabilities. We introduce SPAR, a multi-agent framework that incorporates RefChain-based query decomposition and query evolution to enable more flexible and effective search. To facilitate systematic evaluation, we also construct SPARBench, a challenging benchmark with expert-annotated relevance labels. Experimental results demonstrate that SPAR substantially outperforms strong baselines, achieving up to +56% F1 on AutoScholar and +23% F1 on SPARBench over the best-performing baseline. Together, SPAR and SPARBench provide a scalable, interpretable, and high-performing foundation for advancing research in scholarly retrieval. Code and data will be available at: https://github.com/xiaofengShi/SPAR

  14. Does More Inference-Time Compute Really Help Robustness?

    Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-reasoning) can also benefit from inference-time scaling using a simple budget forcing strategy. More importantly, we reveal and critically examine an implicit assumption in prior work: intermediate reasoning steps are hidden from adversaries. By relaxing this assumption, we identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law: if intermediate reasoning steps become explicitly accessible, increased inference-time computation consistently reduces model robustness. Finally, we discuss practical scenarios where models with hidden reasoning chains are still vulnerable to attacks, such as models with tool-integrated reasoning and advanced reasoning extraction attacks. Our findings collectively demonstrate that the robustness benefits of inference-time scaling depend heavily on the adversarial setting and deployment context. We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.

  15. Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

    Fine-tuning large language models (LLMs) can lead to unintended out-of-distribution generalization. Standard approaches to this problem rely on modifying training data, for example by adding data that better specify the intended generalization. However, this is not always practical. We introduce Concept Ablation Fine-Tuning (CAFT), a technique that leverages interpretability tools to control how LLMs generalize from fine-tuning, without needing to modify the training data or otherwise use data from the target distribution. Given a set of directions in an LLM's latent space corresponding to undesired concepts, CAFT works by ablating these concepts with linear projections during fine-tuning, steering the model away from unintended generalizations. We successfully apply CAFT to three fine-tuning tasks, including emergent misalignment, a phenomenon where LLMs fine-tuned on a narrow task generalize to give egregiously misaligned responses to general questions. Without any changes to the fine-tuning data, CAFT reduces misaligned responses by 10x without degrading performance on the training distribution. Overall, CAFT represents a novel approach for steering LLM generalization without modifying training data.

Solidot(15)

  1. Brave 浏览器默认屏蔽 Microsoft Recall

    Brave 浏览器对 Windows 11 用户默认屏蔽了受争议的 Microsoft Recall 功能。在这之前,Signal 桌面应用也宣布默认屏蔽 Recall。Microsoft Recall 会在用户使用电脑时每几秒钟截取一次屏幕截图,提取屏幕内容将其存储在一个可搜索的数据库中。Recall 引发了隐私和安全方面的争议,微软因此进行了多项调整。Brave 官方博客称,该浏览器的重心是默认最大化保护用户隐私,根据微软的说法,浏览器的隐私浏览窗口不会被截图,因此它通知操作系统 Brave 浏览器的所有浏览窗口都是“隐私窗口”,确保标签页内容不会被记录下来。

  2. 10%-25% 的肺癌患者从未吸烟

    Annie Chen 是在 2017 年第一次注意到异常的呼吸急促,家庭医生告诉她不要担心,她的父亲有重烟瘾,71 岁时因肺癌去世,但她从未吸过烟。两年后医生给她做了 X 光检查,显示肺癌晚期。Chen 女士的病例代表了研究和治疗肺癌的医生所面临的一个日益困惑的现实。因吸烟率下降,过去几十年肺癌的发病率和死亡率也在下降,但非吸烟的肺癌患者的比例则出现了增长。研究人员估计,全球约有 10%-25% 的肺癌患者从未吸烟,在部分亚裔和亚裔美国女性人群中,比例可能高达 50% 或以上。对全世界 871 名非吸烟肺癌患者的研究发现,对于生活在空气污染严重地区的人群,某些 DNA 突变更为常见。污染不仅会直接损害 DNA,还会加速细胞分裂。非吸烟肺癌病人的癌症生物学特征与吸烟者不同,可能需要不同的预防和检测策略。非吸烟肺癌患者更有可能携带特定的致癌突变,而吸烟者则会随着时间推移积累更多突变。

  3. 研究发现 AI 摘要会显著降低搜索结果页的点击率

    Google 已经为其搜索结果页面引入了 AI 摘要功能,它宣称该功能不会抢走网站的流量。然而皮尤研究中心的一项研究给出了不同的答案:AI 摘要会显著降低搜索结果页的点击率。研究人员分析 2025 年 3 月收集的 Ipsos KnowledgePanel 900 名用户的数据,显示当页面包含 AI 摘要时,用户点击搜索结果的可能性要小得多。如果搜索结果页面不包含 AI 摘要,用户的点击率为 15%;如果包含 AI 答案,点击率降为 8%。对于 Google 在 AI 摘要中包含的链接,研究发现其点击率为 1%——链接的来源主要是维基百科、YouTube 和 Reddit。更令人担忧的是用户在看到 AI 摘要之后更可能关闭会话,也就是不再继续搜索,不去验证 AI 摘要是否正确——而幻觉是生成式 AI 的固有问题,幻觉指的是虚构的错误信息。研究表明,Google 对 AI 的使用正在改变收集信息与搜索结果互动的方式。

  4. 微软从 Google DeepMind 挖走了至少 24 名 AI 工程师

    微软过去六个月从 Google AI 研究部门 DeepMind 至少挖走了 24 名 AI 工程师,硅谷巨头之间的 AI 人才战在火热持续中。本周二,Google Gemini 聊天机器人前工程主管 Amar Subramanya 在职业社交网络 LinkedIn 上发帖宣布自己担任微软企业 AI 副总裁,成为最新一名投奔微软的前 Google AI 工程师。他称赞新雇主的文化氛围耳目一新。其他已加入微软的 DeepMind AI 工程师包括了前工程主管 Sonal Gupta、软件工程师 Adam Sadovsky 和产品经理 Tim Frank。

  5. NASA 如何从 8 亿公里外拯救朱诺号的相机

    NASA 环绕木星运行的朱诺号任务团队于 2023 年 12 月执行了一项深空微操作,修复了其 JunoCam 相机。 JunoCam 是一台彩色可见光相机,其光学元件安装在钛金属墙的辐射防护舱外,该防护舱专门用来保护敏感电子元件,因为朱诺号的飞行路线会穿越太阳系内最强烈的行星辐射区域。虽然任务设计人员对 JunoCam 应付前 8 次的木星环绕任务充满信心,但没有人能确定它能持续运作多久。在朱诺号的前 34 次环绕任务中,JunoCam 运作一切正常,持续传回可供科学研究使用的影像。但在第 47 次轨道飞行中,摄影机开始出现辐射损伤的征兆。到了第 56 次,几乎所有影像都已损坏至无法使用。虽然任务团队知道问题与辐射有关,但要从数亿公里外精确判断哪个零件受损极为困难。种种线索指向电压调节器故障,这个零件对 JunoCam 的电力供应至关重要。在可行选项不多的情况下,团队转而尝试「退火处理」(annealing),即对材料加热一段时间后再缓慢冷却。虽然这个过程尚未完全被理解,但其理论是透过加热可减少材料内部的缺陷。退火处理完成后不久,JunoCam 在接下来数次轨道中重新拍摄出清晰影像。然而朱诺号的飞行路径持续深入木星辐射核心地带。到了第 55 次轨道飞行,影像又开始出现问题。团队成员尝试了不同的影像处理方式来改善品质,但是距离与木卫一的近距离接触只剩几周时间。时间急迫,最终他们把 JunoCam 的加热器温度开到最大,看看更激烈的退火能否挽救零件。在进行退火处理的第一周,回传至地球的测试影像几乎没有改善,但随着与木卫一接近的日子逼近,影像品质开始明显提升。到了 2023 年 12 月 30 日,当朱诺号以仅 1,500 公里的距离掠过这颗火山活跃的卫星时,拍摄出的影像几乎恢复到当初发射时的水准,清晰展现了木卫一北极区域的地貌:山块从平原中陡峭突起,覆盖着二氧化硫霜,还揭示了过去未被记录的大型火山与熔岩。

  6. 法庭裁决 Mike Lynch 的遗产以及商业伙伴欠惠普 9.44 亿美元

    英国高等法院法官裁决,惠普在 2011 年收购英国软件公司 Autonomy 时被诱骗支付了过高的收购价,判决公司创始人 Mike Lynch 的遗产及其商业伙伴欠惠普 9.44 亿美元。Mike Lynch 曾被誉为是英国的比尔盖茨,他在去年的游艇事故中去世。惠普在 2011 年以 110 亿美元收购了 Autonomy,第二年就减记 88 亿美元,惠普随后指控 Autonomy 存在严重会计违规操作行为。公司创始人 Mike Lynch 以及财务副总裁 Steve Chamberlain 都受到欺诈指控。然而悲剧的是,去年 8 月短短三天内两位被告先后死亡。Chamberlain 在位于剑桥郡 Stretham 的家附近跑步时被汽车撞死。Lynch 的超级游艇贝叶斯号(Bayesian)随后因风暴在西西里海岸沉没,有 6 人死亡,其中包括 Lynch 本人及其 18 岁女儿 Hanna,他的妻子 Angela Bacares 获救。这次旅行是为了庆祝他无罪释放。Lynch 的遗产估计价值 6.74 亿美元,如果支付其欠款份额那么它可能会破产。

  7. 阿里巴巴发布 Qwen3-Coder

    阿里巴巴发布了其辅助编程工具 Qwen3-Coder。Qwen3-Coder 拥有多个尺寸,最先发布的是最强大的版本 Qwen3-Coder-480B-A35B-Instruct。这是一个总参数量 480B,激活 35B 的 MoE 模型,原生支持 256K token 的上下文并可通过 YaRN 扩展到 1M token,拥有卓越的代码和 Agent 能力。Qwen3-Coder-480B-A35B-Instruct 在 Agentic Coding、Agentic Browser-Use 和 Agentic Tool-Use 上取得了开源模型的 SOTA 效果,可以与 Cluade Sonnet4 媲美。

  8. 乐观主义者是相似的但悲观主义者是不同的

    列夫·托尔斯泰(Leo Tolstoy)在《安娜·卡列尼娜》的开场白中写道:“幸福的家庭都是相似的,不幸的家庭各有各的不幸。”研究乐观主义和悲观主义的神经学家发现了一个类似的现象:在想象未来事件时,乐观主义者大脑一个关键区域的活动模式都是相似的;而悲观主义者的大脑活动模式各有各的不同。研究报告发表在 PNAS 期刊上。研究人员让参与者想象未来发生在自己或配偶身上的特定事件,同时使用 fMRI 对他们的大脑进行扫描。想象的事件部分是积极的,部分中性或消极。研究团队之后让参与者填写问卷以确定其乐观或悲观程度。他们进行了两次研究,一次参与者有 37 人,另一次 50 人。研究人员针对了一个在想象未来事件时特别活跃的区域:前额叶内侧皮质。对于结果,研究人员认为一个人可能有很多不同的方式变得悲观,而乐观者倾向于对充满希望的未来共享心智模式(Shared mental models)。

  9. Firefox 141 释出

    Mozilla 释出了 Firefox 141.0。主要新特性包括:自动管理标签页,本地 AI 模型能识别相似的标签页自动组织成组,甚至能为名称提供建议,这些信息都保存在本地不会泄漏到网上;垂直标签用户可以调整侧边栏底部工具区域的大小;Linux 版本减少了内存占用,通过包管理器更新后不会强制重新启动;向巴西、西班牙和日本用户启用地址自动填充功能;地址栏可用作单位转换器,支持长度、温度、质量、力和角度测量单位以及时区;Windows 版本启用 WebGPU API,等等。

  10. 科学家利用月球土壤取水

    数十年来,航天机构一直提出将月球作为探索宇宙深处的前哨基地的想法。然而为月球基地提供足够的资源尤其是水以支持居民生活,一直是极大的障碍。根据研究,通过火箭运输一加仑水的成本约为 8.3 万美元,而每位宇航员每天大约需要饮用 4 加仑水。香港中文大学王璐团队开发了一种技术,既能从月球土壤中提取水,又能直接利用这些水将宇航员呼出的二氧化碳转化为一氧化碳和氢气,进而用于制造燃料和供宇航员呼吸的氧气。这项技术通过一种新颖的光热策略实现了这一目标,即将太阳光转化为热能。科学家们使用嫦娥任务收集的月壤样本、模拟月球样本和一个充满二氧化碳气体的批量处理反应器测试了这项技术,该反应器使用光聚焦系统来驱动光热过程。研究团队使用钛铁矿(一种重质黑色矿物,也是月球土壤中报告的几种含水矿物之一)来测量光热活性,并分析该过程的机制。尽管实验室取得了成功,但研究人员强调目前的催化性能不足以完全支持人类在地球以外的环境生存。

  11. 參宿四发现有一颗小伴星

    参宿四,这颗夜空中最亮的红超巨星,终于被发现不再孤单!天文学家利用位于夏威夷的北双子星望远镜,首次直接拍摄到一颗紧贴着它的小伴星,这颗恒星暂时被命名为 Siwarha(阿拉伯语意为「她的手镯」)。新发现的恒星是一颗小而微弱的伴星,绕行参宿四的距离极近。虽然探测极为困难,但它的存在与过去的理论预测吻合,这一发现标志着重大突破。另一方面,这对双星最令人惊讶的是它们的演化步调大相径庭:参宿四已走向生命尽头,而 Siwarha 还未踏上主序星阶段,核心尚未开始氢核融合反应。虽然它们同时诞生,但由于质量差异,进程可说是各走极端。参宿四是夜空中最亮的恒星之一,也是距离地球最近的红超巨星,约 548 光年远,半径约为太阳的 700 倍,质量高达 19 倍。尽管它的年龄仅有一千万年,却已迈入生命晚期,预计在未来 10 万年内以壮观的超新星爆炸结束生命。参宿四与 Siwarha 可能同时诞生,但潮汐力将导致伴星最终被吞噬,科学家推测这场吞噬可能在一万年内发生。Siwarha 甚至可能在参宿四爆炸成超新星时被摧毁。

  12. Replit 删除生产数据库,伪造数据以隐藏 bug

    AI 辅助编程工具 Replit 与同类工具不同,它不仅能辅助写代码,还能处理部署和基础架构,它能访问应用后端。SaaStr 创始人 Jason Lemkin 试用了 Replit,他一开始对其赞不绝口,在几小时内就帮助构建出一个原型,但随着深入使用,他发现 Replit 并不可靠,它删除了生产数据库,并会伪造数据以隐藏 bug。在删库事件之后,Replit 一开始声称数据库无法恢复,称它毁掉了所有数据库版本,但后来发现回滚功能仍然有效。 Jason Lemkin 认为该服务不适合非技术用户从事商业使用。

  13. OpenAI 和 Google DeepMind 先后宣称他们的 AI 模型在国际数学奥林匹克竞赛中取得金牌成绩

    国际数学奥林匹克竞赛(IMO)是最负盛名的年轻数学家竞赛,自 1959 年以来每年举办一次。每个参赛国家派出六人参赛,他们需要完成六道代数、组合学、几何学和数论领域的高难度题目。完成六题中的五题就能获得金牌。过去几年 AI 公司也越来越多的用 IMO 比赛题目去检验其先进 AI 模型的数学解题和推理能力。去年 Google DeepMind 的 lphaProof 和 AlphaGeometry 2 完成了六题中的四题,得到 28 分(每题 7 分)获得银牌,但这些题目需要在人类专家的帮助下首先翻译成数学语言,且每题需要长达三天时间去解决而不是人类选手的 4.5 小时。OpenAI 研究员 Alexander Wei 上周六宣布该公司的一个实验模型拿到 IMO 比赛的金牌成绩。但此举违反了 IMO 主办方要求到 7 月 28 日再公布成绩的规定,在数学社区引发了争议,也促使 DeepMind 提前公布了该公司先进模型的成绩——解决了 6 题中的 5 题获得金牌,且不再需要翻译而是使用自然语言。

  14. NASA X-59 Quess 低音爆超音速飞机开始滑行测试

    NASA X-59 Quess 低音爆超音速飞机于 7 月 10 日在加州美国空军 Plant 42 工厂完成了首次低速滑行测试。滑行测试是首飞前的最后一组地面测试。接下来的几周飞机将逐渐提高速度,进行高速滑行测试至接近起飞点。X-59 由洛克希德马丁著名的臭鼬工厂设计,设计能突破音速但同时不会有超音速飞机通常会产生的音爆。Quest 会产生更安静的“砰砰声”,类似室内听到关车门的声音。如果 X-59 成功它有望对超音速飞行和航空业产生革命性影响。X-59 的外形设计减少飞机产生的音爆,它没有前向窗户,而是通过摄像头和显示屏为飞行员提供飞机前方的外部视觉系统。

  15. 弱密码导致一家没有备份的 158 年英国公司倒闭

    勒索软件组织 Akira 通过猜测员工的弱密码入侵了一家英国北安普敦郡运输公司 KNP 的计算机系统,加密系统勒索赎金。KNP 有 158 年历史,有 700 名员工。报道称,黑客没有披露赎金金额,一家专业勒索软件谈判公司估计赎金可能高达 500 万英镑。该公司没有这么多钱,以及显然没有备份,这起事件最终导致了公司倒闭。