OrangeBot.AI Digest — 2025-08-07
74 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Vibechart (www.vibechart.net)
- Historical Tech Tree (www.historicaltechtree.com)
- GPT-5: Key characteristics, pricing and system card (simonwillison.net)
- GPT-5 for Developers (openai.com)
- GPT-5 (openai.com)
- Live: GPT-5 (www.youtube.com)
- Building Bluesky comments for my blog (natalie.sh)
- Windows XP Professional (win32.run)
- Infinite Pixels (meyerweb.com)
- How AI conquered the US economy: A visual FAQ (www.derekthompson.org)
- AI Ethics is being narrowed on purpose, like privacy was (nimishg.substack.com)
- Gemini CLI GitHub Actions (blog.google)
- Leonardo Chiariglione – Co-founder of MPEG (leonardo.chiariglione.org)
- Debounce (developer.mozilla.org)
- Zero-day flaws in authentication, identity, authorization in HashiCorp Vault (cyata.ai)
GitHub Trending(14)
- nautechsystems / nautilus_trader
A high-performance algorithmic trading platform and event-driven backtester
- browserbase / stagehand
The AI Browser Automation Framework
- lvgl / lvgl
Embedded graphics library to create beautiful UIs for any MCU, MPU and display type.
- vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
- ollama / ollama
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
- netbirdio / netbird
Connect your devices into a secure WireGuard®-based overlay network with SSO, MFA and granular access controls.
- jesseduffield / lazygit
simple terminal UI for git commands
- ethereum / solidity
Solidity, the Smart Contract Programming Language
- simstudioai / sim
Sim is an open-source AI agent workflow builder. Sim Studio's interface is a lightweight, intuitive way to quickly build and deploy LLMs that connect with your favorite tools.
- openai / openai-cookbook
Examples and guides for using the OpenAI API
- prisma / prisma
Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
- xiaoyaocz / dart_simple_live
简简单单的看直播
- datawhalechina / self-llm
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
- dotnet / efcore
EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
Product Hunt(15)
- Floot
Build serious apps with AI without getting stuck
- Unicorns Club
Grow, earn Sparks, and get discovered by VCs
- Patio
Share tools, learn DIY, and build sustainably
- Haimeta
Create stunning images, vivid videos, and lifelike 3D assets
- Visionstory - Video Podcast
Turn dialogues to studio-quality video podcasts in seconds
- Nas.io v2
AI agent to build, launch, and sell your digital products
- Simulate by Future AGI
The voice AI auto-testing loop to simulate, evaluate & ship
- Graphy 3.0
From messy data to graphs with a story in seconds.
- xpander.ai
Turn AI agents into Slack native teammates
- Gemini Storybook
Turn any idea into an illustrated story, read aloud
- OFFY
Your tasks created, planned, and executed with an AI agent
- Roomsy
Chores Tracker App that makes house cleaning actually fun
- Deskrib.Ai
Transform ideas into beautiful documents, instantly
- Kreatli
All-in-one collaboration platform for creative teams
- Melder - AI for Excel
Upgrade Excel with an analysis agent and AI formulas
Hugging Face(15)
- VeriGUI: Verifiable Long-Chain GUI Dataset
Recent studies have delved into constructing autonomous agents capable of performing complex Graphical User Interface (GUI)-based computer tasks, with the potential to revolutionize human-computer interaction. Despite encouraging results, existing efforts mainly focus on short-term interactions and rely on outcome-only verification, thereby limiting their scalability in real-world GUI applications that demand long-horizon task decomposition and execution. In this work, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designed to facilitate the development and evaluation of generalist GUI agents operating in realistic computer environments. Our dataset emphasizes two critical dimensions: (1) long-chain complexity, with tasks decomposed into a sequence of interdependent subtasks spanning hundreds of steps, explicitly designed to allow any subtask to serve as a valid starting point; and (2) subtask-level verifiability, which enables diverse exploration strategies within each subtask, while ensuring that each subtask-level goal remains verifiable and consistent. The dataset consists of GUI task trajectories across both desktop and web, annotated by human experts. Extensive experiments on VeriGUI using various agents with different foundation models reveal significant performance gaps in handling long-horizon tasks, highlighting the need for more robust planning and decision-making capabilities in GUI agents.
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providing answers (a.k.a., CoT reasoning), which often leads to the perception that they engage in deliberate inferential processes. However, some initial findings suggest that CoT reasoning may be more superficial than it appears, motivating us to explore further. In this paper, we study CoT reasoning via a data distribution lens and investigate if CoT reasoning reflects a structured inductive bias learned from in-distribution data, allowing the model to conditionally generate reasoning paths that approximate those seen during training. Thus, its effectiveness is fundamentally bounded by the degree of distribution discrepancy between the training data and the test queries. With this lens, we dissect CoT reasoning via three dimensions: task, length, and format. To investigate each dimension, we design DataAlchemy, an isolated and controlled environment to train LLMs from scratch and systematically probe them under various distribution conditions. Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.
- Efficient Agents: Building Effective Agents While Reducing Cost
The remarkable capabilities of Large Language Model (LLM)-driven agents have enabled sophisticated systems to tackle complex, multi-step tasks, but their escalating costs threaten scalability and accessibility. This work presents the first systematic study of the efficiency-effectiveness trade-off in modern agent systems, addressing the critical need for cost-effective designs without sacrificing performance. We investigate three key questions: (1) How much complexity do agentic tasks inherently require? (2) When do additional modules yield diminishing returns? (3) How much efficiency can be gained through the design of efficient agent frameworks? Through an empirical analysis on the GAIA benchmark, we evaluate the impact of LLM backbone selection, agent framework designs, and test-time scaling strategies. Using the cost-of-pass metric, we quantify the efficiency-performance trade-off across these dimensions. Our findings inform the development of Efficient Agents , a novel agent framework that has an optimal complexity to task requirements. Efficient Agents retains 96.7% of the performance of OWL, one leading open-source agent framework, while reducing operational costs from 0.398 to 0.228, resulting in a 28.4% improvement in cost-of-pass. Our work provides actionable insights for designing efficient, high-performing agent systems, advancing the accessibility and sustainability of AI-driven solutions.
- SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent, an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning, where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and challenging tasks. The agent's policy is updated through experiential learning, comprised of adversarial imitation of failure actions and Group Relative Policy Optimization (GRPO) on successful ones. Furthermore, we introduce a specialist-to-generalist training strategy that integrates individual experiential insights from specialist agents, facilitating the development of a stronger generalist CUA capable of continuous autonomous evolution. This unified agent ultimately achieves performance surpassing ensembles of individual specialist agents on their specialized software. We validate the effectiveness of SEAgent across five novel software environments within OS-World. Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA, i.e., UI-TARS.
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Research on applications of Reinforcement Learning (RL) to Large Language Models (LLMs) has mostly been focused on single-turn problems, such as mathematical reasoning or single-shot code generation. While these problems can be viewed as token-level multi-turn MDPs, this view corresponds to a degenerate case of multi-turn interaction where the environment provides no feedback. This contrasts with many real-world domains, such as software engineering (SWE), which require rich multi-turn interactions with a stateful environment that responds to each action with a non-trivial observation. To bridge this gap, we demonstrate the successful application of RL to this general regime. Using a modified Decoupled Advantage Policy Optimization (DAPO) algorithm, we train an agent based on Qwen2.5-72B-Instruct to solve real-world software engineering tasks. Our approach increases the agent's success rate on the SWE-bench Verified benchmark from a 20% rejection fine-tuned baseline to 39%, without relying on any teacher models. On SWE-rebench, our agent matches or outperforms leading open-weight models such as DeepSeek-V3-0324 and Qwen3-235B-A22B using an identical scaffolding, offering a viable path toward building more capable autonomous agents for complex real-world problems based on open models.
- Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success
Interactive multimodal agents must convert raw visual observations into coherent sequences of language-conditioned actions -- a capability that current vision-language models (VLMs) still lack. Earlier reinforcement-learning (RL) efforts could, in principle, endow VLMs with such skills, but they have seldom tested whether the learned behaviours generalize beyond their training simulators, and they depend either on brittle hyperparameter tuning or on dense-reward environments with low state variability. We introduce Vision-Language Decoupled Actor-Critic (VL-DAC), a lightweight, hyperparameter-free RL algorithm. VL-DAC applies PPO updates to action tokens while learning value only at the environment-step level: an arrangement, to our knowledge, not previously explored for large VLMs or LLMs. This simple decoupling removes unstable weighting terms and yields faster, more reliable convergence. Training a single VLM with VL-DAC in one inexpensive simulator at a time (MiniWorld, Gym-Cards, ALFWorld, or WebShop) already produces policies that generalize widely: +50\% relative on BALROG (game-centric agentic control), +5\% relative on the hardest part of VSI-Bench (spatial planning), and +2\% on VisualWebBench (web navigation), all without degrading general image understanding accuracy. These results provide the first evidence that a simple RL algorithm can train VLMs entirely in cheap synthetic worlds while delivering measurable gains on real-image agentic, spatial-reasoning, and web-navigation benchmarks.
- Agent Lightning: Train ANY AI Agents with Reinforcement Learning
We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from scratch) with almost ZERO code modifications. By formulating agent execution as Markov decision process, we define an unified data interface and propose a hierarchical RL algorithm, LightningRL, which contains a credit assignment module, allowing us to decompose trajectories generated by ANY agents into training transition. This enables RL to handle complex interaction logic, such as multi-agent scenarios and dynamic workflows. For the system design, we introduce a Training-Agent Disaggregation architecture, and brings agent observability frameworks into agent runtime, providing a standardized agent finetuning interface. Experiments across text-to-SQL, retrieval-augmented generation, and math tool-use tasks demonstrate stable, continuous improvements, showcasing the framework's potential for real-world agent training and deployment.
- CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction
Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models (LLMs) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction. CoTox combines chemical structure data, biological pathways, and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o, we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLMs to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names, which are easier for LLMs to understand than SMILES, enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses, as shown in case study. This result highlights the potential of LLM-based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.
- Sotopia-RL: Reward Design for Social Intelligence
Social intelligence has become a critical capability for large language models (LLMs), enabling them to engage effectively in real-world social tasks such as accommodation, persuasion, collaboration, and negotiation. Reinforcement learning (RL) is a natural fit for training socially intelligent agents because it allows models to learn sophisticated strategies directly through social interactions. However, social interactions have two key characteristics that set barriers for RL training: (1) partial observability, where utterances have indirect and delayed effects that complicate credit assignment, and (2) multi-dimensionality, where behaviors such as rapport-building or knowledge-seeking contribute indirectly to goal achievement. These characteristics make Markov decision process (MDP)-based RL with single-dimensional episode-level rewards inefficient and unstable. To address these challenges, we propose Sotopia-RL, a novel framework that refines coarse episode-level feedback into utterance-level, multi-dimensional rewards. Utterance-level credit assignment mitigates partial observability by attributing outcomes to individual utterances, while multi-dimensional rewards capture the full richness of social interactions and reduce reward hacking. Experiments in Sotopia, an open-ended social learning environment, demonstrate that Sotopia-RL achieves state-of-the-art social goal completion scores (7.17 on Sotopia-hard and 8.31 on Sotopia-full), significantly outperforming existing approaches. Ablation studies confirm the necessity of both utterance-level credit assignment and multi-dimensional reward design for RL training. Our implementation is publicly available at: https://github.com/sotopia-lab/sotopia-rl.
- Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
Multimodal large-scale models have significantly advanced the development of web agents, enabling perception and interaction with digital environments akin to human cognition. In this paper, we argue that web agents must first acquire sufficient knowledge to effectively engage in cognitive reasoning. Therefore, we decompose a web agent's capabilities into two essential stages: knowledge content learning and cognitive processes. To formalize this, we propose Web-CogKnowledge Framework, categorizing knowledge as Factual, Conceptual, and Procedural. In this framework, knowledge content learning corresponds to the agent's processes of Memorizing and Understanding, which rely on the first two knowledge types, representing the "what" of learning. Conversely, cognitive processes correspond to Exploring, grounded in Procedural knowledge, defining the "how" of reasoning and action. To facilitate knowledge acquisition, we construct the Web-CogDataset, a structured resource curated from 14 real-world websites, designed to systematically instill core knowledge necessary for web agent. This dataset serves as the agent's conceptual grounding-the "nouns" upon which comprehension is built-as well as the basis for learning how to reason and act. Building on this foundation, we operationalize these processes through a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework, developing and training our proposed agent, the Web-CogReasoner. Extensive experimentation reveals its significant superiority over existing models, especially in generalizing to unseen tasks where structured knowledge is decisive. To enable rigorous evaluation, we introduce the Web-CogBench, a comprehensive evaluation suite designed to assess and compare agent performance across the delineated knowledge domains and cognitive capabilities. Our code and data is open sourced at https://github.com/Gnonymous/Web-CogReasoner
- LaTCoder: Converting Webpage Design to Code with Layout-as-Thought
Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.
- Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
In this paper, we present a novel framework for video-to-4D generation that creates high-quality dynamic 3D content from single video inputs. Direct 4D diffusion modeling is extremely challenging due to costly data construction and the high-dimensional nature of jointly representing 3D shape, appearance, and motion. We address these challenges by introducing a Direct 4DMesh-to-GS Variation Field VAE that directly encodes canonical Gaussian Splats (GS) and their temporal variations from 3D animation data without per-instance fitting, and compresses high-dimensional animations into a compact latent space. Building upon this efficient representation, we train a Gaussian Variation Field diffusion model with temporal-aware Diffusion Transformer conditioned on input videos and canonical GS. Trained on carefully-curated animatable 3D objects from the Objaverse dataset, our model demonstrates superior generation quality compared to existing methods. It also exhibits remarkable generalization to in-the-wild video inputs despite being trained exclusively on synthetic data, paving the way for generating high-quality animated 3D content. Project page: https://gvfdiffusion.github.io/.
- HPSv3: Towards Wide-Spectrum Human Preference Score
Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3, the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality. The code and dataset are available at the HPSv3 Homepage.
- LeanK: Learnable K Cache Channel Pruning for Efficient Decoding
Large language models (LLMs) enable long-context tasks but face efficiency challenges due to the growing key-value (KV) cache. We propose LeanK, a learning-based method that prunes unimportant key (K) cache channels by leveraging static channel sparsity. With a novel two-stage training process, LeanK learns channel-wise static mask that could satisfy specific sparsity ratio and hardware alignment requirement. LeanK reduces GPU memory and accelerates decoding without sacrificing accuracy. Experiments demonstrate up to 70% K cache and 16%-18% V cache memory reduction. Custom decoding kernel enables 1.3x speedup for attention computation. We also provide insights into model channels and attention heads during long-context inference by analyzing the learned importance distribution. Our code is available at https://aka.ms/LeanK.
- DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework
Video virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment. However, most existing end-to-end methods rely heavily on scarce paired garment-centric datasets and fail to effectively leverage priors of advanced visual models and test-time inputs, making it challenging to accurately preserve fine-grained garment details and maintain temporal consistency in unconstrained scenarios. To address these challenges, we propose DreamVVT, a carefully designed two-stage framework built upon Diffusion Transformers (DiTs), which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios. To further leverage prior knowledge from pretrained models and test-time inputs, in the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model (VLM), to synthesize high-fidelity and semantically consistent keyframe try-on images. These images serve as complementary appearance guidance for subsequent video generation. In the second stage, skeleton maps together with fine-grained motion and appearance descriptions are extracted from the input content, and these along with the keyframe try-on images are then fed into a pretrained video generation model enhanced with LoRA adapters. This ensures long-term temporal coherence for unseen regions and enables highly plausible dynamic motions. Extensive quantitative and qualitative experiments demonstrate that DreamVVT surpasses existing methods in preserving detailed garment content and temporal stability in real-world scenarios. Our project page https://virtu-lab.github.io/
Solidot(15)
- 特朗普威胁对芯片征收 100% 关税,除非在美建厂或承诺建厂
美国总统特朗普周三表示将对进口半导体和芯片征收 100% 关税,他没有透露具体细节。如果半导体厂商想要豁免关税,它们要么需要在美国建厂,要么需要承诺将在美国建厂。特朗普说,“我们将对芯片和半导体征收高额关税,但对像苹果这样的公司来说,好消息是如果你在美国生产,或者已承诺在美国生产,就不会征收任何费用。”苹果公司在这之前承诺未来四年在美国投资 1000 亿美元以促进美国制造业的发展。
- 日本禁止苹果 iOS 限制第三方浏览器引擎
日本最近通过了被称为《Bill on the Promotion of Competition for Specified Software Used in Smartphones》的智能手机法案,其中之一是禁止苹果在其 iOS 平台上限制第三方浏览器引擎的做法。第三方浏览器登陆苹果的平台必须使用它的浏览器引擎——即 WebKit,Firefox、Chrome、Edge、Opera、Brave 和 Vivaldi 的 iOS 版本都是 WebKit 的换皮,此举导致 iOS 上的浏览器缺乏竞争。上周日本发布了指南 Mobile Software Competition Act (MSCA) Guidelines,明确禁止苹果的这项政策。MSCA 将于 2025 年 12 月生效,执行该法律将是它面临的一大挑战,欧盟和英国都制定了类似的法律。
- Grok 未经用户要求就生成斯威夫特的裸照
马斯克(Elon Musk)旗下 AI 公司 xAI 的聊天机器人 Grok 被发现未经用户要求就生成了著名歌星斯威夫特(Taylor Swift)的裸照。用户使用了提示词 Taylor Swift celebrating Coachella with the boys 选择预设 spicy 生成视频,结果 Grok 生成了斯威夫特在一群 AI 观众前脱衣和穿丁字裤跳舞的视频。随着 Take It Down Act 法案将于明年生效,如果平台放任 AI 生成深度伪造的裸照 xAI 可能会面临法律后果。
- 维基百科编辑对 AI 生成文章采用加速删除政策
维基百科编辑采用了一项新政策去处理大量涌入的 AI 生成文章。新政策允许管理员快速删除符合一定条件的 AI 生成文章。维基百科以前的文章删除流程通常需要长达一周的讨论。对于 AI 生成文章,在标记和审核是否符合条件之后管理员可以无需讨论快速删除。允许快速删除的 AI 文章需要满足两个条件:其一包含明显的大模型对提示词的回应如 Here is your Wikipedia article on…、Up to my last training update … 以及 as a large language model 等等之类;另一个条件是大模型经常犯的错误——引用不存在或显然错误的来源。
- GitHub CEO 警告开发者要么拥抱 AI 要么改行
GitHub CEO Thomas Dohmke 向全世界开发者发出明确的警告,程序员要么拥抱 AI 要么离开这个行业。他在个人博客发文《Developers, Reinvented》,称软件开发正在经历一场变革,不仅涉及到代码如何编写,也涉及到程序员自身的定义。文章引用了 22 名已将 AI 融入工作流程的程序员的经历:AI 不是遥远的未来,而是当前的必需品。 一位开发者说,要么拥抱 AI 要么重新考虑你的职业生涯规划。类似的警告也出现在 GitHub 母公司微软高管的口中。Dohmke 表示,曾认为 AI 工具如 GitHub Copilot 是噱头的程序员现在认为它们是不可或缺的合作伙伴。程序员的角色从写代码转变到设计架构和审核 AI 生成的代码。他们不再称自己是程序员,而是“代码赋能者(code enablers)”或“代码创意总监(creative directors of code)”。
- Proxmox Virtual Environment 9.0 释出
Proxmox Virtual Environment 释出了 v9.0 版本。Proxmox Virtual Environment 9.0 基于 Debian 13 "Trixie" 但使用了较新版本的内核 Linux kernel 6.14.8-2。主要新特性包括:Snapshots 为任何支持块存储的存储系统提供快照支持,高可用性 (HA) 集群的亲和性规则,用 Rust 语言和 Yew Web 框架开发的用于管理 Proxmox 系统的移动 Web 界面,ZFS 2.3.3 支持不停机向 RAIDZ 池添加新设备,等等。
- 瑞典首相因在工作中使用 AI 工具而受到批评
瑞典首相 Ulf Kristersson 承认经常用 ChatGPT 等 AI 工具咨询意见。他表示自己用过的 AI 工具包括 OpenAI 的 ChatGPT 和法国 Mistral AI 的 LeChat,他的同事也在日常工作中经常使用 AI。他表示使用 AI 工具是为了获得政治事务相关的补充意见。但专家对政客使用 AI 工具表达了担忧,Umea 大学的负责任 AI 教授 Virginia Dignum 称 AI 无法对政治观点发表有意义的意见,它只是反映了其开发商的观点,“我们没有投票支持 ChatGPT ”。
- OpenAI 自 GPT-2 以来首次发布开放权重模型
在 2019 年发布 GPT-2 之后 OpenAI 的行为更像是 CloseAI。现在时隔六年之后,OpenAI 终于发布了两款采用 Apache 2.0 许可证的开放权重大模型 gpt-oss-120b 和 gpt-oss-20b,参数分别为 1200 亿和 200 亿,性能接近 o4-mini 和 o3-mini。其中较小的模型 gpt-oss-20b 可以运行在 16GB 内存的普通消费级电脑上,而 gpt-oss-120b 需要 80GB 内存。两个模型都使用了 OpenAI o1 模型率先部署的思维链推理方法,可浏览网页、执行代码和充当 AI 智能体(AI agent)。
- 美科技企业今年前七个月裁员 9 万人
Challenger, Gray & Christmas 的统计显示,1~7 月美国科技企业裁员人数同比增长 36%,达到 8 万 9251 人。Layoffs.fyi 的统计显示,7月份美国裁员人数约为 1 万 6000 人,比上年同期增加78%。微软 7 月初宣布裁员 9000 人,占员工总数的 4%。正在重组的英特尔也将裁员。削减雇用的浪潮也蔓延到了中坚企业,相亲应用 Bumble 裁员 240 人,占员工总数的 30%,荷兰 TomTom 也宣布裁员 300 人。相关企业的业绩并没有恶化。美国 Alphabet、微软、苹果、亚马逊、Meta 5 大企业二季度全部实现营收和利润增长,季度净利润合计达到 1151 亿美元。在此情况下企业仍急于裁员是因为可通过引进 AI 更加积极地削减成本。
- 英伟达否认其产品含有后门和关闭开关
英伟达官方博客发文称,“NVIDIA 芯片不存在后门、终止开关和监控软件”。文章称,“为了降低误用风险,一些专家和政策制定者提出需要在硬件中设置‘终止开关’或内置控件,以便在用户不知情和未经同意的情况下远程禁用 GPU。有人怀疑这种情况已经存在。NVIDIA GPU 不存在也不应该设置终止开关和后门。30 多年前,NVIDIA 开始设计处理器。经验告诉我们,将后门和终止开关嵌入到芯片有可能为黑客和敌对势力提供可乘之机,这将会破坏全球数字基础设施和业界对领先技术的信任。很多国家和地区的既定法律对此也有明确规定,要求公司修正漏洞,而不是制造漏洞。这一原则仍然有效。不存在所谓“好”的秘密后门,只有必须被彻底消除的危险漏洞。”
- 台积电指控前雇员窃取 2 纳米芯片技术机密
台积电指控前雇员窃取 2 纳米芯片制程技术机密,而苹果 iPhone 18 系列使用的 A20 芯片是首批采用 2 纳米工艺的芯片。报道称,台积电日前发现制程技术疑遭外流后,立即向高检署提告,检方经追查后,指挥调查局上月 25 日及 28日 发动多波搜索及约谈行动,目前已知有一名陈姓离职工程师,及近 10 名台积电先进制程试产及研发工程师涉案,此名陈姓工程师曾在台积电系统整合部门任职,离职后转往台积电长期合作的日商东京威力科创担任设备工程师,因陈与台积电目前先进制程相关研发人员熟识,因此负责与台积电研发部门对接。据悉,陈与台积电研发部门有密切往来,加上熟识研发人员,目前已查出陈窃密的方式,是由台积电工程师打开电脑屏幕出示制程技术图样,陈再以手机直接拍照,据了解,陈直接从吴姓等两名工程师电脑屏幕上,分别拍摄了 700 多张、近 300 张制程技术照片,另外有几位台积电工程师,也提供拍摄较不具机密性的个位数的制程图,情节较轻,因此未被声押。
- 科学家研发出一种效力与吗啡相当但无严重副作用的止痛药
日本京都大学的科学家研发出一种效力与吗啡相当但无严重副作用的止痛药。吗啡常被癌症患者使用,它有呼吸困难和成瘾等严重副作用。新药物 Adrian 的工作原理与吗啡和现有的合成阿片类药物完全不同,研究团队声称有望彻底改变医学领域的疼痛控制,有助于解决阿片类药物滥用问题。当人遭遇危及生命的情况时,大脑会分泌去甲肾上腺素(norepinephrine)去抑制疼痛。新研究集中在是人体调节去甲肾上腺素过度分泌的机制。研究团队通过引入新技术首次成功研发出一种能阻断这种调控的药物。科学家计划 2026 年在美国开展临床试验,2028 年投入实用。
- 用激光穿透人类大脑
科学家理解大脑运作主要使用两种工具,它们都有各自的优点和缺点:脑电图 (EEG)廉价且轻便,但无法读取大脑外皮层之外的信息;功能性核磁共振成像 (fMRI) 昂贵且体积庞大但可以深入大脑。现在格拉斯哥(Glasgow)大学研究团队找到了一种能集两者于一身的技术:像 EEG 那样廉价且轻便,像 fMRI 那样能读取大脑深层的信息。他们使用激光器从大脑一侧发射数以百万的光子,然后测量到达另一侧的时间。由于只有极少数光子能完全穿过大脑,因此研究的一大难点是降低背景噪音。这项技术离真正实用还有一段距离,研究人员还需要克服更多障碍。
- 超加工饮食减肥的效果不大
英国科学家发现,超加工饮食对减重和降低心血管代谢疾病风险的效果可能不如最少加工的饮食,即使这两种饮食都遵循相同的国家饮食指南。研究结果基于一项对英国 55 名成年人开展的社群水平的临床试验,揭示了在整体营养构成之外,食品加工程度对特定健康结局的可能影响。全球超加工食物消耗量在近几十年里快速增加,而肥胖症以及2型糖尿病和心血管疾病这类慢性病的发病率也在同期上升。研究人员开展了一项随机交叉试验,比较了以超加工食品为主和以最少加工食品为主的饮食,两种饮食结构都遵循了英国《健康饮食指南》——一组促进健康均衡营养的国家饮食建议。试验中的 55 名成人或接受预制的超加工食品,如早餐谷物或即食千层意面;或接受预制的最少加工食品,如隔夜燕麦或自制肉酱意面,这些食品在 8 周内分别配送到家。休息 4 周后,受试者换成另一种饮食再继续 8 周,从而能在受试者本人身上比较超加工食品和最少加工食品在 6 个月期间的影响。50 名受试者至少完成了一种饮食。研究者发现,遵循英国《健康饮食指南》的两种饮食都能在 8 周内显著减重。不过,最少加工饮食的平均减重量为 2%,而超加工饮食只有 1%。除了减重,最少加工饮食能更有效地改善与心血管代谢健康指标相关的身体成分,如降低脂肪总量、内脏脂肪和甘油三酯水平,但超加工饮食后的低密度脂蛋白胆固醇更低。
- 特斯拉被指在涉及自动驾驶的车祸案件中隐瞒数据、撒谎和误导警方
陪审团上周裁决特斯拉对一起牵涉到 Autopilot 的车祸过失死亡事件负有部分责任。庭审记录显示,特斯拉试图将所有责任都归罪于司机,主动隐瞒 Autopilot 在事故前后表现的关键证据。在车祸发生三分钟内,特斯拉汽车将碰撞快照(collision snapshot)——视频、CAN‑bus streams、EDR 数据等——上传到特斯拉公司的服务器上,然后删除了本地拷贝,使得特斯拉公司成为唯一一个能访问关键证据的实体。警方在多年之后才让特斯拉承认碰撞快照的存在。专家通过从车载电脑上取证恢复数据确认特斯拉一直拥有该“碰撞快照”。而特斯拉一直宣称快照数据并不存在。