OrangeBot.AI Digest — 2025-09-15
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- GPT-5-Codex (openai.com)
- React is winning by default and slowing innovation (www.lorenstew.art)
- Hosting a website on a disposable vape (bogdanthegeek.github.io)
- macOS Tahoe (www.apple.com)
- Asciinema CLI 3.0 rewritten in Rust, adds live streaming, upgrades file format (blog.asciinema.org)
- Wanted to spy on my dog, ended up spying on TP-Link (kennedn.com)
- Apple has a private CSS property to add Liquid Glass effects to web content (alastair.is)
- PayPal to support Ethereum and Bitcoin (newsroom.paypal-corp.com)
- Hosting a website on a disposable vape (bogdanthegeek.github.io)
- How big a solar battery do I need to store all my home's electricity? (shkspr.mobi)
- The Mac App Flea Market (blog.jim-nielsen.com)
- Denmark's Justice Minister calls encrypted messaging a false civil liberty (mastodon.social)
- Removing newlines in FASTA file increases ZSTD compression ratio by 10x (log.bede.im)
- RustGPT: A pure-Rust transformer LLM built from scratch (github.com)
- Folks, we have the best π (lcamtuf.substack.com)
GitHub Trending(15)
- rasbt / LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
- microsoft / markitdown
Python tool for converting files and office documents to Markdown.
- PowerShell / PowerShell
PowerShell for every system!
- x1xhlol / system-prompts-and-models-of-ai-tools
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI, VSCode Agent, Warp.dev, Windsurf, Xcode, Z.ai Code, dia & v0. (And other Open Sourced) System Prompts, Internal Tools & AI Models
- virattt / ai-hedge-fund
An AI Hedge Fund Team
- SoftFever / OrcaSlicer
G-code generator for 3D printers (Bambu, Prusa, Voron, VzBot, RatRig, Creality, etc.)
- simdjson / simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
- ItzCrazyKns / Perplexica
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
- sst / opencode
AI coding agent, built for the terminal.
- Zie619 / n8n-workflows
all of the workflows of n8n i could find (also from the site itself)
- ccxt / ccxt
A cryptocurrency trading API with more than 100 exchanges in JavaScript / TypeScript / Python / C# / PHP / Go
- midday-ai / midday
Invoicing, Time tracking, File reconciliation, Storage, Financial Overview & your own Assistant made for Freelancers
- unclecode / crawl4ai
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
- ml-explore / mlx-lm
Run LLMs with MLX
- CorentinJ / Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Hugging Face(15)
- IntrEx: A Dataset for Modeling Engagement in Educational Conversations
Engagement and motivation are crucial for second-language acquisition, yet maintaining learner interest in educational conversations remains a challenge. While prior research has explored what makes educational texts interesting, still little is known about the linguistic features that drive engagement in conversations. To address this gap, we introduce IntrEx, the first large dataset annotated for interestingness and expected interestingness in teacher-student interactions. Built upon the Teacher-Student Chatroom Corpus (TSCC), IntrEx extends prior work by incorporating sequence-level annotations, allowing for the study of engagement beyond isolated turns to capture how interest evolves over extended dialogues. We employ a rigorous annotation process with over 100 second-language learners, using a comparison-based rating approach inspired by reinforcement learning from human feedback (RLHF) to improve agreement. We investigate whether large language models (LLMs) can predict human interestingness judgments. We find that LLMs (7B/8B parameters) fine-tuned on interestingness ratings outperform larger proprietary models like GPT-4o, demonstrating the potential for specialised datasets to model engagement in educational settings. Finally, we analyze how linguistic and cognitive factors, such as concreteness, comprehensibility (readability), and uptake, influence engagement in educational dialogues.
- X-Part: high fidelity and structure coherent shape decomposition
Generating 3D shapes at part level is pivotal for downstream applications such as mesh retopology, UV mapping, and 3D printing. However, existing part-based generation methods often lack sufficient controllability and suffer from poor semantically meaningful decomposition. To this end, we introduce X-Part, a controllable generative model designed to decompose a holistic 3D object into semantically meaningful and structurally coherent parts with high geometric fidelity. X-Part exploits the bounding box as prompts for the part generation and injects point-wise semantic features for meaningful decomposition. Furthermore, we design an editable pipeline for interactive part generation. Extensive experimental results show that X-Part achieves state-of-the-art performance in part-level shape generation. This work establishes a new paradigm for creating production-ready, editable, and structurally sound 3D assets. Codes will be released for public research.
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Does continued scaling of large language models (LLMs) yield diminishing returns? Real-world value often stems from the length of task an agent can complete. We start this work by observing the simple but counterintuitive fact that marginal gains in single-step accuracy can compound into exponential improvements in the length of a task a model can successfully complete. Then, we argue that failures of LLMs when simple tasks are made longer arise from mistakes in execution, rather than an inability to reason. We propose isolating execution capability, by explicitly providing the knowledge and plan needed to solve a long-horizon task. We find that larger models can correctly execute significantly more turns even when small models have 100\% single-turn accuracy. We observe that the per-step accuracy of models degrades as the number of steps increases. This is not just due to long-context limitations -- curiously, we observe a self-conditioning effect -- models become more likely to make mistakes when the context contains their errors from prior turns. Self-conditioning does not reduce by just scaling the model size. In contrast, recent thinking models do not self-condition, and can also execute much longer tasks in a single turn. We conclude by benchmarking frontier thinking models on the length of task they can execute in a single turn. Overall, by focusing on the ability to execute, we hope to reconcile debates on how LLMs can solve complex reasoning problems yet fail at simple tasks when made longer, and highlight the massive benefits of scaling model size and sequential test-time compute for long-horizon tasks.
- InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the InfGen, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.
- HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering
The Retrieval-Augmented Generation (RAG) approach enhances question-answering systems and dialogue generation tasks by integrating information retrieval (IR) technologies with large language models (LLMs). This strategy, which retrieves information from external knowledge bases to bolster the response capabilities of generative models, has achieved certain successes. However, current RAG methods still face numerous challenges when dealing with multi-hop queries. For instance, some approaches overly rely on iterative retrieval, wasting too many retrieval steps on compound queries. Additionally, using the original complex query for retrieval may fail to capture content relevant to specific sub-queries, resulting in noisy retrieved content. If the noise is not managed, it can lead to the problem of noise accumulation. To address these issues, we introduce HANRAG, a novel heuristic-based framework designed to efficiently tackle problems of varying complexity. Driven by a powerful revelator, HANRAG routes queries, decomposes them into sub-queries, and filters noise from retrieved documents. This enhances the system's adaptability and noise resistance, making it highly capable of handling diverse queries. We compare the proposed framework against other leading industry methods across various benchmarks. The results demonstrate that our framework obtains superior performance in both single-hop and multi-hop question-answering tasks.
- VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
Spoken language models (SLMs) have emerged as a unified paradigm for speech understanding and generation, enabling natural human machine interaction. However, while most progress has focused on semantic accuracy and instruction following, the ability of SLMs to adapt their speaking style based on spoken instructions has received limited attention. We introduce Voice Style Adaptation (VSA), a new task that examines whether SLMs can modify their speaking style, such as timbre, prosody, or persona following natural language spoken commands. To study this task, we present VStyle, a bilingual (Chinese & English) benchmark covering four categories of speech generation: acoustic attributes, natural language instruction, role play, and implicit empathy. We also introduce the Large Audio Language Model as a Judge (LALM as a Judge) framework, which progressively evaluates outputs along textual faithfulness, style adherence, and naturalness, ensuring reproducible and objective assessment. Experiments on commercial systems and open source SLMs demonstrate that current models face clear limitations in controllable style adaptation, highlighting both the novelty and challenge of this task. By releasing VStyle and its evaluation toolkit, we aim to provide the community with a foundation for advancing human centered spoken interaction. The dataset and code are publicly available at https://junzhan2000.github.io/VStyle.github.io/{project's homepage}.
- Virtual Agent Economies
The rapid adoption of autonomous AI agents is giving rise to a new economic layer where agents transact and coordinate at scales and speeds beyond direct human oversight. We propose the "sandbox economy" as a framework for analyzing this emergent system, characterizing it along two key dimensions: its origins (emergent vs. intentional) and its degree of separateness from the established human economy (permeable vs. impermeable). Our current trajectory points toward a spontaneous emergence of a vast and highly permeable AI agent economy, presenting us with opportunities for an unprecedented degree of coordination as well as significant challenges, including systemic economic risk and exacerbated inequality. Here we discuss a number of possible design choices that may lead to safely steerable AI agent markets. In particular, we consider auction mechanisms for fair resource allocation and preference resolution, the design of AI "mission economies" to coordinate around achieving collective goals, and socio-technical infrastructure needed to ensure trust, safety, and accountability. By doing this, we argue for the proactive design of steerable agent markets to ensure the coming technological shift aligns with humanity's long-term collective flourishing.
- FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
Developing efficient Vision-Language-Action (VLA) policies is crucial for practical robotics deployment, yet current approaches face prohibitive computational costs and resource requirements. Existing diffusion-based VLA policies require multi-billion-parameter models and massive datasets to achieve strong performance. We tackle this efficiency challenge with two contributions: intermediate-modality fusion, which reallocates capacity to the diffusion head by pruning up to 50% of LLM layers, and action-specific Global-AdaLN conditioning, which cuts parameters by 20% through modular adaptation. We integrate these advances into a novel 950 M-parameter VLA called FLOWER. Pretrained in just 200 H100 GPU hours, FLOWER delivers competitive performance with bigger VLAs across 190 tasks spanning ten simulation and real-world benchmarks and demonstrates robustness across diverse robotic embodiments. In addition, FLOWER achieves a new SoTA of 4.53 on the CALVIN ABC benchmark. Demos, code and pretrained weights are available at https://intuitive-robots.github.io/flower_vla/.
- Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Masked diffusion large language models (dLLMs) are emerging as promising alternatives to autoregressive LLMs, offering competitive performance while supporting unique generation capabilities such as inpainting. We explore how inpainting can inform RL algorithm design for dLLMs. Aligning LLMs with reinforcement learning faces an exploration challenge: sparse reward signals and sample waste when models fail to discover correct solutions. While this inefficiency affects LLMs broadly, dLLMs offer a distinctive opportunity--their inpainting ability can guide exploration. We introduce IGPO (Inpainting Guided Policy Optimization), an RL framework that strategically inserts partial ground-truth reasoning traces during online sampling. Unlike providing full solutions, inpainting steers exploration toward promising trajectory spaces while preserving self-generated reasoning, bridging supervised fine-tuning and reinforcement learning. We apply IGPO to group-based optimization methods such as GRPO, where exploration failures cause zero advantages and gradients. IGPO restores meaningful gradients while improving sample efficiency. We also propose supervised fine-tuning on synthetically rewritten concise traces that better align with dLLM generation patterns. With additional techniques including entropy-based filtering, our training recipe yields substantial gains across three mathematical benchmarks--GSM8K, Math500, and AMC--achieving new state-of-the-art results for full-attention masked dLLMs.
- LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
Long-tailed learning has garnered increasing attention due to its wide applicability in real-world scenarios. Among existing approaches, Long-Tailed Semi-Supervised Learning (LTSSL) has emerged as an effective solution by incorporating a large amount of unlabeled data into the imbalanced labeled dataset. However, most prior LTSSL methods are designed to train models from scratch, which often leads to issues such as overconfidence and low-quality pseudo-labels. To address these challenges, we extend LTSSL into the foundation model fine-tuning paradigm and propose a novel framework: LoFT (Long-tailed semi-supervised learning via parameter-efficient Fine-Tuning). We demonstrate that fine-tuned foundation models can generate more reliable pseudolabels, thereby benefiting imbalanced learning. Furthermore, we explore a more practical setting by investigating semi-supervised learning under open-world conditions, where the unlabeled data may include out-of-distribution (OOD) samples. To handle this problem, we propose LoFT-OW (LoFT under Open-World scenarios) to improve the discriminative ability. Experimental results on multiple benchmarks demonstrate that our method achieves superior performance compared to previous approaches, even when utilizing only 1\% of the unlabeled data compared with previous works.
- QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Recent advances in Large Language Models (LLMs) have demonstrated impressive capabilities in financial reasoning and market understanding. Multi-agent LLM frameworks such as TradingAgent and FINMEM augment these models to long-horizon investment tasks, leveraging fundamental and sentiment-based inputs for strategic decision-making. However, such systems are ill-suited for the high-speed, precision-critical demands of High-Frequency Trading (HFT). HFT requires rapid, risk-aware decisions based on structured, short-horizon signals, including technical indicators, chart patterns, and trend-based features, distinct from the long-term semantic reasoning typical of traditional financial LLM applications. To this end, we introduce QuantAgent, the first multi-agent LLM framework explicitly designed for high-frequency algorithmic trading. The system decomposes trading into four specialized agents, Indicator, Pattern, Trend, and Risk, each equipped with domain-specific tools and structured reasoning capabilities to capture distinct aspects of market dynamics over short temporal windows. In zero-shot evaluations across ten financial instruments, including Bitcoin and Nasdaq futures, QuantAgent demonstrates superior performance in both predictive accuracy and cumulative return over 4-hour trading intervals, outperforming strong neural and rule-based baselines. Our findings suggest that combining structured financial priors with language-native reasoning unlocks new potential for traceable, real-time decision systems in high-frequency financial markets.
- Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
Visual reasoning over structured data such as tables is a critical capability for modern vision-language models (VLMs), yet current benchmarks remain limited in scale, diversity, or reasoning depth, especially when it comes to rendered table images. Addressing this gap, we introduce Visual-TableQA, a large-scale, open-domain multimodal dataset specifically designed to evaluate and enhance visual reasoning over complex tabular data. Our generation pipeline is modular, scalable, and fully autonomous, involving multiple reasoning LLMs collaborating across distinct roles: generation, validation, and inspiration. Visual-TableQA comprises 2.5k richly structured LaTeX-rendered tables and 6k reasoning-intensive QA pairs, all produced at a cost of under USD 100. To promote diversity and creativity, our pipeline performs multi-model collaborative data generation via cross-model prompting ('inspiration') and LLM-jury filtering. Stronger models seed layouts and topics that weaker models elaborate, collectively distilling diverse reasoning patterns and visual structures into the dataset. Empirical results show that models fine-tuned on Visual-TableQA generalize robustly to external benchmarks, outperforming several proprietary models despite the dataset's synthetic nature. The full pipeline and resources are publicly available at https://github.com/AI-4-Everyone/Visual-TableQA.
- MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools
The Model Context Protocol (MCP) is rapidly emerging as a pivotal open standard, designed to enhance agent-tool integration and interoperability, and is positioned to unlock a new era of powerful, interconnected, and genuinely utilitarian agentic AI. However, despite MCP's growing adoption, existing benchmarks often fail to capture real-world agent performance within this new paradigm, leading to a distorted perception of their true operational value and an inability to reliably differentiate proficiencies. To bridge this critical evaluation gap, we introduce MCP-AgentBench -- a comprehensive benchmark specifically engineered to rigorously assess language agent capabilities in MCP-mediated tool interactions. Core contributions of MCP-AgentBench include: the establishment of a robust MCP testbed comprising 33 operational servers with 188 distinct tools; the development of a benchmark featuring 600 systematically designed queries distributed across 6 distinct categories of varying interaction complexity; and the introduction of MCP-Eval, a novel outcome-oriented evaluation methodology prioritizing real-world task success. Through extensive empirical evaluation of leading language agents, we provide foundational insights. MCP-AgentBench aims to equip the research community with a standardized and reliable framework to build, validate, and advance agents capable of fully leveraging MCP's transformative benefits, thereby accelerating progress toward truly capable and interoperable AI systems.
- Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation
Accurate color alignment in text-to-image (T2I) generation is critical for applications such as fashion, product visualization, and interior design, yet current diffusion models struggle with nuanced and compound color terms (e.g., Tiffany blue, lime green, hot pink), often producing images that are misaligned with human intent. Existing approaches rely on cross-attention manipulation, reference images, or fine-tuning but fail to systematically resolve ambiguous color descriptions. To precisely render colors under prompt ambiguity, we propose a training-free framework that enhances color fidelity by leveraging a large language model (LLM) to disambiguate color-related prompts and guiding color blending operations directly in the text embedding space. Our method first employs a large language model (LLM) to resolve ambiguous color terms in the text prompt, and then refines the text embeddings based on the spatial relationships of the resulting color terms in the CIELAB color space. Unlike prior methods, our approach improves color accuracy without requiring additional training or external reference images. Experimental results demonstrate that our framework improves color alignment without compromising image quality, bridging the gap between text semantics and visual generation.
- CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models
Large Language Models (LLMs) have achieved remarkable success across various domains. However, a fundamental question remains: Can LLMs effectively utilize causal knowledge for prediction and generation? Through empirical studies, we find that LLMs trained directly on large-scale data often capture spurious correlations rather than true causal relationships, leading to suboptimal performance, especially in out-of-distribution (OOD) scenarios. To address this challenge, we propose Causal Attention Tuning (CAT), a novel approach that injects fine-grained causal knowledge into the attention mechanism. We propose an automated pipeline that leverages human priors to automatically generate token-level causal signals and introduce the Re-Attention mechanism to guide training, helping the model focus on causal structures while mitigating noise and biases in attention scores. Experimental results on our proposed Spurious Token Game (STG) benchmark and multiple downstream tasks demonstrate that our approach effectively leverages causal knowledge for prediction and remains robust in OOD scenarios. Implementation details can be found at https://github.com/Kairong-Han/CAT.
Solidot(15)
- 韩国扫地机器人试图通过差异化与中国公司竞争
IDC 的数据显示,在 2025 年 1~6 月扫地机器人全球市场份额居前 5 的企业中,中国企业占 4 家:北京石头世纪科技、科沃斯、追觅科技市场份额均超过 10%,小米 7.4%。2010 年代曾占据三成以上份额的 iRobot 仅占 5.8% 跌至第 5 位。韩国的两大巨头三星电子和 LG 电子也都有扫地机器人产品,但相比中国产品零售价在 965~1448 元,韩国产的扫地机器人普遍在 4826~9653 元之间,且性能无法让消费者认为值得付出更高的代价。由于无法在价格上与中国产品竞争,韩国厂商通过强调安全性、高档感和易用性,力求在价格以外实现差异化,这种战略与日本厂商也有相似之处。
- 全球人口正以更快的速度收缩
伊斯坦布尔妇产科医生 Furkan Kayabasoglu 过去经常会为同一个家庭多次接生,如今绝大部分家庭只生一胎。去年土耳其的总和生育率低至 1.48,已经远低于 2.1 的人口更替率。而联合国人口司此前预测土耳其要到 2100 年总和生育率才会到达如此低的水平。土耳其并非例外。世界各地,从发展中国家到中等收入国家到发达国家,生育率的下降幅度都远超预期。哥伦比亚首都波哥大的总和生育率只有 0.91,甚至低于日本东京。印度的总和生育率已低于人口更替率,中国的人口已经开始萎缩。墨西哥的总和生育率为1.6,与美国相当。2024 年法国出生人口数低于 1806 年,而当时人口不到今天的一半。意大利出生人数创下了 1861 年统一以来的最低水平。非洲出生率仍然远高于全球平均水平,但人口下降速度也远超预期。这一切意味着世界人口可能比专家预测的更早达到峰值,且峰值水平要低得多。世界人口可能到 2050 年代就会停止增长,而不是原来预测的 2084 年,且总数不会超过 90 亿。生育率下降的可能因素包括了女性受教育程度提高、城市化、避孕普及、养育成本急剧上升以及社会观念的转变等。
- 日本老年人口比例占到 29.4%
日本总务省周日公布了人口估算数据,65 岁以上老年人为 3619 万人,占总人口的比例为 29.4%,创下新高,该比例也是人口 4000 万以上国家中最高。老年人就业人数为 930万 ,连续 21 年增加,也创下新高,除了有更多老年人健康良好外,少子化导致人手短缺也是原因之一。人口估算显示,65 岁以上男性为 1568 万人,女性 2051 万人,总数较上年减少 5 万人。这是有可比数据的 1950 年以来继 2023 年后第二次减少。主要原因是新达到 65 岁的人数较少。国立社会保障和人口问题研究所估算,由于第二次婴儿潮(1971-1974 年)出生的一代逐渐进入老年,估计 2040 年老年人口将达到 3928 万人,占总人口的 34.8%。
- CRISPR 基因编辑的马引发争议
基因编辑的猪和绵羊等动物正逐渐在农业领域获得认可。这些技术可提升动物的性状表现,为人类提供更安全、优质的肉类产品。但经 CRISPR 技术改造的马,却被马球比赛拒之门外。专家强调,必须严格追踪并确保基因编辑动物的安全性,审慎推进相关应用。CRISPR 基因编辑技术能够精准切割基因组特定位置,改变基因表达,从而赋予生物新的性状。Kheiron 公司以阿根廷一匹冠军马为原型,利用克隆技术培育出五匹遗传背景完全一致的克隆马。在此基础上,研究人员进一步应用 CRISPR 技术,靶向抑制了肌生成抑制素基因的表达。该基因天然存在于动物体内,作用是限制肌肉过度发育。通过精准下调其活性,团队增加了马匹体内负责爆发性运动的肌纤维数量,从而将它们培育成更出色的“短跑健将”。阿根廷马球协会明确禁止基因编辑马参赛。协会主席表示,这项技术“会剥夺育种的魅力与魔法”。
- 日本百岁人口数量接近 10 万
日本百岁老人数量接近 10 万,创历史新高,其中最年长老人年龄 114 岁。根据日本厚生劳动省上周五公布的数据,截至 9 月,日本百岁老人数量 99,763 人,连续 55 年创新高。其中女性 87,784 名占 88%,男性 11,979 名。预期寿命提高主要归因于心脏病和常见癌症如乳腺癌和前列腺癌死亡人数减少。由于日本饮食中红肉摄入量较少、鱼类和蔬菜摄入量较高,肥胖率较低,而肥胖是导致上述两类疾病的主要因素。女性的肥胖率尤其低,这或许可以解释为什么日本女性预期寿命远高于男性。此外日本人在晚年仍然保持活跃,比美国和欧洲老年人更多地步行和使用公共交通工具。
- NewsGuard 的调查显示 AI 生成虚假信息的比例一年内翻了一倍
新闻评级公司 NewsGuard 调查了 10 款领先的生成式 AI 工具,分析了它们在回复中生成虚假新闻信息的比例。结果显示,2025 年 8 月,10 款 AI 工具在新闻主题上重复虚假信息的比例超过三分之一(35%),高于 2024 年 8 月的 18%。AI 公司并未兑现让 AI 更安全更可靠的承诺。生成虚假信息比例翻一倍的一大原因是今天的 AI 工具都支持联网查询,不再拒绝回答提问,它们不回复比例从 2024 年 8 月的 31% 下降到 2025 年 8 月的 0%,结果就是更多虚假信息。NewsGuard 认为攻击者正利用 AI 这一特点用各种方法洗白虚假信息,让 AI 模型无法区分内容农场和可信新闻渠道。
- AMD 的 RDNA4 GPU 架构
在 Hot Chips 2025 上,AMD 工程师介绍了该公司最新一代显卡 RX 9000 系列使用的 RDNA4 GPU 架构。RDNA4 GPU 大幅改进了光线追踪和机器学习效率,让中端显卡 RX 9070 以更小的面积、更低的功耗、更少的带宽实现了与上一代旗舰显卡 RX 7900XT 相似的光栅性能,以及更高的光线追踪性能。AMD GPU 的硬件视频编解码通常落后于竞争对手,RDNA4 的媒体引擎(Media Engine)提供了更快的解码速度,改进了 H.265、H.265 和 AV1 的视频编码质量,尤其是在低延迟编码方面,串流能从中受益。显示引擎(Display Engine)引入了 Radeon Image Sharpening 过滤器,让专门的硬件锐化最终图像有助于改进功耗;显示引擎也改进了多显示器的待机功耗。标量单元(Scalar Unit)加入了一些浮点指令。RDNA4 的一大变化是 L2 缓存从 RDNA2 的 4MB 和 RDNA3 的 6MB 增加到了 8MB,AMD 称光线追踪等工作负荷能受益于更大的 L2 缓存。
- 2025 年度拉斯克医学奖揭晓
2025 年度的拉斯克医学奖公布了获奖名单。 基础医学研究奖授予了德国马普研究所的 Dirk Görlich 博士以及美国得克萨斯大学西南医学中心的 Steven L.McKnight 博士,以表彰他们在揭示细胞内蛋白质运输与细胞组织构成新机制方面的开创性贡献。他们的研究聚焦于蛋白质序列中的低复杂度结构域(LCDs),这类结构域由少数种类氨基酸构成,曾被认为难以承担重要生物学功能。然而,真核生物中约 15%-20% 的蛋白质含有此类序列。两人的研究揭示,这些“无序”结构域能在细胞内形成高度有序的组装体,参与调控物质运输与细胞功能组织,为理解细胞内部复杂运作提供了全新视角。 临床医学研究奖授予了美国爱荷华大学的 Michael J.Welsh 博士、曾任职于福泰制药的 Jesús (Tito)González 博士,以及现任福泰制药科学家 Paul A.Negulescu 博士,以表彰他们在囊性纤维化创新疗法开发中的关键作用。三人共同推动的三联药物疗法 Trikafta 彻底改变了一种致命遗传病的治疗格局,使大多数患者的生活质量显著提升,预期寿命接近正常人水平。 拉斯克医学科学特殊成就奖则授予斯坦福大学医学院的 Lucy Shapiro 博士,以表彰她在生物医学领域长达55年的卓越贡献。她的研究揭示了细菌如何在时空维度协调基因表达以实现细胞分化,她还创立了斯坦福大学发育生物学系,并长期为美国政府提供应对新型传染病的政策建议。
- 阿联酋发布能与 DeepSeek 竞争的开源模型
阿联酋 AI 实验室 Institute of Foundation Models (IFM) 发布了能与 OpenAI 的 ChatGPT 和 DeepSeek 竞争的开源模型 K2 Think。研究人员称,K2 Think 只有 320 亿个参数,但其表现超过了参数规模比它大 20 倍的推理模型。DeepSeek 的 R1 模型有 6710 亿个参数,但只激活 370 亿个参数;Meta 的 Llama 4 模型活跃参数从 170 亿到 2880 亿;OpenAI 没有披露其模型的参数规模。研究人员还表示,K2 Think 的数学表现超越了所有开源模型。该模型更专注于数学、编程和科学研究。IFM 此前表示它会向研究人员开放训练代码、数据集等与模型相关的材料。
- 照片显示缅甸电诈园区的面积仍然在扩大
尽管最近一段时间遭受了打击,但无人机图像和照片显示缅甸电诈园区的面积仍然在扩大。智库 Australian Strategic Policy Institute 的数据显示,泰缅边境的电诈园区数量已从 11 个增加到 27 个,且平均每月面积扩大 5.5 公顷。其中 KK Park 占地 210 公顷。尽管今年早些时候有 7000 人从电诈园区解救出来,但泰国警方估计仍然有 10 万人关押在戒备森严的园区里。专家表示,在中国政府努力防止本国公民成为目标之后,犯罪集团已将目标转向美国和欧洲,开始招募了解英语和技术的人员,据估计有数千东非人被困在电诈园区。
- 美国啤酒 95% 含有 PFAS
研究人员对美国 23 种啤酒进行了测试,发现 95% 含有 PFAS。PFAS(全氟烷基和多氟烷基物质)涵盖约 9000 种化合物,主要用于制造防水、防污或耐热型产品。这类化学品被广泛应用于数十个行业产出的几千种日常消费品,包括防污剂、地毯以及鞋子。PFAS 被称为“永久留存的化学品”,因为它们不会自然分解。它们会在包括人类在内的各种动物体内积累,与癌症、新生儿缺陷、肝病、甲状腺疾病、免疫力下降、激素紊乱及其它一系列严重健康问题相关。尽管啤酒厂通常有水过滤和处理系统,但其设计不是旨在清除 PFAS。研究人员呼吁啤酒厂、消费者和监管机构提高认识,而啤酒厂可能需要升级水处理系统清除 PFAS。
- 社交媒体的末日
在社交媒体的算法优先级中,真正由人类创造的内容正在边缘化。人类内容的互动率比为点击优率化的合成内容和 AI 垃圾要低得多。社交媒体是建立在真实性的浪漫化上,早期的社交媒体都许诺自己是真实连接的媒介。即使是网红文化也承诺光环之后有真实的人。但注意力经济,尤其是 AI 驱动的后注意力经济,打破了这一幻觉。信息流不再以人为主,而是各种内容,平台用户如今是内容的消费者。人类创造的内容和合成内容之间越来越难以区分,而平台也不感兴趣进行监管。我们正淹没在虚无之中。在内容激增的同时,互动率则在下降:Facebook 和 X 帖子的平均互动率仅为 0.15%,Instagram 互动率同比下降 24%,连 TikTok 也进入了停滞期。人类不再像以前那样在社媒上联系或交流,他们只是在消费为点击率优化的内容。社交媒体的死亡不是惊天撼响而只是耸耸肩。今天社交媒体已经不再能爆炸式增长,全球人口的 65% 约 53 亿人已经在使用各种社交媒体。越来越多的人开始转向更私密的小型空间,如群聊、Discord 服务器,或者联邦宇宙微博客平台。用户渴望真实,平台也不得不做出回应,因为无处不在的合成内容正逼近人的容忍极限,用户开始疲惫了:Instagram 开始推 DM,X 推面向订阅用户的圈子,TikTok 则在尝试私人社区。社交媒体到了最后时刻,不是因为缺乏内容,而是注意力经济已经到了极限。
- 互联网档案馆保存的网页数即将突破 1 万亿
互联网档案馆即将在下个月迎来一个里程碑:已保存以及能通过时光机器(Wayback Machine)访问的网页数突破 1 万亿。互联网档案馆表示举办一系列活动庆祝这一时刻。互联网档案馆创办于 1996 年,其使命是“普及所有知识”(universal access to all knowledge)。它提供的数字资料有如网站、网页、图形材料音乐、视频、音频、软件、动态图像和数百万书籍等的永久性免费储存及获取的副本。但在 2001 年时光机器推出前早期保存的网页等数据并不能提供访问。
- 尼泊尔 Z 世代抗议中的技术力量
因政府封禁大部分社交媒体,以及对精英阶层的炫富行为感到愤怒,本周一 尼泊尔 Z 世代举行了大规模抗议。周一的抗议以悲剧收场,至少 19 人死亡。但在政府同意解除对社交媒体的封禁之后,更多尼泊尔人加入了抗议群体,议会大厦被烧毁,总理宣布辞职,军队暂时接手维持次序。下一步怎么办?年轻一代的活动人士转向了广泛使用的游戏聊天应用 Discord,他们热烈的讨论谁可以成为临时政府领导人。通过非正式投票,前最高法院首席大法官 Sushila Karki 脱颖而出。本周五,Sushila Karki 正式担任临时政府总理,她成为尼泊尔历史上首位女总理。根据世界银行的数据,尼泊尔 3000 万人口中逾半数上网。在抗议活动爆发前几天,很多人就转向了 VPN 等工具绕过封锁。因担心互联网被关闭,Jack Dorsey 创建的蓝牙消息应用 Bitchat 的下载量激增。在这次 Z 世代抗议运动中,科技扮演了重要角色。
- Proton Mail 应网络安全机构要求关闭了记者账户
瑞士加密邮箱服务 ProtonMail 再次引发了争议,它自称是用户个人数据的中立的安全避风港,致力于捍卫用户的自由,但在没有给出任何解释的情况下关闭了两位记者的账户,直到引发长达数周的争论和抗议之后它才恢复账户。它没有给出关闭账户的详细解释。在账户被关闭时两名记者正以 Saber 和 cyb0rg 笔名为黑客杂志《Phrack》的 8 月份撰写一篇朝鲜政府支持的黑客组织入侵韩国多个政府机构计算机网络的文章。文章称入侵活动是朝鲜 APT 组织 Kimsuky 发动的,包括韩国外交部和国防反间谍司令部在内的政府机构网络都被渗透。但上个月 ProtonMail 在收到未指明的网络安全机构的投诉后关闭了记者的账户,影响了记者与受影响机构的沟通。