OrangeBot.AI Digest — 2025-08-13
75 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- VC-backed company just killed my EU trademark for a small OSS project
- Illinois bans use of artificial intelligence for mental health therapy (www.washingtonpost.com)
- PYX: The next step in Python packaging (astral.sh)
- OCaml as my primary language (xvw.lol)
- Pebble Time 2* Design Reveal (ericmigi.com)
- Nginx introduces native support for ACME protocol (blog.nginx.org)
- This website is for humans (localghost.dev)
- New treatment eliminates bladder cancer in 82% of patients (news.keckmedicine.org)
- I'm worried it might get bad (danielmiessler.com)
- Pebble Time 2 Design Reveal [video] (www.youtube.com)
- We caught companies making it harder to delete your personal data online (themarkup.org)
- So what's the difference between plotted and printed artwork? (lostpixels.io)
- UK expands police facial recognition rollout with 10 new facial recognition vans (www.theregister.com)
- Claude says “You're absolutely right!” about everything (github.com)
- FFmpeg 8.0 adds Whisper support (code.ffmpeg.org)
GitHub Trending(15)
- ubicloud / ubicloud
Open source alternative to AWS. Elastic compute, block storage (non replicated), firewall and load balancer, managed Postgres, K8s, AI inference, and IAM services.
- apple / embedding-atlas
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
- jitsi / jitsi-meet
Jitsi Meet - Secure, Simple and Scalable Video Conferences that you use as a standalone app or embed in your web application.
- tadata-org / fastapi_mcp
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!
- menloresearch / jan
Jan is an open source alternative to ChatGPT that runs 100% offline on your computer
- bytedance / UI-TARS-desktop
The Open-sourced Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
- FiloSottile / mkcert
A simple zero-config tool to make locally trusted development certificates with any names you'd like.
- filamentphp / filament
A powerful open source UI framework for Laravel • Build and ship admin panels & apps fast with Livewire
- open-telemetry / opentelemetry-collector
OpenTelemetry Collector
- nomic-ai / gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
- conductor-oss / conductor
Conductor is an event driven orchestration platform providing durable and highly resilient execution engine for your applications
- microsoft / poml
Prompt Orchestration Markup Language
- x1xhlol / system-prompts-and-models-of-ai-tools
FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser, Xcode, Trae AI, Cluely & Orchids.app (And other Open Sourced) System Prompts, Tools & AI Models.
- open-telemetry / opentelemetry-collector-contrib
Contrib repository for the OpenTelemetry Collector
- fastapi / full-stack-fastapi-template
Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.
Product Hunt(15)
- Autumn
Stripe made easy for AI startups
- Bio Calls by Cross Paths
Monetize all your social media in 60 seconds
- mcp-use
Open source SDK and infra for MCP servers & agents
- Kandid
Consultative AI salesperson for ecommerce
- Fellow API
Build custom workflows from meeting transcripts and AI notes
- Reeroll
The AI Video Editor
- DownMark
Turn web content into clean Markdown with one click
- Compozy
Next-level Agentic Orchestration Platform
- Whispering
Open-source, local-first dictation you can trust
- Inworld Runtime
The AI runtime for top consumer applications
- VibeKit CLI
The safety layer for your coding agent
- MaskLLM
Mask your LLM APIs for secure rotation and logging
- AI research creator
AI that writes your research questions for you
- Besimple AI
Your own data annotation platform in 60 sec
- MOBORE
Find the Bugs That Cost You Money with mobore
Hugging Face(15)
- WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Web agents such as Deep Research have demonstrated superhuman cognitive abilities, capable of solving highly challenging information-seeking problems. However, most research remains primarily text-centric, overlooking visual information in the real world. This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge, and the use of more sophisticated tools compared to text-based agents. To address this limitation, we introduce WebWatcher, a multi-modal Agent for Deep Research equipped with enhanced visual-language reasoning capabilities. It leverages high-quality synthetic multimodal trajectories for efficient cold start training, utilizes various tools for deep reasoning, and further enhances generalization through reinforcement learning. To better evaluate the capabilities of multimodal agents, we propose BrowseComp-VL, a benchmark with BrowseComp-style that requires complex information retrieval involving both visual and textual information. Experimental results show that WebWatcher significantly outperforms proprietary baseline, RAG workflow and open-source agents in four challenging VQA benchmarks, which paves the way for solving complex multimodal information-seeking tasks.
- Matrix-3D: Omnidirectional Explorable 3D World Generation
Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from a limited scope in the generated scenes. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction. To facilitate effective training, we also introduce the Matrix-Pano dataset, the first large-scale synthetic collection comprising 116K high-quality static panoramic video sequences with depth and trajectory annotations. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance in panoramic video generation and 3D world generation. See more in https://matrix-3d.github.io.
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Recent advancements in LLM-based agents have demonstrated remarkable capabilities in handling complex, knowledge-intensive tasks by integrating external tools. Among diverse choices of tools, search tools play a pivotal role in accessing vast external knowledge. However, open-source agents still fall short of achieving expert-level Search Intelligence, the ability to resolve ambiguous queries, generate precise searches, analyze results, and conduct thorough exploration. Existing approaches fall short in scalability, efficiency, and data quality. For example, small turn limits in existing online RL methods, e.g. <=10, restrict complex strategy learning. This paper introduces ASearcher, an open-source project for large-scale RL training of search agents. Our key contributions include: (1) Scalable fully asynchronous RL training that enables long-horizon search while maintaining high training efficiency. (2) A prompt-based LLM agent that autonomously synthesizes high-quality and challenging QAs, creating a large-scale QA dataset. Through RL training, our prompt-based QwQ-32B agent achieves substantial improvements, with 46.7% and 20.8% Avg@4 gains on xBench and GAIA, respectively. Notably, our agent exhibits extreme long-horizon search, with tool calls exceeding 40 turns and output tokens exceeding 150k during training time. With a simple agent design and no external LLMs, ASearcher-Web-QwQ achieves Avg@4 scores of 42.1 on xBench and 52.8 on GAIA, surpassing existing open-source 32B agents. We open-source our models, training data, and codes in https://github.com/inclusionAI/ASearcher.
- CharacterShot: Controllable and Consistent 4D Character Animation
In this paper, we propose CharacterShot, a controllable and consistent 4D character animation framework that enables any individual designer to create dynamic 3D characters (i.e., 4D character animation) from a single reference character image and a 2D pose sequence. We begin by pretraining a powerful 2D character animation model based on a cutting-edge DiT-based image-to-video model, which allows for any 2D pose sequnce as controllable signal. We then lift the animation model from 2D to 3D through introducing dual-attention module together with camera prior to generate multi-view videos with spatial-temporal and spatial-view consistency. Finally, we employ a novel neighbor-constrained 4D gaussian splatting optimization on these multi-view videos, resulting in continuous and stable 4D character representations. Moreover, to improve character-centric performance, we construct a large-scale dataset Character4D, containing 13,115 unique characters with diverse appearances and motions, rendered from multiple viewpoints. Extensive experiments on our newly constructed benchmark, CharacterBench, demonstrate that our approach outperforms current state-of-the-art methods. Code, models, and datasets will be publicly available at https://github.com/Jeoyal/CharacterShot.
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus. Simply training an agent equipped with multiple search tools using flat reinforcement learning (RL) is a straightforward idea, but it has problems such as low training data efficiency and poor mastery of complex tools. To address the above issue, we propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL. At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains. At the high level, a planner agent coordinates low-level agents and provides the final answer. Moreover, to prevent direct answer copying and error propagation, we design a knowledge refiner that filters out hallucinations and irrelevant evidence returned by low-level agents. Experiments show that HierSearch achieves better performance compared to flat RL, and outperforms various deep search and multi-source retrieval-augmented generation baselines in six benchmarks across general, finance, and medical domains.
- Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Diffusion large language models (dLLMs) generate text through iterative denoising, yet current decoding strategies discard rich intermediate predictions in favor of the final output. Our work here reveals a critical phenomenon, temporal oscillation, where correct answers often emerge in the middle process, but are overwritten in later denoising steps. To address this issue, we introduce two complementary methods that exploit temporal consistency: 1) Temporal Self-Consistency Voting, a training-free, test-time decoding strategy that aggregates predictions across denoising steps to select the most consistent output; and 2) a post-training method termed Temporal Consistency Reinforcement, which uses Temporal Semantic Entropy (TSE), a measure of semantic stability across intermediate predictions, as a reward signal to encourage stable generations. Empirical results across multiple benchmarks demonstrate the effectiveness of our approach. Using the negative TSE reward alone, we observe a remarkable average improvement of 24.7% on the Countdown dataset over an existing dLLM. Combined with the accuracy reward, we achieve absolute gains of 2.0% on GSM8K, 4.3% on MATH500, 6.6% on SVAMP, and 25.3% on Countdown, respectively. Our findings underscore the untapped potential of temporal dynamics in dLLMs and offer two simple yet effective tools to harness them.
- Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Graphical User Interface (GUI) grounding, the task of mapping natural language instructions to precise screen coordinates, is fundamental to autonomous GUI agents. While existing methods achieve strong performance through extensive supervised training or reinforcement learning with labeled rewards, they remain constrained by the cost and availability of pixel-level annotations. We observe that when models generate multiple predictions for the same GUI element, the spatial overlap patterns reveal implicit confidence signals that can guide more accurate localization. Leveraging this insight, we propose GUI-RC (Region Consistency), a test-time scaling method that constructs spatial voting grids from multiple sampled predictions to identify consensus regions where models show highest agreement. Without any training, GUI-RC improves accuracy by 2-3% across various architectures on ScreenSpot benchmarks. We further introduce GUI-RCPO (Region Consistency Policy Optimization), which transforms these consistency patterns into rewards for test-time reinforcement learning. By computing how well each prediction aligns with the collective consensus, GUI-RCPO enables models to iteratively refine their outputs on unlabeled data during inference. Extensive experiments demonstrate the generality of our approach: GUI-RC boosts Qwen2.5-VL-3B-Instruct from 80.11% to 83.57% on ScreenSpot-v2, while GUI-RCPO further improves it to 85.14% through self-supervised optimization. Our approach reveals the untapped potential of test-time scaling and test-time reinforcement learning for GUI grounding, offering a promising path toward more robust and data-efficient GUI agents.
- VertexRegen: Mesh Generation with Continuous Level of Detail
We introduce VertexRegen, a novel mesh generation framework that enables generation at a continuous level of detail. Existing autoregressive methods generate meshes in a partial-to-complete manner and thus intermediate steps of generation represent incomplete structures. VertexRegen takes inspiration from progressive meshes and reformulates the process as the reversal of edge collapse, i.e. vertex split, learned through a generative model. Experimental results demonstrate that VertexRegen produces meshes of comparable quality to state-of-the-art methods while uniquely offering anytime generation with the flexibility to halt at any step to yield valid meshes with varying levels of detail.
- Aryabhata: An exam-focused language model for JEE Math
We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of-n rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation alongwith novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and efficiency, while offering pedagogically useful step-by-step reasoning. We release Aryabhata as a foundation model to advance exam-centric, open-source small language models. This marks our first open release for community feedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 on Hugging Face}); PW is actively training future models to further improve learning outcomes for students.
- UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation
Text-to-image (T2I) generation has been actively studied using Diffusion Models and Autoregressive Models. Recently, Masked Generative Transformers have gained attention as an alternative to Autoregressive Models to overcome the inherent limitations of causal attention and autoregressive decoding through bidirectional attention and parallel decoding, enabling efficient and high-quality image generation. However, compositional T2I generation remains challenging, as even state-of-the-art Diffusion Models often fail to accurately bind attributes and achieve proper text-image alignment. While Diffusion Models have been extensively studied for this issue, Masked Generative Transformers exhibit similar limitations but have not been explored in this context. To address this, we propose Unmasking with Contrastive Attention Guidance (UNCAGE), a novel training-free method that improves compositional fidelity by leveraging attention maps to prioritize the unmasking of tokens that clearly represent individual objects. UNCAGE consistently improves performance in both quantitative and qualitative evaluations across multiple benchmarks and metrics, with negligible inference overhead. Our code is available at https://github.com/furiosa-ai/uncage.
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Recent work on enhancing the reasoning abilities of large language models (LLMs) has introduced explicit length control as a means of constraining computational cost while preserving accuracy. However, existing approaches rely on fixed-length training budgets, which do not take advantage of the natural progression from exploration to compression during learning. In this work, we propose a curriculum learning strategy for length-controlled reasoning using Group Relative Policy Optimization (GRPO). Our method starts with generous token budgets and gradually tightens them over training, encouraging models to first discover effective solution strategies and then distill them into more concise reasoning traces. We augment GRPO with a reward function that balances three signals: task correctness (via verifier feedback), length efficiency, and formatting adherence (via structural tags). Experiments on GSM8K, MATH500, SVAMP, College Math, and GSM+ demonstrate that curriculum-based training consistently outperforms fixed-budget baselines at the same final budget, achieving higher accuracy and significantly improved token efficiency. We further ablate the impact of reward weighting and decay schedule design, showing that progressive constraint serves as a powerful inductive bias for training efficient reasoning models. Our code and checkpoints are released at: https://github.com/hammoudhasan/curriculum_grpo.
- Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy
We present the first evaluation harness that enables any out-of-the-box, local, Large Language Models (LLMs) to play full-press Diplomacy without fine-tuning or specialized training. Previous work required frontier LLMs, or fine-tuning, due to the high complexity and information density of Diplomacy's game state. Combined with the high variance of matches, these factors made Diplomacy prohibitive for study. In this work, we used data-driven iteration to optimize a textual game state representation such that a 24B model can reliably complete matches without any fine tuning. We develop tooling to facilitate hypothesis testing and statistical analysis, and we present case studies on persuasion, aggressive playstyles, and performance across a range of models. We conduct a variety of experiments across many popular LLMs, finding the larger models perform the best, but the smaller models still play adequately. We also introduce Critical State Analysis: an experimental protocol for rapidly iterating and analyzing key moments in a game at depth. Our harness democratizes the evaluation of strategic reasoning in LLMs by eliminating the need for fine-tuning, and it provides insights into how these capabilities emerge naturally from widely used LLMs. Our code is available in the supplement and will be open sourced.
- Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors
A dexterous hand capable of generalizable grasping objects is fundamental for the development of general-purpose embodied AI. However, previous methods focus narrowly on low-level grasp stability metrics, neglecting affordance-aware positioning and human-like poses which are crucial for downstream manipulation. To address these limitations, we propose AffordDex, a novel framework with two-stage training that learns a universal grasping policy with an inherent understanding of both motion priors and object affordances. In the first stage, a trajectory imitator is pre-trained on a large corpus of human hand motions to instill a strong prior for natural movement. In the second stage, a residual module is trained to adapt these general human-like motions to specific object instances. This refinement is critically guided by two components: our Negative Affordance-aware Segmentation (NAA) module, which identifies functionally inappropriate contact regions, and a privileged teacher-student distillation process that ensures the final vision-based policy is highly successful. Extensive experiments demonstrate that AffordDex not only achieves universal dexterous grasping but also remains remarkably human-like in posture and functionally appropriate in contact location. As a result, AffordDex significantly outperforms state-of-the-art baselines across seen objects, unseen instances, and even entirely novel categories.
- Adversarial Video Promotion Against Text-to-Video Retrieval
Thanks to the development of cross-modal models, text-to-video retrieval (T2VR) is advancing rapidly, but its robustness remains largely unexamined. Existing attacks against T2VR are designed to push videos away from queries, i.e., suppressing the ranks of videos, while the attacks that pull videos towards selected queries, i.e., promoting the ranks of videos, remain largely unexplored. These attacks can be more impactful as attackers may gain more views/clicks for financial benefits and widespread (mis)information. To this end, we pioneer the first attack against T2VR to promote videos adversarially, dubbed the Video Promotion attack (ViPro). We further propose Modal Refinement (MoRe) to capture the finer-grained, intricate interaction between visual and textual modalities to enhance black-box transferability. Comprehensive experiments cover 2 existing baselines, 3 leading T2VR models, 3 prevailing datasets with over 10k videos, evaluated under 3 scenarios. All experiments are conducted in a multi-target setting to reflect realistic scenarios where attackers seek to promote the video regarding multiple queries simultaneously. We also evaluated our attacks for defences and imperceptibility. Overall, ViPro surpasses other baselines by over 30/10/4% for white/grey/black-box settings on average. Our work highlights an overlooked vulnerability, provides a qualitative analysis on the upper/lower bound of our attacks, and offers insights into potential counterplays. Code will be publicly available at https://github.com/michaeltian108/ViPro.
- Complex Logical Instruction Generation
Instruction following has catalyzed the recent era of Large Language Models (LLMs) and is the foundational skill underpinning more advanced capabilities such as reasoning and agentic behaviors. As tasks grow more challenging, the logic structures embedded in natural language instructions becomes increasingly intricate. However, how well LLMs perform on such logic-rich instructions remains under-explored. We propose LogicIFGen and LogicIFEval. LogicIFGen is a scalable, automated framework for generating verifiable instructions from code functions, which can naturally express rich logic such as conditionals, nesting, recursion, and function calls. We further curate a collection of complex code functions and use LogicIFGen to construct LogicIFEval, a benchmark comprising 426 verifiable logic-rich instructions. Our experiments demonstrate that current state-of-the-art LLMs still struggle to correctly follow the instructions in LogicIFEval. Most LLMs can only follow fewer than 60% of the instructions, revealing significant deficiencies in the instruction-following ability. Code and Benchmark: https://github.com/mianzhang/LogicIF
Solidot(15)
- Do Kwon 对欺诈指控认罪
Terraform Labs 创始人 Do Kwon 在美国联邦法院承认犯有共谋欺诈和电信欺诈罪。Terraform 发行了被称为 Terra USD(UST)的算法稳定币,2022 年 5 月 UST 的币值崩溃关联代币 LUNA 几乎跌至一文不值,客户损失了 400 亿美元。Do Kwon 是韩国公民,他的公司总部位于新加坡,他于 2023 年 3 月在黑山被捕,2024 年底被引渡到美国受审。他面临最高 25 年监禁,但检方同意只要认罪他的刑期最高不超过 12 年。Do Kwon 承认做出了虚假和误导性的陈述,犯有错误。
- Perplexity 报价 345 亿美元收购 Chrome
美国联邦法官 Amit Mehta 去年裁决,Google 付费成为智能手机浏览器默认搜索引擎的行为违反了美国的反垄断法。为削弱 Google 在搜索市场的垄断地位,法官正在考虑是否强迫搜索巨人出售其浏览器 Chrome,预计本月就此做出裁决。Chrome 是目前市场份额最高的浏览器,其估值在 200-500 亿美元之间。AI 搜索引擎创业公司 Perplexity 周二提出以 345 亿美元的价格收购 Chrome。这一报价是 Perplexity 自身 180 亿美元估值的 2 倍,该公司表示这笔交易获得了包括大型风险投资基金在内的投资者的全额支持。
- Debian GNU/Hurd 2025 释出
使用 GNU Hurd 内核的发行版释出了最新的 Debian GNU/Hurd 2025。基于微内核架构的 GNU Hurd 至今有逾 30 历史,但 1.0 版本还未发布,最新的稳定版本还是 2016 年的 v0.9。Debian GNU/Hurd 2025 支持 i386 架构和 amd64 架构,基于最近发布的 Debian “Trixie”稳定版,包含 72% 的 Debian 软件包,它仍然是不稳定版本的快照。
- 亚马逊宽带卫星在轨数量突破 100 颗
在因天气原因四次推迟发射之后,亚马逊使用 SpaceX Falcon 9 火箭发射了 24 颗 Kuiper 互联网卫星,Kuiper 卫星星座在轨数量达到了 102 颗。SpaceX 是目前最大的低地球轨道卫星互联网提供商,其 Starlink 卫星星座总数约 8000 颗,在全世界有 500 万用户。为遵守 FCC(联邦通信委员会) 设定的最后期限,亚马逊正在加快发射 Kuiper 卫星。FCC 要求亚马逊到 2026 年 7 月底发射 1600 颗卫星,到 2029 年 7 月底完成 3236 颗卫星的发射。亚马逊从不同火箭公司预定了多达 83 次发射合同,其中三次是与竞争对手 SpaceX。虽然该公司的卫星星座处于早期阶段,亚马逊已经与各国政府签署了协议,希望今年晚些时候开始提供商业服务。
- 有 133 亿年历史的最古老黑洞
德克萨斯大学奥斯汀分校科学家领衔的国际天文学家团队,利用詹姆斯·韦布空间望远镜,捕捉到宇宙大爆炸后仅 5 亿年就已存在的超大质量黑洞。它的质量相当于 3 亿个太阳的质量,刷新了迄今发现最古老黑洞纪录,为揭开宇宙黎明时期的奥秘打开了新窗口。研究团队发现星系 CAPERS-LRD-z9 呈现独特的“小红点”特征。这类诞生于宇宙婴儿期(前15亿年)的星系通常体积紧凑、色泽红艳且异常明亮。进一步研究发现,超大质量黑洞是 CAPERS-LRD-z9 星系中意外亮度的来源,且该黑洞的质量估计为太阳的 3 亿倍。该黑洞也是目前已确认的最遥远的黑洞。
- 前 NSA 局长称美国科技公司难以保持中立
前美国国家安全局(NSA)局长、中央安全局局长和网战司令部司令、现 OpenAI 董事、退役陆军上将保罗·仲宗根(Paul Nakasone)在拉斯维加斯出席 Defcon 网络安全大会,与 Defcon 创始人 Jeff Moss 谈论了美国当前的政治气候。特朗普当局解雇了被认为不忠诚的网络安全官员,撤销了前 CISA 主任 Chris Krebs 和 Jen Easterly 的安全许可,仲宗根评论说,美国科技公司在 2025 和 2026 年试图保持中立将会非常非常困难。
- 柯达可能停止运营
有 133 年历史的柯达公司警告投资者该公司可能停止运营。柯达在最新财报中称,它缺乏资金去偿还约 5 亿美元的即将到期的债务。柯达计划通过停止支付退休金计划筹集现金,该公司的产品如相机、墨水和胶卷都在美国生产,因此关税不会对其业务产生影响。柯达(Eastman Kodak)成立于 1892 年,创始人 George Eastman 在 1888 年出售了第一台柯达相机。柯达的名字是 Eastman 生造出来的。柯达一度占据了美国 90% 的胶卷和 85% 的相机销量,虽然它研制出了第一台数码相机,但却将其拱手让出,最终在数字摄影时代落后,2012 年申请破产保护,2020 年转型为制药原料生产商获得了短暂的喘息。
- 星际译王会将 X11 剪切板数据发送到远程服务器
LWN 发表的一篇文章谈论了星际译王词典软件的潜在隐私问题。用户在 Debian 上安装星际译王时会默认安装插件包 stardict-plugin,其中之一是网易的有道在线词典插件,该插件还会链接另一个在线词典 dict.cn。在 X11 下星际译王默认会启用扫描功能,监视用户的文本选择通过弹出窗口提供自动翻译。在 Wayland 下该功能被破坏了,因为 Wayland 默认禁止应用程序捕获其它应用的文本。星际译王的 Debian 包维护者 Xiao Sheng Wen 认为这种做法没有问题,指出如果用户不想使用扫描功能或有道插件,可以禁用它们。但 Vincent Lefevre 认为,涉及隐私的功能永远不应该默认启用。用户可能会在文本选择时泄漏敏感信息,比如从密码管理器中复制粘贴密码。
- 极端高温可能导致热带鸟类数量急剧下降
在亚马逊和巴拿马等热带地区,即使在几乎未受破坏的雨林中,某些鸟类的种群数量也下降了多达 90%,而愈发强烈的极端高温很可能是主要因素。一项新研究发现,在 1950 年至 2020 年间,极端高温的加剧导致热带地区陆栖鸟类的丰度下降了 25%-38%。研究人员以地球生命力数据库中的全球陆栖鸟类种群数据作为研究起点,未包括水鸟或海鸟。随后他们从全球环境历史数据库获取了有关栖息地破坏的数据,并从欧洲中期天气预报中心获取了天气和气候的历史数据。比较所有数据后,他们发现,在南北纬 21°-43°之间的中纬度地区,栖息地破坏是导致鸟类种群数量下降的主要因素,这与其他研究一致。然而在热带地区,极端高温才是最大的原因。研究人员表示,在这些地区,鸟类通常生活在接近其耐热极限的边缘,一旦超过这个极限就会死亡。即使能在极端高温事件中幸存下来,变差的健康也会减少它们繁殖的机会。
- 大模型的模拟推理能力只是一种脆弱的幻觉
最近几个月 AI 公司开始转向模拟推理模型,使用思维链通过多个逻辑步骤解决难题。但模拟推理真的是推理吗?已有研究显示,如果一个问题中包含上下文无关的文本,模型出错的可能性将会大增。根据发表在 arxiv 上的一篇预印本,亚利桑那大学的研究人员认为,思维链模型只是类推理文本的模拟器。他们的测试发现,思维链模型所谓的性能飞跃只是一种脆弱的幻觉,它展示的只是对训练过程中所学到的模式的复制,而不是真正的对文本的理解。思维链模型没有表现出广义的逻辑推理能力,而是展现出一种复杂的结构化模式匹配形式。稍稍偏离其训练分布,性能就会显著下降。模型生成流畅但胡扯的语言的能力创造出一种虚幻的信任光环,其内容经不起仔细审查。研究人员警告不要将思维链模型的输出等同于人类思维,不要在医学、金融或法律分析等高风险领域过于信任大模型。
- 研究发现来自人类排泄物的生物炭能解决全球化肥短缺
根据发表在 PNAS 期刊上的一项研究报告,康奈尔大学研究人员发现,利用人类固体排泄物生产的生物炭,每年能满足全球 7% 的磷肥需求。生物炭与从尿液中提取的营养物质结合后,能提供全球农业所需磷的 15%、氮的 17% 和钾的 25%。生物炭生产过程可将固体排泄物的体积和重量减少最多 90%,同时还可以根据特定作物的需求调整营养比例。
- Reddit 将屏蔽互联网档案馆抓取其内容
Reddit 与 Google 等公司签署了协议,将其用户生成的内容出售给 Google 等公司训练 AI,它限制了其它 AI 公司抓取其内容。但现在 Reddit 发现部分 AI 公司改从互联网馆的 Wayback Machine 抓取 Reddit 的内容,它宣布将阻止互联网档案馆的爬虫索引 Reddit 的大部分内容,Wayback Machine 将只能索引 Reddit.com 主页,无法再获得详细的帖子内容、用户评论等。
- 掉落在居民屋顶的陨石比地球更古老
6 月 26 日一颗陨石在美国佐治亚州上空爆炸,碎片击穿了 Henry 县 McDonough 市一户民居屋顶。佐治亚大学(University of Georgia)研究人员检测了陨石碎片,发现它属于球粒陨石(Chondrite),根据其类型该陨石的形成于大约 45.6 亿年前,比地球的形成时间还要古老 2000 万年。该居民表示仍然在家周围发现撞击留下的碎片。
- 沃茨对 YouTube 的欺诈诉讼停滞不前
2020 年,骗子利用苹果联合创始人沃茨(Steve Wozniak)的片段在 YouTube 上发布视频以骗取比特币,沃茨的妻子 Janet Wozniak 多次举报了该视频,但 YouTube 对此没有采取任何行动,两人都认为 YouTube 是在助纣为虐,他们为此提起了诉讼。然而五年之后,沃茨接受采访透露案件停滞不前,原因是名为 Section 230 的联邦法规。Section 230 是非常宽泛的法规,它限制了对社媒平台提起任何诉讼的能力。沃茨说,Section 230 规定平台不需要对上面发布的任何内容承担任何责任。对于沃茨的诉讼,Google/YouTube 公关部门的 José Castañeda 发表了一份冠冕堂皇但没有任何正面回应的措辞,声称公司严肃对待平台上滥用行为,会在发现违规行为时迅速采取行动。
- CEO 辞职,GitHub 不再在微软内部独立运营
GitHub 首席执行官 Thomas Dohmke 宣布将于年底离职,而微软不再任命新 CEO,GitHub 领导团队将直接向 CoreAI 部门汇报。微软于 2018 年以 75 亿美元收购 GitHub 后,这家代码托管平台一直在公司内部独立运营,但最新的人事变动意味着 GitHub 的运营方式发生了重大改变。微软的 CoreAI 部门由 Meta 前高管 Jay Parikh 领导,专注于为微软及其客户构建 AI 平台和工具。