OrangeBot.AI Digest — 2025-07-11
75 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- OpenAI’s Windsurf deal is off, and Windsurf’s CEO is going to Google (www.theverge.com)
- ETH Zurich and EPFL to release a LLM developed on public infrastructure (ethz.ch)
- jank is C++ (jank-lang.org)
- In a First, Solar Was Europe's Biggest Source of Power Last Month (e360.yale.edu)
- Pa. House passes 'click-to-cancel' subscription bills (www.pennlive.com)
- Lead pigment in turmeric is the culprit in a global poisoning mystery (2024) (www.npr.org)
- Upgrading an M4 Pro Mac mini's storage for half the price (www.jeffgeerling.com)
- I'm done with social media – Or: why I have a blog now (www.carolinecrampton.com)
- Overtourism in Japan, and how it hurts small businesses (craigmod.com)
- AI agent benchmarks are broken (ddkang.substack.com)
- Recovering from AI addiction (internetaddictsanonymous.org)
- At Least 13 People Died by Suicide Amid U.K. Post Office Scandal, Report Says (www.nytimes.com)
- Bill Atkinson's psychedelic user interface (patternproject.substack.com)
- FP8 is ~100 tflops faster when the kernel name has "cutlass" in it (twitter.com)
- Apple vs the Law (formularsumo.co.uk)
GitHub Trending(15)
- protocolbuffers / protobuf
Protocol Buffers - Google's data interchange format
- googleapis / genai-toolbox
MCP Toolbox for Databases is an open source MCP server for databases.
- Alibaba-NLP / WebAgent
🌐 WebAgent for Information Seeking built by Tongyi Lab: WebWalker & WebDancer & WebSailor https://arxiv.org/pdf/2507.02592
- WordPress / wordpress-develop
WordPress Develop, Git-ified. Synced from git://develop.git.wordpress.org/, including branches and tags! This repository is just a mirror of the WordPress subversion repository. Please include a link to a pre-existing ticket on https://core.trac.wordpress.org/ with every pull request.
- snap-stanford / Biomni
Biomni: a general-purpose biomedical AI agent
- google / googletest
GoogleTest - Google Testing and Mocking Framework
- ByteByteGoHq / system-design-101
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
- goauthentik / authentik
The authentication glue you need.
- LMCache / LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
- punkpeye / awesome-mcp-clients
A collection of MCP clients.
- hashicorp / terraform
Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
- landing-ai / agentic-doc
Python library for Agentic Document Extraction from LandingAI
- open-telemetry / opentelemetry-go
OpenTelemetry Go API and SDK
- getsentry / sentry
Developer-first error tracking and performance monitoring
- antiwork / flexile
Contractor payments as easy as 1-2-3
Product Hunt(15)
- Intervo
The open-source platform for conversational AI
- opencode
Your terminal's AI agent, with any model you want
- YoinkUI
Copy any webpage's UI with 1 click
- Permut
AI Agents for customer listening and feedback collection
- Sublime
AI journal that turns your thoughts into real-world actions
- Midway@
Meet in the middle with ease
- ZeroEntropy (YC W25)
The engine for human-level search
- Todo2
AI project manager for Cursor
- Gemini in Wear OS
Gemini is now on your wrist
- Formix
Connect forms to google sheets
- Agent Sam
World's first human like fund raising agent
- GetGenAI
AI compliance checks for marketing teams
- Sully.ai
SullyAI: 10x clinical diagnosis - introducing Consensus
- Exists
Generate your own 3D worlds and team-deathmatch in minutes
- Rova Bullpost
Show your conviction in underrated founders and startups
Hugging Face(15)
- Scaling RL to Long Videos
We introduce a full-stack framework that scales up reasoning in vision-language models (VLMs) to long videos, leveraging reinforcement learning. We address the unique challenges of long video reasoning by integrating three critical components: (1) a large-scale dataset, LongVideo-Reason, comprising 52K long video QA pairs with high-quality reasoning annotations across diverse domains such as sports, games, and vlogs; (2) a two-stage training pipeline that extends VLMs with chain-of-thought supervised fine-tuning (CoT-SFT) and reinforcement learning (RL); and (3) a training infrastructure for long video RL, named Multi-modal Reinforcement Sequence Parallelism (MR-SP), which incorporates sequence parallelism and a vLLM-based engine tailored for long video, using cached video embeddings for efficient rollout and prefilling. In experiments, LongVILA-R1-7B achieves strong performance on long video QA benchmarks such as VideoMME. It also outperforms Video-R1-7B and even matches Gemini-1.5-Pro across temporal reasoning, goal and purpose reasoning, spatial reasoning, and plot reasoning on our LongVideo-Reason-eval benchmark. Notably, our MR-SP system achieves up to 2.1x speedup on long video RL training. LongVILA-R1 demonstrates consistent performance gains as the number of input video frames scales. LongVILA-R1 marks a firm step towards long video reasoning in VLMs. In addition, we release our training system for public availability that supports RL training on various modalities (video, text, and audio), various models (VILA and Qwen series), and even image and video generation models. On a single A100 node (8 GPUs), it supports RL training on hour-long videos (e.g., 3,600 frames / around 256k tokens).
- T-LoRA: Single Image Diffusion Model Customization Without Overfitting
While diffusion model fine-tuning offers a powerful approach for customizing pre-trained models to generate specific objects, it frequently suffers from overfitting when training samples are limited, compromising both generalization capability and output diversity. This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image, as single-image customization holds the greatest practical potential. We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization. In our work we show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy. T-LoRA incorporates two key innovations: (1) a dynamic fine-tuning strategy that adjusts rank-constrained updates based on diffusion timesteps, and (2) a weight parametrization technique that ensures independence between adapter components through orthogonal initialization. Extensive experiments show that T-LoRA and its individual components outperform standard LoRA and other diffusion model personalization techniques. They achieve a superior balance between concept fidelity and text alignment, highlighting the potential of T-LoRA in data-limited and resource-constrained scenarios. Code is available at https://github.com/ControlGenAI/T-LoRA.
- Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Models like OpenAI-o3 pioneer visual grounded reasoning by dynamically referencing visual regions, just like human "thinking with images". However, no benchmark exists to evaluate these capabilities holistically. To bridge this gap, we propose TreeBench (Traceable Evidence Evaluation Benchmark), a diagnostic benchmark built on three principles: (1) focused visual perception of subtle targets in complex scenes, (2) traceable evidence via bounding box evaluation, and (3) second-order reasoning to test object interactions and spatial hierarchies beyond simple object localization. Prioritizing images with dense objects, we initially sample 1K high-quality images from SA-1B, and incorporate eight LMM experts to manually annotate questions, candidate options, and answers for each image. After three stages of quality control, TreeBench consists of 405 challenging visual question-answering pairs, even the most advanced models struggle with this benchmark, where none of them reach 60% accuracy, e.g., OpenAI-o3 scores only 54.87. Furthermore, we introduce TreeVGR (Traceable Evidence Enhanced Visual Grounded Reasoning), a training paradigm to supervise localization and reasoning jointly with reinforcement learning, enabling accurate localizations and explainable reasoning pathways. Initialized from Qwen2.5-VL-7B, it improves V* Bench (+16.8), MME-RealWorld (+12.6), and TreeBench (+13.4), proving traceability is key to advancing vision-grounded reasoning. The code is available at https://github.com/Haochen-Wang409/TreeVGR.
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Recent advances in multimodal large language models (MLLMs) have shown remarkable capabilities in integrating vision and language for complex reasoning. While most existing benchmarks evaluate models under offline settings with a fixed set of pre-recorded inputs, we introduce OST-Bench, a benchmark designed to evaluate Online Spatio-Temporal understanding from the perspective of an agent actively exploring a scene. The Online aspect emphasizes the need to process and reason over incrementally acquired observations, while the Spatio-Temporal component requires integrating current visual inputs with historical memory to support dynamic spatial reasoning. OST-Bench better reflects the challenges of real-world embodied perception. Built on an efficient data collection pipeline, OST-Bench consists of 1.4k scenes and 10k question-answer pairs collected from ScanNet, Matterport3D, and ARKitScenes. We evaluate several leading MLLMs on OST-Bench and observe that they fall short on tasks requiring complex spatio-temporal reasoning. Under the online setting, their accuracy declines as the exploration horizon extends and the memory grows. Through further experimental analysis, we identify common error patterns across models and find that both complex clue-based spatial reasoning demands and long-term memory retrieval requirements significantly drop model performance along two separate axes, highlighting the core challenges that must be addressed to improve online embodied reasoning. To foster further research and development in the field, our codes, dataset, and benchmark are available. Our project page is: https://rbler1234.github.io/OSTBench.github.io/
- Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Video large language models (LLMs) achieve strong video understanding by leveraging a large number of spatio-temporal tokens, but suffer from quadratic computational scaling with token count. To address this, we propose a training-free spatio-temporal token merging method, named STTM. Our key insight is to exploit local spatial and temporal redundancy in video data which has been overlooked in prior work. STTM first transforms each frame into multi-granular spatial tokens using a coarse-to-fine search over a quadtree structure, then performs directed pairwise merging across the temporal dimension. This decomposed merging approach outperforms existing token reduction methods across six video QA benchmarks. Notably, STTM achieves a 2times speed-up with only a 0.5% accuracy drop under a 50% token budget, and a 3times speed-up with just a 2% drop under a 30% budget. Moreover, STTM is query-agnostic, allowing KV cache reuse across different questions for the same video. The project page is available at https://www.jshyun.me/projects/sttm.
- PyVision: Agentic Vision with Dynamic Tooling
LLMs are increasingly deployed as agents, systems capable of planning, reasoning, and dynamically calling external tools. However, in visual reasoning, prior approaches largely remain limited by predefined workflows and static toolsets. In this report, we present PyVision, an interactive, multi-turn framework that enables MLLMs to autonomously generate, execute, and refine Python-based tools tailored to the task at hand, unlocking flexible and interpretable problem-solving. We develop a taxonomy of the tools created by PyVision and analyze their usage across a diverse set of benchmarks. Quantitatively, PyVision achieves consistent performance gains, boosting GPT-4.1 by +7.8% on V* and Claude-4.0-Sonnet by +31.1% on VLMsAreBlind-mini. These results point to a broader shift: dynamic tooling allows models not just to use tools, but to invent them, advancing toward more agentic visual reasoning.
- Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful geometric-aware structure in their learned representations. To bridge this gap between video diffusion models and the underlying 3D nature of the physical world, we propose Geometry Forcing, a simple yet effective method that encourages video diffusion models to internalize latent 3D representations. Our key insight is to guide the model's intermediate representations toward geometry-aware structure by aligning them with features from a pretrained geometric foundation model. To this end, we introduce two complementary alignment objectives: Angular Alignment, which enforces directional consistency via cosine similarity, and Scale Alignment, which preserves scale-related information by regressing unnormalized geometric features from normalized diffusion representation. We evaluate Geometry Forcing on both camera view-conditioned and action-conditioned video generation tasks. Experimental results demonstrate that our method substantially improves visual quality and 3D consistency over the baseline methods. Project page: https://GeometryForcing.github.io.
- LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
In this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 times speedup and a 47 times boost over LangSplat respectively, along with improved query accuracy. LangSplat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, LangSplat does not yet achieve real-time inference performance (8.2 FPS), even with advanced A100 GPUs, severely limiting its broader application. In this paper, we first conduct a detailed time analysis of LangSplat, identifying the heavyweight decoder as the primary speed bottleneck. Our solution, LangSplatV2 assumes that each Gaussian acts as a sparse code within a global dictionary, leading to the learning of a 3D sparse coefficient field that entirely eliminates the need for a heavyweight decoder. By leveraging this sparsity, we further propose an efficient sparse coefficient splatting method with CUDA optimization, rendering high-dimensional feature maps at high quality while incurring only the time cost of splatting an ultra-low-dimensional feature. Our experimental results demonstrate that LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster. Codes and demos are available at our project page: https://langsplat-v2.github.io.
- A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Despite the significant progress that has been made in video generative models, existing state-of-the-art methods can only produce videos lasting 5-16 seconds, often labeled "long-form videos". Furthermore, videos exceeding 16 seconds struggle to maintain consistent character appearances and scene layouts throughout the narrative. In particular, multi-subject long videos still fail to preserve character consistency and motion coherence. While some methods can generate videos up to 150 seconds long, they often suffer from frame redundancy and low temporal diversity. Recent work has attempted to produce long-form videos featuring multiple characters, narrative coherence, and high-fidelity detail. We comprehensively studied 32 papers on video generation to identify key architectural components and training strategies that consistently yield these qualities. We also construct a comprehensive novel taxonomy of existing methods and present comparative tables that categorize papers by their architectural designs and performance characteristics.
- Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or repeated multiple times as recurrent neural networks (RNN), and stacked with others in arbitrary orders, yielding a chain-of-layers (CoLa) per sample. This compositional space greatly expands the scope of existing works on looped/recurrent pretrained modules, layer pruning, or early-exit networks. We develop a Monte Carlo Tree Search (MCTS) protocol to explore and identify the optimal CoLa for each sample from math and commonsense reasoning benchmarks. Compared to a static model of a fixed depth, CoLa allows shortcut paths (fast thinking), recurrence of the same layer(s) (slow thinking), and combining both, offering more flexible, dynamic architectures for different inputs. We conduct an extensive analysis of the MCTS-optimized CoLa, which leads to two key findings: (1) For >75% of samples with correct predictions by the original LLM, we can find shorter CoLa, suggesting a large space for improving inference efficiency; (2) For >60% of samples with originally incorrect predictions, we can identify CoLa achieving correct predictions, suggesting a large space of performance enhancement. Our results highlight the shortcomings of using a fixed architecture of pre-trained LLMs for inference on different samples and pave the way to unlock the generalization power of test-time depth adaptation.
- Token Bottleneck: One Token to Remember Dynamics
Deriving compact and temporally aware visual representations from dynamic scenes is essential for successful execution of sequential scene understanding tasks such as visual tracking and robotic manipulation. In this paper, we introduce Token Bottleneck (ToBo), a simple yet intuitive self-supervised learning pipeline that squeezes a scene into a bottleneck token and predicts the subsequent scene using minimal patches as hints. The ToBo pipeline facilitates the learning of sequential scene representations by conservatively encoding the reference scene into a compact bottleneck token during the squeeze step. In the expansion step, we guide the model to capture temporal dynamics by predicting the target scene using the bottleneck token along with few target patches as hints. This design encourages the vision backbone to embed temporal dependencies, thereby enabling understanding of dynamic transitions across scenes. Extensive experiments in diverse sequential tasks, including video label propagation and robot manipulation in simulated environments demonstrate the superiority of ToBo over baselines. Moreover, deploying our pre-trained model on physical robots confirms its robustness and effectiveness in real-world environments. We further validate the scalability of ToBo across different model scales.
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and sycophancy, we propose machine bullshit as an overarching conceptual framework that can allow researchers to characterize the broader phenomenon of emergent loss of truthfulness in LLMs and shed light on its underlying mechanisms. We introduce the Bullshit Index, a novel metric quantifying LLMs' indifference to truth, and propose a complementary taxonomy analyzing four qualitative forms of bullshit: empty rhetoric, paltering, weasel words, and unverified claims. We conduct empirical evaluations on the Marketplace dataset, the Political Neutrality dataset, and our new BullshitEval benchmark (2,400 scenarios spanning 100 AI assistants) explicitly designed to evaluate machine bullshit. Our results demonstrate that model fine-tuning with reinforcement learning from human feedback (RLHF) significantly exacerbates bullshit and inference-time chain-of-thought (CoT) prompting notably amplify specific bullshit forms, particularly empty rhetoric and paltering. We also observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy. Our findings highlight systematic challenges in AI alignment and provide new insights toward more truthful LLM behavior.
- Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content -- and context -- dependent segmentation strategies learned jointly with the rest of the model. Incorporating this into an explicit hierarchical network (H-Net) allows replacing the (implicitly hierarchical) tokenization-LM-detokenization pipeline with a single model learned fully end-to-end. When compute- and data- matched, an H-Net with one stage of hierarchy operating at the byte level outperforms a strong Transformer language model operating over BPE tokens. Iterating the hierarchy to multiple stages further increases its performance by modeling multiple levels of abstraction, demonstrating significantly better scaling with data and matching a token-based Transformer of twice its size. H-Nets pretrained on English show significantly increased character-level robustness, and qualitatively learn meaningful data-dependent chunking strategies without any heuristics or explicit supervision. Finally, the H-Net's improvement over tokenized pipelines is further increased in languages and modalities with weaker tokenization heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement in data efficiency over baselines), showing the potential of true end-to-end models that learn and scale better from unprocessed data.
- Beyond the Linear Separability Ceiling
Most state-of-the-art Visual-Language Models (VLMs) are seemingly limited by the linear separabilty of their visual embeddings on abstract reasoning tasks. This work investigates this "linear reasoning bottleneck" by introducing the Linear Separability Ceiling (LSC), the performance of a simple linear classifier on a VLM's visual embeddings. We find this bottleneck is widespread and stems not from poor perception, but from failures in the language model's reasoning pathways. We demonstrate this is a solvable alignment issue. The required intervention, however, is task-dependent: activating existing pathways suffices for semantic concepts, while complex relational reasoning requires adapting core model weights. Using postfix tuning as a methodological control, we find strong evidence for powerful, dormant reasoning pathways within VLMs. However, for complex relational tasks requiring deeper adaptation, explicitly improving representation quality causes the model to fail on new prompt formats despite its embeddings remaining well separated. Ultimately, this work provides a new lens for VLM analysis, showing that robust reasoning is a matter of targeted alignment, not simply improved representation learning.
- Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
Neural audio codecs and autoencoders have emerged as versatile models for audio compression, transmission, feature-extraction, and latent-space generation. However, a key limitation is that most are trained to maximize reconstruction fidelity, often neglecting the specific latent structure necessary for optimal performance in diverse downstream applications. We propose a simple, post-hoc framework to address this by modifying the bottleneck of a pre-trained autoencoder. Our method introduces a "Re-Bottleneck", an inner bottleneck trained exclusively through latent space losses to instill user-defined structure. We demonstrate the framework's effectiveness in three experiments. First, we enforce an ordering on latent channels without sacrificing reconstruction quality. Second, we align latents with semantic embeddings, analyzing the impact on downstream diffusion modeling. Third, we introduce equivariance, ensuring that a filtering operation on the input waveform directly corresponds to a specific transformation in the latent space. Ultimately, our Re-Bottleneck framework offers a flexible and efficient way to tailor representations of neural audio models, enabling them to seamlessly meet the varied demands of different applications with minimal additional training.
Solidot(15)
- 海洋中的纳米塑料多达数千万吨
纳米大小的塑料颗粒已经无处不在,在海洋中其质量数以千万吨。根据发表在《自然》期刊上的一项研究,研究人员从北大西洋代表不同环境的三个深度采集水样。他们从中发现了三种类型的纳米塑料:聚对苯二甲酸乙二醇酯(PET)、聚苯乙烯(PS)和聚氯乙烯(PVC)。纳米塑料的平均浓度为每立方米 18 毫克,相当于有 2700 万吨纳米塑料分布在温带至亚热带北大西洋的表层。纳米塑料不同于体积更大的塑料颗粒,它们会悬浮在水中,不会沉降到海底,能够进入食物链。
- GlobalFoundries 收购 MIPS
GlobalFoundries 宣布达成协议收购 MIPS,此举旨在扩大其产品组合。MIPS 如今的 IP 主要围绕开放指令集架构 RISC-V。MIPS 成立于 1984 年,2013 年被 Imagination Technologies 收购, 2017 年出售给 Tallwood Venture Capital,2018 年被 Wave Computing 收购。2020 年 4 月 Wave 申请破产保护。2021 年 3 月,Wave 更名为 MIPS,宣布将放弃开发 MIPS 架构,投奔 RISC-V 阵营。
- 《星露谷物语》成为 Steam 平台最受好评的游戏
《星露谷物语(Stardew Valley)》超越《传送门 2(Portal 2)》成为 Steam 平台最受好评的游戏。《星露谷物语》是 ConcernedApe 开发的农场模拟类角色扮演游戏。在游戏中,玩家所扮演的主角远离办公室工作的烦扰,在一个名叫星露谷的地方重新开始打理祖父留下的荒废牧场。游戏于 2016 年 2 月发布,至今有近 10 年历史,ConcernedApe 在发布之后继续进行更新,包括引入新的 NPC 和多人模式,游戏售出了逾三千万份拷贝。数据显示,《星露谷物语》以 8.87 的得分和 899,309 票超过了《传送门2》的 8.85 分和 436,510 票。排名前十五的游戏还包括:泰拉瑞亚、People Playground、求生之路 2、Vampire Survivors、Hades 、欧卡2、Schedule I、传送门 1、Garry's Mod、RimWorld、Black Myth: Wukong、Baldur's Gate 3 和 Lethal Company。
- 只有 1% 的龟罹患癌症
龟罹患癌症的可能性远低于哺乳动物或鸟类。根据发表在《BioScience》期刊上的一项研究,仅有 1% 的龟罹患癌症。研究人员分析了数百只生活在动物园的龟的医疗记录和尸检报告。结果显示,龟患癌症的数量非常少,即使出现肿瘤,也几乎不会扩散。研究人员将这一现象归因于龟强大的防御能力,包括抵御细胞损伤、减轻细胞压力的缓慢新陈代谢以及独特的抗癌基因。
- Amarok 3.3 释出
音乐播放器项目 Amarok 释出了 v3.3 版,这是首个基于 KDE Frameworks 6 和 Qt 6 的版本。Amarok 3.3 大改了音频引擎,使用 GStreamer 而不是 Phonon 播放音频。主要变化包括:停止支持 Qt5/KF5,更新了数据库字符集允许完整的 utf-8 值;解决了 2038 年问题,等等。
- 基因组研究确认人类圈养动物后动物病毒开始传播给人类
发表在《自然》期刊上的一项基因组研究确认了人畜共患病原体从动物传播到人类的最早时间点:6500 年前人类从狩猎采集转向畜牧业开始与动物生活在一起。澳大利亚病毒学家 Edward Holmes 称,这不是新观念,但研究人员用数据给出了证明。研究人员从欧亚大陆发现的 1,313 具古人类骨骼和牙齿中残留的血液提取 DNA 序列,寻找微生物基因组的痕迹。这些古代 DNA 所处时期跨越了 3.7 万年。研究识别出 5486 个来自细菌、病毒和寄生虫的 DNA 序列。人畜共患病原体仅在距今 6500 年或更短时间的遗骸中发现,约 5000 年前达到峰值。鼠疫杆菌 首次出现在该数据集中的时间是在 5700-5300 年前。
- 西欧经历有记录以来最热的六月
欧盟哥白尼气候变化服务机构(CCCS)周三表示,西欧经历了有记录以来最热的六月,且连续三年六月都破高温记录。在全球范围内,六月是有记录以来第三热的六月,由于人类排放温室气体导致地球变暖,持续的高温天气仍在延续。哥白尼气候变化服务机构表示,6 月西欧平均气温为 20.49°C,比 1991-2020 年平均气温高出2.81°C。西欧经历了两波热浪,其中第二波延续到七月初,多国地表气温超过 40°C,西班牙和葡萄牙的气温高达 46°C。
- 两性权力关系泾渭并不分明
阿尔法雄性(alpha male)不是普世真理。德国马普和法国研究人员分析了 121 个灵长类物种 253 个种群的雌雄攻击行为的详细观察,结果显示两性之间并不存在泾渭分明的权力关系。雄性和雌性之间的竞争异常普遍。平均而言,社群中近半数侵略性互动都涉及雄性和雌性。学界长期以来一直假定灵长类动物的权力结构偏向雄性,然而最新研究给出了不同的答案,事实上偏向雄性的权力结构更多是一种例外。在分析的 151 个种群中,只有 25 个种群观察到雄性占优,16 个种群雌性占优,七成种群的权力结构是中性即无性别倾向。在陆生种群中,当雄性比雌性拥有更大体型和武器时,雄性占主导的情况更为普遍。研究人员称,灵长类雄性通过武力和强制获得权力,而雌性则通过其它策略如生殖策略获得权力。
- OpenAI 将发布 AI Web 浏览器挑战 Chrome
OpenAI 准备发布一款 AI 驱动的 Web 浏览器,挑战支配着浏览器市场的 Google Chrome。浏览器预计将在数周内发布,它旨在利用 AI 从根本上改变消费者浏览 Web 的方式。它将让 OpenAI 直接获取 Google 成功的基石:用户数据。Chrome 是 Alphabet 广告业务的支柱,Chrome 提供用户信息以帮助 Alphabet 更有效定向广告使其更有利可图,它还为 Google 提供了一种默认将搜索流量路由到自家引擎的方法。Google Chrome 用户数多达 30 亿,而 OpenAI ChatGPT 的周活跃用户为 5 亿,它的浏览器是基于 Google 开源的 Chromium。
- 麦当劳的 AI 招聘平台管理员密码是 123456
今天想要应聘麦当劳工作的人可能首先需要在 McHire.com 平台上与 AI 聊天机器人 Olivia 聊一聊。Olivia 会询问应聘者个人信息和简历,进行性格测试。该聊天机器人由 Paradox.ai 公司提供。安全研究员 Ian Carroll 和 Sam Curry 在听闻麦当劳使用 AI 聊天机器人筛选应聘者后好奇之下对 McHire.com 进行了一番研究,结果意外发现该平台的管理员用户名和密码都是 123456。登陆管理员面板之后,他们可以访问 Paradox.ai 账户,查询该公司保存的所有 McHire 用户与 Olivia 聊天记录的数据库。数据库包含了多达 6400 万条记录,包括了应聘者的姓名、电子邮件地址和电话号码。麦当劳表示 Paradox.ai 需要对该漏洞负责。Paradox.ai 确认并在一天内修复了漏洞。
- 英伟达市值突破 4 万亿美元
作为生成式 AI 最主要的硬件供应商,英伟达股价周三上涨逾 2%,市值突破 4 万亿美元,成为历史上第一家 4 万亿美元市值的企业。英伟达如今是全球市场第一的企业,超过了微软和苹果,两家公司在英伟达之前市值突破 3 万亿美元,但尚未达到 4 万亿美元。英伟达总部位于加州,成立于 1993 年,在 ChatGPT 掀起的生成式 AI 热中,它的股价一路高涨,2024 年 2 月其市值首次突破 2 万亿美元,6 月突破 3 万亿美元。
- 美国科技巨头对财政部的制裁名单响应并不迅速
今年早些时候,海牙国际刑事法院以战争罪对以色列总理内塔尼亚胡及其前国防部长加兰特(Yoav Gallant)发出逮捕令,美国总统特朗普随后对国际刑事法院进行了制裁,微软则立即封了首席检察官 Karim Khan 的电邮账号。但很多时候,美国科技巨头对制裁名单的响应并不像微软动作那么快。5 月 29 日,美国财政部对 Funnull Technology Inc.及其经营者、40 岁的上海人刘理志 aka Liu“Steve”Lizhi、XXL4 和 Nice Lizhi 等进行经济制裁,云服务商 Funnull 被控帮助了金融诈骗导致美国人损失逾 2 亿美元。根据美国法律,美国公司被禁止与被制裁的个人继续做生意。调查发现,在一个多月后,美国科技巨头 Facebook、Github、PayPal 和 Twitter/X 仍然没有关闭刘理志的账号。LinkedIn 的账号是在被要求置评后几小时内删除的。
- 一群抹香鲸被拍摄到以站立姿态睡觉
日本鹿儿岛县奄美大岛近海附近海域近期观察到一群抹香鲸仰头以“站立”姿态睡觉。奄美海洋生物研究会会长兴克树 6 月 23 日在该岛以西约 15 公里的海域发现并拍摄了这一景象。兴克树回忆称,“这是迄今见过的最大鲸群,也是第一次看到它们的睡姿。非常感动。”兴克树称,在距海面约 3 米深处发现的约 20 头鲸群中,中央的四五头处于类似站着睡觉的状态。身体最长约 14 米。抹香鲸睡眠时间为每天中的近 2 小时,能够遇到这一瞬间十分珍贵。
- 230 万 Chrome 和 Edge 用户安装了会劫持浏览器会话的扩展
一款帮助开发者选择颜色并获得 Google 验证徽章的取色器 (color pickers) 扩展看似无害,但安全研究人员称它会劫持浏览器会话、追踪网络活动,在受害者浏览器上植入后门。这款名为 Geco 扩展的下载量逾 10 万次,有 800 条评论,获得 4.2/5 星评价,安全公司 Koi Security 的研究人员称这是名为 RedDirection 的劫持浏览器的网络攻击行动的一部分,该行动涉及 18 个恶意扩展,逾 230 万 Chrome 和 Edge 用户受到影响。研究人员称,这些扩展最初不包含恶意代码,因此能获得 Google 的认证,它们是在后续更新中加入恶意代码的。
- 研究估计未来的胃癌病例大部分与幽门螺杆菌感染相关
根据发表在《Nature Medicine》期刊上的一项研究,法国国际癌症研究机构的研究人员估计了 2008-2017 年出生的年轻一代未来的胃癌负担。胃癌作为全球第五大癌症死因,其发病机制与幽门螺杆菌感染的密切关联早已被确认,但全球范围内针对这一可预防癌症的防控投入长期不足。尤其令人担忧的是,近年来年轻人群(<50岁)的胃癌发病率在高低风险地区均呈现上升趋势,而人口老龄化进程更将加剧疾病负担。研究人员估计,2008-2017 年出生人群一生中将有 1560 万例胃癌病例,其中 76% 可归因于幽门螺杆菌。亚洲大陆预计有 1060 万例(占全球 68%),东亚(590 万)和南亚(290 万)最为突出。中国和印度合计占全球可预防病例的 42%。