DIGEST · 2025-07-02

OrangeBot.AI Digest — 2025-07-02

75 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Couchers is officially out of beta (couchers.org)
  2. Stop Killing Games (www.stopkillinggames.com)
  3. ICEBlock, an app for anonymously reporting ICE sightings (techcrunch.com)
  4. Show HN: CSS generator for a high-def glass effect (glass3d.dev)
  5. ICEBlock climbs to the top of the App Store charts after officials slam it (www.engadget.com)
  6. Gene therapy restored hearing in deaf patients (news.ki.se)
  7. I'm dialing back my LLM usage (zed.dev)
  8. Microsoft to Cut 9k Workers in Second Wave of Major Layoffs (www.bloomberg.com)
  9. Private sector lost 33k jobs, badly missing expectations of 100k increase (www.cnbc.com)
  10. Exploiting the IKKO Activebuds "AI powered" earbuds (blog.mgdproductions.com)
  11. Cloudflare Introduces Default Blocking of A.I. Data Scrapers (www.nytimes.com)
  12. Don’t use “click here” as link text (2001) (www.w3.org)
  13. Math.Pow(-1, 2) == -1 in Windows 11 Insider build (github.com)
  14. Jack Welch, the Man Who Broke Capitalism (2022) (www.forbes.com)
  15. How large are large language models? (gist.github.com)

GitHub Trending(15)

  1. microsoft / generative-ai-for-beginners

    21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

  2. NanmiCoder / MediaCrawler

    小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

  3. zaidmukaddam / scira

    Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet and cites it too. Powered by Vercel AI SDK! Search with models like xAI's Grok 3.

  4. microsoft / Mastering-GitHub-Copilot-for-Paired-Programming

    A multi-module course teaching everything you need to know about using GitHub Copilot as an AI Peer Programming resource.

  5. binwiederhier / ntfy

    Send push notifications to your phone or desktop using PUT/POST

  6. GraphiteEditor / Graphite

    An open source graphics editor for 2025: comprehensive 2D content creation tool suite for graphic design, digital art, and interactive real-time motion graphics — featuring node-based procedural editing

  7. The-Cool-Coders / Project-Ideas-And-Resources

    A Collection of application ideas that can be used to improve your coding skills ❤.

  8. TapXWorld / ChinaTextbook

    所有小初高、大学PDF教材。

  9. NginxProxyManager / nginx-proxy-manager

    Docker container for managing Nginx proxy hosts with a simple, powerful interface

  10. snailyp / gemini-balance

    Gemini polling proxy service (gemini轮询代理服务)

  11. danielmiessler / Fabric

    Fabric is an open-source framework for augmenting humans using AI. It provides a modular system for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

  12. onlook-dev / onlook

    The Cursor for Designers • An Open-Source Visual Vibecoding Editor • Visually build, style, and edit your React App with AI

  13. PaddlePaddle / ERNIE

    The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

  14. openssl / openssl

    TLS/SSL and crypto library

  15. tadata-org / fastapi_mcp

    Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!

Product Hunt(15)

  1. Portia AI

    Secure AI agents with tools, auth, and smart control

  2. String.com

    AI agent for building AI agents

  3. Lazy 2.0

    One shortcut to capture & chat with your notes, everywhere

  4. Nothing Phone (3)

    Beyond lights, with the new Glyph Matrix

  5. Olive

    Generate admin dashboards in minutes from a prompt

  6. Unify AI for Sales Reps

    Use AI to find and engage new customers at scale

  7. UniDeck Beta

    No-code dashboards with AI insights and advanced widgets

  8. zookish

    Make your website talk

  9. create-api.dev by Kong

    Generate and share OpenAPI specs with AI

  10. Version 2

    A private Perplexity that runs locally on your mobile device

  11. Tailored Labs

    Edit videos by simply prompting AI

  12. Billingrails Sandbox

    Build a custom billing system in minutes with just config

  13. Dual Shades

    Keep your subject in color, fade the background to B&W

  14. Steppy by StageKeep

    Easily learn TikTok dances

  15. Semantic Email

    Send email to fill Google Forms

Hugging Face(15)

  1. Ella: Embodied Social Agents with Lifelong Memory

    We introduce Ella, an embodied social agent capable of lifelong learning within a community in a 3D open world, where agents accumulate experiences and acquire knowledge through everyday visual observations and social interactions. At the core of Ella's capabilities is a structured, long-term multimodal memory system that stores, updates, and retrieves information effectively. It consists of a name-centric semantic memory for organizing acquired knowledge and a spatiotemporal episodic memory for capturing multimodal experiences. By integrating this lifelong memory system with foundation models, Ella retrieves relevant information for decision-making, plans daily activities, builds social relationships, and evolves autonomously while coexisting with other intelligent beings in the open world. We conduct capability-oriented evaluations in a dynamic 3D open world where 15 agents engage in social activities for days and are assessed with a suite of unseen controlled evaluations. Experimental results show that Ella can influence, lead, and cooperate with other agents well to achieve goals, showcasing its ability to learn effectively through observation and social interaction. Our findings highlight the transformative potential of combining structured memory systems with foundation models for advancing embodied intelligence. More videos can be found at https://umass-embodied-agi.github.io/Ella/.

  2. MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models

    Multimodal Large Language Models (MLLMs) have achieved remarkable visual reasoning abilities in natural images, text-rich documents, and graphic designs. However, their ability to interpret music sheets remains underexplored. To bridge this gap, we introduce MusiXQA, the first comprehensive dataset for evaluating and advancing MLLMs in music sheet understanding. MusiXQA features high-quality synthetic music sheets generated via MusiXTeX, with structured annotations covering note pitch and duration, chords, clefs, key/time signatures, and text, enabling diverse visual QA tasks. Through extensive evaluations, we reveal significant limitations of current state-of-the-art MLLMs in this domain. Beyond benchmarking, we developed Phi-3-MusiX, an MLLM fine-tuned on our dataset, achieving significant performance gains over GPT-based methods. The proposed dataset and model establish a foundation for future advances in MLLMs for music sheet understanding. Code, data, and model will be released upon acceptance.

  3. FreNBRDF: A Frequency-Rectified Neural Material Representation

    Accurate material modeling is crucial for achieving photorealistic rendering, bridging the gap between computer-generated imagery and real-world photographs. While traditional approaches rely on tabulated BRDF data, recent work has shifted towards implicit neural representations, which offer compact and flexible frameworks for a range of tasks. However, their behavior in the frequency domain remains poorly understood. To address this, we introduce FreNBRDF, a frequency-rectified neural material representation. By leveraging spherical harmonics, we integrate frequency-domain considerations into neural BRDF modeling. We propose a novel frequency-rectified loss, derived from a frequency analysis of neural materials, and incorporate it into a generalizable and adaptive reconstruction and editing pipeline. This framework enhances fidelity, adaptability, and efficiency. Extensive experiments demonstrate that \ours improves the accuracy and robustness of material appearance reconstruction and editing compared to state-of-the-art baselines, enabling more structured and interpretable downstream tasks and applications.

  4. Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

    Large language models (LLMs) excel in complex tasks through advanced prompting techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), but their reliance on manually crafted, task-specific prompts limits adaptability and efficiency. We introduce Mixture of Reasoning (MoR), a training framework that embeds diverse reasoning strategies into LLMs for autonomous, task-adaptive reasoning without external prompt engineering. MoR has two phases: Thought Generation, creating reasoning chain templates with models like GPT-4o, and SFT Dataset Construction, pairing templates with benchmark datasets for supervised fine-tuning.Our experiments show that MoR significantly enhances performance, with MoR150 achieving 0.730 (2.2% improvement) using CoT prompting and 0.734 (13.5% improvement) compared to baselines. MoR eliminates the need for task-specific prompts, offering a generalizable solution for robust reasoning across diverse tasks.

  5. Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video

    Amodal segmentation and amodal content completion require using object priors to estimate occluded masks and features of objects in complex scenes. Until now, no data has provided an additional dimension for object context: the possibility of multiple cameras sharing a view of a scene. We introduce MOVi-MC-AC: Multiple Object Video with Multi-Cameras and Amodal Content, the largest amodal segmentation and first amodal content dataset to date. Cluttered scenes of generic household objects are simulated in multi-camera video. MOVi-MC-AC contributes to the growing literature of object detection, tracking, and segmentation by including two new contributions to the deep learning for computer vision world. Multiple Camera (MC) settings where objects can be identified and tracked between various unique camera perspectives are rare in both synthetic and real-world video. We introduce a new complexity to synthetic video by providing consistent object ids for detections and segmentations between both frames and multiple cameras each with unique features and motion patterns on a single scene. Amodal Content (AC) is a reconstructive task in which models predict the appearance of target objects through occlusions. In the amodal segmentation literature, some datasets have been released with amodal detection, tracking, and segmentation labels. While other methods rely on slow cut-and-paste schemes to generate amodal content pseudo-labels, they do not account for natural occlusions present in the modal masks. MOVi-MC-AC provides labels for ~5.8 million object instances, setting a new maximum in the amodal dataset literature, along with being the first to provide ground-truth amodal content. The full dataset is available at https://huggingface.co/datasets/Amar-S/MOVi-MC-AC ,

  6. Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions

    3D Gaussian Splatting enables high-quality real-time rendering but often produces millions of splats, resulting in excessive storage and computational overhead. We propose a novel lossy compression method based on learnable confidence scores modeled as Beta distributions. Each splat's confidence is optimized through reconstruction-aware losses, enabling pruning of low-confidence splats while preserving visual fidelity. The proposed approach is architecture-agnostic and can be applied to any Gaussian Splatting variant. In addition, the average confidence values serve as a new metric to assess the quality of the scene. Extensive experiments demonstrate favorable trade-offs between compression and fidelity compared to prior work. Our code and data are publicly available at https://github.com/amirhossein-razlighi/Confident-Splatting

  7. IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering

    Vision-language models (VLMs) excel at descriptive tasks, but whether they truly understand scenes from visual observations remains uncertain. We introduce IR3D-Bench, a benchmark challenging VLMs to demonstrate understanding through active creation rather than passive recognition. Grounded in the analysis-by-synthesis paradigm, IR3D-Bench tasks Vision-Language Agents (VLAs) with actively using programming and rendering tools to recreate the underlying 3D structure of an input image, achieving agentic inverse rendering through tool use. This "understanding-by-creating" approach probes the tool-using generative capacity of VLAs, moving beyond the descriptive or conversational capacity measured by traditional scene understanding benchmarks. We provide a comprehensive suite of metrics to evaluate geometric accuracy, spatial relations, appearance attributes, and overall plausibility. Initial experiments on agentic inverse rendering powered by various state-of-the-art VLMs highlight current limitations, particularly in visual precision rather than basic tool usage. IR3D-Bench, including data and evaluation protocols, is released to facilitate systematic study and development of tool-using VLAs towards genuine scene understanding by creating.

  8. FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion

    Recent advances in video generation models have enabled high-quality short video generation from text prompts. However, extending these models to longer videos remains a significant challenge, primarily due to degraded temporal consistency and visual fidelity. Our preliminary observations show that naively applying short-video generation models to longer sequences leads to noticeable quality degradation. Further analysis identifies a systematic trend where high-frequency components become increasingly distorted as video length grows, an issue we term high-frequency distortion. To address this, we propose FreeLong, a training-free framework designed to balance the frequency distribution of long video features during the denoising process. FreeLong achieves this by blending global low-frequency features, which capture holistic semantics across the full video, with local high-frequency features extracted from short temporal windows to preserve fine details. Building on this, FreeLong++ extends FreeLong dual-branch design into a multi-branch architecture with multiple attention branches, each operating at a distinct temporal scale. By arranging multiple window sizes from global to local, FreeLong++ enables multi-band frequency fusion from low to high frequencies, ensuring both semantic continuity and fine-grained motion dynamics across longer video sequences. Without any additional training, FreeLong++ can be plugged into existing video generation models (e.g. Wan2.1 and LTX-Video) to produce longer videos with substantially improved temporal consistency and visual fidelity. We demonstrate that our approach outperforms previous methods on longer video generation tasks (e.g. 4x and 8x of native length). It also supports coherent multi-prompt video generation with smooth scene transitions and enables controllable video generation using long depth or pose sequences.

  9. Data Efficacy for Language Model Training

    Data is fundamental to the training of language models (LM). Recent research has been dedicated to data efficiency, which aims to maximize performance by selecting a minimal or optimal subset of training data. Techniques such as data filtering, sampling, and selection play a crucial role in this area. To complement it, we define Data Efficacy, which focuses on maximizing performance by optimizing the organization of training data and remains relatively underexplored. This work introduces a general paradigm, DELT, for considering data efficacy in LM training, which highlights the significance of training data organization. DELT comprises three components: Data Scoring, Data Selection, and Data Ordering. Among these components, we design Learnability-Quality Scoring (LQS), as a new instance of Data Scoring, which considers both the learnability and quality of each data sample from the gradient consistency perspective. We also devise Folding Ordering (FO), as a novel instance of Data Ordering, which addresses issues such as model forgetting and data distribution bias. Comprehensive experiments validate the data efficacy in LM training, which demonstrates the following: Firstly, various instances of the proposed DELT enhance LM performance to varying degrees without increasing the data scale and model size. Secondly, among these instances, the combination of our proposed LQS for data scoring and Folding for data ordering achieves the most significant improvement. Lastly, data efficacy can be achieved together with data efficiency by applying data selection. Therefore, we believe that data efficacy is a promising foundational area in LM training.

  10. GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. Reinforcement Learning with Curriculum Sampling (RLCS) then unlocks the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding, among others. To facilitate research in this field, we open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information are released at https://github.com/THUDM/GLM-4.1V-Thinking.

  11. Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

    Recent advances in diffusion models have enabled high-quality video generation, but the additional temporal dimension significantly increases computational costs, making training and inference on long videos prohibitively expensive. In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time in nature. Motivated by this, we propose Radial Attention, a scalable sparse attention mechanism with O(n log n) complexity that translates energy decay into exponentially decaying compute density, which is significantly more efficient than standard O(n^2) dense attention and more expressive than linear attention. Specifically, Radial Attention employs a simple, static attention mask where each token attends to spatially nearby tokens, with the attention window size shrinking with temporal distance. Moreover, it allows pre-trained video diffusion models to extend their generation length with efficient LoRA-based fine-tuning. Extensive experiments show that Radial Attention maintains video quality across Wan2.1-14B, HunyuanVideo, and Mochi 1, achieving up to a 1.9times speedup over the original dense attention. With minimal tuning, it enables video generation up to 4times longer while reducing training costs by up to 4.4times compared to direct fine-tuning and accelerating inference by up to 3.7times compared to dense attention inference.

  12. Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images

    A report by the European Union Law Enforcement Agency predicts that by 2026, up to 90 percent of online content could be synthetically generated, raising concerns among policymakers, who cautioned that "Generative AI could act as a force multiplier for political disinformation. The combined effect of generative text, images, videos, and audio may surpass the influence of any single modality." In response, California's Bill AB 3211 mandates the watermarking of AI-generated images, videos, and audio. However, concerns remain regarding the vulnerability of invisible watermarking techniques to tampering and the potential for malicious actors to bypass them entirely. Generative AI-powered de-watermarking attacks, especially the newly introduced visual paraphrase attack, have shown an ability to fully remove watermarks, resulting in a paraphrase of the original image. This paper introduces PECCAVI, the first visual paraphrase attack-safe and distortion-free image watermarking technique. In visual paraphrase attacks, an image is altered while preserving its core semantic regions, termed Non-Melting Points (NMPs). PECCAVI strategically embeds watermarks within these NMPs and employs multi-channel frequency domain watermarking. It also incorporates noisy burnishing to counter reverse-engineering efforts aimed at locating NMPs to disrupt the embedded watermark, thereby enhancing durability. PECCAVI is model-agnostic. All relevant resources and codes will be open-sourced.

  13. DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

    Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding, we systematically investigate their denoising processes and reinforcement learning (RL) methods. We train a 7B dLLM, DiffuCoder, on 130B tokens of code. Using this model as a testbed, we analyze its decoding behavior, revealing how it differs from that of AR models: (1) dLLMs can decide how causal their generation should be without relying on semi-AR decoding, and (2) increasing the sampling temperature diversifies not only token choices but also their generation order. This diversity creates a rich search space for RL rollouts. For RL training, to reduce the variance of token log-likelihood estimates and maintain training efficiency, we propose coupled-GRPO, a novel sampling scheme that constructs complementary mask noise for completions used in training. In our experiments, coupled-GRPO significantly improves DiffuCoder's performance on code generation benchmarks (+4.4\% on EvalPlus) and reduces reliance on AR causal during decoding. Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework. https://github.com/apple/ml-diffucoder.

  14. SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

    We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks. Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses. The platform currently supports 23 open-source and proprietary foundation models and has collected over 13,000 votes from trusted researchers across diverse scientific domains. We analyze the data collected so far and confirm that the submitted questions are diverse, aligned with real-world literature needs, and that participating researchers demonstrate strong self-consistency and inter-annotator agreement in their evaluations. We discuss the results and insights based on the model ranking leaderboard. To further promote research in building model-based automated evaluation systems for literature tasks, we release SciArena-Eval, a meta-evaluation benchmark based on our collected preference data. The benchmark measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes. Our experiments highlight the benchmark's challenges and emphasize the need for more reliable automated evaluation methods.

  15. Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

    Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.

Solidot(15)

  1. 华为发布了使用昇腾 NPU 训练的开放权重模型

    华为发布了使用其昇腾 NPU 训练的开放权重模型,模型发布在 Gitcode 上,其许可证禁止欧盟地区使用。被称为盘古 Pro MoE 的模型总参数 720 亿,每个 token 激活 160 亿参数。模型为昇腾 300I Duo 和 800I A2 进行了优化,单卡推理性能达到了 1148 token/s,通过预测加速(speculative acceleration)能进一步提高到 1528 token/s。华为研究人员称,在参数低于 1000 亿的模型中,盘古 Pro MoE 的性能超越了 GLM-Z1-32B 和 Qwen3-32B 等知名开放权重模型。

  2. 首批美国科学难民抵达法国

    首批逃离特朗普统治的美国科学难民抵达了法国。Aix-Marseille 大学(AMU)通过 Safe Place for Science 项目引进了首批 8 名美国科学家。这些科学家尚未与大学签订合同,大多数人要求匿名以便于在未被聘用的情况下保住美国的职位。申请 Safe Place for Science 项目的科学家包括了气候科学家 James 及其研究司法系统与民主关系的妻子。James 不愿意透露他的姓,他不认为自己是难民,但对特朗普治下学术研究的未来深表担忧。他的研究领域受到了当局的针对,面临研究资金削减。AMU 表示虽然它在法国之外的知名度较低,但来自斯坦福大学和耶鲁大学等美国知名大学的 298 名研究人员申请了该项目,凸显了美国的紧迫形势。

  3. 炎症衰老可能是工业化生活方式的产物

    炎症长期以来被认为是衰老的标志,但根据哥伦比亚大学梅尔曼公共卫生学院的一项新研究,炎症可能并非人类的普遍经历。研究表明,炎症性衰老(inflammaging)似乎是工业化生活方式的副产物,在全球人群中存在显著差异。研究人员分析了四个群体的数据:两个工业化群体以及两个非工业化的原住民群体(玻利维亚亚马逊地区的 Tsimane 人和马来西亚半岛的 Orang Asli 人)。尽管两个工业化群体的炎症特征相似,但在原住民群体中却并非如此,因为原住民群体的炎症水平主要受感染而非年龄的影响。大多数慢性疾病(包括糖尿病、心脏病、阿尔茨海默病)在土著群体中很少见或基本不存在。研究人员发现,大约 66% 的 Tsimane 人至少有一种肠道寄生虫感染;超过 70% 的 Orang Asli 人存在流行性感染。炎症标志物与工业化群体的慢性病密切相关,但与土著群体无关。

  4. GNU Health Hospital Information System 5.0 释出

    针对医疗行业的自由软件 GNU Health Hospital Information System 释出了 5.0 版本。主要变化包括:改进报告和分析,更全面的处理不同类型的患者信息,重新设计了医学影像子系统,完善了保险和计费功能,等等。

  5. RisingAttacK 攻击让 AI “看到”你想让它看到的内容

    研究人员展示了一种攻击人工智能计算机视觉系统的新方法,使其能够控制人工智能“看到”的内容 。研究表明,这种名为 RisingAttacK 的新技术能有效操纵所有最广泛使用的人工智能计算机视觉系统 。RisingAttacK 由一系列操作组成,目标是对图像进行最少的更改,从而允许用户操纵视觉 AI“看到”的内容 。首先,RisingAttacK 识别图像中的所有视觉特征 。该程序还运行一个操作,以确定哪些特征对于实现攻击目标最重要。RisingAttacK 随后计算人工智能系统对数据变化的敏感度,并确定人工智能对关键特征数据变化的敏感度 。研究人员称,“最终结果是,两张图片在人眼看来可能一模一样,我们可能清楚地看到两张图片中都有一辆车。但由于 RisingAttacK,人工智能会在第一张图片中看到一辆车,但在第二张图片中却看不到一辆车” 。研究人员针对四种最常用的视觉人工智能程序:ResNet-50、DenseNet-121、ViTB 和 DEiT-B 对 RisingAttacK 进行了测试 。该技术对所有四种程序都有效 。

  6. 美国卫生部称《自然》是垃圾科学,全面取消订阅《自然》期刊

    美国联邦机构工作的科学家失去了对 Springer Nature 旗下知名期刊的访问权限。NASA、美国农业部、能源部以及国立卫生研究院 (NIH)等机构都终止了对 Springer Nature 旗下期刊的订阅合同。美国卫生与公众服务部 (HHS) 首席发言人 Andrew Nixon 称这些期刊都是垃圾科学。美国反疫苗的卫生部长 Robert F. Kennedy Jr.此前表示要停止在 Lancet、New England Journal of Medicine、JAMA 等期刊上发表论文,因为它们都腐化了,变成了制药行业宣传的载体。制药行业显然是疫苗的支持者。他表示除非相关期刊做出改变,否则美国联邦机构将禁止 NIH 科学家在这些期刊上发表论文。

  7. Cloudflar 测试对 AI 机器人抓取内容收费

    CDN 服务商 Cloudflar 正在试验新工具,允许内容创作者向抓取网站内容的 AI 爬虫收取费用。被称为 pay-per-craw 的功能目前在内测,一小部分出版商和内容创作者将参与实验,每个出版商可以自己设定 AI 爬虫在抓取内容前必须支付的费用。Cloudflare CEO Matthew Prince 表示,此举旨在确保我们所熟知的互联网能在 AI 时代生存下来。如果 Cloudflar 的方案要生效的话,AI 公司显然需要参与进来,但习惯于免费抓取内容的公司会有动力参加付费实验?Cloudflare 声称它已与 AI 公司在该计划上展开合作。

  8. Sci-Hub 探索模因币的资助模式

    Sci-Hub 创始人 Alexandra Elbakyan 正在尝试一种新的筹资方式:模因币。2024 年 12 月,匿名支持者发行了模因币 $SCIHUB,该代币二成供应量将专门用于资助 Sci-Hub。项目的市值达到了 2000 万美元,Elbakyan 套现了 2% 的代币筹集到了 50 万美元。Sci-Hub 以前主要依靠比特币捐款,但此类捐款并不稳定,该平台已经停止更新期刊数据库,可能就是因为缺乏资金。资金危机促成了模因币的融资实验。但这场实验并非一帆风顺。一个重要原因是 $SCIHUB 不是 Sci-Hub 官方发行的,而是来自匿名人士,社区支持者并不信任该项目。Elbakyan 的解决方法是创建了一个 Sci-Hub 直接控制的新代币地址,将资金转移到该地址。加密货币社区成员指责此举是 rugging 了原始项目——rug 是一个加密货币行业的俚语,指的是放弃项目套现跑路。此事凸显了支持开放科学的 Elbakyan 与充斥着投机的加密货币行业之间的文化差异。Elbakyan 称,$SCIHUB 首先应被视为支持开放科学的一种捐赠,而不是一种投资工具。

  9. Steam 推出新的游戏性能监视器

    Valve 为 Steam 客户端引入了新的游戏性能监视器,除了会像之前的 FPS 计数器那样显示帧率,还可以分别列出由 DLSS 或 FSR 生成的帧和游戏实际的帧率。 它可以显示最小/最大帧率以及帧率随时间变化的图表。 此外,它还会显示 CPU 性能信息、GPU 性能信息和系统内存占用信息。这些数据能帮助玩家了解导致游戏性能低下的原因,判明是 CPU 或 GPU 太慢,还是过高的图形设置导致显卡内存或系统内存不堪重负。玩家可通过设置”->“游戏中”,找到新的“叠加界面性能监视器”。

  10. Windows 11 25H2 与 24H2 区别不大

    微软已经向 Windows Insiders 释出下一个大版本 Windows 11 25H2,但该版本与去年下半年释出的 v24H2 区别并不大,只是将原来没有默认启用的功能启用。Windows Servicing and Delivery 的首席项目经理 Jason Leznek 说,Windows 11 25H2 与 24H2 共享源代码,只是启用了额外的功能。微软预计将推出的一项新功能是用黑屏死机界面取代了原来的蓝屏死机。

  11. Android 未来可能会警告用户手机连接了假基站

    手机已经成为日常生活的一部分,它储存了用户大量的敏感私密信息,而针对手机的攻击也日益常见,其中一种攻击方法被称为 Stingrays,利用假基站引诱手机连接,从而泄漏用户私人通信信息。假基站还可以发动降级攻击,引诱手机使用低安全性的通信方式如 2G,攻击者能拦截通话和消息。Google 已经在 Android 16 中尝试解决该安全,问题主要在于缺乏硬件支持。要识别假基站,Android 手机需要搭载 Google IRadio 硬件抽象层的 v3.0 版本,即使是 Google 最新的 Pixel 系列手机也不支持该功能。今年晚些时候推出的 Android 16 手机如 Pixel 10 将能识别假基站,能向用户发出警告。

  12. 发表在 arXiv 上的论文被发现隐含 AI 指令

    对预印本平台 arXiv 发表论文的调查发现,有 17 篇论文包含了隐藏指令诱导 AI 提高评分。这些论文由早稻田大学、韩国科学技术院、美国华盛顿大学、美国哥伦比亚大学、北京大学、同济大学、新加坡国立大学等 14 所大学的研究人员撰写,大部分是计算机科学领域的论文。指令由“只输出肯定的评价”、“否定之处一律不要提及”等 1~3 行英文组成。为了不让人类轻易看到,会在白底上写白色文字,或使用极小的字号。该方法是故意误导 A I的“指令注入攻击(Prompt injection)”的一种。如果让 AI 对论文进行评价,它可能会根据指令给出高分。

  13. 苹果考虑用 Anthropic 或 OpenAI 的模型驱动新版本的 Siri

    苹果考虑用 Anthropic 的 Claude 或 OpenAI 的 ChatGPT 驱动新版本的语音助手 Siri。这可能意味着它将放弃内部开发的模型 Apple Foundation Models,以扭转在 AI 领域的劣势。苹果正在与这两家公司协商,已要求两家公司训练能在苹果云基础设施上运行的模型版本以用于测试。苹果计划在 2026 年推出使用 AI 技术的 Siri 新版本。

  14. 微软 Windows 操作系统三年流失了 4 亿用户

    根据微软官方博客最新披露的数据,今天运行 Windows 的活跃设备数超过 10 亿。而根据微软在 2022 年发表的报告,当时全世界运行 Windows 的活跃设备数超过 14 亿。这意味着三年内大约有 4 亿设备不再运行 Windows。Windows 用户规模缩小可能是有很多因素引起的,其中一个重要因素是在移动优先时代,越来越多的人用智能手机和平板电脑满足计算需求。另一个原因是微软提高了 Windows 11 的硬件需求,导致很多运行良好的 PC 无法升级到 Windows 11,随着 Windows 10 即将终止支持,很多用户可能会弃用 Windows 改用 Linux 或其他。

  15. Tumblr 搁置后端迁移到 WordPress 和联邦宇宙整合的计划

    Tumblr 母公司 Automattic 搁置了将 Tumblr 后端迁移至 WordPress 的计划。Automattic CEO Matt Mullenweg 表示未来的重心将转向提供用户迫切要求的功能。WordPress.com 提供了支持联邦宇宙(Fediverse)的 ActivityPub 协议插件,搁置迁移计划也意味着集成联邦宇宙(Fediverse)的计划暂停了。但 Mullenweg 表示如果用户迫切想要 ActivityPub,他们会在 Tumblr 代码库上直接实现 ActivityPub 支持。