Weekly Digest — 2025-W27
168 unique stories (2025-06-30 → 2025-07-06), aggregated across 8 sources.
Hacker News(42)
- The New Skill in AI Is Not Prompting, It's Context Engineering (www.philschmid.de)
- Xfinity using WiFi signals in your house to detect motion (www.xfinity.com)
- Proton joins suit against Apple for predatory practices (proton.me)
- Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?
- I write type-safe generic data structures in C (danielchasehooper.com)
- There are no new ideas in AI only new datasets (blog.jxmo.io)
- Figma Files Registration Statement for Proposed Initial Public Offering (www.figma.com)
- PlanetScale for Postgres (planetscale.com)
- The Fed says this is a cube of $1M. They're off by half a million (calvin.sh)
- Ask HN: Who is hiring? (July 2025)
- Scientists identify culprit behind biggest-ever U.S. honey bee die-off (www.science.org)
- Show HN: Jobs by Referral: Find jobs in your LinkedIn network (jobsbyreferral.com)
GitHub Trending(30)
- GraphiteEditor / Graphite
A FOSS graphics editor for 2025: comprehensive 2D content creation tool for graphic design, digital art, and interactive real-time motion graphics — featuring node-based procedural editing
- twentyhq / twenty
Building a modern alternative to Salesforce, powered by the community.
- nextcloud / all-in-one
📦 The official Nextcloud installation method. Provides easy deployment and maintenance with most features included in this one Nextcloud instance.
- midday-ai / midday
Invoicing, Time tracking, File reconciliation, Storage, Financial Overview & your own Assistant made for Freelancers
- octra-labs / wallet-gen
- actualbudget / actual
A local-first personal finance app
- microsoft / generative-ai-for-beginners
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
- confident-ai / deepeval
The LLM Evaluation Framework
- ColorlibHQ / AdminLTE
AdminLTE - Free admin dashboard template based on Bootstrap 5
- NanmiCoder / MediaCrawler
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
- zaidmukaddam / scira
Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet and cites it too. Powered by Vercel AI SDK! Search with models like xAI's Grok 3.
- microsoft / Mastering-GitHub-Copilot-for-Paired-Programming
A multi-module course teaching everything you need to know about using GitHub Copilot as an AI Peer Programming resource.
Product Hunt(41)
- Tabl 1.0
A multi-player web browser
- Pokecut
Use AI to create photos with just a few click or a prompt
- Jotform Presentation Agents
Create AI presentations that talk, listen and answers
- Picsart Ignite 2.0: AI for Creators
Instantly generate branded assets, ads, videos, fonts + more
- Foxylingo
Chat and exchange languages with real people worldwide
- Bolt Connect
Embedded marketplace payouts, designed for developers
- Cursor Agents: Browsers & Mobile
Work with a powerful coding assistant anywhere
- Dynamic Mockups
Create realistic mockups at scale
- Rybbit
The open source Google Analytics replacement
- co.dev MCP
Turn your ideas into full-stack apps
- Handit.ai
The open-source engine that auto-improves your AI agents
- Folderly AI
AI-generated emails that hit the inbox and get replies
Hugging Face(27)
- Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
- Adaptive Domain Modeling with Language Models: A Multi-Agent Approach to Task Planning
We introduce TAPAS (Task-based Adaptation and Planning using AgentS), a multi-agent framework that integrates Large Language Models (LLMs) with symbolic planning to solve complex tasks without the need for manually defined environment models. TAPAS employs specialized LLM-based agents that collaboratively generate and adapt domain models, initial states, and goal specifications as needed using structured tool-calling mechanisms. Through this tool-based interaction, downstream agents can request modifications from upstream agents, enabling adaptation to novel attributes and constraints without manual domain redefinition. A ReAct (Reason+Act)-style execution agent, coupled with natural language plan translation, bridges the gap between dynamically generated plans and real-world robot capabilities. TAPAS demonstrates strong performance in benchmark planning domains and in the VirtualHome simulated real-world environment.
- Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
Internal world models (WMs) enable agents to understand the world's state and predict transitions, serving as the basis for advanced deliberative reasoning. Recent large Vision-Language Models (VLMs), such as OpenAI o3, GPT-4o and Gemini, exhibit potential as general-purpose WMs. While the latest studies have evaluated and shown limitations in specific capabilities such as visual understanding, a systematic evaluation of VLMs' fundamental WM abilities remains absent. Drawing on comparative psychology and cognitive science, we propose a two-stage framework that assesses Perception (visual, spatial, temporal, quantitative, and motion) and Prediction (mechanistic simulation, transitive inference, compositional inference) to provide an atomic evaluation of VLMs as WMs. Guided by this framework, we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660 experiments on 15 latest commercial and open-source VLMs, we find that these models exhibit striking limitations in basic world modeling abilities. For instance, almost all models perform at near-random accuracy when distinguishing motion trajectories. Additionally, they lack disentangled understanding -- e.g., some models tend to believe blue objects move faster than green ones. More rich results and analyses reveal significant gaps between VLMs and human-level world modeling.
- Spatial Mental Modeling from Limited Views
Can Vision Language Models (VLMs) imagine the full scene from just a few views, like humans do? Humans form spatial mental models, internal representations of unseen space, to reason about layout, perspective, and motion. Our new MindCube benchmark with 21,154 questions across 3,268 images exposes this critical gap, where existing VLMs exhibit near-random performance. Using MindCube, we systematically evaluate how well VLMs build robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for "what-if" movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps. The significant improvement comes from a synergistic approach, "map-then-reason", that jointly trains the model to first generate a cognitive map and then reason upon it. By training models to reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of unobservable space.
- SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning
Multimodal in-context learning (ICL) remains underexplored despite significant potential for domains such as medicine. Clinicians routinely encounter diverse, specialized tasks requiring adaptation from limited examples, such as drawing insights from a few relevant prior cases or considering a constrained set of differential diagnoses. While multimodal large language models (MLLMs) have shown advances in medical visual question answering (VQA), their ability to learn multimodal tasks from context is largely unknown. We introduce SMMILE, the first expert-driven multimodal ICL benchmark for medical tasks. Eleven medical experts curated problems, each including a multimodal query and multimodal in-context examples as task demonstrations. SMMILE encompasses 111 problems (517 question-image-answer triplets) covering 6 medical specialties and 13 imaging modalities. We further introduce SMMILE++, an augmented variant with 1038 permuted problems. A comprehensive evaluation of 15 MLLMs demonstrates that most models exhibit moderate to poor multimodal ICL ability in medical tasks. In open-ended evaluations, ICL contributes only 8% average improvement over zero-shot on SMMILE and 9.4% on SMMILE++. We observe a susceptibility for irrelevant in-context examples: even a single noisy or irrelevant example can degrade performance by up to 9.5%. Moreover, example ordering exhibits a recency bias, i.e., placing the most relevant example last can lead to substantial performance improvements by up to 71%. Our findings highlight critical limitations and biases in current MLLMs when learning multimodal medical tasks from context.
- Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy
Recent advances in deep generative modeling have unlocked unprecedented opportunities for video synthesis. In real-world applications, however, users often seek tools to faithfully realize their creative editing intentions with precise and consistent control. Despite the progress achieved by existing methods, ensuring fine-grained alignment with user intentions remains an open and challenging problem. In this work, we present Shape-for-Motion, a novel framework that incorporates a 3D proxy for precise and consistent video editing. Shape-for-Motion achieves this by converting the target object in the input video to a time-consistent mesh, i.e., a 3D proxy, allowing edits to be performed directly on the proxy and then inferred back to the video frames. To simplify the editing process, we design a novel Dual-Propagation Strategy that allows users to perform edits on the 3D mesh of a single frame, and the edits are then automatically propagated to the 3D meshes of the other frames. The 3D meshes for different frames are further projected onto the 2D space to produce the edited geometry and texture renderings, which serve as inputs to a decoupled video diffusion model for generating edited results. Our framework supports various precise and physically-consistent manipulations across the video frames, including pose editing, rotation, scaling, translation, texture modification, and object composition. Our approach marks a key step toward high-quality, controllable video editing workflows. Extensive experiments demonstrate the superiority and effectiveness of our approach. Project page: https://shapeformotion.github.io/
- Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs
Fine-tuning pretrained LLMs has been shown to be an effective strategy for reaching state-of-the-art performance on specific tasks like machine translation. However, this process of adaptation often implies sacrificing general-purpose capabilities, such as conversational reasoning and instruction-following, hampering the utility of the system in real-world applications that require a mixture of skills. In this paper, we introduce Tower+, a suite of models designed to deliver strong performance across both translation and multilingual general-purpose text capabilities. We achieve a Pareto frontier between translation specialization and multilingual general-purpose capabilities by introducing a novel training recipe that builds on Tower (Alves et al., 2024), comprising continued pretraining, supervised fine-tuning, preference optimization, and reinforcement learning with verifiable rewards. At each stage of training, we carefully generate and curate data to strengthen performance on translation as well as general-purpose tasks involving code generation, mathematics problem solving, and general instruction-following. We develop models at multiple scales: 2B, 9B, and 72B. Our smaller models often outperform larger general-purpose open-weight and proprietary LLMs (e.g., Llama 3.3 70B, GPT-4o). Our largest model delivers best-in-class translation performance for high-resource languages and top results in multilingual Arena Hard evaluations and in IF-MT, a benchmark we introduce for evaluating both translation and instruction-following. Our findings highlight that it is possible to rival frontier models in general capabilities, while optimizing for specific business domains, such as translation and localization.
- MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation
Recent advances in optical flow estimation have prioritized accuracy at the cost of growing GPU memory consumption, particularly for high-resolution (FullHD) inputs. We introduce MEMFOF, a memory-efficient multi-frame optical flow method that identifies a favorable trade-off between multi-frame estimation and GPU memory usage. Notably, MEMFOF requires only 2.09 GB of GPU memory at runtime for 1080p inputs, and 28.5 GB during training, which uniquely positions our method to be trained at native 1080p without the need for cropping or downsampling. We systematically revisit design choices from RAFT-like architectures, integrating reduced correlation volumes and high-resolution training protocols alongside multi-frame estimation, to achieve state-of-the-art performance across multiple benchmarks while substantially reducing memory overhead. Our method outperforms more resource-intensive alternatives in both accuracy and runtime efficiency, validating its robustness for flow estimation at high resolutions. At the time of submission, our method ranks first on the Spring benchmark with a 1-pixel (1px) outlier rate of 3.289, leads Sintel (clean) with an endpoint error (EPE) of 0.963, and achieves the best Fl-all error on KITTI-2015 at 2.94%. The code is available at https://github.com/msu-video-group/memfof.
- Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography
Metalenses offer significant potential for ultra-compact computational imaging but face challenges from complex optical degradation and computational restoration difficulties. Existing methods typically rely on precise optical calibration or massive paired datasets, which are non-trivial for real-world imaging systems. Furthermore, a lack of control over the inference process often results in undesirable hallucinated artifacts. We introduce Degradation-Modeled Multipath Diffusion for tunable metalens photography, leveraging powerful natural image priors from pretrained models instead of large datasets. Our framework uses positive, neutral, and negative-prompt paths to balance high-frequency detail generation, structural fidelity, and suppression of metalens-specific degradation, alongside pseudo data augmentation. A tunable decoder enables controlled trade-offs between fidelity and perceptual quality. Additionally, a spatially varying degradation-aware attention (SVDA) module adaptively models complex optical and sensor-induced degradation. Finally, we design and build a millimeter-scale MetaCamera for real-world validation. Extensive results show that our approach outperforms state-of-the-art methods, achieving high-fidelity and sharp image reconstruction. More materials: https://dmdiff.github.io/.
- Listener-Rewarded Thinking in VLMs for Image Preferences
Training robust and generalizable reward models for human visual preferences is essential for aligning text-to-image and text-to-video generative models with human intent. However, current reward models often fail to generalize, and supervised fine-tuning leads to memorization, demanding complex annotation pipelines. While reinforcement learning (RL), specifically Group Relative Policy Optimization (GRPO), improves generalization, we uncover a key failure mode: a significant drop in reasoning accuracy occurs when a model's reasoning trace contradicts that of an independent, frozen vision-language model ("listener") evaluating the same output. To address this, we introduce a listener-augmented GRPO framework. Here, the listener re-evaluates the reasoner's chain-of-thought to provide a dense, calibrated confidence score, shaping the RL reward signal. This encourages the reasoner not only to answer correctly, but to produce explanations that are persuasive to an independent model. Our listener-shaped reward scheme achieves best accuracy on the ImageReward benchmark (67.4%), significantly improves out-of-distribution (OOD) performance on a large-scale human preference dataset (1.2M votes, up to +6% over naive reasoner), and reduces reasoning contradictions compared to strong GRPO and SFT baselines. These results demonstrate that listener-based rewards provide a scalable, data-efficient path to aligning vision-language models with nuanced human preferences. We will release our reasoning model here: https://huggingface.co/alexgambashidze/qwen2.5vl_image_preference_reasoner.
- RoboScape: Physics-informed Embodied World Model
World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling. Extensive experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research. The code is available at: https://github.com/tsinghua-fib-lab/RoboScape.
- MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning
The ability to process information from multiple modalities and to reason through it step-by-step remains a critical challenge in advancing artificial intelligence. However, existing reasoning benchmarks focus on text-only reasoning, or employ multimodal questions that can be answered by directly retrieving information from a non-text modality. Thus, complex reasoning remains poorly understood in multimodal domains. Here, we present MARBLE, a challenging multimodal reasoning benchmark that is designed to scrutinize multimodal language models (MLLMs) in their ability to carefully reason step-by-step through complex multimodal problems and environments. MARBLE is composed of two highly challenging tasks, M-Portal and M-Cube, that require the crafting and understanding of multistep plans under spatial, visual, and physical constraints. We find that current MLLMs perform poorly on MARBLE -- all the 12 advanced models obtain near-random performance on M-Portal and 0% accuracy on M-Cube. Only in simplified subtasks some models outperform the random baseline, indicating that complex reasoning is still a challenge for existing MLLMs. Moreover, we show that perception remains a bottleneck, where MLLMs occasionally fail to extract information from the visual inputs. By shedding a light on the limitations of MLLMs, we hope that MARBLE will spur the development of the next generation of models with the ability to reason and plan across many, multimodal reasoning steps.
Solidot(28)
- 小行星 2024 YR4 撞击月球概率上升至 1/25
小行星 2024 YR4 基本不可能撞击地球,但 2032 年 12 月撞击月球的概率提高到了 1/25。若撞击真的发生,预估将在月球表面形成一个约 1 公里直径的新撞击坑。虽然月球本身无需防御,撞击也不会对月球轨道运行有任何影响。但撞击所造成的抛射物有可能进入地球同步轨道范围,对部分卫星系统造成干扰风险。这也提醒我们,太空防御的范畴不应限于地球,整个地月系统的安全亦不可忽视。
- 碳记录显示人类五万年前开始大规模用火
中科院海洋所研究团队与德法研究人员合作在 PNAS 期刊发表论文,基于海洋沉积物中的黑碳记录,重建了过去 30万 年以来东亚北部的古火演化历史,结合欧洲、东亚、东南亚及澳大利亚区域的记录以及考古遗址大数据,发现现代人类大规模用火始于约 5 万年前。考古学研究发现,人类最早的用火记录可追溯至约 170 万年前。但关于人类究竟何时开始大规模用火,目前仍难以给出确切的答案。黑碳是生物质及化石燃料燃烧过程中所生成的一系列含碳化合物的统称。鉴于其芳香族结构具备高度稳定性,黑碳能够在沉积环境中得以长期留存。以大河作为主要沉积物源区的边缘海,其沉积物中的黑碳很大程度上能够反映大陆尺度的火活动状况。研究认为,5 万年前的冰期,现代人类开启了第二次走出非洲的迁徙历程。冰期海平面下降,印太暖池区大面积的陆架出露为陆地,雨林屏障作用减弱,使得人类在不到一万年的时间里就迅速扩散至东亚、东南亚乃至澳大利亚。人口的急剧扩张极大地促进了用火频率的上升。此外,冰期气候寒冷,食物资源相对匮乏,人类对用火的需求也随之大幅增加。这些因素最终共同促成了 5 万年前成为人类开始大规模用火的关键时间节点。这也进一步表明,人类可能在末次冰期就已经通过用火在全球碳循环演变中留下了深刻印记。
- 研究发现消费者对 AI 产品信任度低
两项研究发现消费者对 AI 产品信任度低,购买意愿也低。AI 对产品推广产生了负面影响,这种影响在高风险产品中尤其显著,低风险产品则不太明显。在其中一项研究中,研究人员将参与者分成两组,每组大约 100 人。一组阅读突出 AI 或 AI-powered 等特性的虚构产品和服务的广告,另一组阅读的广告使用了新技术或配备了尖端技术等术语。相比另一组,阅读带有 AI 等关键词广告的参与者报告尝试或购买相关产品和服务的可能性较低。另一项研究由市场研究公司 Parks Associates 完成,调查规模更大。在接受调查的约 4000 名美国人中,18% 的人表示 AI 可能会增加购买意愿,24% 的人表示不太可能,而 58% 的人表示 AI 对他们没有影响。
- Canonical 2024 年营收 2.92 亿美元
根据 Canonical 向 UK Companies House 递交的 2024 年财报,Ubuntu 发行版的开发商在 2024 年营收达到了 2.92 亿美元,2023 年是 2.51 亿美元,而 2022 年是 2.05 亿美元,公司的员工总数也达到 1,175 人。相比下 2014 年 Canonical 的营收仅为 8100 万美元,员工人数约 337 人,公司处于长期亏损状态。暂时不清楚 Canonical 何时会 IPO,早在 2022 年就传出将在 2023 年 IPO 的消息。
- 研究发现白垩纪海洋是“乌贼的天下”
以往观点认为 1亿至7000万年前的白垩纪后期海洋生物以菊石和鱼类为主,但日本研究团队发现当时的海洋实际上是“乌贼的天下”。由于乌贼没有外壳和骨骼,作为化石很难被发现,此前从未被纳入白垩纪海洋的生态图景。研究团队开发出新技术,将岩石以百分之一毫米精度逐层切削拍摄、数字化立体重现内部包括微小化石在内的所有化石。从北海道各地的白垩纪岩石中鉴定出263个乌贼喙部硬组织化石,平均尺寸约为4毫米。
- 日本争议夫妇别姓法案
日本国会上个月未通过允许已婚夫妇保留不同姓氏的“可选择的夫妇别姓制度”法案,尽管民调显示大部分民众对法案表示支持。日本是唯一一个法律要求已婚夫妇使用同一姓氏的国家,95% 的女性选择随夫姓。非政府组织 Asuniwa 的一项研究认为,允许夫妇保留不同姓氏或有助于提高生育率,因为有很多伴侣为避免改姓而宁愿不结婚。如教师 Uchiyama Yukari 和 Koike Yuki 为躲避法律离婚再婚三次,大部分时间处于非婚状态,但为了给孩子登记出生记录,他们会选择结婚然后就离婚。
- 华为发布了使用昇腾 NPU 训练的开放权重模型
华为发布了使用其昇腾 NPU 训练的开放权重模型,模型发布在 Gitcode 上,其许可证禁止欧盟地区使用。被称为盘古 Pro MoE 的模型总参数 720 亿,每个 token 激活 160 亿参数。模型为昇腾 300I Duo 和 800I A2 进行了优化,单卡推理性能达到了 1148 token/s,通过预测加速(speculative acceleration)能进一步提高到 1528 token/s。华为研究人员称,在参数低于 1000 亿的模型中,盘古 Pro MoE 的性能超越了 GLM-Z1-32B 和 Qwen3-32B 等知名开放权重模型。
- 首批美国科学难民抵达法国
首批逃离特朗普统治的美国科学难民抵达了法国。Aix-Marseille 大学(AMU)通过 Safe Place for Science 项目引进了首批 8 名美国科学家。这些科学家尚未与大学签订合同,大多数人要求匿名以便于在未被聘用的情况下保住美国的职位。申请 Safe Place for Science 项目的科学家包括了气候科学家 James 及其研究司法系统与民主关系的妻子。James 不愿意透露他的姓,他不认为自己是难民,但对特朗普治下学术研究的未来深表担忧。他的研究领域受到了当局的针对,面临研究资金削减。AMU 表示虽然它在法国之外的知名度较低,但来自斯坦福大学和耶鲁大学等美国知名大学的 298 名研究人员申请了该项目,凸显了美国的紧迫形势。
- 炎症衰老可能是工业化生活方式的产物
炎症长期以来被认为是衰老的标志,但根据哥伦比亚大学梅尔曼公共卫生学院的一项新研究,炎症可能并非人类的普遍经历。研究表明,炎症性衰老(inflammaging)似乎是工业化生活方式的副产物,在全球人群中存在显著差异。研究人员分析了四个群体的数据:两个工业化群体以及两个非工业化的原住民群体(玻利维亚亚马逊地区的 Tsimane 人和马来西亚半岛的 Orang Asli 人)。尽管两个工业化群体的炎症特征相似,但在原住民群体中却并非如此,因为原住民群体的炎症水平主要受感染而非年龄的影响。大多数慢性疾病(包括糖尿病、心脏病、阿尔茨海默病)在土著群体中很少见或基本不存在。研究人员发现,大约 66% 的 Tsimane 人至少有一种肠道寄生虫感染;超过 70% 的 Orang Asli 人存在流行性感染。炎症标志物与工业化群体的慢性病密切相关,但与土著群体无关。
- GNU Health Hospital Information System 5.0 释出
针对医疗行业的自由软件 GNU Health Hospital Information System 释出了 5.0 版本。主要变化包括:改进报告和分析,更全面的处理不同类型的患者信息,重新设计了医学影像子系统,完善了保险和计费功能,等等。
- RisingAttacK 攻击让 AI “看到”你想让它看到的内容
研究人员展示了一种攻击人工智能计算机视觉系统的新方法,使其能够控制人工智能“看到”的内容 。研究表明,这种名为 RisingAttacK 的新技术能有效操纵所有最广泛使用的人工智能计算机视觉系统 。RisingAttacK 由一系列操作组成,目标是对图像进行最少的更改,从而允许用户操纵视觉 AI“看到”的内容 。首先,RisingAttacK 识别图像中的所有视觉特征 。该程序还运行一个操作,以确定哪些特征对于实现攻击目标最重要。RisingAttacK 随后计算人工智能系统对数据变化的敏感度,并确定人工智能对关键特征数据变化的敏感度 。研究人员称,“最终结果是,两张图片在人眼看来可能一模一样,我们可能清楚地看到两张图片中都有一辆车。但由于 RisingAttacK,人工智能会在第一张图片中看到一辆车,但在第二张图片中却看不到一辆车” 。研究人员针对四种最常用的视觉人工智能程序:ResNet-50、DenseNet-121、ViTB 和 DEiT-B 对 RisingAttacK 进行了测试 。该技术对所有四种程序都有效 。
- 美国卫生部称《自然》是垃圾科学,全面取消订阅《自然》期刊
美国联邦机构工作的科学家失去了对 Springer Nature 旗下知名期刊的访问权限。NASA、美国农业部、能源部以及国立卫生研究院 (NIH)等机构都终止了对 Springer Nature 旗下期刊的订阅合同。美国卫生与公众服务部 (HHS) 首席发言人 Andrew Nixon 称这些期刊都是垃圾科学。美国反疫苗的卫生部长 Robert F. Kennedy Jr.此前表示要停止在 Lancet、New England Journal of Medicine、JAMA 等期刊上发表论文,因为它们都腐化了,变成了制药行业宣传的载体。制药行业显然是疫苗的支持者。他表示除非相关期刊做出改变,否则美国联邦机构将禁止 NIH 科学家在这些期刊上发表论文。