DIGEST · 2025-07-01

OrangeBot.AI Digest — 2025-07-01

68 headlines across 8 sources, aggregated for this day.

Hacker News(15)

  1. Figma Files Registration Statement for Proposed Initial Public Offering (www.figma.com)
  2. PlanetScale for Postgres (planetscale.com)
  3. The Fed says this is a cube of $1M. They're off by half a million (calvin.sh)
  4. Ask HN: Who is hiring? (July 2025)
  5. Scientists identify culprit behind biggest-ever U.S. honey bee die-off (www.science.org)
  6. Show HN: Jobs by Referral: Find jobs in your LinkedIn network (jobsbyreferral.com)
  7. Grammarly acquires Superhuman (www.reuters.com)
  8. Feasibility study of a mission to Sedna - Nuclear propulsion and solar sailing (arxiv.org)
  9. Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite Webpages (simedw.com)
  10. Cloudflare to introduce pay-per-crawl for AI bots (blog.cloudflare.com)
  11. I built something that changed my friend group's social fabric (blog.danpetrolito.xyz)
  12. OpenFLOW – Quickly make beautiful infrastructure diagrams local to your machine (github.com)
  13. Aging-related inflammation is not universal across human populations (www.publichealth.columbia.edu)
  14. Genetic code enables zebrafish to mend damaged organs (www.caltech.edu)
  15. Show HN: A continuation of IRS Direct File that can be self-hosted (github.com)

GitHub Trending(14)

  1. microsoft / generative-ai-for-beginners

    21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

  2. GraphiteEditor / Graphite

    An open source graphics editor for 2025: comprehensive 2D content creation tool for graphic design, digital art, and interactive real-time motion graphics — featuring node-based procedural editing

  3. confident-ai / deepeval

    The LLM Evaluation Framework

  4. octra-labs / wallet-gen
  5. ColorlibHQ / AdminLTE

    AdminLTE - Free admin dashboard template based on Bootstrap 5

  6. twentyhq / twenty

    Building a modern alternative to Salesforce, powered by the community.

  7. actualbudget / actual

    A local-first personal finance app

  8. NanmiCoder / MediaCrawler

    小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫

  9. swisskyrepo / PayloadsAllTheThings

    A list of useful payloads and bypass for Web Application Security and Pentest/CTF

  10. TapXWorld / ChinaTextbook

    所有小初高、大学PDF教材。

  11. The-Cool-Coders / Project-Ideas-And-Resources

    A Collection of application ideas that can be used to improve your coding skills ❤.

  12. onlook-dev / onlook

    The Cursor for Designers • An Open-Source Visual Vibecoding Editor • Visually build, style, and edit your React App with AI

  13. jnsahaj / tweakcn

    A visual no-code theme editor for shadcn/ui components

  14. snailyp / gemini-balance

    Gemini polling proxy service (gemini轮询代理服务)

Product Hunt(15)

  1. Cursor Agents: Browsers & Mobile

    Work with a powerful coding assistant anywhere

  2. Dynamic Mockups

    Create realistic mockups at scale​

  3. Rybbit

    The open source Google Analytics replacement

  4. co.dev MCP

    Turn your ideas into full-stack apps

  5. Handit.ai

    The open-source engine that auto-improves your AI agents

  6. Folderly AI

    AI-generated emails that hit the inbox and get replies

  7. Lunacal

    Stunning scheduling pages like you’ve never seen before

  8. Well Extract

    AI-powered receipt & invoice extraction for developers

  9. Mindly

    Capture anything in seconds with automatic organization

  10. KwaKwa AI Course Creator

    Instantly create and sell online courses

  11. NanoAPI Cloud

    A dependency map of your code that shows where to refactor

  12. BeamUp

    Drag-and-drop files straight into your S3

  13. Minicule

    Use AI to visualize biomedical knowledge

  14. PrompTessor

    AI prompt analysis and optimization

  15. The Influencer AI

    Create consistent AI personas for photos and videos

Hugging Face(15)

  1. Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs

    Fine-tuning pretrained LLMs has been shown to be an effective strategy for reaching state-of-the-art performance on specific tasks like machine translation. However, this process of adaptation often implies sacrificing general-purpose capabilities, such as conversational reasoning and instruction-following, hampering the utility of the system in real-world applications that require a mixture of skills. In this paper, we introduce Tower+, a suite of models designed to deliver strong performance across both translation and multilingual general-purpose text capabilities. We achieve a Pareto frontier between translation specialization and multilingual general-purpose capabilities by introducing a novel training recipe that builds on Tower (Alves et al., 2024), comprising continued pretraining, supervised fine-tuning, preference optimization, and reinforcement learning with verifiable rewards. At each stage of training, we carefully generate and curate data to strengthen performance on translation as well as general-purpose tasks involving code generation, mathematics problem solving, and general instruction-following. We develop models at multiple scales: 2B, 9B, and 72B. Our smaller models often outperform larger general-purpose open-weight and proprietary LLMs (e.g., Llama 3.3 70B, GPT-4o). Our largest model delivers best-in-class translation performance for high-resource languages and top results in multilingual Arena Hard evaluations and in IF-MT, a benchmark we introduce for evaluating both translation and instruction-following. Our findings highlight that it is possible to rival frontier models in general capabilities, while optimizing for specific business domains, such as translation and localization.

  2. MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation

    Recent advances in optical flow estimation have prioritized accuracy at the cost of growing GPU memory consumption, particularly for high-resolution (FullHD) inputs. We introduce MEMFOF, a memory-efficient multi-frame optical flow method that identifies a favorable trade-off between multi-frame estimation and GPU memory usage. Notably, MEMFOF requires only 2.09 GB of GPU memory at runtime for 1080p inputs, and 28.5 GB during training, which uniquely positions our method to be trained at native 1080p without the need for cropping or downsampling. We systematically revisit design choices from RAFT-like architectures, integrating reduced correlation volumes and high-resolution training protocols alongside multi-frame estimation, to achieve state-of-the-art performance across multiple benchmarks while substantially reducing memory overhead. Our method outperforms more resource-intensive alternatives in both accuracy and runtime efficiency, validating its robustness for flow estimation at high resolutions. At the time of submission, our method ranks first on the Spring benchmark with a 1-pixel (1px) outlier rate of 3.289, leads Sintel (clean) with an endpoint error (EPE) of 0.963, and achieves the best Fl-all error on KITTI-2015 at 2.94%. The code is available at https://github.com/msu-video-group/memfof.

  3. Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

    Metalenses offer significant potential for ultra-compact computational imaging but face challenges from complex optical degradation and computational restoration difficulties. Existing methods typically rely on precise optical calibration or massive paired datasets, which are non-trivial for real-world imaging systems. Furthermore, a lack of control over the inference process often results in undesirable hallucinated artifacts. We introduce Degradation-Modeled Multipath Diffusion for tunable metalens photography, leveraging powerful natural image priors from pretrained models instead of large datasets. Our framework uses positive, neutral, and negative-prompt paths to balance high-frequency detail generation, structural fidelity, and suppression of metalens-specific degradation, alongside pseudo data augmentation. A tunable decoder enables controlled trade-offs between fidelity and perceptual quality. Additionally, a spatially varying degradation-aware attention (SVDA) module adaptively models complex optical and sensor-induced degradation. Finally, we design and build a millimeter-scale MetaCamera for real-world validation. Extensive results show that our approach outperforms state-of-the-art methods, achieving high-fidelity and sharp image reconstruction. More materials: https://dmdiff.github.io/.

  4. Listener-Rewarded Thinking in VLMs for Image Preferences

    Training robust and generalizable reward models for human visual preferences is essential for aligning text-to-image and text-to-video generative models with human intent. However, current reward models often fail to generalize, and supervised fine-tuning leads to memorization, demanding complex annotation pipelines. While reinforcement learning (RL), specifically Group Relative Policy Optimization (GRPO), improves generalization, we uncover a key failure mode: a significant drop in reasoning accuracy occurs when a model's reasoning trace contradicts that of an independent, frozen vision-language model ("listener") evaluating the same output. To address this, we introduce a listener-augmented GRPO framework. Here, the listener re-evaluates the reasoner's chain-of-thought to provide a dense, calibrated confidence score, shaping the RL reward signal. This encourages the reasoner not only to answer correctly, but to produce explanations that are persuasive to an independent model. Our listener-shaped reward scheme achieves best accuracy on the ImageReward benchmark (67.4%), significantly improves out-of-distribution (OOD) performance on a large-scale human preference dataset (1.2M votes, up to +6% over naive reasoner), and reduces reasoning contradictions compared to strong GRPO and SFT baselines. These results demonstrate that listener-based rewards provide a scalable, data-efficient path to aligning vision-language models with nuanced human preferences. We will release our reasoning model here: https://huggingface.co/alexgambashidze/qwen2.5vl_image_preference_reasoner.

  5. RoboScape: Physics-informed Embodied World Model

    World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling. Extensive experiments demonstrate that RoboScape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. Our work provides new insights for building efficient physics-informed world models to advance embodied intelligence research. The code is available at: https://github.com/tsinghua-fib-lab/RoboScape.

  6. MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning

    The ability to process information from multiple modalities and to reason through it step-by-step remains a critical challenge in advancing artificial intelligence. However, existing reasoning benchmarks focus on text-only reasoning, or employ multimodal questions that can be answered by directly retrieving information from a non-text modality. Thus, complex reasoning remains poorly understood in multimodal domains. Here, we present MARBLE, a challenging multimodal reasoning benchmark that is designed to scrutinize multimodal language models (MLLMs) in their ability to carefully reason step-by-step through complex multimodal problems and environments. MARBLE is composed of two highly challenging tasks, M-Portal and M-Cube, that require the crafting and understanding of multistep plans under spatial, visual, and physical constraints. We find that current MLLMs perform poorly on MARBLE -- all the 12 advanced models obtain near-random performance on M-Portal and 0% accuracy on M-Cube. Only in simplified subtasks some models outperform the random baseline, indicating that complex reasoning is still a challenge for existing MLLMs. Moreover, we show that perception remains a bottleneck, where MLLMs occasionally fail to extract information from the visual inputs. By shedding a light on the limitations of MLLMs, we hope that MARBLE will spur the development of the next generation of models with the ability to reason and plan across many, multimodal reasoning steps.

  7. Ovis-U1 Technical Report

    In this report, we introduce Ovis-U1, a 3-billion-parameter unified model that integrates multimodal understanding, text-to-image generation, and image editing capabilities. Building on the foundation of the Ovis series, Ovis-U1 incorporates a diffusion-based visual decoder paired with a bidirectional token refiner, enabling image generation tasks comparable to leading models like GPT-4o. Unlike some previous models that use a frozen MLLM for generation tasks, Ovis-U1 utilizes a new unified training approach starting from a language model. Compared to training solely on understanding or generation tasks, unified training yields better performance, demonstrating the enhancement achieved by integrating these two tasks. Ovis-U1 achieves a score of 69.6 on the OpenCompass Multi-modal Academic Benchmark, surpassing recent state-of-the-art models such as Ristretto-3B and SAIL-VL-1.5-2B. In text-to-image generation, it excels with scores of 83.72 and 0.89 on the DPG-Bench and GenEval benchmarks, respectively. For image editing, it achieves 4.00 and 6.42 on the ImgEdit-Bench and GEdit-Bench-EN, respectively. As the initial version of the Ovis unified model series, Ovis-U1 pushes the boundaries of multimodal understanding, generation, and editing.

  8. UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

    Urban research involves a wide range of scenarios and tasks that require the understanding of multi-modal data. Current methods often focus on specific data types and lack a unified framework in urban field for processing them comprehensively. The recent success of multi-modal large language models (MLLMs) presents a promising opportunity to overcome this limitation. In this paper, we introduce UrbanLLaVA, a multi-modal large language model designed to process these four types of data simultaneously and achieve strong performance across diverse urban tasks compared with general MLLMs. In UrbanLLaVA, we first curate a diverse urban instruction dataset encompassing both single-modal and cross-modal urban data, spanning from location view to global view of urban environment. Additionally, we propose a multi-stage training framework that decouples spatial reasoning enhancement from domain knowledge learning, thereby improving the compatibility and downstream performance of UrbanLLaVA across diverse urban tasks. Finally, we also extend existing benchmark for urban research to assess the performance of MLLMs across a wide range of urban tasks. Experimental results from three cities demonstrate that UrbanLLaVA outperforms open-source and proprietary MLLMs in both single-modal tasks and complex cross-modal tasks and shows robust generalization abilities across cities. Source codes and data are openly accessible to the research community via https://github.com/tsinghua-fib-lab/UrbanLLaVA.

  9. VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs

    In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, followed by verification by a base LLM, a target model, accepting a subset as its valid generation. As it is usually considered that the speculative decoding requires one-to-one mapping between vocabularies of the target model and the draft model, it has been natural to share the vocabulary between them, or even share the LM head as in EAGLE or Medusa. We first identify that this draft token sampling scheme inherently contains an unnecessary inference overhead in drafting, especially for some target LLMs with very large vocabularies. Then, we propose a simple technique, VocabTrim, to mitigate the drafting overhead to improve the generation speed in memory-bound environment. VocabTrim reconstructs the drafter LM head to contain only a limited set of tokens, selected by the most frequently sampled from the vocabulary of the target model. While limiting the vocabulary in drafting slightly degrades the acceptance rate, it significantly reduces the drafting latency in memory-bound process which is often the case on edge devices, resulting in higher memory-bound speed up (MBSU). We show that our method can boost the memory-bound speed-up for Llama-3 models on Spec-Bench, specifically by 16% for Llama-3.2-3B-Instruct.

  10. ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

    While end-to-end video-to-audio generation has greatly improved, producing high-fidelity audio that authentically captures the nuances of visual content remains challenging. Like professionals in the creative industries, such generation requires sophisticated reasoning about items such as visual dynamics, acoustic environments, and temporal relationships. We present ThinkSound, a novel framework that leverages Chain-of-Thought (CoT) reasoning to enable stepwise, interactive audio generation and editing for videos. Our approach decomposes the process into three complementary stages: foundational foley generation that creates semantically coherent soundscapes, interactive object-centric refinement through precise user interactions, and targeted editing guided by natural language instructions. At each stage, a multimodal large language model generates contextually aligned CoT reasoning that guides a unified audio foundation model. Furthermore, we introduce AudioCoT, a comprehensive dataset with structured reasoning annotations that establishes connections between visual content, textual descriptions, and sound synthesis. Experiments demonstrate that ThinkSound achieves state-of-the-art performance in video-to-audio generation across both audio metrics and CoT metrics and excels in out-of-distribution Movie Gen Audio benchmark. The demo page is available at https://ThinkSound-Project.github.io.

  11. SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

    Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

  12. Teaching a Language Model to Speak the Language of Tools

    External tool integration through function-calling is essential for practical language model applications, yet most multilingual models lack reliable tool-use capabilities in non-English languages. Even state-of-the-art multilingual models struggle with determining when to use tools and generating the structured outputs required for function calls, often exhibiting language confusion when prompted in lower-resource languages. This work presents a methodology for adapting existing language models to enable robust tool use in any target language, using Bulgarian as a case study. The approach involves continued training of the BgGPT model series (2.6B, 9B, 27B parameters) on a novel bilingual dataset of 10,035 function-calling examples designed to support standardized protocols like MCP (Model Context Protocol). The research introduces TUCAN (Tool-Using Capable Assistant Navigator), which achieves up to 28.75% improvement in function-calling accuracy over base models while preserving core language understanding, as verified on established Bulgarian benchmarks. Beyond accuracy gains, TUCAN models demonstrate production-ready response formatting with clean, parsable function calls, contrasting with the verbose and inconsistent outputs of base models. The models, evaluation framework, and dataset are released to enable replication for other languages. This work demonstrates a practical approach for extending tool-augmented capabilities beyond English-centric systems.

  13. SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

    Recent advances in reinforcement learning have shown that language models can develop sophisticated reasoning through training on tasks with verifiable rewards, but these approaches depend on human-curated problem-answer pairs and domain-specific reward engineering. We introduce SPIRAL, a self-play framework where models learn by playing multi-turn, zero-sum games against continuously improving versions of themselves, eliminating the need for human supervision. Through self-play, SPIRAL generates an infinite curriculum of progressively challenging problems as models must constantly adapt to stronger opponents. To enable this self-play training at scale, We implement a fully online, multi-turn, multi-agent reinforcement learning system for LLMs and propose role-conditioned advantage estimation (RAE) to stabilize multi-agent training. Using SPIRAL, self-play on zero-sum games produces reasoning capabilities that transfer broadly. Training Qwen3-4B-Base on Kuhn Poker alone achieves 8.6% improvement on math and 8.4% on general reasoning, outperforming SFT on 25,000 expert game trajectories. Analysis reveals that this transfer occurs through three cognitive patterns: systematic decomposition, expected value calculation, and case-by-case analysis. Multi-game training (TicTacToe, Kuhn Poker, Simple Negotiation) further enhances performance as each game develops distinct reasoning strengths. Applying SPIRAL to a strong reasoning model (DeepSeek-R1-Distill-Qwen-7B) can still lead to 2.0% average improvement. These results demonstrate that zero-sum games naturally develop transferable reasoning capabilities, highlighting a promising direction for autonomous reasoning development.

  14. Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

    We propose a novel prompt design paradigm that challenges conventional wisdom in large language model (LLM) prompting. While conventional wisdom prioritizes well-crafted instructions and demonstrations for in-context learning (ICL), we show that pruning random demonstrations into seemingly incoherent "gibberish" can remarkably improve performance across diverse tasks. Notably, the "gibberish" always matches or surpasses state-of-the-art automatic prompt optimization techniques, achieving substantial gains regardless of LLM alignment. Nevertheless, discovering an effective pruning strategy is non-trivial, as existing attribution methods and prompt compression algorithms fail to deliver robust results, let alone human intuition. In terms of this, we propose a self-discover prompt optimization framework, PromptQuine, an evolutionary search framework that automatically searches for the pruning strategy by itself using only low-data regimes. Much like the emergent complexity in nature--such as symbiosis and self-organization--arising in response to resource constraints, our framework evolves and refines unconventional yet highly effective prompts by leveraging only the tokens present within the context. We demonstrate its effectiveness across classification, multi-choice question answering, generation and math reasoning tasks across LLMs, while achieving decent runtime efficiency. We hope our findings can guide mechanistic studies on in-context learning, and provide a call to action, to pave the way for more open-ended search algorithms for more effective LLM prompting.

  15. Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention

    Depth images captured by Time-of-Flight (ToF) sensors are prone to noise, requiring denoising for reliable downstream applications. Previous works either focus on single-frame processing, or perform multi-frame processing without considering depth variations at corresponding pixels across frames, leading to undesirable temporal inconsistency and spatial ambiguity. In this paper, we propose a novel ToF depth denoising network leveraging motion-invariant graph fusion to simultaneously enhance temporal stability and spatial sharpness. Specifically, despite depth shifts across frames, graph structures exhibit temporal self-similarity, enabling cross-frame geometric attention for graph fusion. Then, by incorporating an image smoothness prior on the fused graph and data fidelity term derived from ToF noise distribution, we formulate a maximum a posterior problem for ToF denoising. Finally, the solution is unrolled into iterative filters whose weights are adaptively learned from the graph-informed geometric attention, producing a high-performance yet interpretable network. Experimental results demonstrate that the proposed scheme achieves state-of-the-art performance in terms of accuracy and consistency on synthetic DVToF dataset and exhibits robust generalization on the real Kinectv2 dataset. Source code will be released at https://github.com/davidweidawang/GIGA-ToF{https://github.com/davidweidawang/GIGA-ToF}.

Solidot(9)

  1. 小行星 2024 YR4 撞击月球概率上升至 1/25

    小行星 2024 YR4 基本不可能撞击地球,但 2032 年 12 月撞击月球的概率提高到了 1/25。若撞击真的发生,预估将在月球表面形成一个约 1 公里直径的新撞击坑。虽然月球本身无需防御,撞击也不会对月球轨道运行有任何影响。但撞击所造成的抛射物有可能进入地球同步轨道范围,对部分卫星系统造成干扰风险。这也提醒我们,太空防御的范畴不应限于地球,整个地月系统的安全亦不可忽视。

  2. 碳记录显示人类五万年前开始大规模用火

    中科院海洋所研究团队与德法研究人员合作在 PNAS 期刊发表论文,基于海洋沉积物中的黑碳记录,重建了过去 30万 年以来东亚北部的古火演化历史,结合欧洲、东亚、东南亚及澳大利亚区域的记录以及考古遗址大数据,发现现代人类大规模用火始于约 5 万年前。考古学研究发现,人类最早的用火记录可追溯至约 170 万年前。但关于人类究竟何时开始大规模用火,目前仍难以给出确切的答案。黑碳是生物质及化石燃料燃烧过程中所生成的一系列含碳化合物的统称。鉴于其芳香族结构具备高度稳定性,黑碳能够在沉积环境中得以长期留存。以大河作为主要沉积物源区的边缘海,其沉积物中的黑碳很大程度上能够反映大陆尺度的火活动状况。研究认为,5 万年前的冰期,现代人类开启了第二次走出非洲的迁徙历程。冰期海平面下降,印太暖池区大面积的陆架出露为陆地,雨林屏障作用减弱,使得人类在不到一万年的时间里就迅速扩散至东亚、东南亚乃至澳大利亚。人口的急剧扩张极大地促进了用火频率的上升。此外,冰期气候寒冷,食物资源相对匮乏,人类对用火的需求也随之大幅增加。这些因素最终共同促成了 5 万年前成为人类开始大规模用火的关键时间节点。这也进一步表明,人类可能在末次冰期就已经通过用火在全球碳循环演变中留下了深刻印记。

  3. 研究发现消费者对 AI 产品信任度低

    两项研究发现消费者对 AI 产品信任度低,购买意愿也低。AI 对产品推广产生了负面影响,这种影响在高风险产品中尤其显著,低风险产品则不太明显。在其中一项研究中,研究人员将参与者分成两组,每组大约 100 人。一组阅读突出 AI 或 AI-powered 等特性的虚构产品和服务的广告,另一组阅读的广告使用了新技术或配备了尖端技术等术语。相比另一组,阅读带有 AI 等关键词广告的参与者报告尝试或购买相关产品和服务的可能性较低。另一项研究由市场研究公司 Parks Associates 完成,调查规模更大。在接受调查的约 4000 名美国人中,18% 的人表示 AI 可能会增加购买意愿,24% 的人表示不太可能,而 58% 的人表示 AI 对他们没有影响。

  4. Canonical 2024 年营收 2.92 亿美元

    根据 Canonical 向 UK Companies House 递交的 2024 年财报,Ubuntu 发行版的开发商在 2024 年营收达到了 2.92 亿美元,2023 年是 2.51 亿美元,而 2022 年是 2.05 亿美元,公司的员工总数也达到 1,175 人。相比下 2014 年 Canonical 的营收仅为 8100 万美元,员工人数约 337 人,公司处于长期亏损状态。暂时不清楚 Canonical 何时会 IPO,早在 2022 年就传出将在 2023 年 IPO 的消息。

  5. 研究发现白垩纪海洋是“乌贼的天下”

    以往观点认为 1亿至7000万年前的白垩纪后期海洋生物以菊石和鱼类为主,但日本研究团队发现当时的海洋实际上是“乌贼的天下”。由于乌贼没有外壳和骨骼,作为化石很难被发现,此前从未被纳入白垩纪海洋的生态图景。研究团队开发出新技术,将岩石以百分之一毫米精度逐层切削拍摄、数字化立体重现内部包括微小化石在内的所有化石。从北海道各地的白垩纪岩石中鉴定出263个乌贼喙部硬组织化石,平均尺寸约为4毫米。

  6. 日本争议夫妇别姓法案

    日本国会上个月未通过允许已婚夫妇保留不同姓氏的“可选择的夫妇别姓制度”法案,尽管民调显示大部分民众对法案表示支持。日本是唯一一个法律要求已婚夫妇使用同一姓氏的国家,95% 的女性选择随夫姓。非政府组织 Asuniwa 的一项研究认为,允许夫妇保留不同姓氏或有助于提高生育率,因为有很多伴侣为避免改姓而宁愿不结婚。如教师 Uchiyama Yukari 和 Koike Yuki 为躲避法律离婚再婚三次,大部分时间处于非婚状态,但为了给孩子登记出生记录,他们会选择结婚然后就离婚。

  7. 中国平面设计师面临 AI 图像生成器的挑战

    中国平面设计师体会到了 AI 图像生成器对其日常工作的影响。AI 图像生成器容易模仿艺术风格,深刻改变了客户对设计师作品的认识。一家大型电商平台的匿名员工称,在 AI 图像生成器流行前,科技巨头和大型企业的平面设计师就被指示拷贝竞争对手或复制社媒上的作品。对于一种独特的艺术风格,人类需要理解和逆向工程才能复制。而 AI 图像生成器只是给这种艺术风格引入随机的变化,其结果可能会非常像复制品,可能会包括错误,人类平面设计师可以在此基础上编辑成产品。这位匿名员工称,如果不拥抱 AI,会觉得非常容易被取代。在北京和伦敦经营工作室的设计师 Sendi Jia 说,AI 图像生成器正迫使设计师和客户重新思考设计师的价值,设计师的价值仅仅在于创作设计?还是在于咨询、创意、策略、方向和审美?北京的平面设计师 Erbing 说,AI 无法产生任何独特的东西,“每个项目都面临着不同的问题,设计师的存在是为了解决具体问题,而不是创造千篇一律的视觉效果。”他说一个项目的思考过程经常比实际创作更耗时,他认为 AI 图像生成器是一种玩具而不是工具。但设计师们承认 AI 的狂热让客户对其作品价值产生了负面影响。客户现在希望设计师以更少的费用在更短的时间内完成作品。这可能导致质量的下降。Erbin 说,部分客户认为 AI 提高了效率,那么他们的预算可以减半了,但设计师的工作并不是作图。

  8. Bcachefs 文件系统可能将会移除出内核

    因与维护者 Kent Overstreet 之间存在分歧,Linux 作者再次威胁要将 Bcachefs 文件系统从内核中移除出去。Linus Torvalds 在最新拉取评论中表示有可能在 6.17 合并窗口期间会与 Bcachefs 分道扬镳。他给出的理由是双方的开发理念存在巨大分歧,Torvalds 说他甚至无法对 Bcachefs 的 bug 修复提出任何质疑,好像他只能按照 Overstreet 的要求拉取代码,他说双方争吵之后的唯一共识是“we're done”。

  9. 德国要求苹果和 Google 下架 DeepSeek

    德国联邦数据保护与信息自由专员 Meike Kamp 周五表示,她已正式要求苹果和 Google 将中国 AI 公司 DeepSeek APP 从德国地区的 App Store 和 Google Play 下架。原因是该公司未能证明其数据处理符合欧盟标准,涉嫌将德国用户的个人数据非法转移至中国。根据 DeepSeek 的隐私政策,该应用会将用户的 AI 请求、上传文件等多种个人信息储存在中国境内的服务器上。德国监管机构今年早些时候要求 DeepSeek 满足欧盟关于数据跨境传输的合规要求,或主动下架应用。但 DeepSeek 并未作出回应,因此 Kamp 启动了正式下架程序。