OrangeBot.AI Digest — 2025-11-05
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Photos: New Phoenix Microcenter is a 'tech-heaven' for geeks (www.phoenixnewtimes.com)
- I was right about dishwasher pods and now I can prove it [video] (www.youtube.com)
- Solarpunk is happening in Africa (climatedrift.substack.com)
- New gel restores dental enamel and could revolutionise tooth repair (www.nottingham.ac.uk)
- Dillo, a multi-platform graphical web browser (github.com)
- Norway reviews cybersecurity after remote-access feature found in Chinese buses (scandasia.com)
- Microsoft Can't Keep EU Data Safe from US Authorities (www.forbes.com)
- Ask HN: My family business runs on a 1993-era text-based-UI (TUI). Anybody else?
- The shadows lurking in the equations (gods.art)
- An eBPF Loophole: Using XDP for Egress Traffic (loopholelabs.io)
- iOS 26.2 to allow third-party app stores in Japan ahead of regulatory deadline (www.macrumors.com)
- YouTube erased more than 700 videos documenting Israeli human rights violations (theintercept.com)
- I’m worried that they put co-pilot in Excel (simonwillison.net)
- SPy: An interpreter and compiler for a fast statically typed variant of Python (antocuni.eu)
- Hypothesis: Property-Based Testing for Python (hypothesis.readthedocs.io)
GitHub Trending(15)
- 666ghj / BettaFish
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
- Skyvern-AI / skyvern
Automate browser based workflows with AI
- HKUDS / DeepCode
"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"
- nocobase / nocobase
NocoBase is the most extensible AI-powered no-code/low-code platform for building business applications and enterprise solutions.
- mudler / LocalAI
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference
- sst / opentui
OpenTUI is a library for building terminal user interfaces (TUIs)
- prometheus / alertmanager
Prometheus Alertmanager
- GopeedLab / gopeed
A modern download manager that supports all platforms. Built with Golang and Flutter.
- imthenachoman / How-To-Secure-A-Linux-Server
An evolving how-to guide for securing a Linux server.
- Raphire / Win11Debloat
A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry, as well as perform various other changes to customize, declutter and improve your Windows experience. Win11Debloat works for both Windows 10 and Windows 11.
- GoogleCloudPlatform / vertex-ai-creative-studio
GenMedia Creative Studio is a Vertex AI generative media user experience highlighting the use of Imagen, Veo, Gemini 🍌, Gemini TTS, Chirp 3, Lyria and other generative media APIs on Google Cloud.
- VectifyAI / PageIndex
📑 PageIndex: Document Index for Reasoning-based RAG
- coleam00 / ottomator-agents
All the open source AI Agents hosted on the oTTomator Live Agent Studio platform!
- NickvisionApps / Parabolic
Download web video and audio
- aandrew-me / ytDownloader
Desktop App for downloading Videos and Audios from hundreds of sites
Hugging Face(15)
- Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
The growing success of Vision-Language-Action (VLA) models stems from the promise that pretrained Vision-Language Models (VLMs) can endow agents with transferable world knowledge and vision-language (VL) grounding, laying a foundation for action models with broader generalization. Yet when these VLMs are adapted to the action modality, it remains unclear to what extent their original VL representations and knowledge are preserved. In this work, we conduct a systematic study of representation retention during VLA fine-tuning, showing that naive action fine-tuning leads to degradation of visual representations. To characterize and measure these effects, we probe VLA's hidden representations and analyze attention maps, further, we design a set of targeted tasks and methods that contrast VLA models with their counterpart VLMs, isolating changes in VL capabilities induced by action fine-tuning. We further evaluate a range of strategies for aligning visual representations and introduce a simple yet effective method that mitigates degradation and yields improved generalization to out-of-distribution (OOD) scenarios. Taken together, our analysis clarifies the trade-off between action fine-tuning and the degradation of VL representations and highlights practical approaches to recover inherited VL capabilities. Code is publicly available: https://blind-vla-paper.github.io
- VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored. Inspired by how humans reason over sketches, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benchmark that reframes multimodal understanding as code generation: given an image, a model must produce SVG that preserves symbolic meaning for downstream reasoning. VCode covers three domains - general commonsense (MM-Vet), professional disciplines (MMMU), and visual-centric perception (CV-Bench). To assess symbolic fidelity, we propose CodeVQA, a novel evaluation protocol in which a policy model answers questions over rendered SVGs; correct answers indicate faithful symbolic preservation. Empirically, frontier VLMs struggle to generate faithful SVGs, revealing a persistent gap between language-centric and visual-centric coding. To close this gap, we introduce VCoder, an agentic framework that augments VLMs along two axes: (i) Thinking with Revision, which iteratively analyzes discrepancies and refines SVG code; and (ii) Acting with Visual Tools, where detectors and parsers supply structured cues such as objects, shapes, and text beyond the model's intrinsic capacity. Across benchmarks, frontier VLMs with strong reasoning capabilities score well overall yet remain limited in professional knowledge and 3D reasoning. VCoder delivers a 12.3-point overall gain over the top-performing Claude-4-Opus. Human studies show that both humans and VLMs perform worse on rendered SVGs, their consistency reveals the promise of symbolic visual representation. The benchmark and code are available at https://github.com/CSU-JPG/VCode.
- When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
We propose MIRA, a new benchmark designed to evaluate models in scenarios where generating intermediate visual images is essential for successful reasoning. Unlike traditional CoT methods that rely solely on text, tasks in MIRA require models to generate and utilize intermediate images - such as sketches, structural diagrams, or path drawings - to guide their reasoning process. This setup closely mirrors how humans solve complex problems through "drawing to think". To solve this, MIRA focuses on tasks that are intrinsically challenging and involve complex structures, spatial relationships, or reasoning steps that are difficult to express through language alone. To ensure that our evaluation data is of high-quality, we include 546 multimodal problems, annotated with intermediate visual images and final answers. We also propose a unified evaluation protocol for MIRA that spans three levels of evaluation input: direct input with image and question only, text-only CoT input with image and thinking prompts, and Visual-CoT input with both annotated image clues and textual thinking prompts. To probe the upper bound of model capacity on our benchmark, we also report pass@k and majority voting accuracies under different k settings. Experimental results show that existing multimodal large language models, including strongest private models as well as strong open-weight models, perform poorly when relying solely on textual prompts. However, when intermediate visual cues are provided, model performance improves consistently, yielding an average relative gain of 33.7% across all models and tasks. We also probe the upper bound by expanding the search space and designing textual prompts aligned with Visual-CoT, but both yield only limited improvements compared to our Visual-CoT setting. These results underscore the critical role of imagined visual information in enabling successful reasoning on MIRA.
- When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Multimodal large language models (MLLMs) must resolve conflicts when different modalities provide contradictory information, a process we term modality following. Prior work measured this behavior only with coarse dataset-level statistics, overlooking the influence of model's confidence in unimodal reasoning. In this paper, we introduce a new framework that decomposes modality following into two fundamental factors: relative reasoning uncertainty (the case-specific confidence gap between unimodal predictions) and inherent modality preference( a model's stable bias when uncertainties are balanced). To validate this framework, we construct a controllable dataset that systematically varies the reasoning difficulty of visual and textual inputs. Using entropy as a fine-grained uncertainty metric, we uncover a universal law: the probability of following a modality decreases monotonically as its relative uncertainty increases. At the relative difficulty level where the model tends to follow both modalities with comparable probability what we call the balance point, a practical indicator of the model's inherent preference. Unlike traditional macro-level ratios, this measure offers a more principled and less confounded way to characterize modality bias, disentangling it from unimodal capabilities and dataset artifacts. Further, by probing layer-wise predictions, we reveal the internal mechanism of oscillation: in ambiguous regions near the balance point, models vacillate between modalities across layers, explaining externally observed indecision. Together, these findings establish relative uncertainty and inherent preference as the two governing principles of modality following, offering both a quantitative framework and mechanistic insight into how MLLMs resolve conflicting information.
- The Collaboration Gap
The trajectory of AI development suggests that we will increasingly rely on agent-based systems composed of independently developed agents with different information, privileges, and tools. The success of these systems will critically depend on effective collaboration among these heterogeneous agents, even under partial observability. Despite intense interest, few empirical studies have evaluated such agent-agent collaboration at scale. We propose a collaborative maze-solving benchmark that (i) isolates collaborative capabilities, (ii) modulates problem complexity, (iii) enables scalable automated grading, and (iv) imposes no output-format constraints, preserving ecological plausibility. Using this framework, we evaluate 32 leading open- and closed-source models in solo, homogeneous, and heterogeneous pairings. Our results reveal a "collaboration gap": models that perform well solo often degrade substantially when required to collaborate. Collaboration can break down dramatically; for instance, small distilled models that solve mazes well alone may fail almost completely in certain pairings. We find that starting with the stronger agent often improves outcomes, motivating a "relay inference" approach where the stronger agent leads before handing off to the weaker one, closing much of the gap. Our findings argue for (1) collaboration-aware evaluation, (2) training strategies developed to enhance collaborative capabilities, and (3) interaction design that reliably elicits agents' latent skills, guidance that applies to AI-AI and human-AI collaboration.
- Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present "Brain-IT", a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i)high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii)low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT's design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.
- Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
Large language models (LLMs) trained for step-by-step reasoning often become excessively verbose, raising inference cost. Standard Reinforcement Learning with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for training efficiency, leaving the model to train primarily on harder problems that require longer reasoning chains. This skews the output length distribution upward, resulting in a model that conflates ``thinking longer'' with ``thinking better''. In this work, we show that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. Exposing the model to solvable short-chain tasks constrains its output distribution and prevents runaway verbosity. The result is \emph{emergent brevity for free}: the model learns to solve harder problems without inflating the output length, despite the absence of any explicit length penalization. RLVR experiments using this approach on Qwen3-4B-Thinking-2507 (with a 16k token limit) achieve baseline pass@1 AIME25 accuracy while generating solutions that are, on average, nearly twice as short. The code is available at https://github.com/MBZUAI-Paris/Frugal-AI{GitHub}, with datasets and models on https://huggingface.co/collections/MBZUAI-Paris/k2-think-mini-68dcfa8b114686a4bd3dc2bc{Hugging Face}.
- LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Current evaluation paradigms for large language models (LLMs) represent a critical blind spot in AI research--relying on opaque numerical metrics that conceal fundamental limitations in spatial reasoning while providing no intuitive understanding of model capabilities. This deficiency creates a dangerous disconnect between reported performance and practical abilities, particularly for applications requiring physical world understanding. We introduce LTD-Bench, a breakthrough benchmark that transforms LLM evaluation from abstract scores to directly observable visual outputs by requiring models to generate drawings through dot matrices or executable code. This approach makes spatial reasoning limitations immediately apparent even to non-experts, bridging the fundamental gap between statistical performance and intuitive assessment. LTD-Bench implements a comprehensive methodology with complementary generation tasks (testing spatial imagination) and recognition tasks (assessing spatial perception) across three progressively challenging difficulty levels, methodically evaluating both directions of the critical language-spatial mapping. Our extensive experiments with state-of-the-art models expose an alarming capability gap: even LLMs achieving impressive results on traditional benchmarks demonstrate profound deficiencies in establishing bidirectional mappings between language and spatial concept--a fundamental limitation that undermines their potential as genuine world models. Furthermore, LTD-Bench's visual outputs enable powerful diagnostic analysis, offering a potential approach to investigate model similarity.
- Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Large multimodal models (LMMs) often suffer from severe inference inefficiency due to the large number of visual tokens introduced by image encoders. While recent token compression methods, such as pruning and merging, have shown promise in reducing redundancy, their evaluation remains fragmented and inconsistent. In this work, we present UniPruneBench, a unified and extensible benchmark for visual token pruning in multimodal LLMs. UniPruneBench provides standardized protocols across six ability dimensions and ten datasets, covering ten representative compression algorithms and three families of LMMs (LLaVA-v1.5, Intern-VL3, and Qwen2.5-VL). Beyond task accuracy, it incorporates system-level metrics such as runtime and prefilling latency to provide a holistic view. Our experiments uncover several key findings: (1) random pruning is a surprisingly strong baseline, (2) no single method consistently outperforms others across scenarios, (3) pruning sensitivity varies significantly across tasks, with OCR being most vulnerable, and (4) pruning ratio is the dominant factor governing performance degradation. We believe UniPruneBench will serve as a reliable foundation for future research on efficient multimodal modeling.
- CodeClash: Benchmarking Goal-Oriented Software Engineering
Current benchmarks for coding evaluate language models (LMs) on concrete, well-specified tasks such as fixing specific bugs or writing targeted tests. However, human programmers do not spend all day incessantly addressing isolated tasks. Instead, real-world software development is grounded in the pursuit of high-level goals, like improving user retention or reducing costs. Evaluating whether LMs can also iteratively develop code to better accomplish open-ended objectives without any explicit guidance remains an open challenge. To address this, we introduce CodeClash, a benchmark where LMs compete in multi-round tournaments to build the best codebase for achieving a competitive objective. Each round proceeds in two phases: agents edit their code, then their codebases compete head-to-head in a code arena that determines winners based on objectives like score maximization, resource acquisition, or survival. Whether it's writing notes, scrutinizing documentation, analyzing competition logs, or creating test suites, models must decide for themselves how to improve their codebases both absolutely and against their opponents. We run 1680 tournaments (25,200 rounds total) to evaluate 8 LMs across 6 arenas. Our results reveal that while models exhibit diverse development styles, they share fundamental limitations in strategic reasoning. Models also struggle with long-term codebase maintenance, as repositories become progressively messy and redundant. These limitations are stark: top models lose every round against expert human programmers. We open-source CodeClash to advance the study of autonomous, goal-oriented code development.
- TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action models in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoperation systems either use decoupled control or depend on expensive motion capture setups. We introduce TWIST2, a portable, mocap-free humanoid teleoperation and data collection system that preserves full whole-body control while advancing scalability. Our system leverages PICO4U VR for obtaining real-time whole-body human motions, with a custom 2-DoF robot neck (cost around $250) for egocentric vision, enabling holistic human-to-humanoid control. We demonstrate long-horizon dexterous and mobile humanoid skills and we can collect 100 demonstrations in 15 minutes with an almost 100% success rate. Building on this pipeline, we propose a hierarchical visuomotor policy framework that autonomously controls the full humanoid body based on egocentric vision. Our visuomotor policy successfully demonstrates whole-body dexterous manipulation and dynamic kicking tasks. The entire system is fully reproducible and open-sourced at https://yanjieze.com/TWIST2 . Our collected dataset is also open-sourced at https://twist-data.github.io .
- RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies
Testing on real machines is indispensable for robotic control algorithms. In the context of learning-based algorithms, especially VLA models, demand for large-scale evaluation, i.e. testing a large number of models on a large number of tasks, is becoming increasingly urgent. However, doing this right is highly non-trivial, especially when scalability and reproducibility is taken into account. In this report, we describe our methodology for constructing RoboChallenge, an online evaluation system to test robotic control algorithms, and our survey of recent state-of-the-art VLA models using our initial benchmark Table30.
- BRAINS: A Retrieval-Augmented System for Alzheimer's Detection and Monitoring
As the global burden of Alzheimer's disease (AD) continues to grow, early and accurate detection has become increasingly critical, especially in regions with limited access to advanced diagnostic tools. We propose BRAINS (Biomedical Retrieval-Augmented Intelligence for Neurodegeneration Screening) to address this challenge. This novel system harnesses the powerful reasoning capabilities of Large Language Models (LLMs) for Alzheimer's detection and monitoring. BRAINS features a dual-module architecture: a cognitive diagnostic module and a case-retrieval module. The Diagnostic Module utilizes LLMs fine-tuned on cognitive and neuroimaging datasets -- including MMSE, CDR scores, and brain volume metrics -- to perform structured assessments of Alzheimer's risk. Meanwhile, the Case Retrieval Module encodes patient profiles into latent representations and retrieves similar cases from a curated knowledge base. These auxiliary cases are fused with the input profile via a Case Fusion Layer to enhance contextual understanding. The combined representation is then processed with clinical prompts for inference. Evaluations on real-world datasets demonstrate BRAINS effectiveness in classifying disease severity and identifying early signs of cognitive decline. This system not only shows strong potential as an assistive tool for scalable, explainable, and early-stage Alzheimer's disease detection, but also offers hope for future applications in the field.
- ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension
Complex chart understanding tasks demand advanced visual recognition and reasoning capabilities from multimodal large language models (MLLMs). However, current research provides limited coverage of complex chart scenarios and computation-intensive reasoning tasks prevalent in real-world applications. This study proposes an automated multi-stage code-driven pipeline for systematically generating visual reasoning datasets to address these limitations. The pipeline integrates retrieval-augmented generation (RAG) to retrieve professional chart templates and employs chain-of-thought (CoT) strategies to generate reasoning codes that simulate real data distributions, thereby driving chart rendering and question-related statistical computations. Through model-based evaluation, the pipeline enhances chart diversity and data quality. Using this framework, we construct ChartM^3, a multi-dimensional and multi-step dataset containing 38K charts and 142K Q&A pairs for training, along with 2,871 high-quality evaluation samples for enabling practical performance assessment. Supervised fine-tuning (SFT) and reinforcement learning (RL) experiments demonstrate that our dataset significantly improves reasoning capabilities and cross-domain generalization performance, enabling smaller models to achieve performance comparable to larger-scale models in complex chart comprehension.
- iFlyBot-VLA Technical Report
We introduce iFlyBot-VLA, a large-scale Vision-Language-Action (VLA) model trained under a novel framework. The main contributions are listed as follows: (1) a latent action model thoroughly trained on large-scale human and robotic manipulation videos; (2) a dual-level action representation framework that jointly supervises both the Vision-Language Model (VLM) and the action expert during training; (3) a mixed training strategy that combines robot trajectory data with general QA and spatial QA datasets, effectively enhancing the 3D perceptual and reasoning capabilities of the VLM backbone. Specifically, the VLM is trained to predict two complementary forms of actions: latent actions, derived from our latent action model pretrained on cross-embodiment manipulation data, which capture implicit high-level intentions; and structured discrete action tokens, obtained through frequency-domain transformations of continuous control signals, which encode explicit low-level dynamics. This dual supervision aligns the representation spaces of language, vision, and action, enabling the VLM to directly contribute to action generation. Experimental results on the LIBERO Franka benchmark demonstrate the superiority of our frame-work, while real-world evaluations further show that iFlyBot-VLA achieves competitive success rates across diverse and challenging manipulation tasks. Furthermore, we plan to open-source a portion of our self-constructed dataset to support future research in the community
Solidot(15)
- 逾七成开发者认为 Steam 是 PC 游戏市场的垄断者
Atomik Research 在 2025 年 5月 18-22 日间调查了英美两国 306 位游戏行业高管,其中四分之三的受访者是 C 级别的高管,77% 的受访者来自人数逾 50 人的游戏工作室。研究发现,大多数工作室的收入逾四分之三来自 Steam。72% 的受访者认为 Steam 垄断了 PC 游戏市场。游戏开发商也开始利用其它平台如 Epic Game Store 和 Xbox PC Games store。48% 受访者在两个平台发行过游戏,10% 受访者使用过 GOG,8% 受访者使用过 Itch.io。32% 开发者以物理媒介发行过部分游戏。
- 特朗普再次提名 Jared Isaacman 为 NASA 局长
美国总统特朗普再次提名亿万富翁、私人宇航员 Jared Isaacman 为 NASA 局长。他在声明中没有解释为什么今年五月撤回了对 Isaacman 的提名而如今又再次认为他能胜任。此前特朗普取消提名被认为与马斯克(Elon Musk)退出特朗普的核心圈子有关,Isaacman 是马斯克青睐的人选,曾搭乘 SpaceX 的飞船多次飞到地球轨道。特朗普在今年 7 月任命了运输部长 Sean Duffy 兼任 NASA 局长,但他最近的一系列言论和透露的 NASA 的计划引发了很多争议。与此同时,特朗普的幕僚则在继续推荐 Isaacman,而 Isaacman 也被发现与特朗普多次共餐,显示两人关系良好。美国政府目前处于停摆中,确认 Isaacman 的提名可能需要很长的时间。
- 天文学家可能发现了大爆炸之后的第一代恒星
天文学家一直在寻找宇宙最初诞生的第一代恒星,如今他们或许终于找到它们的踪迹。美国俄亥俄州托雷多大学(University of Toledo)研究团队对韦伯太空望远镜(JWST)的引力透镜观测资料进行详细分析后,认为在遥远星系 LAP1-B 中,他们可能捕捉到了这些宇宙初生恒星的光芒。第一代恒星主要由氢与氦构成,含有微量的锂,这些都是大爆炸后遗留的原始元素。这些恒星极为罕见,寿命极短因此早已消亡,但它们遗留的微弱星光在穿越遥远距离后仍然有机会被捕捉。过去曾出现多次第一代恒星的候选对象,但最终都因为不符合理论预测的三大预测而被排除:形成于极低金属丰度的小型暗物质晕中;质量在 10 到 1,000 倍太阳质量之间;应该以小型星团的形式诞生,星团总质量数千倍太阳质量。LAP1-B 被认为同时满足三项条件。这个恒星系统形成于一个质量约为太阳 5,000 万倍的暗物质团块。其次,这些恒星质量介于太阳的 10 到 1,000 倍之间。最后,它们以总质量仅数千倍太阳质量的小型星团存在。
- 微软测试用 Copilot 取代桌面搜索框
微软正将其 AI 助手 Copilot 集成到其每一个产品之中,而 Windows 操作系统则在更深入的整合 Copilot。微软在最新 Windows Insider Dev 和 Beta 版本中测试了用 Copilot 取代传统的桌面搜索框。Copilot 搜索框并没有默认启用,默认的搜索框显示了文字“搜索”,在用 Copilot 取代之后,搜索框会显示文字“问 Copilot 任何事”,用户可以输入 Copilot 提示词或搜索关键词。 目前的测试显示它并没有传统搜索功能强。
- Switch 2 首年销量预计将达到 1900 万台
任天堂周二发布预期称,Switch 2 的 2025 财年(截至 2026 年 3 月)全球销量将达到 1900 万台。高于最初计划的 1500 万台。任天堂上调了 2025 财年业绩预期。销售额将同比增长 93%,达到 2.25 万亿日元,首次突破 2 万亿日元,创下历史新高。净利润将同比增长 26%,达到 3500 亿日元。相比软件,游戏机硬件的利润率较低。拖累利润的另一个主要因素是特朗普政府的关税政策。面向美国市场的产品主要由作为代工方的越南基地生产并出口。为了优先推动普及,目前任天堂并未在美国市场对主机实施涨价。
- 老年人更可能一天到晚盯着屏幕
根据英国通信管理局 Ofcom 的数据,去年英国 65 岁以上的老年人平均每天在智能手机、电脑和平板上花费逾三个小时,看电视逾五个半小时。GWI 对七个国家的调查发现,65 岁以上的老年人比 25 岁以下的年轻人更有可能拥有平板、智能电视、电子书阅读器,台式机和笔记本电脑。近五分之一的 55-64 岁人群拥有一台游戏机。哈佛医学院 McLean 医院科技与老龄化实验室主任 Ipsit Vahia 表示,部分老年人越来越像青少年,生活重心日益倾向手机。韩国 2022 年的一项研究估计,60-69 岁人群中 15% 存在手机成瘾风险。今年 4 月发表的一项元分析调查了逾 40 万名老年人,发现经常用电子设备的 50 岁以上老年人的认知能力下降速度低于不使用电子设备的老年人。
- Google 移除安娜的档案 7.49 亿网址
根据 Google 的透明度报告,版权持有者要求移除安娜的档案(Anna's Archive)7.84 亿个网址,Google 满足了绝大部分的要求,移除了 7.49 亿网址,少数被驳回是因为 Google 没有索引相关链接。安娜的档案的网址占到了 Google 收到的所有版权移除请求数的 5%。自 2012 年首次公布透明度报告以来,版权持有者向 Google 报告了 151 亿个涉嫌侵权的网址。Penguin Random House 和 John Wiley & Sons 是针对安娜的档案的最活跃图书出版商,版权持有者平均每周向 Google 报告约 1000 万个安娜的档案新网址。
- 麒麟勒索软件滥用 WSL 在 Windows 中运行 Linux 加密器
麒麟(Qilin)勒索软件被发现滥用 Windows Subsystem for Linux(WSL)在 Windows 操作系统中执行 Linux 加密器以逃避传统安全工具的检测。麒麟勒索软件最初的名字叫 Agenda,2022 年 9 月改名为麒麟,沿用至今,它是目前最活跃的勒索软件之一。安全公司趋势科技和思科 Talos 报告,麒麟勒索软件组织今年至今攻击了 62 个国家的逾 700 名受害者,2025 年下半年每月新增受害者逾 40。趋势科技的安全研究员报告,麒麟勒索软件组织利用 WinSCP 将 Linux ELF 加密器传输到入侵的设备,然后通过 Splashtop 远程管理软件 (SRManager.exe) 直接在 Windows 系统中启动加密器。该加密器无法直接在 Windows 中运行,必须通过 WSL 子系统。WSL 允许用户直接在 Windows 系统中安装和运行 Linux 发行版,攻击者在获得设备的访问权限之后,会启用或安装 WSL,然后执行加密器,绕过传统的 Windows 安全软件。
- 科学家可能找到了治疗毒蛇咬伤的灵药
被非洲黑曼巴蛇咬伤的人通常只能活几个小时。蛇的毒液会破坏神经和肌肉,最终导致肺部和心脏麻痹。黑曼巴蛇并非唯一对人类构成威胁的毒蛇。在撒哈拉以南非洲,每年有逾 30 万人被蛇咬伤,逾 7000 人死亡。还有 1 万人需要截肢。这些仅仅是向当局报告的数字,实际数字可能要高得多。现在科学家报告找到了一种广谱抗蛇毒血清,能有效对抗曼巴蛇、眼镜蛇和唾蛇等致命毒蛇的毒液。丹麦技术大学研究团队通过组合名为纳米抗体的工程改造蛋白,创造了一种新型抗蛇毒血清。这些纳米抗体能靶向蛇毒中发现的关键毒素。在用眼镜蛇、曼巴蛇和唾蛇等 18 种非洲蛇的毒液给一只羊驼和一只美洲驼进行免疫接种后,团队鉴定出了这些纳米抗体。在小鼠实验中,这种抗蛇毒血清能防止其中 17 种蛇咬伤导致的死亡,并缓解一些最有害毒液导致的组织损伤。相比现有的商业抗蛇毒血清 Inoserp PAN-AFRICA,对所有测试蛇种,这种血清预防死亡和皮肤坏死的效果更好,但对绿曼巴蛇和黑曼巴蛇毒液只有部分保护能力。研究结果表明,只需少量新型血清就能实现普遍的蛇咬保护。
- 开盒背后的青少年
蔓延互联网的“开盒”事件背后,活跃着一个名为“喷系”的青少年群体。他们在社交网络构筑“帝国”,以“出征”为号,将人肉搜索与网络暴力视作权力的游戏,甚至将暴力延伸至现实,演变为跨省上门、撬锁、打人。他们还将“开盒“做成了一门生意,用普通网民的隐私与安全兑换零花钱。有少年仅一年半内,便借此获利十余万元。上百万青少年混迹于“喷系“。这些少年呈现出共同的画像:大多来自乡镇,早早辍学或休学,在无所事事中投入网络,随后沿着相似的轨迹滑落——在游戏中被人拉进“喷系”,毫无障碍地加入网暴,迅速蜕变为施暴者,再拉下一个少年入局。
- 澳大利亚人将每天分享到至少三小时的免费太阳能
从 2026 年起,澳大利亚的太阳能共享计划将为新南威尔士州、昆士兰州东南部和南澳的家庭每天提供至少三个小时的免费太阳能电力——没有安装屋顶太阳能电池板的家庭也能享受这项福利。政府表示,居民可以在免费供电期间安排使用洗衣机、洗碗机和空调等家电,以及为电动汽车和家用电池充电。澳大利亚人已安装了逾 400 万套太阳能系统,在中午经常会有廉价的剩余电力。太阳能共享计划的部分理念是:将用电高峰(尤其是在傍晚)的用电需求转移到阳光最充足的时段。此举有助于最大限度降低高峰时段的电价,减少为确保电网稳定而升级和干预电网的需求。
- 大模型无法可靠区分信念和事实
研究发现大模型(LLM)可能无法可靠识别用户的错误信念。这些发现凸显了在高风险决策领域,如医学、法律和科学等,需要谨慎使用 LLM 给出的结果,特别是当信念或观点与事实相悖时。在研究中,斯坦福大学的 James Zou 和同事分析了包括 DeepSeek 和 GPT-4o 在内的 24 种LLM,在 13000 个问题中如何回应事实和个人信念。当要求它们验证事实性数据的真假时,较新的 LLM 平均准确率分别为 91.1% 或 91.5%,较老的模型平均准确率分别为 84.8% 或 71.5%。当要求模型回应第一人称信念,即“我相信……”时,研究人员观察到,LLM相 较于真实信念,更难识别虚假信念。研究人员表示,LLM 必须要能够成功区分事实与信念的细微差别及其真假,才可以对用户查询做出有效回应并防止错误信息传播。
- 南极一冰川在两个月内缩小一半
根据发表在《Nature Geoscience》期刊上的一项研究,南极一冰川在两个月内缩小近一半,创下了现代历史上最快的冰川退缩速度。其退缩方式可能会对全球海平面上升产生重大影响。Hektoria 冰川位于南极半岛,属于接地冰川而不是漂浮冰川,此类冰川每年的退缩速度不超过几百米。但在 2022 年 11-12 月,Hektoria 冰川退缩了 8 公里。如果南极洲更大的冰川以类似的速度退缩,可能会对海平面上升产生灾难性的影响。南极洲的冰量足以使全球海平面上升约 58 米。上一次发生如此大规模的冰川融化是在大约 15000 至 19000 年前,当时正处于上一个冰川时代结束全球变暖的时期。
- 在共和党参议员投诉之后 Google 从 AI Studio 中移除 Gemma 模型
Google 上周五从 AI Studio 中移除了开源模型 Gemma,理由语焉不详。模型下架前,美国田纳西州共和党参议员 Marsha Blackburn 致函 Google CEO Sundar Pichai,称 Gemma 模型生成了针对她的性行为不检的虚假指控,她要求 Google 给出解释。她试图将此事与正在进行的针对 Google 等公司机器人诽谤保守派的听证会关联起来。在听证会上,Google 的 Markham Erickson 解释说,AI 幻觉是生成式 AI 广泛存在的已知问题,Google 正致力于缓解幻觉问题的影响。测试显示 Gemini for Home 特别容易生成幻觉。而在 Blackburn 的诱导性测试中 Gemma 编造了一个她与州警之间由吸毒催化的婚外情,甚至还提供了一个虚假的新闻链接。
- 微软修复十年历史的“更新关机”变“更新重启”bug
从 Windows 11 25H2 Build 26200.7019 或 Windows 24H2 Build 26100.7019 起,你的 Windows PC 终于能正确响应“更新和关机”的指令了。微软在 Windows 10 中引入了存在 bug 的“更新和关机”选项,当操作系统打上最新补丁之后用户在关闭电脑时可选择“更新和关机”,但用户经常发现电脑并没有真正关机,而是重启了。如果你的笔记本电脑没有连接电源,那么可能的一个结果会是电池耗尽。微软确认,它在 2025 年 10 月释出了修复该 bug 的可选更新 KB5067036。如果用户没有安装该可选更新,微软将在 11 月 11 日的例行安全更新中释出相关补丁。微软没有解释导致该 bug 的原因,可能的解释是竞态条件或 Windows Servicing Stack 的问题。