OrangeBot.AI Digest — 2025-12-22
59 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- NPM Package with 56K Downloads Caught Stealing WhatsApp Messages (www.koi.ai)
- GLM-4.7: Advancing the Coding Capability (z.ai)
- US blocks all offshore wind construction, says reason is classified (arstechnica.com)
- The Illustrated Transformer (jalammar.github.io)
- NIST was 5 μs off UTC after last week's power cut (www.jeffgeerling.com)
- Claude Code gets native LSP support (github.com)
- Flock Exposed Its AI-Powered Cameras to the Internet. We Tracked Ourselves (www.404media.co)
- Benn Jordan – This Flock Camera Leak Is Like Netflix for Stalkers [video] (www.youtube.com)
- Jimmy Lai Is a Martyr for Freedom (reason.com)
- The U.S. Is Funding Fewer Grants in Every Area of Science and Medicine (www.nytimes.com)
- Scaling LLMs to Larger Codebases (blog.kierangill.xyz)
- The biggest CRT ever made: Sony's PVM-4300 (dfarq.homeip.net)
- Debian's Git Transition (diziet.dreamwidth.org)
- A year of vibes (lucumr.pocoo.org)
- If you don't design your career, someone else will (2014) (gregmckeown.com)
GitHub Trending(15)
- exo-explore / exo
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
- iptv-org / iptv
Collection of publicly available IPTV channels from all over the world
- swisskyrepo / PayloadsAllTheThings
A list of useful payloads and bypass for Web Application Security and Pentest/CTF
- GreyDGL / PentestGPT
A GPT-empowered penetration testing tool
- anthropics / skills
Public repository for Agent Skills
- cocoindex-io / cocoindex
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
- danielmiessler / Fabric
Fabric is an open-source framework for augmenting humans using AI. It provides a modular system for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
- tensorflow / tensorflow
An Open Source Machine Learning Framework for Everyone
- rendercv / rendercv
Typst-based CV/resume generator for academics and engineers
- home-assistant / core
🏡 Open source home automation that puts local control and privacy first.
- Semperis / EntraGoat
A deliberately vulnerable Microsoft Entra ID environment. Learn identity security through hands-on, realistic attack challenges.
- google / langextract
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
- expressjs / express
Fast, unopinionated, minimalist web framework for node.
- lintsinghua / DeepAudit
DeepAudit:人人拥有的 AI 黑客战队,让漏洞挖掘触手可及。国内首个开源的代码漏洞挖掘多智能体系统。小白一键部署运行,自主协作审计 + 自动化沙箱 PoC 验证。支持 Ollama 私有部署 ,一键生成报告。支持中转站。让安全不再昂贵,让审计不再复杂。
- cloudcommunity / Free-Certifications
A curated list of free courses with certifications. Also available at https://free-certifications.com/
Hugging Face(15)
- Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep research, idea generation, dry/wet experiments, and experimental reasoning. SGI-Bench comprises over 1,000 expert-curated, cross-disciplinary samples inspired by Science's 125 Big Questions, enabling systematic evaluation of state-of-the-art LLMs. Results reveal gaps: low exact match (10--20%) in deep research despite step-level alignment; ideas lacking feasibility and detail; high code executability but low execution result accuracy in dry experiments; low sequence fidelity in wet protocols; and persistent multimodal comparative-reasoning challenges. We further introduce Test-Time Reinforcement Learning (TTRL), which optimizes retrieval-augmented novelty rewards at inference, enhancing hypothesis novelty without reference answer. Together, our PIM-grounded definition, workflow-centric benchmark, and empirical insights establish a foundation for AI systems that genuinely participate in scientific discovery.
- PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Robotic generalization relies on physical intelligence: the ability to reason about state changes, contact-rich interactions, and long-horizon planning under egocentric perception and action. However, most VLMs are trained primarily on third-person data, creating a fundamental viewpoint mismatch for humanoid robots. Scaling robot egocentric data collection remains impractical due to high cost and limited diversity, whereas large-scale human egocentric videos offer a scalable alternative that naturally capture rich interaction context and causal structure. The key challenge is to convert raw egocentric videos into structured and reliable embodiment training supervision. Accordingly, we propose an Egocentric2Embodiment translation pipeline that transforms first-person videos into multi-level, schema-driven VQA supervision with enforced evidence grounding and temporal consistency, enabling the construction of the Egocentric2Embodiment dataset (E2E-3M) at scale. An egocentric-aware embodied brain, termed PhysBrain, is obtained by training on the E2E-3M dataset. PhysBrain exhibits substantially improved egocentric understanding, particularly for planning on EgoThink. It provides an egocentric-aware initialization that enables more sample-efficient VLA fine-tuning and higher SimplerEnv success rates (53.9\%), demonstrating effective transfer from human egocentric supervision to downstream robot control.
- When Reasoning Meets Its Laws
Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formalize the desired reasoning behaviors, this paper presents the Laws of Reasoning (LoRe), a unified framework that characterizes intrinsic reasoning patterns in LRMs. We first propose compute law with the hypothesis that the reasoning compute should scale linearly with question complexity. Beyond compute, we extend LoRe with a supplementary accuracy law. Since the question complexity is difficult to quantify in practice, we examine these hypotheses by two properties of the laws, monotonicity and compositionality. We therefore introduce LoRe-Bench, a benchmark that systematically measures these two tractable properties for large reasoning models. Evaluation shows that most reasoning models exhibit reasonable monotonicity but lack compositionality. In response, we develop an effective finetuning approach that enforces compute-law compositionality. Extensive empirical studies demonstrate that better compliance with compute laws yields consistently improved reasoning performance on multiple benchmarks, and uncovers synergistic effects across properties and laws. Project page: https://lore-project.github.io/
- Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present Seed-Prover 1.5, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves 88\% of PutnamBench (undergraduate-level), 80\% of Fate-H (graduate-level), and 33\% of Fate-X (PhD-level) problems. Notably, using our system, we solved 11 out of 12 problems from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.
- Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing
Modern Latent Diffusion Models (LDMs) typically operate in low-level Variational Autoencoder (VAE) latent spaces that are primarily optimized for pixel-level reconstruction. To unify vision generation and understanding, a burgeoning trend is to adopt high-dimensional features from representation encoders as generative latents. However, we empirically identify two fundamental obstacles in this paradigm: (1) the discriminative feature space lacks compact regularization, making diffusion models prone to off-manifold latents that lead to inaccurate object structures; and (2) the encoder's inherently weak pixel-level reconstruction hinders the generator from learning accurate fine-grained geometry and texture. In this paper, we propose a systematic framework to adapt understanding-oriented encoder features for generative tasks. We introduce a semantic-pixel reconstruction objective to regularize the latent space, enabling the compression of both semantic information and fine-grained details into a highly compact representation (96 channels with 16x16 spatial downsampling). This design ensures that the latent space remains semantically rich and achieves state-of-the-art image reconstruction, while remaining compact enough for accurate generation. Leveraging this representation, we design a unified Text-to-Image (T2I) and image editing model. Benchmarking against various feature spaces, we demonstrate that our approach achieves state-of-the-art reconstruction, faster convergence, and substantial performance gains in both T2I and editing tasks, validating that representation encoders can be effectively adapted into robust generative components.
- 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Despite advances in Multimodal LLMs (MLLMs), their ability to reason over 3D structures and temporal dynamics remains limited, constrained by weak 4D perception and temporal understanding. Existing 3D and 4D Video Question Answering (VQA) benchmarks also emphasize static scenes and lack region-level prompting. We tackle these issues by introducing: (a) 4D-RGPT, a specialized MLLM designed to capture 4D representations from video inputs with enhanced temporal perception; (b) Perceptual 4D Distillation (P4D), a training framework that transfers 4D representations from a frozen expert model into 4D-RGPT for comprehensive 4D perception; and (c) R4D-Bench, a benchmark for depth-aware dynamic scenes with region-level prompting, built via a hybrid automated and human-verified pipeline. Our 4D-RGPT achieves notable improvements on both existing 4D VQA benchmarks and the proposed R4D-Bench benchmark.
- Are We on the Right Way to Assessing LLM-as-a-Judge?
LLM-as-a-Judge has been widely adopted as an evaluation method and served as supervised rewards in model training. However, existing benchmarks for LLM-as-a-Judge are mainly relying on human-annotated ground truth, which introduces human bias that undermines the assessment of reliability and imposes scalability constraints. To overcome these limitations, we introduce Sage, a novel evaluation suite that assesses the quality of LLM judges without necessitating any human annotation. Inspired by axioms of rational choice theory, Sage introduces two new lenses for measuring LLM-as-a-Judge: local self-consistency (pair-wise preference stability) and global logical consistency (transitivity across a full set of preferences). We curate a dataset of 650 questions by combining structured benchmark problems with real-world user queries. Our experiments demonstrate both the stability of our metrics and their high correlation with supervised benchmarks like LLMBar and RewardBench2, confirming Sage's reliability as an evaluation suite for the robustness and accuracy of LLM-as-a-Judge. Based on Sage, we reveal that current state-of-the-art LLMs exhibit significant reliability problems when acting as judges in both scoring and pairwise settings; even the top-performing models, Gemini-2.5-Pro and GPT-5, fail to maintain consistent preferences in nearly a quarter of difficult cases. We attribute this to a new phenomenon called situational preference, which explains why explicit rubrics or criteria can help the model judge consistently across answer pairs. Our further analysis shows that finetuned LLM-as-a-Judge is a feasible method to boost performance, and the panel-based judge as well as deep reasoning can enhance the judging consistency. We also find substantial inconsistency in human judgments, which indicates that human annotation may not be a reliable gold standard.
- RadarGen: Automotive Radar Point Cloud Generation from Cameras
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery. RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form that encodes spatial structure together with radar cross section (RCS) and Doppler attributes. A lightweight recovery step reconstructs point clouds from the generated maps. To better align generation with the visual scene, RadarGen incorporates BEV-aligned depth, semantic, and motion cues extracted from pretrained foundation models, which guide the stochastic generation process toward physically plausible radar patterns. Conditioning on images makes the approach broadly compatible, in principle, with existing visual datasets and simulation frameworks, offering a scalable direction for multimodal generative simulation. Evaluations on large-scale driving data show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data, marking a step toward unified generative simulation across sensing modalities.
- GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation
Visual grounding, localizing objects from natural language descriptions, represents a critical bridge between language and vision understanding. While multimodal large language models (MLLMs) achieve impressive scores on existing benchmarks, a fundamental question remains: can MLLMs truly ground language in vision with human-like sophistication, or are they merely pattern-matching on simplified datasets? Current benchmarks fail to capture real-world complexity where humans effortlessly navigate ambiguous references and recognize when grounding is impossible. To rigorously assess MLLMs' true capabilities, we introduce GroundingME, a benchmark that systematically challenges models across four critical dimensions: (1) Discriminative, distinguishing highly similar objects, (2) Spatial, understanding complex relational descriptions, (3) Limited, handling occlusions or tiny objects, and (4) Rejection, recognizing ungroundable queries. Through careful curation combining automated generation with human verification, we create 1,005 challenging examples mirroring real-world complexity. Evaluating 25 state-of-the-art MLLMs reveals a profound capability gap: the best model achieves only 45.1% accuracy, while most score 0% on rejection tasks, reflexively hallucinating objects rather than acknowledging their absence, raising critical safety concerns for deployment. We explore two strategies for improvements: (1) test-time scaling selects optimal response by thinking trajectory to improve complex grounding by up to 2.9%, and (2) data-mixture training teaches models to recognize ungroundable queries, boosting rejection accuracy from 0% to 27.9%. GroundingME thus serves as both a diagnostic tool revealing current limitations in MLLMs and a roadmap toward human-level visual grounding.
- An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our https://suyuz1.github.io/Survery/{project page}.
- Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by noise and randomness. To overcome this, we introduce controlled synthetic pretraining tasks that isolate and evaluate core model capabilities. Within this framework, we discover CANON LAYERS: lightweight architectural components -- named after the musical term "canon" -- that promote horizontal information flow across neighboring tokens. Canon layers compute weighted sums of nearby token representations and integrate seamlessly into Transformers, linear attention, state-space models, or any sequence architecture. We present 12 key results. This includes how Canon layers enhance reasoning depth (e.g., by 2times), reasoning breadth, knowledge manipulation, etc. They lift weak architectures like NoPE to match RoPE, and linear attention to rival SOTA linear models like Mamba2/GDN -- validated both through synthetic tasks and real-world academic-scale pretraining. This synthetic playground offers an economical, principled path to isolate core model capabilities often obscured at academic scales. Equipped with infinite high-quality data, it may even PREDICT how future architectures will behave as training pipelines improve -- e.g., through better data curation or RL-based post-training -- unlocking deeper reasoning and hierarchical inference.
- HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering
Video Large Language Models (Video-LLMs) are rapidly improving, yet current Video Question Answering (VideoQA) benchmarks often allow questions to be answered from a single salient cue, under-testing reasoning that must aggregate multiple, temporally separated visual evidence. We present HERBench, a VideoQA benchmark purpose-built to assess multi-evidence integration across time. Each question requires aggregating at least three non-overlapping evidential cues across distinct video segments, so neither language priors nor a single snapshot can suffice. HERBench comprises 26K five-way multiple-choice questions organized into twelve compositional tasks that probe identity binding, cross-entity relations, temporal ordering, co-occurrence verification, and counting. To make evidential demand measurable, we introduce the Minimum Required Frame-Set (MRFS), the smallest number of frames a model must fuse to answer correctly, and show that HERBench imposes substantially higher demand than prior datasets (mean MRFS 5.5 vs. 2.6-4.2). Evaluating 13 state-of-the-art Video-LLMs on HERBench reveals pervasive failures: accuracies of 31-42% are only slightly above the 20% random-guess baseline. We disentangle this failure into two critical bottlenecks: (1) a retrieval deficit, where frame selectors overlook key evidence, and (2) a fusion deficit, where models fail to integrate information even when all necessary evidence is provided. By making cross-time evidence both unavoidable and quantifiable, HERBench establishes a principled target for advancing robust, compositional video understanding.
- Animate Any Character in Any World
Recent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construct 3D environments without active agents, and (2) controllable-entity models, which allow a single entity to perform limited actions in an otherwise uncontrollable environment. In this work, we introduce AniX, leveraging the realism and structural grounding of static world generation while extending controllable-entity models to support user-specified characters capable of performing open-ended actions. Users can provide a 3DGS scene and a character, then direct the character through natural language to perform diverse behaviors from basic locomotion to object-centric interactions while freely exploring the environment. AniX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character, formulated as a conditional autoregressive video generation problem. Built upon a pre-trained video generator, our training strategy significantly enhances motion dynamics while maintaining generalization across actions and characters. Our evaluation covers a broad range of aspects, including visual quality, character consistency, action controllability, and long-horizon coherence.
- Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn tasks exposes notable limitations, particularly in scenarios requiring long-horizon reasoning. To address these challenges, we investigate more stable and effective advantage estimation strategies, especially for multi-turn settings. We first explore Proximal Policy Optimization (PPO) as an alternative and find it to be more robust than GRPO. To further enhance PPO in multi-turn scenarios, we introduce turn-PPO, a variant that operates on a turn-level MDP formulation, as opposed to the commonly used token-level MDP. Our results on the WebShop and Sokoban datasets demonstrate the effectiveness of turn-PPO, both with and without long reasoning components.
- SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories
Benchmarks like SWE-bench have standardized the evaluation of Large Language Models (LLMs) on repository-level software engineering tasks. However, these efforts remain limited by manual curation, static datasets, and a focus on Python-based bug fixes. We introduce SWE-Bench++, an automated framework that generates repository-level coding tasks from open-source GitHub projects. Unlike synthetic approaches, our pipeline harvests live pull requests to cover both bug fixes and feature requests across 11 languages. SWE-Bench++ turns GitHub pull requests (PRs) into reproducible, execution-based tasks via four stages: programmatic sourcing, environment synthesis, test oracle extraction, and quality assurance. A final hint-guided trajectory synthesis step converts instances that strong models fail on into training trajectories. Our initial benchmark consists of 11,133 instances from 3,971 repositories across 11 languages. On a subset of 1,782 instances of this benchmark, today's strongest models perform as follows: claude-sonnet-4.5 achieves 36.20% pass@10, gpt-5-2025-08-07 34.57%, gemini/gemini-2.5-pro 24.92%, and gpt-4o 16.89%. We further demonstrate the utility of our dataset by showing that fine-tuning on SWE-Bench++ instances yields measurable improvements on the SWE-bench Multilingual benchmark. SWE-Bench++ provides a scalable, multilingual benchmark for evaluating and improving repository-level code generation.
Solidot(14)
- 室内日光浴会加速皮肤老化
根据发表在《Science Advances》期刊上的一项研究,室内日光浴或使用人造紫外线晒黑皮肤,会加速皮肤细胞突变,可能会增加未来癌症的风险。研究人员发现,30 多岁和 40 多岁的日光浴者皮肤上的细胞突变数量比 70 多岁和 80 多岁的普通人群还要多。换句话说,从基因层面看,日光浴者的皮肤似乎老了几十岁。根据美国癌症协会的数据,皮肤癌是美国最常见的癌症,其中最致命的是黑色素瘤。每年约有 11,000 名美国人死于黑色素瘤,主要原因是暴露于紫外线辐射。紫外线辐射天然存在于阳光中,也存在于日光浴床等人工光源中。随着日光浴床使用量的增加,黑色素瘤的发病率随之上升,尤其对年轻女性造成了不成比例的影响,她们是日光浴的主要客户。许多国家已禁止使用日光浴床,世界卫生组织也将其列为一级致癌物,与烟草烟雾和石棉同属一类。
- 韦伯发现了氦碳构成大气层的系外行星
天文学家使用韦伯望远镜发现了一颗不同寻常的系外行星 PSR J2322-2650b,其大气层主要由氦和碳构成。碳云在大气层深处可能会凝结成钻石。PSR J2322-2650b 的母星是一颗脉冲星,脉冲星会释放出伽马射线等高能粒子,在韦伯望远镜的红线观测下是不可见的,因此天文学家能了解到行星的细节。PSR J2322-2650b 距离脉冲星非常近仅为 100 万英里,围绕一周仅 7.8 小时,在脉冲星的强大引力下这颗木星质量的行星被拉成柠檬形状,其表面温度从 600-2000 摄氏度。
- 自然光可能有助于糖尿病患者控制血糖
人体细胞和组织遵循昼夜节律,即 24 小时的代谢活动周期,这种周期调节了诸如血糖水平等生理机制。过去研究表明,夜间暴露在人造光下会扰乱这些节律,导致血糖水平升高,而多在户外晒太阳,似乎能增强身体对有助于控制血糖水平激素的反应。 研究人员招募了 13 名平均年龄为 70 岁的 2 型糖尿病患者,让他们在一个房间里待了 4.5 天。期间参与者们继续服用他们惯常的糖尿病药物,仅在每天上午 8-17 点通过大窗户接触自然光。对照实验只有人造光。结果表明,在自然光照的一周里,参与者血糖水平保持在健康范围内的时间占 50%。而在人工光照实验中,这一比例仅为 43%。研究人员认为需要进一步的研究去验证。
- 订阅陷阱时代
一款售价 169 美元的闹钟提供了特殊灯光效果和音效,但客户需要每月支付 4.99 美元。欢迎来到订阅陷阱时代,越来越多你付费购买的东西,会反过来控制或束缚你。订阅模式对企业非常有利,因为能带来稳定的收入来源。但对消费者来说,它弊大于利,原因一样:你必须不断向企业付费。每月 5 美元或更高的订阅费用会一直带进到我们的坟墓。研究表明 2023 年消费者平均每月在订阅服务上花费 219 美元。2024 年全球订阅市场规模估计达到 4920 亿美元。到 2033 年,这一数字预计将增长三倍。公司辩称,订阅模式不只是为了增加其利润,消费者也能受益。比如惠普打印机的 Instant Ink 订阅服务承诺消费者不用再担心墨水用完。但一旦用户取消订阅,打印机会扣押只用了一半的墨盒。需要付费才能继续用。惠普还使用 DRM 限制第三方墨盒使用。
- FSF 警告任天堂新 DRM 允许它远程让游戏机变砖
自由软件基金会(FSF)对任天堂最近更新的 DRM 发出警告,它允许任天堂自行决定、单方面撤销玩家对游戏、安全更新和互联网访问的权利。任天堂的新用户协议明确,如果玩家没有遵守其限制,任天堂可能会让玩家的 Nintendo Account Services 和/或相关任天堂设备永久无法使用。任天堂设定的限制包括:以任何方式篡改硬件或软件;尝试运行备份游戏;运行“二手”游戏;使用第三方游戏或配件。任天堂并不只是在口头警告,在 Switch 2 发售不到一个月就有玩家的游戏机受到限制的报道。
- 苹果和 Google 建议持签证员工不要出国
根据内部备忘录,由于特朗普政府加强签证审查,苹果和 Google 建议持签证员工不要出国旅行,以免返回时被困。美国领事馆和大使馆报告,由于国土安全部新规,旅客需要接受最长五年的社媒历史记录审查,签证预约出现长时间延误,有时甚至长达数月。苹果和 Google 雇佣了逾 30 万名员工,高度依赖高技能外籍员工。鉴于审查加强以及签证预约延误,两家公司通知部分员工,建议避免出国旅行,尽可能留在美国。苹果的合作律所 Fragomen 说,如果无法推迟旅行,那么员工需要提前联系苹果移民部门和 Fragomen 律所讨论相关风险。
- 旧金山断电交通信号中断导致 Waymo 无人出租车陷入困境
旧金山周六晚上九点因变电站起火发生了断电事故,电力公司称有十多万客户受到影响。断电也影响了交通信号,在一片漆黑没有任何交通信号的情况下,Waymo 公司的无人驾驶出租车采取了最谨慎的做法:以蜗牛般的速度缓慢行驶,以至于挡住了人类司机的路。根据交通法,交通信号故障时汽车需要遵守“四向停车(four-way stop)”,在停车标志前停车观察后通行。Waymo 遵守了这一规定,但行动过于缓慢以至于造成了堵车,用户在社交媒体上分享了 Waymo 出租车停在路口的照片和视频。
- 小鼠实验显示一种天然细菌具有抗癌功效
肠道菌群与癌症之间的关联最近几年备受关注,大部分研究集中在间接方法上如微生物组调控(Microbiome Modulation)或粪便微生物移植。日本研究人员采用了直接方法:从日本树蛙、日本火腹蝾螈和日本草蜥的肠道中分离出 45 株细菌,其中 9 株细菌测试表现出抗癌效用。在小鼠实验中,美洲大肠杆菌(E. americana)能完全清除直肠癌。但小鼠的结果能否在人类身上复现还有待观察。
- AI 公司对 AGI 的警告是为了掩盖对权力的争夺
AI 公司不断警告 AGI(通用 AI 智能或超级智能)的到来是不可避免的,但大语言模型的缺陷已是众所周知,在大模型基础之上是否能发展出 AGI 在科学上并不存在共识。围绕 AGI 的喧嚣是在试图掩盖真正的问题。Cork 大学学院的 James O'Sullivan 认为这是一种政治叙事,是将关于企业问责、失业、算法偏见和民主治理等问题转化为关于意识和控制的抽象哲学问题。AGI 不可避免的话语是试图营造“科技救世主神话”,警告 AI 风险的人,也是同时在构建 AI、并试图获得权力和财富的人。大模型的每一次改进都被解读为迈向 AGI 的一步,成为 AI 公司争取巨额投资与宽松监管的策略。我们的未来是政治性的,而非技术性的。问题不在于 AGI 是否会到来,而在于谁决定什么样的 AI,不应该由少数科技精英去决定我们的未来。
- 可再生能源开始超越化石燃料
自工业革命以来,人类长期依赖煤炭、石油和天然气等化石燃料作为能源。源自这些有限资源所导致的碳排放极大地加剧了气候变暖的速度。然而 2025 年标志着这一模式发生了重大转变,因为太阳能和风能等可再生能源已在多个领域开始超越传统的化石燃料能源生产。今年全球可再生能源增长量足以满足上半年全球新增电力需求,全球发电量已超过煤炭。这一转型由中国引领:其在太阳能电池板、风力涡轮发电机和锂电池储能领域的规模化发展已使其稳居全球可再生能源生产与技术的领先地位。在其他地方,得益于由中国制造业主导而实现价格亲民且得以普及的小型屋顶太阳能系统正在迅速推广,特别是在欧洲、南亚和全球南方地区,并为数百万人提供了可靠且成本低廉的能源保障。现有的可再生能源已显著减缓了中国温室气体排放的增长速度,预示着全球应对气候变暖的转折点即将到来。
- Google 起诉 SerpApi 抓取和出售其搜索数据
Google 起诉 SerpApi 抓取和出售其搜索数据。Google 称 SerpApi 绕过其安全措施和爬虫控制;无视关于内容访问权限的指令;使用伪装、轮换机器人身份和机器人网络大规模抓取内容;从搜索功能中窃取授权内容,包括图片和实时数据,转售牟利。Google 称这种抓取行为是“明目张胆”且“非法”,过去一年其活动急剧增加。
- Google 计划对通过外部内容链接成功完成的交易和下载收取 2-4 美元的服务费
在 Epic Games 提起的反垄断诉讼中,加州法官 James Donato 裁决 Google 必须向竞争对手开放其 Google Play Store。在遵守该裁决的最后一天,Google 公布了一个疯狂计划:对通过外部内容链接成功完成的交易和下载收取 2-4 美元的服务费。Google 称,通过 Google Play 分发应用的开发者可以借助链接引导美国境内用户访问外部内容,以便这些用户完成一些操作,包括购买应用内数字商品,或下载安装和更新均不受 Google Play 管理的应用。此外开发者还可以提供外部链接,以便用户购买应用内数字商品,而无需使用 Google Play 结算服务或搭配使用该服务。它计划对通过外部内容链接成功完成的交易和下载收取服务费。对于自动续订型订阅,收取交易金额 10% 的费用;对于其他应用内数字功能和服务产品,收取交易金额 20% 的费用。对于开发者每年总收入中的前 100 万美元所对应的交易,收取交易金额 10% 的费用。对于链接到的外部应用的安装,按每次安装收取固定费用(会定期调整),具体取决于应用类别,其中游戏为 3.65 美元,而应用为 2.85 美元。
- 2025 年可能是电视亮度之战的转折点
2025 年可能是电视亮度之战转折点,因为电视的亮度已经超过了 HDR 内容的最高亮度。TCL 和海信在 2025 年推出了首批在特定设置下亮度能达到 5000 尼特的消费级电视,而在不久前,电视厂商还在为达到 2000 尼特的亮度挣扎,5000 尼特亮度似乎遥不可及。LG 推出了 Primary RGB Tandem OLED 技术,将三层面板设计升级为四层红蓝绿蓝(RGBG)配置,可实现 4000 尼特亮度。这项技术已应用于 LG G5、松下 Z95B 以及飞利浦 OLED950和 OLED910 等产品中。TCL 推出了采用 RGB mini-LED 技术的 Q10M,海信有类似产品,三星的版本叫 micro-RGB。HDR 内容目前最高可按照 4,000 尼特的亮度进行母带制作。
- 大部分停放域名会重定向到恶意内容
域名停放(Domain parking)是指过期或休眠的域名,或者是热门网站的常见拼写错误。当用户因为拼写错误而意外访问域名停放公司的网页时,上面通常会展示第三方的付费链接。2014 年安全研究人员的分析显示,不到 5% 的域名停放网页会将用户重定向到恶意内容,也就是大部分链接是合法的。但如今比例发生了逆转。安全公司 Infoblox 的最新研究发现,大部分停放域名网站会将用户重定向到恶意内容。研究人员发现,逾九成的域名停放链接会将用户引导至非法内容、诈骗网站、恐吓软件、杀毒软件订阅服务或恶意软件。