OrangeBot.AI Digest — 2025-10-16
60 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- How I bypassed Amazon's Kindle web DRM (blog.pixelmelt.dev)
- Syntax highlighting is a waste of an information channel (2020) (buttondown.com)
- Codex Is Live in Zed (zed.dev)
- Gemini 3.0 spotted in the wild through A/B testing (ricklamers.io)
- Claude Skills (www.anthropic.com)
- Video game union workers rally against $55B private acquisition of EA (www.eurogamer.net)
- Tor browser removing various Firefox AI features (blog.torproject.org)
- DoorDash and Waymo launch autonomous delivery service in Phoenix (about.doordash.com)
- Why I Chose Elixir Phoenix over Rails, Laravel, and Next.js (akarshc.com)
- Hyperflask – Full stack Flask and Htmx framework (hyperflask.dev)
- Liquibase continues to advertise itself as "open source" despite license switch (github.com)
- JustSketchMe – Digital Posing Tool (justsketch.me)
- Upcoming Rust language features for kernel development (lwn.net)
- Steve Jobs and Cray-1 to be featured on 2026 American Innovations $1 coin (www.usmint.gov)
- Flies keep landing on North Sea oil rigs (theconversation.com)
GitHub Trending(15)
- nvm-sh / nvm
Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions
- devlikeapro / waha
WAHA - WhatsApp HTTP API (REST API) that you can configure in a click! 3 engines: WEBJS (browser based), NOWEB (websocket nodejs), GOWS (websocket go)
- QwenLM / Qwen3-VL
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
- ChristianLempa / boilerplates
This is my personal template collection. Here you'll find templates, and configurations for various tools, and technologies.
- karpathy / nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
- ntdevlabs / tiny11builder
Scripts to build a trimmed-down Windows 11 image.
- envoyproxy / envoy
Cloud-native high-performance edge/middle/service proxy
- GorvGoyl / Clone-Wars
100+ open-source clones of popular sites like Airbnb, Amazon, Instagram, Netflix, Tiktok, Spotify, Whatsapp, Youtube etc. See source code, demo links, tech stack, github stars.
- linexjlin / GPTs
leaked prompts of GPTs
- reflex-dev / reflex
🕸️ Web apps in pure Python 🐍
- wmjordan / PDFPatcher
PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等
- KellerJordan / modded-nanogpt
NanoGPT (124M) in 3 minutes
- anthropics / prompt-eng-interactive-tutorial
Anthropic's Interactive Prompt Engineering Tutorial
- jingyaogong / minimind
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
- DataDog / datadog-agent
Main repository for Datadog Agent
Hugging Face(15)
- UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Recent advances in unified multimodal models indicate a clear trend towards comprehensive content generation. However, the auditory domain remains a significant challenge, with music and speech often developed in isolation, hindering progress towards universal audio synthesis. This separation stems from inherent task conflicts and severe data imbalances, which impede the development of a truly unified audio generation model. To address this challenge, we propose UniMoE-Audio, a unified speech and music generation model within a novel Dynamic-Capacity Mixture-of-Experts (MoE) framework. Architecturally, UniMoE-Audio introduces a Top-P routing strategy for dynamic expert number allocation, and a hybrid expert design comprising routed experts for domain-specific knowledge, shared experts for domain-agnostic features, and null experts for adaptive computation skipping. To tackle data imbalance, we introduce a three-stage training curriculum: 1) Independent Specialist Training leverages original datasets to instill domain-specific knowledge into each "proto-expert" without interference; 2) MoE Integration and Warmup incorporates these specialists into the UniMoE-Audio architecture, warming up the gate module and shared expert using a subset of balanced dataset; and 3) Synergistic Joint Training trains the entire model end-to-end on the fully balanced dataset, fostering enhanced cross-domain synergy. Extensive experiments show that UniMoE-Audio not only achieves state-of-the-art performance on major speech and music generation benchmarks, but also demonstrates superior synergistic learning, mitigating the performance degradation typically seen in naive joint training. Our findings highlight the substantial potential of specialized MoE architecture and curated training strategies in advancing the field of universal audio generation. Homepage: https://mukioxun.github.io/Uni-MoE-site/home.html
- FlashWorld: High-quality 3D Scene Generation within Seconds
We propose FlashWorld, a generative model that produces 3D scenes from a single image or text prompt in seconds, 10~100times faster than previous works while possessing superior rendering quality. Our approach shifts from the conventional multi-view-oriented (MV-oriented) paradigm, which generates multi-view images for subsequent 3D reconstruction, to a 3D-oriented approach where the model directly produces 3D Gaussian representations during multi-view generation. While ensuring 3D consistency, 3D-oriented method typically suffers poor visual quality. FlashWorld includes a dual-mode pre-training phase followed by a cross-mode post-training phase, effectively integrating the strengths of both paradigms. Specifically, leveraging the prior from a video diffusion model, we first pre-train a dual-mode multi-view diffusion model, which jointly supports MV-oriented and 3D-oriented generation modes. To bridge the quality gap in 3D-oriented generation, we further propose a cross-mode post-training distillation by matching distribution from consistent 3D-oriented mode to high-quality MV-oriented mode. This not only enhances visual quality while maintaining 3D consistency, but also reduces the required denoising steps for inference. Also, we propose a strategy to leverage massive single-view images and text prompts during this process to enhance the model's generalization to out-of-distribution inputs. Extensive experiments demonstrate the superiority and efficiency of our method.
- Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
The reasoning pattern of Large language models (LLMs) remains opaque, and Reinforcement learning (RL) typically applies uniform credit across an entire generation, blurring the distinction between pivotal and routine steps. This work positions attention as a privileged substrate that renders the internal logic of LLMs legible, not merely as a byproduct of computation, but as a mechanistic blueprint of reasoning itself. We first distinguish attention heads between locally and globally focused information processing and reveal that locally focused heads produce a sawtooth pattern near the diagonal indicating phrasal chunks, while globally focused heads expose tokens that exert broad downstream influence over future tokens. We formalize these with two metrics: 1) Windowed Average Attention Distance, which measures the extent of backward attention within a clipped window; 2) Future Attention Influence, which quantifies a token's global importance as the average attention it receives from subsequent tokens. Taken together, these signals reveal a recurring preplan-and-anchor mechanism, where the model first performs a long-range contextual reference to generate an introductory token, which is immediately followed by or coincides with a semantic anchor token that organizes subsequent reasoning. Leveraging these insights, we introduce three novel RL strategies that dynamically perform targeted credit assignment to critical nodes (preplan tokens, anchor tokens, and their temporal coupling) and show consistent performance gains across various reasoning tasks. By aligning optimization with the model's intrinsic reasoning rhythm, we aim to transform opaque optimization into an actionable structure-aware process, hoping to offer a potential step toward more transparent and effective optimization of LLM reasoning.
- LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
- Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Fully open multimodal large language models (MLLMs) currently lag behind proprietary counterparts, primarily due to a significant gap in data quality for supervised fine-tuning (SFT). Existing open-source datasets are often plagued by widespread noise and a critical deficit in complex reasoning data, such as Chain-of-Thought (CoT), which hinders the development of advanced model capabilities. Addressing these challenges, our work makes three primary contributions. First, we introduce Honey-Data-15M, a new SFT dataset comprising approximately 15 million QA pairs, processed through multiple cleaning techniques and enhanced with a novel dual-level (short and long) CoT enrichment strategy. Second, we introduce HoneyPipe, the data curation pipeline, and its underlying framework DataStudio, providing the community with a transparent and adaptable methodology for data curation that moves beyond static dataset releases. Finally, to validate our dataset and pipeline, we train Bee-8B, an 8B model on Honey-Data-15M. Experiments show that Bee-8B establishes a new state-of-the-art (SOTA) for fully open MLLMs, achieving performance that is competitive with, and in some cases surpasses, recent semi-open models such as InternVL3.5-8B. Our work delivers to the community a suite of foundational resources, including: the Honey-Data-15M corpus; the full-stack suite comprising HoneyPipe and DataStudio; training recipes; an evaluation harness; and the model weights. This effort demonstrates that a principled focus on data quality is a key pathway to developing fully open MLLMs that are highly competitive with their semi-open counterparts.
- PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
Video generation models nowadays are capable of generating visually realistic videos, but often fail to adhere to physical laws, limiting their ability to generate physically plausible videos and serve as ''world models''. To address this issue, we propose PhysMaster, which captures physical knowledge as a representation for guiding video generation models to enhance their physics-awareness. Specifically, PhysMaster is based on the image-to-video task where the model is expected to predict physically plausible dynamics from the input image. Since the input image provides physical priors like relative positions and potential interactions of objects in the scenario, we devise PhysEncoder to encode physical information from it as an extra condition to inject physical knowledge into the video generation process. The lack of proper supervision on the model's physical performance beyond mere appearance motivates PhysEncoder to apply reinforcement learning with human feedback to physical representation learning, which leverages feedback from generation models to optimize physical representations with Direct Preference Optimization (DPO) in an end-to-end manner. PhysMaster provides a feasible solution for improving physics-awareness of PhysEncoder and thus of video generation, proving its ability on a simple proxy task and generalizability to wide-ranging physical scenarios. This implies that our PhysMaster, which unifies solutions for various physical processes via representation learning in the reinforcement learning paradigm, can act as a generic and plug-in solution for physics-aware video generation and broader applications.
- InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
We introduce InteractiveOmni, a unified and open-source omni-modal large language model for audio-visual multi-turn interaction, ranging from 4B to 8B parameters, designed to lead the field of lightweight models by offering comprehensive omni-modal understanding and speech generation capabilities. To achieve this, we integrate the vision encoder, audio encoder, large language model, and speech decoder into a unified model for understanding and generation tasks. We design a multi-stage training strategy to ensure robust cross-modal capabilities, including pre-training for omni-modal understanding, followed by post-training with speech conversation and audio-visual interaction. To enable human-like long-term conversational ability, we meticulously curate a multi-turn training dataset that enhances the model's ability to handle complex and multi-turn interactions. To effectively evaluate the multi-turn memory and speech interaction capabilities, we construct the multi-modal multi-turn memory benchmark and the multi-turn speech interaction benchmark. Experiments demonstrate that InteractiveOmni significantly outperforms leading open-source models and provides a more intelligent multi-turn audio-visual experience, particularly in its long-term memory capabilities. Notably, InteractiveOmni-4B is comparable to the much larger model like Qwen2.5-Omni-7B on general benchmarks, and it can retain 97% of the performance of the InteractiveOmni-8B while utilizing only 50% of the model size. Achieving state-of-the-art results against similarly sized models across image, audio, video understanding, and speech generation tasks, InteractiveOmni is an accessible, open-source foundation for next-generation intelligent interactive systems.
- CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving
Generative models have been widely applied to world modeling for environment simulation and future state prediction. With advancements in autonomous driving, there is a growing demand not only for high-fidelity video generation under various controls, but also for producing diverse and meaningful information such as depth estimation. To address this, we propose CVD-STORM, a cross-view video diffusion model utilizing a spatial-temporal reconstruction Variational Autoencoder (VAE) that generates long-term, multi-view videos with 4D reconstruction capabilities under various control inputs. Our approach first fine-tunes the VAE with an auxiliary 4D reconstruction task, enhancing its ability to encode 3D structures and temporal dynamics. Subsequently, we integrate this VAE into the video diffusion process to significantly improve generation quality. Experimental results demonstrate that our model achieves substantial improvements in both FID and FVD metrics. Additionally, the jointly-trained Gaussian Splatting Decoder effectively reconstructs dynamic scenes, providing valuable geometric information for comprehensive scene understanding.
- ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
While most autoregressive LLMs are constrained to one-by-one decoding, diffusion LLMs (dLLMs) have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding. Despite this promise, the conditional independence assumption in dLLMs causes parallel decoding to ignore token dependencies, inevitably degrading generation quality when these dependencies are strong. However, existing works largely overlook these inherent challenges, and evaluations on standard benchmarks (e.g., math and coding) are not sufficient to capture the quality degradation caused by parallel decoding. To address this gap, we first provide an information-theoretic analysis of parallel decoding. We then conduct case studies on analytically tractable synthetic list operations from both data distribution and decoding strategy perspectives, offering quantitative insights that highlight the fundamental limitations of parallel decoding. Building on these insights, we propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding. Using ParallelBench, we systematically analyze both dLLMs and autoregressive LLMs, revealing that: (i) dLLMs under parallel decoding can suffer dramatic quality degradation in real-world scenarios, and (ii) current parallel decoding strategies struggle to adapt their degree of parallelism based on task difficulty, thus failing to achieve meaningful speedup without compromising quality. Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off. We release our benchmark to help accelerate the development of truly efficient dLLMs.
- Trace Anything: Representing Any Video in 4D via Trajectory Fields
Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing any video as a Trajectory Field: a dense mapping that assigns a continuous 3D trajectory function of time to each pixel in every frame. With this representation, we introduce Trace Anything, a neural network that predicts the entire trajectory field in a single feed-forward pass. Specifically, for each pixel in each frame, our model predicts a set of control points that parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at arbitrary query time instants. We trained the Trace Anything model on large-scale 4D data, including data from our new platform, and our experiments demonstrate that: (i) Trace Anything achieves state-of-the-art performance on our new benchmark for trajectory field estimation and performs competitively on established point-tracking benchmarks; (ii) it offers significant efficiency gains thanks to its one-pass paradigm, without requiring iterative optimization or auxiliary estimators; and (iii) it exhibits emergent abilities, including goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion. Project page: https://trace-anything.github.io/.
- Generative Universal Verifier as Multimodal Meta-Reasoner
We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation process. This work makes three main contributions: (1) We build ViVerBench, a comprehensive benchmark spanning 16 categories of critical tasks for evaluating visual outcomes in multimodal reasoning. Results show that existing VLMs consistently underperform across these tasks, underscoring a substantial gap from human-level capability in reliable visual verification. (2) We design two automated pipelines to construct large-scale visual verification data and train OmniVerifier-7B, the first omni-capable generative verifier trained for universal visual verification and achieves notable gains on ViVerBench(+8.3). Through training, we identify three atomic capabilities in visual verification and demonstrate how they generalize and interact synergistically. (3) We propose OmniVerifier-TTS, a sequential test-time scaling paradigm that leverages the universal verifier to bridge image generation and editing within unified models, enhancing the upper bound of generative ability through iterative fine-grained optimization. Beyond generation, we extend universal verifier to broader world-modeling interleaved reasoning scenarios. Empirically, OmniVerifier-TTS achieves improvements on T2I-ReasonBench(+3.7), and GenEval++(+4.3), outperforming existing parallel test-time scaling methods, such as Best-of-N. By endowing multimodal reasoning with reliable visual verification, OmniVerifier advances both reliable reflection during generation and scalable test-time refinement, marking a step toward more trustworthy and controllable next-generation reasoning systems.
- Reasoning in Space via Grounding in the World
In this paper, we claim that 3D visual grounding is the cornerstone of spatial reasoning and introduce the Grounded-Spatial Reasoner (GS-Reasoner) to explore the effective spatial representations that bridge the gap between them. Existing 3D LLMs suffer from the absence of a unified 3D representation capable of jointly capturing semantic and geometric information. This deficiency is manifested either in poor performance on grounding or in an excessive reliance on external modules, ultimately hindering the seamless integration of grounding and spatial reasoning. To address this, we propose a simple yet effective dual-path pooling mechanism that tightly aligns geometric features with both semantic and positional cues, constructing a unified image patch-based 3D representation that encapsulates all essential information without increasing the number of input tokens. Leveraging this holistic representation, GS-Reasoner is the first 3D LLM that achieves autoregressive grounding entirely without external modules while delivering performance comparable to state-of-the-art models, establishing a unified and self-contained framework for 3D spatial reasoning. To further bridge grounding and spatial reasoning, we introduce the Grounded Chain-of-Thought (GCoT) dataset. This dataset is meticulously curated to include both 3D bounding box annotations for objects referenced in reasoning questions and step-by-step reasoning paths that integrate grounding as a core component of the problem-solving process. Extensive experiments demonstrate that GS-Reasoner achieves impressive results on 3D visual grounding, which in turn significantly enhances its spatial reasoning capabilities, leading to state-of-the-art performance.
- InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
We introduce InternVLA-M1, a unified framework for spatial grounding and robot control that advances instruction-following robots toward scalable, general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link between instructions and robot actions. InternVLA-M1 employs a two-stage pipeline: (i) spatial grounding pre-training on over 2.3M spatial reasoning data to determine ``where to act'' by aligning instructions with visual, embodiment-agnostic positions, and (ii) spatially guided action post-training to decide ``how to act'' by generating embodiment-aware actions through plug-and-play spatial prompting. This spatially guided training recipe yields consistent gains: InternVLA-M1 outperforms its variant without spatial guidance by +14.6% on SimplerEnv Google Robot, +17% on WidowX, and +4.3% on LIBERO Franka, while demonstrating stronger spatial reasoning capability in box, point, and trace prediction. To further scale instruction following, we built a simulation engine to collect 244K generalizable pick-and-place episodes, enabling a 6.2% average improvement across 200 tasks and 3K+ objects. In real-world clustered pick-and-place, InternVLA-M1 improved by 7.3%, and with synthetic co-training, achieved +20.6% on unseen objects and novel configurations. Moreover, in long-horizon reasoning-intensive scenarios, it surpassed existing works by over 10%. These results highlight spatially guided training as a unifying principle for scalable and resilient generalist robots. Code and models are available at https://github.com/InternRobotics/InternVLA-M1.
- The Role of Computing Resources in Publishing Foundation Model Research
Cutting-edge research in Artificial Intelligence (AI) requires considerable resources, including Graphics Processing Units (GPUs), data, and human resources. In this paper, we evaluate of the relationship between these resources and the scientific advancement of foundation models (FM). We reviewed 6517 FM papers published between 2022 to 2024, and surveyed 229 first-authors to the impact of computing resources on scientific output. We find that increased computing is correlated with national funding allocations and citations, but our findings don't observe the strong correlations with research environment (academic or industrial), domain, or study methodology. We advise that individuals and institutions focus on creating shared and affordable computing opportunities to lower the entry barrier for under-resourced researchers. These steps can help expand participation in FM research, foster diversity of ideas and contributors, and sustain innovation and progress in AI. The data will be available at: https://mit-calc.csail.mit.edu/
- What Generative Search Engines Like and How to Optimize Web Content Cooperatively
By employing large language models (LLMs) to retrieve documents and generate natural language responses, Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences and have rapidly become the new form of search. Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. AutoGEO first prompts frontier LLMs to explain generative engine preferences and extract meaningful preference rules from these explanations. Then it uses preference rules as context engineering for AutoGEO_API, a prompt-based GEO system, and as rule-based rewards to train AutoGEO_Mini, a cost-effective GEO model. Experiments on the standard GEO-Bench and two newly constructed benchmarks using real user queries demonstrate the effectiveness of AutoGEO in enhancing content traction while preserving search utility. Analyses confirm the learned rules' robustness and abilities to capture unique preferences in variant domains, and AutoGEO systems' ability to embed them in content optimization. The code is released at https://github.com/cxcscmu/AutoGEO.
Solidot(15)
- 日本政府要求 OpenAI 停止侵权
日本政府正式要求 OpenAI 停止侵权。OpenAI 本月初发布了 Sora 2,它能以 1080p 的分辨率生成 20 秒长度的视频。之后网络上充斥着 Sora 2 视频,其中很多使用了来自日本流行动漫和游戏中的版权角色,包括来自《海贼王》、《鬼灭之刃》、《宝可梦》和《马力欧》的角色。日本 IP 和 AI 战略大臣 Minoru Kiuchi 称动漫是日本向世界展示的无可替代的瑰宝。数字大臣 Masaaki Taira 希望 OpenAI 能自愿遵守相关版权法。如果问题得不到解决,可能会根据日本 AI Promotion Act 中的相关措施采取行动。
- 苹果新 MacBook Pro 电池续航力长达 24 小时
苹果发布了采用 M5 芯片的新 MacBook Pro,10 月 22 日发售,起售价 12,999 元。M5 芯片由 10 核 CPU——其中 4 个性能核心和 6 个能效核心,以及 10 核 GPU、神经网络加速器、硬件加速光线追踪 和16 核神经网络引擎构成,AI 性能比上一代的 M4 提升最高 3.5 倍,图像处理能力提升最高 1.6 倍,CPU 多线程性能提升最高 20%。苹果声称,MacBook Pro 的电池续航时间最长达到了 24 小时。
- 到 2050 年全球气温可能上升 2C
英国气候变化委员会 Climate Change Committee (CCC)警告政府需要为 2050 年全球气候上升 2 摄氏度做准备。准备不足可能会在未来产生严重的经济和健康后果。气候委员会称,全球变暖 2 摄氏度将对英国的天气产生重大影响,极端事件将变得更频繁和普遍。英国可能会遭遇更多热浪、干旱和洪水,野火季节可能会持续到秋季。由于气候变化,英国天气模式已经在发生变化,2025 年英国气象局确认了四次官方认可的热浪,这是有记录以来最热的夏季。由于人为气候暖化,未来出现与 2025 年夏季相似炎热或更炎热夏季的可能性要高得多。
- 美国近七成成年人肥胖
肥胖的传统定义是基于 BMI 指数,即 BMI 指数达到或超过 30 的人属于肥胖。但这一定义长期以来一直受到争议,因为它没有区分脂肪和肌肉。为解决争议,医学专家在今年 1 月呼吁采用新的定义。新肥胖的定义包括:BMI 指数超过 40;BMI 指数较高同时腰围、腰臀或腰高比中至少一项指标偏高;不管 BMI 但腰围、腰臀或腰高比中有两项偏高;通过扫描直接测量体内脂肪过多。医学专家认为,肥胖应分为两类:临床肥胖(有疾病迹象)和临床前肥胖(没有疾病迹象)。研究表明,修改后的定义会导致美国成年人肥胖率急剧上升。研究人员分析了 301,026 名年龄在 18 岁至 80 岁之间的美国人数据,根据传统定义其中 44% 的人肥胖,而根据新的定义肥胖率达到了 69%,其中 70 岁以上人群肥胖率达到 78%。
- Reddit 联合创始人称大部分互联网已死
Reddit 联合创始人 Alexis Ohanian 对互联网的现状表达了不满,称互联网的很大一部分已死。他指出大部分互联网内容是 AI 或机器人生成的。他援引了“死亡互联网理论”——即认为互联网的机器人数量超过了活跃的人类数量。他认为需要真实的人才能避免互联网的死亡,认为下一代的社交媒体将会是在群聊基础上发展而来。群组聊天的成员都是由真实的人组成,虽然一部分群聊者也开始使用 AI 技术去帮助生成和编辑信息。Ohanian 称,群聊是黄金标准,但不是新技术,肯定会有下一次迭代,群聊是今天我们所有人获取最佳信息的地方。
- GLP-1 减肥药有治疗糖尿病的潜力
对许多糖尿病患者来说,控糖不仅是一场长期战斗,更是一场与生活质量的平衡。注射治疗的不便、药物依从性的困扰、体重和血脂的双重负担,让“控糖”成为一个难以轻松面对的命题。这一切或许正在发生改变。礼来公布了其在研口服 GLP-1 类药物 orforglipron 在两项关键的三期临床试验A CHIEVE-2 与 ACHIEVE-5 中取得积极成果。研究显示,该药物不仅显著降低血糖,还在体重、血脂等多项代谢指标上表现出色,为全球糖尿病患者带来了新的希望。GLP-1 类药物近年来成为糖尿病治疗领域的“明星”,但绝大多数需要注射使用。orforglipron是一种口服小分子药物,每日仅需一次服用,无需严格的饮食或饮水限制。这一转变,极大地降低了治疗的心理门槛,让长期规范治疗变得更加容易和自然。礼来计划于 2026 年向全球监管机构提交 orforglipron 用于治疗2型糖尿病的申请,而肥胖治疗适应症的申报预计将在今年底完成。
- 美国近四成 2 岁以下儿童接触智能手机
皮尤(Pew)研究中心调查了美国父母如何管理 12 岁以下儿童的屏幕使用时间。61% 接受调查的父母称他们的孩子曾经使用或与智能手机互动过,而年龄不到 2 岁的儿童的父母中这一比例为 38%。儿童大部分的屏幕时间被流媒体卡通占据着。调查发现,2岁以下儿童观看 YouTube 的比例从 45% 大幅上升至 62%。调查还显示,近四分之一美国父母允许 12 岁及以下的儿童拥有自己的智能手机,如果将范围缩小到 11-12 岁年龄段,这一比例升至近 60%。
- 苹果承诺增加在华投资
苹果 CEO 库克(Tim Cook)周三在北京与工业和信息化部部长李乐成见面时承诺继续加大在华投资,但预计的投资规模没有具体信息公布。由于现任美国总统特朗普力推美国制造,苹果此前也已经承诺将加大在美国的投资,承诺的投资规模是未来四年高达 6000 亿美元。
- Firefox 145 Beta 停止发布 32 位 Linux 版本
Mozilla 在释出 Firefox 144 正式版的同时发布了下一个版本的 Beta,而从 Firefox 145 起它不再支持 32 位 Linux 操作系统。Firefox 144 是最后一个支持 32 位 Linux 的版本,而最后一个支持 32 位 Linux 的长期支持版是 Firefox 140 ESR。Mozilla 现在鼓励用户安装 64 位 Firefox。Firefox 145 的主要变化包括水平标签更圆,支持常用编解码器 Matroska 等等。
- 黑客入侵安全公司窃取未披露漏洞信息和源代码
美国网络安全公司 F5 披露,有国家背景的黑客入侵了其系统,窃取了未披露的 BIG-IP 漏洞信息和源代码。该公司是在 8 月 9 日首次发现其系统遭到入侵,而进一步调查发现黑客已经获得了很长时间的访问权限。调查显示,黑客窃取了该公司旗舰产品 BIG-IP 的源代码和未披露的漏洞信息,以及少量客户的配置和实现信息。 F5 在 170 个国家/地区有 23,000 名客户,财富 50 强企业有 48 家使用该公司产品。F5 声称,此次入侵未危及其软件供应链,也未导致任何可疑的代码修改。
- Firefox 144.0 释出
Mozilla 释出了 Firefox 144.0。主要新特性包括:改进标签页组和配置文件管理(将在未来几周逐步推送,Windows 10 支持延后);Firefox 密码管理器中的密码使用 AES-256-CBC 加密,替代了旧的 3DES-CBC;由 Google Lens 提供支持的图像搜索(仅限于桌面,且默认搜索引擎必须是 Google);支持 AI 答案搜索引擎 Perplexity,等等。
- 青少年使用社媒与认知能力下降相关
根据发表在 JAMA 期刊上的一项研究,相比不使用或很少社媒的同龄人,高频使用社媒的青少年在阅读、词汇和记忆力测试中表现更差。研究分析了逾 6000 名 9-10 岁儿童到青春期早期的数据,他们将儿童分为三组:58% 很少或根本不使用社媒;37% 早期使用频率较低,但到 13 岁时每天在社交媒体上花费约一小时;6% 每天在社媒上花约三小时或更长时间。每天只花约一小时的低强度社媒使用者在阅读和记忆力测试中的得分比不使用者低 1 到 2 分,高强度使用者的得分比不使用者低 4 到 5 分。研究人员考虑了年龄、性别、种族和民族、家庭收入、父母教育程度、注意力缺陷/多动症、抑郁症等因素。
- 美国扣押柬埔寨电诈集团价值 150 亿美元的比特币
美国司法部周二宣布冻结柬埔寨太子集团(Prince Group)价值 150 亿美元的 127271 枚比特币,起诉该集团创始人兼董事长陈志 aka Vincent。太子集团在 30 多个国家经营数十家公司,被认为是亚洲最大的跨国犯罪组织之一,在柬埔寨各地经营诈骗中心,实施加密货币投资欺诈。被强迫参与诈骗的工人大多来自外地,在柬埔寨的诈骗园区工作且遭到暴力威胁,每天操控上万个假的社群账号,以杀猪盘诱骗全球投资者投资虚拟货币,窃取数十亿美元的不法获利。陈志本人也被指控直接参与施暴,曾要求“不要打死人”。他将面临最高 40 年徒刑。
- FSF 宣布 Librephone 项目
自由软件基金会(FSF)宣布了旨在为手机用户带来计算自由的 Librephone 项目。今天绝大多数软件用户都将手机作为其主要的计算设备。在倡导计算自由四十年后,FSF 致力于将研究、更换、共享和修改用户日常生活所依赖程序的权利引入手机。FSF 聘请了 Rob Savoye(DejaGNU、Gnash、OpenStreetMap 等)领导该项目。他目前主要针对 LineageOS 项目,研究其使用的设备固件和二进制数据块(binary blobs)。目标是选择一款自由程度最高的智能手机,对上面的私有模块进行逆向工程,用完全自由的软件进行替换,构建一个完全自由的 Android 兼容操作系统。
- 研究发现卫星未加密传输敏感信息
加州圣迭戈(UC San Diego)和马里兰大学的研究人员发现,约半数地球静止卫星信号在传输敏感数据时没有加密。研究团队用了三年时间花了 800 美元在大学屋顶上安装了一个卫星接收器,拦截从他们的位置可见的卫星通信。他们只用 9 个小时就记录到了逾 2700 名 T-Mobile 用户的通话和短信。研究人员还收集到航班乘客使用机载 Wi-Fi 的数据,电力公司和海上油气平台的通信数据,以及泄露军人位置和设备信息的美国和墨西哥军方通信数据。泄露的数据源于电信公司利用卫星将远程基站信号中继到核心网络。在研究人员警告之后,大部分公司对卫星传输进行了加密。