OrangeBot.AI Digest — 2025-08-24
67 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- Paracetamol disrupts early embryogenesis by cell cycle inhibition (academic.oup.com)
- Cloudflare incident on August 21, 2025 (blog.cloudflare.com)
- Burner Phone 101 (rebeccawilliams.info)
- Is 4chan the perfect Pirate Bay poster child to justify wider UK site-blocking? (torrentfreak.com)
- Making games in Go: 3 months without LLMs vs. 3 days with LLMs (marianogappa.github.io)
- Comet AI browser can get prompt injected from any site, drain your bank account (twitter.com)
- Trees on city streets cope with drought by drinking from leaky pipes (www.newscientist.com)
- US attack on renewables will lead to power crunch that spikes electricity prices (www.cnbc.com)
- ICE uses celebrity loophole to hide deportation flights (jacobin.com)
- Dynamically patch a Python function's source code at runtime (ericmjl.github.io)
- Show HN: Clearcam – Add AI object detection to your IP CCTV cameras (github.com)
- A German ISP changed their DNS to block my website (lina.sh)
- Valve Software handbook for new employees [pdf] (2012) (cdn.akamai.steamstatic.com)
- Turning Claude Code into my best design partner (betweentheprompts.com)
- It is worth it to buy the fast CPU (blog.howardjohn.info)
GitHub Trending(15)
- winapps-org / winapps
Run Windows apps such as Microsoft Office/Adobe in Linux (Ubuntu/Fedora) and GNOME/KDE as if they were a part of the native OS, including Nautilus integration. Hard fork of https://github.com/Fmstrat/winapps/
- moeru-ai / airi
💖🧸 Self hosted, you owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.
- HKUDS / DeepCode
"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"
- scottpetrovic / mesh2motion-app
Import a 3D Model and automatically assign and export animations
- EbookFoundation / free-programming-books
📚 Freely available programming books
- plait-board / drawnix
开源白板工具(SaaS),一体化白板,包含思维导图、流程图、自由画等。All in one open-source whiteboard tool with mind, flowchart, freehand and etc.
- midday-ai / midday
Invoicing, Time tracking, File reconciliation, Storage, Financial Overview & your own Assistant made for Freelancers
- yt-dlp / yt-dlp
A feature-rich command-line audio/video downloader
- django / django
The Web framework for perfectionists with deadlines.
- Klipper3d / klipper
Klipper is a 3d-printer firmware
- TheAlgorithms / Java
All Algorithms implemented in Java
- HunxByts / GhostTrack
Useful tool to track location or mobile number
- simstudioai / sim
Sim is an open-source AI agent workflow builder. Sim's interface is a lightweight, intuitive way to rapidly build and deploy LLMs that connect with your favorite tools.
- puckeditor / puck
The visual editor for React
- RSSNext / Folo
🧡 Follow everything in one place
Product Hunt(7)
- VibeFlow
If Lovable, n8n and Convex had a genius baby
- Informed
Your daily news, narrated by the voice you love most.
- Grok 2.5 (OSS Ver.)
2024 best model from xAI, now open source.
- pixxel
browser screenshots that don't suck
- ChatGPT Marketing
Boost ChatGPT visibility by aligning all marketing channels
- KiForm (beta)
Create Forms that People Love.
- PerformaMeter
Upload a photo and get your Performity™ rated
Hugging Face(15)
- Intern-S1: A Scientific Multimodal Foundation Model
In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to those in popular areas, far from sufficient for transforming scientific research and leaving substantial gap between open-source models and closed-source models in these scientific domains. To mitigate this gap and explore a step further toward Artificial General Intelligence (AGI), we introduce Intern-S1, a specialized generalist equipped with general understanding and reasoning capabilities with expertise to analyze multiple science modal data. Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the post-training stage, Intern-S1 undergoes offline and then online reinforcement learning (RL) in InternBootCamp, where we propose Mixture-of-Rewards (MoR) to synergize the RL training on more than 1000 tasks simultaneously. Through integrated innovations in algorithms, data, and training systems, Intern-S1 achieved top-tier performance in online RL training.On comprehensive evaluation benchmarks, Intern-S1 demonstrates competitive performance on general reasoning tasks among open-source models and significantly outperforms open-source models in scientific domains, surpassing closed-source state-of-the-art models in professional tasks, such as molecular synthesis planning, reaction condition prediction, predicting thermodynamic stabilities for crystals. Our models are available at https://huggingface.co/internlm/Intern-S1.
- Mobile-Agent-v3: Foundamental Agents for GUI Automation
This paper introduces GUI-Owl, a foundational GUI agent model that achieves state-of-the-art performance among open-source end-to-end models on ten GUI benchmarks across desktop and mobile environments, covering grounding, question answering, planning, decision-making, and procedural knowledge. GUI-Owl-7B achieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we propose Mobile-Agent-v3, a general-purpose GUI agent framework that further improves performance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a new state-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporates three key innovations: (1) Large-scale Environment Infrastructure: a cloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows, enabling our Self-Evolving GUI Trajectory Production framework. This generates high-quality interaction data via automated query generation and correctness validation, leveraging GUI-Owl to refine trajectories iteratively, forming a self-improving loop. It supports diverse data pipelines and reduces manual annotation. (2) Diverse Foundational Agent Capabilities: by integrating UI grounding, planning, action semantics, and reasoning patterns, GUI-Owl supports end-to-end decision-making and can act as a modular component in multi-agent systems. (3) Scalable Environment RL: we develop a scalable reinforcement learning framework with fully asynchronous training for real-world alignment. We also introduce Trajectory-aware Relative Policy Optimization (TRPO) for online RL, achieving 34.9 on OSWorld. GUI-Owl and Mobile-Agent-v3 are open-sourced at https://github.com/X-PLUG/MobileAgent.
- Deep Think with Confidence
Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages model-internal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.
- LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Tool calling has emerged as a critical capability for AI agents to interact with the real world and solve complex tasks. While the Model Context Protocol (MCP) provides a powerful standardized framework for tool integration, there is a significant gap in benchmarking how well AI agents can effectively solve multi-step tasks using diverse MCP tools in realistic, dynamic scenarios. In this work, we present LiveMCP-101, a benchmark of 101 carefully curated real-world queries, refined through iterative LLM rewriting and manual review, that require coordinated use of multiple MCP tools including web search, file operations, mathematical reasoning, and data analysis. Moreover, we introduce a novel evaluation approach that leverages ground-truth execution plans rather than raw API outputs, better reflecting the evolving nature of real-world environments. Experiments show that even frontier LLMs achieve a success rate below 60\%, highlighting major challenges in tool orchestration. Detailed ablations and error analysis further reveal distinct failure modes and inefficiencies in token usage, pointing to concrete directions for advancing current models. LiveMCP-101 sets a rigorous standard for evaluating real-world agent capabilities, advancing toward autonomous AI systems that reliably execute complex tasks through tool use.
- Waver: Wave Your Way to Lifelike Video Generation
We present Waver, a high-performance foundation model for unified image and video generation. Waver can directly generate videos with durations ranging from 5 to 10 seconds at a native resolution of 720p, which are subsequently upscaled to 1080p. The model simultaneously supports text-to-video (T2V), image-to-video (I2V), and text-to-image (T2I) generation within a single, integrated framework. We introduce a Hybrid Stream DiT architecture to enhance modality alignment and accelerate training convergence. To ensure training data quality, we establish a comprehensive data curation pipeline and manually annotate and train an MLLM-based video quality model to filter for the highest-quality samples. Furthermore, we provide detailed training and inference recipes to facilitate the generation of high-quality videos. Building on these contributions, Waver excels at capturing complex motion, achieving superior motion amplitude and temporal consistency in video synthesis. Notably, it ranks among the Top 3 on both the T2V and I2V leaderboards at Artificial Analysis (data as of 2025-07-30 10:00 GMT+8), consistently outperforming existing open-source models and matching or surpassing state-of-the-art commercial solutions. We hope this technical report will help the community more efficiently train high-quality video generation models and accelerate progress in video generation technologies. Official page: https://github.com/FoundationVision/Waver.
- A Survey on Large Language Model Benchmarks
In recent years, with the rapid development of the depth and breadth of large language models' capabilities, various corresponding evaluation benchmarks have been emerging in increasing numbers. As a quantitative assessment tool for model performance, benchmarks are not only a core means to measure model capabilities but also a key element in guiding the direction of model development and promoting technological innovation. We systematically review the current status and development of large language model benchmarks for the first time, categorizing 283 representative benchmarks into three categories: general capabilities, domain-specific, and target-specific. General capability benchmarks cover aspects such as core linguistics, knowledge, and reasoning; domain-specific benchmarks focus on fields like natural sciences, humanities and social sciences, and engineering technology; target-specific benchmarks pay attention to risks, reliability, agents, etc. We point out that current benchmarks have problems such as inflated scores caused by data contamination, unfair evaluation due to cultural and linguistic biases, and lack of evaluation on process credibility and dynamic environments, and provide a referable design paradigm for future benchmark innovation.
- SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
3D content generation has recently attracted significant research interest due to its applications in VR/AR and embodied AI. In this work, we address the challenging task of synthesizing multiple 3D assets within a single scene image. Concretely, our contributions are fourfold: (i) we present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets with geometry and texture. Notably, SceneGen operates with no need for optimization or asset retrieval; (ii) we introduce a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders within the feature extraction module. Coupled with a position head, this enables the generation of 3D assets and their relative spatial positions in a single feedforward pass; (iii) we demonstrate SceneGen's direct extensibility to multi-image input scenarios. Despite being trained solely on single-image inputs, our architectural design enables improved generation performance with multi-image inputs; and (iv) extensive quantitative and qualitative evaluations confirm the efficiency and robust generation abilities of our approach. We believe this paradigm offers a novel solution for high-quality 3D content generation, potentially advancing its practical applications in downstream tasks. The code and model will be publicly available at: https://mengmouxu.github.io/SceneGen.
- aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists
Recent advances in large language models (LLMs) have enabled AI agents to autonomously generate scientific proposals, conduct experiments, author papers, and perform peer reviews. Yet this flood of AI-generated research content collides with a fragmented and largely closed publication ecosystem. Traditional journals and conferences rely on human peer review, making them difficult to scale and often reluctant to accept AI-generated research content; existing preprint servers (e.g. arXiv) lack rigorous quality-control mechanisms. Consequently, a significant amount of high-quality AI-generated research lacks appropriate venues for dissemination, hindering its potential to advance scientific progress. To address these challenges, we introduce aiXiv, a next-generation open-access platform for human and AI scientists. Its multi-agent architecture allows research proposals and papers to be submitted, reviewed, and iteratively refined by both human and AI scientists. It also provides API and MCP interfaces that enable seamless integration of heterogeneous human and AI scientists, creating a scalable and extensible ecosystem for autonomous scientific discovery. Through extensive experiments, we demonstrate that aiXiv is a reliable and robust platform that significantly enhances the quality of AI-generated research proposals and papers after iterative revising and reviewing on aiXiv. Our work lays the groundwork for a next-generation open-access ecosystem for AI scientists, accelerating the publication and dissemination of high-quality AI-generated research content. Code is available at https://github.com/aixiv-org. Website is available at https://forms.gle/DxQgCtXFsJ4paMtn8.
- ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Parametric body models offer expressive 3D representation of humans across a wide range of poses, shapes, and facial expressions, typically derived by learning a basis over registered 3D meshes. However, existing human mesh modeling approaches struggle to capture detailed variations across diverse body poses and shapes, largely due to limited training data diversity and restrictive modeling assumptions. Moreover, the common paradigm first optimizes the external body surface using a linear basis, then regresses internal skeletal joints from surface vertices. This approach introduces problematic dependencies between internal skeleton and outer soft tissue, limiting direct control over body height and bone lengths. To address these issues, we present ATLAS, a high-fidelity body model learned from 600k high-resolution scans captured using 240 synchronized cameras. Unlike previous methods, we explicitly decouple the shape and skeleton bases by grounding our mesh representation in the human skeleton. This decoupling enables enhanced shape expressivity, fine-grained customization of body attributes, and keypoint fitting independent of external soft-tissue characteristics. ATLAS outperforms existing methods by fitting unseen subjects in diverse poses more accurately, and quantitative evaluations show that our non-linear pose correctives more effectively capture complex poses compared to linear models.
- Visual Autoregressive Modeling for Instruction-Guided Image Editing
Recent advances in diffusion models have brought remarkable visual fidelity to instruction-guided image editing. However, their global denoising process inherently entangles the edited region with the entire image context, leading to unintended spurious modifications and compromised adherence to editing instructions. In contrast, autoregressive models offer a distinct paradigm by formulating image synthesis as a sequential process over discrete visual tokens. Their causal and compositional mechanism naturally circumvents the adherence challenges of diffusion-based methods. In this paper, we present VAREdit, a visual autoregressive (VAR) framework that reframes image editing as a next-scale prediction problem. Conditioned on source image features and text instructions, VAREdit generates multi-scale target features to achieve precise edits. A core challenge in this paradigm is how to effectively condition the source image tokens. We observe that finest-scale source features cannot effectively guide the prediction of coarser target features. To bridge this gap, we introduce a Scale-Aligned Reference (SAR) module, which injects scale-matched conditioning information into the first self-attention layer. VAREdit demonstrates significant advancements in both editing adherence and efficiency. On standard benchmarks, it outperforms leading diffusion-based methods by 30\%+ higher GPT-Balance score. Moreover, it completes a 512times512 editing in 1.2 seconds, making it 2.2times faster than the similarly sized UltraEdit. The models are available at https://github.com/HiDream-ai/VAREdit.
- Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds
Reconstructing 3D human bodies from sparse views has been an appealing topic, which is crucial to broader the related applications. In this paper, we propose a quite challenging but valuable task to reconstruct the human body from only two images, i.e., the front and back view, which can largely lower the barrier for users to create their own 3D digital humans. The main challenges lie in the difficulty of building 3D consistency and recovering missing information from the highly sparse input. We redesign a geometry reconstruction model based on foundation reconstruction models to predict consistent point clouds even input images have scarce overlaps with extensive human data training. Furthermore, an enhancement algorithm is applied to supplement the missing color information, and then the complete human point clouds with colors can be obtained, which are directly transformed into 3D Gaussians for better rendering quality. Experiments show that our method can reconstruct the entire human in 190 ms on a single NVIDIA RTX 4090, with two images at a resolution of 1024x1024, demonstrating state-of-the-art performance on the THuman2.0 and cross-domain datasets. Additionally, our method can complete human reconstruction even with images captured by low-cost mobile devices, reducing the requirements for data collection. Demos and code are available at https://hustvl.github.io/Snap-Snap/.
- "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries
Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.
- LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
The development of Large Speech-Language Models (LSLMs) has been slowed by fragmented architectures and a lack of transparency, hindering the systematic comparison and reproducibility of research. Unlike in the vision-language domain, the LSLM field suffers from the common practice of releasing model weights without their corresponding training data and configurations. To address these critical gaps, we introduce LLaSO, the first fully open, end-to-end framework for large-scale speech-language modeling. LLaSO provides the community with three essential resources: (1) LLaSO-Align, a 12M-instance speech-text alignment corpus; (2) LLaSO-Instruct, a 13.5M-instance multi-task instruction-tuning dataset; and (3) LLaSO-Eval, a reproducible benchmark for standardized evaluation. To validate our framework, we build and release LLaSO-Base, a 3.8B-parameter reference model trained exclusively on our public data. It achieves a normalized score of 0.72, establishing a strong, reproducible baseline that surpasses comparable models. Our analysis reveals that while broader training coverage enhances performance, significant generalization gaps persist on unseen tasks, particularly in pure audio scenarios. By releasing the complete stack of data, benchmarks, and models, LLaSO establishes a foundational open standard to unify research efforts and accelerate community-driven progress in LSLMs. We release the code, dataset, pretrained models, and results in https://github.com/EIT-NLP/LLaSO.
- INTIMA: A Benchmark for Human-AI Companionship Behavior
AI companionship, where users develop emotional bonds with AI systems, has emerged as a significant pattern with positive but also concerning implications. We introduce Interactions and Machine Attachment Benchmark (INTIMA), a benchmark for evaluating companionship behaviors in language models. Drawing from psychological theories and user data, we develop a taxonomy of 31 behaviors across four categories and 368 targeted prompts. Responses to these prompts are evaluated as companionship-reinforcing, boundary-maintaining, or neutral. Applying INTIMA to Gemma-3, Phi-4, o3-mini, and Claude-4 reveals that companionship-reinforcing behaviors remain much more common across all models, though we observe marked differences between models. Different commercial providers prioritize different categories within the more sensitive parts of the benchmark, which is concerning since both appropriate boundary-setting and emotional support matter for user well-being. These findings highlight the need for more consistent approaches to handling emotionally charged interactions.
- When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding
Understanding videos requires more than answering open ended questions, it demands the ability to pinpoint when events occur and how entities interact across time. While recent Video LLMs have achieved remarkable progress in holistic reasoning, they remain coarse in temporal perception: timestamps are encoded only implicitly, frame level features are weak in capturing continuity, and language vision alignment often drifts from the entities of interest. In this paper, we present Grounded VideoDiT, a Video LLM designed to overcome these limitations by introducing three key innovations. First, a Diffusion Temporal Latent (DTL) encoder enhances boundary sensitivity and maintains temporal consistency. Second, object grounded representations explicitly bind query entities to localized visual evidence, strengthening alignment. Third, a mixed token scheme with discrete temporal tokens provides explicit timestamp modeling, enabling fine grained temporal reasoning. Together, these designs equip Grounded VideoDiT with robust grounding capabilities, as validated by state of the art results on Charades STA, NExT GQA, and multiple VideoQA benchmarks.
Solidot(15)
- Google TV 和 Android TV 应用到 2026 年 8 月都必须支持 64 位
Google 准备让 Android TV 和其它 Android 生态系统保持一致。官方博客宣布从 2026 年 8 月 1 日起,Google TV 和 Android TV 应用都需要原生支持 64 位,为即将到来的 64 位电视设备做好准备。未满足要求的应用将无法在电视设备使用的 Google Play 应用商店上架。Google 表示,这一转变将有助于提供更好的性能、更短的启动时间,为未来的硬件带来全新的体验。
- 英特尔同意美国政府控制 10% 股份
美国总统特朗普宣布与英特尔达成协议,美国政府将持有芯片巨人 10% 的股份。英特尔是唯一一家能在美国本土制造先进芯片的美国公司,该公司在新闻稿中表示,美国政府向英特尔普通股投资 89 亿美元,以每股 20.47 美元购买了 4.33 3亿股,持有该公司 10% 的股份。英特尔指出,政府支付的价格低于当前的市场价格。89 亿美元中的 57 亿美元来自 CHIPS Act 下已授予但未支付的政府拨款,32 亿美元来自一项制造安全芯片的政府拨款。英特尔表示,美国政府将不会拥有董事席位或其它治理权力。
- FFmpeg 8.0 释出
开源多媒体编解码器项目 FFmpeg 正式释出了代号为 Huffman 的 v8.0 版本。新版本距离上个大版本的发布相距约 17 个月。主要新变化包括:为 RealVideo 6.0、ADPCM IMA Xbox、G.728、Sanyo LD-ADPCM 引入了新的原生解码器;为三星 Advanced Professional Video 引入了 APV 解码器;为 APV、动画 JPEG-XL 和 libx265 alpha 层编码功能引入了编码支持;OpenHarmony 的编码和解码支持;Video Acceleration API (VA-API)的 VVC/H.266 支持; AVX-512 优化、FFV1 改进、AV1 RTP 分组器/解包器、AMD AMF 解码器、Vulkan 视频增强、更好的 HDR 视频支持;新的过滤器如 colordetect、pad_cuda、scale_d3d11、Whisper 等等等。
- Arch Linux 遭遇 DDoS 攻击
Arch Linux 项目披露其基础设施遭遇了 DDoS 攻击,提供了一旦网站发生宕机的一些权益方法,如使用镜像。Arch Linux 称,持续的 DDoS 攻击主要影响主网页、Arch User Repository (AUR)和论坛,它正在与托管服务提供商合作缓解攻击,同时在评估 DDoS 防护方案。
- Google 数据中心的用水量
密歇根州的研究人员调查了亚马逊、Google、微软、Meta、Digital Realty 和 Equinix 六大公司数据中心用水量。亚马逊每年会发布可持续发展报告,但报告没有披露数据中心的用水量,微软的情况类似,Google 和 Meta 的报告更详细。根据 Google 和 Meta 的报告,2023 年 Meta 全球用水量为 8.13 亿加仑,其中 95% 即 7.76 亿加仑用于数据中心;2023 年 Google 全球运营用水量为 64 亿加仑,其中 95% 即 61 亿加仑用于数据中心,2024 年 Google 位于爱荷华州 Council Bluffs 的数据中心用水量达到了 10 亿加仑,是其所有数据中心中用水量最高的,用水量最少的是 Google 位于德州 Pflugerville 的数据中心,用水量 1 万加仑,相当于德州一个家庭两个月的用水量,该数据中心采用风冷而非水冷。
- 审稿人如果审的论文引用了其工作会更可能批准
根据对四家开放获取期刊出版物发表的 18,400 篇论文的分析,如果审稿人自己的论文在手稿后续修改版本中被引用,那么相比那些没有引用的手稿,它们更可能获得批准。研究作者 Adrian Barnett 在澳大利昆士兰科技大学从事同行评审和元研究(meta-research)研究,他表示研究灵感来自轶事传闻,据说作者因为审稿人的要求而在论文中添加新的引用。他认为少量引用是可以接受的,但如果要求引用的数量过多,或者引用理由不充分,那么审稿流程就变成交易型(transactional),引用会增加一名研究人员的 h-index。研究针对的四个开放获取平台——F1000Research、Wellcome Open Research、Gates Open Research 和 Open Research Europ——它们公开了论文的所有版本以及审稿人意见。每篇论文的手稿至少需要有两位审稿人发表评论意见。近 5000 篇论文引用了一位审稿人的论文,有 2300 篇审稿人意见要求引用审稿人的论文。
- 天文学家跟踪一颗垂死恒星长达 130 年
天文学家首次直接追踪一颗垂死恒星超过一个世纪的缓慢转变,这一发现不仅刷新了在行星状星云中观测恒星演化的时间纪录,甚至可能是迄今在所有恒星中所观测到的最长演化过程,其变化幅度亦十分显著。螺线图星云(Spirograph Planetary Nebula)IC 418 是最早被发现的行星状星云之一,同时也是最明亮、最美丽且易于研究的星云之一。早在 1893 年,天文学家便已开始观测其光谱。自被发现以来,IC 418便 持续受到观测,即使观测光谱的技术多次革新,从肉眼测量进展到底片、数字相机,直至今日常用的 CCD,对这一星云的观测从未间断。 IC 418自开始观测以来,其特征性的绿光已经比维多利亚时代天文学家研究时强了约 2.5 倍。这样的变化是由中央恒星温度升高所致。自 1893 年以来,它的温度已经上升约 3,000°C,大约每 40 年增加 1,000°C。作为对照,太阳在形成过程中也曾升高相同程度的温度,但花了整整1,000万年。行星状星云是恒星生命的最后阶段之一。当恒星核心变得不稳定时,会将外层物质抛向太空,留下的核心则会迅速升温,使周围的气体与尘埃被激发形成壮丽的结构。对 IC 418 而言,这些结构错综复杂、宛如漩涡图案,因此获得螺线图星云的昵称。
- 烂番茄被 Fandango 收购后评分出现膨胀
影评聚合网站烂番茄(Rotten Tomatoes)成立于 1998 年,它很快成为可靠的电影电视剧评分网站,三分之一的美国人在观影前会查烂番茄。一部影视剧的综合评分只要超过 60%(3/5),就能贴上新鲜(fresh)的标签。然而过去十年,贴上新鲜标签的影视剧变多了,是因为好莱坞不会再拍烂片了吗?或者是它的统计方法发生了变化导致评分膨胀了?10 年前发生了什么?2016 年美国最大的电影票房平台 Fandango 收购了烂番茄,Fandango 的股东包括了娱乐巨头 NBCUniversal 和 Warner Bros. Discovery,它收购烂番茄被认为会造成利益冲突。收购之后,烂番茄统计的影评人数量增加了 40-70 人,他们主要来自规模较小的媒体。小媒体可能更容易被娱乐巨头的公关部门收买,因此操纵评分比过去更容易了,这可能导致了烂番茄评分的膨胀。
- 百度自动驾驶出租车准备开辟海外市场
百度无人驾驶出租车业务将拓展海外市场。预计 2025 年下半年进驻亚洲和中东,2026 年将涉足欧洲。将通过与网约车大企业合作,方便消费者利用。在美国 Alphabet 旗下的 Waymo寻 求进军日本等海外市场的情况下,百度将利用在中国积累的经验,开拓单价较高的海外市场。百度早在 2013 年就启动了自动驾驶技术的研发,以 2019 年的湖南省长沙市为开端,推出了无人自动驾驶出租车业务“萝卜快跑(Apollo Go)”。目前在湖北武汉和北京等 10 多个城市开展业务。4~6 月的运营次数增加到 220 万次,是 2024 年同期的 2.5 倍。从服务开始到 8 月,累计 1400 万次,行驶里程超过 1 亿公里。
- 英伟达和富士通联合开发下一代富岳
日本理化学研究所宣布,英伟达将参与联合研发下一代富岳超算,其开发代号为富岳NEXT。富岳是世界第一台登顶 TOP500 排行榜的 ARM 超算,性能 415.5 petaflops,2021 年启用,处理器是富士通的 48/52 核 A64FX ARM v8.2-A。富士通公司将负责新超算的基本设计,并将继续为富岳NEXT 研发新处理器,而英伟达将为超算提供 GPU。新超算的计算能力预计将达到 1 Zetta(10 的 21 次方),2030 年前后投入运行。目前 TOP500 排行榜第一的是美国的 El Capitan 超算,使用了 AMD EPYC 处理器,性能 1.742 EFlop/s。
- 丹麦邮政终止送信服务
在数字化时代,寄信过时了,国有丹麦邮政(PostNord)宣布将于年底终止信件递送服务,结束长达四个世纪的送信服务。丹麦邮政将裁掉三分之一员工,专注于盈利的包裹递送业务。公司 CEO Kim Pedersen 说,丹麦人几乎都收不到了信了,这一情况已经持续了多年,他们平均一个月收到一封信,与此同时他们更热衷于网购,丹麦邮政也要跟上电子商务时代。丹麦邮政在 15 年前还运营着几家大的信件分拣中心,如今只剩下哥本哈根西郊的一家。自 2000 年以来,该公司处理的信件数量下降逾九成,从约 14 亿封降至 2024 年的 1.1 亿封,这一趋势还在快速下降。随着丹麦邮政退出送信服务,私有的 DAO 将接手全国的信件投递业务。丹麦是全球数字化最高的国家之一,其程度仅次于韩国。
- 特朗普称美国不再批准新的太阳能风能项目
美国总统特朗普(Donald Trump)表示,即使部分地区电力需求超过供应导致电价上涨,他的政府也不会批准新的太阳能风能项目。特朗普曾在其 Truth Social 平台上发帖称,不会批准风力发电,也不会批准破坏农民生活的太阳能项目。他抱怨太阳能项目占用了太多土地,“愚蠢的日子在美国结束了!!!(The days of stupidity are over in the USA!!!)”。
- 俄罗斯命令智能手机和平板预装 MAX
在限制了该国最流行的即时消息应用 WhatsApp 和 Telegram 之后,俄罗斯政府要求在其境内销售的智能手机和平板必须从 9 月 1 日起预装本土替代应用 MAX。俄罗斯政府在一份声明中表示,MAX 将整合政府服务,被列入自 9 月 1 日起在俄罗斯销售的所有电子设备——包括手机和平板电脑——强制预装的应用名单。官方媒体称,批评人士关于 MAX 是间谍应用的指控是错误的,MAX 的用户数据访问权限比竞争对手 WhatsApp 和 Telegram 少。俄罗斯同时还要求从 9 月 1 日起所有 Android 设备和苹果设备都预装本国的应用商店 RuStore。此外名为 LIME HD TV 的俄语电视应用也将预装在俄罗斯所有销售的智能电视上。
- OpenAI联合创始人Greg Brockman:从游戏AI到通用智能,我们的创业一路意外,ChatGPT模式都是不得已的选择
在Stripe播客《Cheeky Pint》中,OpenAI联合创始人兼总裁Greg Brockman分享了AI发展的关键洞察。他透露规模假说并非OpenAI的初始战略,而是在2017年Dota 2项目中意外发现的——每当计算资源翻倍,AI表现就相应提升,这一发现彻底改变了AI研究方向。Greg强调AI项目管理需要"过程导向"而非"结果导向",因为AI结果不可控,只能控制输入。关于GPT-3的产品化决策,团队最初感到绝望,因为选择做API违背了传统创业原则,但最终证明当技术足够强大时市场会自己找到出路。Greg预测AI将在2-5年内解决千禧年数学难题,能源将成为AI发展的主要瓶颈,而非技术本身。他认为数据墙问题已通过合成数据、强化学习等新方法得到突破,AI编程正从代码生成向智能协作演进。Greg还分析了OpenAI采用类似Disney的产品策略——以核心模型为资产,通过多种方式产品化,这种技术驱动的方法虽违背传统创业理论,但适合AGI公司的特殊性质。
- 非洲受野火影响最严重
根据发表在《科学》期刊上的一项研究,从 2002 年至 2021 年,全球直接暴露于野火的人口增加了 4 成。尽管同期过火面积减少了 26%,但这种增长仍在发生。这一增长的主要原因是有更多的人生活在野地-城市交界带——换言之:人们正迁入野火频发地区。此外在 2002 年至 2021 年间,尽管北美、欧洲和大洋洲的野火灾害引发更多的关注,但全球 85% 的与野火的接触发生在非洲(虽然那里的野火通常未达灾难性级别)。在 1990 年至 2021 年间发生的野火造成至少 2500 人死亡和 1 万又 500 人受伤,而全球有 153 万人的死亡可归因于野火引发的空气污染。