TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0914
THU, JUL 2, 2026
OrangeBot.AI 智能策划和筛选每日科技趋势和新闻,为您节省时间。
TODAY · THU, JUL 2, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

新功能!我们推出了用于保存推文和Reddit帖子的Chrome扩展程序。点击安装!
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

July 2, 2026

Of course. Here is a summary of today's main events based on the information provided.


U.S. Job Growth Slows, Shifting Market Expectations

The U.S. economy added fewer jobs than anticipated in June, marking a slowdown in the labor market after several months of strong growth. In response to the news, U.S. stocks and gold prices rose, while the U.S. dollar and Treasury yields fell, as investors speculated the data might lead the Federal Reserve to be less aggressive with future interest rate hikes.

Russia Launches Massive Air Attack on Ukraine

Russia unleashed a large-scale missile and drone attack across Ukraine, involving nearly 500 drones and over 70 missiles. The widespread assault on multiple regions follows recent warnings from Ukrainian President Volodymyr Zelenskyy that Moscow was preparing for a significant strike.

U.S. to End USMCA, Shifting North American Trade Policy

The White House announced it will not renew the U.S.-Mexico-Canada Agreement (USMCA), the trade deal that replaced NAFTA. Instead, Washington will conduct annual reviews of its trade terms with its two largest partners, signaling a major shift in its approach to continental commerce.

Google Loses Appeal on Record €4 Billion EU Antitrust Fine

The European Court of Justice upheld a record €4 billion fine against Google for abusing the dominance of its Android operating system. The ruling is a significant victory for the EU's long-running effort to regulate the power of major technology companies.

AI Development Surges Amid Regulatory and Economic Concerns

Artificial intelligence remains a key focus, with news of a new AI-integrated handset prototype and a major pharmaceutical partnership using AI for drug discovery. Simultaneously, Washington is debating how to regulate advanced AI models, while some experts warn that the rapid boom in AI investment could lead to a potential economic bust.

Oil Prices Fall as U.S.-Iran Tensions Ease

Crude oil prices continued to fall as diplomatic talks between the U.S. and Iran progressed and commercial shipping moved uninterrupted through the Strait of Hormuz. The easing of geopolitical tensions in the Middle East has reduced fears of a supply disruption, pushing prices lower.

Medicare Expands Coverage to Include Weight-Loss Drugs

In a major policy change, Medicare has started covering certain popular weight-loss drugs for seniors with related health conditions, such as heart issues. The move is seen as a significant win for patients but creates new financial complexities for the healthcare industry and investors.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - July 2, 2026

Hacker News Feed: Highlighting key posts and discussions.

The fall of the theorem economy

(davidbessis.substack.com)

13248
Fable 5 is Back

(twitter.com)

386381
FFmpeg 9.1's new AAC encoder

(hydrogenaudio.org)

413132
ArXiv's Next Chapter

(blog.arxiv.org)

28191
03

HUGGINGFACE

03.00
HUGGINGFACE

HuggingFace 新闻 - July 2, 2026

HuggingFace Feed:最新的 AI 模型、数据集和社区动态。

PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception

We introduce PerceptionRubrics, a rubric-based evaluation framework that addresses the gap between saturated benchmark scores and real-world brittleness. Shifting evaluation from holistic semantic matching to rigorous atomic auditing, PerceptionRubrics pairs 1,038 information-dense images with over 12,000 instance-specific rubrics. These criteria are derived from golden captions constructed via a novel Circular Peer-Review consensus pipeline and then distilled into a dual-stream system of Must-Right (essential facts) and Easy-Wrong (fine-grained details) rubrics. Crucially, PerceptionRubrics implements a Gated Scoring mechanism: unlike linear averages, failure on mandatory visual facts triggers sharp binary penalties. Extensive evaluation yields critical insights: (1) The Reliability Gap: models often verify fragmented elements correctly yet fail strict conjunctive constraints, exposing brittleness in dense domains; (2) Open-Closed Stratification: contrary to reasoning trends, we reveal a persistent 8% perception deficit between open-source and proprietary frontiers; and (3) Human-Aligned Rigor: our gated metrics substantially out-align conventional benchmarks, validating that strict perceptual fidelity is the prerequisite for reliable generation.

26
TurboServe: Serving Streaming Video Generation Efficiently and Economically

Streaming video generation is emerging as a new serving workload in which users interact with long-lived sessions that generate video progressively, chunk by chunk. Unlike offline video generation or typical LLM serving, streaming video generation must preserve session state across active and idle periods, repeatedly schedule ongoing sessions, and deliver each chunk under a tight latency target. This creates two key serving challenges in multi-user, multi-GPU environments: session duration heterogeneity, where long-running sessions make placement decisions suboptimal over time, and temporal user-demand heterogeneity, where the number of active sessions fluctuates sharply across bursts and idle periods. We present TurboServe, the first serving system designed specifically for streaming video generation workloads. TurboServe formulates serving as an online scheduling problem that jointly coordinates session placement and GPU provisioning. Its closed-loop scheduling algorithm combines a migration-aware placement controller, which rebalances sessions across GPUs to reduce the maximum per-chunk latency, with a load-driven autoscaling controller, which adapts the GPU budget to workload variation for improved cost efficiency. To support these decisions at runtime, TurboServe implements coalesced chunk processing for batching concurrent active sessions on the same GPU, GPU-CPU offloading for session suspension and resumption, and NCCL-based GPU-GPU migration for online rebalancing. We evaluate TurboServe on real-world production traces from Shengshu Technology across multiple model sizes and GPU clusters with up to 64 NVIDIA B300 GPUs. Compared with baseline serving configurations, TurboServe reduces worst-case per-chunk latency by 37.5% and total GPU operating cost by 37.2% on average. Our code is publicly available at https://github.com/shengshu-ai/TurboServe.

18
ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving

In prefill-decode (PD) disaggregated LLM serving, each request is assigned to a decode worker after prefill. Existing decode routers balance only load; for mixture-of-experts (MoE) models this is incomplete: equally loaded workers can differ in latency, since each decode step loads the weights of every distinct expert its batch activates. We present ELDR, an expert-locality-aware decode router for PD-disaggregated MoE serving. From a request's prefill expert activations, ELDR builds an expert signature predicting the experts it will activate during generation. Offline, balanced K-means partitions signature space across decode workers; online, locality-band routing sends each request to the least-loaded worker among those best matching its signature. A signature cache, co-indexed with the KV cache at KV-block granularity, keeps signatures exact under prefix caching. Implemented in vLLM and evaluated on deployments of up to 40 GPUs, ELDR reduces median TPOT by 5.9-13.9% over the strongest of four load-balancing baselines across three MoE models and two workloads, with model outputs unchanged.

17
MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

Memory has emerged as a cornerstone of modern LLM-based agents, supporting their evolution from single-turn assistants to long-term collaborators. However, memory is not always beneficial: retrieved memories often induce a critical issue of sycophancy, causing agents to over-align with the user at the cost of factual accuracy or objective reasoning. Despite this emerging risk, existing memory benchmarks primarily evaluate whether memories are correctly stored, retrieved, or updated, while overlooking how retrieved memories influence downstream reasoning and decision-making. To bridge this gap, we propose MemSyco-Bench, a comprehensive benchmark for evaluating memory-induced sycophancy in agent systems. MemSyco-Bench measures when memory should influence a decision and how valid memory should be used. Specifically, it covers five tasks that assess whether agents can reject memory as factual evidence, respect its applicable scope, resolve conflicts between memory and objective evidence, track memory updates, and use valid memory for personalization. All related resources are collected for the community at https://github.com/XMUDeepLIT/MemSyco-Bench.

17
Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts

Vision-Language-Action (VLA) models often fail to perform the same learned tasks under environmental shifts, such as changes in camera pose and shifts to a different but similar robot (e.g., from Panda to UR5e). Adapting these models to the shifted environment (i.e., target domain) often requires training on multiple demonstrations for each task, which are costly to collect. To reduce the burden of data curation and training, we propose an analogy-based method that adapts VLA models under environmental shifts through weight vector arithmetic with domain-specific information addition, named Domain ARiThmetic (DART). Unlike prior approaches, DART requires collecting only a single demonstration, enabling efficient adaptation. To accurately isolate domain-specific information for addition, DART performs subspace alignment between singular components in weight vectors to filter out noisy components. In both simulated and real-world experiments, DART outperforms existing VLA adaptation methods in one-shot scenarios across diverse visual and embodiment shifts. Code is available at https://github.com/snumprlab/dart.

15
Multimodal Continuous Reasoning via Asymmetric Mutual Variational Learning

Multimodal Large Language Models (MLLMs) are often constrained by a language-space bottleneck, forcing complex visual reasoning into discrete tokens which can lose perceptual nuance. A promising alternative is continuous latent reasoning, where the goal is to discover implicit reasoning pathways that bridge the multimodal query and the final answer. However, this introduces a severe train-inference mismatch: a training-time posterior, conditioned on the ground-truth answer, can exploit answer-dependent shortcuts. Standard variational training then forces the inference-time prior to mimic a posterior that has access to information unavailable at test time, leading to poor performance. To address this, we propose Asymmetric Mutual Variational Learning (AMVL), a framework that resolves this mismatch via a bidirectional calibration objective. A forward KL divergence trains the target-agnostic prior to match the posterior, while a novel reverse KL divergence simultaneously regularizes the posterior, preventing it from collapsing into inference-incompatible regions and mitigating this ``answer leakage''. We provide theoretical analysis formalizing this leakage as prior contamination and prove that our dual-KL objective reduces it. We instantiate AMVL in a latent-integrated MLLM and show that it consistently outperforms strong discrete and latent-reasoning baselines, improving the average score on the complex BLINK benchmark by +10.83 and achieving gains of up to +32.00 on individual reasoning tasks, with analyses confirming improved latent-space stability.

14
CausalMix: Data Mixture as Causal Inference for Language Model Training

In Large Language Model (LLM) training, data mixing plays a pivotal role in determining model performance. Recent methods optimize mixture weights via proxy models, but they rely on the assumption of static data distributions. As a result, when the underlying data pool shifts, these methods require costly retraining from scratch. This limitation restricts their ability to scale seamlessly from small settings to larger data pools and model sizes. In this paper, we propose CausalMix to address this limitation by casting data mixture optimization as a causal inference problem. We formulate the statistical features of the data pool as covariates and the domain mixture as the treatment. After fitting a causal model on 512 runs of Qwen2.5-0.5B to estimate the Conditional Average Treatment Effect (CATE), we extrapolate the optimal mixture for an 800K data pool and apply it to train a 7B model. Furthermore, we successfully generalize the framework to long chain-of-thought data on Qwen3-4B-Base. By leveraging causal modeling to isolate confounding biases, CausalMix dynamically infers state-dependent optimal data mixtures. Extensive experiments show that the mixture guided by CausalMix consistently improves performance across multiple downstream tasks, outperforming RegMix and other baselines. In addition, we use the CATE Interpreter to provide visual analysis of the learned mixing strategy. Overall, CausalMix offers a causal and interpretable framework for optimizing LLM data mixtures.

12
Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

Fine-grained visual reasoning remains challenging for vision-language models, especially when small but critical visual cues are buried in high-resolution images. Existing approaches rely on repeated cropping or test-time visual search to introduce local evidence, but they typically do not explicitly distinguish perception from reasoning. In this paper, we propose Perceive-to-Reason (P2R), a unified framework that formulates fine-grained visual reasoning as a two-stage process: the model first localizes question-relevant evidence as a Perceiver, and then answers the question as a Reasoner based on the annotated image and cropped regions. To better align training with this decoupled formulation, we further introduce Perception-Reasoning Alternating GRPO (PRA-GRPO), a role-aware reinforcement learning strategy that alternates between perception-focused and reasoning-focused updates using only final-answer supervision. Built on top of Qwen3-VL-Instruct-2B/4B/8B, P2R consistently improves performance across model scales. In particular, P2R-4B achieves 93.2% on V-Star, 81.9% on HR-Bench-4K, and 80.5% on HR-Bench-8K, substantially outperforming its corresponding backbone. Further experiments show that the benefits of P2R extend beyond high-resolution benchmarks to broader multimodal reasoning tasks. These results suggest that explicitly decoupling perception from reasoning provides an effective framework for fine-grained visual reasoning.

10
Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity

We present Seed2.0, a model series that takes a meaningful step toward solving complex, real-world tasks. Our approach begins with identifying users' genuine needs and constructing a reliable, forward-looking evaluation system by selecting and abstracting benchmarks grounded in these needs and in realistic, complex scenarios. Guided by this evaluation system, Seed2.0 targets two persistent challenges, long-tail knowledge and complex instruction following, substantially improving the model's reliability on intricate, long-horizon tasks. Beyond these, Seed2.0 delivers world-leading reasoning intelligence, visual understanding, and search capabilities that address the most common needs of a broad user base. Through extensive real-world use cases documented in this model card, we demonstrate that Seed2.0 begins to exhibit the ability to handle initial complex real-world tasks, delivering greater value to hundreds of millions of users.

9
ASPIRE: Agentic /Skills Discovery for Robotics

Traditional robot programming is challenging: it requires orchestrating multimodal perception, managing physical contact dynamics, and handling diverse configurations and execution failures. We introduce ASPIRE (Agentic Skill Programming through Iterative Robot Exploration), a continual learning system that autonomously writes and refines robot control programs in a code-as-policy paradigm while compounding experience into a reusable skill library. ASPIRE discovers skills that persist across tasks, simulation and real-world settings, and embodiments. It operates in an open-ended loop with three components: (1) a closed-loop robot execution engine that exposes fine-grained multimodal traces, enabling autonomous failure diagnosis, repair synthesis, and validation; (2) a continually expanding skill library that distills validated fixes into reusable, transferable knowledge; and (3) evolutionary search that generates diverse task sequences and control programs to explore beyond single-trajectory refinement. ASPIRE surpasses prior methods by up to 77% on LIBERO-Pro manipulation under perturbation, 72% on Robosuite bimanual handover, and 32% on BEHAVIOR-1K long-horizon household tasks. Its accumulated library also enables zero-shot generalization to unseen long-horizon tasks: on LIBERO-Pro Long, ASPIRE achieves 31% success versus 4% for prior methods despite their use of test-time reasoning and retries. Finally, simulation-discovered skills provide initial evidence of sim-to-real transfer, substantially reducing real-robot programming effort across different embodiments and robot APIs.

8
ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

Mobile manipulation is a key capability for general-purpose robots, yet remains challenging for current embodied learning methods. VLA policies are typically reactive and lack explicit world modeling, while existing World Action Models (WAMs) are still poorly aligned with the structure of mobile manipulation: they operate on coarse video chunks, model entangled navigation-manipulation actions, and train inverse dynamics under supervision that does not match autoregressive inference. As a result, they often miss fine-grained contact dynamics, suffer from action-distribution conflicts, and accumulate errors over long-horizon rollouts. We propose ABot-M0.5, a new WAM built on the insight that mobile manipulation requires alignment at three levels: temporal granularity, action space, and train-test consistency. To align temporal granularity, we introduce intermediate latent actions that capture local visual state transitions and serve as an bridging action space between video latents and embodiment-specific controls. To align action space, we design a dual-level Mixture-of-Transformers architecture that disentangles both modality representations and heterogeneous action subspaces such as base movement and arm manipulation. To align inference conditions, we propose the dream-forcing training strategy that progressively trains inverse dynamics on model-predicted videos, improving train-test alignment and robustness during autoregressive prediction. Experiments on challenging mobile and fine-grained manipulation benchmarks demonstrate that ABot-M0.5 achieves state-of-the-art performance in both long-horizon task success and finegrained control accuracy. These results highlight the critical importance of granularity-aligned, action-disentangled, and inference-consistent world-action modeling.

8
The State-Prediction Separation Hypothesis

Transformers use the same forward computation stream to both predict the next token and store useful state for future token predictions. We formulate the state-prediction separation hypothesis: disentangling the two roles yields better language modeling performance. We design a Transformer variant that uses two computation streams to separate the two functions, and conduct pretraining experiments across various scales. Our experiments show that state-prediction separation consistently offers better data and compute efficiencies, improving validation loss and outperforming standard Transformers by 2--3 percentage points on average on downstream tasks. We also conduct extensive empirical analysis that rules out potential confounders and demonstrates the fundamental difference in the gradients our design entails.

7
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that autonomous post-training is not just a coding problem: it requires the agent to repeatedly plan iterations, construct benchmark-aligned data, run stable training jobs, evaluate checkpoints, and preserve experiment state across many hours of interaction. We present AutoTrainess, a LM agent that exposes these operations as a repository of agent-computer interfaces for planning, data preparation, training, evaluation, and logging. Rather than leaving the agent to operate in a raw CLI environment with an underspecified action space, AutoTrainess externalizes prior human experience as explicit workflows, rules, and execution constraints that guide the agent toward effective and reliable training behavior. On PostTrainBench, AutoTrainess consistently outperforms CLI-only baselines, achieving 26.94 average score with GPT-5.4 (Codex) versus 23.21 for CLI-only. It also generalizes across models and harnesses, improving DeepSeek-V4-Flash (OpenCode) from 12.13 to 19.58.

6
BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery

Biomedical researchers increasingly use AI-generated analyses and reports to interpret protein-level signals, but static outputs are often insufficient for research decision-making, where users need to inspect evidence, assess uncertainty, compare mechanisms, and refine hypotheses. We present BioInsight, a multi-agent system that moves from static biomedical report generation to interactive evidence-centered interactive interface generation. Given a disease name, a protein association table, and optional cohort metadata, BioInsight organizes disease-specific evidence through typed intermediate artifacts, including ranked pathways, literature evidence packets, protein-level reasoning notes, citation-grounded reports, dashboard schemas, and rendered interactive interfaces. The system decomposes evidence retrieval from mechanistic reasoning, normalizes citations through deterministic components, and converts the same structured evidence used in the report into an interactive interface. We evaluate BioInsight on standardized biomedical QA, challenging protein-function reasoning, and end-to-end biomedical evidence synthesis. Results show that BioInsight achieves best, and suggest that biomedical AI systems should move beyond text-only and static reports toward provenance-preserving, interactive evidence artifacts.

6
Valdi: Value Diffusion World Models

World models can enable Model Predictive Control (MPC), but this requires dynamics prediction that is both fast enough for online use and expressive enough to represent uncertain futures. Diffusion models offer a natural mechanism for modeling uncertain dynamics, yet their iterative inference procedure makes them difficult to use for low-latency latent planning. We bridge this gap with Value Diffusion World Models (Valdi), combining end-to-end online training for MPC with a latent diffusion dynamics model. In preliminary experiments on the CarRacing environment, we show that Valdi, using a single diffusion step at both training and inference, matches a deterministic MLP baseline. Our experiments expose a trade-off between predictive multimodality and control performance in this setup. Code is available at https://github.com/Kit115/ValueDiffusionWorldModels.

5
Cross-Domain Generalization Failure in Lightweight Intrusion Detection Models for IIoT Networks

Lightweight machine learning models are increasingly proposed for intrusion detection in Industrial Internet of Things (IIoT) networks due to their suitability for resource-constrained edge deployment. Most reported results evaluate these models only within their training network, leaving behavior on unseen networks unverified. This study trains four lightweight architectures on one IIoT dataset and evaluates them, without retraining, on two structurally distinct IIoT datasets using a feature representation restricted to attributes available across all three sources. Explainability analysis across two top-performing models shows both rely overwhelmingly on coarse port-category features; the most influential category occurs in source-domain attack traffic at 96 to 435 times the rate in the two target domains, indicating that coarsening port resolution relocates rather than removes a documented shortcut. Evaluation under naturally imbalanced class distributions reveals a further effect: the evaluation protocol used can reverse which target network appears to pose the greater generalization challenge. Adversarial robustness and recovery through limited target-domain exposure are also assessed; robustness to adversarial perturbation is unrelated to cross-network generalization, and recovery through adaptation varies considerably by architecture. These findings suggest deployment readiness should be assessed using cross-network evaluation under realistic class distributions, rather than within-domain accuracy alone.

4
AtomiMed: Hierarchical Atomic Fact-Checking for Universal Clinical-Aware Medical Report Evaluation

Traditional metrics for Medical Report Generation (MRG) predominantly rely on surface-level n-gram overlap, which fails to capture clinical factual accuracy and often overlooks catastrophic diagnostic errors. We address this fundamental limitation by proposing AtomiMed, a universal, modality-agnostic evaluation framework that decomposes complex medical narratives into a standardized, multi-level hierarchy of Atomic Clinical Facts, encompassing Disease-level entities and Attribute-level descriptors, including location, morphology, and severity. By implementing an Agentic Cross-Verification loop between ground-truth and predicted reports, AtomiMed simulates a multi-radiologist peer-review process to verify clinical consistency, thus enabling the decoupled assessment of diagnostic detection and descriptive accuracy. To facilitate standardized evaluation, we introduce MRGEvalKit, an open-source toolkit for automated hierarchical extraction, and curate OmniMRG-Bench, a comprehensive multi-modal benchmark covering X-ray, CT, MRI, and Ultrasound. Extensive experiments on multiple expert-annotated reader studies demonstrate that AtomiMed achieves significantly higher correlation with human radiologist judgment compared to traditional and model-based metrics. Our code are release at https://github.com/Venn2336/MRGEvalkit

4
Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination

Accelerating materials discovery requires AI systems that can generate scientifically valid hypotheses through multi-step, domain-grounded reasoning. Standard large language models often produce fluent but weakly traceable responses to open-ended materials design problems, making it difficult to determine whether final answers are supported by coherent intermediate reasoning. We develop Graph-PRefLexOR, a family of graph-native reasoning models fine-tuned with Group Relative Policy Optimization (GRPO) to organize reasoning into explicit phases for mechanism exploration, graph construction, pattern extraction, and hypothesis synthesis. This design links neural language generation with symbolic relational structure, enabling causal connections to be constructed, inspected, and reused. On 100 open-ended questions from materials science and mechanics literature, Graph-PRefLexOR achieves 40-65% improvements over corresponding base models, with the largest gains in reasoning traceability. Embedding analyses show broader semantic exploration and approximately 2-3 times greater semantic diversity than baselines. Semantic backtracking and layer-wise hidden-state analyses further show stronger alignment between structured reasoning and final answers. Finally, test-time graph expansion reveals that additional compute primarily increases long-range conceptual recombination within a bounded semantic space, rather than simply expanding semantic coverage. These results establish graph-native reinforcement learning as a pathway toward interpretable AI systems for scientific hypothesis generation in materials design and other scientific applications.

3
When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer accuracy, DREs directly compromise the correctness and reliability of intermediate reasoning steps. Yet prior studies have only offered limited, small-scale analyses. In this work, we present the first systematic evaluation of tabular data referencing errors across different models and tasks. Our results show that DREs occur across all tested models (1.7B to 20B parameters). Furthermore, we demonstrate that incorporating data referencing as a critic significantly improves answer accuracy up to 12.0%, through critic-based filtering and rejection sampling. Finally, we trained a lightweight 4B-parameter critic model that achieves an average F1 score of 78.2% in detecting both in-distribution and out-of-distribution DREs, and effectively assists inference for larger models.

3
Autonomous Scientific Discovery via Iterative Meta-Reflection

Autonomous scientific discovery systems offer the potential to accelerate research by automating the process of hypothesis generation and validation. However, current systems operate within constrained search spaces or require predefined research questions, limiting their capacity for true open-ended inquiry. Furthermore, while they generate hypotheses iteratively, they largely lack the ability to explicitly synthesize their own accumulated findings to uncover complex, interconnected phenomena. We introduce DiscoPER, an autonomous large language model-powered framework that conducts open-ended research by dynamically generating and executing code to explore datasets without pre-specified research objectives. To ensure rigorous scientific validity, every proposed discovery must pass statistical testing. To overcome the limitations of isolated search, our framework introduces a second-order reasoning mechanism that periodically analyzes its own accumulated discoveries. By treating prior discoveries as empirical data, DiscoPER identifies structural patterns, confounds, and epistemic gaps, actively redirecting hypothesis exploration toward uncharted regions of the search space. The search space is further expanded by incorporating tool use, enabling the system to explore hypotheses beyond structured metadata by seamlessly processing and extracting useful information from multimodal sources like images. Evaluated on iNatDisco, a new multimodal ecological knowledge benchmark with pattern-level ground truth obtained from peer-reviewed literature, DiscoPER recovers 8 of 9 known patterns with a 72.7% hypothesis support rate, outperforming both classical causal discovery and LLM-guided baselines. Ablations show that DiscoPER scales with more data, and confirms the benefits of second-order meta-reflection.

3
PixelEyes: Decoupling Perception and Reasoning for Pinpoint Visual Evidence Seeking

This paper explores multi-turn visual reasoning and observes that MLLMs repeatedly fail to localize the target, leading to long, redundant trajectories. We attribute this failure to the entanglement of reasoning and perception within a single model, the MLLM reasons and localizes simultaneously, and inaccurate localization triggers additional reasoning turns that bloat the trajectory. To solve this problem, we propose PixelEyes, a multi-turn visual reasoning agent that explicitly decouples reasoning from perception, i.e., the reasoner decides what to look for, while a specialized perception tool answers where it is. Specifically, PixelEyes introduces 1) Mask-guided Visual Search. A referring segmentation model is invoked to provide mask-precise localization, freeing the reasoner from the need to compensate for imprecise grounding. 2) Semantic-region Breadth-first Search (BFS). To eliminate redundant loops caused by repeatedly cropping incorrect sub-regions, we organize exploration as a breadth-first search over semantic regions. To internalize these capabilities, we construct the PixelEyes-6K dataset by resynthesizing expert trajectories from existing data. This explicitly embeds our mask-guided search and BFS logic into the model. We further introduce Pinpoint-Bench, a zero-hint visual search benchmark, i.e., no location cues are provided in the question, with instance-level masks and bounding boxes that separate localization failures from reasoning failures, enabling fine-grained analysis of failure modes such as inattentional blindness. Recent state-of-the-art MLLMs and visual reasoning agents leave large headroom on Pinpoint-Bench, demonstrating its quality and difficulty. Code and models are open-sourced.

1
Personalization as Inverse Planning: Learning Latent Design Intents for Agentic Slide Generation via Structural Denoising

Slide design requires personalizing both deck themes and page layouts. Yet, current AI agent-based methods struggle with fine-grained, page-level design. Solely relying on prespecified templates or user verbose instructions, they fail to capture latent design intents, leaving Page-level Slide Personalization (PSP) unresolved. To close this gap, this work formulates PSP as an inverse planning problem. We propose to learn a design intent without assuming any knowledge of the specific executing tools (e.g., PowerPoint, Beamer) being used. However, relinquishing control over these tools makes the problem intractable to optimize end-to-end. To overcome this, we propose SPIRE, a principled framework to solve PSP approximately. By intentionally corrupting the visual structures of clean slides, SPIRE creates a verifiable task to denoise the corruption, whereby two agents learn to collaboratively refine executable designs via reinforcement learning (RL). We present a proof that structural denoising is a consistent surrogate for PSP, and that the multi-agent formulation strictly reduces policy gradient variance in RL. Extensive experiments demonstrate the superiority of SPIRE.

1
CogSENet: Blind Image Deblurring with Blur-Conditioned Semantic Routing and Explicit Frequency Fusion

Blind image deblurring demands the recovery of high-fidelity details and coherent structures from complex, unknown degradations. Current blind image deblurring methods struggle with real-world, spatially varying degradations, and lack the semantic awareness necessary to reliably differentiate valid textures from artifacts. To bridge this gap, we propose CogSENet, a dynamic, semantic-aligned reconstruction framework inspired by the eagle's visual system. By mimicking the eagle's active saccadic scanning, we devise a Semantic-Driven State Space Module (SDSSM) with semantic-aware token regrouping via differentiable routing, enabling prompt-conditioned long-range dependency modeling. To ensure physically interpretable recovery of textures and structures, a BiFreqFusionBlock (BFFB) mirrors functional differentiation of the eagle's retina by decomposing features into high and low frequencies using wavelet transforms. Finally, we estimate a continuous Blur Field (CBF) from blur image and fuse it with CLIP semantic priors to modulate the deepest latent features, emulating focal adaptation and enabling adaptive restoration under spatially non-uniform blur. Extensive experiments demonstrate that CogSENetoutperforms state-of-the-art deblurring methods in both visual quality and structural fidelity with fewer parameters, while also performing favorably on dehazing, deraining, and denoising tasks.

0
AI translation of literary texts is "fine", but readers still prefer human translations

AI translation of literary works is increasingly common. While the content may be rendered adequately, we do not know enough about how readers experience it in terms of immersiveness and literary effect, aspects poorly captured by automatic machine translation metrics or human evaluation targeting fluency and adequacy. We ask 15 avid readers to compare recently published human translations (HT) to machine translations (MT) generated with an agentic large language model (LLM)-based pipeline, for 15 recent novels in French, Polish, and Japanese and translated into English. Readers evaluated approximately 8K-word excerpts in two conditions: immersive reading of the whole excerpt (30 comparisons) and close reading of 386 aligned HT-MT chunk pairs (772 comparisons), with two readers per book and in alternating order of presentation. Overall, readers find MT "fine", but prefer HT (slightly at excerpt-level 19/30, more clearly at chunk-level 522/772) for its ease, clarity, and immersive nature. Readers' highlights show that MT's quality varies more within one book than HT's does. Crucially, readers cannot reliably tell the two apart (17/30 guess correctly) and tend to prefer the version they believe to be human. Automatic metrics, including LLM-as-a-judge approaches, fail to recover reader preferences and favor MT. We release LAIT (Literary AI Translation), a reader-centered evaluation dataset with 1K reader comments, 2K judgments and preference ratings, and 7.2K span-level annotations, along with our evaluation protocol and supporting interface.

0
NoPA: Non-Parametric Online 3D Scene Graph Generation

Classic 3D scene graph generation approaches fail to work in real-time due to the heavy computational cost of environment mapping and the need to generate intermediate point-cloud representations. To alleviate this issue, a recent work eschews point clouds in favor of a lightweight Gaussian distribution for each object. This approximation drastically speeds up inference and enables real-time 3D scene graph generation. However, the representation has two key weaknesses. 1) Each object is approximated by a single 3D Gaussian, which causes a severe loss of 3D geometric detail. 2) The discrepancy between this approximation and the true object geometry exacerbates the inaccurate merging of object candidates during online inference. To address these issues, we propose NoPA, which represents each object as a separate non-parametric distribution. This formulation retains 3D geometric information while preserving real-time inference of the parametric Gaussian formulation. To build upon our novel object representation, we propose a tailored merging strategy to recover coherent object instances. Specifically, we leverage maximum mean discrepancy on kernel density estimates to enable robust merging of object candidates during online exploration while minimizing added computational complexity. The key is to maintain a fixed particle set per object. Furthermore, to rectify the relation loss caused by misclassified objects, NoPA propagates relationships between objects with high affinity. Experiments show that NoPA substantially outperforms current methods without sacrificing real-time inference speed.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - July 2, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Retrace icon
Retrace

Debug AI agents by replaying and forking runs

0
Needle icon
Needle

The proactive GTM agent in Slack and Teams

0
Quick Sub 2: Video Subtitling icon
Quick Sub 2: Video Subtitling

Quick, creative video subtitling with direct canvas control.

0
Flowly icon
Flowly

A personal AI agent that runs on your desktop and iPhone

0
Basedash Actions icon
Basedash Actions

A BI tool that can take action for you

0
Context.dev icon
Context.dev

One API to scrape, enrich, and extract the internet

0
Gaming Chat SDK by CometChat icon
Gaming Chat SDK by CometChat

Chat drops into Unreal like it was always there

0
scritty icon
scritty

Shared, searchable memory for every AI coding agent

0
Sidedoor icon
Sidedoor

Paste any job, find who in your network can refer you

0
Banger Mail icon
Banger Mail

Shared mailboxes for teams and AI agents

0
Fypro icon
Fypro

Convert your TikTok followers into paying customers

0
Solaris icon
Solaris

Your company’s AI adoption and upskilling platform

0
PixFit icon
PixFit

Turn 1 creative into every ad format, instantly

0
Macro icon
Macro

Unifies your work into one app with shared memory

0
PieterPost MCP icon
PieterPost MCP

Connect your AI agent to postal mail

0
html.contact icon
html.contact

A full form backend you can test before paying

0
Macuse icon
Macuse

Give Your AI Superpowers on macOS

0
Wins 3.4 icon
Wins 3.4

Snap, switch, and arrange Mac windows from the notch

0
Bamboo icon
Bamboo

Markdown notes with AI under your control

0
Aruki icon
Aruki

The Japanese walking method, coached on your iPhone

0
Modelence Mobile Builder icon
Modelence Mobile Builder

Build mobile apps by chatting with AI

0
Folderly Lens icon
Folderly Lens

Domain health analysis for high performance email campaigns

0
Claude Sonnet 5 icon
Claude Sonnet 5

AI that plans, acts, and gets work done

0
Sequence Agentic icon
Sequence Agentic

Money movement for AI agents

0
Claude Science icon
Claude Science

Your research partner for rigorous science

0
Adam CAD Copilot icon
Adam CAD Copilot

AI CAD inside Onshape and Fusion

0
Dump Memory icon
Dump Memory

We fix your memory

0
OASIS 1 Ring icon
OASIS 1 Ring

Whisper to write and touch to edit

0
Loot icon
Loot

Collect your favorite things in real life

0
Humalike icon
Humalike

Give your AI agents the social intelligence they're missing

0
N71 icon
N71

Give all your AI agents one shared context

0
Gemini Omni Flash icon
Gemini Omni Flash

High-quality video generation and conversational editing

0
Mark by Airtop icon
Mark by Airtop

Vibe automation for solo marketers

0
Fuser Apps icon
Fuser Apps

Vibecode apps, sites, & games on everyone's favorite canvas

0
MailAdept by mailwarm icon
MailAdept by mailwarm

AI Agents & Email deliverability experts on your team

0
Tabstack Browser Automation icon
Tabstack Browser Automation

Automate the web in your app or agent, no browser to host

0
LightTwist icon
LightTwist

Record & stream your show in a realistic virtual studio

0
Acti icon
Acti

Agentic keyboard for mobile commands and search

0
Ciaro Pro icon
Ciaro Pro

AI filmmaking for visual storytellers

0
Saldor icon
Saldor

Speed up procurement and AP.

0
Clusy icon
Clusy

AI notebook platform for modern data science

0
Stigg 2.0 icon
Stigg 2.0

The usage runtime for AI products

0
Metal icon
Metal

AI-driven operating system for raising venture rounds

0
RunInfra icon
RunInfra

Describe the AI model you need and get an optimized AI

0
Livinity icon
Livinity

Open-source homeserver OS with a built-in AI agent

0
Get Transparent Pricing on Labs icon
Get Transparent Pricing on Labs

The right tests. The real price. Nothing extra.

0
Browser Notes icon
Browser Notes

Your ideas, organized - not uploaded

0
iVox icon
iVox

The first app dedicated to 1980s tape-edit effects.

0
Supafax icon
Supafax

Email-native assistant that learns how you work

0
Akiflow icon
Akiflow

Manage tasks and calendars from Claude, ChatGPT or Cursor

0
06

TECHMEME

06.00
TECHMEME

Techmeme - July 2, 2026

Techmeme Digest: Major tech headlines and industry conversations.

Sources: Anthropic has initiated early-stage development of a custom AI server chip and held preliminary discussions with Samsung about manufacturing the chip (Qianer Liu/The Information)
Source: TechmemePublished: Jul 2, 2026

Qianer Liu / The Information : Sources: Anthropic has initiated early-stage development of a custom AI server chip and held preliminary discussions with Samsung about manufacturing the chip —  Anthropic has begun early-stage work on its own AI chip and held talks with Samsung Electronics as a potential manufacturing partner …

Global VC funding hit a record $510B in H1 2026, with OpenAI and Anthropic accounting for $217B, or 43% of the total; in Q2, VCs put $205B into 5K+ startups (Gené Teare/Crunchbase News)
Source: TechmemePublished: Jul 2, 2026

Gené Teare / Crunchbase News : Global VC funding hit a record $510B in H1 2026, with OpenAI and Anthropic accounting for $217B, or 43% of the total; in Q2, VCs put $205B into 5K+ startups —  Global venture funding reached a record $510 billion in the first half of 2026, surpassing the $440 billion invested in all of 2025 …

Microsoft establishes an organization with 6,000 staff specializing in engineering, corporate training, and management to support businesses with AI deployments (Todd Bishop/GeekWire)
Source: TechmemePublished: Jul 2, 2026

Todd Bishop / GeekWire : Microsoft establishes an organization with 6,000 staff specializing in engineering, corporate training, and management to support businesses with AI deployments —  Microsoft is launching a new AI “company.”  It won't be a separate legal entity, and most of its 6,000 people already work at Microsoft.

Sam Altman proposes a "US-led international forum" to establish AI standards, provide analysis of capabilities and risks, and make AI available to allies (Sam Altman/Financial Times)
Source: TechmemePublished: Jul 2, 2026

Sam Altman / Financial Times : Sam Altman proposes a “US-led international forum” to establish AI standards, provide analysis of capabilities and risks, and make AI available to allies —  The labs develop the technology, but citizens and their elected representatives must make the rules

SoftBank and its telecom unit launch SB Neo to offer AI chips and cloud services to big companies, aiming to provide 10GW of capacity in the US by 2030 (Min-Jeong Lee/Bloomberg)
Source: TechmemePublished: Jul 2, 2026

Min-Jeong Lee / Bloomberg : SoftBank and its telecom unit launch SB Neo to offer AI chips and cloud services to big companies, aiming to provide 10GW of capacity in the US by 2030 —  SoftBank Group Corp. and its telecom unit will start renting AI computing resources to US companies next fiscal year …

Z.ai launches ZCode, an "Agentic Development Environment" optimized for its new GLM-5.2 model; Z.ai's GLM Coding Plan costs from $16.20 to $144 per month (Michael Nuñez/VentureBeat)
Source: TechmemePublished: Jul 2, 2026

Michael Nuñez / VentureBeat : Z.ai launches ZCode, an “Agentic Development Environment” optimized for its new GLM-5.2 model; Z.ai's GLM Coding Plan costs from $16.20 to $144 per month —  The move marks the company's most aggressive push yet into the fast-growing AI-powered coding tool market …

Filings: Trump purchased up to $5M each in Broadcom, Meta, Amazon, Apple, Microsoft, and Nvidia stocks on July 23, the same day he unveiled his AI action plan (New York Times)
Source: TechmemePublished: Jul 2, 2026

New York Times : Filings: Trump purchased up to $5M each in Broadcom, Meta, Amazon, Apple, Microsoft, and Nvidia stocks on July 23, the same day he unveiled his AI action plan —  President Trump and his family reaped vast financial rewards from a memecoin that generated losses for hundreds of thousands of investors.

Amazon says Leo reached 396 satellites in low-Earth orbit following a recent launch, which is "enough to support continuous service across initial latitudes" (Thomas Ricker/The Verge)
Source: TechmemePublished: Jul 2, 2026

Thomas Ricker / The Verge : Amazon says Leo reached 396 satellites in low-Earth orbit following a recent launch, which is “enough to support continuous service across initial latitudes” —  Early adopters of Amazon Leo should temper expectations. … Amazon says it now has enough satellites operating …

German software giant SAP says it is encouraging workers to invent new, more valuable jobs aided by AI, in a bid to avoid layoffs; SAP cut ~10K staff in 2024 (Jim Tankersley/New York Times)
Source: TechmemePublished: Jul 2, 2026

Jim Tankersley / New York Times : German software giant SAP says it is encouraging workers to invent new, more valuable jobs aided by AI, in a bid to avoid layoffs; SAP cut ~10K staff in 2024 —  The German software giant SAP says it is betting that employees can reinvent jobs instead of eliminating them.  Experts are divided on whether it will work.

The US DOJ says Peter Stokes, a 19-year-old dual US-Estonian citizen, was extradited from Finland to face charges of participating in Scattered Spider hacks (Joe Warminsky/The Record)
Source: TechmemePublished: Jul 2, 2026

Joe Warminsky / The Record : The US DOJ says Peter Stokes, a 19-year-old dual US-Estonian citizen, was extradited from Finland to face charges of participating in Scattered Spider hacks —  A 19-year-old man with dual U.S. and Estonian citizenship was extradited from Finland to Chicago this week to face criminal charges …

President Trump says Micron will invest $250M in Trump Accounts; Micron says it will come via an employee matching program and a $250 deposit in some US states (New York Times)
Source: TechmemePublished: Jul 2, 2026

New York Times : President Trump says Micron will invest $250M in Trump Accounts; Micron says it will come via an employee matching program and a $250 deposit in some US states —  President Trump said the U.S. chip maker would make a significant donation to a new type of investment account created by the administration.

Cloudflare sets a September 15 deadline for AI companies to differentiate their web crawlers into search, AI training, and AI agents or face being blocked (Samantha Elkins/NBC News)
Source: TechmemePublished: Jul 2, 2026

Samantha Elkins / NBC News : Cloudflare sets a September 15 deadline for AI companies to differentiate their web crawlers into search, AI training, and AI agents or face being blocked —  Cloudflare gave AI crawlers a September deadline to separate the bots that gather content for search from those that harvest …

Germany-based Quantum Systems, which sells surveillance drones used by Ukraine, raised $1.2B led by Blackstone, Noteus, Airbus, and Advent at an ~$8B valuation (Yazhou Sun/Bloomberg)
Source: TechmemePublished: Jul 2, 2026

Yazhou Sun / Bloomberg : Germany-based Quantum Systems, which sells surveillance drones used by Ukraine, raised $1.2B led by Blackstone, Noteus, Airbus, and Advent at an ~$8B valuation —  Quantum Systems has raised $1.2 billion in a funding round, more than doubling the German drone startup's valuation …

An interview with Amazon SVP of Devices Panos Panay on designing custom chips for Echo and Fire TV devices, experimenting with AI-enabled gadgets, and more (Arjun Kharpal/CNBC)
Source: TechmemePublished: Jul 2, 2026

Arjun Kharpal / CNBC : An interview with Amazon SVP of Devices Panos Panay on designing custom chips for Echo and Fire TV devices, experimenting with AI-enabled gadgets, and more —  Amazon is focusing on building chips for its “critical” consumer devices, the company's top hardware executive told CNBC.

The European Court of Justice rules that Google's earlier defeat against a €4.1B European Commission penalty over abusing Android's market power should stand (Bloomberg)
Source: TechmemePublished: Jul 2, 2026

Bloomberg : The European Court of Justice rules that Google's earlier defeat against a €4.1B European Commission penalty over abusing Android's market power should stand —  Google lost its long-running fight against a €4.1 billion ($4.7 billion) European Union antitrust fine after the bloc's top judges …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - July 2, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - July 2, 2026

Solidot Feed: Highlighting essential tech & open-source news.

DGX Spark 黑客松线上训练营:4 小时干货,从环境配置到具身智能,手把手教你搭出能跑的 Agent

NVIDIA DGX Spark 黑客松开赛即报满,但赛事之外还有一场更适合"先蹭一波再决定要不要打"的硬核直播 直播时间:7 月 12 日 10:00 - 12:00 训练营内容: 1· 黑客松赛事规则说明:解读赛事机制、评审标准与提交流程,帮助团队明确方向、高效备赛。 2 · 基于 DGX Spark 和 Step 3.7 搭建本地 Agent Team 的最佳实践:从环境配置到模型推理,讲解如何在 DGX Spark 上高效落地Stepfun3.7模型能力。 3 · Agent 一键出片:基于 DGX Spark 搭建本地视觉生成智能体 演示如何构建具备视觉理解与内容生成能力的本地 Agent,打通从提示词到成片的完整链路。 4 · 从本地 AI 到具身智能:基于 DGX Spark 构建桌面机器人 Agent 开发平台 探索 Agent 从软件走向物理世界的实现路径,展示 DGX Spark 在具身智能场景下的开发实践。

久坐不动者肌肉线粒体功能出现显著下降

研究人员发现健康但久坐不动者其肌肉线粒体功能出现了显著且一致的下降。这可能是重大疾病发生的先兆。论文资深作者 Iñigo San Millan 表示,线粒体功能是代谢健康的核心,如果你 40 岁,身体健康但久坐不动,那么细胞很可能已出现问题,这些问题可能会在 10-15 年后给你带来麻烦。研究对象为 9 名久坐不动的男性和 10 名经常运动的男性,年龄均约为 42 岁。研究人员分析了肌肉活检以观察线粒体燃烧燃料的效率,并进行了运动测试以测量受试者的体能、脂肪燃烧能力和血乳酸水平——血乳酸水平是衡量身体能量消耗程度的关键指标。相比经常运动的男性,久坐不动的男性的线粒体效率在多个类别中下降了 28%-36%;将糖转化为可用能量的关键蛋白 MPC1 的水平降低了 49%,脂肪运输到线粒体的 CPT1 酶的活性降低了约一半;最大摄氧量(VO2max)降低了 38%,运动时血液乳酸水平升高了 60%。

来自 Google 的 Android 恶意程序

Android 自由软件应用商店 F-Droid 警告,过去几个月 Google 向多达 40 亿 Android 设备推送了被称为 Android Developer Verifier(ADV)的恶意程序。它以系统服务的形式在后台秘密运行,拥有完整的 root 权限,正静静等待 Google 的激活信号。ADV 服务无法屏蔽、禁用或移除。一旦激活,它唯一的目的就是阻止用户运行未经 Google 批准的开发者应用。Google 是以安全的名义强制推行 Android 开发者验证计划。根据 Android Developer Console 服务条款,如果开发者“违反任何条款,或者分发恶意应用或其它有害应用,Google 可能会终止您对 ADC 的访问……”,Google 没有定义恶意应用或有害应用,这意味着一款应用是否是恶意应用由 Google 判断,而作为最大的广告公司,广告屏蔽应用在其眼里可能就属于恶意应用。Google 预计从 9 月 30 日开始逐步激活 ADV。

越来越多的儿童使用 AI

根据来自 10 个国家的新数据,联合国儿童基金会估计,至少有 2000 万儿童使用过人工智能,且青少年采用该技术人数的增长速度是成年人的三倍多。最引人注目的是,据估计约有 200万 儿童——约占十分之一——表示会向人工智能寻求关于自身担忧的建议,另有 1300 万儿童表示使用人工智能来协助完成学校作业和家庭作业。儿基会表示:“人工智能已经到来。它正日益成为我们生活的一部分,它已经在塑造全球儿童的成长历程——无论好坏。”尽管人工智能为学习和创造力提供了新机遇,但儿基会警告称,关于其对儿童发展、情绪健康以及可能面临的伤害的影响,相关证据才刚刚开始浮出水面。该机构表示:“实际上,这一代人正在一场全球性实验中成长。”它敦促各国政府和科技公司将儿童权利置于人工智能监管的核心位置。

全球昆虫物种可能有 2000 万

科学家长期以来一直就昆虫物种的确切数量争论不休,此前普遍认为约为 600 万种。过去 3 个世纪里,昆虫学家已描述了约 100 万种昆虫,但要发现并描述所有物种,是一项艰巨甚至不可能完成的任务。为更准确估算昆虫多样性,研究人员研究了哥斯达黎加瓜纳卡斯特国家公园多年的昆虫调查数据,并应用了借鉴自流行病学领域的统计方法。随后利用另一个高度多样化的生物群体——树木,将这一数字推及全球范围。如果昆虫的多样性也遵循相同的比例,那么地球上大约有 1330万~2470 万种昆虫,一个稳妥的中间值是 2030 万种。研究人员表示他们的估算数字较为保守,这意味着可能还有数百万种尚未被发现的昆虫物种。

Cloudflare 推动 AI 公司为内容付费

Cloudflare 宣布推出新的控制功能,赋予内容出版商更多控制权,更好的掌控 AI 公司如何访问和使用其内容。 从 9 月 15 日起,新 Cloudflare 网站将允许传统的搜索引擎索引,但默认会屏蔽 AI 训练机器人和 AI 智能体访问广告支持的网页。Cloudflare 还在扩展其变现努力,推出了一种按使用付费模式(Pay-Per-Use),目的是当出版商的内容为 AI 生成的答案做出贡献时给予它们补偿,而不仅仅是让内容被抓取。Cloudflare 认为,出版商不应被迫在提高在线曝光率和免费向 AI 系统提供内容之间做出选择。

科学家首次利用非生命成分制造出细胞

明尼苏达大学的合成生物学家首次将非生物成分逐一装入类细胞的膜,见证该分子袋开始表现出类生命行为。这种人工合成细胞能生长、复制 DNA 并分裂,展示了细胞周期的基本功能。就任何定义而言,这个细胞都不是活着的。它离不开源源不断的营养物质和核糖体——合成蛋白质的分子机器。它没有防御机制,没有完善的废物处理系统。但迄今为止它最有力地证明从非生命物质创造生命是可能的,而这正是合成生物学家几十年来一直追求的目标。大约 40 亿年前,非生物分子聚集在一起形成了最早的原细胞。它们吸收养分、生长和分裂。随着时间的推移,这些细胞演变分化成不同的类型,用各种奇特的生物装饰这个原本贫瘠的世界。科学家对从非生命到生命的这种转变是如何发生的至今仍存在争议,部分科学家已开始在实验室进行尝试。

Anthropic 将移除检测中国用户的秘密代码

Anthropic 工程师表示将在周三发布补丁移除几个月前添加到 Claude Code 中的隐藏代码,这些代码旨在阻止其它 AI 公司蒸馏其模型。Claude Code 工程师 Thariq Shihipar 表示,“它是我们 3 月启动的一项实验,旨在防止未经授权的转售商滥用账户,以及防止模型蒸馏。团队此后已采取了更有效的缓解措施,实际上我们早就打算移除这些代码了。”在这之前有开发者发现 Claude Code 包含了秘密代码检查基本 URL 环境变量,该变量用于将 API 请求路由到代理或网关。如果基本 URL 已被覆盖,代码会继续检查系统时区,以及主机名是否与已知中国 AI 实验室、其它 AI 公司、账户转售商和网关域名列表中的任何条目匹配。

瑞典法院判决 Google 向比价网站赔偿 15 亿美元

瑞典法院以 Google 在搜索结果中偏袒自家购物服务为由判决它向比价网站 PriceRunner 赔偿约 15 亿美元(143 亿瑞典克朗)。这是瑞典法院在反垄断诉讼中判处的最高金额罚款,但远低于 PriceRunner 寻求的 780 亿瑞典克朗赔偿。PriceRunner 于 2022 年起诉 Google,指控 Google 操纵搜索结果。2008 年 Google 开始在搜索结果中突出展示其比价购物服务,导致竞争对手的比价网站流量急剧下降。2017 年时任欧盟竞争事务专员 Margrethe Vestager 以 Google 利用其比价购物服务获取不公平优势对其处以罚款。Google 于 2021 年对该裁决提出上诉但被驳回。之后欧洲的多家比价网站提起了赔偿诉讼。

索尼 PS 从 2028 年 1 月起不再发售新游戏的光盘版

数字游戏是未来,索尼正式宣布其 PS 游戏机从 2028 年 1 月起不再发售新游戏的实体光盘版本。这也意味着未来的的 PS 游戏机不会再发售包含蓝光光驱的型号。索尼称 2028 年 1 月之前已发售或即将发售的游戏实体光盘版不受影响。消费者普遍偏爱数字媒介而不是实体光盘,索尼表示它只是顺应这一趋势罢了。

Godot 拒绝接受 AI 生成的代码

开源项目都面临 AI 代码越来越多的问题,现在负责开发开源游戏引擎 Godot 的基金会宣布修订贡献者政策,禁止递交 AI 署名的代码和 AI 智能体提交的 pull request,以及在人与人之间的沟通中禁止 AI 生成文本——机器翻译除外。新政策旨在限制 AI Slop,鼓励维护者审查代码,将新贡献者培养成未来的维护者,最重要的是要求所有贡献都必须来自对代码负责的人类,修复出现问题的代码。基金会称,“AI 不能承担责任,我们也不能指望 AI 的重度用户能充分理解他们的代码并能进行修正。”

LHC 第三次停机维护

CERN 宣布了 LHC 的第三次长时间停机维护(Long Shutdown 3)。这次维护和升级将为下一阶段的 High-Luminosity LHC(HiLumi LHC)的运行做准备。LHC 于 1998-2008 年建造,2009 年投入运行,2010 年首次实现 3.5TeV 粒子对撞,2012 年宣布发现了希格斯玻色子。2013-2015 年 LHC 进行了第一次维护升级,使得粒子对撞的总能量提高到了 13.0TeV;2018 年底到 2022 年 4 月 LHC 进行第二次维护升级。第三次停机维护将是至今最大规模的升级改造,HiLumi LHC 计划于 2030 年投入运行,其亮度提高最多十倍,将使研究人员能收集规模更大的数据集,对希格斯玻色子进行更精确的研究,增强发现标准模型之外现象的潜力。

研究人员将干细胞转化为人类初级卵母细胞

美国生物科技创业公司 Conception 宣布成功诱导干细胞转化为人类初级卵母细胞,称这是一项重大的科学进步。科学家已在小鼠身上实现了利用干细胞制造卵子。研究人员首先将小鼠皮肤细胞转化为“诱导多能干细胞”(iPSC),然后转化为可用的卵子。这些卵子产生了健康的幼崽,寿命正常,能自然繁殖,有自己的健康后代。该过程被称为“体外配子发生”(in vitro gametogenesis),在小鼠身上比在大型动物身上更容易实现。 IVG 有潜力重新定义生殖。简单通过抽血就可以制造出一个家庭需要的尽可能多的健康卵子。这种能力将可摆脱生物和遗传的限制,极大扩大家庭生育健康孩子的选择,使女性能在更大的年龄生育后代。

中国 2026 年汽车出口量有望达到千万

AlixPartners 预测,中国的汽车出口量 2026 年将比上一年增加 4 成,达到 1000 万辆。中国汽车工业协会的统计显示,2026 年 1~5 月出口量同比增长 63%,达 405 万辆。占出口大部分的是纯电动汽车(EV)等新能源汽车,达到 183 万辆,增至 2.1 倍,远远超过汽油汽车的增长率(36%)。如果目前的出口速度持续,2026 年出口量将比 2025 年增加 41%,达到 1000 万辆。如果实现,将成为世界上第一个出口 1000 万辆汽车的国家。相当于日本出口量的约 2.5 倍。中国汽车出口量剧增,主要原因是国内销量的减少。2026 年中国新车销量预计为 2460 万辆,比 2025 年减少 10%。

arXiv 从康奈尔大学独立

预印本平台 arXiv.org 于 7 月 1 日脱离康奈尔大学成立独立的非营利性组织。arXiv 诞生于 1991 年,创始人 Paul Ginsparg 在 2001 年加入了康奈尔大学,arXiv 网站随后由康奈尔大学图书馆接手。25 年后 arXiv 决定翻开新的篇章。arXiv 组织正致力于确保这一过渡平稳进行,让作者、读者和社区几乎不会感受到有任何变化。官方博客表示:“arXiv 由科学家创建,为科学家服务,虽然我们的“家”可能会变,但我们的使命、愿景和价值观永远不变。arXiv 将继续致力于免费阅读和投稿,致力于为全球科学家提供公平获取新想法和新发现的机会。在从康奈尔大学独立出来的过程中,arXiv 的工作人员、志愿者以及我们的支持者正在努力确保 arXiv 提供的重要服务不会中断。”

美国 Henrico 县请求政府和学校为数据中心节约用电

美国弗吉尼亚州 Henrico 县因毗邻华盛顿特区,拥有大片的土地,而几乎一夜之间成为数据中心枢纽。该县有 37 座数据中心,计划再建 17 座数据中心。数据中心带来的一个副作用是电价在上涨。县长 John Vithoulkas 于 6 月 26 日向数千名政府员工发送了一封电子邮件,请求他们协助政府节约用电,“从 7月 1 日起,Henrico 县所有政府和学校设施的电价将上涨 25%,预计下一财年将增加 500 万美元的开支。我们预计未来几年电价还会继续上涨。”“为缓解电费上涨带来的影响,我请求大家共同做出些调整,在各自的工作区域节约用电。离开工作区域时,包括下班离开时,请关灯。每天工作结束后,请关闭电脑/笔记本电脑。如果您的工作区域有窗户,请调整百叶窗以控制阳光照射带来的热量。不使用电器、充电器或其它电器时,请拔掉电源插头。请限制使用(或完全避免使用)电暖器。仅单个普通电暖器每年就可能给县政府造成 150-300 美元的电费支出。”

微软发布 WSL 容器的预览版

微软发布了 Windows Subsystem for Linux (WSL)容器的公开预览版本。容器已成为现代开发的基础部分——从云原生应用、AI工作负载到测试和部署流水线。WSL 容器通过提供一个内置的、企业级的方式,在 Windows 上创建、运行和管理 Linux 容器,简化了这一体验,不再需要额外的第三方工具。

为什么我们需要睡眠

为什么动物需要睡眠?发表在《Brain Medicine》期刊上的一篇综述认为,关于睡眠的诸多困惑,源自将三个概念混为一谈,作者认为睡眠最恰当的理解,既非休息,亦非打扫,而是一种系统层面的韧性机制,使这具约由 860 亿个神经元构成的网络,不致漂入无法脱身的状态。“我们想要超越那种把睡眠仅仅视作一夜充电的看法,”中国科学院长春应用化学研究所的 Xiaohui Wang 教授说,他是通讯作者之一。“当你把大脑看作一个复杂的动态网络,睡眠便开始像是一位审慎的工程师有意设计之物,是留给系统自我修复与重组的一段预定窗口。”综述把睡眠的两大阶段看作一种分工。在非快速眼动睡眠中,尤其是慢波阶段,大脑陷入高振幅、低频率、低于一赫兹的节律。模块性上升。熵下降。被一整日学习所抬高的突触强度,被悄然重整,使网络不致饱和。快速眼动睡眠所为,几乎相反。电信号去同步化,θ节律与γ节律攀升,大脑转向全局整合与探索,松动那些已变得过于僵硬的回路。“一个只知优化的网络,可能把自己逼入死角,”第一作者 Longwei Yang 说。“而睡眠所守护的,似乎正是重新走出来的能力。”

美国政府解除对 Claude Fable 5 和 Mythos 5 模型的出口限制

Anthropic 周二宣布美国商务部已解除对 Claude Fable 5 和 Mythos 5 模型的出口限制,该公司将于周三恢复提供对其新模型的访问。美国政府是在 6 月中旬以国家安全理由下令禁止外国公民访问 Anthropic 最先进的 AI 模型,这一限制甚至涵盖了 Anthropic 自己的外籍员工。美国商务部长 Howard Lutnick 周二在 X 平台上发帖称,“过去两周我们与 Anthropic 密切合作,分析并批准 Fable 5,以确保其与美国政府保持一致,并强化美国在 AI 领域的领导地位。”

Claude Code 会悄悄检查用户的系统时区是否是中国

Claude Code 被发现会悄悄检查用户的系统时区和是否来自中国 AI 公司。对 Claude Code(2.1.196)的本地二进制文件的分析发现,它会检查系统时区是否为 Asia/Shanghai 或 Asia/Urumqi,以及是否匹配中国科技公司的域名,其中包括 baidu.com、alibaba-inc.com、alipay.com、antgroup-inc.cn、bytedance.net、kuaishou.com 、xiaohongshu.com、jd.com 和 bilibili.co 等等。此举可能是防止中国 AI 公司蒸馏其模型。

09

APP STORE RANK

09.00
APP STORE RANK
Loading…