Pentagon Labels Anthropic a Supply Chain Risk and Bans Use of Claude
Policy & RegulationAI SafetyDefense
The U.S. Department of Defense has declared Anthropic an immediate 'supply chain risk,' requiring military units and contractors to cease using Claude in their systems. Reports indicate the decision stems from Anthropic's refusal to relax product restrictions against use in mass surveillance and autonomous weapons, prompting a strong government response. Several defense contractors are now switching to alternative model providers; Anthropic stated it will pursue legal action to seek relief.
NVIDIA Halts China-Bound H200 Production, Shifts TSMC Capacity to Vera Rubin
ComputeSemiconductorsGeopolitics
According to Korean media, NVIDIA has halted production of its H200 chips for the Chinese market due to strict export controls, delayed sales revenue, and uncertain future shipments. The company is reallocating TSMC capacity from H200 to next-generation 'Vera Rubin' related products to mitigate supply chain volatility and component shortages. Reports also state that NVIDIA currently holds approximately 250,000 H200 chips in inventory, and even if policies ease, restoring supply chains could take up to three months.
Researchers have disclosed the 'Clinejection' supply chain attack: attackers embedded prompt injection payloads in GitHub issue titles, tricking tool-enabled automated triage agents into executing commands. They then poisoned node_modules via shared Actions cache keys across workflows, forcing eviction of legitimate caches by filling the 10GB limit, causing high-privilege release processes to load malicious dependencies, ultimately stealing NPM publishing keys and producing tampered production packages.
GitHub Security Lab has open-sourced an AI-driven code security audit framework that breaks audits into multi-stage 'taskflows,' first performing threat modeling and issue enumeration, then rigorous validation to reduce LLM hallucinations and false positives. The framework can be quickly deployed via GitHub Codespaces (requires Copilot license) and reused by the open-source community. Official reports confirm the tool has identified critical vulnerabilities and real CVEs in projects like Outline, WooCommerce, and Rocket.Chat, covering logic flaws such as authorization bypass.
Xiaomi has launched a system-level AI agent called 'miclaw' in limited beta testing, integrating execution capabilities at the system permission layer based on the MiMo large model. It includes over 50 system-level tools orchestrated automatically by a reasoning-execution engine. Its 'perceive-associate-decide-act' pipeline enables task planning using personal context such as SMS and calendar data, extending execution to whole-home IoT devices via the MiJia protocol and MCP standards. Xiaomi also emphasizes its self-evolving capability, enabling tool generation, memory retention, and sub-agent creation.
Databricks Unveils LogSentinel: LLM-Powered PII Detection Cuts Review Time from Weeks to Hours
Data GovernanceEnterprise AIPrivacy & Compliance
Databricks has disclosed its internal system LogSentinel, which uses LLMs to automate PII identification and data governance in logs and table fields: first generating field descriptions and building vector retrieval, then retrieving similar labeled examples for dynamic few-shot prompting. Multi-model orchestration enables routing, fallback, and validation hooks to reduce risks of empty labels and hallucinations. Classification results directly feed into masking, access control, and remediation workflows, and Jira tickets are auto-generated upon detecting drift or policy violations, reducing periodic compliance reviews from weeks to hours.
Liquid AI Releases LFM2-24B-A2B and LocalCowork Local Offline Agent
Local ModelAgentOpen Source
Liquid AI has released the sparse MoE model LFM2-24B-A2B and the open-source desktop agent LocalCowork, focusing on privacy-first local agent workflows with tool calling via MCP. The model has 24 billion total parameters, activating approximately 2 billion per token during inference. It runs with about 14.5GB memory usage on Apple M4 Max using Q4_K_M quantized GGUF format, with average tool selection latency around 385ms. LocalCowork operates offline with 75 tools across 14 MCP servers and maintains local audit logs; testing shows 80% single-step accuracy but only 26% end-to-end success rate for multi-step tasks.
Model ArchitectureParameter-Efficient Fine-TuningResearch
Tencent HunYuan has proposed the HY-WU (Weight Unleashing) paradigm: instead of relying on fixed checkpoints, it trains a parameter generator that synthesizes and mounts adapter parameters like LoRA in real time based on input conditions, transforming 'static parameter memory' into 'functional memory.' The company claims this mechanism mitigates catastrophic forgetting and task trade-offs in personalized adaptation, reducing gradient conflicts through dynamic routing. It demonstrates structural consistency of generated parameters in tasks like image editing. The study also reveals scaling laws: performance improves with increased generator depth and LoRA rank.
InSpatio Open-Sources Real-Time 3D World Model WorldFM: Runs in Real-Time on Single 4090
World ModelEmbodied IntelligenceOpen Source
InSpatio has announced and open-sourced InSpatio-WorldFM, a real-time interactive 3D world model advocating native 3D representation over 2D pixel prediction for spatial intelligence. Its 'explicit anchors + implicit memory' architecture generates spatial anchors via feedforward reconstruction and combines them with generative models’ implicit memory to mitigate long-term forgetting and geometric collapse, supporting theoretically infinite-duration consistent generation. The team proposes a data amplification strategy extracting geometric and physical priors from massive 2D videos, claiming the model can run in real time on a single RTX 4090 after distillation and inference optimization.
FlashAttention-4 Released: Restructured for Blackwell to Make Attention Approach Matmul Speed
Inference AccelerationOperator OptimizationGPU
FlashAttention-4 has been officially released, deeply restructuring the attention computation pipeline for NVIDIA's Blackwell architecture to address the 'asymmetric scaling' bottleneck where Tensor Core throughput grows faster than SFU and shared memory bandwidth. The solution replaces exp operations with polynomial approximations in the forward pass and reduces shared memory traffic in the backward pass via TMEM intermediate result reuse and 2-CTA MMA designs, achieving better overlap of computation and I/O. The project adopts a Python version of CuTe-DSL, maintaining PTX-level control while improving compilation speed by approximately 20–30x, and supports rapid prototyping with PyTorch FlexAttention.