Inception Labs has released Mercury 2, a diffusion-based language reasoning model that uses parallel 'denoising' generation instead of autoregressive token-by-token decoding. On NVIDIA Blackwell, it achieves approximately 1009 tokens/second throughput and about 1.7 seconds of end-to-end latency. The model supports 128K context length, tool calling, and JSON output, with early access to an OpenAI-compatible API. Pricing is set at $0.25 per million input tokens and $0.75 per million output tokens, targeting low-latency use cases such as voice assistants, search, and real-time agent loops.
U.S. DoD Pressures Anthropic to Loosen Claude Safeguards Amid $200M Contract Dispute
Policy & RegulationAI SafetyDefense
Multiple media outlets report that the U.S. Department of Defense, during collaboration talks with Anthropic, proposed an 'any lawful use' clause. If Anthropic refuses to relax safety restrictions on Claude, the DoD may terminate a roughly $200 million contract and designate the company as a 'supply chain risk.' Reports suggest the U.S. might invoke the Defense Production Act to compel technology access. The dispute centers on Anthropic's refusal to support lethal autonomous weapons and mass surveillance of U.S. citizens. Claude is reportedly one of the few frontier models capable of operating within classified networks.
Bridgewater: Four Tech Giants May Invest $650B in AI by 2026
Compute & Data CentersInvestment & FinancingMacro
Bridgewater Associates estimates that Alphabet, Amazon, Meta, and Microsoft will collectively spend around $650 billion on AI-related infrastructure in 2026, up from $410 billion in 2025. The firm attributes the increase to sustained compute demand outpacing supply, noting companies may redirect capital by reducing share buybacks. Bridgewater also highlights downside risks: heavy investment increases sensitivity to external financing and market sentiment, while data center construction could strain regional power grids and supply chains. However, the investment is expected to contribute approximately 100 basis points to U.S. GDP growth.
Meta Open-Sources RCCLX: Boosts AMD GPU Inference Communication, Reduces TTIT by 10%
Open SourceInference AccelerationHardware Ecosystem
Meta has open-sourced RCCLX, a GPU communication library for AMD platforms, integrated into Torchcomms to deliver communication backend capabilities comparable to NVIDIA's ecosystem. Its Direct Data Access (DDA) algorithm enables GPUs to directly load memory from peer ranks, reducing small-message allreduce latency from O(N) to O(1). This delivers 10%–50% acceleration during decoding and 10%–30% improvement during prefill. It also introduces FP8 low-precision collectives to reduce overhead in large-scale data transfers. Meta's MI300X benchmarks show approximately 10% reduction in Time to First Token (TTIT) and about 7% overall throughput gain.
Cloudflare has launched Vinext, a reimplementation of Next.js APIs directly on Vite, enabling applications to deploy more natively on Cloudflare Workers. The team reported using Claude and OpenCode to generate and cover approximately 94% of the API surface within a week, validating behavioral consistency with existing test suites. Vinext introduces 'Traffic-aware Pre-Rendering,' which uses Cloudflare traffic analytics during deployment to pre-render only high-traffic pages, avoiding linear build time growth with page count. This case demonstrates the feasibility and boundary conditions of AI-assisted engineering in large-scale framework reimplementations.
Cerebras Reportedly Files Secretly for IPO, Fueled by Compute Supply Deal with OpenAI
SemiconductorsIPOCompute
Multiple reports indicate that AI chipmaker Cerebras Systems has confidentially filed for an IPO and begun meetings with potential investors, with a possible listing as early as April 2026. The IPO push is attributed to recent multi-year compute supply agreements, including a significant partnership with OpenAI, boosting market interest. Cerebras challenges the GPU paradigm with ultra-large-scale AI chips and systems, serving major institutional clients. However, concerns remain over reliance on large customers and geopolitical supply chain risks. Company details and regulatory filings have not yet been made public.
SambaNova has raised $350 million, backed by Intel Capital, to advance its dataflow architecture and launch the fifth-generation RDU accelerator SN50, targeting the generative AI inference market. Reports reveal SN50 achieves peak 16-bit and 8-bit floating-point performance of 1.6 and 3.2 petaFLOPS—2.5x and 5x faster than its predecessor—and features a hierarchical memory structure (432MB SRAM, 64GB HBM2E, up to 2TB DDR5) emphasizing efficient model switching and cache management. SoftBank is named as an early customer, and the company claims per-user generation speeds up to five times that of B200.
Multiverse Open-Sources HyperNova 60B: 50% Compression of 120B Model, VRAM Drops from 61GB to 32GB
Open SourceModel CompressionInference Deployment
Multiverse Computing has announced free full access to the compressed model HyperNova 60B 2602 on Hugging Face, described as a 50% compressed version of OpenAI's gpt-oss-120B. Using quantum-inspired CompactifAI compression, it reduces resource requirements significantly with only about 2%–3% accuracy loss. Official figures show runtime memory reduced from 61GB to 32GB, with enhanced tool calling and agent coding capabilities: BFCL v4 function calls improved ~1.5x, Terminal Bench Hard ~2x, and Tau2-Bench ~5x. This approach targets lower-barrier enterprise and research deployments, with more compressed models expected to follow.
Media Reports Hugging Face Open-Source Leaderboard Update: Qwen3.5 Tops, First 18 Models from China
Open Source EcosystemModel BenchmarkingMultimodal
Media reports indicate Hugging Face released an updated open-source large model leaderboard on February 24, with Alibaba's native multimodal model Qwen3.5 ranked first. The model is said to have approximately 397 billion total parameters but activates only about 17 billion during inference, delivering performance close to Gemini 3 at roughly 5% of the token cost. Eight of the top ten models are reported to be from Chinese teams. The report notes successful adaptations across NVIDIA, AMD, Apple, Intel, and several domestic chip platforms. Alibaba's open-source models are described as exceeding 400 releases with over 1 billion cumulative downloads, creating significant ecosystem momentum.
Anthropic Discloses 'Distillation' API Abuse: 24K Fake Accounts, 16M Interactions
AI SafetyAPI AbuseModel Distillation
Multiple reports and briefings indicate Anthropic has accused DeepSeek, Moonshot AI, and MiniMax of conducting high-frequency interactions with Claude via approximately 24,000 fake accounts, totaling over 16 million requests. The intent appears to be 'distilling' Claude's reasoning, coding, and tool-use capabilities, using proxies and evasion tactics to bypass restrictions. Anthropic is reported to have strengthened behavioral fingerprint detection and access controls, imposing stricter limits on entities under Chinese jurisdiction. The incident brings into focus risks around replicating model capabilities via API outputs, raising concerns about stripped safety alignment, intellectual property boundaries, and circumvention of export controls—potentially forcing the industry to invest more heavily in API abuse defense.