Back to Archive
Sunday, March 15, 2026
9 stories3 min read

Today's Highlights

1

Anthropic Expands Claude 4.6 Context to 1M Tokens Without Price Increase

Model ReleaseLong ContextPricing

Anthropic has expanded the context window of Claude Opus 4.6 and Sonnet 4.6 to 1 million tokens and made them generally available (GA), without charging a premium for long-context usage. Each request supports up to 128K output tokens and can process up to 600 images or PDF pages. The models also offer four adaptive 'thinking' intensity levels to control cost. Officially, Opus 4.6 achieves 78.3% accuracy on MRCR v2 million-token retrieval tasks. This capability is available to both subscription and API users at unchanged rates (Opus: $5/$25, Sonnet: $3/$15 per million tokens in/out).

Read full article
2

NVIDIA Open-Sources Nemotron 3 Super 120B with 1M Context Support

Open Source ModelLong ContextNVIDIA

NVIDIA has released the open-weight model Nemotron 3 Super 120B, natively supporting a 1 million token context. It is now available via Hugging Face, OpenRouter, Together, and other platforms. The model uses the Mamba long-sequence architecture combined with MoE design, activating approximately 12 billion parameters per inference to reduce long-context inference costs. Reports indicate up to 5x throughput improvement and about 50% faster generation in multi-agent tasks. In addition to model weights, NVIDIA provides training data, training recipes, and enterprise-grade fine-tuning licenses, targeting enterprise-local or private deployment with full control.

Read full article
3

xAI Undergoes Major Restructuring: Only 2 of 11 Co-Founders Remain, Macrohard Paused

Company NewsOrganizational ChangeAI Coding

TechCrunch reports that xAI is undergoing a major restructuring and 'rebuilding from the ground up': only 2 of the original 11 co-founders remain, with several co-founders and senior engineers departing. Musk acknowledged the company was 'not built right the first time.' The report states its AI coding product line lags behind OpenAI Codex and Anthropic Claude Code in competition, prompting executives from SpaceX and Tesla to intervene in evaluations and drive downsizing. The company has paused its white-collar agent project Macrohard and shifted focus toward Digital Optimus in collaboration with Tesla. xAI currently has around 5,000 employees and is recruiting engineering leads from teams like Cursor to close development gaps.

Read full article
4

European Parliament Approves Signing of AI Framework Convention, First Binding Treaty

Policy & RegulationAI GovernanceEU

The European Parliament has approved the EU's signing of the 'Council of Europe Artificial Intelligence Framework Convention,' described as the world’s first legally binding international treaty on AI governance. The vote passed with 455 in favor, 101 opposed, and 74 abstentions. The convention establishes common standards for transparency, documentation, risk management, and oversight mechanisms, applying to public authorities and private entities acting on their behalf. It aligns with existing regulations such as the EU AI Act and GDPR while allowing the EU to maintain higher protection levels. Negotiations began in 2022, with participants including EU member states, the UK, Canada, Israel, and the US, and the treaty will be open for accession by additional countries.

Read full article
5

Iranian Drone Attack on Amazon Data Centers in Middle East Causes Widespread Service Outages

Security IncidentCompute InfrastructureGeopolitics

ABC reports that Iran launched drone attacks on Amazon data centers in the UAE and Bahrain, with the Iranian Revolutionary Guard stating they supported U.S. military AI targeting systems and were therefore legitimate retaliation targets. The incident caused widespread disruptions to banking, ride-hailing, and food delivery services in Dubai and Abu Dhabi. Experts note that physical hardening of data centers could cost hundreds of millions of dollars per site. Additionally, risks to undersea cables in strategic areas like the Red Sea and Strait of Hormuz are rising; any damage could impact intercontinental communications and cloud service availability, influencing tech companies’ future AI infrastructure deployment, location choices, and redundancy designs in high-risk regions.

Read full article
6

NanoClaw Integrates with Docker MicroVM Sandboxes to Isolate AI Agent Tasks

SecurityAgentSandbox

Yahoo Tech reports that open-source AI agent platform NanoClaw has partnered with Docker to integrate into Docker Sandboxes, providing isolated execution environments for each agent task via MicroVM-level sandboxing. The integration aims to mitigate security risks posed by agents like OpenClaw that can access files, credentials, and accounts. NanoClaw’s codebase is about 4,000 lines, emphasizing containerized operation and access only to explicitly mounted resources, thereby minimizing privilege escalation and data leakage surfaces. The project has gained approximately 21,000 GitHub stars and 3,800 forks. The collaboration supports enterprise experimentation and evaluation of agent workflows, offering more controlled isolation boundaries and rollback capabilities.

Read full article
7

Hume AI Open-Sources TADA Speech Model: 1B/3B Versions, Zero Hallucinated Words in Testing

Open SourceSpeech GenerationModel Release

Hume AI has open-sourced its speech generation model TADA, which uses a 'one-to-one correspondence between text tokens and audio signals' approach, enabling synchronized text and audio processing and claiming speeds over 5x faster than comparable systems. In tests across more than 1,000 samples, TADA produced no hallucinated words (no fabricated or missing transcriptions), with human evaluators rating naturalness at 3.78/5. The model comes in 1B and 3B versions, with the 3B variant supporting eight languages and designed to run efficiently on smartphones, lowering the barrier for on-device speech generation. Code and models are released under the MIT license on GitHub and Hugging Face, accompanied by a paper detailing technical specifications and evaluation setups.

Read full article
8

llama.cpp Releases b8340: AVX512-FP16 Acceleration and GDN Fix

Open Source ToolInference AccelerationEngineering

llama.cpp has released version b8340, adding native support for the AVX512-FP16 instruction set to optimize F16 computation on CPUs. While individual operations show improved efficiency, overall benchmarks are limited by RAM bandwidth, with performance analysis showing a reduction of approximately 2.7 billion instructions. The same release refactors the state matrix access pattern in the Gated Delta Net (GDN) kernel, improving cache bandwidth utilization through contiguous memory reads, fixing a ~39% performance drop previously observed on backends like Metal and certain models. A new --fused-gdn flag is added to control fusion paths. Official binary packages are provided for macOS/iOS/Linux/Windows across multiple backends.

Read full article
9

ByteDance × Tsinghua Propose CUDA Agent: Using RL to Generate High-Performance CUDA Kernels

PaperReinforcement LearningSystem Optimization

A joint team from ByteDance and Tsinghua University proposes CUDA Agent: an LLM agent trained via reinforcement learning to automatically write, execute, and iteratively optimize CUDA kernels. The framework combines directed pretraining with PPO policy updates, supported by a synthetic dataset of over 6,000 composite PyTorch operators. Performance evaluation and feedback occur within a highly isolated execution sandbox to minimize security and stability risks from generated code. The authors claim the method discovers customized memory access patterns and hardware-specific operator fusion, outperforming static compiler heuristics like torch.compile and zero-shot generation from general large models across various operator graphs, pointing to a scalable path for automated performance engineering.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief