AI Daily Brief

Tuesday, April 21, 2026

9 stories3 min read

Today's Highlights

Moonshot AI Open-Sources Kimi K2.6 with Support for 300 Collaborative Sub-Agents and 1T Parameter MoE Architecture

Open Source ModelAgent

Moonshot AI has released and open-sourced the Kimi K2.6 multimodal model, featuring a 1 trillion parameter Mixture-of-Experts (MoE) architecture (activating 32B), supporting 256K context length, and natively accepting image and video inputs. The core innovation is its agent swarm capability, enabling coordination of up to 300 sub-agents to execute complex tasks over 4,000 steps, with support for 13 hours of continuous programming. It achieves 80.2% on SWE-Bench Verified and improves MCPMark from 29.5 in K2.5 to 55.9. The API input cost is approximately $0.95 per million tokens. Released under a Modified MIT license, it supports multiple deployment methods including vLLM and SGLang, and can even run on consumer-grade RTX 4090 GPUs. Multiple applications have already integrated the model, with daily token calls reaching hundreds of millions.

Read full article

Alibaba Releases Qwen3.6-Max-Preview, Tops Six Programming Benchmarks

Large ModelProgramming

Alibaba's Qwen team has launched the flagship preview model Qwen3.6-Max-Preview, which significantly outperforms the previous generation Qwen3.6-Plus in agent programming, world knowledge, and instruction following. It achieves the highest scores across six programming benchmarks including SWE-bench Pro, Terminal-Bench 2.0, and SkillsBench, with improvements of 9.9 on SkillsBench and 10.8 on SciCode. On world knowledge benchmarks, it gains 2.3 on SuperGPQA and 5.3 on QwenChineseBench. According to ArtificialAnalysis evaluation, its overall performance surpasses GLM5.1 and MiniMax-M2.7, making it the current strongest domestic large language model. It is now accessible via Qwen Studio and will soon be available through Alibaba Cloud's Bailian API, compatible with both OpenAI and Anthropic protocols.

Read full article

GitHub Suspends New Copilot Signups Due to Runaway Compute Costs from AI Agent Workflows

Development ToolsBusiness Model

On April 20, GitHub announced the suspension of new signups for Copilot Pro, Pro+, and student plans due to excessive compute costs caused by long-running, parallelized tasks driven by AI agents—where single requests exceed monthly subscription fees. Usage restrictions are also tightened: Pro plans lose access to the Opus model, with only Pro+ retaining Opus 4.7; Pro+ usage limits are more than five times those of Pro; Opus 4.5 and 4.6 will also be removed from Pro+. GitHub will introduce usage visualization in VS Code and CLI, and affected users may request refunds by May 20. This adjustment reflects the industry-wide challenge of balancing compute costs against subscription revenue as AI coding assistants evolve from code completion tools to autonomous agents.

Read full article

Recursive Superintelligence Raises $500M at $4B Valuation, Betting on Recursive Self-Improvement in AI

FundingAGI

Recursive Superintelligence has raised $500 million in funding, co-led by Google Ventures (GV) and NVIDIA, achieving a $4 billion valuation. Founded by former engineers from Google DeepMind and OpenAI, the company aims to build AI systems capable of autonomously designing, testing, and optimizing their own code and architectures, reducing model iteration cycles from months to hours. Funds will be used to recruit top talent and build large-scale computing clusters, with plans to initiate its first 'Level 1' autonomous training run within the year. Notably, the company secured this high valuation just four months after founding and before launching any product, sparking concerns about capital bubbles in the AI sector.

Read full article

NVIDIA Releases Nemotron 3 Super, 120B-Parameter Open MoE Model Designed for AI Agents

Open Source ModelAI Agent

NVIDIA has released Nemotron 3 Super, a 120-billion-parameter open Mixture-of-Experts (MoE) model specifically designed for AI agent workloads. Utilizing an MoE architecture, it activates only 12.7 billion parameters per forward pass, significantly reducing compute costs while delivering up to 7.5x performance gains. Its open nature allows deployment across various scenarios, including smart contract security audits and decentralized prediction markets. This release marks NVIDIA’s deeper expansion into AI modeling, transitioning from a hardware provider to a full-stack hardware-software ecosystem player.

Read full article

Vercel Breached After Employee Authorized Third-Party AI Tool, Hackers Demand $2M

Security IncidentSupply Chain Security

AI cloud platform Vercel confirmed a security incident stemming from a third-party AI tool, Context.ai, used by an employee. Attackers compromised the employee’s Google Workspace account, gaining limited access to internal Vercel systems and exposing some non-sensitive environment variables. The threat actor ShinyHunters claimed responsibility and is offering stolen data—including internal databases, employee accounts, and tokens—for $2 million. Vercel has contacted law enforcement and advised customers to rotate credentials and review integrations. This incident highlights supply chain security risks arising from improper configuration of AI tool access permissions in enterprise environments.

Read full article

NVIDIA Cancels Full-Year Consumer GPU Launch for 2026 Due to Memory Shortage, RTX 50 Series Delayed to Q3

ChipSupply Chain

Due to global memory chip shortages, NVIDIA has canceled its entire 2026 consumer GPU launch schedule—the first time in company history with no new products released in a year. Although the RTX 50 Super series has been finalized, production priority was downgraded in December 2025 as memory resources were prioritized for AI accelerators. Overall capacity for gaming GPU memory supply could shrink by up to 40%. The launch is now delayed to Q3 2026, though timely availability remains uncertain. The flagship RTX 5090D v2 starts at RMB 16,499 in China, with select overseas models priced above $3,000. If new cards fail to launch within the year, gamers may face a two-year gap without new product releases.

Read full article

OpenAI Launches Codex Chronicle Feature to Build Development Context Memory from Screenshots

Development ToolsProduct Release

OpenAI has introduced the Chronicle feature for its Mac version of Codex desktop app, which uses a background agent to capture screenshots and build localized memory, enabling Codex to understand specific user references such as error messages on screen, documents being edited, or past projects handled. Data is temporarily stored locally on the device, and users can view and edit it at any time. Currently available only to Mac Pro subscribers as a research preview, it requires granting screen recording and accessibility permissions. OpenAI warns that this feature rapidly consumes API call quotas and that other applications may access the temporarily stored screenshot files, raising privacy considerations.

Read full article

Moonshot AI and Tsinghua Propose PrfaaS Cross-Datacenter Inference Architecture, Achieving 54% Throughput Gain

InfrastructureResearch

Moonshot AI and Tsinghua University jointly proposed Prefill-as-a-Service (PrfaaS), an architecture that offloads the compute-intensive prefill phase of LLM inference to specialized high-performance clusters, then transfers the KVCache over standard Ethernet to local decoding clusters—overcoming traditional RDMA network limitations. Enabled by KVCache compression (up to 36x) via hybrid attention models, cross-datacenter transmission becomes feasible. Real-world testing using an internal 1T-parameter model shows PrfaaS achieves 54% higher throughput compared to homogeneous baselines, reduces P90 first-token latency by 64%, and consumes only 13% of available bandwidth. Combining request-length routing, hierarchical scheduling, and multi-connection TCP transmission, this architecture offers a novel infrastructure paradigm for large-scale LLM serving.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief