AI Daily Brief

Friday, April 3, 2026

10 stories3 min read

Today's Highlights

Google Releases Gemma 4 Open Source Model Series, 31B Version Ranks Third Globally Under Apache 2.0 License

Open Source ModelGoogleEdge AI

Google DeepMind released the Gemma 4 open source model series on April 2, including four variants: E2B, E4B, 26B MoE, and 31B Dense. The 31B model ranks third globally on the Arena AI text leaderboard, while the 26B MoE ranks sixth, outperforming models 20 times its size. The 26B MoE uses 128 small experts and activates only 3.8B parameters to achieve inference performance equivalent to a 27B model. All models natively support image, video, and audio processing, with context windows up to 256K, coverage of over 140 languages, built-in function calling, and structured JSON output, making them suitable for agent workflows. E2B and E4B can run with near-zero latency on edge devices such as smartphones and Raspberry Pi. The models have transitioned from a custom license to Apache 2.0, eliminating friction for commercial deployment. They are now available on platforms including Hugging Face, Ollama, and Google AI Studio, with NVIDIA providing NVFP4 quantized versions and full-stack deployment support.

Read full article

Microsoft Launches Three Proprietary AI Foundation Models MAI Series, Speech-to-Text Word Error Rate at Just 3.8%

MicrosoftAI ModelMultimodal

Microsoft launched three in-house AI models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—on April 2, marking its strategic shift from distributor to independent model developer. MAI-Transcribe-1 supports 25 languages, achieving an average word error rate of just 3.8% on the FLEURS benchmark, surpassing OpenAI Whisper and Google Gemini, with 2.5x faster speed than Azure Fast at a starting price of $0.36 per hour. MAI-Voice-1 enables second-level voice cloning and 60x real-time audio generation at $22 per million characters. MAI-Image-2 doubles generation speed and has been integrated into Bing and PowerPoint. All three models were developed by a team of fewer than 10 people using half the GPU resources of competitors. Microsoft emphasizes its "humanistic AI" vision, adopting aggressive pricing to reduce costs, and plans to release a frontier LLM comparable to GPT in the future.

Read full article

Alibaba Launches Qwen3.6-Plus, Programming Capability Near Claude Opus 4.5, Leading Domestic Model

AlibabaLarge ModelProgramming

Alibaba launched Qwen3.6-Plus, the latest large language model in its Qwen series, on April 2. It performs close to Claude Opus 4.5 in authoritative programming benchmarks such as SWE-bench, Terminal-Bench2.0, and NL2Repo, outperforming domestic models like GLM-5 and Kimi-K2.5. The model has native multimodal understanding capabilities, enabling direct front-end code generation from UI screenshots or design drafts, completing an end-to-end workflow of 'understanding interface → generating code → invoking tools for modification'. It supports a 1-million-token context window and integrates with mainstream Agent frameworks like OpenClaw and Qwen Code. Now available on Alibaba Cloud's Bailing platform, input pricing starts at just $2 per million tokens. This follows Qwen3.5 less than a month and a half ago, indicating accelerated iteration, with a stronger flagship version, Qwen3.6-Max, soon to be released.

Read full article

Foundational AI Funding Reaches $178 Billion in Q1 2026, Double 2025 Full-Year Total

FundingAI IndustryInvestment Trend

According to Crunchbase data, venture investment in foundational AI startups reached $178 billion in Q1 2026 (24 deals), double the $889 billion total for all of 2025. Funding is highly concentrated among top players: OpenAI has raised over $122 billion cumulatively, Anthropic secured a $30 billion Series G at a $380 billion valuation, and xAI completed a $20 billion Series E. European startup AMI set a record with a €1.03 billion seed round. Global AI investment totaled $211 billion in 2025, accounting for about 50% of global VC funding. OpenAI and Anthropic are expected to prepare for IPOs by late 2026 or 2027, while xAI has merged with SpaceX and will go public via its IPO.

Read full article

Hugging Face Releases Transformers v5.5.0, Integrates Gemma 4 and Two Other New Models on Launch Day

Hugging FaceOpen Source ToolMultimodal

Hugging Face released version 5.5.0 of the Transformers library, adding support for three new models. Gemma4 is a multimodal model supporting image input, using fixed token budgets and 2D RoPE encoding to handle images of varying sizes. NomicBERT is the first open-source, reproducible long-context text embedding model, supporting up to 8192 context length and outperforming OpenAI Ada-002 on MTEB and LoCo benchmarks. MusicFlamingo is an audio-language model capable of understanding up to 20 minutes of audio, introducing Rotary Time Embeddings. Additionally, this release natively implements the Mamba caching mechanism, removes remote code execution support for LightGlue, and improves static check performance by up to 27x.

Read full article

GPU Rental Prices Surge 40% in Six Months, H100 One-Year Contract Hits $2.35/Hour

GPUInfrastructureCompute Shortage

SemiAnalysis released its H100 one-year rental price index, showing H100 rental prices surged from $1.70/hour in October 2025 to $2.35/hour in March 2026, a nearly 40% increase. Demand growth stems from expansions at AI companies like Anthropic, explosive growth in multi-agent workloads, and native media generation driving massive token consumption. Currently, on-demand capacity for almost all GPU types is sold out. Blackwell chip deliveries are backlogged until June–July 2026, with nearly all new capacity through August–September already reserved. Long-term contracts (4–5 years) are dominated by large AI labs. Future price trends depend on GB300 production ramp-up and AI model revenue growth.

Read full article

NMPA Releases Implementation Guidelines for AI+Drug Regulation, Aiming to Build Integrated Innovation System by 2030

PolicyDrug RegulationAI Application

The National Medical Products Administration (NMPA) released the "Implementation Guidelines for Artificial Intelligence + Drug Regulation," proposing to initially establish an integrated innovation system linking drug regulation and AI by 2030. This includes high-quality datasets, vertical large models, and intelligent agents to enable efficient human-AI collaboration in review and approval, inspections, testing, and monitoring. By 2035, a digitally driven smart governance framework for drug safety is expected to be largely complete. Key focus areas include human-AI collaborative intelligent review, full-chain intelligent supervision, and digital transformation of risk management, particularly for high-risk products like blood制品 and traditional Chinese medicine injections. Foundational support emphasizes dataset development, model application systems, computing infrastructure, and enhanced security frameworks.

Read full article

Zhipu and MiniMax Release First Post-IPO Financials, Revenue Up 132% and 159% Respectively

ZhipuMiniMaxAI Commercialization

Chinese AI startups Zhipu AI and MiniMax released their first financial reports since going public on April 2. Zhipu AI achieved 2025 revenue of RMB 724 million (~$105 million), up 131.9% year-on-year, primarily through its Model-as-a-Service (MaaS) platform offering AI model deployment for institutional clients. MiniMax reported revenue of $79 million during the same period, up 159%, with more diversified income streams including enterprise services, the video generation platform HaiLo AI, and consumer products like the AI companion app Talkie. Both companies saw widened losses, but their rapid revenue growth reflects early success in business model exploration among China’s pure AI startups. In Hong Kong’s Q1 IPOs, Zhipu and MiniMax stood out with cumulative gains of approximately 5x.

Read full article

Meta Releases KernelEvolve, AI Agent Automates GPU Kernel Optimization Outperforming Human Experts

MetaAI InfrastructureKernel Optimization

Meta's engineering blog introduced KernelEvolve, a system transforming GPU kernel optimization from manual expert tuning into an automated, search-based agent workflow. The system treats kernel generation as a structured search problem, using retrieval-augmented knowledge bases to dynamically inject hardware-specific documentation and optimization patterns, enabling LLMs to generate optimized code for new hardware without pre-training. Automated evaluation and feedback loops enable continuous performance improvement, with successful strategies distilled into reusable skills and written back to the knowledge base. Unlike one-off code generation, KernelEvolve searches across hundreds of candidate implementations, frequently achieving performance superior to human experts in production workloads.

Read full article

Cloudflare Redesigns CDN Cache Architecture to Counter AI Crawlers, Introduces Tiered AI-Aware Caching

CloudflareInfrastructureAI Crawler

Cloudflare published a blog post noting that AI crawler traffic patterns fundamentally differ from human behavior, with high unique URL ratios and long-tail sequential content scanning causing severe cache thrashing, evicting popular content and reducing hit rates for human users. Traditional LRU algorithms are insufficient for AI-era mixed workloads; early experiments show SIEVE or S3FIFO algorithms better preserve hit rates for human traffic. Cloudflare proposes a tiered AI-aware caching architecture: prioritizing edge response speed for human traffic, while reserving deeper, higher-capacity cache layers for AI training crawls. Platforms like Wikipedia and SourceHut have already experienced 50% bandwidth spikes and service instability due to AI crawling.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief