AI Daily Brief

Thursday, April 2, 2026

10 stories3 min read

Today's Highlights

Liquid AI Releases LFM2.5-350M, 350M-Parameter Model Trained on 28T Tokens Challenges Large Model Paradigm

Model ReleaseEdge AI

Liquid AI has released a compact language model, LFM2.5-350M, with only 350 million parameters but trained on an extensive 28 trillion tokens, achieving a token-to-parameter ratio of 80,000:1. The model adopts a hybrid architecture based on Linear Input Varying (LIV) systems and incorporates grouped-query attention modules, supporting a 32k context window. It scores 76.96 on the IFEval instruction-following task, excelling in tool use and structured data extraction, though it is not suitable for complex math, programming, or creative writing. Optimized for edge devices, the model consumes only 81MB peak memory on Snapdragon GPUs, achieves an inference throughput of 40.4K output tokens per second on a single H100 GPU, and can run on a Raspberry Pi 5 with just 300MB of memory, demonstrating exceptional potential for edge deployment.

Read full article

NVIDIA Sets New Records in MLPerf Inference v6.0, Blackwell Ultra Delivers Up to 2.7x Performance Gain

BenchmarkChip

NVIDIA's systems based on Blackwell Ultra GPUs have set multiple records in the MLPerf Inference v6.0 benchmark, accumulating 291 wins in training and inference since 2018—nine times more than all other platforms combined. This round introduced new workloads including DeepSeek-R1 and Qwen3-VL-235B, with NVIDIA being the only platform to submit results for all new models. Through TensorRT-LLM software optimization, the GB300 NVL72 system achieved up to a 2.7x performance improvement on DeepSeek-R1, reducing token cost by over 60%. Key technologies include Disaggregated Serving, Wide Expert Parallel, and Multi-Token Prediction. Four GB300 NVL72 systems interconnected via Quantum-X800 InfiniBand enable millions of tokens processed per second across 288 GPUs.

Read full article

Hugging Face Launches TRL v1.0, Unified Post-Training Framework Transitions to Production-Grade

Open Source ToolModel Training

Hugging Face has officially launched TRL v1.0, marking the library’s transition from research-focused to a stable, production-grade reinforcement learning framework. TRL v1.0 offers a unified post-training workflow covering supervised fine-tuning (SFT), reward modeling, and alignment algorithms such as DPO and GRPO. It introduces three core features: a CLI interface, a unified configuration system, and an expanded suite of alignment algorithms. The new version supports YAML-driven training configurations, integrates with Hugging Face Accelerate for cross-hardware distributed training, and leverages PEFT (LoRA/QLoRA) and Unsloth acceleration kernels to significantly improve training efficiency and memory utilization. Additionally, trl.experimental provides a namespace for cutting-edge experimental methods like ORPO, making large model post-training more standardized, efficient, and reproducible.

Read full article

Google DeepMind Launches Veo 3.1 Lite, Cost-Effective Video Generation Model Now Accessible via Gemini API

Video GenerationProduct Release

Google DeepMind has launched Veo 3.1 Lite, a video generation model positioned as the most cost-effective solution, cutting costs by over half compared to its predecessor while maintaining the same speed as Veo 3.1 Fast. The model supports text-to-video and image-to-video generation, offering flexible durations of 4, 6, and 8 seconds, compatible with both landscape (16:9) and portrait (9:16) formats, and delivering outputs at 720p and 1080p resolution. Accessible through the Gemini API and Google AI Studio, it lowers the barrier for developers and is well-suited for large-scale applications such as marketing, education, and social media content. This move signals a shift in video generation toward total ownership cost and integration efficiency as key competitive advantages.

Read full article

Oracle Initiates Massive Layoffs Amid Financial Pressure from AI Infrastructure Investments, Estimated 20,000–30,000 Jobs Affected

Corporate NewsLayoffs

Oracle has initiated layoffs due to mounting financial pressure from massive investments in AI infrastructure, with its stock down 25% year-to-date. The cuts affect cloud services, sales, and healthcare teams across multiple regions including the U.S. and India. The company has taken on $50 billion in debt to build AI data centers, causing its remaining performance obligations to surge to $455 billion. While Oracle has not officially disclosed the scale, analysts estimate 20,000 to 30,000 jobs may be eliminated, potentially freeing $8–10 billion in free cash flow. Management emphasizes long-term returns from AI investments, noting current AI compute orders stand at $553 billion. This marks Oracle’s participation in a broader wave of workforce reductions across the tech industry, following Amazon, Microsoft, and Meta.

Read full article

SpaceX Secretly Files IPO Application, Targeting $1.75 Trillion Valuation and $75 Billion Fundraising

IPOAerospace

SpaceX has secretly filed for an initial public offering (IPO) with the U.S. SEC, planning to go public by June 2026 with a target valuation exceeding $1.75 trillion and aiming to raise up to $75 billion—far surpassing Saudi Aramco’s $29 billion IPO record in 2019. The IPO is being led by Bank of America, Citigroup, Goldman Sachs, JPMorgan Chase, and Morgan Stanley. SpaceX dominates the commercial space launch market and operates the Starlink satellite internet project, with projected 2026 revenue nearing $20 billion. Previously, SpaceX acquired Elon Musk’s AI company xAI, expanding its footprint in the AI domain. This development comes as other AI giants like OpenAI and Anthropic also prepare for public listings.

Read full article

Zhipu AI Launches AutoClaw Agent Integrated with Pony-Alpha-2, Stock Surges 31.9% in Hong Kong

AgentProduct Release

Zhipu AI has launched the AutoClaw agent, integrating the Pony-Alpha-2 model, which offers seamless deployment capabilities aimed at enhancing autonomous execution and collaboration efficiency in complex tasks. The Pony-Alpha-2 model demonstrates significant improvements in reasoning and task comprehension, enabling rapid adaptation across multiple scenarios. Driven by positive expectations of a 60-fold increase in annual recurring API revenue to 1.7 billion RMB, Zhipu AI’s Hong Kong-listed shares (02513.HK) surged 31.9% on the day, becoming one of the most watched stocks in the Hong Kong AI sector. This release underscores Zhipu AI’s continued exploration of agent-based technology and advances the deep industrial application of large models.

Read full article

AI Semiconductor Demand Squeezes Global Smartphone Market, Analysts Warn of Up to 31% Shipment Decline

ChipSupply Chain

Surging demand for AI-related semiconductors is driving memory chip prices sharply higher, severely squeezing the global smartphone market. Jefferies warns that smartphone shipments could decline by as much as 31% over the next year. Server manufacturers' demand has already pushed memory prices up by 70%, with another 50% increase expected by Q2 2026, potentially raising high-end smartphone prices by over $150. Counterpoint Research and IDC also forecast a 12%-13% drop in smartphone shipments in 2026. Qualcomm and Apple are both affected, with Apple delaying standard model production and prioritizing high-end device shipments. Micron Technology reported a threefold year-over-year revenue increase to nearly $24 billion in its latest quarter, yet still meets only about half to two-thirds of customer demand.

Read full article

Holo3 Achieves 78.85% on OSWorld Benchmark with MoE Architecture, Activating Only 10B Parameters

AgentModel Release

Hcompany has launched the Holo3 computer-use agent model, achieving an industry-leading 78.85% on the OSWorld-Verified benchmark, outperforming large models like GPT-5. Built on a Mixture-of-Experts (MoE) architecture with 122B total parameters, Holo3 activates only 10B parameters during inference, reducing computational cost to a fraction of that required by large closed-source models. Trained through a proprietary synthetic training flywheel that auto-generates enterprise environments, Holo3 demonstrates multi-step reasoning across applications and can orchestrate complex desktop workflows such as PDF parsing, budget validation, and email dispatch. The model represents a significant leap in open-source computer-use agents, showcasing the evolution from single-interface automation to adaptive, general-purpose agents.

Read full article

Kestra Secures $25 Million in Series A Funding, Open-Source Orchestration Platform Sees 25x Enterprise Revenue Growth in 18 Months

FundingOpen Source

Open-source orchestration platform Kestra has announced a $25 million Series A funding round led by RTP Global, bringing its total funding to $36 million. The company has seen a 25x increase in enterprise revenue over the past 18 months, executing over 2 billion workflows in 2025—a 20x year-over-year growth. With over 26,000 GitHub stars, Kestra is adopted by more than 30,000 organizations globally, including Apple, Toyota, and JPMorgan Chase, for use cases such as AI pipelines, cybersecurity analytics, and infrastructure automation. The new capital will be used to launch Kestra 2.0—featuring a distributed execution engine and real-time observability—launch a pay-as-you-go Kestra Cloud SaaS offering, and expand operations in North America and Europe.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief