Back to Archive
Monday, March 16, 2026
10 stories3 min read

Today's Highlights

1

ByteDance Releases Doubao-Seed 2.0 Series, Natively Multimodal with Code Version

Model ReleaseMultimodalProduct Deployment

ByteDance has launched the Doubao large model 2.0 series (Doubao-Seed-2.0), offering multimodal general models including Pro, Lite, and Mini variants, along with a Code model for developers, addressing diverse latency and cost requirements. The company emphasized its natively multimodal framework, where modalities learn jointly from early training stages, and native support for Agent-style task execution. Model capabilities have been iteratively validated through productized channels such as the Doubao app and Volcano Engine API. The company continues its self-developed, closed-source strategy to strengthen the 'product—data—training' closed loop.

Read full article
2

Zhipu × Tsinghua Introduce GLM-OCR 0.9B: +50% Throughput, API at $0.2/Million Tokens

MultimodalOCRAPI Pricing

Zhipu AI and Tsinghua University have introduced GLM-OCR (0.9B) for document parsing and key information extraction: composed of a 0.4B CogViT vision encoder and a 0.5B GLM decoder, enhanced with multi-token prediction (MTP), generating an average of 5.2 tokens per step during inference, boosting throughput by approximately 50%. The pipeline first uses PP-DocLayout-V3 for layout analysis, then performs parallel region recognition, supporting output in Markdown/JSON format and direct KIE-to-JSON generation. Achieves 94.6 on OmniDocBench v1.5 and 94.0 on OCRBench(Text). Supports deployment via vLLM/SGLang/Ollama, with MaaS pricing at $0.2 per million tokens.

Read full article
3

Moonshot AI Seeks Up to $1 Billion Funding at ~$18 Billion Valuation

FundingLarge ModelChina

Reports indicate that Moonshot AI is advancing a new funding round targeting up to approximately $1 billion, implying a valuation of around $18 billion—significantly higher than its previous ~$10 billion valuation. Existing investors including Alibaba, Tencent, and 5Y Capital are reported to have invested additional funds at the prior valuation tier. The accelerated fundraising reflects ongoing capital pursuit of leading Chinese large model companies, signaling that competition in domestic chatbot and Agent products will increasingly center on compute investment, product iteration speed, and ecosystem distribution capabilities.

Read full article
4

Lhasa Launches Tibetan Language Model DeepZang: 70M Corpus, 30K Hours Speech

Model ReleaseOpen SourceLow-Resource Language

The world's first Tibetan large language model, DeepZang, was launched in Lhasa, developed by Tibet Jueluo Digital Industry Management Co., Ltd. It is reported as China’s first Tibetan large model to complete national generative AI filing, positioned as an open-source large model platform supporting over 80 languages, featuring dialogue, translation, speech transcription, and accompanied by a dedicated app. Since 2018, the team has accumulated nearly 70 million high-quality Chinese-Tibetan parallel data samples and completed over 30,000 hours of voice recordings across three major Tibetan dialect regions, forming a large-scale speech database; the launch event received certification for 'world's first Tibetan large language model'.

Read full article
5

SILMA Open-Sources 150M Bilingual TTS: <8s Cloning, RTF ~0.12

Open SourceSpeechMultilingual

SILMA AI has released and open-sourced SILMA TTS v1 (150M parameters), supporting Arabic and English text-to-speech. Built on a diffusion-based architecture inspired by F5-TTS and trained from scratch using tens of thousands of hours of public and proprietary audio data. Key features include instant voice cloning with less than 8 seconds of reference audio; real-time factor (RTF) of approximately 0.12 on an RTX 4090; full support for Arabic Tashkeel diacritics; and availability under the Apache 2.0 license for commercial use. The model and code are available on Hugging Face and GitHub, enabling rapid testing and customization with just two lines of code.

Read full article
6

LangChain Releases Deep Agents: Planning + File System Context + Sub-Agent Isolation

Agent FrameworkDevelopment ToolContext Management

LangChain has introduced Deep Agents, designed for complex tasks involving multiple steps, statefulness, and large intermediate outputs, providing a structured 'runtime wrapper' based on LangGraph. Default capabilities include task planning and decomposition via write_todos; use of virtual file system tools to read/write files and execute commands, offloading large outputs or intermediate states to prevent context window overflow; context isolation through task-generated sub-agents to reduce quality degradation caused by single-thread objective and tool output accumulation; and integration with LangGraph Memory Store for cross-session long-term memory. The project outputs standard CompiledStateGraph, facilitating streaming, checkpointing, and production deployment integration.

Read full article
7

Princeton Open-Sources OpenClaw-RL: Train via Dialogue, Score Rises from 0.17 to 0.76

Reinforcement LearningAgentOpen Source

A Princeton team has proposed OpenClaw-RL, training AI agents using feedback generated within conversations: converting user dialogues, terminal commands, and tool calls into real-time training signals, reducing reliance on pre-collected datasets or teacher models. The framework consists of four decoupled, parallel modules supporting asynchronous training for continuous learning during usage. The method combines Binary RL for overall positive/negative rewards with Hindsight-Guided On-Policy Distillation to extract token-level improvement signals. Experiments show more natural expression after dozens of interactions; personalized scoring in simulated student scenarios improved from 0.17 to 0.76, and from 0.22 to 0.90 in teacher scenarios. Code is now open-sourced.

Read full article
8

AWS Offers Managed OpenClaw on Lightsail: One-Click Deploy with Bedrock Pre-Configured

Cloud ServiceAgentSecurity

InfoQ reports that AWS has launched a managed OpenClaw blueprint on Amazon Lightsail, providing a 'one-click deploy' managed environment for the popular autonomous agent framework, lowering barriers for self-hosted configuration and security hardening. The blueprint comes pre-integrated with Amazon Bedrock and automates certain permission and deployment steps. Users can interact via web interface or messaging platforms like WhatsApp, Telegram, Slack, and Discord after SSH pairing. The offering is positioned as a response to widespread vulnerabilities and configuration complexity in self-hosted instances: managed default configurations reduce risks of misexposure and inconsistent security settings, while offering non-specialist teams a more standardized deployment path.

Read full article
9

Study: Google AI Overviews Suppress Traffic, Breaking News Surges 103%

SearchMedia EcosystemGenerative AI

A study by Define Media Group suggests that Google's AI Overviews are significantly reducing organic search traffic for publishers across most content categories, yet 'breaking news' traffic has逆势 grown by 103%. The report notes this growth is primarily driven by Google Discover, whose traffic has reached parity with Web Search for the first time. For queries like 'iran war', Top Stories carousels still take precedence over AI summaries, possibly due to limitations in LLM real-time performance and accuracy. Statistics also show AI summaries appear in 15.1% of news-related results, while exceeding 43% in science, health, and other domains. Publishers are advised to treat Discover as an independent distribution channel, optimizing topic selection and publishing rhythm to counteract declining evergreen content traffic in the AIO era.

Read full article
10

Scale Labs Launches Showdown Human Evaluation Leaderboard: gpt-5.2-chat-latest and gemini-3-flash Rank High

EvaluationHuman PreferenceLLM

Scale Labs has released the Showdown leaderboard, evaluating multiple LLMs based on real-world human preference through pairwise blind voting in authentic conversation scenarios, emphasizing avoidance of synthetic benchmarks or lab-only metrics. Results show gpt-5.2-chat-latest and gemini-3-flash ranking highly across multiple categories, with performance differentiated between Thinking and non-Thinking modes, as well as multimodal scenarios like voice input. Scale states the evaluation involves active users from over 80 countries and 70+ languages, while noting that engineering factors such as API response formatting may affect scores. The leaderboard aims to provide enterprises with user-experience-aligned references for model selection.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief