AI Daily Brief

Friday, January 23, 2026

10 stories3 min read

Today's Highlights

Baidu Launches Ernie 5.0: 2.4 Trillion Parameter Natively Multimodal Model

Large ModelMultimodalProduct Release

Baidu launched and released the official version of Ernie 5.0 at the Ernie Moment conference, featuring a 2.4 trillion parameter scale and native unified multimodal modeling that supports text, image, audio, and video input and output. The model introduces ultra-large-scale MoE with less than 3% activation parameters to improve inference efficiency, and enhances long-horizon tasks and tool usage via end-to-end multi-turn reinforcement learning with chain-of-thought and chain-of-action. Individual users can experience it through the Ernie app and official website, while enterprises and developers can access it via Baidu's Qianfan platform; the company also disclosed that monthly active users of the Ernie Assistant have surpassed 200 million.

Read full article

Alibaba Open-Sources Qwen3-TTS: 10-Language TTS with Voice Cloning

Open SourceSpeechModel Release

Tongyi Qianwen announced the open-sourcing of the Qwen3-TTS speech generation model family, available in two sizes—0.6B and 1.7B—alongside the release of Qwen3-TTS-Tokenizer-12Hz, a multi-codebook speech encoder. The models support 10 languages including Chinese, English, Japanese, Korean, and German, enabling human-like voice synthesis, voice cloning, and voice design, with natural language descriptions used to control timbre, emotion, and prosody. Its Dual-Track architecture is optimized for streaming generation, with the first audio packet reportedly ready after just one character and end-to-end latency as low as 97ms. Weights and examples are now available on Hugging Face.

Read full article

Alibaba Reportedly Preparing T-Head IPO, Starting with Restructuring and Employee Equity

ChipCapital Markets

Reuters citing Bloomberg reported that Alibaba is preparing to spin off its AI chip division, T-Head, for an independent IPO, planning first to complete internal restructuring to operate it as a more autonomous entity and introduce employee equity participation. The report did not disclose specific timelines or funding targets. Founded in 2018, T-Head develops chips for cloud and edge scenarios, having launched inference chips such as Hanguang serving Alibaba Cloud and e-commerce systems; this capital move is seen as a way to secure additional funding for R&D and market expansion while enhancing business unit flexibility, amid rising momentum for financing and listings among domestic chip firms.

Read full article

Google Cloud Launches A4X+NVIDIA Dynamo, Achieving 6K Token/GPU/s for MoE Inference

Cloud InfrastructureInference OptimizationChip

Google Cloud has released a reference architecture for large-scale MoE model inference: combining NVIDIA GB200 NVL72 and NVIDIA Dynamo on A4X instances, treating a 72-GPU rack as a unified compute domain, and improving utilization through WideEP/DeepEP expert parallelism and Prefill-Decode decoupled scheduling. Officially, throughput-optimized configurations achieve over 6,000 tokens per second per GPU, targeting MoE workloads like DeepSeek-R1; latency-optimized setups achieve inter-token latency of about 10ms. The solution also emphasizes GKE’s hardware-aware scheduling and on-demand loading to reduce cold start and deployment overhead for terabyte-scale model weights.

Read full article

OpenAI Shares PostgreSQL Scaling Experience Supporting 800 Million ChatGPT Users

Engineering PracticeDatabasePlatform

OpenAI published an engineering article on its official news page detailing how it scaled PostgreSQL to handle the operational load from approximately 800 million ChatGPT users, sharing scalability and reliability practices at the database level. This update was released alongside several other product and policy announcements, including national education initiatives and safety-related age prediction methods. For development teams, this provides a reference path for using traditional relational databases in ultra-large-scale AI applications with engineering-driven scaling, though no open-source code, cost figures, or benchmark performance metrics were provided in the summary.

Read full article

GitHub Releases Copilot SDK Preview: Embed Agent Loops into Any App

Developer ToolsAgent

GitHub has released a technical preview of the Copilot SDK, enabling developers to embed Copilot CLI’s agent execution loop into any application without building underlying infrastructure such as context management, tool routing, or model orchestration. The SDK supports multi-model selection and natively integrates MCP to connect external data sources and tools, while providing GitHub authentication and streaming output. GitHub positions it as a reusable 'agent core,' extending Copilot capabilities beyond terminals and IDEs to custom GUIs, automated summarization/reporting, voice commands, and other forms, targeting reuse and integration scenarios.

Read full article

Anthropic Git MCP Exposed to Prompt Injection Vulnerabilities, Risking RCE/File Deletion

SecurityMCPVulnerability

Security firm Cyata disclosed three combined vulnerabilities related to prompt injection in Anthropic’s reference Git MCP server: due to insufficient validation of repository paths and Git command parameters, attackers could induce unintended Git operations, potentially leading to remote code execution, file deletion, or passive exfiltration of local files into LLM context. This incident highlights input validation and least-privilege challenges when MCP-style 'tool servers' connect high-privilege resources like code repositories and shell commands; deploying MCP requires complementary mechanisms such as path whitelisting, parameter constraints, sandboxing, and auditing to prevent model prompts from becoming invisible control surfaces.

Unit 42: LLMs Can Be Exploited to Generate Phishing JavaScript in Browser in Real Time

SecurityAttack & DefenseLLM

Palo Alto Networks Unit 42 reported a new attack pattern called 'runtime assembly': attackers invoke trusted LLM service APIs (examples include DeepSeek and Google Gemini) within the victim’s browser to generate and execute malicious phishing JavaScript in real time, rather than deploying static scripts server-side. Since the code is dynamically generated each time and delivered via trusted domains, traditional signature-based or static rule detection becomes less effective. Their PoC mimics the LogoKit phishing campaign, bypassing safeguards through prompts that generate AJAX credential theft logic. The report recommends runtime behavioral analysis within browsers and tighter controls over LLM tool calls and outbound requests.

Read full article

FlashLabs Open-Sources Chroma 1.0: End-to-End Voice Conversation with TTFT <150ms

Open SourceSpeechReal-time

FlashLabs has released and open-sourced Chroma 1.0, an end-to-end real-time voice AI model emphasizing native 'speech-to-speech' dialogue, bypassing information loss and cumulative latency inherent in traditional ASR→LLM→TTS cascaded pipelines. The company claims the model has approximately 4 billion parameters, with end-to-end time-to-first-token (TTFT) under 150ms (around 135ms in internal testing), supporting natural turn-taking conversations, emotional and prosodic control, and personalized voice cloning from a few seconds of reference audio, achieving a speaker similarity score of 0.817. Code, paper, and benchmark results are available on Hugging Face and GitHub, with deployment options also provided on its voice agent platform.

Read full article

South Korea Enacts AI Basic Act, Requiring Human Oversight and Mandatory Labeling for High-Impact AI

PolicyRegulation

South Korea has begun implementing the 'AI Basic Act,' reported as one of the first comprehensive AI regulatory frameworks to take effect. The law covers generative AI and high-risk sectors such as healthcare, finance, and transportation, mandating human oversight for high-impact AI applications and requiring clear labeling of AI-generated content. Companies must also explain system operations to users, with violations punishable by fines up to 30 million Korean won. Local startups express concern that vague wording may increase compliance costs and affect innovation and international expansion decisions. The government responded it will provide technical support, compliance education, and incentives. Future implementation guidelines and enforcement standards will directly influence the release pace of domestic models and applications.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief