Back to Archive
Wednesday, January 28, 2026
9 stories3 min read

Today's Highlights

1

Moonshot Open-Sources Kimi K2.5: 100 Sub-Agents and Visual Encoding

Model ReleaseOpen SourceAgent

On January 27, Moonshot released and open-sourced Kimi K2.5, a natively multimodal and agent-capable large model (with MoE architecture and up to 1T parameters) that supports image/video input and generates front-end interfaces via 'visual encoding'. Its Agent Cluster/Swarm can automatically orchestrate up to 100 sub-agents in parallel, enabling cumulative tool usage of up to 1,500 steps for long-horizon tasks. The company also launched Kimi Code, integrating with IDEs such as VSCode and Cursor. Publicly disclosed API pricing is set at $0.6 per million input tokens and $3 per million output tokens, under a modified MIT license that requires attribution for large-scale commercial use.

Read full article
2

OpenAI Launches Prism: Free GPT-5.2-Powered LaTeX Workspace for Research

Product ReleaseResearch ToolLLM Application

OpenAI has introduced Prism, a scientific collaboration workspace that integrates GPT-5.2-powered writing, revision, citation management, formula processing, and real-time collaboration into a native cloud-based LaTeX environment. It is freely available to all ChatGPT account users with support for unlimited projects and collaborators. Prism emphasizes 'stricter context management,' enabling reasoning and literature retrieval within project contexts, and leverages visual capabilities to convert whiteboard sketches into charts and other assets. OpenAI disclosed that approximately 1.3 million researchers submit over 8 million science- and math-related queries to ChatGPT weekly. The company plans to offer advanced features for commercial, enterprise, and educational users in the future.

Read full article
3

DeepSeek Open-Sources DeepSeek-OCR 2: Achieves 91.09% on OmniDocBench

Open SourceVision/OCRModel Architecture

DeepSeek has released and open-sourced its OCR model DeepSeek-OCR 2, introducing DeepEncoder V2, which employs a 'causal flow'-style semantic-driven visual token reordering to process documents according to structure rather than fixed left-to-right scanning, enhancing understanding of complex layouts, tables, and formulas. It achieves 91.09% on OmniDocBench v1.5, a 3.73% improvement over the previous version, while maintaining visual token budgets between 256 and 1,120 to balance accuracy and inference cost. The model retains a 3B-parameter MoE architecture; it still faces challenges in high-density text scenarios but can be improved through cropping and sample augmentation. This approach is seen as a significant step toward a unified universal modality encoder.

Read full article
4

Google Introduces Agentic Vision in Gemini 1.5 Flash: 5–10% Quality Gain

Model CapabilityMultimodalDeveloper Tool

Google has introduced 'Agentic Vision' in Gemini 1.5 Flash, upgrading image understanding from single-pass observation to an iterative 'Think-Act-Observe' investigation loop. The model can automatically execute code to perform deterministic operations such as zooming, cropping, counting, and labeling, then answer based on accumulated evidence—reducing hallucinations and coarse estimates. Official benchmarks show consistent 5%–10% quality improvements across most vision tasks when code execution is enabled. The system can autonomously detect if details are too small, generate code to magnify key regions, and produce a visual 'scratchpad' with bounding boxes and labels to validate reasoning. This capability targets developer tools, emphasizing the integration of visual grounding with verifiable execution.

Read full article
5

Anthropic Launches MCP Apps and Integrates with Claude: Tool Calls Return Interactive UI

AgentProtocol/StandardProduct Update

Anthropic has launched the MCP Apps open specification, natively supported in Claude.ai, enabling external tool calls to return interactive UI components directly within the chat interface—such as draft Slack messages or Figma/document content—reducing workflow friction from copy-pasting and multi-tab switching. Multiple sources confirm integration with tools including Asana, Slack, Figma, Box, and Canva, with support for developers to extend the ecosystem via MCP. Available to subscribers (Pro, Max, Team, and Enterprise), this advancement reflects a shift in Agent products from mere tool invocation toward standardized, interactive delivery within conversational interfaces.

Read full article
6

Tencent HunYuan Open-Sources HPC-Ops Operator Library: Up to 2.22x Faster Attention

Inference AccelerationOpen SourceAI Infra

Tencent HunYuan has open-sourced its inference operator library HPC-Ops, featuring deep CUDA/CuTe optimizations for core operators in large model inference, aiming to improve end-to-end throughput (reportedly by ~30%). Its Attention operator achieves up to 2.22x speedup over FlashInfer, and GroupGEMM up to 1.88x over DeepGEMM. The library emphasizes instruction emission sequence and prefetch strategy optimizations tailored for mainstream inference GPUs (e.g., H20), pushing bandwidth utilization close to hardware peaks. It also introduces an abstraction layer to reduce maintenance costs for high-performance kernels, targeting deployment scenarios for models like HunYuan and DeepSeek.

Read full article
7

StepFun Completes Over 5 Billion RMB B+ Round, Yin Qi Joins as Chairman

FundingOrganizational ChangeEdge/Hardware

StepFun has disclosed completion of its B+ round of financing exceeding 5 billion RMB, bringing in Megvii founder Yin Qi as chairman, forming a '1+3' core decision-making team. Reports indicate the company will deeply integrate foundational model capabilities with terminal hardware, advancing its 'AI+Device' strategy, while elevating AI Infra investment to equal importance with algorithm development. Focus areas include training efficiency, cluster stability, and resource utilization. Leadership changes emphasize engineering excellence and product-market fit, aiming to embed large models into high-frequency entry points (e.g., vehicles, smartphones) to create hardware-software closed loops, and to leverage long-term infra capabilities for cost and efficiency advantages in future model iterations.

Read full article
8

EU Launches DMA Rulemaking Procedure: Requires Google to Open Gemini and Search Data

Regulatory PolicyAntitrust/DMAData Access

The European Commission has initiated the Digital Markets Act (DMA) rulemaking procedure, requiring Google to provide competitors with access to its Gemini AI service and search data, aiming to prevent abuse of dominant positions and ensure fair market competition. While not a formal antitrust investigation, the process must conclude within six months and will result in enforceable draft measures. The EU is particularly examining whether Google fairly provides third parties with search data usable for training AI chatbots. Google has expressed concerns about potential impacts on privacy and innovation. This move signals the EU's intent to enforce 'interoperability/open interfaces' as a key regulatory constraint on major platforms in the AI and data domains, potentially reshaping competition in search and generative AI.

Read full article
9

Node-Based AI Design Tool Flora Raises $42M in Series A

FundingGenerative AI ApplicationCreative Tool

AI design tool Flora has raised $42 million in a Series A round led by Redpoint Ventures. The product uses a node-based canvas to organize creative workflows, allowing designers to rapidly generate and iterate multimedia content from text, image, or video inputs, with visual nodes connecting content generation, editing, and collaboration steps—reducing cross-tool switching costs for creative teams. The company counts several large enterprises and design studios among its clients. Proceeds will fund expansion of enterprise sales and marketing, enhancement of traditional editing features, and scaling the team to 2–3 times its current size. This funding reflects a shift in 'generative design workstations' from isolated model calls toward reusable workflow orchestration and team collaboration platforms.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief