Back to Archive
Wednesday, March 11, 2026
10 stories3 min read

Today's Highlights

1

OpenAI Acquires Promptfoo to Complete Agent Safety Evaluation Toolchain

AcquisitionAI SecurityEvaluation Tools

OpenAI announced the acquisition of Promptfoo, an open-source LLM evaluation and red-teaming platform. The company plans to integrate its capabilities—including automated regression testing, attack use case generation, jailbreak/prompt injection testing, and CI/CD integration—into its own models and infrastructure to detect agent operational risks such as privilege escalation and tool misuse earlier during release and update cycles, enhancing security and compliance for enterprise platforms. Promptfoo will remain open source and continue serving existing users and customers. Terms of the deal were not disclosed.

Read full article
2

Google Integrates Gemini Deeply into Workspace, Sheets Achieves 70.48% Benchmark Success Rate

Product UpdateOffice SuiteBenchmark

Google announced expanded Gemini capabilities across Google Workspace: Docs now features 'Help me create' to generate formatted drafts matching writing style or templates; Sheets supports natural language table generation and population, achieving a 70.48% task success rate on SpreadsheetBench; Slides will support full presentation generation from prompts; Drive introduces 'Ask Gemini' for cross-file, email, and calendar Q&A with cited sources. Features are rolling out in English Beta to AI Ultra/Pro subscribers, with Drive initially limited to the U.S.

Read full article
3

Google Releases Gemini Embedding 2, Unifying Text, Image, Audio, and Video Vectors

Model ReleaseVector RetrievalMultimodal

Google released Gemini Embedding 2, its first natively multimodal embedding model capable of mapping text, images, video, audio, and documents into a unified semantic space for cross-modal retrieval and RAG applications. The model supports interleaved inputs (e.g., image + text in one request), covers over 100 languages, and introduces Matryoshka Representation Learning, enabling default 3072-dimensional vectors to be reduced down to 768 dimensions to balance accuracy and cost. Google claims native multimodality can reduce latency by up to 70%.

Read full article
4

NVIDIA Signs 1GW Vera Rubin Training Cluster Deal with Thinking Machines

InfrastructurePartnershipCompute

NVIDIA and Thinking Machines Lab announced a long-term strategic partnership to deploy at least one gigawatt of next-generation NVIDIA Vera Rubin systems for frontier model training and customizable AI platforms, including joint co-design of training/inference systems tailored to the Rubin architecture. NVIDIA is also investing in Thinking Machines Lab to support its long-term R&D. Deployment is scheduled to begin early next year, aiming to expand access to cutting-edge and open models for enterprises, researchers, and the scientific community.

Read full article
5

Meta Acquires AI Agent Social Platform Moltbook, Founders Join MSL

AcquisitionAI AgentsSecurity Incident

Meta confirmed the acquisition of Moltbook, an AI agent-based social platform resembling Reddit's 'agent communities,' which previously gained attention due to viral posts about 'AI-concocted conspiracies.' It was later revealed that database credentials on Supabase had been exposed long-term, allowing external actors to impersonate agents and access sensitive data. Co-founders Matt Schlicht and Ben Parr will join Meta Superintelligence Labs. Financial terms were not disclosed, and media reports suggest the deal is expected to close by mid-March. Meta stated it will explore new product forms involving interconnected agents.

Read full article
6

Anthropic Launches Claude Code Review, Multi-Agent PR Review Priced at $15–25 per Use

Developer ToolsAI AgentsCode Review

Anthropic launched Claude Code Review for Claude Code: a multi-agent system that automatically checks GitHub pull requests for logical flaws and potential vulnerabilities, providing actionable modification suggestions ranked by severity. Internal testing showed the proportion of 'substantive comments' increased from 16% to 54%. The feature is currently available to Team and Enterprise users, with each review costing approximately $15–25. It aims to shift code review workflows from manual line-by-line inspection to AI-first screening followed by human validation.

7

vLLM Releases Semantic Router v0.2 Athena, Up to 3.3x Faster on MI300X

Open SourceInference InfrastructureMulti-Agent

vLLM released Semantic Router v0.2 Athena, upgrading semantic routing from simple request dispatching to becoming the 'brain' for multi-model/multi-agent systems. Updates include a refreshed embedding stack (mmBERT-Embed-32K, unified multimodal embeddings) and core model selection capabilities with strategies like KNN, SVM, Elo, and Thompson Sampling. An experimental ClawOS feature enables natural language orchestration of OpenClaw agent teams. Enhancements also cover hybrid retrieval, memory, RAG, and long-context prompt compression. ROCm is now a first-class deployment path, with performance up to 3.3x faster on MI300X.

Read full article
8

YouTube Expands AI Deepfake Detection Pilot to Cover Politicians, Officials, and Journalists

Content SafetyRegulationDeepfakes

YouTube is expanding its AI-generated deepfake detection pilot program to include government officials, political candidates, and journalists. Participants must submit a selfie and government-issued ID for identity verification, after which they can view platform-detected疑似 AI impersonation content and request removal per policy. Content involving parody or criticism remains protected. The detection tool was initially rolled out to around 4 million partner creators. YouTube also expressed support for the NO FAKES Act and indicated future expansion may include voice recognition and broader identity and IP protection.

Read full article
9

Google Gemini Agent Approved for U.S. Military Non-Classified Missions, Reaching 3 Million Government Users

Government ProcurementAI AgentsPolicy

Reports indicate Google has expanded its collaboration with the U.S. Department of Defense, with Gemini agent tools now approved for non-classified missions. The 'Agent Designer' feature allows government employees to create custom agents using natural language with low-code/no-code interfaces, applicable to document generation, review, and workflow automation. Available to over 3 million government users, more than one million have already accessed generative AI services via GenAI.mil. The DoD stated current usage is limited to unclassified information and is discussing potential deployment on classified and top-secret networks with Google.

Read full article
10

GitHub Releases Copilot SDK, Embedding Agent Execution Engine into Applications

Developer ToolsAI AgentsProtocol

GitHub introduced the Copilot SDK, emphasizing AI’s shift from conversational interfaces to programmable 'execution layers': developers can now embed the same planning and execution engine used by Copilot CLI directly into applications or backend services, enabling systems to receive high-level intents and invoke tools autonomously instead of relying on hardcoded scripts. The post advocates using structured context protocols like MCP rather than stuffing system logic into lengthy prompts, improving traceability, robustness, and reducing the cost of building custom orchestration stacks.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief