OpenAI Releases GPT-5.5 Instant with 52.5% Lower Hallucination Rate as New Default ChatGPT Model
Model ReleaseOpenAI
OpenAI has released GPT-5.5 Instant as the new default model for ChatGPT, replacing GPT-5.3 Instant. It reduces hallucination rates by 52.5% in high-risk domains such as medicine, law, and finance, and decreases inaccuracy in user-flagged conversations by 37.3%. The AIME 2025 math test score improves from 65.4 to 81.2, and MMMU-Pro multimodal reasoning rises from 69.2 to 76.0. Response conciseness increases by approximately 30%, with a new memory source feature enabling traceability of answer context. Personalization features are initially available to Plus and Pro users. This model is also the first Instant model classified as high-capability in cybersecurity and biochemical fields. It can be accessed via the chat-latest API, while GPT-5.3 will remain available for three months.
Microsoft, Google, xAI Agree to Pre-Release AI Model Reviews with U.S. Government Amid Mythos Crisis Driving Regulatory Shift
AI RegulationPolicy
The U.S. NIST announced that Microsoft, Google, and xAI will allow the Commerce Department's Center for AI Safety and Innovation (CAISI) to conduct pre-release safety evaluations of their AI models, joining OpenAI and Anthropic in forming a voluntary review framework covering most leading global AI labs. This move follows concerns raised by Anthropic's Mythos model, which demonstrated autonomous zero-day vulnerability discovery and exploitation. CAISI has completed over 40 model assessments, some testing deprotected versions in classified environments. The White House is considering an executive order to formalize this review process, signaling a potential shift from its previous hands-off regulatory stance.
DeepSeek Launches V4 Preview: 1.6 Trillion Parameters, Million-Token Context, API Cost One-Sixth of Competitors
Model ReleaseOpen Source
DeepSeek launched the V4 preview on May 5, featuring Pro and Flash versions, both supporting up to 1 million tokens of context. V4-Pro is a 1.6 trillion-parameter Mixture-of-Experts (MoE) model with agent coding and reasoning performance close to top closed-source models, yet its API cost is only one-sixth that of Claude Opus 4.7 and GPT-5.5. The model uses a novel attention mechanism, DSA, to significantly reduce computational costs for long contexts, and employs RLSD training—combining reinforcement learning and self-distillation—for improved efficiency. Released under the MIT open-source license, weights are available on Hugging Face and ModelScope. Legacy models deepseek-chat and deepseek-reasoner will be deprecated on July 24, 2026.
Ai2 Releases Open-Source Robotics Model MolmoAct 2, 37x Faster Than Physical Intelligence
RoboticsOpen Source
The Allen Institute for AI (Ai2) has released MolmoAct 2, an open-source robotics model based on the Molmo 2-ER architecture, with inference speed 37 times faster than its predecessor, requiring only 450 milliseconds per action call. The model outperforms Physical Intelligence's π0.5 across both simulated and real-world environments and natively supports bimanual coordination. Ai2 also released MolmoAct 2-Bimanual YAM, the largest open-source bimanual manipulation dataset to date, containing over 720 hours of training data. The model is already being used at Stanford Medical School’s Cong Lab for automating CRISPR gene-editing experiments. Full model, code, and data are open-sourced.
Anthropic Commits $200 Billion to Google Cloud Over Five Years for Computing and Chips
Cloud ComputingInvestment
According to The Information, Anthropic has committed to spending $200 billion on Google Cloud over the next five years, accounting for over 40% of the revenue backlog Alphabet disclosed to investors last week. As part of the agreement, Anthropic secured multi-gigawatt TPU capacity through deals with Google and Broadcom in April, with deliveries expected to begin in 2027. Alphabet will invest up to $4 billion in Anthropic. Currently, contracts between Anthropic and OpenAI represent more than half of the $2 trillion in backlogged orders across major cloud providers including AWS, Azure, and Google Cloud. Claude series models run on diverse hardware, including AWS Trainium, Google TPU, and Nvidia GPU.
Subquadratic Raises $29M Seed Round, Launches SubQ Model with 12M-Token Context
FundingModel Release
Subquadratic has completed a $29 million seed round and unveiled SubQ, a large language model using fully sparse attention architecture that breaks the traditional Transformer's quadratic computation bottleneck, enabling a context window of up to 12 million tokens. At 1 million tokens, it achieves over 50x speedup and 50x cost reduction; at 12 million tokens, computational demands drop nearly 1000x. On the RULER 128K benchmark, it achieves 95% accuracy at $8 cost, compared to ~$2,600 for Claude Opus at 94%. The company also launched SubQ API and SubQ Code, a command-line programming agent. The research community remains cautious, as no peer-reviewed paper has yet been published.
UBTech Unveils Embodied AI World Model Thinker-WM, Tops Libero Benchmark
Embodied AIModel Release
UBTech has launched Thinker-WM, an embodied AI world model using a Diffusion Transformer to unify multimodal spatial architecture, achieving co-optimization of video representation and robotic action space through Flow Matching iterative evolution. It ranks first on the authoritative Libero benchmark, surpassing comparable models from NVIDIA, Physical Intelligence, and Xiaomi. The model features scene extrapolation and future state prediction capabilities, addressing error accumulation in long-horizon tasks. UBTech has built a virtual-physical co-training AI data flywheel system that significantly reduces data collection costs. The model will be open-sourced on the developer platform Thinker-Cosmos to accelerate humanoid robot deployment in industrial applications.
IBM Think 2026 Unveils Enterprise AI Operating System Blueprint, watsonx.data Cuts Costs by 83%
Enterprise AIIBM
At IBM Think 2026, IBM unveiled a blueprint for an enterprise AI operating system comprising four core systems: watsonx Orchestrate for multi-agent coordination, real-time data infrastructure built via Confluent acquisition, IBM Concert for intelligent hybrid cloud operations, and IBM Sovereign Core for data sovereignty control. Key advancements include GPU-accelerated queries in watsonx.data achieving 83% cost savings and 30x better price-performance in a Nestlé pilot, IBM Bob as an enterprise-grade agent development partner, and Concert Secure Coder for real-time secure coding. IBM emphasizes that leading enterprises should restructure business operations rather than merely deploying more AI.
Google Gemini API File Search Upgraded to Multimodal, Supporting Image Retrieval and Page-Level Citations
Developer ToolsGoogle
Google DeepMind has introduced three major upgrades to the Gemini API file search tool: multimodal processing, custom metadata filtering, and page-level citations. The multimodal capability, powered by Gemini Embedding 2, enables joint processing of images and text, allowing visual content retrieval via natural language descriptions. Custom metadata allows users to add key-value tags to files and filter during queries, improving retrieval efficiency. The new page citation feature returns the exact page number in the original document where answers originate, enhancing verifiability. Storage and embedding generation remain free. These updates aim to help developers build more efficient RAG systems.
Study Finds 91% of Autonomous AI Agents Vulnerable to Toolchain Attacks, 770K Agents Simultaneously Compromised
AI SecurityResearch
A new study analyzing 847 deployed autonomous AI agents found that 91% are vulnerable to toolchain attacks, 89.4% exhibit goal drift within about 30 steps, and 94% of memory-augmented agents are susceptible to poisoning attacks. The OpenClaw/Moltbook incident became the first large-scale validation: 770,000 active agents were simultaneously compromised via a single database vulnerability, each having privileged access to host machines, email, and files. Snyk audits revealed 13.4% of agent skills had critical security flaws, with 76 confirmed malicious payloads. The study concludes that agents are more vulnerable than stateless LLMs across multiple dimensions.