MiniMax Open Sources M2.5, SWE-Bench Verified 80.2%
Model ReleaseOpen SourceAI Coding
MiniMax launched its text model M2.5 on February 12 and announced global open-sourcing on February 13, positioning it as a native Agent production-grade model. It achieved 80.2% on SWE-Bench Verified and 51.3% on Multi-SWE-Bench. The Lightning version delivers over 100 TPS output speed, with input costing approximately $0.3 per million tokens and output around $2.4 per million tokens; tool-calling and search capabilities improved by 20%. The company stated that leveraging the Forge framework and large-scale Agent reinforcement learning enabled approximately 40x training acceleration.
Anthropic Completes $30 Billion Funding Round, Valued at $380 Billion
FundingLarge Model
Anthropic announced the completion of a new $30 billion funding round, reaching a post-investment valuation of $380 billion, led by Coatue and Singapore sovereign fund GIC, with participation from 38 investors including Microsoft and NVIDIA. The company disclosed an annualized revenue of approximately $14 billion, serving over 500 enterprise customers, eight of which are among the Fortune Top 10. Funds will be used for frontier research, product development, and scaling compute infrastructure. Anthropic also pledged to offset electricity price increases caused by data center expansion, avoiding passing power costs onto residential users.
Ant Group Open Sources Ring-2.5-1T, Self-Evaluated 35/42 on IMO
Open SourceReasoning ModelLong Context
Ant Group open-sourced its reasoning model Ring-2.5-1T, featuring a linear attention architecture combining MLA and Lightning Linear in a 1:7 ratio, designed for long-context reasoning. Documentation indicates that when generation length exceeds 32K, memory access is reduced to one-tenth of the previous generation, with generation throughput tripled. Training introduced dense rewards on top of RLVR and employed fully asynchronous Agent reinforcement learning to enhance long-horizon task planning and tool collaboration. The model self-evaluated 35/42 on IMO 2025, claiming performance within gold medalist range.
DeepMind Introduces Aletheia, Achieves 95.1% on IMO-Proof Advanced
AI AgentResearch Progress
Google DeepMind released Aletheia, a mathematical research agent based on an enhanced Gemini Deep Think, utilizing a 'generate-verify-revise' three-phase agent loop to improve proof accuracy. It achieved 95.1% accuracy on the IMO-Proof Bench Advanced, surpassing the prior record of 65.7%. The team reported that by scaling compute during inference, the computational cost for solving Olympiad problems was reduced 100-fold compared to the 2025 version. Aletheia independently resolved four open problems in the Erdős conjecture set and proposed a hierarchical framework for assessing AI autonomy in mathematical research.
vLLM Discloses DeepSeek Performance of 7360 TGS per GB300 Card
Inference AccelerationComputeEngineering Practice
vLLM published performance benchmarks for running DeepSeek-V3.2 and DeepSeek-R1 on NVIDIA GB300 (Blackwell Ultra). Under NVFP4 quantization with TP2 parallelism, V3.2 achieves 7360 TGS prefill throughput per GPU and 2816 TGS in mixed scenarios (2k input / 1k output); R1 reaches 22476 TGS prefill and 3072 TGS mixed under 2×GB300 with EP2 configuration. The article notes that B300 offers approximately 8x prefill and 10–20x mixed throughput gains over Hopper, and discusses prefill/decode splitting to improve high-concurrency latency and throughput.
GitHub unveiled a technical preview of Agentic Workflows, enabling developers to describe intent via Markdown and have AI agents execute repository maintenance tasks in GitHub Actions—such as triaging issues, updating documentation, and suggesting code simplifications—positioned as 'Continuous AI'. To mitigate overreach and errors, workflows default to read-only permissions; any write actions must map through 'safe outputs' to pre-approved, auditable GitHub operations, maintaining human approval loops. GitHub emphasized that agents run in sandboxed environments, with explicit declarations of permissions, tools, and allowed outputs.
AWS Adds Proxy and Persistent Profile Support to AgentCore Browser
Cloud ServiceAI AgentEnterprise Deployment
AWS enhanced Amazon Bedrock AgentCore Browser with enterprise-grade browsing: support for proxy routing ensures stable egress IPs and meets corporate network compliance; persistent browser profiles retain cookies and local storage across sessions, reducing repeated logins; and support for loading Chrome extensions (hosted from S3) allows customization of page processing logic. AWS also introduced a tiered routing priority (bypass mode, domain rules, default proxy) for granular control over traffic routing and data boundaries, improving agent usability and security in real-world web workflows.
Sophos: OpenClaw Exposes Over 30K Instances, Agent Security Concerns Rise
AI SecurityAI Agent
Sophos warned of OpenClaw (Moltbot/Clawdbot) security risks in enterprises: researchers found over 30,000 exposed OpenClaw instances online, with attackers already discussing repurposing its 'skills' for botnets. Threats include malicious skills or prompt injection leading to local host compromise, and agents transferring sensitive data between trusted and untrusted systems, creating data leakage chains. The report recommends enterprises prohibit direct usage or restrict execution to sandboxes devoid of sensitive data, while establishing approved skill marketplaces, dedicated LLM access layers, and session isolation to mitigate risks from the combination of executable tools, external connectivity, and handling untrusted content.
Pete Warden Open Sources Streaming ASR, 245M Parameters with 6.65% WER
SpeechOpen Source
Pete Warden, a member of the OpenAI developer community, announced the open-sourcing of a streaming speech-to-text (streaming STT) model and runtime library for real-time speech recognition. Its largest model has approximately 245 million parameters and achieves a 6.65% word error rate (WER) on the HuggingFace OpenASR leaderboard, outperforming Whisper Large v3's 7.44% WER despite the latter having ~1.5 billion parameters. Due to forum posting limits, he did not directly attach the repository link; resources are compiled in the pinned blog post on his personal site. This release sparked community discussion on lightweight, low-latency ASR applications in real-time agents and voice interaction.
Musk Announces xAI Restructuring, Grok Line Reorganization and Layoffs
Company NewsOrganizational Change
According to Business Insider Japan, Musk announced organizational restructuring at xAI during an all-hands meeting on the evening of February 10: establishing new leadership structures for Grok, Grok Voice, Grok Code, and Grok Imagine, and adjusting management of the white-collar automation project 'Macrohard'. Following reports of multiple co-founders departing, additional staff exits and headcount reductions occurred. Internal sources indicated Musk expressed dissatisfaction with progress on certain projects and pushed for team downsizing. This move comes about a week after SpaceX's acquisition of xAI, with reports suggesting the integrated company plans to pursue an IPO within 2026.