AI Daily Brief

Saturday, January 31, 2026

9 stories3 min read

Today's Highlights

Mistral Releases Mistral 3 Open-Source Multimodal Family, Large 3 Totals 675B/Activates 41B

Model ReleaseOpen Source

Mistral has launched the new generation of its open-source multimodal model family, Mistral 3, emphasizing enterprise control and customizable deployment. The flagship Mistral Large 3 uses a MoE architecture with approximately 675 billion total parameters and 41 billion activated parameters. The release also includes Ministral 14B/8B/3B models, covering efficient inference, complex multimodal tasks, and edge scenarios. The series supports over 40 languages and features image-text understanding capabilities, operable in local, cloud, or hybrid environments, with an emphasis on auditability and reduced vendor lock-in.

Read full article

OpenAI to Retire GPT-4o and Three Other Models from ChatGPT on February 13, Daily Active Usage at Only 0.1%

Large ModelProduct Adjustment

OpenAI announced it will retire GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT on February 13, 2026. The company stated that currently only 0.1% of daily users still choose GPT-4o; after the launch of GPT-5, some paying users briefly requested its reinstatement. This adjustment aims to reduce the maintenance cost of running older models in parallel, streamline the available model lineup within ChatGPT, and drive user and developer migration toward newer models and product pathways.

Read full article

Study: 175,000 Ollama Hosts Exposed Publicly Worldwide, 7.23 Million Activities Detected Over 293 Days

SecurityInfrastructure

SentinelOne and Censys disclosed that approximately 175,000 Ollama hosts are publicly exposed globally, with 7.23 million activities recorded over 293 days across 130 countries and 4,032 autonomous systems. About half of these hosts can execute code or invoke external APIs, and at least 201 run prompt templates with security restrictions removed. The report indicates China accounts for about 30% and the US over 20%; this 'low-cost, abuse-prone' inference infrastructure could be exploited for spam, phishing, and disinformation generation, with resource and billing costs borne by the exposed hosts.

Read full article

DeepSeek Plans V4 Launch Within Weeks: Million-Token Context and Enhanced Code Generation

Large ModelR&D Update

Media reports indicate DeepSeek plans to release its flagship model V4 within the coming weeks, likely around mid-February. V4 is expected to support up to a million-token context window, enabling single-pass parsing of medium-sized codebases, technical documentation, and requirements specifications, alleviating context fragmentation in complex development workflows. Reports suggest improvements in continual training mechanisms to reduce degradation in comprehension abilities and enhance abstraction learning, with internal benchmarks showing superior code generation performance compared to some mainstream models. It may also introduce new architectures like mHC to improve parallel efficiency and better align with domestic chips, reducing deployment costs and inference latency.

Read full article

BIGAI and Peking University Release TongGeometry: Solves Nearly 25 Years of IMO Geometry Problems in 38 Minutes on Single GPU

ResearchReasoning

Beijing Institute of General Artificial Intelligence (BIGAI), Peking University, and collaborators published TongGeometry in Nature Machine Intelligence, using a neuro-symbolic architecture with guided tree search to achieve 'autonomous problem generation + automatic solving'. The team claims the system can solve nearly 25 years of International Mathematical Olympiad (IMO) geometry problems within 38 minutes using just a single consumer-grade GPU. A value function simulating 'mathematical aesthetics' enables active creation. Three original geometry problems generated by the system have been selected for inclusion in the 2024 Chinese National High School Mathematics Competition and U.S. elite olympiads, marking a case where AI-generated problems enter high-level human competitions.

Read full article

arXiv Proposes Scalable Power Sampling: Training-Free Inference Optimization, Reduces Latency by 10x+

PaperInference Optimization

arXiv introduced Scalable Power Sampling, an inference strategy that requires no training, external rewards, or validators. It approximates power distribution sampling via per-token low-temperature scaling to sharpen the output distribution during decoding. The paper argues that gains from reinforcement learning (RL) post-training stem more from 'sharpening' the distribution than acquiring new capabilities, making inference-time sampling a viable alternative to部分 post-training. The authors claim the method matches or exceeds single-step GRPO gains across math, question answering, and coding tasks on four different LLMs; compared to MCMC power sampling, it reduces inference latency by over 10x with significantly lower computational overhead.

Read full article

EU AI Act Enforcement May Be Centralized Under AI Office, Italy, Germany Demand Limits

PolicyRegulation

According to MLex, the European Commission plans to strengthen the AI Office's authority to centralize enforcement of the EU AI Act. However, member states including Italy and Germany oppose the 'centralized enforcement' model, fearing erosion of national regulatory autonomy, and demand clear responsibility boundaries and retention of control over sensitive domains. Meanwhile, key European Parliament members advocate revising regulations to cover AI agents, ban generative deepfakes, and restore/strengthen registration obligations and protections for sensitive data. These developments reflect ongoing tensions in the EU over rebalancing centralized enforcement power and addressing emerging AI risks.

Read full article

Praetorian Open-Sources Julius: Fingerprints LLM Services Like Ollama and Enumerates Model Lists

SecurityOpen-Source Tool

Security firm Praetorian released and open-sourced Julius, a tool for HTTP fingerprinting of LLM services, helping security teams detect unauthorized deployments of Ollama, LiteLLM, Open WebUI, and others within enterprise networks. Julius identifies service types based on target URLs, extracts available model lists, and provides configuration details for interacting with the service. Detection rules are defined in YAML and support response caching to minimize redundant requests, with JSON output facilitating integration into automated security workflows. Built in Go, the tool currently focuses on HTTP fingerprinting and is licensed under Apache-2.0.

Read full article

Ex-DeepMind Team Poetiq Raises $45.8M Seed Round for LLM Self-Optimizing 'Meta-System'

FundingAgent

Poetiq, founded by former Google DeepMind researchers, announced a $45.8 million seed funding round. Its 'meta-system' software wraps existing LLMs into AI agents capable of self-optimizing output quality and automatically terminating unnecessary computations during inference to reduce costs. It also enables rapid adaptation to user tasks with minimal training samples. The company claims the system helped GPT-5.2 achieve a 16% improvement over the previous top score on the ARC-AGI-2 benchmark. Funds will be used to advance productization and expand the team, focusing on engineering capabilities that enhance reasoning quality and control inference costs.

Read full article

Don't Miss Tomorrow's Insights

Join thousands of professionals who start their day with AI Daily Brief