Mistral has launched the new generation of its open-source multimodal model family, Mistral 3, emphasizing enterprise control and customizable deployment. The flagship Mistral Large 3 uses a MoE architecture with approximately 675 billion total parameters and 41 billion activated parameters. The release also includes Ministral 14B/8B/3B models, covering efficient inference, complex multimodal tasks, and edge scenarios. The series supports over 40 languages and features image-text understanding capabilities, operable in local, cloud, or hybrid environments, with an emphasis on auditability and reduced vendor lock-in.
OpenAI to Retire GPT-4o and Three Other Models from ChatGPT on February 13, Daily Active Usage at Only 0.1%
Large ModelProduct Adjustment
OpenAI announced it will retire GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT on February 13, 2026. The company stated that currently only 0.1% of daily users still choose GPT-4o; after the launch of GPT-5, some paying users briefly requested its reinstatement. This adjustment aims to reduce the maintenance cost of running older models in parallel, streamline the available model lineup within ChatGPT, and drive user and developer migration toward newer models and product pathways.
Study: 175,000 Ollama Hosts Exposed Publicly Worldwide, 7.23 Million Activities Detected Over 293 Days
SecurityInfrastructure
SentinelOne and Censys disclosed that approximately 175,000 Ollama hosts are publicly exposed globally, with 7.23 million activities recorded over 293 days across 130 countries and 4,032 autonomous systems. About half of these hosts can execute code or invoke external APIs, and at least 201 run prompt templates with security restrictions removed. The report indicates China accounts for about 30% and the US over 20%; this 'low-cost, abuse-prone' inference infrastructure could be exploited for spam, phishing, and disinformation generation, with resource and billing costs borne by the exposed hosts.
DeepSeek Plans V4 Launch Within Weeks: Million-Token Context and Enhanced Code Generation
Large ModelR&D Update
Media reports indicate DeepSeek plans to release its flagship model V4 within the coming weeks, likely around mid-February. V4 is expected to support up to a million-token context window, enabling single-pass parsing of medium-sized codebases, technical documentation, and requirements specifications, alleviating context fragmentation in complex development workflows. Reports suggest improvements in continual training mechanisms to reduce degradation in comprehension abilities and enhance abstraction learning, with internal benchmarks showing superior code generation performance compared to some mainstream models. It may also introduce new architectures like mHC to improve parallel efficiency and better align with domestic chips, reducing deployment costs and inference latency.
BIGAI and Peking University Release TongGeometry: Solves Nearly 25 Years of IMO Geometry Problems in 38 Minutes on Single GPU
ResearchReasoning
Beijing Institute of General Artificial Intelligence (BIGAI), Peking University, and collaborators published TongGeometry in Nature Machine Intelligence, using a neuro-symbolic architecture with guided tree search to achieve 'autonomous problem generation + automatic solving'. The team claims the system can solve nearly 25 years of International Mathematical Olympiad (IMO) geometry problems within 38 minutes using just a single consumer-grade GPU. A value function simulating 'mathematical aesthetics' enables active creation. Three original geometry problems generated by the system have been selected for inclusion in the 2024 Chinese National High School Mathematics Competition and U.S. elite olympiads, marking a case where AI-generated problems enter high-level human competitions.
arXiv Proposes Scalable Power Sampling: Training-Free Inference Optimization, Reduces Latency by 10x+
PaperInference Optimization
arXiv introduced Scalable Power Sampling, an inference strategy that requires no training, external rewards, or validators. It approximates power distribution sampling via per-token low-temperature scaling to sharpen the output distribution during decoding. The paper argues that gains from reinforcement learning (RL) post-training stem more from 'sharpening' the distribution than acquiring new capabilities, making inference-time sampling a viable alternative to部分 post-training. The authors claim the method matches or exceeds single-step GRPO gains across math, question answering, and coding tasks on four different LLMs; compared to MCMC power sampling, it reduces inference latency by over 10x with significantly lower computational overhead.
EU AI Act Enforcement May Be Centralized Under AI Office, Italy, Germany Demand Limits
PolicyRegulation
According to MLex, the European Commission plans to strengthen the AI Office's authority to centralize enforcement of the EU AI Act. However, member states including Italy and Germany oppose the 'centralized enforcement' model, fearing erosion of national regulatory autonomy, and demand clear responsibility boundaries and retention of control over sensitive domains. Meanwhile, key European Parliament members advocate revising regulations to cover AI agents, ban generative deepfakes, and restore/strengthen registration obligations and protections for sensitive data. These developments reflect ongoing tensions in the EU over rebalancing centralized enforcement power and addressing emerging AI risks.
Praetorian Open-Sources Julius: Fingerprints LLM Services Like Ollama and Enumerates Model Lists
SecurityOpen-Source Tool
Security firm Praetorian released and open-sourced Julius, a tool for HTTP fingerprinting of LLM services, helping security teams detect unauthorized deployments of Ollama, LiteLLM, Open WebUI, and others within enterprise networks. Julius identifies service types based on target URLs, extracts available model lists, and provides configuration details for interacting with the service. Detection rules are defined in YAML and support response caching to minimize redundant requests, with JSON output facilitating integration into automated security workflows. Built in Go, the tool currently focuses on HTTP fingerprinting and is licensed under Apache-2.0.
Ex-DeepMind Team Poetiq Raises $45.8M Seed Round for LLM Self-Optimizing 'Meta-System'
FundingAgent
Poetiq, founded by former Google DeepMind researchers, announced a $45.8 million seed funding round. Its 'meta-system' software wraps existing LLMs into AI agents capable of self-optimizing output quality and automatically terminating unnecessary computations during inference to reduce costs. It also enables rapid adaptation to user tasks with minimal training samples. The company claims the system helped GPT-5.2 achieve a 16% improvement over the previous top score on the ARC-AGI-2 benchmark. Funds will be used to advance productization and expand the team, focusing on engineering capabilities that enhance reasoning quality and control inference costs.