NVIDIA Open Sources Alpamayo Autonomous Driving VLA, Including 10B Model and 1700+ Hours of Data
Autonomous DrivingOpen Source Model
NVIDIA released the open-source Alpamayo model family at CES. The core Alpamayo 1 is a 10-billion parameter chain-of-thought Visual-Language-Action (VLA) model that performs step-by-step reasoning and outputs reasoning trajectories, improving the interpretability of decision-making in rare long-tail scenarios. It simultaneously opens over 1,700 hours of driving data and the simulation verification framework AlpaSim. The code is available on Hugging Face, supporting developers in fine-tuning and integration into autonomous driving software stacks for higher-level autonomous driving verification and deployment.
NVIDIA Releases Cosmos/GR00T and Jetson T4000, Completing Open-Source Toolchain for Robotics Stack
RoboticsEdge AIHardware
NVIDIA announced open models and tools for robotic "Physical AI": Cosmos Transfer/Predict 2.5 for simulation data generation, Cosmos Reason 2, and the humanoid robot VLA model Isaac GR00T N1.6 enhancing reasoning and control. It also open-sourced Isaac Lab-Arena for policy evaluation and benchmarking, and launched the cloud-native orchestration framework OSMO to unify the development workflow. On the hardware side, it released the Jetson T4000 module with 1,200 FP4 TFLOPS computing power, 64GB memory, priced at $1,999 per unit for orders of 1,000 or more, claiming a 4x performance improvement over the previous generation.
NVIDIA Unveils Rubin AI Platform Roadmap, Claims Inference Costs Can Be Reduced by Up to 10x
AI ChipData Center
NVIDIA showcased the Rubin AI platform roadmap at CES: an AI supercomputer composed of six chips, emphasizing the synergy between computing and memory bandwidth to address the "memory bottleneck." Reports disclosed it could reduce inference costs by up to 10x and reduce the number of required GPUs by up to 4x in certain training scenarios (e.g., MoE). It also launched an "Inference Context Memory Storage Platform" for agents, designed to manage and accelerate long-context data. Rubin is expected to be supplied in the second half of 2026 through partners like AWS, Google, Microsoft, and OpenAI.
Google TV Introduces Gemini: TV Platform Supports Image/Video Generation and Voice-Activated System Settings
Product UpdateMultimodal
Google rolled out a major Gemini update for Google TV: adding Nano Banana image generation and Veo video generation to the TV platform, supporting voice generation/editing of images and videos, and automatically creating mixed videos from Google Photos. It also provides a chat interface and "Learn More" info cards for TV, and can automatically adjust system settings based on voice feedback (e.g., adjusting related options if conversational volume is too low). The features will first launch on TCL Google TV models, with other devices requiring a wait of several months. Devices must run Android 14 or above.
vLLM Releases Semantic Router v0.1 Iris, Routing Expands from 14 Categories to Plugin-based Infinite Strategies
AI InfraRoutingSecurity
The vLLM team released the semantic router Semantic Router v0.1 Iris, designed for "Mixture of Models" (MoM) to make system-level routing decisions between users and models. The new version adopts a "Signal-Decision" plugin chain architecture, allowing the integration of multiple signal types and combined decision-making, expanding routing from a fixed 14 categories to configurable infinite strategies. On the performance side, it introduces modular LoRA to reduce latency. On the security side, it adds HaluGate three-stage hallucination detection. It also provides one-click installation, Kubernetes deployment, a visual dashboard, and supports the OpenAI Responses API and intelligent tool management.
TII Releases Falcon H1R 7B Reasoning Model, Claims AIME 2025 Score of 83.1 and Opens Source
Open Source ModelReasoning
The Abu Dhabi TII (Technology Innovation Institute) released the Falcon H1R 7B reasoning model on Hugging Face. While its parameter scale is 7B, it focuses on reasoning and throughput efficiency. According to official materials, it underwent two-stage post-training (SFT+GRPO) combined with DeepConf test-time scaling, improving accuracy while reducing generated tokens. Materials claim it achieved 83.1 on AIME 2025, 68.6 on the coding benchmark LCB v6, and reduced token usage by 38% through confidence-based dynamic pruning. The model is open under the Falcon LLM License, allowing commercial use with attribution and adherence to the acceptable use policy.
Ant Group Open Sources Medical LLM AntAngelMed: 100B MoE, 6.1B Activated Parameters, Claims Top Scores on Multiple Benchmarks
Medical AIOpen Source Model
Ant Group has reportedly open-sourced the medical large language model "AntAngelMed." It has 100B total parameters and a MoE architecture with approximately 6.1B activated parameters. Materials claim it ranks first among open-source models on evaluations such as HealthBench, MedAIBench, and MedBench. It was trained through three stages: continued pre-training, supervised fine-tuning (SFT), and GRPO reinforcement learning, balancing medical reasoning capabilities with medical safety/ethics requirements. Regarding inference efficiency, materials claim speeds of 200+ tokens/s on H20. It is targeted for applications like medical Q&A, health management, and clinical assistance.
Intel Proposes DeepMath: Letting LLMs Generate and Execute Python Snippets to Reduce Math Errors
LLMAgent Tool CallingSecurity
Intel proposed the DeepMath architecture to enhance LLMs' mathematical abilities: instead of outputting lengthy reasoning text, the model generates small Python snippets executed in a restricted sandbox, offloading deterministic calculations to code, thereby reducing arithmetic and numerical errors and shortening reasoning traces. The scheme is based on Qwen3-4B Thinking and fine-tuned with GRPO, with training rewarding "correct answers + concise code." To meet production safety requirements, the Python execution environment uses a module whitelist, execution timeouts, and disables file and network access to mitigate security risks introduced by tool calling.
MiniMax's Hong Kong IPO Leans Towards Pricing at Upper Limit, Targeting ~$538M Raise at ~$6.5B Valuation
Financing/IPOLarge Model
MiniMax is reportedly leaning towards pricing at the upper end of the range (HK$151–165 per share) for its Hong Kong IPO, targeting a raise of approximately $538 million, corresponding to a market valuation of about $6.5 billion. Materials state the company started book-building on December 31, 2025, with the order book multiple times oversubscribed. It plans to finalize pricing on January 6 and begin trading on January 9. MiniMax's main business is multimodal large models and applications, backed by investors including Tencent and Alibaba. A successful listing would provide a new pricing reference and financing example for LLM companies entering the public market.
NVIDIA introduced a "cache-aware" streaming architecture in Nemotron Speech ASR, replacing traditional buffered inference with overlapping windows: each audio segment is processed only once, reusing historical computations, reducing redundant calculations, and stabilizing latency drift in high-concurrency scenarios. Officially, this solution can deliver up to 3x concurrency, near-linear scaling memory usage, and achieved a median time-to-final transcription of 24ms in validation with partners. The model is targeted for real-time voice agent deployment and supports dynamic trade-offs and parameter tuning between latency and WER at runtime.