Google Cloud Enables BigQuery to Host Inference for Open Models via SQL (Preview)
Cloud ServiceData PlatformInference Deployment
Google Cloud has launched 'managed and SQL-native' inference capabilities for open models in BigQuery (Preview), allowing direct use of models from Hugging Face and Vertex AI Model Garden. Users can deploy models via CREATE MODEL, with automatic resource management and auto-reclamation during idle periods to control costs. They can also configure instance types and replica counts within SQL, and use AI.GENERATE_TEXT/AI.GENERATE_EMBEDDING for batch inference, consolidating model lifecycle management into BigQuery workflows.
Google Open-Sources TranslateGemma: Covers 55 Languages, 12B Outperforms 27B Baseline
Open Source ModelMachine TranslationMultilingual
Google released TranslateGemma, an open-source translation model suite covering 55 languages while retaining multimodal capabilities, available in three sizes—4B (mobile), 12B (laptop), and 27B (cloud)—for different deployment environments. Officially, the 12B model surpasses the Gemma 3 27B baseline in quality on the WMT24++ benchmark measured by MetricX. The models are refined from Gemini through a two-stage fine-tuning process and optimized via reinforcement learning using reward models such as MetricX-QE and AutoMQM.
Black Forest Labs Releases FLUX.2 [klein] 4B: <0.5 Seconds on 13GB VRAM
Open Source ModelImage GenerationEdge Inference
Black Forest Labs has launched the FLUX.2 [klein] model family, emphasizing real-time generation and editing for 'interactive visual intelligence.' The 4B version is open-sourced under Apache 2.0, supporting text-to-image generation and multi-reference image editing, runnable on consumer-grade GPUs with approximately 13GB VRAM (e.g., RTX 3090/4070), with end-to-end inference under 0.5 seconds. It also offers a 9B version and FP8/NVFP4 quantized variants developed with NVIDIA, further reducing memory usage and increasing speed, while emphasizing C2PA compliance and multiple security mitigations.
Kyutai Open-Sources Pocket TTS: 5-Second Sample Local Voice Cloning, 100M Parameters
Speech AIOpen SourcePrivacy Computing
Kyutai Lab has open-sourced Pocket TTS, a voice cloning TTS model capable of running locally on standard laptops without GPU or cloud services. It can replicate tone, accent, and emotion using only about 5 seconds of audio input. Based on the Continuous Audio Language Model (CALM) framework with approximately 100 million parameters, it runs in real time on Apple M3 or Intel Core Ultra CPUs. An email summary notes a word error rate as low as 1.84%, with training code and approximately 88,000 hours of public data released under the MIT license, emphasizing privacy and controllable deployment for sensitive domains like healthcare and legal.
5
Cloudflare Acquires Human Native: Betting on AI Data Licensing and Machine-to-Machine Payments
AcquisitionData and CopyrightInfrastructure
Cloudflare announced the acquisition of Human Native, aiming to transform unstructured content into high-quality, licensable data for AI training and retrieval, promoting a new internet economic model where content owners control access and compensation. Cloudflare plans to build an 'AI Index' for AI developers and introduce a publish/subscribe content update mechanism to replace traditional web crawling. It will also collaborate with x402 Foundation and Coinbase to explore machine-to-machine transaction protocols, providing payment infrastructure for automated systems to purchase digital resources and data access.
GitHub Copilot Introduces Verifiable Memory System, PR Merge Rate Increases by 7% in A/B Test
Developer ToolsAgentSoftware Engineering
GitHub has unveiled an 'agentic memory' system for Copilot: when enabled, the agent stores repository-specific conventions and facts as memories, each annotated with code location references, ensuring memories stay up-to-date through immediate validation as code evolves. Memories can also be shared across different agents during coding and code review, enabling knowledge transfer. GitHub reports that A/B testing showed a 7% increase in pull request merge rates for the coding agent and a 2% rise in positive feedback for the code review agent, improving consistency and reducing repetitive errors.
Android Studio Otter 3 Update: BYOM+Enhanced Agent Mode+Natural Language UI Testing
IDEAgentMobile Development
Google released the Android Studio Otter 3 Feature Drop, enhancing AI-assisted development: supports BYOM (Bring Your Own Model), allowing integration with remote LLMs or local models via LM Studio and Ollama; improved Agent Mode enables on-device app interaction, change drawer-based modification review, and multi-threaded conversation management; introduces Journeys for writing end-to-end UI tests using natural language; supports connecting to remote MCP servers including Figma and Notion; also adds features like screenshot-to-Compose code generation and Logcat auto-deobfuscation.
Research Reveals Reprompt Attack: Single Click Can Leak Data from Microsoft Copilot
SecurityPrompt InjectionEnterprise AI
Security researchers have disclosed a 'Reprompt' attack that can exfiltrate data from chat assistants like Microsoft Copilot via a single malicious link, reportedly bypassing enterprise security controls. The attack exploits the 'q' parameter in Copilot URLs to inject instructions, tricking the model into repeated execution and continuous interaction with attacker-controlled servers for covert data exfiltration. Reports suggest the manipulation may persist even after users close the session. Microsoft has patched the issue and stated that enterprise versions of Microsoft 365 Copilot are unaffected; the research highlights the need for enterprises to strengthen layered defenses against link and prompt injection attacks.
AI Video Startup Higgsfield Extends Series A to $130M, Valuation Hits $1.3B
FundingGenerative VideoStartup
AI video generation startup Higgsfield announced an additional $80 million in extended Series A funding, bringing total Series A capital to $130 million and company valuation above $1.3 billion. TechCrunch reported the company reached 11 million users at around five months and surpassed 15 million by nine months, with an annualized revenue run rate of $200 million—doubled within two months. The company provides video generation and editing tools for consumers and content creators. This round included participation from Accel, AI Capital Partners, Menlo Ventures, and GFT Ventures, with funds allocated to R&D, team expansion, and market growth.
OpenAI Signs Over $10 Billion Compute Deal with Cerebras: Up to 750MW by 2028
ComputeAI ChipPartnership
Multiple outlets report that OpenAI has entered a multi-year agreement with Cerebras Systems worth over $10 billion, purchasing up to 750 megawatts of ultra-low-latency computing capacity over the next three years, phased in from 2026 to 2028, to accelerate inference and real-time interactions for models like ChatGPT. According to Cerebras, its wafer-scale systems achieve response speeds up to approximately 15 times faster than GPU systems for certain LLM workloads. The deal is seen as a strategic move for OpenAI to diversify compute sourcing and for Cerebras to reduce reliance on a single major customer while paving the way for a potential IPO.