RepoPrompt Open Sourced: Author Restructures with MCP Server After Joining OpenAI, CLI Tools Hot-Swappable
Open SourceAI Programming
RepoPrompt has been officially open-sourced. After its author Provencher joined OpenAI, he led a full architectural overhaul. The new design centers around an MCP (Model Context Protocol) server as the controller, enabling底层 CLI tools to be flexibly swapped—standardizing context engineering management. This architecture allows developers to orchestrate various underlying coding tools under a unified interface, sharing key engineering insights behind the refactoring. The project represents mature practice in context engineering and exemplifies the evolution of AI programming tools toward composable, replaceable architectures.
Baidu has open-sourced Unlimited OCR, leveraging Reference Sliding Window Attention (R-SWA) for efficient continuous parsing of long documents. This mechanism treats visual tokens as fixed references, retaining only the most recent 128 historical tokens in the output stream, keeping KV Cache size constant regardless of generation length. It achieves a composite score of 93.23% on OmniDocBench v1.5, a 6.22% improvement over DeepSeek OCR; delivers ~35% faster inference when generating 6000 tokens, and maintains stable quality on documents exceeding 40 pages. Core contributor YY is believed to be Wei Haoran, former DeepSeek researcher, continuing technically from the DeepEncoder approach.
MiniMax Grants 600 Million HKD in Zero-Cost Stock to All Employees, No KPIs, Fully Vested by Tenure
Industry DynamicsChina AI
The AI talent war intensifies as MiniMax grants approximately 600 million HKD worth of zero-cost stock to all employees, with no KPI requirements—full vesting based solely on tenure. Meanwhile, DeepSeek plans to double department headcount, with Harness team leads conducting daily interviews; Alibaba promotes 「one-person teams」 to reduce communication overhead, reflecting broader trends in organizational efficiency optimization. Additionally, former Google CEO Schmidt acknowledges China's ability to build top-tier AI models on weaker hardware, narrowing the US-China AI gap to about six months. Apple lobbies the US government to relax procurement restrictions on Chinese memory chipmaker ChangXin to mitigate price hikes.
Four Waves of AI Development Paradigms: From Prompt Engineering to Loop Engineering, Human Role Shifts from Executor to Designer
AI EngineeringAgent
A comprehensive 10,000-character review systematically traces the evolution of AI development paradigms from prompt engineering to loop engineering. The core shift moves from communication techniques to system design: context engineering adopts methods like MVC and GraphRAG, reducing costs by over 90% via prompt cache prefix matching; harness engineering implements hierarchical interception (hard rules → policy gateway → AI review → human final approval), blocking 80% of basic errors at low cost—the model only proposes, while the harness holds final execution authority; loop engineering enables autonomous iteration, using a five-component toolkit and loop protocols to prevent失控. The article argues that future high-value talent will be 「loop designers」 rather than prompt experts.
NVIDIA and collaborators publish the RQGM paper, enabling co-evolution of AI agents and evaluators for recursive self-improvement. Breaking through the fixed-evaluator bottleneck, it allows evaluators to rotate at epoch boundaries while preserving useful evidence. Three experiments demonstrate effectiveness: code generation pass rate improves to 71.7% with lower token consumption, paper review acceptance rises to 40.5%, and Olympiad math search cost drops threefold. The framework also self-corrects LLM evaluator bias toward AI-generated content, ultimately treating AI and human outputs equally with 80% accuracy. Dubbed this year's 「most dangerous」 paper.
Liquid AI Open-Sources LFM2.5-230M: 4-bit Model Just 293MB, Reaches 213 tok/s on Samsung S25 Ultra
Open SourceOn-Device AI
Liquid AI releases the 230M-parameter open-weight model LFM2.5-230M, optimized for on-device agent tasks, supporting multiple frameworks including llama.cpp, MLX, vLLM, SGLang, and ONNX. It achieves 213 tok/s on the Samsung Galaxy S25 Ultra and 42 tok/s on Raspberry Pi 5, with 4-bit quantization occupying only 293–375MB. It outperforms larger models such as Qwen3.5-0.8B and Gemma 3 1B in IFEval instruction following and data extraction benchmarks, though weaker in math, coding, and creative writing. Features built-in JSON function calling and has already been deployed as the skill selection layer on Unitree G1 robots.
Gary Marcus Warns of AI Price War: Open-Source Near-Zero Cost Undermines Profitability, Questions Trillion-Dollar Valuations
Industry AnalysisBusiness Model
Gary Marcus writes that lack of moats in the AI industry triggers price wars, threatening US dominance. Open-source competition from China drives token prices toward zero, making it difficult for companies like OpenAI and Anthropic to sustain trillion-dollar valuations or recoup massive infrastructure investments. He identifies three flaws in the current paradigm: brute-force training inefficiency, unreliability undermining premium pricing, and ease of replication. Marcus advocates shifting from price competition to reliable, specialized applications—fostering new AI better suited for science and healthcare—and prioritizing safety over chasing the cheapest LLM.
VLX-Seek 3B Vision Model Surpasses Gemini: Region Tokens Replace Coordinates, Fine-Grained Perception Achieves SOTA
MultimodalOn-Device AI
Om AI releases VLX-Seek, a 3-billion-parameter vision-language model that reframes object localization as a language-conditioned retrieval task, using region tokens instead of unstable coordinate outputs—better aligning with language model capabilities. It employs dual visual pathways plus HFRE to deliver both high-level semantics and fine-grained spatial details, with two-stage training preventing capability degradation and supporting rejection learning. It surpasses larger models including Gemini 3.1/2.5 Pro and Qwen2.5-VL-7B across benchmarks such as MSCOCO detection, ODinW13 open-vocabulary detection, RefCOCO referring expression comprehension, and PixMo counting, targeting on-device and embodied deployment.
Study Shows LLM 「Preferences」 Are Behavioral Inertia: Stated Preferences Don't Drive Actions, Should Not Be Interpreted as True Desires
AI SafetyAlignment
A LessWrong experiment reveals that preferences reported by LLMs in paired-choice tests do not drive their behavior. Across four writing tasks and seven models, offering highly preferred outcomes (e.g., saving 1000 lives) did not produce better outputs. However, direct prompts like 「try harder」 or role-playing significantly improved quality, while harmful prompts induced sandbagging and degradation. The authors argue that true desires should drive behavior; since elicited preferences fail this test, they should not be interpreted as human-like goals. Misaligned preferences may not pose safety risks. The study recommends prioritizing behavioral testing over self-reported preferences when evaluating LLMs.
One-Click Website Cloning Tool Gains 20K Stars, Frontend Engineers Under Pressure: Full Next.js Site Generated via Single Command
AI ProgrammingOpen Source
The GitHub project ai-website-cloner-template has gained 20K stars, capable of pixel-perfect cloning of any website and generating a complete Next.js project with a single command. Its five-stage pipeline includes full-site crawling (simulating scroll and click to capture real CSS), basic setup, component standardization, parallel build (using git worktree to schedule multiple Agents), and QA (automated ESLint and TypeScript checks). Supports major tools like Claude Code, Cursor, and Codex CLI, with unified configuration in AGENTS.md. The author specifies legitimate uses include platform migration, source recovery, and learning, and emphasizes prohibitions against phishing and copyright infringement.