Musk Unveils Terafab: Chip Production Target of 1TW/year
Compute InfrastructureSemiconductorSpace AI
Musk announced the Terafab chip manufacturing initiative in Texas, a joint effort by Tesla, SpaceX, and xAI, aiming to achieve an annual AI compute chip production capacity of 1 terawatt (TW), approximately 50 times the current global output. The project plans to integrate design, fabrication, testing, and iteration within a single campus, producing edge inference chips for Optimus/autonomous driving and radiation-hardened aerospace chips. Most of the computing power will be deployed in orbiting data centers powered by solar energy, bypassing terrestrial power and cooling constraints.
NVIDIA Begins GPU Deliveries to AWS, Targeting Over 1 Million Units
Cloud ComputingGPUCompute
Reports indicate NVIDIA started delivering its next-generation AI GPUs to AWS on March 22, with both parties targeting cumulative deployment of over one million Blackwell and Rubin architecture GPUs by 2027. AWS will launch new EC2 P6e instances based on this, supporting up to 72 GPUs per NVLink domain and 360 PFLOPS (FP8). They are also co-building Ceiba, a cloud supercomputer comprising 20,736 B200 GPUs with ~414 EFLOPS, dedicated to NVIDIA's internal research in biology, autonomous driving, and climate science.
Salesforce Acquires Cimulate to Strengthen Agentforce Commerce AI
AcquisitionAgentE-commerce
Salesforce announced a definitive agreement to acquire e-commerce AI company Cimulate, integrating its AI product discovery and agent-based commerce capabilities into Agentforce Commerce. Cimulate's CustomerGPT enables search, recommendation, and conversational shopping guidance, helping retailers improve conversion rates and average order value. Salesforce did not disclose the deal amount or closing timeline but positioned it as a key acquisition in 2026, aimed at enhancing end-to-end intelligent interactions in e-commerce to compete with Adobe, Shopify, SAP, and others in the business AI space.
WeChat Launches ClawBot Plugin: Integrating OpenClaw Agents into Chat
AgentApplication IntegrationWeChat Ecosystem
WeChat has launched the ClawBot plugin, enabling integration of the open-source AI agent framework OpenClaw directly into chat interfaces. The plugin is reportedly available in WeChat 8.0.70; users can enable it via QR code pairing and then trigger tool calls and task execution within conversations. Tencent Cloud has completed backend adaptation, forming a product matrix including Lighthouse, Claw Pro, WorkBuddy, and QClaw, embedding agent access into social apps to lower usage barriers. However, the current version still has limitations in PC/Mac access, Markdown rendering, and link redirection.
NSF Paper Proposes 'Deep Unlearning' Evaluation: Current LLMs Struggle with Complete Forgetting
Model SafetyEvaluationUnlearning
A paper from the NSF public repository introduces 'deep unlearning' evaluation: requiring not only deletion of target facts but also preventing models from re-deriving them through remaining knowledge and multi-step reasoning. The authors propose three metrics—Success-DU, Recall, and Accuracy—and conduct experiments using the real-world MQuAKE dataset and a newly constructed multi-step reasoning set, Eval-DU. Results show existing unlearning methods struggle to balance 'non-rederivability' with general capability: either forgetting is incomplete or irrelevant knowledge is excessively erased. The paper calls for specialized unlearning algorithms targeting reasoning chains to enhance safety and trustworthiness.
IBM Uses Agentic AI for Antibody Affinity Screening: MAMMAL+MCP Workflow
AI for ScienceBiomedicalAgent
IBM Research introduced an Agentic AI workflow for antibody-antigen affinity screening: leveraging MAMMAL, an open-source foundation model trained on billions of protein sequences, fine-tuned on the SabDab dataset to predict binding affinity. The team compared full antibody vs. heavy chain-only, full-length vs. CDR3 representations, and full-parameter fine-tuning vs. LoRA, selecting the best-performing model by AUC-ROC for candidate ranking. The model and tools are packaged into an MCP workflow supporting conversational interaction, where agents autonomously select tools, interpret results, and iteratively optimize—enabling large-scale virtual screening to reduce experimental burden.
S2LC Open-Sourced: Concurrently Serve 320+ LoRA Adapters on Single GPU, Latency 3.59ms
Inference OptimizationLoRAOpen Source
The open-source project S2LC proposes Shared Spectral Low-Rank Compression, compressing multiple LoRA adapters using shared spectral bases and low-bit codebook indexing. During inference, fused Triton kernels reconstruct weights in GPU registers, minimizing HBM reads/writes to achieve 'zero-copy' operation. Authors report forward latency as low as 3.59 milliseconds on A100-80GB; current uint8 compression achieves ~2x ratio, with theoretical 3-bit compression reaching up to 10.1x, enabling concurrent serving of over 320 fine-tuned adapters on a single GPU—ideal for multi-tenant and large-scale personalized deployments. The repository includes validation suites and API interfaces.
ZJU CA-TTS Calibrates Multimodal Confidence: 4.3x Greater Drop Under Degradation
MultimodalTest-time ScalingTrustworthy AI
A team from Zhejiang University proposed the CA-TTS framework to address multimodal models' tendency to maintain 'high confidence' responses even when visual evidence degrades. It first uses confidence-driven reinforcement learning to calibrate self-assessment, then converts calibrated confidence into a test-time scaling signal for compute allocation. A dual-reward design increases the confidence drop under image degradation by 4.3x. Combined with confidence-weighted voting, self-reflection, and visual self-check in a multi-stage verification loop, it achieves SOTA performance across four visual reasoning benchmarks, with peak accuracy surpassing 45%.