Kimi K2.5: Visual Agentic Intelligence

Kimi Team, Yuyao Ge 葛钰峣

February, 2026

Abstract

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. The model focuses on joint optimization of text and vision so that two modalities enhance each other, employing techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Built on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Evaluations demonstrate state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to 4.5x over single-agent baselines.

Type

Preprint

Kimi K2.5: Visual Agentic Intelligence

Abstract

Yuyao Ge 葛钰峣

Ph.D Student

Related