Kimi K2.5: Visual Agentic Intelligence

Abstract

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. The model focuses on joint optimization of text and vision so that two modalities enhance each other, employing techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Built on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Evaluations demonstrate state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to 4.5x over single-agent baselines.

Yuyao Ge 葛钰峣
Yuyao Ge 葛钰峣
Ph.D Student
Previous

Related