LLM

Kimi K2.6: Advancing Open-Source Coding

We are open sourcing our latest model, Kimi K2.6, featuring state-of-the-art coding, long-horizon execution, and agent swarm …

Kimi Team, Yuyao Ge 葛钰峣

Kimi K2.6: Advancing Open-Source Coding

SkillForge: Co-Evolving Skills and Agents via Dynamic Skill Lifecycles

An agentic RL method that evolves the skill library through a fitness-driven lifecycle, enabling skills and the model to co-evolve throughout training.

Yuyao Ge 葛钰峣, Shenghua Liu, Yiwei Wang, Tianyu Liu, Yuchen He, Baolong Bi, Lingrui Mei, Jiayu Yao, Lizhe Chen, Jiafeng Guo, Xueqi Cheng

SkillForge: Co-Evolving Skills and Agents via Dynamic Skill Lifecycles

Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language Models

We propose PRISM-Δ, a differential subspace steering method for prompt highlighting that matches or exceeds the best existing method on 19 of 20 configurations with relative gains up to +10.6%, while halving the fluency cost.

Yuyao Ge 葛钰峣, Shenghua Liu, Yiwei Wang, Tianyu Liu, Baolong Bi, Lingrui Mei, Jiayu Yao, Jiafeng Guo, Xueqi Cheng

Do Large Language Models Already Know the Answer Before They Finish Thinking?

Probing hidden states during reasoning reveals that LLMs already know the answer before finishing thinking. We detect overthinking via ‘jumps’ and intervene during inference to improve reasoning.

Yuyao Ge 葛钰峣, Shenghua Liu, Yiwei Wang, Tianyu Liu, Lingrui Mei, Baolong Bi, Jiayuan Guo, Jiayu Yao, Jiafeng Guo, Xueqi Cheng

PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding

Abstract: We present PromptCD, a test-time method for controlling LLM behavior without additional training. The approach creates paired positive and negative guiding prompts for a target behavior and contrasts model responses at the token-probability level for LLMs and through visual attention patterns for VLMs.

Baolong Bi, Yuyao Ge 葛钰峣, Shenghua Liu, Yuchen He, Siqian Tong, Lizhe Chen, Lingrui Mei, Zehao Li, Yiwei Wang, Yujun Cai, Ming-Hsuan Yang, Xueqi Cheng

Kimi K2.5: Visual Agentic Intelligence

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. The model focuses on …

Kimi Team, Yuyao Ge 葛钰峣

Gated Differentiable Working Memory for Long-Context Language Modeling

Abstract: Long contexts challenge transformers as attention scores dilute across thousands of tokens and critical information is often lost in the middle. We reframe test-time adaptation as a budget-constrained memory consolidation problem and propose Gdwm (Gated Differentiable Working Memory), which introduces a write controller that estimates Contextual Utility, an information-theoretic measure of long-range contextual dependence, to allocate gradient steps efficiently.

Lingrui Mei, Shenghua Liu, Yiwei Wang, Yuyao Ge 葛钰峣, Baolong Bi, Jiayu Yao, Jun Wan, Ziling Yin, Jiafeng Guo, Xueqi Cheng

Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

Abstract: Reinforcement learning (RL) has shown great promise in enhancing LLM reasoning, but current approaches mainly focus on single domains with verifiable rewards. We propose RGR-GRPO, a rubric-driven RL framework for multi-domain reasoning that uses rubrics to provide fine-grained reward signals and offline guidance.

Baolong Bi, Shenghua Liu, Yiwei Wang, Siqian Tong, Lingrui Mei, Yuyao Ge 葛钰峣, Yilong Xu, Jiafeng Guo, Xueqi Cheng

Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

Can Graph Descriptive Order Affect Solving Graph Problems with LLMs?

We present the first comprehensive analysis of how the order of graph descriptions impacts LLM performance, evaluating four graph description orders across six graph problems using six mainstream LLMs.

Yuyao Ge 葛钰峣, Shenghua Liu, Baolong Bi, Yiwei Wang, Lingrui Mei, Wenjie Feng, Lizhe Chen, Xueqi Cheng

Can Graph Descriptive Order Affect Solving Graph Problems with LLMs?

Kimi K2: Open Agentic Intelligence

We introduce Kimi K2, a Mixture-of-Experts large language model with 32 billion activated parameters and 1 trillion total parameters. …

Kimi Team, Yuyao Ge 葛钰峣

Kimi K2: Open Agentic Intelligence

PIS:Linking Importance Sampling and Attention Mechanisms for Efficient Prompt Compression

Abstract: Large language models (LLMs) have achieved remarkable progress, demonstrating unprecedented capabilities across various natural language processing tasks. However, the high costs associated with such exceptional performance limit the widespread adoption of LLMs, highlighting the need for prompt compression.

Lizhe Chen, Binjia Zhou, Yuyao Ge 葛钰峣, Jiayi Chen, Shiguang Ni

PIS:Linking Importance Sampling and Attention Mechanisms for Efficient Prompt Compression

a1: Steep Test-time Scaling Law via Environment Augmented Generation

Large Language Models (LLMs) have made remarkable breakthroughs in reasoning, yet continue to struggle with hallucinations, logical …

Lingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi, Yuyao Ge 葛钰峣, Jun Wan, Yurong Wu, Xueqi Cheng

a1: Steep Test-time Scaling Law via Environment Augmented Generation

Innate Reasoning is Not Enough : In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking

We present the first comprehensive analysis of the impacts of CoT prompting on Reasoning LLMs, finding that one-shot CoT consistently enhances performance and reduces excessive reflections by approximately 90%.

Yuyao Ge 葛钰峣, Shenghua Liu, Yiwei Wang, Lingrui Mei, Lizhe Chen, Baolong Bi, Xueqi Cheng

EMNLP2024论文分享 | Fewer is More：CoT示例要少而精

作者提出CoT-Influx方法，一种对CoT的示例和内容进行优化从而提高LLMs推理能力的方法，其核心思想是通过剪枝最大化有效信息的输入。

Yuyao Ge 葛钰峣

Oct 24, 2024 2 min read 论文分享

Translating Words to Worlds: Zero-Shot Synthesis of 3D Terrain from Textual Descriptions Using Large Language Models

Abstract: The current research on text-guided 3D synthesis predominantly utilizes complex diffusion models, posing significant challenges in tasks like terrain generation. This study ventures into the direct synthesis of text-to-3D terrain in a zero-shot fashion, circumventing the need for diffusion models.

Guangzi Zhang, Lizhe Chen, Yu Zhang, Yan Liu, Yuyao Ge 葛钰峣, Xingquan Cai

Translating Words to Worlds: Zero-Shot Synthesis of 3D Terrain from Textual Descriptions Using Large Language Models

论文分享 | 广泛的解码策略导致大模型越狱

在本文，作者提出了一个新的数据集MaliciousInstruct，一种模型回答毒性评估方式，一种通过操纵解码超参数的攻击手段——generation exploitation，一种对齐策略——generation-aware alignment

Yuyao Ge 葛钰峣

Apr 9, 2024 2 min read 论文分享

论文分享 | 广泛的解码策略导致大模型越狱

论文解读 | TTA:大模型回答置信度评估新方法

本文提出了一种新的方法，全面评估大模型多个候选答案的可信度，以减轻大模型对于错误答案的过度自信。

Yuyao Ge 葛钰峣

Mar 25, 2024 2 min read 论文分享

论文解读 | TTA:大模型回答置信度评估新方法

Softmax回归及其优化问题

本文所属系列为笔者学习陈天奇和J.Zico Kolter在CMU开设的Deep Learning Systems的课程笔记。

Yuyao Ge 葛钰峣

Mar 21, 2024 3 min read 笔记

Softmax回归及其优化问题

论文解读 | 3月最新用于游戏的大模型Agent综述

3月最新用于游戏的大模型Agent综述

Yuyao Ge 葛钰峣

Mar 21, 2024 1 min read 论文分享

论文解读 | 3月最新用于游戏的大模型Agent综述

论文解读 | 2月最新大模型综述——来自Word2Vec作者Tomas Mikolov

Word2Vec作者Tomas Mikolov二月最新关于大模型综述

Yuyao Ge 葛钰峣

Mar 16, 2024 2 min read 论文分享

论文解读 | 2月最新大模型综述——来自Word2Vec作者Tomas Mikolov

编码实践 | 一文读懂Self-Attention机制

这篇文章将用编码复现Transformer架构中使用的自注意力机制。

Yuyao Ge 葛钰峣

Mar 10, 2024 3 min read 编码实践

编码实践 | 一文读懂Self-Attention机制

论文解读 | Auto CoT——利用聚类自动生成CoT

在过去CoT有两种范式，一种是Zero-shot，在问题末尾添加"Let’s think step by step"。另一种Manual CoT(Few-shot CoT)，每个例子由问题和推理链组成。第二种方法表现是否好取决于CoT写的好不好，不过这需要人手工来写。本文通过提出Auto-CoT这一方法使得Few-shot CoT可以自动生成，解放双手！

Yuyao Ge 葛钰峣

Mar 2, 2024 1 min read 论文分享

论文解读 | Auto CoT——利用聚类自动生成CoT

论文解读 | 思维链越长大模型越聪明？

思维链(Chain of thought - CoT)在过去的实践中已经证明对提升大模型的推理能力有显著帮助。然而，目前还没有一项工作解释思维链长度与推理能力之间的关系。本文围绕这一核心问题，围绕CoT做了系统实验，并给出许多有意思和反直觉的结论。

Yuyao Ge 葛钰峣

Feb 26, 2024 1 min read 论文分享

论文解读 | 思维链越长大模型越聪明？

论文解读 | 大模型的涌现是幻影

本文揭示了大模型并不存在「涌现」，大模型的能力随着参数的增长是线性增长的。

Yuyao Ge 葛钰峣

Dec 16, 2023 2 min read 论文分享

论文解读 | 大模型的涌现是幻影

我们该如何监督比人类更强的AI？ | Weak-to-strong generalization

如今AI的能力越来越强，接近甚至已经超越人类的能力，人类已经很难监督这类超人AI了。那么该如何监督比人类更强大的超人AI呢？

Yuyao Ge 葛钰峣

Dec 15, 2023 1 min read 笔记

我们该如何监督比人类更强的AI？ | Weak-to-strong generalization

论文解读 | 面向LLM的图结构多模态表示

This paper explore the impact of encoding global and local graph structures using different modalities, particularly focusing on node classification tasks

Yuyao Ge 葛钰峣

Nov 21, 2023 10 min read 论文分享

论文解读 | 面向LLM的图结构多模态表示

论文解读 | Graph-Guided Reasoning for Multi-Hop Question Answering in Large Language Models

提出了一种基于大模型的图引导的面向多步推理问题的推理方式。本文的主要贡献有两点：提出上述推理方式，提出允许变量定义的用于知识三元组提取的上下文学习方法

Yuyao Ge 葛钰峣

Nov 20, 2023 2 min read 论文分享

论文解读 | Graph-Guided Reasoning for Multi-Hop Question Answering in Large Language Models

论文解读 | GraphText —— 图到文本的映射

在本文,作者跟据图结构的特点提出了一种将图映射到文本空间的框架GRAPHTEXT

Yuyao Ge 葛钰峣

Nov 14, 2023 1 min read 论文分享

论文解读 | GraphText —— 图到文本的映射

论文解读 | ReAct——LLM推理范式推理+动作

LLM ReAct范式，在大语言模型中结合推理和动作

Yuyao Ge 葛钰峣

Oct 27, 2023 1 min read 论文分享

论文解读 | ReAct——LLM推理范式推理+动作

论文解读 | Reasoning with Language Model is Planning with World Model

过去的研究证明，人类拥有一个内部的世界模型，使人类能够模拟行动及其对世界状态的影响，以进行复杂任务的有意识的规划，包括运动控制、想象、推理和决策。而大模型只能通过自回归的方式进行推理，为此作者将强化学习和蒙特卡洛树搜索引入大模型推理

Yuyao Ge 葛钰峣

Oct 26, 2023 1 min read 论文分享

论文解读 | Reasoning with Language Model is Planning with World Model

论文解读 | Can Language Models Solve Graph Problems in Natural Language?

本文首次尝试使用大模型来解决图问题。在本文作者基于自然语言提出了含有千余个图基础问题的BenchMark NLGraph。作者在GPT3/4上进行实验获得了一些结论。在论文的最后，作者提出了两个可以提升大模型处理图问题的方法。

Yuyao Ge 葛钰峣

Oct 23, 2023 2 min read 论文分享

论文解读 | Can Language Models Solve Graph Problems in Natural Language?

论文解读 | TALK LIKE A GRAPH： ENCODING GRAPHS FOR LARGE LANGUAGE MODELS

尽管在自动推理在自然文本方面取得了显著进展，但大型语言模型（LLMs）上的图推理效果依然不尽人意。在本文，作者团队进行了首次全面的研究，将图结构数据编码为文本，以供LLMs推理

Yuyao Ge 葛钰峣

Oct 21, 2023 1 min read 论文分享

论文解读 | TALK LIKE A GRAPH： ENCODING GRAPHS FOR LARGE LANGUAGE MODELS

论文解读 | LANGUAGE MODELS REPRESENT SPACE AND TIME

大语言模型（LLM）的领军地位引发了大模型系统到底只是对海量数据进行肤浅地统计还是真正学习到了世界模型的争论。在本文，作者以llama2为实验对象，在三个空间数据集和三个时间数据集上进行了实验发现了后者存在的证据。

Yuyao Ge 葛钰峣

Oct 11, 2023 2 min read 论文分享

论文解读 | LANGUAGE MODELS REPRESENT SPACE AND TIME

llama模型量化报告(二)

LLM.int8的进一步研究及Emergent feature现象

Yuyao Ge 葛钰峣

Aug 11, 2023 1 min read 论文分享

llama模型量化报告(二)

llama模型量化报告(一)

大模型量化基础知识及LLM.int8量化

Yuyao Ge 葛钰峣

Aug 4, 2023 3 min read 论文分享

llama模型量化报告(一)