Efficacy of Context Summarization Techniques on Large Language Model Chatbots: Balancing Compression and Recollection
View project (urn.kb.se)This paper investigates the efficacy of various context summarization techniques for LLM chatbots, with a focus on evaluating recollection performance. The study explores the challenges posed by LLMs’ statelessness and the limitations of their context window due to token constraints. It presents a comparative analysis of different summarization methods, including FullContext, and Cohere Summarize, and introduces a novel LLM-prompt for context compression. The research aims to strike a balance between context retention and data compression to reduce API call costs and enhance the usability of LLMs for both personal and enterprise applications. The findings indicate that certain techniques can significantly reduce the space occupied by conversation history while maintaining adequate context recollection, with FullContext emerging as a balanced method for doing so. It achieved a compression ratio of 12.3 (92% space saving), while retaining 77% of the original context, when applied to the large dataset of real ChatGPT conversations. As for retaining fine details in a hand-crafted dataset, Cohere Summarize included 6/7 key topics with a compression ratio of 5.1 (80% space saving). The paper contributes to the field by providing insights into cost-effective LLM utilization and expanding the context window for lower-grade LLM models.