Large language models (LLMs) have achieved remarkable accomplishments in various natural language processing tasks. Scientific text summarization is a particularly challenging task due to the specialized nature of scientific content. Evaluating LLMs on this unique task requires meticulously constructed benchmarks and metrics. Several research pap