Safety Aha Moment

<!DOCTYPE html> 阅读论文样式修改 Read the Paper 1 Introduction Recent advances in large language models (LLMs) have led to significant improvements in their intelligence, particularly in their reasoning and decision-making capabilities [1, 2]. However, these performance gains are often accompanied by an increasing gap between the capability and safety, moving further away from the AI-45° Law [3]. For example, existing LLMs frequently demonstrate difficulty in upholding ethical principles, societal norms, and wider human values, especially when navigating the complexities of real-world scenarios. ...