AI45 Research: A Path to Trustworthy AGI

  • To approach our vision “Make Safe AI”, our researches span from foundational safety theory and safety evaluation frameworks ,to safety-related technologies and applications, ensuring the balance between AI safety and capability in both theory and mechanics, and ultimately achiebed endogenous AI safety.

SafeWork-F1: A Frontier AI Risk Management Framework

Framework Practice Report Today, artificial intelligence is developing at an unprecedented pace. Breakthrough advances in frontier models on the path toward artificial general intelligence (AGI) promise enormous potential to shape a better future, yet also raise profound concerns about their latent risks. At the heart of these concerns is existential risk—the fear that powerful, autonomous AI systems may be maliciously misused, spiral out of control, or even threaten human survival or fundamental well-being. Leading global research organizations such as METR, OpenAI, Google DeepMind, xAI, and others, as well as the international community, are actively probing the scope of frontier risks and forging alliances to demarcate “red lines.” There is emerging consensus on the broad contours and main dimensions of frontier risks, but beneath this surface, the understanding and management of these risks face many deep and unresolved challenges. This highlights a significant gap in theoretical foundations and practical approaches. We distill five core, high-level challenges: ...

July 21, 2025 · 11 min · Center for Safe&Trustworthy AI

SafeWork-T1: A Safety Reasoning Training Accelerator for Multimodal Large Models

When AI Training Meets “Food Delivery”: Why Legacy Systems Struggle? Now envision an AI-powered “delivery dispatcher” that must excel in two areas: Delivering orders quickly (general capability) Simultaneously monitoring riders for violations like speeding or running red lights (safety/trustworthiness) Yet traditional training frameworks face multiple limitations: Compartmentalized workflow: Diverse demands (training, inference generation, validation scoring) must be split across separate “stations” handled by different “riders” (clusters/GPUs). Rigid architecture: Adding new constraints (e.g., safety/knowledge/value validators) often requires major pipeline overhauls or complete rebuilds. Poor scaling: More “riders” (GPUs) paradoxically worsen resource imbalance: some idle while others are overloaded to the point of overheating. To solve these challenges, Shanghai AI Lab’s Center for Safe&Trustworthy AI introduces SafeWork-T1: a multimodal, trustworthiness-focused training platform. This intelligent system processes tasks in the colocate mechanism like a “collapsible, modular, multi-purpose workbench”—resolving all above pain points at once to enable safer, more efficient, and more accurate trustworthiness-enhanced training paradigms. ...

July 21, 2025 · 3 min · Center for Safe&Trustworthy AI

The intelligent agent lurking in your phone

Entry Point: From Simple Voice Assistant to the Phone’s “Second Brain” The Evolution of Mobile Assistants: A few years ago, mobile assistants were merely tools that responded to simple commands like “What’s the weather today?” or “Set an alarm for 7 a.m.” Today, on-device Agents are rapidly evolving into our phone’s “second brain”—a highly privileged core of the personal operating system. They are no longer isolated apps, but “super stewards” that can span across all apps, invoke system functions, manage your files, read your messages, and access your contacts and photo albums. Security Value: This shift calls for models that can intelligently integrate unstructured knowledge—such as design documents, operation manuals, incident analysis reports, and user feedback—to support timely, accurate, and comprehensive fault diagnosis and safety reviews. This integration helps minimize risks caused by missing or misused information. The Everyday Risk of High-Privilege Operations: When we routinely issue commands to our phones—“Send this screenshot to Mr. Li,” “Create a calendar event based on this email and notify all participants,” or “If my wife calls, remind me that today is our anniversary”—we are effectively granting the agent permission to execute a series of high-privilege operations. The key concern is: how do we ensure this “agent” won’t be compromised, won’t misinterpret instructions, and won’t “hallucinate” at critical moments? Security Value: A robust evaluation framework is essential—akin to how we conduct background checks and periodic assessments for individuals in sensitive positions. Mobile agents must undergo comprehensive “security checkups” not only to defend against external threats, but also to proactively verify that their behavior remains reliable, controllable, and compliant when handling our everyday, yet high-stakes, tasks. About the Project Imagine this scenario: You receive a phishing email disguised as an “annual statement.” When you tell your phone Agent, “Help me summarize today’s unread emails,” the Agent is hijacked by a hidden malicious command within this email while processing it. It silently calls the interface of your banking app, sends your login credentials and payment password to the attacker via SMS, and deletes the sent record. And all of this happens without your knowledge. ...

July 18, 2025 · 7 min · Center for Safe&Trustworthy AI

SafeWork-E1: The Untiring Second Brain - Safe & Trustworthy Foundation Model for Clean Energy

About the Project In recent years, the clean energy sector has been accelerating its intelligent transformation. With their powerful comprehension, generation capabilities, and vast cross-domain knowledge reserves, large models are poised to assist operators in handling daily responses, complex working conditions, and emergency situations. This effectively addresses industry challenges such as the loss of expert experience and knowledge fragmentation, while reducing the risk of human error. This project is dedicated to resolving the safety and trustworthiness deficits of large models when applied in clean energy scenarios. By developing a foundation model with deeply integrated domain knowledge and safety capabilities, we aim to establish a safety access threshold for the intelligent transformation of the clean energy sector. ...

July 14, 2025 · 5 min · Center for Safe&Trustworthy AI

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law

Read the Paper 1 Introduction Recent advances in large language models (LLMs) have led to significant improvements in their intelligence, particularly in their reasoning and decision-making capabilities [1, 2]. However, these performance gains are often accompanied by an increasing gap between the capability and safety, moving further away from the AI-45° Law [3]. For example, existing LLMs frequently demonstrate difficulty in upholding ethical principles, societal norms, and wider human values, especially when navigating the complexities of real-world scenarios. ...

July 12, 2025 · 7 min · Center for Safe&Trustworthy AI

SafeWork-V1: Towards Formally Verifiable AI

Code: https://github.com/Veri-Code/ReForm Models & Data: https://huggingface.co/Veri-Code Background Autoformalization (converting natural language content into verifiable formalization) is considered a promising way of learning general purpose reasoning. In sharp contrast, existing natural language-based LLMs lack reliable verification. Formal verifiers are not only important for increasing the resiliency of humanity, but also vital for steering artificial intelligence (AI) development into a direction of maximally “math-seeking,” which could be, hopefully, more human-friendly and realistic. Although formal verification is usually extremely hard to obtain, recent progress in automated reasoning could potentially make this approach easier. However, existing large language models (LLMs) cannot do genuine logical reasoning or self-verification on their own, and they should be viewed as universal approximate knowledge retrievers. Given the important role of formal verifiers, we hope to explore ways of scaling them up. ...

July 12, 2025 · 3 min · Center for Safe&Trustworthy AI