The intelligent agent lurking in your phone

Entry Point: From Simple Voice Assistant to the Phone’s “Second Brain” The Evolution of Mobile Assistants: A few years ago, mobile assistants were merely tools that responded to simple commands like “What’s the weather today?” or “Set an alarm for 7 a.m.” Today, on-device Agents are rapidly evolving into our phone’s “second brain”—a highly privileged core of the personal operating system. They are no longer isolated apps, but “super stewards” that can span across all apps, invoke system functions, manage your files, read your messages, and access your contacts and photo albums. Security Value: This shift calls for models that can intelligently integrate unstructured knowledge—such as design documents, operation manuals, incident analysis reports, and user feedback—to support timely, accurate, and comprehensive fault diagnosis and safety reviews. This integration helps minimize risks caused by missing or misused information. The Everyday Risk of High-Privilege Operations: When we routinely issue commands to our phones—“Send this screenshot to Mr. Li,” “Create a calendar event based on this email and notify all participants,” or “If my wife calls, remind me that today is our anniversary”—we are effectively granting the agent permission to execute a series of high-privilege operations. The key concern is: how do we ensure this “agent” won’t be compromised, won’t misinterpret instructions, and won’t “hallucinate” at critical moments? Security Value: A robust evaluation framework is essential—akin to how we conduct background checks and periodic assessments for individuals in sensitive positions. Mobile agents must undergo comprehensive “security checkups” not only to defend against external threats, but also to proactively verify that their behavior remains reliable, controllable, and compliant when handling our everyday, yet high-stakes, tasks. About the Project Imagine this scenario: You receive a phishing email disguised as an “annual statement.” When you tell your phone Agent, “Help me summarize today’s unread emails,” the Agent is hijacked by a hidden malicious command within this email while processing it. ...

July 18, 2025 · 7 min · Center for Safe&Trustworthy AI

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45° Law

<!DOCTYPE html> 阅读论文样式修改 Read the Paper 1 Introduction Recent advances in large language models (LLMs) have led to significant improvements in their intelligence, particularly in their reasoning and decision-making capabilities [1, 2]. However, these performance gains are often accompanied by an increasing gap between the capability and safety, moving further away from the AI-45° Law [3]. For example, existing LLMs frequently demonstrate difficulty in upholding ethical principles, societal norms, and wider human values, especially when navigating the complexities of real-world scenarios. ...

July 12, 2025 · 7 min · Center for Safe&Trustworthy AI