Skip to content
@AI45Lab

OpenAI45Lab

Welcome 👋

to AI45, a safety ecosystem platform developed by Shanghai Artificial Intelligence Laboratory.

Core Philosophy

The platform is guided by the AI-45° Law. From a long-term perspective, AI safety and performance should ideally advance in parallel along a 45° line. Short-term fluctuations are permissible, but in the long run, this balance should neither fall below 45° (as at present) nor exceed it (to avoid constraining development).

Multiple technical pathways may achieve this “AI-45° Law”. We are exploring a causality-centered approach—“the Causal Ladder of Trustworthy AGI"—spanning three progressive layers: Approximate Alignment Layer, Intervenable Layer, and Reflectable Layer.'

Core Modules

🔬 Safety Foundation

🛡️ Safety Technology

🏆 Safety Evaluation

🌐 Safety Services

Popular repositories Loading

  1. AgentDoG AgentDoG Public

    A Diagnostic Guardrail Framework for AI Agent Safety and Security

    Python 298 9

  2. OpenRT OpenRT Public

    Open-source red teaming framework for MLLMs with 37+ attack methods

    Python 216 9

  3. ActorAttack ActorAttack Public

    Python 121 10

  4. Awesome-Trustworthy-Embodied-AI Awesome-Trustworthy-Embodied-AI Public

    JavaScript 92 2

  5. REEF REEF Public

    The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source LLMs.

    Python 73 8

  6. Flames Flames Public

    Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.

    63

Repositories

Showing 10 of 36 repositories
  • AI45Lab/Awesome-Trustworthy-Embodied-AI’s past year of commit activity
    JavaScript 92 2 0 0 Updated Feb 3, 2026
  • AgentDoG Public

    A Diagnostic Guardrail Framework for AI Agent Safety and Security

    AI45Lab/AgentDoG’s past year of commit activity
    Python 298 9 0 0 Updated Jan 29, 2026
  • OpenRT Public

    Open-source red teaming framework for MLLMs with 37+ attack methods

    AI45Lab/OpenRT’s past year of commit activity
    Python 216 AGPL-3.0 9 0 1 Updated Jan 16, 2026
  • AI45Lab/AIGC-Identification-Toolkit’s past year of commit activity
    Jupyter Notebook 5 1 1 0 Updated Dec 26, 2025
  • TheOtherMind Public
    AI45Lab/TheOtherMind’s past year of commit activity
    Python 11 0 0 0 Updated Dec 16, 2025
  • IS-Bench Public

    [AAAI 2026] Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks

    AI45Lab/IS-Bench’s past year of commit activity
    Python 40 3 0 0 Updated Nov 24, 2025
  • X-Boundary Public

    [EMNLP 2025] The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability"

    AI45Lab/X-Boundary’s past year of commit activity
    Python 38 4 0 0 Updated Nov 24, 2025
  • AI45Lab/AIGC_detection’s past year of commit activity
    Makefile 0 Unlicense 0 0 5 Updated Oct 13, 2025
  • CodeAttack Public

    [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

    AI45Lab/CodeAttack’s past year of commit activity
    Python 58 MIT 8 1 0 Updated Oct 1, 2025
  • AI45Lab/Safe-Trustworthy-EAI’s past year of commit activity
    Vue 1 0 0 0 Updated Sep 29, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Most used topics

Loading…