Musubi's PolicyAI Now Integrates with NVIDIA NeMo Guardrails

March 10, 2026

Alice Hunsberger

In This Guide:

At Musubi, we design our tools to work well alongside whatever else T&S teams are using, because we believe good T&S tooling should be modular — each component doing one thing well, and connecting easily with the rest of your stack. Today, we're making that easier: Musubi’s PolicyAI now officially integrates with NVIDIA's NeMo Guardrails, so teams using NeMo to manage LLM safety can connect PolicyAI's custom content moderation to their pipeline in a few minutes of configuration.

What Is NeMo Guardrails?

NeMo Guardrails is NVIDIA's open-source toolkit for adding programmable safety controls to LLM applications. It sits between your users and your AI model, evaluating both what users send and what the model responds with, and blocking anything that falls outside defined boundaries. It's become a widely-adopted foundation for teams building AI assistants, chatbots, and copilots who need structured, reliable safety controls.

What PolicyAI Adds

NeMo Guardrails includes a solid set of built-in safety features. PolicyAI is a separate content moderation API that extends that foundation by letting you define your own policies in plain language, so your guardrails reflect what your specific platform needs, instead of a generic taxonomy.

Instead of selecting from fixed violation categories, you write policies in the language your team already uses. Something like "Block any response that commits to a specific refund amount or timeline" is a policy that any T&S practitioner can write and iterate on without code or model retraining.

Policies are organized by tags/ endpoints, so you can maintain separate rule sets for different environments, product surfaces, or user segments, and switch between them with a single configuration variable. PolicyAI also includes an analytics, testing, and oversight layer, so you can test policies against real content before deploying them, monitor how your rules are performing over time, and identify patterns in flagged content to surface emerging threats before they become widespread.

When something is flagged, PolicyAI returns a structured and configurable assessment with a policy label, severity score, confidence score, and reasoning, so the decision is legible to whoever needs to understand it, whether that's a moderator, a compliance team, or an engineer tuning the rules.

How It Works

Every message flows through the pipeline in both directions. When a user sends a message, PolicyAI checks it against all policies attached to your configured tag before it reaches the model. When the model responds, PolicyAI checks the output before it reaches the user. If any policy returns unsafe at either stage, the message is blocked and replaced with a refusal response. If PolicyAI can't reach a determination (for instance, if no policies are attached to the specified tag) the integration fails closed rather than silently passing content through.

What This Looks Like in Practice

The flexibility here opens up use cases that fixed-category moderation can't easily address.

Customer support — A support bot should be helpful, but it shouldn't make commitments about refunds or pricing that your team can't honor. You can write a policy that catches those commitments before they reach users, without touching your application code.
Healthcare — Generic content categories don't map cleanly onto clinical boundaries. With PolicyAI, you can write policies that address medical diagnoses, medication recommendations, or other domain-specific concerns at the level of granularity your legal and clinical teams actually need.
Education — An AI tutor for younger students needs different content thresholds than one designed for adults. PolicyAI lets you write age-appropriate policies and switch between them based on user context.
Financial services — You need to catch content that could be construed as investment advice, and you need documentation when something is flagged so your compliance team understands why. Both are built into how PolicyAI explains its decisions.
Internal enterprise tools — Where what counts as confidential is specific to your organization, you can write policies that reflect your actual business context. Off-the-shelf moderation tools assume a generic platform; PolicyAI assumes you know your platform better than anyone else does.

Why This Matters

Most content moderation tools give you a fixed set of categories to work within — a predetermined taxonomy of what counts as unsafe. That works well for well-understood violation types, but it breaks down when your platform has specific requirements, when language evolves, or when you're operating in a domain with its own standards.

PolicyAI is built around the idea that the people closest to a platform understand what should and shouldn't happen there better than any generic classifier can. When your T&S team can write and update policies directly, in language they already use, they can respond to new edge cases and emerging threats without waiting on engineering. And when content is blocked, the explanation is clear enough that compliance teams, stakeholders, and moderators themselves can understand it. The moderation keeps pace with your platform because the people running it are in direct control.

Getting Started

PolicyAI and NeMo Guardrails are each set up independently — you'll need a PolicyAI API key from Musubi to connect them. Once you have that, enabling input and output moderation is a matter of adding a few lines to your NeMo config.yml. The integration supports both Colang 1.x and 2.x.

Full setup instructions are in the NeMo Guardrails documentation.

Want to see how Musubi works with NeMo Guardrails? Get in touch — we're happy to walk you through it.

‍

Book a demo

How to audit your fixed ML classifier

Four signs that a fixed ML Classifier might not be working for your Trust & Safety operations. What the symptoms are, how to diagnose, and what might work instead.

Rule-Based vs. Fixed ML vs. LLM Content Moderation: How to Choose

A practical comparison of rule-based, ML, and LLM-based content moderation for T&S teams — how each approach works, where each breaks down, and how to think about the decision in 2026.