Instructions to use walledai/walledguard-edge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use walledai/walledguard-edge with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="walledai/walledguard-edge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("walledai/walledguard-edge")
model = AutoModelForCausalLM.from_pretrained("walledai/walledguard-edge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use walledai/walledguard-edge with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "walledai/walledguard-edge"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "walledai/walledguard-edge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/walledai/walledguard-edge

SGLang

How to use walledai/walledguard-edge with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "walledai/walledguard-edge" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "walledai/walledguard-edge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "walledai/walledguard-edge" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "walledai/walledguard-edge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use walledai/walledguard-edge with Docker Model Runner:
```
docker model run hf.co/walledai/walledguard-edge
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

WalledGuard-Edge

🔥WalledGuard-Edge is a 0.6B parameter open source model (Apache 2.0 license) that outperforms Llamaguard3 (1B) on multilingual and multiple jailbreak types.
🔥WalledProtect is the most capable content moderator of Walled AI to date. To try the latest version, get your free API access at www.walled.ai. Read the full announcement at blog.

Model Details

Model	XStest	Aegis	OAI	Multilingual	Jailbreak	Latency (ms)
Llamaguard3 (1b)	83.11	67.72	73.11	64.88	74.23	250
WalledGuard-Edge (0.6b)	88.00	82.99	84.05	67.02	88.57	113
Llamaguard3 (8b)	88.89	76.88	79.74	71.99	81.88	520
Llamaguard4 (12b)	86.22	73.12	76.48	68.14	79.73	661
WalledProtect	94.22	84.37	88.36	90.18	91.34	300

OAI = OpenAI Moderation dataset (used as a benchmark for content safety evaluation). Latency is computed on a single A6000 GPU for open-weight models and from the API for WalledProtect.

⚔️ WalledGuard-Edge and WalledProtect cover a wide array of jailbreak prompt types, including direct disallowed requests, role-play (“DAN”, “do-anything” style), persuasion, encoding, and both single-turn and multi-turn chains.

🌍 Both systems also provide robust multilingual support, tested across Arabic, English, Filipino, French, Hindi, Russian, Serbian, and Spanish.

🔒 WalledProtect especially covers stricter and more complex attacks (lower jailbreak success rate) and provides stronger moderation/refusal behavior without excessive false positives.

Model Description

Developed by: Walled AI
Language(s) (NLP): Arabic, English, Chinese, Filipino, French, Hindi, Russian, Serbian, Spanish, Malay, Tamil
License: Apache 2.0

Direct Use

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM

TEMPLATE = """[INST] Evaluate if the given text is harmful i.e. either it asks for unsafe information or it contains unsafe information.

<START TEXT>
{prompt}
<END TEXT>

Answer: [/INST]
"""

model_name = "walledai/walledguard-edge"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_ids = tokenizer.encode(TEMPLATE.format(prompt="How can I make my wife burst into laughter?"), return_tensors="pt")
output = model.generate(input_ids=input_ids, max_new_tokens=20, pad_token_id=0)
prompt_len = input_ids.shape[-1]
output_decoded = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
prediction = 'unsafe' if 'unsafe' in output_decoded else 'safe'

print(prediction)

Proprietary Guardrail: WalledProtect

To know about our best-in-class proprietary guardrail, Read the full announcement at blog.

To try the latest version for free, get your API access at www.walled.ai.

LLM Safety Evaluation Hub

Do check out our LLM Safety Evaluation One-Stop Center: Walled Eval!

Citation

If you use WalledGuard in your research or product, please cite the following paper:

@misc{gupta2024walledeval,
      title={WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models}, 
      author={Prannaya Gupta and Le Qi Yau and Hao Han Low and I-Shiang Lee and Hugo Maximus Lim and Yu Xin Teoh and Jia Hng Koh and Dar Win Liew and Rishabh Bhardwaj and Rajat Bhardwaj and Soujanya Poria},
      year={2024},
      eprint={2408.03837},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.03837}, 
}

Downloads last month: 10

Safetensors

Model size

0.6B params

Tensor type

BF16

Paper for walledai/walledguard-edge

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7, 2024 • 18