Instructions to use walledai/walledguard-edge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use walledai/walledguard-edge with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="walledai/walledguard-edge") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("walledai/walledguard-edge") model = AutoModelForCausalLM.from_pretrained("walledai/walledguard-edge") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use walledai/walledguard-edge with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "walledai/walledguard-edge" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "walledai/walledguard-edge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/walledai/walledguard-edge
- SGLang
How to use walledai/walledguard-edge with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "walledai/walledguard-edge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "walledai/walledguard-edge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "walledai/walledguard-edge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "walledai/walledguard-edge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use walledai/walledguard-edge with Docker Model Runner:
docker model run hf.co/walledai/walledguard-edge
WalledGuard-Edge
🔥WalledGuard-Edge is a 0.6B parameter open source model (Apache 2.0 license) that outperforms Llamaguard3 (1B) on multilingual and multiple jailbreak types.
🔥WalledProtect is the most capable content moderator of Walled AI to date. To try the latest version, get your free API access at www.walled.ai. Read the full announcement at blog.
Model Details
| Model | XStest | Aegis | OAI | Multilingual | Jailbreak | Latency (ms) |
|---|---|---|---|---|---|---|
| Llamaguard3 (1b) | 83.11 | 67.72 | 73.11 | 64.88 | 74.23 | 250 |
| WalledGuard-Edge (0.6b) | 88.00 | 82.99 | 84.05 | 67.02 | 88.57 | 113 |
| Llamaguard3 (8b) | 88.89 | 76.88 | 79.74 | 71.99 | 81.88 | 520 |
| Llamaguard4 (12b) | 86.22 | 73.12 | 76.48 | 68.14 | 79.73 | 661 |
| WalledProtect | 94.22 | 84.37 | 88.36 | 90.18 | 91.34 | 300 |
OAI = OpenAI Moderation dataset (used as a benchmark for content safety evaluation). Latency is computed on a single A6000 GPU for open-weight models and from the API for WalledProtect.
⚔️ WalledGuard-Edge and WalledProtect cover a wide array of jailbreak prompt types, including direct disallowed requests, role-play (“DAN”, “do-anything” style), persuasion, encoding, and both single-turn and multi-turn chains.
🌍 Both systems also provide robust multilingual support, tested across Arabic, English, Filipino, French, Hindi, Russian, Serbian, and Spanish.
🔒 WalledProtect especially covers stricter and more complex attacks (lower jailbreak success rate) and provides stronger moderation/refusal behavior without excessive false positives.
Model Description
- Developed by: Walled AI
- Language(s) (NLP): Arabic, English, Chinese, Filipino, French, Hindi, Russian, Serbian, Spanish, Malay, Tamil
- License: Apache 2.0
Direct Use
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM
TEMPLATE = """[INST] Evaluate if the given text is harmful i.e. either it asks for unsafe information or it contains unsafe information.
<START TEXT>
{prompt}
<END TEXT>
Answer: [/INST]
"""
model_name = "walledai/walledguard-edge"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
input_ids = tokenizer.encode(TEMPLATE.format(prompt="How can I make my wife burst into laughter?"), return_tensors="pt")
output = model.generate(input_ids=input_ids, max_new_tokens=20, pad_token_id=0)
prompt_len = input_ids.shape[-1]
output_decoded = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
prediction = 'unsafe' if 'unsafe' in output_decoded else 'safe'
print(prediction)
Proprietary Guardrail: WalledProtect
To know about our best-in-class proprietary guardrail, Read the full announcement at blog.
To try the latest version for free, get your API access at www.walled.ai.
LLM Safety Evaluation Hub
Do check out our LLM Safety Evaluation One-Stop Center: Walled Eval!
Citation
If you use WalledGuard in your research or product, please cite the following paper:
@misc{gupta2024walledeval,
title={WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models},
author={Prannaya Gupta and Le Qi Yau and Hao Han Low and I-Shiang Lee and Hugo Maximus Lim and Yu Xin Teoh and Jia Hng Koh and Dar Win Liew and Rishabh Bhardwaj and Rajat Bhardwaj and Soujanya Poria},
year={2024},
eprint={2408.03837},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.03837},
}
- Downloads last month
- 10