Instructions to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF", filename="LFM2.5-1.2B-Instruct-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Use Docker
docker model run hf.co/sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
- Ollama
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with Ollama:
ollama run hf.co/sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
- Unsloth Studio
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF to start chatting
- Pi
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with Docker Model Runner:
docker model run hf.co/sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
- Lemonade
How to use sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LFM2.5-1.2B-Instruct-DGX-Spark-GGUF-Q4_K_M
List all available models
lemonade list
๐ v0.1.6: Real-time Metrics & Blackwell-Optimized Docker (Recommended)
This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware.
๐ Key Features (v0.1.6)
- Real-time Performance Metrics: Now visualizes
Input TPSandOutput TPSduring streaming. - Improved Reasoning UI: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT).
- Blackwell Optimization: Native support for ARM64/SM121 and CUDA 13.0 FP4.
๐ณ Quick Start
# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6
For more details, visit our GitHub Repository.
๐ v0.1.6: ์ค์๊ฐ ์งํ ๋ฐ Blackwell ์ต์ ํ ๋์ปค (๊ถ์ฅ)
์ด ๋ชจ๋ธ์ DGX-Spark-llama.cpp-Bench ์์คํ ์ ์ต์ ํ๋์ด ์์ต๋๋ค. NVIDIA Blackwell (DGX Spark) ํ๋์จ์ด์ ์ฑ๋ฅ์ ์ต๋๋ก ํ์ฉํ์ธ์.
๐ ์ฃผ์ ํน์ง (v0.1.6)
- ์ค์๊ฐ ์ฑ๋ฅ ์งํ ์๊ฐํ: ์คํธ๋ฆฌ๋ฐ ์ค
Input TPS๋ฐOutput TPS๋ฅผ ์ค์๊ฐ์ผ๋ก ํ์ํฉ๋๋ค. - ์ง๋ฅํ ์ถ๋ก UI ๊ณ ๋ํ: ๋ชจ๋ธ์ ์๊ฐํ๋ ๊ณผ์ (CoT)์ ๋ ์์ ์ ์ผ๋ก ๋ ๋๋งํฉ๋๋ค.
- Blackwell ์ต์ ํ: ARM64/SM121 ์ํคํ ์ฒ ๋ฐ CUDA 13.0 FP4 ๊ฐ์ ์ง์.
๐ณ ์คํ ๋ฐฉ๋ฒ
# ์ต์ ์ต์ ํ ์ด๋ฏธ์ง ๋ด๋ ค๋ฐ๊ธฐ
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.6
์์ธํ ์ฌ์ฉ๋ฒ์ GitHub ๋ฆฌํฌ์งํ ๋ฆฌ๋ฅผ ์ฐธ์กฐํ์ธ์.
๐ v0.1.5: Real-time Metrics & Blackwell-Optimized Docker (Recommended)
This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the state-of-the-art inference engine optimized for NVIDIA Blackwell (DGX Spark) hardware.
๐ Key Features (v0.1.5)
- Real-time Performance Metrics: Now visualizes
Input TPSandOutput TPSduring streaming. - Improved Reasoning UI: Seamlessly renders and stabilizes the model's Chain-of-Thought (CoT).
- Blackwell Optimization: Native support for ARM64/SM121 and CUDA 13.0 FP4.
๐ณ Quick Start
# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5
For more details, visit our GitHub Repository.
๐ v0.1.5: ์ค์๊ฐ ์งํ ๋ฐ Blackwell ์ต์ ํ ๋์ปค (๊ถ์ฅ)
์ด ๋ชจ๋ธ์ DGX-Spark-llama.cpp-Bench ์์คํ ์ ์ต์ ํ๋์ด ์์ต๋๋ค. NVIDIA Blackwell (DGX Spark) ํ๋์จ์ด์ ์ฑ๋ฅ์ ์ต๋๋ก ํ์ฉํ์ธ์.
๐ ์ฃผ์ ํน์ง (v0.1.5)
- ์ค์๊ฐ ์ฑ๋ฅ ์งํ ์๊ฐํ: ์คํธ๋ฆฌ๋ฐ ์ค
Input TPS๋ฐOutput TPS๋ฅผ ์ค์๊ฐ์ผ๋ก ํ์ํฉ๋๋ค. - ์ง๋ฅํ ์ถ๋ก UI ๊ณ ๋ํ: ๋ชจ๋ธ์ ์๊ฐํ๋ ๊ณผ์ (CoT)์ ๋ ์์ ์ ์ผ๋ก ๋ ๋๋งํฉ๋๋ค.
- Blackwell ์ต์ ํ: ARM64/SM121 ์ํคํ ์ฒ ๋ฐ CUDA 13.0 FP4 ๊ฐ์ ์ง์.
๐ณ ์คํ ๋ฐฉ๋ฒ
# ์ต์ ์ต์ ํ ์ด๋ฏธ์ง ๋ด๋ ค๋ฐ๊ธฐ
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.5
์์ธํ ์ฌ์ฉ๋ฒ์ GitHub ๋ฆฌํฌ์งํ ๋ฆฌ๋ฅผ ์ฐธ์กฐํ์ธ์.
๐ v0.1.4: Quick Start with Blackwell-Optimized Docker (Recommended)
This model is fully compatible with the DGX-Spark-llama.cpp-Bench. Experience the best performance on NVIDIA Blackwell (DGX Spark) hardware with our optimized inference engine.
๐ Key Features (v0.1.4)
- Blackwell Optimized: Native support for ARM64/SM121 and CUDA 13.0 FP4.
- Intelligent Reasoning UI: Automatic extraction and visualization of reasoning processes (CoT).
- One-Click Deployment: Standardized environment via GHCR Docker image.
๐ณ How to Run
# Pull the latest optimized image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4
# Follow the instructions in our repo to serve this model
# GitHub: https://github.com/sowilow/DGX-Spark-llama.cpp-Bench
๐ v0.1.4: Blackwell ์ต์ ํ ๋์ปค ํต์คํํธ (๊ถ์ฅ)
์ด ๋ชจ๋ธ์ DGX-Spark-llama.cpp-Bench ์์คํ ์ ์ต์ ํ๋์ด ์์ต๋๋ค. NVIDIA Blackwell (DGX Spark) ํ๋์จ์ด์ ์ฑ๋ฅ์ ์ต๋๋ก ํ์ฉํ๋ ์ต์ ํ๋ ์ถ๋ก ์์ง์ ๊ฒฝํํด ๋ณด์ธ์.
๐ ์ฃผ์ ํน์ง (v0.1.4)
- Blackwell ์ต์ ํ: ARM64/SM121 ์ํคํ ์ฒ ๋ฐ CUDA 13.0 FP4 ํ๋์จ์ด ๊ฐ์ ์ง์.
- ์ง๋ฅํ ์ถ๋ก UI: ๋ชจ๋ธ์ ์๊ฐํ๋ ๊ณผ์ (CoT)์ ์๋์ผ๋ก ๊ฐ์งํ๊ณ ์๊ฐํํฉ๋๋ค.
- ๊ฐํธํ ๋ฐฐํฌ: GHCR ๋์ปค ์ด๋ฏธ์ง๋ฅผ ํตํด ํ๊ฒฝ ์ค์ ์์ด ์ฆ์ ์คํ ๊ฐ๋ฅํฉ๋๋ค.
๐ณ ์คํ ๋ฐฉ๋ฒ
# ์ต์ ์ต์ ํ ์ด๋ฏธ์ง ๋ด๋ ค๋ฐ๊ธฐ
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:v0.1.4
์์ธํ ์ฌ์ฉ๋ฒ์ GitHub ๋ฆฌํฌ์งํ ๋ฆฌ๋ฅผ ์ฐธ์กฐํ์ธ์.
๐ Quick Start with Docker (Recommended)
You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).
1. Pull the Docker Image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest
2. Run the Inference Server
For detailed configuration and usage, visit the GitHub Repository.
LFM2.5-1.2B-Instruct-DGX-Spark-GGUF
This repository contains GGUF-quantized weights for LFM2.5-1.2B-Instruct, specifically optimized for NVIDIA Blackwell (DGX Spark) hardware.
๐ Key Features
- Hardware Optimized: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration.
- Quantization:
- Q4_K_M: Balanced performance and accuracy.
- Q8_0: High precision preservation.
- Base Model Integration: Linked directly to the original LiquidAI/LFM2.5-1.2B-Instruct.
โ๏ธ License & Attribution
This model is a quantized version of the original LiquidAI/LFM2.5-1.2B-Instruct and is subject to its original license.
๐ Files Included
lfm2.5-1.2b-instruct-q4_k_m.gguf: 4-bit quantized model.lfm2.5-1.2b-instruct-q8_0.gguf: 8-bit quantized model.
Created using DGX-Spark-llama.cpp-Bench
- Downloads last month
- 49
4-bit
8-bit
Model tree for sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF
Base model
LiquidAI/LFM2.5-1.2B-Base