dragonkue/colbert-ko-0.1b

This is a Korean ColBERT model finetuned from skt/A.X-Encoder-base using PyLate. It maps sentences and paragraphs to sequences of 32, 64, 96, or 128-dimensional token embeddings, and uses the MaxSim operator for late-interaction retrieval.

Model Details

Model Description

  • Model Type: PyLate model
  • Base model: skt/A.X-Encoder-base
  • Document Length: 2048 tokens
  • Query Length: 32 tokens
  • Output Dimensionality: 128 (Matryoshka: 32 / 64 / 96 / 128)
  • Similarity Function: MaxSim
  • Language: Korean
  • License: Apache-2.0

Model Sources

Full Model Architecture

ColBERTWrapper(
  (0): Transformer({'max_seq_length': 2047, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

Usage

This model supports Matryoshka embeddings with multiple dimensions (32, 64, 96, 128) via separate projection heads (Jina-ColBERT-v2 style).

The model ships a bundled model.py so it can be loaded with the transformers library alone — no third-party retrieval package is required. The custom code is opt-in: pass trust_remote_code=True to from_pretrained.

Installation

pip install transformers torch

Quick Start

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("dragonkue/colbert-ko-0.1b")
model = AutoModel.from_pretrained(
    "dragonkue/colbert-ko-0.1b",
    trust_remote_code=True,
)
model.eval()
# model.cuda()  # optional

# Pick embedding dimension (32, 64, 96, or 128). Default is 128.
model.set_active_dim(128)

queries = ["검색 쿼리"]
documents = ["문서 내용"]

q_embs = model.encode(tokenizer, queries, is_query=True)
d_embs = model.encode(tokenizer, documents, is_query=False)

print(f"Query shape: {q_embs[0].shape}")  # (num_tokens, 128)
print(f"Doc shape:   {d_embs[0].shape}")  # (num_tokens, 128)

# ColBERT late-interaction (MaxSim) score for one (query, doc) pair
score = model.maxsim(q_embs[0], d_embs[0])
print(f"MaxSim: {score:.4f}")

Reranking

query = "대한민국의 수도는 어디인가요?"
candidates = [
    "대한민국의 수도는 서울이며, 인구가 가장 많은 도시이다.",
    "파리는 프랑스의 수도이고 에펠탑으로 유명하다.",
    "오늘 날씨는 맑고 기온은 25도 정도이다.",
]

q_embs = model.encode(tokenizer, [query], is_query=True)
c_embs = model.encode(tokenizer, candidates, is_query=False)

ranked = sorted(
    [(i, model.maxsim(q_embs[0], c_embs[i])) for i in range(len(candidates))],
    key=lambda x: x[1],
    reverse=True,
)
for rank, (i, score) in enumerate(ranked, start=1):
    print(f"  rank {rank}: score={score:.4f}  | {candidates[i]}")

Approximate Retrieval (PyLate PLAID, optional)

For PLAID indexing over a large corpus you can additionally use PyLate. PyLate loads the model as a standard sentence-transformers checkpoint (no trust_remote_code needed). The 128-dim head saved at 1_Dense/model.safetensors matches the Matryoshka 128-dim head, so PyLate produces the same embeddings as the snippet above.

from pylate import indexes, models, retrieve

model = models.ColBERT(model_name_or_path="dragonkue/colbert-ko-0.1b")

index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="index",
    override=True,
)

documents_ids = ["1", "2", "3"]
documents = ["첫번째 문서입니다", "두번째 문서입니다", "세번째 문서입니다"]

documents_embeddings = model.encode(documents, is_query=False)
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

retriever = retrieve.ColBERT(index=index)
queries_embeddings = model.encode(["첫번째 문서 검색"], is_query=True)
scores = retriever.retrieve(queries_embeddings=queries_embeddings, k=3)
print(scores)

The PyLate path is fixed to the 128-dim head; for Matryoshka dimension switching (32 / 64 / 96 / 128), use the trust_remote_code=True path shown above.

Evaluation Results (NDCG@10)

Comparison with Other Models (dim128)

Model AutoRAG Ko-StrategyQA NanoBEIR-Ko Avg
dragonkue-colbert-ko-0.1b (149M) 0.970 0.735 0.503 0.737
BGE-M3-MultiVec (568M) 0.844 0.797 0.569 0.737
LFM2-ColBERT (353M) 0.833 0.757 0.528 0.706
colbert-ko-v1 (149M) 0.966 0.713 0.476 0.718

Performance by Embedding Dimension

Dimension AutoRAG Ko-StrategyQA NanoBEIR-Ko
32 0.976 0.713 0.485
64 0.979 0.725 0.492
96 0.969 0.739 0.501
128 0.970 0.735 0.503

Differences across dimensions are small (within ≈0.01 on each benchmark) and can flip between runs on the more saturated benchmarks such as AutoRAG. The 128-dim head is the default; smaller dimensions trade a few points of NDCG for a 2–4× reduction in vector size.

Scores for dragonkue-colbert-ko-0.1b are measured here with raw MaxSim (no PLAID quantization) on transformers==5.3.0 and pytorch==2.9. Scores for the other models are reproduced from prior reports and may differ from a re-run on the same stack.

  • Loss: src.losses.MatryoshkaColBERTLoss with these parameters:
    {
        "dims": [
            32,
            64,
            96,
            128
        ],
        "weights": [
            0.25,
            0.25,
            0.25,
            0.25
        ],
        "temperature": 1.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • learning_rate: 1e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_drop_last: True
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • router_mapping: {'anchor': 'query', 'positive': 'document', 'neg_0': 'document'}

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: {'use_reentrant': False}
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {'anchor': 'query', 'positive': 'document', 'neg_0': 'document'}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0017 10 4.1388
0.0034 20 4.1142
0.0051 30 3.9797
0.0067 40 3.8761
0.0084 50 3.6167
0.0101 60 3.424
0.0118 70 3.0256
0.0135 80 2.827
0.0152 90 2.5787
0.0169 100 2.2696
0.0185 110 2.0266
0.0202 120 1.6815
0.0219 130 1.4739
0.0236 140 1.2877
0.0253 150 1.1474
0.0270 160 1.0143
0.0286 170 0.9363
0.0303 180 0.9189
0.0320 190 0.7442
0.0337 200 0.6919
0.0354 210 0.6251
0.0371 220 0.6527
0.0388 230 0.5923
0.0404 240 0.572
0.0421 250 0.5255
0.0438 260 0.4407
0.0455 270 0.5038
0.0472 280 0.3939
0.0489 290 0.3938
0.0506 300 0.3253
0.0522 310 0.335
0.0539 320 0.2855
0.0556 330 0.2396
0.0573 340 0.252
0.0590 350 0.2299
0.0607 360 0.2133
0.0624 370 0.2186
0.0640 380 0.1935
0.0657 390 0.1743
0.0674 400 0.1462
0.0691 410 0.1552
0.0708 420 0.1491
0.0725 430 0.1581
0.0741 440 0.1635
0.0758 450 0.1383
0.0775 460 0.1377
0.0792 470 0.1155
0.0809 480 0.1184
0.0826 490 0.1333
0.0843 500 0.1341
0.0859 510 0.1259
0.0876 520 0.0748
0.0893 530 0.1342
0.0910 540 0.1058
0.0927 550 0.1024
0.0944 560 0.0921
0.0961 570 0.104
0.0977 580 0.1069
0.0994 590 0.0925
0.1011 600 0.1146
0.1028 610 0.0682
0.1045 620 0.0711
0.1062 630 0.1491
0.1079 640 0.0602
0.1095 650 0.0753
0.1112 660 0.0713
0.1129 670 0.0739
0.1146 680 0.0783
0.1163 690 0.0678
0.1180 700 0.0963
0.1196 710 0.0677
0.1213 720 0.0829
0.1230 730 0.0719
0.1247 740 0.0646
0.1264 750 0.0927
0.1281 760 0.0755
0.1298 770 0.0799
0.1314 780 0.0535
0.1331 790 0.0555
0.1348 800 0.0804
0.1365 810 0.0627
0.1382 820 0.0726
0.1399 830 0.0685
0.1416 840 0.0421
0.1432 850 0.0895
0.1449 860 0.0964
0.1466 870 0.0515
0.1483 880 0.0825
0.1500 890 0.0801
0.1517 900 0.0579
0.1534 910 0.0559
0.1550 920 0.0432
0.1567 930 0.0553
0.1584 940 0.0577
0.1601 950 0.0451
0.1618 960 0.049
0.1635 970 0.0459
0.1651 980 0.0684
0.1668 990 0.0449
0.1685 1000 0.0392
0.1702 1010 0.071
0.1719 1020 0.0511
0.1736 1030 0.0501
0.1753 1040 0.0464
0.1769 1050 0.0678
0.1786 1060 0.0597
0.1803 1070 0.0569
0.1820 1080 0.044
0.1837 1090 0.0452
0.1854 1100 0.0394
0.1871 1110 0.0496
0.1887 1120 0.0296
0.1904 1130 0.0321
0.1921 1140 0.0525
0.1938 1150 0.058
0.1955 1160 0.0552
0.1972 1170 0.035
0.1989 1180 0.0468
0.1999 1186 -
0.2005 1190 0.0383
0.2022 1200 0.0599
0.2039 1210 0.0572
0.2056 1220 0.0383
0.2073 1230 0.0486
0.2090 1240 0.0407
0.2107 1250 0.044
0.2123 1260 0.04
0.2140 1270 0.0338
0.2157 1280 0.036
0.2174 1290 0.0511
0.2191 1300 0.0472
0.2208 1310 0.031
0.2224 1320 0.0614
0.2241 1330 0.0388
0.2258 1340 0.0403
0.2275 1350 0.047
0.2292 1360 0.033
0.2309 1370 0.0524
0.2326 1380 0.0357
0.2342 1390 0.0463
0.2359 1400 0.0355
0.2376 1410 0.0411
0.2393 1420 0.028
0.2410 1430 0.0386
0.2427 1440 0.0553
0.2444 1450 0.0353
0.2460 1460 0.0462
0.2477 1470 0.0399
0.2494 1480 0.0319
0.2511 1490 0.0456
0.2528 1500 0.0302
0.2545 1510 0.0366
0.2562 1520 0.0409
0.2578 1530 0.0337
0.2595 1540 0.0362
0.2612 1550 0.0318
0.2629 1560 0.0433
0.2646 1570 0.0379
0.2663 1580 0.0419
0.2679 1590 0.0225
0.2696 1600 0.0269
0.2713 1610 0.0295
0.2730 1620 0.048
0.2747 1630 0.0382
0.2764 1640 0.0341
0.2781 1650 0.0334
0.2797 1660 0.0534
0.2814 1670 0.0445
0.2831 1680 0.0284
0.2848 1690 0.0327
0.2865 1700 0.0309
0.2882 1710 0.0372
0.2899 1720 0.0384
0.2915 1730 0.022
0.2932 1740 0.0266
0.2949 1750 0.0399
0.2966 1760 0.0342
0.2983 1770 0.0391
0.3000 1780 0.0349
0.3017 1790 0.0365
0.3033 1800 0.0322
0.3050 1810 0.0414
0.3067 1820 0.0297
0.3084 1830 0.0446
0.3101 1840 0.0312
0.3118 1850 0.0379
0.3134 1860 0.0252
0.3151 1870 0.0424
0.3168 1880 0.0367
0.3185 1890 0.0226
0.3202 1900 0.0319
0.3219 1910 0.0189
0.3236 1920 0.0219
0.3252 1930 0.0341
0.3269 1940 0.0505
0.3286 1950 0.0176
0.3303 1960 0.0328
0.3320 1970 0.0276
0.3337 1980 0.0251
0.3354 1990 0.0603
0.3370 2000 0.0243
0.3387 2010 0.0316
0.3404 2020 0.0294
0.3421 2030 0.025
0.3438 2040 0.0255
0.3455 2050 0.0318
0.3472 2060 0.025
0.3488 2070 0.0273
0.3505 2080 0.0338
0.3522 2090 0.0299
0.3539 2100 0.0275
0.3556 2110 0.0184
0.3573 2120 0.0244
0.3589 2130 0.0432
0.3606 2140 0.0325
0.3623 2150 0.0525
0.3640 2160 0.0329
0.3657 2170 0.0236
0.3674 2180 0.0309
0.3691 2190 0.0195
0.3707 2200 0.0318
0.3724 2210 0.0229
0.3741 2220 0.0312
0.3758 2230 0.0186
0.3775 2240 0.0231
0.3792 2250 0.0262
0.3809 2260 0.0287
0.3825 2270 0.0299
0.3842 2280 0.0302
0.3859 2290 0.0281
0.3876 2300 0.0252
0.3893 2310 0.0362
0.3910 2320 0.0266
0.3927 2330 0.0304
0.3943 2340 0.0259
0.3960 2350 0.0276
0.3977 2360 0.0219
0.3994 2370 0.0361
0.3997 2372 -

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.1.1
  • PyLate: 1.3.4
  • Transformers: 4.56.2
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.2-rc0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084"
}

PyLate

@misc{PyLate,
title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
author={Chaffin, Antoine and Sourty, Raphaël},
url={https://github.com/lightonai/pylate},
year={2024}
}

Jina ColBERT v2

@article{jina-colbert-v2,
    title={Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever},
    author={Rohan Jha and Bo Wang and Michael Günther and Saba Sturua and Mohammad Kalim Akram and Han Xiao},
    year={2024},
    journal={arXiv preprint arXiv:2408.16672},
    url={https://arxiv.org/abs/2408.16672}
}
Downloads last month
319
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dragonkue/colbert-ko-0.1b

Finetuned
(6)
this model

Collection including dragonkue/colbert-ko-0.1b

Papers for dragonkue/colbert-ko-0.1b