TikTok-10M Dataset

We just released our first dataset of 10M highly curated TikTok videos. Would love feedback!

If you are training or finetuning video models and need curated datasets, feel free to reachout!

i would love a link

The-data-company/TikTok-10M on hf

I tried loading it using datasets.load_dataset and it worked after grabbing the CSV link directly from the GitHub repo. Make sure to use streaming=True if memory is an issue.

I load it with load_dataset using the raw CSV URL, then switch on streaming so I don’t run out of memory while scanning or sampling videos.