Corpus List
Korpora
provides following corpora.
- Korean Chatbot Data
- KcBERT Pre-Training Corpus
- Korean Hate Speech Dataset
- Korean Petitions
- KorNLI
- KorSTS
- Korean WikiText
- NamuWikiText
- NAVER x Changwon NER
- NAVER Sentiment Movie Corpus
- Korean Question Pair
- Ko-En Parallel Corpus
- Modu: Newspaper
- Modu: Messenger
- Modu: Morphemes
- Modu: Named Entity
- Modu: Spoken
- Modu: Web
- Modu: Written
- AI Hub Ko-En Parallel Corpus
- OpenSubtitles2016
Warning
Due to the licensing issue of Modu corpus and AI Hub Ko-En Parallel Corpus, Korpora
does not provide any download functions for these corpora. Rather, it only offers a load function.If you wish to use these corpora, please complete the authentication process required by the National Institue of Korean Language and AI Hub. And then manually download the corpora.