Korpora: Korean Corpora Archives
Korpora
is an open-source Python package that aims to minimize such inconvenience. The name Korpora
comes from the word corpora, a plural form of the word corpus. Korpora
is an acronym that stands for Korean Corpora. We hope that Korpora
will serve as a starting point that encourages more Korean datasets to be released and improve the state of Korean natural language processing to the next level.
List of corpora
Korpora
provides following corpora.
License
- Korpora is licensed under the Creative Commons License(CCL) 4.0 CC-BY. This license covers the Korpora package and all of its components.
- Its users have the following rights.
- Share : They are free to reproduce, distribute, exhibit, perform and transmit via air (including changes in the format).
- Adapt : They can remix, transform, and build upon the material for any purpose, even commercially.
- Its users have the following obligations. As long as these obligations are fulfilled, the user rights listed above are valid.
- Attribution : They must indicate that they have used Korpora.
- No additional restrictions : For all derivative works of Korpora, they cannot impose stricter license than CC-BY permits.
- For example, if you have downloaded and used Korpora, you need to fulfill only the ‘attribution' obligation. However, if you are creating and distributing models, documents or any other derivative works of Korpora, you must fulfill both the ‘attribution' and ‘no additional restrictions' obligations.
- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!
Contributors