Bijankhan Corpus
   HOME

TheInfoList



OR:

The Bijankhan corpus () is a tagged
corpus Corpus (plural ''corpora'') is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of ...
that is suitable for
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP) research on the
Persian language Persian ( ), also known by its endonym and exonym, endonym Farsi (, Fārsī ), is a Western Iranian languages, Western Iranian language belonging to the Iranian languages, Iranian branch of the Indo-Iranian languages, Indo-Iranian subdivision ...
. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags. The Bijankhan corpus was created by the Database Research Group at the
University of Tehran The University of Tehran (UT) or Tehran University (, ) is a public collegiate university in Iran, and the oldest and most prominent Iranian university located in Tehran. Based on its historical, socio-cultural, and political pedigree, as well as ...
. The corpus is non-
free Free may refer to: Concept * Freedom, the ability to act or change without constraint or restriction * Emancipate, attaining civil and political rights or equality * Free (''gratis''), free of charge * Gratis versus libre, the difference betw ...
in that it is not free for commercial use, although these restrictions vary by country. The Bijankhan corpus is named after Mahmood Bijankhan, professor of linguistics at the University of Tehran due to his contributions in this area.


See also

* Hamshahri Corpus


References


External links



Persian corpora Applied linguistics Linguistic research {{ie-lang-stub