JMDict
JMdict (Japanese–Multilingual Dictionary) is a large machine-readable multilingual Japanese dictionary. As of March 2023, it contains Japanese– English translations for around 199,000 entries, representing 282,000 unique headword-reading combinations. The dictionary files are free to use with attribution (Creative Commons Attribution-ShareAlike) and have been widely adopted on the Internet and are used in many computer and smartphone applications. The project is considered a standard Japanese–English reference on the Internet and is used by the Unihan Database and several other Japanese–English projects. History The JMdict project was started by computational linguist Jim Breen in 1991 with the creation of EDICT (a plain text flat file in EUC-JP encoding), which was later expanded to a UTF-8-encoded XML file in 1999 as JMdict. The XML format allows for multiple surface forms of lexemes and multiple readings, as well as cross-references and annotations. It permits glosses i ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Jim Breen
James William Breen (born 1947) is a Research Fellow at Monash University in Australia, where he was a professor in the area of IT and telecommunications before his retirement in 2003. He holds a BSc in mathematics, an MBA and a PhD in computational linguistics, all from the University of Melbourne. He is well known for his involvement in several popular free Japanese-related projects: the EDICT and JMDict Japanese–English dictionaries, the KANJIDIC kanji dictionary, and the WWWJDIC WWWJDIC is an online Japanese dictionary based on the electronic dictionaries compiled and collected by Australian academic Jim Breen. The main Japanese–English dictionary file (EDICT) contains over 180,000 entries, and the ENAMDICT dictionary ... portal which provides an interface to search them. His EDICT dictionary and WWWJDIC server have been described as "reliable and close to comprehensive". The 195,000-term lexicon is used by popular apps such aImiWa(iOS) anAEDict(Android), and has b ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Nonprofit
A nonprofit organization (NPO), also known as a nonbusiness entity, nonprofit institution, not-for-profit organization, or simply a nonprofit, is a non-governmental (private) legal entity organized and operated for a collective, public, or social benefit, as opposed to an entity that operates as a business aiming to generate a Profit (accounting), profit for its owners. A nonprofit organization is subject to the non-distribution constraint: any revenues that exceed expenses must be committed to the organization's purpose, not taken by private parties. Depending on the local laws, charities are regularly organized as non-profits. A host of organizations may be non-profit, including some political organizations, schools, hospitals, business associations, churches, foundations, social clubs, and consumer cooperatives. Nonprofit entities may seek approval from governments to be Tax exemption, tax-exempt, and some may also qualify to receive tax-deductible contributions, but an enti ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,064 valid Unicode code points using a variable-width encoding of one to four one- byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any extended ASCII can read and write UTF-8, and this results in fewer internationalization issues than any alternative text encoding. UTF-8 is dominant for all countries/languages on the internet, with 99% global ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
CEDICT
The CEDICT project was started by Paul Denisowski in 1997 and is maintained by a team on mdbg.net under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters. Content CEDICT is a text file; other programs (or simply Notepad or egrep or equivalent) are needed to search and display it. This project is used by several other Chinese-English projects. The Unihan Database uses CEDICT data for most of its information about character compounds, but this is auxiliary and is explicitly not a part of the main Unicode database. Features: * Traditional Chinese and Simplified Chinese * Pinyin (several pronunciations) * American English (several) * , it had 122,444 entries in UTF-8. The basic format of a CEDICT entry is: Traditional Simplified in1 yin1/American English equivalent 1/equivalent 2/ 漢字 汉字 an4 zi4/Chinese character/CL:個, 个/ Example of a simple egrep search: $ egrep -i 有� ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Tatoeba
Tatoeba is a free content, free collection of example sentences with translations geared towards Second-language acquisition, foreign language learners. It is available in more than 400 languages. Its name comes from the Japanese phrase , meaning 'for example'. It is written and maintained by a community of volunteers through a model of open collaboration. Individual contributors are known as "Tatoebans". It is run by Association Tatoeba, a French Nonprofit organization, non-profit organization funded through donations. History and development In 2006, Trang Ho was frustrated that unlike some of their Japanese counterparts, German Bilingual dictionary, bilingual dictionaries didn't feature full-text search of usage examples with translations. It led her to imagine her ideal dictionary and to build a prototype hosted on SourceForge under the name "multilangdict." The main focus was already the crowdsourcing of translated sentences: "A Wikipedia type of thing, except people ad ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Monash University
Monash University () is a public university, public research university based in Melbourne, Victoria (state), Victoria, Australia. Named after World War I general Sir John Monash, it was founded in 1958 and is the second oldest university in the state. The university has a number of campuses, four of which are in Victoria (Monash University, Clayton campus, Clayton, Monash University, Caulfield campus, Caulfield, Monash University, Peninsula campus, Peninsula, and Monash University, Parkville Campus, Parkville), one in Monash University Malaysia Campus, Malaysia and another one in Indonesia. Monash also owns landed property, land (3.6 hectares) in Notting Hill, Victoria, Notting Hill, opposite its Clayton campus. Monash has a research and teaching centre in Monash University, Prato Centre, Prato, Italy, a graduate research school in IITB-Monash Research Academy, Mumbai, India and graduate schools in Southeast University-Monash University Joint Graduate School, Suzhou, China and T ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
EPWING
EPWING () is the standard format for electronic dictionaries, primarily used for Japanese. A subset of EPWING V1 is standardized as JIS X 4081 ("Retrieval Data Structure for Japanese Electronic Publication"). History In 1986, Fujitsu, Iwanami Shoten, Sony and Dai Nippon Printing worked together to publish a new CD-ROM edition of ''Kōjien''. They created a specification called "WING". In 1988, after CD-ROM was standardized as ISO 9660 ISO 9660 (also known as ECMA-119) is a file system for optical disc media. The file system is an international standard available from the International Organization for Standardization (ISO). Since the specification is publicly available, im ..., the WING specification was renamed to "EPWING" ("Electronic Publishing WING"). File structure The basic structure of EPWING is as follows: . └── Catalogs └── ├── DATA │ └── HONMON └── GAIJI ├── GA16FULL ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
WWWJDIC
WWWJDIC is an online Japanese dictionary based on the electronic dictionaries compiled and collected by Australian academic Jim Breen. The main Japanese–English dictionary file (EDICT) contains over 180,000 entries, and the ENAMDICT dictionary contains over 720,000 Japanese surnames, first names, place names and product names. WWWJDIC also contains several specialized dictionaries covering topics such as life sciences, law, computing, engineering, etc. For example sentences with Japanese words, WWWJDIC makes use of a sentence database from the Tatoeba project, largely based on the Tanaka Corpus. Unlike the original Tanaka Corpus, the sentences from the Tatoeba project are not public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ..., but are available under the non-rest ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Flat-file Database
A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a plain text file (e.g. csv, txt or tsv), or a binary file. Relationships can be inferred from the data in the database, but the database format itself does not make those relationships explicit. The term has generally implied a small database, but very large databases can also be flat. Overview Plain text files usually contain one record per line. Examples of flat files include /etc/passwd and /etc/group on Unix-like operating systems. Another example of a flat file is a name-and-address list with the fields ''Name'', ''Address'' and ''Phone Number''. Flat files are typically either delimiter-separated or fixed-width. Delimiter-separated values In delimiter-separated values files, the fields are separated by a character or string c ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
EUC-JP
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters). The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded character set (such as ASCII) taking one byte, and a character belonging to a 94×94 coded character set (such as ) represented in two bytes. The EUC-CN form of and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes, including an initial , whereas a single character in EUC-TW can take up to four bytes. Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors. EUC is however still very popular, especially EUC-KR for South Korea. Encoding structure The structure of EUC is based on the standard, which specifies a system of graphical character set ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Japanese Dictionary
have a history that began over 1300 years ago when Japanese Buddhist priests, who wanted to understand Chinese sutras, adapted Chinese character dictionaries. Present-day Japanese lexicographers are exploring computerized editing and electronic dictionaries. According to Nakao Keisuke (): It has often been said that dictionary publishing in Japan is active and prosperous, that Japanese people are well provided for with reference tools, and that lexicography here, in practice as well as in research, has produced a number of valuable reference books together with voluminous academic studies. (1998:35) After introducing some Japanese "dictionary" words, this article will discuss early and modern Japanese dictionaries, demarcated at the 1603 CE lexicographical sea-change from '' Nippo Jisho'', the first bilingual Japanese–Portuguese dictionary. "Early" here will refer to lexicography during the Heian, Kamakura, and Muromachi periods (794–1573); and "modern" to Japanese dictionar ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Plain Text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.). The term is sometimes used quite loosely, to mean files that contain ''only'' "readable" content (or just files with nothing that the speaker does not prefer). For example, that could exclude any indication of fonts or layout (such as markup, markdown, or even tabs); characters s ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |