Text Simplification

	Text Simplification Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and information remain the same. Text simplification is an important area of research because of communication needs in an increasingly complex and interconnected world more dominated by science, technology, and new media. But natural human languages pose huge problems because they ordinarily contain large vocabularies and complex constructions that machines, no matter how fast and well-programmed, cannot easily process. However, researchers have discovered that, to reduce linguistic diversity, they can use methods of semantic compression to limit and simplify a set of words used in given texts. Example Text simplification is illustrated with an example used by Siddharthan (2006). The first sentence contains two relative clauses and one c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Natural Language Processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Major tasks in natural language processing are speech recognition, text classification, natural-language understanding, natural language understanding, and natural language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lexical Simplification Lexical simplification is a sub-task of text simplification. It can be defined as any lexical substitution task that reduces text complexity. See also Lexical substitution Text simplification Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning an ... References * Advaith Siddharthan.Syntactic Simplification and Text Cohesion. In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands. * Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June External links Task Computational linguistics Speech recognition Natural language processing {{Comp-ling- ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speech Recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent". Speech recognition applications include voice user interfaces ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Computational Linguistics Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others. Computational linguistics is closely related to mathematical linguistics. Origins The field overlapped with artificial intelligence since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since rule-based approaches were able to make arithmetic (systematic) calculations much faster and more accurately than humans, it was expected that lexicon, morphology, syntax and semantics can be learned using explicit rules, a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Basic English Basic English (a backronym for British American Scientific International and Commercial English) is a controlled language based on standard English, but with a greatly simplified vocabulary and grammar. It was created by the linguist and philosopher Charles Kay Ogden as an international auxiliary language, and as an aid for teaching English as a second language. It was presented in Ogden's 1930 book ''Basic English: A General Introduction with Rules and Grammar''. The first work on Basic English was written by two Englishmen, Ivor Richards of Harvard University and Charles Kay Ogden of the University of Cambridge in England. The design of Basic English drew heavily on the semiotic theory put forward by Ogden and Richards in their 1923 book '' The Meaning of Meaning''. Ogden's Basic, and the concept of a simplified English, gained its greatest publicity just after the Allied victory in World War II as a means for world peace. He was convinced that the world needed to grad ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Simplified Technical English ASD-STE100 Simplified Technical English (STE) is a controlled natural language designed to simplify and clarify technical documentation. It was originally developed during the 1980's by the European Association of Aerospace Industries (AECMA), at the request of the European Airline industry, who wanted a standardized form of English for aircraft maintenance documentation that could be easily understood by non-native English speakers. It has since been adopted in many other fields outside the aerospace, defense, and maintenance domains for its clear, consistent, and comprehensive nature. The current edition of the STE Standard, published in January 2025, consists of 53 writing rules and a dictionary of approximately 900 approved words. History The first attempts towards controlled English were made as early as the 1930s and 1970s with Basic English, Caterpillar Fundamental English and Eastman Kodak (KISL). In 1979, aerospace documentation was written in American English (Boeing, D ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Text Normalization Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. Applications Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context.Sproat, R.; Black, A.; Chen, S.; Kumar, S.; Ostendorf, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' 15; 287–333. doibr>10.1006/csla.2001.0169 For example: * "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. * "vi" ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Semantic Compression In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document (or a set of documents) by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words. In most applications, semantic compression is a lossy compression. Increased prolixity does not compensate for the lexical compression and an original document cannot be reconstructed in a reverse process. By generalization Semantic compression is basically achieved in two steps, using frequency dictionaries and semantic network: # determining cumulated term frequencies to identify target lexicon, # replacing less frequent terms with their hypernyms (generalization) from target lexicon. Step 1 requires assembling word frequencies and information on semantic relationships, specifically hyponymy. Moving upwards in word hierarchy, a cumulative concept frequency is calculating by adding a su ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lexical Substitution Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the ''match'', replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of ''game'' might be given. Lexical substitution is strictly related to word sense disambiguation (WSD), in that both aim to determine the meaning of a word. However, while WSD consists of automatically assigning the appropriate sense from a fixed sense inventory, lexical substitution does not impose any constraint on which substitute to choose as the best representative for the word in context. By not prescribing the inventory, lexical substitution overcomes the issue of the granularity of sense distinctions and provides a level playing field for automatic systems that automatically acquire word senses (a task referred to as Word Sense Induction). Evaluation In order to evaluate automatic systems on lexical su ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Language Reform Language reform is a kind of language planning by widespread change to a language. The typical methods of language reform are simplification and linguistic purism. Simplification regularises vocabulary, grammar, or spelling. Purism aligns the language with a form which is deemed 'purer'. Language reforms are intentional changes to language; this article does not cover natural language change, such as the Great Vowel Shift. Simplification By far the most common language reform is simplification. The most common simplification is spelling reform, but inflection, syntax, vocabulary and word formation can also be targets for simplification. For example, in English, there are many prefixes which mean "the opposite of", e.g. ''un-'', ''in-'', ''a(n)-'', ''dis-'', and ''de-''. A language reform might propose to replace the redundant prefixes with one, such as ''un-''. Purification Linguistic purism or linguistic protectionism is the prescriptive practice of recognising one form of a l ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Meaning (linguistic) Semantics is the study of linguistic meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction between sense and reference. Sense is given by the ideas and concepts associated with an expression while reference is the object to which an expression points. Semantics contrasts with syntax, which studies the rules that dictate how to create grammatically correct sentences, and pragmatics, which investigates how people use language in communication. Lexical semantics is the branch of semantics that studies word meaning. It examines whether words have one or several meanings and in what lexical relations they stand to one another. Phrasal semantics studies the meaning of sentences by exploring the phenomenon of compositionality or how new meanings can be created by arranging words. Formal semantics relies on logic and mathematics to provide precise framewor ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Controlled Natural Language Controlled natural languages (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the semi-automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal syntax and formal semantics, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]