HFST
   HOME

TheInfoList



OR:

Helsinki Finite-State Technology (HFST) is a computer programming
library A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
and set of utilities for natural language processing with finite-state automata and
finite-state transducer A finite-state transducer (FST) is a finite-state machine with two memory ''tapes'', following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. ...
s. It is
free and open-source software Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
, released under a mix of the
GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
version 3 (GPLv3) and the
Apache License The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...
.


Features

The library functions as an interchanging interface to multiple backends, such as OpenFST,
foma Freedom of Mobile Multimedia Access (FOMA) is the brand name of the W-CDMA-based 3G telecommunications services being offered by the Japanese telecommunications service provider NTT DoCoMo. It is an implementation of the Universal Mobile Telecommu ...
and SFST. The utilities comprise various compilers, such as hfst-twolc (a compiler for morphological two-level rules), hfst-lexc (a compiler for lexicon definitions) and hfst-regexp2fst (a regular expression compiler). Functions from
Xerox Xerox Holdings Corporation (, ) is an American corporation that sells print and electronic document, digital document products and services in more than 160 countries. Xerox was the pioneer of the photocopier market, beginning with the introduc ...
's proprietary scripting language xfst is duplicated in hfst-xfst, and the pattern matching utility pmatch in hfst-pmatch, which goes beyond the finite-state formalism in having
recursive transition network A recursive transition network ("RTN") is a graph theoretical schematic used to represent the rules of a context-free grammar. RTNs have application to programming languages, natural language and lexical analysis. Any sentence that is constructe ...
s (RTNs). The library and utilities are written in
C++ C++ (, pronounced "C plus plus" and sometimes abbreviated as CPP or CXX) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programmin ...
, with an interface to the library in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
and a utility for looking up results from transducers ported to
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and Python. Transducers in HFST may incorporate weights depending on the backend. For performing FST operations, this is currently only possible via the OpenFST backend. HFST provides two ''native'' backends, one designed for fast lookup (''hfst-optimized-lookup''), the other for format interchange. Both of them can be weighted.


Uses

HFST has been used for writing various linguistic tools, such as spell-checkers, hyphenators, and morphologies. Morphological dictionaries written in other formalisms have also been converted to HFST's formats.


See also

*
Foma (software) Foma is a free and open source ''finite-state toolkit'' created and maintained by Mans Hulden. It includes a compiler, programming language, and C library for constructing finite-state automata and transducers (FST's) for various uses, most typi ...


Notes


External links

* * https://github.com/hfst/hfst/wiki - A documentation wiki


References

{{cite conference , url= https://researchportal.helsinki.fi/en/publications/hfsta-system-for-creating-nlp-tools , title= HFST - A System for Creating NLP Tools , first1= Krister , last1= Lindén , first2= Erik , last2= Axelson , first3= Senka , last3= Drobac , first4= Sam , last4= Hardwick , first5= Juha , last5= Kuokkala , first6= Jyrki , last6= Niemi , first7= Tommi , last7= Pirinen , first8= Miikka , last8= Silfverberg , date= 2013 , conference= Systems and Frameworks for Computational Morphology , conference-url= http://sfcm.eu/sfcm2013/ , editor1-first= Cerstin , editor1-last= Mahlow , editor2-first= Michael , editor2-last= Piotrowski , series= Communications in Computer and Information Science , volume= 380 , book-title= Systems and Frameworks for Computational Morphology , publisher= Springer , location= Humboldt-Universität in Berlin , pages= 53–71 Finite-state machines Free software programmed in C++ Free software programmed in Prolog Free software programmed in Python Cross-platform free software Free software for Linux Free software for Windows Free software for macOS Software using the GNU General Public License