HOME

TheInfoList



OR:

The Multidimensional hierarchical toolkit o

is a
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
-based, open-sourced, toolkit of portable
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
that supports very fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in databases ranging in size up to 256 terabytes. The package is written in C and C++ and is available under the
GNU GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
/LGPL/Free Documentation licenses in source code form. The distribution kit contains demonstration implementations of network-capable, interactive text and sequence retrieval tools that function with very large genomic data bases and illustrate the toolkit's capability to manipulate massive data sets of genomic information.


Distribution

The toolkit is distributed as part of th
Mumps Compiler
Versions exist for Linux,
Cygwin Cygwin ( ) is a POSIX-compatible programming and runtime environment that runs natively on Microsoft Windows. Under Cygwin, source code designed for Unix-like operating systems may be compiled with minimal modification and executed. The Cygwin in ...
, and
Windows XP Windows XP is a major release of Microsoft's Windows NT operating system. It was released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct upgrade to its predecessors, Windows 2000 for high-end and ...
.


Origins

The toolkit is a solution to the problem of manipulating very large, character string indexed, multi-dimensional, sparse matrices. It is based on
MUMPS MUMPS ("Massachusetts General Hospital Utility Multi-Programming System"), or M, is an imperative, high-level programming language with an integrated transaction processing key–value database. It was originally developed at Massachusetts Gener ...
(also referred to as M), a general purpose programming language that originated in the mid 60's at the
Massachusetts General Hospital Massachusetts General Hospital (Mass General or MGH) is the original and largest teaching hospital of Harvard Medical School located in the West End neighborhood of Boston, Massachusetts. It is the third oldest general hospital in the United Stat ...
.


Key features

The principal database feature in this project is the ''global array'' which permits direct, efficient manipulation of multi-dimensional arrays of effectively unlimited size. A global array is a persistent, sparse, undeclared, multi-dimensional, string indexed data disk based structure. A global array may appear anywhere an ordinary array reference is permitted and data may be stored at leaf nodes as well as intermediate nodes in the data base array. The number of subscripts in an array reference is limited only by the total length of the array reference with all subscripts expanded to their string values. The toolkit includes several functions to traverse the data base and manipulate the arrays. The toolkit makes the data base and function set available as C++ classes and also permits interpretive execution of legacy Mumps scripts. To use the toolkit, you install the MDH and Mumps distribution kit and related code.


Functions implemented

The toolkit implements the legacy Mumps functions: $ascii(), $extract(), $find(), , $length(), $name(), $justify(), $order(), $piece(), and $test as well as vector and matrix operations, Boyer–Moore–Gosper string search algorithm functions, a Smith–Waterman algorithm function, relational algebra operations and access to the Perl Compatible Regular Expression library (
PCRE Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax i ...
). Science software for Linux {{linux-stub