Owl Scientific Computing
   HOME

TheInfoList



OR:

Owl Scientific Computing is a software system for scientific and engineering computing developed in the
Department of Computer Science and Technology, University of Cambridge The Department of Computer Science and Technology, formerly the Computer Laboratory, is the computer science department of the University of Cambridge. it employed 56 faculty members, 45 support staff, 105 research staff, and about 205 researc ...
. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the
MIT License The MIT License is a permissive software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts very few restrictions on reuse and therefore has high license compatibility. Unl ...
and can be accessed from the GitHub repository. The library is mostly designed and developed in the functional programming language
OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
. As a unique functional programming language, OCaml offers runtime efficiency, flexible module system, static type checking, intelligent garbage collector, and powerful
type inference Type inference, sometimes called type reconstruction, refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some bran ...
. Owl inherits these features directly from OCaml. With Owl, users can write succinct type-safe numerical applications in a concise functional language without sacrificing performance. It speeds up the development life-cycle, and reduces the cost from prototype to production use. The system serves as the de facto tool for computation intensive tasks in OCaml.


History

Owl was developed when Dr. Liang Wang was working as a Post-Doc in the OCaml Labs. Owl originated from a research project which studied the design of synchronous parallel machines for large-scale distributed computing in July 2016. Back then the libraries for numerical computing in OCaml ecosystem were very limited and the tooling was fragmented at that time. In order to test various analytical applications, many numerical functions had to be implemented, from very low level algebra and random number generators to the high level stuff like algorithmic differentiation and deep neural networks. These code snippets started accumulating. These functions were later taken out and wrapped into a standalone library named Owl. Owl's architecture undertook at least a dozen of iterations in the beginning, and some of the architectural changes are quite drastic. After one-year intensive development, Owl was capable of doing many complicated numerical tasks (e.g. image classification). Dr. Liang Wang held a tutorial at the CUFP 2017 to demonstrate data science in OCaml. In 2018, Prof. Richard Mortier gave a talk about Owl in the
Alan Turing Institute The Alan Turing Institute is the United Kingdom's national institute for data science and artificial intelligence, founded in 2015 and largely funded by the UK government. It is named after Alan Turing, the British mathematician and computing p ...
. To further promote OCaml and functional programming in data science, Owl provides abundant learning materials in the form of a details manual.


Design and features

Owl has implemented many advanced numerical functions atop of its implementation of n-dimensional arrays. Compared to other numerical libraries, Owl is unique in many perspectives, e.g. algorithmic differentiation and distributed computing have been included as integral components in the core system to maximise developers' productivity. The figure below gives a bird view of Owl's system architecture. The subsystem on the left part is Owl's Numerical system. The modules contained in this subsystem fall into three categories. The first is core modules contains basic data structures, i.e., N-dimensional array (Ndarray) in both dense and sparse forms. The Ndarray module supports various number types: float32, float64, complex32, complex64, int16, int32, etc. Also, the core module provide foreign function interfaces to other low level numerical libraries, such as CBLAS and
LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It als ...
. These libraries are fully interfaced to the Linear Algebra module. The second category is the classic analytics modules. This part contains basic mathematical and statistical functions,
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...
, regression, optimisation, plotting, etc. Advanced math and statistics functions such as
statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
and
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
are also included. As a core functionality, Owl provides the algorithmic differentiation (or automatic differentiation) and dynamic computation graph modules. The highest level in the Owl architecture includes modules more advanced numerical applications such as
neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...
,
natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, data processing etc. The Zoo system is used for efficient scripting and code sharing. The modules in the second category, especially the algorithmic differentiation, make the code at this level quite concise. The subsystem on the right is called Actor Subsystem which extends Owl's capability to parallel and distributed computing. The core idea is to transform a user application from sequential execution mode into parallel mode (using various computation engines) with minimal efforts. The method is to compose two subsystems together with
functors In mathematics, specifically category theory, a functor is a mapping between categories. Functors were first considered in algebraic topology, where algebraic objects (such as the fundamental group) are associated to topological spaces, and m ...
to generate the parallel version of the module defined in the numerical subsystem. Besides what have been mentioned in this figure, there are several other features in Owl. For example, the
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
and
unikernel A unikernel is a type of computer program that is static linking, statically linked with the operating system code on which it depends. Unikernels are built with a specialized compiler that identifies the operating system services that a program ...
backends, integration with other frameworks such as
TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...
and
PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the mo ...
, utilising GPU and other accelerator frameworks via symbolic graph, etc.


Research

The Owl project is research oriented, and supports research of numerical computing in multiple related topics. Some of its research topics are listed below. * Synchronous parallel distributed machine learning design. Owl is the first to propose using sampling to synchronise nodes in iterative algorithms. The work published on arxiv comes with solid mathematical proof. This idea proves to be advanced and was later proposed in top Machine Learning conferences. * One of the factors that contribute to the small code base of Owl is that it builds advanced analytical functions around the algorithmic differentiation. This idea was also proves to be popular and develops into the paradigm of
Differentiable programming Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation. This allows for gradient-based optimization of parameters in the program, often via gradient ...
. It is now being used in popular numerical packages such as JuliaDiff. * Using the computation graph offers another dimension optimization to the computation in Owl. Besides, the computation graph also bridges Owl application and hardware accelerators such as
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
and TPU. Later, the computation graph becomes a de facto intermediate representation. Standards such as the
Open Neural Network Exchange The Open Neural Network Exchange (ONNX) [] is an Open-source software, open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms an ...
and
Neural Network Exchange Format Neural Network Exchange Format (NNEF) is an artificial neural network data exchange format developed by the Khronos Group. It is intended to reduce machine learning deployment fragmentation by enabling a rich mix of neural network training tools ...
are now widely supported by various deep learning frameworks such as
TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...
and
PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the mo ...
. * The idea of service-level composition and serving was investigated in the Zoo subsystem of Owl. The prototype demonstrates the streamlining various stages in the code development including composition, test, distribution, validation, and deployment. It is very similar to the later
MLOps MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. It bridges the gap betweemachine learning developmentand production operations, ensuring that models are robust, scalabl ...
concepts. Recently this topic attracts attention in top system conferences such as OSDI. As result of research following part of these directions, Owl produces several publications. In 2018, a paper titled Data Analytics Service Composition and Deployment on Edge Devices is accepted at the ACM
SIGCOMM SIGCOMM is the Association for Computing Machinery's Special Interest Group on Data Communications, which specializes in the field of communication and computer networks. It is also the name of an annual 'flagship' conference, organized by SIGCOMM ...
2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks. Two talks are also accepted at the OCaml Workshop of the International Conference on Functional Programming 2019, on the topics of numerical ordinary differential equation solving, and executing Owl computation on GPUs. An internship in the OCaml Labs investigates the topic of
image segmentation In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects (Set (mathematics), sets of pixels). The goal of segmen ...
and related memory optimisation in Owl. In 2022, the book <> was published by Springer. In 2023, the book <> was published by Apress.


See also

*
Array programming In computer science, array programming refers to solutions that allow the application of operations to an entire set of values at once. Such solutions are commonly used in computational science, scientific and engineering settings. Modern program ...
*
List of numerical-analysis software Listed here are notable end-user computer applications intended for use with numerical or data analysis: Numerical-software packages * Analytica is a widely used proprietary software tool for building and analyzing numerical models. It is a de ...


References

{{DEFAULTSORT:Owl Free mathematics software Numerical analysis software for Linux Numerical programming languages Array programming languages Free science software Numerical analysis software for macOS Software using the MIT license