Alluxio
   HOME

TheInfoList



OR:

Alluxio is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
virtual
distributed file system A clustered file system (CFS) is a file system which is shared by being simultaneously Mount (computing), mounted on multiple Server (computing), servers. There are several approaches to computer cluster, clustering, most of which do not emplo ...
(VDFS). Initially as research project "Tachyon", Alluxio was created at the
University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a Public university, public Land-grant university, land-grant research university in Berkeley, California, United States. Founded in 1868 and named after t ...
's
AMPLab AMPLAB was a University of California, Berkeley lab focused on big data analytics located in Soda Hall. The name stands for the Algorithms, Machines and People Lab. It has been publishing papers since 2008 and was officially launched in 2011. The A ...
as
Haoyuan Li Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), and creating an open-source data orchestration s ...
's Ph.D. Thesis, advised by Professor
Scott Shenker Scott J. Shenker (born January 24, 1956) is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science ...
& Professor
Ion Stoica Ion Stoica (born ) is a Romanian–American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPL ...
. Alluxio is situated between computation and storage in the
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the
Apache License The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...
. Data Driven Applications, such as Data Analytics, Machine Learning, and AI, use APIs (such as Hadoop HDFS API, S3 API, FUSE API) provided by Alluxio to interact with data from various storage systems at a fast speed. Popular frameworks running on top of Alluxio include
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
, Presto,
TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...
,
Trino Trino () is a ''comune'' (municipality) in the Province of Vercelli in the Italian region Piedmont, located about northeast of Turin and about southwest of Vercelli, at the foot of the Montferrat hills. Trino borders the following municipalit ...
,
Apache Hive Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like Interface (computing), interface to query data stored in various databases and file systems that i ...
, and
PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the mo ...
, etc. Alluxio can be deployed on-premise, in the cloud (e.g.
Microsoft Azure Microsoft Azure, or just Azure ( /ˈæʒər, ˈeɪʒər/ ''AZH-ər, AY-zhər'', UK also /ˈæzjʊər, ˈeɪzjʊər/ ''AZ-ure, AY-zure''), is the cloud computing platform developed by Microsoft. It has management, access and development of ...
, AWS,
Google Compute Engine Google Compute Engine (GCE) is the infrastructure as a service (IaaS) component of Google Cloud Platform which is built on the global infrastructure that runs Google's search engine, Gmail, YouTube and other services. Google Compute Engine enab ...
), or a hybrid cloud environment. It can run on bare-metal or in containerized environments such as
Kubernetes Kubernetes (), also known as K8s is an open-source software, open-source OS-level virtualization, container orchestration (computing), orchestration system for automating software deployment, scaling, and management. Originally designed by Googl ...
, Docker,
Apache Mesos Apache Mesos is an Open-source software, open-source project to manage computer clusters. It was developed at the University of California, Berkeley. History Mesos began as a research project in the UC Berkeley RAD Lab by then PhD students Benja ...
.


History

Alluxio was initially started by
Haoyuan Li Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), and creating an open-source data orchestration s ...
at UC Berkeley's
AMPLab AMPLAB was a University of California, Berkeley lab focused on big data analytics located in Soda Hall. The name stands for the Algorithms, Machines and People Lab. It has been publishing papers since 2008 and was officially launched in 2011. The A ...
in 2013, and open sourced in 2014. Alluxio had in excess of 1000 contributors in 2018,Open HUB Alluxio development activity
/ref> making it one of the most active projects in the data eco-system.


Enterprises that use Alluxio

The following is a list of notable enterprises that have used or are using Alluxio:


See also

*
Clustered file system A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached stora ...
*
Comparison of distributed file systems In computing, a distributed file system (DFS) or network file system is any file system that allows access from multiple hosts to files shared via a computer network. This makes it possible for multiple users on multiple machines to share file ...
*
Global Namespace Global may refer to: General *Globe, a spherical model of celestial bodies *Earth, the third planet from the Sun Entertainment * ''Global'' (Paul van Dyk album), 2003 * ''Global'' (Bunji Garlin album), 2007 * ''Global'' (Humanoid album), 198 ...
*
List of file systems The following lists identify, characterize, and link to more thorough information on file systems. Many older operating systems support only their one "native" file system, which does not bear any name apart from the name of the operating system i ...


References


External links

* {{URL, https://www.alluxio.io Free and open-source software Distributed file systems Free software programmed in Java (programming language) Software using the Apache license