Azure Data Lake
   HOME

TheInfoList



OR:

Azure Data Lake is a scalable data storage and analytics service. The service is hosted in Azure, Microsoft's public
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
.


History

Azure Data Lake service was released on November 16, 2016. It is based on COSMOS, which is used to store and process data for applications such as Azure, AdCenter,
Bing Bing most often refers to: * Bing Crosby (1903–1977), American singer * Microsoft Bing, a web search engine Bing may also refer to: Food and drink * Bing (bread), a Chinese flatbread * Bing (soft drink), a UK brand * Bing cherry, a varie ...
,
MSN MSN (meaning Microsoft Network) is a web portal and related collection of Internet services and apps for Windows and mobile devices, provided by Microsoft and launched on August 24, 1995, alongside the release of Windows 95. The Microsoft Net ...
,
Skype Skype () is a proprietary telecommunications application operated by Skype Technologies, a division of Microsoft, best known for VoIP-based videotelephony, videoconferencing and voice calls. It also has instant messaging, file transfer, deb ...
and
Windows Live Windows Live is a discontinued brand name for a set of web services and software products developed by Microsoft as part of its software-as-a-service platform. Chief components under the brand name included web services (all of which were expose ...
. COSMOS features a SQL-like query engine called SCOPE upon which U-SQL was built.


Azure Data Lake Store

Users can store structured, semi-structured or
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
produced from applications including
social networks A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for a ...
,
relational data Relational may refer to: Business * Relational capital, the value inherent in a company's relationships with its customers, vendors, and other important constituencies * Relational contract, a contract whose effect is based upon a relationship o ...
, sensors, videos, web apps, mobile or desktop devices. A single Azure Data Lake Store account can store trillions of files where a single file can be greater than a
petabyte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
in size.


Azure Data Lake Analytics

Azure Data Lake Analytics is a parallel on-demand job service. The parallel processing system is based on Microsoft Dryad. Dryad can represent arbitrary
Directed Acyclic Graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one v ...
s (DAGs) of computation. Data Lake Analytics provides a distributed infrastructure that can dynamically allocate or de-allocate resources so customers pay for only the services they use. Azure Data Lake Analytics uses Apache YARN, the part of Apache Hadoop which governs resource management across clusters. Microsoft Azure Data Lake Store supports any application that uses the Hadoop Distributed File System (HDFS) interface.


U-SQL

Using Data Lake Analytics, users can develop and run parallel data transformation and processing programs in U-SQL, a query language that combines SQL with C#. U-SQL was designed as an evolution of the declarative SQL language with native extensibility through the user code written in C#. U-SQL uses C# data types and the C# expression language.


See also

*
Data lake A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transform ...


References


External links


Data Lake on Microsoft Azure
{{Microsoft Azure Services Platform, state=expanded Cloud computing Cloud computing providers Cloud infrastructure Cloud platforms Microsoft cloud services