HOME

TheInfoList



OR:

Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one lar ...
s, e-mail server software,
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
backup, and other storage-related computer software. Single-instance storage is a simple variant of
data deduplication In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amou ...
. While data deduplication may work at a segment or sub-block level, single-instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages.


Concept

In the case of an e-mail server, single-instance storage would mean that a single copy of a message is held within its
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
while individual mailboxes access the content through a reference pointer. However, there is a common misconception that the primary benefit of single-instance storage in mail servers is a reduction in disk space requirements. The truth is that its primary benefit is to greatly enhance delivery efficiency of messages sent to large distribution lists. In a mail server scenario disk space savings from single-instance storage are transient and drop off very quickly over time. When used in conjunction with backup software, single-instance storage can reduce the quantity of
archive An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual ...
media required since it avoids storing duplicate copies of the same file. Often identical files are installed on multiple computers, for example
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
files. With single-instance storage, only one copy of a file is written to the backup media therefore reducing space. This becomes more important when the storage is offsite and on
cloud storage Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is ty ...
such as
Amazon S3 Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its ...
. In such cases, it has been reported that deduplication can help reduce the costs of storage, costs of bandwidth and backup windows by up to 10:1.
Novell GroupWise GroupWise is a messaging and collaboration platform from Micro Focus that supports email, calendaring, personal information management, instant messaging, and document management. The GroupWise platform consists of desktop client software, wh ...
was built on single-instance storage, which accounts for its large capacity. ISO CD/DVD image files can be optimized to use SIS to reduce the size of a CD/DVD compilation (if there are enough duplicated files) to make it fit into smaller media. SIS is related to system wide file duplication search and multiple file instance detection tools such as the P2P application
BearShare BearShare was a peer-to-peer-file-sharing-application originally created by Free Peers, Inc. for Microsoft Windows and also a rebranded version of iMesh by MusicLab, LLC, tightly integrated with their music subscription service. History The pr ...
(5.n Versions and below) but differs in that SIS reduces storage utilization automatically and creates and retains symbolic linkages, whereas Bearshare allows for manual deletion of duplicates and associated user-level file system,
Windows Explorer File Explorer, previously known as Windows Explorer, is a file manager application that is included with releases of the Microsoft Windows operating system from Windows 95 onwards. It provides a graphical user interface for accessing the file ...
type of icon links.


Microsoft

SIS was introduced with the
Remote Installation Services RIS, Remote Installation Services is a Microsoft-supplied server that allows PXE BIOS-enabled computers to remotely execute boot environment variables. These variables are likely computers that are on a company's (or that company's client's) netwo ...
feature of
Windows 2000 Server Windows 2000 is a major release of the Windows NT operating system developed by Microsoft and oriented towards businesses. It was the direct successor to Windows NT 4.0, and was released to manufacturing on December 15, 1999, and was official ...
. A typical server might hold ten or more unique installation configurations (perhaps with different
device driver In computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabling operating systems and o ...
s or
software suite A software suite (also known as an application suite) is a collection of computer programs (application software, or programming software) of related functionality, sharing a similar user interface and the ability to easily exchange data with eac ...
s) but perhaps only 20% of the data may be unique between configurations. Microsoft states that "SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the SIS Common Store, and replaces other copies with
pointers Pointer may refer to: Places * Pointer, Kentucky * Pointers, New Jersey * Pointers Airport, Wasco County, Oregon, United States * The Pointers, a pair of rocks off Antarctica People with the name * Pointer (surname), a surname (including a li ...
to the stored versions." Files are compared solely by their
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...
s; files with different names or dates can be consolidated so long as the data itself is identical.
Windows Server 2003 Windows Server 2003 is the sixth version of Windows Server operating system produced by Microsoft. It is part of the Windows NT family of operating systems and was released to manufacturing on March 28, 2003 and generally available on April 24, ...
Standard Edition has SIS capabilities but is limited to OEM OS system installs. The file-based Windows Imaging Format introduced in
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
also supported single-instance storage. Single-instance storage was a feature of
Microsoft Exchange Server Microsoft Exchange Server is a mail server and calendaring server developed by Microsoft. It runs exclusively on Windows Server operating systems. The first version was called Exchange Server 4.0, to position it as the successor to the related ...
since version 4.0 and is also present in Microsoft's
Windows Home Server Windows Home Server (code-named Quattro) is a home server operating system from Microsoft. It was announced on 7 January 2007 at the Consumer Electronics Show by Bill Gates, released to manufacturing on 16 July 2007 and officially released o ...
. It is deduplicating attachments only in Exchange 2007 and was dropped completely in Microsoft Exchange Server 2010. Microsoft announced Windows Storage Server 2008 (WSS2008)Windows Storage Server 2008
at Microsoft
with Single Instance Storage on June 1, 2009, and states this feature is not available on
Windows Server 2008 Windows Server 2008 is the fourth release of the Windows Server operating system produced by Microsoft as part of the Windows NT family of the operating systems. It was released to manufacturing on February 4, 2008, and generally to retail on F ...
. The feature is officially deprecated since Windows Server 2012, when a new, more powerful chunk-based data deduplication mechanism was introduced. It allows files with similar content to be deduplicated as long as they have stretches of identical data. This mechanism is more powerful than SIS. Since Windows Server 2019, the feature is fully supported on ReFS.


See also

* Capacity optimization *
Data deduplication In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amou ...
*
Peer-to-peer file sharing Peer-to-peer file sharing is the distribution and sharing of digital media using peer-to-peer (P2P) networking technology. P2P file sharing allows users to access media files such as books, music, movies, and games using a P2P software program th ...
*
WinFS WinFS (short for Windows Future Storage) was the code name for a canceled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Micr ...


References

{{Reflist, 30em Computer data storage Computer file systems Databases Storage software