Serialize
   HOME

TheInfoList



OR:

In computing, serialization (or serialisation, also referred to as pickling in Python) is the process of translating a
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
or
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an a ...
state into a format that can be stored (e.g. files in secondary storage devices,
data buffer In computer science, a data buffer (or just buffer) is a region of memory used to store data temporarily while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device (such as ...
s in primary storage devices) or transmitted (e.g.
data stream In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has become u ...
s over
computer networks A computer network is a collection of communicating computers and other devices, such as printers and smart phones. In order to communicate, the computers and devices must be connected by wired media like copper cables, optical fibers, or ...
) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of
references A reference is a relationship between Object (philosophy), objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. ...
, this process is not straightforward. Serialization of
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an a ...
s does not include any of their associated methods with which they were previously linked. This process of serializing an object is also called marshalling an object in some situations. The opposite operation, extracting a data structure from a series of bytes, is deserialization, (also called unserialization or
unmarshalling In computer science, marshalling or marshaling ( US spelling) is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different runtimes. It is typically ...
). In networking equipment hardware, the part that is responsible for serialization and deserialization is commonly called SerDes.


Uses

Uses of serialization include: * serializing data for transfer across wires and networks (
messaging A message is a unit of communication that conveys information from a sender to a receiver. It can be transmitted through various forms, such as spoken or written words, signals, or electronic data, and can range from simple instructions to co ...
). * storing data (in
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s, on
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s). *
remote procedure call In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared computer network), which is written as if it were a ...
s, e.g., as in
SOAP Soap is a salt (chemistry), salt of a fatty acid (sometimes other carboxylic acids) used for cleaning and lubricating products as well as other applications. In a domestic setting, soaps, specifically "toilet soaps", are surfactants usually u ...
. * distributing objects, especially in
component-based software engineering Component-based software engineering (CBSE), also called component-based development (CBD), is a style of software engineering that aims to construct a software system from software component, components that are loosely-Coupling (computer program ...
such as COM,
CORBA The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between sy ...
, etc. * detecting changes in time-varying data. For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different
hardware architecture In engineering, hardware architecture refers to the identification of a system's physical components and their interrelationships. This description, often called a hardware design model, allows hardware designers to understand how their compon ...
should be able to reliably reconstruct a serialized data stream, regardless of
endianness file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture-independent format means preventing the problems of
byte ordering '' Jonathan_Swift.html" ;"title="Gulliver's Travels'' by Jonathan Swift">Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of d ...
, memory layout, or simply different ways of representing data structures in different
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s. Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications, this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization. Even on a single machine, primitive pointer objects are too fragile to save because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called ''
unswizzling In computer science, pointer swizzling is the conversion of references based on name or position into direct pointer references (memory addresses). It is typically performed during deserialization or loading of a relocatable object from a disk ...
'' or ''pointer unswizzling'', where direct pointer references are converted to references based on name or position. The deserialization process includes an inverse step called ''
pointer swizzling In computer science, pointer swizzling is the conversion of references based on name or position into direct pointer references (memory addresses). It is typically performed during deserialization or loading of a relocatable object from a disk ...
''. Since both serializing and deserializing can be driven from common code (for example, the ''Serialize'' function in
Microsoft Foundation Classes Microsoft Foundation Class Library (MFC) is a C++ Object-oriented programming, object-oriented Library (computer science), library for developing desktop applications for Windows. MFC was introduced by Microsoft in 1992 and quickly gained wides ...
), it is possible for the common code to do both at the same time, and thus, 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy because differences can be detected on the fly, a technique called differential execution. This is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.


Drawbacks

Serialization breaks the opacity of an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
by potentially exposing private implementation details. Trivial implementations which serialize all data members may violate encapsulation. To discourage competitors from making compatible products, publishers of
proprietary software Proprietary software is computer software, software that grants its creator, publisher, or other rightsholder or rightsholder partner a legal monopoly by modern copyright and intellectual property law to exclude the recipient from freely sharing t ...
often keep the details of their programs' serialization formats a
trade secret A trade secret is a form of intellectual property (IP) comprising confidential information that is not generally known or readily ascertainable, derives economic value from its secrecy, and is protected by reasonable efforts to maintain its conf ...
. Some deliberately
obfuscate Obfuscation is the wikt:obscure#Verb, obscuring of the intended meaning (linguistics), meaning of communication by making the message difficult to understand, usually with mental confusion, confusing and ambiguity, ambiguous language. The obfusca ...
or even
encrypt In cryptography, encryption (more specifically, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the information, known as plai ...
the serialized data. Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore, remote method call architectures such as
CORBA The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between sy ...
define their serialization formats in detail. Many institutions, such as archives and libraries, attempt to
future proof Future-proofing (also futureproofing) is the process of anticipating the future and developing methods of risk mitigation, minimizing the effects of shocks and stresses of future events. Future-proofing is used in industries such as infrastruct ...
their
backup In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "wikt:back ...
archives—in particular,
database dump A database dump contains a record of the table structure and/or the data from a database and is usually in the form of a list of SQL statements ("SQL dump"). A database dump is most often used for backing up a database so that its contents can ...
s—by storing them in some relatively
human-readable In computing, a human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans, resulting in human-readable data. It is often encoded as ASCII or Unicode text, rather than as binary da ...
serialized format.


Serialization formats

The
Xerox Network Systems Xerox Network Systems (XNS) is a computer networking protocol suite developed by Xerox within the Xerox Network Systems Architecture. It provided general purpose network communications, internetwork routing and packet delivery, and higher level ...
Courier technology in the early 1980s influenced the first widely adopted standard.
Sun Microsystems Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
published the External Data Representation (XDR) in 1987. XDR is an
open format An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with a ...
, and standardized a
STD 67
(RFC 4506) by
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...
. In the late 1990s, a push to provide an alternative to the standard serialization protocols started:
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
, an
SGML The Standard Generalized Markup Language (SGML; International Organization for Standardization, ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on t ...
subset, was used to produce a human-readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in
Ajax Ajax may refer to: Greek mythology and tragedy * Ajax the Great, a Greek mythological hero, son of King Telamon and Periboea * Ajax the Lesser, a Greek mythological hero, son of Oileus, the king of Locris * Ajax (play), ''Ajax'' (play), by the an ...
web applications. XML is an open format and standardized as a W3C recommendation.
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
is a lightweight plain-text alternative to XML and is also commonly used for client-server communication in web applications. JSON is based on
JavaScript syntax The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program. The examples below make use of the log function of the console object present in most browsers for standard text output. The JavaScript sta ...
but is independent of JavaScript and supported in many other programming languages. JSON is standardized a
STD 90
()
ECMA-404
an

YAML YAML ( ) is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Marku ...
is a strict superset of JSON and includes additional features such as a data type tags, support for cyclic data structures, indentation-sensitive syntax, and multiple forms of scalar data quoting. YAML is an open format.
Property list In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files. Property l ...
s are used for serialization by
NeXTSTEP NeXTSTEP is a discontinued object-oriented, multitasking operating system based on the Mach kernel and the UNIX-derived BSD. It was developed by NeXT, founded by Steve Jobs, in the late 1980s and early 1990s and was initially used for its ...
,
GNUstep GNUstep is a free software implementation of the Cocoa (formerly OpenStep) Objective-C frameworks, widget toolkit, and application development tools for Unix-like operating systems and Microsoft Windows. It is part of the GNU Project. GNUst ...
,
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
, and
iOS Ios, Io or Nio (, ; ; locally Nios, Νιός) is a Greek island in the Cyclades group in the Aegean Sea. Ios is a hilly island with cliffs down to the sea on most sides. It is situated halfway between Naxos and Santorini. It is about long an ...
frameworks. ''Property list'', or ''p-list'' for short, doesn't refer to a single serialization format but instead several different variants, some human-readable and one binary. For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF,
netCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidat ...
and the older
GRIB GRIB (GRIdded Binary or General Regularly-distributed Information in Binary form) is a concise data format commonly used in meteorology to store historical and forecast weather data. It is standardized by the World Meteorological Organization's ...
.


Programming language support

Several
object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
languages directly support ''object serialization'' (or ''object archival''), either by
syntactic sugar In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an ...
elements or providing a standard interface for doing so. The languages which do so include
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
,
Smalltalk Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learni ...
, Python,
PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
,
Objective-C Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
,
Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), was an ancient sacred precinct and the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient Classical antiquity, classical world. The A ...
,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, and the
.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
family of languages. There are also libraries available that add serialization support to languages that lack native support for it.


C and C++

C and C++ do not provide serialization as any sort of high-level construct, but both languages support writing any of the built-in
data types In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
, as well as plain old data structs, as binary data. As such, it is usually trivial to write custom serialization functions. Moreover, compiler-based solutions, such as the ODB ORM system for C++ and the
gSOAP gSOAP is a C and C++ software development toolkit for SOAP/XML web services and generic XML data bindings. Given a set of C/C++ type declarations, the compiler-based gSOAP tools generate serialization routines in source code for efficient XML se ...
toolkit for C and C++, are capable of automatically producing serialization code with few or no modifications to class declarations. Other popular serialization frameworks are Boost.Serialization from the Boost Framework, the S11n framework, and Cereal. MFC framework (Microsoft) also provides serialization methodology as part of its Document-View architecture.


CFML

CFML ColdFusion Markup Language, more commonly known as CFML, is a scripting language for web development that runs on the Java virtual machine (JVM), the .NET framework, and Google App Engine. Several commercial and free and open-source software imp ...
allows data structures to be serialized to WDDX with the
/code> tag and to
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
with th
SerializeJSON()
function.


Delphi

Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), was an ancient sacred precinct and the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient Classical antiquity, classical world. The A ...
provides a built-in mechanism for serialization of components (also called persistent objects), which is fully integrated with its IDE. The component's contents are saved to a DFM file and reloaded on-the-fly.


Go

Go natively supports unmarshalling/marshalling of
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
data. There are also third-party modules that support
YAML YAML ( ) is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Marku ...
and
Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
. Go also supports ''Gobs''.


Haskell

In Haskell, serialization is supported for types that are members of the Read and Show
type class In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T a ...
es. Every type that is a member of the Read type class defines a function that will extract the data from the string representation of the dumped data. The Show type class, in turn, contains the show function from which a string representation of the object can be generated. The programmer need not define the functions explicitly—merely declaring a type to be deriving Read or deriving Show, or both, can make the compiler generate the appropriate functions for many cases (but not all: function types, for example, cannot automatically derive Show or Read). The auto-generated instance for Show also produces valid source code, so the same Haskell value can be generated by running the code produced by show in, for example, a Haskell interpreter. For more efficient serialization, there are haskell libraries that allow high-speed serialization in binary format, e.g
binary


Java

Java provides automatic serialization which requires that the object be marked by implementing the interface. Implementing the interface marks the class as "okay to serialize", and Java then handles serialization internally. There are no serialization methods defined on the Serializable interface, but a serializable class can optionally define methods with certain special names and signatures that if defined, will be called as part of the serialization/deserialization process. The language also allows the developer to override the serialization process more thoroughly by implementing another interface, the interface, which includes two special methods that are used to save and restore the object's state.
There are three primary reasons why objects are not serializable by default and must implement the Serializable interface to access Java's serialization mechanism.
Firstly, not all objects capture useful semantics in a serialized state. For example, a object is tied to the state of the current
JVM A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descri ...
. There is no context in which a deserialized Thread object would maintain useful semantics.
Secondly, the serialized state of an object forms part of its class' compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be a deliberate design decision and not a default condition.
Lastly, serialization allows access to non- transient private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable nor externalizable. The standard encoding method uses a recursive graph-based translation of the object's class descriptor and serializable fields into a byte stream. Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object via a field that is not marked as transient must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that some portion of the reference graph is truncated and not serialized.
Java does not use constructor to serialize objects. It is possible to serialize Java objects through
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the Java (programming language), Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java ...
and store them into a database. While Swing components do implement the Serializable interface, they are not guaranteed to be portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to a byte stream, but it is not guaranteed that this will be re-constitutable on another machine.


JavaScript

Since ECMAScript 5.1,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
has included the built-in JSON object and its methods JSON.parse() and JSON.stringify(). Although JSON is originally based on a subset of JavaScript, there are boundary cases where JSON is not valid JavaScript. Specifically, JSON allows the Unicode line terminators and to appear unescaped in quoted strings, while ECMAScript 2018 and older does not. See the main article on JSON.


Julia

Julia implements serialization through the serialize() / deserialize() modules, intended to work within the same version of Julia, and/or instance of the same system image. The HDF5.jl package offers a more stable alternative, using a documented format and common library with wrappers for different languages, while the default serialization format is suggested to have been designed rather with maximal performance for network communication in mind.


Lisp

Generally a
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
data structure can be serialized with the functions "read" and "print". A variable foo containing, for example, a list of arrays would be printed by (print foo). Similarly an object can be read from a stream named s by (read s). These two parts of the Lisp implementation are called the Printer and the Reader. The output of "print" is human readable; it uses lists demarked by parentheses, for example: . In many types of Lisp, including
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
, the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function print-object, this will be invoked when the object is printed. This is somewhat similar to the method used in Ruby. Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out, such as in a
read–eval–print loop A read–eval–print loop (REPL), also termed an interactive toplevel or language shell, is a simple interactive computer programming environment that takes single user inputs, executes them, and returns the result to the user; a program written ...
. Not all readers/writers support cyclic, recursive or shared structures.


.NET

.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
has several serializers designed by
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
. There are also many serializers by third parties. More than a dozen serializers are discussed and teste
here
an
here


OCaml

OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
's standard library provides marshalling through the Marshal module and the Pervasives functions output_value and input_value. While OCaml programming is statically type-checked, uses of the Marshal module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. In OCaml it is difficult to marshal a function or a data structure which contains a function (e.g. an object which contains a method), because executable code in functions cannot be transmitted across different programs. (There is a flag to marshal the code position of a function but it can only be unmarshalled in exactly the same program). The standard marshalling functions can preserve sharing and handle cyclic data, which can be configured by a flag.


Perl

Several
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
modules available from
CPAN The Comprehensive Perl Archive Network (CPAN) is a software repository of over 220,000 software modules and accompanying documentation for 45,500 distributions, written in the Perl programming language by over 14,500 contributors. ''CPAN'' can de ...
provide serialization mechanisms, including Storable , JSON::XS and FreezeThaw. Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars. In addition to serializing directly to files, Storable includes the freeze function to return a serialized copy of the data packed into a scalar, and thaw to deserialize such a scalar. This is useful for sending a complex data structure over a
network socket A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming ...
or storing it in a database. When serializing structures with Storable, there are network safe functions that always store their data in a format that is readable on any computer at a small cost of speed. These functions are named nstore, nfreeze, etc. There are no "n" functions for deserializing these structures — the regular thaw and retrieve deserialize structures serialized with the "n" functions and their machine-specific equivalents.


PHP

PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
originally implemented serialization through the built-in serialize() and unserialize() functions. PHP can serialize any of its data types except resources (file pointers, sockets, etc.). The built-in unserialize() function is often dangerous when used on completely untrusted data. For objects, there are two " magic methods" that can be implemented within a class — __sleep() and __wakeup() — that are called from within serialize() and unserialize(), respectively, that can clean up and restore an object. For example, it may be desirable to close a database connection on serialization and restore the connection on deserialization; this functionality would be handled in these two magic methods. They also permit the object to pick which properties are serialized. Since PHP 5.1, there is an object-oriented serialization mechanism for objects, the Serializable interface.


Prolog

Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
's ''term'' structure, which is the only data structure of the language, can be serialized out through the built-in predicate write_term/3 and serialized-in through the built-in predicates read/1 and read_term/2. The resulting stream is uncompressed text (in some encoding determined by configuration of the target stream), with any free variables in the term represented by placeholder variable names. The predicate write_term/3 is standardized in the ISO Specification for Prolog (ISO/IEC 13211-1) on pages 59 ff. ("Writing a term, § 7.10.5"). Therefore it is expected that terms serialized-out by one implementation can be serialized-in by another without ambiguity or surprises. In practice, implementation-specific extensions (e.g. SWI-Prolog's dictionaries) may use non-standard term structures, so interoperability may break in edge cases. As examples, see the corresponding manual pages for SWI-Prolog, SICStus Prolog, GNU Prolog. Whether and how serialized terms received over the network are checked against a specification (after deserialization from the character stream has happened) is left to the implementer. Prolog's built-in Definite Clause Grammars can be applied at that stage.


Python

The core general serialization mechanism is the pickle
standard library In computer programming, a standard library is the library (computing), library made available across Programming language implementation, implementations of a programming language. Often, a standard library is specified by its associated program ...
module, alluding to the database systems term ''pickling'' to describe data serialization (''unpickling'' for ''deserializing''). Pickle uses a simple
stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...
-based
virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
that records the instructions used to reconstruct the object. It is a cross-versio
customisable
but unsafe (not secure against erroneous or malicious data) serialization format. Malformed or maliciously constructed data, may cause the deserializer to import arbitrary modules and instantiate any object. The standard library also includes modules serializing to standard data formats:
/code> (with built-in support for basic scalar and collection types and able to support arbitrary types vi

.
/code> (with support for both binary and XML
property list In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files. Property l ...
formats). xdrlib
/code> (with support for the External Data Representation (XDR) standard as described in RFC 1014). Finally, it is recommended that an object's
/code> be evaluable in the right environment, making it a rough match for Common Lisp's
/code>. Not all object types can be pickled automatically, especially ones that hold
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
resources like
file handle In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket. File descriptors typically ha ...
s, but users can register custom "reduction" and construction functions to support the pickling and unpickling of arbitrary types. Pickle was originally implemented as the pure Python pickle module, but, in versions of Python prior to 3.0, the cPickle module (also a built-in) offers improved performance (up to 1000 times faster). The cPickle was adapted from the
Unladen Swallow Unladen Swallow was an optimization branch of CPython, the reference implementation of the Python programming language, which incorporated a just-in-time compiler built using LLVM into CPython's virtual machine. Like many things regarding Python ...
project. In Python 3, users should always import the standard version, which attempts to import the accelerated version and falls back to the pure Python version.


R

R has the function dput which writes an ASCII text representation of an R object to a file or connection. A representation can be read from a file using dget. More specific, the function serialize serializes an R object to a connection, the output being a raw vector coded in hexadecimal format. The unserialize function allows to read an object from a connection or a raw vector.


REBOL

REBOL will serialize to file (save/all) or to a string! (mold/all). Strings and files can be deserialized using the polymorphic load function. RProtoBuf provides cross-language data serialization in R, using
Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
.


Ruby

Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
includes the standard module Marshal
/code> with 2 methods dump and load, akin to the standard Unix utilities
dump Deoxyuridine monophosphate (dUMP), also known as deoxyuridylic acid or deoxyuridylate in its conjugate acid and conjugate base forms, respectively, is a deoxynucleotide. It is an intermediate in the metabolism of deoxyribonucleotides. Biosynthes ...
and restore. These methods serialize to the standard class String, that is, they effectively become a sequence of bytes. Some objects cannot be serialized (doing so would raise a TypeError exception): bindings, procedure objects, instances of class IO, singleton objects and interfaces. If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods: _dump and _load. The
instance method A method in object-oriented programming (OOP) is a Procedure (computer science), procedure associated with an Object (computer science), object, and generally also a Message passing, message. An object consists of ''state data'' and ''behavior''; ...
_dump should return a String object containing all the information necessary to reconstitute objects of this class and all referenced objects up to a maximum depth given as an integer parameter (a value of -1 implies that depth checking should be disabled). The
class method A method in object-oriented programming (OOP) is a procedure associated with an object, and generally also a message. An object consists of ''state data'' and ''behavior''; these compose an ''interface'', which specifies how the object may be us ...
_load should take a String and return an object of this class.


Rust

Serde
/code> is the most widely used library, or crate, for serialization in
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
which provides the and traits.


Smalltalk

In general, non-recursive and non-sharing objects can be stored and retrieved in a human readable form using the storeOn:/readFrom: protocol. The storeOn: method generates the text of a Smalltalk expression which – when evaluated using readFrom: – recreates the original object. This scheme is special, in that it uses a procedural description of the object, not the data itself. It is therefore very flexible, allowing for classes to define more compact representations. However, in its original form, it does not handle cyclic data structures or preserve the identity of shared references (i.e. two references a single object will be restored as references to two equal, but not identical copies). For this, various portable and non-portable alternatives exist. Some of them are specific to a particular Smalltalk implementation or class library. There are several ways in Squeak Smalltalk to serialize and store objects. The easiest and most used are storeOn:/readFrom: and binary storage formats based on SmartRefStream serializers. In addition, bundled objects can be stored and retrieved using ImageSegments. Both provide a so-called "binary-object storage framework", which support serialization into and retrieval from a compact binary form. Both handle cyclic, recursive and shared structures, storage/retrieval of class and
metaclass In object-oriented programming, a metaclass is a Class (computer science), class whose Instance (computer programming), instances are classes themselves. Unlike ordinary classes, which define the behaviors of objects, metaclasses specify the beha ...
info and include mechanisms for "on the fly" object migration (i.e. to convert instances which were written by an older version of a class with a different object layout). The APIs are similar (storeBinary/readBinary), but the encoding details are different, making these two formats incompatible. However, the Smalltalk/X code is open source and free and can be loaded into other Smalltalks to allow for cross-dialect object interchange. Object serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in Ambrai Smalltalk. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database and some
RPC RPC may refer to: Science and technology * Rational polynomial coefficient * Reactive Plastic Curtain, a carbon-dioxide-absorbing device used in some rebreather breathing sets * Regional Playback Control, a regional lockout technology for DVDs ...
packages. A solution to this problem is SIXX, which is a package for multiple Smalltalks that uses an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
-based format for serialization.


Swift

The
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIF ...
standard library provides two protocols, Encodable and Decodable (composed together as Codable), which allow instances of conforming types to be serialized to or deserialized from
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
,
property list In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files. Property l ...
s, or other formats. Default implementations of these protocols can be generated by the compiler for types whose stored properties are also Decodable or Encodable.


PowerShell

PowerShell PowerShell is a shell program developed by Microsoft for task automation and configuration management. As is typical for a shell, it provides a command-line interpreter for interactive use and a script interpreter for automation via a langu ...
implements serialization through the built-in cmdlet Export-CliXML. Export-CliXML serializes .NET objects and stores the resulting XML in a file. To reconstitute the objects, use the Import-CliXML cmdlet, which generates a deserialized object from the XML in the exported file. Deserialized objects, often known as "property bags" are not live objects; they are snapshots that have properties, but no methods. Two dimensional data structures can also be (de)serialized in CSV format using the built-in cmdlets Import-CSV and Export-CSV.


See also

* Commutation (telemetry) *
Comparison of data serialization formats This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file format A document file format is a Text file, text or bi ...
*
Container format A container format (informally, sometimes called a wrapper) or metafile is a file format that allows multiple data streams to be embedded into a single file, usually along with metadata for identifying and further detailing those streams. Nota ...
*
Hibernate (Java) Hibernate ORM (or simply Hibernate) is an object–relational mapping tool for the Java programming language. It provides a framework for mapping an object-oriented domain model to a relational database. Hibernate handles object–relational im ...
*
XML Schema An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constrai ...
*
Basic Encoding Rules X.690 is an ITU-T standard specifying several Abstract Syntax Notation One, ASN.1 encoding formats: * #BER encoding, Basic Encoding Rules (BER) * #CER encoding, Canonical Encoding Rules (CER) * #DER encoding, Distinguished Encoding Rules (DER) T ...
*
Google Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
* Wikibase * Apache Avro


References


External links

*
Java 1.4 Object Serialization documentation





Databoard
- Binary serialization with partial and random access, type system, RPC, type adaption, and text format {{Authority control * Persistence