Xena is
open-source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Ope ...
for use in
digital preservation
In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
. Xena is short for XML Electronic Normalising for Archives.
Xena is a
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
application that was developed by the
National Archives of Australia
The National Archives of Australia (NAA), formerly known as the Commonwealth Archives Office and Australian Archives, is an Australian Government agency that serves as the national archives of the nation. It collects, preserves and encourages ...
. It is available free of charge under the
GNU General Public License
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end user
In product development, an end user (sometimes end-user) is a person who ultimately uses or is intended to ulti ...
.
Version 6.1.0 was released 31 July 2013. Source code and binaries for Linux, OS X and Windows are available from
SourceForge
SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirrori ...
. However, as of 2018, it is no longer maintained or supported.
Mode of operation
Xena attempts to avoid
digital obsolescence
Digital obsolescence is the risk of data loss because of inabilities to access digital assets, due to the hardware or software required for information retrieval being repeatedly replaced by newer devices and systems, resulting in increasingly in ...
by converting files into an openly specified format, such as
ODF
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
or
PNG. If the file format is not supported or the Binary Normalisation option is selected, Xena will perform
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
Common to all bina ...
encoding on binary files and wrap the output in XML metadata. The resulting
.xena
file is plain text, although the content of the data itself is not directly human-readable. The exact original file can be retrieved by stripping the metadata and reversing the Base64 encoding, using an internal viewer.
Features
Platforms supported by Xena are
Microsoft Windows,
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
and
Mac OS X
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
.
Xena uses a series of plugins to identify file formats and convert them to an appropriate openly specified format.
Xena has an
application programming interface which allows any reasonably skilled Java developer to develop a plugin to cover a new file type.
Xena can process individual files or whole directories. When processing a whole directory, it can preserve the original directory structure of the converted records.
Xena can create plain text versions of file formats such as
TIFF
Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...
,
Word
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
and
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
, with the use of
Tesseract (software)
Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open so ...
.
The Xena interface or Xena Viewer can be used to view or export a Xena file (extension
.xena
) in its target file format. These files contain the normalised file as well as any extra information relevant to the normalisation process.
The Xena Viewer supports bulk export of Xena files to target file formats.
Xena can be used via its
graphical user interface
The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows User (computing), users to Human–computer interaction, interact with electronic devices through graphical icon (comp ...
or the
command line
A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
.
For Xena to be fully functional, it requires a local installation of the following external software:
*
LibreOffice
LibreOffice () is a free and open-source office productivity software suite, a project of The Document Foundation (TDF). It was forked in 2010 from OpenOffice.org, an open-sourced version of the earlier StarOffice. The LibreOffice suite consi ...
suite - to convert office documents to OpenDocument format
*
Tesseract
In geometry, a tesseract is the four-dimensional analogue of the cube; the tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of six square faces, the hypersurface of the tesseract consists of ei ...
- to create plain text versions of file formats
*
ImageMagick
ImageMagick, invoked from the command line as magick, is a free and open-source cross-platform software suite for displaying, creating, converting, modifying, and editing raster images. Created in 1987 by John Cristy, it can read and write ove ...
- to convert a subset of image files to
PNG
*Readpst - to convert
Microsoft Outlook
Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites. Though primarily an email client, Outlook also includes such functions as c ...
PST files to XML. Readpst is part of the free and open sourc
libpst software suite
*
FLAC
FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...
- to convert audio files to FLAC format. This is also required to play back audio files using Xena.
Supported file types
Xena will recognize and process the file types listed below, plus a few others of minor importance. Unsupported file types will automatically undergo binary normalization.
Office file formats:
*
Microsoft Office
Microsoft Office, or simply Office, is the former name of a family of client software, server software, and services developed by Microsoft. It was first announced by Bill Gates on August 1, 1988, at COMDEX in Las Vegas. Initially a ma ...
files (including
MS Office XML,
SYLK spreadsheets and
Rich Text Format
)
As an example, the following RTF code
would be rendered as follows:
This is some bold text.
Character encoding
A standard RTF file can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters. T ...
) are converted to the corresponding OpenDocument files
*
Microsoft Outlook
Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites. Though primarily an email client, Outlook also includes such functions as c ...
PST files are parsed for their individual messages, which are converted to XML files and a Xena index file is created
*
Microsoft Project
Microsoft Project is a project management software product, developed and sold by Microsoft. It is designed to assist a project manager in developing a schedule, assigning resources to tasks, tracking progress, managing the budget, and anal ...
MPP files are converted to XML
*
OpenOffice.org XML
OpenOffice.org XML is an open XML-based file format developed as an open community effort by Sun Microsystems in 2000–2002. The open-source software application suite OpenOffice.org 1.x and StarOffice 6 and 7 used the format as their native an ...
files (SXC, SXI, SXW) are converted to the corresponding OpenDocument formats
*
WordPerfect WPD files are converted to OpenDocument ODT
*
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed ...
documents (ODT, ODS, ODB, ODP) are preserved unchanged
*Acrobat PDF files are stored as binaries
*Mailbox files (MBX) are converted to individual XML files
Graphics:
*
BMP,
GIF
The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. ...
,
PSD PSD may refer to:
Educational bodies
* Pennsylvania School for the Deaf, a Pre-K to 12th grade school for Deaf and Hard of Hearing students, located in the Germantown section of Philadelphia, Pennsylvania
* Philippine School Doha, a Filipino sch ...
,
PCX
PCX, standing for ''PiCture eXchange'', was an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia, United States. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS ...
,
RAS
Ras or RAS may refer to:
Arts and media
* RAS Records Real Authentic Sound, a reggae record label
* Rundfunk Anstalt Südtirol, a south Tyrolese public broadcasting service
* Rás 1, an Icelandic radio station
* Rás 2, an Icelandic radio stati ...
, and the
X Window System
The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems.
X provides the basic framework for a GUI environment: drawing and moving windows on the display device and interacting ...
XBM
In computer graphics, the X Window System used X BitMap (XBM), a plain text binary image format, for storing cursor and icon bitmaps used in the X GUI. The XBM format is superseded by XPM, which first appeared for X11 in 1989.
Format
XBM file ...
and
XPM bitmap files are converted to
PNG;
TIFF
Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...
files additionally get embedded metadata stored in Xena XML. If the
Tesseract
In geometry, a tesseract is the four-dimensional analogue of the cube; the tesseract is to the cube as the cube is to the square. Just as the surface of the cube consists of six square faces, the hypersurface of the tesseract consists of ei ...
OCR software
This comparison of optical character recognition software includes:
* OCR engines, that do the actual character identification
* Layout analysis software, that divide scanned documents into zones suitable for OCR
* Graphical interfaces to one or m ...
is installed, text will be extracted from TIFF files.
*OpenDocument Drawings (ODG) and
SVG
Scalable Vector Graphics (SVG) is an XML-based vector image format for defining two-dimensional graphics, having support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium sinc ...
files are wrapped in Xena XML
*JPG and PNG files are stored unchanged
Archive Files:
*Files are extracted from
archives
An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located.
Archives contain primary source documents that have accumulated over the course of an individual ...
(
ZIP
Zip, Zips or ZIP may refer to:
Common uses
* ZIP Code, USPS postal code
* Zipper or zip, clothing fastener
Science and technology Computing
* ZIP (file format), a compressed archive file format
** zip, a command-line program from Info-ZIP
* Zi ...
,
GZIP
gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
,
TAR/TAR.gz,
JAR
A jar is a rigid, cylindrical or slightly conical container, typically made of glass, ceramic, or plastic, with a wide mouth or opening that can be closed with a lid, screw cap, lug cap, cork stopper, roll-on cap, crimp-on cap, press-on c ...
,
WAR
War is an intense armed conflict between states, governments, societies, or paramilitary groups such as mercenaries, insurgents, and militias. It is generally characterized by extreme violence, destruction, and mortality, using regular o ...
, Mac binary) and normalised into a separate Xena file. A Xena index file is created, which when opened in the internal Xena viewer will display the files in a table.
Audio files:
*
MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Orig ...
,
WAV
Waveform Audio File Format (WAVE, or WAV due to its filename extension; pronounced "wave") is an audio file format standard, developed by IBM and Microsoft, for storing an audio bitstream on PCs. It is the main format used on Microsoft Wi ...
,
AIFF
Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data for personal computers and other electronic audio devices. The format was developed by Apple Inc. in 1988 based on Electronic Arts' Interchange Fil ...
, and
OGG
Ogg is a free, open container format maintained by the Xiph.Org Foundation. The authors of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high-quality di ...
formats are converted to
FLAC
FLAC (; Free Lossless Audio Codec) is an audio coding format for lossless compression of digital audio, developed by the Xiph.Org Foundation, and is also the name of the free software project producing the FLAC tools, the reference software p ...
files.
Databases:
*
SQL files are processed as plain text wrapped in XML
Other file types:
*HTML is converted to XHTML
*TXT text files are stored as plain text wrapped in XML; CSS files are stored as plain text wrapped in XML
Reviews
An April 22, 2010 review in Practical e-Records rated Xena at 82/100 points. At present Xena has no target preservation format for video files.
References
External links
Xena on SourceForgeXena wiki on SourceForgeXena project description at The Australian Service for Knowledge of Open Source SoftwareNational Archives of Australia - software
{{DEFAULTSORT:Xena (Software)
Digital preservation
Electronic documents
Free software programmed in Java (programming language)
Binary-to-text encoding formats
Mass digitization
Software using the GPL license