# Unisens File Format
Unisens is a universal data format for multi sensor data. It was developed at the FZI Research Center for Information Technology and the KIT (formerly University of Karlsruhe). The motivation for specifying a new data format was the need for an open, universal, generic and sustainable format for storing and archiving sensor data. Other main requirements were a human readable header and the use of future-proof standards like XML. Have a look at the complete list of features here.
A unisens data set is represented by a folder in the file system. The name of the data set equates the name of the folder. The folder contains a header file and at least one data file. The filename of the header file is always unisens.xml. The header file contains all meta information of the data set. It references the data files and xdescribes them. The header file is written in a human readable XML (eXtensible Markup Language) format and the structure of this header file is given by a XML Schema Definition (XSD).
# File structure
In this example there are seven data files (e.g. ecg.bin and acc.bin) and the unisens.xml header file.
# Entry Types
The data files are called entries of the data set. Entries and can be of three different types:
|Signal Entry||Continuous signal with a fixed samplerate (e.g. a ECG signal)|
|Values Entry||Time discrete values, consisting of time stamp and value (e.g. values of a blood pressure measurement)|
|Event Entry||Time discrete events, consisting of time stamp and event type (e.g. a detected R-peak in a ECG signal)|
Each entry in a dataset has an entry ID. The entry ID equals the filename.
# File formats
The data of an entry can be represeted by different file formats:
|File format||File ending|
|CSV (Character Separareted Values)||*.csv|
|XML (eXtensible Markup Language)||*.xml|
For large data sets the binary format is recommended. The text based formats (CSV nd XML) need much more file space (factor 4 to 10).
# Data Types
Values Entries and Signal Entries can use the following data types:
|Data Type||Size (Byte)||Value Range|
|DOUBLE||8||4.9 · 10^−324 ... 1.7976931348623157 · 10^308|
|FLOAT||4||<1.4 · 10^−45 ... 3.4028235 · 10^38|
|INT32||4||−2147483648 ... 2147483647|
|INT16||2||−32768 ... 32767|
|INT8||1||−128 ... 127|
|UINT32||4||0 ... 4294967295|
|UINT16||2||0 ... 65535|
|UINT8||1||0 ... 255|
For Values Entries and Signal Entries
lsbValue (a double value) and
baseline (a integer value) can be specified to map stored data to real world values.
Name and meaning of
baseline are based on the idea of data originating from analog to digital converters (ADC):
baseline: the value of ADC output that would map to 0 physical units input. This value can be beyond the ADC output range.
lsbValue: the equivalent value of the physical variable represented by the least significant bit of the ADC.
Real world values are calulated as follows from the stored data:
value = (ADCout - baseline) * lsbValue
# Software using unisens
- SensorManager (opens new window)
- UnisensViewer (opens new window)
- DataAnalyzer (opens new window)
- movisensXS (opens new window)
- EDFBrowser (opens new window)
# Libraries for unisens
- Java: unisens4java (opens new window), unisens2Excel (opens new window). The Java library can directly be used in Matlab.
- Matlab: unisensMatlabTools (opens new window). Useful tools to work with unisens data sets.
- R: unisensR on GitHub (opens new window) - unisensR on CRAN (opens new window)
- Python: pyunisens (opens new window)
# List of Features
Handles different types of data
- continuous signals e.g. ECG, acceleration, thoracic impedance, etc.
- events e.g. trigger annotations, artifact regions, etc.
- values e.g. blood pressure, respiration rate, heart rate
One or more simultaneous channels may be recorded
- e.g. multi channel ECG
- e.g. diastolic and systolic blood pressure
- e.g. three axial acceleration sensor data
Simultaneous, synchronous storage at different sample rates
- e.g. acceleration data at 80 Hz
- e.g. ECG at 200 Hz
Multiple sensors of one type may be recorded
- e.g. one ECG sensor for dry electrodes and one for adhesive electrodes
- e.g. multiple acceleration sensors with different locations
May be easyly used in embedded systems
- simple way for writing data
- variable byte order (little endian, big endian)
- support of multiple data types e.g. int32, uint16, double, etc.
- no unit conversions are necessary
Sample-exact data access
- different time bases for different entries possible
- access via sample stamp
Support of different file formats
- binary files, CSV files, XML files projected
Reference implementation in Java
- no dependency to commercial software
- little platform dependancy
- easy integration in MATLAB®
Human readable meta file
- XML format
- readable in every browser or editor
- Separation of meta information and data
- one meta file (header file) and one or multiple data files
- easy opening of data files with 3rd-party software
Direct data access possible
- data access without unisens library is possible
- Flexible way for data organisation
- free comments
- data can be arragned in logical groups
# Technical Documentation
- Dataset: Coherent unit of a header file and a arbitrary number of data files.
- Header File: The header files contains mete information of a data set and describes the data files. the header file is always named unisens.xml.
- Data Entry / data file: Data files in a data set are called Entries. Data Entries contain the actual data. The following Entry types exist: Signal Entry, Values Entry and Event Entry.
- Signal Entry: Contiuous data with a fixed samplerate (e.g. ECG or acceleration signal). A Signal Entry can have mutiple channels (e.g. several ECG leads or the 3 axes of a acceleration sensor).
- Values Entry: Time discrete values, consisting of a time stamp and value (e.g. values of a blood pressure measurement). Values Entries can have mutiple channels (e.g. systolic and diastolic values of a blood pressure measurement).
- Event Entry: Time discrete events, consisting of time stamp and event type (e.g. a detected R-peak in a ECG signal).
- Channel: Subdivision of an Signal or Values Entry. All channels in one entry share the same unit and the same samplerate.
- Data type: The following data types are supported: uint8, int8, uint16, int16, uint32, int32, float, double.
- File Format: Entries can be stored using the following file formats:Binary, CSV (Character Separareted Values), XML (eXtensible Markup Language)
Multiple files needed for one data set.
The timestamp of the start of measurement is given as local time. Therefore the timestamp of the start of measurement can only be converted to UTC if the timezone (and as consequence the offset to UTC) is known. The timestamps for measurements started during one hour after the clock change from daylight saving time to standard time are not unique. By convention use custom attributes to add timezone and offset infromation, e.g.:
<customAttribute key="timeZoneId" value="America/New_York"/> <customAttribute key="timeZoneOffset" value="-18000"/>
Range of values that can be representend when using integer values is limited by the
baselinevalue which is specified to be given as integer (4-bit) value.
# XSD Schema Definition
You can find the documentation of the unisens XML Schema Definition (XSD) here (opens new window).
# Java API Documentation
The documentation for ja java API can be found here (opens new window).
Malte Kirst, Jörg Ottenbacher, and Radoslav Nedkov. "UNISENS–Ein universelles Datenformat für Multisensordaten." Biosignalverarbeitung: Innovationen bei der Erfassung und Analyse bioelektrischer und biomagnetischer Signale (2008): 106-108