# Unisens File Format

Unisens is a universal data format for multi sensor data. It was developed at the FZI Research Center for Information Technology and the KIT (formerly University of Karlsruhe). The motivation for specifying a new data format was the need for an open, universal, generic and sustainable format for storing and archiving sensor data. Other main requirements were a human readable header and the use of future-proof standards like XML. Have a look at the complete list of features here.

# Overview

A unisens data set is represented by a folder in the file system. The name of the data set equates the name of the folder. The folder contains a header file and at least one data file. The filename of the header file is always unisens.xml. The header file contains all meta information of the data set. It references the data files and xdescribes them. The header file is written in a human readable XML (eXtensible Markup Language) format and the structure of this header file is given by a XML Schema Definition (XSD).

# File structure

structure of a unisens dataset in the file system

In this example there are seven data files (e.g. ecg.bin and acc.bin) and the unisens.xml header file.

# Entry Types

The data files are called entries of the data set. Entries and can be of three different types:

Entry type Description
Signal Entry Continuous signal with a fixed samplerate (e.g. a ECG signal)
Values Entry Time discrete values, consisting of time stamp and value (e.g. values of a blood pressure measurement)
Event Entry Time discrete events, consisting of time stamp and event type (e.g. a detected R-peak in a ECG signal)

Each entry in a dataset has an entry ID. The entry ID equals the filename.

# File formats

The data of an entry can be represeted by different file formats:

File format File ending
Binary *.bin
CSV (Character Separareted Values) *.csv
XML (eXtensible Markup Language) *.xml

For large data sets the binary format is recommended. The text based formats (CSV nd XML) need much more file space (factor 4 to 10).

# Data Types

Values Entries and Signal Entries can use the following data types:

Data Type Size (Byte) Value Range
DOUBLE 8 4.9 · 10^−324 ... 1.7976931348623157 · 10^308
FLOAT 4 <1.4 · 10^−45 ... 3.4028235 · 10^38
INT32 4 −2147483648 ... 2147483647
INT16 2 −32768 ... 32767
INT8 1 −128 ... 127
UINT32 4 0 ... 4294967295
UINT16 2 0 ... 65535
UINT8 1 0 ... 255

For Values Entries and Signal Entries lsbValue (a double value) and baseline (a integer value) can be specified to map stored data to real world values.

Name and meaning of lsbValue and baseline are based on the idea of data originating from analog to digital converters (ADC):

  • baseline: the value of ADC output that would map to 0 physical units input. This value can be beyond the ADC output range.

  • lsbValue: the equivalent value of the physical variable represented by the least significant bit of the ADC.

Real world values are calulated as follows from the stored data:

 value = (ADCout - baseline) * lsbValue

# Software using unisens

# Libraries for unisens

# List of Features

  • Handles different types of data

    • continuous signals e.g. ECG, acceleration, thoracic impedance, etc.
    • events e.g. trigger annotations, artifact regions, etc.
    • values e.g. blood pressure, respiration rate, heart rate
  • One or more simultaneous channels may be recorded

    • e.g. multi channel ECG
    • e.g. diastolic and systolic blood pressure
    • e.g. three axial acceleration sensor data
  • Simultaneous, synchronous storage at different sample rates

    • e.g. acceleration data at 80 Hz
    • e.g. ECG at 200 Hz
  • Multiple sensors of one type may be recorded

    • e.g. one ECG sensor for dry electrodes and one for adhesive electrodes
    • e.g. multiple acceleration sensors with different locations
  • May be easyly used in embedded systems

    • simple way for writing data
    • variable byte order (little endian, big endian)
    • support of multiple data types e.g. int32, uint16, double, etc.
    • no unit conversions are necessary
  • Sample-exact data access

    • different time bases for different entries possible
    • access via sample stamp
  • Support of different file formats

    • binary files, CSV files, XML files projected
  • Reference implementation in Java

    • no dependency to commercial software
    • little platform dependancy
    • easy integration in MATLAB®
  • Human readable meta file

    • XML format
    • readable in every browser or editor
    • Separation of meta information and data
    • one meta file (header file) and one or multiple data files
    • easy opening of data files with 3rd-party software
  • Direct data access possible

    • data access without unisens library is possible
    • Flexible way for data organisation
    • free comments
    • data can be arragned in logical groups

# Technical Documentation

# Definitions

  • Dataset: Coherent unit of a header file and a arbitrary number of data files.
  • Header File: The header files contains mete information of a data set and describes the data files. the header file is always named unisens.xml.
  • Data Entry / data file: Data files in a data set are called Entries. Data Entries contain the actual data. The following Entry types exist: Signal Entry, Values Entry and Event Entry.
  • Signal Entry: Contiuous data with a fixed samplerate (e.g. ECG or acceleration signal). A Signal Entry can have mutiple channels (e.g. several ECG leads or the 3 axes of a acceleration sensor).
  • Values Entry: Time discrete values, consisting of a time stamp and value (e.g. values of a blood pressure measurement). Values Entries can have mutiple channels (e.g. systolic and diastolic values of a blood pressure measurement).
  • Event Entry: Time discrete events, consisting of time stamp and event type (e.g. a detected R-peak in a ECG signal).
  • Channel: Subdivision of an Signal or Values Entry. All channels in one entry share the same unit and the same samplerate.
  • Data type: The following data types are supported: uint8, int8, uint16, int16, uint32, int32, float, double.
  • File Format: Entries can be stored using the following file formats:Binary, CSV (Character Separareted Values), XML (eXtensible Markup Language)

# Limitations

  • Multiple files needed for one data set.

  • The timestamp of the start of measurement is given as local time. Therefore the timestamp of the start of measurement can only be converted to UTC if the timezone (and as consequence the offset to UTC) is known. The timestamps for measurements started during one hour after the clock change from daylight saving time to standard time are not unique. By convention use custom attributes to add timezone and offset infromation, e.g.:

        <customAttribute key="timeZoneId" value="America/New_York"/>
        <customAttribute key="timeZoneOffset" value="-18000"/>
    
  • Range of values that can be representend when using integer values is limited by the baseline value which is specified to be given as integer (4-bit) value.

# XSD Schema Definition

You can find the documentation of the unisens XML Schema Definition (XSD) here (opens new window).

# Java API Documentation

The documentation for ja java API can be found here (opens new window).

# Literature

Malte Kirst, Jörg Ottenbacher, and Radoslav Nedkov. "UNISENS–Ein universelles Datenformat für Multisensordaten." Biosignalverarbeitung: Innovationen bei der Erfassung und Analyse bioelektrischer und biomagnetischer Signale (2008): 106-108

Last Updated: 2/2/2022, 2:30:01 PM
© 2024 movisens GmbH, Imprint