The HDF5 Filter Interface

Introduction

The HDF5 Filter interface (H5Z) provides a flexible pipeline mechanism for processing dataset data during I/O operations. Filters can perform data compression, error checking, data transformation, and other custom operations on dataset chunks.

Filters operate on chunked datasets only (see Using HDF5 Filters for details on dataset chunking) and are applied independently to each chunk. Multiple filters can be chained together in a pipeline, where the output of one filter becomes the input to the next.

Built-in Filters

HDF5 includes several standard filters:

DEFLATE (gzip): General-purpose compression using the gzip algorithm (H5Z_FILTER_DEFLATE). Provides good compression ratios with moderate CPU usage.

SZIP: Compression algorithm designed for scientific data (H5Z_FILTER_SZIP). Offers better performance than DEFLATE for certain data patterns.

SHUFFLE: Rearranges byte order to improve compression (H5Z_FILTER_SHUFFLE). Typically used before compression filters.

FLETCHER32: Computes and verifies checksums for error detection (H5Z_FILTER_FLETCHER32). Ensures data integrity.

NBIT: Lossless compression for datasets with unused bits (H5Z_FILTER_NBIT).

SCALEOFFSET: Lossy compression using scaling and offset (H5Z_FILTER_SCALEOFFSET).

Using Filters

Filters are configured through dataset creation property lists. Enable chunking first using H5Pset_chunk, then add compression with functions like H5Pset_deflate.

Filter Pipelines

Multiple filters can be combined in a pipeline. Filters are applied in the order they are added during write operations and in reverse order during read operations. Common pipelines combine H5Pset_shuffle, H5Pset_deflate, and H5Pset_fletcher32.

Custom Filters

Applications can create and register custom filters:

Define a filter function with signature matching H5Z_func_t
Create a H5Z_class_t structure describing the filter
Register the filter using H5Zregister
Apply the filter using H5Pset_filter with the filter ID

Custom filters enable domain-specific data transformations, specialized compression algorithms, encryption, and other custom processing.

Querying Filters

The H5Z interface provides functions to query available filters:

H5Zfilter_avail checks if a filter is available
H5Zget_filter_info retrieves information about a filter's capabilities
H5Zunregister removes a filter from the pipeline

Filter Plugins

HDF5 supports dynamic loading of filter plugins, allowing filters to be added without recompiling applications. See HDF5 Filter Plugins for details on creating and using filter plugins.

Summary

The H5Z filter interface provides:

Built-in compression and error checking filters
Flexible filter pipeline mechanism
Support for custom user-defined filters
Dynamic filter plugin loading
Per-chunk processing for optimal performance

Filters are essential for reducing storage requirements and ensuring data integrity in HDF5 files while maintaining compatibility and performance.

Navigate back: Main / HDF5 User Guide