Helper classes

This section describes some classes that do not fit in any other section and that mainly serve for ancillary purposes.

The Filters class

class tables.Filters(complevel: int = 0, complib: Literal['zlib', 'lzo', 'bzip2', 'blosc', 'blosc2'] = 'zlib', shuffle: bool = True, bitshuffle: bool = False, fletcher32: bool = False, least_significant_digit: int | None = None, _new: bool = True)[source]

Container for filter properties.

This class is meant to serve as a container that keeps information about the filter properties associated with the chunked leaves, that is Table, CArray, EArray and VLArray.

Instances of this class can be directly compared for equality.

Parameters:

complevel (int) – Specifies a compression level for data. The allowed range is 0-9. A value of 0 (the default) disables compression.
complib (str) – Specifies the compression library to be used. Right now, ‘zlib’ (the default), ‘lzo’, ‘bzip2’, ‘blosc’ and ‘blosc2’ are supported. Additional compressors for Blosc like ‘blosc:blosclz’ (‘blosclz’ is the default in case the additional compressor is not specified), ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:zlib’ and ‘blosc:zstd’ are supported too. Also, additional compressors for Blosc2 like ‘blosc2:blosclz’ (‘blosclz’ is the default in case the additional compressor is not specified), ‘blosc2:lz4’, ‘blosc2:lz4hc’, ‘blosc2:zlib’ and ‘blosc2:zstd’ are supported too. Specifying a compression library which is not available in the system issues a FiltersWarning and sets the library to the default one.
shuffle (bool) – Whether to use the Shuffle filter in the HDF5 library. This is normally used to improve the compression ratio. A false value disables shuffling and a true one enables it. The default value depends on whether compression is enabled or not; if compression is enabled, shuffling defaults to be enabled, else shuffling is disabled. Shuffling can only be used when compression is enabled.
bitshuffle (bool) – Whether to use the BitShuffle filter in the Blosc/Blosc2 libraries. This is normally used to improve the compression ratio. A false value disables bitshuffling and a true one enables it. The default value is disabled.
fletcher32 (bool) – Whether to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on each data chunk. A false value (the default) disables the checksum.
least_significant_digit (int) –
If specified, data will be truncated (quantized). In conjunction with enabling compression, this produces ‘lossy’, but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.

Note

quantization is only applied if some form of compression is enabled

Examples

This is a small example on using the Filters class:

import numpy as np
import tables as tb

fileh = tb.open_file('test5.h5', mode='w')
atom = Float32Atom()
filters = Filters(complevel=1, complib='blosc', fletcher32=True)
arr = fileh.create_earray(fileh.root, 'earray', atom, (0,2),
                         "A growable array", filters=filters)

# Append several rows in only one call
arr.append(np.array([[1., 2.],
                     [2., 3.],
                     [3., 4.]], dtype=np.float32))

# Print information on that enlargeable array
print("Result Array:")
print(repr(arr))
fileh.close()

This enforces the use of the Blosc library, a compression level of 1 and a Fletcher32 checksum filter as well. See the output of this example:

Result Array:
/earray (EArray(3, 2), fletcher32, shuffle, blosc(1)) 'A growable ...
type = float32
shape = (3, 2)
itemsize = 4
nrows = 3
extdim = 0
flavor = 'numpy'
byteorder = 'little'

Filters attributes

fletcher32: Whether the Fletcher32 filter is active or not.

complevel: The compression level (0 disables compression).

complib: The compression filter used (irrelevant when compression is not enabled).

shuffle: Whether the Shuffle filter is active or not.

bitshuffle: Whether the BitShuffle filter is active or not (Blosc/Blosc2 only).

Filters methods

Filters.copy(**override) → Filters[source]

Get a copy of the filters, possibly overriding some arguments.

Constructor arguments to be overridden must be passed as keyword arguments.

Using this method is recommended over replacing the attributes of an instance, since instances of this class may become immutable in the future:

>>> filters1 = Filters()
>>> filters2 = filters1.copy()
>>> filters1 == filters2
True
>>> filters1 is filters2
False
>>> filters3 = filters1.copy(complevel=1) 
Traceback (most recent call last):
...
ValueError: compression library ``None`` is not supported...
>>> filters3 = filters1.copy(complevel=1, complib='zlib')
>>> print(filters1)
Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)
>>> print(filters3)
Filters(complevel=1, complib='zlib', shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)
>>> filters1.copy(foobar=42) 
Traceback (most recent call last):
...
TypeError: ...__init__() got an unexpected keyword argument ...

The Index class

class tables.index.Index(parentnode: Group, name: str, atom: Atom | None = None, title: str = '', kind: Literal['ultralight', 'light', 'medium', 'full'] | None = None, optlevel: int | None = None, filters: Filters | None = None, tmp_dir: str | None = None, expectedrows: int = 0, byteorder: str | None = None, blocksizes: tuple[int, int, int, int] | None = None, new: bool = True)[source]

Represents the index of a column in a table.

This class is used to keep the indexing information for columns in a Table dataset (see The Table class). It is actually a descendant of the Group class (see The Group class), with some added functionality. An Index is always associated with one and only one column in the table.

Note

This class is mainly intended for internal use, but some of its documented attributes and methods may be interesting for the programmer.

Parameters:

parentnode –
The parent Group object.

Changed in version 3.0: Renamed from parentNode to parentnode.
name (str) – The name of this node in its parent group.
atom (Atom) – An Atom object representing the shape and type of the atomic objects to be saved. Only scalar atoms are supported.
title – Sets a TITLE attribute of the Index entity.
kind – The desired kind for this index. The ‘full’ kind specifies a complete track of the row position (64-bit), while the ‘medium’, ‘light’ or ‘ultralight’ kinds only specify in which chunk the row is (using 32-bit, 16-bit and 8-bit respectively).
optlevel – The desired optimization level for this index.
filters (Filters) – An instance of the Filters class that provides information about the desired I/O filters to be applied during the life of this object.
tmp_dir – The directory for the temporary files.
expectedrows – Represents an user estimate about the number of row slices that will be added to the growable dimension in the IndexArray object.
byteorder – The byteorder of the index datasets on-disk.
blocksizes – The four main sizes of the compound blocks in index datasets (a low level parameter).
new – Whether this Index is new or has to be read from disk.

Index instance variables

Index.column

Column instance for the indexed column.

See The Column class.

Index.dirty

Whether the index is dirty or not.

Dirty indexes are out of sync with column data, so they exist but they are not usable.

Index.filters

Filter properties for this index.

See Filters in The Filters class.

Index.is_csi: Whether the index is completely sorted or not.

Changed in version 3.0: The is_CSI property has been renamed into is_csi.

tables.index.Index.nelements: The number of currently indexed rows for this column.

Index methods

Index.read_sorted(start: int | None = None, stop: int | None = None, step: int | None = None) → ndarray[source]

Return the sorted values of index in the specified range.

The meaning of the start, stop and step arguments is the same as in Table.read_sorted().

Index.read_indices(start: int | None = None, stop: int | None = None, step: int | None = None) → ndarray[source]

Return the indices values of index in the specified range.

The meaning of the start, stop and step arguments is the same as in Table.read_sorted().

Index special methods

Index.__getitem__(key: int | slice) → int | ndarray[source]

Return the indices values of index in the specified range.

If key argument is an integer, the corresponding index is returned. If key is a slice, the range of indices determined by it is returned. A negative value of step in slice is supported, meaning that the results will be returned in reverse order.

This method is equivalent to Index.read_indices().

The IndexArray class

class tables.indexes.IndexArray(parentnode: Group, name: str, atom: Atom | None = None, title: str = '', filters: Filters | None = None, byteorder: str | None = None)[source]

Represent the index (sorted or reverse index) dataset in HDF5 file.

All NumPy typecodes are supported except for complex datatypes.

Parameters:

parentnode –
The Index class from which this object will hang off.

Changed in version 3.0: Renamed from parentNode to parentnode.
name (str) – The name of this node in its parent group.
atom – An Atom object representing the shape and type of the atomic objects to be saved. Only scalar atoms are supported.
title – Sets a TITLE attribute on the array entity.
filters (Filters) – An instance of the Filters class that provides information about the desired I/O filters to be applied during the life of this object.
byteorder – The byteroder of the data on-disk.

property chunksize: int: Size of the chunk for the object.

property slicesize: int: Size of the slice for the object.

The Enum class

class tables.misc.enum.Enum(enum: list[str] | tuple[str, ...] | dict[str, Any] | Enum)[source]

Enumerated type.

Each instance of this class represents an enumerated type. The values of the type must be declared exhaustively and named with strings, and they might be given explicit concrete values, though this is not compulsory. Once the type is defined, it can not be modified.

There are three ways of defining an enumerated type. Each one of them corresponds to the type of the only argument in the constructor of Enum:

Sequence of names: each enumerated value is named using a string, and its order is determined by its position in the sequence; the concrete value is assigned automatically:
```
>>> boolEnum = Enum(['True', 'False'])
```

Mapping of names: each enumerated value is named by a string and given an explicit concrete value. All of the concrete values must be different, or a ValueError will be raised:

>>> priority = Enum({'red': 20, 'orange': 10, 'green': 0})
>>> colors = Enum({'red': 1, 'blue': 1})
Traceback (most recent call last):
...
ValueError: enumerated values contain duplicate concrete values: 1

Enumerated type: in that case, a copy of the original enumerated type is created. Both enumerated types are considered equal:
```
>>> prio2 = Enum(priority)
>>> priority == prio2
True
```

Please note that names starting with _ are not allowed, since they are reserved for internal usage:

>>> prio2 = Enum(['_xx'])
Traceback (most recent call last):
...
ValueError: name of enumerated value can not start with ``_``: '_xx'

The concrete value of an enumerated value is obtained by getting its name as an attribute of the Enum instance (see __getattr__()) or as an item (see __getitem__()). This allows comparisons between enumerated values and assigning them to ordinary Python variables:

>>> redv = priority.red
>>> redv == priority['red']
True
>>> redv > priority.green
True
>>> priority.red == priority.orange
False

The name of the enumerated value corresponding to a concrete value can also be obtained by using the __call__() method of the enumerated type. In this way you get the symbolic name to use it later with __getitem__():

>>> priority(redv)
'red'
>>> priority.red == priority[priority(priority.red)]
True

(If you ask, the __getitem__() method is not used for this purpose to avoid ambiguity in the case of using strings as concrete values.)

Enum special methods

Enum.__call__(value: Any, *default: Any) → Any[source]

Get the name of the enumerated value with that concrete value.

If there is no value with that concrete value in the enumeration and a second argument is given as a default, this is returned. Else, a ValueError is raised.

This method can be used for checking that a concrete value belongs to the set of concrete values in an enumerated type.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> enum(5)
'T2'
>>> enum(42, None) is None
True
>>> enum(42)
Traceback (most recent call last):
  ...
ValueError: no enumerated value with that concrete value: 42

Enum.__contains__(name: str) → bool[source]

Return True if the Enum has a value with the specified name.

If the enumerated type has an enumerated value with that name, True is returned. Otherwise, False is returned. The name must be a string.

This method does not check for concrete values matching a value in an enumerated type. For that, please use the Enum.__call__() method.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> 'T1' in enum
True
>>> 'foo' in enum
False
>>> 0 in enum
Traceback (most recent call last):
  ...
TypeError: name of enumerated value is not a string: 0
>>> enum.T1 in enum  # Be careful with this!
Traceback (most recent call last):
  ...
TypeError: name of enumerated value is not a string: 2

Enum.__eq__(other: Enum) → bool[source]

Return True if other equivalent to this enumerated type.

Two enumerated types are equivalent if they have exactly the same enumerated values (i.e. with the same names and concrete values).

Examples

Let enum* be enumerated types defined as:

>>> enum1 = Enum({'T0': 0, 'T1': 2})
>>> enum2 = Enum(enum1)
>>> enum3 = Enum({'T1': 2, 'T0': 0})
>>> enum4 = Enum({'T0': 0, 'T1': 2, 'T2': 5})
>>> enum5 = Enum({'T0': 0})
>>> enum6 = Enum({'T0': 10, 'T1': 20})

then:

>>> enum1 == enum1
True
>>> enum1 == enum2 == enum3
True
>>> enum1 == enum4
False
>>> enum5 == enum1
False
>>> enum1 == enum6
False

Comparing enumerated types with other kinds of objects produces a false result:

>>> enum1 == {'T0': 0, 'T1': 2}
False
>>> enum1 == ['T0', 'T1']
False
>>> enum1 == 2
False

Enum.__getattr__(name: str) → Any[source]

Get the concrete value of the enumerated value with that name.

The name of the enumerated value must be a string. If there is no value with that name in the enumeration, an AttributeError is raised.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> enum.T1
2
>>> enum.foo
Traceback (most recent call last):
  ...
AttributeError: no enumerated value with that name: 'foo'

Enum.__getitem__(name: str) → Any[source]

Get the concrete value of the enumerated value with that name.

The name of the enumerated value must be a string. If there is no value with that name in the enumeration, a KeyError is raised.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> enum['T1']
2
>>> enum['foo']
Traceback (most recent call last):
  ...
KeyError: "no enumerated value with that name: 'foo'"

Enum.__iter__() → Generator[Any][source]

Iterate over the enumerated values.

Enumerated values are returned as (name, value) pairs in no particular order.

Examples

>>> enumvals = {'red': 4, 'green': 2, 'blue': 1}
>>> enum = Enum(enumvals)
>>> enumdict = dict([(name, value) for (name, value) in enum])
>>> enumvals == enumdict
True

Enum.__len__() → int[source]

Return the number of enumerated values in the enumerated type.

Examples

>>> len(Enum(['e%d' % i for i in range(10)]))
10

Enum.__repr__() → str[source]

Return the canonical string representation of the enumeration.

The output of this method can be evaluated to give a new enumeration object that will compare equal to this one.

Examples

>>> repr(Enum({'name': 10}))
"Enum({'name': 10})"

The UnImplemented class

class tables.UnImplemented(parentnode: Group, name: str)[source]

Class represents datasets not supported by PyTables in an HDF5 file.

When reading a generic HDF5 file (i.e. one that has not been created with PyTables, but with some other HDF5 library based tool), chances are that the specific combination of datatypes or dataspaces in some dataset might not be supported by PyTables yet. In such a case, this dataset will be mapped into an UnImplemented instance and the user will still be able to access the complete object tree of the generic HDF5 file. The user will also be able to read and write the attributes of the dataset, access some of its metadata, and perform certain hierarchy manipulation operations like deleting or moving (but not copying) the node. Of course, the user will not be able to read the actual data on it.

This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets are not supported by PyTables. However, if you are really interested in having full access to an unimplemented dataset, please get in contact with the developer team.

This class does not have any public instance variables or methods, except those inherited from the Leaf class (see The Leaf class).

byteorder: str | None: The endianness of data in memory (‘big’, ‘little’ or ‘irrelevant’).

nrows: The length of the first dimension of the data.

shape: The shape of the stored data.

The Unknown class

class tables.Unknown(parentnode: Group, name: str)[source]

Class representing nodes reported as unknown by the HDF5 library.

This class does not have any public instance variables or methods, except those inherited from the Node class.

The ChunkInfo class

class tables.ChunkInfo(start: tuple[int, ...] | None, filter_mask: int | None, offset: int | None, size: int | None)[source]

Information about storage for a given chunk.

It may also refer to a chunk which is within the dataset’s shape but that does not exist in storage, i.e. a missing chunk.

An instance of this named tuple class contains the following information, in field order:

start: The coordinates in dataset items where the chunk starts, a tuple of integers with the same rank as the dataset. These coordinates are always aligned with chunk boundaries. Also present for missing chunks.

filter_mask: An integer where each active bit signals that the filter in its position in the pipeline was disabled when storing the chunk. For instance, 0b10 disables shuffling, 0b100 disables szip, and so on. None for missing chunks.

offset: An integer which indicates the offset in bytes of chunk data as it exists in storage. None for missing chunks.

size: An integer which indicates the size in bytes of chunk data as it exists in storage. None for missing chunks.

Exceptions module

In the exceptions module exceptions and warnings that are specific to PyTables are declared.

exception tables.HDF5ExtError(*args, **kargs)[source]

A low level HDF5 operation failed.

This exception is raised the low level PyTables components used for accessing HDF5 files. It usually signals that something is not going well in the HDF5 library or even at the Input/Output level.

Errors in the HDF5 C library may be accompanied by an extensive HDF5 back trace on standard error (see also tables.silence_hdf5_messages()).

Changed in version 2.4.

Parameters:

message – error message
h5bt –
This parameter (keyword only) controls the HDF5 back trace handling. Any keyword arguments other than h5bt is ignored.
- if set to False the HDF5 back trace is ignored and the HDF5ExtError.h5backtrace attribute is set to None
- if set to True the back trace is retrieved from the HDF5 library and stored in the HDF5ExtError.h5backtrace attribute as a list of tuples
- if set to “VERBOSE” (default) the HDF5 back trace is stored in the HDF5ExtError.h5backtrace attribute and also included in the string representation of the exception
- if not set (or set to None) the default policy is used (see HDF5ExtError.DEFAULT_H5_BACKTRACE_POLICY)

format_h5_backtrace(backtrace: list[tuple[str, int, str, str]] | None = None) → str[source]

Convert the HDF5 trace back into a string.

The HDF5 trace back is represented as a list of tuples.

See HDF5ExtError.h5backtrace.

Added in version 2.4.

classmethod set_policy_from_env() → str[source]: Set the policy from environment variables.

DEFAULT_H5_BACKTRACE_POLICY = 'VERBOSE'

Default policy for HDF5 backtrace handling

if set to False the HDF5 back trace is ignored and the HDF5ExtError.h5backtrace attribute is set to None
if set to True the back trace is retrieved from the HDF5 library and stored in the HDF5ExtError.h5backtrace attribute as a list of tuples
if set to “VERBOSE” (default) the HDF5 back trace is stored in the HDF5ExtError.h5backtrace attribute and also included in the string representation of the exception

This parameter can be set using the PT_DEFAULT_H5_BACKTRACE_POLICY environment variable. Allowed values are “IGNORE” (or “FALSE”), “SAVE” (or “TRUE”) and “VERBOSE” to set the policy to False, True and “VERBOSE” respectively. The special value “DEFAULT” can be used to reset the policy to the default value

Added in version 2.4.

h5backtrace

HDF5 back trace.

Contains the HDF5 back trace as a (possibly empty) list of tuples. Each tuple has the following format:

(filename, line number, function name, text)

Depending on the value of the h5bt parameter passed to the initializer the h5backtrace attribute can be set to None. This means that the HDF5 back trace has been simply ignored (not retrieved from the HDF5 C library error stack) or that there has been an error (silently ignored) during the HDF5 back trace retrieval.

Added in version 2.4.