Helper classes

This section describes some classes that do not fit in any other section and that mainly serve for ancillary purposes.

The Filters class

class tables.Filters(complevel: int = 0, complib: Literal['zlib', 'lzo', 'bzip2', 'blosc', 'blosc2'] = 'zlib', shuffle: bool = True, bitshuffle: bool = False, fletcher32: bool = False, least_significant_digit: int | None = None, _new: bool = True)[source]

Container for filter properties.

This class is meant to serve as a container that keeps information about the filter properties associated with the chunked leaves, that is Table, CArray, EArray and VLArray.

Instances of this class can be directly compared for equality.

Parameters:
  • complevel (int) – Specifies a compression level for data. The allowed range is 0-9. A value of 0 (the default) disables compression.

  • complib (str) – Specifies the compression library to be used. Right now, ‘zlib’ (the default), ‘lzo’, ‘bzip2’, ‘blosc’ and ‘blosc2’ are supported. Additional compressors for Blosc like ‘blosc:blosclz’ (‘blosclz’ is the default in case the additional compressor is not specified), ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:zlib’ and ‘blosc:zstd’ are supported too. Also, additional compressors for Blosc2 like ‘blosc2:blosclz’ (‘blosclz’ is the default in case the additional compressor is not specified), ‘blosc2:lz4’, ‘blosc2:lz4hc’, ‘blosc2:zlib’ and ‘blosc2:zstd’ are supported too. Specifying a compression library which is not available in the system issues a FiltersWarning and sets the library to the default one.

  • shuffle (bool) – Whether to use the Shuffle filter in the HDF5 library. This is normally used to improve the compression ratio. A false value disables shuffling and a true one enables it. The default value depends on whether compression is enabled or not; if compression is enabled, shuffling defaults to be enabled, else shuffling is disabled. Shuffling can only be used when compression is enabled.

  • bitshuffle (bool) – Whether to use the BitShuffle filter in the Blosc/Blosc2 libraries. This is normally used to improve the compression ratio. A false value disables bitshuffling and a true one enables it. The default value is disabled.

  • fletcher32 (bool) – Whether to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on each data chunk. A false value (the default) disables the checksum.

  • least_significant_digit (int) –

    If specified, data will be truncated (quantized). In conjunction with enabling compression, this produces ‘lossy’, but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.

    Note

    quantization is only applied if some form of compression is enabled

Examples

This is a small example on using the Filters class:

import numpy as np
import tables as tb

fileh = tb.open_file('test5.h5', mode='w')
atom = Float32Atom()
filters = Filters(complevel=1, complib='blosc', fletcher32=True)
arr = fileh.create_earray(fileh.root, 'earray', atom, (0,2),
                         "A growable array", filters=filters)

# Append several rows in only one call
arr.append(np.array([[1., 2.],
                     [2., 3.],
                     [3., 4.]], dtype=np.float32))

# Print information on that enlargeable array
print("Result Array:")
print(repr(arr))
fileh.close()

This enforces the use of the Blosc library, a compression level of 1 and a Fletcher32 checksum filter as well. See the output of this example:

Result Array:
/earray (EArray(3, 2), fletcher32, shuffle, blosc(1)) 'A growable ...
type = float32
shape = (3, 2)
itemsize = 4
nrows = 3
extdim = 0
flavor = 'numpy'
byteorder = 'little'

Filters attributes

fletcher32

Whether the Fletcher32 filter is active or not.

complevel

The compression level (0 disables compression).

complib

The compression filter used (irrelevant when compression is not enabled).

shuffle

Whether the Shuffle filter is active or not.

bitshuffle

Whether the BitShuffle filter is active or not (Blosc/Blosc2 only).

Filters methods

Filters.copy(**override) Filters[source]

Get a copy of the filters, possibly overriding some arguments.

Constructor arguments to be overridden must be passed as keyword arguments.

Using this method is recommended over replacing the attributes of an instance, since instances of this class may become immutable in the future:

>>> filters1 = Filters()
>>> filters2 = filters1.copy()
>>> filters1 == filters2
True
>>> filters1 is filters2
False
>>> filters3 = filters1.copy(complevel=1) 
Traceback (most recent call last):
...
ValueError: compression library ``None`` is not supported...
>>> filters3 = filters1.copy(complevel=1, complib='zlib')
>>> print(filters1)
Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)
>>> print(filters3)
Filters(complevel=1, complib='zlib', shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)
>>> filters1.copy(foobar=42) 
Traceback (most recent call last):
...
TypeError: ...__init__() got an unexpected keyword argument ...

The Index class

class tables.index.Index(parentnode: Group, name: str, atom: Atom | None = None, title: str = '', kind: Literal['ultralight', 'light', 'medium', 'full'] | None = None, optlevel: int | None = None, filters: Filters | None = None, tmp_dir: str | None = None, expectedrows: int = 0, byteorder: str | None = None, blocksizes: tuple[int, int, int, int] | None = None, new: bool = True)[source]

Represents the index of a column in a table.

This class is used to keep the indexing information for columns in a Table dataset (see The Table class). It is actually a descendant of the Group class (see The Group class), with some added functionality. An Index is always associated with one and only one column in the table.

Note

This class is mainly intended for internal use, but some of its documented attributes and methods may be interesting for the programmer.

Parameters:
  • parentnode

    The parent Group object.

    Changed in version 3.0: Renamed from parentNode to parentnode.

  • name (str) – The name of this node in its parent group.

  • atom (Atom) – An Atom object representing the shape and type of the atomic objects to be saved. Only scalar atoms are supported.

  • title – Sets a TITLE attribute of the Index entity.

  • kind – The desired kind for this index. The ‘full’ kind specifies a complete track of the row position (64-bit), while the ‘medium’, ‘light’ or ‘ultralight’ kinds only specify in which chunk the row is (using 32-bit, 16-bit and 8-bit respectively).

  • optlevel – The desired optimization level for this index.

  • filters (Filters) – An instance of the Filters class that provides information about the desired I/O filters to be applied during the life of this object.

  • tmp_dir – The directory for the temporary files.

  • expectedrows – Represents an user estimate about the number of row slices that will be added to the growable dimension in the IndexArray object.

  • byteorder – The byteorder of the index datasets on-disk.

  • blocksizes – The four main sizes of the compound blocks in index datasets (a low level parameter).

  • new – Whether this Index is new or has to be read from disk.

Index instance variables

Index.column

Column instance for the indexed column.

See The Column class.

Index.dirty

Whether the index is dirty or not.

Dirty indexes are out of sync with column data, so they exist but they are not usable.

Index.filters

Filter properties for this index.

See Filters in The Filters class.

Index.is_csi

Whether the index is completely sorted or not.

Changed in version 3.0: The is_CSI property has been renamed into is_csi.

tables.index.Index.nelements

The number of currently indexed rows for this column.

Index methods

Index.read_sorted(start: int | None = None, stop: int | None = None, step: int | None = None) ndarray[source]

Return the sorted values of index in the specified range.

The meaning of the start, stop and step arguments is the same as in Table.read_sorted().

Index.read_indices(start: int | None = None, stop: int | None = None, step: int | None = None) ndarray[source]

Return the indices values of index in the specified range.

The meaning of the start, stop and step arguments is the same as in Table.read_sorted().

Index special methods

Index.__getitem__(key: int | slice) int | ndarray[source]

Return the indices values of index in the specified range.

If key argument is an integer, the corresponding index is returned. If key is a slice, the range of indices determined by it is returned. A negative value of step in slice is supported, meaning that the results will be returned in reverse order.

This method is equivalent to Index.read_indices().

The IndexArray class

class tables.indexes.IndexArray(parentnode: Group, name: str, atom: Atom | None = None, title: str = '', filters: Filters | None = None, byteorder: str | None = None)[source]

Represent the index (sorted or reverse index) dataset in HDF5 file.

All NumPy typecodes are supported except for complex datatypes.

Parameters:
  • parentnode

    The Index class from which this object will hang off.

    Changed in version 3.0: Renamed from parentNode to parentnode.

  • name (str) – The name of this node in its parent group.

  • atom – An Atom object representing the shape and type of the atomic objects to be saved. Only scalar atoms are supported.

  • title – Sets a TITLE attribute on the array entity.

  • filters (Filters) – An instance of the Filters class that provides information about the desired I/O filters to be applied during the life of this object.

  • byteorder – The byteroder of the data on-disk.

property chunksize: int

Size of the chunk for the object.

property slicesize: int

Size of the slice for the object.

The Enum class

class tables.misc.enum.Enum(enum: list[str] | tuple[str, ...] | dict[str, Any] | Enum)[source]

Enumerated type.

Each instance of this class represents an enumerated type. The values of the type must be declared exhaustively and named with strings, and they might be given explicit concrete values, though this is not compulsory. Once the type is defined, it can not be modified.

There are three ways of defining an enumerated type. Each one of them corresponds to the type of the only argument in the constructor of Enum:

  • Sequence of names: each enumerated value is named using a string, and its order is determined by its position in the sequence; the concrete value is assigned automatically:

    >>> boolEnum = Enum(['True', 'False'])
    
  • Mapping of names: each enumerated value is named by a string and given an explicit concrete value. All of the concrete values must be different, or a ValueError will be raised:

    >>> priority = Enum({'red': 20, 'orange': 10, 'green': 0})
    >>> colors = Enum({'red': 1, 'blue': 1})
    Traceback (most recent call last):
    ...
    ValueError: enumerated values contain duplicate concrete values: 1
    
  • Enumerated type: in that case, a copy of the original enumerated type is created. Both enumerated types are considered equal:

    >>> prio2 = Enum(priority)
    >>> priority == prio2
    True
    

Please note that names starting with _ are not allowed, since they are reserved for internal usage:

>>> prio2 = Enum(['_xx'])
Traceback (most recent call last):
...
ValueError: name of enumerated value can not start with ``_``: '_xx'

The concrete value of an enumerated value is obtained by getting its name as an attribute of the Enum instance (see __getattr__()) or as an item (see __getitem__()). This allows comparisons between enumerated values and assigning them to ordinary Python variables:

>>> redv = priority.red
>>> redv == priority['red']
True
>>> redv > priority.green
True
>>> priority.red == priority.orange
False

The name of the enumerated value corresponding to a concrete value can also be obtained by using the __call__() method of the enumerated type. In this way you get the symbolic name to use it later with __getitem__():

>>> priority(redv)
'red'
>>> priority.red == priority[priority(priority.red)]
True

(If you ask, the __getitem__() method is not used for this purpose to avoid ambiguity in the case of using strings as concrete values.)

Enum special methods

Enum.__call__(value: Any, *default: Any) Any[source]

Get the name of the enumerated value with that concrete value.

If there is no value with that concrete value in the enumeration and a second argument is given as a default, this is returned. Else, a ValueError is raised.

This method can be used for checking that a concrete value belongs to the set of concrete values in an enumerated type.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> enum(5)
'T2'
>>> enum(42, None) is None
True
>>> enum(42)
Traceback (most recent call last):
  ...
ValueError: no enumerated value with that concrete value: 42
Enum.__contains__(name: str) bool[source]

Return True if the Enum has a value with the specified name.

If the enumerated type has an enumerated value with that name, True is returned. Otherwise, False is returned. The name must be a string.

This method does not check for concrete values matching a value in an enumerated type. For that, please use the Enum.__call__() method.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> 'T1' in enum
True
>>> 'foo' in enum
False
>>> 0 in enum
Traceback (most recent call last):
  ...
TypeError: name of enumerated value is not a string: 0
>>> enum.T1 in enum  # Be careful with this!
Traceback (most recent call last):
  ...
TypeError: name of enumerated value is not a string: 2
Enum.__eq__(other: Enum) bool[source]

Return True if other equivalent to this enumerated type.

Two enumerated types are equivalent if they have exactly the same enumerated values (i.e. with the same names and concrete values).

Examples

Let enum* be enumerated types defined as:

>>> enum1 = Enum({'T0': 0, 'T1': 2})
>>> enum2 = Enum(enum1)
>>> enum3 = Enum({'T1': 2, 'T0': 0})
>>> enum4 = Enum({'T0': 0, 'T1': 2, 'T2': 5})
>>> enum5 = Enum({'T0': 0})
>>> enum6 = Enum({'T0': 10, 'T1': 20})

then:

>>> enum1 == enum1
True
>>> enum1 == enum2 == enum3
True
>>> enum1 == enum4
False
>>> enum5 == enum1
False
>>> enum1 == enum6
False

Comparing enumerated types with other kinds of objects produces a false result:

>>> enum1 == {'T0': 0, 'T1': 2}
False
>>> enum1 == ['T0', 'T1']
False
>>> enum1 == 2
False
Enum.__getattr__(name: str) Any[source]

Get the concrete value of the enumerated value with that name.

The name of the enumerated value must be a string. If there is no value with that name in the enumeration, an AttributeError is raised.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> enum.T1
2
>>> enum.foo
Traceback (most recent call last):
  ...
AttributeError: no enumerated value with that name: 'foo'
Enum.__getitem__(name: str) Any[source]

Get the concrete value of the enumerated value with that name.

The name of the enumerated value must be a string. If there is no value with that name in the enumeration, a KeyError is raised.

Examples

Let enum be an enumerated type defined as:

>>> enum = Enum({'T0': 0, 'T1': 2, 'T2': 5})

then:

>>> enum['T1']
2
>>> enum['foo']
Traceback (most recent call last):
  ...
KeyError: "no enumerated value with that name: 'foo'"
Enum.__iter__() Generator[Any][source]

Iterate over the enumerated values.

Enumerated values are returned as (name, value) pairs in no particular order.

Examples

>>> enumvals = {'red': 4, 'green': 2, 'blue': 1}
>>> enum = Enum(enumvals)
>>> enumdict = dict([(name, value) for (name, value) in enum])
>>> enumvals == enumdict
True
Enum.__len__() int[source]

Return the number of enumerated values in the enumerated type.

Examples

>>> len(Enum(['e%d' % i for i in range(10)]))
10
Enum.__repr__() str[source]

Return the canonical string representation of the enumeration.

The output of this method can be evaluated to give a new enumeration object that will compare equal to this one.

Examples

>>> repr(Enum({'name': 10}))
"Enum({'name': 10})"

The UnImplemented class

class tables.UnImplemented(parentnode: Group, name: str)[source]

Class represents datasets not supported by PyTables in an HDF5 file.

When reading a generic HDF5 file (i.e. one that has not been created with PyTables, but with some other HDF5 library based tool), chances are that the specific combination of datatypes or dataspaces in some dataset might not be supported by PyTables yet. In such a case, this dataset will be mapped into an UnImplemented instance and the user will still be able to access the complete object tree of the generic HDF5 file. The user will also be able to read and write the attributes of the dataset, access some of its metadata, and perform certain hierarchy manipulation operations like deleting or moving (but not copying) the node. Of course, the user will not be able to read the actual data on it.

This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets are not supported by PyTables. However, if you are really interested in having full access to an unimplemented dataset, please get in contact with the developer team.

This class does not have any public instance variables or methods, except those inherited from the Leaf class (see The Leaf class).

byteorder: str | None

The endianness of data in memory (‘big’, ‘little’ or ‘irrelevant’).

nrows

The length of the first dimension of the data.

shape

The shape of the stored data.

The Unknown class

class tables.Unknown(parentnode: Group, name: str)[source]

Class representing nodes reported as unknown by the HDF5 library.

This class does not have any public instance variables or methods, except those inherited from the Node class.

The ChunkInfo class

class tables.ChunkInfo(start: tuple[int, ...] | None, filter_mask: int | None, offset: int | None, size: int | None)[source]

Information about storage for a given chunk.

It may also refer to a chunk which is within the dataset’s shape but that does not exist in storage, i.e. a missing chunk.

An instance of this named tuple class contains the following information, in field order:

start

The coordinates in dataset items where the chunk starts, a tuple of integers with the same rank as the dataset. These coordinates are always aligned with chunk boundaries. Also present for missing chunks.

filter_mask

An integer where each active bit signals that the filter in its position in the pipeline was disabled when storing the chunk. For instance, 0b10 disables shuffling, 0b100 disables szip, and so on. None for missing chunks.

offset

An integer which indicates the offset in bytes of chunk data as it exists in storage. None for missing chunks.

size

An integer which indicates the size in bytes of chunk data as it exists in storage. None for missing chunks.

Exceptions module

In the exceptions module exceptions and warnings that are specific to PyTables are declared.

exception tables.HDF5ExtError(*args, **kargs)[source]

A low level HDF5 operation failed.

This exception is raised the low level PyTables components used for accessing HDF5 files. It usually signals that something is not going well in the HDF5 library or even at the Input/Output level.

Errors in the HDF5 C library may be accompanied by an extensive HDF5 back trace on standard error (see also tables.silence_hdf5_messages()).

Changed in version 2.4.

Parameters:
  • message – error message

  • h5bt

    This parameter (keyword only) controls the HDF5 back trace handling. Any keyword arguments other than h5bt is ignored.

format_h5_backtrace(backtrace: list[tuple[str, int, str, str]] | None = None) str[source]

Convert the HDF5 trace back into a string.

The HDF5 trace back is represented as a list of tuples.

See HDF5ExtError.h5backtrace.

Added in version 2.4.

classmethod set_policy_from_env() str[source]

Set the policy from environment variables.

DEFAULT_H5_BACKTRACE_POLICY = 'VERBOSE'

Default policy for HDF5 backtrace handling

  • if set to False the HDF5 back trace is ignored and the HDF5ExtError.h5backtrace attribute is set to None

  • if set to True the back trace is retrieved from the HDF5 library and stored in the HDF5ExtError.h5backtrace attribute as a list of tuples

  • if set to “VERBOSE” (default) the HDF5 back trace is stored in the HDF5ExtError.h5backtrace attribute and also included in the string representation of the exception

This parameter can be set using the PT_DEFAULT_H5_BACKTRACE_POLICY environment variable. Allowed values are “IGNORE” (or “FALSE”), “SAVE” (or “TRUE”) and “VERBOSE” to set the policy to False, True and “VERBOSE” respectively. The special value “DEFAULT” can be used to reset the policy to the default value

Added in version 2.4.

h5backtrace

HDF5 back trace.

Contains the HDF5 back trace as a (possibly empty) list of tuples. Each tuple has the following format:

(filename, line number, function name, text)

Depending on the value of the h5bt parameter passed to the initializer the h5backtrace attribute can be set to None. This means that the HDF5 back trace has been simply ignored (not retrieved from the HDF5 C library error stack) or that there has been an error (silently ignored) during the HDF5 back trace retrieval.

Added in version 2.4.

See also

traceback.format_list

traceback.format_list()

exception tables.ClosedNodeError[source]

The operation can not be completed because the node is closed.

For instance, listing the children of a closed group is not allowed.

exception tables.ClosedFileError[source]

The operation can not be completed because the hosting file is closed.

For instance, getting an existing node from a closed file is not allowed.

exception tables.FileModeError[source]

FIle mode error.

The operation can not be carried out because the mode in which the hosting file is opened is not adequate.

For instance, removing an existing leaf from a read-only file is not allowed.

exception tables.NodeError[source]

Invalid hierarchy manipulation operation requested.

This exception is raised when the user requests an operation on the hierarchy which can not be run because of the current layout of the tree. This includes accessing nonexistent nodes, moving or copying or creating over an existing node, non-recursively removing groups with children, and other similarly invalid operations.

A node in a PyTables database cannot be simply overwritten by replacing it. Instead, the old node must be removed explicitly before another one can take its place. This is done to protect interactive users from inadvertently deleting whole trees of data by a single erroneous command.

exception tables.NoSuchNodeError[source]

An operation was requested on a node that does not exist.

This exception is raised when an operation gets a path name or a (where, name) pair leading to a nonexistent node.

exception tables.UndoRedoError[source]

Problems with doing/redoing actions with Undo/Redo feature.

This exception indicates a problem related to the Undo/Redo mechanism, such as trying to undo or redo actions with this mechanism disabled, or going to a nonexistent mark.

exception tables.UndoRedoWarning[source]

Issued when an action not supporting Undo/Redo is run.

This warning is only shown when the Undo/Redo mechanism is enabled.

exception tables.NaturalNameWarning[source]

Issued when a non-pythonic name is given for a node.

This is not an error and may even be very useful in certain contexts, but one should be aware that such nodes cannot be accessed using natural naming (instead, getattr() must be used explicitly).

exception tables.PerformanceWarning[source]

Warning for operations which may cause a performance drop.

This warning is issued when an operation is made on the database which may cause it to slow down on future operations (i.e. making the node tree grow too much).

exception tables.FlavorError[source]

Unsupported or unavailable flavor or flavor conversion.

This exception is raised when an unsupported or unavailable flavor is given to a dataset, or when a conversion of data between two given flavors is not supported nor available.

exception tables.FlavorWarning[source]

Unsupported or unavailable flavor conversion.

This warning is issued when a conversion of data between two given flavors is not supported nor available, and raising an error would render the data inaccessible (e.g. on a dataset of an unavailable flavor in a read-only file).

See the FlavorError class for more information.

exception tables.FiltersWarning[source]

Unavailable filters.

This warning is issued when a valid filter is specified but it is not available in the system. It may mean that an available default filter is to be used instead.

exception tables.OldIndexWarning[source]

Unsupported index format.

This warning is issued when an index in an unsupported format is found. The index will be marked as invalid and will behave as if it doesn’t exist.

exception tables.DataTypeWarning[source]

Unsupported data type.

This warning is issued when an unsupported HDF5 data type is found (normally in a file created with other tool than PyTables).

exception tables.ExperimentalFeatureWarning[source]

Generic warning for experimental features.

This warning is issued when using a functionality that is still experimental and that users have to use with care.

exception tables.ChunkError[source]

An operation related to direct chunk access failed.

This exception may be related with the properties of the dataset or the chunk being accessed, or with how the chunk is being accessed. It is a base for more specific exceptions.

exception tables.NotChunkedError[source]

A direct chunking operation was attempted on a non-chunked dataset.

For instance, chunk information was requested for a plain Array instance.

exception tables.NotChunkAlignedError[source]

Coordinate not aligned to the chunks.

A direct chunk read/write operation was given coordinates that do not match the chunk’s start.

These operations require coordinates that are integer multiples of the dataset’s chunksize.

exception tables.NoSuchChunkError[source]

The chunk with the given coordinates does not exist in storage.

The coordinates are within the dataset’s shape, though.

This is only an error when the chunk is to be read. Such a missing chunk can be written, in which case it is created in storage.