Migrating from PyTables 2.x to 3.x

Author:

Antonio Valentino

Author:

Anthony Scopatz

Author:

Thomas Provoost

This document describes the major changes in PyTables in going from the 2.x to 3.x series and what you need to know when migrating downstream code bases.

Python 3 at Last!

The PyTables 3.x series now ships with full compatibility for Python 3.1+. Additionally, we plan on maintaining compatibility with Python 2.7 for the foreseeable future. Python 2.6 is no longer supported but may work in most cases. Note that the entire 3.x series now relies on numexpr v2.1+, which itself is the first version of numexpr support both Python 2 & 3.

Numeric, Numarray, NetCDF3, & HDF5 1.6 No More!

PyTables no longer supports numeric and numarray. Please use numpy instead. Additionally, the tables.netcdf3 module has been removed. Please refer to the netcdf4-python project for further support. Lastly, the older HDF5 1.6 API is no longer supported. Please upgrade to HDF5 1.8+.

Unicode all the strings!

In Python 3, all strings are natively in Unicode. This introduces some difficulties, as the native HDF5 string format is not Unicode-compatible. To minimize explicit conversion troubles when writing, especially when creating data sets from existing Python objects, string objects are implicitly cast to non-Unicode for HDF5 storage. To make you aware of this, a warning is raised when this happens.

This is certainly no true Unicode compatibility, but mainly for convenience with the pure-Unicode Python 3 string type. Any string that is not castable as ascii upon creation of your data set, will hence still raise an error. For true Unicode support, look into the VLUnicodeAtom class.

Major API Changes

The PyTables developers, by popular demand, have taken this opportunity that a major version number upgrade affords to implement significant API changes. We have tried to do this in such a way that will not immediately break most existing code, though in some breakages may still occur.

PEP 8 Compliance

The PyTables 3.x series now follows PEP 8 coding standard. This makes using PyTables more idiomatic with surrounding Python code that also adheres to this standard. The primary way that the 2.x series was not PEP 8 compliant was with respect to variable naming conventions. Approximately 450 API variables were identified and updated for PyTables 3.x.

To ease migration, PyTables ships with a new pt2to3 command line tool. This tool will run over a file and replace any instances of the old variable names with the 3.x version of the name. This tool covers the overwhelming majority of cases was used to transition the PyTables code base itself! However, it may also accidentally also pick up variable names in 3rd party codes that have exactly the same name as a PyTables’ variable. This is because pt2to3 was implemented using regular expressions rather than a fancier AST-based method. By using regexes, pt2to3 works on Python and Cython code.

pt2to3 help:

usage: pt2to3 [-h] [-r] [-p] [-o OUTPUT] [-i] filename

PyTables 2.x -> 3.x API transition tool This tool displays to standard out, so
it is common to pipe this to another file: $ pt2to3 oldfile.py > newfile.py

positional arguments:
  filename              path to input file.

optional arguments:
  -h, --help            show this help message and exit
  -r, --reverse         reverts changes, going from 3.x -> 2.x.
  -p, --no-ignore-previous
                        ignores previous_api() calls.
  -o OUTPUT             output file to write to.
  -i, --inplace         overwrites the file in-place.

Note that pt2to3 only works on a single file, not a directory. However, a simple BASH script may be written to run pt2to3 over an entire directory and all sub-directories:

#!/bin/bash
for f in $(find .)
do
    echo $f
    pt2to3 $f > temp.txt
    mv temp.txt $f
done

Note

pt2to3 uses the argparse module that is part of the Python standard library since Python 2.7. Users of Python 2.6 should install argparse separately (e.g. via pip).

The old APIs and variable names will continue to be supported for the short term, where possible. (The major backwards incompatible changes come from the renaming of some function and method arguments and keyword arguments.) Using the 2.x APIs in the 3.x series, however, will issue warnings. The following is the release plan for the warning types:

  • 3.0 - PendingDeprecationWarning

  • 3.1 - DeprecationWarning

  • >=3.2 - Remove warnings, previous_api(), and _past.py; keep pt2to3,

The current plan is to maintain the old APIs for at least 2 years, though this is subject to change.

Consistent create_xxx() Signatures

Also by popular demand, it is now possible to create all data sets (Array, CArray, EArray, VLArray, and Table) from existing Python objects. Constructors for these classes now accept either of the following keyword arguments:

  • an obj to initialize with data

  • or both atom and shape to initialize an empty structure, if possible.

These keyword arguments are also now part of the function signature for the corresponding create_xxx() methods on the File class. These would be called as follows:

# All create methods will support the following
create_xxx(where, name, obj=obj)

# All non-variable length arrays support the following:
create_xxx(where, name, atom=atom, shape=shape)

Using obj or atom and shape are mutually exclusive. Previously only Array could be created with an existing Python object using the object keyword argument.

API Name Changes

The following tables shows the old 2.x names that have been update to their new values in the new 3.x series. Please use the pt2to3 tool to convert between these.

2.x Name

3.x Name

AtomFromHDF5Type

atom_from_hdf5_type

AtomToHDF5Type

atom_to_hdf5_type

BoolTypeNextAfter

bool_type_next_after

HDF5ClassToString

hdf5_class_to_string

HDF5ToNPExtType

hdf5_to_np_ext_type

HDF5ToNPNestedType

hdf5_to_np_nested_type

IObuf

iobuf

IObufcpy

iobufcpy

IntTypeNextAfter

int_type_next_after

NPExtPrefixesToPTKinds

npext_prefixes_to_ptkinds

PTSpecialKinds

pt_special_kinds

PTTypeToHDF5

pttype_to_hdf5

StringNextAfter

string_next_after

__allowedInitKwArgs

__allowed_init_kwargs

__getRootGroup

__get_root_group

__next__inKernel

__next__inkernel

_actionLogName

_action_log_name

_actionLogParent

_action_log_parent

_actionLogPath

_action_log_path

_addRowsToIndex

_add_rows_to_index

_appendZeros

_append_zeros

_autoIndex

_autoindex

_byteShape

_byte_shape

_c_classId

_c_classid

_c_shadowNameRE

_c_shadow_name_re

_cacheDescriptionData

_cache_description_data

_checkAndSetPair

_check_and_set_pair

_checkAttributes

_check_attributes

_checkBase

_checkbase

_checkColumn

_check_column

_checkGroup

_check_group

_checkNotClosed

_check_not_closed

_checkOpen

_check_open

_checkShape

_check_shape

_checkShapeAppend

_check_shape_append

_checkUndoEnabled

_check_undo_enabled

_checkWritable

_check_writable

_check_sortby_CSI

_check_sortby_csi

_closeFile

_close_file

_codeToOp

_code_to_op

_column__createIndex

_column__create_index

_compileCondition

_compile_condition

_conditionCache

_condition_cache

_convertTime64

_convert_time64

_convertTime64_

_convert_time64_

_convertTypes

_convert_types

_createArray

_create_array

_createCArray

_create_carray

_createMark

_create_mark

_createPath

_create_path

_createTable

_create_table

_createTransaction

_create_transaction

_createTransactionGroup

_create_transaction_group

_disableIndexingInQueries

_disable_indexing_in_queries

_doReIndex

_do_reindex

_emptyArrayCache

_empty_array_cache

_enableIndexingInQueries

_enable_indexing_in_queries

_enabledIndexingInQueries

_enabled_indexing_in_queries

_exprvarsCache

_exprvars_cache

_f_copyChildren

_f_copy_children

_f_delAttr

_f_delattr

_f_getAttr

_f_getattr

_f_getChild

_f_get_child

_f_isVisible

_f_isvisible

_f_iterNodes

_f_iter_nodes

_f_listNodes

_f_list_nodes

_f_setAttr

_f_setattr

_f_walkGroups

_f_walk_groups

_f_walkNodes

_f_walknodes

_fancySelection

_fancy_selection

_fillCol

_fill_col

_flushBufferedRows

_flush_buffered_rows

_flushFile

_flush_file

_flushModRows

_flush_mod_rows

_g_addChildrenNames

_g_add_children_names

_g_checkGroup

_g_check_group

_g_checkHasChild

_g_check_has_child

_g_checkName

_g_check_name

_g_checkNotContains

_g_check_not_contains

_g_checkOpen

_g_check_open

_g_closeDescendents

_g_close_descendents

_g_closeGroup

_g_close_group

_g_copyAsChild

_g_copy_as_child

_g_copyChildren

_g_copy_children

_g_copyRows

_g_copy_rows

_g_copyRows_optim

_g_copy_rows_optim

_g_copyWithStats

_g_copy_with_stats

_g_createHardLink

_g_create_hard_link

_g_delAndLog

_g_del_and_log

_g_delLocation

_g_del_location

_g_flushGroup

_g_flush_group

_g_getAttr

_g_getattr

_g_getChildGroupClass

_g_get_child_group_class

_g_getChildLeafClass

_g_get_child_leaf_class

_g_getGChildAttr

_g_get_gchild_attr

_g_getLChildAttr

_g_get_lchild_attr

_g_getLinkClass

_g_get_link_class

_g_listAttr

_g_list_attr

_g_listGroup

_g_list_group

_g_loadChild

_g_load_child

_g_logAdd

_g_log_add

_g_logCreate

_g_log_create

_g_logMove

_g_log_move

_g_maybeRemove

_g_maybe_remove

_g_moveNode

_g_move_node

_g_postInitHook

_g_post_init_hook

_g_postReviveHook

_g_post_revive_hook

_g_preKillHook

_g_pre_kill_hook

_g_propIndexes

_g_prop_indexes

_g_readCoords

_g_read_coords

_g_readSelection

_g_read_selection

_g_readSlice

_g_read_slice

_g_readSortedSlice

_g_read_sorted_slice

_g_refNode

_g_refnode

_g_removeAndLog

_g_remove_and_log

_g_setAttr

_g_setattr

_g_setLocation

_g_set_location

_g_setNestedNamesDescr

_g_set_nested_names_descr

_g_setPathNames

_g_set_path_names

_g_unrefNode

_g_unrefnode

_g_updateDependent

_g_update_dependent

_g_updateLocation

_g_update_location

_g_updateNodeLocation

_g_update_node_location

_g_updateTableLocation

_g_update_table_location

_g_widthWarning

_g_width_warning

_g_writeCoords

_g_write_coords

_g_writeSelection

_g_write_selection

_g_writeSlice

_g_write_slice

_getColumnInstance

_get_column_instance

_getConditionKey

_get_condition_key

_getContainer

_get_container

_getEnumMap

_get_enum_map

_getFileId

_get_file_id

_getFinalAction

_get_final_action

_getInfo

_get_info

_getLinkClass

_get_link_class

_getMarkID

_get_mark_id

_getNode

_get_node

_getOrCreatePath

_get_or_create_path

_getTypeColNames

_get_type_col_names

_getUnsavedNrows

_get_unsaved_nrows

_getValueFromContainer

_get_value_from_container

_hiddenNameRE

_hidden_name_re

_hiddenPathRE

_hidden_path_re

_indexNameOf

_index_name_of

_indexNameOf_

_index_name_of_

_indexPathnameOf

_index_pathname_of

_indexPathnameOfColumn

_index_pathname_of_column

_indexPathnameOfColumn_

_index_pathname_of_column_

_indexPathnameOf_

_index_pathname_of_

_initLoop

_init_loop

_initSortedSlice

_init_sorted_slice

_isWritable

_iswritable

_is_CSI

_is_csi

_killNode

_killnode

_lineChunkSize

_line_chunksize

_lineSeparator

_line_separator

_markColumnsAsDirty

_mark_columns_as_dirty

_newBuffer

_new_buffer

_notReadableError

_not_readable_error

_npSizeType

_npsizetype

_nxTypeFromNPType

_nxtype_from_nptype

_opToCode

_op_to_code

_openArray

_open_array

_openUnImplemented

_open_unimplemented

_pointSelection

_point_selection

_processRange

_process_range

_processRangeRead

_process_range_read

_pythonIdRE

_python_id_re

_reIndex

_reindex

_readArray

_read_array

_readCoordinates

_read_coordinates

_readCoords

_read_coords

_readIndexSlice

_read_index_slice

_readSelection

_read_selection

_readSlice

_read_slice

_readSortedSlice

_read_sorted_slice

_refNode

_refnode

_requiredExprVars

_required_expr_vars

_reservedIdRE

_reserved_id_re

_reviveNode

_revivenode

_saveBufferedRows

_save_buffered_rows

_searchBin

_search_bin

_searchBinNA_b

_search_bin_na_b

_searchBinNA_d

_search_bin_na_d

_searchBinNA_e

_search_bin_na_e

_searchBinNA_f

_search_bin_na_f

_searchBinNA_g

_search_bin_na_g

_searchBinNA_i

_search_bin_na_i

_searchBinNA_ll

_search_bin_na_ll

_searchBinNA_s

_search_bin_na_s

_searchBinNA_ub

_search_bin_na_ub

_searchBinNA_ui

_search_bin_na_ui

_searchBinNA_ull

_search_bin_na_ull

_searchBinNA_us

_search_bin_na_us

_setAttributes

_set_attributes

_setColumnIndexing

_set_column_indexing

_shadowName

_shadow_name

_shadowParent

_shadow_parent

_shadowPath

_shadow_path

_sizeToShape

_size_to_shape

_tableColumnPathnameOfIndex

_table_column_pathname_of_index

_tableFile

_table_file

_tablePath

_table_path

_table__autoIndex

_table__autoindex

_table__getautoIndex

_table__getautoindex

_table__setautoIndex

_table__setautoindex

_table__whereIndexed

_table__where_indexed

_transGroupName

_trans_group_name

_transGroupParent

_trans_group_parent

_transGroupPath

_trans_group_path

_transName

_trans_name

_transParent

_trans_parent

_transPath

_trans_path

_transVersion

_trans_version

_unrefNode

_unrefnode

_updateNodeLocations

_update_node_locations

_useIndex

_use_index

_vShape

_vshape

_vType

_vtype

_v__nodeFile

_v__nodefile

_v__nodePath

_v__nodepath

_v_colObjects

_v_colobjects

_v_maxGroupWidth

_v_max_group_width

_v_maxTreeDepth

_v_maxtreedepth

_v_nestedDescr

_v_nested_descr

_v_nestedFormats

_v_nested_formats

_v_nestedNames

_v_nested_names

_v_objectID

_v_objectid

_whereCondition

_where_condition

_writeCoords

_write_coords

_writeSelection

_write_selection

_writeSlice

_write_slice

appendLastRow

append_last_row

attrFromShadow

attr_from_shadow

attrToShadow

attr_to_shadow

autoIndex

autoindex

bufcoordsData

bufcoords_data

calcChunksize

calc_chunksize

checkFileAccess

check_file_access

checkNameValidity

check_name_validity

childName

childname

chunkmapData

chunkmap_data

classIdDict

class_id_dict

className

classname

classNameDict

class_name_dict

containerRef

containerref

convertToNPAtom

convert_to_np_atom

convertToNPAtom2

convert_to_np_atom2

copyChildren

copy_children

copyClass

copyclass

copyFile

copy_file

copyLeaf

copy_leaf

copyNode

copy_node

copyNodeAttrs

copy_node_attrs

countLoggedInstances

count_logged_instances

createArray

create_array

createCArray

create_carray

createCSIndex

create_csindex

createEArray

create_earray

createExternalLink

create_external_link

createGroup

create_group

createHardLink

create_hard_link

createIndex

create_index

createIndexesDescr

create_indexes_descr

createIndexesTable

create_indexes_table

createNestedType

create_nested_type

createSoftLink

create_soft_link

createTable

create_table

createVLArray

create_vlarray

defaultAutoIndex

default_auto_index

defaultIndexFilters

default_index_filters

delAttr

del_attr

delAttrs

_del_attrs

delNodeAttr

del_node_attr

detectNumberOfCores

detect_number_of_cores

disableUndo

disable_undo

dumpGroup

dump_group

dumpLeaf

dump_leaf

dumpLoggedInstances

dump_logged_instances

enableUndo

enable_undo

enumFromHDF5

enum_from_hdf5

enumToHDF5

enum_to_hdf5

fetchLoggedInstances

fetch_logged_instances

flushRowsToIndex

flush_rows_to_index

getAttr

get_attr

getAttrs

_get_attrs

getClassByName

get_class_by_name

getColsInOrder

get_cols_in_order

getCurrentMark

get_current_mark

getEnum

get_enum

getFilters

get_filters

getHDF5Version

get_hdf5_version

getIndices

get_indices

getLRUbounds

get_lru_bounds

getLRUsorted

get_lru_sorted

getLookupRange

get_lookup_range

getNestedField

get_nested_field

getNestedFieldCache

get_nested_field_cache

getNestedType

get_nested_type

getNode

get_node

getNodeAttr

get_node_attr

getPyTablesVersion

get_pytables_version

getTypeEnum

get_type_enum

getWhereList

get_where_list

hdf5Extension

hdf5extension

hdf5Version

hdf5_version

indexChunk

indexchunk

indexValid

indexvalid

indexValidData

index_valid_data

indexValues

indexvalues

indexValuesData

index_values_data

indexesExtension

indexesextension

infType

inftype

infinityF

infinityf

infinityMap

infinitymap

initRead

initread

isHDF5File

is_hdf5_file

isPyTablesFile

is_pytables_file

isUndoEnabled

is_undo_enabled

isVisible

isvisible

isVisibleName

isvisiblename

isVisibleNode

is_visible_node

isVisiblePath

isvisiblepath

is_CSI

is_csi

iterNodes

iter_nodes

iterseqMaxElements

iterseq_max_elements

joinPath

join_path

joinPaths

join_paths

linkExtension

linkextension

listLoggedInstances

list_logged_instances

listNodes

list_nodes

loadEnum

load_enum

logInstanceCreation

log_instance_creation

lrucacheExtension

lrucacheextension

metaIsDescription

MetaIsDescription

modifyColumn

modify_column

modifyColumns

modify_columns

modifyCoordinates

modify_coordinates

modifyRows

modify_rows

moveFromShadow

move_from_shadow

moveNode

move_node

moveToShadow

move_to_shadow

newNode

new_node

newSet

newset

newdstGroup

newdst_group

objectID

object_id

oldPathname

oldpathname

openFile

open_file

openNode

open_node

parentNode

parentnode

parentPath

parentpath

reIndex

reindex

reIndexDirty

reindex_dirty

readCoordinates

read_coordinates

readIndices

read_indices

readSlice

read_slice

readSorted

read_sorted

readWhere

read_where

read_sliceLR

read_slice_lr

recreateIndexes

recreate_indexes

redoAddAttr

redo_add_attr

redoCreate

redo_create

redoDelAttr

redo_del_attr

redoMove

redo_move

redoRemove

redo_remove

removeIndex

remove_index

removeNode

remove_node

removeRows

remove_rows

renameNode

rename_node

rootUEP

root_uep

searchLastRow

search_last_row

setAttr

set_attr

setAttrs

_set_attrs

setBloscMaxThreads

set_blosc_max_threads

setInputsRange

set_inputs_range

setNodeAttr

set_node_attr

setOutput

set_output

setOutputRange

set_output_range

silenceHDF5Messages

silence_hdf5_messages

splitPath

split_path

tableExtension

tableextension

undoAddAttr

undo_add_attr

undoCreate

undo_create

undoDelAttr

undo_del_attr

undoMove

undo_move

undoRemove

undo_remove

utilsExtension

utilsextension

walkGroups

walk_groups

walkNodes

walk_nodes

whereAppend

append_where

whereCond

wherecond

whichClass

which_class

whichLibVersion

which_lib_version

willQueryUseIndexing

will_query_use_indexing


Enjoy data!

—The PyTables Developers