h5py-Tutorial

Data binge, March 8th, 2019

Import packages

In [10]:
import h5py
import numpy as np
import matplotlib.pyplot as plt

The general paradigm for working with HDF5 files is to:

  1. Open/Create the object
  2. Access the object
  3. Close the object
In [11]:
f = h5py.File('/Users/chgautschi/Desktop/laser_stim_exp.h5','r')
'''
Do your analysis/modifiaction

'''
f.close()

-> Data is only written after calling .close()

Open and browse groups of an existing HDF5 file

In [12]:
with h5py.File('/Users/chgautschi/Desktop/laser_stim_exp.h5','r') as f:
    for group in f.keys():
        print(group)
124563082491
1304000026
2016050966
2016080033
801010272
801010278
801010378
801010459
801010534
801010543
801010546

-> Groups work somewhat like python dictionaries

In [9]:
with h5py.File('/Users/chgautschi/Desktop/laser_stim_exp.h5','r') as f:
    for group,content in f.items():
        print(group)
        for sub_group in content.keys():
            print('\t'+sub_group)
124563082491
	log
	ref_im
	targets
	trial_image
	trial_videos
1304000026
	log
	ref_im
	targets
	trial_image
	trial_videos
2016050966
	log
	trial_image
	trial_videos
2016080033
	log
	ref_im
	targets
	trial_image
	trial_videos
801010272
	log
	ref_im
	targets
	trial_image
	trial_videos
801010278
	log
	trial_image
	trial_videos
801010378
	log
	trial_image
	trial_videos
801010459
	log
	trial_image
	trial_videos
801010534
	log
	ref_im
	trial_image
	trial_videos
801010543
	log
	ref_im
	targets
	trial_image
	trial_videos
801010546
	log
	trial_image
	trial_videos

Dataset vs. numpy array

In [13]:
with h5py.File('/Users/chgautschi/Desktop/laser_stim_exp.h5','r') as f:
    mouse_ref_im_dset = f['801010543/ref_im']
    print(type(mouse_ref_im_dset))
    print(mouse_ref_im_dset.shape)
    mouse_ref_im = f['801010543/ref_im'][:,:,:]
    print(type(mouse_ref_im))
<class 'h5py._hl.dataset.Dataset'>
(256, 256, 3)
<class 'numpy.ndarray'>

-> Datasets live in the hard drive, numpy arrays in RAM memory

In [14]:
plt.imshow(mouse_ref_im)
Out[14]:
<matplotlib.image.AxesImage at 0x119321da0>

Partial access of (big) dataset

In [17]:
with h5py.File('/Users/chgautschi/Desktop/laser_stim_exp.h5','r') as f:
    print(f['801010543/trial_videos/trial_4'].shape)
    video_slice = f['801010543/trial_videos/trial_4'][300:302,:,:]

fig = plt.figure(figsize=(10,10))
for ids,vid in enumerate(video_slice):  
    fig.add_subplot(1,2,ids+1)
    plt.imshow(vid)
(1327, 256, 256)

-> Datasets work somewhat like numpy arrays

Slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays

-> Data is read from disk only when needed

Writing of new datasets in HDF5 file

Steps to create a dataset:

  1. Define the dataset characteristics (datatype, dataspace, properties).
  2. Decide which group to attach the dataset to.
  3. Create the dataset.
  4. Close the dataset.
In [16]:
path = '/Users/chgautschi/Desktop/2019_2_13_IK2_reaching1.raw'
mouse_id = '801010272'
vid = np.fromfile(path,dtype='int8').reshape((-1,320,640))
timestamp = 1551984753.0323927

with h5py.File('/Users/chgautschi/Desktop/laser_stim_exp.h5','r+') as f:
    behavior_group = f[mouse_id].require_group('behavior_videos')
    b_vid = behavior_group.require_dataset('session_1',shape=vid.shape,dtype=np.uint8,data=vid)
    
    b_vid.attrs.create('timestamp',timestamp)
    b_vid.attrs.create('parameter_a',0)
    b_vid.attrs.create('parameter_b',1)