Contents
Normally, user needs to follow below code to process many files
In [1]: import pytraj as pt
In [2]: import numpy as np
In [3]: template = 'tz2.%s.nc'
In [4]: flist = [template % str(i) for i in range(4)]
In [5]: print(flist)
['tz2.0.nc', 'tz2.1.nc', 'tz2.2.nc', 'tz2.3.nc']
In [6]: trajlist = [pt.iterload(fname, 'tz2.parm7') for fname in flist]
# calculate radgyr
In [7]: data = []
In [8]: for traj in trajlist:
...: data.append(pt.radgyr(traj))
...:
In [9]: data = np.array(data).flatten()
In [10]: print(data)
[ 9.4451 9.5746 10.1923 ..., 7.4834 7.4324 7.6204]
However, pytraj
offer shorter (and easy?) way to do
# load all files into a single TrajectoryIterator
In [11]: filelist = ['tz2.0.nc', 'tz2.1.nc', 'tz2.2.nc']
In [12]: traj = pt.iterload(filelist, 'tz2.parm7')
# perform calculation
In [13]: data = pt.radgyr(traj)
# that's it
In [14]: print(data)
[ 9.4451 9.5746 10.1923 ..., 7.4834 7.4324 7.6204]
Supposed you have a list of 5 (or whatever) trajectories, you only want to load those files 1st to 100-th frames
and skip every 10 frames. Below is a convention cpptraj
input.
parm 2koc.parm7
trajin traj0.nc 1 100 10
trajin traj1.nc 1 100 10
trajin traj2.nc 1 100 10
trajin traj3.nc 1 100 10
trajin traj4.nc 1 100 10
In pytraj
, you can specify frame_slice
import pytraj as pt
pt.iterload('traj*.nc', top='2koc.parm7', frame_slice=[(0, 100, 10),]*5)
# [(0, 100, 10),]*5 is equal to [(0, 100, 10), (0, 100, 10),(0, 100, 10),(0, 100, 10),(0, 100, 10),]
In [15]: import pytraj as pt
In [16]: frame_indices = [2, 4, 7, 51, 53]
# use ``load`` to load those frames to memory
In [17]: traj0 = pt.load('tz2.nc', 'tz2.parm7', frame_indices=frame_indices)
In [18]: traj0
Out[18]:
pytraj.Trajectory, 5 frames:
Size: 0.000025 (GB)
<Topology: 223 atoms, 13 residues, 1 mols, non-PBC>
# only loadd coordinates for specific atoms
In [19]: traj1 = pt.load('tz2.nc', 'tz2.parm7', frame_indices=frame_indices, mask='@CA')
In [20]: traj1
Out[20]:
pytraj.Trajectory, 5 frames:
Size: 0.000001 (GB)
<Topology: 12 atoms, 12 residues, 12 mols, non-PBC>
# or use ``iterload``
In [21]: frame_indices = [2, 4, 7, 51, 53]
In [22]: traj2 = pt.iterload('tz2.nc', 'tz2.parm7')
In [23]: traj2
Out[23]:
pytraj.TrajectoryIterator, 101 frames:
Size: 0.000503 (GB)
<Topology: 223 atoms, 13 residues, 1 mols, non-PBC>
In [24]: traj2[frame_indices, '@CA']
Out[24]:
pytraj.Trajectory, 5 frames:
Size: 0.000001 (GB)
<Topology: 12 atoms, 12 residues, 12 mols, non-PBC>
In [25]: import pytraj as pt
# load multiple files
In [26]: traj = pt.iterload(['tz2.0.nc', 'tz2.1.nc', 'tz2.2.nc'], top='tz2.parm7')
In [27]: traj.save('tz2_combined.nc', overwrite=True)
If memory is critical, do not load all frames into memory.
# DO this (only a single frame will be loaded to memory)
In [28]: pt.radgyr(traj, frame_indices=[0, 200, 300, 301])
Out[28]: array([ 9.4451, 7.4324, 7.4834, 7.4324])
# DON'T do this if you want to save memory (all 4 frames will be loaded to memory)
In [29]: pt.radgyr(traj[[0, 200, 300, 301]])
Out[29]: array([ 9.4451, 7.4324, 7.4834, 7.4324])
In [30]: pt.iterframe(traj, frame_indices=[0, 200, 300, 301])
Out[30]:
<FrameIterator with start=None, stop=None, step=None, n_frames=4,
frame_indices=[0, 200, 300, 301],
mask=None, autoimage=False, rmsfit=None, copy=False>
In [31]: traj[[0, 200, 300, 301]]
Out[31]:
pytraj.Trajectory, 4 frames:
Size: 0.000020 (GB)
<Topology: 223 atoms, 13 residues, 1 mols, non-PBC>
Example: calculate pairwise rmsd for 3000 frames (~3.9 G) only costs 112.7 MB
$ python memory/pairwise_rmsd_shorter_version.py
pytraj.TrajectoryIterator, 3000 frames:
Size: 3.911264 (GB)
<Topology: 58329 atoms, 19202 residues, 19168 mols, PBC with box type = ortho>
Filename: memory/pairwise_rmsd_shorter_version.py
Line # Mem usage Increment Line Contents
================================================
6 30.1 MiB 0.0 MiB @profile
7 def compute(mask='!:WAT,K+,Cl-', mat_type='half'):
8 64.0 MiB 33.9 MiB traj = pt.iterload('GAAC3.5000frames.nc', 'GAAC.topo', frame_slice=(0, 3000))
9 64.1 MiB 0.1 MiB print(traj)
10 112.7 MiB 48.5 MiB return pt.pairwise_rmsd(traj, mask=mask, mat_type=mat_type)
See also: Fancy indexing of trajectory
# convert Amber netcdf to Charmm dcd file.
pt.iterload('traj.nc', 'prmtop').save('traj.dcd', overwrite=True)
Sometimes you need to perform very long analysis (hours), you need to save the output to
disk to do further analysis. You have options to save data to different files and write
code to load the data back. However, you can use pytraj.to_pickle
and
pytraj.read_pickle
to save the state of data. Check the example:
In [32]: traj3 = pt.datafiles.load_trpcage()
In [33]: data = pt.dssp(traj, ':3-7')
In [34]: data
Out[34]:
(array(['THR:3', 'TRP:4', 'GLU:5', 'ASN:6', 'GLY:7'],
dtype='<U5'), array([['0', '0', '0', '0', '0'],
['0', '0', 'S', '0', '0'],
['0', '0', 'S', '0', '0'],
...,
['0', '0', '0', '0', '0'],
['0', '0', 'S', '0', '0'],
['0', '0', 'S', '0', '0']],
dtype='<U1'), <pytraj.DatasetList with 8 datasets>
none_avg
[ 1. 0.8 0.8 ..., 1. 0.8 0.8]
para_avg
[ 0. 0. 0. ..., 0. 0. 0.]
anti_avg
[ 0. 0. 0. ..., 0. 0. 0.]
...
turn_avg
[ 0. 0. 0. ..., 0. 0. 0.]
bend_avg
[ 0. 0.2 0.2 ..., 0. 0.2 0.2])
In [35]: pt.to_pickle(data, 'my_data.pk')
# load the data's state back for further analysis
In [36]: pt.read_pickle('my_data.pk')
Out[36]:
(array(['THR:3', 'TRP:4', 'GLU:5', 'ASN:6', 'GLY:7'],
dtype='<U5'), array([['0', '0', '0', '0', '0'],
['0', '0', 'S', '0', '0'],
['0', '0', 'S', '0', '0'],
...,
['0', '0', '0', '0', '0'],
['0', '0', 'S', '0', '0'],
['0', '0', 'S', '0', '0']],
dtype='<U1'), <pytraj.DatasetList with 8 datasets>
none_avg
[ 1. 0.8 0.8 ..., 1. 0.8 0.8]
para_avg
[ 0. 0. 0. ..., 0. 0. 0.]
anti_avg
[ 0. 0. 0. ..., 0. 0. 0.]
...
turn_avg
[ 0. 0. 0. ..., 0. 0. 0.]
bend_avg
[ 0. 0.2 0.2 ..., 0. 0.2 0.2])
$ cat > radgyr_mpi.py <<EOF
import pytraj as pt
# import mpi4py to get rank of each core
from mpi4py import MPI
comm = MPI.COMM_WORLD
# load trajectory to each core. Use iterload to save memory
traj = pt.iterload('tz2.nc', 'tz2.parm7')
# compute radgyr by sending this method to pt.pmap_mpi function
data = pt.pmap_mpi(pt.radgyr, traj, '@CA')
# data is sent to first core (rank=0)
if comm.rank == 0:
# save data
# pt.to_pickle(data, 'data.pk')
print(data)
EOF
$ # run
$ mpirun -n 4 python radgyr_mpi.py
# you can reload data by `pt.read_pickle('data.pk')
# send pt.radgyr method to `pt.pmap` function
# need to specify the number of cores
# (choose wisely)
In [37]: pt.pmap(pt.radgyr, traj, '@CA', n_cores=4)
Out[37]:
OrderedDict([('RoG_00000',
array([ 8.1092, 7.7643, 8.0969, ..., 6.3806, 6.2414, 6.4899]))])
This does not work with ipython-notebook but it’s still good for interactive ipython
In [106]: import pytraj as pt
In [107]: pt.info('radgyr')
[<name>] [<mask1>] [out <filename>] [mass] [nomax] [tensor]
Calculate radius of gyration of atoms in <mask>