pytraj.cluster.
kmeans
(traj=None, mask='*', n_clusters=10, random_point=True, kseed=1, maxit=100, metric='rms', top=None, frame_indices=None, options='', dtype='ndarray')¶perform clustering and return cluster index for each frame
Parameters: | traj : Trajectory-like or iterable that produces Frame mask : str, default: * (all atoms) n_clusters: int, default: 10 random_point : bool, default: True maxit : int, default: 100
metric : str, {‘rms’, ‘dme’}
top : Topology, optional, default: None
frame_indices : {None, 1D array-like}, optional
options : str, optional
Sieve options::
Output options::
Coordinate output options::
|
---|---|
Returns: | 1D numpy array of frame indices |
Notes
options
(check example)
- install libcpptraj
with -openmp
flag to speed up this calculation.
Examples
>>> import pytraj as pt
>>> from pytraj.cluster import kmeans
>>> traj = pt.datafiles.load_tz2()
>>> # use default options
>>> cluster_data = kmeans(traj)
>>> cluster_data.cluster_index
array([8, 8, 6, ..., 0, 0, 0], dtype=int32)
>>> cluster_data.centroids
array([95, 34, 42, 40, 71, 10, 12, 74, 1, 64], dtype=int32)
>>> # update n_clusters
>>> data = kmeans(traj, n_clusters=5)
>>> # update n_clusters with CA atoms
>>> data = kmeans(traj, n_clusters=5, mask='@CA')
>>> # specify distance metric
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='dme')
>>> # add sieve number for less memory
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='rms', options='sieve 5')
>>> # add sieve number for less memory, and specify random seed for sieve
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='rms', options='sieve 5 sieveseed 1')