pytraj.cluster.kmeans(traj=None, mask='*', n_clusters=10, random_point=True, kseed=1, maxit=100, metric='rms', top=None, frame_indices=None, options='', dtype='ndarray')¶perform clustering and return cluster index for each frame
| Parameters: | traj : Trajectory-like or iterable that produces Frame mask : str, default: * (all atoms) n_clusters: int, default: 10 random_point : bool, default: True maxit : int, default: 100 
 metric : str, {‘rms’, ‘dme’} 
 top : Topology, optional, default: None 
 frame_indices : {None, 1D array-like}, optional 
 options : str, optional 
 Sieve options:: 
 Output options:: 
 Coordinate output options:: 
 | 
|---|---|
| Returns: | 1D numpy array of frame indices | 
Notes
options (check example)
- install libcpptraj with -openmp flag to speed up this calculation.
Examples
>>> import pytraj as pt
>>> from pytraj.cluster import kmeans
>>> traj = pt.datafiles.load_tz2()
>>> # use default options
>>> kmeans(traj)
array([8, 8, 6, ..., 0, 0, 0], dtype=int32)
>>> # update n_clusters
>>> data = kmeans(traj, n_clusters=5)
>>> # update n_clusters with CA atoms
>>> data = kmeans(traj, n_clusters=5, mask='@CA')
>>> # specify distance metric
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='dme')
>>> # add sieve number for less memory
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='rms', options='sieve 5')
>>> # add sieve number for less memory, and specify random seed for sieve
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='rms', options='sieve 5 sieveseed 1')