pytraj.cluster.
kmeans
(traj=None, mask='*', n_clusters=10, random_point=True, kseed=1, maxit=100, metric='rms', top=None, frame_indices=None, options='', dtype='ndarray')¶perform clustering and return cluster index for each frame
Parameters: | traj : Trajectory-like or iterable that produces Frame mask : str, default: * (all atoms) n_clusters: int, default: 10 random_point : bool, default: True maxit : int, default: 100
metric : str, {‘rms’, ‘dme’}
top : Topology, optional, default: None
frame_indices : {None, 1D array-like}, optional
options : str, optional
Sieve options::
Output options::
Coordinate output options::
|
---|---|
Returns: | 1D numpy array of frame indices |
Notes
options
(check example)
- install libcpptraj
with -openmp
flag to speed up this calculation.
Examples
>>> import pytraj as pt
>>> from pytraj.cluster import kmeans
>>> traj = pt.datafiles.load_tz2()
>>> # use default options
>>> cluster_data = kmeans(traj)
>>> cluster_data.cluster_index
array([8, 8, 6, ..., 0, 0, 0], dtype=int32)
>>> cluster_data.centroids
array([95, 34, 42, 40, 71, 10, 12, 74, 1, 64], dtype=int32)
>>> # update n_clusters
>>> data = kmeans(traj, n_clusters=5)
>>> # update n_clusters with CA atoms
>>> data = kmeans(traj, n_clusters=5, mask='@CA')
>>> # specify distance metric
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='dme')
>>> # add sieve number for less memory
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='rms', options='sieve 5')
>>> # add sieve number for less memory, and specify random seed for sieve
>>> data = kmeans(traj, n_clusters=5, mask='@CA', kseed=100, metric='rms', options='sieve 5 sieveseed 1')
pytraj.cluster.
dbscan
(traj=None, mask='', options='', dtype='dataset')¶clustering. Limited support.
Parameters: | traj : Trajectory-like or any iterable that produces Frame mask : str
dtype : str
top : Topology, optional options: str
|
---|
Notes
Call pytraj._verbose() to see more output. Turn it off by pytraj._verbose(False)
cpptraj manual:
Algorithms:
[hieragglo [epsilon <e>] [clusters <n>] [linkage|averagelinkage|complete]
[epsilonplot <file>]]
[dbscan minpoints <n> epsilon <e> [sievetoframe] [kdist <k> [kfile <prefix>]]]
[dpeaks epsilon <e> [noise] [dvdfile <density_vs_dist_file>]
[choosepoints {manual | auto}]
[distancecut <distcut>] [densitycut <densitycut>]
[runavg <runavg_file>] [deltafile <file>] [gauss]]
[kmeans clusters <n> [randompoint [kseed <seed>]] [maxit <iterations>]
[{readtxt|readinfo} infofile <file>]
Distance metric options: {rms | srmsd | dme | data}
{ [[rms | srmsd] [<mask>] [mass] [nofit]] | [dme [<mask>]] |
[data <dset0>[,<dset1>,...]] }
[sieve <#> [random [sieveseed <#>]]] [loadpairdist] [savepairdist] [pairdist <name>]
[pairwisecache {mem | none}]
Output options:
[out <cnumvtime>] [gracecolor] [summary <summaryfile>] [info <infofile>]
[summarysplit <splitfile>] [splitframe <comma-separated frame list>]
[clustersvtime <filename> cvtwindow <window size>]
[cpopvtime <file> [normpop | normframe]] [lifetime]
[sil <silhouette file prefix>]
Coordinate output options:
[ clusterout <trajfileprefix> [clusterfmt <trajformat>] ]
[ singlerepout <trajfilename> [singlerepfmt <trajformat>] ]
[ repout <repprefix> [repfmt <repfmt>] [repframe] ]
[ avgout <avgprefix> [avgfmt <avgfmt>] ]
Experimental options:
[[drawgraph | drawgraph3d] [draw_tol <tolerance>] [draw_maxit <iterations]]
Cluster structures based on coordinates (RMSD/DME) or given data set(s).
<crd set> can be created with the 'createcrd' command.
Examples
>>> import pytraj as pt
>>> traj = pt.datafiles.load_tz2()
>>> data = pt.cluster.dbscan(traj, mask='@CA', options='epsilon 1.7 minpoints 5')