Modified version of https://github.com/KlugerLab/FIt-SNE/
implementation to expose argument names and defaults within CytoExploreR.
This function should not be used directly, data should instead be mapped
using cyto_map
.
fftRtsne( X, dims = 2, perplexity = 30, theta = 0.5, max_iter = 750, fft_not_bh = TRUE, ann_not_vptree = TRUE, stop_early_exag_iter = 250, exaggeration_factor = 12, no_momentum_during_exag = FALSE, start_late_exag_iter = -1, late_exag_coeff = 1, mom_switch_iter = 250, momentum = 0.5, final_momentum = 0.8, learning_rate = "auto", n_trees = 50, search_k = -1, rand_seed = -1, nterms = 3, intervals_per_integer = 1, min_num_intervals = 50, K = -1, sigma = -30, initialization = "pca", max_step_norm = 5, load_affinities = NULL, fast_tsne_path = NULL, nthreads = 0, perplexity_list = NULL, get_costs = FALSE, df = 1 )
X | a matrix containing the data to be mapped. |
---|---|
dims | dimensionality of the embedding, set to 2 by default. |
perplexity | used to determine the bandwidth of the Gaussian kernel in the input space, set to 30 by default. |
theta | set to 0 for exact t-SNE. If non-zero, then will use either Barnes Hut or FIt-SNE based on nbody_algo. If Barnes Hut, then this determins the accuracy of BH approximation. Set to 0.5 by default. |
max_iter | number of iterations of t-SNE to run, set to 750 by default. |
fft_not_bh | if theta is nonzero, this determins whether to use FIt-SNE or Barnes Hut approximation, set to TRUE by default for FIt-SNE. |
ann_not_vptree | use vp-trees (as in bhtsne) or approximate nearest neighbors (default). Set to be TRUE for approximate nearest neighbors. |
stop_early_exag_iter | when to switch off early exaggeration, set to 250 by default. |
exaggeration_factor | coefficient for early exaggeration (>1), set to 12 by default. |
no_momentum_during_exag | set to 0 to use momentum and other optimization tricks. Can be set to 1 to do plain, vanilla gradient descent (useful for testing large exaggeration coefficients). |
start_late_exag_iter | when to start late exaggeration, set to -1 by default to not use late exaggeration. |
late_exag_coeff | late exaggeration coefficient, set to 1 by default to not use late exaggeration. |
mom_switch_iter | iteration number to switch from momentum to final_momentum, set to 250 by default. |
momentum | initial value of momentum, set to 0.5 by default. |
final_momentum | value of momentum to use later in the optimisation, set to 0.8 by default. |
learning_rate | set to desired learning rate or 'auto', which sets learning rate to N/exaggeration_factor where N is the sample size, or to 200 if N/exaggeration_factor < 200. |
n_trees | when using Annoy, the number of search trees to use, set to 50 by default. |
search_k | When using Annoy, the number of nodes to inspect during search. Default is 3*perplexity*n_trees (or K*n_trees when using fixed sigma). |
rand_seed | seed for random initialisation, set to -1 by default to initialise random number generator with current time. |
nterms | if using FIt-SNE, this is the number of interpolation points per sub-interval, set to 3 by default. |
intervals_per_integer | see min_num_intervals. |
min_num_intervals | let maxloc = ceil(max(max(X))) and minloc = floor(min(min(X))). i.e. the points are in a [minloc]^no_dims by [maxloc]^no_dims interval/square. The number of intervals in each dimension is either min_num_intervals or ceil((maxloc - minloc)/intervals_per_integer), whichever is larger. min_num_intervals must be an integer >0, and intervals_per_integer must be >0. Defaults are min_num_intervals=50 and intervals_per_integer = 1. |
K | number of nearest neighbours to get when using fixed sigma, set to -1 by default. |
sigma | fixed sigma value to use when perplexity==-1, set to -30 by default. |
initialization | 'pca', 'random', or N x no_dims array to intialize the solution, set to 'pca' by default. |
max_step_norm | maximum distance that a point is allowed to move on one iteration. Larger steps are clipped to this value. This prevents possible instabilities during gradient descent. Set to -1 to switch it off. Set to 5 by default. |
load_affinities | if 1, input similarities are loaded from a file and not computed. If 2, input similarities are saved into a file. If 0, affinities are neither saved nor loaded. |
fast_tsne_path | path to FItSNE executable. |
nthreads | number of threads to use, set to use all available threads by default. |
perplexity_list | if perplexity==0 then perplexity combination will be used with values taken from perplexity_list. Default: NULL df - Degree of freedom of t-distribution, must be greater than 0. Values smaller than 1 correspond to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details. Default is 1.0. |
get_costs | logical indicating whether the KL-divergence costs computed every 50 iterations should be returned, set to FALSE by default. |
df | positive numeric that controls the degree of freedom of t-distribution. The actual degree of freedom is 2*df-1. The standard t-SNE choice of 1 degree of freedom corresponds to df=1. Large df approximates Gaussian kernel. df<1 corresponds to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details. Default is 1.0. |
Linderman, G., Rachh, M., Hoskins, J., Steinerberger, S., Kluger., Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6402590/.