FIt-SNE Based on Kluger Lab FIt-SNE

Modified version of https://github.com/KlugerLab/FIt-SNE/ implementation to expose argument names and defaults within CytoExploreR. This function should not be used directly, data should instead be mapped using cyto_map.

fftRtsne(
  X,
  dims = 2,
  perplexity = 30,
  theta = 0.5,
  max_iter = 750,
  fft_not_bh = TRUE,
  ann_not_vptree = TRUE,
  stop_early_exag_iter = 250,
  exaggeration_factor = 12,
  no_momentum_during_exag = FALSE,
  start_late_exag_iter = -1,
  late_exag_coeff = 1,
  mom_switch_iter = 250,
  momentum = 0.5,
  final_momentum = 0.8,
  learning_rate = "auto",
  n_trees = 50,
  search_k = -1,
  rand_seed = -1,
  nterms = 3,
  intervals_per_integer = 1,
  min_num_intervals = 50,
  K = -1,
  sigma = -30,
  initialization = "pca",
  max_step_norm = 5,
  load_affinities = NULL,
  fast_tsne_path = NULL,
  nthreads = 0,
  perplexity_list = NULL,
  get_costs = FALSE,
  df = 1
)

Arguments

X	a matrix containing the data to be mapped.
dims	dimensionality of the embedding, set to 2 by default.
perplexity	used to determine the bandwidth of the Gaussian kernel in the input space, set to 30 by default.
theta	set to 0 for exact t-SNE. If non-zero, then will use either Barnes Hut or FIt-SNE based on nbody_algo. If Barnes Hut, then this determins the accuracy of BH approximation. Set to 0.5 by default.
max_iter	number of iterations of t-SNE to run, set to 750 by default.
fft_not_bh	if theta is nonzero, this determins whether to use FIt-SNE or Barnes Hut approximation, set to TRUE by default for FIt-SNE.
ann_not_vptree	use vp-trees (as in bhtsne) or approximate nearest neighbors (default). Set to be TRUE for approximate nearest neighbors.
stop_early_exag_iter	when to switch off early exaggeration, set to 250 by default.
exaggeration_factor	coefficient for early exaggeration (>1), set to 12 by default.
no_momentum_during_exag	set to 0 to use momentum and other optimization tricks. Can be set to 1 to do plain, vanilla gradient descent (useful for testing large exaggeration coefficients).
start_late_exag_iter	when to start late exaggeration, set to -1 by default to not use late exaggeration.
late_exag_coeff	late exaggeration coefficient, set to 1 by default to not use late exaggeration.
mom_switch_iter	iteration number to switch from momentum to final_momentum, set to 250 by default.
momentum	initial value of momentum, set to 0.5 by default.
final_momentum	value of momentum to use later in the optimisation, set to 0.8 by default.
learning_rate	set to desired learning rate or 'auto', which sets learning rate to N/exaggeration_factor where N is the sample size, or to 200 if N/exaggeration_factor < 200.
n_trees	when using Annoy, the number of search trees to use, set to 50 by default.
search_k	When using Annoy, the number of nodes to inspect during search. Default is 3perplexityn_trees (or K*n_trees when using fixed sigma).
rand_seed	seed for random initialisation, set to -1 by default to initialise random number generator with current time.
nterms	if using FIt-SNE, this is the number of interpolation points per sub-interval, set to 3 by default.
intervals_per_integer	see min_num_intervals.
min_num_intervals	let maxloc = ceil(max(max(X))) and minloc = floor(min(min(X))). i.e. the points are in a [minloc]^no_dims by [maxloc]^no_dims interval/square. The number of intervals in each dimension is either min_num_intervals or ceil((maxloc - minloc)/intervals_per_integer), whichever is larger. min_num_intervals must be an integer >0, and intervals_per_integer must be >0. Defaults are min_num_intervals=50 and intervals_per_integer = 1.
K	number of nearest neighbours to get when using fixed sigma, set to -1 by default.
sigma	fixed sigma value to use when perplexity==-1, set to -30 by default.
initialization	'pca', 'random', or N x no_dims array to intialize the solution, set to 'pca' by default.
max_step_norm	maximum distance that a point is allowed to move on one iteration. Larger steps are clipped to this value. This prevents possible instabilities during gradient descent. Set to -1 to switch it off. Set to 5 by default.
load_affinities	if 1, input similarities are loaded from a file and not computed. If 2, input similarities are saved into a file. If 0, affinities are neither saved nor loaded.
fast_tsne_path	path to FItSNE executable.
nthreads	number of threads to use, set to use all available threads by default.
perplexity_list	if perplexity==0 then perplexity combination will be used with values taken from perplexity_list. Default: NULL df - Degree of freedom of t-distribution, must be greater than 0. Values smaller than 1 correspond to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details. Default is 1.0.
get_costs	logical indicating whether the KL-divergence costs computed every 50 iterations should be returned, set to FALSE by default.
df	positive numeric that controls the degree of freedom of t-distribution. The actual degree of freedom is 2*df-1. The standard t-SNE choice of 1 degree of freedom corresponds to df=1. Large df approximates Gaussian kernel. df<1 corresponds to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details. Default is 1.0.

References

Linderman, G., Rachh, M., Hoskins, J., Steinerberger, S., Kluger., Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6402590/.

Arguments

References

See also