Random Projections An Implementation In R

Random projections is an interesting concept. It focusses mainly on the result from the Johnson-Lindenstrauss lemma, which essentially says, give me an error, and I will give you a projection to a lower dimension in such a way that the distances between points are nearly preserved (with respect to the error you gave me).

This is interesting and can be used in dimension reduction problems, in a similar way that PCA is used. However its approach is very different. Within PCA, we in essence say, we want to project to a (given) lower dimension, give me a matrix which reduces the projection distance. This is subtle enough to worth considering.

Another difference, as indicated in the title of this post is that random projections (RP) are random! Where PCA basically has one solution, RP has potentially infinite. Here is some R code which produces random projections based on several papers freely available on the internet.


#' random projections

#' starting with a matrix A, with n records, and m features
#' we want to reduce it to E, which is nxk features
#' 
#' what we need to do is produce a matrix R which is m by k
#' and fill it with eps, -1, 1 with probability 2/3, 1/6, 1/6
#' 
#' Then we will just go ahead and multy AxR to find E, there are 
#' various shortcuts to make it really fast! but lets not worry about that
#' 
#' this code is a mix of the sklearn code `random_projection.py` and also here: 
#' http://www.cs.ucsc.edu/~optas/papers/jl.pdf

johnson_loindenstrauss_min_dim <- function(n_samples, eps=0.1) {
  denominator = (eps ** 2 / 2) - (eps ** 3 / 3)
  # the min function is required in this case, due to the theorem 
  # which says that we require k0 to be less than k. 
  return (min(floor(4 * log(n_samples) / denominator),n_samples))
}

generate_random_matrix <- function(eps=0.1, n=1, rows=NULL, cols=NULL) {
  if(is.null(rows) | is.null(cols)) {
    return(sample(sqrt(3)*c(-1, 1, 0), n, replace=TRUE, prob=c(1/6, 1/6, 2/3)))  
  } else {
    R <- sample(sqrt(3)*c(-1, 1, 0), rows*cols, replace=TRUE, prob=c(1/6, 1/6, 2/3))
    dim(R) <- c(rows, cols) 
    return(R)
  }  
}

#' generate the projection as required, if the dimensions are not
#' specified we will use the dimensions generated by 
#' `johnson_liondenstrauss_min_dim` as defined above..
random_projection <- function(A, n_features=NULL, eps=0.1) {
  # convert to matrix if the format is a dataframe.
  if (is.data.frame(A)) {
    #check_numeric types
    if (sum(sapply(A, is.numeric)) != length(names(A))){
      warning("Not all columns are numeric. Non-numeric columns will be ignored.")
    }
    A <- as.matrix(A[, sapply(A, is.numeric)])        
  }
  
  get_dim <- dim(A)
  if (is.null(n_features)){
    n_features = johnson_loindenstrauss_min_dim(get_dim[2]) # we want to reduce the number of features!
  }
  R = (1/sqrt(n_features))*generate_random_matrix(eps, cols=n_features, rows=get_dim[2])
  
  return(A %*% R)
}


rowA <- 4
colA <- 4
A <- rnorm(rowA*colA)
dim(A) <- c(rowA, colA)

E <- random_projection(A, n_features=2)