Example for dimension reduction (SVD vs Random Projection) in R - r

I am learning about dimension reduction techniques in R. I take one image as input and I have reduced dimension using svd using this code
library(raster)
img <- raster("C:/Users/***/Pictures/pansy.jpg")
img_flip <- flip(img, direction = "y")
img <- t(as.matrix(img_flip))
dim(img)
image(img,col=grey(seq(0,1,length=256))) # Actual
img_svd <- svd(img)
u <- img_svd$u
d <- diag(img_svd$d)
v <- img_svd$v
u1 <- as.matrix(u[, 1:50])
d1 <- as.matrix(d[1:50, 1:50])
v1 <- as.matrix(v[, 1:50])
photo1 <- u1 %*% d1 %*% t(v1)
image(photo1, col = grey(seq(0, 1, length = 256))) #Reduced
After that, I got Output Like this
Then I learned about Random Projection to compare with svd,
I followed the below steps for the same Image
Convert the image into matrix [465 X 600]
Create a Random matrix of size [600 X 300] filled with +1 and -1 and equal probability
Multiply both the matrix and I get a matrix of size [465 X 300]
Code:
rp_img <- raster("C:/Users/***/Pictures/pansy.jpg")
img_flip <- flip(rp_img, direction = "y")
rp_img <- t(as.matrix(img_flip))
rm <- form_sparse_matrix(600,n_cols = ncol(rp_img),TRUE,0.5,method = "probability") # To create a random matrix with +1 and -1 of equal probability
photo2 <- rp_img %*% rm
image(photo2, col = grey(seq(0, 1, length = 256)))
After that, I view the image, It show like below
Whether my steps for performing Random projection is correct or wrong? Where have I done a mistake?

Related

Add a gaussian blur with specified sigma to a satellite image using R

I am trying to add a gaussian blur with sigma = 0.5 to a satellite image. I have found the package spatialEco has a function called gaussian.kernel which does exactly what I want. So basically I convert my raster image into a matrix, then I create a gaussian matrix with sigma = 0.5 and finally, I multiply those two matrices. After the multiplication I am receiving an error: Error in gm * m : non-conformable arrays. I am guessing it's because in the function gaussian.kernel I don't set the n parameter to equals the dimensions of my matrix. Can some1 help me? Here is the code:
library(raster)
library(spatialEco)
pan = raster("path/pan.tif")
m = as.matrix(pan)
dim(m) #2775 1435 are the dimensions of the matrix.
gm = spatialEco::gaussian.kernel(sigma = 0.5, n = ????)
wm = gm * m
Just use raster.gaussian.smooth directly on the RasterLayer object as in:
smoothed = spatialEco::raster.gaussian.smooth(pan, sigma = 0.5)
plot(pan)
plot(smoothed)
Gaussian filters do not work like you described. The dimension of the kernel is
much smaller than the image. This is because submatrices from the initial image
are convoluted with the kernel in so-called focal or moving window operations. Just give the source code a look:
# This is the code for spatialEco::gaussian.kernel
gaussian.kernel <- function(sigma=2, n=5)
{
m <- matrix(ncol=n, nrow=n)
mcol <- rep(1:n, n)
mrow <- rep(1:n, each=n)
x <- mcol - ceiling(n/2)
y <- mrow - ceiling(n/2)
m[cbind(mrow, mcol)] <- 1/(2*pi*sigma^2) * exp(-(x^2+y^2)/(2*sigma^2))
m / sum(m)
}
# This is the code for spatialEco::raster.gaussian.smooth
raster.gaussian.smooth <- function(x, sigma = 2, n = 5, type = mean, ...) {
if (!inherits(x, "RasterLayer")) stop("MUST BE RasterLayer OBJECT")
gm <- gaussian.kernel(sigma=sigma, n=n)
return(raster::focal(x, w = gm, fun = type, na.rm=TRUE, pad=FALSE, ...) )
}
The code for raster::focal can be found at Raster's Github repository. Also, for your information, matrix multiplication in R is done with the %*% operator rather than simply *.

Generating 3D data with cube as a decision surface

I am new to using r program. I have a task to use r to create a function to simulate standard normal distribution containing 500 observations and three variables, x,y,& z.
I am to use cube as a decision surface to categorize observations based on whether they fell within or outside the cube.
Below is my code. I am able to plot the 3D data, but I am not sure of how to categorize the datasets into two classes.
library(scatterplot3d)
set.seed (1234)
nObs <- 500
x <- matrix (rnorm (1.25*nObs), ncol =2)
y <- matrix (rnorm (1.25*nObs), ncol =2)
z <- matrix (rnorm (1.25*nObs), ncol =2)
mSample <- function(nObs,x,y,z){
x1 <- rnorm(1,x)
x1[y==1,] <- x[y==1,] + 1
mSample <- as_tibble(rbind(mvnfast::rmvn(x,y = y1,z = z1), mvnfast::rmvn(x,y = y1,z = z1)))
mSample[1:x1, 1.25] <- 0
mSample[(x1 + 1):(x1 + 1), 1.25] <- 1
mSample <- mSample[sample(nrow(mSample)), ]
colnames(mSample <- c("x", "y", "class"))
mSample
}
spl <- scatterplot3d(x,y,z)
spl <- scatterplot3d(x,y,z,pch=16,highlight.3d=TRUE)
I had a similar question to this recently. Basically, to know if a given point is inside or outside of a cube, first you need to know the length of the cube.
Then, simply iterate over all the points (nObs) and do an if statement
if (x > -cubeLength ** x < cubeLength && y > -cubeLength ** y < cubeLength && z > -cubeLength ** z < cubeLength) {
classify positive
}
else {
classify negative
}

How to clip an isosurface to a ball?

Consider the Togliatti implicit surface. I want to clip it to the ball centered at the origin with radius 4.8. A solution, with the misc3d package, consists in using the mask argument of the computeContour3d function, which allows to use only the points satisfying x^2+y^2+z^2 < 4.8^2:
library(misc3d)
# Togliatti surface equation: f(x,y,z) = 0
f <- function(x,y,z){
w <- 1
64*(x-w)*
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
5*sqrt(5-sqrt(5))*(2*z-sqrt(5-sqrt(5))*w)*(4*(x^2+y^2-z^2)+(1+3*sqrt(5))*w^2)^2
}
# make grid
nx <- 220; ny <- 220; nz <- 220
x <- seq(-5, 5, length=nx)
y <- seq(-5, 5, length=ny)
z <- seq(-4, 4, length=nz)
g <- expand.grid(x=x, y=y, z=z)
# calculate voxel
voxel <- array(with(g, f(x,y,z)), dim = c(nx,ny,nz))
# mask: keep points satisfying x^2+y^2+z^2 < 4.8^2, in order to
# clip the surface to the ball of radius 4.8
mask <- array(with(g, x^2+y^2+z^2 < 4.8^2), dim = c(nx,ny,nz))
# compute isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, mask=mask, x=x, y=y, z=z)
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE))
But the borders of the resulting surface are irregular:
How to get regular, smooth borders?
The solution I found resorts to spherical coordinates. It consists in defining the function f in terms of spherical coordinates (ρ, θ, ϕ), then to compute the isosurface with ρ running from 0 to the desired radius, and then to transform the result to Cartesian coordinates:
# Togliatti surface equation with spherical coordinates
f <- function(ρ, θ, ϕ){
w <- 1
x <- ρ*cos(θ)*sin(ϕ)
y <- ρ*sin(θ)*sin(ϕ)
z <- ρ*cos(ϕ)
64*(x-w)*
(x^4-4*x^3*w-10*x^2*y^2-4*x^2*w^2+16*x*w^3-20*x*y^2*w+5*y^4+16*w^4-20*y^2*w^2) -
5*sqrt(5-sqrt(5))*(2*z-sqrt(5-sqrt(5))*w)*(4*(x^2+y^2-z^2)+(1+3*sqrt(5))*w^2)^2
}
# make grid
nρ <- 300; nθ <- 400; nϕ <- 300
ρ <- seq(0, 4.8, length = nρ) # ρ runs from 0 to the desired radius
θ <- seq(0, 2*pi, length = nθ)
ϕ <- seq(0, pi, length = nϕ)
g <- expand.grid(ρ=ρ, θ=θ, ϕ=ϕ)
# calculate voxel
voxel <- array(with(g, f(ρ,θ,ϕ)), dim = c(nρ,nθ,nϕ))
# calculate isosurface
surf <- computeContour3d(voxel, maxvol=max(voxel), level=0, x=ρ, y=θ, z=ϕ)
# transform to Cartesian coordinates
surf <- t(apply(surf, 1, function(rtp){
ρ <- rtp[1]; θ <- rtp[2]; ϕ <- rtp[3]
c(
ρ*cos(θ)*sin(ϕ),
ρ*sin(θ)*sin(ϕ),
ρ*cos(ϕ)
)
}))
# draw isosurface
drawScene.rgl(makeTriangles(surf, smooth=TRUE, color = "violetred"))
Now the resulting surface has regular, smooth borders:
Your solution is excellent for the problem you stated, because spherical coordinates are so natural for that boundary. However, here is a more general solution that would work for other smooth boundaries.
The idea is to allow input of a boundary function, and cull points when they are too large or too small. In your case it would be the squared distance from the origin, and you would want to cull points where the value is bigger than 4.8^2. But sometimes the triangles being drawn to make the smooth surface should only be partially culled: one point would be kept and two deleted, or two kept and one deleted. If you cull the whole triangle that leads to the jagged edges in your original plot.
To fix this, the points can be modified. If only one is supposed to be kept, then the other two points can be shrunk towards it until they lie on an approximation to the boundary. If two are supposed to be kept you want the shape to be a quadrilateral, so you would build that out of two triangles.
This function does that, assuming the input surf is the output of computeContour3d:
boundSurface <- function(surf, boundFn, bound = 0, greater = TRUE) {
# Surf is n x 3: each row is a point, triplets are triangles
values <- matrix(boundFn(surf) - bound, 3)
# values is (m = n/3) x 3: each row is the boundFn value at one point
# of a triangle
if (!greater)
values <- -values
keep <- values >= 0
# counts is m vector counting number of points to keep in each triangle
counts <- apply(keep, 2, sum)
# result is initialized to an empty array
result <- matrix(nrow = 0, ncol = 3)
# singles is set to all the rows of surf where exactly one
# point in the triangle is kept, say s x 3
singles <- surf[rep(counts == 1, each = 3),]
if (length(singles)) {
# singleValues is a subset of values where only one vertex is kept
singleValues <- values[, counts == 1]
singleIndex <- 3*col(singleValues) + 1:3 - 3
# good is the index of the vertex to keep, bad are those to fix
good <- apply(singleValues, 2, function(col) which(col >= 0))
bad <- apply(singleValues, 2, function(col) which(col < 0))
for (j in 1:ncol(singleValues)) {
goodval <- singleValues[good[j], j]
for (i in 1:2) {
badval <- singleValues[bad[i,j], j]
alpha <- goodval/(goodval - badval)
singles[singleIndex[bad[i,j], j], ] <-
(1-alpha)*singles[singleIndex[good[j], j],] +
alpha *singles[singleIndex[bad[i,j], j],]
}
}
result <- rbind(result, singles)
}
doubles <- surf[rep(counts == 2, each = 3),]
if (length(doubles)) {
# doubleValues is a subset of values where two vertices are kept
doubleValues <- values[, counts == 2]
doubleIndex <- 3*col(doubleValues) + 1:3 - 3
doubles2 <- doubles
# good is the index of the vertex to keep, bad are those to fix
good <- apply(doubleValues, 2, function(col) which(col >= 0))
bad <- apply(doubleValues, 2, function(col) which(col < 0))
newvert <- matrix(NA, 2, 3)
for (j in 1:ncol(doubleValues)) {
badval <- doubleValues[bad[j], j]
for (i in 1:2) {
goodval <- doubleValues[good[i,j], j]
alpha <- goodval/(goodval - badval)
newvert[i,] <-
(1-alpha)*doubles[doubleIndex[good[i,j], j],] +
alpha *doubles[doubleIndex[bad[j], j],]
}
doubles[doubleIndex[bad[j], j],] <- newvert[1,]
doubles2[doubleIndex[good[1,j], j],] <- newvert[1,]
doubles2[doubleIndex[bad[j], j],] <- newvert[2,]
}
result <- rbind(result, doubles, doubles2)
}
# Finally add all the rows of surf where the whole
# triangle is kept
rbind(result, surf[rep(counts == 3, each = 3),])
}
You would use it after computeContour3d and before makeTriangles, e.g.
fn <- function(x) {
apply(x^2, 1, sum)
}
drawScene.rgl(makeTriangles(boundSurface(surf, fn, bound = 4.8^2,
greater = FALSE),
smooth = TRUE))
Here's the output I see:
It's not quite as good as yours, but it would work for many different boundary functions.
Edited to add: Version 0.100.26 of rgl now has a function clipMesh3d which incorporates these ideas.

Multi-data likelihood function and mle2 function from bbmle package in R

I have written a custom likelihood function that fits a multi-data model that integrates mark-recapture and telemetry data (sensu Royle et al. 2013 Methods in Ecology and Evolution). The likelihood function is designed to be flexible in terms of whether and how many covariates are specified for different linear models in different likelihood components which is determined by values supplied as function arguments (i.e., data matrices "detcovs" and "dencovs" in my code). The likelihood function works when I directly supply it to optimization functions (e.g., optim or nlm), but does not play nice with the mle2 function in the bbmle package. My problem is that I continually run into the following error: "some named arguments in 'start' are not arguments to the specified log-likelihood function". This is my first attempt at writing custom likelihood functions so I'm sure there are general coding conventions of which I'm unaware that make such tasks much more efficient and amendable to the mle2 function. Below is my likelihood function, code creating the staring value objects, and code calling the mle2 function. Any advice how to solve the error problem and general comments on writing cleaner functions is welcome. Many thanks in advance.
Edit: As requested, I have simplified the likelihood function and provided code to simulate reproducible data to which the model can be fit. Included in the simulation code are 2 custom functions and use of the raster function from the raster package. Hopefully, I have sufficiently simplified everything to enable others to troubleshoot. Again, many thanks for your help!
Jared
Likelihood function:
CSCR.RSF.intlik2.EXAMPLE <- function(alpha0,sigma,alphas=NULL,betas=NULL,n0,yscr=NULL,K=NULL,X=X,trapcovs=NULL,Gden=NULL,Gdet=NULL,ytel=NULL,stel=NULL,
dencovs=NULL,detcovs=NULL){
#
# this version of the code handles a covariate on log(Density). This is starting value 5
#
# start = vector of starting values
# yscr = nind x ntraps encounter matrix
# K = number of occasions
# X = trap locations
# Gden = matrix with grid cell coordinates for density raster
# Gdet = matrix with gride cell coordinates for RSF raster
# dencovs = all covariate values for all nGden pixels in density raster
# trapcovs = covariate value at trap locations
# detcovs = all covariate values for all nGrsf pixels in RSF raster
# ytel = nguys x nGdet matrix of telemetry fixes in each nGdet pixels
# stel = home range center of telemetered individuals, IF you wish to estimate it. Not necessary
# alphas = starting values for RSF/detfn coefficients excluding sigma and intercept
# alpha0 = starting values for RSF/detfn intercept
# sigma = starting value for RSF/detfn sigma
# betas = starting values for density function coefficients
# n0 = starting value for number of undetected individuals on log scale
#
n0 = exp(n0)
nGden = nrow(Gden)
D = e2dist(X,Gden)
nGdet <- nrow(Gdet)
alphas = alphas
loglam = alpha0 -(1/(2*sigma*sigma))*D*D + as.vector(trapcovs%*%alphas) # ztrap recycled over nG
psi = exp(as.vector(dencovs%*%betas))
psi = psi/sum(psi)
probcap = 1-exp(-exp(loglam))
#probcap = (exp(theta0)/(1+exp(theta0)))*exp(-theta1*D*D)
Pm = matrix(NA,nrow=nrow(probcap),ncol=ncol(probcap))
ymat = yscr
ymat = rbind(yscr,rep(0,ncol(yscr)))
lik.marg = rep(NA,nrow(ymat))
for(i in 1:nrow(ymat)){
Pm[1:length(Pm)] = (dbinom(rep(ymat[i,],nGden),rep(K,nGden),probcap[1:length(Pm)],log=TRUE))
lik.cond = exp(colSums(Pm))
lik.marg[i] = sum( lik.cond*psi )
}
nv = c(rep(1,length(lik.marg)-1),n0)
part1 = lgamma(nrow(yscr)+n0+1) - lgamma(n0+1)
part2 = sum(nv*log(lik.marg))
out = -1*(part1+ part2)
lam = t(exp(a0 - (1/(2*sigma*sigma))*t(D2)+ as.vector(detcovs%*%alphas)))# recycle zall over all ytel guys
# lam is now nGdet x nG!
denom = rowSums(lam)
probs = lam/denom # each column is the probs for a guy at column [j]
tel.loglik = -1*sum( ytel*log(probs) )
out = out + tel.loglik
out
}
Data simulation code:
library(raster)
library(bbmle)
e2dist <- function (x, y){
i <- sort(rep(1:nrow(y), nrow(x)))
dvec <- sqrt((x[, 1] - y[i, 1])^2 + (x[, 2] - y[i, 2])^2)
matrix(dvec, nrow = nrow(x), ncol = nrow(y), byrow = F)
}
spcov <- function(R) {
v <- sqrt(nrow(R))
D <- as.matrix(dist(R))
V <- exp(-D/2)
cov1 <- t(chol(V)) %*% rnorm(nrow(R))
Rd <- as.data.frame(R)
colnames(Rd) <- c("x", "y")
Rd$C <- as.numeric((cov1 - mean(cov1)) / sd(cov1))
return(Rd)
}
set.seed(1234)
co <- seq(0.3, 0.7, length=5)
X <- cbind(rep(co, each=5),
rep(co, times=5))
B <- 10
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
dencovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(dencovs)[[2]][3:4] <- c("dencov1","dencov2")
denr.list <- vector("list",2)
for(i in 1:2){
denr.list[[i]] <- raster(
list(x=seq(0,1,length=10),
y=seq(0,1,length=10),
z=t(matrix(dencovs[,i+2],10,10,byrow=TRUE)))
)
}
B <- 20
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
detcovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(detcovs)[[2]][3:4] <- c("detcov1","detcov2")
detcov.raster.list <- vector("list",2)
trapcovs <- matrix(0,J,2)
for(i in 1:2){
detr.list[[i]] <- raster(
list(x=seq(0,1,length=20),
y=seq(0,1,length=20),
z=t(matrix(detcovs[,i+2],20,20,byrow=TRUE)))
)
trapcovs[,i] <- extract(detr.list[[i]],X)
}
alpha0 <- -3
sigma <- 0.15
alphas <- c(1,-1)
beta0 <- 3
betas <- c(-1,1)
pixelArea <- (dencovs$y[2] - dencovs$y[1])^2
mu <- exp(beta0 + as.matrix(dencovs[,3:4])%*%betas)*pixelArea
EN <- sum(mu)
N <- rpois(1, EN)
pi <- mu/sum(mu)
s <- dencovs[sample(1:nrow(dencovs), size=N, replace=TRUE, prob=pi),1:2]
J <- nrow(X)
K <- 10
yc <- d <- p <- matrix(NA, N, J)
D <- e2dist(s,X)
loglam <- t(alpha0 - t((1/(2*sigma*sigma))*D*D) + as.vector(trapcovs%*%alphas))
p <- 1-exp(-exp(loglam))
for(i in 1:N) {
for(j in 1:J) {
yc[i,j] <- rbinom(1, K, p[i,j])
}
}
detected <- apply(yc>0, 1, any)
yscr <- yc[detected,]
ntel <- 5
nfixes <- 100
poss.tel <- which(s[,1]>0.2 & s[,1]<0.8 & s[,2]>0.2 & s[,2]<0.8)
stel.id <- sample(poss.tel,ntel)
stel <- s[stel.id,]
ytel <- matrix(NA,ntel,nrow(detcovs))
d <- e2dist(stel,detcovs[,1:2])
lam <- t(exp(1 - t((1/(2*sigma*sigma))*d*d) + as.vector(as.matrix(detcovs[,3:4])%*%alphas)))
for(i in 1:ntel){
ytel[i,] <- rmultinom(1,nfixes,lam[i,]/sum(lam[i,]))
}
Specify starting values and call mle2 function:
start1 <- list(alpha0=alpha0,sigma=sigma,alphas=alphas,betas=betas,n0=log(N-nrow(yscr)))
parnames(CSCR.RSF.intlik2.EXAMPLE) <- names(start)
out1 <- mle2(CSCR.RSF.intlik2.EXAMPLE,start=start1,method="SANN",optimizer="optim",
data=list(yscr=yscr,K=K,X=X,trapcovs=trapcovs,Gden=dencovs[,1:2],Gdet=detcovs[,1:2],
ytel=ytel,stel=stel,dencovs=as.matrix(dencovs[,3:4]),detcovs=as.matrix(detcovs[,3:4]))
)

Offline orthogonal range counting implementation

It seems that a statistical problem that I am working on requires doing something known in computational geometry as "offline orthogonal range counting":
Suppose I have a set of n points (for the moment, in the plane). For every pair of points i and j, I would like to count the number of remaining points in the set that are in the rectangle whose diagonal is the segment with endpoints i and j. The overall output then is a vector of n(n-1) values each in [0, 1, 2, ... , n-2].
I've seen that a rich literature on the problem (or at least a very similar problem) exists, but I cannot find an implementation. I would prefer an R (a statistical computing language) package, but I guess that's asking too much. An open source C/C++ implementation will also work.
Thanks.
I hope I understand well your proble. Here an implementation in R using package geometry. I use
mesh.drectangle function which compute a signed distance from points p to boundary of rectangle.
I create a combination for all points using combn
for each point p of combination , I compute the distance from the rectangle rect_p to the others points
if distance < 0 I choose the points.
For example
library(geometry)
## I generate some data
set.seed(1234)
p.x <- sample(1:100,size=30,replace=T)
p.y <- sample(1:100,size=30,replace=T)
points <- cbind(p.x,p.y)
## the algortithm
ll <- combn(1:nrow(points),2,function(x){
x1<- p.x[x[1]]; y1 <- p.y[x[1]]
x2<- p.x[x[2]]; y2 <- p.y[x[2]]
p <- points[-x,]
d <- mesh.drectangle(p,x1,y1,x2,y2)
res <- NA
if(length(which(d <0))){
points.in = as.data.frame(p,ncol=2)[ d < 0 , ]
res <- list(n = nrow(points.in),
rect = list(x1=x1,x2=x2,y1=y1,y2=y2),
points.in = points.in)
}
res
},simplify=F)
ll <- ll[!is.na(ll)]
## the result
nn <- do.call(rbind,lapply(ll,'[[','n'))
To visualize the results, I plots rectangles with 5 points for example.
library(grid)
grid.newpage()
vp <- plotViewport(xscale = extendrange(p.x),
yscale = extendrange(p.y))
pushViewport(vp)
grid.xaxis()
grid.yaxis()
grid.points(x=points[,'p.x'],y=points[,'p.y'],pch='*')
cols <- rainbow(length(ll))
ll <- ll[nn == 5] ## here I plot only the rectangle with 5 points
lapply(seq_along(ll),function(i){
x <- ll[[i]]
col <- sample(cols,1)
x1<- x$rect$x1; x2<- x$rect$x2
y1<- x$rect$y1; y2<- x$rect$y2
grid.rect(x=(x1+x2)*.5,y=(y1+y2)*.5,
width= x2-x1,height = y2-y1,
default.units ='native',
gp=gpar(fill=col,col='red',alpha=0.2)
)
grid.points(x=x$points.in$p.x,y=x$points.in$p.y,pch=19,
gp=gpar(col=rep(col,x$n)))
}
)
upViewport()

Resources